26,925 Matching Annotations
  1. Nov 2023
    1. eLife assessment

      This fundamental work by Cao et al. advances our understanding of the role of senescent osteoclasts (SnOCs) in the pathogenesis of spine instability. The authors provide compelling evidence for the SnOCs to induce sensory nerve innervation. Subsequently, reduction of SnOCs by the senolytic drug Navitoclax markedly reduces spinal pain sensitivity. This work will be of broad interest to regenerative biologists working on spinal pain.

    2. Reviewer #1 (Public Review):

      Summary:<br /> In this study, Pan DY et al. discovered that the clearance of senescent osteoclasts can lead to a reduction in sensory nerve innervation. This reduction is achieved through the attenuation of Netrin-1 and NGF levels, as well as the regulation of H-type vessels, resulting in a decrease in pain-related behavior. The experiments are well-designed. The results are clearly presented, and the legends are also clear and informative. Their findings represent a potential treatment for spine pain utilizing senolytic drugs.

      Strengths:<br /> Rigorous data, well-designed experiments as well as significant innovation make this manuscript stand out.

      Weaknesses:<br /> Quantification of histology and detailed statistical analysis will further strengthen this manuscript.

    3. Reviewer #2 (Public Review):

      Summary:<br /> This manuscript examined the underlying mechanisms between senescent osteoclasts (SnOCs) and lumbar spine instability (LSI) or aging. They first showed that greater numbers of SnOCs are observed in mouse models of LSI or aging, and these SnOCs are associated with induced sensory nerve innervation, as well as the growth of H-type vessels, in the porous endplate. Then, the deletion of senescent cells by administration of the senolytic drug Navitoclax (ABT263) results in significantly less spinal hypersensitivity, spinal degeneration, porosity of the endplate, sensory nerve innervation, and H-type vessel growth in the endplate. Finally, they also found that there is greater SnOC-mediated secretion of Netrin-1 and NGF, two well-established sensory nerve growth factors, compared to non-senescent OCs. The study is well conducted and data strongly support the idea. However, some minor issues need to be addressed.

    4. Reviewer #3 (Public Review):

      Summary:<br /> This research article reports that a greater number of senescent osteoclasts (SnOCs), which produce Netrin-1 and NGF, are responsible for innervation in the LSI and aging animal models.

      Strengths:<br /> The research is based on previous findings in the authors' lab and the fact that the IVD structure was restored by treatment with ABT263. The logic is clear and clarifies the pathological role of SnOCs, suggesting the potential utilization of senolytic drugs for the treatment of LBP. Generally, the study is of good quality and the data is convincing.

      Weaknesses:<br /> There are some points that can be improved:<br /> 1. Since this work primarily focuses on ABT263, it resembles a pharmacological study for this drug. It is preferable to provide references for the ABT263 concentration and explain how the administration was determined.<br /> 2. It would strengthen the study to include at least 6 mice per group for each experiment and analysis, which would provide a more robust foundation.<br /> 3. In Figure 4, either use "adult" or "young" consistently, but not both. Additionally, it's important to define "sham," "young," and "adult" explicitly in the methods section.<br /> 4. Assess the protein expression of Netrin 1 and NGF.

    1. eLife assessment

      The findings of this article provide valuable information on the changes of cell clusters induced by chronic periodontitis. The observation of a new fibroblast subpopulation, named AG fibroblasts, is interesting, and the strength of evidence presented is solid.

    2. Reviewer #1 (Public Review):

      In this article, the authors found a distinct fibroblast subpopulation named AG fibroblasts, which are capable of regulating myeloid cells, T cells and ILCs, and proposed that AG fibroblasts function as a previously unrecognized surveillant to orchestrate chronic gingival inflammation in periodontitis. Generally speaking, this article is innovative and interesting.

    3. Reviewer #2 (Public Review):

      This study proposed the AG fibroblast-neutrophil-ILC3 axis as a mechanism contributing to pathological inflammation in periodontitis. In this study single-cell transcriptomic analysis was performed. But the signal mechanism behind them was not evaluated.

      The authors achieved their aims, and the results partially support their conclusions.

      The mouse ligatured periodontitis models differ from clinical periodontitis in human, this study supplies the basis for future research in human.

    1. Author Response

      Reviewer #3 (Public Review):

      The authors sought to directly compare the predictions of two models of somatosensory processing: The attenuation model, which states that the sensation of touch on one hand is reduced when it is the predictable result of an active movement by the other hand; and the enhancement model, which states that the sensation of touch is actually increased, as long as the active hand does not receive touch stimultaneously with the passive hand (no double stimulation). The authors achieved their aims, with results clearly demonstrating (1) attenuation in the case of self touch, (2) that previously-observed enhancement is a consequence of the comparison condition (false enhancement), and (3) that attenuation involves predictive mechanisms and does not result simply from double stimulation. These findings, and the methodology, should particularly impact future studies of perceptual attenuation, sensory prediction error, and motor control more generally. The opposite conclusions obtainable by selecting different comparison conditions is particularly striking.

      Experiment 1 affirms that a touch to the passive finger caused by the active finger tapping a force sensor is perceived as weaker (attenuated) compared to a baseline not involving the active finger, but that if double stimulation is prevented (active finger moves, but no contact), neither attenuation nor enhancement occurs. Experiment 2 includes the three original conditions, plus the no-go condition used as a comparison in these earlier studies. Results suggest that the comparisons used by previous studies would result in the false appearance of enhancement. Finally, Experiment 3 tests the hypothesis that the lack of attenuation in the no-contact condition is due to the absence of double stimulation rather than predictive mechanisms. When contact and no-contact trials were mixed in an 80:20 ratio, such that participants would form predictions about the consequence of their active finger movement even if some trials lacked contact. In this case, attenuation was observed for both contact and no-contact trials, supporting the idea that attenuation is related to predictive processes linked to moving the active finger, and is not a simple consequence of double stimulation.

      The methodology and analysis plans for all three experiments were pre-registered prior to data collection. We can therefore be very confident that the results were not influenced by hypotheses developed only after seeing the data. The three experiments were each performed in a new set of participants. Experiments 2 and 3 included conditions that replicated the Experiment 1 effects, allowing us to be very confident that the results are robust.

      While the study has significant strengths, some aspects of the interpretation need to be clarified. In particular, the authors' interpretation depends on the idea that attenuation is absent in the no-contact condition because this action-sensory consequence relationship is an "arbitrary mapping." It is not clear what makes it arbitrary. The self-touch contact condition could also be considered somewhat arbitrary and different from real self-touch; the 2N test force was triggered by the right finger tapping a force sensor. If participants' tapping forces were recorded, it would be useful to include this information, particularly about how variable participants' taps were. In other words, unlike real self-touch, in this paradigm the force of the active finger tap did not affect the force delivered to the passive finger.

      By ‘arbitrary’, we refer to nonecological mappings between a movement and a somatosensory stimulus. In other words, a mapping that does not resemble how one touches their body (natural self-touch). Examples of such arbitrary mappings are moving the right finger in the air and receiving simultaneous touch on the other hand, as in Thomas et al. (2022), or moving a joystick or potentiometer with one hand and receiving a touch on the other hand. These joystick or potentiometer conditions are typically used as a control condition when studying somatosensory attenuation because they include an arbitrary sensorimotor mapping (Shergill et al., 2005, 2003; Teufel et al., 2010; Wolpe et al., 2016).

      We understand the reviewer’s point about the relationship between the forces applied with the right hand and the forces received on the left hand. First, we would like to clarify that we recorded the forces that the participants applied to the sensor in every experiment. We have now added a figure (Figure 3 – figure supplement 3) showing the forces over time across all participants in every experiment, which is referred to in the Methods on Lines 727-730. As we wrote in the Methods (Lines 720-727), and in line with previous studies (Asimakidou et al., 2022; Kilteni et al., 2021; Kilteni and Ehrsson, 2022), we asked participants to tap, neither too weakly nor too strongly, with their right index finger, “as if tapping the screen of their smartphone”. We did so because participants do not have an intuitive sense of how strong a force of 2 N is, and this instruction allowed them to apply forces of similar magnitude from trial to trial while receiving the same touch on their left index finger. Indeed, as shown in Figure 3 – figure supplement 3 (D-F), participants showed low trial-to-trial variability in the applied forces, with an average variability (s.e.m.) of only ± 0.13 N in Experiment 1, ± 0.12 N in Experiment 2 and ± 0.11 N in Experiment 3. In other words, they generated similar forces with their right index finger across all trials while receiving the same force on their left index finger, establishing an approximately constant gain between movement and touch and a perceived causality between the two (Bays and Wolpert, 2008; Kilteni, 2023). Critically, Bays and Wolpert (Experiment 1 in that book chapter) previously showed that the magnitude of attenuation remains unaffected when halving or doubling the gain between the force applied by the active finger and the force delivered on the passive hand as long as the gain remains constant throughout the experiment (Bays and Wolpert, 2008). This should not be surprising given that when one finger transmits a force through an object to another finger, the resulting force also depends on the object's properties (e.g., shape, material and contact area) and the angle at which the finger contacts the object. This is outlined in Lines 733-736 of the manuscript.

      One additional potential weakness is that participants' vision was occluded in Experiment 3, but not in Experiments 1 and 2. The authors do not discuss whether this difference could confound any of the analyses that compare results across experiments.

      We thank the reviewer for the comment. We do not think that blindfolding is a weakness of our study, as we designed our experiment to take this factor into account. Specifically, we blindfolded participants to ensure that they would not know when the force sensor was retracted on (unexpected) no-contact trials. This was essential for establishing an expectation that they would contact the force sensor. Importantly, participants were blindfolded in all conditions of Experiment 3 (contact, no-contact and baseline), so any effect of blindfolding was present across all conditions of Experiment 3. Since in the analyses of Experiment 3 (Lines 342-354), we always compared between conditions, blindfolding per se could not explain any differences between conditions, as any putative effects of blindfolding are effectively removed when contrasting two conditions in which participants were blindfolded. Notably, this argument also applies to the comparisons that we made between Experiment 3 and Experiments 1 and 2, since all these analyses (Lines 362-376) compare the difference between contact and no-contact trials (e.g., PSE values) between the experiments. Once again, any putative effects from blindfolding were effectively removed. We should also emphasize that the participants’ left index finger as well as the motor that delivered the force to their left index finger were occluded from view in Experiments 1 and 2. This was done to prevent participants from using any visual cues to discriminate between the two forces. This is has been included in the Methods section (Lines 772-775).

      In conclusion, blindfolding cannot explain the results of Experiment 3, and it did not alter the interpretation of any of our results derived by comparing the experiments. We have clarified this point in the manuscript (Lines 823-827).

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript the authors perform a detailed analysis of the impact of food type on reproduction in C. elegans. They find that, in comparison with the standard OP50 strain of E. coli that is ubiquitously used to maintain C. elegans in the laboratory setting, the CS180 strain results in a reduction in the number of progeny that may be a consequence of an early transition from spermatogenesis to oogenesis that reduces total sperm number. They also find that the rate of oocyte fertilization is increased in animals fed CS180 vs. OP50. Using mutants and laser ablations, the authors show that, whereas the insulin-like peptide INS-6 acts in the ASJ sensory neurons to mediate the food type effect on total progeny and early oogenesis, the increased fertilization rate phenotype does not require ASJ or insulin-like signaling and instead requires the AWA olfactory neurons.

      The major strengths of the manuscript are the establishment of INS-6 as a link between food type and reproduction and the detail and rigor with which the experiments were executed. The results presented generally support the authors' model. This role of insulin-like signaling in connecting food type and reproduction makes it a plausible target for evolutionary forces that may have shaped insulin-like signaling in invertebrates. As such, this work contributes broadly to our understanding of how insulin signaling may have evolved prior to the emergence of vertebrates.

      We thank the Reviewer for these nice comments.

      A weakness of the work is the epistasis analysis of insulin-like pathway components, which is incomplete and at times difficult to interpret.

      We conducted an epistasis analysis between ins-6 and daf-16 with regard to early oogenesis onset on the CS180 diet. Through recombination of lin-41::GFP with the daf-16 deletion mutation on chromosome I, we showed that daf-16 mutants exhibit early oogenesis at mid L4 on CS180 (Figure 5C and F), which is unlike the ins-6 deletion (null) mutants or the reduction-offunction mutations in daf-2. Both ins-6 and daf-2 mutants exhibit delayed oogenesis on CS180 (Figure 5B, D, and F). Interestingly, the delayed oogenesis phenotype of ins-6 null mutants was not rescued by loss of daf-16, suggesting that wild-type ins-6 promotes early oogenesis independent of daf-16 (Figure 5F). This is reminiscent of the Arur lab’s findings, where daf-2 promotes germline meiotic progression independent of daf-16 in response to food availability (Lopez et al., Dev Cell 2013, vol 27, pp 227-240).

      Reviewer #2 (Public Review):

      The manuscript by Mishra et al. examines the modulation of the nervous system by different bacterial food to influence reproductive phenotypes-specifically onset of oogenesis, fertilization rate, and progeny production. Defining how animal reproduction could be modulated by bacterial food cues through neuroendocrine signaling is a fascinating subject of study for which C. elegans is well-suited. However, the overall scope of the current study is limited, and some of the central data do not provide compelling evidence for the authors' underlying hypothesis and model.

      1) Two strains of E. coli are examined, the standard C. elegans bacterial food strain OP50 and an E. coli strain that Alcedo and colleagues have previously characterized to influence aging and longevity through nervous system modulation. While the authors determine that differences in LPS structure present between the strains does not account for the food-dependent effects, there is little further insight regarding the bacterial features that contribute to the observed differences in reproductive physiology. Moreover, at least two of the phenotypes examined-total progeny and fertilization rate-are known to be affected by bacterial food quality and may be affected by bacteria in many ways, so the description of these phenotypes is somewhat less compelling than the study of the onset of oogenesis.

      Our study focused on how specific sensory neurons mediate the effects of different bacterial diets on three different aspects of C. elegans reproductive physiology—total progeny, oogenesis onset and fertilization rates. We examined the effects of three different bacteria, E. coli OP50, CS180 and CS2429, on these three phenotypes and the effects of two Serratia marcescens strains, Db11 and Db1140, on oogenesis onset. Of these five bacteria, only CS180 and its derivative CS2429, promote early C. elegans oogenesis.

      In the revised manuscript, we included the effects of a fourth E. coli strain, the K-12 HT115 on total progeny (Figure 2—supplement 1), oogenesis onset (Figure 2E) and fertilization rates (Figure 2F). We found that HT115 does not elicit the same response as CS180 on oogenesis onset and fertilization rates. Thus, the oogenic-inducing and fertilization-enhancing cue(s) appear to be specific to CS180 and its derivative CS2429. We started characterizing the potential nature of these CS180-derived cue(s). So far, we found that these cues are unlikely to be free, small metabolites, since they were lost upon filtration of the CS180-conditioned LB media through a nylon membrane that has a pore size of 0.45 µm (Figure 2G and H). While we agree with the Reviewer that the identification of these cues are important, we believe that it is beyond the scope of this manuscript.

      More importantly, we showed that the sensory neuron ASJ does modulate the timing of oogenesis and that this involves the insulin-like peptide ins-6 (please see our responses to the Essential Revisions section and Figures 5 and 6). We also showed that ASJ (Figure 7G and K) or ins-6 (Figure 8D) does not affect the food type-dependent fertilization rates, which are modulated by a different sensory neuron, the olfactory neuron AWA (Figure 7J and K). AWA in turn has no effect on the timing of oogenesis (Figure 7L). Thus, this manuscript links specific sensory neurons and insulin-like peptides to distinct aspects of oocyte biology, which we believe is a significant advance in the field of reproductive biology.

      2) The onset of oogenesis phenotype, using the lin-41::GFP reporter, seems more specific and tractable, and the authors nicely decouple this phenotype from the total progeny and fertilization rate phenotypes through experiments that shift animals to different bacterial food at specific developmental stages.

      We thank the Reviewer for this comment.

      However, as it stands, the data regarding the role of ins-6 and ASJ in modulating this phenotype, and the model that exposure to CS180 bacterial food causes a change in the ASJ expression of ins-6, which is sufficient to promote the earlier onset of oogenesis at the mid-L4 stage, seems somewhat incomplete and have some inconsistencies to be addressed.

      a) The ins-6 mutant phenotype is rescued by genome ins-6 and partially rescued by ins-6 expressed under and ASJ-specific promoter. The lack of rescue from an ASI promoter is puzzling given the secreted nature of ins-6.

      We address this in Essential Revisions, point 3. Briefly, we disagree that this is puzzling, since several labs have already shown that there are functional differences between the INS-6 produced from ASI versus the INS-6 produced from ASJ, using different experimental approaches (Chen et al., 2013; Tang et al., 2023; and this work). Indeed, the cell-specific activities of a secreted signal is not limited to INS-6, but has also been described for other secreted peptides, such as INS-1 (Kodama et al., 2006; Tomioka et al., 2006; Takeishi et al., eLife 2020, vol 9, e61167. Thus, the interesting question is why functional differences exist between the INS-6 peptides from the two neurons. This is a fascinating question, but beyond the scope of this manuscript.

      b) The ins-6 mutant phenotype with regard to delaying the early expression of lin-41::GFP on CS180 appears weaker than the daf-2 mutant phenotype. This is difficult to reconcile with what is known about the relative strength of the daf-2 mutant alleles relative to ins-6 for a wide range of phenotypes.

      There are evidence in the literature that the ins-6 mutant phenotype will not look exactly like that of daf-2 (Chen et al., 2013; Cornils et al., Development 2011, vol 138, pp1183-93; Fernandes de Abreu et al., PLoS Genet 2014, vol 10, e1004225). The DAF-2 insulin-like receptor is predicted to bind multiple insulin-like peptides (Pierce et al., Genes Dev 2001, vol 15, pp 672-686), some of which can act antagonistic to DAF-2 function (Pierce et al., 2001; Cornils et al., 2011; Chen et al., 2013; Fernandes de Abreu et al., 2014). Thus, the oogenic effects of the reduction-offunction mutations in daf-2 are likely the sum of multiple insulin-like peptides, some of which might also delay oogenesis. This could explain why the manipulation of an individual insulin-like peptide, INS-6, which could bind DAF-2 to promote oogenesis, does not closely resemble the phenotype of daf-2 mutants.

      c) The daf-16 loss-of-function phenotype and suppression of daf-2 and ins-6 mutant phenotypes are not shown for the lin-41::GFP expression phenotype.

      We address this in the Public Review comments of Reviewer 1. Briefly, we focused on the epistasis analysis between ins-6 and daf-16 and showed that ins-6 promotes early oogenesis independent of daf-16.

      d) The modest difference in ins-6p::mCherry expression in the ASJ neurons (Figure 5D) make the idea that this difference causes onset of oogenesis somewhat implausible.

      We disagree that this change is modest and that the oogenic effect of such a change is implausible.

      First, the change in ins-6p::mCherry expression in ASJ on CS180 is comparable to other physiologically-important expression changes that have been reported for other genes (for example, Entchev et al., eLife 2015, vol 4, 4:e06259, for the tryptophan hydroxylase tph-1 and the TGF-β daf-7; and Tataridas-Pallas et al, PLoS Genet 2021, vol 17, e1009358, for the neuronally expressed NRF transcription factor skn-1b). Second, it is worth noting that we were using a single-copy reporter for ins-6 expression, where detected changes will be smaller but should be closer to physiological responses. It is possible that multiple-copy reporters will give larger changes, but that would be further from a physiological response. Third, the change in ins-6p::mCherry expression is comparable in scale to the ins-6 mutant phenotype. Our results showed that the 35% increase in ASJ expression of ins-6 is due to food type (Figure 6A; mean fluorescence on OP50 = 1526 + 94; mean fluorescence on CS180 = 2056 + 104). This change in magnitude is similar to the loss of lin-41::GFP expression in mid L4 of ins-6 mutants versus controls. About 30% to 43% of control worms express lin-41::GFP, whereas 0% of ins-6 mutants express the same reporter at mid L4 on CS180 (Figure 5 and its associated supplement).

      e) The strain carrying an genetic ablation of ASJ appears to have a markedly different baseline of kinetics of lin-41::GFP expression (even at lethargus, less than half of the animals appear to express lin-41::GFP). Given this phenotype, it seems difficult to draw conclusions about bacterial food-dependent effects on expression of lin-41::GFP. Additional characterization corroborating timing of oogenesis independent of the lin-41::GFP marker may be helpful, but something seems amiss.

      We address this in Essential Revisions, point 4. Briefly, we disagree that the kinetics of lin-41::GFP expression in ASJ-ablated animals is puzzling, compared to the kinetics observed in insulin signaling mutants. Besides ins-6, ASJ expresses multiple signals (Taylor et al., 2021), some of which might also regulate the multiple functions of oogenic lin-41::GFP. Thus, it should not be surprising that loss of ASJ will have a markedly different effect on oogenesis than the loss of ins-6.

      Reviewer #3 (Public Review):

      I very much enjoyed reading this paper by Shashwat Mishra and team from Joy Alcedo's and from Queelim Ch'ng's laboratories dissecting how sensory signals regulate reproduction in worms. The mechanisms by which sensory inputs affect the function of the germline, the balance between growth and differentiation within this tissue, are of broad interest not only to those interested in reproduction and differentiation, but also to those interested in the mechanisms of plasticity that enable organisms to adjust to changing environmental conditions. These mechanisms are only now beginning to be characterized. Here the focus is on the role of insulin signals expressed in sensory neurons. This work builds on previous findings by the Alcedo lab that sensory perception of bacterial-type dependent signals regulates C. elegans lifespan. Here their focus is on the effects on reproduction, and on the communication of that information by insulin-like signals.

      We thank the Reviewer for these nice comments.

      Worms have a huge family of 40 insulin-like genes, which the Alcedo and Ch'ng labs have been studying for many years. The paper starts with the interesting premise that the brood size of the worms is food type dependent. The authors show that this is due to effects on the timing of the onset of oogenesis during larval development (which constrains the size of the pool of sperm available for subsequent oocyte fertilization) as well as on effects on the rate of oocyte fertilization during adulthood. Using clever timing for food switching, they show that the effects on oogenesis onset and on fertilization rate are separable. In addition, these effects did not appear to be merely the outcome of indirect effects of food ingestion, but were, instead, at least in part, due to the perception of environmental information by specific sensory neurons. Using mutants affecting transduction of sensory information in specific neurons and genetic ablation of specific neurons, the authors show that the onset of oogenesis and the rate of reproduction were controlled by different sensory neurons, ASJ and AWA, respectively. One of these neurons, ASJ, transmitted environmental information via the ins-6 neuropeptide.

      Altogether, the paper advances our understanding of how environmental determinants influence reproduction.

      We thank the Reviewer for these nice comments.

    1. Author Response

      Reviewer #1 (Public Review):

      However, the authors are cautioned to tone down some of the sentences with the human diabetic samples as they rely heavily on extrapolation rather experimental tests.

      Thank you for this feedback. We have added an experimental test to support the CellChat results. We found that, in accordance with the CellChat analysis, more macrophage Gas6 expression is observed in diabetic wounds via IF. These data are now included in Figures 3C-D. We have additionally edited the text relating to Figure 3 to indicate that these results are not fully conclusive.

      For instance, the antibody inhibition of Axl had minimal effect on the clearance of apoptotic cells in the wound and this would be expected with the redundancy endowed by other TAM receptors.

      Thank you for this point. We have made a note of this in the text in lines 289-291.

      For instance, in Figure 6, the number of TUNEL+ cells seem to be higher in the IgG samples compared to the anti-Timd4 treatment, but this is not the case in the quantification

      Thank you for this comment. We have replaced these with more representative images, which appear in Figure 6A. We also repeated the staining with antibodies for cleaved caspase 3, which appear in Fig. 6 – Fig. supplement 1A, which showed similar results.

      Reviewer #2 (Public Review):

      I suggest to repeat the quantification of cells containing active caspase-3 with an anticleaved caspase-3 antibody. Here the authors use an antibody recognizing phospho S150 antibody, which is far from generally accepted to be a marker for active caspase-3. It would also be good to quantify the apoptotic cells observed in the sections (Fig 1 I and J) and compare to control treatment on sections. It is not clear from the data presented whether the number of apoptotic cells increases or not in the time frame analyzed since the controls are lacking.

      Thank you for this important suggestion. We have repeated the IF staining using an antibody for cleaved caspase 3 (Cell Signaling 9661S) and quantified the apoptotic cells present. We found that apoptotic cells were rare but present at both 24h and 48h after injury, and that significantly more cleaved caspase 3+ cells were present in 48h wounds than 24h wounds. These data are now included in Figure 1H-J and Fig. 1 – Fig. supplement 1F. We have also used this antibody in IF staining in Fig. 5 – Fig. supplement 1B and Fig. 6 – Fig. supplement 1A.

      In a FACS analysis (Fig S1 H), the authors show that there is no increase in dead cells in a time frame of 48 hrs. Could it be that the majority of the cells that may have died in vivo, were lost during the procedure of tissue digestions. Dead cells tend to aggregate.

      Based on these comments and the inconsistency in these data due to potential technical challenges, we have removed the FACS data quantifying Annexin V. We now include the quantification of cleaved caspase 3 and an efferocytosis assay to analyze the kinetics of efferocytosis.

      On line 104 the authors refer to the apoptosis-inducing activity of G0s2. Please, realize that there is little or no in vivo evidence for a role of G0s2 in apoptosis.

      Thank you for this helpful comment. We have removed this gene from our analysis and text.

      The authors state that Axl is uniquely expressed in DC and fibroblasts (Fig 2). Are the Axlcells positive in panel G (red, Fig 2) that do not stain for the Pdgfra marker (green) then all DCs? Please clarify or show with a triple staining that these cells are indeed DCs.

      Thank you for this comment. To clarify, our intention was to show that both DCs and fibroblasts express Axl, not to say conclusively that only DCs and fibroblasts express Axl. Indeed, in Figure 5, we show that a portion of macrophages also express Axl (at day 3), so some of the Axl+ cells in 2G may be macrophages rather than DCs. We have made this more clear in the text in lines 163-166.

      In addition, it is not clear to me to what reference level exactly the expression levels are compared in Fig 2A. Is this between the 24 and 48h time points after wounding (as mentioned in the legend)? If so, the analysis may indicate up or down regulation but not necessarily expression or no expression.

      Thank you for making this point. The heatmaps display scaled log-normalized mRNA counts for the entire dataset, not a comparison between the two timepoints. We have clarified this in the figure legends.

      2) Human diabetic wounds display increased and altered efferocytosis signaling via Axl. This conclusion is solely based on CellChat analysis and should be tuned down or validated.

      Thank you for this suggestion. We have experimentally validated this conclusion using IF staining for Gas6. We found that more Gas6 staining in CD68+ macrophages in diabetic foot ulcers when compared to nondiabetic foot wounds. These data are now included in Figure 3C-D.

      The authors conclude that anti-Axl treatment leads to healing defects based on lack of granulation tissue and larger scabs, a reduction of fibroblast repopulation and revascularization. The differences in the last two parameters mentioned above are obvious, however the other parameters, as granulation tissue and scabs are less clear to me. Is this quantified in any way? In Fig S4 D there is also a large scab visible in the control treatment image. Therefore, it would be good if these parameters could be better substantiated.

      Thank you for this comment. We have edited the text in lines 301-304 to de-emphasize these qualitative changes.

      In view of the lack of revascularization, are there differences in the mRNA expression levels of angiogenic factors such as VEGF and others at this time point? Does revascularization occur at later stages?.

      Thank you for this helpful suggestion. We have used qPCR to measure Vegfa mRNA expression, and these data are now included in Figure 5I. We found no significant difference in Vegfa expression 5 days after injury.

      Based on the FACS analysis the authors claim that there are no differences at the level of DCs. However, the plots shown in Fig 5C do not convincingly show the detection of DC (as boxed in the lower panel). Based on the density plots one would presume this is just the continuation of the CD11b+ population and not a separate CD11c+ population. To get a better view of that, it would be better to show dot plots instead of density plots.

      Thank you for this insightful comment. We have created new plots as suggested to demonstrate that this is not exactly the case. In the wound bed, contrary to what we see in blood isolates many times the full separation of populations is elusive and to ensure that we use single stain controls to set the gates. Nonetheless, we provide in Author response image 1 the same data as dot-plots as requested to show that that is not the case, alongside the single stain control to show that the gating strategy is adequate. We do understand and acknowledge that in dissociated tissues sometimes the outlines are not as perfect as what is obtained in immunological samples.

      Author response image 1.

      Finally, the authors state (line 265-266) that anti-Axl treatment leads to non-significantly increased expression of IL1alpha and IL6 after one day of injury (Fig S4C). If the difference between the control-treated and the anti-Axl-treated group is statistically not significant I would not conclude there is an increase. Please adapt phrasing or include more mice in the experiment (now only 4) to substantiate the observation and clarify whether it is increased or not.

      Thank you for this comment. We have altered the text in lines 286-289 to better reflect this.

      The authors conclude that overall healing was not affected but that the wound beds appeared more fragile. What is meant with 'appeared more fragile' is not clear. In addition, this seems to me a quite subjective interpretation. What are the objective parameters to come this conclusion?

      Thank you for this point. We have altered the text to remove this subjective language.

      Similar to inhibition of Axl, inhibition of Timd4 led to a defect in revascularization as witnessed by the absence of CD31 staining. Also in this experiment one can raise similar questions as in the anti-Axl experiment: 1) does revascularization occur at a later timepoint; 2) what about the expression of angiogenic factors?

      Thank you for this helpful suggestion. To further investigate the impact of Axl inhibition of angiogenesis, we have assayed for Vegfa by qPCR. We found no significant difference in Vegfa expression 5 days after injury. These data are now included in Figure 5I.

      In the anti-Timd4 treated wounds the authors observe more TUNEL-positive cells and conclude that this is due to a defect in efferocytosis. However, the formal experimental proof for this in the current model is lacking. How do the authors exclude the possibility that anti-Timd4 treatment attracts more infiltrating cells that then undergo treatment, or that the treatment with anti-Timd4 leads to more apoptosis of certain cells in the wound bed. What is the nature of these apoptotic cells (neutrophils, T cells, others)? It has been shown that Timd4 can have stimulatory effects on other cells, such as T cells. Could deprivation of Timd4 signaling in certain conditions lead to more dying cells in this model?

      Thank you for this insightful comment. To investigate this, we have repeated this experiment with IF staining for cleaved caspase 3 and found similar results, indicating the increase in apoptosis upon Timd4 inhibition (Fig. 6 – Fig. supplement 1A). We have also included text to acknowledge the possibility of an increase in apoptosis in lines 326-327.

      Reviewer #3 (public Review)):

      They never do show that there is an increase in apoptotic cells in the wounds, which then go down (which would be a sign that the cells are being cleared via efferocytosis. In addition, they are looking for apoptotic cells at very early time points (24-48 hours), times at which large numbers of apoptotic cells would not be expected. As an example, neutrophil infiltration peaks at 24-48 hours and efferocytosis of apoptotic neutrophils would be expected after that. Other types of apoptotic cells would likely be cleared even later. Finally, several of the panels showing apoptotic cells were done with a very small number of samples (1-3 per group) in some cases so it is unclear how rigorous the data are. I would recommend that the authors at the very least soften the wording related to these conclusions and discuss the limitations of their experimental design; ideally data from more samples would be included to provide clear support those statements.

      Thank you for raising this important point. In order to support these claims, we have undertaken two additional experiments. Firstly, we have repeated the immunofluorescence staining with a new antibody for activated caspase 3 and quantified the number of apoptotic cells present in 24h and 48h wound beds. We found that apoptotic cells significantly increased in 48h wound beds compared to 24h wounds (Figures 1H and Fig. 1 – Fig. supplement 1F).

      We have also undertaken a new experiment to show the temporal regulation of efferocytosis. We injected stained apoptotic neutrophils into 1D, 3D, and 5D wound beds and quantified the stained cells remaining after 1 hour in order to quantify the clearance of cells from the wound bed at different timepoints. We found that significantly more stained cells undergoing efferocytosis remained in 5D wounds, and that the rate of efferocytosis was approximately constant over this timeline. These data are now included in Figures 2H-M.

      While we would be interested to determine the identities of cells engaging in efferocytosis of the labeled apoptotic neutrophils, we found that co-staining for additional cell markers was impossible while maintaining the fluorescent labeling on the injected neutrophils.

      2) The human RNA-seq data is also quite limited, as non-diabetic wound tissue was all from one patient. Again, this limitation should be acknowledged.

      Thank you for this feedback. We have analyzed new data sets that include 5 individuals with diabetic foot ulcers and 4 individuals with non-diabetic wounds. These data are now included in Figure 3.

      Also, there are some important published papers by Sashwati Roy's group indicating that there are defects in efferocytosis in diabetic wounds, which may go against what the authors are showing here to some degree. Discussion of the authors' work in relation to these other studies should be discussed.

      Thank you for this suggestion. We have included discussion of this work to the text in lines 192193.

      3) For anti-Axl and anti-Timd4 experiments, the authors conclude that inhibition of Axl does not affect TUNEL+ cells and that Timd4 does not affect reepithelialization. However, in some cases the sample size was only 3 mice per group when measuring these parameters. That is a very small number of samples to draw conclusions about apoptotic cells or reepithelialization since these parameters are key for the overall conclusions of the experiments. Given that these are key data, it would be important to include more than n=3. Additionally, as stated above, a time point later than 24 h may be necessary to actually see changes in apoptotic cells.

      Thank you for this suggestion. We have repeated the staining for apoptotic cells using a new antibody for cleaved caspase 3 and stained wound beds from additional mice. In the anti-Axl experiments, we now show data for cleaved caspase 3 staining of 3- and 5-day wound beds with N=4 (Fig. 5 – Fig. supplement 1B). In the anti-Timd4 experiments, we now have N=6-11 for the TUNEL staining at 5 days after injection and injury (Figure 6B).

      4) In Fig 6, there look to be many more TUNEL+ cells in the wound bed of IgG control samples compared to anti-Timd4-treated samples, which contradicts the graph. Perhaps the authors could clarify where they were taking their measurements for panels with image analysis results.

      Thank you for this helpful point. We have updated this figure to be more representative of the quantification (Figure 6A-B), as well as repeated the staining with antibodies for cleaved caspase 3 (Fig. 6 – Fig. supplement 1A).

      Another question related to this experiment is how it is possible that efferocytosis is so drastically different yet there are no changes in wound healing (this is one reason why a larger sample size for reepithelialization may be critical) - this would seem to suggest that efferocytosis is not important in wound healing, which is confusing. Further discussion on this might be useful.

      Thank you for this point. Indeed, we see that there is a defect to revascularization when Timd4 is inhibited (Figure 6E-F), which indicates that efferocytosis is important to normal healing. This is discussed in lines 333-335.

    1. Author Response

      Reviewer #3 (Public Review):

      Comment 1: I'm having some difficulty understanding the logic of Figure 5 in determining cis processing. It is an inverse of figure 4, and in my view, provides further evidence of trans processing. A better experiment would be to use WT-citrine tagged protein with catalytic dead mcherry and image them together. This would show WT cis processing occurs faster than trans processing as citrine specks should appear earlier than the mCherry ones. Can also do colocalization and FRET-based assays with the pair.

      We thank the reviewer for pointing this out. While our data demonstrate that the same molecule must be catalytically active and competent for processing at the IDL (Figure 5), we agree that the data do not rule out trans-processing as a mechanism for speck formation. We have therefore modified the interpretation of these findings accordingly (pp. 7-8). We agree that some of the quantitative assays the reviewer has suggested would strengthen this logic, and we are making efforts to carry out a kinetic FRET-based assay for our upcoming biochemistry-focused manuscript to better characterize the enzymatic affinity of Casp11 for cis- vs. trans- based autoprocessing, and how either impacts Casp11 speck assembly.

      Comment 2: Do those casp11 specks still contain CARDs?- i.e. is the second cleavage necessary for speck formation? Is CARD necessary at all? Would adding the TEV site at CDL and b/w p20 and p10 rescue? i.e. trans-activate?

      We are grateful to the reviewer for these insightful questions, which we also had considered. We addressed this question in two ways – first by replacing the CARD with a DmrB dimerizable domain that undergoes inducible dimerization of Casp11 in the presence of the dimerizing drug AP20187. Critically, inducible dimerization of DmrB-ΔCARD-Casp11-mCherry significantly enhances Casp11-mCherry speck formation, and this speck formation requires catalytic activity, even in the presence of dimerizer (Figure 6A-C). Moreover, we generated CARD-less Casp11-mCherry constructs containing wild-type p20-p10 and catalytically inactive p20-p10. Intriguingly, the CARD was dispensable for spontaneous Casp11-mCherry speck formation, which again was dependent on catalytic activity (Figure 6-figure supplement 2A-B). While we do not currently have data with a TEV-cleavable CDL construct, our data here demonstrate that the CARD is dispensable for speck formation in an overexpression system, implying that the p20/p10 contains all the information that is necessary and sufficient to mediate spontaneous assembly of Casp11 specks in HEK293T cells. Nonetheless, as forced dimerization enhances speck formation (Figure, we hypothesize that CARD-LPS interactions act to facilitate catalytic activity and push cooperative assembly of the Casp11 speck.

      To address whether both the N-terminal CARD and C-terminal p10 domains are present in Casp11 specks, we performed a dual-fluorophore co-localization assay in which we transiently expressed C-terminal mCherry-tagged Casp11 constructs (Casp11-mCherry) in HEK293T cells that stably express N-terminal Flag-tagged Casp11 (2xFLAG-Casp11). As expected, Casp11-mCherry formed specks spontaneously in this setting (Figure 3-figure supplement 1). Critically, both the N-terminal FLAG and C-terminal mCherry were found together in these specks, indicating the presence of both Casp11 N- and C- termini within the specks. Moreover, the wild-type Casp11-mCherry also recruited catalytically inactive 2xFLAG-Casp11C254A, again supporting the finding that wild-type Casp11 can recruit a catalytic mutant to noncanonical inflammasome complexes.

      Comment 3: What are the equations that fit experimental data points and R2 for? E.g. Figure 1E. What are the parameters being fitted/compared and how are those interpreted? A table of fitted values and proper interpretation should be provided.

      We thank the reviewer for this request to clarify how the curves were fit to the experimental data points. We have modified our ‘Statistical Analysis’ section and all figure legends that contain dose-response curves to reflect the equations used to fit each curve. Additionally, please find a table of raw values in the corresponding source data provided for each dose-response curve (Figure 2 Source Data 5; Figure 4 Source Data 3, 6; Figure 5 Source Data 3, 4; Figure 7 Source Data 2; and Figure 4-figure supplement 1 Source Data 1).

    1. Author Response

      Reviewer #1 (Public Review):

      This paper examines different signaling networks and attempts to give general results for when the network will exhibit biphasic behavior, which is the situation when the output of the network is a non-monotonic function of its inputs. The strength of the paper is in the approach it takes. It starts with the simplest network motifs that produce biphasic behavior and then asks too what happens when these motifs are parts of larger networks. Their approach is in contrast to the usual way in which this question is tackled, which tends to be within the confines of a specific signaling network, where general results like the ones that the authors are after, might be hard to spot.

      We thank the reviewer for the careful reading of the manuscript and for the comments and appreciate the fact the reviewer regards the approach as the strength of the paper.

      The weakness of the paper, in my opinion, is the rather formal description of the results which I am afraid will be of rather limited utility to experimental groups seeking to make use of them. The paper attempts to provide general rules for when to expect biphasic behavior and it was hard to assess to what extent such rules exist as behaviors can change depending on the context of a larger network in which the smaller biphasic one is embedded. The other thing that made assessing the generality of the results difficult is that the input-output functions shown in all the figures are computed for a specific choice of parameters and I was left wondering how different choices of parameters might change the reported behaviors. The lack of specific proposals for how their results should guide future experiments on different signaling networks is another weakness.

      We address these points in a number of ways. Initially our presentation was intended to highlight unambiguously which systems (especially the substrate modification building blocks) were capable of biphasic response and which were not, and highlighting parameter dependence on intrinsic kinetic parameters. Based on both referee comments, we make a number of changes

      (a) We highlight the rationale for choosing the suite of biochemical substrate modification systems: enzyme/substrate sharing is a key driver for the origins of biphasic responses and the suite of systems we employ allows us to systematically explore this (see Response to Essential Revisions). These are building blocks of many pathways,

      (b) Biphasic responses emerge from a built in competing effect. In every instance of substrate modification systems, we now highlight the mechanistic underpinning which gives rise to the competing effect responsible for the biphasic response. This will help experimentalists and modellers alike obtain insights into how such behaviour may arise, and the associated ingredients which facilitate that (which may be relevant in other systems). Similarly, we highlight how altered behaviour at the network level may arise from a biphasic interaction pattern, providing the intuition therein and guide further experimental investigation (also see Response to Essential Revisions).

      (c) With regard to parameters (also see Response to Essential Comments) firstly we emphasize that we completely characterize at the substrate modification level, whether biphasic responses are possible as a function of intrinsic kinetic constants. This is done for every system studied. In Fig 2, we depict this, along with sample biphasic dose responses, for pictorial depiction. However, the essential point is that the parametric dependence on intrinsic kinetic parameters is completely done. We indicate in which cases biphasic responses are impossible irrespective of intrinsic kinetic parameters, where they can be obtained for every value of the intrinsic kinetic parameters, and where there are partial restrictions in the intrinsic kinetic parameter space for obtaining this. In the revision we have performed further parametric analysis to assess the impact of species total amount providing further insights. We have also shown that in all these systems biphasic responses can be obtained in ranges of kinetic parameters similar to those found experimentally (eg Wistel et al 2018) and for reasonable species total amounts in systems and synthetic biology. This is analyzed, and depicted in Figure 2-figure supplement 3 and Figure 2-figure supplement 4.

      (d) Also, in response to another comment (about behaviour changing in networks): we first emphasize that we start at the substrate modification level to uncover drivers of biphasic responses at this level. Biphasic responses arise from an inbuilt competing effect and we demonstrate different ways in which such an inbuilt competing effect arises, through sharing of enzymes or substrates. While it is true that the behaviour can change as part of a network (a) It still remains that there are these in-built competing effects which can generate biphasic responses (both substrate and enzyme) and this can manifest at a pathway or network level under suitable conditions (b) the fact that behaviour at a network level may be altered is exactly why we consider studies at the network level showing both biphasic patterns in interaction (the overall behaviour is determined by the motif and the biphasic pattern of interaction and studies involving interaction of biphasic responses at both the network and substrate modification level!! (subsection: The network level)

      (e) We have also expanded on a paragraph on testable predictions in the conclusions (p10).

      Taken together, we believe that these results should interest both experimentalists and modellers and have intrinsic value as well.

      While I appreciate that the authors adopted a style of presenting their results such that all the mathematics is buried in the figures, I found that it made reading the paper quite difficult, and contributed to my confusion about which results are general and insensitive to parameter choices and which are not. I believe a narrative that integrated the math with some simple intuition might have been more effective. For example, when the authors say in the text that model M0 is incapable of displaying biphasic response, how general is that result? Later on, when discussing model M2, they provide a criterion for biphasic response in terms of products of rate constants satisfying an inequality, but the meaning of this condition is not described. Such things make it hard to learn from the authors' work.

      This has indeed been incorporated, and we agree that presenting the intuition and mechanistic underpinning for the behaviour aids readability. In addition to the points about parameters which are now explained at length in the paper , there are a number of paragraphs providing the mechanistic underpinning and intuition for why the behaviour is obtained. Both these are discussed at length in Response to Essential Revisions. Thus, both the mechanistic intuition and the role of parameters are addressed in detail in the revision.

      When M0 is mentioned to be incapable of yielding biphasic responses we mean just that: irrespective of any parameter choice in the model. The meaning of the criterion in Model M2 is now discussed. We take the point about not being able to learn from the work seriously and have made various changes both on the intuition and clarifying the impact of parameters.

      The text is sprinkled with statements like "this reveals the plurality of information processing behaviors..." where the meaning is quite opaque (for this example, there is no description of "information processing" and what it might mean in this context) and therefore it makes it hard to understand what are the lessons learned from these calculations. Another example is found in the description of Erk regulation where the authors speak of "significant robustness" but what is meant by "significant" is also unclear.

      Yes, we agree that these phrases are distracting and not adding much and so we have removed them.

      Overall, I think this is an interesting attempt to provide a general mathematical framework for analyzing biphasic response of signaling networks, but the authors fall short for the reasons described above. I think a lot can be fixed by improving the way the results are presented.

      We have indeed taken these comments on board and aimed to improve the presentation

      Reviewer #2 (Public Review):

      Biphasic responses are widely observed in biological systems and the determination of general design principles underlying biphasic responses is an important problem. The authors attempt to study this problem using a range of biochemical signaling models ranging from simple enzymatic modification and de-modification of a single substrate to systems with multiple enzymes and substrates. The authors used analytical and computational calculations to determine conditions such as network topology, range of concentrations, and rate parameters that could give rise to biphasic responses. I think the approach and the result of their investigation are interesting and can be potentially useful. However, the conditions for biphasic responses are described in terms of parameter ranges or relationships in particular biochemical models, and these parameters have not been connected to the values of concentrations or rates in real biological systems. This makes it difficult to evaluate how these findings would be applicable in nature or in experiments. It might also help if some general mechanisms in terms of competition/cooperation of time scales/processes are gleaned which potentially can be used to analyze biphasic responses in real biological systems.

      We thank the reviewer for a careful reading of the manuscript and for the various comments and are happy to see the reviewer find the approach interesting. We address these comments in more detail below.

      Reading these comments, we recognized how various analysis and algebraic equations could appear opaque to a reader both in terms of what it conveys and its import. To address this, we made a number of changes.

      1. First and foremost, we provide the mechanistic underpinning and intuition for why a competing effect emerges in the first place. We do this for every substrate modification system we analyze and make further comments in the subsection focussing on the network level as well as ERK This intuition should help a reader where the result is coming from and be then able to see if it might apply in a quite different system. This is discussed in detail in Response to Essential Revisions.

      2. Secondly, we have discussed many aspects of the parameters in more detail. Our goal, especially in substrate modification systems was to be able to completely characterize the role of intrinsic kinetic parameters: whether biphasic responses was impossible irrespective of parameters, whether they were possible for every value of intrinsic kinetic parameters or whether they were possible in a subset of kinetic parameter space. This has been done for every substrate modification system, and has been discussed more explicitly in the revision. Furthermore, when biphasic responses were possible, we aimed to determine the impact of species total amounts which facilitated the response. Here we performed additional analytical and semi-analytical work. Additionally with the semi-analytical work and parameters chosen in ranges very similar to those found experimentally (eg Wistel et al 2018), we are able to show that biphasic responses can indeed be obtained in experimentally feasible ranges. Further aspects of the parameters are discussed in detail in the Response to Essential Revisions. In particular, a number of new paragraphs (p2-3, p6) and plots Figure 2-figure supplement 3 and Figure 2-figure supplement 4 specifically deal with this.

      Taken together these address the reviewers points.

    1. Author Response

      Reviewer #1 (Public Review):

      This interesting manuscript sets out to develop for the mouse a series of important concepts and models that this group has previously developed for models of monkey brains, where they showed that in a large-scale model, anterior → posterior spatial gradients such as spine density (and thus inferred strength of local coupling) lead to a transition from transient stimulus responses to persistent responses, capable of supporting working memory (WM). No such spine density gradient is found in the mouse. Here, the authors propose and use modeling to explore the idea, that the corresponding gradient may be that of density of inhibitory PV cells in different regions of the brain.

      The goal of the study - a large-scale, anatomically-constrained model of WM - is an extremely valuable one, and the authors' efforts in this direction should be supported. That said, some of the main claims in the manuscript were not, at least as currently written, clearly supported by the data, a number of important clarifications need to be made, and some claims of novelty are made in a way that, for a typical reader, may obscure the actual contribution being made.

      The biggest issue is that one of the main claims, that together with cell-type specific long-range targeting, "density of cell classes define working memory representations" (abstract), is not terribly clear. For example, Figs. 2D and 2E show that a brain region's hierarchical location tightly predicts its persistent firing rate (2D), but that PV cell fraction has a far weaker correlation (2E). Is hierarchical location sufficient? If PV cell fraction were constant across model brain regions, would we still get persistent activity modes? It seems likely that the answer may be "yes", but the answer, easily within reach of the authors, is surprisingly not in the current version of the manuscript. Figure 3D, for the thalamocortical model, shows no significant correlation of firing rate with PV density.

      Given the claim about PV density (in the abstract and the first main point of the discussion), this is a big concern. Yet it seems easily addressable: e.g. if indeed the authors found that hierarchy was sufficient and PV density immaterial, the model would be no less interesting. And if the authors demonstrated clearly that a PV density gradient is required, that would make the claim a solid one. If, within the model, such a causal demonstration is present, this reader at least missed it.

      MAJOR CONCERNS:

      (1) The model appears to be a model of a single side of the brain. Perhaps each brain region in the model could be considered an amalgam of that region across both sides of the brain. Yet given results like Li et al. Nature 2016, who show that persistent activity is robust to inhibition of one side, but not both sides of ALM, at the very least discussion of the issue is warranted.

      The model is indeed a one-hemisphere model, and an expansion to a bihemispheric model is considered for future work. We have added the following sentence in the Discussion section:

      “Future versions of the large-scale model may consider different interneuron types to understand their contributions to activity patterns in the cortex (Kim et al,2017; Meng et al., 2023; Tremblay and Rudy, 2016; Nigro et al., 2022), the role of interhemispheric projections in providing robustness for short-term memory encoding (Ni et al., 2016), and the inclusions of populations with tuning to various stimulus features and/or task parameters that would allow for switching across tasks (Yang et al, 2018).”

      (2) The authors make an interesting attempt to distinguish core WM regions from other regions such as "readout" regions, defined as showing persistent activity yet not having an effect on persistent activity elsewhere in the network.

      However, this definition seemed problematic: for example, consider a network that consists of 20 brain regions, all interconnected to each other, and all equivalent to each other, capable of displaying persistent activity thanks to mutual connectivity. Imagine that inhibition of any one of these regions is not sufficient to significantly perturb persistent activity in the other 19. Then they would all be labeled as "readout". Yet, by construction in this thought experiment, they are all equivalent to each other and are all core areas. Such redundancy may well be present in the brain. How would the authors address this redundancy issue?

      We acknowledge the importance of this thought experiment. Although we initially restricted the definition of core area to how a single area contributes to working memory, we proceeded with concurrent inhibition of multiple readout areas (see Essential Revisions response 6 above).

      (3) Also important to discuss would be the fact that every brain region in this model is set up as composed of two populations, and when long-range interactions are strong and the attractors strongly coupled, the entire brain is set up as a 1-bit working memory. How would results and the approach be impacted by considering WM for more flexible situations?

      We have used a model of two populations as the simplest way to integrate large-scale connectivity and inhibitory gradients. Indeed, future work should consider more realistic connectivity and populations with various degrees of tuning to different task parameters. (see Reviewer 1 response 1 above)

      (4) Another concern that is important yet easily addressed is the authors' use of the term "novel cell-type specific graph theory measures". Describing in the abstract and elsewhere the fact that what they mean is to take into account the sign of connections, not just their magnitude, would transmit to readers the essence of the contribution in a manner very simple to understand. Most readers would fail to grasp the essential point of the current labeling, which sounds potentially very vague and complex.

      We have reworded the abstract - see also Essential reviews response 2 above.

      (5) Finally, the overall significance of the study, and advances over previous work, were not entirely clear. In the discussion, the authors identify three major findings: (1) WM function is shaped by the PV cell density gradient. But as above, further work is required to make it clear that this claim is supported by the model. (2) if local recurrent excitation is insufficient to generate persistent activity, then long-range recurrent excitation is needed to generate it. I had trouble understanding why a model was needed to reach this conclusion - it seems as if it is simply a question of straightforward logic. The discussion states that in this regard, the work here "offers specific predictions to be tested experimentally", but I had trouble identifying what these specific predictions are. (3) Taking into account sign, not only magnitude, of connections, is important. This last point once again seemed a matter of straightforward logic, making its novelty difficult to assess.

      We thank the reviewer, we have addressed these issues in the Essential Revisions 3) above.

      Reviewer #2 (Public Review):

      This paper uses the mouse mesoscale connectome, combined with data on the number and fraction of PV-type interneurons, to build a large-scale model of working memory activity in response to inputs from various sensory modalities. The key claims of the paper are two-fold. First, previous work has shown that there does not appear to be an increase in the number of excitatory inputs (spines) per pyramidal neuron along the cortical hierarchy (and this increase was previously suggested to underlie working memory activity occurring preferentially in higher areas along the cortical hierarchy). Thus, the claim is that a key alternative mechanism in the mouse is the heterogeneity in the fraction of PV interneurons. Second, the authors claim to develop novel cell type-specific graph theory.

      I liked seeing the authors put all of the mouse connectomic information into a model to see how it behaved and expect that this will be useful to the community at large as a starting point for other researchers wishing to use and build upon such large-scale models. However, I have significant concerns about both primary scientific claims. With regard to the PV fraction, this does not look like a particularly robust result. First, it's a fairly weak result to start, much smaller than the simple effect of the location of an area along the cortical hierarchy (compare Figs. 2D, 2E; 3C, 3D). Second, the result seems to be heavily dependent upon having subdivided the somatosensory cortex into many separate points and focusing the main figures of the paper (and the only ones showing rates as a function of PV cell fraction) solely on simulations in which the sensory input is provided to the visual cortex. With regards to the claim of novel cell type-specific graph theory, there doesn't appear to be anything particularly novel. The authors simply make sure to assign negative rather than positive weights to inhibitory connections in their graph-theoretic analyses.

      Major issues:

      1) Weakness of result on effect of PV cell fraction. Comparing Figures 2D and 2E, or 3C and 3D, there is a very clear effect of cortical hierarchy on firing rate during the delay period in Figures 2D and 3C. However, in Figure 2E relating delay period firing rate to PV cell fraction, the result looks far weaker. (And similarly for Figs. 3C, 3D, with the latter result not even significant). Moreover, the PV cell fraction results are dominated by the zero firing rate brain regions (as opposed to being a nice graded set of rates, both for zeros and non-zeros, as with the cortical hierarchy results of Figures 2D), and these zeros are particularly contributed to by subdividing somatosensory (SS) into many subregions, thus contributing many points at the lower right of the graph.

      Further, it should be noted that Figure 2E is for visual inputs. In the supplementary Figure 2 - supplement 1, the authors do apply sensory inputs to auditory and somatosensory cortex...but then only show the result that the delay period firing rate increases along the cortical hierarchy (as in Figure 2D for the visual input), but strikingly omit the plots of firing rate versus PV cell fraction. This omission suggests that the result is even weaker for inputs to other sensory modalities, and thus difficult to justify as a defining principle.

      We have now made an effort to exhaustively compare the contributions of PV versus hierarchy in defining the firing rate activity patterns in the model - see Essential Revisions response 1 above. Moreover, we included plots of firing rate versus PV cell fraction for other sensory modalities, and the results would still support a common architecture for short-term memory maintenance.

      2) Graph theoretic analyses. The main comparison made is between graph-theoretic quantities when the quantities account for or do not account for, PV cells contributing negative connection strengths. This did not seem particularly novel.

      See Essential Revisions response 2 above

      3) It was not clear to me how much the cell-type specific loop strength results were a result of having inhibitory cell types, versus were a result of the assumption ('counter-stream inhibitory bias') that there is a different ratio of excitation to inhibition in top-down versus bottom-up connections. It seems like the main results were more a function of this assumed asymmetry in top-down vs. bottom-up than it was a function of just using cell-type per se. That is, if one ignored inhibitory neurons but put in the top-down vs. bottom-up asymmetry, would one get the same basic results? And, likewise, if one didn't assume asymmetry in the excitatory vs. inhibitory connectivity in top-down versus bottom-up connections, but kept the Pyramidal and PV cell fraction data, would the basic result go away?

      We have addressed the issue of cell-type specific loop strength in Essential Revisions response 2 above.

      4) In the Discussion, there is a third 'main finding' claimed: "when local recurrent excitation is not sufficient to sustain persistent activity...distributed working memory must emerge from long-range interactions between parcellated areas". Isn't this essentially true by definition?

      We have addressed this important issue in Essential Revisions response 3 above.

      5) I don't know if it's even "CIB" that's important or just "any asymmetry (excitatory or inhibitory) between top-down vs. bottom-up directions along the hierarchy". This is worth clarifying and thinking more about, as assigning this to inhibition may be over-attributing a more basic need for asymmetry to a particular mechanism.

      We found that this asymmetry is indeed crucial, which may be provided by CIB or, in some regimes, it is sufficient that a PV gradient is present - see Essential Revisions response 1 above.

      Other questions:

      1) Is it really true that less than 2% of neurons are PV neurons for some areas? Are there higher fractions of other inhibitory interneuron types for these areas, and does this provide a confound for interpreting model results that don't include these other types?

      Maybe related to the above, the authors write in the Results that local excitation in the model is proportional to PV interneuron density. However, in the methods, it looks like there are two terms: a constant inhibition term and a term proportional to density. Maybe this former term was used to account for other cell types. Also, is local excitation in the model likewise proportional to pyramidal interneuron density (and, if not, why not?)?

      The reviewer is correct in pointing out that the ‘constant inhibition term’, which we interpret as a minimal inhibition, accounts for other cell types. We have added the respective explanation in the Methods section. Future versions of the model may include different interneuron types - see Reviewer 1 Response 1 above.

      2) Non-essential areas. The categorization of areas as 'non-essential' as opposed to, e.g. "inputs" is confusing. It seems like the main point is that, since the delay period activity as a whole is bistable, certain areas' contributions may be small enough that, alone, they can't flip the network between its bistable down and up states. However, this does not mean that such areas (such as the purple 'non-essential' area in Figure 5a) are 'non-essential' in the more common sense of the word. Rather, it seems that the purple area is just a 'weaker input' area, and it's confusing to thus label it as 'non-essential' (especially since I'd guess that, whether or not an area flips on/off the bistability may also depend on the assumed strength of the external input signal, i.e. if one made the labeled 'input area' a bit too weak to alone trigger the bistability, then the purple area might become 'essential' to cross the threshold for triggering a bistable-up state).

      This is an important point, and a similar point was also raised by Reviewer 1. For simplicity, we have restricted the definition of the function of an area (e.g., input, vs core vs non essential) to how a single area contributes to working memory. The existence of ‘subnetworks’ for any of these functions is indeed plausible - and potentially important, but we have left this for future modeling work. (see Essential Revisions response 6 above). The point that distinguishes ‘input’ and ‘non-essential’ areas is simply whether inhibiting said area during the stimulus period affects stimulus-specific persistent activity. Surely some of the areas that we have classified as ‘non-essential’ have important roles, even for the contents of working memory, however they are not essential to produce the activity pattern we observe here.

      3) Relation between 'core areas' and loop strength. The measure underlying 'prediction accuracy = 0.93' in Figure 6D and the associated results seems incomplete by being unidirectional. It captures the direction: 'given high cell-type specific loop strength, then core area' but it does not capture the other direction: 'given a cell is part of a core area, is its predicted cell-type specific loop strength strong?'. It would be good to report statistics for both directions of association between loop strength and core area.

      Indeed the prediction accuracy refers to the direction loop strength->core area, for which we estimate how well a continuous variable (loop strength) predicts a binary variable (whether core area or not). A prediction in the reverse direction is not well defined, namely to predict a continuous variable from a binary variable, so the reverse association may be only indirectly inferred from Figure 8D.

      4) More justification would be useful on the assumption that the reticular nucleus provides tonic inhibition across the entire thalamus.

      Relatively little is known about how specific this inhibition may be. We have included references in the Discussion section that speak to this fact. (Crabtree 2018, Hardinger et al., 2023).

      5) Is NMDA/AMPA ratio constant across areas and is this another difference between mice and monkeys? I am aware of early work in the mouse (Myme et al., J. Neurophys., 2003) suggesting no changes at least in comparing two brain regions' layer 2/3, but has more work been performed related to this?

      Recent anatomical in-vitro autoradiography work in the macaque shows that NMDA/AMPA ratio (in terms of receptor density) varies across the cortical hierarchy (Klatzmann et al., 2022). Functionally NMDA receptors seem important in PFC L2/3 for persistent activity, while in V1, they contribute relatively little to the stimulus response, which is dominated by AMPA-mediated excitation. This was shown by a recent physiological study in the macaque (Yang et al., 2018). This could indeed point to a species difference, although like-for-like comparisons of equivalent experiments across species are lacking in the literature.. We have included this and other related references in our Discussion - see Essential Revision 4 above.

      6) Are bilateral connections between the left and right sides of a given area omitted and could those be important?

      These potentially important connections were omitted for simplicity in the model, please see Reviewer 1 Responses 1, 3 above.

      Reviewer #3 (Public Review):

      Combining dynamical modelling and recent findings of mouse brain anatomy, Ding et al. developed a cell-type-specific connectome-based dynamical model of the mouse brain underlying working memory. The authors find that there is a gradient across the cortex in terms of whether mnemonic information can be sustained persistently or only transiently, and this gradient is negatively correlated to the local density of parvalbumin (PV) positive inhibitory cells but positively correlated with mesoscale-defined cortical hierarchy. In addition, weighing connectivity strength by PV density at target areas provides a more faithful relationship between input strength and delay firing rate. The authors also investigate a model where cortical persistent activity can only be sustained with thalamus input intact, although this result is rather separate from the rest of the study. The authors then use this model to test the causal contributions of different areas to working memory. Although some of the in silico perturbations are consistent with existing experimental data, others are rather surprising and need to be further discussed. Finally, the authors investigate patterns of attractor states as a result of different local and long-range connections and suggest that distinct attractor states could underlie different task demands.

      The importance of PV density as a predictor for working memory activity patterns in the mouse brain is in contrast to recent computational findings in the primate brain where the number of spines (excitatory synapses per pyramidal cell) is the key predictor. This finding reveals important species differences and provides complementary mechanisms that can shape distributed patterns of working memory representation across cortical regions. The method of biologically-based near-whole-brain dynamical modeling of a cognitive function is compelling, and the main conclusions are mostly well supported by evidence. However, some aspects of the method, result, and discussion need to be clarified and extended.

      1) Based on existing anatomical data, the authors reveal a negative correlation between cortical hierarchy (defined by mesoscale connectivity; this concept needs to be explicitly defined in the Results session, not just in the Method section) and local PV density (Fig. 1). In the dynamical model, the authors find that working memory activity is positively (and strongly) correlated with cortical hierarchy and negatively (and less strongly) correlated with PV cell density (Fig. 2), and conclude that working memory activity depends on both. But could the negative correlation between activity and PV density simply result from the inherent relationship between hierarchy and PV density across regions? To strengthen this result, the authors should quantify the predictive power of local PV density on working memory activity beyond the predictive power of cortical hierarchy.

      We have systematically compared the relationship between PV and hierarchy in generating delay-patterns of activity - see Essential Revisions response 1 above.

      2) In Fig. 4, the authors find that cell-type-specific graph measures more accurately predict delay-period firing rates. Specifically, the authors weigh connections with a cell-type-projection coefficient, which is smaller when the PV cell fraction is higher in the target area. Considering that local PV cell fraction is already correlated with delay activity patterns, weighing the input with the same feature will naturally result in a better input-output relationship. This result will be strengthened if there is a more independent measure of cell-type-projection coefficient, such as the spine density of PV vs excitatory cells across regions, or even the percentage of inhibitory versus excitatory cells targeted by upstream region (even just for an example set of brain regions).

      We have compared different measures of cell-type projection coefficients and how they predict delay-patterns of activity and whether an area is a core area - see Essential Revisions response 2 above.

      3) The authors aim to identify a core subnetwork that generates persistent activity across the cortex by characterising delay activity as well as the effects of perturbations during the stimulus and delay period. Consistent with existing data, the model identifies frontal areas and medial orbital areas as core areas. Surprisingly, areas such as the gustatory area are also part of the core areas. These more nuanced predictions from the model should be further discussed. Also surprisingly, the secondary motor cortex (MOs), which has been indicated as a core area for short-term memory and motor planning by many existing studies is classified as a readout area. The authors explain this potential discrepancy as a difference in task demand. The task used in this study is a visual delayed response task, and the task(s) used to support the role of MOs in short-term memory is usually a whisker-based delayed response task or an auditory delay response task. In all these tasks, activity in the delay period is likely a mixture of sensory memory, decision, and motor preparation signals. Therefore, task demand is unlikely the reason for this discrepancy. On the other hand, motor effectors (saccade, lick, reach, orient) could be a potential reason why some areas are recruited as part of the core working-memory network in one task and not in another task. The authors should further discuss both of these points.

      We have addressed this important point in Essential Revisions response 5 above.

      4) As a non-expert in the field, it is rather difficult to grasp the relationship between the results in Fig. 7 and the rest of the paper. Are all the attractor states related to working memory? If so, why are the core regions for different attractor states so different? And are the core regions identified in Fig. 5 based on arbitrary parameters that happen to identify certain areas as core (PL)? The authors should at least further clarify the method used and discuss these results in the context of previous results in this study.

      Attractor states that have a stable baseline are, by definition, related to working memory in that there is a baseline and a memory state associated with the model. Some areas, such as PL are more likely to be associated with different core subnetworks given its position in the hierarchy. In the current version of the manuscript, we provide a motivation for the different attractor states and how they may relate to cognitive function.

    1. Author Response

      Reviewer #1

      While the article clearly outlines the strengths of the chosen approach, it lacks an equally clear exposition of its limitations and a more thorough comparison to established approaches. Two examples of limitations that should be stated more clearly, in my opinion: models need to be small enough to fit on a single machine (in contrast to e.g. NEURON and NEST which support distributed computation via MPI), and only single-compartment models are supported; both limitations are mentioned in passing in the discussion, but would merit a more upfront mention.

      We agree that our paper could be improved by more clearly stating the limitations of our approach and comparing it to established approaches. We have revised the paper and added two new subsections in the Discussion section to address these specific concerns:

      1. The Limitations subsection (L448 - L491) acknowledges restrictions of BrainPy paradigm which uses a Python-based object-oriented programming. It highlights three main categories of limitations: (a) approach limitations, (b) functionality limitations, (c) parallelization limitations. These limitations highlight areas where BrainPy may require further development to improve its functionality, performance, and compatibility with different modeling approaches.

      2. The Future Work subsection (L493 - L526) outlines development enhancements we aim to pursue in the near term. It emphasizes the need for further development in order to meet the demands of the field. Three key areas requiring attention are highlighted: (a) multi-compartment neuron models, (b) ultra-large-scale brain simulations, (c) bridging with acceleration computing systems.

      In addition to these changes, we have also made a number of other minor changes to the paper to improve its clarity and readability.

      The study does not verify the accuracy of the presented framework. While its basic approach (time-step-based simulation, standard numerical integration algorithms) is sufficiently similar to other software to not expect major discrepancies, an explicit comparison would remove any doubt. Quantitative measures of accuracies are particularly important in the context of benchmarks (see below), since simulations can be made arbitrarily fast by sacrificing performance.

      We agree that an explicit comparison would help alleviate any doubts and provide a more comprehensive understanding of our framework's accuracy. We have revised our manuscript to include a dedicated section, particularly Appendix 11. In this section, we verified that all simulators generated consistent average firing rates for the given benchmark network models (figure 1 and figure 2 in Appendix 11). These verifications were performed under different network sizes (ranging from 4e^3 to 4e^5) and different computing platforms (CPU, GPU and TPU). We also qualitatively compared the overall network activity patterns produced by each simulator to ensure they exhibited the same dynamics (figure 3 and figure 4 in Appendix 11). While exact spike-to-spike reproducibility was not guaranteed between different simulator implementations, we confirmed that our simulator produced activity consistent with the reference simulators for both firing rates and network-level dynamics. Additionally, BrainPy did not sacrifice simulation accuracy for speed performance. Despite using single precision floating point, BrainPy was able to produce consistent firing rates and raster diagrams across all simulations (see figure 3 and figure 4 in Appendix 11).

      We hope these revisions can ensure that our manuscript provides a clear and robust validation of the accuracy of our simulator.

      Benchmarking against other software is obviously important, but also full of potential pitfalls. The current article does not state clearly whether the results are strictly comparable. In particular: are the benchmarks on the different simulators calculating results to the same accuracy (use of single or double precision, same integration algorithm, etc.)? Does each simulator use the fastest possible execution mode (e.g. number of threads/processes for NEST, C++ standalone mode in Brian2, etc.)? What is exactly measured (compilation time, network generation time, simulation execution time, ...) - these components will scale differently with network size and simulation duration, so summing them up makes the results difficult to interpret. Details are also missing for the comparison between the XLA operator customization in C++ vs. Python: was the C++ variant written by the authors or by someone else? Does the NUMBA→XLA mechanism also support GPUs/TPUs? This comparison also seems to be missing from the GitHub repository provided for reproducing the paper results.

      We have carefully considered these comments and addressed each of these concerns regarding the benchmarks and examples presented in our paper.

      1. Lack of Details in Examples: In the revised version of the paper, we provide additional information and any other pertinent details to enhance the clarity and replicability of our results. Particularly, in Appendix 9, we provide the mathematical description, the number of neurons, the connection density, and delay times used in our multi-scale spiking network; in Appendix 10, we provide the detail description of reservoir models, evaluation metrics, training algorithms, and their implementations in BrainPy; in Appendix 11, we elaborate the hardware and software specifications and experimental details for benchmark comparisons.

      2. Inadequate Description of Benchmarking Procedures: In the revised paper, particularly, in L328-L329 of the main text at section of "Efficient performance of BrainPy" and in Appendix 11, we elaborate on the integration methods, simulation time steps, and floating-point precision used in our experiments. We also ensure that these parameters are clearly stated and identical across all simulators involved in the benchmarking process, see "Accuracy evaluations" in Appendix 11 (L1550 - L1580).

      3. Clarification on Measured Time: In the revised paper, we state that all simulations only measured the model execution time, excluding model construction time, synapse creation time, and compilation time, see "Performance measurements" in Appendix 11 (L1539 - L1548).

      4. Consideration of Acceleration Modes: In the revised version, we provide the simulation speed of other brain simulators on different acceleration modes, see Figure 8. For instance, we utilize the fastest possible option --- the C++ standalone mode in Brian2 --- for speed evaluations. Furthermore, we have requested the developers of the comparison simulators for optimizing the benchmark models, ensuring a fair and accurate comparison.

      5. Scaling Networks to Maintain Activity: In the revised manuscript, we adopt the suggestion of Reviewer #3 and apply the appropriate scaling techniques to maintain consistent network activity throughout our experiments. These details can be found in Appendix 11 (also see Appendix 11—figure 1 and Appendix 11—figure 2).

      Regarding the comparison between XLA operator customization in C++ and Python, we utilized our self-implemented C++ version, which is accessible in the Appendix 8 Listing 2. Presently, the NUMBA→XLA mechanism does not support GPUs/TPUs; however, we are working on expanding this capability to other platforms. We have made this clarification in the revised manuscript as well (see L1278 - L1285).

      While the authors convincingly argue for the merits of their Python-based/object-oriented approach, in my opinion, they do not fully acknowledge the advantages of domain-specific languages (NMODL, NestML, equation syntax of ANNarchy and Brian2, ...). In particular, such languages aim at a strong decoupling of the mathematical model description from its implementation and other parts of the model. In contrast, models described with BrainPy's approach often need to refer to such details, e.g. be aware of differences between dense and sparse connectivity schemes, online, or batch mode, etc. It might also be worth mentioning descriptive approaches to synaptic connectivity as supported by other simulators (connection syntax in Brian2, Connection Set Algebra for NEST).

      We have made revisions to better acknowledge the merits of DSLs while providing a more comprehensive comparison. These revisions are incorporated in Discussion (L452 - L466) and Appendix 1 (L778 - L788).

      Reviewer #2

      While the results presented are impressive, publishing further details of the benchmarks in an appendix would be helpful for evaluating the claims and the overall conclusion would be more convincing if the performance benefits were demonstrated on a wider selection of test cases. Unsatisfyingly, the authors gave up on making a direct comparison to Brian running on GPUs with GeNN which would have been a fairer comparison than CPU-based simulations. The code for the chosen benchmarks is also likely to be highly optimised by the authors for running on BrainPy but less so for the other platforms - a fairer test would be to invite the authors of the other simulators to optimise the same models and re-evaluate the benchmarks.

      We have carefully considered these comments and addressed each of these concerns regarding the benchmarks and examples presented in our paper.

      1. Lack of Details in Examples: In the revised version of the paper, we provide additional information and any other pertinent details to enhance the clarity and replicability of our results. Particularly, in Appendix 9, we provide the mathematical description, the number of neurons, the connection density, and delay times used in our multi-scale spiking network; in Appendix 10, we provide the detail description of reservoir models, evaluation metrics, training algorithms, and their implementations in BrainPy; in Appendix 11, we elaborate the hardware and software specifications and experimental details for benchmark comparisons.

      2. Inadequate Description of Benchmarking Procedures: In the revised paper, particularly, in L328-L329 of the main text at section of "Efficient performance of BrainPy" and in Appendix 11, we elaborate on the integration methods, simulation time steps, and floating-point precision used in our experiments. We also ensure that these parameters are clearly stated and identical across all simulators involved in the benchmarking process, see "Accuracy evaluations" in Appendix 11 (L1550 - L1580).

      3. Clarification on Measured Time: In the revised paper, we state that all simulations only measured the model execution time, excluding model construction time, synapse creation time, and compilation time, see "Performance measurements" in Appendix 11 (L1539 - L1548).

      4. Consideration of Acceleration Modes: In the revised version, we provide the simulation speed of other brain simulators on different acceleration modes, see Figure 8. For instance, we utilize the fastest possible option --- the C++ standalone mode in Brian2 --- for speed evaluations. Furthermore, we have requested the developers of the comparison simulators for optimizing the benchmark models, ensuring a fair and accurate comparison.

      5. Scaling Networks to Maintain Activity: In the revised manuscript, we adopt the suggestion of Reviewer #3 and apply the appropriate scaling techniques to maintain consistent network activity throughout our experiments. These details can be found in Appendix 11 (also see Appendix 11—figure 1 and Appendix 11—figure 2).

      Regarding the wider selection of test cases, we understand the importance of demonstrating the performance benefits on a broader range of scenarios. Particularly, we have designed two kinds of benchmark models:

      • Sparse connection models. This category models include COBA-LIF network and COBA-HH network. The former is a standard E/I balanced network for comparing simualtion speed of a brain simulator, while the latter uses the complex computational expensive HH neuron model as the elements. Both models can be effectively to demonstrate the capability of a brain simulator for the sparse and event-driven computation.

      • Dense connection models. The local circuits of a cortical column are usually connected densely (Science 366, 1093). Particularly, we use the decision making network proposed by (Wang, 2002) for evaluations.

      In the revised version, we include extensive experiments on these three test cases under different kinds of computing platforms (including CPU, GPU, and TPU) to strengthen the overall conclusion and provide a more comprehensive evaluation of our approach.

      Regarding the comparison to Brian running on GPUs with GeNN, we apologize for not including that in our initial submission. We have conducted the necessary experiments on all three benchmark models we have used in our evaluations and include these results in the revised version of the paper (see Figure 8). This addition will enhance the credibility of our findings and allow for a more meaningful comparison between different simulation platforms. Furthermore, we have also reached out to the authors of other simulators and invite them to optimize the same models used in our benchmarks. We believe this collaborative approach will ensure a more equitable evaluation of the simulators and provide a more robust and convincing analysis of our work.

      Furthermore, the manuscript reads like an advertisement for the platform with very little discussion of its limitations, weaknesses, or directions for further improvement. A more frank and balanced perspective would strengthen the manuscript and give the reader greater confidence in the platform.

      We agree that our paper could be improved by more clearly stating the limitations of our approach and comparing it to established approaches. We have revised the paper and added two new subsections in the Discussion section to address these specific concerns:

      1. The Limitations subsection (L448 - L491) acknowledges restrictions of BrainPy paradigm which uses a Python-based object-oriented programming. It highlights three main categories of limitations: (a) approach limitations, (b) functionality limitations, (c) parallelization limitations. These limitations highlight areas where BrainPy may require further development to improve its functionality, performance, and compatibility with different modeling approaches.

      2. The Future Work subsection (L493 - L526) outlines development enhancements we aim to pursue in the near term. It emphasizes the need for further development in order to meet the demands of the field. Three key areas requiring attention are highlighted: (a) multi-compartment neuron models, (b) ultra-large-scale brain simulations, (c) bridging with acceleration computing systems. In addition to these changes, we have also made a number of other minor changes to the paper to improve its clarity and readability.

      Since simulators wax and wane in popularity, it would be reassuring to see a roadmap for development with a proposed release cadence and a sustainable governance policy for the project. This would serve to both clearly indicate the areas of active development where community contributions would be most valuable and also to reassure potential users that the project is unlikely to be abandoned in the near future, ensuring that their time investment in learning to use the framework will not be wasted.

      We appreciate the reviewer raising the point for demonstrating the project's sustainability. In response to this feedback, we have made the following efforts.

      Firstly, we add and maintain a "Development roadmap" section in the BrainPy GitHub homepage (https://github.com/brainpy/BrainPy). This will enable the community to have a clear understanding of the project's direction and the areas of active development. Additionally, the "Future work" section in our revised paper has also outlined a comprehensive roadmap for next stages of the BrainPy development.

      Secondly, to address the concern about the sustainability of our project and the potential risk of abandonment, we have incorporated a ACKNOWLEDGMENTS.md file in the GitHub (https://github.com/brainpy/BrainPy/blob/master/ACKNOWLEDGMENTS.md) to outline our sustainable funding support. These supports demonstrates our commitment to the long-term maintenance and development of the project, thus may help to dispel doubts of users for the project abandonment.

      Similarly, a complex set of dependencies, which need to be modified for BrainPy, will likely make the project hard to maintain and so a similar plan to those given for the CI pipeline and documentation generation for automation of these modifications would be a good addition. It is also important to periodically reflect on whether it still makes sense to combine all the disparate tools into one framework as the codebase grows and starts to strain under modifications required to maintain its unification.

      We appreciate the reviewer's valuable suggestions on the BrainPy framework.

      First, BrainPy is a self-contained package designed specifically for brain dynamics programming. It boasts minimal dependencies, relying only on fundamental packages within the Python scientific computing ecosystem. In essence, BrainPy relies on numpy for array-based computations and utilizes jax and jaxlib for JIT compilation. While we currently utilize numba to customize dedicated operators, we can also remove this dependency by rewriting these operators with C++ code. We incorporate the use of brainpylib, a package developed by ourselves, which provides dedicated operators for CPUs and GPUs in the context of brain dynamics modeling. Additionally, BrainPy leverages mature solutions within the field for certain auxiliary functions. For instance, we integrate the use of tqdm to facilitate the display of a progress bar during model execution, and employ matplotlib for visualization purposes, capitalizing on its well-established capabilities in the scientific community.

      Second, we agree that there is a risk of overly complex dependencies and architectural strains. To mitigate this risk, we have taken the following changes:

      • We prioritize good software engineering practices like loose coupling, high cohesion and modularity in the framework design. This will isolate dependencies and changes to specific components. For example, brainpy.visualize nodule defines abstract visualization functions in which the visualization backend can be changed anytime.

      • We invest in automating aspects of the build, test, and release process to relieve manual maintenance burdens. We heavily use the GitHub actions for testing BrainPy codes and building documentations.

      • We document dependencies clearly and maintain backwards compatibility when possible. New APIs will be clearly stated supported after which BrainPy version, and deprecated APIs will be deprecated over multiple release cycles.

      • We continuously monitor code complexity metrics and refactor/simplify the architecture when needed.

      • When new tools have significantly different requirements, we will consider spinning them off into separate projects rather than forcing them into the core framework.

      Finally, a live demonstration would be a very useful addition to the project. For example, a Jupyter notebook hosted on mybinder.org or similar, and a fully configured Docker image, would each enable potential users to quickly experiment with BrainPy without having to install a stack of dependencies and troubleshoot version conflicts with their pre-existing setup. This would greatly lower the barrier to adoption and help to convince a larger base of modellers of the potential merits of BrainPy, which could be major, both in terms of the computational speed-up and ease of development for a wide range of modelling paradigms.

      We appreciate the reviewer's valuable feedback and suggestion. We have hosted a Jupyter notebook and a fully configured Docker image on mybinder.org (https://mybinder.org/v2/gh/brainpy/BrainPy-binder/main). Users can easily experiment with BrainPy without the need to install multiple dependencies or troubleshoot version conflicts.

      Reviewer #3

      One potential issue is that the scope of the neuro-simulator is not very clearly explained and the target audience is not well defined: is BrainPy primarily intended for computational neuroscientists or for neuro-AI practitioners? The simulator offers very detailed neural models (HH, fractional order models), classical point-models (LIF, AdEx), rate-coded models (reservoirs), but also deep learning layers (Conv, MaxPool, BatchNorm, LSTM). Is there an advantage to using BrainPy rather than PyTorch for purely deep networks? Is it possible to build hybrid models combining rate-coded reservoirs or convnets with a network of HH neurons? Without such a hybrid approach, it is unclear why the deep learning layers are needed.

      We appreciate the reviewer's concern regarding the scope of BrainPy and the need for clarification regarding the target audience.

      BrainPy is designed to cater to both computational neuroscientists and neuro-AI practitioners by integrating detailed neural models, classical point models, rate-coded models, and deep learning models. The platform aims to provide a general-purpose programming framework for modeling brain dynamics, allowing users to explore the dynamics of brain or brain-inspired models that combines insights from biology and machine learning.

      Particularly, brain dynamics models (provided in brainpy.dyn module) and deep learning models (provided in brainpy.dnn module) are closely integrated with each other in BrainPy. First, to build brain dynamics models, users should use the building blocks in brainpy.dnn module to create synaptic projections.

      Second, to build brain-inspired computing models for machine learning, users could also take advantages of neuronal and synaptic dynamics have been provided in brainpy.dyn module.

      To that end, BrainPy provides building blocks of detailed conductance-based models like Hodgkin-Huxley, as well as common deep learning layers like convolutions.

      Regarding the advantage of using BrainPy over PyTorch for purely deep networks, we acknowledge that existing deep learning libraries like Flax in the JAX ecosystem provide extensive tools and examples for constructing traditional deep neural networks. While BrainPy does implement standard deep learning layers, our primary focus is not to compete directly with those libraries. Instead, we provide these models for the seamless integration of deep learning layers within BrainPy's core modeling abstractions, including variables and dynamical systems. This integration allows researchers to incorporate common deep learning layers into their brain models. Additionally, the inclusion of deep learning layers in BrainPy serves as examples for customization and facilitates the development of tailored layers for neuroscience research. Researchers can modify or extend the implementations to suit their specific needs.

      In summary, BrainPy's scope focuses on the general-purpose brain dynamics programming. The target audience includes computational neuroscientists who want to incorporate insights from machine learning, as well as some ML researchers interested in integrating brain-like components.

      In terms of plasticity, only external training procedures are implemented (backpropagation, FORCE, surrogate gradients). No local plasticity mechanism (Hebbian learning for rate-coded networks, STDP and its variants for spiking networks) seems to be implemented, apart from STP. Is it a planned feature? Appendix 8 refers to bp.synplast.STDP(), but it is not present in the current code (https://github.com/brainpy/BrainPy/tree/master/brainpy/_src/dyn/synplast). Spiking networks without STDP are not going to be very useful to computational neuroscientists, so this suggests that the simulator targets primarily neuro-AI, i.e. AI researchers interested in using spiking models in a machine learning approach.

      We appreciate that the reviewer raising the limitations of BrainPy in terms of local plasticity mechanisms. We are sorry for the delay of implementing STDP models in BrainPy. Currently, we provide very general implementations of STDP. It can be compatible with any synaptic model (such as Exponential, Dual Exponential, AMPA, GABA, and NMDA dynamics), and common connection patterns (such as Dense, and Sparse connection patterns).

      bp.dyn.STDP_Song2001(pre, post, delay, syn, comm, out)

      It can also be easily used with the combination of short-term plasticity models. The modular design of BrainPy's framework also make the plasticity component straightforward to be implemented and integrated into existing models.

      A second weakness of the paper concerns the demos and benchmarks used to demonstrate the versatility and performance of BrainPy, which are not sufficiently described. In Fig. 4, it is for example not explained how the reservoirs are trained (only the readout weights, or also the recurrent ones? Using BPTT only makes sense when the recurrent weights are also trained.), nor how many neurons they have, what the final performance is, etc. The comparison with NEURON, NEST, and Brian2 is hard to trust without detailed explanations. Why are different numbers of neurons used for COBA and COBAHH? How long is the simulation in each setting? Which time is measured: the total time including compilation and network creation, or just the simulation time? Are the same numerical methods used for all simulators? It would also be interesting to discuss why the only result involving TPUs (Fig 8c) shows that it is worse than the V100 GPU. What could be the reason? Are there biologically-realistic networks that would benefit from a TPU? As the support for TPUs is a major selling point of BrainPy, it would be important to investigate its usage further.

      We appreciate the reviewer for raising the important question about the demos and benchmarks used to demonstrate the versatility and performance of BrainPy. To address these concerns, we have added more details in the revised paper, including:

      • In Fig. 4, we explain how the reservoirs are trained in Appendix 10, in which only the readout weights are trained, and they are trained using backpropagation, FORCE learning, and ridge regression algorithms, respectively. We also specify the number of neurons in each reservoir (see L1397), and the final performance of the reservoirs on the task (see Figure 4).

      • To enable readers to better interpret the simulator comparisons in Fig. 8, we have also added more detailed explanations of the comparison with NEURON, NEST, and Brian2 in Appendix 11.

      • In the current revised paper, we provide a comprehensive analysis of BrainPy's compatibility with different hardware platforms, including TPUs, and to identify the specific conditions under which TPUs may offer advantages (see Figure 8 and Appendix 11—figure 7 ). We have also discussed the potential benefits of TPUs for biologically-realistic networks (see L514 - L521). Particularly, for the biological network with arbitrary sparsity, TPUs does not show advantage over GPUs (see Appendix 11—figure 7). TPUs are best at exploiting certain kinds of structured sparsity, for example block sparsity.

    1. Author Response

      Reviewer #1 (Public Review):

      Due complicated and often unpredictable idiosyncratic differences, comparing fMRI topography between subjects typically would require extra expensive scan time and extra laborious analyzing steps to examine with specific functional localizer scan runs that contrast fMRI responses of every subject to different stimulus categories. To overcome this challenge, hyperaligning tools have recently been developed (e.g., Guntupalli et al., 2016; Haxby et al., 2011) based on aligning in a high-dimensional space of voxels of subjects' fMRI responses to watching a given movie. In the present study, Jiahui and colleagues propose a significantly improved version of hyperaligning functional brain topography between individuals. This new version, based on fMRI connectivity, works robustly on datasets when subjects watched different movies and were scanned with different parameters/scanners at different MRI centers.

      Robustness is the major strength of this study. Despite the fact that datasets from different subjects watching different movies at different MRI centers with different scan parameters were used, the results of functional brain topography from between-subject hyperalignment based on fMRI connectivity were comparable to the golden standard of within-subject functional localizations, and significantly better than regular surface anatomical alignments. These results also support the claim that the present approach is a useful improvement from previous hyperalignments based on time-locked fMRI voxel responses, which would require normative samples of subjects watching a same movie.

      We thank the reviewer for the appreciation of our work.

      Given the robustness, this new version of hyperalignment would provide much stronger statistical power for group-level comparisons with less costs of time and efforts to collect and analyze data from large sample size according to the current stringent standard, likely being useful to the whole research community of functional neuroimaging. That said, more discussions of the limit of the present hyperalignment approach would be helpful to potential eLife readers. For example, to what extend the present hyperalignment approach would be applicable to individuals with atypical functional brain topography such as brain lesion patients with e.g., acquired prosopagnosia? Even in typical populations, while bilateral fusiform face areas can be identified in the majority through functional localizer scans, the left fusiform face area sometimes cannot be found. Moreover, many top-down factors are known to modulate functional brain topography. Due to these factors, brain responses and functional connectivity may be different even when a same subject watched a same movie twice (e.g., Cui et al., 2021).

      We thank the reviewer for the suggestion and agree that it would be fascinating if the predictions can be made with high fidelity in neuropsychological populations. Although we are optimistic that our algorithm is able to generalize across diverse populations, to date, no previous literature has provided empirical evidence to illustrate the effectiveness, including optimizations and special applications beyond typical brains. Besides the neuropsychological population, it would also be valuable to study the generalization across a broad age range, for example, from infants to the elderly. The brain changes across age both anatomically and functionally, so it is a challenge to predict functional topographies based on a normative group that only includes young participants. With all these potential applications in mind, future research is needed to illustrate the efficacy, build the pipeline, and construct the representative normative groups to meet the requirements of accurate individualized predictions in diverse populations.

      In typical populations, although participants have great individual variabilities in their functional topographies, for instance, some participants have distinguishable patches of activations in their left ventral temporal cortex while some participants don’t, our algorithms successfully captured these individualized differences in the prediction. The figure below shows, as an example, the face-selective topographies of two individuals that have markedly different face-selective topographies on the left ventral temporal cortex. The left participant has prominent face-selective areas on the left ventral temporal cortex that are in similar sizes as the right side, while the right participant only has a few scattered small face-selective spots on the left side. No matter what their face-selective areas look like, our algorithm accurately recovered the individualized locations, shapes, and sizes, retaining the individual variability in the functional topographies.

      Functional connectivity profiles based on naturalistic stimuli are very stable across the cortex, even when participants watch different movies. In Figure 4-figure supplement 9, the mean correlations of fine-scaled connectome for most searchlights (r = 15mm) when participants watched The Grand Budapest Hotel and the Raiders of the Lost Ark were generally around 0.8. The mean correlations were about 0.9 between the first and second half of the same movie although the stimuli contents were different between the two halves. Thus, the fine-grained functional connectivity profiles remain highly stable and reliable across movie contents, which contributes to the robustness of cross-movie, time, and other parameters (e.g., scanner models, scanning parameter) predictions using our algorithms.

      We added a paragraph in the discuss section to address the concerns (page 18-19):

      “This study successfully illustrated that accurate individualized predictions are both robust and applicable across a variety of conditions, including movie types, languages, scanning parameters, and scanner models. Importantly, the intricate connectivity profiles remain consistent even when participants view entirely different movies, as evidenced by Figure 4-figure supplement 9, reinforcing the prediction's stability in various scenarios. However, all four datasets in this study only included typical participants with anatomically intact brains. An unanswered question is whether individualized topographies of neuropsychological populations with atypical cortical function (e.g., developmental prosopagnosics) or with lesioned brains (e.g., acquired prosopagnosics) could also be accurately predicted using the hyperalignment-based methods. Up to now, as far as we know, no previous literature has investigated this question. Beyond neuropsychological groups, it is also valuable to investigate how well the predictions will be across a wide range of age, from infants to the elderly. Future research is essential to adapt our algorithms to diverse populations.”

      Reviewer #2 (Public Review):

      Guo and her colleagues develop a new approach to map the category-selective functional topographies in individual participants based on their movie-viewing fMRI data and functional localizer data from a normative sample. The connectivity hyperalignment are used to derived the transformation matrices between the participants according to their functional connectomes during movies watching. The transformation matrices are then used to project the localizer data from the normative sample into the new participant and create the idiosyncratic cortical topography for the participant. The authors demonstrate that a target participant's individualized category-selective topography can be accurately estimated using connectivity hyperalignment, regardless of whether different movies are used to calculate the connectome and regardless of other data collection parameters. The new approach allows researchers to estimate a broad range of functional topographies based on naturalistic movies and a normative database, making it possible to integrate datasets from laboratories worldwide to map functional areas for individuals. The topic is of broad interest for neuroimaging community; the rationale of the study is straightforward and the experiments were well designed; the results are comprehensive. I have some concerns that the authors may want to address, particularly on the details of the pipeline used to map individual category-selective functional topographies.

      We thank the reviewer for the encouragement.

      1) How does the length of the scan-length of movie-viewing fMRI affect the accuracy in predicting the idiosyncratic cortical topography? Similarly, how does the number of participants in the normative database affect the prediction of the category-selective topography? This information is important for the researchers who are interested in using the approach in their studies.

      To investigate the influence of movie-viewing data length and the number of participants in the normative database on prediction performance, we systematically varied these parameters. Specifically, we altered the number of runs utilized in the analysis for both the normative and target data and experimented with varying the number of participants in the normative dataset using the Budapest and the Sraiders datasets. We have included a new Figure 4-figure supplement 5 to present a summary of these findings.

      The results reveal that both within-dataset and between-dataset prediction performances are positively correlated with the length of movie-viewing fMRI data used for both the normative and target groups. A similar trend was observed with respect to the number of participants included in the normative dataset. It is important to highlight, though, that, even when analyzing as little as one run of movie-viewing data—roughly 10-15 minutes, our hyperalignment-based prediction performance was significantly higher than that achieved using traditional surface alignment. This held true even when the normative dataset included as few as five participants.

      In summary, our results show that prediction performance generally improves with longer movie-viewing sessions and larger normative datasets. However, it is noteworthy that even with minimal data—10 minutes of movie-viewing and a small number of participants in the normative dataset—our algorithm still outperforms traditional surface alignment methods significantly.

      We also added sentences in the discussion section (page 15):

      “We investigated the influence of naturalistic movie length and the size of the training group on the prediction accuracy of individualized functional topographies. By incrementally increasing both the number of movie runs in the training and target dataset and the participants in the training group in the Budapest and Sraiders dataset, we observed enhanced prediction accuracy (Figure 4-figure supplement 5). Notably, even with just one movie run in the training or target dataset, or with a mere five participants in the training group, our prediction performance (Pearson r) ranged from about 0.6 to 0.7. This accuracy significantly outperformed results obtained using surface-based alignment.”

      2) The data show that category-selective topography can be accurately estimated using connectivity hyperalignment, regardless of whether different movies are used to calculate the connectome and regardless of other data collection parameters. I'm wondering whether the functional connectome from resting state fMRI can do the same job as the movie-watching fMRI. If it is yes, it will expand the approach to broader data.

      We agree with the reviewer that demonstrating the applicability of the resting state data will expand the application scenarios of this approach. Most previous findings on resting state connectivity, including the comparison between the naturalistic and the resting state paradigms, focused on the macro-scale similarities and differences (e.g., Samara et al., 2023). Very few studies have investigated the fine-scaled connectome based on resting state data. The study on connectivity hyperalignment (Guntupalli et al., 2018) demonstrated a shared fine-scale connectivity structure among individuals that co-exists with the common coarse-scale structure and built the algorithm to successfully hyperalign individuals to the shared fine-scaled space. Another study from our lab (Feilong et al., 2021) revealed that the fine-scaled connectivity profiles in both resting and task states are highly predictive of general intelligence, indicating reliable and biologically relevant fine-scaled resting state connectome structures. Thus, it is highly plausible that our approach is able to be generalized to the resting state data, generating significantly better predictions of individualized functional topographies than traditional surface alignment. However, due to the limitations of the current datasets, we do not have resting state data available in the current datasets to perform this analysis. We are in the process of collecting new data to explore this hypothesis in future work.

      We added sentences to the discussion section to discuss this idea (page 18):

      “Studies comparing movie-viewing and resting state functional connectivity have shown that both paradigms yield overlapping macroscale cortical organizations (29), though naturalistic viewing introduces unique modality-specific hierarchical gradients. However, there remains a gap in research comparing the fine-scaled connectomes of naturalistic and resting state paradigms. Guntupalli and colleagues (14) revealed a shared fine-scale structure that coexists with the coarse-scale structure, and connectivity hyperalignment successfully improved intersubject correlations across a wide variety of tasks. Feilong et al. (13) noted that the fine-scaled connectivity profiles in both resting and task states are highly predictive of general intelligence. This suggests a reliable and biologically relevant fine-scale resting state connectivity structure among individuals. Therefore, it is plausible that individualized functional topography could be effectively estimated using resting state functional connectivity, expanding the applicability of our approach. Future studies are needed to explore this direction.”

      3) The authors averaged the hyper-aligned functional localizer data from all of subjects to predict individual category-selective topographies. As there are large spatial variability in the functional areas across subjects, averaging the data from many subjects may blur boundaries of the functional areas. A better solution might be to average those subjects who show highly similar connectome to the target subjects.

      We appreciate the reviewer’s insightful question about optimizing prediction performance by selecting participants most similar in functional connectivity to the target individuals. This is a promising direction and difficult problem as well. Our approach is based on fine-scale connectome to hyperalign participants, thus different groups of participants may be similar to the target participant in different searchlights. In addition, based on results discussed in the response to Q2, the more participants included in the normative dataset, the better the prediction performance. Thus, there is a trade-off between the number of participants included in the normative dataset for the prediction and the overall similarity of those participants to the target participant.

      To quantitatively explore this idea, we used a searchlight in the right ventral temporal cortex, roughly at the location of posterior fusiform face area (pFFA).We sorted participants by their connectome similarity to each target participant and then examined prediction performance based on either the top nine most similar participants or the bottom nine least similar participants. Our results, presented in Figure 4-figure supplement 8, reveal that hyperalignment consistently outperforms surface alignment regardless of the subset of participants used. Notably, using the nine most similar participants did not significantly alter prediction performance (Tukey Test, z = -0.09, p = 0.996), while using the least similar participants did negatively impact it (Tukey Test, z = 2.492, p = 0.034). Interestingly, the stability of hyperalignment-based predictions remained high even when only a subset of participants was used, contrasting with the variability observed in surface-alignment-based predictions.

      Overall, these findings suggest that while selecting functionally similar participants is a promising avenue for future optimization, the process will require nuanced, searchlight-specific criteria. Each searchlight may necessitate its own set of optimal participants to balance between the performance boost from having more participants and the fidelity gained from participant similarity.

      We added the following to the discussion in the manuscript (page 16):

      “In our study, we used fine-scale connectomes, noting that some participants are more similar to the target participant in specific searchlights. It is an interesting question whether predictions could be enhanced by exclusively selecting those more similar participants for the target participant. To explore this option, we examined a searchlight in the right ventral temporal cortex that was roughly at the location of the posterior fusiform area (pFFA) using the top and bottom nine participants similar to each target participant measured by their fine-scale connectome similarities in the budapest dataset. Generally, using all or part of the participants for the prediction generated similar results (Figure 4-figure supplement 8). Compared to using all the participants, using only the top nine participants who are the most similar to the target participants did not significantly improve the prediction (Tukey Test, z = -0.09, p = 0.996), but using only the bottom nine participants generated significantly lower prediction accuracies (Tukey Test, z = 2.492, p = 0.034). This suggests a trade-off between the number of participants included in the prediction and the similarity of the participants. Future studies are needed to explore the optimal threshold for the number of participants included for each searchlight to refine the algorithm.”

      4) It is good to see that predictions made with hyperalignment were close to and sometimes even exceeded the reliability values measured by Cronbach's alpha. But, please clarify how the Cronbach's alpha is calculated.

      Cronbach’s alpha calculates the correlation score between localizer-based maps across the runs, and it reflects the amount of noise in maps based on individual localizer runs. Traditionally, the reliability was estimated based on split-half correlations. For example, Guntupalli et al. (2016) used correlations of category-selectivity maps between odd and even localizer runs as the measure of reliability. The odd/even split measure underestimated reliability and necessitated recalculation of correlations between maps for only half the data to provide valid comparisons. In contrast, Cronbach’s alpha involves all localizer runs and provides a more accurate statistical estimate of the reliability of the topographies estimated with localizer runs.

      Cronbach’s alpha has been used in many previously published works from our lab (e.g., Feilong et al., 2021; Jiahui et al., 2020, 2023). The code for implementing this metric is publicly accessible on the first author’s Github repository (https://github.com/GUOJiahui/face_DCNN/blob/main/code/cronbach_alpha.py).

      We added the detailed explanation above to the Material and Methods section (page 24):

      “Cronbach’s alpha calculates the correlation score between localizer-based maps across the runs, and it reflects the amount of noise in maps based on individual localizer runs. Traditionally, the reliability was estimated based on split-half correlations. The common odd/even split measure underestimated reliability and necessitated recalculation of correlations between maps for only half the data to provide valid comparisons. In contrast, Cronbach’s alpha involves all localizer runs and provides a more accurate statistical estimate of the reliability of the topographies estimated with localizer runs.”

      5) Which algorithm was used to perform surface-based anatomical alignment? Can the state-ofthe-art Multimodal Surface Matching (MSM) algorithm from HCP achieve better performance?

      We preprocessed our datasets using fMRIPrep, which employs algorithms from FreeSurfer’s recon-all for surface-based anatomical alignment. It is worth noting that different alignment methods can yield varying degrees of performance. For instance, a study by Coalson et al. (2018) compared the localization performance of multiple surface-based alignment methods, including Multimodal Surface Matching (MSM) and FreeSurfer. The study found that MSM outperformed FreeSurfer in terms of peak probabilities and spatial clustering, suggesting better overall localization.

      Additionally, Guntupalli et al. (2018) evaluated intersubject correlations (ISC) of functional connectivity from movie-viewing data using both Connectivity Hyperalignment (CHA) and MSM-All with the Human Connectome Project (HCP) dataset. The study showed that although MSM-All yielded marginally better ISC than traditional surface alignment, CHA’s performance was significantly superior.

      In summary, while using a more advanced alignment algorithm like MSM could marginally improve prediction performance, its advantages may not be substantial when compared to our CHA-based predictions. The combination of MSM and CHA represents an intriguing direction for future research, although it falls outside the scope of our current study.

      6) Is it necessary to project to the time course of the functional localizer from the normative sample into the new participants? Does it work if we just project the contrast maps from the normative samples to the new subjects?

      It is an interesting question and a practical alternative to researchers to know whether time series of the localizer runs are required to obtain reasonable predictions, as in some scenarios, contrast maps may be the only accessible data in the analysis. To quantitatively explore this possibility, we applied transformation matrices derived from the movie data to training participants’s individual pre-calculated contrast maps of all four categories, and evaluated the predictions. We found nearly similar prediction performance between the two flavors within and across datasets (Figure 4-figure supplement 7). However, it is worth noting that applying transformation matrices directly to contrast maps did not get as much improvement in the interactive steps as the other flavor in the advanced CHA, perhaps due to the scale changes when multiple iterations were implemented and the difficulty to properly normalize the t-maps compared to the regular time series.

      Overall, although our algorithm is originally designed to be used on the time course of the functional localizer runs, relatively comparable results can be generated even when the contrast maps are directly projected from the normative group to the target participant. However, to derive the best results with our approach, time series are recommended when the situation permits.

      We have also added the contents into the Discussion section (page 16):

      “Our original algorithm is designed to apply transformation matrices to the time series of localizer data of training participants before generating contrast maps. To explore whether directly applying these matrices to pre-calculated contrast maps yields comparable results, we conducted an additional analysis across the four categories. Our findings indicate that the prediction outcomes were indeed quite similar between the two approaches for both the within- and across-datasets predictions (Figure 4-figure supplement 7). However, it is worth noting that the improvements observed with enhanced CHA were not as pronounced when applied directly to the contrast maps as opposed to the time series.”

      7) Saygin and her colleagues have demonstrated that structural connectivity fingerprints can predict cortical selectivity for multiple visual categories across cortex (Osher DE et al, 2016, Cerebral Cortex; Saygin et al, 2011, Nat. Neurosci). I think there's a connection between those studies and the current study. If the author can discuss the connection between them, it may help us understand why CHA work so well.

      We thank the reviewer for raising this point that provides us with the chance of clarifying how our approach differs with methods previously reported in the literature. The computational logic underlying our approach is that we derived the transformation matrices between the training and the target participants in the high-dimensional space based on functional connectivity calculated from the movie data. Then, we applied these transformation matrices to the training participant’s localizer data to accomplish the prediction. On the other hand, Saygin and colleagues directly used diffusion-weighted imaging (DWI) data and predicted participants’ functional responses based on the anatomical-functional correspondence. They evaluated the prediction by calculating the mean absolute errors (AE) of the difference between the actual and predicted contrast responses. Although AE linearly increases with the quality of the prediction, it is difficult to measure the prediction performance of the shape, size, and location of the functional areas precisely using this mean value. With our algorithm, we were able to predict the general location and size of the areas and recover the individualized shapes, generating more powerful predictions. We also used the searchlight analysis to evaluate the performance across the cortex systematically. In addition, Osher et al. (2016) and Saygin et al. (2012) always have a few participants failing to show better predictions based on the connectivity than the group averaged method. Our algorithm is more stable, as all participants across all four datasets had better predicted performance using our algorithm than using the group average. However, although we did not directly use the anatomical-functional correspondence with DWI, the relationships between individual structural connectivity and cortical visual category selectivity could be one of the biological underpinnings that contribute to this robust and accurate prediction.

      The Connectivity-Based Shared Response Model (cSRM, Nastase et al., 2020) offers an alternative framework for aligning individuals through functional connectivity. While the overarching aim of cSRM and our methodology converges, substantial differences emerge in the respective implementation and application between the two methods that make our approach the more suitable for predicting individualized topographies. The most significant difference between the two is that, instead of focusing on within-individual connectivity profiles, cSRM used inter-subject functional connectivity (ISFC) in the initial step. This design requires that all participants must have time-locked time series, making the algorithm unusable for cross-content prediction and making it incompatible with resting-state data. Our approach, on the other hand, does not require time-locked stimuli, thereby offering a more flexible framework that permits generalization across different types of stimuli and experimental settings and enables bringing data across laboratories across the world together. Secondly, cSRM predominantly focuses on Region of Interest (ROI) analyses, whereas our model employs searchlight-based analyses designed to comprehensively cover the entire cortical sheet. Whole-brain coverage is needed to generate the topography that reflects the patterns across the cortex. Finally, with the optimized 1step method, our approach directly hyeraligns the training and target participants together, avoiding the accumulation of errors from the intermediate common space. cSRM, with an implementation similar to the classic connectivity hyperalignment, creates and hyperaligns all participants to a shared information space. In summary, while our approach and cSRM share a similar theoretical foundation, our approach has been specifically optimized to address the challenges and complexities in predicting individualized whole-brain functional topographies. Moreover, our approach demonstrates a remarkable ability to generalize across a variety of contexts and stimuli, offering a significant advantage in dealing with diverse experimental settings and datasets.

      We have added the contents to the discussion section (page 16-17):

      “By leveraging transformation matrices obtained from hyperaligning participants based on movie-viewing data, we successfully mapped these relationships to the training participants’ localizer data, enabling robust predictions. Prior work employing diffusion-weighted imaging (DWI) has underscored the link between anatomical connectivity and category selectivity across diverse visual fields (22, 23) and has established a notable congruence between structural and functional connectivities (24). These findings suggest that the unique anatomical connectivity patterns of individuals may serve as a foundational mechanism, contributing to the stable finescale functional connectome that underpins our approach. The connectivity-based Shared Response Model (cSRM) proposed by Nastase and colleagues (25) used connectivity to functionally align individuals similar to the connectivity hyperalignment algorithm. While both approaches share overarching goals, they diverge considerably in implementation and application. First and most important, cSRM used inter-subject functional connectivity (ISFC) rather than within-subject functional connectivity to initially estimate the connectome. As a result, cSRM requires participants to have time-locked fMRI time series. Therefore, unlike our algorithm, the cSRM approach does not support cross-content applications and also is not suitable for use with resting-state data. Second, cSRM is implemented based on a predefined cortical parcellation rather than the overlapping, regularly-spaced cortical searchlights applied in our method which are not constrained by areal borders. For the application, cSRM has mainly been used to do ROI analysis rather than the estimation of the whole-brain topography that requires broader coverage of the cortex with a searchlight analysis. Third, our method is specifically designed to work in each individual’s space, while cSRM decomposes data across subjects into shared and subjectspecific transformations, focusing on a communal connectivity space. In summary, although cSRM presents a promising alternative for similar aims, its current implementation precludes it from fulfilling the range of applications for which our method is optimized.”

      Reviewer #3 (Public Review):

      In this paper, Jiahui and colleagues propose a new method for learning individual-specific functional resonance imaging (fMRI) patterns from naturalistic stimuli, extending existing hyperalignment methods. They evaluate this method - enhanced connectivity hyperalignment (CHA) - across four datasets, each comprising between nine (Raiders) and twenty (Budapest, Sraiders) participants.

      The work promises to address a significant need in existing functional alignment methods: while hyperalignment and related methods have been increasingly used in the field to compare participants scanned with overlapping stimuli (or lack thereof, in the case of resting state data), their use remains largely tied to naturalistic stimuli. In this case, having non-overlapping stimuli is a significant constraint on application, as many researchers may have access to only partially overlapping stimuli or wish to compare stimuli acquired under different protocols and at different sites.

      It is surprising, however, that the authors do not cite a paper that has already successfully demonstrated a functional alignment method that can address exactly this need: a connectivitybased Shared Response Model (cSRM; Nastase et al., 2020, NeuroImage). It would be relevant for the authors to consider the cSRM method in relation to their enhanced CHA method in detail. In particular, both the relative predictive performance as well as associated computational costs would be useful for researchers to understand in considering enhanced CHA for their applications.

      We thank the reviewer for raising this point that provides us with the chance of clarifying how our approach differs with methods previously reported in the literature. The computational logic underlying our approach is that we derived the transformation matrices between the training and the target participants in the high-dimensional space based on functional connectivity calculated from the movie data. Then, we applied these transformation matrices to the training participant’s localizer data to accomplish the prediction. On the other hand, Saygin and colleagues directly used diffusion-weighted imaging (DWI) data and predicted participants’ functional responses based on the anatomical-functional correspondence. They evaluated the prediction by calculating the mean absolute errors (AE) of the difference between the actual and predicted contrast responses. Although AE linearly increases with the quality of the prediction, it is difficult to measure the prediction performance of the shape, size, and location of the functional areas precisely using this mean value. With our algorithm, we were able to predict the general location and size of the areas and recover the individualized shapes, generating more powerful predictions. We also used the searchlight analysis to evaluate the performance across the cortex systematically. In addition, Osher et al. (2016) and Saygin et al. (2012) always have a few participants failing to show better predictions based on the connectivity than the group averaged method. Our algorithm is more stable, as all participants across all four datasets had better predicted performance using our algorithm than using the group average. However, although we did not directly use the anatomical-functional correspondence with DWI, the relationships between individual structural connectivity and cortical visual category selectivity could be one of the biological underpinnings that contribute to this robust and accurate prediction.

      The Connectivity-Based Shared Response Model (cSRM, Nastase et al., 2020) offers an alternative framework for aligning individuals through functional connectivity. While the overarching aim of cSRM and our methodology converges, substantial differences emerge in the respective implementation and application between the two methods that make our approach the more suitable for predicting individualized topographies. The most significant difference between the two is that, instead of focusing on within-individual connectivity profiles, cSRM used inter-subject functional connectivity (ISFC) in the initial step. This design requires that all participants must have time-locked time series, making the algorithm unusable for cross-content prediction and making it incompatible with resting-state data. Our approach, on the other hand, does not require time-locked stimuli, thereby offering a more flexible framework that permits generalization across different types of stimuli and experimental settings and enables bringing data across laboratories across the world together. Secondly, cSRM predominantly focuses on Region of Interest (ROI) analyses, whereas our model employs searchlight-based analyses designed to comprehensively cover the entire cortical sheet. Whole-brain coverage is needed to generate the topography that reflects the patterns across the cortex. Finally, with the optimized 1step method, our approach directly hyeraligns the training and target participants together, avoiding the accumulation of errors from the intermediate common space. cSRM, with an implementation similar to the classic connectivity hyperalignment, creates and hyperaligns all participants to a shared information space. In summary, while our approach and cSRM share a similar theoretical foundation, our approach has been specifically optimized to address the challenges and complexities in predicting individualized whole-brain functional topographies. Moreover, our approach demonstrates a remarkable ability to generalize across a variety of contexts and stimuli, offering a significant advantage in dealing with diverse experimental settings and datasets.

      We have added the contents to the discussion section (page 16-17):

      “By leveraging transformation matrices obtained from hyperaligning participants based on movie-viewing data, we successfully mapped these relationships to the training participants’ localizer data, enabling robust predictions. Prior work employing diffusion-weighted imaging (DWI) has underscored the link between anatomical connectivity and category selectivity across diverse visual fields (22, 23) and has established a notable congruence between structural and functional connectivities (24). These findings suggest that the unique anatomical connectivity patterns of individuals may serve as a foundational mechanism, contributing to the stable finescale functional connectome that underpins our approach. The connectivity-based Shared Response Model (cSRM) proposed by Nastase and colleagues (25) used connectivity to functionally align individuals similar to the connectivity hyperalignment algorithm. While both approaches share overarching goals, they diverge considerably in implementation and application. First and most important, cSRM used inter-subject functional connectivity (ISFC) rather than within-subject functional connectivity to initially estimate the connectome. As a result, cSRM requires participants to have time-locked fMRI time series. Therefore, unlike our algorithm, the cSRM approach does not support cross-content applications and also is not suitable for use with resting-state data. Second, cSRM is implemented based on a predefined cortical parcellation rather than the overlapping, regularly-spaced cortical searchlights applied in our method which are not constrained by areal borders. For the application, cSRM has mainly been used to do ROI analysis rather than the estimation of the whole-brain topography that requires broader coverage of the cortex with a searchlight analysis. Third, our method is specifically designed to work in each individual’s space, while cSRM decomposes data across subjects into shared and subjectspecific transformations, focusing on a communal connectivity space. In summary, although cSRM presents a promising alternative for similar aims, its current implementation precludes it from fulfilling the range of applications for which our method is optimized.”

      With this in mind, I noted several current weaknesses in the paper:

      First, while the enhanced CHA method is a promising update on existing CHA techniques, it is unclear why this particular six step, iterative approach was adopted. That is: why was six steps chosen over any other number? At present, it is not clear if there is an explicit loss function that the authors are minimizing over their iterations. The relative computational cost of six iterations is also likely significant, particularly compared to previous hyperalignment algorithms. A more detailed theoretical understanding of why six iterations are necessary-or if other researchers could adopt a variable number according to the characteristics of their data-would significantly improve the transferability of this method.

      In the advanced connectivity hyperalignment implementation, we gradually increased the number of targets. The six steps were not intentionally chosen but were the result of the increase to the maximum number of fine-grained targets, namely single cortical vertices.

      Our datasets were resampled to the cortical mesh with 18,742 vertices across both hemispheres (approximately 3 mm vertex spacing; icoorder 5; 20,484 vertices before removing non-cortical vertices). Step 1 was the classic standard connectivity hyperalignment implementation based on the anatomically-aligned data. Since using dense connectivity targets (e.g., using all 18742 vertices on the surface) with anatomically-aligned data generates poor functional correspondence across participants (Busch et al., 2021), we used 1,284 vertices (icoorder 3, before removing the medial wall) as connectivity targets in step 1. However, it is beneficial to include more targets for calculating connectivity patterns after the first iteration of connectivity hyperalignment and repeated iterations to lead to a better solution by gradually aligning the information at finer scales. To better align across participants, we iterated the alignment for another two times (step 2 and step 3) with the same number of 1,284 coarse connectivity targets to ensure improved alignment before increasing the number of targets in the later steps. In step 4, we increased the number of targets to 5,124 (icoorder 4, before removing the medial wall), and iterated with this number of vertices for two times in total (step 4 & step 5) before using all vertices as targets. In the final step (step 6), all vertices were used as connectivity targets.

      It is true that the multiple iteration steps largely increased the computational complexity compared to the classic connectivity hyperalignment, but the prediction increase was steady across all datasets and became comparable to response hyperalignment performance which requires time-locked stimuli. We did not use an explicit loss function in the algorithm, but followed the natural progression of the number of potential connectivity targets in the implementation. On the other hand, the difference between the performance of the improved and the classic connectivity hyperalignment was relatively small (difference of r < 0.05), which indicates the effectiveness of our classic algorithm. It is up to the researchers’ own options to adopt the number of iterations and the pace of increasing the number of targets in each step. If computational resources are limited or if a shorter total computational time is the primary priority, using the classic connectivity hyperalignment may be the best option to balance the trade-offs.

      The Materials and Methods section had the details of the implementation (page 22-23):

      “Using dense connectivity targets (e.g., using all 18742 vertices on the surface) with anatomically-aligned data usually generates poor functional correspondence across participants (33). It is, however, beneficial to include more targets for calculating connectivity patterns after the first iteration of connectivity hyperalignment and repeated iterations to lead to a better solution by gradually aligning the information at finer scales.

      We used six steps to further improve the connectivity hyperalignment method. Step 1 was the initial connectivity hyperalignment step as described above that was based on the raw anatomically aligned movie data. The resultant transformation matrices were applied to those movie runs, and the hyperaligned data were then used in step 2 to calculate new connectivity patterns and calculate new transformation matrices. We repeated this procedure iteratively six times and derived transformation matrices for each step. In steps 1, 2, and 3, 642 × 2 (icoorder3, before removing the medial wall) connectivity targets were defined with 13 mm searchlights. In step 4 and 5, 2562 × 2 (icoorder 4, before removing the medial wall) connectivity targets were used with 7 mm searchlights to calculate target mean time series. In the final step 6, all 18742 vertices were included as separate connectivity targets, using each vertex’s time series rather than calculating the mean in a searchlight. Each step of this advanced connectivity hyperalignment algorithm increased the prediction performance (Figure 4-figure supplement 2).”

      But to help the readers understand the logic of the advanced connectivity hyperalignment algorithm used in this study, we expanded the discussion section (page 15):

      “Because using dense connectivity targets (e.g., using all vertices as connectivity targets) with anatomically-alignment data often leads to suboptimal alignment across participants (33), we started with coarse connectivity targets and gradually increased the number of connectivity targets to form a denser representation of connectivity profiles. The iterations improved the prediction performance step by step, and at the final step (step 6, all vertices were used as connectivity targets) in this analysis, the enhanced CHA generated comparable performance with RHA (Figure 4-figure supplement 4).”

      Second, the existing evaluations for enhanced CHA appear to be entirely based on imagederived correlations. That is, the authors compare the predicted image from CHA with the ground-truth image using correlation. While this provides promising initial evidence, correlation-based measures are often difficult to interpret given their sensitivity to image characteristics such as smoothness. Including Cronbach's alpha reliability as a baseline does not address this concern, as it is similarly an image-based statistic. It would be useful to see additional predictive experiments using frameworks such as time-segment classification, intersubject decoding, or encoding models.

      We appreciate the reviewer’s concern regarding the stability of local correlations in relation to image characteristics. To address this, we conducted additional analysis using different searchlight sizes (with radii of 10 mm, 15 mm, and 20 mm) to evaluate the predicted categoryselective maps, focusing specifically on the Budapest dataset. The local correlations between the predicted category-selective maps (obtained using enhanced CHA) and participants’ own maps based on classic localizer runs were calculated for each searchlight. We averaged these correlations across participants and plotted the resulting maps, as shown in Figure 4-figure supplement 10. Although using a larger searchlight radius is similar to employing a larger smoothing kernel, the results remained relatively stable across different searchlight sizes, particularly in regions selectively responsive to the specific category. This stability suggests that while the evaluation may be influenced by image-related features, the conclusion would remain consistent under varying parameters.

      As for the use of enhanced CHA, it serves as an optimized version of the classic CHA, specifically designed for predicting individualized functional topographies. Evaluating prediction performance in our study is based on t-value contrast maps for each participant. Given this, it's unclear how time-segment classification or other decoding/encoding models could be appropriately implemented for performance evaluation. However, prior research from our lab has already established the effectiveness of classic CHA. Specifically, Guntupalli et al. (2018) showed that classic CHA significantly improved intersubject correlations (ISC) of connectivity profiles across the cortex. They also revealed that CHA captured fine-scale variations in connectivity profiles for nearby cortical nodes across participants and led to improved betweensubject multivariate pattern classification accuracies (bsMVPC) of movie segments. These findings serve as robust evidence for the effectiveness of classic CHA, laying the groundwork for our enhanced CHA approach.

      We added Figure 4-figure supplement 10 to the supplementary material:

      Addressing these concerns and considering cSRM as a comparison model would significantly strengthen the paper. There are also notable strengths that I would encourage the authors to further pursue. In particular, the authors have access to a unique dataset in which the same Raiders of the Lost Ark stimulus was scanned for participants within the Budapest (SRaiders) dataset as well as non-overlapping participants in the Raiders dataset. Exploring the relative performance for cross-movie prediction within a dataset as compared to a shared movie prediction across datasets is particularly interesting for methods development. I would encourage the authors to explicitly report results in this framework to highlight both this unique testing structure as well as the performance of their enhanced CHA method.

      We appreciate the reviewer's suggestion to examine a shared time-series but non-overlapping participants scenario using the Sraiders and Raiders datasets. However, there are significant differences between the two datasets that preclude such direct comparison. These differences include varying scanning parameters, MRI scanners, localizer types, and data collection procedures. Due to these methodological divergences, the datasets cannot be treated as identical time-series.

      Firstly, the scanning parameters vary considerably. Sraiders were scanned with TR = 1 s (TR/TE = 1000/33 ms, flip angle = 59 °, resolution = 2.5 mm3 isotropic voxels, matrix size = 96 × 96, FoV = 240 × 240 mm, multiband acceleration factor = 4, and no in-plane acceleration), and Raiders were scanned with TR = 2.5 s (TR = 2.5 s, TE = 35 ms, Flip angle = 90°, 80 × 80 matrix, FOV = 240 mm × 240 mm, resolution = 0.938 mm × 0.938 mm × 1.0 mm).

      Secondly, participants in the Sraiders were scanned with a 3 T S Magnetom Prisma MRI scanner with a 32 channel head coil and the Raiders dataset, collected more than 10 years ago, used a 3T Philips Intera Achieva scanner with an eight-channel head coil.

      Thirdly, the stimuli presentations were different. In the Sraiders dataset, the movie Raiders of the Lost Ark was split into eight parts (~15 min each), and the first four parts were watched outside of the scanner prior to the scanning (~56 min). The later four parts were watched in the scanner (57 min) with audio. And in the Raiders dataset, the audio-visual movie was split into eight parts (~15 min each). Participants watched all eight parts in the scanner with audio (one part / per run).

      Fourthly and critically, the two datasets included two types of localizers. The Sraiders dataset included dynamic localizer runs, and the Raiders dataset only contained a static localizer that was similarly designed as in the Forrest dataset.

      With all four points, it is not suitable to treat the two datasets as identical time-series. The difference in the localizer type is a further issue. The topographies generated from the two types of localizers are dissimilar in many ways. For all categories, the dynamic localizer elicited stronger and broader category-selective activations than the static localizer, and the searchlight analysis showed that the dynamic localizer had higher reliabilities across the cortex, especially in regions that were selectively responsive to the target category. Due to these differences, crossdataset predictions yielded lower correlations than within-dataset predictions. This is not indicative of methodological failure but reflects diverging topographies activated by different localizers.

      In the manuscript, we have extensively analyzed cross-dataset predictions (Figure 2-figure supplement 1-Figure 4-figure supplement 4 & 6).

      ● Figure 2-figure supplement 1 demonstrates that, despite the limitations of cross-localizertype evaluation, both R-to-S (Raiders to Sraiders) and S-to-R (Sraiders to Raiders) predictions significantly outperformed surface alignment methods across categories.

      ● Figure Figure 2-figure supplement 2 confirms that the prediction performance remained stable across individual participants, underscoring the robustness of our methodology.

      ● Figure 3-figure supplement 1 & Figure 3-figure supplement 2 display contrast maps generated from both native and alternate localizers, revealing that the maps share similar topographies irrespective of the dataset origin.

      ● Figure 4-figure supplement 1 presents a correlation analysis of local similarities in R-to-S and S-to-R predictions, highlighting particularly strong correlations in the ventral face regions.

      ● Figure 4-figure supplement 2 employs histograms to showcase performance across major cortices and furnishes additional evidence regarding the influence of localizer types on the results.

      ● Figure 4-figure supplement 3 offers a searchlight analysis for other categories, enriching the scope of our investigation.

      ● Figure 4-figure supplement 4 affirms that the advanced CHA is effective in both R-to-S and S-to-R predictions.

      ● Figure 4-figure supplement 6 compares the efficacy of 1-step vs. 2-step prediction methods for R-to-S and S-to-R, showing a clear advantage for the 1-step approach.

      These analyses affirmed that our approach outperforms surface alignment methods. But the inherent limitations in data collection and localizer types preclude a direct exploration of the reviewer’s hypothesis. These complexities necessitate further research to fully validate the proposed scenario.

      Overall, I share the authors' enthusiasm for the potential of cross-movie, cross-dataset prediction, and I believe that methods such as enhanced CHA are likely to significantly improve our ability to make these comparisons in the near future. At present, however, I find that the theoretical and experimental support for enhanced CHA is incomplete. It is therefore difficult to assess how enhanced CHA meets its goals or how successfully other researchers would be able to adopt this method in their own experiments.

      We hope our new analysis and replies addressed the reviewer’s concerns.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, the authors describe an elegant genetic screen for mutants that suppress defects of MCT1 deletions which are deficient in mitochondrial fatty acid synthesis. This screen identified many genes, including that for Sit4. In addition, genes for retrograde signaling factors (Rtg1, Rtg2 and Rtg3), proteins influencing proteasomal degradation (Rpn4, Ubc4) or ribosomal proteins (Rps17A, Rps29A) were found. From this mix of components, the authors selected Sit4 for further analysis. In the first part of the study, they analyzed the effect of Sit4 in context of MCT1 mutant suppression. This more specific part is very detailed and thorough, the experiments are well controlled and convincing. The second, more general part of the study focused on the effect of Sit4 on the level of the mitochondrial membrane potential. This part is of high general interest, but less well developed. Nevertheless, this study is very interesting as it shows for the first time that phosphate export from mitochondrial is of general relevance for the membrane potential even in wild type cells (as long as they live from fermentation), that the Sit4 phosphatase is critical for this process and that the modulation of Sit4 activity influences processes relying on the membrane potential, such as the import of proteins into mitochondria. However, some aspects should be further clarified.

      1) It is not clear whether Sit4 is only relevant under fermentative conditions. Does Sit4 also influence the membrane potential in respiring cells? Fig. S2D shows the membrane potential in glucose and raffinose. Both carbon sources lead to fermentative growths. The authors should also test whether Sit4 levels influence the membrane potential when cells are grown under respirative conditions, such in ethanol, lactate or glycerol. Even if deletions of Sit4 affect respiration, mutants with altered activity can be easily analyzed.

      sit4Δ cells fail to grow on nonfermentable media as shown by us (Figure 2—figure supplement 1C) and others (Arndt et al., 1989; Dimmer et al., 2002; Jablonka et al., 2006). In our opinion, the exact reason is unclear, but there is an interesting observation that addition of aspartate can partially restore growth on ethanol (Jablonka et al., 2006). Despite the lack of thorough investigation on this sit4Δ defect, an early study speculated that this defect could be related to the cAMP-PKA pathway (Sutton et al., 1991). This study pointed out genetic interactions of SIT4 with multiple genes in cAMP-PKA (Sutton et al., 1991). In addition, sit4Δ cells have similar phenotypes as those cAMP-PKA null mutants, such as glycogen accumulation, caffeine resistant, and failure to grow on nonfermentable media (Sutton et al., 1991). We have not found sit4Δ mutants that could grow on nonfermentable media based on literature search.

      2) The authors should give a name to the pathway shown in Fig. 4D. This would make it easier to follow the text in the results and the discussion. This pathway was proposed and characterized in the 90s by George Clark-Walker and others, but never carefully studied on a mechanistic level. Even if the flux through this pathway cannot be measured in this study, the regulatory role of Sit4 for this process is the most important aspect of this manuscript.

      We now refer this mechanism as the mitochondrial ATP hydrolysis pathway.

      3) To further support their hypothesis, the authors should show that deletion of Pic1 or Atp1 wipes out the effect of a Sit4 deletion. In these petite-negative mutants, the phosphate export cycle cannot be carried out and thus, Sit4, should have no effect.

      The mitochondrial phosphate transport activity is electroneutral as it also pumps a proton together with inorganic phosphate. The F1 subunit of the ATP synthase (Atp1 and Atp2) is suggested among many literatures to be responsible for the ATP hydrolysis. We performed tetrad dissection to generate atp1Δ or atp2Δ in pho85Δ background. After streaking the single colony to a fresh plate, we noticed that atp1Δ mct1Δ and atp2Δ mct1Δ cells are lethal, and knocking out PHO85 rescued this synthetic lethality. It is not surprising that atp1Δ mct1Δ or atp2Δ mct1 Δ cells are lethal since the F1 subunit is important to generate a minimum of MMP in mct1 Δ cells when the ETC is absent (i.e., rho0 cells). However, knocking out PHO85 can generate MMP independent of F1 subunit of ATP synthase, which is suggested by the viable atp1Δ mct1Δ pho85Δ and atp2Δ mct1Δ pho85Δ cells. There are many ATPases in the mitochondrial matrix that could hydrolyze ATP for ADP/ATP carrier to generate MMP theoretically. However, we do not currently know exactly which ATPase(s) is activated by phosphate starvation. This data is now included as Figure 5—figure supplement 1F-G.

      4) What is the relevance of Sit4 for the Hap complex which regulates OXPHOS gene expression in yeast? The supplemental table suggests that Hap4 is strongly influenced by Sit4. Is this downstream of the proposed role in phosphate metabolism or a parallel Sit4 activity? This is a crucial point that should be addressed experimentally.

      To investigate the role of the Hap complex in MMP generation in sit4Δ cells, we overexpressed and knocked out HAP4, the catalytic subunit of the Hap complex, separately in wild-type and sit4Δ cells. We confirmed the HAP4 overexpression by the enriched abundance of ETC complexes as shown in the BN-PAGE (Figure 2—figure supplement 1E). However, we did not observe any rescue of ETC or ATP synthase in mct1Δ cells when HAP4 was overexpressed. The enriched level of ETC complexes by HAP4 overexpress is not sufficient to rescue the MMP (Figure 2—figure supplement 1F).

      Next, we knocked out HAP4 in sit4Δ cells. Knocking out SIT4 could still increase MMP in hap4Δ cells with a much-reduced magnitude, which phenocopied ETC subunit and RPO41 deletion in sit4Δ cells (Figure 2—figure supplement 1G).

      In conclusion, the Hap complex is involved in the MMP increase when SIT4 is absent. However, it is not sufficient to increase MMP by overexpressing HAP4. The Hap complex discussion is now included in the manuscript, and the data is presented as Figure 2—figure supplement 1E-G.

      5) The authors use the accumulation of Ilv2 precursors as proxy for mitochondrial protein import efficiency. Ilv2 was reported before as a protein which, if import into mitochondria is slow, is deviated into the nucleus in order to be degraded (Shakya,..., Hughes. 2021, Elife). Is it possible that the accumulation of the precursor is the result of a reduced degradation of pre-Ilv2 in the nucleus rather than an impaired mitochondrial import? Since a number of components of the ubiquitin-proteasome system were identified with Sit4 in the same screen, a role of Sit4 in proteasomal degradation seems possible. This should be tested.

      We thank the reviewer for pointing out this potential caveat with our Ilv2-FLAG reporter. With limited search and tests, we could not find another reporter that behaves like Ilv2FLAG. The reason Ilv2-FLAG is a perfect reporter for this study is because in wild-type cells, Ilv2-FLAG is not 100% imported. Therefore, we could demonstrate that mitochondria with higher MMP import more efficiently. Unfortunately, all of the mitochondrial proteins that we tested could efficiently import in wild-type cells. To identify other suitable mitochondrial proteins that behave like Ilv2-FLAG, we would need to conduct a more comprehensive screen.

      To address the concern of the involvement of protein degradation in obscuring the interpretation of Ilv2-FLAG import, we performed two experiments. First, we measured the proteasomal activity in wild-type and our mutants using a commercial kit (Cayman). We did not observe a statistically significant difference in 20S proteasomal activity between wild-type and sit4Δ cells.

      In the second experiment, we reduced the MMP of sit4 cells using CCCP treatment and measured the Ilv2-FLAG import. We first treated sit4Δ cells with different dosage of CCCP for six hours and measured their MMP. sit4Δ cells treated with 75 µM CCCP had comparable MMP to wild-type cells. When we treated sit4Δ cells with higher concentrations of CCCP, most of the cells did not survive after six hours. Next, we performed the Ilv2-FLAG import assay. We observed similar level of unimported Ilv2FLAG (marked with *) in sit4Δ cells treated with 75 µM CCCP. This result confirms that sit4Δ cells have similar Ilv2-FLAG turnover mechanism and activity as the wild-type cells, because when we lower the MMP in sit4Δ background we observe a similar level of unimported Ilv2-FLAG. We thus feel confident in concluding that the Ilv2-FLAG import results are indeed an accurate proxy for MMP level. These data are now included as Figure 1—figure supplement 1H-J in the manuscript.

      Author response image 1.

      Reviewer #2 (Public Review):

      This study reports interesting findings on the influence of a conserved phosphatase on mitochondrial biogenesis and function. In the absence of it, many nucleus-encoded mitochondrial proteins among which those involved in ATP generation are expressed much better than in normal cells. In addition to a better understanding of th mechanisms that regulate mitochondrial function, this work may help developing therapeutic strategies to diseases caused by mitochondrial dysfunction. However there are a number of issues that need clarification.

      1) The rationale of the screening assay to identify genes required for the gene expression modifications observed in mct1 mutant is not clear. Indeed, after crossing with the gene deletion libray, the cells become heterozygote for the mct1 deletion and should no longer be deficient in mtFAS. Thank you for clarifying this and if needed adjust the figure S1D to indicate that the mated cells are heterozygous for the mct1 and xxx mutations.

      We updated the methods section and the graphic for the genetic screen to clarify these points within the SGA workflow overview. After we created the heterozygote by mating mct1Δ cells with the individual KO cells in the collection, these diploids underwent sporulation and selection for the desired double KO haploid. As a result, the luciferase assay was performed in haploid cells with MCT1 and one additional non-essential gene deleted.

      2) The tests shown in Fig. S1E should be repeated on individual subclones (at least 100) obtained after plating for single colonies a glucose culture of mct1 mutant, to determine the proportion of cells with functional (rho+) mtDNA in the mct1 glucose and raffinose cultures. With for instance a 50% proportion of rho- cells, this could substantially influence the results of the analyses made with these cells (including those aiming to evaluate the MMP).

      We agree that this would provide a more confident estimate for population-level characterization of these colonies. It is important to note that we randomly chose 10 individual subclones, and 100% of these colonies were verified to be rho+. This suggests the population has functional mtDNA, and thus felt confident in the identity of our populations.

      3) The mitochondria area in mct1 cells (Fig.S1G) does not seem to be consistent with the tests in Fig. 1C. that indicate a diminished mitochondrial content in mct1 cells vs wild-type yeast. A better estimate (by WB for instance) of the mitochondrial content in the analyzed strains would enable to better evaluate MMP changes monitored with Mitotracker since the amount of mitochondria in cells correlate with the intensity of the fluorescence signal.

      As this reviewer pointed out, we quantified mitochondrial area based on Tom70-GFP signal. This measurement is quantified by mitochondrial area over cell size. Cell size is an important parameter when measuring organelle size as most of the organelles scale up and down with the cell size. mct1Δ cells generally have smaller cell size than WT cells. Therefore, the mitochondrial area of mct1Δ cells was not significantly different from WT cells when scaled to cell size. We believe this is the best method to compare mitochondrial area. As for quantifying MMP from these microscopy images, we measured the average MitoTracker Red fluorescence intensity of each mitochondria defined by Tom70-GFP. This method inherently normalizes to subtract the influence of mitochondria area when quantifying MMP.

      4) Page 12: "These data demonstrate that loss of SIT4 results in a mitochondrial phenotype suggestive of an enhanced energetic state: higher membrane potential, hyper-tubulated morphology and more effective protein import." Furthermore, the sit4 mutant shows higher levels of OXPHOS complexes compared to WT yeast.

      Despite these beneficial effects on mitochondria, the sit4 deletion strain fails to grow on respiratory substrates. It would be good to know whether the authors have some explanation for this apparent contradiction.

      We agree that this was initially puzzling. We provide a more complete explanation above (see comments to reviewer #1 - major concern #1). Briefly, the growth deficiency in non-fermentable media with sit4Δ cells was reported and studied by multiple groups (Arndt et al., 1989; Dimmer et al., 2002; Jablonka et al., 2006). These seems to indicate that sit4Δ cells contain more ETC complexes and more OCR but cannot respire on nonfermentable carbon source. However, we do not think there is yet a clear explanation for this phenotype. One interesting observation reported is the addition of aspartate partly restoring cells’ growth on ethanol (Jablonka et al., 2006). One early study speculates that this defect could be related to the cAMP-PKA pathway. Sutton et al. pointed out genetic interactions with sit4 and multiple genes in cAMP-PKA (Sutton et al., 1991). In addition, sit4Δ cells have similar phenotypes as those cAMP-PKA null mutants, such as glycogen accumulation, caffeine resistance, and failure to grow on non-fermentable media. However, to keep this manuscript succinct, we opted to stay focused on MMP.

      Reviewer #3 (Public Review):

      In this study, the authors investigate the genetic and environmental causes of elevated Mitochondrial Membrane Potential (MMP) in yeast, and also some physiological effects correlated with increased MMP.

      The study begins with a reanalysis of transcriptional data from a yeast mutant lacking the gene MCT1 whose deletion has been shown to cause defects in mitochondrial fatty acid synthesis. The authors note that in raffinose mct1del cells, unlike WT cells, fail to induce expression of many genes that code for subunits of the Electron Transport Chain (ETC) and ATP synthase. The deletion of MCT1 also causes induction of genes involved in acetyl-CoA production after exposure to raffinose. The authors therefore conduct a screen to identify mutants that suppress the induction of one of these acetylCoA genes, Cit2. They then validate the hits from this screen to see which of their suppressor mutants also reduce expression in four other genes induced in a mct1del strain. This yielded 17 genes that abolished induction of all 5 genes tested in an mct1del background during growth on raffinose.

      The authors chose to focus on one of these hits, the gene coding for the phosphatase SIT4 (related to human PP6) which also caused an increase in expression of two respiratory chain genes. The authors then investigated MMP and mitochondrial morphology in strains containing SIT4 and MCT1 deletions and surprisingly saw that sit4del cells had highly elevated MMP, more reticular mitochondria, and were able to fully import the acetolactate synthase protein Ilv2p and form ETC and ATP synthase complexes, even in cells with an mct1del background, rescuing the low MMP, fragmented mitochondria, low import of Ilv2 and an inability to form ETC and ATP synthase complexes phenotypes of the mct1del strain. Surprisingly, the authors find that even though MMP is high and ETC subunits are present in the sit4del mct1del double deletion strain, that strain has low oxygen consumption and cannot grow under respiratory conditions, indicating that the elevated MMP cannot come from fully functional ETC subunits. The authors also observe that deleting key subunits of ETC complex III (QCR2) and IV (COX5) strongly reduced the MMP of the sit4del mutant, which would suggest that the majority of the increase in MMP of the sit4del mutant was dependant on a partially functional ETC. The authors note that there was still an increase in MMP in the qcr2del sit4del and cox4del sit4del strains relative to qcr2del and cox4del strains indicating that some part of the increase in MMP was not dependent on the ETC.

      The authors dismiss the possibility that the increase in MMP could have been through the reversal of ATP synthase because they observe that inhibition of ATP synthase with oligomycin led to an increase of MMP in sit4del cells. Indicating that ATP synthase is operating in a forward direction in sit4del cells.

      Noting that genes for phosphate starvation are induced in sit4del cells, the authors investigate the effects of phosphate starvation on MMP. They found that phosphate starvation caused an increase in MMP and increased Ilv2p import even in the absence of a mitochondrial genome. They find that inhibition of the ADP/ATP carrier (AAC) with bongkrekic acid (BKA) abolishes the increase of MMP in response to phosphate starvation. They speculate that phosphate starvation causes an increase in MMP through the import and conversion of ATP to ADP and subsequent pumping of ADP and inorganic phosphate out of the mitochondria.

      They further show that MMP is also increased when the cyclin dependent kinase PHO85 which plays a role in phosphate signaling is deleted and argue that this indicates that it is not a decrease in phosphate which causes the increase in MMP under phosphate starvation, but rather the perception of a decrease in phosphate as signalled through PHO85. Unlike in the case of SIT4 deletion, the increase in MMP caused by the deletion of pho85 is abolished when MCT1 is deleted.

      Finally they show an increase in MMP in immortalized human cell lines following phosphate starvation and treatment with the phosphate transporter inhibitor phosphonoformic acid (PFA). They also show an increase in MMP in primary hepatocytes and in midgut cells of flies treated with PFA.

      The link between phosphate starvation and elevated MMP is an important and novel finding and the evidence is clear and compelling. Based on their experiments in various mammalian contexts, this link appears likely to be generalizable, and they propose and begin to test an interesting hypothesis for how MMP might occur in response to phosphate starvation in the absence of the Electron Transport Chain.

      The link between phosphate starvation and deletion of the conserved phosphatase SIT4 is also interesting and important, and while the authors' experiments and analysis suggest some connection between the two observations, that connection is still unclear.

      Major points

      Mitotracker is great fluorescent dye, but it measures membrane potential only indirectly. There is a danger when cells change growth rates, ion concentrations, or when the pH changes, all MMP indicating dyes change in fluorescence: their signal is confounded Change in phosphate levels can possibly do both, alter pH and ion concentrations. Because all conclusions of the manuscript are based on a change in MMP, it would be a great precaution to use a dye-independent measure of membrane potential, and confirm at least some key results.

      Mitochondrial MMP does strongly influence amino acid metabolism, and indeed the SIT4 knockout has a quite striking amino acid profile, with histidine, lysine, arginine, tyrosine being increased in concentration. http://ralser.charite.de/metabogenecards/Chr_04/YDL047W.html Could this amino acid profile support the conclusions of the authors? At least lysine and arginine are down in petites due to a lack of membrane potential and iron sulfur cluster export.- and here they are up. Along these lines, according to the same data resource, the knock-outs CSR2, ASF1, SSN8, YLR0358 and MRPL25 share the same metabolic profile. Due to limited time I did not re-analyse the data provided by the authors- but it would be worth checking if any of these genes did come up in the screens of the authors.

      We tested the mutants within the same cluster as SIT4 shown in this paper from the deletion collection and measured their MMP. yrl358cΔ cells have similar high MMP as observed in sit4Δ cells. However, this gene has a yet undefined function. Beyond YRL358C, we did not observe similar MMP increases in other gene deletions from this panel, which does not support the notion that amino acids such as histidine, lysine, arginine, or tyrosine play a determining effect in driving MMP.

      The media condition and strain used in the suggested paper is very different from what we used in our study. Instead of growing prototrophic cells in minimal media without any amino acids, we used auxotrophic yeast strains and grew them in media containing complete amino acids. So far, none of the other defects or signaling associated with SIT4 deletion could influence MMP as much as the phosphate signaling. We interpret these data to support the hypothesis that the MMP observation in sit4Δ cells is connected with the phosphate signaling as illustrated by the second half of the story in our manuscript.

      Author reponse image 2.

      One important claim in the manuscript attempts to explain a mechanism for the MMP increase in response to phosphate starvation which is independent of the ETC and ATP synthase.

      It seems to me the only direct evidence to support this claim is that inhibition of the AAC with BKA stops the increase of mitotracker fluorescence in response to phosphate starvation in both WT and rho0 cells (Figs 4B and 4C). It would strengthen the paper if the authors could provide some orthogonal evidence.

      This is a similar comment as raised by reviewer #1 - major concern #3. We refer the reviewer to our discussion and the new data above. Briefly, we do not think F1 subunit is responsible for the ATP hydrolysis activity to generate MMP in phosphate depleted situation. We believe there are additional ATPase(s) in the mitochondrial matrix that can be utilized to couple to ADP/ATP carrier for MMP generation during phosphate starvation. However, we have not identified the relevant ATPase(s) at this point, and it is likely that multiple ATPases could contribute to this activity.

      Introduction/Discussion The author might want to make the reader of the article aware that the 'reversal' of the ATP synthase directionality -i.e. ATP hydrolysis by the ATP synthase as a mechanism to create a membrane potential (in petites), has always been a provocative idea - but one that thus far could never be fully substantiated. Indeed some people that are very familiar with the topic, are skeptical this indeed happens. For instance, Vowinckel et al 2021 (PMID: 34799698) measured precise carbon balances for peptide cells, and found no evidence for a futile cycle - peptides grow slower, but accumulate the same biomass from glucose as peptides that re-evolve at a fast growth rate . Perhaps the manuscript could be updated accordingly.

      We thank the reviewer for pointing out this additional relevant study. We have rephased the referenced sentence in the introduction. The MMP generation in phosphate starvation is independent of the F1 portion of ATP synthase. Therefore, our data neither supports or refutes either of these arguments.

      In the introduction and conclusion there is discussion of MMP set points. In particular the authors state:

      "Critically, we find that cells often prioritize this MMP setpoint over other bioenergetic priorities, even in challenging environments, suggesting an important evolutionary benefit."

      This does not seem to be consistent with the central finding of the manuscript that MMP changes under phosphate starvation. MMP doesn't seem so much to have a 'set point' but rather be an important physiological variable that reacts to stimuli such as phosphate starvation.

      The reviewer raises a rational alternative hypothesis to the one that we have proposed. In reality, both of these are complete speculations to explain the data and we can’t think of any way to test the evolutionary basis for the mechanisms that we describe. We recognize that untested/untestable speculative arguments have limitations and there are viable alternative hypotheses. We have softened our language to ensure that it is clear that this is only a speculation.

      The authors suggest that deletion of Pho85 causes an increase in MMP because of cellular signaling. However, they also state in the conclusion:

      "Unlike phosphate starvation, the pho85D mutant has elevated intracellular phosphate concentrations. This suggests that the phosphate effect on MMP is likely to be elicited by cellular signaling downstream of phosphate sensing rather than some direct effect of environmental depletion of phosphate on mitochondrial energetics."

      The authors should cite the study that shows deletion of PHO85 causes increased intracellular phosphate concentrations. It also seems possible that the 'cellular signaling' that causes the increase in MMP could be a result of this increase in intracellular phosphate concentrations, which could constitute a direct effect of an environmental overload of phosphate on mitochondrial energetics.

      We now cited the literature that shows higher intracellular phosphate in pho85Δ cells (Gupta et al., 2019; Liu et al., 2017). Depleting phosphate in the media drastically reduced intracellular phosphate concentration, which is the opposing situation as pho85Δ cells. Nevertheless, we observed higher MMP in either situation. We concluded from these two observations that the increase in MMP is a response to the signaling activated by phosphate depletion rather than the intracellular phosphate abundance.

      Related to this point, in the conclusion, the authors state:

      "We now show that intracellular signaling can lead to an increased MMP even beyond the wild-type level in the absence of mitochondrial genome."

      In sum, the data shows that signaling is important here- but signaling alone is only the message - not the biophysical process that creates a membrane potential. The authors then could revise this slightly.

      We have rephrased this sentence as suggested, which now reads “We now show that intracellular signaling triggers a process that can lead to an increased MMP even beyond the wild-type level in the absence of mitochondrial genome”.

      The authors state in the conclusion that

      "We first made the observation that deletion of the SIT4 gene, which encodes the yeast homologue of the mammalian PP6 protein phosphatase, normalized many of the defects caused by loss of mtFAS, including gene expression programs, ETC complex assembly, mitochondrial morphology, and especially MMP (Fig. 1)"

      The data shown though indicates that a defect in mtFAS in terms of MMP, deletion of SIT4 causes a huge increase (and departure away from normality) whether or not mct1 is present (Fig 1D)

      We changed the word “normalized” to “reversed”. In the discussion section, we also emphasized that many of these increases are independent of mitochondrial dysfunction induced by loss of mtFAS.

      The language "SIT4 is required for both the positive and negative transcriptional regulation elicited by mitochondrial dysfunction" feels strong. SIT4 seems to influence positive transcriptional regulation in response to mitochondrial dysfunction caused by MCT1 deletion (but may not be the only thing as there appears to be an increase in CIT2 expression in a sit4del background following a further deletion of MCT1). In terms of negative regulation, SIT4 deletion clearly affects the baseline, but MCT1 deletion still causes down regulation of both examples shown in Fig 1B, showing that negative transcriptional regulation can still occur in the absence of SIT4. The authors might consider showing fold change of expression as they do in later figures (Figs 4B and C) to help the reader evaluate the quantitative changes they demonstrate.

      We now displayed the fold change as suggested. This sentence now reads “These data suggest that SIT4 positively and negatively influences transcriptional regulation elicited by mitochondrial dysfunction”.

      The authors induce phosphate starvation by adding increasing amounts of potassium phosphate monobasic at a pH of 4.1 to phosphate dropout media supplemented with potassium. The authors did well to avoid confounding effects of removing potassium. The final pH of YNB is typically around 5.2. Is it possible that the authors are confounding a change in pH with phosphate starvation? One would expect the media in the phosphate starvation condition to have a higher pH than the phosphate replacement or control media. Is a change in pH possibly a confounding factor when interpreting phosphate starvation? Perhaps the authors could quantify the pH of the media they use for the experiment to understand how much of a factor that could be. One needs to be careful with Miotracker and any other fluorescent dye when pH changes. Albeit having constraints on its own, MitoLoc as a protein rather than small molecule marker of MMP might be a good complement.

      We followed the protocol used by many other studies that depleted phosphate in the media. The reason we and others adjusted the media without inorganic phosphate to a pH of 4.1 is because that is the pH of phosphate monobasic. From there, we could add phosphate monobasic to create +Pi media without changing the media pH. Therefore, media containing different concentrations of phosphate all have the exact same pH. We now emphasize that all media containing different levels of inorganic phosphate have the same pH to the manuscript to eliminate such concern (see page 18).

      Even though all media have the similar pH, we also provided complementary data using a parallel approach to measure the MMP by assessing mitochondrial protein import as demonstrated previously with Ilv2-FLAG, which shares the same principle as mitoLoc.

      Reference

      Arndt, K. T., Styles, C. A., & Fink, G. R. (1989). A suppressor of a HIS4 transcriptional defect encodes a protein with homology to the catalytic subunit of protein phosphatases. Cell, 56(4), 527–537. https://doi.org/10.1016/00928674(89)90576-X

      Dimmer, K. S., Fritz, S., Fuchs, F., Messerschmitt, M., Weinbach, N., Neupert, W., & Westermann, B. (2002). Genetic basis of mitochondrial function and morphology in Saccharomyces cerevisiae. Molecular Biology of the Cell, 13(3), 847–853. https://doi.org/10.1091/mbc.01-12-0588

      Gupta, R., Walvekar, A. S., Liang, S., Rashida, Z., Shah, P., & Laxman, S. (2019). A tRNA modification balances carbon and nitrogen metabolism by regulating phosphate homeostasis. ELife, 8, e44795. https://doi.org/10.7554/eLife.44795

      Jablonka, W., Guzmán, S., Ramírez, J., & Montero-Lomelí, M. (2006). Deviation of carbohydrate metabolism by the SIT4 phosphatase in Saccharomyces cerevisiae. Biochimica et Biophysica Acta (BBA) - General Subjects, 1760(8), 1281–1291. https://doi.org/10.1016/j.bbagen.2006.02.014

      Liu, N.-N., Flanagan, P. R., Zeng, J., Jani, N. M., Cardenas, M. E., Moran, G. P., & Köhler, J. R. (2017). Phosphate is the third nutrient monitored by TOR in Candida albicans and provides a target for fungal-specific indirect TOR inhibition. Proceedings of the National Academy of Sciences, 114(24), 6346–6351. https://doi.org/10.1073/pnas.1617799114

      Sutton, A., Immanuel, D., & Arndt, K. T. (1991). The SIT4 protein phosphatase functions in late G1 for progression into S phase. Molecular and Cellular Biology, 11(4), 2133–2148.

    1. Author Response

      Reviewer #2 (Public Review):

      Weaknesses:

      1)The authors demonstrate that Isw1 has a role in responding to antifungals in Cryptococcus. However, it is not clear if changes in Isw1 stability represent a general response to stress. This study would have benefited from experiments to test: (1) if levels of Isw1 change in response to other stressors (e.g., heat, osmotic, or oxidative stress) and (2) if loss of Isw1 impacts resistance to other stressors.

      A series of experiments were conducted to illustrate and measure phenotypic traits associated with virulence. These traits encompassed capsule formation, melanin synthesis, cell proliferation under stressful conditions, and Isw1 expression levels in response to diverse environmental stimuli. Please see Figure 3a, 3b, 3c, Figure 3-figure supplement 1 and line 237-241.

      2) The authors demonstrate a critical role in the acetylation of K97 and ubiquitination of K441 in regulating Isw1 stability. Additionally, this study shows that K113 is also likely involved in this process. However, it appears that K113 can be either acetylated or ubiquitinated, and it is, thus, less clear if one of the two modifications or both modifications is critical at this residue. Additional experiments may be required to answer this question. This study would have benefited from an additional discussion on the results related to the modification of K113.

      We express our genuine gratitude for this insightful critique pertaining to the K113 site. In our study, we observed the presence of acetylation and ubiquitination changes at the K113 site in our mass spectrometry data. This finding suggests that a proportion of Isw1 is acetylated, while another proportion of Isw1 is ubiquitinated. In order to analyze the K113 function, a series of experiments were conducted, involving the production of triple, double, and single mutations at positions K89, K97, and K113. In addition, the utilization of K-to-R (mimicking deacetylation) and K-to-Q (mimicking acetylation) methodologies was implemented. To elucidate the significance of the acetylation modification of K113, a series of mutants were created. The K-to-R mutation was employed to indicate the deacetylation and deubiquitylation status, while the K-to-Q mutation was utilized to represent the acetylation and deubiquitylation status. In our dataset, it was shown that neither the single mutation of K113 K-to-R nor K-to-Q exhibited any discernible drug resistance phenotype. This finding suggests that, within the physiological context of the Isw1 protein, both post-translational modifications (PTMs) of K113 had minimal or no impact on the regulation of drug resistance. The reason for this phenomenon is because the acetylation modification of K97 imitates the process of ubiquitination of Isw1, hence reducing the interaction between Isw1 and Cdc4, which is an E3 ligase. Hence, the ubiquitination of K113 does not play a crucial role in the regulation of Isw1 protein stability under conditions where K97 is completely acetylated. Nevertheless, upon deacetylation of K97, we observed a notable increase in the abundance of Isw1 protein when K113 is substituted with R. This finding strongly supports the notion that ubiquitination of K113 plays a crucial role in maintaining the stability of the Isw1 protein. Hence, in the case of K97 acetylation, the PTM modifications of K113 are not required for maintaining Isw1 protein levels. However, in the event of K97 deacetylation, the ubiquitination of K113 becomes crucial in regulating protein stability. Considering the intricate post-translational modification (PTM) regulation observed at the K113 site, it would be advantageous to generate antibodies specific to K113ac and K113ub in order to comprehensively investigate the functional role of K113 in the regulatory processes. Nevertheless, the presence of antibodies targeting site-specific ubiquitination is infrequent in scientific literature. We regret any confusion that may have arisen from the previous remark and have made revisions to the manuscript to address this issue. Please refer to line 485-500.

      3)The authors demonstrate that overexpression of ISW1 in select clinical isolates of Cryptococcus increases sensitivity to antifungals. However, these experiments would have benefited from additional controls, such as including overexpression of ISW1 in the wild-type strain (H99) and antifungal-sensitive isolate (CDLC120).

      In response to your concern, we successfully generated the strains as required. In the revised manuscript, we demonstrated that the overexpression of the stable variant of Isw1 in H99 and CDLC120 strains induces heightened susceptibility to antifungal drugs. Please see Figure 8e, 8i and line 404-413.

      Reviewer #3 (Public Review):

      1) ISWI chromatin remodellers are well-characterised in many organisms. How many ISWI proteins does Cryptococcus contain? Why did the authors focus on ISWI?

      We express our gratitude for this criticism. The identification of Isw1 was conducted as a further investigation building upon the findings presented in our previously published data (Li Y, 2019). In prior research, the acetylome in C. neoformans was comprehensively analyzed, and a series of knockout strains were created to investigate the relationship between fungal pathogenicity and acetylation. The Isw1 mutant has been discovered as a modifier of drug resistance. The identification of fungal paralogs of ISW genes was initially observed in Saccharomyces cerevisiae, a species of yeast that has experienced genome duplication. This process involves two paralogs, Isw1 and Isw2, which emerged as a result of the whole genome duplication event (Kellis M, 2004; Tsukiyama T, 1999; Wolfe KH, 1997). Because C. neoformans has not gone through the complete genome duplication event, its genome only encodes one copy of ISW gene. Please see line 129-134..

      2) What is the ISWI protein complex(es)? The Mass-Spec analysis should reveal this.

      Prior research conducted on Saccharomyces cerevisiae has provided evidence that the ISWI complex is comprised of several subunits, namely Isw1, Ioc genes, Itc1, Chd1, and Sua7 (Mellor J, 2004; Smolle M, 2012; Sugiyama and Nikawa, 2001; Vary JC Jr, 2003; Yadon AN, 2013). Upon a thorough examination of the C. neoformans genome, we have not been able to identifying a similar the IOC gene family. This absence likely suggests an evolutionary loss of the IOC gene family in C. neoformans, as suggested on the FungiDB website. However, C. neoformans has Itc1, Chd1, and Sua7. While we concur with the aforementioned statement on the capability of Mass-Spec data to elucidate potential protein-protein interactions and aid in the identification of subunits within the ISWI complex, it is important to acknowledge that the PTM Mass-Spec methodology is solely employed for the purpose of identifying potential sites of protein modification. In order to comprehensively investigate the cryptoccocal ISWI complex, we conducted a standardized Isw1-Flag protein immunoprecipitation procedure, followed by Mass-Spec analysis. In the present study, a total of 22 proteins that interact with Isw1 were found in our experimental data. Among these proteins, 11 have been previously reported to be associated with the regulatory networks including Isw1. In the mass spectrometry results, the protein Itc1 was found to be co-immunoprecipitated with the protein Isw1. Although the Mass-Spec analysis did not reveal the presence of Chd1 and Sua7, our study demonstrated that Chd1 can be coimmunoprecipitated with Isw1 through the utilization of co-IP and immunoblotting techniques. However, no interaction between Isw1 and Sua7 was shown utilizing any of these methods. In brief, cryptococcal ISWI regulatory machinery is distantly related to that from S. cerevisiae. Please see Figure 2 and line 206-219.

      3) Is Cryptococcus ISWI a transcriptional activator or repressor?

      We regret the erroneous representation of Isw1 in the prior iteration of the manuscript. The misclassification of Isw1 as a transcriptional regulator has been identified, since it has been determined to function as a chromatin remodeler instead. The text has been suitably revised in accordance with academic standards. In the revised publication, we have presented a comprehensive transcriptome analysis of the isw1 Δ strain under both FLC treatment and no treatment conditions. This analysis offers valuable insights into the gene regulatory patterns associated with Isw1. In our dataset, we observed that Isw1 exerts a negative regulatory effect on the expression of genes that encode drug pumps, while simultaneously exerting a positive regulatory effect on the expression of genes that are essential for 5-FC resistance. Moreover, the ChIP-PCR study demonstrated the binding of Isw1 to the promoter regions of genes of interest. Hence, the chromatin remodeler Isw1 has a dual role, wherein it both facilitates the activation of certain genes and suppresses the expression of others, in response to varying forms of drug resistance. Please see line 142-153.

      4) Is ISWI function in drug resistance linked to its chromatin remodelling activity?

      In order to investigate the potential role of Isw1 on chromatin activity in the modulation of multidrug resistance, we have conducted protein truncation experiments. Specifically, we deleted the DNA binding domain, the helicase domain, and the SNF2 domain, which have been previously shown to regulate Isw1 chromatin activity in the model organism S. cerevisiae (Grune T, 2003; Mellor J, 2004; Pinskaya M, 2009; Rowbotham SP, 2011). The new data demonstrated that all truncation variants of Isw1 mutants had a growth phenotype consistent with that of the deletional strain isw1Δ. In addition, the levels of gene expression observed in these strains were also similar to those observed in the deletion strain isw1Δ. This finding provides evidence that the regulation of the drug resistance mechanism is influenced by these critical domains involved in modifying chromatin activities. Moreover, the Isw1-Flag strain was utilized to conduct chromatin immunoprecipitation and PCR experiments, which revealed that Isw1 exhibits the ability to directly bind to the promoter regions of target genes. The new findings added evidence substantially supporting the hypothesis that the Isw1 chromatin activity plays a crucial role in modulating its protein function, and acting as a central regulator of drug resistance in C. neoformans. Please see revised Figure 1g, 1h, 1i and line 186-199 in the revised manuscript text.

      5) Does ISWI interact with chromatin? If so, which are ISWI-target genes? Does drug treatment modulate chromatin binding?

      To effectively tackle this concern, we have pursued two distinct approaches to demonstrate the chromatin regulatory effects of Isw1. In this study, the DNA binding domain was deliberately removed through genetic manipulation. The data presented indicates that the Isw1 mutants with shorter variations exhibited a growth phenotype that was characterized by multidrug resistance. This growth phenotype correlates with the growth phenotype obtained in the isw1Δ deletion strain. Additionally, it was observed that the levels of gene expression in the strain were comparable to those detected in the deletion strain isw1Δ. This discovery offers empirical support for the notion that the control of the drug resistance mechanism is indeed impacted by the DNA binding capability of Isw1. Furthermore, the Isw1-Flag strain was employed to perform chromatin immunoprecipitation and PCR assays, demonstrating the direct binding capacity of Isw1 to the promoter regions of target genes. The results obtained from this comprehensive analysis of the revised data offer significant evidence for the proposition that Isw1 interacts with chromatin and that its chromatin activity plays a pivotal role in modulating its protein function. This interaction serves as a central regulatory mechanism for drug resistance in C. neoformans. Furthermore, a transcriptome analysis was performed on both wildtype and isw1 deletion strains in the absence of FLC therapy. Upon comparing the results obtained from two unique experimental settings, specifically those with and without FLC administration, a notable disparity in the control of gene expression between these two situations was identified. In the context of the isw1 deletion strain exposed to FLC treatment, a set of 21 genes, including those belonging to the ABC/MFS family and efflux pumps, displayed significant changes in their gene expression patterns. In particular, a total of 9 genes exhibited downregulation, whilst 12 genes displayed upregulation. In contrast, in the absence of FLC supplementation, a total of 9 genes exhibited alterations in gene expression, with 3 genes showing downregulation and 6 genes showing upregulation. Therefore, the Isw1 protein plays a crucial role in the activation of certain genes, while simultaneously having a suppressive effect on other genes. Hence, the Isw1 undergoes a reconfiguration of its regulatory apparatus in response to drugs. Despite that the performance of ChIP-seq analysis was necessary in this study, it was observed that the treatment of fungal cells resulted in a notable decrease in the abundance of the Isw1 protein. This decrease can be attributed to the activation of Isw1 protein degradation. Consequently, there was an insufficient amount of Isw1 protein available for successful enrichment and subsequent ChIP-seq analysis (please see Figure 4a and 4c). However, the data collected collectively have demonstrated the idea that Isw1 serves as a crucial master regulator of drug resistance in C. neoformans. The text has undergone revisions in order to present our findings in a precise and thorough manner. Please see Figure 1c, 1g, Supplementary File 2, and line 145-153, 186-188.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      Multimodal experiences that for example contain both visual and tactile components are encoded as associative memories. This manuscript is a valuable contribution supporting structural and functional brain plasticity following associative training protocols that pair together different types of sensory stimuli. The results provide solid support for this plasticity being a basis for cross-modal associative memories.

      We appreciate eLife assessments to our discovery about the recruitment of associative memory neurons in cerebral cortices as a hub for the fulfillment of the first order and the second order of associative memory. Synapse interconnections among associative memory neurons mediate the reciprocal retrieval, the conversion and the translation of associated signals learnt in life span.

      Reviewer #1 (Public Review):

      This manuscript by Xu and colleagues addresses the important question of how multi-modal associations are encoded in the rodent brain. They use behavioral protocols to link stimuli to whisker movement and discover that the barrel cortex can be a hub for associations. Based on anatomical correlations, they suggest that structural plasticity between different areas can be linked to training. Moreover, they provide electrophysiological correlates that link to behavior and structure. Knock-down of nlg3 abolishes plasticity and learning. This study provides an important contribution as to how multi-modal associations can be formed across cortical regions.

      We sincerely thank Reviewer one’s comments, which is a great driving force for us to move forward to reveal the specific roles of neural circuits in associative memory and its relevant cognitive activities and emotional reactions.

      Reviewer #2 (Public Review):

      This manuscript by Xu et al. explores the potential joint storage/retrieval of associated signals in learning/memory and how that is encoded by some associative memory neurons using a mouse model. The authors examined mouse associative learning by pairing multimodal mouse learning including olfactory, tactile, gustatory, and pain/tail heating signals. The key finding is that after associative learning, barrel neurons respond to other multi-model stimulations. They found these barrel cortical neurons interconnect with other structures including piriform cortex, S1-Tr and gustatory cortical neurons. Further studies showed that Neuroligin 3 mediated the recruitment of associative memory neurons during paired stimulation group. The authors found that knockdown Neuroligin 3 in the barrel cortex suppressed the associative memory cell recruitment in the paired stimulation learning. Overall, while the findings of this study are interesting, the concept of associative learning involving multiple functionally connective cortical regions is not that novel. While some data presented are convincing, the other seems to lack rigor. In addition, more details and clarification of the experimental methods are needed.

      Thank you so much for your comments on our studies in terms of the recruitment of associative memory neurons as the hub for the joint storage and reciprocal retrieval of multi-modal associated signals. You are right about that the concept of associative memory neuron and the new established interconnection among cerebral cortices for the formation of associative memory are not novel. The original finding has been reported by senior author’s lab many years ago, which has also been presented in a book by Jin-Hui Wang “Associative Memory Cells: Basic Units of Memory Trace” published by Springer-Nature 2019. In addition, we have made certain clarifications in our revision, but the detailed information about experimental approaches and concepts are expected to be seen in our previous publications and this book as well.

      Reviewer #1 (Recommendations For The Authors):

      I have two points that I find would strengthen the manuscript further:

      1. Associative memories are also based on specificity, which is not addressed in this manuscript. The authors could discuss this and also the magnitude of plasticity. In general, I would suggest also testing plasticity in response to a non-linked stimulus to prove specificity.

      This a good point. In terms of the specificity of associative memory in our model, we have shown this point in our previous studies, such as Wang, et al. “Neurons in the barrel cortex turn into processing whisker and odor signals: a cellular mechanism for the storage and retrieval of associative signals”. Frontiers in Cellular Neuroscience 9-320:1-17 2015, and Jin-Hui Wang “Associative Memory Cells: Basic Units of Memory Trace” published by Springer-Nature 2019.

      1. Nlg3 knock-down is a strong intervention. The authors could discuss the implications of interfering with synapse assembly and mechanistic implications at the synaptic level. It could help to compare the consequences of this intervention to a post-training lesion.

      This is a good point. To prevent the possibility of post-training lesion by the intervention of Nlg3 knockdown, we have conducted the use of shRNA-scramble control. In addition, the discussion about the intervention of Nlg3 knockdown at synapse level has been added in our discussion.

      1. In general, the clarity of the wording in some sections/sentences could be improved.

      The rewording of certain sentences has been done in our revision.

      Reviewer #2 (Recommendations For The Authors):

      1. The writing of the manuscript needs major editing, there are grammatical errors even in the title. The extremely long introduction and discussion section with repeated details can be distracting from the main focus of the work.

      This point has been taken during our revision.

      1. Many bar graphs, such as Figure 5C and 5G, Figure 6C-6G, have low-resolution images, meaning that the axis titles and labels are unreadable.

      The resolution of Figures have been improved in our revision.

      1. The bar graph with data points and illustration in Figure 1E and 1G are misplaced.

      This mistake has been corrected in our revision.

      1. On page 23, Figure 2B, which layer(s) of the PC, S1Tr and GC were the images taken from? In the PSG group, why is there no red axon terminal signal observed in the three regions? does it indicate that there is no significant projection from the BC axon to PC, S1Tr, or GC neurons? Given that Thy1-YFP labeled glutamatergic neurons at PC, S1Tr, and GC and there is no discernable co-localization of yellow and green cells, can we assume that the glutamatergic neurons at PC, S1Tr, and GC are not involved in the associative learning after PSG paradigm? Lastly, the number of synapse contacts in Figure 2E is only 1-2 per 100um dendrite, but this is not quite consistent with the confocal images in Figure 2D. In Figure 2D, there are at least three tdTomato boutons on the cropped dendrite which is ~16um according to the scale bar.

      If we magnify Figure 2B, we are able to see red boutons, which can be seen in Figure 2C with a higher magnification. In addition, the distribution of synapse contacts is variable, we have demonstrated the averaged values of synapse contacts over dendrites in Figure 2E, such that the single original image may not exactly same as the statistical data.

      1. Figure 4C and Figure 8C, how were the percentages of associative neurons calculated after LFP recording? More details are needed on the method of this in vivo LFP/single unit recordings, including the spike sorting algorithm.

      In the section of Results, the total number of neurons recorded in each of groups has been given. For instance, the neurons recorded from PSG mice (Figure 4) were 70, which was used as denominator. With the number of neurons that responded to two or more signals, the percentage of associative memory neurons recruited in associative learning was calculated. This information has been added in our revision (please see the section of Results).

      1. The rationale for the authors choosing Neuroligin 3 as the target for investigating the formation of new synapse interconnections between BC, PC, S1Tr, and GC after PSG should be more clearly spelled out. Synaptic CAMs include SynCAM, NCAM, Neurexin, Cadherin et al all play a role in new synapse formation. Neuroligin 1 is expressed specifically in the CNS at excitatory synapses. Why did the authors choose to study Neuroligin 3 instead of Neuroligin 1?

      This is a good point. Based on our previous data, miRNA-324 is upregulated during the associative learning by our mouse model, which degrades neuroligin-3 mRNA. The role of neuroligin-3 in the formation of new synapses and the recruitment of associative memory neurons is studied in this paper.

      1. The behavioral results in Figure 5B-5G indicated that after pair-stimulation of WS-OS, WS-TS, or WS-GS, the memory learned in piriform, S1-Tr and gustatory cortical neurons can be retrieved from each other, by jumping over the barrel cortex. Is it possible that there is some direct interconnection formed between piriform, S1-Tr, and gustatory cortical neurons? Maybe they can try to do barrel cortical lesion or chemogenetic inhibition after PGS training and then repeat the behavioral tests as in Figure 5B-5G.

      We have done experiments to examine the potential direct interconnection among piriform, S1-Tr and gustatory cortical neurons, after the associative learning about twelve days. We have no convincing data to support this possibility at this moment.

      1. Some of the images showing the location of virus injections look VERY similar, such as Figure 3A left and right, Figures 7A and 7D. Larger variability of different animals/injection sites is definitely expected.

      The injected viruses in Figure 3 and Figure 7 are different, since AAV-carried fluorescent proteins in different cortical areas are different. In addition, if we carefully enlarge the images in the right and left panels of Figure 3A, we will see that the areas of AAV transfection in morphology are different. The similarity of injection areas as Reviewer two claimed indicates the more precision of our virus-injection sites.

      1. On page 49, are the green neurons in Figure 9B the BC cells? Just to be consistent, the authors should use the same color for BC cells as in Figure 9A. Also, label the primary and the secondary associative memory cells in Figure 9.

      Figure 9 has been thoroughly changed in our revision.

    2. eLife assessment

      Multimodal experiences that for example contain both visual and tactile components are encoded as associative memories. This manuscript is a valuable contribution supporting structural and functional brain plasticity following associative training protocols that pair together different types of sensory stimuli. The results provide solid support for this plasticity being a basis for cross-modal associative memories.

    3. Reviewer #1 (Public Review):

      This manuscript by Xu and colleagues addresses the important question of how multi-modal associations are encoded in the rodent brain. They use behavioral protocols to link stimuli to whisker movement and discover that the barrel cortex can be a hub for associations. Based on anatomical correlations, they suggest that structural plasticity between different areas can be linked to training. Moreover, they provide electrophysiological correlates that link to behavior and structure. Knock-down of nlg3 abolishes plasticity and learning.

      This study provides an important contribution as to how multi-modal associations can be formed across cortical regions.

    4. Reviewer #2 (Public Review):

      This manuscript by Xu et al. explores the potential joint storage/retrieval of associated signals in learning/memory and how that is encoded by some associative memory neurons using a mouse model. The authors examined mouse associative learning by pairing multimodal mouse learning including olfactory, tactile, gustatory, and pain/tail heating signals. The key finding is that after associative learning, barrel neurons respond to other multi-model stimulations. They found these barrel cortical neurons interconnect with other structures including piriform cortex, S1-Tr and gustatory cortical neurons. Further studies showed that Neuroligin 3 mediated the recruitment of associative memory neurons during paired stimulation group. The authors found that knockdown Neuroligin 3 in the barrel cortex suppressed the associative memory cell recruitment in the paired stimulation learning. Overall, this is an interesting study that reveals novel modalities associative learning involving multiple functionally connective cortical regions. Data presented are in general supporting their conclusions after revision.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Soudi, Jahani et al. provide a valuable comparative study of local adaptation in four species of sunflowers and investigate the repeatability of observed genomic signals of adaptation and their link to haploblocks, known to be numerous and important in this system. The study builds on previous work in sunflowers that have investigated haploblocks in those species and on methodologies developed to look at repeated signals of local adaptations. The authors provide solid evidence of both genotype-environment associations (GEA) and genome-wide association study (GWAS), as well as phenotypic correlations with the environment, to show that part of the local adaptation signal is repeatable and significantly co-occur in regions harboring haploblocks. Results also show that part of the signal is species specific and points to high genetic redundancy. The authors rightfully point out the complexities of the adaptation process and that the truth must lie somewhere between two extreme models of evolutionary genetics, i.e. a population genetics view of large effect loci and a quantitative genetics model. The authors take great care in acknowledging and investigating the multiple biases inherent to the used methods (GEA and GWAS) and use a conservative approach to draw their conclusions. The multiplicity of analyses and their interdependence make them slightly hard to understand and the manuscript would benefit from more careful explanations of concepts and logical links throughout. This work will be of interest to evolutionary biologists and population geneticists in particular, and constitutes an additional applied example to the comparative local adaptation literature.

      Some thoughts on the last paragraph of the discussion (L481-497): I think it would be fine to have some more thoughts here on the processes that could contribute to the presence/absence of inversions, maybe in an "Ideas and Speculation" subsection. To me, your results point to the fact that though inversions are often presented as important for local adaptation, they seem to be highly contingent on the context of adaptation in each species. First, repeatability results are only at the window/gene level in your results, the specific mutations are not under scrutiny. Is it possible that inversions are only necessary when sets of small effect mutations are used, opposite to a large effect mutation in other species? Additionally, in a model with epistasis, fitness effects of mutations are dependent on the genomic background and it is possible that inversions were necessary in only certain contexts, even for the same mutations, i.e. some adaptive path contingency. Finally, do you have specific demographic history knowledge in this system that maps to the observations of the presence of inversions or not? For example, have the species "using" inversions been subject to more gene flow compared to others?

      Thank you for the great suggestions and helpful comments. Regarding the question of demography, each of the species actually harbours quite a large number of haploblocks (13 in H. annuus spanning 326Mb, 6 in H. argophyllus spanning 114 Mb, and 18 in H. petiolaris spanning 467 Mb; see Todesco et al. 2020 for more details) so there does not seem to be any clear association with demography. We agree about the complexities that might underly the evolution of inversions that you outline above, and have refined some of the text where we discuss their evolution in the Discussion.

      Reviewer #2 (Public Review):

      In this study the authors sought to understand the extent of similarity among species in intraspecific adaptation to environmental heterogeneity at the phenotypic and genetic levels. A particular focus was to evaluate if regions that were associated with adaptation within putative inversions in one species were also candidates for adaptation in another species that lacked those inversions. This study is timely for the field of evolutionary genomics, due to recent interest surrounding how inversions arise and become established in adaptation.

      Major strengths

      Their study system was well suited to addressing the aims, given that the different species of sunflower all had GWAS data on the same phenotypes from common garden experiments as well as landscape genomic data, and orthologous SNPs could be identified. Organizing a dataset of this magnitude is no small feat. The authors integrate many state-of-the-art statistical methods that they have developed in previous research into a framework for correlating genomic Windows of Repeated Association (WRA, also amalgamated into Clusters of Repeated Association based on LD among windows) with Similarity In Phenotype-Environment Correlation (SIPEC). The WRA/CRA methods are very useful and the authors do an excellent job at outlining the rationale for these methods.

      Thank you!

      Major weaknesses

      The study results rely heavily on the SIPEC measure, but I found the values reported difficult to interpret biologically. For example, in Figure 4 there is a range of SIPEC from 0 to 0.03 for most species pairs, with some pairs only as high as ~0.01. This does not appear to be a high degree of similarity in phenotype-environment correlation. For example, given the equation on line 517 for a single phenotype, if one species has a phenotype-environment correlation of 1.0 and the other has a correlation of 0.02, I would postulate that these two species do not have similar evolutionary responses, but the equation would give a value of (1+0.02)10.02/1 = 0.02 which is pretty typical "higher" value in Figure 4. I also question the logic behind using absolute values of the correlations for the SIPEC, because if a trait increases with an environment in one species but decreases with the environment in another species, I would not predict that the genetic basis of adaptation would be similar (as a side note, I would not question the logic behind using absolute correlations for associations with alleles, due to the arbitrary nature of signing alleles). I might be missing something here, so I look forward to reading the author's responses on these thoughts.

      The reviewer makes a very good point about the range of SIPEC, and we have changed our analysis to reflect this, now reporting the maximum value of SIPEC for each environment (across the axes of the PCA on phenotypes that cumulatively explain 95% of the variance), in Figure 4 and Supplementary Figures S2 and S13. For consistency among manuscript versions and to illustrate the effect of this change, we retain the mean SIPEC value in one figure in the supplementary materials (S12), which shows the small effect of this change on the qualitative patterns. Figure 4 now shows that the maximum SIPEC value is regularly quite strong, which should address the reviewer’s concern that this is not being driven by anomalous and small values. We appreciate this point and think this change now more closely reflects how we are trying to estimate the biological feature of interest – that some axis of phenotypic space is strongly (or not) responding to selection from the environmental variable.

      With respect to the logic behind using absolute value, we still feel this is justified for traits, because if a trait evolves to be bigger or smaller, it may still use the same genes. For example, flowering time may change to be later or earlier, which would result in opposite correlations with a given environment, but might use the same gene (e.g. FT) for this. As such, we think keeping absolute value is more representative as otherwise species with strong but opposite patterns of adaptation would look like they were very different. We have added a statement on line 584 in the methods section to further clarify the reason for this choice.

      An additional potential problem with the analysis is that from the way the analysis is presented, it appears that the 33 environmental variables were essentially treated as independent data points (e.g. in Figure 4, Figure 5). It's not appropriate to treat the environmental variables independently because many of them are highly correlated. For example in Figure 4, many of the high similarity/CRA values tend to be categorized as temperature variables, which are likely to be highly correlated with each other. This seems like a type of pseudo replication and is a major weakness of the framework.

      This is a good point and we fully agree. It is for this reason that we didn’t present any p-values or statistical tests of the overall patterns that are shown in these figures (i.e. the linear relationship between SIPEC and number of CRAs in figure 4 and the tendency for most points to fall above the 1:1 line in figure 5). But to make sure this is even more clear, we have added statements to the captions of these figures to remind readers that points are non-independent. We still feel that in the absence of a formal test, the overall patterns are strongly consistent with this interpretation. A smaller number of non-pseudo-replicated points in Figure 4 would still likely show linear patterns. Similarly, there are almost no significant points falling below the 1:1 line in Figure 5, and it seems unlikely that pseudoreplication would generate this pattern.

      Below I highlight the main claims from the study and evaluate how well the results support the conclusions.

      "We find evidence of significant genome-wide repeatability in signatures of association to phenotypes and environments" (abstract)<br /> Given the questions above about SIPEC, I did not find this conclusion well supported with the way the data are presented in the manuscript.

      We have changed the reporting of the SIPEC metric so that it more clearly reflects whichever axis of phenotypic space is most strongly correlated with environment in both species (using max instead of mean). This shows similar qualitative patterns but illustrates that this happens across much higher values of SIPEC, showing that it is in fact driven by high correlations in each species (or non-similar correlations resulting in low values of SIPEC). While we agree about the pseudo-replication problem preventing formal statistical test of this hypothesis, the visual pattern is striking and seems unlikely to be an artefact, so we think this does still support this conclusion.

      "We find evidence of significant genome-wide repeatability in signatures of association to phenotypes and environments, which are particularly enriched within regions of the genome harbouring an inversion in one species. " (Abstract) And "increased repeatability found in regions of the genome that harbour inversions" (Discussion)<br /> These claims are supported by the data shown in Figure 4, which shows that haploblocks are enriched for WRAs. I want to clarify a point about the wording here, as my understanding of the analysis is that the authors test if haploblocks are enriched with WRAs, not whether WRAs are enriched for haploblocks. The wording of the abstract is claiming the latter, but I think what they tested was the former. Let me know if I'm missing something here.

      We are actually not interested in whether WRAs are enriched for haploblocks; we want to know if WRAs tend to occur more commonly within haploblocks than outside of them. We have tried to clarify that this is our aim in various places in the manuscript. Our analysis for Figure 5 is the one supporting these claims, and it uses the Chi-square test statistic to assess the number of WRAs and non-WRAs that fall within vs. outside of inversions, and a permutation test to assess the significance of this observation, for each environmental variable and phenotype. We don’t think that this test has any direction to it – it’s simply testing if there is non-random association between the levels of the two factors. Thus, we think the wording we have used is consistent with the test result and our aims. Perhaps the confusion arose from the two methods that we present in the Methods (one is used for Figure 5, the other for Figure S6C & D), so we have added clarifications there.

      Notwithstanding the concerns about highly correlated environments potentially inflating some of the patterns in the manuscript, to my knowledge this is the first attempt in the literature to try this kind of comparison, and the results does generally suggest that inversions are more likely capturing, rather than accumulating adaptive variation. However, I don't think the authors can claim that repeated signatures are enriched with haploblock regions, and the authors should take care to refrain from stating the relative importance of different regions of the genome to adaptation without an analysis.

      Actually, we don’t have a strong feeling about whether inversions are capturing vs. accumulating adaptive variation, as these results could be consistent with either. As described above, we do not understand why we can’t claim that repeated signatures are enriched within haploblocks. We thought the reviewer is perhaps referring to the fact that the points are pseudo-replicated in the figures due to environment? We note that a very large number of points are significantly different from random in terms of the distribution of WRAs within vs. outside of haploblocks (light- vs. dark-shaded symbols), and that almost all of them fall above the 1:1 line. While there may be pseudo-replication preventing a test of the bigger multi-environment/multi-species hypothesis across all phenotypes and environments, there is almost a complete lack of significant results in the other direction. This seems like quite strong evidence about enrichment of WRAs within haploblocks, across many environments/species contrasts. We have added some text to the description of patterns in figure 5 to try to clarify this.

      "While a large number of genomic regions show evidence of repeated adaptation, most of the strongest signatures of association still tend to be species-specific, indicating substantial genotypic redundancy for local adaptation in these species." (Abstract)<br /> Figure 3B certainly makes it look like there is very little similarity among species in the genetic basis of adaptation, which leaves the question as to how important the repeated signatures really are for adaptation if there are very few of them. (Is 3B for the whole genome or only that region?). This result seems to be at odds with the large number of CRAs and the claims about the importance of haploblock regions to adaptation, which extend from my previous point.

      Figure 3B is for the whole genome, we have added text to the figure caption to clarify this. We think that both interpretations are possible: that most of the regions of the genome that are driving adaptation are non-repeated, but that a small but significant proportion of regions driving adaptation are repeated above what would be expected at random. Thus, it seems that there is high redundancy, coupled with adaptation via some genes that seem particularly functionally important and non-redundant, and therefore repeated. We added clarifying text on lines 541-548.

      "we have shown evidence of significant repeatability in the basis of local adaptation (Figure 4, 5), but also an abundance of species-specific, non-repeated signatures (Figure 3)"<br /> While the claim is a solid one, I am left wondering how much of these genomes show repeated vs. non-repeated signatures, how much of these genomes have haploblocks, and how much overlap there really is. Finding a way to intuitively represent these unknowns would greatly strengthen the manuscript.

      We agree, and really struggled to find the best way to communicate both the repeated patterns and the large amount of non-repeated signatures. Unfortunately, we have more confidence in the validity of repeated patterns because for the non-repeated patterns, a strong signature of association to environment in only one species could just be the product of structureenvironment correlation, as we didn’t control for population structure. Thus, trying to quantify the proportion of non-repeated signatures is difficult to do with any accuracy and we preferred to avoid putting too much emphasis on the simple calculation of the proportion of top candidate windows that were also WRAs.

      Overall, I think the main claims from the study, the statistical framework, and the results could be revised to better support each other.

      Although the current version of the manuscript has some potential shortcomings with regards to the statistical approaches, and the impact of this paper in its present form could be stifled because the biology tended to get lost in the statistics, these shortcomings may be addressed by the authors.

      With some revisions, the framework and data could have a high impact and be of high utility to the community.

      Thank you for your very helpful comments and suggestions on our paper, we really appreciate it.

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors

      Editor's comments:

      The reviewers make a series of reasonable suggestions that I echo. I found the paper quite hard to follow, and got fairly lost in the various layers of analyses done. Partially, this represents the complexity of empirical genomic data, which rarely deliver simple stories of convergence at a few genes. However, the properties of the various statistics used to detail local adaptation and convergence are not particularly clear and the figures presented were not intuitive representations of the data. This leaves the reader with an incomplete view of how much weight to put in the various lines of evidence marshaled. I would suggest simplifying the presentation of the results considerably. I add a few additional comments below.

      Great suggestion, we’ve added a schematic overview of the methods and main research questions to Figure S1 in the supplementary materials.

      A figure would help showing some of the signals of SNPs with putative signals of convergent environmental correlations across species, e.g. frequencies plotted against climate variables. This would help readers get a sense of how strong these signals were. These could be accompanied by the statistics calculated for these SNPs, that would allow the reader to start to get some intuitive sense of what the numbers mean.

      Great suggestion, we have added a schematic overview of the methods to Figure S1 that shows some of the values and illustrates how the methods work using visual examples from our data.

      In general, the introduction and some of the discussion of the inversion results feel oddly framed:<br /> Abstract line 36: "This shows that while inversions may facilitate local adaptation, at least some of the loci involved can still make substantial contributions without the benefit of recombination suppression."

      We have changed “some of the loci involved can still make substantial contributions without the benefit of recombination suppression” here to “some of the loci involved can still harbour mutations that make substantial contributions without the benefit of recombination suppression in species lacking a segregating inversion” as it hopefully clarifies that we’re not talking about individual alleles that are present in both species.

      Models of the role of local adaptation in the establishment of inversions (Kirkpatrick & Barton) assume that there are multiple locally adapted alleles already present. It is the load created by these alleles being constantly maintained in the face of migration and subsequent recombination that allow an inversion to be selected for because it keeps together locally adapted alleles. Thus these models predict that there could well be standing local adaptation at these loci in the absence of the inversion in other species, and that these locally adapted alleles while not fixed may be at high frequency. (After establishment, inversions housing locally adapted alleles, can shield more weakly, locally beneficial alleles from migration allow other alleles to build up.) Empirically it's interesting to find signals of local adaptation in other species that don't contain putative inversions. But the logic of the different predictions is not particularly clear from the introduction, and only becomes somewhat clearer in the discussion.

      Thank you for pointing out this murkiness, we have re-written portions of both the Introduction and Discussion to clarify this aspect.

      From the introduction: Inversions have been implicated in local adaptation in many species (Wellenreuther and Bernatchez 2018), likely due to their effect to suppress recombination among inverted and noninverted haplotypes, and thereby maintain LD among beneficial combinations of locally adapted alleles (Rieseberg 2001; Noor et al. 2001; Kirkpatrick and Barton 2006). This has been approached by models studying the establishment of inversions that capture combinations of locally adapted alleles present as standing variation (e.g., Kirkpatrick and Barton 2006), as well as models examining the accumulation of locally adapted mutations within inversions (e.g., Schaal et al. 2022). If there is variation in the density of loci that can potentially contribute to local adaptation, inversions would be expected to preferentially establish and be retained in regions harbouring a high density of such loci (and this expectation would hold for both the capture and accumulation models). We would also expect to see stronger signatures of repeated local adaptation in such high density regions. Despite mounting evidence of their importance in adaptation, it is unclear how inversions may covary with repeatability of adaptation among species. A fundamental parameter of importance in these models is the relationship between migration rate and strength of selection on individual alleles, which may not make persistent contributions to local adaptation without the suppressing effects of recombination if selection is too weak (Yeaman and Whitlock 2011; Bürger and Akerman 2011). If most alleles have small effects relative to migration rate and can only contribute to local adaptation via the benefit of the recombination-suppressing effect of an inversion, then we would expect little repeatability at the site of an inversion – other species lacking the inversion would not tend to use that same region for adaptation because selection would be too weak for alleles to persist. On the other hand, if some loci are particularly important for local adaptation and regularly yield mutations of large effect, with these patterns being conserved among species, repeatability within regions harbouring inversions may be substantial. Thus, studying whether adaptation at the same genomic region harbouring an inversion is observed in other species lacking the inversion can give insights about the underlying architecture of adaptation, and the evolution and maintenance of inversions.

      From the Discussion: The observed repeatability associated with inversions further supports the local adaptation model as an explanation for the long-term persistence of segregating inversions (at least in sunflowers, rather than mechanisms based on dominance or meiotic drive (Rieseberg 2001). If there is variation across the genome in the density of loci with the potential to be involved in local adaptation, then the establishment and maintenance of inversions would be biased towards regions harbouring a high density such loci under this model. If the genomic basis for local adaptation is conserved amongst species, then these same regions are more likely to have high repeatability. Thus, our observation of genomic regions harbouring inversions also being enriched for WRAs is consistent with this general model for inversion evolution. Unfortunately, our observations do not provide much insight into whether inversions evolve through the capture (e.g. Kirkpatrick and Barton 2006) or accumulation (e.g. Schaal et al. 2022) type of model, as either model would be consistent with our results. Most of the sunflower inversions are >1 My old, and therefore predate any current local adaptation patterns, but likely do not predate the genes underlying local adaptation (which appear to be shared among the species we studied). As for the alleles underlying local adaptation, they may be younger than the inversions, but as our work suggests, these regions are prone to harbouring locally adaptive alleles so it is possible that they also harboured other ancestral locally adaptive alleles.

      As a minor comment, there's a fair number of places where a more nuanced view of the field is needed, e.g.:<br /> "Models in evolutionary genetics tend to focus on extremes: population genetic approaches explore cases where strong selection deterministically drives a change in allele frequency" --This seems like a strange strawman. Population genetic models span a huge parameter range. The empirical approaches of looking for sweeps by detecting genome-wide statistical outliers is predicated on strong selection, but there are numerous papers that have looked for signals of weak selection genome-wide.

      Good point, we have changed our wording here.

      Reviewer #1 (Recommendations For The Authors):

      Comments

      My main comment on the manuscript is that the different levels and diversity of analyses are slightly hard to follow on the first, and even second, read. As there are several layers of correlations and comparisons, as well as some independent analyses, I wonder if it might be helpful to have a summary schematic figure of how all analyses fit together.

      Great idea, we have added Figure S1 that summarizes the main flow of the methods and research questions.

      • L169-171: Would it be more accurate to say that SIPEC is maximized when both species have strong correlations for an environmental variable across the same phenotypes? But maybe I misunderstood the index.

      Good point, we have now simplified SIPEC, reporting the max instead of the mean, which we think better reflects when similar patterns are happening in both species for some phenotype.

      • L191: Given the discussion in the introduction and elsewhere about the correction for population structure, which version is used here? Same for Figure 3.

      We have added clarification there.

      • L348: One [environmental] variable?

      Added

      • L353: Maybe add a percentage indication for 387 so that it is comparable to the following 23.3%.

      Good point, added

      -> L388 and paragraph: You mention "significant repeatability" but it is hard from the results at this point to have a broad idea of the amount of signal that is repeatable. Would it be possible to add here some quantitative measure of the proportion of signal repeatable or not, even if approximated?

      I wish we could, but I think the precision implied by such an approximation would involve a huge amount of uncertainty and likely inaccuracy. Because it is so hard to conclusively identify how many loci are significant but non-repeated, we really don’t have a good handle on the denominator here. We are pretty confident that the repeated loci are strongly enriched for true positives, but the non-repeated loci are also almost certainly strongly enriched for false positives. While we really want to be able to quantify this explicitly, we don’t think it’s possible given our data.

      -L415-418: "If there is variation [...] involved in local adaptation", I do not follow this argument, could you rephrase?

      Changed

      -L447-450: As you say in the supplementary methods, your analyses exclude 3/4 of the genome. Do you think this choice has a large impact on the number of outliers observed here as the genome-wide baseline would change?

      This is a very good question, but one that is quite complex and without a clear answer – we chose not to delve into it in the paper to keep the discussion streamlined. My (SY) feeling is that it is unlikely that regions harbouring transposable elements would contribute much to adaptation, but I think we really don’t know if that is true. Even excluding ¾ of the genome harbouring TEs, ¼ of the genome still constitutes a huge amount of sequence and a very large number of genes and it seems plausible that most genes and genic regions would not contribute to adaptation for a given trait, so I don’t think this would change the results too much in a qualitative way – but would almost certainly change the number of windows that are significant, etc.

      • L455-457: "As we are unable [...] potentially important drivers" Could you provide the logical link here between loci of small effect and them being important drivers. I presume you mean that the large effect loci found here only account for a small proportion of the heritability?

      Yes that’s what we meant here, so we’ve added some clarification.

      • L482: "enriched within inversions" should that be 'in genomic regions where there exist inversions in at least one species'? Thanks for catching that, yes. Changed.

      • Methods/SIPEC L512: Compared to the Results section it is unclear here what is referred to as an "environment" Is it a variable or a set of environment variables?

      This is done per environmental variable.

      I find the presence of the PCA for environment variables in Figure 2 misleading as my first interpretation was that PCs for environment were also used.

      Good point, we have clarified this on line 190-193.

      Maybe one potential addition to the formula would be to add an environment variable $j$ notation such that it reads "$SIPEC_j = \sum_i (|r_{ij,1}| + ...) ...$ where ... between environment variable $j$". I had initial difficulties to understand how this SIPEC was computed relating to environmental variables and this might help.

      Given the other changes we made to SIPEC, we felt it was simpler to just present it as a single calculation on a given combination of phenotype and environment for a pair of species, and then discuss taking the mean and maximum of this later.

      Finally, PCA axes explaining 95% of the variance are used, I would find it interesting to see how many PCs are used in comparison to the number of traits being measured.

      We have added the following sentence to the methods describing this:

      "For comparisons including H. argophyllus, 95% of the variance was typically explained by 8-10 PC axes (out of 28 or 29 phenotypes), whereas for comparisons among other taxa this included 21 or 22 PC axes (out of 65 or 66 phenotypes."

      Typos

      L52: --

      Changed

      L254: portions [of] their

      Changed

      L399: additional closing parenthesis

      Changed

      L458: signatures [of] repeated association

      Changed

      L554: performed [on]

      Changed

      L578: 5 ~~kp~~/kb windows

      Changed

      L601: ~~casual~~/causal SNPs

      Changed

      L615: ~~widow~~/window

      Changed

      L732: ~~Banding~~/Banting Postdoctoral Fellowship

      Changed

      L1002 & L960: [Supplementary] Figure

      Changed

      Supplementary: Some figure titles are in bold and others are not.

      Changed

      Reviewer #2 (Recommendations For The Authors):

      Overall I found the writing to be very clear and easy to follow. Despite my comments, it was clear that a lot of thought went into how to conduct the tests and visualize the results. I recommend ending the Discussion on a positive note, rather than an impossible test.

      Thanks for the positive suggestion, we have done this.

      In Figure 5, is the temperature variable missing in the legend and in the plot?

      No, for this plot we just combined the temperature/precipitation variables into one variable called “climate”.

    2. eLife assessment

      This is a valuable comparative study of adaptation across multiple species. The results provide a solid example of the application of genotype-environment associations to demonstrate that local adaptation is repeatable.

    3. Reviewer #1 (Public Review):

      Soudi, Jahani et al. provide a valuable comparative study of local adaptation in four species of sunflowers, and investigate the repeatability of observed genomic signals of adaptation and their link to haploblocks, known to be numerous and important in this system. The study builds on previous work in sunflowers that have investigated haploblocks in those species and on methodologies developed to look at repeated signals of local adaptations. The authors provide solid evidence of both genotype-environment associations (GEA) and genome-wide association study (GWAS), as well as phenotypic correlations with the environment, to show that part of the local adaptation signal is repeatable and significantly co-occur in regions harboring haploblocks. Results also show that part of the signal is species specific and points to high genetic redundancy. This work will be of interest to evolutionary biologists in general and population geneticists in particular, and constitutes a good example of comparative local adaptation. Importantly, this study helps in advancing our understanding of the genetic architecture implicated in the adaptation process.

      Strenghts: The authors take great care in acknowledging and investigating the multiple biases inherent to the used methods (GEA and GWAS) and use conservative and well thought statistical approaches to draw their conclusions. Additionally, I appreciated the nuanced discussion and can only agree with the authors that the adaptation process is complex and does not fully fit the classic simplified genetics models of either few large effect genes or only infinitesimal quantitative traits. I find the added Summary figure of this revised version (S1) extremely helpful in better understanding the different analysis steps and how they relate to the different questions.

      Weaknesses: After those revisions, I did not find any major weakness and am satisfied with the authors responses.

    4. Reviewer #2 (Public Review):

      In this study the authors sought to understand the extent of similarity among species in intraspecific adaptation to environmental heterogeneity at the phenotypic and genetic levels. A particular focus was to evaluate if regions that were associated with adaptation within putative inversions in one species were also candidates for adaptation in another species that lacked those inversions. This study is timely for the field of evolutionary genomics, due to recent interest surrounding how inversions arise and become established in adaptation.

      Major strengths-

      Their study system was well suited to addressing the aims, given that the different species of sunflower all had GWAS data on the same phenotypes from common garden experiments as well as landscape genomic data, and orthologous SNPs could be identified. Organizing a dataset of this magnitude is no small feat. The authors integrate many state-of-the-art statistical methods that they have developed in previous research into a framework for correlating genomic Windows of Repeated Association (WRA, also amalgamated into Clusters of Repeated Association based on LD among windows) with Similarity In Phenotype-Environment Correlation (SIPEC). The WRA/CRA methods are very useful and the authors do an excellent job at outlining the rationale for these methods.

      Weaknesses-

      The authors did an excellent job responding to the first set of reviews and overall I found the manuscript more streamlined and easier to read. The main weakness in the manuscript is that correlations among environmental variables were not controlled for in their results, and is a source of potential pseudoreplication. The authors are clear about the results that are affected by pseudoreplication.

      The manuscript shows how to integrate many recent methods to study the repeatability of adaptation, and the methods and data are likely to be used in similar studies.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The first major issue is related to the imaging and tracking experiment to examine the formation and migration of F-actin foci as illustrated in figure 3. The formation and centripetally migration of F-actin foci is a significant finding of this MS for the promotion of B cells to switch from spreading to contraction response. Thus, I may suggest to recommend the authors to conduct one more rigorous fluorescent molecular tracking experiment to confirm this phenomenon. Molecular tracking usually requires low labeling density, and the lifeact-GFP labeling here do not meet this requirement which may cause misidentification of the moving molecules. Permeable dye-based fluorescent speckle microscopy is recommended here to track the actin foci if applicable (P. Risteski, Nat. Rev. Mol. Cell Biol., 2023, DOI: 10.1038/s41580-023-00588w & K. Hu, et al, Science, 2007, 315, 111-115).

      We thank the reviewer for the suggestion. We conducted the suggested experiment using membrane-permeable SiR-actin to track B-cell actin dynamics. Unfortunately, two significant issues prevented us from confirming the LifeAct-GFP results using fluorescent speckle microscopy. First, the concentration of SiR-actin required to visualize F-actin in the contact zone of mouse primary B-cells was relatively high due to their smaller sizes (~6 µm diameter) and non-adherent nature. With such a relatively high concentration of SiR-actin, we could not perform fluorescent speckle microscopy. Second, we observed that SiR-actin appeared to stabilize actin structures and reduce actin dynamics, further limiting its use in studying actin dynamics in B-cells.

      Additionally, kymograph is used for foci tracking in figure3 and figure4. Kymograph is indeed a powerful tool for tracking cell protrusion and retraction but is not fairly suitable here, since a Factin focus is a concentrated point which may not move strictly along the selected eight lines generating kymograph. Other imaging processing method should be used to track the foci, for example, time series max projection is recommended if applicable.

      We thank the reviewer for the suggestion and have tried the time series max projection. Unfortunately, it did not provide the resolution to identify individual actin foci, again probably due to the small size of primary mouse B-cells. While kymographs may not track the entire paths of these moving foci, we believe that the conclusions drawn from the kymography analysis in Figure 3 and 4 are reasonable. We generated eight kymographs for each cell in Figure 3 and three kymographs for each cell in Figure 4 to follow as many actin foci as possible within the spreading to contraction transition time window. Our analysis in Figure 3 identifies the fraction of actin foci originating from lamellipodia. In Figure 4, we used the kymographs to trace the path of putative clusters and used these to calculate their relative lifetimes and speed. While this is not what was suggested by the reviewer, our analysis provides qualitatively similar information to the time series max projection and reasonable comparisons between contracted and noncontracted cells, inhibitor-treated and untreated cells, and wild-type and WASP KO cells.

      The second major issue is about the relationship between actin foci formation and NMII recruitment in figure 5. The author concludes that 'N-WASP and Arp2/3 mediated branched actin polymerization promotes the recruitment and the reorganization of NMII ring-like structures by generating inner F-actin foci in the contact zone'. However, there is a lack of strong evidence to directly show the mechanism by which myosin is recruited and the up and down stream relationship between actin foci migration and myosin recruitment. Since myosin-induced actin retrograde flow is a classical model in adherent cells, is it possible that, here also in activated B cells, the recruited myosin driven the formation and migration of actin foci? This reviewer may recommend the author to investigate whether Myosin blocking (e.g., using Y27632) can eliminate the F-actin foci formation and migration.

      This is an excellent suggestion! In the revised manuscript, we have included new data showing that treatment with the non-muscle myosin II motor inhibitor blebbistatin, which is known to inhibit B-cell contraction but not spreading on Fab’-PLB (Seeley-Fallen et al. 2022. Frontiers in Immunology), interferes with the formation of inner actin foci ring-like structures, which are associated with B-cell contraction. These results together suggest that the generation of inner actin foci ring-like structure depends on the coordination between N-WASP-mediated actin polymerization and myosin contractile activity. We chose to use blebbistatin rather than Y27632 to inhibit non-muscle myosin II because in addition to the ROCK pathway, myosin light chain kinase can also activate myosin II, and Y27632 may have additional effects besides inhibiting myosin activity. The new data are shown in Figure 5G and H and discussed in the revised manuscript.

      Reviewer #2 (Public Review):

      Weaknesses: Minor as listed below. The working hypothesis of molecular crowding as a way to push out signalling molecules from the BCR dense foci is interesting. The authors provide evidence for that this is an active process mediated by N-WASP - Arp2/3 induced actin foci. Another possibility is that BCR dense foci formation is an indirect consequence of lamellipodia retraction. Future works should define the specific role of N-WASP, Arp2/3 and actin in the process to form BCR dense foci, especially as the BCR continue to signal in the cytoplasm.

      We thank the reviewer for the comments. We have included the possibility that lamellipodial retraction may be involved in increasing the molecular density of BCR clusters and suggested future studies on the potential roles of N-WASP-dependent inner actin foci and actomyosin structures in BCR internalization and intracellular signaling in the Discussion section.

      Reviewer #3 (Public Review):

      The author prove their claims by mean of thorough image analysis, mainly observing and quantifying the fluorescence and the dynamics of single clusters of antigen and actin foci and analyzing two-colors dynamical images. They perform their observation in control cells, on pharmacologically perturbed cells where the action of Arp2/3 or N-WASP is inhibited, and on modified primary cells (primary derived from genetically engineered mice) to silence N-WASP or WASP. The work is sound and complete, the experiments technically excellent and well explained. Some experiments and discussions are objectively harder to describe, and given the length of the work, the reader might find itself lost some times. A graphical abstract/summary of the main way N-WASP ultimately control signal attenuation would solve this minor point.

      We greatly appreciate the reviewer’s confirmation of our data quality and are delighted to accept the reviewer’s suggestion. In the revised manuscript, we have included a new figure (Figure 10) in the Discussion section, summarizing the results presented in the manuscript as a working model.

      Reviewer #1 (Recommendations For The Authors):

      Some minor points: Figure 1C, E, G and I shows three individual symbols, indicating three independent experiments described in legend. Please double check for accuracy.

      It is better to show statistical data with representative repeat, not the merged means of independent experiments. For example, figure 1C even indicates three "0" data in CK-666 treated cells, meaning no contracting cell was found in ~75 cells, while there are other repeats showing 45% - 50% contracting cells. This applies to all figures involving individual cell imaging data, such as figure 2D, in which 30 cells from three independent experiments were pooled. The authors shall clearly state that those independent experiments are statistically indistinguishable before pooling the data.

      We agree with the reviewer’s comments that these data have variability from individual mice, the quality of isolated primary B-cells, and the lateral mobility of planar lipid bilayers. To show the variability, we displayed the data from each experiment as individual data points. In the revised manuscript, we have utilized three colors of dots to represent three independent experiments in Figure 1C, E, G, and I, Figure 2B-G, and new Figure 5H, which show that the data from the three experiments have the same trend despite the variability.

      In figure 7B-C, figure 8 and figure 9. The significant test results were hard to understand in which groups they compared. Please describe it in more detail in the figure legend or the method section.

      In the legend, the authors claimed blue points in Figure 7B represented individual pCD79a clusters within an equal number of BCR clusters from each time points. The authors used means to qualify the change of blue points distribution. These shall be clearly stated in the Methods. Total BCR cluster numbers shall be shown also. This applies to Figure 7B, 7C, 7D and all figures in figure 8 and figure 9.

      We thank the reviewer for pointing it out. We have revised Figures 7-9, where we utilized square braces to indicate groups of clusters (blue points) being compared. We have also provided additional information in the figure legend and Method sections.

      Reviewer #2 (Recommendations For The Authors):

      199-200: What is the consequence of increased WASP activation in N-WASP knockout B cells? Is this evaluated as increased pWASP activity and/or increased actin polymerization of WASP knockout B cells. Does WASP and N-WASP have an additive or counteractive effect on each other during spreading and contraction?

      Indeed, the relationship between WASP and N-WASP, which are co-expressed in B-cells and other immune cells, is fascinating. Our previous studies, using WASP germline knockout, B-cellspecific N-WASP knockout, WASP and N-WASP double knockout mice, showed that WASP and N-WASP have both additive and counteractive effects during B-cell spreading, but B-cell contraction only depends on N-WASP (Liu et al. 2013. PLoS Biol). Double knockout B-cells fail to spread, and WASP knockout B-cells show reduced spreading but still contract, showing their additive effects. However, WASP and N-WASP suppress each other for activation, as detected by their phosphorylation. Phosphorylated WASP increases in the B-cell contact zone first, and phosphorylated N-WASP increases later when the phosphorylated WASP level decreases. Knocking out one of them enhances the phosphorylation of the other. Consequently, N-WASP knockout B-cells show increased spreading, probably due to enhanced activation of WASP, but exhibit delayed contraction. The revised manuscript has expanded the discussion on this area to relate it to the results presented in this manuscript.

      560-563: Was Syk and SHIP-1 measured in the same cell? If not, the conclusion should be tempered.

      Unfortunately, antibodies specific for Syk and SHIP-1 were from the same host, which did not allow us to stain them in the same cells. The revised manuscript has discussed this as a shortcoming of our work.

      1204-1205: Explain better "three randomly positioned kymographs were generated" - how were they selected?

      We apologize for this unclear sentence. The three kymographs were positioned to track as many inner F-actin foci as possible.

      328: Change "abolished" to "reduced" to describe the data. 354-356: Unclear sentence, please edit. 1171: (H) should be (G). 1325: "PI" should be "FI".

      We thank the reviewer for finding these typos and unclear sentences. We have made the corrections accordingly.

      Methods: The description of the TIRF microscopy method is good. Regarding the image analysis, it is somehow difficult to have a good understanding of what was analyzed just by reading the text. Please show an example of the pipeline for the analysis from a raw image and the processing steps.

      Figure 6-figure supplement 2 shows the image analysis process for tracking Fab’ clusters. We utilized the same approach for the image analysis of Figures 7-9.

      Discussion: Add a paragraph to state the limitations of the study. How do the findings here translate into in vivo activation of B cells and how can this be addressed based on the data presented in this study.

      We thank the reviewer for the suggestion. In several paragraphs of the revised Discussion section, we have brought up the limitations of the study and how these limitations affect the data interpretation. In addition, we have added Figure 10 and the associated text to present our working model, which explains how our findings reveal the cellular mechanism by which BCR surface signaling amplification transitions into attenuation, likely occurring in vivo.

      Figure 2: Add an example of the image analysis for foci determination. From the images, it is not always clear what is a foci and what is not which makes the "number of foci" data difficult to evaluate.

      We have added arrows to Figure 2A to indicate all identified inner F-actin foci in images.

      Figure 3: add a kymograph for the WKO analysis.

      In the revised Figure 4, we have provided a kymograph of a WKO B cell.

      Figure 4M: the analysis of the "relative speed" of the "WT" samples is lower compared to the other control samples "DMSO" and "CK-689". The conclusion is that WKO have similar "relative speed" as "WT" cells, but in fact the "WT" cells may have responded poorly in this experiment. What is the author's experience and explanation?

      We agree that the relative speeds of inner actin foci in the contact zone of WT and WKO B-cells are relatively low compared to DMSO and CK-689. Based on our experience, this parameter is very sensitive to the lateral mobility of planar lipid bilayers. We could only perform one pair of conditions using live cell images each time. The WT and WKO experiments were done at the end and might use relatively aged liposomes. However, it did not affect the number of inner actin foci formed and their relative lifetime, consistent with their similar relative speeds. Unfortunately, we lost the LifeAct-GFP-expressing WKO mouse colony and cannot redo this experiment using freshly made liposomes within a reasonable time.

      Figure 7B-D: Add a more detailed legend for the black and brown lines in the dot plots.

      We have expanded the legend for Figure 7B-D to provide additional details.

      Figure 8-9: Show representative images for SYK, pSYK, SHIP-1 and pSHIP-1. Add a more detailed legend for the black and brown lines in the dot plots.

      We have provided representative images for Syk, pSyk, SHIP-1, and pSHIP-1 in revised Figure 8 and 9.

      Reviewer #3 (Recommendations For The Authors):

      From the paper one understands that NMII is recruited by the actin foci and this recruitment pushes the foci towards the center of the synapse, in what resembles a positive feedback. Could the authors better elucidate this point? What happen at the peak of NMII recruitment? Could this be a mechanism used by the cell to end the contact and detach (which probably cannot be observed in this experimental setup)?

      This is an excellent comment! We have recently shown that NMIIA recruitment peaks right before B-cell contraction occurs, and inhibition of NMII by inhibitors or B-cell conditional knockout blocks B-cell contraction and enhances signaling (Seeley-Fallen et al. 2022. Frontiers in Immunology). In the revised manuscript, we have included new data showing that treatment with the NMII motor inhibitor blebbistatin, which is known to inhibit B-cell contraction but not spreading on Fab’-PLB (Seeley-Fallen et al. 2022. Frontiers in Immunology), interferes with the formation of inner actin foci associated with B-cell contraction. These results together suggest that the generation of inner actin foci depends on the coordination between N-WASP-activated actin polymerization and myosin contractile activity, supporting the reviewer’s comment. The new data are shown in Figure 5G and H and discussed in the revised manuscript.

      Whether the recruited NMII pulls B-cells away from antigen-presenting surfaces remains an interesting question. We have previously shown that high-affinity interaction of surface BCRs with membrane-anchored antigen can cause NMII-dependent B-cell membrane permeabilization, which triggers lysosome exocytosis and lysosomal enzyme-mediated antigen cleavage, allowing antigen internalization and presentation to T-cells (Maeda et al. 2021. eLife). Furthermore, NMII is required for B cells to internalize surface antigens (Natkanski et al. 2013. Science). These results support the possibility that actomyosin structures formed during B-cell contraction may further drive B-cells to internalize antigen. We have discussed this interesting point in the revised manuscript.

      Some experiments/quantification are a bit more complex than others and a reader might find hard to follow them (in particular figs 7,8 and 9). The comprehension could be improved by providing a guide to read them. E.g. it is not clear what the population distribution represents (and it is not particularly affected by any manipulation. How were the group for test chosen? It seems they are based on intensity categories taken every 100 units: is it the case? even if arbitrary, this should be stated it in the legend.

      We thank the reviewer for understanding the complexity of image analysis and pointing out the unclear points. Based on the reviewer’s comments, we have revised Figures 7-9 and the figure legend. We utilized square brackets to indicate groups of clusters (blue points) being compared. The comparison groups were chosen arbitrarily based on Fab’ peak fluorescence intensity every 90 units for Figure 7 and 8 and every 100 units for Figure 9.

      Can the author speculate on how the actin organization passes from actin foci to recruitment of NMII and arc formation? Is it a rearrangement of the actin network (percolation) or simply recruitment of monomers?

      Our previous and new results show that both N-WASP-activated Arp2/3 and NMII are required to form inner F-actin foci. Based on these results, we speculate that N-WASP and Arp2/3mediated actin polymerization may initiate the process and recruit NMII, and recruited NMII coordinates with actin polymerization to reorganize actin structures, promoting inner actin foci maturation and arc formation. We have included these possibilities in the revised discussion.

      The role of SHIP recruitment as way to inhibit the signal downstream of the BCR is an interesting finding. Is this related to the termination of the synapse? Could we relate the time scales (accurately measured in this work) to contact times observed in vivo?

      The reviewer raises an interesting question. In the discussion section, we have speculated that the actomyosin structures responsible for B-cell contraction are potentially the precursor cytoskeleton structures for antigen internalization. However, the relationship of B-cell contraction and signaling attenuation with the termination of the synapse remains unclear.

      The BCR has been shown to be internalised mechanically: do these new data suggest a mechanisms for force generation in antigen internalization at the actin foci? Related to that, how do the dynamics of N-WASP recruitment relate to the force measurement highlighted in Traction Force Microscopy experiments (see for example Wang Sci.Signal. 2018, Kumari Nat.Comm.2019)? What happens in situation when the actin foci are unable to get transported, e.g. as on the more classical antigen on coverslip configuration?

      Indeed, our results allow us to speculate that the actomyosin structures responsible for B-cell contraction potentially contribute to antigen internalization by mechanical forces. We previously showed that the B-cell-specific N-WASP knockout drastically reduced BCR internalization of soluble antigen (Liu et al. 2013. PLoS Biol), and that NMII is required for BCR internalization of membrane-associated antigen (Maeda et al. 2021. eLife and Natkanski et al. 2013. Science). The effect of N-WASP knockout on the internalization of membrane-associated antigen and traction forces generated at the contact membrane and whether traction forces are generated from the inner F-actin foci have not been determined but will be pursued in the future.

      Our previous publication compared the BCR and actin dynamics of B-cells interacting with Fab’ tethered to planer lipid bilayers (Fab’-PLB) and cover glass (Fab’-G) (Ketchum et al. 2014. Biophys J). B-cells interacting with Fab’-G do not contract and generate inner F-actin foci and exhibit less dynamic BCR clusters and actin cytoskeleton than B-cells interacting with Fab’-PLB. Actin foci remain coincident with Fab’ clusters on glass rather than being positioned behind Fab’ clusters on PLB, thus driving their centripetal movement.

      Minor remarks: When several experiments (mice) are presented in dot plots (e.g. fig 2D-G 4J-M), color dot plot (so called "smart plot") where each experiment is identified by a color, could be used to highlight the sample-to-sample variability.

      This is an excellent suggestion. In the revised manuscript, we have utilized three shades of dots to represent the data points from three independent experiments.

      Fig 6A: the fluorophore should be indicated in the picture (Fab'-AF546)

      The suggested correction has been made.

      Fig 6D: how is the contraction phase (purple rectangle) determined? Curve by curve or on the average curve? Please specify this in the legend.

      The contraction phase (purple rectangle) was determined using the average curve of the contact area by IRM over time. We have added this sentence to the revised figure legend.

      Minor typos in the material and methods: in some case C56BL/6 is written instead of C57BL/6 Corrected.

    2. eLife assessment

      This is an important study highlighting a distinct role of WASP dependent actin foci in B cell antigen receptor signalling. The evidence supporting the conclusions is compelling. The proposal of higher molecular density in B cell receptor clustering leading to kinase exclusion and attenuated signalling is provocative as it contrasts with models for other antigen receptors.

    3. Reviewer #1 (Public Review):

      In this study, the authors demonstrated a new model that B cell contraction after antigen encountering was dependent on N-WASP-branched actin polymerization. This statement is achieved by a systemic comparison of genetic modified mice vs wild type mice or inhibitor treated cells vs control cells. By imaging how B cells interact with antigen-coated planar lipid bilayer, the authors further suggested that the contraction event may provide B cells a channel to dismiss downstream kinase for a purpose to attenuate B cell activation signaling.

      In this revised version, the authors have fully addressed my concerns raised against the initial submission of their studies.

    4. Reviewer #2 (Public Review):

      Bhanja et al have examined how actin polymerization switch B-cell receptor (BCR) signaling from amplification to attenuation. The authors have examined B cell spreading and contraction using lipid bilayers to assess the molecular regulation of BCR signalling during the contraction phase. Their data provide evidence for that N-WASP activated Arp2/3 generates centripetally moving actin foci and contractile actomyosin from lamellipodia actin networks. This generates BCR dense foci that pushes out both stimulatory kinases and inhibitory phosphatases. The study provides novel insight into how B cells upon activation attenuate BCR signalling by contraction of the actin cytoskeleton and clustering of BCR foci and this dynamic response is mediated by N-WASP and Arp2/3.

      Strengths: The manuscript is well written and results, methods, figures and legends described in detail making it easy to follow the experimental setup, analysis, and conclusions. The authors achieved their aims, and the results support their conclusions.

      Weaknesses: Minor. The working hypothesis of molecular crowding as a way to push out signalling molecules from the BCR dense foci is interesting. The authors provide evidence for that this is an active process mediated by N-WASP - Arp2/3 induced actin foci. Another possibility discussed in the revised version is that BCR dense foci formation is an indirect consequence of lamellipodia retraction. Future works should define the specific role of N-WASP, Arp2/3 and actin in the process to form BCR dense foci, especially as the BCR continue to signal in the cytoplasm.

    5. Reviewer #3 (Public Review):

      This work shows how, in the formation of the immune synapse, the B cell controls the contraction phase, the formation and retraction of actin structures concentrating the antigen (actin foci), and, ultimately, global signal attenuation. The authors use a combination of TIRF microscopy and original image quantification to show that Arp2/3 activated by N-WASP controls a pool of actin concentrated in foci (situated in the synapse), formed and transported centripetally towards the center of the synapse through myosin II mediated contractions. These contractions concentrate the B cell receptors (BCR) in the center, promote disassembly of the stimulatory kinase Syk as well as the the disassociation from the BCR of the inhibitory phosphatase SHIP, process which entails the attenuation of the BCR signal.

      The author prove their claims by mean of thorough image analysis, mainly observing and quantifying the fluorescence and the dynamics of single clusters of antigen and actin foci and analyzing two-colors dynamical images. They perform their observation in control cells, on pharmacologically perturbed cells where the action of Arp2/3 or N-WASP is inhibited, and on modified primary cells (primary derived from genetically engineered mice) to silence N-WASP or WASP. The work is sound and complete, the experiments technically excellent and well explained.

      In the reviewed manuscript the authors answer to all referees' suggestions and add new data and comments to the manuscript. In particular by suppressing NMII activation (with Blebbistatin), they show that NMII contraction plays a role (in coordination with N-WASP mediated actin polymerization) in the generation of actin foci ring-like structures.

      This work adds an important information to the current view of B cell activation, in particular it links the contraction phase to the actin foci that have been recently characterized. Moreover, the late phase of the immune synapse formation is poorly investigated, but it is crucial for the fate of the cell: this work provides an explanation for the attenuation of the signal that might lead to the termination of the synapse.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The investigators sought to determine whether Marco regulates the levels of aldosterone by limiting uptake of its parent molecule cholesterol in the adrenal gland. Instead, they identify an unexpected role for Marco on alveolar macrophages in lowering the levels of angiotensin-converting enzyme in the lung. This suggests an unexpected role of alveolar macrophages and lung ACE in the production of aldosterone.

      Strengths:

      The investigators suggest an unexpected role for ACE in the lung in the regulation of systemic aldosterone levels. The investigators suggest important sex-related differences in the regulation of aldosterone by alveolar macrophages and ACE in the lung. Studies to exclude a role for Marco in the adrenal gland are strong, suggesting an extra-adrenal source for the excess Marco observed in male Marco knockout mice.

      Weaknesses:

      While the investigators have identified important sex differences in the regulation of extrapulmonary ACE in the regulation of aldosterone levels, the mechanisms underlying these differences are not explored. The physiologic impact of the increased aldosterone levels observed in Marco -/- male mice on blood pressure or response to injury is not clear. The intracellular signaling mechanism linking lung macrophage levels with the expression of ACE in the lung is not supported by direct evidence.

      Reviewer #2 (Public Review):

      Summary:

      Tissue-resident macrophages are more and more thought to exert key homeostatic functions and contribute to physiological responses. In the report of O'Brien and Colleagues, the idea that the macrophage-expressed scavenger receptor MARCO could regulate adrenal corticosteroid output at steady-state was explored. The authors found that male MARCO-deficient mice exhibited higher plasma aldosterone levels and higher lung ACE expression as compared to wild-type mice, while the availability of cholesterol and the machinery required to produce aldosterone in the adrenal gland were not affected by MARCO deficiency. The authors take these data to conclude that MARCO in alveolar macrophages can negatively regulate ACE expression and aldosterone production at steady-state and that MARCO-deficient mice suffer from secondary hyperaldosteronism.

      Strengths:

      If properly demonstrated and validated, the fact that tissue-resident macrophages can exert physiological functions and influence endocrine systems would be highly significant and could be amenable to novel therapies.

      Weaknesses:

      The data provided by the authors currently do not support the major claim of the authors that alveolar macrophages, via MARCO, are involved in the regulation of a hormonal output in vivo at steady-state. At this point, there are two interesting but descriptive observations in male, but not female, MARCO-deficient animals, and overall, the study lacks key controls and validation experiments, as detailed below.

      Major weaknesses:

      1) According to the reviewer's own experience, the comparison between C57BL/6J wild-type mice and knock-out mice for which precise information about the genetic background and the history of breedings and crossings is lacking, can lead to misinterpretations of the results obtained. Hence, MARCO-deficient mice should be compared with true littermate controls.

      2) The use of mice globally deficient for MARCO combined with the fact that alveolar macrophages produce high levels of MARCO is not sufficient to prove that the phenotype observed is linked to alveolar macrophage-expressed MARCO (see below for suggestions of experiments).

      3) If the hypothesis of the authors is correct, then additional read-outs could be performed to reinforce their claims: levels of Angiotensin I would be lower in MARCO-deficient mice, levels of Antiotensin II would be higher in MARCO-deficient mice, Arterial blood pressure would be higher in MARCO-deficient mice, natremia would be higher in MARCO-deficient mice, while kaliemia would be lower in MARCO-deficient mice. In addition, co-culture experiments between MARCO-sufficient or deficient alveolar macrophages and lung endothelial cells, combined with the assessment of ACE expression, would allow the authors to evaluate whether the AM-expressed MARCO can directly regulate ACE expression.

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      1. Corticosterone levels in male Marco -/- mice are not significantly different, but there is (by eye) substantially more variability in the knockout compared to the wild type. A power analysis should be performed to determine the number of mice needed to detect a similar % difference in corticosterone to the difference observed in aldosterone between male Marco knockout and wild-type mice. If necessary the experiments should be repeated with an adequately powered cohort.

      We thank the reviewer for their comments. We are prepared to carry out these power calculations and repeat the experiment if necessary.

      1. All of the data throughout the MS (particularly data in the lung) should be presented in male and female mice. For example, the induction of ACE in the lungs of Marco-/- female mice should be absent. Similar concerns relate to the dexamethasone suppression studies. Also would be useful if the single cell data could be examined by sex--should be possible even post hoc using Xist etc.

      We are prepared to measure the levels of Ace, biosynthetic enzyme expression in female mice by qPCR, and ACE protein expression by IF. Additionally, we will test females using the dexamethasone suppression study. The single cell RNA seq analysis was used primarily to inform our model, not for experimental readout. We will explore the dataset as the reviewer suggests and will add additional plots if the analysis substantively changes our previous findings.

      1. IF is notoriously unreliable in the lung, which has high levels of autofluorescence. This is the only method used to show ACE levels are increased in the absence of Marco. Orthogonal methods (e.g. immunoblots of flow-sorted cells, or ideally CITE-seq that includes both male and female mice) should be used.

      We have negative controls for antibody staining. Additionally, we also used qPCR to show an increase in Ace mRNA expression in the lung.

      1. Given the central importance of ACE staining to the conclusions, validation of the antibody should be included in the supplement.

      The vendor of this antibody has verified by cell treatment to ensure that the antibody binds to the antigen stated .We are prepared to additionally validate the antibody using other tissues as control, though we point out that ACE is expressed, albeit at lower levels, in endothelial cells throughout the body and so some signal is to be expected in most if not all tissues.

      1. The link between alveolar macrophage Marco and ACE is poorly explored.

      We are prepared do co-culture experiments of alveolar macrophages and endothelial cells and measure ACE/Ace expression as a consequence.

      1. Mechanisms explaining the substantial sex difference in the primary outcome are not explored.

      We argue that this would be outside the scope if this project, though we would consider exploring such experiments in future studies.

      1. Are there physiologic consequences either in homeostasis or under stress to the increased aldosterone (or lung ACE levels) observed in Marco-/- male mice?

      We are prepared to measure blood electrolytes and blood pressure in Marco-deficient and Marco-sufficient mice.

      Reviewer #2 (Recommendations For The Authors):

      Below is a suggestion of important control or validation experiments to be performed in order to support the authors' claims.

      1) It is imperative to validate that the phenotype observed in MARCO-deficient mice is indeed caused by the deficiency in MARCO. To this end, littermate mice issued from the crossing between heterozygous MARCO +/- mice should be compared to each other. C57BL/6J mice can first be crossed with MARCO-deficient mice in F0, and F1 heterozygous MARCO +/- mice should be crossed together to produce F2 MARCO +/+, MARCO +/- and MARCO -/- littermate mice that can be used for experiments.

      We thank the reviewer for their comments. We recognise the concern of the reviewer but due to limited experimenter availability we are unable to undertake such a breeding programme to address this particular concern.

      2) The use of mice in which AM, but not other cells, lack MARCO expression would demonstrate that the effect is indeed linked to AM. To this end, AM-deficient Csf2rb-deficient mice could be adoptively transferred with MARCO-deficient AM. In addition, the phenotype of MARCO-deficient mice should be restored by the adoptive transfer of wild-type, MARCO-expressing AM. Alternatively, bone marrow chimeras in which only the hematopoietic compartment is deficient in MARCO would be another option, albeit less specific for AM.

      We recognise the concern of the reviewer. We have access to an AM cell line which we plan to use to do co-culture experiments with an ACE-expressing endothelial cell line. In this way we will test whether this effect is linked to AMs.

      3) If the hypothesis of the authors is correct, then additional read-outs could be performed to reinforce their claims: levels of Angiotensin I would be lower in MARCO-deficient mice, levels of Antiotensin II would be higher in MARCO-deficient mice, Arterial blood pressure would be higher in MARCO-deficient mice, natremia would be higher in MARCO-deficient mice, while kaliemia would be lower in MARCO-deficient mice. Similar read-outs could also be performed in the models proposed in point 2).

      We are prepared to measure blood electrolytes and blood pressure (via tail cuff method) in Marco-deficient and Marco-sufficient mice.

      4) Co-culture experiments between MARCO-sufficient or deficient alveolar macrophages and lung endothelial cells, combined with the assessment of ACE expression, would allow the authors to evaluate whether the AM-expressed MARCO can directly regulate ACE expression.

      To address this concern, we plan to do a co-culture experiment as outlined above.

      Broadly, we thank the reviewers for taking the time to critically appraise this manuscript. The reviewers primary concern seems to be the lack of direct evidence of an effect of AMs on endothelial Ace expresion, which we plan to address as outlined above. We will adjust our conclusions as appropriate based on the results of the experiments outlined above.

    2. eLife assessment

      O'Brien and co-authors addressed how statins reduce levels of aldosterone in humans and provide important data demonstrating that tissue-resident macrophages can exert physiological functions and influence endocrine systems. However, the strength of evidence, as of now, is incomplete, as the sole description of the phenotype of MARCO-deficient mice is insufficient to claim that MARCO in alveolar macrophages can negatively regulate ACE expression and aldosterone production at steady-state. The work will be of broad interest to cell biologists and immunologists.

    3. Reviewer #1 (Public Review):

      Summary:

      The investigators sought to determine whether Marco regulates the levels of aldosterone by limiting uptake of its parent molecule cholesterol in the adrenal gland. Instead, they identify an unexpected role for Marco on alveolar macrophages in lowering the levels of angiotensin-converting enzyme in the lung. This suggests an unexpected role of alveolar macrophages and lung ACE in the production of aldosterone.

      Strengths:

      The investigators suggest an unexpected role for ACE in the lung in the regulation of systemic aldosterone levels.<br /> The investigators suggest important sex-related differences in the regulation of aldosterone by alveolar macrophages and ACE in the lung.<br /> Studies to exclude a role for Marco in the adrenal gland are strong, suggesting an extra-adrenal source for the excess Marco observed in male Marco knockout mice.

      Weaknesses:

      While the investigators have identified important sex differences in the regulation of extrapulmonary ACE in the regulation of aldosterone levels, the mechanisms underlying these differences are not explored.<br /> The physiologic impact of the increased aldosterone levels observed in Marco -/- male mice on blood pressure or response to injury is not clear.<br /> The intracellular signaling mechanism linking lung macrophage levels with the expression of ACE in the lung is not supported by direct evidence.

    4. Reviewer #2 (Public Review):

      Summary:

      Tissue-resident macrophages are more and more thought to exert key homeostatic functions and contribute to physiological responses. In the report of O'Brien and Colleagues, the idea that the macrophage-expressed scavenger receptor MARCO could regulate adrenal corticosteroid output at steady-state was explored. The authors found that male MARCO-deficient mice exhibited higher plasma aldosterone levels and higher lung ACE expression as compared to wild-type mice, while the availability of cholesterol and the machinery required to produce aldosterone in the adrenal gland were not affected by MARCO deficiency. The authors take these data to conclude that MARCO in alveolar macrophages can negatively regulate ACE expression and aldosterone production at steady-state and that MARCO-deficient mice suffer from secondary hyperaldosteronism.

      Strengths:

      If properly demonstrated and validated, the fact that tissue-resident macrophages can exert physiological functions and influence endocrine systems would be highly significant and could be amenable to novel therapies.

      Weaknesses:

      The data provided by the authors currently do not support the major claim of the authors that alveolar macrophages, via MARCO, are involved in the regulation of a hormonal output in vivo at steady-state. At this point, there are two interesting but descriptive observations in male, but not female, MARCO-deficient animals, and overall, the study lacks key controls and validation experiments, as detailed below.

      Major weaknesses:

      1) According to the reviewer's own experience, the comparison between C57BL/6J wild-type mice and knock-out mice for which precise information about the genetic background and the history of breedings and crossings is lacking, can lead to misinterpretations of the results obtained. Hence, MARCO-deficient mice should be compared with true littermate controls.

      2) The use of mice globally deficient for MARCO combined with the fact that alveolar macrophages produce high levels of MARCO is not sufficient to prove that the phenotype observed is linked to alveolar macrophage-expressed MARCO (see below for suggestions of experiments).

      3) If the hypothesis of the authors is correct, then additional read-outs could be performed to reinforce their claims: levels of Angiotensin I would be lower in MARCO-deficient mice, levels of Antiotensin II would be higher in MARCO-deficient mice, Arterial blood pressure would be higher in MARCO-deficient mice, natremia would be higher in MARCO-deficient mice, while kaliemia would be lower in MARCO-deficient mice. In addition, co-culture experiments between MARCO-sufficient or deficient alveolar macrophages and lung endothelial cells, combined with the assessment of ACE expression, would allow the authors to evaluate whether the AM-expressed MARCO can directly regulate ACE expression.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Our comments on the initial eLife assessment

      “This study presents a useful inventory of the joint effects of genetic and environmental factors on psychotic-like experiences, and identifies cognitive ability as a potential underlying mediating pathway. The data were analyzed using solid and validated methodology based on a large, multi-center dataset. The claim that these findings are of relevance to psychosis risk and have implications for policy changes are partially supported by the results”

      We sincerely appreciate the editor and reviewers for their valuable feedback and their willingness to accommodate our perspectives in the first revision. In this revision, the comments from the reviewers have allowed us to further improve our manuscript. Regarding the eLife assessment, we would like to discuss two points.

      Firstly, regarding your point of our “findings are of relevance to psychosis risk…partially supported…”, we want to address that our study is closely related to psychosis risk. Childhood psychotic-like experiences (PLEs) are closely linked to psychotic risk and have been shown to increase the risk of general psychopathology, as mentioned in our Introduction and Discussion.

      The reviewers asked for clearer differentiation between PLEs and schizophrenia, which we incorporated in this revision (line 100~111; line 419~430). So, this revised version now clearly points out that findings are relevant primarily to psychosis risk, and only partially relevant to schizophrenia risk.

      Secondly, regarding “…implications for policy changes are partially supported…”, we have revised our study’s social contribution more clearly and specifically. Incorporating the comments, we have revised that our study offers an insight to the future studies by showing the importance of integrative approaches, considering multi-factorial neurocognition and psychopathology ranging from genes to environment (line 503~512), rather than offers direct policy implications.

      Our collaboration with eLife and the reviewers has proven satisfactory and enriching. The community, coupled with the innovative system and culture established around eLife, has significantly advanced the progression of scientific research. We are privileged to contribute to this endeavor.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I am happy with the revisions provided by the authors and I think most of my concerns have been addressed satisfactorily. One remaining concern is the authors' conflation of PLEs and schizophrenia. They stated, for example, that it is necessary to adjust for schizophrenia PGS. Even though studies have found a statistical relationship between schizophrenia PGS and PLEs, this relationship is not very strong (although statistically significant) and other studies have found no relationship. Similarly, having PLEs increases the risk of developing psychosis, but that does not necessarily mean that this risk is substantial or specific. I think this needs more nuance in the manuscript and the term 'schizophrenia' should be used sparsely and very carefully as the paper has focused on PLEs. Otherwise, great work on the revisions, thank you.

      Thank you for your comment on the use of PLEs and schizophrenia. We clearly understand the differences between the two and we made relevant corrections throughout the manuscript. In particular, we added that PLEs are not a direct predictor of schizophrenia and corrected any expressions that may imply that PLEs are closely related to schizophrenia in the Introduction.

      “Psychotic-like experiences (PLEs), which are prevalent in childhood, indicate the risk of psychosis (van der Steen et al., 2019; Van Os & Reininghaus, 2016). Although they are not a direct precursor of schizophrenia, children reporting PLEs in ages of 9-11 years are at higher risk of psychotic disorders in adulthood (Kelleher & Cannon, 2011; Poulton et al., 2000). PLEs also point towards the potential for other psychopathologies including mood, anxiety, and substance disorders (van der Steen et al., 2019), are linked to deficits in cognitive intelligence (Cannon et al., 2002; Kelleher & Cannon, 2011) and show a stronger association with environmental risk factors during childhood than other internalizing/externalizing symptoms (Karcher, Schiffman, et al., 2021).

      Maladaptive cognitive intelligence may act as a mediator for the effects of genetic and environmental risks on the manifestation of psychotic symptoms (Cannon et al., 2000; Keefe et al., 2006; Reichenberg et al., 2005).” (line 100~111)

      We also revised any expressions that could be perceived as implying relevance to schizophrenia in the Discussion. “Prior research identifying the mediation of cognitive intelligence focused on either genetic (Karcher, Paul, et al., 2021) or environmental factors (Lewis et al., 2020) alone. Studies with older clinical samples have shown that cognitive deficit may be a precursor for the onset of psychotic disorders (Eastvold et al., 2007; Fett et al., 2020; Vorstman et al., 2015). Our study advances this by demonstrating the integrated effects of genetic and environmental factors on PLEs through the cognitive intelligence in 9-11 years old children. Such comprehensive analysis contributes to assessing the relative importance of various factors influencing children's cognition and mental health, and it can aid future studies designed for identifying health policy implications. Considering the directions and magnitudes of the effects, though the effects of PGS remain significant, aggregated effects of environmental factors account for much greater degrees on PLEs.” (line 419~430)

      Reviewer #2 (Recommendations For The Authors):

      I thank the authors for addressing most of my comments. I feel the manuscript has already greatly improved.

      I have a few more comments.

      1) Although I did not make this comment, I find the authors' reply to the following comment by Reviewer #1 unclear: Original comment 'I like that the assessment of CP (cognitive performance) and self-reports PLEs is of good quality. However, I was wondering which 4 items from the parent-reported CBCL (Child Behavior Checklist) were used and how did they correlate with the child-reported PLEs? And how was distress taken into account in the child self-reported PLEs measurement? Which PLEs measures were used?'

      The authors' response refers to correlation coefficients, but I think Reviewer #1's inquiry was on more than these correlations.

      Thank you for your concern. We think that this comment was referring to our previous manuscript submitted elsewhere. In our initial submission to eLife, we already added the details about the four items from the parent-reported CBCL and how distress was considered in the child self-reported PLEs measurement (Appendix S1, page 48).

      2) Regarding the authors' reply that they have 'standardized the use of 'cognitive capacity' - I do not understand what this means. How exactly was this term standardized? In fact, I can find the term 'cognitive capacity' only once and it seemed to have been deleted from the manuscript. This is fine, but it doesn't clearly align with the statement that this term has been standardized.

      We apologize for causing such confusion. What we meant was that throughout our revised manuscript, we used the term “cognitive phenotypes” instead of “cognitive capacity”.

      3) Regarding my initial comment that 'it needs to be described how cognitive performance was defined in Lee 2018.' - I believe this is still not clarified. The authors write 'CP was measured as the respondent's score on cognitive ability assessments', but it remains unclear what exactly these assessments were.

      Thank you for pointing this out. We added that “CP, measured as the respondent's score on cognitive ability assessments of general cognitive function and verbal-numerical reasoning, was assessed in participants from the COGENT consortium and the UK Biobank” (line 204~206).

      4) Regarding the authors' reply to my comment 'In the 'Path Modeling' section, please explain what 'factors and components' concretely refer to. How is this different from a standard SEM with latent factors?'

      I can see that the authors explained 'components' (=the weighted sum of observed variables), but please also add what you mean by 'factors' - and how these are different from 'components' (line 284). Furthermore, I don't think it is correct that SEMs can only model latent factors, but not components (=measured variables). I also cannot see how using a weighted sum of observed variables controls more effectively for bias in estimation than latent factors. However, even though I do have some knowledge on this method, I'm not an expert and would appreciate the authors, other reviewer and/or editor to weigh in on this point.

      Thank you for pointing this out. We added that latent factors are indirectly measured indicators that explain the covariance among observed variables (line 263~271). We also added that standard SEM method using latent factors assumes that observed variables within each construct share a common underlying factor, but if this assumption is not met, then the standard SEM method cannot effectively control for biases. This is the reason why the IGSCA method, which addresses this limitation by allowing for use of both composite and latent factors as constructs.

      “Standard SEM using latent factors (i.e., indirectly measured indicators that explain the covariance among observed variables) to represent indicators such as PGS or family SES relies on the assumption that observed variables within each construct share a common underlying factor. If this assumption is violated, standard SEM cannot effectively control for estimation biases. The IGSCA method addresses this limitation by allowing for the use of composite indicators (i.e., components)—defined as a weighted sum of observed variables—as constructs in the model, more effectively controlling bias in estimation compared to the standard SEM. During estimation, the IGSCA determines weights of each observed variable in such a way as to maximize the variances of all endogenous indicators and components.” (line 263~271)

      5) I overall disagree with the authors' following statement 'It has been suggested from prior studies that these variables (PGS, family SES, neighborhood SES, positive family and school environment, and PLEs) are less likely to share a common factor', but I appreciate the authors' argument.

      Thank you for your comment. To make clarify our statement in the manuscript, we changed the sentence to “Considering that the observed variables of the PGSs, family SES, neighborhood SES, positive family and school environment, and PLEs are evaluated as a composite index by prior research, the IGSCA method can mitigate bias more effectively by representing these constructs as components” (line 274~277).

      6) Regarding 'genetic ethnicity': please describe your methods on how this was defined.

      Genetic ethnicity was defined as the genetic ancestry of participants, which is included as one of observations in the original ABCD Study data. To avoid further confusion, we corrected ‘genetic ethnicity’ to ‘genetic ancestry’ throughout the manuscript.

      7) Regarding 'a more direct genetic predictor of PLEs' - I still don't understand what the contrast is here. More direct than what else?

      The description was unclear; we removed it from our manuscript.

      8) Regarding the factor loadings in Figure 3: I don't understand how deprivation loads positively on 'low neighborhood SES', but poverty loads negatively. Shouldn't they both show the same direction of effect/loading on neighbourhood SES, while 'years of residency' should show the opposite direction (i.e., deprivation and poverty = risk, while years of residency = protective)? Are these unexpected loadings?

      The authors did not yet respond to this point: 'Please also add the autocorrelations between the 3 PLE measures. I assume these were also modelled statistically, given the strong correlations between time points?' Were these correlations not modelled? Why not?

      Figure 3B is still unclear. Was intelligence included here? What is the difference between Figure 3A and B? The legend suggests that 3B shows the indirect effects, but figure 3B looks like a direct effect, while 3A seem to show the indirect effect.

      The reviewer’s confusion resulted from our incorrect description. The factor loadings of low neighborhood SES were marked incorrectly. The loading for ‘years of residence’ and ‘poverty’ should be switched: -0.3648 for ‘years of residence’ and +0.877 for ‘poverty’. This was a mistake when we were applying factor loadings in the Figure. We thank you for pointing this out.

      We apologize for missing your point on autocorrelation. Adding autocorrelations between the three PLEs is unrelated to our research goal. In this paper, we investigated how genetic and environmental factors explain the variations in PLEs between participants, regardless of changes over time. Since we used PLEs of multiple follow-ups to ensure that the results are robust irrespective of the timing of PLE measurements, taking autocorrelation into account is not necessary.

      The decision to add autocorrelation, which involves using the outcome variable at time (t-1) as a predictor for the outcome variable at time t, depends on the research focus. If your interest lies in explaining inter-individual variation in the rate of change in PLEs over a one-year period, then autocorrelation should be controlled for (typically, predictors measured at different time points are used in such cases). However, this was not the focus of this paper, which is why we did not apply autocorrelation in the SEM analysis.

      We apologize for the confusion between Figure 3A and 3B. To clarify, we added titles in the figure images as “Direct effects” and “Indirect effects”. We also changed the legend as well.

      “A. Direct pathways from PGS, high family SES, low neighborhood SES, and positive environment to cognitive intelligence and PLEs. Standardized path coefficients are indicated on each path as direct effect estimates (significance level *p<0.05). B. Indirect pathways to PLEs via intelligence were significant for polygenic scores, high family SES, low neighborhood SES, and positive environment, indicating the significant mediating role of intelligence.” (line 968~973)

      Figure 3A shows direct effects: i.e., the coefficients of paths from PGS, family SES, neighborhood SES, and positive environment to intelligence and PLEs, as well as the coefficient of paths from intelligence to PLEs. This is why Figure 3A shows colored arrows starting from PGS, family and neighborhood SES, and positive environment towards intelligence and PLEs, as well as the arrows from intelligence to PLEs. On the other hand, in Figure 3B, the colored arrows staring from PGS, family and neighborhood SES, and positive environment goes through intelligence, and heads towards PLEs. This was meant to show that the indirect effects shown in Figure 3B indicate the specific effects of PGS, family SES, neighborhood SES, and positive environment on PLEs mediated by intelligence.

      In short, Figure 3 can be seen as a diagram drawn from Table 2: direct effects of the genetic and environmental variables on intelligence and PLEs, and direct effects of intelligence on PLEs are shown in Figure 3A; indirect effects of genetic and environmental variables on PLEs mediated by intelligence are shown in Figure 3B.

      9) Regarding Supporting Information tables: to make these more digestible, I suggest using Excel and adding one table per sheet with a clear title and legend, indicating what each table shows. For example, Table S1 has 9(?) different subsections, all called the same (Linear Mixed Model: Multiethnic). It is not clear how each subsection differs from the others. Separate tables in separate excel sheets might be easier.

      Also, I think two decimal points might be good enough, enhancing readability of these tables.

      Thank you for your suggestion. We moved the supplementary tables into an external Excel file, with each sheet showing different tables, as well as titles, legends, and clear subsections.

      10) Regarding reporting exact p-values in Table 2: I don't understand. At the moment, categorical significance statements are reported. Were these not based on exact p-values (or how else was it decided if a finding was significant at a 0.05 (?) significance level).

      Either remove the significance column completely (as p-values cannot be estimated due to non-normality) or specify exactly/clarify what this column shows and this was derived.

      We apologize for the confusion. In Table 2, we checked the significance of each path using 95% confidence intervals with 5,000 bootstrapping iterations. Since 95% confidence intervals that does not include zero is equivalent to p-values below 0.05 significance level, we believe this is an appropriate alternative for reporting the significance of each path in the SEM model.

      We specified the reason why we were not able to calculate exact p-values (clean copy: line 299~303). “As a trade-off for obtaining robust nonparametric estimates without distributional assumptions for normality, the IGSCA method does not return exact p-values (Hwang, Cho, Jung, et al., 2021). As a reasonable alternative, we obtained 95% confidence intervals based on 5,000 bootstrap samples to test the statistical significance of parameter estimates.”

    2. eLife assessment

      This study presents a useful inventory of the joint effects of genetic and environmental factors on psychotic-like experiences and identifies cognitive ability as a potential underlying mediating pathway. The data were analyzed using a solid and validated methodology based on a large, multi-center dataset. The claim that these findings are of relevance to psychosis risk and have implications for policy changes is partially supported by the results.

    3. Joint Public Review:

      This paper aimed to assess the link between genetic and environmental factors on psychotic-like experiences and the potential mediation through cognitive ability. This study was based on data from the ABCD cohort, including 6,602 children aged 9-10 years. The authors report a mediating effect, suggesting that cognitive ability is a key mediating pathway in linking several genetic and environmental (risk and protective) factors to psychotic-like experiences.

      Strengths of the methods: The authors use a wide range of validated (genetic, self- and parent-reported, and cognitive) measures in a large dataset with a 2-year follow-up period. The statistical methods have the potential to address key limitations of previous research.

      Weaknesses of the methods: Not the largest or most recent GWASes were used to generate PGSes.

      Strengths of the results: The authors included a comprehensive array of analyses.

      Weaknesses of the results: Results are only sometimes clearly described and presented.

      Appraisal: The authors suggest that their findings provide evidence for policy reforms (e.g., targeting residential environments, family SES, parenting, and schooling).

      Impact: Immediate impact is limited given the short follow-up period (2 years), possibly concerns for selection bias and attrition in the data, and some methodological concerns. The authors are transparent about most of these limitations.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would first like to thank the reviewers for their time and effort in their critical review of our manuscript, and appreciate the opportunity to address these comments. We thank the reviewers for appreciating that our experimental design is well crafted, and contributes to the broader understanding of dietary exercise recommendations for metabolic health and muscle development. We have revised the figures and text in accordance with the reviewer’s recommendations, and hope that they appreciate the revised version.

      Reviewer #1:

      1) A significant limitation of this study pertains to the absence of a detailed exploration into the mechanistic underpinnings of the interaction between high protein intake and resistance exercise at the molecular level. The authors should provide a comprehensive discussion on potential avenues or prospective research directions to address this gap in understanding.

      We agree and have added some theories in the discussion on page 14.

      2) Figure 4 and Figure 7 can be moved to supplementary and text in the description can be arranged accordingly to make a better flow of the story.

      We agree with this suggestion and have made adjustments.

      3) The authors have used a high protein diet (36% calorie from protein) and a low protein diet (7% calorie from protein) for this study. The authors should explain whether this mouse diet is practically comparable to the human's high protein (2% of BW) and low protein diet (less than 0.8% BW) or not. The high protein diet is comparable to a human diet of 180 grams of protein ((0.36x2000 calories)/4 calories per gram=180 g), which is in a range that some people consume, particularly bodybuilders and athletes. The low protein diet is equivalent to 35 grams of protein per day ((0.07x2000 calories)/4 calories/gram=35g), and a diet of just 7% protein is not recommended for humans per the Acceptable Macronutrient Distribution Range (AMDR) of 10-35% dietary protein set by the Institute of Medicine (IOM). We have addressed this on page 14.

      4) The color coding of the error bar and lines does not match with the group description in almost every figure. Maybe the authors could choose more contrasting colors.

      Thanks, we have adjusted the coloring of the error bars and lines in all figures.

      5) In Figure 3C-E it seems like the number of biological samples is not consistent in the LP+WP group. If the authors have excluded any outlier from the analysis, that should be included in the methodology.

      We did list outliers in the methodology in the statistics section (page 19): “Outliers were determined using GraphPad Prism Grubbs’ calculator (https://www.graphpad.com/quickcalcs/grubbs1/).”

      Reviewer #2:

      Very nice work! I do not have a whole lot to say in terms of experiments, analysis, or data to present other than what is in my public review (and you cannot really provide it as it was not in the experimental design). The manuscript is also very well written. My only question is about the following two sentences in the introduction:

      "Both exercise and amino acids activate the mechanistic target of TOR (mTOR) protein kinase, which stimulates the protein synthesis machinery needed to stimulate skeletal muscle hypertrophy (Schiaffino et al., 2021). Therefore, The Academy of Nutrition and Dietetics recommends consuming 1.2-2.0 grams of protein per kg of body weight (BW) per day in physically active individuals (Thomas et al., 2016)." I am not sure how the second sentence follows from the first, so I am not convinced that "therefore" is the right adverb in the right place.

      Thanks for pointing this out. We have added a clarifying transition to the text (page 3).

    2. eLife assessment

      This study presents a valuable finding on relationship between high protein diet and resistance exercise on fat accumulation and glucose homeostasis. The evidence supporting the claims of the authors is solid, although inclusion of mechanistic insight would have strengthened the study. The work will be of interest to dietician and exercise biologists working to understand the synergy between diet and physical activity.

    3. Reviewer #1 (Public Review):

      Summary:

      The study conducted on mice establishes a noteworthy connection between dietary protein intake and resistance exercise impact on metabolic health and muscle development. In sedentary mice, a diet rich in protein resulted in excessive fat accumulation and compromised blood sugar regulation in comparison to a diet low in protein. Intriguingly, when mice followed the high protein diet alongside progressive resistance training, they exhibited protection against surplus fat gain, though blood glucose regulation remained impaired. The research also revealed that resistance training notably enhanced muscle hypertrophy induced by exercise, particularly in mice on the high protein diet. Although the maximum strength achieved was similar across diets, this highlights the potential synergy between high protein consumption and resistance exercise in promoting skeletal muscle growth.

      Strengths:

      The study possesses several significant strengths. Firstly, it combines controlled dietary manipulations with resistance exercise, providing a comprehensive understanding of their combined effects on metabolic health and muscle growth. The use of mouse models, while not directly translatable to humans, offers a controlled experimental environment, enabling precise measurements and observations. Moreover, the study reveals nuanced outcomes such as the differential impact of high protein intake on adiposity and muscle hypertrophy. The emphasis on both positive and negative findings lends balance to the conclusions, enhancing the overall credibility of the study. Additionally, the clear delineation of diet-exercise interactions contributes to the broader understanding of dietary and exercise recommendations for metabolic health and muscle development.

      Weaknesses:

      Certain limitations warrant consideration. Firstly, the study's exclusive reliance on mice might limit the generalizability of the findings to humans due to inherent physiological differences. Additionally, the absence of direct investigation into the underlying molecular mechanisms responsible for the observed outcomes leaves room for speculation. Moreover, the research's concentration on male and young mice raises questions about the applicability of these findings to female and older subjects. Lastly, the study's duration and the specific resistance exercise protocol utilized might not fully reflect long-term human scenarios, underscoring the need for further research in more diverse populations and over extended timeframes.

    4. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Trautman et al. set out to test the hypothesis that increased intake of dietary protein is deleterious to health when uncoupled from resistance training.

      Strengths:

      The experimental design is well crafted and the experiments provide useful information supporting the hypothesis. The authors take into account the limitations of their study in the discussion, and guide the reader through their results and the interpretation in a fair and measured way, without overstating claims.

      Weaknesses:

      As acknowledged by the authors in the discussion section, this study only features a small sample of male mice from a single strain. Thus the results may not hold when female mice and diverse genetic backgrounds are analyzed. The lack of repeated measures of physiological parameters is also a limitation of the study. Measurements of body weight, body composition, food (calorie) consumption, and locomotor/strength assays could have been provided throughout the study and compared to a baseline value for each animal.

    1. Reviewer #1 (Public Review):

      The manuscript by Lolicato and colleagues characterizes the role of FGF2 dimerization in unconventional secretion of this signaling molecule using a combination of cell-based and in vitro assays. FGF2 is a signaling molecule secreted from the cell via an unconventional mechanism because it lacks a signal sequence. Previous studies by the same group have established a compelling model in which FGF2 forms an oligomer in a PIP2 dependent manner at the plasma member, which drives its translocation to the cell exterior. The same group also reports two cysteine residues that are critical for FGF2 oligomerization and secretion.

      In this study, the authors analyzed the impact of single Cysteine to Alanine substitution on oligomerization and secretion of FGF2. They found that C95 but not C77 is required for PIP2 dependent membrane binding, FGF2 oligomerization and secretion. On the other hand, C77 is required for the interaction of FGF2 with the plasma membrane Na, K-ATPase, which is thought to enhance the FGF2-PIP2 interaction. Using a set of bi-functional crosslinkers, the authors were able to capture a fraction of the FGF2 homo-dimer, which is dependent on C95. They propose that FGF2 forms a disulfide-bridged dimer via C95, which serves as the building block for FGF2 oligomerization in the plasma membrane.

      The revised manuscript has carefully addressed my concerns. I should clarify that when I inquired about evidence for a disulfide-linked FGF2 dimer, I referred to in vivo evidence. I was aware of the authors' previous in vitro study, which demonstrated that FGF2 indeed can form a disulfide dimer under an in vitro condition. Although the new manuscript still contains no in vivo data on this issue, the authors have added numerous controls. In particular, the fact that the FGF2 C95S mutant is severely defective in secretion does provide strong support for the involvement of the thiol group of C95 in FGF2 secretion. The additional discussions on other examples of cytosolically-localized disulfide proteins and those in proximity to membranes further alleviates my concern.

    2. eLife assessment

      This manuscript reports important findings, demonstrating a critical role for a cysteine-containing dimerization interface in the secretion of FGF2 through an unconventional pathway. The authors provide compelling evidence, combining in vitro biochemical assays with structural simulation. The work will be of interest to researchers working on protein trafficking and secretion.

    3. Reviewer #2 (Public Review):

      Unconventional secretion refers to the release of cargoes without a signal peptide and is performed independent of ER-Golgi trafficking. One essential type of unconventional secretion is type I, in which a cargo can translocate directly across the plasma membrane. FGF2 is one excellent mode to study type I translocation and the authors have focused on FGF2 secretion for decades. Many beautiful works have been performed to reveal the mechanism of FGF2 translocation step by step. And the picture is getting clearer which time a new work from the lab is published. In the current work, the authors characterized the importance of disulfate bond formation on C95 of FGF2 in lipid binding and translocation. In addition, they clearified the role of another C77 which is require for binding to the Na/K -ATPase that regulates the early step of FGF2 binding to the membrane. The authors also employed structural approaches and MD to provide mechanistic insights into the translocation process. In general it is an important advance regarding the translocation of FGF2 and data provided are brief, clear and convincing.

    4. Reviewer #3 (Public Review):

      In addition to ER-Golgi-dependent conventional protein secretion, a wide range of substrates lacking N-terminal signal peptides are secreted through diverse pathways collectively known as unconventional protein secretion (UPS). The translocation mechanism of these different substrates across the membrane remains a fascinating question in this field. In this manuscript, the authors employ a comprehensive combination of biochemistry, cell biology, and structural biology techniques to investigate the mechanism by which two crucial cystine residues, C77 and C95, facilitate the secretion of FGF2. The key finding is that the C95-C95 disulfide bond mediates the formation of an FGF2 dimer, which is essential for pore formation and translocation. Additionally, it is revealed that C77 promotes FGF2 secretion by interacting with a cell surface factor called Na-K ATPase. This observation provides valuable mechanistic insights into a critical step of FGF2 secretion. Overall, the experimental results presented in this study are both clear and convincing.

      The authors have well addressed my concern about the formation of disulfide bond in the revision. In addition, the cross-linking mass spectrometry identified an additional dimerization interface, which would be of interest for future studies on its role in regulating high-order FGF2 oligomer formation and secretion.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This important study from Godneeva et al. establishes a Drosophila model system for understanding how the activity of Tif1 proteins is modified by SUMO. The authors nicely show that Bonus, like homologous mammalian Tif1 proteins, is a repressor, and that it interacts with other co-repressors Mi-2/NuRD and setdb1 in Drosophila ovaries and S2 cells. They also show that Bonus is SUMOylated by Su(var)2-10 on at least one lysine at its N-terminus to promote its interaction with setdb1. By combining nice biochemistry with an elegant reporter gene approach, they show that SUMOylation is important for Bonus interaction with setdb1, and that this SUMO-dependent interaction triggers high levels of H3K9me3 deposition and gene silencing. While there are still major questions of how SUMO molecularly promotes this process, this study is a valuable first step that opens the door for interesting future experimentation.

      Major Point:

      The RNAseq and ChIPseq data is not available. This is critical for the review of the paper and would help the readers and reviewers interpret the Bonus mutant phenotype and its mechanism of repressing genes.

      The sequencing data have been deposited to the NCBI GEO archive. The accession number for all other RNA-seq and ChIP-seq data reported in this paper is GEO: GSE241375.

      1) The author's conclusion that Bonus SUMOylation is "essential for its chromatin localization" is not supported by the data. Figure 5F shows less 3KR mutant in the chromatin fraction but there is still significant signal.

      We appreciate the reviewer's feedback and agree that the term "essential" was not appropriate in this context. We have revised the manuscript to replace "essential" with "contributes to" to accurately reflect our findings.

      2) The author's conclusion that Bonus is SUMOylated at a single site close to its N-terminus is not necessarily true. In several SUMO and Bonus blots throughout the paper (5B, 6C, S4A), there are >2 differentially migrating species that could represent more than one SUMO added to Bonus. While the single K20R mutation eliminates all of these species in Fig 5C, it is possible that K20R SUMOylation is required for additional SUMOylation events on other residues. One way to determine if Bonus is SUMOylated on multiple sites is to add recombinant SUMO protease to the extract and see if multiple higher molecular weight bands collapse into a single migrating species (implying multiple SUMOs) or multiple migrating species (implying something else is altering gel migration).

      We appreciate the suggestion made by the reviewer. While we acknowledge the presence of occasional multiple bands in SUMO Western blots, the predominant pattern is the presence of unmodified Bon and a single additional band corresponding to SUMO-modified Bon. To investigate the possibility of multi-site SUMOylation, we performed requested experiment where we added SENP2 SUMO protease to the extract and checked Bon's SUMOylation. In the presence of NEM, we observed the unmodified form of Bon, as well as a single additional band representing a SUMO-modified form of Bon. Following SENP2 SUMO protease treatment, SUMOylation form of Bon was completely abolished in all samples, leaving only the unmodified Bon band (Extended Data Fig. 4D). This indicates that Bon is not SUMOylated on multiple sites and that the observed differential migration species likely result from other factors affecting gel migration.

      3) The authors state that most upregulated genes in BonusGLKD are not highly enriched in H3K9me3. The heatmap in figure 3D is not an ideal presentation of this argument. The authors should show an example of what the signal on a highly enriched gene looks like for comparison. The authors also argue that because most upregulated genes in BonusGLKD are not highly enriched in H3K9me3, they must be indirectly repressed. Another possibility is that bonus-mediated H3K9me3 is only important (and present) during early nurse cell differentiation and is later lost and dispensable during the rapid endocycles. After bonus establishes repression though H3K9me3, it might be maintained through bonus-Mi2/Nurd, something else, or nothing at all. The authors could discuss this possibility or perform H3K9me3 ChIP during cyst formation and early nurse cell differentiation rather than in whole ovaries, which are enriched for later stages.

      We thank the reviewer for their thoughtful comments and suggestions. In our revised manuscript we have included the tracks of gene that is highly enriched in H3K9me3 but remain unchanged upon Bon GLKD (Extended Data Fig. 3B). This addition allows for a visual comparison and better supports our argument that majority of genes upregulated in Bon GLKD are not enriched in H3K9me3 mark. We also appreciate the reviewer's suggestion regarding the potential temporal dynamics of Bon-mediated H3K9me3. It is indeed possible that Bon's role in establishing H3K9me3 might be more prominent during early nurse cell differentiation and less critical in later stages. We included discussion of this possibility in revised manuscript. To further explore it would be valuable to perform H3K9me3 ChIP during cyst formation and early nurse cell differentiation. However, given the limitations of our current resources and time limitations, we were unable to perform these experiments for the revised manuscript.

      4) The BonusGLKD RNAseq analysis is underwhelming. The conclusion that "Bonus represses tissue-specific genes" has limited value. Every gene that is not expressed in ovaries is "tissue-specific." What subset of tissue-specific genes does Bonus repress? What common features do these genes have and how do they compare to other sets of tissue-specific genes, such as those reportedly repressed by setdb1, Polycomb proteins, small ovary, l(3)mbt, and stonewall (among others in female germ cells). Comparing these available data sets could help the authors understand the mechanism of Bonus repression and how BonusGLKD leads to sterility. The authors could also further analyze the differences between nos-Gal4 and MT-Gal4 to better understand why nos- but not MT-driven knockdown is sterile.

      We appreciate the reviewer's feedback regarding the RNA-seq analysis and acknowledge the importance of identifying the specific subset of tissue-specific genes. The Figure 2C shows specific tissues where genes derepressed upon Bon GLKD are normally expressed. These are tissues/organs such as the head, digestive system, and nervous system. The reviewer's suggestion to compare our findings with existing datasets are valid and could indeed provide a more comprehensive understanding of Bon repression and its implications in female germ cells. However, many of the published datasets are based on mutant fly lines or use different GAL4 drivers to induce knockdowns, making direct comparisons challenging. We have conducted a preliminary analysis of available data, specifically nos-Gal4>SetDB1KD (GSE109852), and identified an overlap of 135 genes out of the 464 genes upregulated upon nos-Gal4>BonusKD with those affected by SetDB1 knockdown. We have included this result in the revised manuscript.

      Main Study Limitations:

      1) It is unclear which genes are directly vs indirectly regulated by bonus, which makes it difficult to understand Bonus's repressive mechanism. Several lines of experiments could help resolve this issue. 1) Bonus ChIPseq, which the authors mentioned was difficult. 2) RNAseq of BonusGLKD rescued with KR3 mutation. This would help separate SUMO/setdb1-dependent regulation from Mi-2 dependent regulation. Similarly, comparing differentially expressed genes in Su(var)2-10GLKD, setdb1GLKD, 3KR rescue, and MI-2 GLKD could identify overlapping targets and help refine how bonus represses subsets of genes through these different corepressors.

      We appreciate the reviewer's suggestions and agree that discrimination between direct and indirect Bon targets should be the next step in understanding Bon repressive mechanism. We have previously attempted to determine Bon direct targets using ChIP-seq approach. However, despite our multiple efforts using both native Bon antibodies and GFP-tagged Bon fly lines, analysis of ChIP-seq data did not reveal specific enrichment indicating that Bon – similar to many other chromatin-bound proteins – are not amenable to ChIP. The recommendation for RNA-seq analysis of Bon GLKD rescued with the 3KR mutation is valuable, and we will certainly consider it for future investigations.

      We compared differentially expressed genes in Su(var)2-10 GLKD and Mi-2 GLKD and found limited overlap: out of the 231 genes affected by Bon GLKD, 39 genes were affected in Mi-2 GLKD and 42 in Su(var)2-10 GLKD. We acknowledge the importance of understanding which genes are directly or indirectly regulated by Bon and the potential for further experiments to address this question.

      2) The paper falls short in discussing how SUMO might promote repression. This is important when considering the conservation (of lack thereof) of SUMOylation sites in Tif1 proteins in distantly related animals. One piece of data that was not discussed is the apparent localization of SUMOylated bonus in the cytoplasmic fraction of the blot in Figure 5F. Su(var)2-10 is mostly a nuclear protein, so is bonus SUMOylated in the nucleus and then exported to the cytoplasm? Also, setdb1 is a nuclear protein, so it is unlikely that the SUMOylated bonus directly interacts with setdb1 on target genes. Together with Fig 5E (unSUMOylatable Bonus aggregates in the nucleus), one could make a model where SUMO solubilizes bonus (perhaps by disassembling aggregates) and indirectly allows it to associate with setdb1 and chromatin. It is also important to note that in Figure 5I, the K3R mutation appears to lessen but not eliminate Bonus interaction with setdb1. This data again disfavors a model where SUMO establishes an interaction interface between setdb1 and Bonus. To determine which form of Bonus interacts with setdb1, the authors could perform a setdb1 pulldown and monitor the SUMOylation state of coIPed Bonus through mobility shift. If mostly unSUMOylated bonus interacts with setdb1, and SUMO indirectly promotes Bonus interaction with setdb1 (perhaps by disassembling Bonus aggregates), then the precise locations of Bonus SUMOylation sites could more easily shift during evolution, disfavoring the author's convergent evolution hypothesis.

      We appreciate the reviewer's valuable feedback. Regarding the observation of SUMOylated Bon in the cytoplasmic fraction in Figure 5F, we recognize its significance. This finding has prompted us to consider a model in which SUMOylation may play a role in translocating Bon from the nucleus to the cytoplasm, potentially influencing interactions with SetDB1 and chromatin indirectly. Furthermore, Figure 5I which shows only a partial reduction in Bon-SetDB1 interaction with the 3KR mutation, suggests that SUMO may not be the primary mediator of this interaction. We recognize the need for further investigations to clarify SUMO's exact role in this context. In response to the reviewer's suggestion, we conducted SetDB1 pulldown experiments in S2 cells. The results reveal that indeed SetDB1 primarily interacts with unmodified Bon which is by far more abundant compared to SUMOylated form (Extended Data Fig. 5C). We think this experiment presents certain technical challenges, as the signal for Bon, when used as prey in co-IP experiments, is relatively faint, making it inherently difficult to detect the lower levels of SUMO-modified Bon. Additionally, in revised manuscript we have added new result of determining Bon interactors in ovary using mass-spec analysis, which showed that SetDB1 associates with wild-type, but not SUMO-deficient Bon. While our data support the idea that SUMO may contribute to Bon solubilization, possibly by disassembling aggregates, thereby indirectly facilitating its association with SetDB1 and chromatin, we acknowledge that the precise mechanism remains unclear.

      Reviewer #2 (Public Review):

      Summary:

      The authors analyze the functions and regulation of Bon, the sole Drosophila ortholog of the TIF1 family of mammalian transcriptional regulators. Bon has been implicated in several developmental programs; however, the molecular details of its regulation have not been well understood. Here, the authors reveal the requirement of Bon in oogenesis, thus establishing a previously unknown biological function for this protein. Furthermore, careful molecular analysis convincingly established the role of Bon in transcriptional repression. This repressor function requires interactions with the NuRD complex and histone methyltransferase SetDB1, as well as sumoylation of Bon by the E3 SUMO ligase Su(var)2-10. Overall, this work represents a significant advance in our understanding of the functions and regulation of Bon and, more generally, the TIF1 family. Since Bon is the only TIF1 family member in Drosophila, the regulatory mechanisms delineated in this study may represent the prototypical and important modes of regulation of this protein family. The presented data are rigorous and convincing. As discussed below, this study can be strengthened by a demonstration of a direct association of Bon with its target genes, and by analysis of the biological consequences of the K20R mutation.

      Strengths:

      1. This study identified the requirement for Bon in oogenesis, a previously unknown function for this protein.
      2. Identified Bon target genes that are normally repressed in the ovary, and showed that the repression mechanism involves the repressive histone modification mark H3K9me3 deposition on at least some targets.
      3. Showed that Bon physically interacts with the components of the NuRD complex and SetDB1. These protein complexes are likely mediating Bon-dependent repression.
      4. Identified Bon sumoylation site (K20) that is conserved in insects. This site is required for repression in a tethering transcriptional reporter assay, and SUMO itself is required for repression and interaction with SetDB1. Interestingly, the K20-mutant Bon is mislocalized in the nucleus in distinct puncta.
      5. Showed that Su(var)2-10 is a SUMO E3 ligase for Bon and that Su(var)2-10 is required for Bon-mediated repression.

      Weaknesses:

      The study would be strengthened by demonstrating a direct recruitment of Bon to the target genes identified by RNA-seq. Given that the global ChIP-seq was not successful, a few possibilities could be explored. First, Bon ChIP-qPCR could be performed on the individual targets that were functionally confirmed (e.g. rbp6, pst). Second, a global Bon ChIP-seq has been reported in PMID: 21430782 - these data could be used to see if Bon is associated with specific targets identified in this study. In addition, it would be interesting to see if there is any overlap with the repressed target genes identified in Bon overexpression conditions in PMID: 36868234.

      We greatly appreciate the reviewer's suggestion to demonstrate the direct recruitment of Bon to the target genes. As described in our answer to reviewer #1, we attempted to determine Bon direct targets using ChIP-seq approach using both native Bon antibodies and GFP-tagged Bon fly lines. However, analysis of ChIP-seq data did not reveal specific enrichment. Similarly, Bon ChIP-qPCR on individual targets showed the same results suggesting that Bon – similar to many other chromatin-bound proteins – are not amenable to ChIP protocol, at least in standard conditions. To further explore this issue, we have analyzed results of a global Bon ChIP-seq reported in PMID: 21430782. We did not find Bon binding to individual targets, but even more importantly, we did not see clear Bon enrichment elsewhere in the genome confirming a conclusion that Bon targets on chromatin cannot be determined by ChIP. Additionally, we explored the possibility of overlap between target genes repressed by Bon in our study and those observed under Bon overexpression conditions in PMID: 36868234. While we did identify 41 genes in common, it's important to note that the datasets are derived from different tissues (pupal eyes vs. ovaries), making direct comparison problematic.

      The second area where the manuscript can be improved is to analyze the biological function of the K20R mutant Bonus protein. The molecular data suggest that this residue is important for function, and it would be important to confirm this in vivo.

      We appreciate the reviewer's suggestion to analyze the biological function of the K20R mutant Bon protein. While we acknowledge that we did not use single-site K20R mutant for in vivo experiments, we demonstrated that the mutant with the three-residue substitution (3KR) is incapable of inducing repression (Figure 5G). Given that other experiments consistently showed that K20 is the primarily SUMOylation site, this result supports the conclusion that K20 SUMOylation plays an important role in Bon-mediated transcriptional silencing.

      Reviewer #1 (Recommendations for The Authors):

      Make the RNAseq and ChIPseq data publicly available!

      The sequencing data have been deposited to the NCBI GEO archive. The accession number for all other RNA-seq and ChIP-seq data reported in this paper is GEO: GSE241375.

      Reviewer #2 (Recommendations for The Authors):

      It would be interesting to identify the biological basis of aberrant ovary development in Bon depletion conditions. Previous studies (e.g. PMID: 11336699) suggested that Bon loss of function clones are cell lethal, and the developmental defects in oogenesis presented in the current study offer an opportunity to delve more into the causes of cell loss, e.g. by showing that the cells die via apoptosis.

      Thank you for your valuable suggestion. In response to your comment, we performed a TUNEL assay to investigate whether germ cells in nos-Gal4>BonusKD ovaries undergo apoptosis. Our results indeed indicate that germ cells in these ovaries exhibit apoptosis, as evidenced by the TUNEL signal (Extended Data Fig. 1C). This information has been included in the revised manuscript to provide insights into the biological basis of aberrant ovary development in Bon depletion conditions.

      The K20 residue could also be ubiquitinated. This possibility could at least be discussed, particularly given the presence of the RING Ub ligase domain in Bon that might potentially perform self-ubiquitination.

      Indeed, the possibility that Bon can be ubiquitinated is a valid consideration. We have explored this possibility. We did not detect any signals with the Ubiquitin antibody in both wild-type Bon immunoprecipitant and triple-mutant [3KR] ovaries (in which K20 is also mutated) (Extended Data Fig. 4C). This suggests that K20 is more likely responsible for Bon SUMOylation rather than ubiquitination. We appreciate the reviewer's suggestion and have included this information into the revised manuscript.

    1. Reviewer #1 (Public Review):

      Terzioglu and co-workers tested the provocative hypothesis that mitochondria maintain an internal temperature considerably higher than cytosolic/external environmental temperature due to the inherent thermodynamic inefficiency of mitochondrial oxidative phosphorylation. As a follow-up to a prior paper from some of the same authors, the goal of this study was to conduct additional experiments to assess mitochondrial temperature in cultured cells. Consistent with the prior work, the authors provide consistent evidence that the temperature of mitochondria in four different types of cultured mammalian cells, as well as cells from Drosophila (poikilotherms), is 15oC or more above the external temperature at which cells are maintained (e.g., 37oC). Additional evidence shows that mitochondria maintain higher temperatures under several different types of cellular metabolic stresses predicted to decrease the dependence on OxPhos, adding to the notion that natural thermodynamic inefficiency and heat generation may be an important, and potentially regulated, characteristic of mitochondrial metabolism.

      Strengths

      Demonstration that both a fluorescent (Mito Thermo Yellow) and a genetic-based (mito-gTEMP) mitochondrial targeted temperature probe elicit similar quantitative changes in mitochondrial temperature under different experimental conditions is a strength. The addition of the genetic probe to the current study supports prior findings using the fluorescent probe and thus achieves a primary objective of the study.

      The experiments are well designed and executed. Specific attention given to potential artifacts affecting probe signal and/or non-specific effects from the different experimental interventions is a strength.

      The use of different cultured cell lines from different organisms provides additional evidence of elevated temperature as a general property of functioning mitochondria, representing additional validation.

      Weakness:

      While the findings and potential interpretations put forward by the authors are intriguing, the severity of the interventions (e.g., mitochondrial complex-specific inhibitors, inhibition of protein synthesis) and the absence of simultaneous or parallel measurements of other key bioenergetic parameters (i.e., membrane potential, oxygen consumption rate, etc.) limits the ability to interpret potential cause and effect - whether the thermogenesis aspect of OxPhos is being sensed and regulated, or whether temperature changes are more of a biproduct of adjustments in OxPhos flux under the experimental circumstances. In other words, the physiological relevance of the findings remains unclear.

      Related, several of the interventions are employed to either increase or decrease dependence on OxPhos flux, but no outcome measures are reported to document whether the intended objective was achieved (e.g., increased OxPhos flux in low glucose plus galactose, decreased ATP demand-OxPhos flux with anisomycin, etc.).

    2. Reviewer #2 (Public Review):

      An important paper that confirms the validity of the initial findings of Chretien et al regarding the hot temperatures at which the mitochondrion is operating. The authors responded adequately to the reviewers' concerns.

    3. Reviewer #3 (Public Review):

      The goal of this study was to use a combination of fluorescent dyes and genetically encoded reporters to estimate the temperature of mitochondria. The authors provide additional evidence that they claim to support "hot" mitochondria.

      Strengths:<br /> 1. The authors use several methods, including a mitochondrial fluorescent reporter dye, as well as a genetically encoded gTEMP temperature probe, to estimate mitochondrial temperature.<br /> 2. The authors couple these measurements with other perturbation of mitochondria, such as OXPHOS inhibitors, to show consistency

      Weaknesses:<br /> 1. The methodology for inferring mitochondrial temperature is not well-established to begin with and requires additional controls for interpretation.<br /> a. Very little benchmarking is done of the "basal" fluorescence ratio, and whether that fluorescence ratio actually reflects true organelle temperature. For instance, the authors should in parallel compare between different organelles to see if only mitochondria appear "hot" or whether this is some calibration error. Another control is to use different incubator temperatures and see how mitochondrial (vs other organelle) temperature varies as a function of external temperature.<br /> b. The authors do not rigorously control for other factors that may also be changing fluorescence and may be confounders to the delta fluorescence (eg, delta calcium in response to mito inhibitors, membrane potential, redox status, ROS, etc.). There should be additional calibration for all potential confounders.<br /> c. Can these probes be used in isolated mitochondria and other isolated organelles. Such data would also help to clarify whether the high temperature is specific to mitochondria.<br /> 2. The authors should try to calibrate their fluorescence inference of temperature with an alternative method and benchmark to others in the field. For instance, Okabe et al Nat Comm 2012 used a polymeric thermometer to measure temperature and reported 33degC cytoplasm and 35degC nucleus. Can the authors also show a ~2degC difference in their hands between those two compartments, and under those conditions are mitochondria still 10degC hotter?

      Based on the aforementioned weaknesses, in my opinion, the authors did not achieve their Aims to accurately determine the temperature of mitochondria. The results, while interesting, are preliminary and require additional controls before conclusions can be drawn. Previous studies have indicated intra-organelle temperature variations within cells; typically, previous reports have estimated that the variation is within a few degrees (Okabe et al Nat Comm 2012). Only one report has previously suggested that mitochondria are at 50degC (Cretien, Plos biology 2018). The study does not substantially clarify the true temperature of mitochondria or resolve potential discrepancies in previous estimates of mitochondrial temperature.

    1. Reviewer #2 (Public Review):

      Tuller et al. first made the curious observation, that the first ∼30-50 codons in most organisms are encoded by scarce tRNAs and appear to be translated slower than the rest of the coding sequences (CDS). They speculated that this has evolved to pace ribosomes on CDS and prevent ribosome collisions during elongation - the "Ramp" hypothesis. Various aspects of this hypothesis, both factual and in terms of interpreting the results, have been challenged ever since. Sejour et al. present compelling results confirming the slower translation of the first ~40 codons in S. cerevisiae but providing an alternative explanation for this phenomenon. Specifically, they show that the higher amino acid sequence divergence of N-terminal ends of proteins and accompanying lower purifying selection (perhaps the result of de novo evolution) is sufficient to explain the prevalence of rare slow codons in these regions. These results are an important contribution in understanding how aspects of the evolution of protein coding regions can affect translation efficiency on these sequences and directly challenge the "Ramp" hypothesis proposed by Tuller et al.

      I believe the data is presented clearly and the results generally justify the conclusions.

    2. eLife assessment

      This is an important contribution to the origins and translational consequences of the relatively low rate of translation elongation in the first ∼30-50 codons of genes in most organisms. The authors provide convincing evidence that the prevalence of rare codons in the first ~40 codons in yeast is due to the relatively recent evolution of these coding sequences, or of lower purifying selection operating on them, and that a preponderance of codons encoded by rare tRNAs near the N-terminus is not associated with higher translational efficiency in the manner proposed by the "translational ramp" hypothesis. The work is incomplete in that the results of reporter assays may have been confounded by alterations of mRNA sequence or structure that could have influenced their translation or mRNA stability; that the work cannot fully account for a greater enrichment of slowly translated codons in N-terminal vs. C-terminal regions; and that the work does not resolve whether translation elongation through N-terminal coding is truly slow.

    3. Reviewer #1 (Public Review):

      The manuscript by Sejour et al. is testing "translational ramp" model described previously by Tuller et al. in S. cerevisiae. Authors are using bioinformatics and reporter based experimental approaches to test whether "rare codons" in the first 40 codons of the gene coding sequences increase translation efficiency and regulate abundance of translation products in yeast cells. Authors conclude that "translation ramp" model does not have support using a new set of reporters and bioinformatics analyses. The strength of bioinformatic evidence and experimental analyses (even very limited) of the rare codons insertion in the reporter make a compelling case for the authors claims. However the major weakness of the manuscript is that authors do not take into account other models that previously disputed "rare or slow codon" model of Tuller et al. and overstate their own results that are rather limited. This maintains to be the weak part of the manuscript even in the revised form.

      The studies that authors do not mention argue with "translation ramp" model and show more thorough analyses of translation initiation to elongation transition as well as early elongation "slow down" in ribosome profiling data. Moreover several studies have used bioinformatical analyses to point out the evolution of N-terminal sequences in multiple model organisms including yeast, focusing on either upstream ORFs (uORFs) or already annotated ORFs. The authors did not mention multiple of these studies in their revised manuscript and did not comment on their own results in the context of these previous studies. As such the authors approach to data presentation, writing and data discussion makes the manuscript rather biased, focused on criticizing Tuller et al. study and short on discussing multiple other possible reasons for slow translation elongation at the beginning of the protein synthesis. This all together makes the manuscript at the end very limited.

    1. Author Response

      We very much appreciate all the reviewers’ positive feedback and additional comments and suggestions for this manuscript!

      In this provisional reply, we’d like to quickly address only one selected key point, for which we have already collected relevant experimental data:

      Reviewer 1 suggests that ‘it would have been more rigorous for the authors to independently reproduce the kinetics reported for nsp8/9 using their specific experimental conditions.’ We absolutely agree with this and have already carried out these kinetic experiments while our paper was under review. We have now measured kinetic parameters for cleavage of the nsp8/9 peptide in our own hands under the same conditions as we used for nsp4/5 and TRMT1. We measured kcat and KM values of 0.019 +/- 0.002 s-1 and 40 +/- 7.5 µM, respectively, for nsp8/9 cleavage; these data are very much in line with the previously reported values from MacDonald et al (kcat = 0.013 +/- 0.001 s-1, KM = 36 +/- 6.0 µM) that we used for comparison in Figure 4 and listed in Table S2. We will add our own measured kinetic values for nsp8/9 in the next version of our manuscript, but wanted to report these numbers as soon as possible, because this further supports and validates our claim that the human TRMT1 sequence is cleaved at a similar rate to the known nsp8/9 viral polypeptide cleavage site.

      We will provide a detailed, point-by-point reply to all reviewer comments accompanying the forthcoming revised manuscript, in which we intend to have new and updated data and additional MD simulations that directly address key questions raised by the reviewers.

    2. eLife assessment

      This manuscript provides important structural insights into the recognition and degradation of the host tRNA methyltransferase by SARS-CoV-2 protease nsp5 (Mpro). The data convincingly support the main conclusions of the paper. These results will be of interest to researchers studying structures and substrate recognition and specificity of viral proteases.

    3. Reviewer #1 (Public Review):

      D'Oliviera et al. have demonstrated cleavage of human TRMT1 by the SARS-CoV-2 main protease in vitro. Following this, they solved the structure of Mpro-C145A bound to TRMT1 substrate peptide, revealing binding conformation distinct from most viral substrates. Overall, this work enhances our understanding of substrate specificity for a key drug target of CoV2. The paper is well-written and the data is clearly presented. It complements the companion article by demonstrating the interaction between Mpro and TRMT1 and TRMT1 cleavage under isolated conditions in vitro. Importantly, the revelation of flexible substrate binding of Nsp5 is fundamental for understanding Nsp5 as a drug target. Trmt1 cleavage assays revealed similar kinetics for TRMT1 cleavage as compared to the nsp8/9 viral polyprotein cleavage site, however, it would have been more rigorous for the authors to independently reproduce the kinetics reported for nsp8/9 using their specific experimental conditions. The finding that murine TRMT1 lacks a conserved consensus sequence is interesting, but is not experimentally tested here and is reported elsewhere. I am unable to comment critically on the structural analyses as it is outside of my expertise. Overall, I think that these findings are important for confirming TRMT1 as a substrate of Mpro and defining substrate binding and cleavage parameters for an important drug target of SARS-CoV-2.

    4. Reviewer #2 (Public Review):

      Summary:<br /> The manuscript 'Recognition and Cleavage of Human tRNA Methyltransferase TRMT1 by the SARS-CoV-2 Main Protease' from Angel D'Oliviera et al., uncovers that TRMT1 can be cleaved by SARS-CoV-2 main protease (Mpro) and defines the structural basis of TRMT1 recognition by Mpro. They use both recombinant TRMT1 and Mpro as well as endogenous TRMT1 from HEK293T cell lysates to convincingly show cleavage of TRMT1 by the SARS-CoV-2 protease. To understand how Mpro recognizes TRMT1, they solved a co-crystal structure of Mpro bound to a peptide derived from the predicted cleavage site of TRMT1. This structure revealed important protein-protein interfaces and highlights the importance of the conserved Q530 for cleavage by Mpro. They then compared their structure with previous X-ray crystal structures of Mpro bound to substrate peptides derived from the viral polyprotein and proposed the concept of two distinct binding conformations to Mpro: P3´-out and P3´-in conformations (here P3´ stands for the third residue downstream of the cleavage site). It remains unknown what is the physiological role of these two binding conformations on Mpro function, but the authors established that Mpro has dramatically different cleavage efficiencies for three distinct substrates. In an effort to rationalize this observation, a series of mutations in Mpro's active site and the substrate peptide were tested but unexpectedly had no significant impact on cleavage efficiency. While molecular dynamic simulations further confirmed the propensity of certain substrates to adopt the P3´-out or P3´-in conformation, they did not provide additional insights into the dramatic differences in cleavage efficiencies between substrates. This led the authors to propose that the discrimination of Mpro for preferred substrates might occur at a later stage of catalysis after binding of the peptide. Overall, this work will be of interest to biologists studying proteases and substrate recognition by enzymes as well as help efforts to target Mpro with peptide-like drugs.

      Strengths:<br /> • The authors' statements are well supported by their data, and they used relevant controls when needed. Indeed, they used the Mpro C145A inactive variant to unambiguously show that the TRMT1 cleavage detected in vitro is solely due to Mpro's activity. Moreover, they used two distinct polyclonal antibodies to probe TRMT1 cleavage.

      • Their 1.9 Å crystal structure is of high quality and increases the confidence in the reported protein-protein contacts seen between TRMT1-derived peptide and Mpro.

      • Their extensive in vitro kinetic assay was performed in ideal conditions although it is unclear how many replicates were performed.

      • The authors test multiple hypotheses to rationalize the preference of Mpro for certain substrates.

      • While this reviewer is not able to comment on the rigor of the MD simulations, the interpretations made by the authors seem reasonable and convincing.

      • The concept of two binding conformations (P3´-out or P3´-in) for the substrate in the active site of Mpro is significant and can guide drug design.

      Weaknesses:<br /> • While the authors convincingly show that TRMT1 is cleaved by Mpro, the exact cleavage site was never confirmed experimentally. It is most likely that the predicted site is the main cleavage site as proposed by the authors (region 527-534). Nevertheless, in Fig 1C (first lane from the right) there are two bands clearly observed for the cleavage product containing the MT Domain. If the predicted site was the only cleavage site recognized by Mpro, then a single band for the MT domain would be expected. This observation suggests that there might be two cleavage sites for Mpro in TRMT1. Indeed, residues RFQANP (550-555) in TRMT1 might be a secondary weaker cleavage site for Mpro, which would explain the two observed bands in Fig 1C. A mass spectrometry analysis of the cleaved products would clarify this.

      • A control is missing in Fig 1D. Since the authors use western blots to show the gradual degradation of endogenous TRMT1, a control with a protein that does not change in abundance over the course of the measurement is important. This is required to show that the differences in intensity of TRMT1 by western blotting are not due to loading differences etc.

      • The two polyclonal antibodies used by the authors seem to have strong non-specific binding to proteins other than TRMT1 but did not impact the author's conclusions. This is a limitation of the commercially available antibodies for TRMT1, and unless the authors select a new monoclonal antibody specific to TRMT1 (costly and lengthy process), this limitation seems out of their control.

      • The recombinantly purified TRMT1 seems to have some non-negligible impurities (extra bands in Fig 1C). This does not impact the conclusions of the authors but might be relevant to readers interested in working with TRMT1 for biochemical, structural, or other purposes.

      • Despite the reasonable efforts of the authors, it remains unknown why Mpro shows higher cleavage efficiency for the nsp4/5 sequence compared to TRMT1 or nsp8/9 sequences.

      • The peptide cleavage kinetic assay used by the authors relies on a peptide labelled with a fluorophore (MCA) on the N-terminus and a quencher (Dpn) on the C-terminus. This design allows high-throughput measurements compatible with plate readers and is a robust and convenient tool. Nevertheless, the authors did not control for the impact of the labels (MCA and Dpn) on the activity of Mpro. It is possible that the differences in cleavage efficiencies between peptides are due to unexpected conformational changes in the peptide upon labelling. Moreover, the TRMT1 peptide has an E at the N-terminus and an R at the C-terminus (while the nsp4/5 peptide has an S and M, respectively). It is possible that these two terminal residues form a salt bridge in the TRMT1 peptide that might constrain the conformation of the peptide and thus reduce its accessibility and cleavage by Mpro. Enzymatic assays in the absence of labels and MD simulations with the bona fide peptides (including the labels) used in the kinetic measurements are needed to prove that the cleavage efficiencies are not biased by the fluorescence assay.

      • The authors used A431S variant in TRMT1-derived peptide to disrupt the P3´-in conformation. While this reviewer agrees with the rationale behind A431S design, it is important to confirm experimentally that the mutation disrupted the P3´-in conformation in favor of the P3´-out conformer. The authors could use their MD simulations to determine if the TRMT1 A431S variant favors the P3´-out conformation.

      • An unanswered question not addressed by the authors is if the peptides undergo conformational changes upon Mpro binding or if they are pre-organized to adopt the P3´-out and P3´-in conformations.

      • While the authors describe at great length the hydrogen bonds involved in the substrate recognition by Mpro, they occluded to highlight important stacking interactions in this interface. For instance, Phe533 from TRMT1 stacks with Met49 while L529 from TRMT1 packs against His41 of Mpro. Both hydrogen bonding and stacking interactions seem important for TRMT1-derived peptide recognition by Mpro.

    5. Reviewer #3 (Public Review):

      Summary:<br /> In this manuscript, the authors have used a combination of enzymatic, crystallographic, and in silico approaches to provide compelling evidence for substrate selectivity of SARS-CoV-2 Mpro for human TRMT1.

      Strengths:<br /> In my opinion, the authors came close to achieving their intended aim of demonstrating the structural and biochemical basis of Mpro catalysis and cleavage of human TRMT1 protein. The combination of orthogonal approaches is highly commendable.

      Weaknesses:<br /> It would have been of high scientific impact if the consequences of TRMT1 cleavage by Mpro on cellular metabolism were provided. Furthermore, assays to investigate the effect of inhibition of this Mpro activity on SARS-CoV-2 propagation and infection would have been extremely useful in providing insights into host- SARS-CoV-2 interactions.

    1. Author Response

      We thank the reviewers for their suggestions in improving the manuscript. We are currently working on a formal revision and plan to submit a revised manuscript in the near future. However, we would be remiss, if we did not address concerns regarding the conceptual merits of the paper. Below we speak to major points of note that address select reviewer comments and the eLife assessment of our manuscript.

      eLife assessment:

      However, the strength of evidence is incomplete due to the concern that larval contraction is a result of chilling the nervous system and muscles, which causes spreading depolarization and mechanical contraction of the body, rather than an active sensorimotor response to cold.

      Reviewer #3:

      The scientific premise is that a full body contraction in larvae that are exposed to noxious cold is a sensorimotor behavioral pathway. This premise is, to start with, questionable. A common definition of behavior is a set of "orderly movements with recognizable and repeatable patterns of activity produced by members of a species (Baker et al., 2001)." In the case of nociception behaviors, the patterns of movement are typically thought to play a protective role and to protect from potential tissue damage.

      Does noxious cold elicit a set of orderly movements with a recognizable and repeatable pattern in larvae? Can the patterns of movement that are stimulated by noxious cold allow the larvae to escape harm? Based on the available evidence, the answer to both questions is seemingly no.

      We thank the reviewer for their questions and clarify, here. Exposure to cold temperatures does elicit a recognizable and repeatable pattern of behavior across multiple strains, including both wildtype and genetic control strains (w1118, Oregon R) and numerous control conditions that have been previously published (Himmel et al., 2021, Himmel et al., 2023, Patel et al., 2022, Turner et al., 2016, Turner et al., 2018, Tenedini et al., 2019). Our initial publication on Drosophila cold nociception demonstrated a variety of cold-evoked behavior responses including head and/or tail raising of the larva as well as contraction behavior. These behaviors were repeatedly observed in assays involving either local cold stimulation with a cold probe or global cold stimulation on a cold plate. Head and/or tail raise behaviors are consistent with behavior that displaces the larval body from the cold surface, however, exposure to increasingly colder temperatures leads to an increasing level of cold-evoked contraction (CT) responses which result in a reduction of larval area (Turner et al., 2016). Presumably, increasing the level of CIII md neuron activation leads to greater activation of downstream circuitry. We previously performed optogenetic dose response assays to further clarify the increased prevalence CT response to strong noxious cold stimuli and investigated how CIII md neurons discriminate between innocuous touch and noxious cold stimuli. Here, we found that lower-level activation of CIII md neurons lead to predominantly touch-evoked behaviors whereas high-level activation led predominantly to cold-evoked responses (Turner et al., 2016). These analyses were coupled with stimulus-evoked calcium imaging, which revealed that touch-evoked Ca2+ levels were significantly lower than cold-evoked Ca2+ levels (Turner et al., 2016).

      In this manuscript, we confirm our previously published findings that neural silencing of CIII md neurons with either tetanus toxin expression or impairing action potential propagation results impaired cold-evoked CT responses (Turner et al., 2016, Turner et al., 2018). However, neural silencing of CIII md neurons did not eliminate cold-evoked CT responses. We interpret this finding as evidence that some component of cold-evoked CT response may be due to cold-induced muscle contraction. Furthermore, in this manuscript, we implicate the requirement of chordotonal (Ch) neurons in cold-evoked CT and demonstrate cold-evoked Ca2+ increases in Ch neurons. Furthermore, neural silencing of multiple sensory neuron types (CIII + Ch or CIII + CII) resulted in greater deficits in cold-evoked behaviors (Turner et al., 2016). Thus, the noxious cold stimulus is detected by multiple peripheral sensory neurons and inhibiting neural activity in CIII md neurons alone cannot eliminate cold-evoked CT responses.

      In this manuscript and in several other publications, studies have shown that optogenetic activation of CIII md neurons, or CIII neurons plus CII neurons or Ch neurons elicits CT-like responses (Hwang et al., 2007, Shearin et al., 2013, Turner et al., 2016). Conversely, optogenetic stimulation of CIII md neurons knocked down for paralytic, the α-subunit of voltage-gated sodium channel, did not elicit blue light-evoked CT responses due to impaired action potential propagation. These analyses collectively indicate that CIII md neuron activation is sufficient for eliciting CT-like responses. Additionally, we have previously published electrophysiological recordings of CIII md neurons under cold exposure. To address potential confounds of cold-induced muscle contraction on cold-induced electrical activity of CIII md neurons, we performed these analyses on de-muscled fillets revealing that CIII neural activity is not dependent upon muscles in response to cold. Exposure to noxious cold stimuli results in temperature-dependent increases in CIII neuron firing pattern consisting of both bursting and tonic firing (Himmel et al., 2021, Himmel et al., 2023, Maksymchuk et al., 2022, Patel et al., 2022, Himmel et al., 2022, Maksymchuk et al., 2023).

      Reviewer #3:

      Can the patterns of movement that are stimulated by noxious cold allow the larvae to escape harm?

      We were similarly curious about the neuroethological and/or protective implications of cold-evoked behaviors. In Drosophila larvae, noxious mechanical stimuli-evoked body rolling allows for lateral escape from predatory wasp (Hwang et al., 2007). Reducing the overall surface area that is exposed to cold (e.g., huddling behavior) serves as a protective strategy in many species (Canals et al., 1997, Contreras, 1984, Gilbert et al., 2006, Vickery and Millar, 1984, Hayes et al., 1992). Low temperatures can be fatal to poikilotherms (e.g., insects), however, many species have evolved the ability to cold acclimate thereby increasing their cold tolerance. To explore the potential evolutionary benefit of CIII-mediated contraction response to cold, we previously published work revealing a neural basis for cold acclimation in Drosophila larvae implicating these neurons (Himmel et al., 2021). We demonstrated that cold-evoked CT behavior is evolutionarily conserved across 11 different drosophilid species and that other cold-induced behaviors (e.g., tail raise) were also observed. Furthermore, drosophilid species adapted to rapid temperature swings were more likely to retain the ability to locomote even at lower temperatures (Himmel et al., 2021). Next, we elucidated the role of CIII md neurons in cold acclimation. Silencing CIII md neurons resulted in the inability to cold acclimate. We additionally investigated roles of Ch or CII md neurons, which alone did not inhibit the ability of larvae to cold acclimate. However, combinatorial silencing of CIII with CII or Ch neurons resulted in an inability to cold acclimate but did not obviously increase baseline cold tolerance. We explored how developmental exposure to noxious cold temperature impacts CIII md neuron cold-evoked firing pattern. Electrophysiological analyses revealed that cold acclimation results in hypersensitization in CIII md neurons (Himmel et al., 2021). Lastly, developmental optogenetic activation of CIII md neurons led to increased cold tolerance. Therefore, CIII md neurons are necessary and sufficient for cold tolerance and our collective evidence demonstrate that CIII-mediated cold nociception constitutes a peripheral neural basis for Drosophila larval cold acclimation (Himmel et al., 2021).

      Reviewer #3:

      It should be noted that this actuator drives very strong activation, and other studies with milder optogenetic stimulation of Class III neurons have shown that these cells produce behavioral responses that resemble gentle touch responses (Tsubouchi et al 2012 and Yan et al 2013)…The latter makes the reported Calcium responses to cold difficult to interpret in light of the fact that the strong muscle contractions driven by cold may actually be driving mechanosensory responses in these cells (ie through deformation of the mechanosensitive dendrites)…. Are the cIII calcium signals still observed in a preparation where cold induced muscle contractions are prevented?”

      We agree with the reviewer that mild activation of CIII md neurons results in gentle touch-like responses. In this manuscript, and other previously published work, it has been shown that optogenetic activation of CIII neurons, or CIII neurons and other sensory neurons, using a variety of optogenetic actuators (ChR2, ChETA, and CsChrimson) promotes bilateral contraction of the larval body along the anterior-posterior axis (Shearin et al., 2013, Hwang et al., 2007, Meloni et al., 2020, Turner et al., 2016, Patel and Cox, 2017, Patel et al., 2022, Himmel et al., 2023).

      As described above, in our initial publication documenting larval cold nociception in Drosophila, we investigated how CIII md neurons discriminate multimodal stimuli to elicit stimulus relevant behavioral responses. We reported that increased activation of CIII md neurons results in cold-evoked behaviors, where lower activation results in touch-evoked behaviors. Subsequent, calcium analyses revealed greater stimulus-evoked calcium response to noxious cold and milder calcium response to gentle touch (Turner et al., 2016).

      Though we have not performed cold-evoked Ca2+ imaging of CIII md neurons in larval preparations without muscles, we have recorded electrical responses of CIII md neurons in the absence of muscle contractions using de-muscled larvae fillets to analyze cold-evoked firing patterns of CIII md neurons (Himmel et al., 2021, Himmel et al., 2022, Himmel et al., 2023, Patel et al., 2022, Maksymchuk et al., 2022, Maksymchuk et al., 2023). These studies demonstrate the cold-evoked CIII neural activity is not dependent upon muscles.

      Reviewer #3:

      A major weakness of the study is that none of the second or third order neurons (that are downstream of CIII neurons) are found to trigger the CT behavioral responses even when strongly activated with the ChETA actuator (Figure 2 Supplement 2). These findings raise major concerns for this and prior studies and it does not support the hypothesis that the CIII neurons drive the CT behaviors.”

      We conducted extensive screening of interneuron populations post-synaptically connected to CIII neurons in an effort to identify post-synaptic partners that were sufficient to trigger CT response. Much to our surprise, we were unable to find any individual neuron type or driver line that was sufficient to elicit a CT response. However, we provide substantial supporting evidence for our co-activation experiments including neural silencing, EM connectivity and calcium imaging. We also report necessity for the reported second/third order neurons in cold-evoked behavioral responses, where inhibiting neural activity resulted in reduced cold-evoked behavior. Second/third order neurons also exhibit cold-evoked calcium responses. Lastly, we also report CIII-evoked (using optogenetics) increases in calcium response in downstream post-synaptic neurons.

      Previously published literature investigating CIV md neuron circuitry has implicated downstream neurons that are not sufficient to elicit rolling behavior upon activation. In CIV md neuron circuit dissection, select neurons are reported as acting downstream of CIV md neurons that require additional circuit components in order to execute rolling behavior. For example, A00c neuron activation alone does not lead to rolling behavior, however, co-activation of A00c and Basin-4 neurons facilitates rolling response (Ohyama et al., 2015). Similarly, co-activation of Basin-1 and Basin-4 neurons significantly enhance rolling probability relative to Basin-4 alone (Ohyama et al., 2015). Further, DnB neurons require Goro command neuron activity to promote rolling behavior (Burgos et al., 2018). Thus, there is precedent for co-activation requirements to elicit robust behavioral output in sensorimotor circuits and we employed a similar strategy after we discovered that activation of second or third order neurons alone did not elicit CT response.

      Reviewer #3:

      Later experiments in the paper that investigate strong CIII activation (with ChETA) in combination with other second and third order neurons does support the idea activating those neurons can facilitate body-wide muscle contractions. But many of the co-activated cells in question are either repeated in each abdominal neuromere or they project to cells that are found all along the ventral nerve cord, so it is therefore unsurprising that their activation would contribute to what appears to be a non-specific body-wide activation of muscles along the AP axis. Also, if these neurons are already downstream of the CIII neurons the logic of this co-activation approach is not particularly clear.”

      We agree with the reviewer’s comment that various cell-types that were investigated are repeated in every abdominal neuromere, however, only select post-synaptic neurons (Basin 1-4, DnB, mCSI, and Chair neurons) are segmentally repeated in every abdominal segment. Conversely, other projection and ascending neurons we investigated (A09e, A00c, A05q, Goro, TePn04/05, and A08n) are not segmentally repeated in every section. We used connectome evidence to guide our experiments on populations of neurons to explore in cold-evoked behavior and as alluded to above our co-activation approach was driven by the observation that an individual subpopulation of connected interneurons was not found to be sufficient to elicit CT behavior. That said, it does not change the findings that inhibition of neural activity in these subpopulations impairs cold-evoked behavior, nor does it change the observation that connected interneurons exhibit cold-evoked Ca2+ responses that can also be observed with optogenetic activation of CIII neurons. Reviewer #3: “The authors argument that the co-activation studies support "a population code" for cold nociception is a very optimistic interpretation of a brute force optogenetics approach that ultimately results in an enhancement of a relatively non-specific body-wide muscle convulsion.” Many studies exploring circuit bases of behavior have applied large-scale optogenetic, including co-activation strategies, or silencing screens to identify circuit components involved in specific behaviors under investigation. We employed similar methods in our circuit-based dissection and our conclusions are not solely based upon optogenetic analyses.

      References: BURGOS, A., HONJO, K., OHYAMA, T., QIAN, C. S., SHIN, G. J.-E., GOHL, D. M., SILIES, M., TRACEY, W. D., ZLATIC, M., CARDONA, A. & GRUEBER, W. B. 2018. Nociceptive interneurons control modular motor pathways to promote escape behavior in Drosophila. eLife, 7:e26016.

      CANALS, M., ROSENMANN, M. & BOZINOVIC, F. 1997. Geometrical aspects of the energetic effectivenes of huddling in small mammals. Acta Theriologica 42(3):321-328..

      CONTRERAS, L. C. 1984. Bioenergetics of Huddling: Test of a Psycho-Physiological Hypothesis. Journal of Mammalogy, 65, 256-262.

      GILBERT, C., ROBERTSON, G., LE MAHO, Y., NAITO, Y. & ANCEL, A. 2006. Huddling behavior in emperor penguins: Dynamics of huddling. Physiol Behav, 88, 479-88.

      HAYES, J. P., SPEAKMAN, J. R. & RACEY, P. A. 1992. The Contributions of Local Heating and Reducing Exposed Surface Area to the Energetic Benefits of Huddling by Short-Tailed Field Voles (Microtus agrestis). Physiological Zoology, 65, 742-762.

      HIMMEL, N. J., LETCHER, J. M., SAKURAI, A., GRAY, T. R., BENSON, M. N., DONALDSON, K. J. & COX, D. N. 2021. Identification of a neural basis for cold acclimation in Drosophila larvae. iScience, 24, 102657.

      HIMMEL, N. J., SAKURAI, A., DONALDSON, K. J. & COX, D. N. 2022. Protocols for measuring cold-evoked neural activity and cold tolerance in Drosophila larvae following fictive cold acclimation. STAR Protoc, 3, 101510.

      HIMMEL, N. J., SAKURAI, A., PATEL, A. A., BHATTACHARJEE, S., LETCHER, J. M., BENSON, M. N., GRAY, T. R., CYMBALYUK, G. S. & COX, D. N. 2023. Chloride-dependent mechanisms of multimodal sensory discrimination and nociceptive sensitization in Drosophila. elife, 12:e76863.

      HWANG, R. Y., ZHONG, L., XU, Y., JOHNSON, T., ZHANG, F., DEISSEROTH, K. & TRACEY, W. D. 2007. Nociceptive Neurons Protect Drosophila Larvae from Parasitoid Wasps. Current Biology, 17, 2105-2116.

      MAKSYMCHUK, N., SAKURAI, A., COX, D. N. & CYMBALYUK, G. 2022. Transient and Steady-State Properties of Drosophila Sensory Neurons Coding Noxious Cold Temperature. Frontiers in Cellular Neuroscience, 16:831803.

      MAKSYMCHUK, N., SAKURAI, A., COX, D. N. & CYMBALYUK, G. S. 2023. Cold-Temperature Coding with Bursting and Spiking Based on TRP Channel Dynamics in Drosophila Larva Sensory Neurons. Int J Mol Sci, 24(19):14638.

      MELONI, I., SACHIDANANDAN, D., THUM, A. S., KITTEL, R. J. & MURAWSKI, C. 2020. Controlling the behaviour of Drosophila melanogaster via smartphone optogenetics. Scientific Reports, 10, 17614.

      OHYAMA, T., SCHNEIDER-MIZELL, C. M., FETTER, R. D., ALEMAN, J. V., FRANCONVILLE, R., RIVERA-ALBA, M., MENSH, B. D., BRANSON, K. M., SIMPSON, J. H., TRUMAN, J. W., CARDONA, A. & ZLATIC, M. 2015. A multilevel multimodal circuit enhances action selection in Drosophila. Nature, 520, 633-639.

      PATEL, A. & COX, D. 2017. Behavioral and Functional Assays for Investigating Mechanisms of Noxious Cold Detection and Multimodal Sensory Processing in Drosophila Larvae. BIO-PROTOCOL, 7(13):e2388.

      PATEL, A. A., SAKURAI, A., HIMMEL, N. J. & COX, D. N. 2022. Modality specific roles for metabotropic GABAergic signaling and calcium induced calcium release mechanisms in regulating cold nociception. Front Mol Neurosci 15:942548.

      SHEARIN, H. K., DVARISHKIS, A. R., KOZELUH, C. D. & STOWERS, R. S. 2013. Expansion of the Gateway MultiSite Recombination Cloning Toolkit. PLoS ONE, 8, e77724-e77724.

      TENEDINI, F. M., SÁEZ GONZÁLEZ, M., HU, C., PEDERSEN, L. H., PETRUZZI, M. M., SPITZWECK, B., WANG, D., RICHTER, M., PETERSEN, M., SZPOTOWICZ, E., SCHWEIZER, M., SIGRIST, S. J., CALDERON DE ANDA, F. & SOBA, P. 2019. Maintenance of cell type-specific connectivity and circuit function requires Tao kinase. Nature Communications, 10, 3506.

      TURNER, H. N., ARMENGOL, K., PATEL, A. A., HIMMEL, N. J., SULLIVAN, L., IYER, S. C., BHATTACHARYA, S., IYER, E. P. R., LANDRY, C., GALKO, M. J. & COX, D. N. 2016. The TRP Channels Pkd2, NompC, and Trpm Act in Cold-Sensing Neurons to Mediate Unique Aversive Behaviors to Noxious Cold in Drosophila. Curr Biol, 26, 3116-3128.

      TURNER, H. N., PATEL, A. A., COX, D. N. & GALKO, M. J. 2018. Injury-induced cold sensitization in Drosophila larvae involves behavioral shifts that require the TRP channel Brv1. PLoS One, 13, e0209577.

      VICKERY, W. L. & MILLAR, J. S. 1984. The Energetics of Huddling by Endotherms. Oikos, 43, 88-93.

    2. eLife assessment

      This is a useful study that investigates neural circuits mediating behavioral responses to cold in Drosophila larvae. Using a combination of behavioral analysis, neuronal manipulation, EM connectomics, and reporters of calcium activity, the authors convincingly show that cold-induced body contraction is mediated by specific central neurons. However, the strength of evidence is incomplete due to the concern that larval contraction is a result of chilling the nervous system and muscles, which causes spreading depolarization and mechanical contraction of the body, rather than an active sensorimotor response to cold. With these concerns addressed, this paper would be of interest to neuroscientists interested in temperature sensing.

    3. Reviewer #1 (Public Review):

      Summary. The authors goal was to map the neural circuitry underlying cold sensitive contraction in Drosophila. The circuitry underlying most sensory modalities has been characterized but noxious cold sensory circuitry has not been well studied. The authors achieve their goal and map out sensory and post-sensory neurons involved in this behavior.

      Strengths. The manuscript provides convincing evidence for sensory and post sensory neurons involved in noxious cold sensitive behavior. They use both connectivity data and functional data to identify these neurons. This work is a clear advance in our understanding of noxious cold behavior. The experiments are done with a high degree of experimental rigor.

      Positive comments

      -Campari is nicely done to map cold responsive neurons, although it doesn't give data on individual neurons.

      -Chrimson and TNT experiments are nicely done.

      -Cold temperature activates basin neurons, it's a solid and convincing result.

      Weaknesses. Among the few weaknesses in this manuscript is the failure to trace the circuit from sensory neuron to motor neuron; and to ignore analysis of the muscles driving, cold induced contraction. Authors also need to elaborate more on the novel aspects of their work in the introduction or abstract.

      Major comments.

      -Class three sensory neuron connectivity is known, and role in cold response is known (turner 16, 18). Need to make it clearer what the novelty of the experiments are.

      -Why focus on premotor neurons in mechano nociceptive pathways? Why not focus on PMNs innervating longitudinal muscles, likely involved in longitudinal larval contraction? Especially since chosen premotor neurons have only weak effects on cold induced contraction?

    4. Reviewer #2 (Public Review):

      Patel et al perform the analysis of neurons in a somatosensory network involved in responses to noxious cold in Drosophila larvae. Using a combination of behavioral experiments, Calcium imaging, optogenetics, and synaptic connectivity analysis in the Drosophila larval they assess the function of circuit elements in the somatosensory network downstream of multimodal somatosensory neurons involved in innocuous and noxious stimuli sensing and probe their function in noxious cold processing, Consistent with their previous findings they find the multidendritic class III neurons, to be the key cold sensing neurons that are both required and sufficient for the CT behaviors response (shown to evoked by noxious cold). They further investigate the downstream neurons identified based on literature and connectivity from EM at different stages of sensory processing characterize the different phenotypes upon activating/silencing those neurons and monitor their responses to noxious cold. The work reveals diverse phenotypes for the different neurons studied and provides the groundwork for understanding how information is processed in the nervous system from sensory input to motor output and how information from different modalities is processed by neuronal networks. However, at times the writing could be clearer and some results interpretations more rigorous.

      Specific comments

      1) In Figure 1 -supplement 6D-F (Cho co-activation)

      The authors find that Ch neurons are cold sensitive and required for cold nociceptive behavior but do not facilitate behavioral responses induced but CIII neurons

      The authors show that coactivating mdIII and cho inhibits the CT (a typically observed cold-induced behavioral response) in the second part of the stimulation period, while Cho was required for cold-induced CT. Different levels of activation of md III and Cho (different light intensities) could bring some insights into the observed phenotypes upon Cho manipulation as different levels activate different downstream networks that could correspond to different stimuli. Also, it would be interesting to activate chordotonal during exposure to cold to determine how a behavioral response to cold is affected by the activation of chordotonal sensory neurons.

      2) Throughout the paper the co-activation experiments investigate whether co-activating the different candidate neurons and md III neurons facilitates the md III-induced CT response. However, the cold noxious stimuli will presumably activate different neurons downstream than optogenetic activation of MdIII and thus can reveal more accurately the role of the different candidate neurons in facilitating cold nociception.

      3) Use of blue lights in behavioral and imaging experiments

      Strong Blue and UV have been shown to activate MDIV neurons (Xiang, Y., Yuan, Q., Vogt, N. et al. Light-avoidance-mediating photoreceptors tile the Drosophila larval body wall. Nature 468, 921-926 (2010). https://doi.org/10.1038/nature09576) and some of the neurons tested receive input from MdIV. In their experiments, the authors used blue light to optogenetically activate CDIII neurons and then monitored Calcium responses in Basin neurons, premotor neurons, and ascending neurons and UV light is necessary for photoconversion in Campari Experiments. Therefore, some of the neurons monitored could be activated by blue light and not cdIII activation. Indeed, responses of Basin-4 neurons can be observed in the no ATR condition (Fig 3HI) and quite strong responses of DnB neurons. (Figure 6E) How do authors discern that the effects they see on the different neurons are indeed due to cold nociception and not the synergy of cold and blue light responses could especially be the case for DNB that could have in facilitating the response to cold in a multisensory context (where mdIV are activated by light). In addition, the silencing of DNB neurons during cold stimulation does not seem to give very robust phenotypes (no significant CT decrease compared to empty GAL4 control).

      It would be important to for example show that even in the absence of blue light the DNB facilitates the mdIII activation or cold-induced CT by using red light and Chrimson for example or TrpA activation (for coactivation with md III)

      Alternatively, in some other cases, the phenotype upon co-activation could be inhibited by blue light (e.g. chair-1 (Figure 5 H-I))

      More generally, given the multimodal nature of stimuli activating mdIV , MdIII (and Cho) and their shared downstream circuitry it is important to either control for using the blue light in these stimuli or take into account the presence of the stimulus in interpreting the results as the coactivation of for example Cho and mdIII using blue lights also could activate mdIV (and downstream neurons, alter the state of the network that could inhibit the md III induced CT responses

      Assessing the differences in behavioral phenotypes in the different conditions could give an idea of the influence of combining different modalities in these assays. For example, did the authors observe any other behaviors upon co-activation of MDIII and Cho (at the expense of CT in the second part of the stimulation) or did the larvae resume crawling? Blue light typically induces reorientation behavior. What about when co-activating mdIII and Basin-4?

      Using Chrimson and red light or TrpA in some key experiments e.g. with Cho, Basin-4, and DNB would clarify the implication of these neurons in cold nociception

      4) Basins<br /> - Page 17 line 442-3 "Neural silencing of all Basin (1-4) neurons, using two independent driver lines (R72F11GAL4 and R57F07GAL4)<br /> Did the authors check the expression profile of the R57F07 line that they use to probe "all basins"? The expression profile published previously (Ohyama et al, 2015, extended data) shows one basin neuron (identified as basin-4 ) and some neurons in the brain lobes. Also, the split GAL4 that labels Basin-4 (SS00740) is the intersection between R72F11 and R57F07 neurons. Thus the R57F07 likely labels Basin-4 and if that is the case the data in Figure 2 9 and supplement) and Figure 3 related to this driver line, should be annotated as Basin-4, and the results and their interpretation modified to take into account the different phenotypes for all basins and Basin-4 neurons

      Page 19 l. 521-525 I am confused by these sentences as the authors claim that Basin-4 showed reduced Calcium responses upon repetitive activation of CDIII md neurons but then they say they exhibit sensitization. Looking at the plots in FIG 3 F-I the Basin-4 responses upon repeated activation seem indeed to decrease on the second repetition compared to the first. What is the sensitization the authors refer to?

      On Page 47-In this section of the discussion, the authors emit an interesting hypothesis that the Basin-1 neuron could modulate the gain of behavioral responses. While this is an interesting idea, I wonder what would be the explanation for the finding that co-activation of Cho and MDIII does not facilitate cold nociceptive responses. Would activation of Basin-1 facilitate the cold response in different contexts (in addition to CH0-mediated stimuli?

      Page 48 Thus the implication of the inhibitory network in cold processing should be better contextualized

      The authors explain the difference in the lower basin-2 Ca- response to Cold/ mdIII activation (compared to Basin-4) despite stronger connectivity, due a stronger inputs from inhibitory neurons to Basin-2 (compared to Basin-4). The previously described inhibitory neurons that synapse onto Basin-2 receive rather a small fraction of inputs from the class III sensory neurons. The differences in response to cold could be potentially assigned to the activation of the inhibitory neurons by the cold-sensing cho- neurons. However, that cannot explain the differences in responses induced by class III neurons. Do the authors refer to additional inhibitory neurons that would receive significant input from MdIII?

      Alternative explanations could exist for this difference in activation: electrical synapses from mdII I onto Basin-4, and by stronger inputs from mdIV (compared to Basin-2 in the case of responses to Cold stimulus (Cold induces responses in md IV sensory neurons). Different subtypes of CD III may differentially respond to cold and the cold-sensing ones could synapse preferentially on basin-4 etc.

      5) A00c<br /> Page 26 Figure 4F-I line While Goro may not be involved in cold nociception the A00c (and A05q) seems to be.<br /> A00c could convey information to other neurons other than Goro and thus be part of a pathway for cold-induced CT.

      6) Page 31 766-768 the conclusion that "premotor function is required for and can facilitate cold nociception" seems odd to stress as one would assume that some premotor neurons would be involved in controlling the behavioral responses to a stimulus. It would be more pertinent in the summary to specify which premotor neurons are involved and what is their function

      7) There are several Split GAL4 used in the study (with transgenes inserted in attP40 et attP2 site). A recent study points to a mutation related toattP40 that can have an effect on muscle function: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9750024/. The controls used in behavioral experiments do not contain the attP40 site. It would be important to check a control genotype bearing an attP40 site and characterize the different parameters of the CT behavior to cold and take this into account in interpreting the results of the experiments using the Split-GAL4 lines

    5. Reviewer #3 (Public Review):

      Summary:<br /> The authors follow up on prior studies where they have argued for the existence of cold nociception in Drosophila larvae. In the proposed pathway, mechanosensitive Class III multidendritic neurons are the noxious cold responding sensory cells. The current study attempts to explore the potential roles of second and third order neurons, based on information of the Class III neuron synaptic outputs that have been obtained from the larval connectome.

      Strengths:

      The major strength of the manuscript is the detailed discussion of the second and third order neurons that are downstream of the mechanosensory Class III multidendritic neurons. These will be useful in further studies of gentle touch mechanosensation and mechanonociception both of which rely on sensory input from these cells. Calcium imaging experiments on Class III activation with optogenetics support the wiring diagram.

      Weaknesses:

      The scientific premise is that a full body contraction in larvae that are exposed to noxious cold is a sensorimotor behavioral pathway. This premise is, to start with, questionable. A common definition of behavior is a set of "orderly movements with recognizable and repeatable patterns of activity produced by members of a species (Baker et al., 2001)." In the case of nociception behaviors, the patterns of movement are typically thought to play a protective role and to protect from potential tissue damage.

      Does noxious cold elicit a set of orderly movements with a recognizable and repeatable pattern in larvae? Can the patterns of movement that are stimulated by noxious cold allow the larvae to escape harm? Based on the available evidence, the answer to both questions is seemingly no. In response to noxious cold stimulation many, if not all, of the muscles in the larva, simultaneously contract (Turner et al., 2016), and as a result the larva becomes stationary. In response to cold, the larva is literally "frozen" in place and it is incapable of moving away. This incapacitation by cold is the antithesis of what one might expect from a behavior that protects the animals from harm.

      Extensive literature has investigated the physiological responses of insects to cold (reviewed in Overgaard and MacMillan, 2017). In numerous studies of insects across many genera (excluding cold adapted insects such as snow flies), exposure to very cold temperatures quickly incapacitates the animal and induces a state that is known as a chill coma. During a chill coma, the insect becomes immobilized by the cold exposure, but if the exposure to cold is very brief the insect can often be revived without apparent damage. Indeed, it is common practice for many laboratories that use adult Drosophila for studies of behavior to use a brief chilling on ice as a form of anesthesia because chilling is less disruptive to subsequent behaviors than the more commonly used carbon dioxide anesthesia. If flies were to perceive cold as a noxious nociceptive stimulus, then this "chill coma" procedure would likely be disruptive to behavioral studies but is not. Furthermore, there is no evidence to suggest that larval sensation of "noxious cold" is aversive.

      The insect chill coma literature has investigated the effects of extreme cold on the physiology of nerves and muscles and the consensus view of the field is that the paralysis that results from cold is due to complex and combined action of direct effects of cold on muscle and on nerves (Overgaard and MacMillan, 2017). Electrophysiological measurements of muscles and neurons find that they are initially depolarized by cold, and after prolonged cold exposure they are unable to maintain potassium homeostasis and this eventually inhibits the firing of action potentials (Overgaard and MacMillan, 2017). The very small thermal capacitance of a Drosophila larva means that its entire neuromuscular system will be quickly exposed to the effect of cold in the behavioral assays under consideration here. It would seem impossible to disentangle the emergent properties of a complex combination of effects on physiology (including neuronal, glial, and muscle homeostasis) on any proposed sensorimotor transformation pathway.

      Nevertheless, the manuscript before us makes a courageous attempt at attempting this. A number of GAL4 drivers tested in the paper are found to affect parameters of contraction behavior (CT) in cold exposed larvae in silencing experiments. However, notably absent from all of the silencing experiments are measurements of larval mobility following cold exposure. Thus, it is not known from the study if these manipulations are truly protecting the larvae from paralysis following cold exposure, or if they are simply reducing the magnitude of the initial muscle contraction that occurs immediately following cold (ie reducing CT). The strongest effect of silencing occurs with the 19-12-GAL4 driver which targets Class III neurons (but is not completely specific to these cells).

      Optogenetic experiments for Class III neurons relying on the 19-12-GAL4 driver combined with a very strong optogenetic acuator (ChETA) show the CT behavior that was reported in prior studies. It should be noted that this actuator drives very strong activation, and other studies with milder optogenetic stimulation of Class III neurons have shown that these cells produce behavioral responses that resemble gentle touch responses (Tsubouchi et al 2012 and Yan et al 2013). As well, these neurons express mechanoreceptor ion channels such as NompC and Rpk that are required for gentle touch responses. The latter makes the reported Calcium responses to cold difficult to interpret in light of the fact that the strong muscle contractions driven by cold may actually be driving mechanosensory responses in these cells (ie through deformation of the mechanosensitive dendrites). Are the cIII calcium signals still observed in a preparation where cold induced muscle contractions are prevented?

      A major weakness of the study is that none of the second or third order neurons (that are downstream of CIII neurons) are found to trigger the CT behavioral responses even when strongly activated with the ChETA actuator (Figure 2 Supplement 2). These findings raise major concerns for this and prior studies and it does not support the hypothesis that the CIII neurons drive the CT behaviors.

      Later experiments in the paper that investigate strong CIII activation (with ChETA) in combination with other second and third order neurons does support the idea activating those neurons can facilitate body-wide muscle contractions. But many of the co-activated cells in question are either repeated in each abdominal neuromere or they project to cells that are found all along the ventral nerve cord, so it is therefore unsurprising that their activation would contribute to what appears to be a non-specific body-wide activation of muscles along the AP axis. Also, if these neurons are already downstream of the CIII neurons the logic of this co-activation approach is not particularly clear. A more convincing experiment would be to silence the different classes of cells in the context of the optogenetic activation of CIII neurons to test for a block of the effects, a set of experiments that is notably absent from the study.

      The authors argument that the co-activation studies support "a population code" for cold nociception is a very optimistic interpretation of a brute force optogenetics approach that ultimately results in an enhancement of a relatively non-specific body-wide muscle convulsion.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary: The current study reports a cryo-EM structure of MFS transporter MelB trapped in an inward-facing state by a conformationally selective nanobody. The authors compare this structure to previously-resolved crystal structures of outward-facing MelB. Additionally, the authors report H/D exchange/ mass spec experiments that identify accessible residues in the protein.

      Strengths: The authors overcame very significant technical challenges to solve the first inward-facing structure of the small, model MFS transporter MelB by cryo-EM. The use of conformation-trapping nanobodies (which had been reported previously by this group) is particularly nice.

      We appreciate reviewer #1’s positive comments.

      Weaknesses: Maps and coordinates were not provided by the authors, which presents a gap in this assessment.

      We didn’t know specific requests for maps & coordinates during the initial submission but will provide them per request.

      The authors highlight the use of HDX experiments as a measurement of protein conformational dynamics. However, this experiment does not measure the conformational dynamics of the transporter, since in these experiments exchange is not initiated by ligand addition or another trigger. The experiment instead measures the accessibility of different residues, and of course, a freely-exchanging sodium bound transporter would have more exchangeable positions than when a conformation-trapping nanobody is bound. It is not clear what new mechanistic information this provides, since this property of the nanobody has already been established.

      We thank you for your comment. We will address your and reviewer 2’s similar questions later.

      Based on the evidence presented, it is somewhat speculative that the structure represents the EIIa-bound regulatory state.

      We believe that have presented convincing evidence obtained by ITC and gel-filtration chromatography to support this statement. The effects of Nb725 or EIIAGlc on MelB functions are similar: little change in Na+ binding, little change in Nb725 or EIIAGlc binding in the absence or presence of the EIIAGlc or Nb725, but a great reduction in sugar-binding affinity (sFigs. 2&3; tables 1&2; published two papers in J. Biol. Chem. 2014; 289: 33012-33019 and 2023; 299:104967). To make it clear, we will add the related data from the two JBC papers into the table 2. Nb725 and EIIAGlc can concurrently bind to MelBSt (sFigs. 2&3; tables 1&2). Further, we will provide a new figure to show that a complex composed of all three proteins can be isolated by gel-filtration chromatography. We have also established this finding with another Nb733 from the same family (JBC, 2023; 299:104967). However, given the EIIAGlc-bound structure has not been resolved yet, we will tune down the related argument.

      Reviewer #2 (Public Review):

      Summary: In this manuscript, Hariharan and colleagues present an elegant study regarding the mechanistic basis of sugar transport by the prototypical Na+-coupled transporter MelB. The authors identified a nanobody (Nb 725) that reduces melibiose binding but not Na+ binding. In vitro (ITC) experiments suggest that the conformation targeted by this nanobody is different from the published outward-open structures. They go on to solve the structure of this other conformational by cryo-EM using the Nanobody grafted with a fiducial marker and enhancer and, as predicted, capture a new conformation of MelB, namely the inward-open conformation. Through MD simulations and ITC measurements, they demonstrate that such state has a reduced affinity for sugar but that Na+ binding is mostly unaffected. A detailed observation and comparison between previously published structures in the outward-open conformation and this new conformational intermediate allows to strengthen and develop the mobile barrier hypothesis underpinning sugar transport. The conformational transition to the inward-facing state leads to the formation of a barrier on the extracellular side that directly affects the amino acid arrangement of the sugar binding site, leading to a decreased affinity that drives the direction of transport. In contrast, the Na+ binding remains the same. This structural data is complemented with dynamic insights from HDX-MS experiments conducted in the presence and absence of the Nb. These measurements highlight the overall protective effect of nanobody binding, consistent with the stabilization of one conformational intermediate.

      Strengths: The experimental strategy to isolate this elusive conformational intermediate is smart and well-executed. The biochemical and biophysical data were obtained in a lipid system (nanodiscs), which allows dismissing questions about detergent induced artefacts. The new conformation observed is of great interest and allows to have a better mechanistic understanding of ion-coupled sugar transport. The comparison between the two structures and the mobile barrier mechanism hypothesis is convincingly depicted and tested.

      We appreciate the reviewer’s insightful understanding of our novel findings and the associated explanations on the cation-coupled symport mechanisms.

      Weaknesses: This is excellent experimental work. My recommendations stem mostly from concerns regarding the interpretation of the observed results. In particular, I am somewhat puzzled by the important role the authors give to the regulatory protein EIIa with little structural or biophysical data to back up their claims. The hypothesis that the conformation captured by the Nb is physiologically and functionally equivalent to that caused by EIIa binding is definitely a worthy hypothesis, but it is not an experimental result. Evidence in support could include a structure with EIIa bound. Since it does not bind at the same location as the Nb, it seems feasible. Or, the authors could have performed HDX-MS in the presence of EIIa to determine if the effect is similar to that of Nb_725 binding. In the absence of these experiments, discussion about EIIa should be limited. Along the same lines, I find it misleading to put in the abstract a sentence such as "It is the first structure of a major facilitator superfamily (MFS) transporter with experimentally determined cation binding, and also a structure mimicking the physiological regulatory state of MelB under the global regulator EIIAGlc of the glucose-specific phosphoenolpyruvate:phosphotransferase system." None of this is supported by the experimental work presented in this article: the Na+ is modelled (with great confidence, but still) and whether this structure mimics the physiological state of MelB bound to EIIa is not known. The results of the paper are strong and interesting enough per se, and there is no need to inflate them with hypothesis that belongs to the discussion section.

      As stated in the response to reviewer 1, we believe that we presented strong data to argue for a structure mimicking the physiological regulatory state of MelB. The only missing data is the lack of the structure determination of the EIIA-bound state. We will change the title and tune down the related discussions in a new version.

      Regarding our statement in our abstract that “It is the first structure of a major facilitator superfamily (MFS) transporter with experimentally determined cation binding”, we believe that our claim is supported by the resolved Na+ binding in the cryoEM structure. So far, to our knowledge, there was no experimentally determined cation on its canonical binding site reported yet.

      I also note that the HDX-MS experiments do not distinguish between two conformational states, but rather an ensemble of states vs one state.

      We will address both reviewers 1 and 2 together. We agree with your comments and we compared the one (inward) state and ensembles of (predominantly outward) states. A lot of published data have demonstrated that the WT MelBSt predominantly populates outward-facing states, especially in the presence of Na+. The major differences in HDX-MS between the inward-facing state in the presence of the Nb and the outward-facing ensembles in the absence of the Nb should be related to the conformational changes between the inward- and outward-facing states, but not quantitatively. The type of measurements we performed do not contain information on the rates of conformational changes, but this study identified the dynamics regions involved in this conformational switch.

      Reviewer #3 (Public Review):

      Summary: The manuscript authored by Lan Guan and colleagues reveals the structure of the cytosol-facing conformation of the MelB sodium/Li coupled permease using the nab-Fab approach and cryoEM for structure determination. The study reveals the conformational transitions in the melB transport cycle and allows understanding the role of sugar and ion specificities within this transporter.

      Strengths: The study employs a very exciting strategy of transferring the CDRS of a conformation specific nano body to the nab-fab system to determine the inward-open structure of MelB. The resolution of the structure is reasonable enough to support the major conclusions of the study. This is overall a well-executed study.

      Thank you for your positive comments.

      Weaknesses: The authors seem to have mixed up the exothermic and endothermic aspects of ITC binding in their description. Positive heats correspond to endothermic heat changes in ITC and negative heat changes correspond to exothermic heats. The authors seem to suggest the opposite.

      This is consistently observed throughout the manuscript.

      All of our ITC data are correctly presented. Our data were collected from the NanoITC (TA instruments, Inc), which directly measures the heat release/enthalpic changes and projects exotherm with positive values. This is in contrast to the MicroCal device, which detects heat changes through voltage compensation and exotherm is depicted with negative values. We will further emphasize this in related figure legends.

    2. eLife assessment

      In a valuable study that will be of interest to the mechanistic membrane transport community, the authors capture the first cryo-EM structure of the inward facing melbiose transporter MelB, a well-studied model transporter from the major facilitator (MFS) superfamily. Cryo-EM experiments and supporting biophysical experiments provide solid evidence for transporter conformational changes. The supporting evidence is incomplete in that the maps were not provided for review.

    1. eLife Assessment

      This valuable paper presents a new protocol for quantifying tRNA aminoacylation levels by deep sequencing. The improved methods for discrimination of aminoacyl-tRNAs from non-acylated tRNAs, more efficient splint-assisted ligation to modify the tRNAs' ends for the following RT-PCR reaction, and the use of an error-tolerating mapping algorithm to map the tRNA sequencing reads provide new tools for anyone interested in tRNA concentrations and functional states in different cells and organisms. The results and conclusions are solid with well-designed tests to optimize the protocol under different conditions.

    2. Reviewer #1 (Public Review):

      Summary:<br /> The manuscript of Davidsen and Sullivan describes an improved tRNA-seq protocol to determine aminoacyl-tRNA levels. The improvements include: (i) optimizing the Whitfeld or oxidation reaction to select aminoacyl-tRNAs from oxidation-sensitive non-acylated tRNAs; (ii) using a splint-assisted ligation to modify the tRNAs' ends for the following RT-PCR reaction; (iii) using an error-tolerating mapping algorithm to map the tRNA sequencing reads that contain mismatches at modified nucleotides.

      Strengths:<br /> The two steps, the oxidation, and the splint-assisted ligation are yield-diminishing steps, thus the protocol of Davidsen and Sullivan is an important improvement of the current protocols to enhance the quantification of aminocyl-tRNAs.

      Weaknesses:<br /> The oxidation and the selection of aminoacyl-tRNA is the first step in all protocols. Thereafter they differ on whether blunt ligation, hairpin (DM-tRNA-seq, YAMAT-seq, QuantM-seq, mim tRNA-seq, LOTTE tRNA-seq), or splint ligation is used and finally what detection method is applied (i-tRAP, tRNA microarrays). What is the correlation to those alternative approaches (e.g. i-tRAP (PMID 36283829), tRNA microarrays (PMID: 31263264) etc.)? What is the correlation with other approaches with which this improved protocol shares some steps (DM-tRNA-seq, mim-tRNA-seq)?

    3. Reviewer #2 (Public Review):

      Davidsen and Sullivan present an improved method for quantifying tRNA aminoacylation levels by deep sequencing. By combining recent advances in tRNA sequencing with lysine-based chemistry that is more gentle on RNA, splint oligo-based adapter ligation, and full alignment of tRNA reads, they generate an interesting new protocol. The lab protocol is complemented by a software tool that is openly available on Github. Many of the points highlighted in this protocol are not new but have been used in recent protocols such as Behrens et al. (2021) or McGlincy and Ingolia (2017). Nevertheless, a strength of this study is that the authors carefully test different conditions to optimize their protocol using a set of well-designed controls.

      The conclusions of the manuscript appear to be well supported by the data presented. However, there are a few points that need to be clarified.

      1) One point that remains unsatisfactory is a better benchmarking against the state of the art. It is currently impossible to estimate how much the results of this new protocol differ from alternative methods and in particular from Behrens et al. (2021). Here it will be helpful to perform experiments with samples similar to those used in the mim-tRNAseq study and not with H1299 cells.

      2) While the protocol aims to implement an improved method for quantification of tRNA aminoacylation, it can also be used for tRNA quantification and analysis of tRNA modifications. It will increase the impact of this study if the authors benchmark the outcomes of their protocol with other tRNA sequencing protocols with samples similar to these papers, which will be important for certain research teams that are unlikely to implement two different tRNA sequencing methods. Are there any possible adaptations that would allow the analysis of tRNA fragments?

      3) Like Behrens et al. (2021), Davidsen and Sullivan use TGIRT-III RT for their analyses. The enzyme is not currently available in a form suitable for tRNA-seq. It would be very helpful to test different new RT enzymes that are commercially available. The example of Maxima RT - Figure 2 Supp 6 - shows significantly lower performance than the presented TGIRT-III RT data. In lines 296-298, the authors mention improvements to the protocol by using ornithine. Why are these improvements not included?

      4) A technical concern: The samples are purified multiple times using a specific RNA purification kit. Did the authors test different methods to purify the RNA and does this influence the result of the method?

      5) The study would benefit from an explicit step-by-step protocol, including the choice of adapters that are shown to work best in the protocol.

    1. eLife assessment

      This valuable study reports on the potential of neural networks to emulate simulations of human ventricular cardiomyocyte action potentials for various ion channel parameters with the advantage of saving simulation time in certain conditions. The evidence supporting the claims of the authors is solid, although the inclusion of open analysis of drop-off accuracy and validation of the neural network emulators against experimental data would have strengthened the study. The work will be of interest to scientists working in cardiac simulation and quantitative pharmacology.

    2. Reviewer #1 (Public Review):

      Summary:<br /> The authors present a neural network (NN)-based approach to computationally cheaper emulation of simulations of biophysically relatively detailed cardiac cell models based on systems of ordinary differential equations. Relevant case studies are used to demonstrate the performance in the prediction of standard action potentials, as well as action potentials manifesting early depolarizations. Application to the "reverse problem" (inferring the effect of pharmacological compounds on ion channels based on action potential data before and after drug treatment) is also explored, which is a task of generally high interest.

      Strengths:<br /> This is a well-designed study, which explores an area that many in the cardiac simulation community will be interested in. The article is well written and I particularly commend the authors on transparency of methods description, code sharing, etc. - it feels rather exemplary in this regard and I only wish more authors of cardiac simulation studies took such an approach. The training speed of the network is encouraging and the technique is accessible to anyone with a reasonably strong GPU, not needing specialized equipment.

      Weaknesses:<br /> Below are several points that I consider to be weaknesses and/or uncertainties of the work:

      1. I am not convinced by the authors' premise that there is a great need for further acceleration of cellular cardiac simulations - it is easy to simulate tens of thousands of cells per day on a workstation computer, using simulation conditions similar to those of the authors. I do not really see an unsolved task in the field that would require further speedup of single-cell simulations.

      At the same time, simulations offer multiple advantages, such as the possibility to dissect mechanisms of the model behaviour, and the capability to test its behaviour in a wide array of protocols - whereas a NN is trained for a single purpose/protocol, and does not enable a deep investigation of mechanisms. Therefore, I am not sure the cost/benefit ratio is that strong for single-cell emulation currently.

      An area that is definitely in need of acceleration is simulations of whole ventricles or hearts, but it is not clear how much potential for speedup the presented technology would bring there. I can imagine interesting applications of rapid emulation in such a setting, some of which could be hybrid in nature (e.g. using simulation for the region around the wavefront of propagating electrical waves, while emulating the rest of the tissue, which is behaving more regularly/predictable, and is likely to be emulated well), but this is definitely beyond of the scope of this article.

      2. The authors run a cell simulation for 1000 beats, training the NN emulator to mimic the last beat. It is reported that the simulation of a single cell takes 293 seconds, while emulation takes only milliseconds, implying a massive speedup. However, I consider the claimed speedup achieved by emulation to be highly context-dependent, and somewhat too flattering to the presented method of emulation. Two specific points below:

      First, it appears that a not overly efficient (fixed-step) numerical solver scheme is used for the simulation. On my (comparable, also a Threadripper) CPU, using the same model ("ToR-ORd-dyncl"), but a variable step solver ode15s in Matlab, a simulation of a cell for 1000 beats takes ca. 50 seconds, rather than 293 of the authors. This can be further sped up by parallelization when more cells than available cores are simulated: on 32 cores, this translates into ca. 2 seconds amortized time per cell simulation (I suspect that the NN-based approach cannot be parallelized in a similar way?). By amortization, I mean that if 32 models can be simulated at once, a simulation of X cells will not take X*50 seconds, but (X/32)*50. (with only minor overhead, as this task scales well across cores).

      Second, and this is perhaps more important - the reported speed-up critically depends on the number of beats in the simulation - if I am reading the article correctly, the runtime compares a simulation of 1000 beats versus the emulation of a single beat. If I run a simulation of a single beat across multiple simulated cells (on a 32-core machine), the amortized runtime is around 20 ms per cell, which is only marginally slower than the NN emulation. On the other hand, if the model was simulated for aeons, comparing this to a fixed runtime of the NN, one can get an arbitrarily high speedup.

      Therefore, I'd probably emphasize the concrete speedup less in an abstract and I'd provide some background on the speedup calculation such as above, so that the readers understand the context-dependence. That said, I do think that a simulation for anywhere between 250 and 1000 beats is among the most reasonable points of comparison (long enough for reasonable stability, but not too long to beat an already stable horse; pun with stables was actually completely unintended, but here it is...). I.e., the speedup observed is still valuable and valid, albeit in (I believe) a somewhat limited sense.

      3. It appears that the accuracy of emulation drops off relatively sharply with increasing real-world applicability/relevance of the tasks it is applied to. That said, the authors are to be commended on declaring this transparently, rather than withholding such analyses. I particularly enjoyed the discussion of the not-always-amazing results of the inverse problem on the experimental data. The point on low parameter identifiability is an important one and serves as a warning against overconfidence in our ability to infer cellular parameters from action potentials alone. On the other hand, I'm not that sure the difference between small tissue preps and single cells which authors propose as another source of the discrepancy will be that vast beyond the AP peak potential (probably much of the tissue prep is affected by the pacing electrode?), but that is a subjective view only. The influence of coupling could be checked if the simulated data were generated from 2D tissue samples/fibres, e.g. using the Myokit software.

      Given the points above (particularly the uncertain need for further speedup compared to running single-cell simulations), I am not sure that the technology generated will be that broadly adopted in the near future. However, this does not make the study uninteresting in the slightest - on the contrary, it explores something that many of us are thinking about, and it is likely to stimulate further development in the direction of computationally efficient emulation of relatively complex simulations.

    3. Reviewer #2 (Public Review):

      Summary:<br /> This study provided a neural network emulator of the human ventricular cardiomyocyte action potential. The inputs are the corresponding maximum conductances and the output is the action potential (AP). It used the forward and inverse problems to evaluate the model. The forward problem was solved for synthetic data, while the inverse problem was solved for both synthetic and experimental data. The NN emulator tool enables the acceleration of simulations, maintains high accuracy in modeling APs, effectively handles experimental data, and enhances the overall efficiency of pharmacological studies. This, in turn, has the potential to advance drug development and safety assessment in the field of cardiac electrophysiology.

      Strengths:<br /> (1) Low computational cost: The NN emulator demonstrated a massive speed-up of more than 10,000 times compared to the simulator. This substantial increase in computational speed has the potential to expedite research and drug development processes

      (2) High accuracy in the forward problem: The NN emulator exhibited high accuracy in solving the forward problem when tested with synthetic data. It accurately predicted normal APs and, to a large extent, abnormal APs with early afterdepolarizations (EADs). High accuracy is a notable advantage over existing emulation methods, as it ensures reliable modeling and prediction of AP behavior

      Weaknesses:<br /> (1) Input space constraints: The emulator relies on maximum conductances as inputs, which explain a significant portion of the AP variability between cardiomyocytes. Expanding the input space to include channel kinetics parameters might be challenging when solving the inverse problem with only AP data available.

      (2) Simplified drug-target interaction: In reality, drug interactions can be time-, voltage-, and channel state-dependent, requiring more complex models with multiple parameters compared to the oversimplified model that represents the drug-target interactions by scaling the maximum conductance at control. The complex model could also pose challenges when solving the inverse problem using only AP data.

      (3) Limited data variety: The inverse problem was solved using AP data obtained from a single stimulation protocol, potentially limiting the accuracy of parameter estimates. Including AP data from various stimulation protocols and incorporating pacing cycle length as an additional input could improve parameter identifiability and the accuracy of predictions.

      (4) Larger inaccuracies in the inverse problem using experimental data: The reasons for this result are not quite clear. Hypotheses suggest that it may be attributed to the low parameter identifiability or the training data set were collected in small tissue preparation.

    4. Reviewer #3 (Public Review):

      Summary:<br /> Grandits and colleagues were trying to develop a new tool to accelerate pharmacological studies by using neural networks to emulate the human ventricular cardiomyocyte action potential (AP). The AP is a complex electrical signal that governs the heartbeat, and it is important to accurately model the effects of drugs on the AP to assess their safety and efficacy. Traditional biophysical simulations of the AP are computationally expensive and time-consuming. The authors hypothesized that neural network emulators could be trained to predict the AP with high accuracy and that these emulators could also be used to quickly and accurately predict the effects of drugs on the AP.

      Strengths:<br /> One of the study's major strengths is that the authors use a large and high-quality dataset to train their neural network emulator. The dataset includes a wide range of APs, including normal and abnormal APs exhibiting EADs. This ensures that the emulator is robust and can be used to predict the AP for a variety of different conditions.

      Another major strength of the study is that the authors demonstrate that their neural network emulator can be used to accelerate pharmacological studies. For example, they use the emulator to predict the effects of a set of known arrhythmogenic drugs on the AP. The emulator is able to predict the effects of these drugs, even though it had not been trained on these drugs specifically.

      Weaknesses:<br /> One weakness of the study is that it is important to validate neural network emulators against experimental data to ensure that they are accurate and reliable. The authors do this to some extent, but further validation would be beneficial. In particular for the inverse problem, where the estimation of pharmacological parameters was very challenging and led to particularly large inaccuracies.

      Additional context:<br /> The work by Grandits et al. has the potential to revolutionize the way that pharmacological studies are conducted. Neural network emulation has the promise to reduce the time and cost of drug development and to improve the safety and efficacy of new drugs. The methods and data presented in the paper are useful to the community because they provide a starting point for other researchers to develop and improve neural network emulators for the human ventricular cardiomyocyte AP. The authors have made their code and data publicly available, which will facilitate further research in this area.

      It is important to note that neural network emulation is still a relatively new approach, and there are some challenges that need to be addressed before it can be widely adopted in the pharmaceutical industry. For example, neural network emulators need to be trained on large and high-quality datasets. Additionally, it is important to validate neural network emulators against experimental data to ensure that they are accurate and reliable. Despite these challenges, the potential benefits of neural network emulation for pharmacological studies are significant. As neural network emulation technology continues to develop, it is likely to become a valuable tool for drug discovery and development.

    1. eLife assessment

      This study reports a new approach to determine the architecture of peptidoglycan (PG), the primary component of the bacterial cell wall, validating the pipeline through an architectural analysis of several members of the human gut microbiota. The technique is potentially valuable for this sub-field as it would enable researchers interested in peptidoglycan in a range of organisms to easily assess muropeptide composition in an easy, automated manner. However, there is some uncertainty about whether the pipeline was fully automated and it was noted that the pipeline requires prior knowledge of the peptidoglycan composition of an organism. Additionally, the use of the technique to investigate whether PG cross-bridge length is a determinant of cell wall stiffness produced evidence that would need more direct support and is therefore so far incomplete.

    2. Reviewer #1 (Public Review):

      The paper from Hsu and co-workers describes a new automated method for analyzing the cell wall peptidoglycan composition of bacteria using liquid chromatography and mass spectrometry (LC/MS) combined with newly developed analysis software. The work has great potential for determining the composition of bacterial cell walls from diverse bacteria in high-throughput, allowing new connections between cell wall structure and other important biological functions like cell morphology or host-microbe interactions to be discovered. A downside to the method is that it does require some prior knowledge of an organisms peptidoglycan composition to generate the database for automated analysis. Nevertheless, the automation will allow rapid analysis of peptidoglycan composition under a variety of conditions and/or between closely related organisms once the general peptidoglycan structure is known. The methodology described will therefore be useful for the field.

      The potential connection between the structure of different cell walls from bifidobacteria and cell stiffness proposed in the report is weak. The cells analyzed are from different strains such that there are many possible reasons for the change in physical measurements made by AFM. Conclusions relating cell wall composition to stiffness would be best drawn from a single strain of bacteria genetically modified to have an altered content of 3-3 crosslinks.

    3. Reviewer #2 (Public Review):

      The authors introduce "HAMA", a new automated pipeline for architectural analysis of the bacterial cell wall. Using MS/MS fragmentation and a computational pipeline, they validate the approach using well-characterized model organisms and then apply the platform to elucidate the PG architecture of several members of the human gut microbiota. They discover differences in the length of peptide crossbridges between two species of the genus Bifidobacterium and then show that these species also differ in cell envelope stiffness, resulting in the conclusion that PG "compactness" determines stiffness.

      The pipeline is solid and revealing the poorly characterized PG architecture of the human gut microbiota is worthwhile and significant. However, it is unclear if or how their pipeline is superior to other existing techniques - PG architecture analysis is routinely done by many other labs; the only difference here seems to be that the authors chose gut microbes to interrogate.

      I do not agree with their conclusions about the correlation between compactness and cell envelope stiffness. These experiments are done on two different species of bacteria and their experimental setup therefore does not allow them to isolate crossbridge length (which they propose indicates more or less compact PG) as the only differential property that can influence stiffness. These two species likely also differ in other ways that could modulate stiffness, e.g. turgor pressure, overall PG architecture (not just crossbridge length), membrane properties, teichoic acid composition etc.

    1. eLife assessment

      This important study describes the coordinated regulation of cellular size and protein translation in response to chronic stress as an adaptive mechanism, termed the 'rewiring stress response' regulated by the heat shock response. The evidence supporting this conclusion is solid, utilizing diverse methods to monitor and manipulate cell size and evaluate stress resistance. The study could be strengthened by the inclusion of more experiments focused on defining the mechanistic basis of this coordination and broadening the scope of the specific role of the 'rewiring stress response' across different chronic cellular stresses. This work will be of broad interest to researchers interested in diverse fields including cellular proteostasis, stress-responsive signaling, and aging and senescence.

    2. Reviewer #1 (Public Review):

      The manuscript describes that cultured mammalian cells adapt to chronic stress by increasing their size and protein translation through Hsp90. The authors extensively use Hsp90 knockout cells and mass spectrometry to provide solid evidence that chronic heat shock response is accompanied by cell size changes and stress resistance in large cells. The major strength of the work is the authors ability to document the heat shock response in detail. The increased stress resistance of large cells is conceptually important and provides one potential explanation why cells need to control their size. This work adds to our understanding of how cellular stress is managed, and while stress responses have been observed previously in relation to cell size, this work provides evidence for increased stress resistance in larger cells.

    3. Reviewer #2 (Public Review):

      The authors have done a number of additional experiments and textual changes to address referee comments from the first round of review that have improved some aspects of the manuscript. However, they did not fully address two major issues brought up in my previous public review, reiterated below.

      1) What is the specific role for HSP90a/b in regulating protein translation during chronic stress through the ISR or related pathways? The authors indicate that the induction of the eIF2a phosphatase GADD34 is not impacted in HSP90-deficient cells, so what role does HSP90 have in this process. Is HSP90 required for proper folding of GADD34? Would you see similar effects in protein translation recovery if other ISR activators are used in HSP90-deficient cells?

      2) Are similar effects observed in non-dividing cells?' Does chronic stress lead to increases of size and regulation of protein translation in primary cell models that are not undergoing division.

      This leaves the study as an interesting observational study that correlates increases in cell size and protein translation. However, it doesn't really answer some of the most important questions related to mechanisms defining this correlation. Regardless, this remains an interesting jumping off point to continue exploring this interesting finding correlating cell size and stress signaling that will be further pursued in subsequent manuscripts, which will likely continue to reveal the importance and mechanistic basis of this 'rewiring stress response' during stress and in disease.

    1. eLife assessment

      This manuscript provides important insights into the degradation of a host tRNA modification enzyme TRMT1 by SARS-CoV-2 protease nsp5. The data convincingly support the main conclusions of the paper. These results will be of interest to virologists interested in studying the alterations in tRNA modifications, host methyltransferases, and viral infections.

    2. Reviewer #1 (Public Review):

      Zhang et al. investigate the hypothesis that tRNA methyl transferase 1 (TRMT1) is cleaved by NSP5 (nonstructural protein 5 or MPro), the SARS-CoV-2 main protease, during SARS-CoV-2 infection. They provide solid evidence that TRMT1 is a substrate of Nsp5, revealing an Nsp5 target consensus sequence and evidence of TRMT1 cleavage in cells. Their conclusions are exceptionally strong given the co-submission by D'Oliveira et al showing cleavage of TRMT1 in vitro by Nsp5. Separately, the authors convincingly demonstrate widespread downregulation of RNA modifications during CoV-2 infection, including a requirement for TRMT1 in efficient viral replication. This finding is congruent with the authors' previous work defining the impact of TRMT1 and m2,2g on global translation, which is most likely necessary to support infection and virion production. What still remains unclear is the functional relevance of TRMT1 cleavage by Nsp5 during infection. Based on the data provided here, TRMT1 cleavage may be an act by CoV-2 to self-limit replication, as the expression of a non-cleavable TRMT1 (versus wild-type TRMT1) supports enhanced viral RNA expression at certain MOIs. Theoretically, TRMT1 cleavage should inactivate the modification activity of TRMT1, which the authors thoroughly and elegantly investigate with rigorous biochemical assays. However, only a minority of TRMT1 undergoes cleavage during infection in this study and thus whether TRMT1 cleavage serves an important functional role during CoV-2 replication will be an important topic for future work. The authors fairly assess their work in this regard. This study pushes forward the idea that control of tRNA expression and functionality is an important and understudied area of host-pathogen interaction.

      Weaknesses noted:<br /> The detection of the N-terminal TRMT1 fragment by western blot is not robust. The polyclonal antibody used to detect TRMT1 in this work cross-reacts with a non-specific protein product. Unfortunately, this obstructs the visualization of the predicted N-terminal TRMT1 fragment. It is unclear how the authors were able to perform densitometry, given the interference of the non-specific band. Additionally, the replicates in the source data make it clear that the appearance of the N-terminal fragment "wisp" under the non-specific band is not seen in every replicate. Though the disappearance of this wisp with mutant Nsp5 and uncleavable TRMT1 is reassuring, the detection of the N-terminal fragment with the TRMT1 antibody should be assessed critically. Considering this group has strong research interests in TRMT1, I assume that attempts to make other antibodies have proved unfruitful. Additionally, N-terminal tagging of TRMT1 is predicted to disrupt the mitochondrial targeting signal, eliminating the potential for using alternative antibodies to see the N-terminal fragment. These technical issues reiterate the fact that the functional significance of TRMT1 cleavage during CoV-2 infection remains unclear. However, this study demonstrates an important finding that the tRNA modification landscape is altered during CoV-2 infection and that TRMT1 is an important host factor supporting CoV-2 replication.

    3. Reviewer #2 (Public Review):

      Summary:<br /> The manuscript titled 'Proteolytic cleavage and inactivation of the TRMT1 tRNA modification enzyme by SARS-CoV-2 main protease' from K. Zhang et al. demonstrates that several RNA modifications are downregulated during SARS-CoV-2 infection including the widespread m2,2G methylation, which potentially contributes to changes in host translation. To understand the molecular basis behind this global hypomodification of RNA during infection, the authors focused on the human methyltransferase TRMT1 that catalyzes the m2,2G modification. They reveal that TRMT1 not only interacts with the main SARS-CoV-2 protease (Nsp5) in human cells but is also cleaved by Nsp5. To establish if TRMT1 cleavage by Nsp5 contributes to the reduction in m2,2G levels, the authors show compelling evidence that the TRMT1 fragments are incapable of methylating the RNA substrates due to loss of RNA binding by the catalytic domain. They further determine that expression of full-length TRMT1 is required for optimal SARS-CoV-2 replication in 293T cells. Nevertheless, the cleavage of TRMT1 was dispensable for SARS-CoV-2 replication hinting at the possibility that TRMT1 could be an off-target or fortuitous substrate of Nsp5. Overall, this study will be of interest to virologists and biologists studying the role of RNA modification and RNA modifying enzymes in viral infection.

      Strengths:<br /> • The authors use a state-of-the-art mass spectrometry approach to quantify RNA modifications in human cells infected with SARS-CoV-2.<br /> • The authors go to great length to demonstrate that SARS-CoV-2 main protease, Nsp5, interacts, and cleaves TRMT1 in cells and perform important controls when needed. They use a series of overexpression with strategically placed tags on both TRMT1 and Nsp5 to strengthen their observations.<br /> • The use of an inactive Nsp5 mutant (C145A) strongly supports the claim of the authors that Nsp5 is solely responsible for TRMT1 cleavage in cells.<br /> • Although the direct cleavage was not experimentally determined, the authors convincingly show that TRMT1 Q530N is not cleaved by Nsp5 suggesting that the predicted cleavage site at this position is most likely the bona fide region processed by Nsp5 in cells.<br /> • To understand the impact of TRMT1 cleavage on its RNA methylation activity, the authors rigorously test four protein constructs for their capacity not only to bind RNA but also to introduce the m2,2G modification. They demonstrate that the fragments resulting from TRMT1 cleavage are inactive and cannot methylate RNA. They further establish that the C-terminal region of TRMT1 (containing a zinc-finger domain) is the main binding site for RNA.<br /> • While 293T cells are unlikely an ideal model system to study SARS-CoV-2 infection, the authors use two cell lines and well-designed rescue experiments to uncover that TRMT1 is required for optimal SARS-CoV-2 replication.

      Weaknesses:<br /> • Immunoblotting is extensively used to probe for TRMT1 degradation by Nsp5 in this study. Regretfully, the polyclonal antibody used by the authors shows strong non-specific binding to other epitopes. This complicates the data interpretation and quantification since the cleaved TRMT1 band migrates very closely to a main non-specific band detected by the antibody (for instance Fig 3A). While this reviewer is concerned about the cross-contamination during quantification of the N-TRMT1, the loss of this faint cleaved band with the TRMT1 Q530N mutant is reassuring. Nevertheless, the poor behavior of this antibody for TRMT1 detection was already reported and the authors should have taken better precautions or designed a different strategy to circumvent the limitation of this antibody by relying on additional tags.

      • While 293T cells are convenient to use, it is not a well-suited model system to study SARS-CoV-2 infection and replication. Therefore, some of the conclusions from this study might not apply to better-suited cell systems such as Vero E6 cells or might not be observed in patient-infected cells.

      • The reduction of bulk TRMT1 levels is minor during infection of MRC5 cells with SARS-CoV-2 (Fig 1). This does not seem to agree with the more dramatic reduction in m2,2G modification levels. Cellular Localization experiments of TRMT1 would help clarify this. While TRMT1 is found in the cytoplasm and nucleus, it is possible that TRMT1 is more dramatically degraded in the cytoplasm due to easier access by Nsp5.

      • In Fig 6, the authors show that TRMT1 is required for optimal SARS-CoV-2 replication. This can be rescued by expressing TRMT1 (Fig 7). Nevertheless, it is unknown if the methylation activity of TRMT1 is required. The authors could have expressed an inactive TRMT1 mutant (by disrupting the SAM binding site) to establish if the RNA modification by TRMT1 is important for SARS-CoV-2 replication or if it is the protein backbone that might contribute to other processes.

      • Fig 7, the authors used the Q530N variant to rescue SARS-CoV-2 replication in TRMT1 KO cells. This is an important experiment and unexpectedly reveals that TRMT1 cleavage by Nsp5 is not required for viral replication. To strengthen the claim of the authors that TRMT1 is required to promote viral replication and that its cleavage inhibits RNA methylation, the authors could express the TRMT1 N-terminal construct in the TRMT1 KO cells to assess if viral replication is restored or not to similar levels as WT TRMT1. This will further validate the potential biological importance of TRMT1 cleavage by Nsp5.

      • Fig 7 shows that the TRMT1 Q530N variant rescues SARS-CoV-2 replication to greater levels then WT TRMT1. The authors should discuss this in greater detail and its possible implications with their proposed statement. For instance, are m2,2G levels higher in Q530N compared to WT? Does Q530N co-elute with Nsp5 or is the interaction disrupted in cells?

    4. Reviewer #3 (Public Review):

      Summary:<br /> In this manuscript, the authors have used biochemical approaches to provide compelling evidence for the cleavage of TRMT1 by SARS-CoV-2 Nsp5 protease. This work is of wide interest to biochemists, cell biologists, and structural biologists in the coronavirus (CoV) field. Furthermore, it substantially advances the understanding of how CoV's interact with host factors during infection and modify cellular metabolism.

      Strengths:<br /> The authors provide multiple lines of biochemical evidence to report a TRMT1-Nsp5 interaction during SARS-CoV-2 infection. They show that the host enzyme TRMT1 is cleaved at a specific site and that it generates fragments that are incapable of functioning properly. This is an important result because TRMT1 is a critical player in host protein synthesis. This also advances our understanding of virus-host interactions during SARS-CoV-2 infections.

      Weaknesses:<br /> The major weakness is the lack of mechanistic insights into TRMT1-Nsp5 interactions. The authors have provided commendable biochemical data on proving the TRMT1-Nsp5 interaction but without clear mechanistic insights into when this interaction takes place in the context of SARS-CoV-2 propagation, what are the functional consequences of this interaction on host biology, and does this somehow benefit the infecting virus? I feel that the authors played it a bit safe despite having access to several reagents and an extremely promising research direction.

    1. eLife assessment

      This study reports biochemical and structural analysis of two PLP decarboxylase enzymes from plants. The findings are useful due to the utility of these enzymes in industrial theanine production. While certain aspects of the study are solid, other components elucidating the role of a Zn(II)-binding motif are incomplete. In addition, some of the finding could be presented more clearly, including the connections between the structural findings and the reaction mechanism. The work will be of interest to enzymologists studying PLP enzymes and those interested in enzyme engineering in plants.

    2. Reviewer #1 (Public Review):

      In this study, the structural characteristics of plant AlaDC and SerDC were analyzed to understand the mechanism of functional differentiation, deepen the understanding of substrate specificity and catalytic activity evolution, and explore effective ways to improve the initial efficiency of theanine synthesis.

      On the basis of previous solid work, the authors successfully obtained the X-ray crystal structures of the precursors of theanine synthesis-CsAlaDC and AtSerDC, which are key proteins related to ethylamine synthesis, and found a unique zinc finger structure on these two crystal structures that are not found in other Group II PLP- dependent amino acid decarboxylases. Through a series of experiments, it is pointed out that this characteristic zinc finger motif may be the key to the folding of CsAlaDC and AtSerDC proteins, and this discovery is novel and prospective in the study of theine synthesis.

      In addition, the authors identified Phe106 of CsAlaDC and Tyr111 of AtSerDC as key sites of substrate specificity by comparing substrate binding regions and identified amino acids that inhibit catalytic activity through mutation screening based on protein structure. It was found that the catalytic activity of CsAlaDCL110F/P114A was 2.3 times higher than that of CsAlaDC. At the same time, CsAlaDC and AtSerDC substrate recognition key motifs were used to carry out evolutionary analysis of the protein sequences that are highly homologous to CsAlaDC in embryos, and 13 potential alanine decarboxylases were found, which laid a solid foundation for subsequent studies related to theanine synthesis.

      In general, this study has a solid foundation, the whole research idea is clear, the experimental design is reasonable, and the experimental results provide strong evidence for the author's point of view. Through a large number of experiments, the key links in the theanine synthesis pathway are deeply studied, and an effective way to improve the initial efficiency of theanine synthesis is found, and the molecular mechanism of this way is expounded. The whole study has good novelty and prospectivity, and sheds light on a new direction for the efficient industrial synthesis of theanine.

    3. Reviewer #2 (Public Review):

      Summary:<br /> The manuscript focuses on the comparison of two PLP-dependent enzyme classes that perform amino acyl decarboxylations. The goal of the work is to understand the substrate specificity and factors that influence the catalytic rate in an enzyme linked to theanine production in tea plants.

      Strengths:<br /> The work includes x-ray crystal structures of modest resolution of the enzymes of interest. These structures provide the basis for the design of mutagenesis experiments to test hypotheses about substrate specificity and the factors that control catalytic rate. These ideas are tested via mutagenesis and activity assays, in some cases both in vitro and in plants.

      Weaknesses:<br /> The manuscript could be more clear in explaining the contents of the x-ray structures and how the complexes studied relate to the reactant and product complexes. The structure and mechanism section would also be strengthened by including a diagram of the reaction mechanism and including context about reactivity. As it stands, much of the structural results section consists of lists of amino acids interacting with certain ligands without any explanation of why these interactions are important or the role they play in catalysis. The experiments testing the function of a novel Zn(II)-binding domain also have serious flaws. I don't think anything can be said at this point about the function of the Zn(II) due to a lack of key controls and problems with experimental design.

    4. Reviewer #3 (Public Review):

      In the manuscript titled "Structure and Evolution of Alanine/Serine Decarboxylases and the Engineering of Theanine Production," Wang et al. solved and compared the crystal structures of Alanine Decarboxylase (AlaDC) from Camellia sinensis and Serine Decarboxylase (SerDC) from Arabidopsis thaliana. Based on this structural information, the authors conducted both in vitro and in vivo functional studies to compare enzyme activities using site-directed mutagenesis and subsequent evolutionary analyses. This research has the potential to enhance our understanding of amino acid decarboxylase evolution and the biosynthetic pathway of the plant-specialized metabolite theanine, as well as to further its potential applications in the tea industry.

    1. eLife assessment

      This comprehensive study provides valuable information on the cooperation of Ikaros with Foxp3 to establish and regulate a major portion of the epigenome and transcriptome of T-regulatory cells. However, the characterization is incomplete in that incontrovertible evidence that these are intrinsic features regulating biological function and not outcomes of the inflammatory micro-environment of the genetically manipulated mice is missing.

    2. Joint Public Review:

      This study investigates the role of Ikaros, a zinc finger family transcription factor related to Helios and Eos, in T-regulatory (Treg) cell functionality in mice. Through genome-wide association studies and chromatin accessibility studies, the authors find that Ikaros shares similar binding sites to Foxp3. Ikaros cooperates with Foxp3 to establish a major portion of the Treg epigenome and transcriptome. Ikaros-deficient Treg exhibits Th1-like gene expression with abnormal expression of IL-2, IFNg, TNFa, and factors involved in Wnt and Notch signalling. Further, two models of inflammatory/ autoimmune diseases - Inflammatory Bowel Disease (IBD) and organ transplantation - are employed to examine the functional role of Ikaros in Treg-mediated immune suppression. The authors provide a detailed analysis of the epigenome and transcriptome of Ikaros-deficient Treg cells.

      These studies establish Ikaros as a factor required in Treg for tolerance and the control of inflammatory immune responses. The data are of high quality. Overall, the study is well organized, and reports new data consolidating mechanistic aspects of Foxp3 mediated gene expression program in Treg cells.

      Strengths:<br /> The authors have performed biochemical studies focusing on mechanistic aspects of molecular functions of the Foxp3-mediated gene expression program and complemented these with functional experiments using two models of autoimmune diseases, thereby strengthening the study. The studies are comprehensive at both the cellular and molecular levels. The manuscript is well organized and presents a plethora of data regarding the transcriptomic landscape of these cells.

      Weakness:<br /> The authors claim that the mice have no pathologic signs of autoimmune disease even at a relatively old age, yet mice have an increased number of activated CD4+ T cells and T-follicular helper cells (even at the age of 6 weeks) as well as reduced naïve T-cells. Thus, immune homeostasis is perturbed in these mice even at a young age and the effect of inflammatory microenvironments on cellular functions cannot be ruled out. Further, clear conclusions from the genome-wide studies are lacking.

    1. eLife assessment

      This valuable study advances our understanding of the relationship between different mammalian ligands and receptors of the Notch signaling pathway. The authors systematically evaluate the effects of different combinations of ligands and receptors on levels of pathway activation. The convincing though not always complete data uncover interesting and unexpected differences, which provide a foundation for interpreting Notch signaling events in normal and disease contexts where this pathway operates.

    2. Reviewer #1 (Public Review):

      Summary:<br /> The Notch signaling pathway plays an important role in many developmental and disease processes. Although well-studied there remain many puzzling aspects. One is the fact that as well as activating the receptor through trans-activation, the transmembrane ligands can interact with receptors present in the same cell. These cis-interactions are usually inhibitory, but in some cases, as in the assays used here, they may also be activating. With a total of 6 ligands and 4 receptors, there is potentially a wide array of possible outcomes when different combinations are co-expressed in vivo. Here the authors set out to make a systematic analysis of the qualitative and quantitative differences in the signaling output from different receptor-ligand combinations, generating sets of "signaling" (ligand expressing) and "receiving" (receptor +/- ligand expressing cells).

      The readout of pathway activity is transcriptional, relying on the fusion of GAL4 in the intracellular part of the receptor. Positive ligand interactions result in the proteolytic release of Gal4 that turns on the expression of H2B-citrine. As an indicator of ligand and receptor expression levels, they are linked via TA to H2B mCherry and H2B mTurq expression respectively. The authors also manipulate the expression of the glycosyltransferase Lunatic-Fringe (LFng) that modifies the EGF repeats in the extracellular domains impacting their interactions. The testing of multiple ligand-receptor combinations at varying expression levels is a tour de force, with over 50 stable cell lines generated, and yields valuable insights although as a whole, the results are quite complex.

      Strengths:<br /> Taking a reductionist approach to testing systematically differences in the signaling strength, binding strength, and cis-interactions from the different ligands in the context of the Notch1 and Notch 2 receptors (they justify well the choice of players to test via this approach) produces a baseline understanding of the different properties and leads to some unexpected and interesting findings. Notably:

      - Jag1 ligand expressing cells failed to activate Notch1 receptor although were capable of activating Notch2. Conversely, Jag2 cells elicited the strongest activation of both receptors. The results with Jag1 are surprising also because it exhibits some of the strongest binding to plate-bound ligands. The failure to activate Notch1 has major functional significance and it will be important in the future to understand the mechanistic basis.

      - Jagged ligands have the strongest ciis-inhibitory effects and the receptors differ in their sensitivity to cis-inhibition by Dll ligands. These observations are in keeping with earlier in vivo and cell culture studies. More referencing of those would better place the work in context but it nicely supports and extends previous studies that were conducted in different ways.

      - Responses to most trans-activating ligands showed a degree of ultrasensitivity but this was not the case for cis-interactions where effects were more linear. This has implications for the way the two mechanisms operate and for how the signaling levels will be impacted by ligand expression levels.

      - Qualitatively similar results are obtained in a second cell line, suggesting they reflect fundamental properties of the ligands/receptors.

      Weaknesses:<br /> One weakness is that the methods used to quantify the expression of ligands and receptors rely on the co-translation of tagged nuclear H2B proteins. These may not accurately capture surface levels/correctly modified transmembrane proteins. In general, the multiple conditions tested partly compensate for the concerns - for example, as Jag1 cells do activate Notch2 even if they do not activate Notch1 some Jag1 must be getting to the surface. But even with Notch2, Jag1 activities are on the lower side, making it important to clarify, especially given the different outcomes with the plated ligands. Similarly, is the fact that all ligands "signalled strongest to Notch2" an inherent property or due to differences in surface levels of Notch 2 compared to Notch1? The results would be considerably strengthened by calibration of the ligand/receptor levels (and ideally their sub-cellular localizations). Assessing the membrane protein levels would be relatively straightforward to perform on some of the basic conditions because their ligand constructs contain Flag tags, making it plausible to relate surface protein to H2B, and there are antibodies available for Notch1 and Notch2.

      Cis-activation as a mode of signaling has only emerged from these synthetic cell culture assays raising questions about its physiological relevance. Cis-activation is only seen at the higher ligand (Dll1, Dll4) levels, how physiological are the expression levels of the ligands/receptors in these assays? Is it likely that this would make a major contribution in vivo? Is it possible that the cells convert themselves into "signaling" and "receiving" sub-populations within the culture by post-translational mechanism? Again some analysis of the ligand/receptors in the cultures would be a valuable addition to show whether or not there are major heterogeneities.

      It is hard to appreciate how much cell-to-cell variability in the "output" there is. For example, low "outputs" could arise from fewer cells becoming activated or from all cells being activated less. As presented, only the latter is considered. That may be already evident in their data, but not easy for the reader to distinguish from the way they are presented. For example, in many of the graphs, data have been processed through multiple steps of normalization. Some discussion/consideration of this point is needed.

      Impact:<br /> Overall, cataloguing the outcomes from the different ligand-receptor combinations, both in cis and trans, yields a valuable baseline for those investigating their functional roles in different contexts. There is still a long way to go before it will be possible to make a predictive model for outcomes based on expression levels, but this work gives an idea about the landscape and the complexities. This is especially important now that signaling relationships are frequently hypothesised based on single-cell transcriptomic data. The results presented here demonstrate that the relationships are not straightforward when multiple players are involved.

    3. Reviewer #2 (Public Review):

      Summary:<br /> In this manuscript, the authors extend their previous studies on trans-activation, cis-inhibition (PMID: 25255098), and cis-activation (PMID: 30628888) of the Notch pathway. Here they create a large number of cell lines using CHO-K1 and C2C12 cells expressing either Notch1-Gal4 or Notch2-Gal4 receptors which express a fluorescent protein upon receptor activation (receiver cells). For cis-inhibition and cis-activation assays, these cells were engineered to express one of the four canonical Notch ligands (Dll1, Dll4, Jag1, Jag2) under tetracycline control. Some of the receiver cells were also transfected with a Lunatic fringe (Lfng) plasmid to produce cells with a range of Lfng expression levels. Sender cells expressing all of the canonical ligands were also produced. Cells were mixed in a variety of co-culture assays to highlight trans-activation, cis-activation, and cis-inhibition. All four ligands were able to trans-activate Notch1 and Notch 2, except Jag1 did not transactivate Notch1. Lfng enhanced trans-activation of both Notch receptors by Dll1 and Dll2, and inhibited Notch1 activation by Jag2 and Notch2 activation by both Jag 1 and Jag2. Cis-expression of all four ligands was predominantly inhibitory, but Dll1 and Dll4 showed strong cis-activation of Notch2. Interestingly, cis-ligands preferentially inhibited trans-activation by the same ligand, with varying effects on other trans-ligands.

      Strengths:<br /> This represents the most comprehensive and rigorous analysis of the effects of canonical ligands on cis- and trans-activation, and cis-inhibition, of Notch1 and Notch2 in the presence or absence of Lfng so far. Studying cis-inhibition and cis-activation is difficult in vivo due to the presence of multiple Notch ligands and receptors (and Fringes) that often occur in single cells. The methods described here are a step towards generating cells expressing more complex arrays of ligands, receptors, and Fringes to better mimic in vivo effects on Notch function.

      In addition, the fact that their transactivation results with most ligands on Notch1 and 2 in the presence or absence of Lfng were largely consistent with previous publications provides confidence that the author's assays are working properly.

      Weaknesses:<br /> It was unusual that the engineered CHO cells expressing Notch1-Gal4 were not activated at all by co-culture with Jag1-expressing CHO cells. Many previous reports have shown that Jag1 can activate Notch1 in co-culture assays, including when Notch1 was expressed in CHO cells. Interestingly, when the authors used Jag1-Fc in a plate coating assay, it did activate Notch1 and could be inhibited by the expression of Lfng.

      The cell surface level of the ligands was determined by flow cytometry of a co-translated fluorescent protein. Some calibration of the actual cell surface levels with the fluorescent protein would strengthen the results.

    4. Reviewer #3 (Public Review):

      Summary:<br /> This manuscript reports a comprehensive analysis of Notch-Delta/Jagged signaling inclusive of the human Notch1 and Notch2 receptors and DLL1, DLL4, JAG1, and JAG2 ligands. Measurements encompassed signaling activity for ligand trans-activation, cis-activation, cis-inhibition, and activity modulation by Lfng. The most striking observations of the study are that JAG1 has no detectable activity as a Notch1 ligand when presented on a cell (though it does have activity when immobilized on a surface), even though it is an effective cis-inhibitor of Notch1 signaling by other ligands, and that DLL1 and DLL4 exhibit cis-activating activity for Notch1 and especially for Notch2. Notwithstanding the artificiality of the system and some of its shortcomings, the results should nevertheless be a valuable resource for the Notch signaling community.

      Strengths:<br /> 1) The work is systematic and comprehensive, addressing questions that are of importance to the community of researchers investigating mammalian Notch proteins, their activation by ligands, and the modulation of ligand activity by LFng.<br /> 2) A quantitative and thorough analysis of the data is presented.

      Weaknesses:<br /> 1) The manuscript is primarily descriptive and does not delve into the underlying, mechanistic origin or source of the different ligand activities.

      2) The amount of ligand or receptor expressed is inferred from the flow cytometry signal of a co-translated fluorescent protein-histone fusion, and is not directly measured. The work would be more compelling if the amount of ligand present on the cell surface were directly measured with anti-ligand antibodies, rather than inferred from measurements of the fluorescent protein-histone fusion.

      3) It would be helpful to see plots of the raw activity data before transformation and normalization, because the plots present data after several processing steps, and it is not clear how the processed data relate to the original values determined in each measurement.

      4) The authors use sparse plating of engineered cells with parental (no ligand or receptor-expressing cell to measure cis activation). However, the cells divide within the cultured period of 22-24 h and can potentially trans-activate each other.

    1. eLife assessment

      In this valuable study the authors propose a new regulatory role for one the most abundant circRNAs, circHIPK3, mediated by the RNA binding protein IGF2BP2. While the study presents interesting and largely solid evidence, part of the work is incomplete, requiring additional controls to more robustly support the major claims. The work would also benefit from further discussion addressing the apparently contradictory effects of circHIPK3 and STAT3 depletion in cancer progression.

    2. Reviewer #1 (Public Review):

      In this work the authors propose a new regulatory role for one the most abundant circRNAs, circHIPK3, by showing that it interacts with an RNA binding protein (IGF2BP2) and, by sequestering it, it regulates the expression of hundreds of genes containing a sequence (11-mer motif) in their untranslated regions (3'-UTR). This sequence is also present in circHIPK3, precisely where IGF2BP2 binds. The study further focuses on one specific case, the STAT3 gene, whose mRNA product is downregulated upon circHIPK3 depletion apparently through sequestering IGF2BP2, which otherwise binds to and stabilizes STAT3 mRNA. The study presents mechanistic insight into the interactions, sequence motifs, and stoichiometries of the molecules involved in this new mode of regulation. Altogether, this new mechanism seems to underlie the effects of circHIPK3 in cancer progression.

      Strengths:<br /> The authors show mechanistic insight into a proposed novel "sponging" function of circHIPK3 which is not mediated by sequestering miRNAs but rather by a specific RNA binding protein (IGF2BP2). They address the stoichiometry of the molecules involved in the interaction, which is a critical aspect that is frequently overlooked in this type of study. They provide both genome-wide analysis and a specific case (STAT3) that is relevant for cancer progression.

      Weaknesses:<br /> One of the central conclusions of the manuscript, namely that circHIPK3 sequesters IGF2BP2 and thereby regulates target mRNAs, lacks more direct experimental evidence such as rescue experiments where both species are simultaneously knocked down. CircRNA overexpression lacks a demonstration of circularization efficiencies. There seem to be contradictory effects of circHIPK3 and STAT3 depletion in cancer progression, namely that while circHIPK3 is frequently downregulated in cancer, circHIPK3 downregulation in this study leads to downregulation of STAT3. This does not seem to fit the fact that STAT3 is normally activated in a wide diversity of cancers and is positively associated with cell proliferation. The result is neither consistent with the fact that circHIPK3 expression positively correlates with good clinical outcomes. Overall, the authors have achieved some of their aims but additional controls would be advisable to fully support their conclusions.

    3. Reviewer #2 (Public Review):

      The manuscript by Okholm and colleagues identified an interesting new instance of ceRNA involving a circular RNA. The data are clearly presented and support the conclusions. Quantification of the copy number of circRNA and quantification of the protein were performed, and this is important to support the ceRNA mechanism.

    4. Reviewer #3 (Public Review):

      In Okholm et al., the authors evaluate the functional impact of circHIPK3 in bladder cancer cells. By knocking it down and performing an RNA-seq analysis, the authors found thousands of deregulated genes that look unaffected by miRNAs sponging function and that are, instead, enriched for an 11-mer motif. Further investigations showed that the 11-mer motif is shared with the circHIPK3 and able to bind the IGF2BP2 protein. The authors validated the binding of IGF2BP2 and demonstrated that IGF2BP2 KD antagonizes the effect of circHIPK3 KD and leads to the upregulation of genes containing the 11-mer. Among the genes affected by circHIPK3 KD and IGF2BP2 KD (resulting in downregulation and upregulation, respectively) the authors found the STAT3 gene. This was accompanied by consistent concomitant upregulation of one of its targets, TP53. The authors propose a mechanism of competition between circHIPK3 and IGF2BP2 triggered by IGF2BP2 nucleation, potentially via phase separation.

      Strengths:<br /> The number of circRNAs continues to drastically grow; however, the field lacks detailed molecular investigations. The presented work critically addresses some of the major pitfalls in the field of circRNAs and there has been a careful analysis of aspects frequently poorly investigated. The time-point KD followed by RNA-seq, investigation of the miRNAs-sponge function of circHIPK3, identification of 11-mer motif, identification, and validation of IGF2BP2, and the analysis of copy number ratio between circHIPK3 and IGF2BP2 in assessing the potential ceRNA mode of action have been extensively explored and, comprehensively are convincing.

      Weaknesses:<br /> In some parts, the manuscript lacks appropriate internal controls (eg: comparison with normal bladder cells, linear transcript measurements upon the KD, RIP internal controls/ WB analysis, etc), statistical analysis and significance (in some qPCRs), exhaustive description in the methods of microscopy and image analysis, western blot, and a separate section of cell lines used. The use of certain cell lines bladder cancer cells vs non-bladder cells in some experiments for the purpose of the study is also unclear.

      Overall, the presented study adds new knowledge in describing circHIPK3 function, its capability to regulate some downstream genes and its interaction and competition for IGF2BP2. However, whereas the experimental part appears technically logical, it remains unclear the overall goal of this study and the final conclusions. The mechanism of condensation proposed, although interesting and encouraging, would need further experimental support and information, especially in the context of cancer.

      In summary, this study is a promising step forward in the comprehension of the functional role of circHIPK3. These data could possibly help to better understand the circHIPK3 role in cancer.

    1. eLife assessment

      This is an important study describing the function of Laminin γ1-dependent basement membranes in the development of the olfactory placode, including morphogenesis of the placode, boundary formation, and olfactory axonal pathfinding. The study uses elegant live imaging approaches, and detailed mutant analyses to provide a convincing description of the role of Laminin in olfactory placode development, although the mechanisms by which Laminin γ1 regulates these processes are not conclusive. In addition to the contributions this study makes to understanding olfactory placode development, it will also be of broader interest to individuals interested in extracellular matrix regulation of tissue morphogenesis, and neural development including neuronal pathfinding.

    2. Reviewer #1 (Public Review):

      Summary:<br /> The authors describe the dynamic distribution of laminin in the olfactory system and forebrain. Using immunohistochemistry and transgenic lines, they found that the olfactory system and adjacent brain tissues are enveloped by BMs from the earliest stages of olfactory system assembly. They also found that laminin deposits follow the axonal trajectory of axons. They performed a functional analysis of the sly mutant to analyse the function of laminin γ1 in the development of the zebrafish olfactory system. Their study revealed that laminin enables the shape and position of placodes to be maintained late in the face of major morphogenetic movements in the brain, and its absence promotes the local entry of sensory axons into the brain and their navigation towards the olfactory bulb.

      Strengths:<br /> -They showed that in the sly mutants, no BM staining of laminin and Nidogen could be detected around the OP and the brain. The authors then elegantly used electron microscopy to analyse the ultrastructure of the border between the OP and the brain in control and sly mutant conditions.<br /> -To analyse the role of laminin γ1-dependent BMs in OP coalescence, the authors used the cluster size of Tg(neurog1:GFP)+ OP cells at 22 hpf as a marker. They found that the mediolateral dimension increased specifically in the mutants. However, proliferation did not seem to be affected, although apoptosis appeared to increase slightly at a later stage. This increase could therefore be due to a dispersal of cells in the OP. To test this hypothesis, the authors then analysed the cell trajectories and extracted 3D mean square displacements (MSD), a measure of the volume explored by a cell in a given period of time. Their conclusion indicates that although brain cell movements are increased in the absence of BM during coalescence phases, overall OP cell movements occur within normal parameters and allow OPs to condense into compact neuronal clusters in sly mutants. The authors also analysed the dimensions of the clusters composed of OMP+ neurons. Their results show an increase in cluster size along the dorso-ventral axis. These results were to be expected since, compared with BM, early neurog1+ neurons should compact along the medio-lateral axis, and those that are OMP+ essentially along the dorso-ventral axis. In addition to the DV elongation of OP tissue, the authors show the existence of isolated and ectopic (misplaced) YFP+ cells in sly mutants.<br /> -To understand the origin of these phenotypes, the authors analysed the dynamic behaviour of brain cells and OPs during forebrain flexion. The authors then quantitatively measured brain versus OPs in the sly mutant and found that the OP-brain boundary was poorly defined in the sly mutant compared with the control. Once again, the methods (cell tracks, brain size, and proliferation/apoptosis, and the shape of the brain/OP boundary) are elegant but the results were expected.<br /> -They then analysed the dynamic behaviour of the axon using live imaging. Thus, olfactory axon migration is drastically impaired in sly mutants, demonstrating that Laminin γ1-dependent BMs are essential for the growth and navigation of axons from the OP to the olfactory bulb.<br /> -The authors therefore performed a quantitative analysis of the loss of function of Laminin γ1. They propose that the BM of the OP prevents its deformation in response to mechanical forces generated by morphogenetic movements of the neighbouring brain.

      Weaknesses:<br /> - The authors did not analyse neurog1 + axonal migration at the level of the single cell and instead made a global analysis. An analysis at the cell level would strengthen their hypotheses.<br /> - Rescue experiments by locally inducing Laminin expression would have strengthened the paper.<br /> -The paper lacks clarity between the two neuronal populations described (early EONs and late OSNs).<br /> -The authors quantitatively measured brain versus OPs in the sly mutant and found that the OP-brain boundary was poorly defined in the sly mutant compared with the control. Once again, the methods (cell tracks, brain size, proliferation/apoptosis, and the shape of the brain/OP boundary) are elegant but the results were expected.<br /> - A missing point in the paper is the effect of Laminin γ1 on the migration of cranial NCCs that interact with OP cells. The authors could have analysed the dynamic distribution of neural crest cells in the sly mutant.

    3. Reviewer #2 (Public Review):

      Summary:<br /> This manuscript addresses the role of the extracellular matrix in olfactory development. Despite the importance of these extracellular structures, the specific roles and activities of matrix molecules are still poorly understood. Here, the authors combine live imaging and genetics to examine the role of laminin gamma 1 in multiple steps of olfactory development. The work comprises a descriptive but carefully executed, quantitative assessment of the olfactory phenotypes resulting from loss of laminin gamma. Overall, this is a constructive advance in our understanding of extracellular matrix contributions to olfactory development, with a well-written Discussion with relevance to many other systems.

      Strengths:<br /> The strengths of the manuscript are in the approaches: the authors have combined live imaging, careful quantitative analyses, and molecular genetics. The work presented takes advantage of many zebrafish tools including mutants and transgenics to directly visualize the laminin extracellular matrix in living embryos during the developmental process.

      Weaknesses:<br /> The weaknesses are primarily in the presentation of some of the imaging data. In certain cases, it was not straightforward to evaluate the authors' interpretations and conclusions based on the single confocal sections included in the manuscript. For example, it was difficult to assess the authors' interpretation of when and how laminin openings arise around the olfactory placode and brain during olfactory axon guidance.

    4. Reviewer #3 (Public Review):

      This is a beautifully presented paper combining live imaging and analysis of mutant phenotypes to elucidate the role of laminin γ1-dependent basement membranes in the development of the zebrafish olfactory placode. The work is clearly illustrated and carefully quantified throughout. There are some very interesting observations based on the analysis of wild-type, laminin γ1, and foxd3 mutant embryos. The authors demonstrate the importance of a Laminin γ1-dependent basement membrane in olfactory placode morphogenesis, and in establishing and maintaining both boundaries and neuronal connections between the brain and the olfactory system. There are some very interesting observations, including the identification of different mechanisms for axons to cross basement membranes, either by taking advantage of incompletely formed membranes at early stages, or by actively perforating the membrane at later ones.

      This is a valuable and important study but remains quite descriptive. In some cases, hypotheses for mechanisms are stated but are not tested further. For example, the authors propose that olfactory axons must actively disrupt a basement membrane to enter the brain and suggest alternative putative mechanisms for this, but these are not tested experimentally. In addition, the authors propose that the basement membrane of the olfactory placode acts to resist mechanical forces generated by the morphogenetic movement of the developing brain, and thus to prevent passive deformation of the placode, but this is not tested anywhere, for example by preventing or altering the brain movements in the laminin γ1 mutant.

    1. eLife assessment

      This important study convincingly shows that the less common D-serine stereoisomer is transported in the kidney by the neutral amino acid transporter ASCT2 and that it is a non-canonical substrate for sodium-coupled monocarboxylate transporter SMCTs. With a multi-hierarchical approach, this important study further shows that Ischemia-Reperfusion Injury in the kidney causes a specific increment in renal reabsorption carried out, in part, by ASCT2.

    2. Reviewer #1 (Public Review):

      Most amino acids are stereoisomers in the L-enantiomer, but natural D-serine has also been detected in mammals and its levels shown to be connected to a number of different pathologies. Here, the authors convincingly show that D-serine is transported in the kidney by the neutral amino acid transporter ASCT2 and as a non-canonical substrate for the sodium-coupled monocarboxylate transporter SMCTs. Although both transport D-serine, this important study further shows in a mouse model for acute kidney injury that ASCT2 has the dominant role.

      Strengths:<br /> The paper combines proteomics, animal models, ex vivo transport analyses, and in vitro transport assays using purified components. The exhaustive methods employed provide compelling evidence that both transporters can translocate D-serine in the kidney.

      Weakness:<br /> In the model for acute kidney injury, the SMCTs proteins were not showing a significant change in expression levels and were rather analysed based on other, circumstantial evidence. Although its clear SMCTs can transport D-serine its physiological role is less obvious compared to ASCT2.

    3. Reviewer #2 (Public Review):

      Summary:<br /> The manuscript "A multi-hierarchical approach reveals D-1 serine as a hidden substrate of sodium-coupled monocarboxylate transporters" by Wiriyasermkul et al. is a resubmission of a manuscript, which focused first on the proteomic analysis of apical membrane isolated from mouse kidney with early Ischemia-Reperfusion Injury (IRI), a well-known acute kidney injury (AKI) model. In the second part, the transport of D-serine by Asct2, Smct1, and Smct2 has been characterized in detail in different model systems, such as transfected cells and proteoliposomes.

      Strengths:<br /> A major problem with the first submission was the explanation of the link between the two parts of the manuscript: it was not very clear why the focus on Asct2, Smct1, and Smct2 was a consequence of the proteomic analysis. In the present version of the manuscript, the authors have focused on the expression of membrane transporters in the proteome analysis, thus making the reason for studying Asct2, Smct1, and Smct2 transporters more clear. In addition, the authors used 2D-HPLC to measure plasma and urinary enantiomers of 20 amino acids in plasma and urine samples from sham and Ischemia-Reperfusion Injury (IRI) mice. The results of this analysis demonstrated the value of D-serine as a potential marker of renal injury. These changes have greatly improved the manuscript and made it more convincing.

    4. Reviewer #3 (Public Review):

      Summary:<br /> The main objective of this work has been to delve into the mechanisms underlying the increment of D-serine in serum, as a marker of renal injury.

      Strengths:<br /> With a multi-hierarchical approach, the work shows that Ischemia-Reperfusion Injury in the kidney causes a specific increment in renal reabsorption of D-serine that, at least in part, is due to the increased expression of the apical transporter ASCT2. In this way, the authors revealed that SMCT1 also transports D-serine.

      The manuscript also supports that increased expression of ASCT2, even together with the parallel decreased expression of SMCT1, in renal proximal tubules underlies the increased reabsorption of D-serine responsible for the increment of this enantiomer in serum in a murine model of Ischemia-Reperfusion Injury.

      Weaknesses:<br /> Remains to be clarified whether ASCT2 has substantial stereospecificity in favor of D- versus L-serine to sustain a ~10-fold decrease in the ratio D-serine/L-serine in the urine of mice under Ischemia-Reperfusion Injury (IRI).<br /> It is not clear how the increment in the expression of ASCT2, in parallel with the decreased expression of SMCT1, results in increased renal reabsorption of D-serine in IRI.

    1. eLife assessment

      Antibodies are some of the most critical tools in biomedical research. However, their quality and specificity vary significantly. This fundamental study provides guidelines for how the quality of an antibody should be assessed and recorded and provides compelling data on the selected antibodies. This paper will be of interest to researchers working in experimental cell biology.

    2. Author Response

      Reviewer #1:

      We thank Reviewer #1 for their review of our manuscript.

      Reviewer #1, comment #1: “The authors of this manuscript are from the Canadian, public interest open-science company YCharos.”.

      It is important to state that none of the authors work for YCharOS. The YCharOS company has created an open ecosystem consisting of antibody manufacturers, knockout cell lines providers, academics, granting agencies and publishers. The Antibody Characterization Group (participating authors are affiliated to the Department of Neurology and Neurosurgery, Structural Genomics Consortium, The Montreal Neurological Institute, McGill University) works in collaboration with YCharOS to have access to commercial antibodies and knockout cell lines donated by YCharOS’ manufacturer partners.

      Reviewer #1, comment #2: In regard to ZENODO antibody characterization reports prepared by this group, Reviewer #1 wrote: “While the results are convincing, they could be more accessible. In the current format, researchers have to download reports for each target and look through all images to identify the most useful antibodies from the images. The reports I reviewed did not draw conclusions on performance. A searchable database that returns validated antibodies for each application seems necessary.”

      After careful consideration and consultation with YCharOS industry partners, we decided not to rate the performance of the antibodies tested. It was determined that antibody selection is best left to the user, who should analyze all parameters, including the type of antibody to be chosen (recombinant-monoclonal, recombinant-polyclonal, monoclonal), the species used to generate the antibody, the species predicted to react with the antibody, performance in a specific application, antigen sequences, and antibody cost.

      Reviewer #1, comment #3: “A key question is to what extent off-target binding was predictable from the WBs provided by the manufacturers. Thus, how often did the authors find multiple bands when the catalogue image showed a single band and vice versa?”

      In many cases, the antibodies were tested on cell lines other than those used by the manufacturers. Given that protein expression is specific to each line, we can't answer this question properly.

      Reviewer #1, comment #4: “Cross-reactive proteins will generally not be detected when blots are stained with an antibody reactive with a different epitope than the one used for IP. Possible solutions to overcome this limitation such as the use of mass spectrometry as readout should be discussed (Nature Methods volume 12, pages 725- 731 (2015)”.

      Our protocols only inform whether an antibody can capture the intended target, without any evaluation of the extend to the capture of unwanted, cross-reactive proteins. Thus, our data can only be used to aid in selection of the best performing antibodies for IP – our data does not inform profiling of non-specific interactions.

      IP/mass spec is an excellent approach for evaluating antibody performance for IP, and authors on this manuscript are experts in proteomics and recognize the importance of this methodology. We have considered implementing IP/mass in our platform. However, there are limitations, such as the cost of the approach and the difficulty of detecting smaller proteins or proteins with a certain amino acid composition (high presence of Cys, Arg or Lys). Fundamentally, we have decided to focus on throughput relative to details in this regard.

      Reviewer #1, comment #5: “Performance in immunofluorescence microscopy was performed on cells that were fixed in 4% paraformaldehyde and then permeabilized with 0.1% Triton-X100. It seems reasonable to assume that this treatment mainly yields folded proteins wherein some epitopes are masked due to cross-linking. The expectation is therefore that results from IP are more predictive for on-target binding in IF than are WB results (Nature Methods volume 12, pages725-731 (2015). It is therefore surprising that IP and WB were found to have similar predictive value for performance in IF (supplemental Fig. 3). It would be useful to know if failure in IF was defined as lack of signal, lack of specificity (i.e. off-target binding) or both. Again, it is important to note the IP/western protocol used here does not test for specificity.”

      The assessment of antibody performance is biased by how antibodies were originally tested by suppliers. Manufacturers primarily validate their antibody by WB. Thus, most antibodies immunodetect their intended target for WB. Thus, in retrospect, we tested a biased pool of antibodies that detect linear epitopes. Still, we observed that a large cohort of antibodies show specificity for their target across all three applications or for specific combinations of applications. This slightly challenges the idea that antibodies are fit-for-purpose reagents and can recognize either linear or native epitopes - a significant number of antibodies can specifically detect both types of epitope.

      Reviewer #1, comment #6: “The authors report that recombinant antibodies perform better than standard monoclonals/mAbs or polyclonal antibodies. Again, a key question is to what extent this was predictable from the validation data provided by the manufacturers. It seems possible that the recombinant antibodies submitted by the manufacturers had undergone more extensive validation than standard mAbs and polyclonals”.

      Our antibody manufacturing partners indicated that the recombinant antibodies are more recent products and have been more extensively characterized relative to standard polyclonal or monoclonal antibodies.

      The main message is that recombinant antibodies can be used in all applications once validated. Although recombinant antibodies are available for many proteins, the scientific community is not adopting these renewable regents as we believe it should. We hope that the data provided will encourage scientists to adopt recombinant technologies when available to improve research reproducibility.

      Reviewer #1, comment #7: “Overall, the manuscript describes a landmark effort for systematic validation of research antibodies. The results are of great importance for the very large number of researchers who use antibodies in their research. The main limitations are the high cost and low throughput. While thorough testing of 614 antibodies is impressive and important, the feasibility of testing hundreds of thousands of antibodies on the market should be discussed in more detail.”

      We thank the reviewer for this comment. One of our challenges is to increase the platform's throughput to succeed in our mission to characterize antibodies for all human gene products. We will continue to test antibodies using protocols agreed upon with our partners, commonly used in the laboratory, to ensure that ZENODO reports can serve as a guide to the wider community.

      In terms of development our marketing efforts have been substantially accelerated by our new partnership with the journal F1000. We have begun to convert our reports into peer-reviewed papers (20 ZENODO reports were converted into F1000 articles). This conversion allows researchers to find our work via PubMed, and easily cite any study. Producing peer-reviewed articles also further enhances the credibility of our research and our project as a whole: https://f1000research.com/ycharos

      Colleagues have published a letter to Nature explaining the problem and our technology platform: (Kahn, et al., Nature, 2023, DOI: https://doi.org/10.1038/d41586-023-02566-w).

      This project has been presented worldwide, with a presence at major antibody conferences, such as the annual Antibody Validation meeting in Bath (PSM attended the meeting in September 2023). The authors are organizing a sponsored mini-symposium on antibody validation at the next American Society for Cell Biology (ASCB) meeting in December 2023 (Boston, USA): https://plan.core- apps.com/ascbembo2023/event/6fb928f06b0d672e088c6fa88e4d77fb

      Colleagues have prepared petitions addressed to various governmental organizations (US, Canada, UK) to support characterization and validation of renewable antibodies: https://www.thesgc.org/news/support- characterization-and-validation-renewable-antibodies.

      Reviewer #2

      We thank Reviewer #2 for the review of the antibody characterization reports we have uploaded to ZENODO. A manuscript describing the full standard operating procedures of the platform, which has been used in all reports is in preparation, and should be available on a preprint server before the end of the year. Our protocols were reviewed and approved by each of YCharOS' manufacturer partners. Moreover, a recent editorial describes the platform used here and gives advice on how to interpret the data: https://doi.org/10.12688/f1000research.141719.1)

      Reviewer #2, comment #1: “A discussion of how the working concentrations of antibodies are selected and validated is required. Based on the dilutions described in the reports, it seems that dilutions suggested by the manufacturer were used - For LRRK2 it seems that antibody concentrations ranging from 0.06 to over 5 µg/ml for WB were used. Often commercial antibody comes in a BSA-containing buffer making it hard to validate the concentration of the antibody claimed by the manufacturer”.

      The concentration recommended by the manufacturer is our starting point. For WB, when the signal is at the level of detectability, we will repeat with a ~5-10 fold increase in antibody concentration. For >80% of the antibody tested, the use of the recommended concentration led to the detection of bands (specific or not to the target protein).

      Reviewer #2, comment #2: “In the authors' experience are the manufacturer's concentrations reliable? Additionally, if the information regarding applications provided by the manufacturers is unreliable how do the authors suggest working concentrations for antibodies to be assessed”?

      We do not evaluate the concentration of antibodies internally. In the immunoprecipitation experiments, we use 2.0 µg of antibody for each IP, based on the concentration provided by the manufacturers. On Ponceau staining of membranes, we can observe the heavy and light chains of the primary antibodies used, giving an indication of the amount of antibodies added to the cell lysate. In most cases, the intensity of the heavy and light chains is comparable.

      Reviewer #2, comment #3: “We understand that it would not be feasible to test every antibody at different concentrations, but this is an issue that should at least be mentioned. An antibody might be put in the wrong performance category solely because of the wrong concentration being used. Ie if an excellent antibody is used at too high a concentration, it may detect non-specific proteins that are not seen at lower dilutions where the antibody still picks up the desired antigen well”.

      We agree with Reviewer #2, we do not use an optimal concentration for all tested antibodies. As mentioned previously, the concentration recommended by the manufacturer is our starting point. By testing multiple antibodies side-by-side against a single target protein, we can generally identify one or more specific and selective antibodies. We leave it to users of our reports to optimize the antibody concentration to suit their experimental needs.

      Reviewer #2, comment #4: “Do the authors check different WB conditions ie 2h primary antibody with BSA or milk vs. overnight at 4 degrees with BSA or Milk”?

      All primary antibodies are always tested in milk overnight at 4 degrees. The overnight incubation is convenient in the timeline of the protocol. All protocols were agreed upon after careful consultation with our partners.

      Reviewer #2, comment #5: “Do the authors provide detailed WB protocols that include the description of the electrophoresis and type of gels used, transfer buffer and transfer method and time used, and conditions for all the primary and secondary blotting including times, buffers and dilutions of all antibodies and other reagents”?

      This information is included in all ZENODO reports.

      Reviewer #2, comment #6: “Do the authors discuss detection approaches- we have noticed for some antibodies there are significant different results using LICOR, ECL and other detection methods, with certain especially weaker antibodies preferring ECL-based methods”.

      We only use ECL-based methods.

      Reviewer #2, comment #7: “For IPs the amount of antibody needed can also vary-for some we can use 1 microgram or less, but for others, we need 5 to 10 micrograms. The amount of antibody needed to get maximal IP should be stated”.

      We use 2.0 ug of antibodies and we have found this to be adequate for lower abundance proteins (e.g. Parkin - https://zenodo.org/records/5747356) and higher abundance proteins (e.g. PRDX6 - https://zenodo.org/records/4730953). Abundance is based on PaxDb.com. For Parkin and PRDX6, we were able to enrich the expected target in the IP and observe depletion in the unbound fraction. Optimization of the IP conditions is left to the antibody users.

      Reviewer #2, comment #8: “Doing IPs with commercial antibodies can be very expensive or infeasible if many micrograms are needed especially if only packages of 10 micrograms for several hundred dollars are provided”.

      This is a major advantage of the side-by-side comparison: the reader is free to choose between high-performance antibodies from different manufacturers, with varying antibody costs. We also work in partnership with the Developmental Studies Hybridoma Band (DSHB), which supplies antibodies on a cost recovery basis.

      Reviewer #2, comment #9: “For IPs it is important to determine the percentage of antigen that is depleted from the supernatant for each IP. We think that this should be calculated and recorded in the Zenodo data. Some antibodies will only IP 10% of antigen whereas others may do 50% and others 80-90%. One rarely sees 100% depletion. For IPs the buffer detergent and salt concentration might also strongly influence the degree of IP and therefore these should be clearly stated”.

      In Box 1, we define criteria of success. For IP, “under the conditions used, a successful primary antibody immunocaptures the target protein to at least 10% of the starting material”. Colleagues have written an editorial on how to interpret and analyze antibody performance https://f1000research.com/articles/12-1344).

      The cell lysis buffer is a critical reagent when considering IP experiments. We use a commercial buffer consisting of 25 mM Tris-HCl pH 7.4, 150 mM NaCl, 1 mM EDTA, 1% NP-40 and 5% glycerol (Thermo Fisher, cat. #87787). This buffer is efficient to extract the target proteins we have studied thus far.

      Reviewer #2, comment #10: “Whether antibodies cross-react with human, mouse and other species of antigens is always a major question. It is always good to test human and mouse cell lines if possible. If antibodies cross-react in WB, in the authors' experience will they also cross-react for IF and IP”?

      The authors started this initiative by focusing on the 20,000 human proteins, defining an end point. We and our collaborators found that most of the cherry-picked selective antibodies for WB for human proteins, which manufacturers claim react with the murine version of the target proteins, were selective for murine tissue lysates.

      Indeed, poorly performing antibodies in WB mostly failed IF and IP. However, selective antibodies for IF or specific for IP were generally (>90%) selective for WB.

      Reviewer #2, comment #11: “Cell lines express proteins at vastly different levels and it is possible that the selected cell line does not express the antigen or expresses it at very low levels - this could be a reason for wrongly assessing an antibody not working. It would be useful to use cell lines in which MS data has defined the copy number of protein per cell and this figure could be included in the antibody data if available. This MS data is available for the vast majority of commonly used cells”.

      We agree with Reviewer #2 that MS data are useful for target protein selection. At the moment, our approach using transcriptomic data provided on DepMap.org proved to be a successful mechanism for cell line selection. We have identified a specific antibody for WB for each target, enabling the validation of expression in the cell line selected.

      For some protein targets, the parental line corresponding to the only commercial or academic knockout line available has weak protein expression. We thus needed to generate a KO clone in a second cell line background with high expression, and indeed found that some antibodies which failed in the first commercial line were successful in the new higher-expressing line (e.g CHCHD10 - https://zenodo.org/records/5259992).

      Reviewer #2, comment #12: “Some proteins are glycosylated, ubiquitylated or degraded rapidly making them hard to see in WB analysis”.

      We used the full gel/membrane length when analyzing antibody performance by WB. Indeed, proteins can show different isoforms and molecular weights compared to that based on amino acid sequence (e.g. SLC19A1 -https://zenodo.org/records/7324605).

      Reviewer #2, comment # 13: “We have occasionally had proteins that appear unstable when heated with SDS- sample buffer before WB. For these, we still use SDS-Sample buffer but omit the heating step. I often wonder how necessary the heating step is”.

      For WB, samples are heated to 65 degrees, then spun to remove any precipitate.

      Reviewer #2, comment # 14: “For IF the methods by which cells are fixed and stained, and the microscope and settings, can significantly influence the final result. It would be important to carefully record all the methods and the microscope used”.

      We agree with Reviewer #2 that many parameters influence antibody performance for imaging purposes. We are progressively implementing the OMERO software to monitor any experimental parameters and information (metadata) about the microscope itself.

      Reviewer #2, comment # 15: “How do the authors recommend antibodies are stored? These should be very stable, but I have had reports from the lab that some antibodies become less good when stored and others that recommend storing at 4 degrees”.

      Antibodies are aliquoted to avoid freeze-thaw cycles and stored at -20 degrees. If it is recommended to store antibodies at 4 degrees, we add glycerol to a final concentration of 50% and store them at -20 degrees.

      Reviewer #2, comment # 16: “Would other researchers not part of the authors' team, be able to add their own data to this database validating or de-validating antibodies? This would rapidly increase the number of antibodies for which useful data would be available for. It would be nice to greatly expand the number of antibodies being used in research and this is not feasible for a single team to undertake”.

      Yes! We believe that only a community effort can resolve the antibody liability crisis. We partner with the Antibody Registry (antibodyregistry.org - led by co-author Anita Bandrowski). In the Registry, each antibody is labelled with a unique identifier, and third-party validation information can be easily tagged to any antibody. Antibody users are invited to upload information about an antibody they have characterized into the Registry.

    1. eLife assessment

      This is an important study on how behavioral context affects decision making in the nematode C. elegans. Behavioral analyses at multiple time scales combined with genetic and neuronal manipulations revealed how arousal states affect decision making. The results and interpretations are convincing. This work will be of interest to both neuroscientists and ecologists.

    2. Author Response

      In this paper, we examine the behavioral context that generates foraging decisions at the boundaries of food patches in the nematode C. elegans. By analyzing animal locomotion at high spatial and temporal resolution, we identify discrete behavioral responses to encountering the edge of a food patch that can be understood as a decision: either to remain inside the food patch or to leave it. We find that the decision to leave a food patch is associated with increased behavioral arousal that unfolds on long and short timescales. The coupling of increased arousal to lawn leaving decisions is preserved across genetic, neuronal, and environmental manipulations that alter global arousal levels. However, genetic inactivation of a set of chemosensory neurons disrupts the coupling of arousal and lawn leaving, revealing a potential site of integration between internal signals and external sensation that governs foraging.

      We appreciate the reviewers’ thoughtful engagement with this work. In addition to modifications in the text to address minor concerns and ambiguities, we have conducted new analyses and made text and figure edits to strengthen or explain our conclusions. We have also investigated possible confounding explanations to our interpretation of the data.

      In newly added analysis, we show that increased arousal does not result in increased proximity to the lawn boundary, which would be a trivial reason why roaming animals leave more than dwelling ones (new Figure 2-Supplement 1E).

      We also addressed the concern that classifying the brief speed acceleration motif as a roaming state would inflate the apparent coupling of roaming to leaving. By measuring the duration of roaming states prior to leaving, we in fact found the opposite: roaming states that precede leaving are slightly longer than other roaming states, not short acceleration events (new Figure 2-Supplement 4).

      The reviewers also asked reasonable questions about variability between batches of experiments. In particular, reviewers pointed out high levels of roaming in wild type controls accompanying npr-1 mutants. Indeed, the simultaneously-tested wild type animals roamed more than usual in this experiment (Fig. 4C,K) and less than usual in other panels (Fig. 4A,B,I,J) in these small datasets. There is more to do here, but the results support the general point that roaming and leaving are correlated in several neuromodulatory mutants that regulate roaming. We have included a new sentence in the Figure 4 legend to draw the reader’s attention to the potential limitations of these results, and to explicitly state that results should not be compared across panels. Similarly, there is more to be done to understand tax-4, as we did not test all tax-4-expressing sensory neurons for their effects on roaming and leaving.

      In private comments, reviewers also asked about experimental design and statistics and were concerned that certain assays conducted on just a few days may not represent independent experiments. We have updated the Methods section to improve the description of the behavioral experiments, including more information about the behavioral chambers and imaging conditions. We note that for all experiments we tested all relevant genotypes in the same batches and days, enabling comparisons of experimental animals with matched controls conducted at the same time.

      Reviewers asked us to compare our results to those generated by Rhoades, et al. (2019) and Cermak, et al. (2020). To the best of our knowledge, our results are fully consistent with those studies. The study by Rhoades and co-authors is primarily concerned with behavioral slowing upon first encountering a food patch, and thus does not include data regarding roaming or lawn leaving (Rhoades et al., 2019). As we mention in the text, we were initially surprised that tph-1 did not eliminate regulation of roaming by feeding, but there are straightforward explanations (redundant transmitters, other neurons). tph-1 did have a significant, albeit small, effect. The study by Cermak and co-authors presents an alternative Hidden Markov Model that uses whole animal postures to segment on-food behavior into 9 states including 8 dwelling states and a single roaming state (Cermak et al., 2020); we refer to this analysis in the discussion. Cermak’s paper and ours differ in experimental conditions, the behaviors measured, and the models used to analyze them. The animals in the Cermak paper are exposed to a large bacterial lawn of uniform density, whereas animals in our study are recorded on small bacterial lawns with thick edges. The analysis tools also differ in their use of animal posture (Cermak only) and autoregressive dynamics (our work only). Further studies of the neurons and molecules involved may help to fully harmonize these models.

      References

      Cermak, N., Yu, S.K., Clark, R., Huang, Y.C., Baskoylu, S.N., and Flavell, S.W. (2020). Whole-organism behavioral profiling reveals a role for dopamine in statedependent motor program coupling in C. Elegans. Elife 9, 1–34.

      Rhoades, J.L., Nelson, J.C., Nwabudike, I., Yu, S.K., McLachlan, I.G., Madan, G.K., Abebe, E., Powers, J.R., Colón-Ramos, D.A., and Flavell, S.W. (2019). ASICs Mediate Food Responses in an Enteric Serotonergic Neuron that Controls Foraging Behaviors. Cell 176, 85-97.e14.

    3. Reviewer #1 (Public Review):

      Genetic, physiological, and environmental manipulations that increase roaming increase leaving rates. The connection between increased roaming and increased leaving is lost when tax4-expressing sensory neurons are inactivated. This study is conceptually important in its characterization of worm behaviors as time-series of discrete states, a promising framework for understanding behavioral decisions as algorithms that govern state transitions. This framework is well-established in other animals, but relatively new to worms.

      A key discovery is that lawn leaving behavior is probabilistically favored in states of behavioral arousal. I like the use of response-triggered averages (triggered on leaving events) that illustrate a "state-dependent receptive field" of the behavioral response. Response-triggered averages are common in sensory neuroscience, used, for example, to characterize the diverse "stimulus-dependent receptive fields" of different retinal ganglion cell types. It's nice to adapt the idea to illustrate the state-dependence of behavioral state transitions.

      The simplest metric of arousal state is crawling speed. When animals crawl faster, they are more likely to leave lawns. A more sophisticated metric of behavioral context is whether the animal is in a "roaming" or "dwelling" state, two-state HMM modeling from previous work (Flavell et al., 2013). Roaming animals are more likely to leave lawns than dwelling animals. Different autoregressive HMM tools can segment worm behavior into 4-states. Also with ARHMMs, the most aroused state is again the state that promotes lawn-leaving. HMM analysis disentangles effects that were lumped by the simpler metric of overall speed.

      The authors use diverse environmental, genetic, and optogenetic perturbations to regulate the roaming state, thereby regulating the statistics of leaving in the expected manner. One surprise is that feeding inhibition evokes roaming and lawn-leaving in both pdfr-1 and tph-1 mutants, even though the tph-1-expressing NSM neurons have been shown to sense bacterial ingestion and food availability.

      Another surprise is that evoking roaming does not evoke leaving in tax-4 mutants. Without sensory neuron activity, worms are only more likely to roam for a minute before leaving rather than roaming for several minutes before leaving like wild-type (Figure 6C). ASJ seems to be the most important sensory neuron in this coupling between roaming and leaving (which is uncoupled when sensory neurons are inactivated).

    4. Reviewer #2 (Public Review):

      Here, the authors use quantitative behavioral analyses to describe in unprecedented detail the various behavioral choices animals make when encountering the lawn edge. They report that leaving the lawn is a rare outcome compared to other choices such as pausing or reversing back into the lawn. It occurs predominantly out of the roaming state and has a characteristic preceding fast crawling profile. They developed a refined analysis method, the result of which suggests that the arousal state of animals on food can be described by a 4-state behavior (as opposed to the 2-state roaming - dwelling classification); leaving the lawn occurs predominantly from "state 3", which corresponds to the highest level of arousal during roaming. They further show that various manipulations, such as optogenetic inhibition of feeding, stimulation of RIB neurons, or mutations of neuromodulator pathways, all of which have previously been reported to affect crawling speed and/or roaming/dwelling, maintain the coupling between roaming states and leaving, suggesting a dedicated mechanism for coupling leaving to the roaming state. Finally, they use genetics to implicate chemosensory neurons as neuronal circuit elements mediating this coupling.

      How arousal states affect decision making is an active area of neuroscience research; therefore, the current manuscript will impact the field beyond the small community of C. elegans researchers. Also, in the past, roaming/dwelling and leaving have been treated as independent behaviors; the current manuscript is very intriguing, demonstrating both the interconnectedness of different behavioral programs and the importance of the animal's behavioral context for specific decisions.

      In this current revision and, the authors have made a good effort at addressing most of my previous comments, especially to clarify the sample sizes and how independent assays were performed.

      My major concern, however, remains: when leaving animals apparently accelerate their locomotion speed starting about 30s prior to the leaving events (Fig. 2A, D, G). By the authors' analysis, these episodes are assigned to roaming or 'state 3'. Note, that even within these states the behavior seems to be distinctively faster than baseline roaming- or 'state 3'- speed (Fig. 2A, D, G). If leaving is indeed preceded by a stereotypic acceleration phase, this phase should be assigned to the leaving event, not to roaming or 'state 3'. If this is done, the distribution of roaming dwelling states prior to acceleration-leaving could get closer to 50/50 (draw a vertical line at 30s onto Figure 2C, and then count the fraction of prior roaming-dwelling states). I would conclude that the probability of leaving is also high out of the dwelling-state. This interpretation challenges the major conclusion of the study, which is that the roaming behavioral state is a major determinant of the leaving decision. The analysis in Figure 2 S1E shows interesting results hinting that leaving is indeed not fully independent of the roaming history, but does not directly address the issue described above.

      I think that the work is otherwise overall very well done and the results are extremely interesting. But I would interpret the results differently unless the authors provide a more tailored analysis that rules out my concern.

    5. Reviewer #3 (Public Review):

      Scheer and Bargmann use a combination of computational and experimental approaches in C.elegans to investigate the neuronal mechanisms underlying the regulation of foraging decisions by the state of arousal. They showed that, in C.elegans, the decision to leave food substrates is linked to a high arousal state, roaming, and that an increase in speed at different timescales preceded the food leaving decisions. They found that mutants that exhibit increased roaming also leave food substrates more frequently and that both behaviors can be triggered if food intake is inhibited. They further identify a set of chemosensory neurons that express the transduction channel tax-4 that couple the roaming state and the food-leaving decisions. The authors postulate that these neurons integrate foraging decisions with behavioral states and internal feeding cues.

      The strength of the paper relies on using quantitative and detailed behavioral analysis over multiple time scales in combination with manipulation of genes and neuron to tackle the state-dependent control of behavioral decisions in C. elegans. The evidence is convincing, the analysis rigorous, and the writing is clear and to the point.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      This manuscript describes a set of four passage-reading experiments which are paired with computational modeling to evaluate how task-optimization might modulate attention during reading. Broadly, participants show faster reading and modulated eye-movement patterns of short passages when given a preview of a question they will be asked. The attention weights of a Transformerbased neural network (BERT and variants) show a statistically reliable fit to these reading patterns above-and-beyond text- and semantic-similarity baseline metrics, as well as a recurrent-networkbased baseline. Reading strategies are modulated when questions are not previewed, and when participants are L1 versus L2 readers, and these patterns are also statistically tracked by the same transformer-based network.

      I should note that I served as a reviewer on an earlier version of this manuscript at a different venue. I had an overall positive view of the paper at that point, and the same opinion holds here as well.

      Strengths:

      • Task-optimization is a key notion in current models of reading and the current effort provides a computationally rigorous account of how such task effects might be modeled

      • Multiple experiments provide reasonable effort towards generalization across readers and different reading scenarios

      • Use of RNN-based baseline, text-based features, and semantic features provides a useful baseline for comparing Transformer-based models like BERT

      Thank you for the accurate summary and positive evaluation.

      Weaknesses:

      1) Generalization across neural network models seems, to me, somewhat limited: The transformerbased models differ from baseline models in numerous ways (model size, training data, scoring algorithm); it is thus not clear what properties of these models necessarily supports their fit to human reading patterns.

      Thank you for the insightful comment. To dissociate the effect of model architecture and the effect of training data, we have now compared the attention weights across three transformer-based models that have the same architecture but different training data/task: randomized (with all model parameters being randomized), pretrained, and fine-tuned models. Remarkably, even without training on any data, the attention weights in randomly initialized models exhibited significant similarity to human attention patterns (Figure. 3A). The predictive power of randomly initialized transformer-based models outperformed that of the SAR model. Through subsequent pre-training and fine-tuning, the predictive capacity of the models was further elevated. Therefore, both model architecture and the training data/task contribute to human-like attention distribution in the transformer models. We have now reported this result:

      “The attention weights of randomly initialized transformer-based models could predict the human word reading time and the predictive power, which was around 0.3, was significantly higher than the chance level and the SAR (Fig. 3A, Table S1). The attention weights of pre-trained transformerbased models could also predict the human word reading time, and the predictive power was around 0.5, significantly higher than the predictive power of heuristic models, the SAR, and randomly initialized transformer-based models (Fig. 3A, Table S1). The predictive power was further boosted for local but not global questions when the models were fine-tuned to perform the goal-directed reading task (Fig. 3A, Table S1).”

      In addition, we reported how training influenced the sensitivity of attention weights to text features and question relevance. As shown in Figure 4AB, attention in the randomized models were sensitive to text features across all layers. After pretraining, the models exhibited increased sensitivity to text features in the shallow layers, and decreased sensitivity to text features in deep layers. Subsequent finetuning on the reading comprehension task further attenuates the encoding of text features in deep layers but strengthens the sensitivity to task-relevant information.

      2) Inferential statistics are based on a series of linear regressions, but these differ markedly in model size (BERT models involve 144 attention-based regressor, while the RNN-based model uses just 1 attention-based regressor). How are improvements in model fit balanced against changes in model size?

      Thank you for pointing out this issue. The performance of linear regressions was evaluated based on 5-fold cross-validation, and the performance we reported was the performance on the test set. To match the number of parameters, we have now predicted human attention using the average of all heads. The predictive power of the average head was still significantly higher than the predictive power of the SAR model. We have now reported this result in our revised manuscript:

      “For the fine-tuned models, we also predict the human word reading time using an unweighted averaged of the 144 attention heads and the predictive power was 0.3, significantly higher than that achieved by the attention weights of SAR (P = 4 × 10-5, bootstrap).”

      Also, it was not clear to me how participant-level variance was accounted for in the modeling effort (mixed-effects regression?) These questions may well be easily remedied by more complete reporting.

      In the previous manuscript, the word reading time was averaged across participants, and we did not consider the variance between participants. We have now analyzed eye movements of each participant and used the linear mixed effects model to test how different factors affected human word reading time to account for participantslevel and item-level variances.

      “Furthermore, a linear mixed effect model also revealed that more than 85% of the DNN attention heads contribute to the prediction of human reading time when considering text features and question relevance as covariates (Supplementary Results).”

      “Supplementary Methods To characterize the influences of different factors on human word reading time, we employed linear mixed effects models [5] implemented in the lmerTest package [6] of R. For the baseline model, we treated the type of questions (local vs. global; local = baseline) and all text/task-related features as fixed factors, and considered the interaction between the type of questions and these text/taskrelated features. We included participants and items (i.e., questions) as random factors, each with associated random intercepts…”

      Supplementary Results The baseline mixed model revealed significant fixed effects for question type and all text/task-related features, as well as significant interactions between question type and these text/task-related features (Table S7). Upon involving SAR attention, we observed a statistically significant fixed effect associated with SAR attention. When involving attention weights of randomly initialized BERT, the mixed model revealed that most attention heads exhibited significant fixed effects, suggesting their contributions to the prediction of human word reading time. A broader range of attention heads showed significant fixed effects for both pre-trained and fine-tuned BERT.

      3) Experiment 1 was paired with a relatively comprehensive discussion of how attention weights mapped to reading times, but the same sort of analysis was not reported for Exps 2-4; this seems like a missed opportunity given the broader interest in testing how reading strategies might change across the different parameters of the four experiments.

      Thank you for the valuable suggestion. We have now also characterized how different reading measures, e.g., gaze duration and counts or rereading, were affected by text and task-related features in Experiments 2-4.

      For Experiment 2: “For local questions, consistent with Experiment 1, the effects of question relevance significantly increased from early to late processing stages that are separately indexed by gaze duration and counts of rereading (Fig. S9A, Table S3).”

      For Experiment 3: “For local questions, the layout effect was more salient for gaze duration than for counts of rereading. In contrast, the effect of word-related features and task relevance was more salient for counts of rereading than gaze duration (Fig. S9B, Table S3).”

      For Experiment 4: “Both the early and late processing stages of human reading were significantly affected by layout and word features, and the effects were larger for the late processing stage indexed by counts of rereading (Fig. S9C, Table S3).”

      4) Comparison of predictive power of BERT weights to human annotations of text relevance is limited: The annotation task asked participants to chose the 5 "most relevant" words for a given question; if >5 words carried utility in answering a question, this would not be captured by the annotation. It seems to me that the improvement of BERT over human annotations discussed around page 10-11 could well be due to this arbitrary limitation of the annotations.

      Thank you for the insightful comment. We only allowed a participant to label 5 words since we wanted the participant to only label the most important information. As the reviewer pointed out, five words may not be enough. However, this problem is alleviated by having >26 annotators per question. Although each participant can label up to 5 words, pooling the results across >26 annotators results in nonzero relevance rating for an average 21.1 words for local questions and 26.1 words for global question. More important, as was outlined in Experimental Materials, we asked additional participants to answer questions based on only 5 annotated keywords. The accuracy for question answering were 75.9% for global questions and 67.6% for local questions, which was close to the accuracy achieved when the complete passage was present (Fig. 1B), suggesting that even 5 keywords could support question answering.

      5) Abstract ln 35: This concluding sentence didn't really capture the key contribution of the paper which, at least from my perspective, was something closer to "we offer a computational account of how task optimization modulates attention during reading"

      p 4 ln 66: I think this sentence does a good job capturing the main contributions of this paper

      Thanks for your suggestion. We have modified our conclusion in Abstract accordingly.

      6) p 4 ln 81: "therefore is conceptually similar" maybe "may serve a conceptually similar role"

      We have rewritten the sentence.

      “Attention in DNN also functions as a mechanism to selectively extract useful information, and therefore attention may potentially serve a conceptually similar role in DNN.”

      7) p. 7 ln 140: "disproportional to the reading time" I didn't understand this sentence

      Sorry for the confusion and we have rewritten the sentence.

      “In Experiment 1, participants were allowed to read each passage for 2 minutes. Nevertheless, to encourage the participants to develop an effective reading strategy, the monetary reward the participant received decreased as they spent more time reading the passage (see Materials and Methods for details).”

      8) p 8 ln 151: This was another sentence that helped solidify the main research contributions for me; I wonder if this framing could be promoted earlier?

      Thank you for the suggestion and we have moved the sentence to Introduction.

      9) p. 33: I may be missing something here, but I didn't follow the reasoning behind quantifying model fit against eye-tracking measures using accuracy in a permutation test. Models are assessed in terms of the proportion of random shuffles that show a greater statistical correlation. Does that mean that an accuracy value like 0.3 (p. 10 ln 208) means that 0.7 random permutations of word order led to higher correlations between attention weights and RT? Given that RT is continuous, I wonder if a measure of model fit such as RMSE or even R^2 could be more interpretable.

      We have now realized that the term “prediction accuracy” was not clearly defined and have caused confusion. Therefore, in the revised manuscript, we have replaced this term with “predictive power”. Additionally, we have now introduced a clear definition of “prediction power” at its first mention in Result:

      “…the predictive power, i.e., the Pearson correlation coefficient between the predicted and real word reading time, was around 0.2”

      The permutation test was used to test if the predictive power is above chance. Specifically, if the predictive power is higher than the 95 percentile of the chancelevel predictive power estimated using permutations, the significant level (i.e., the p value) is 0.05. We have explained this in Statistical tests.

      10) p. 33: FDR-based multiple comparisons are noted several times, but wasn't clear to me what the comparison set is for any given test; more details would be helpful (e.g. X comparisons were conducted across passages/model-variants/whatever)

      Sorry for missing this important information. We have now mentioned which comparisons are corrected,

      “…Furthermore, the predictive power was higher for global than local questions (P = 4 × 10-5, bootstrap, FDR corrected for comparisons across 3 features, i.e., layout features, word features, and question relevance)…”

      Reviewer #2:

      In this study, researchers aim to understand the computational principles behind attention allocation in goal-directed reading tasks. They explore how deep neural networks (DNNs) optimized for reading tasks can predict reading time and attention distribution. The findings show that attention weights in transformer-based DNNs predict reading time for each word. Eye tracking reveals that readers focus on basic text features and question-relevant information during initial reading and rereading, respectively. Attention weights in shallow and deep DNN layers are separately influenced by text features and question relevance. Additionally, when readers read without a specific question in mind, DNNs optimized for word prediction tasks can predict their reading time. Based on these findings, the authors suggest that attention in real-world reading can be understood as a result of task optimization.

      The research question pursued by the study is interesting and important. The manuscript was well written and enjoyable to read. However, I do have some concerns.

      We thank the reviewer for the accurate summary and positive evaluation.

      1) In the first paragraph of the manuscript, it appears that the purpose of the study was to test the optimization hypothesis in natural tasks. However, the cited papers mainly focus on covert visual attention, while the present study primarily focuses on overt attention (eye movements). It is crucial to clearly distinguish between these two types of attention and state that the study mainly focuses on overt attention at the beginning of the manuscript.

      Thank you for pointing out this issue. We have explicitly mentioned that we focus on overt attention in the current study. Furthermore, we have also discussed that native readers may rely more on covert attention so that they do not need to spend more time overtly fixating at the task relevant words.

      In Introduction:

      “Reading is one of the most common and most sophisticated human behaviors [16, 17], and it is strongly regulated by attention: Since readers can only recognize a couple of words within one fixation, they have to overtly shift their fixation to read a line of text [3]. Thus, eye movements serve as an overt expression of attention allocation during reading [3, 18].”

      In Discussion:

      “Therefore, it is possible that when readers are more skilled and when the passage is relatively easy to read, their processing is so efficient so that they do not need extra time to encode task-relevant information and may rely on covert attention to prioritize the processing of task-relevant information.”

      2) The manuscript correctly describes attention in DNN as a mechanism to selectively extract useful information. However, eye-movement measures such as gaze duration and total reading time are primarily influenced by the time needed to process words. Therefore, there is a doubt whether the argument stating that attention in DNN is conceptually similar to the human attention mechanism at the computational level is correct. It is strongly suggested that the authors thoroughly discuss whether these concepts describe the same or different things.

      Thank you for bringing up this very important issue and we have added discussions about why human and DNN may generate similar attention distributions. For example, we found that both DNN and human attention distributions are modulated by task relevance and word properties, which include word length, word frequency, and word surprisal. The influence of task relevance is relatively straightforward since both human readers and DNN should rely more on task relevant words to answer questions. The influence of word properties is less apparent for models than for human readers and we have added discussions:

      For DNN’s sensitivity to word surprisal:

      “The transformer-based DNN models analyzed here are optimized in two steps, i.e., pre-training and fine-tuning. The results show that pre-training leads to text-based attention that can well explain general-purpose reading in Experiment 4, while the fine-tuning process leads to goal-directed attention in Experiments 1-3 (Fig. 4B & Fig. 5A). Pre-training is also achieved through task optimization, and the pre-training task used in all the three models analyzed here is to predict a word based on the context. The purpose of the word prediction task is to let models learn the general statistical regularity in a language based on large corpora, which is crucial for model performance on downstream tasks [21, 22, 33], and this process can naturally introduce the sensitivity to word surprisal, i.e., how unpredictable a word is given the context.”

      For DNN’s sensitivity to word length:

      “Additionally, the tokenization process in DNN can also contribute to the similarity between human and DNN attention distributions: DNN first separates words into tokens (e.g., “tokenization” is separated into “token” and “ization”). Tokens are units that are learned based on co-occurrence of letters, and is not strictly linked to any linguistically defined units. Since longer words tend to be separated into more tokens, i.e., fragments of frequently co-occurred letters, longer words receive more attention even if the model pay uniform attention to each of its input, i.e., a token.”

      3) When reporting how reading time was predicted by attention weights, the authors used "prediction accuracy." While this measure is useful for comparing different models, it is less informative for readers to understand the quality of the prediction. It would be more helpful if the results of regression models were also reported.

      Sorry for the confusion. The prediction accuracy was defined as the correlation coefficient between the predicted and actual eye-tracking measures. We have now realized that the term “prediction accuracy” might have caused confusion. Therefore, in the revised manuscript, we have replaced this term with “predictive power”. Additionally, we have now introduced a clear definition of “prediction power” at its first mention in Result:

      “…the predictive power, i.e., the Pearson correlation coefficient between the predicted and real word reading time, was around 0.2”

      4) The motivations of Experiments 2 and 3 could be better described. In their current form, it is challenging to understand how these experiments contribute to understanding the major research question of the study.

      Thank you for pointing out this issue. In Experiments 1, different types of questions were presented in separate blocks, and all the participants were L2 reader. Therefore, we conducted Experiments 2 and 3 to examine how reading behaviors were modulated when different types of questions were presented in a mixed manner, or when participants were L1 readers. We have now clarified the motivations:

      “In Experiment 1, different types of questions were presented in blocks which encouraged the participants to develop question-type-specific reading strategies. Next, we ran Experiment 2, in which questions from different types were mixed and presented in a randomized order, to test whether the participants developed question-type-specific strategies in Experiment 1.”

      “Experiments 1 and 2 recruited L2 readers. To investigate how language proficiency influenced task modulation of attention and the optimality of attention distribution, we ran Experiment 3, which was the same as Experiment 2 except that the participants were native English readers.”

      Reviewer #3:

      This paper presents several eyetracking experiments measuring task-directed reading behavior where subjects read texts and answered questions.

      It then models the measured reading times using attention patterns derived from deep-neural network models from the natural language processing literature.

      Results are taken to support the theoretical claim that human reading reflects task-optimized attention allocation.

      STRENGTHS:

      1) The paper leverages modern machine learning to model a high-level behavioral task (reading comprehension). While the claim that human attention reflects optimal behavior is not new, the paper considers a substantially more high-level task in comparison to prior work. The paper leverages recent models from the NLP literature which are known to provide strong performance on such question-answering tasks, and is methodologically well grounded in the NLP literature.

      2) The modeling uses text- and question-based features in addition to DNNs, specifically evaluates relevant effects, and compares vanilla pretrained and task-finetuned models. This makes the results more transparent and helps assess the contributions of task optimization. In particular, besides finetuned DNNs, the role of the task is further established by directly modeling the question relevance of each word. Specifically, the claim that human reading is predicted better by task-optimized attention distributions rests on (i) a role of question relevance in influencing reading in Expts 1-2 but not 4, and (ii) the fact that fine-tuned DNNs improve prediction of gaze in Expts 1-2 but not 4.

      3) The paper conducts experiments on both L2 and L1 speakers.

      We thank the reviewer for the accurate summary and positive evaluation.

      WEAKNESSES:

      1) The paper aims to show that human gaze is predicted the the DNN-derived task-optimal attention distribution, but the paper does not actually derive a task-optimal attention distribution. Rather, the DNNs are used to extract 144 different attention distributions, which are then put into a regression with coefficients fitted to predict human attention. As a consequence, the model has 144 free parameters without apparent a-priori constraint or theoretical interpretation. In this sense, there is a slight mismatch between what the modeling aims to establish and what it actually does.

      Regarding Weakness (1): This weakness should be made explicit, at least by rephrasing line 90. The authors could also evaluate whether there is either a specific attention head, or one specific linear combination (e.g. a simple average of all heads) that predicts the human data well.

      Thank you for pointing out this issue. One the one hand, we have now also predicted human attention using the average of all heads, i.e., the simple average suggested by the reviewer. The predictive power of the average head was still significantly higher than the predictive power of the SAR model. We have now reported this result in our revised manuscript.

      “For the fine-tuned models, we also predict the human word reading time using an unweighted averaged of the 144 attention heads and the predictive power was 0.3, significantly higher than that achieved by the attention weights of SAR (P = 4 × 10-5, bootstrap).”

      On the other hand, since different attention weights may contribute differently to the prediction of human reading time, we have now also reported the weights assigned to individual attention head during the original regression analysis (Fig. S4). It was observed that the weight was highly distributed across attention head and was not dominated by a single head.

      Even more importantly, we have now rephrased the statement in line 90 of the previous manuscript:

      “We employed DNNs to derive a set of attention weights that are optimized for the goal-directed reading task, and tested whether such optimal weights could explain human attention measured by eye tracking.”

      Furthermore, in Discussion, we mentioned that:

      “Furthermore, we demonstrate that both humans and transformer-based DNN models achieve taskoptimal attention distribution in multiple steps… Similarly, the DNN models do not yield a single attention distribution, and instead it generates multiple attention distributions, i.e., heads, for each layer. Here, we demonstrate that basic text features mainly modulate the attention weights in shallow layers, while the question relevance of a word modulates the attention weights in deep layers, reflecting hierarchical control of attention to optimize task performance. The attention weights in both the shallow and deep layers of DNN contribute to the explanation of human word reading time (Fig. S4).”

      2) While Experiment 1 tests questions from different types in blocks, and the paper mentions that this might encourage the development of question-type-specific reading strategies -- indeed, this specifically motivates Experiment 2, and is confirmed indirectly in the comparison of the effects found in the two experiments ("all these results indicated that the readers developed question-typespecific strategies in Experiment 1") -- the paper seems to miss the opportunity to also test whether DNNs fine-tuned for each of the question-types predict specifically the reading times on the respective question types in Experiment 1. Testing not only whether DNN-derived features can differentially predict normal reading vs targeted reading, but also different targeted reading tasks, would be a strong test of the approach.

      Regarding Weakness (2): results after finetuning for each question type could be reported.

      Thank you for the valuable suggestion. We have now fine-tuned the models separately based on global and local questions. The detailed fine-tuning parameters employed in the fine-tuning process were presented in Author response table 1.

      Author response table 1.

      The hyperparameter for fine-tuning DNN models with specific question type.

      The fine-tuning process yielded a slight reduction in loss (i.e., the negative logarithmic score of the correct option) on the validation set. Specifically, for BERT, the loss decreased from 1.08 to 0.96; for ALBERT, it decreased from 1.16 to 0.76; for RoBERTa, it went down from 0.68 to 0.54. Nevertheless, the fine-tuning process did not improve the prediction of reading time (Author response image 1). A likely reason is that the number of global and local questions for training is limited (local questions: 520; global questions: 280), and similar questions also exist in RACE dataset that is used for the original fine tuning (sample size: 87,866). Therefore, a small number of questions can significantly change the reading strategy of human readers but using these questions to effectively fine-tune a model seems to be a more challenging task.

      Author response image 1.

      Fine-tuning based on local and global questions does not significantly modulate the prediction of human reading time. Lighter-color symbols show the results for the 3 BERT-family models (i.e., BERT, ALBERT, and RoBERTa) and the darker-color symbols show the average over the 3 BERT-family models. trans_fine: model fine-tuned based on the RACE dataset; trans_local: models additionally fine-tuned using local questions; trans_global: models additionally fine-tuned using global questions.

      3) The paper compares the DNN-derived features to word-related features such as frequency and surprisal and reports that the DNN features are predictive even when the others are regressed out (Figure S3). However, these features are operationalized in a way that puts them at an unfair disadvantage when compared to the DNNs: word frequency is estimated from the BNC corpus; surprisal is derived from the same corpus and derived using a trigram model. The BNC corpus contains 100 Million words, whereas BERT was trained on several Billions of words. Relatedly, trigram models are now far surpassed by DNN-based language models. Specifically, it is known that such models do not fit human eyetracking reading times as well as modern DNN-based models (e.g., Figure 2 Dundee in: Wilcox et al, On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior, CogSci 2020). This means that the predictive power of the word-related features is likely to be underestimated and that some residual predictive power is contained in the DNNs, which may implicitly compute quantities related to frequency and surprisal, but were trained on more data. In order to establish that the DNN models are predictive over and above word-related features, and to reliably quantify the predictive power gained by this, the authors could draw on (1) frequency estimated from the corpora used for BERT (BookCorpus + Wikipedia), (2) either train a strong DNN language model, or simply estimate surprisal from a strong off-the-shelf model such as GPT-2.

      This concern does not fundamentally cast doubt on the conclusions, since the authors found a clear effect of the task relevance of individual words, which by definition is not contained in those baseline models. However, Figure S3 -- specifically Figure S3C -- is likely to inflate the contribution of the DNN model over and above the text-based features.

      Thank you for pointing out these issues. Following the valuable suggestion of the reviewer, we have now 1) computed word frequencies based on BookCorpus and Wikipedia and 2) calculated word surprisal using GPT-2.

      “The word features included word length, logarithmic word frequency estimated based on the BookCorpus [62] and English Wikipedia using SRILM [68], and word surprisal estimated from GPT-2 Medium [69].”

      These recalculated word frequency and surprisal are correlated with the original measures (word frequency: 0.98; surprisal: 0.59), and the updated results are also closely aligned with those reported in the previous manuscript.

      Others:

      1) How does the statistical modeling take into account that measures are repeated both within the items (same texts read by different subjects) and within the subjects (some subject read multiple texts)? I only see the items-level repetition be addressed in line 715-721 in comparing between local and global questions, but not elsewhere. The standard approach in the literature on human reading times (e.g. the Wilcox et al paper mentioned above, or ref. 44) is to use mixed-effects regression with appropriate random effects for items and subjects. The same question applies to the calculation of chance accuracy (line 702-709), which is done by shuffling words within a passage. Relatedly, how exactly was cross-validation (line 681) calculated? On the level of subjects, individual words, trials, texts, ...?

      Thank you for raising up this issue. In the previous manuscript, the word reading time was averaged across participants. The cross-validation was conducted on the level of texts (i.e., passages). Following the valuable suggestion, we have now separately analyzed each participant and applied the linear mixed effects models.

      “Furthermore, a linear mixed effect model also revealed that more than 85% of the DNN attention heads contribute to the prediction of human reading time when considering text features and question relevance as covariates (Supplementary Results).”

      “Supplementary Methods To characterize the influences of different factors on human word reading time, we employed linear mixed effects models [5] implemented in the lmerTest package [6] of R. For the baseline model, we treated the type of questions (local vs. global; local = baseline) and all text/task-related features as fixed factors, and considered the interaction between the type of questions and these text/taskrelated features. We included participants and items (i.e., questions) as random factors, each with associated random intercepts…”

      Supplementary Results The baseline mixed model revealed significant fixed effects for question type and all text/task-related features, as well as significant interactions between question type and these text/task-related features (Table S7). Upon involving SAR attention, we observed a statistically significant fixed effect associated with SAR attention. When involving attention weights of randomly initialized BERT, the mixed model revealed that most attention heads exhibited significant fixed effects, suggesting their contributions to the prediction of human word reading time. A broader range of attention heads showed significant fixed effects for both pre-trained and fine-tuned BERT.

      2) I could not find any statement about code availability (only about data availability). Will the source code and statistical analysis code also be made available?

      We have added the code availability statement.

      “The code is now available at https://github.com/jiajiezou/TOA.”

      3) The theoretical claim, and some basic features of the research, are quite similar to other recent work (Hahn and Keller, Modeling task effects in human reading with neural network-based attention, Cognition, 2023; cited with very little discussion as ref 44), which also considered task-directed reading in a question-answering task and derived task-optimized attention distributions. There are various differences, and the paper under consideration has both weaknesses and strengths when compared to that existing work -- e.g., that paper derived a single attention distribution from task optimization, but the paper under consideration provides more detailed qualitative analysis of the task effects, uses questions requiring more high-level reasoning, and uses more state-of-the-art DNNs.

      The paper would benefit from being more explicit about how the work under review provides a novel angle over Ref 44 (Hahn and Keller, Cognition, 2023).

      Thanks for bringing up this issue. We have now incorporated a more comprehensive discussion that compare the current study with the recent work conducted by Hahn and Keller:

      “When readers read a passage to answer a question that can be answered using a word-matching strategy [45], a recent study has demonstrated that the specific reading goal modulates the word reading time and the effect can be modeled using a RNN model [46]. Here, we focus on questions that cannot be answered using a word-matching strategy (Fig. 1B) and demonstrate that, for these challenging questions, attention is still modulated by the reading goal but the attention modulation cannot be explained by a word-matching model (Fig. S3). Instead, the attention effect is better captured by transformer models than an advanced RNN model, i.e., the SAR (Fig. 3A). Combining the current study and the study by Hahn et al. [46], it is possible that the word reading time during a general-purpose reading task can be explained by a word prediction task, the word reading time during a simple goal-directed reading task that can be solved by word matching can be modeled by a RNN model, while the word reading time during a more complex goal-directed reading task involving inference is better modeled using a transformer model. The current study also further demonstrates that elongated reading time on task-relevant words is caused by counts of rereading and further studies are required to establish whether earlier eye movement measures can be modulated by, e.g., a word matching task.”

      4) In Materials&Methods, line 599-636, specifically when "pretraining" is mentioned (line 632), it should be mentioned what datasets these DNNs were pretrained on.

      We have now mentioned this in the revised manuscript:

      “The pre-training process aimed to learn general statistical regularities in a language based on large corpora, i.e., BooksCorpus [62] and English Wikipedia…”

    1. Reviewer #1 (Public Review):

      Rai1 encodes the transcription factor retinoic acid-induced 1 (RAI1), which regulates expression of factors involved in neuronal development and synaptic transmission. Rai1 haploinsufficiency leads to the monogenic disorder Smith-Magenis syndrome (SMS), which is associated with excessive feeding, obesity and intellectual disability. Consistent with findings in human subjects, Rai1+/- mice and mice with conditional deletion of Rai1 in Sim+ neurons, which are abundant in the paraventricular nucleus (PVN), exhibit hyperphagia, obesity and increased adiposity. Furthermore, RAI1-deficient mice exhibit reduced expression of brain-derived neurotrophic factor (BDNF), a satiety factor essential for the central control of energy balance. Notably, overexpression of BDNF in PVN of RAI1-deficient mice mitigated their obesity, implicating this neurotrophin in the metabolic dysfunction these animals exhibit. In this follow up study, Javed et al. interrogated the necessity of RAI1 in BDNF+ neurons promoting metabolic health.

      Consistent with previous reports, the authors observed reduced BDNF expression in hypothalamus of Rai1+/- mice. Moreover, proteomics analysis indicated impairment in neurotrophin signaling in the mutants. Selective deletion of Rai1 in BDNF+ neurons in the brain during development resulted in increased body weight, fat mass and reduced locomotor activity and energy expenditure without changes in food intake. There was also a robust effect on glycemic control, with mutants exhibiting glucose intolerance. Selective depletion of RAI1 in BDNF+ neurons in PVN in adult mice also resulted in increased body weight, reduced locomotor activity and glucose intolerance without affecting food intake. Blunting RAI1 activity also leads to increases and decreases the inhibitory tone and intrinsic excitability, respectively, of BDNF+ neurons in the PVN.

      Overall, the experiments are well designed and multidisciplinary approaches are employed to demonstrate that RAI1 deficits in BDNF+ neurons diminish hypothalamic BDNF signaling and produce metabolic dysfunction. The most significant advance relative to previous reports is the finding from electrophysiological studies showing that blunting RAI1 activity leads to increases and decreases the inhibitory tone and intrinsic excitability, respectively, of BDNF+ neurons in the PVN. Furthermore, that intact RAI1 function is required in BDNF+ neurons for the regulation of glucose homeostasis.

      Depleting RAI1 in BDNF+ neurons had a robust effect compromising glycemic control while playing a lesser part driving deficits in energy balance regulation. Accordingly, both global central depletion of Rai1 in BDNF+ neurons during development and deletion of Rai1 in BDNF+ neurons in the adult PVN elicited modest effects on body weight (less than 18% increase) and did not affect food intake. This contrasts with mice with selective Bdnf deletion in the adult PVN, which are hyperphagic and dramatically obese (90% heavier than controls). Therefore, the results suggest that deficits in RAI1 in PVN or the whole brain only moderately affect BDNF actions influencing energy homeostasis and that other signaling cascades and neuronal populations play a more prominent role driving the phenotypes observed in Rai1+/- mice, which are hyperphagic and 95% heavier than controls. The results from the proteomic analysis of hypothalamic tissue of Rai1 mutant mice and controls could be useful in generating alternative hypotheses.

    1. eLife assessment

      In this valuable study, the authors characterize the role of splicing factor SRSF1 during spermatogenesis with a conditional knockout for Srsf1 in male mouse germ cells. The requirement of SRSF1 for maturation of postnatal gonocytes into spermatogonia, and the molecular role of SRSF1 in regulating alternative splicing in juvenile testes are convincingly supported. The paper also provides strong evidence that the mRNA encoding Tial, a factor relevant for spermatogonial maintenance and male fertility, is alternatively spliced in testis and that this splicing is regulated by SRSF1. The work will be of interest to reproductive biologists and stem cell biologists.

    1. eLife assessment

      This valuable study provides extensive high-quality imaging data and new insights into the process of the endothelial-to-hematopoietic transition (EHT), which generates nascent hematopoietic stem cells from the ventral wall of the dorsal aorta. This study provides strong evidence that, based on apicobasal cell polarity, different morphologies exist for emergent hematopoietic stem cells. The study is incomplete at present in that it does not yet support the additional claim that there are functional consequences, as altered cell fate related to these different morphologies has not been definitively shown.

    1. eLife assessment

      Studies of synaptic development and plasticity in the nematode C. elegans have been limited by the difficulty of rapid, accurate assessments of synaptic structure. In this valuable work, the authors convincingly introduce and validate a computational pipeline, "WormPsyQi," that allows rapid, reproducible quantitation of fluorescent synaptic puncta while minimizing human error and bias. The authors also describe a new set of strains carrying synaptic markers. Together, these tools should provide many groups studying this model system with the ability to quantitatively characterize chemical and electrical synapses, even in densely packed regions in 3D space such as the nerve ring.

    2. Reviewer #1 (Public Review):

      Summary:

      The paper by Majeed et al has a valuable and worthwhile aim: to provide a set of tools to standardize the quantification of synapses using fluorescent markers in the nematode C. elegans. Using current approaches, the identification of synapses using fluorescent markers is tedious and subject to significant inter-experimenter variability. Majeed et al successfully developed and validated a computational pipeline called "WormPsyQi" that overcomes some of these obstacles and will be a powerful resource for many C. elegans neurobiologists.

      Strengths:

      The computational pipeline is rigorously validated and shown to accurately quantitate fluorescent puncta, at least as well as human experimenters. The inclusion of a mask - a region of interest defined by a cytoplasmic marker - is a powerful and useful approach. Users can take advantage of one of four pre-trained neural networks, or train their own. The software is freely available and appears to be user-friendly. A series of rigorous experiments demonstrate the utility of the pipeline for measuring differences in the number of synaptic puncta between sexes and across developmental stages. Neuron-to-neuron heterogeneity in patterns of synaptic growth during development is convincingly demonstrated. Weaknesses and caveats are realistically discussed.

    3. Reviewer #2 (Public Review):

      Summary:

      This paper nicely introduces WormPsyQi, an imaging analysis pipeline that effectively quantifies synaptically localized fluorescent signals in C. elegans through high-throughput automation. This toolkit is particularly valuable for the analysis of densely packed regions in 3D space, such as the nerve ring. The authors applied WormPsyQi to various aspects, including the examination of sexually dimorphic synaptic connectivity, presynaptic markers in eight head neurons, five GRASP reporters, electrical synapses, the enteric nervous system, and developmental synapse comparisons. Furthermore, they validated WormPsyQi's accuracy by comparing its results to manual analysis.

      Strengths:

      Overall, the experiments are well done, and their toolkit demonstrates significant potential and offers a valuable resource to the C. elegans community. This will expand the range of possibilities for studying synapses in the central nervous system in C. elegans.

      Weaknesses:

      1. The authors effectively validated sexually dimorphic synaptic connectivity by comparing the synapse puncta numbers of PHB>AVA, PHA>AVG, PHB>AVG, and ADL>AVA. However, these differences appear to be quite robust. Knowing how well WormPsyQi can detect more subtle changes at the synapses, such as 10-20% changes in puncta number and fluorescence intensity, will require further study.

      2. The authors mentioned that having a cytoplasmic reporter in the background of the synaptic reporter enhanced performance. However, comparative results with and without cytoplasmic reporters, particularly for scenarios involving dim signals or densely distributed signals, are not provided, making it difficult to rigorously assess the importance of this step.

      3. In some cases, the authors note discrepancies between WormPsyQi and human quantification. While they provide some potential explanations for these, the areas of discrepancy are not always highlighted in the images. This may make it difficult for users to know which types of signals are or are not well-suited for analysis by WormPsyQi.

    4. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors present a new automated image analysis pipeline named WormPsyQi which allows researchers to quantify various parameters of synapses in C. elegans. Using a collection of newly generated transgenic strains in which synaptic proteins are tagged with fluorescent proteins, the authors showed that WormPsyQi can reliably detect puncta of synaptic proteins, and measure several parameters, including puncta number, location, and size.

      Strengths:

      The image analysis of fluorescently-labeled synaptic (or other types of) puncta pattern requires extensive experience such that one can tell which puncta likely represent bona fide synapse or background noise. The authors showed that WormPsyQi nicely reproduced the quantifications done manually for most of the marker strains they tested. Many researchers conducting such types of quantifications would receive significant benefits in saving their time by utilizing the pipeline developed by the authors. The collections of new markers would also help researchers examine synapse patterning in different neuron types which may have a unique mechanism in synapse assembly and specificity.

      Weaknesses:

      As the authors note, the limitations that the use of fluorescently-tagged proteins expressed from the concatemeric transgenes directly apply to WormPsyQi. While I appreciate that WormPsyQi could help researchers in doing repetitive, time-consuming tedious quantifications, it remains unclear whether there are particular kinds of quantifications that WormPsyQi handles better than human experimenters.

    1. Reviewer #3 (Public Review):

      Summary:

      This paper presents novel and innovative force measurements of the biophysics of gliding cyanobacteria filaments. These measurements allow for estimates of the resistive force between the cell and substrate and provide potential insight into the motility mechanism of these cells, which remains unknown.

      Strengths:

      The authors used well-designed microfabricated devices to measure the bending modulus of these cells and to determine the critical length at which the cells buckle. I especially appreciated the way the authors constructed an array of pillars and used it to do 3-point bending measurements and the arrangement the authors used to direct cells into a V-shaped corner in order to examine at what length the cells buckled at. By examining the gliding speed of the cells before buckling events, the authors were able to determine how strongly the buckling length depends on the gliding speed, which could be an indicator of how the force exerted by the cells depends on cell length; however, the authors did not comment on this directly.

      Weaknesses:

      There were two minor weaknesses in the paper.

      First, the authors investigate the buckling of these gliding cells using an Euler beam model. A similar mathematical analysis was used to estimate the bending modulus and gliding force for Myxobacteria (C.W. Wolgemuth, Biophys. J. 89: 945-950 (2005)). A similar mathematical model was also examined in G. De Canio, E. Lauga, and R.E Goldstein, J. Roy. Soc. Interface, 14: 20170491 (2017). The authors should have cited these previous works and pointed out any differences between what they did and what was done before.

      The second weakness is that the authors claim that their results favor a focal adhesion-based mechanism for cyanobacterial gliding motility. This is based on their result that friction and adhesion forces correlate strongly. They then conjecture that this is due to more intimate contact with the surface, with more contacts producing more force and pulling the filaments closer to the substrate, which produces more friction. They then claim that a slime-extrusion mechanism would necessarily involve more force and lower friction. Is it necessarily true that this latter statement is correct? (I admit that it could be, but is it a requirement?)

      Related to this, the authors use a model with isotropic friction. They claim that this is justified because they are able to fit the cell shapes well with this assumption. How would assuming a non-isotropic drag coefficient affect the shapes? It may be that it does equally well, in which case, the quality of the fits would not be informative about whether or not the drag was isotropic or not.

    2. Reviewer #1 (Public Review):

      The paper "Quantifying gliding forces of filamentous cyanobacteria by self-buckling" combines experiments on freely gliding cyanobacteria, buckling experiments using two-dimensional V-shaped corners, and micropipette force measurements with theoretical models to study gliding forces in these organisms. The aim is to quantify these forces and use the results to perhaps discriminate between competing mechanisms by which these cells move. A large data set of possible collision events are analyzed, bucking events evaluated, and critical buckling lengths estimated. A line elasticity model is used to analyze the onset of buckling and estimate the effective (viscous type) friction/drag that controls the dynamics of the rotation that ensues post-buckling. This value of the friction/drag is compared to a second estimate obtained by consideration of the active forces and speeds in freely gliding filaments. The authors find that these two independent estimates of friction/drag correlate with each other and are comparable in magnitude. The experiments are conducted carefully, the device fabrication is novel, the data set is interesting, and the analysis is solid. The authors conclude that the experiments are consistent with the propulsion being generated by adhesion forces rather than slime extrusion. While consistent with the data, this conclusion is inferred.

      Summary:

      The paper addresses important questions on the mechanisms driving the gliding motility of filamentous cyanobacteria. The authors aim to understand these by estimating the elastic properties of the filaments, and by comparing the resistance to gliding under a) freely gliding conditions, and b) in post-buckled rotational states. Experiments are used to estimate the propulsion force density on freely gliding filaments (assuming over-damped conditions). Experiments are combined with a theoretical model based on Euler beam theory to extract friction (viscous) coefficients for filaments that buckle and begin to rotate about the pinned end. The main results are estimates for the bending stiffness of the bacteria, the propulsive tangential force density, the buckling threshold in terms of the length, and estimates of the resistive friction (viscous drag) providing the dissipation in the system and balancing the active force. It is found that experiments on the two bacterial species yield nearly identical values of 𝑓 (albeit with rather large variations). The authors conclude that the experiments are consistent with the propulsion being generated by adhesion forces rather than slime extrusion.

      Strengths of the paper:

      The strengths of the paper lie in the novel experimental setup and measurements that allow for the estimation of the propulsive force density, critical buckling length, and effective viscous drag forces for movement of the filament along its contour - the axial (parallel) drag coefficient, and the normal (perpendicular) drag coefficient (I assume this is the case, since the post-buckling analysis assumes the bent filament rotates at a constant frequency). These direct measurements are important for serious analysis and discrimination between motility mechanisms.

      Weaknesses:

      There are aspects of the analysis and discussion that may be improved. I suggest that the authors take the following comments into consideration while revising their manuscript.

      The conclusion that adhesion via focal adhesions is the cause for propulsion rather than slime protrusion is consistent with the experimental results that the frictional drag correlates with propulsion force. At the same time, it is hard to rule out other factors that may result in this (friction) viscous drag - (active) force relationship while still being consistent with slime production. More detailed analysis aiming to discriminate between adhesion vs slime protrusion may be outside the scope of the study, but the authors may still want to elaborate on their inference. It would help if there was a detailed discussion on the differences in terms of the active force term for the focal adhesion-based motility vs the slime motility.

      Can the authors comment on possible mechanisms (perhaps from the literature) that indicate how isotropic friction may be generated in settings where focal adhesions drive motility? A key aspect here would probably be estimating the extent of this adhesion patch and comparing it to a characteristic contact area. Can lubrication theory be used to estimate characteristic areas of contact (knowing the radius of the filament, and assuming a height above the substrate)? If the focal adhesions typically cover areas smaller than this lubrication area, it may suggest the possibility that bacteria essentially present a flat surface insofar as adhesion is concerned, leading to a transversely isotropic response in terms of the drag. Of course, we will still require the effective propulsive force to act along the tangent.

      I am not sure why the authors mention that the power of the gliding apparatus is not rate-limiting. The only way to verify this would be to put these in highly viscous fluids where the drag of the external fluid comes into the picture as well (if focal adhesions are on the substrate-facing side, and the upper side is subject to ambient fluid drag). Also, the friction referred to here has the form of a viscous drag (no memory effect, and thus not viscoelastic or gel-like), and it is not clear if forces generated by adhesion involve other forms of drag such as chemical friction via temporary bonds forming and breaking. In quasi-static settings and under certain conditions such as the separation of chemical and elastic time scales, bond friction may yield overall force proportional to local sliding velocities.

      For readers from a non-fluids background, some additional discussion of the drag forces, and the forms of friction would help. For a freely gliding filament if 𝑓 is the force density (per unit length), then steady gliding with a viscous frictional drag would suggest (as mentioned in the paper) 𝑓 ∼ 𝑣! 𝐿 𝜂∥. The critical buckling length is then dependent on 𝑓 and on 𝐵 the bending modulus. Here the effective drag is defined per length. I can see from this that if the active force is fixed, and the viscous component resulting from the frictional mechanism is fixed, the critical buckling length will not depend on the velocity (unless I am missing something in their argument), since the velocity is not a primitive variable, and is itself an emergent quantity.

    3. eLife assessment

      This paper describes innovative force measurements of the bending modulus of gliding cyanobacteria, along with measurements of the critical buckling length of the cells, which combined lead to valuable insight into how these cells produce the force necessary to move. The major findings are well supported by the data; however, the evidence that the results favor an adhesion-based mechanism is currently incomplete.

    4. Reviewer #2 (Public Review):

      In the presented manuscript, the authors first use structured microfluidic devices with gliding filamentous cyanobacteria inside in combination with micropipette force measurements to measure the bending rigidity of the filaments.

      Next, they use triangular structures to trap the bacteria with the front against an obstacle. Depending on the length and rigidity, the filaments buckle under the propulsive force of the cells. The authors use theoretical expressions for the buckling threshold to infer propulsive force, given the measured length and stiffnesses. They find nearly identical values for both species, 𝑓 ∼ (1.0 {plus minus} 0.6) nN∕µm, nearly independent of the velocity.

      Finally, they measure the shape of the filament dynamically to infer friction coefficients via Kirchhoff theory. This last part seems a bit inconsistent with the previous inference of propulsive force. Before, they assumed the same propulsive force for all bacteria and showed only a very weak correlation between buckling and propulsive velocity. In this section, they report a strong correlation with velocity, and report propulsive forces that vary over two orders of magnitude. I might be misunderstanding something, but I think this discrepancy should have been discussed or explained.

      From a theoretical perspective, not many new results are presented. The authors repeat the well-known calculation for filaments buckling under propulsive load and arrive at the literature result of buckling when the dimensionless number (f L^3/B) is larger than 30.6 as previously derived by Sekimoto et al in 1995 [1] (see [2] for a clamped boundary condition and simulations). Other theoretical predictions for pushed semi-flexible filaments [1-4] are not discussed or compared with the experiments.<br /> Finally, the Authors use molecular dynamics type simulations similar to [2-4] to reproduce the buckling dynamics from the experiments. Unfortunately, no systematic comparison is performed.

      [1] K. Sekimoto, N. Mori, K. Tawada and Y. Toyoshima, Phys. Rev. Lett., 1995, 75, 172-175<br /> [2] R. Chelakkot, A. Gopinath, L. Mahadevan and M. F. Hagan, J. R. Soc., Interface, 2014, 11, 20130884.<br /> [3] R. E. Isele-Holder, J. Elgeti and G. Gompper, Soft Matter, 2015, 11, 7181-7190.<br /> [4] R. E. Isele-Holder, J. Jager, G. Saggiorato, J. Elgeti and G. Gompper, Soft Matter, 2016, 12, 8495

    1. Author Response

      Reviewer 1 (Public Review)

      Summary: The authors have made a novel and important effort to distinguish and include different sources of active deformations for fitting C elegans embryo development: cyclic muscle contrac- tions and actomyosion circumferential stresses. The combination and synchronisation of both contributions are, according to the model, responsible for different elongation rates, and can in- duce bending and torsion deformations, which are a priori not expected from purely contractile forces. The model can be applied to other growth processes in initially cylindrical shapes.

      Strengths: The model allows us to fit and deduce specific growth patterns, frequencies, and lo- cations of contractions that yield the observed axial elongation during the 240 min of the studied process.

      The deformation gradient is decomposed according to muscle and actomyosin activity, which can be distinguished and quantified. An energy-transferring process allows for the retrieval of the nec- essary permanent deformations that embryo development requires.

      Weaknesses: Despite the completeness of the model, the explanation of the methodology needs to be improved. Parameters and quantities are not always explained in the main text and are intro- duced on some occasions in an ordered manner. This makes the comprehension and deduction of methodology difficult. There are some minor comments that are listed below. The most important points are:

      How are the authors sure that there is a torsional deformation? Without tracking the muscle fibers, bending with respect to different angles for different Zs may yield a shape similar to the one in Figure 6E. Furthermore, it is unclear why the model yields torsion deformation. If material points of actomyosin rings do not change in reference configuration, no helicoidal growth should be happening.

      Our torsional deformations were obtained computationally, and the results are plotted in Figure 6 according to our formalism. In our approach, the torsional deformation results from the interaction between the vertical muscles and the circumferential actin network: the muscles bend the cylinder and the bending modifies the direction of the actin fibers, as demonstrated in the experiment.

      -The triple decomposition 𝐹 = 𝐹𝑒 ⋅ 𝐺𝑖 ⋅ 𝐺0 seems to complicate the expressions of growth and requires the use of angles alpha and beta due to the initial deformation 𝐺0. Why not use a simpler decomposition 𝐹 = 𝐹𝑒 ⋅ 𝐺, where 𝐺 contains all contributions from actomyosin and muscle contrac- tions in a material frame? This would avoid considering angles alpha and beta.

      𝐺0 represents the active strain during the early elongation stage and 𝐺𝑖 during the late elongation stage respectively. Such a decomposition which is not mandatory, allows a better un- derstanding. In addition, due to the late elongation stage, both muscle and actin networks must be considered, and their orientation changes with deformation. Therefore, it is clearer and simpler to express the active strain in terms of alpha and beta angles.

      The section "Energy transformation and Elongation" is unclear. Indeed, stresses need to relax, oth- erwise, the removal of muscle and actin activity would send the embryo back to its initial state. How- ever, the rationale behind the energy transfer is not explained. Authors seem to impose 𝑊𝑐 = 𝑊𝑟, and from this deduce the necessary actin contraction after muscle relaxation. Why should energy be maintained when muscle relaxes? Which mechanism physically imposes this energy transfer? Muscle contraction could indeed induce elongation if traction forces at the opposite side of the contracting muscle relax. In fact, an alternative approach for obtaining stress relaxation and axial elongation would be converting part of the elastic deformation 𝐹𝑒 to a permanent deformation 𝐹𝑝.

      In this section, we do assume that all the energy accumulated by the muscle contrac- tions will be converted into the energy necessary for elongation, and as our estimate in the article shows, 𝑊𝑐 is indeed greater than 𝑊𝑟, indicating that a significant fraction of 𝑊𝑐 is converted into dissipation and friction, but also into the reorganization of the actin cables. Indeed, elongation of the cylinder induces a significant reduction in the experimentally observed and also in the actin cable density. However, this reduction in cable density is not observed experimentally. Thus, elon- gation requires a reorganization of the actin network, which is part of the energy consumption and which explains the existence of a permanent deformation 𝐹𝑝.

      Self contact is ignored. This may well be a shape generator and responsible for bending deforma- tions. The convoluted shape of the embryo in the confined space deserves at least commenting on this limitation of the model.

      Thank you for your suggestion. We have considered the effect of contact between C. elegans and the eggshell in the energy dissipation section but we also agree that the self-contact of the worm in confinement will be important. Here, we focus mainly on active filaments: actomyosin and muscle, and we restrict ourselves to a cylindrical shell that is far from the embryo.

      Reviewer 2 (Public Review)

      Summary

      During C. elegans development, embryos undergo elongation of their body axis in the absence of cell proliferation or growth. This process relies in an essential way on periodic contractions of two pairs of muscles that extend along the embryo’s main axis. How contraction can lead to extension along the same direction is unknown.

      To address this question, the authors use a continuum description of a multicomponent elastic solid. The various components are the interior of the animal, the muscles, and the epidermis. The different components form separate compartments and are described as hyperelastic solids with different shear moduli. For simplicity, a cylindrical geometry is adopted. The authors consider first the early elongation phase, which is driven by contraction of the epidermis, and then late elongation, where contraction of the muscles injects elastic energy into the system, which is then released by elongation. The authors get elongation that can be successfully fitted to the elongation dynamics of wild-type worms and two mutant strains.

      Strengths

      The work proposes a physical mechanism underlying a puzzling biological phenomenon. The framework developed by the authors could be used to explain phenomena in other organisms and could be exploited in the design of soft robots.

      Weaknesses

      1) This reviewer considers that the quality of the writing is poor. Because of this the main result of this work, how elongation is achieved by contraction, remains unclear to me. In the opinion of this reviewer, the work is not accessible to a biologist. This is a real pity because the findings are potentially of great interest to developmental biologists and engineers alike.

      We regret that, despite a general introduction and a number of figures, the work does not seem accessible to biologists.

      2) The authors assume that the embryo is elastic throughout all stages of development. Is this assumption appropriate? In my opinion, the authors need to critically discuss this assumption and provide justification. Would this still be true for the adult? If so could the adult relax back to the state prior to elongation? The embryo should be able to do that, if the contractility of the epidermis were sufficiently reduced, right?

      Soft tissues are elastic, the modeling of soft tissues, even with large deformations, is now well established. The difference between a worm embryo and an adult is first of all the quality of the tissues, their low degree of heterogeneity, the weakness of the muscles and the absence of bones. As for the question of complete relaxation of the stresses, the fact that different components are attached to each other limits complete relaxation. We keep our fingerprints and cortical undula- tions, although they originate from an elastic instability that occurs in fetal life. It never disappears.

      The authors impose strains rather than stress. Since they want to understand the final deformation, I find this surprising. Maybe imposing strain or stress is equivalent, but then you should discuss this.

      Perhaps, the referee has in mind the question of active strain versus active stress and is concerned about the representation of biological forces such as those produced by actomyosin or muscle. In fact, both exist in morphoelasticity and are, of course, related. Usually, the choice is dictated by the simplicity of deriving quantitative results for comparison with experiments.

      4) Does your mechanism need 4 muscle strands or would 2 be sufficient?

      First, the 4 muscle strands are consistent with real C. elegans structures, and second, although we assume that two muscles on the same side contract simultaneously, their size and position affect the deformation results. Also, the time period we consider is just before the worm hatches. After that, the worm has to slide on the ground. So efficient muscles are needed.

      5) It is sometimes hard to understand, whether the authors are talking about the model or the worm.

      It will be corrected in the new version.

    2. eLife assessment

      Using the continuum theory of elastic solids, the authors suggest that periodic muscle contraction leads to elongation of C. elegans embryos by storing elastic energy that is subsequently released by extending the embryo's long axis. This important finding could apply to other developmental processes and be exploited in soft robotics. While the presented evidence is in principle convincing, features of the the theory are not explained in sufficient detail.

    3. Reviewer #1 (Public Review):

      Summary:<br /> The authors have made a novel and important effort to distinguish and include different sources of active deformations for fitting C elegans embryo development: cyclic muscle contractions and actomyosion circumferential stresses. The combination and synchronisation of both contributions are, according to the model, responsible for different elongation rates, and can induce bending and torsion deformations, which are a priori not expected from purely contractile forces. The model can be applied to other growth processes in initially cylindrical shapes.

      Strengths:<br /> The model allows us to fit and deduce specific growth patterns, frequencies, and locations of contractions that yield the observed axial elongation during the 240 min of the studied process.

      The deformation gradient is decomposed according to muscle and actomyosin activity, which can be distinguished and quantified. An energy-transferring process allows for the retrieval of the necessary permanent deformations that embryo development requires.

      Weaknesses:<br /> Despite the completeness of the model, the explanation of the methodology needs to be improved. Parameters and quantities are not always explained in the main text and are introduced on some occasions in an ordered manner. This makes the comprehension and deduction of methodology difficult. There are some minor comments that are listed below. The most important points are:

      -How are the authors sure that there is a torsional deformation? Without tracking the muscle fibers, bending with respect to different angles for different Zs may yield a shape similar to the one in Figure 6E. Furthermore, it is unclear why the model yields torsion deformation. If material points of actomyosin rings do not change in reference configuration, no helicoidal growth should be happening.

      -The triple decomposition F=F_e*G_i*G_0 seems to complicate the expressions of growth and requires the use of angles alpha and beta due to the initial deformation G_0. Why not use a simpler decomposition F=F_e*G, where G contains all contributions from actomyosin and muscle contractions in a material frame? This would avoid considering angles alpha and beta.

      The section "Energy transformation and Elongation" is unclear. Indeed, stresses need to relax, otherwise, the removal of muscle and actin activity would send the embryo back to its initial state. However, the rationale behind the energy transfer is not explained. Authors seem to impose W_c=W_r, and from this deduce the necessary actin contraction after muscle relaxation. Why should energy be maintained when muscle relaxes? Which mechanism physically imposes this energy transfer? Muscle contraction could indeed induce elongation if traction forces at the opposite side of the contracting muscle relax. In fact, an alternative approach for obtaining stress relaxation and axial elongation would be converting part of the elastic deformation F_e to a permanent deformation F_p.

      -Self contact is ignored. This may well be a shape generator and responsible for bending deformations. The convoluted shape of the embryo in the confined space deserves at least commenting on this limitation of the model.

    4. Reviewer #2 (Public Review):

      Summary:<br /> During C. elegans development, embryos undergo elongation of their body axis in the absence of cell proliferation or growth. This process relies in an essential way on periodic contractions of two pairs of muscles that extend along the embryo's main axis. How contraction can lead to extension along the same direction is unknown.

      To address this question, the authors use a continuum description of a multicomponent elastic solid. The various components are the interior of the animal, the muscles, and the epidermis. The different components form separate compartments and are described as hyperelastic solids with different shear moduli. For simplicity, a cylindrical geometry is adopted. The authors consider first the early elongation phase, which is driven by contraction of the epidermis, and then late elongation, where contraction of the muscles injects elastic energy into the system, which is then released by elongation. The authors get elongation that can be successfully fitted to the elongation dynamics of wild-type worms and two mutant strains.

      Strengths:<br /> The work proposes a physical mechanism underlying a puzzling biological phenomenon. The framework developed by the authors could be used to explain phenomena in other organisms and could be exploited in the design of soft robots.

      Weaknesses:<br /> 1) This reviewer considers that the quality of the writing is poor. Because of this the main result of this work, how elongation is achieved by contraction, remains unclear to me. In the opinion of this reviewer, the work is not accessible to a biologist. This is a real pity because the findings are potentially of great interest to developmental biologists and engineers alike.

      2) The authors assume that the embryo is elastic throughout all stages of development. Is this assumption appropriate? In my opinion, the authors need to critically discuss this assumption and provide justification. Would this still be true for the adult? If so could the adult relax back to the state prior to elongation? The embryo should be able to do that, if the contractility of the epidermis were sufficiently reduced, right?

      3) The authors impose strains rather than stress. Since they want to understand the final deformation, I find this surprising. Maybe imposing strain or stress is equivalent, but then you should discuss this.

      4) Does your mechanism need 4 muscle strands or would 2 be sufficient?

      5) It is sometimes hard to understand, whether the authors are talking about the model or the worm.

    1. Author Response

      The following is the authors’ response to the original reviews.

      The authors thank the reviewers for their thoughtful and constructive comments. We address each comment below and have uploaded a revised manuscript.

      Public Reviews

      1) One key point that could use further clarification is how to interpret densities in the reconstruction that do overlap with the template. If the omitted regions can be reliably reconstructed, and the density is smooth throughout, it implies the detected particles are not only (mostly) true positives but also their poses must be essentially correct. Therefore, why cannot the entire reconstruction be trusted, including portions overlapping with the template? In the "Future applications" section, the authors state that in order to obtain a reconstruction that is entirely devoid of template bias, it would be necessary to successively omit parts of the template structure through its entirety. I wonder if that is really necessary and if the presented approach of omitting template portions could be better framed as a "gold-standard" validation procedure.

      Our assumption is indeed that the entire reconstruction can be trusted if the omitted features are faithfully reproduced in the reconstruction. We have added a sentence in the discussion to clarify this. However, we think that assessing template bias will still require the omit test (see also our reply below). Also, as discussed in the manuscript, there is likely a little bias left, even if it is not directly visible in the reconstruction. Therefore, if the goal is an entirely unbiased reconstruction, the only way will be to successively omit parts of the template structure throughout the template.

      2) In other words, given the compelling evidence provided by the reconstructions in the omitted areas, I find it hard to imagine how the procedure would be "hallucinating" features in the rest of the structure, as the entire reconstruction depends on the same pose and defocus parameters. A possible experiment to test this hypothesis would be to go the opposite way, deliberately adding an unrealistic feature to the bait and checking whether it comes up in the reconstruction, while at the same time checking how it behaves in omitted parts.

      Template bias might be generated in different ways. A common situation is the presence of noise, which causes biased deviations of the best template match from their “true” match that would just align the target signal to the template. Another type of bias may occur when there is a mismatch between the template and the detected target. The target may still be detected if there is sufficient structural overlap with the template. Since there might not be a clear “correct” alignment of a mismatching target to the template, the best alignment may again be biased, generating artificial density in the reconstruction. This second case may produce bias that is more pronounced in the mismatching regions. The different origins of bias will have to be investigated more thoroughly in another study. For the present study, however, we maintain that unless there is some assessment of bias in a given location, one cannot completely rule out bias based on the absence of it elsewhere in the reconstruction.

      3) When assessing their approach to in situ data (the yeast ribosome), it is intriguing to see that the resolution downgraded from 3.1 to 8 Å when refinement of the particle poses against the current reconstruction was attempted. The authors do provide some possible explanations, such as the reduced signal of the reconstruction at high resolution and the crowded background, but it leaves one to wonder if this means that a 3.1 Å reconstruction could never be obtained from these data by conventional single-particle analysis procedures.

      The refinement results with our in situ data do indeed appear to be limited to low resolution when using the conventional single-particle pipeline and software. It might be possible to improve refinement by introducing certain priors, filters and masking functions that are optimized for the increased background and spectral properties of in situ data. Also, we have not tested all available software, and some might perform better than others. It is worth noting that in a different study using our data, by Cheng et al (2023) and cited in our manuscript, the resolution of the refined reconstruction using different software was ~7 Å resolution, i.e., close to what we report here. Finally, refinement of the detected targets against a high-resolution template does work but since it involved the template, we regard this as part of the template matching process.

      4) Furthermore, in the section "Quantifying template bias", the authors make the intriguing statement that there can still be some overfitting of noise even in true positives. I understand this overfitting would occur in the form of errors in the pose and defocus estimation, but a clarification would be helpful.

      We have added a sentence in the Discussion to clarify where this bias may come from.

      5) In the Discussion, the claim that "it is not necessary to use tomography to generate high-resolution reconstructions of macromolecular complexes in cells" is a misconception, at least in part. As demonstrated in works by the same group and others (https://doi.org/10.1016/j.xinn.2021.100166, https://doi.org/10.1038/s41467-023-36175-y, https://doi.org/10.1038/s41586-023-05831-0), 2D imaging of native cellular environments does offer a faster and better way to obtain high-resolution reconstructions compared to tomography. However, tomography provides the entire 3D context of the macromolecules, such as their localization to membranes and the cellular architecture, which can be readily visualized in a tomogram even at low resolution, so methods for structure determination from tilt series data such as subtomogram averaging remain of paramount importance. Most likely, a combination of 2D and 3D imaging approaches will be necessary to retrieve both the highest structural resolution and their cellular context to address biological questions.

      We agree and have modified our statement accordingly.

      6) The "Materials and Methods" section lacks a description of transmission electron microscopy data collection.

      We are sorry for this oversight and have added these details.

      7) Finally, the preprint version of this work posted on bioRxiv (https://doi.org/10.1101/2023.07.03.547552) contains the following competing interests statement, which is missing from the submitted version: "The authors are listed as inventors on a closely related patent application named "Methods and Systems for Imaging Interactions Between Particles and Fragments", filed on behalf of the University of Massachusetts."

      This is correct. The statement was missing in the first version of the uploaded manuscript and was added after consultation with the eLife editorial office.

      8) Quantification of the amount of model bias is then performed using omit maps, where every 20th residue is removed from the template and corresponding reconstructions are compared (for those residues) with the full-template reconstructions. As expected, model bias increases with lower thresholds for the picking. Some model bias (Omega=8%) remains even for very high thresholds. The authors state this may be due to overfitting of noise when template-matching true particles, instead of introducing false positives. Probably, that still represents some sort of problem. Especially because the authors then go on to show that their expectation of the number of false positives does not always match the correct number of false positives, probably due to inaccuracies in the noise model for more complicated images. This may warrant further in-depth discussion in a revised manuscript.

      We have added further thoughts regarding the mismatch between expected and actual number of false positives in the Discussion section. A full understanding of the issue likely requires further study, which is currently underway.

      9) The authors evaluate the effect of high-resolution 2D template matching on template bias in reconstructions, and provide a quantitative metric for overfitting. It is an interesting manuscript that made me reevaluate and correct some mistakes in my understanding of overfitting and template bias, and I'm sure it will be of great use to others in the field. However, its main point is to promote high-resolution 2D template matching (2DTM) as a more universal analysis method for in vitro and, more importantly, in situ data. While the experiments performed to that end are sound and well-executed in principle, I fail to make that specific conclusion from their results.

      We do not see 2DTM as a more universal analysis method for in vitro and in situ data, but as simply as another method that can be used. We have added a sentence in the introduction to clarify this.

      10) The authors correctly point out that overfitting is largely enabled by the presence of false-positives in the data set. They go on to perform their in situ experiments with ribosomes, which provide an extremely favorable amount of signal that is unrealistic for the vast majority of the proteome. This seems cherry-picked to keep the number of false-positives and false-negatives low. The relationship between overfitting/false-positive rate and the picking threshold will remain the same for smaller proteins (which is a very useful piece of knowledge from this study). However, the false-negative rate will increase a lot compared to ribosomes if the same high picking threshold is maintained. This will limit the applicability of 2DTM, especially for less-abundant proteins.

      The reviewer is correct that the lower SNR of smaller targets poses a fundamental limit to 2DTM. We have stated this in previous studies and have added a sentence in the introduction of the current manuscript to clarify this.

      11) I would like to see an ablation study: Take significantly smaller segments of the ribosome (for which the authors already have particle positions from full-template matching, which are reasonably close to the ground-truth), e.g. 50 kDa, 100 kDa, 200 kDa etc., and calculate the false-negative rate for the same picking threshold. If the resulting number of particles does plummet, it would be very helpful to discuss how that affects the utility of 2DTM for non-ribosomes in situ.

      The suggested ablation study is a good idea and was reported by Rickgauer et al (2020), cited in our manuscript. We added our own analysis for this dataset in Figure 4-figure supplement 1 and show the proportion of LSUs detected as a function of template mass, indicating detection limit of ~300 kDa. We also added a note in the Results section to explain that the threshold we use to limit false positives means that there are also false negatives, with a rate that depends on their molecular mass.

      12) Another point of concern is the dramatic resolution decrease to 8 A after multiple iterations of refinement against experimental reconstructions described in line 159. Was this a local search from the poses provided by 2DTM, or something more global? While this is not a manifestation of overfitting as the authors have conclusively shown, I think it adds an important point to the ongoing "But do we really need tomograms, or can we just 2D everything?" debate in the field, which is also central to the 2D part of 2DTM. Reaching 8 A with 12k ribosome particles would be considered a rather poor subtomogram averaging result these days. Being in the "we need tilt series to be less affected by non-Gaussian noise" camp myself, I wonder if this indicates 2D images are inherently worse for in situ samples. If they are, the same limitations would extend to template matching. In that case, shouldn't the authors advocate for 3DTM instead of 2DTM? It may not be needed for ribosomes, but could give smaller proteins the necessary edge.

      We have extensively discussed the advantages and disadvantages of both tomography and 2DTM (Lucas et al, 2021) and think it is not useful to talk in terms of “better” and “worse”. Instead, each technique has its areas of application, and we maintain that a combination of the two may give the best results. The limitation of 8 Å does not apply to reconstructions aligned against high-resolution templates, as demonstrated in the present study. Regarding noise models, there is also need for these in 3DTM, as explained in recent publications: Maurer et al (2023), bioRxiv, doi.org/10.1101/2023.09.06.556487; Cruz-León et al (2023), bioRxiv, doi.org/10.1101/2023.09.05.556310; Chaillet et al (2023), Int. J. Mol. Sci. 24, 13375.

      13) Right now, this study is also an invitation to practitioners who do not understand the picking threshold used here and cannot relate it to other template-matching programs to do a lot of questionable template matching and claim that the results are true because templates are "unoverfittable". I think such undesirable consequences should be discussed prominently.

      We have added a discussion of this point in the Discussion section.

      Recommendations for the authors

      1) Lines 58-59: What does "nominally untilted" mean? Has the lamella pre-tilt (milling angle) been taken into account or not? If yes, how?

      The lamella milling angle was not taken into account, so there is a tilt built into the sample of about 8° that was not compensated for by a counter-tilt of the microscope goniometer. We have added a note to explain this in the text of the manuscript.

      2) Lines 113-114: A brief explanation of the threshold calculation method from Rickgauer et al, 2017 to achieve an expected false positive rate of one per micrograph would be helpful here.

      We describe the equation for estimating the false discovery rate later in the manuscript. We have added a note in the text to point the reader to the relevant section of the manuscript.

      3) For consistency, it would be interesting to include a plot of the SNR peaks found by 2DTM in the in situ dataset, that could be directly compared to Figure 1 - figure supplement 1B.

      We have added this to Figure 2 - figure supplement 1A-C, to directly compare to Figure 1 – figure supplement 1A-C.

      4) Showing model-map FSC curves between the density retrieved from the omitted areas and their respective models would provide further evidence not only that they are correct but to what extent.

      An FSC calculation would be challenging for small regions, such as side chains and drugs, due to masking artifacts. Moreover, the model was built into an in vitro determined map and was not fit into the in vivo map calculated here. Therefore, deviations between the map and model may reflect differences between the two conditions and may not reflect the agreement of the map to the in vivo structure.

      5) Lines 128-130: The figure references are wrong. Here, Figure 1B should probably be Figure 1A (or 1B), and Figure 1C clearly refers to Supplementary Figure 1F (FSC curve).

      We have corrected the incorrect figure references.

      6) Line 125: Wrong figure reference, Figure 1A here refers to Supplementary Figure 1B (cross-correlation peaks).

      We have corrected the incorrect figure references.

      7) I haven't been able to find mention of code availability in the manuscript. Given that it is a major outcome of the study, I think it should be provided.

      The code is available from the cisTEM repository, github.com/timothygrant80/cisTEM, and an executable version of the program measure_template_bias has been posted for download on the cisTEM webpage, cistem.org. We have added a note in the Methods section to point the readers to these resources.

      8) Line 50: "An additional complication of subtomogram averaging for in situ imaging is the selection of valid targets" - This is not specific to subtomogram averaging, but to in situ samples.

      We agree and have updated the text to reflect this.

      9) Line 77: "if this is true for high-resolution features, which are more susceptible to noise overfitting" - This is not intuitive to me. High-resolution features require more information to be overfitted with a constant set of model parameters, thus making their overfitting harder.

      The reviewer is correct that there is more information at high resolution, partially compensating for the low SNR. However, the overall refinement behavior is still dominated by overfitting at high resolution, as we have demonstrated in an earlier publication in Stewart & Grigorieff (2004), Ultramicroscopy 102, 67–84.

      10) Line 316: "Baited reconstruction is substantially faster and a more streamlined" - To back this and other similar statements, it would be helpful if the authors provided some time measurements for the execution of their potentially very computationally expensive search.

      The current implementation of 2DTM requires 45 GPU hours per template per K3 image to search 13 defocus planes. However, for a comparison, the manual work for annotation, as well as additional processing to align and classify sub-tomograms to generate high resolution averages should also be considered in this comparison. These are highly project-dependent and can exceed the time required for 3DTM manifold. We have clarified this in our Discussion section.

      11) Line 319: "We expect focused classification to identify sub-populations to further improve the resolution" - How would this work if refining the 2D data without a high-resolution template resulted in significantly worse resolution even for a ribosome? Or is this meant to be done with prior knowledge of every state?

      Classification can be done using existing single particle software. To avoid alignment errors, as described above, particle alignment angles and shifts are fixed during classification. This leaves only the particle occupancy per class to be refined, which appears to lead to good classification. We have added a brief note to explain this strategy. However, since this is not shown in this manuscript, we have not added a more extensive discussion of particle classification.

      12) Line 354: "without requiring manual intervention or expert knowledge" - Previous expert knowledge was arguably provided in the form of a high-resolution structure.

      We agree with the reviewer and have clarified our statement.

    2. eLife assessment

      This is an important demonstration of how the false-positive rate of high-resolution 2D template matching to find particles of a given target structure in 2D cryo-EM images (2DTM) relates to overfitting the data towards the template. The authors present new methods to measure the amount of model bias that gets introduced in high-resolution features of such maps, with compelling evidence that high-resolution features that are not present in the template can still be reconstructed in 3D from images obtained by 2DTM.

    3. Reviewer #1 (Public Review):

      This work continues a series of recent publications from the Grigorieff lab (https://doi.org/10.7554/eLife.25648, https://doi.org/10.7554/eLife.68946, https://doi.org/10.7554/eLife.79272, https://doi.org/10.1073/pnas.2301852120) showcasing the development of high-resolution 2D template matching (2DTM) for detection and reconstruction of macromolecules in cryo-electron microscopy (cryo-EM) images of crowded cellular environments. It is well known in the field of cryo-EM that searching noisy images with a template can result in retrieval of the template itself when averaging the candidate particles detected, an effect known as "Einstein-from-noise" (https://doi.org/10.1073/pnas.1314449110). Briefly, this occurs because it is statistically likely to find a match to an arbitrary motif over a large noisy dataset just by chance. The effect can be mitigated for example by limiting the resolution of the template, but this prevents the accurate detection of macromolecules in a crowded environment, as their "fingerprint" lies in the high-resolution range (https://doi.org/10.7554/eLife.25648). Here, the authors show through several experiments on in vitro and in situ data that features as small as drug compounds and water molecules can be reliably retrieved by 2DTM if they are searched by a template (the "bait") that contains expected neighboring features but not the targets themselves.

      The ideas are generally clearly presented with appropriate references to related work, and claims are well supported by the data. In particular, the experiments for verifying the density of the ribosomal protein L7A as well as the systematic removal of residuals from the template model to assess bias are particularly clever.

      The revised version of the manuscript addresses essentially all of the concerns raised previously by this reviewer, with the addition of figures and extended discussion of the key concepts.

    4. Reviewer #2 (Public Review):

      This paper by Lucas et al follows on from earlier work by the same group. They use high-resolution 2D template matching (2DTM) to find particles of a given target structure in 2D cryo-EM images, either of in vitro single-particle samples or of more complicated samples, such as FIB-milled cells (which would otherwise perhaps be used for 3D electron tomography). One major concern for high-resolution template matching has been the amount of model bias that gets introduced into a reconstruction that is calculated straight from the orientations and positions identified by the projection matching algorithm. This paper assesses the amount of model bias that gets introduced in high-resolution features of such maps.

      For a high-signal-to-noise in vitro single-particle cryo-EM data set, the authors show that their approach does not yield much model bias. This is probably not very surprising, as their method is basically a low false-positive particle picker, which works very well on such data. Still, I guess that is the whole point of it, and it is good to see that they can reconstruct density for a small-molecule compound that was not present in the original template.

      For FIB-milled lamella of yeast cells with stalled ribosomes, the SNR is much lower and the dangers of model bias will be higher. This is also evidenced by the observation that further refinement of initial 2DTM identified orientations and positions worsens the map. This is obviously a more relevant SNR regime to assess their method. Still, they show convincing density for the GHX compound that was not present in the template, but was there in the reconstruction from the identified particles.

      Quantification of the amount of model bias is then performed using omit maps, where every 20th residue in removed from the template and corresponding reconstructions are compared (for those residues) with the full-template reconstructions. As expected, model bias increases with lower thresholds for the picking. Some model bias (Omega=8%) remains even for very high thresholds. The authors state this may be due to overfitting of noise when template-matching true particles, instead of introducing false positive. Probably, that still represents some sort of problem. Especially because the authors then go on to show that their expectations of number of false positives do not always match the correct number of false positive, probably due to inaccuracies in the noise model for more complicated images, this may warrant further in-depth discussion in a revised manuscript.

      Overall, I think this paper is well written and it has made me think differently (again) about the 2DTM technique and its usefulness in various applications, as outlined in the Discussion. Therefore, it will be a constructive contribution to the field.

      After the first round of review, the authors addressed most points raised in a satisfying manner, which has led to a further (relatively minor) improvement of the manuscript.

    5. Reviewer #3 (Public Review):

      The authors evaluate the effect of high-resolution 2D template matching on template bias in reconstructions and provide a quantitative metric for overfitting. It is an interesting manuscript that made me reevaluate and correct some mistakes in my understanding of overfitting and template bias, and I'm sure it will be of great use to others in the field.

      The revised version of this manuscript addresses all of my concerns. The newly added Figure 4 supplement 1 provides a sobering outlook for the fraction of the proteome we can hope to identify in situ.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review)

      Summary:

      Huang and colleagues present a method for approximation of linkage disequilibrium (LD) matrices. The problem of computing LD matrices is the problem of computing a correlation matrix. In the cases considered by the authors, the number of rows (n), corresponding to individuals, is small compared to the number of columns (m), corresponding to the number of variants. Computing the correlation matrix has cubic time complexity , which is prohibitive for large samples. The authors approach this using three main strategies:

      1. they compute a coarsened approximation of the LD matrix by dividing the genome into variant-wise blocks which statistics are effectively averaged over;

      2. they use a trick to get the coarsened LD matrix from a coarsened genomic relatedness matrix (GRM), which, with time complexity, is faster when n << m;

      3. they use the Mailman algorithm to improve the speed of basic linear algebra operations by a factor of log(max(m,n)). The authors apply this approach to several datasets.

      Strengths:

      The authors demonstrate that their proposed method performs in line with theoretical explanations.

      The coarsened LD matrix is useful for describing global patterns of LD, which do not necessarily require variant-level resolution.

      They provide an open-source implementation of their software.

      Weaknesses:

      The coarsened LD matrix is of limited utility outside of analyzing macroscale LD characteristics. The method still essentially has cubic complexity--albeit the factors are smaller and Mailman reduces this appreciably. It would be interesting if the authors were able to apply randomized or iterative approaches to achieve more fundamental gains. The algorithm remains slow when n is large and/or the grid resolution is increased.

      Thanks for your positive and accurate evaluation! We acknowledge the weakness and include some sentences in Discussion.

      “The weakness of the proposed method is obvious that the algorithm remains slow when the sample size is large or the grid resolution is increased. With the availability of such as UK Biobank data (Bycroft et al., 2018), the proposed method may not be adequate, and much advanced methods, such as randomized implementation for the proposed methods, are needed.”  

      Reviewer #2 (Public Review)

      Summary:

      In this paper, the authors point out that the standard approach of estimating LD is inefficient for datasets with large numbers of SNPs, with a computational cost of , where n is the number of individuals and m is the number of SNPs. Using the known relationship between the LD matrix and the genomic- relatedness matrix, they can calculate the mean level of LD within the genome or across genomic segments with a computational cost of . Since in most datasets, n<<m, this can lead to major computational improvements. They have produced software written in C++ to implement this algorithm, which they call X-LD. Using the output of their method, they estimate the LD decay and the mean extended LD for various subpopulations from the 1000 Genomes Project data.

      Strengths:

      Generally, for computational papers like this, the proof is in the pudding, and the authors appear to have been successful at their aim of producing an efficient computational tool. The most compelling evidence of this in the paper is Figure 2 and Supplementary Figure S2. In Figure 2, they report how well their X- LD estimates of LD compare to estimates based on the standard approach using PLINK. They appear to have very good agreement. In Figure S2, they report the computational runtime of X-LD vs PLINK, and as expected X-LD is faster than PLINK as long as it is evaluating LD for more than 8000 SNPs.

      Weakness:

      While the X-LD software appears to work well, I had a hard time following the manuscript enough to make a very good assessment of the work. This is partly because many parameters used are not defined clearly or at all in some cases. My best effort to intuit what the parameters meant often led me to find what appeared to be errors in their derivation. As a result, I am left worrying if the performance of X-LD is due to errors cancelling out in the particular setting they consider, making it potentially prone to errors when taken to different contexts.

      Thanks for you critical reading and evaluation. We do feel apologize for typos, which have been corrected and clearly defined now (see Eq 1 and Table 1). In addition, we include more detailed mathematical steps, which explain how LD decay regression is constructed and consequently finds its interpretation (see the detailed derivation steps between Eq 3 and Eq 4).

      Impact:

      I feel like there is value in the work that has been done here if there were more clarity in the writing. Currently, LD calculations are a costly step in tools like LD score regression and Bayesian prediction algorithms, so a more efficient way to conduct these calculations would be useful broadly. However, given the difficulty I had following the manuscript, I was not able to assess when the authors’ approach would be appropriate for an extension such as that.

      See our replies below in responding to your more detailed questions.

      Reviewer #1 (Recommendations For The Authors)

      There are numerous linguistic errors throughout, making it challenging to read.

      It is unclear how the intercepts were chosen in Figure S2. Since theory only gives you the slopes, it seems like it would make more sense to choose the intercept such that it aligns with the empirical results in some way.

      Thanks for your critical evaluation. We do feel apologize some typos, and we have read it through and clarify the text as much as possible. In addition, we included Table 1, which introduces mathematical symbols of the paper.

      In Figure S2, the two algorithms being compared have different software implementations, PLINK vs X-LD. Their real performance not only depended on the time complexity of the algorithms (right-side y-axis), but also how the software was coded. PLINK is known for its excellent programming. If we could have programmed as well as Chris Chang, the performance of X-LD should have been even better and approach the ratio m/n. However, even under less skilled programming, X-LD outperformed plink.

      Reviewer #2 (Recommendations For The Authors):

      Thank you for the chance to review your manuscript. It looks like compelling work that could be improved by greater detail. Providing the level of detail necessary may require creating a Supplementary Note that does a lot of hand-holding for readers like me who are mathematically literate but who don’t have the background that you do. Then you can refer readers to the Supplement if they can’t follow your work.

      We fix the problems and style issues as possible as we can.

      Regarding the weakness section in the public review, here are a few examples of where I got confused, though this list is not exhaustive.

      1) Consider Equation 1 (line 100), which I believe must be incorrect. Imagine that g consists of two SNPs on different chromosomes with correlation rho. Then ell_g (which is defined as the average squared elements of the correlation matrix) would be

      ell_g = 1/4 (1 + 1 + rho^2 + rho^2) = (1+rho^2)/2.

      But ell_1=1 and ell_2=1 and ell_12=rho^2 (The average squared elements of the chromosome-specific correlation matrices and the cross-chromosome correlation matrix, respectively). So

      sum(ell_i)+sum(ell_ij) = 1 + 1 + rho^2 + rho^2 = (1+rho^2)*2.

      I believe your formulas would hold if you defined your LD values as the sum of squared correlations instead of the mean, but then I don’t know if the math in the subsequent sections holds. I think this problem also holds for Eq 2 and therefore makes Eqs 3 and 4 difficult to interpret.

      Thanks for your attentive review and invaluable suggestions. We acknowledge the typo in calculating the mean in Eq 1, resulting in difficulties in understanding the equations. We sincerely apologize for this oversight. To address this issue and ensure clarity in the interpretation of Eq 3 and Eq 4, we have provided more detailed explanations (see the derivation between Eq 3 and Eq 4).

      2) I didn’t know what the parameters are in Equation 3. The vector ell needs to be defined. Is it the vector of ell_i for each chromosomal segment i? I’m also confused by the definition of m_i, which is defined on line 113 as the “SNP number of the i-th chromosome.” Do the authors mean the number of SNPs on the i-th chromosomal segment? If so, it wasn’t clear to me how Eq 2 and Eq 3 imply Eq 4. Further, it wasn’t clear to me why E(b1) quantifies the average LD decay of the genome. I’m used to seeing plots of average LD as a function of distance between SNPs to calculate this, though I’m admittedly not a population geneticist, so maybe this is standard. Standard or not, readers deserve to have their hands held a bit more through this either in the text or in a Supplementary Note.

      Thanks for your insightful feedback. When we were writing this paper, our actually focus was Eq 3 and to establish the relationship between chromosomal LD and the reciprocal of the length of chromosome (Fig 6A) – which was surrogated by the number of SNPs, the correlation between ell_i and 1/m_i.

      We asked around our friends who are population geneticists, who anticipated the correlation between chromosomal LD (ell) and 1/m. The rationale simple if one knows the very basis of population genetics. A long chromosome experiences more recombination, which weakens LD for a pair of loci. In particular, for a pair of loci D_t=D_0 (1-c)^t. D_t the LD at the t generation, D_0 at the 0 generation, and c the recombination fraction. As recombination hotspots are nearly even distributed along the genome, such as reported by Science 2019;363:eaau8861, the chromosome will be broken into the shape in Author response image 1 (Fig 1C, newly added). Along the diagonal you see tight LD block, which will be vanished in the further as predicted by D_t equation, and any loci far away from each other will not be in LD otherwise raised by such as population structure. Ideally, we assume the diagonal block of aveage size of m×m and average LD of a SNP with other SNPs inside the diagonal block (red) is l_u; and, in contrast, off-diagonal average LD (light red) to be l_uv. This logic is hidden but employed in such as ld score regression and prs refinement using LD structure.

      Author response image 1.

      But, how to estimate chromosomal LD (ell), which is overwhelming as our friends said! So, the Figure 6A is logically anticipated by a seasoned population geneticist, but has never been realized because of is nightmare. Often, those signature patterns should have been employed as showcases in releasing new reference data, such as HapMap. However, to our knowledge, this signature linear relationship has never been illustrated in those reference data.

      If you further test a population geneticist, if any chromosome will deviate from this line (Fig 6A)? The answer most likely will be chromosome 6 because of the LD tight HLA region. However, it is chromosome 11 because of its most completed sequenced centromere. Chr 11 is a surprise! With T2T sequenced population, Chr 11 will not deviate much. We predict!

      However, we suspect whether people appreciate this point, we shift our focus to efficient computation of LD—which is more likely understood. We acknowledge the lack of clarity in notation definitions and the absence of the derivation for the interpretation of b1 and b0 for LD decay regression. So, we have added a table to provide an explanation of the notation (see the Table 1) and provided additional derivations, which explained how LD decay regression was derived (see the derivation between Eq 3 and Eq 4). Figure 1C provides illustration for the underlying assumption under LD.

      The technique to bridge Eq 2~3 to Eq 4 is called “building interpretation”. It once was one of the kernel tasks for population genetics or statistical genetics, and a classical example is Haseman-Elston regression (Behavior Genetics, 1972, 2:3-19). When it is moving towards a data-driven style, the culture becomes “shut up, calculate”. Finding interpretation for a regression is a vanishing craftmanship, and people often end up with unclear results!

      3) In line 135, it’s not clear to me what is meant by . If it is , then wouldn’t the resulting matrix be a matrix of zeros since is zero everywhere except the lower off-diagonal? So maybe it is ? But then later in that line, you say that the square of this matrix is the sum of several terms of the form . Are these the scalar elements of the G matrix? But then the sum is a scalar, which can’t be true since is a matrix.

      Thanks for your attentive review. We indeed confused the definition of matrices and their elements, and should refer to the stacked off-diagonal elements of matrix . So, is a vector for variable – the relationship between sample i and j. We assume the reviewer use R software, then corresponds to mean .

      See the text between Eq 5 and Eq 6.

      “We extract two vectors , which stacks the off-diagonal elements of , and , which takes the diagonal elements of .”

      In addition, , so the ground truth is that , but not zero.

      To clarify these math symbols, we replace G with K, so as to be consistent with our other works (see Table 1).

      To derive the means and the sampling variances for and , the Eq 7 can be established by some modifications on the Delta method as exampled in Appendix I of Lynch and Walsh’s book (Lynch and Walsh, 1998). We added this sentence near Eq 7 in the main text.

    2. eLife assessment

      This study presents a useful new approach for efficient computation of statistics on correlations between genetic variants (linkage disequilibrium, or LD), which the authors apply to quantify the extent of LD across chromosomes. The method and its derivation are solid. The authors document that cross-chromosome LD can be substantial, which has implications for geneticists who are interested in population structure and its impact on genetic association studies.

    3. Joint Public Review:

      Summary:

      In this paper, the authors point out that the standard approach of estimating LD is inefficient for datasets with large numbers of SNPs, with a computational cost of O(nm^2), where n is the number of individuals and m is the number of SNPs. Using the known relationship between the LD matrix and the genomic-relatedness matrix, they can calculate the mean level of LD within the genome or across genomic segments with a computational cost of O(n^2m). Since in most datasets, n<<br /> Strengths:

      Generally, for computational papers like this, the proof is in the pudding, and the authors have been successful at their aim of producing an efficient computational tool. The most compelling evidence of this in the paper are Figure 2 and Supplementary Figure S2. In Figure 2, they report how well their X-LD estimates of LD compare to estimates based on the standard approach using PLINK. They appear to have very good agreement. In Figure S2, they report the computational runtime of X-LD vs PLINK, and as expected X-LD is faster than PLINK as long as it is evaluating LD for more than 8000 SNPs.

      Weakness:

      This method seems to be limited to calculating average levels of LD in broad regions of the genome. While it would be possible to make the regions more fine-grained, doing so appears to make this approach much less efficient. As such, applications of this method may be limited to those proposed in the paper, for questions where average LD of large chromosomal segments is informative.

      Impact:

      This approach seems to produce real gains for settings where broad average levels of LD are useful to know, but it will likely have less of an impact in settings where fine-grained levels are LD are necessary (e.g., accounting for LD in GWAS summary statistics).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for your consideration and insightful comments on our article.

      We have gone through all the reviewers' comments and addressed all their questions and concerns point by point.

      As per their recommendation, we have amended our manuscript by providing more information about the experimental procedure and statistical analysis followed, and removed some analyses with a reduced number of imaging sessions. In addition, as a Resource and Tools article, the claim of our paper has been adjusted to a proof-of-concept paper showing robust and reliable preliminary results. In the meantime, we have provided 3 new Supplementary Figures, including one showing data from all individual animals.

      Reviewer #1 (Public Review):

      The authors apply a new approach to monitor brain-wide changes in sensory-evoked hemodynamic activity after focal stroke in fully conscious rats. Using functional ultrasound (fUS), they report immediate and lasting (up to 5 days) depression of sensory-evoked responses in somatosensory thalamic and cortical regions.

      Strengths: This a technically challenging and proof-of-concept study that employs new methods to study brain-wide changes in sensory-evoked neural activity, inferred from changes in cerebral blood flow. Despite the minor typos/grammatical errors and small sample size, the authors provide compelling images and rigorous analysis to support their conclusions. Overall, this was a very technically difficult study that was well executed. I believe that it will pave the way for more extensive studies using this methodological approach. Therefore I support this study and my recommendations to improve it are relatively minor in nature and should be simple for the authors to address.

      Weaknesses: The primary weakness of this paper is the small sample sizes. Drawing conclusions based on the small sham control group (n=2) or 5-day stroke recovery group (n=2), is rather tenuous. One way to alleviate some uncertainty with regard to the conclusions would be to state in the discussion that the findings (ie. loss of thalamocortical function after stroke) are perfectly consistent with previous studies that examined thalamocortical function after stroke. The authors missed some of these supporting studies in their reference list (see PMID: 28643802, 1400649). A second issue that can easily be resolved is their analysis of the 69 brain regions. This seems like a very important part of the study and one of the primary advantages of employing efUS. As presented, I had difficulty seeing the data. I think it would be worthwhile to expand Fig 3 (especially 3C) into a full-page figure with an accompanying table in the Supplementary info section describing the % change in CBF for each brain region.

      Other Recommendations for the authors:.

      • Since there is variability in spreading depolarizations, was there any trend in the relationship between # SD's and ischemic volume? I know there are few data points but a scatterplot might be of interest.

      • For statistical comparisons of 'response curves' in Fig 3 and 4, what exactly was the primary dependent measure: changes in peak amplitude (%) or area under the curve?

      • There are several typos and minor grammatical errors in the manuscript. Some editing is recommended.

      We thank the reviewer for the comments and suggestion, we have adapted our message to a proof-of-concept paper showing robust and reliable preliminary results. We also thank the reviewer for pointing out important references that support our observation and have added them to our article. We have provided a supplementary full-page version of the current Figure 3C (see Supplementary Figure 3).

      Regarding the recommendations, we strongly agree that it would be of interest to link SDs and ischaemia, but unfortunately this can't be done because our experimental design, i.e. narrow cranial window and single static plane, does not allow brain-wide quantification of ischemic volume. This would be possible either by scanning the brain or by using a matrix array (also discussed in the manuscript).

      For statistical analysis of the hemodynamic response curves, we have adapted them to compare the area under the curve (AUC). In addition, we have provided a new Supplementary Figure 4 showing the associated values and statistics.

      We have edited typos and errors.

      Reviewer #2 (Public Review):

      Brunner et al. present a new and promising application of functional ultrasound (fUS) imaging to follow the evolution of perfusion and haemodynamics upon thrombotic stroke in awake rats. The authors leveraged a chemically induced occlusion of the rat Medial Cerebral Artery (MCA) with ferric chloride in awake rats, while imaging with fUS cerebral perfusion with high spatio and temporal resolution (100µm x 110µm x 300µm x 0.8s). The authors also measured evoked haemodynamic response at different timepoints following whisker stimulation.

      As the fUS setup of the authors is limited to 2D imaging, Brunner and colleagues focused on a single coronal slice where they identified the primary Somatosensory Barrel Field of the Cortex (S1BF), directly perfused by the MCA and relay nuclei of the Thalamus: the Posterior (Po) and the Ventroposterior Medial (VPM) nuclei of the Thalamus. All these regions are involved in the sensory processing of whisker stimulation. By investigating these regions the authors present the hyper-acute effect of the stroke with these main results:

      • MCA occlusion results in a fast and important loss of perfusion in the ipsilesional cortex.

      • Thrombolysis is followed by Spreading Depolarisation measured in the Retrosplenial cortex.

      • Stroke-induced hypo-perfusion is associated with a significant drop in ipsilesional cortical response to whisker stimulation, and a milder one in ipsilesional subcortical relays.

      • Contralesional hemisphere is almost not affected by stroke with the exception of the cortex which presents a mildly reduced response to the stimulation.

      In addition, the authors demonstrate that their protocol allows to follow up stroke evolution up to five days post-induction. They further show that fUS can estimate the size of the infarcted volume with brilliance mode (B-mode), confirming the presence of the identified lesional tissue with post-mortem cresyl violet staining.

      Upon measuring functional response to whisker stimulation 5 days after stroke induction, the authors report that:

      • The ipsilesional cortex presents no response to the stimulation

      • The ipsilesional thalamic relays are less activated than hyper acutely

      • The contralesional cortex and subcortical regions are also less activated 5d after the stroke.

      These observations mainly validate the new method as a way to chronically image the longitudinal sequelae of stroke in awake animals. However, the potentially more intriguing results the authors describe in terms of functional reorganization of functional activity following stroke appear to be preliminary, and underpowered ( N = 5 animals were imaged to describe hyper-acute session, and N = 2 in a five day follow-up). While highly preliminary, the research model proposed by the author (where the loss of the infarcted cortex induces reduces activity in connected regions, whether by cortico-thalamic or cortico-cortical loss of excitatory drive), is interesting. This hypothesis would require a greatly expanded, sufficiently powered study to be validated (or disproven).

      We thank the reviewer for the careful and accurate description of our work. We have addressed all the comments, recommendations and concerns raised by providing details of the experimental procedure and statistical analysis followed, and by removing some analyses associated with a reduced number of imaging sessions (at d5, n=2).

      Reviewer #3 (Public Review):

      The authors set out to demonstrate the utility of functional ultrasound for evaluating changes in brain hemodynamics elicited acutely and subacutely by the middle cerebral artery occlusion model of ischemic stroke in awake rats.

      Functional ultrasound affords a distinct set of tradeoffs relative to competing imaging modalities. Acclimatization of rats for awake imaging has proven difficult with most, and the high quality of presented data in awake rats is a major achievement. The major weakness of the approach is in its being restricted to single-slice acquisitions, which also complicates the registration of acquisition across multiple imaging sessions within the same animal. Establishing that awake imaging represents an advancement in relation to studies under anesthesia hinges upon the establishment of the level of stress experienced by the animals in the course of imaging, i.e., requires providing data on the assessment of stress over the course of these long imaging sessions. This is particularly significant given how significant a stressor physical restraint has been established to be in rodent models of stress. Furthermore, assessment of the robustness of these measurements is of particular significance for supporting the wide applicability of this approach to preclinical studies of brain injury: the individual animal data (effect sizes, activation areas, kinetics) should thus be displayed and the statistical analysis expanded. Both within-subject, within/across sessions, and across-subjects variability should be evaluated. Thoughtful comments on the relationship between power doppler signal and cerebral blood volume are important to include and facilitate comparisons to studies recording other blood volume-weighted signals. Finally, the contextualization of the observations with respect to other studies examining acute and subacute changes in brain hemodynamics post focal ischemic stroke in rats is needed. It is also quite helpful, for establishing the robustness of the approach, when the statistical parametric maps are shown in full (i.e. unmasked).

      We would like to thank the reviewer for the comments, recommendations and concerns he/she/they raised. We have addressed all the points to clarify our article and make it more relevant and informative for readers.

      Reviewer #2 (Recommendations For The Authors):

      The work described by Brunner et al is primarily a methodological paper, with potentially interesting, yet not robust enough, novel biological insight into the mechanisms of stroke. Nonetheless, the method employed is interesting and potentially well-validated.

      General comments/suggestions

      1- One potential concern I have is related to the relatively low sample size used, with n=5 for the main results and only n=2 for the follow-up after 5d. I am not sure much can be generalized using only two animals in any research study and this N = 2 dataset should probably be removed entirely from the study. Moreover, I found the statistical methods used were only superficially described, which prevented me from assessing whether the results reported by the authors are biologically relevant or not (including some significant differences in rCBV well below 1% estimated over two individuals).

      We fully agree with the reviewer’s comment and balanced our claim by considering this work as a proof-of-concept on brain imaging of multiple aspects of stroke hemodynamics (ischemia, spreading depolarization-like events, cortico-thalamic functions) in awake head-fixed rats. Therefore, we attenuated our message along the entire manuscript to prevent misunderstanding and over statement (e.g., Lines 356, 441, 455), we also remove statistics from the analysis at d5 post-stroke, see Figure 4 and associated paragraph from Line 356.

      2- Based on their investigations, the authors propose a model where the loss of infarcted cortex induces reduced activity in connected regions, whether by cortico-thalamic or cortico-cortical loss of excitatory drive. This is an intriguing framework but this hypothesis would require a more complete, well-powered study to be substantiated.

      I think a clear recognition of the fact that these findings are just preliminary and not validated should be more explicitly reported. I also marginally note here that these results are in contrast with previous reports from the same team where occlusion of the MCA induced increased response to whisker stimulation in anaesthetised rats. These contradictory findings are not discussed in this manuscript.

      As mentioned above, we explicit more on the proof-of-concept proposed in this work as well as clearly stating on the preliminary aspect of the findings described in this work. As mentioned above, we attenuated our message along the entire manuscript to prevent misunderstanding and over statement (e.g., Lines 348, 433, 447), we also remove statistics from the analysis at d5 post-stroke, see figure 4 and associated paragraph from Line 348.

      We thanks the reviewer for pointing out the missing link with our previous work performed under anesthesia. We therefore provided a discussion point on this contradictory finding (Line 441).

      3- In a previous study from the same group perfusion was imaged in 3D either by means of a motorized probe or by using a 2D matrix arrays. It would be interesting to discuss why a 2D approach was chosen in this study over those previous methods.

      Indeed, brain-wide coverage would be of great interest in such experiment context. As mentionned by the reviewer, two strategies can be used:

      • One can scan the brain using a motorized probe as performed for different purposes by Sieu et al., Nature Methods, 2015; Hingot, Brodin et al., Theranostics 2020; Macé et al., Neuron 2019 and also by our group in Sans-Dublanc, Chrzanowska et al., Neuron, 2022; Brunner et al. Frontiers in Neuroscience 2022 and Brunner et al., JCBFM 2023. (This list of publication is not exhaustive).

      • A second approach aims at using a 2D matrix array to capture functions at brain-wide scale. So far, this strategy has been employed in a couple of studies (Rabut et al., Nature Methods, 2019 and Brunner, Grillet et al., Neuron, 2020).

      The strategy consisting of scanning (manually or using a motor) strongly limits investigation on brain functions, as performing an accurate covering of the functional regions requires an extensive and time-consumming scanning: brain functions must be addressed several time to capture a reliable and robust signal for all the brain section scanned (see Brunner et al., 2022). Unfortunately, this strategy prevents us to accurately capture other brain hemodynamics like the dynamic of the ischemia or the spreading depolarization event.

      On the other hand, the volumetric functional ultrasound imaging (vfUSI) would be suited for brain-wide coverage capturing large-scale brain functions (see Brunner, Grillet et al. Neuron 2020) and hemodynamic events (see Rabut et al., Nature Methods, 2019) but at the cost of the resolution, frame rate and larger cranial window. Unfortunately, this technology was not available when this work was conducted.

      Such experimental opportunities have been suggested at the end of the manuscript: “To overcome such limitation, one can extend the size of the cranial window to allow for larger scale imaging either by sequentially scanning the brain27,28,31,32,59,69,71,72, or by using the recently developed volumetric fUS which provides whole-brain imaging capabilities in anesthetized73 and awake rats30.“

      4- Overall the registration scheme seems suboptimal which ultimately questions the specificity of the findings in thalamic regions. It would be interesting to validate this procedure, especially the probe repositioning five days after the stroke.

      Positioning was not difficult part of this experiment. First, all head posts were implanted in the same position relative to the skull references bregma and lambda. Second, the head fixation ensures the same placement of the headpost for all animals. Finally, fine adjustement of the ultrasound probe position were done using a micromanipulator by finding key landmarks from the µDoppler image. In practice, minimal adjustements were needed to find back the same imaging plane. We provide additional information about the positionning in the Materials and Methods section.

      New text – Line 126: “Positionning.

      The mechanical fixation of the head-post ensures an easy and repeatabe positionning of the ultrasound probe across imaging session. The ultrasound probe is indeed fixed to a micromanipulator enabling light adjustements To find the plane of interest (containing both S1BF and thalamic relays: bregma - 3.4mm), we used brain landmarks (e.g., surface of the brain, hippocampus, superior sagittal sinus, large vessels). Note that as the headpost was carefully placed in the same position relative to the skulls landmarks (bregma and lambda), the position of the region of interest was minimal across animals.”

      Second, at d5 post-stroke, we positionned the ultrasound probe over the imaging window as described in the Materials and Methods section and use brain landmarks from baseline/post-stroke image to maximize the position of brain image. We better detail the procedure followed.

      Original text: “First, we used the vascular markers and the shape of the hippocampus31,32 to find back the coronal cross-section imaged during the pre-stroke session. Five days after the MCA occlusion,….”

      New text – Line 360 :“Five days after the MCA occlusion, we first placed the ultrasound probe over the imaging window and adjusted its position (using micromanipulator) to find back the recording plane from Pre-Stroke session using Bmode (morphological mode) and µDoppler imaging using brain vascular landmarks (i.e., vascular patterns, brain surface and hippocampus34,35; see Figure 2B).”

      More detailed questions/comments/suggestions

      Methods

      ARRIVE methodology

      • Point 2b: sample size is not adequately explained, especially the use of n = 2 animals for 5d follow up

      We have explicited the sample size by adding a short paragraph at the beginning of the Results section. We also make the Supplementary Table 1 more accurate. New text – Line 239: “Animals

      Report on animal use, experimentation, exclusion criteria can be found in Supplementary Table 1. Rat#1 was excluded after the control session as the imaging window was too anterior to capture both cortical and thalamic responses. Ra#2 was excluded as hemodynamic responses were inconsistent during baseline (pre-stroke) period. Rat#3 showed early post-stroke reperfusion and was excluded from stroke analysis, the control session (pre-stroke) from Rat#3 was analyzed.”

      • Point 7: statistical methods: The quantification used to assess significant differences in stimulation traces is poorly described.

      We have amended the Materials and Methods section about statistics and provided Supplementary Figure 4.

      New text – Line 221: “Activated brain regions were detected from hemodynamic response time-courses using GLM followed by t-test across animals as proposed in Brunner, Grillet et al.,34. The area under the curve (AUC) from hemodynamic response time-courses was computed for individual trials in S1BF, VPM and Po regions, for all the periods of the recording and for all rats included in this work. AUC were compared and analysed using a non-parametric Kruskal-Wallis test corrected for multiple comparison using a Dunn’s test. Tests were performed using GraphPad Prism 10.0.1. “

      Functional Ultrasound Imaging acquisition

      • References 26 and 28 imply 2.5Hz and 2Hz acquisition rates, respectively. Why does the same method result in a 1.25Hz acquisition rate here? Can you confirm the same spatial resolution in these conditions?

      The spatial resolution is independent of the temporal resolution (frame rate). The spatial resolution depends on the resolution of the compound image and the temporal resolution is given by the number of compound images to generate a single Doppler image (exposure time). By increasing the number of compound images, the frame rate decreases while increasing the signal to noise ratio and sensistivity. For some work, a pause between 2 frames is used (mostly due to technical limitations in the software (processing time , or execution of a real-time display/processing by the user), however this reduces the frame rate.

      Author response table 1.

      Comparing with the sequences used in references 26 and 28, we have the following timing parameters

      In this work, we decided to reduce the frame rate to have less images but with higher SNR. The 0.3s were added by technical considerations in this specific implementation.

      New text – Line 158:“ To obtain a single vascular image we acquired a set of 250 compound images in 0.5s, an extra 0.3s pause is included between each image to have some processing time to display the images for real-time monitoring of the experiment. “

      Activity Maps

      • How is the use of a 40s window motivated?

      The 40s window has been choosen to better compare hemodynamic responses to either left or right whisker stimulation and centered the period of interest on the start of the stimulation. Original text:” Pre- and post-stroke recordings are reshaped in shorter 40-s sessions, i.e., 50 frames, …”

      New text – Line 206:“ Pre- and post-stroke recordings are reshaped in 40-s sessions, i.e., 50 frames, centered on the start of the stimulation (at 20s), …”

      • I think the manuscript would benefit from the use of an established, event-based GLM for activity mapping.

      We thank the reviewer for this suggestion, here we used a z-score for activity mapping that is largerly established in the neuroimaging realm.

      • The statistical thresholds used should account for multiple comparisons.

      We have amended the Materials and Methods section, and figure captions about statistics and provided Supplementary Figure 4.

      Statistical analyses

      • Overall this section is only superficially described, and lacks detailed information.

      We have amended the Materials and Methods section about statistics and provided Supplementary Figure 4.

      New text – Line 221 : “Activated brain regions were detected from hemodynamic response time-courses using GLM followed by t-test across animals as proposed in Brunner, Grillet et al.,34. The area under the curve (AUC) from hemodynamic response time-courses was computed for individual trials in S1BF, VPM and Po regions, for all the periods of the recording and for all rats included in this work. AUC were compared and analysed using a non-parametric Kruskal-Wallis test corrected for multiple comparison using a Dunn’s test. Tests were performed using GraphPad Prism 10.0.1. “

      • Are average rCBV changes referred to in the 40s window?

      The rCBV changes are referring to the pre-stimulation baseline. We have modified the text accordingly (Line 206).

      • Were normality and variance equality requirements verified in the group with n=2?

      Based on reviewers comment’s on the limited amount of recording at 5d, we have decided to remove this statistical analysis. The manuscript, figure and caption were corrected accordingly.

      • There is no method for cresyl violet staining

      We thank the review for highlighting this omission. We have provided a paragraph in the Materials & Methods section detailling the histology procedure – Line 228:

      “Histopathology

      Rats were killed 24hrs after the occlusion for histological analysis of the infarcted tissue. Rats received a lethal injection of pentobarbital (100mg/kg i.p. Dolethal, Vetoquinol, France). Using a peristaltic pump, they were transcardially perfused with phosphate-buffered saline followed by 4% paraformaldehyde (Sigma-Aldrich, USA). Brains were collected and post-fixed overnight. 50-μm thick coronal brain sections across the MCA territory were sliced on a vibratome (VT1000S, Leica Microsystems, Germany) and analyzed using the cresyl violet (Electron Microscopy Sciences, USA) staining procedure (see Open Lab Book for procedure). Slices were mounted with DPX mounting medium (Sigma-Aldrich, USA) and scanned using a bright-field microscope.”

      Results 1: Real time imaging of stroke induction in awake rats

      • Why is the window so narrow in the anteroposterior direction?

      The imaging window was defined based on the brain regions investigated in this work, meaning the primary somatosensory cortex (S1BF) and the ventroposterior medial thalamic relay (VPM). From Paxinos atlas, a position of interest is located at Bregma -3.4mm. The cranial window was performed accordingly, and restricted couple of mm to avoid non-needed procedure and brain exposure. We added a new sentence in the Materials & Methods section – Line 116: “This cranial window aims to cover bilateral thalamo-cortical circuits of the somatosensory whisker-to-barrel pathway.”

      • What validation was employed for the habituation protocol? Are animals stressed by the procedure? Do you have cortisol data to show? Ar animal weights throughout the procedure?

      The habituation protocol employed in this work follows recommandations from the expert in the field and peers (Martin et al., Journal of Neuroscience Methods, 2002; Martin et al., Neuroimage 2006; Topchiy et al., Behav Brain Res 2009). We have amended the corresponding paragraph in the Materials & Methods section detailling the habituation procedure:

      Original text: “Body restraint and head fixation.

      Rats were habituated to the workbench and to be restrained in a sling suit (Lomir Biomedical inc, Canada), progressively increasing the restraining period from minutes to hours33,34. After the headpost implantation (see below), rats were habituated to be head-fixed while restrained in the sling. The period of fixation was progressively increased from minutes to hours. Water and food gel (DietGel, ClearH2O, USA) were provided along the habituation session. Once habituated, the cranial window for imaging was performed as described below (Figure 1A-C).”

      New text - Line 90:“ Body restraint and head fixation.

      The body restraint and head fixation procedures are adapted from published protocols and setup dedicated for brain imaging of awake rats39–41. Rats were habituated to the workbench and to be restrained in a sling suit (Lomir Biomedical inc, Canada) by progressively increasing restraining periods from minutes (5mins, 10mins, 30mins) to hours (1 and 3hrs) for one or two weeks. The habituation to head-fixation started by short (5 to 30s) and gentle head-fixation of the headpost between fingers. The headpost was then secured between clamps for fixation periods progressively increased following the same procedure as with the sling. For both body restraint and head fixation, the initial struggling and vocalization diminished over sessions. Water and food gel (DietGel, ClearH2O, USA) were provided for all body restraint and head-fixation habituation sessions. Once habituated, the cranial window for imaging was performed as described below (Figure 1A-C).”

      • The observation of contralateral oligemia is based only on RSG traces.

      We provided contralesional perfusion changes for all regions in Supplementary Figure 1.

      • The spatial and temporal distribution of Bmode measured hyperechogenicity is surprising and should be discussed. Reference 29 describes for instance non-overlap with an area of hypo-perfusion. Overlap between hypo-perfused and infarct volumes should be systematically investigated and coregistered with histology. Moreover, reference 40, while using a different model, presents hyperechogenicity at 5h.

      The B-mode images in Figure 2B are presented as an illustration of the potential morphological changes detected at different timepoint. However, our study focuses on functional responses and not on the evolution of the morphological changes. Indeed, this Bmode images remain difficult to interpret as they show a structural reorganization at the level of the ultrasound scatterers which has not been directly linked with tissue infarction, oedema, orother histological conditions.

      Regarding the reference 40, the authors found an hyper-echogenicity at 5h a time window is not covered by our protocol. In reference 29, we indeed detailed a mismatch between the µDoppler images and histopathology. As suggested by the reviewer, seeking for other potential mismatchs/overlaps between Bmode/µDoppler and histopathology is an interesting field on investigation, but remains out of the scope of this work.

      Results 3: Delayed alteration of the somatosensory thalamocortical pathway

      • These results are underpowered and as such should probably be removed entirely from the paper (or substantiated with greater Ns of animals). Based on reviewers comment’s on the limited amount of recording at 5d, we have decided to remove this statistical analysis. The manuscript, figure and caption were corrected accordingly.

      • If I am not mistaken, reference 28 describes a protocol for awake mouse imaging, and thereby does not introduce any hippocampal landmark allowing effective positioning of the probe.

      We thanks the reviewer for this comment. While not used in the figure detailling image registration in reference 28, step 42 (page 17) from the protocol mentions the use of hippocampal landmark to position of the imaged brain to the atlas. The hippocampal landmark is also used in Brunner et al., JCBFM 2023, we have added this reference which is more appropriate to this work (i.e., rat model, digitalized paxinos atlas, linear ultrasound transducer).

      • Significant difference in ispsilesional VPM with post-stroke period looks spurious.

      We have amended the Materials and Methods section about statistics and provided Supplementary Figure 4.

      Discussion:

      The sentence "might result from the direct loss of the excitatory corticothalamic feedback to the VPM" should be moderated in the absence of electrophysiology support. Such a decrease could be explained by reduced perfusion due to the challenge.

      The reviewer is right and we believe the tense used in the sentence already balance the claim. However, we clarified on how such result could be better validated.

      Original text: “Further work will need to dissect the complex and long-lasting post-stroke alterations of the functional whisker-to-barrel pathway, including at the neuronal level, as fUS only reports on hemodynamics as a proxy of local neuronal activity27,28,60,66–68“

      New text – Line 445: “Therefore, further studies will be needed to accurately dissect the complex and long-lasting post-stroke alterations of the functional whisker-to-barrel pathway, including at the neuronal level by direct electrophysiology recordings and imaging, as fUS only reports on hemodynamics as a proxy of local neuronal activity30,31,63,74–76.“

      Figure 2

      • Panel B would be more informative if presented as an average.

      The aim of this figure is to show the raw data of a typical case. Averaging µDoppler images wouldn’t be illustrative as individual vessels will not be visible anymore. Because the vessels are in different positions from one animal to another, an average image would be blurred.

      • Panel C lacks contralateral S1BF trace.

      We have provided contralesional perfusion changes for all regions in Supplementary Figure 1.

      • Methods for detection of SDs refer to non-peer-reviewed reference 29, where SD is defined as 50% over baseline level. What is the actual threshold/method used to define a SD in this study?

      We better detailled this procedure in the Materials & Methods section - Line 195: “The detection of hemodynamic events associated with spreading depolarizations (SDs) was performed based on the temporal analysis of the rCBV signal in the retrosplenial granular (RSGc) and dysgranular (RSD) cortices of the left hemisphere (ipsi-lesional). SDs were defined as transient increase of rCBV signal (+25%) detected with a temporal delay of <10 frames (i.e., 8secs) between the two regions of interest, validating both the hyperemia and spreading features of hemodynamic events associated with spreading depolarizations.”

      • For panel F, a measure of variance would be more suited to show stereotypic profile across animals as the number of SDs varies between animals.

      Figure 2F indeed shows the average profile of hemodynamic events associated with spreading depolarizations (black line) with the variance (95% confidence interval error bands in gray). We have adjusted the corresponding figure caption to make this information more clear.

      Figure 3

      • The exact stimulation employed is not clear as the methods describe a 1.33 min delay between two whisker pad stimulations, but the figure reports 40s. The description is thereby ambiguous. We thank the reviewer for pointing out this potiential confusion which allowed us to correct a mistake

      • The effective delay between two stimulations delivered to the whisker pads is 40 seconds

      • The effective delay between two stimulations delivered to the same whisker pad is 80 seconds from start to start or 75 seconds from end to start.

      The text was amended accordingly in line 144: “Thus, the effective delay between two stimulations delivered to the same whisker pad is 80 seconds from start to start.“

      • In panel B the choice of colormap and transparency for template overlay is not explained and is confusing given the employed threshold of 1.6. Which mask was used to overlay the activation map on the template? Why black color to represent a supposedly significant difference?

      We thank the reviewer for pointing out this potiential confusion. We have adjusted the colormap in Figures 3 and 4.

      • The pre-stroke thalamic response is clearly localized in VPM for left stimulation, while it overlaps VPM and Po for the right stimulation. This questions the accuracy of the employed registration scheme and consequently the choice of these ROIs, which appear quite small as compared to the resolution and this positioning precision.

      We see the point of the reviewer, here the apparent difference because the brain is slighly tilted. By adjusting the angle for both activity maps (see Author response image 1) we confirm that both maps are very similar including the for activated areas VPM and Po.

      Author response image 1.

      • It would be interesting to see the same activation maps for all animals in supplementary.

      We have provided the Supplementary Figure 5 that contains both ipsilateral and contralateral responses to whiskers stimulation (from both left and right pads) for all trials and all rats included in this work.

      • Looking at panel C, more cortical regions seem to respond to the stimulation above S1BF.

      The reviewer is right and we have indeed mentioned this point several times in the original manuscript in:

      • the result section: “We also detected significant increase of activity in S2, AuD, Ect (*p<0.0001) and PRh (p<0.001) cortices and VPL nucleus (**p<0.01; the list of acronyms is provided in Supplementary Table 2), brain regions receiving direct efferent projections from the S1BF45,48,49, VPM or Po nuclei50–52.”

      • the caption of Figure 4: “S1BF, S2, AuD, VPM, VPL and Po regions are brain regions significatively activated (all pvalue<0.01; GLM followed by t-test.”

      • the conclusion section : “Functional responses to mechanical whisker stimulation were detected in several regions relaying the information from the whisker to the cortex, including the VPM and Po nuclei of the thalamus, and S1BF, the somatosensory barrel-field cortex. Responses were also observed in the S2 cortex involved in the multisensory integration of the information43,44,61, the auditory cortex as it receives direct efferent projection from S1BF45,61, and the VPL nuclei of the thalamus connected via corticothalamic projections45.“

      • It would be interesting to see bilateral traces as supplementary figures.

      We have provided the Supplementary Figure 5 that contains both ipsilateral and contralateral responses to whiskers stimulation (from both left and right pads) for all trials and all rats included in this work.

      • In both panels C and D, n=5 is reported, but methods state the use of 7 animals. Please clarify how animals have been used in the different studies

      We have clarified the report on animal use and amended the Supplementary Table 1 accordingly.

      • In Panel D, the 95% CI intervals seem particularly narrow. Might this be the result of considering multiple trials as independent events? A GLM analysis would avoid this statistical fallacy.

      We have provided the Supplementary Figure 5 that contains both ipsilateral and contralateral responses to whiskers stimulation (from both left and right pads) for all trials and all rats included in this work. The statistical analysis has been adjusted (see Materials and Methods) and completed with a Supplementary Figure 4

      Figure 4 - See comments above for Figure 3

      We have adjusted the Figure 3 accordingly to reviewer’s suggestions

      Reviewer #3 (Recommendations For The Authors):

      1) Introduction: Given the emphasis on the awake state, it would be helpful to note that a significant portion of strokes occur during sleep - as well as comment on its hemodynamic difference with respect to an awake state.

      We agree with the reviewer on the remark that some strokes occur during sleep phase. However, here the awake state, which has been poorly addressed in the litterature, is opposed to anesthesia a condition largerly used to investigate brain functions after stroke. We added a point and corresponding references about wake-up stroke, see Line 49.

      2) The effects of anesthetics on stroke are quite variable and the literature data on the topic is rather divergent: it would be helpful for the introduction to reflect the large level of discord in the literature and the wide-ranging mechanisms of action of different anesthetics.

      We thank the reviewer for this comment. We have completed our original sentence in the introduction to better reflect the various effects of anesthetics on stroke, see Line 50

      3) The reference list (14-17) to other studies of brain hemodynamic changes post ischemic stroke is egregiously short. Please expand. Similarly, the list of citations to other functional ultrasound rodent studies in the literature (23-24) is misleading: other groups have published similar work and ought to be cited.

      We thank the reviewer for this comment and added complementary references. However, we believe that the references 14-17 pointed by the reviewer are not only refering to brain hemodynamic changes but mostly on network and function as stated in the manuscript. Regarding references on fUS (23-24) mentioned by the reviewer, we did not limited our citation on functional ultrasound imaging to those 2 articles but on 15+ from 4 different research groups.

      4) It would be helpful if the authors used "spreading depolarization" the way it has been utilized in the many decades of research on them in the literature, namely, as waves of hyper/hypoactivity in the electrophysiological signals. Please use a distinct term to refer to waves of changes in the hemodynamic state.

      We have amended the terminology used in the manuscript. “Spreading depolarization” has been replaced by “hemodynamic events associated with spreading depolarizations” or similar.

      5) Why is this investigation restricted to male rats?

      As a proof of concept, we did not performed experiments in female rats. We agree that further investigation would require a gender mix. We added a line in the discussion.

      New text – Line 455:” Finally, it is important to note that this proof-of-concept work did not specifically focus the impact of sex dimorphism on the stroke or early behavioral outcomes following the insult that would greatly enhance the translational value of such preclinical stroke study80.”

      6) Were the animals tested during their active phase? If not, why not, and what are the implications of testing their responses during the sleep phase?

      We think there is a misunderstanding here as we investigated brain functions in awake head-fixed rats. Therefore, the sleep/active phases were not investigated neither mentioned in the manuscript.

      7) How is the level of stress monitored/established?

      In this work, we followed established procedure used to reduce stress and disconfort of the rats all along the experiment. The procedure used is now better detailled in the Materials and Methods section. However, the level of stress was not monitored, and would be of interest to considere in future experiments.

      8) What are the sequelae of stress on brain hemodynamics, especially given 1-4 hour long sessions.

      This is a good remark. While we cannot state on how the stress impacts brain hemodynamics, the data extracted show that hemodynamics reponse functions were stable and robust over hour-long recording (see control and pre-stroke sessions in Supplementary Figure 5).

      9) How is the animal prepared for stroke induction? In general, the methodological steps surrounding animal handling and preparation are exceedingly terse.

      We provided more details about the handling and preparation of the rats in the Materials and Methods section.

      Original text: “Body restraint and head fixation.

      Rats were habituated to the workbench and to be restrained in a sling suit (Lomir Biomedical inc, Canada), progressively increasing the restraining period from minutes to hours33,34. After the headpost implantation (see below), rats were habituated to be head-fixed while restrained in the sling. The period of fixation was progressively increased from minutes to hours. Water and food gel (DietGel, ClearH2O, USA) were provided along the habituation session. Once habituated, the cranial window for imaging was performed as described below (Figure 1A-C).”

      New text - Line 90:“ Body restraint and head fixation.

      The body restraint and head fixation procedures are adapted from published protocols and setup dedicated for brain imaging of awake rats39–41. Rats were habituated to the workbench and to be restrained in a sling suit (Lomir Biomedical inc, Canada) by progressively increasing restraining periods from minutes (5mins, 10mins, 30mins) to hours (1 and 3hrs) for one or two weeks. The habituation to head-fixation started by short (5 to 30s) and gentle head-fixation of the headpost between fingers. The headpost was then secured between clamps for fixation periods progressively increased following the same procedure as with the sling. For both body restraint and head fixation, the initial struggling and vocalization diminished over sessions. Water and food gel (DietGel, ClearH2O, USA) were provided for all body restraint and head-fixation habituation sessions. Once habituated, the cranial window for imaging was performed as described below (Figure 1A-C).”

      10) What is the reproducibility of the chemo-thrombotic model timeline? What are its limitations?

      We have provided more information on the chemo-thrombotic model and its limitations in the discussion section to discuss

      New text – Line 402:” However, to adequatly and efficiently occlude the vessel of interest, removing a piece of skull remains required. As mentioned in the report on animal use, one rat was excluded from the analysis as the MCA spontaneously reperfuses, thus dropping the success rate of such model.”

      11) What is the motivation behind the 5-days post stroke timepoint selection?

      In addition to demonstrating the feasability of imaging brain functions at different timepoint following the ischemia, the motivation to performed this delayed session was to capture functional diaschisis which is known to occur few days after the initial insult. More recurrent imaging sessions covering a longer post-stroke period would be of high interest to better capture the impact of ischemia on both the brain hemodynamics and functions.

      12) How predictive is hyperacute hemodynamics imaging of the long-term outcome?

      We thanks the reviewer for this question, that remains of major interest in the stroke realm. However, the prediction of long-term outcome would require to capture brain hemodynamic at larger scale as performed in Hingot et al., Theranostics 2020 and Brunner et al. JCBFM 2023, a coverage not accessible with the imaging window proposed in this work.

      13) It would be greatly reassuring if the authors presented the statistical parametric maps without masking regions of interest (eg Fig3B).

      We thank the reviewer for pointing out this potential confusion. In the first version of the figure, the colormap used of activity maps was indeed non optimal. Therefore, we i) adjusted the colormap used in Fig 3 and 4 and ii) provided non-thresholded z-score maps for all rats in Supplementary Figure 5.

      14) Fig 3C is hard to make out.

      We provided a full page version of the Figure 3C in Supplementary Figure 3.

      15) Figs 3,4 should incorporate box and whisker plots of data across all rats scatter plots of individual animal data.

      We are not sure which kind of data the reviewer wants to have displayed here. However, we have provided the Supplementary Figure 5 that contains both ipsilateral and contralateral responses to whiskers stimulation (from both left and right pads) for all trials and for individual animal included in this work.

      16) The final panels in Figures 3,4 would more tellingly include the plots of the linear models fitted.

      Based on all reviewers’ comments, we have adjusted and clarified the statistical analysis performed (see Materials and Method) and completed with a Supplementary Figure 4.

      17) The frame rate calculations are not adding up unless averaging and pauses are included so some more details should be stated. Are tilted plane waves averaged before compounding as in prior publications?

      The angles are averaged 6 times before compounding to reduce signal to noise ration and there is a pause of 0.3s between each Doppler image. See also question “Functional Ultrasound Imaging acquisition” from reviewer 2. We also provided supplementary and key information about the sequence used in this work.

      We have provided complementary information in the manuscript:

      Original text:” The ultrasound sequence generated by the software is the same as in Macé et al.,26 and Brunner, Grillet et al., Briefly, the ultrafast scanner images the brain 140 with 5 tilted plane-waves (-6°, -3°, +0.5°, +3°, +6°) at a 10-kHz frame rate. The 5 plane-wave images are added to create compound images at a frame rate of 500Hz. Each set of 250 compound images is 142 filtered to extract the blood signal. Finally, the intensity of the filtered images is averaged to obtain a 143 vascular image of the rat brain at a frame rate of 1.25Hz. Then, the acquired images are processed with a dedicated GPU architecture, displayed in real-time for data visualization, and stored for subsequent off-line analysis.”

      New text – Line 146:” The ultrasound sequence generated by the software is adapted from Macé et al.31 and Brunner, Grillet et al.34 Ultrafast images of the brain were generated using 5 tilted plane-waves (-6°, -3°, +0.5°, +3°, +6°). Each plane wave is repeated 6 times and the recorded echoes are averaged to increase the signal to noise ration. The 5 plane-wave images are added to create compound images at a frame rate of 500Hz. To obtain a single vascular image we acquired a set of 250 compound images in 0.5s, an extra 0.3s pause is included between each image to have some processing time to display the images for real-time monitoring of the experiment. The set of 250 compound images has a mixed information of blood and tissue signal. To extract the blood signal we apply a low pass filter (cutt off 15Hz) and an SVD filter that eliminates 20 singular values. This filter aims to select all the signal from blood moving with an axial velocity higher than ~1mm/s. To obtain a vascular iimage we compute the intensity of the blood signal i.e., Power Doppler image. This image is in first approximation proportional to the cerebral blood volume26,28. Overall, this process enables a continious acquisition of power Doppler images at a frame rate of 1.25Hz during several hours.”

      18) Ultrasound data processing: The filtering process should have more description. It would be highly instructive to explain that the power Doppler signal is being used and comment clearly on its relationship to blood volume, commenting on stalled flow mircrovessels/RBC-devoid micrrovessels, and considerations of vessel orientation.

      The compound image has a mixed information of blood and tissu signal. To extract the blood signal, we applied a low pass filter (cutt off 15Hz) and an SVD filter that eliminates 20 singular values. This filter selects all the signal from blood moving with an axial velocity higher than ~1mm/s. To obtain a vascular iimage we compute the intensity of the blood signal (Power Doppler image). This power Doppler image is in first approximation proportional to the cerebral blood volume.

      These information have been added in the Materials and Methods section of the manuscript.

      19) Does the SVD processing have the same cut off (20 singular values) as in prior publications as a standard value, or is that adjusted for each study? There are enough minor differences between sequences that these details are uncertain. Do the overall hemodynamics measurements (Fig 2) include all data acquired, or do they exclude the whisker stimulation events, and if so, how long of a window is excluded? The explanation of the activity maps should be rephrased e.g. "... recordings are segmented in shorter 40-s time windows encompassing the whisker stimulation trials..."

      We agree that these details are important, all these information have been added to the manuscript

      • SVD processing: We eliminate 20 singular values as in cited studies.

      • Sequence: we have included more details about the sequence.

      • Processing: all data during the whisker stimulation is used.

      • We have rephrased the explanation about the activity maps.

      20) Discuss the methodology behind histological data shown in Fig. 1.

      We thank the review for highlighting this omission. We have provided a paragraph in the Materials & Methods section detailling the histology procedure (Line 228):

      “Histopathology

      Rats were killed 24hrs after the occlusion for histological analysis of the infarcted tissue. Rats received a lethal injection of pentobarbital (100mg/kg i.p. Dolethal, Vetoquinol, France). Using a peristaltic pump, they were transcardially perfused with phosphate-buffered saline followed by 4% paraformaldehyde (Sigma-Aldrich, USA). Brains were collected and post-fixed overnight. 50-μm thick coronal brain sections across the MCA territory were sliced on a vibratome (VT1000S, Leica Microsystems, Germany) and analyzed using the cresyl violet (Electron Microscopy Sciences, USA) staining procedure (see Open Lab Book for procedure). Slices were mounted with DPX mounting medium (Sigma-Aldrich, USA) and scanned using a bright-field microscope

    2. eLife assessment

      This important proof-of-concept study strongly supports the utility of functional ultrasound imaging for evaluating cerebral hemodynamics in rat models of brain injury. Functional ultrasound affords a distinct coverage/spatial/temporal resolution tradeoff when compared to other modalities for studying brain hemodynamics. The solid data presented indicate high fidelity of the recordings, a particular feat given that the rats were awake. On the other hand, single slice imaging and complexity of registration of subsequent imaging sessions limit the usefulness of the approach, particularly for quantitative imaging, and the small sample size will need to be followed up with and verified by future studies. This work will be of interest to researchers working in functional neuroimaging and more precisely with preclinical models of stroke in rodents.

    3. Reviewer #1 (Public Review):

      Summary: The authors apply a new approach to monitor widespread changes in sensory evoked hemodynamic activity after focal stroke in fully conscious rats. Using functional ultrasound (fUS), they report immediate and lasting (up to 5 days) depression of sensory evoked responses in somatosensory thalamic and cortical regions.

      Strengths: This a technically challenging study that employs new methods to study more distributed changes in sensory evoked neural activity, inferred from changes in cerebral blood flow. The authors provide compelling images and rigorous analysis to support their conclusions.

      The primary weakness of this paper was the small sample size used for drawing conclusions. The authors have added additional references that help support their preliminary findings.

      Ultimately, it is a proof of concept paper showing the potential of this imaging approach for examining brain wide changes in activity before and after stroke in awake animals. In that sense, I think this paper will be well appreciated by researchers trying to understand how stroke leads to distributed changes in brain function.

    4. Reviewer #2 (Public Review):

      Brunner et al. present a new and promising application of functional ultrasound (fUS) imaging to follow the evolution of perfusion and haemodynamics upon thrombotic stroke in awake rats. The authors leveraged a chemically induced occlusion of the rat Medial Cerebral Artery (MCA) with ferric chloride in awake rats, while imaging with fUS cerebral perfusion with high spatial and temporal resolution (100µm x 110µm x 300µm x 0.8s). The authors also measured evoked haemodynamic responses at different timepoints following whisker stimulation.

      As the fUS setup of the authors is limited to 2D imaging, Brunner and colleagues focused on a single coronal slice where they identified the primary Somatosensory Barrel Field of the Cortex (S1BF), directly perfused by the MCA and relay nuclei of the Thalamus: the Posterior (Po) and the Ventroposterior Medial (VPM) nuclei of the Thalamus. All these regions are involved in the sensory processing of whisker stimulation. By investigating these regions the authors present the hyper-acute effect of the stroke with these main results:

      - MCA occlusion results in a fast and important loss of perfusion in the ipsilesional cortex.<br /> - Thrombolysis is followed by Spreading Depolarisation measured in the Retrosplenial cortex.<br /> - Stroke-induced hypo-perfusion is associated with a significant drop in ipsilesional cortical response to whisker stimulation, and a milder one in ipsilesional subcortical relays.<br /> - Contralesional hemisphere is almost not affected by stroke with the exception of the cortex which presents a mildly reduced response to the stimulation.

      In addition, the authors demonstrate that their protocol allows to follow up stroke evolution up to five days postinduction. They further show that fUS can estimate the size of the infarcted volume with brilliance mode (Bmode), confirming the presence of the identified lesional tissue with post-mortem cresyl violet staining.

      Upon measuring functional response to whisker stimulation 5 days after stroke induction, the authors report that:

      - The ipsilesional cortex presents no response to the stimulation<br /> - The ipsilesional thalamic relays are less activated than hyper acutely

      These observations mainly validate a new method to chronically image the longitudinal sequelae of stroke in awake animals. However, the potentially more intriguing results the authors describe in terms of functional reorganization of functional activity following stroke will require additional data to be validated. While highly preliminary, the research model proposed by the author (where the loss of the infarcted cortex induces reduces activity in connected regions, whether by cortico-thalamic or cortico-cortical loss of excitatory drive), is interesting. This hypothesis would require a greatly expanded, sufficiently powered study to be validated (or disproven)."

    5. Reviewer #3 (Public Review):

      The authors set out to demonstrate the utility of functional ultrasound for evaluating changes in brain hemodynamics elicited acutely and subacutely by middle cerebral artery occlusion model of ischemic stroke in awake rats.<br /> Functional ultrasound affords a distinct set of tradeoffs relative to competing imaging modalities. Acclimatization of rats for awake imaging has proven difficult with most, and the high quality of presented data in awake rats is a major achievement. The major weakness of the approach is in its being restricted to single slice acquisitions, which also complicates registration of acquisition across multiple imaging sessions within the same animal. Establishing that awake imaging represents an advancement in relation to studies under anesthesia hinges upon establishment of the level of stress experienced by the animals in the course of imaging, i.e., requires providing data on the assessment of stress over the course of these long imaging sessions, which was not undertaken. This is particularly significant given that physical restraint has been established to be a particularly potent stressor in experimental models of stress. Assessment of the robustness of these measurements in a larger cohort of animals under varying conditions is of particular significance for supporting its wide applicability.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We were pleased with seeing our work published as a Reviewed Preprint online so swiftly. Now, we would like to take the opportunity to include our responses to the comments made by the reviewers into the Reviewed Preprint and also submit a revised version of the manuscript, in which we have incorporated and addressed the reviewers’ comments.

      We believe that our revisions have significantly improved the quality of the manuscript. Specifically, we have described our results more precisely and explained certain decisions that were made in the analysis pipeline more clearly. For example, Figure 4 was improved substantially, by incorporating a schematic representation of how ERP traces were extracted from neural data. Furthermore, we have added three paragraphs in the Discussion where we elaborate on 1) the two observed interaction effects between attention and drug condition, 2) the relation between behavioral, computational, and neural effects, and 3) the statistical robustness of our findings. As such, we believe our interpretation of the results and their robustness now more faithfully represents our observations.

      Moreover, we have incorporated the Supplementary Information and Figures, initially presented as a separate section of the manuscript, to the main manuscript and its accompanying supplementary figures. Thereby, the structure of the paper now better follows the eLife format. As a result, some of the previously included supplementary figures are now described in text of the main manuscript.

      Reviewer #1 comments:

      In the results section on page 6, the authors conclude that "Attention and ATX both enhanced the rate of evidence accumulation towards a decision threshold, whereas cholinergic effects were negligible." I believe "negligible" is wrong here: the corresponding effects of donepezil had p-values of .09 (effect of donepezil on drift rate), .07 (effect of donepezil on the cue validity effect on drift rate) and .09 (effect of donepezil on non-decision time), and were all in the same direction as the effects of atomoxetine, and would presumably have been significant with a somewhat larger sample size. I would say the effects of donepezil were "in the same direction but less robust" (or at the very least "less robust") instead of "negligible".

      We agree with the reviewer that ‘negligible’ may not properly capture the effects of DNP on DDM parameter estimates. Although we do feel that caution is warranted in interpreting the effects of DNP on computational parameter estimates, we have now described these effects in line with the reviewer’s suggestion: in the same direction as the effects of ATX, but not (or less) statistically robust.

      "In the results section on page 8, the authors conclude that "Summarizing, we show that drug condition and cue validity both affect the CPP, but they do so by affecting different features of this component (i.e. peak amplitude and slope, respectively)." This conclusion is a bit problematic for two reasons. First, drug condition had a significant effect not only on peak amplitude but also on slope. Second, cue validity had a significant effect not only on slope but also on peak amplitude. It may well be that some effects were more significant than others, but I think this does not warrant the authors' conclusion.

      Indeed, we observed that cue validity affected both CPP peak amplitude and slope and some effects were more significant than others. As such, we agree with the reviewer that the conclusion that cue validity and drug condition affect different features of the CPP was too strongly formulated. We have changed this statement in the manuscript to reflect the observed data pattern more appropriately. We would however like to point out that this does not undermine our main conclusion. Spatial attention and drug condition showed only limited interaction effects in terms of behavior and neural data and their effects on occipital activity were separable in terms of timing and spatial profile. Therefore, our conclusion that catecholamines and spatial attention jointly shape perceptual decision-making remains valid.

      In the discussion section on page 11, the authors conclude that "First, although both attention and catecholaminergic enhancement affected centro-parietal decision signals in the EEG related to evidence accumulation (O'Connell et al., 2012; Twomey et al., 2015), attention mainly affected the build-up rate (slope) whereas ATX increased the amplitude of the CPP component (Figure 3D-F)." As I wrote above, I believe it is not correct that "attention mainly affected the build-up rate or slope", given that the effect of cue-validity on CPP slope was also significant. Also, while the authors' data do support the conclusion that ATX increased the amplitude and not the slope of the CPP component, a previous study in humans found the opposite: ATX increased the slope but did not affect the peak amplitude of the CPP (Loughnane et al 2019, JoCN, https://pubmed.ncbi.nlm.nih.gov/30883291). Although the authors cite this study (as from 2018 instead of 2019), they do not draw attention to this important discrepancy between the two studies. I encourage the authors to dedicate some discussion to these conflicting findings.

      We thank the reviewer for spotting this error, we cited the preprint version (from 2018) of Loughnane and colleagues and not the published JoCN paper (from 2019). We have changed this in the updated version of the manuscript. We further thank the reviewer for asking about this interesting discrepancy between our observation that ATX increased CPP peak amplitude in absence of slope effects and the observation by Loughnane et al. (2019, JoCN) that ATX increased CPP slope, but not amplitude. We first would like to point out that the peak amplitude effect in Loughnane et al. (2019) was in the same direction as our reported effect, with numerically higher peak amplitudes for ATX compared to PLC (Figure 2A – right panel in Loughnane et al., 2019). However, as their omnibus main effect of drug condition on CPP peak amplitude was not significant, they did not provide statistics for a pairwise comparison of ATX and PLC in terms of CPP peak amplitude, which makes it hard to compare the effects directly. Regardless, Loughnane et al. (2019) did observe an effect on CPP slope, whereas we did not. Speculatively, this difference could be related to the behavioral tasks that were used in both studies. Below we have added a new paragraph from the Discussion in which we elaborate on this more.

      In Discussion, page 15:

      Here, we demonstrated that response accuracy and response speed are differentially represented in the CPP, with correct vs. erroneous responses resulting in a higher slope and peak amplitude, whereas fast vs. slow responses are only associated with increased slopes (Figure 3A-B). Speculatively, the specific effect of any (pharmacological) manipulation on the CPP may depend on task-setting. For example, Loughnane et al. (2019) used a visual task on which participants did not make many errors (hit rate>98%, no false alarms), whereas we applied a task in which participants regularly made errors (roughly 25% of all trials). Possibly, the effects of ATX from Loughnane et al. (2019) in terms of behavior (RT effect, not accuracy/d’) and CPP feature (slope effect, not peak) may therefore have been different from the effects of ATX we observed on behavior (d’ effect, not RT) and CPP feature (peak effect, not slope). Regardless, when we compared subjects with high and low drift rates (Figure 3C), we observed that both CPP slope and CPP peak were increased for the high vs. low drift group (independent of the drug or attentional manipulation). This indicates that both CPP slope and CPP peak were associated with drift rate from the DDM. Clearly, more work is needed to fully understand how evidence accumulation unfolds in neural systems, which could consequently inform future behavioral models of evidence accumulation as well.

      On page 12 and page 14 the authors suggest a selective effect of ATX on tonic catecholamine activity, but to my knowledge the exact effects of ATX on phasic vs. tonic catecholamine activity are unknown. Although microdialysis studies have shown that a single dose of atomoxetine increases catecholamine concentrations in rodents, it is unknown whether this reflects an increase in tonic and/or phasic activity, due to the limited temporal resolution of microanalysis. Thus, atomoxetine may affect tonic and/or phasic catecholamine activity, and which of these two effects dominates is still unknown, I think.

      We agree with the reviewer that the direct effects of ATX on tonic versus phasic catecholaminergic activity are not clear as initially stated in the manuscript. Equally problematic, previous work has demonstrated that changes in tonic neuromodulation shape evoked neuromodulatory discharge (Aston-Jones & Cohen, 2005, Annu. Rev. Neurosci; Knapen et al., 2016, PLoS ONE). As such, any effect of ATX on tonic neuromodulatory drive would probably have affected phasic catecholaminergic responses as well, although this claim will have to be experimentally addressed. We think that because of the close relation between tonic and phasic neuromodulation, it may indeed be better to refrain from the simplistic interpretation that ATX (and DNP) solely and specifically affects tonic neuromodulation. We have used more neutral language in that regard in the updated version of the manuscript, for example by only mentioning elevated neuromodulator levels (not specifying tonic or phasic). Moreover, we have extended a part of our previous Discussion, to elaborate this issue in more detail. An excerpt of this paragraph, consisting of previous and newly added text, can be seen below.

      In Discussion, page 14:

      In contrast with recent work associating catecholaminergic and cholinergic activity with attention by virtue of modulating prestimulus alpha-power shifts (Bauer et al., 2012; Dahl et al., 2020, 2022) and attentional cue-locked gamma-power (Bauer et al., 2012; Howe et al., 2017), the current work shows that the effects of neuromodulator activity are relatively global and non-specific, whereas the effects of spatial attention are more specific to certain locations in space. Our findings are, however, not necessarily at odds with these previous studies. Most recent work associates phasic (event-related) arousal with selective attention (for reviews see: Dahl et al., 2022; Thiele & Bellgrove, 2018). For example, cue detection in visual tasks is known to be related to cholinergic transients occurring after cue onset (Howe et al., 2017; Parikh et al., 2007). Contrarily, in our work we aimed to investigate the effects of increased baseline levels of neuromodulation by suppressing the reuptake of catecholamines and the breakdown of acetylcholine throughout cortex and subcortical structures. Tonic and phasic neuromodulation have previously been shown to differentially modulate behavior and neural activity (de Gee et al., 2014, 2020, 2021; McGinley et al., 2015; McGinley, Vinck, et al., 2015; van Kempen et al., 2019). Note, however, that it is difficult to investigate causal effects of tonic neuromodulation in isolation of changes in phasic neuromodulation, mostly because phasic and tonic activity are thought to be anti-correlated, with lower phasic responses following high baseline activity and vice versa (Aston- Jones & Cohen, 2005; de Gee et al., 2020; Knapen et al., 2016). As such, pharmacologically elevating tonic neuromodulator levels may have resulted in changes in phasic neuromodulatory responses as well. Concurrent and systematic modulations of tonic (e.g. with pharmacology) and phasic (e.g. with accessory stimuli; Bruel et al., 2022; Tona et al., 2016) neuromodulator activity may be necessary to disentangle the respective and interactive effects of tonic and phasic neuromodulator activity on human perceptual decision-making.

      Reviewer #2 comments:

      The main weakness of the paper lies in the strength of evidence provided, and how the results tally with each other. To begin with, there are a lot of significance tests performed here, increasing the chances of false positives. Multiple comparison testing is only performed across time in the EEG results, and not across post-hoc comparisons throughout the paper. In and of itself, it does not invalidate any result per se, but it does colour the interpretation of any results of weak significance, of which there are quite a few. For example, the effect of Drug on d' and subsequent post-hoc comparisons, also effect of ATX on CPP amplitude and others.

      We agree with the reviewer that the statistical evidence for some of the results presented in this study is limited. This issue mostly concerns the effects of the pharmacological manipulation (effects of attention were strong and robust), which is unfortunately often the case given the high inter-individual variability in responses to pharmaceutical agents. We have added a paragraph to the Discussion in which we discuss this limitation of the current study. Furthermore, we discuss our findings in the context of previous work, thereby showing that - although not always robust- most of the reported drug effects were in the direction that could be expected based on previous literature. We have pasted that paragraph below.

      In Discussion, pages 16:

      Although the effects of the attentional manipulation were generally strong and robust, the statistical reliability of the effects of the pharmacological manipulation was more modest for some comparisons. This may partly be explained by high inter-individual variability in responses to pharmaceutical agents. For example, initial levels of catecholamines may modulate the effect of catecholaminergic stimulants on task performance, as task performance is supposed to be optimal at intermediate levels of catecholaminergic neuromodulation (Cools & D’Esposito, 2011). While acknowledging this, we would like to highlight that many of the observed effects of ATX were in the expected direction and in line with previous work. First, pharmacologically enhancing catecholaminergic levels have previously been shown to increase perceptual sensitivity (d’) (Gelbard-Sagiv et al., 2018), a finding that we have replicated here. Second, methylphenidate (MPH), a pharmaceutical agent that elevates catecholaminergic levels as well, has been shown to increase drift rate as derived from drift diffusion modeling on visual tasks (Beste et al., 2018) in line with our ATX observations. Third, a previous study using ATX to elevate catecholaminergic levels observed that ATX increased CPP slope (Loughnane et al., 2019). Although in our case ATX increased the CPP peak and not its slope, this provide causal evidence that centro-parietal ERP signals related to sensory evidence accumulation are modulated by the catecholaminergic system (Nieuwenhuis et al., 2005). Fourth, we observed that elevated levels of catecholamines affected stimulus driven occipital activity relatively late in time and close to the behavioral response, which resonates with previous observations (Gelbard-Sagiv et al., 2018). Finally, ATX had robust effects on physiological responses (heart rate, blood pressure, pupil size), cue-locked ERP signals and oscillatory power dynamics in the alpha-band, leading up to stimulus presentation. We concur, however, that more work is needed to firmly establish how (various forms of) attention and catecholaminergic neuromodulation affect perceptual decision-making.

      The lack of an overall RT effect of Drug leaves any DDM result a little underwhelming. How do these results tally? One potential avenue for lack of RT effect in ATX condition is increased drift rate but also increased non-decision time, working against each other. However, it may be difficult to validate these results theoretically.

      As the reviewer remarks, an increase in performance/d’ in absence of any RT effects can be algorithmically explained by a combination of increased drift rate and prolonged non-decision time. This is indeed what we observed for ATX. Non-decision time is generally thought to reflect the time necessary for stimulus encoding and motor execution and as such is seen as separate from the evidence-accumulation decision process. We deem it possible that ATX simultaneously prolonged stimulus encoding/motor execution (reflected in changes in non-decision time) and fastened evidence accumulation (reflected in changes in drift rate). Although our neural data did not provide evidence for this claim, previous work has demonstrated that increased baseline (pupil-linked) arousal/neuromodulation is associated with a decreased build-up rate of a neural signal associated with motor execution (β-power over motor cortex, Van Kempen et al., 2019, eLife), potentially linking increased non-decision time under ATX to slowing down of motor execution processes. The same authors also report relationships between baseline (pupil-linked) arousal/neuromodulation and activity over occipital and centroparietal cortices, respectively associated with sensory processing and sensory evidence accumulation, suggesting that baseline neuromodulation may affect all stages leading up to a decision (sensory processing, evidence accumulation and motor execution). Note also that the attentional manipulation seems to simultaneously increase drift rate and shorten non-decision time in our case, as one would expect (Figure 2E, Figure 2 – Supplements 4&5).

      There is an interaction between ATX and Cue in terms of drift rate, this goes against the main thesis of the paper of distinct and non-interacting contributions of neuromodulators and attention. This finding is then ignored. There is also a greater EDAN later for ATX compared to PLA later in the results, which would also indicate interaction of neuromodulators and attention but this is also somewhat ignored.

      There are indeed some interesting interaction effects between ATX and spatial attention (cue), as pointed out by the reviewer. However, we did also observe striking differences in the effects of ATX and attention on stimulus-locked occipital activity (in timing and spatial specificity) as well as independent (main) effects on CPP amplitude and pre-stimulus alpha power. Therefore, throughout the paper we tried to carefully describe the effects of attention and ATX as largely independently and jointly modulating perceptual decision-making, while at the same time highlighting the interaction effects that we observed, where present. We have highlighted the effects the reviewer refers to even more explicitly in a separate paragraph that we added to the discussion, pasted below.

      In Discussion, page 13-14:

      We did observe two striking interaction effects between the catecholaminergic system and spatial attention. First, effects of attention on drift rate were increased under catecholaminergic enhancement (Figure 2D). Although this interaction effect was not reflected in CPP slope/peak amplitude, this does suggest that catecholamines and spatial attention might together shape sensory evidence accumulation in a non-linear manner. Second, the amplitude of the cue-locked early lateralized ERP component (resembling the EDAN) was increased under ATX as compared to PLC. The underlying neural processes driving the EDAN ERP, as well as its associated functions, have been a topic of debate. Some have argued that the EDAN reflects early attentional orienting (Praamstra & Kourtis, 2010) but others have claimed it is mere a visually evoked response and reflects visual processing of the cue (Velzen & Eimer, 2003). Thus, whether this effect reflects a modulation of ATX on early attentional processes or rather a modulation of early visual responses to sensory input in general is a matter for future experimentation.

      The CPP results are somewhat unclear. Although there is an effect of ATX on drift rate algorithmically, there is no effect of ATX on CPP slope. On the other hand, even though there is no effect of DNP on drift rate, there is an effect of DNP on CPP slope. Perhaps one may say that the effect of DNP on drift rate trended towards significance, but overall the combination of effects here is a little unconvincing. In addition, there is an effect of ATX on CPP amplitude, but how does this tally with behaviour? Would you expect greater CPP amplitude to lead to faster or slower RTs? The authors do recognise this discrepancy in the Discussion, but discount it by saying the relationship between algorithmic and CPP parameters in terms of DDM is unclear, which undermines the reasoning behind the CPP analyses (and especially the one correlating CPP slope with DDM drift rate).

      We thank the reviewer for pointing out this dissociation of drug effects in terms of the algorithmic (DDM) and neural (CPP) ‘implementations’ of the evidence accumulating process underlying perceptual decisions. We have added a new paragraph to the discussion where we interpret the effects of ATX on the neural and algorithmic levels of evidence accumulation. Below we have pasted that paragraph:

      In Discussion, page 14-15:

      We reported attentional and neuromodulatory effects on algorithmic (DDM, Figure 2) and neural (CPP, Figure 3) markers of sensory evidence accumulation. Recent work has started to investigate the association of these two descriptors of the accumulation process, aiming to uncover whether neural activity over centroparietal regions reflects evidence accumulation, as proposed by computational accumulation-to-threshold models (Kelly & O’Connell, 2015; O’Connell et al., 2018; O’Connell & Kelly, 2021; Twomey et al., 2015). Currently, the CPP is often thought to reflect the decision variable, i.e. the (unsigned) evidence for a decision (Twomey et al., 2015), and consequently its slope should correspond with drift rate, whereas its amplitude at any time should correspond with the so-far accumulated evidence. As -computationally- the decision is reached when evidence crosses a decision bound (the threshold), it may be argued that the peak amplitude of the CPP (roughly) corresponds with the decision boundary. This seems to contradict our observation that 1) ATX modulated drift rate, but not CPP slope and 2) ATX did not modulate boundary separation, but did modulate CPP peak. Note, however, that previous studies using pharmacology or pupil-linked indexes of (catecholaminergic) neuromodulation have also demonstrated effects on both CPP peak (van Kempen et al., 2019) and CPP slope (Loughnane et al., 2019).

      The posterior component effects are problematic. The main issue is the lack of clarification of and justification for the choice of posterior component. The analysis is introduced in the context of the target selection signal the N2pc/N2c, but the component which follows is defined relative to Cue, albeit post-target. Thus this analysis tells us the effect of Cue on early posterior (possibly) visual ERP components, but it is not related to target selection as it is pooled across target/distractor. Even if we ignore this, the results themselves wrt Drug lack context. There is a trending lower amplitude for ATX at later latencies at temporo-parietal electrodes, and more positive for DNP, relative to PLA. Is this what one would expect given behaviour? This is where the issue of correct component identification becomes critical in order to inform any priors on expected ERP results given behaviour.

      We thank the reviewer for raising this issue with the occipital ERP analysis, allowing us to clarify our decisions regarding the analyses and our interpretations of the results. First, the selection of electrodes was based on, and identical to, previous studies investigating lateralized target selection signals in visual tasks containing bilateral visual stimuli (Loughnane et al., 2016; Newman et al., 2017; Papaioannou & Luck, 2020; van Kempen et al., 2019). Second, the ERPs were defined relative to both the direction of the cue as well as the location of the target. As cue direction and target location were not always congruent (cue validity=80%), we could adopt a 2x2 (cue direction x stimulus identity) design for our ERP analyses (we are ignoring drug condition for explanation purposes). For example, for validly cued target trials we extracted two ERP traces: 1) from the hemisphere contralateral to both the cue and the target stimulus (representing processing of cued target stimulus) and 2) from the hemisphere ipsilateral to the cue and the target stimulus (representing processing of non-cued noise stimulus). However, for invalidly cued trials, ERP traces were extracted from 3) the hemisphere contralateral to cue direction and ipsilateral to the target stimulus (reflecting processing of cued noise stimuli) as well as 4) from the hemisphere ipsilateral to cue direction but contralateral to the target stimulus (reflecting processing of non-cued target stimuli). By defining our ERPs as such, we were able to gauge effects of cue direction (reflecting general shifts in attention), stimulus identity (reflecting target vs. noise selection processes) and their interaction (reflecting cue validity) on activity over occipito-temporal activity. Third, we did not pool data (across target/noise stimuli) for statistical analyses, but only for visualization purposes. To clarify how we extracted ERP traces, we have changed Figure 4 substantially. The updated figure now contains a schematic of how these four distinct ERP traces (cue x stimulus identity) were extracted from neural activity. Moreover, for clarity sake, we now show all 12 ERP traces (3x2x2, drug condition x cue direction x stimulus identity) as well as the three main effects that we observed after performing a 3x2x2 repeated measures (rm)ANOVA over time.

      We observed robust (cluster-corrected) effects of cue direction (not validity) on early occipital activity (Fig. 4C – left panel) and of stimulus identity (target/noise) and drug condition on later occipital activity (Fig. 4C – middle and right panel). These results crucially highlight the different temporal (early/late) and spatial (lateralized/not lateralized) profiles of cue, target and drug effects on occipital activity. Moreover, we observed a specific order of drug effects on late occipital activity (DNP>PLC>ATX). The behavioral relevance of this pattern of effects remains elusive. Although the effects of drug condition coincide in time with those of target selection (i.e. when activity contralateral and ipsilateral to the target stimulus was different), the effects of drug were bilateral, meaning that occipito-temporal activity related to the processing of the target (task-relevant) stimulus and non-target (task-irrelevant) stimulus was equally modulated by these pharmaceutical agents. One might argue that these effects show that neither ATX nor DNP modulated the signal-to-noise ratio (SNR), a feature that describes how well relevant stimulus information (signal) can be discerned from irrelevant information (noise). Although it may be tempting to extrapolate this finding to behavior, by suggesting that on the basis of these drug effect neither ATX nor DNP could have modulated d’ (behavioral measure describing how well signal is separated from noise), we would like to point out that our behavioral task specifically concerned a discrimination task about the (orientation of the) target stimulus in which the difference between signal and noise was only relevant for localization purposes and thus has a less direct relation with task performance. As such it is difficult to grasp how the modulation of late occipito-temporal activity by ATX and DNP relates to their behavioral effects. Moreover, the bilateral effect of both ATX and DNP also suggests an absence of interaction effects between drug conditions and visuo-spatial attention, as the effects of ATX/DNP were similar across all cue and target identity conditions.

    2. eLife assessment

      This important study shows that pharmacologically enhanced catecholamine levels and increased voluntary spatial attention have overlapping as well as dissociable effects on performance on a visuospatial attention task and corresponding EEG markers. The findings provide solid evidence regarding how neuromodulatory arousal and selective spatial attention jointly shape perceptional decision-making.

    3. Joint Public Review:

      The authors aimed to contrast the effects of pharmacologically enhanced catecholamine and acetylcholine levels versus the effects of voluntary spatial attention on decision making in a standard spatial cuing paradigm. Meticulously reported, the authors show that atomoxetine, a norepinephrine reuptake inhibitor, and cue validity both enhance model-based evidence accumulation rate, but have several distinct effects on EEG signatures of pre-stimulus cortical excitability, evoked sensory EEG potentials and perceptual evidence accumulation. The results are based on a reasonable sample size (N=28) and state-of-the art modeling and EEG methods.

      The authors' EEG findings provide solid evidence for the overall conclusion that selective attention and neuromodulatory systems shape perception in "similar, unique, and interactive" ways. This is an important conclusion because neuromodulatory systems and selective spatial attention are both known to regulate the neural gain of task-relevant single neurons and neural networks. Apparently, these effects on neural gain affect decision making in partly overlapping and partly dissociable ways.

      The effects of donepezil, a cholinesterase inhibitor, were generally less strong than those of atomoxetine, and in various analyses went in the opposite direction. The authors fairly conclude that more work is necessary to determine the effects of cholinergic neuromodulation on perceptual decision making.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):*

      The manuscript by Hariani et al. presents experiments designed to improve our understanding of the connectivity and computational role of Unipolar Brush Cells (UBCs) within the cerebellar cortex, primarily lobes IX and X. The authors develop and cross several genetic lines of mice that express distinct fluorophores in subsets of UBCs, combined with immunocytochemistry that also distinguishes subtypes of UBCs, and they use confocal microscopy and electrophysiology to characterize the electrical and synaptic properties of subsets of so-labelled cells, and their synaptic connectivity within the cerebellar cortex. The authors then generate a computer model to test the possible computational functions of such interconnected UBCs.

      Using these approaches, the authors report that:

      1) GRP-driven TDtomato is expressed exclusively in a subset (20%) of ON-UBCs, defined electrophysiologically (excited by mossy fiber afferent stimulation via activation of UBC AMPA and mGluR1 receptors) and immunocytochemically by their expression of mGluR1.

      2) UBCs ID'd/tagged by mCitrine expression in Brainbow mouse line P079 are expressed in a similar minority subset of OFF-UBCs defined electrophysiologically (inhibited by mossy fiber afferent stimulation via activation of UBC mGluR2 receptors) and immunocytochemically by their expression of Calretinin. However, such mCitrine expression was also detected in some mGluR1 positive UBCs, which may not have shown up electrophysiologically because of the weaker fluorophore expression without antibody amplification.

      This is correctly stated with the exception that the P079 mouse line itself expresses mCitrine. The Brainbow mouse line was used in the connectivity study by crossing it to the GRP-Cre or Calretinin-Cre lines.

      3) Confocal analysis of crossed lines of mice (GRP X P079) stained with antibodies to mGluR1 and calretinin documented the existence of all possible permutations of interconnectivity between cells (ON-ON, ON-OFF, OFF-OFF, OFF-ON), but their overall abundance was low, and neither their absolute nor relative abundance was quantified.

      They were certainly rare to observe using our approaches, but we reasoned that the densities of such connections are not possible to estimate accurately. Please see discussion below.

      4) A computational model (NEURON ) indicated that the presence of an intermediary UBC (in a polysynaptic circuit from MF to UBC to UBC) could prolong bursts (MF-ON-ON), prolong pauses (MF-ON-OFF), cause a delayed burst (MF-OFF- OFF), cause a delayed pause (MF-OFF-ON) relative to solely MF to UBC synapses which would simply exhibit long bursts (MF-ON) or long pauses (MF-OFF).

      The authors thus conclude that the pattern of interconnected UBCs provides an extended and more nuanced pattern of firing within the cerebellar cortex that could mediate longer-lasting sensorimotor responses.

      The cerebellum's long-known role in motor skills and reflexes, and associated disorders, combined with our nascent understanding of its role in cognitive, emotional, and appetitive processing, makes understanding its circuitry and processing functions of broad interest to the neuroscience and biomedical community. The focus on UBCs, which are largely restricted to vestibular lobules of the cerebellum reduces the breadth of likely interest somewhat. The overall design of specific experiments is rigorous and the use of fluorophore expressing mouse lines is creative. The data that is presented and the writing are clear. However, the overall experimental design has issues that reduce overall interpretation (please see specific issues for details), which combined with a lack of thorough analysis of the experimental outcomes severely undermines the value of the NEURON model results and the advance in our understanding of cerebellar processing in situ (again, please see specific issues for details).

      Specific issues:

      1) All data gathered with inhibition blocked. All of the UBC response data (Fig. 1) was gathered in the presence of GABAAR and Glycine R blockers. While such an approach is appropriate generally for isolating glutamatergic synaptic currents, and specifically for examining and characterizing monosynaptic responses to single stimuli, it becomes problematic in the context of assaying synaptic and action potential response durations for long-lasting responses, and in particular for trains of stimuli, when feed-forward and feed-back inhibition modulates responses to afferent stimulation. That is, even for single MF stimuli, given the >500ms duration of UBC synaptic currents, there is plenty of time for feedback inhibition from Golgi cells (or feedforward, from MF to Golgi cell excitation) to interrupt AP firing driven by the direct glutamatergic synaptic excitation. This issue is compounded further for all of the experiments examining trains of MF stimuli. Beyond the impact of feedback inhibition on the AP firing of any given UBC, it would also obviously reduce/alter/interrupt that UBC's synaptic drive of downstream UBCs. This issue fundamentally undermines our ability to interpret the simulation data of Vm and AP firing of both the modeled intermediate and downstream UBC, in terms of applying it to possible cerebellar cortical processing in situ.

      The goal of Figure 1 was to determine the cell types of labeled UBCs in transgenic mouse lines, which is determined entirely by their synaptic responses to glutamate (Borges-Merjane and Trussell, 2015). Thus, blocking inhibition was essential to produce clear results in the characterization of GRP and P079 UBCs. While GABAergic/glycinergic feedforward and feedback inhibition is certainly important in the intact circuit, it was not our intention, nor was it possible, to study its contribution in the present study. Leaving inhibition unblocked does not lead to a physiologically realistic stimulation pattern in acute brain slices, because electrical stimulation produces synchronous excitation and inhibition by directly exciting Golgi cells, rather than their synaptic inputs. The main inhibition that UBCs receive that are crucial to determining burst or pause durations is not via GABA/glycine, but instead through mGluR2, which lasts for 100-1000s of milliseconds. The main excitation that drives UBC firing is mGluR1 and AMPA, which both last 100-1000s of milliseconds. Thus, these large conductances are unlikely to be significantly shaped by 1-10 ms IPSCs from feedforward and feedback GABA/glycine inhibition. Recent studies that examined the duration of bursting or pausing in UBCs had inhibition blocked in their experiments, presumably for the reasons outlined above (Guo et al., 2021; Huson et al., 2023).

      In Author response image 1 is an example showing the synaptic currents and firing patterns in an ON UBC before and after blocking inhibition. The GABA/glycinergic inhibition is fast, occurs soon after the stimuli and has little to no effect on the slow inward current that develops after the end of stimulation, which is what drives firing for 100s of milliseconds.

      Author response image 1.

      Example showing small effect of GABAergic and glycinergic inhibition on excitatory currents and burst duration. A) Excitatory postsynaptic currents in response to train of 10 presynaptic stimuli at 50 Hz before (black) and after (Grey) blocking GABA and glycine receptors. The slow inward current that occurs at the end of stimulation is little affected. B) Expanded view of the synaptic currents evoked during the train of stimuli. GABA/glycine receptors mediate the fast outward currents that occur immediately after the first couple stimuli. C) Three examples of the bursts caused by the 50 Hz stimulation in the same cell without blocking GABA and glycine receptors. D) Three examples in the same cell after blocking GABA and glycine receptors.

      2) No consideration for the involvement of polysynaptic UBCs driving UBC responses to MF stimulation in electrophysiology experiments. Given the established existence (in this manuscript and Dino et al. 2000 Neurosci, Dino et al. 2000 ProgBrainRes, Nunzi and Mugnaini 2000 JCompNeurol, Nunzi et al. 2001 JCompNeurol) of polysynaptic connections from MFs to UBCs to UBCs, the MF evoked UBC responses established in this manuscript, especially responses to trains of stimuli could be mediated by direct MF inputs, or to polysynaptic UBC inputs, or possibly both (to my awareness not established either way). Thus the response durations could already include extension of duration by polysynaptic inputs, and so would overestimate the duration of monosynaptic inputs, and thus polysynaptic amplification/modulation, observed in the NEURON model.

      We are confident that the synaptic responses shown are monosynaptic for several reasons. UBCs receive a single mossy fiber input on their dendritic brush, and thus if our stimulation produces a reliable, short-latency response consistent with a monosynaptic input, then there is not likely to be a disynaptic input, because the main input is accounted for by the monosynaptic response. In all cells included in our data set, the fast AMPA receptor-mediated currents always occurred with short latency (1.24 ± 0.29 ms; mean ± SD; n = 13), high reliability (no failures to produce an EPSC in any of the 13 GRP UBCs in this data set), and low jitter (SD of latency; 0.074 ± 0.046 ms; mean ± SD; n = 13). These measurements have been added to the results section. In some rare cases, we did observe disynaptic currents, which were easily distinguishable because a single electrical stimulation produced a burst of EPSCs at variable latencies. Please see example in Author response image 2. These cases of disynaptic input, which have been reported by others (Diño et al., 2000; Nunzi and Mugnaini, 2000; van Dorp and De Zeeuw, 2015) support the conclusion that UBCs receive input from other UBCs.

      Author response image 2.

      Example of GRP UBC with disynaptic input. Three examples of the effect of a single presynaptic stimulus (triangle) in a GRP UBC with presumed disynaptic input. Note the variable latency of the first evoked EPSC, bursts of EPSCs, and spontaneous EPSCs.

      3) Lack of quantification of subtypes of UBC interconnectivity. Given that it is already established that UBCs synapse onto other UBCs (see refs above), the main potential advance of this manuscript in terms of connectivity is the establishment and quantification of ON-ON, ON-OFF, OFF-ON, and OFF-OFF subtypes of UBC interconnections. But, the authors only establish that each type exists, showing specific examples, but no quantification of the absolute or relative density was provided, and the authors' unquantified wording explicitly or implicitly states that they are not common. This lack of quantification and likely small number makes it difficult to know how important or what impact such synapses have on cerebellar processing, in the model and in situ.

      As noted by the reviewer, the connections between UBCs were rare to observe. We decided against attempting to quantify the absolute or relative density of connections for several reasons. A major reason for rare observations of anatomical connections between UBCs is likely due to the sparse labeling. First, the GRP mouse line only labels 20% of ON UBCs and we are unable to test whether postsynaptic connectivity of GRP ON UBCs is the same as that of the rest of the population of ON UBCs that are not labeled in the GRP mouse line. Second, the Brainbow reporter mouse only labels a small population of Cre expressing cells for unknown reasons. Third, the Brainbow reporter expression was so low that antibody amplification was necessary, which then limited the labeled cells to those close to the surface of the brain slices, because of known antibody penetration difficulties. Therefore, we refrained from estimating the density of these connections, because each of these variables reduced the labeling to unknown degrees and we reasoned that extrapolating our rare observations to the total population would be inaccurate.

      A paper that investigated UBC connectivity using organotypic slice cultures from P8 mice suggests that 2/3 of the UBC population receives UBC input, based on the observation that 2/3 of the mossy fibers did not degenerate as would be expected after 2 days in vitro if they were severed from a distant cell body (Nunzi and Mugnaini, 2000). It remains to be seen if this high proportion is due to the young age of these mice or is also the case in adult mice. Even if these connections are indeed rare, they are expected to have profound effects on the circuit, as each UBC has multiple mossy fiber terminals (Berthie and Axelrad, 1994), and mossy fiber terminals are estimated to contact 40 granule cells each (Jakab and Hamori, 1988). We have added a comment regarding this point to the discussion.

      4) Lack of critical parameters in NEURON model.

      A) The model uses # of molecules of glutamate released as the presumed quantal content, and this factor is constant. However, no consideration of changes in # of vesicles released from single versus trains of APs from MFs or UBCs is included. At most simple synapses, two sequential APs alters release probability, either up or down, and release probability changes dynamically with trains of APs. It is therefore reasonable to imagine UBC axon release probability is at least as complicated, and given the large surface area of contact between two UBCs, the number of vesicles released for any given AP is also likely more complex.

      B) the model does not include desensitization of AMPA receptors, which in the case of UBCs can paradoxically reduce response magnitude as vesicle release and consequent glutamate concentration in the cleft increases (Linney et al. 1997 JNeurophysiol, Lu et al. 2017 Neuron, Balmer et al. 2021 eLIFE), as would occur with trains of stimuli at MF to ON-UBCs.

      A) The model produces synaptic AMPA and mGluR2 currents that reproduce those we recorded in vitro. We did not find it necessary to implement changes in glutamate release during a train as the model was fit to UBC data with the assumption that the glutamate transient did not change during the train. If there is a change in neurotransmitter release during a train, it is therefore built into the model, which has the advantage of reducing its complexity. UBCs are a special case where the postsynaptic currents are mediated mostly by the total amount of transmitter released. Most of the evoked current occurs tens to hundreds of milliseconds after neurotransmitter release and is therefore much more sensitive to total release and less sensitive to how it is released during the train. Author response image 3 shows the effect of reducing the amount of glutamate released by 10% on each stimulus in the model. Despite a significant change in the pattern of neurotransmitter release, as well as a reduction in the total amount of glutamate, the slow EPSC still decays over the course of hundreds of milliseconds.

      Author response image 3.

      Effect of short-term depression of neurotransmitter release. A) The top trace shows the glutamate transient that drives the AMPA receptor model used in our study. No change in release is implemented, although the slow tail of each transient summates during the train. The bottom trace shows the modeled AMPA receptor mediated current. B) In this model the amount of glutamate released is reduced by 10% on each stimulus. The duration of the slow AMPA current that develops at the end of stimulation is similar, despite a profound change in the pattern of neurotransmitter exposure.

      B) The detailed kinetic AMPA receptor model used here accurately reproduces desensitization, and in fact recovery from desensitization is what mediates the slow ON UBC current. This AMPA receptor is a 13-state model, including 4 open states with 1-4 glutamates bound, 4 closed states with 1-4 glutamates bound, 4 desensitized states with 1-4 glutamates bound, and 5 closed states with 0-4 glutamates bound. The forward and reverse rates between different states in the model were fit to AMPA receptor currents recorded from dissociated UBCs and they accurately reproduced the ON UBC currents evoked by synaptic stimulation in our previous work (Balmer et al., 2021).

      5) Lack of quantification of various electrophysiological responses. UBCs are defined (ON or OFF) based on inward or outward synaptic response, but no information is provided about the range of the key parameter of duration across cells, which seems most critical to the current considerations. There is a similar lack of quantification across cells of AP duration in response to stimulation or current injections, or during baseline. The latter lack is particularly problematic because, in agreement with previous publications, the raw data in Fig. 1 shows ON UBCs as quiescent until MF stimulation and OFF UBCs firing spontaneously until MF stimulation, but, for example, at least one ON UBC in the NEURON model is firing spontaneously until synaptically activated by an OFF UBC (Fig. 11A), and an OFF UBC is silent until stimulated by a presynaptic OFF UBC (Fig. 11C). This may be expected/explainable theoretically, but then such cells should be observed in the raw data.

      To address this reasonable concern of a general lack of quantification of electrophysiological responses we have added data characterizing the slow inward and outward currents evoked by synaptic stimulation in GRP and P079 UBCs in the results section and in new panels in Figure 1. We report the action potential pause lengths in P079 UBCs and burst lengths in ON UBCs in the results section. However, we favor the duration of the currents to the length of burst and pause, because the currents do not depend on a stable resting membrane potential, which is itself difficult to determine in intracellular recordings of these small cells. We have added peak times and decay time constants of the slow inward and outward currents in ON and OFF UBCs in the results section and have added new panels to figure 1.

      In a series of recent publications that focused on UBC firing, the authors argue that cell-attached recordings are necessary to determine accurately the burst and pause lengths, as well as spontaneous firing rates (Guo et al., 2021; Huson et al., 2023). (The trade-off of these extracellular recordings is that the monosynaptic nature of the input is nearly impossible to confirm.) Spontaneous firing rates were variable within both GRP and P079 UBCs from silent to firing regularly or in bursts, as previously reported for UBCs (Kim et al., 2012; van Dorp and De Zeeuw, 2015). For clarity, we chose to model the GRP UBCs as silent unless receiving synaptic input and P079 UBCs as active unless receiving synaptic input. As the reviewer suggests, we have observed UBCs firing in the patterns similar to those shown in the model UBCs that have input from a spontaneously active presynaptic UBC. In Author response image 4 are some examples.

      Author response image 4.

      Examples of UBCs that receive spontaneous input. A) Three ON UBCs that had spontaneous EPSCs, suggesting the presence of an active presynaptic UBC. B) Two OFF UBCs that had spontaneous outward currents.

      Reviewer #2 (Public Review):

      In this paper, the authors presented a compelling rationale for investigating the role of UBCs in prolonging and diversifying signals. Based on the two types of UBCs known as ON and OFF UBC subtypes, they have highlighted the existing gaps in understanding UBCs connectivity and the need to investigate whether UBCs target UBCs of the same subtype, different subtypes, or both. The importance of this knowledge is for understanding how sensory signals are extended and diversified in the granule cell layer.

      The authors designed very interesting approaches to study UBCs connectivity by utilizing transgenic mice expressing GFP and RFP in UBCs, Brainbow approach, immunohistochemical and electrophysiological analysis, and computational models to understand how the feed-forward circuits of interconnected UBCs transform their inputs.

      This study provided evidence for the existence of distinct ON and OFF UBC subtypes based on their electrophysiological properties, anatomical characteristics, and expression patterns of mGluR1 and calretinin in the cerebellum. The findings support the classification of GRP UBCs as ON UBCs and P079 UBCs as OFF UBCs and suggest the presence of synaptic connections between the ON and OFF UBC subtypes. In addition, they found that GRP and P079 UBCs form parallel and convergent pathways and have different membrane capacitance and excitability. Furthermore, they showed that UBCs of the same subtype provide input to one another and modify the input to granule cells, which could provide a circuit mechanism to diversify and extend the pattern of spiking produced by mossy fiber input. Accordingly, they suggested that these transformations could provide a circuit mechanism for maintaining a sensory representation of movement for seconds.

      Overall, the article is well written in a sound detailed format, very interesting with excellent discovery and suggested model, however, I have some comments/suggestions that may help to improve this manuscript:

      • The discovery of UBCs innervating each other and their own subtypes, suggesting the presence of feed-forward networks in the cerebellum, is an incredibly fascinating and exciting finding followed by an intriguing model by authors. However, it is worth considering an alternative model as well. I acknowledge that visualizing such interactions using current tools and methods can be challenging ("The approaches used here were not able to determine the existence of networks of more than 2 UBCs connected one after the other. If present, 3 or more UBCs in series could extend and transform the input in even more dramatic ways. The temporal diversity that UBC circuits generate may underlie the flexibility of the cerebellum to coordinate movements over a broad range of behaviors."). Therefore, if this is the case in which more than 2 UBCs connected one after the other, then an alternative model PERHAPS resembles the basal nuclei, with its direct and indirect circuits, can be considered (maybe a type of circular model). The basal nuclei circuits are also regulated by modulators such as D1 dopamine receptors in the direct pathway, causing depolarization, and D2 dopamine receptors in the indirect pathway, resulting in hyperpolarization upon dopamine activation. This approach could involve using computational models to gain insight into potential alternatives within this pathway (may be a future direction).

      Thank you for this suggestion to consider the potentially similar circuit interactions in the basal nuclei. We will certainly investigate this further as we move forward with modeling the feed-forward networks in the cerebellum.

      • GRP UBCs are more densely distributed in lobes VI-IX, while P079 UBCs are more densely distributed in the dorsal leaflet of lobe X in sagittal sections. While the cerebellum is well known for its characteristic stripy pattern, are UBC distributions the same in coronal/transverse section?

      UBCs of different types, based on their expression of specific proteins, have overlapping but somewhat distinct distributions in coronal sections. The densities of calretinin-expressing UBCs are higher within Zebrin II positive zones and form sagittal stripes, whereas the densities of mGluR1-expressing and PLCb4-expressing UBCs vary less but are in their highest densities at the midline (Chung et al., 2009; Sekerkova et al., 2014). The difference noted by the reviewer between the dorsal and ventral leaflets of lobe X are the most distinct that we know of in the GRP and P079 populations.

      • The extension of the axons from both subtypes of UBCs show they are long enough to pass several UBCs and even projections are directed toward the white matter (e.g. Fig 9A), suggesting targeting the UBCs or granule cells in other lobules. Is it suggesting UBCs connectivity between different lobules (perhaps longitudinal connectivity)? Is there any observation or information in coronal/transverse section to visualize mediolateral connectivity?

      This is certainly worth exploring in future work. UBCs have been reported to project their axons into and across the white matter (Diño et al., 2000). To our knowledge, whether UBCs project their axons out of one lobule and into another has not been examined.

      • The limitation in identifying networks involving more than two sequentially connected UBCs was briefly noted. I suggest including a paragraph describing limitations and discussing the implications of the findings would enhance the overall impact of the research and broaden our understanding of cerebellar function.

      • It is a pity that there is no clear conclusion to the discussion of this very interesting study. I suggest providing the key points as a conclusion.

      Thank you for these suggestions. Limitations and implications are included throughout the discussion section and we feel that the summary figure and significance statement now sufficiently convey the key conclusions of the study.

      • Please make the correction in Figure 2A by relabeling it as IXa, IXb, and IXc to correct the typographical error.

      Fixed

      • I recommend rotating Figure 7A to align its orientation with the other figures for consistency.

      Fixed

      Reviewer #1 (Recommendations For The Authors):

      Minor comments that should be addressed for clarity:

      1) In the NEURON model, why was the reversal potential for the leak conductance and Gmax for Ih different for the two types of UBCs. Relatedly, why is Erev for GABAB -95mV if Ek is -90mV?

      The h-current (Ih) was estimated from a hyperpolarizing current step in both cell types and these data have been added to the result section and as a panel in Figure 1. The conductance of Ih in the model cells were adjusted accordingly, with OFF UBCs having ~3 times that of ON UBCs and approximated the measured voltage sag, as we now describe in the methods section. The reversal potential of the model mGluR2 current (which is based on a model of GABAB) has been fixed.

      2) Line 69 justification for their dual genetic approach is a bit too strong: "Paired recordings not possible". It may be difficult, but it is certainly possible.

      Reworded

      3) Confusing wording, only one stat for two parameters? Line 93: These currents were produced by both mGluR1 and AMPA receptors, as they were blocked by their antagonists JNJ16259685 and GYKI53655, respectively (92.86% {plus minus} 3.25; paired t-test; P=0.0066; n = 9; 95 mean {plus minus} SEM) (Fig 1D-E).

      Reworded

      References

      Balmer TS, Borges-Merjane C, Trussell LO (2021) Incomplete removal of extracellular glutamate controls synaptic transmission and integration at a cerebellar synapse. eLife 10:e63819.

      Berthie B, Axelrad H (1994) Granular layer collaterals of the unipolar brush cell axon display rosette-like excrescences. A Golgi study in the rat cerebellar cortex. Neuroscience Letters 167:161–165.

      Borges-Merjane C, Trussell LO (2015) ON and OFF unipolar brush cells transform multisensory inputs to the auditory system. Neuron 85:1029–1042.

      Chung SH, Sillitoe RV, Croci L, Badaloni A, Consalez G, Hawkes R (2009) Purkinje cell phenotype restricts the distribution of unipolar brush cells. Neuroscience 164:1496–1508.

      Diño MR, Schuerger RJ, Liu Y-B, Slater NT, Mugnaini E (2000) Unipolar brush cell: a potential feedforward excitatory interneuron of the cerebellum. Neuroscience 98:625–636.

      Guo C, Huson V, Macosko EZ, Regehr WG (2021) Graded heterogeneity of metabotropic signaling underlies a continuum of cell-intrinsic temporal responses in unipolar brush cells. Nat Commun 12:5491.

      Huson V, Newman LN, Regehr WG (2023) A continuum of response properties across the population of Unipolar Brush Cells in the Dorsal Cochlear Nucleus. J Neurosci Available at: https://www.jneurosci.org/content/early/2023/07/26/JNEUROSCI.0873-23.2023 [Accessed August 15, 2023].

      Jakab RL, Hamori J (1988) Quantitative morphology and synaptology of cerebellar glomeruli in the rat. Anatomy and embryology 179:81–88.

      Kim JA, Sekerkova G, Mugnaini E, Martina M (2012) Electrophysiological, morphological, and topological properties of two histochemically distinct subpopulations of cerebellar unipolar brush cells. Cerebellum 11:1012–1025.

      Nunzi M-G, Mugnaini E (2000) Unipolar brush cell axons form a large system of intrinsic mossy fibers in the postnatal vestibulocerebellum. Journal of Comparative Neurology 422:55–65.

      Sekerkova G, Watanabe M, Martina M, Mugnaini E (2014) Differential distribution of phospholipase C beta isoforms and diaglycerol kinase-beta in rodents cerebella corroborates the division of unipolar brush cells into two major subtypes. Brain structure & function 219:719–749.

      van Dorp S, De Zeeuw CI (2015) Forward signaling by unipolar brush cells in the mouse cerebellum. Cerebellum 14:528– 533.

    2. Reviewer #1 (Public Review):

      The manuscript by Hariani et al. presents experiments designed to improve our understanding of the connectivity and computational role of Unipolar Brush Cells (UBCs) within the cerebellar cortex, primarily lobes IX and X. The authors develop and cross several genetic lines of mice that express distinct fluorophores in subsets of UBCs, combined with immunocytochemistry that also distinguishes subtypes of UBCs, and they use confocal microscopy and electrophysiology to characterize the electrical and synaptic properties of subsets of so-labelled cells, and their synaptic connectivity within the cerebellar cortex. The authors then generate a computer model to test possible computational functions of such interconnected UBCs.

      Using these approaches, the authors report that:<br /> 1) GRP-driven TDtomato is expressed exclusively in a subset (20%) of ON-UBCs, defined electrophysiologically (excited by mossy fiber afferent stimulation via activation of UBC AMPA and mGluR1 receptors) and immunocytochemically by their expression of mGluR1.

      2) UBCs ID'd/tagged by mCitrine expression in Brainbow mouse line P079 is expressed in a similar minority subset of OFF-UBCs defined electrophysiologically (inhibited by mossy fiber afferent stimulation via activation of UBC mGluR2 receptors) and immunocytochemically by their expression of Calretinin. However, such mCitrine expression was also detected in some mGluR1 positive UBCs, which may not have shown up electrophysiologically because of the weaker fluorophore expression without antibody amplification.

      3) Confocal analysis of crossed lines of mice (GRP X P079) stained with antibodies to mGluR1 and calretinin documented the existence of all possible permutations of interconnectivity between cells (ON-ON, ON-OFF, OFF-OFF, OFF-ON), but their overall abundance was low, and neither their absolute or relative abundance was quantified.

      4) A computational model (NEURON ) indicated that the presence of an intermediary UBC (in a polysynaptic circuit from MF to UBC to UBC) could prolong bursts (MF-ON-ON), prolong pauses (MF-ON-OFF), cause a delayed burst (MF-OFF-OFF), cause a delayed pause (MF-OFF-ON) relative to solely MF to UBC synapses which would simply exhibit long bursts (MF-ON) or long pauses (MF-OFF).

      The authors thus conclude that the pattern of interconnected UBCs provides an extended and more nuanced pattern of firing within the cerebellar cortex that could mediate longer lasting sensorimotor responses.

      The cerebellum's long known role in motor skills and reflexes, and associated disorders, combined with our nascent understanding of its role in cognitive, emotional, and appetitive processing, makes understanding its circuitry and processing functions of broad interest to the neuroscience and biomedical community. The focus on UBCs, which are largely restricted to vestibular lobes of the cerebellum reduces the breadth of likely interest somewhat. The overall design of specific experiments is rigorous and the use of fluorophore expressing mouse lines is creative. The data that is presented and the writing are clear. However, despite some additional analysis in response to the initial review, the overall experimental design still has issues that reduce overall interpretation (please see specific issues for details), which combined with a lack of thorough analysis of the experimental outcomes undermines the value of the NEURON model results and the advance in our understanding of cerebellar processing in situ (again, please see specific issues for details).

      Specific issues:<br /> 1) All data gathered with inhibition blocked. All of the UBC response data (Fig. 1) was gathered in the presence of GABAAR and Glycine R blockers. While such an approach is appropriate generally for isolating glutamatergic synaptic currents, and specifically for examining and characterizing monosynaptic responses to single stimuli, it becomes problematic in the context of assaying synaptic and action potential response durations for long lasting responses, and in particular for trains of stimuli, when feed-forward and feed-back inhibition modulates responses to afferent stimulation. I.e. even for single MF stimuli, given the >500ms duration of UBC synaptic currents, there is plenty of time for feedback inhibition from Golgi cells (or feedforward, from MF to Golgi cell excitation) to interrupt AP firing driven by the direct glutamatergic synaptic excitation. This issue is compounded further for all of the experiments examining trains of MF stimuli. Beyond the impact of feedback inhibition on the AP firing of any given UBC, it would also obviously reduce/alter/interrupt that UBC's synaptic drive of downstream UBCs. This issue fundamentally undermines our ability to interpret the simulation data of Vm and AP firing of both the modeled intermediate and downstream UBC, in terms of applying it to possible cerebellar cortical processing in situ.

      The authors' response to the initial concern is (to paraphrase), "its not possible to do and its not important", neither of which are soundly justified.

      As stated in the original review, it is fully understandable and appropriate to use GABAAR/GlycineR antagonists to isolate glutamatergic currents, to characterize their conductance kinetics. That was not the issue raised. The issue raised was that then using only such information to generate a model of in situ behavior becomes problematic, given that feedback and lateral inhibition will sculpt action potential output, which of course will then fundamentally shape their synaptic drive of secondary UBCs, which will be further sculpted by their own inhibitory inputs. This issue undermines interpretation of the NEURON model.

      The argument that taking inhibition into account is not possible because of assumed or possible direct electrical excitation of Golgi cells is confusing for two interacting reasons. First, one can certainly stimulate the mossy fiber bundle to get afferent excitation of UBCs (and polysynaptic feedback/lateral inhibitory inputs) without directly stimulating the Golgi cells that innervate any recorded UBC. Yes, one might be stimulating some Golgi cells near the stimulating electrode, but one can position the stimulating electrode far enough down the white matter track (away from the recorded UBC), such that mossy fiber inputs to the recorded UBC can be stimulated without affecting Golgi cells near or synaptically connected to the recorded UBC. Moreover, if the argument were true, then presumably the stimulation protocol would be just as likely to directly stimulate neighboring UBCs, which then drove the recorded UBC's responses. Thus, it is both doable and should be ensured that stimulation of the white matter is distant enough to not be directly activating relevant, connected neurons within the granule cell layer.

      Finally, the authors present three examples of UBC recordings with and without inhibitory inputs blocked, and state "Thus, these large conductances are unlikely to be significantly shaped by 1-10 ms IPSCs from feedforward and feedback GABA/glycine inhibition" and "GABA/glycinergic inhibition...has little to no effect on the slow inward current that develops after the end of stimulation". This response reflects on original concerns about lack of quantification or consideration of important parameters. In particular, while the traces with and without inhibition are qualitatively similar, quantitative considerations indicate otherwise. First, unquantified examples are not adequate to drive conclusions. Regardless, the main issue (how inhibition affects actual responses in situ) is actually highlighted by the authors current clamp recordings of UBC responses, before and after blocking inhibition. The output response is dramatically different, both at early and late time points, when inhibition is blocked. Again, a lack of quantification (of adequate n's) makes it hard to know exactly how important, but quick "eye ball" estimates of impact include: 1) a switch from only low frequency APs initially (without inhibition blocked) to immediate burst of high frequency APs (high enough to not discern individual APs with given figure resolution) when inhibition is blocked, 2) Slow rising to a peak EPSP, followed by symmetrical return to baseline (without inhibition blocked) versus immediate rise to peak, followed by prolonged decay to baseline (with inhibition blocked), 3) substantially shorter duration (~34% shorter) secondary high frequency burst (individual APs not discernible) of APs (with inhibition blocked versus without inhibition blocked), and 4) substantial reduction in number of long delayed APs (with inhibition blocked versus without inhibition blocked). Thus, clearly, feedback/lateral inhibition is actually sculpting AP output at all phases of the UBC response to trains of afferent stimulations. Importantly, the single voltage clamp trace showing little impact of transient IPSCs on the slow EPSC do not take into account likely IPSC influences on voltage-activated conductances that would not occur in voltage-clamp recordings but would be free to manifest in current clamp, and thereby influence AP output, as observed.

      So again, our ability to understand how interconnected UBCs behave in the intact system is undermined by the lack of consideration and quantification of the impact of inhibition, and it not being incorporated into the model. At the very least a strong proviso about lack of inclusion of such information, given the authors' data showing its importance in the few examples shown, should be added to the discussion.

      2) No consideration for involvement of polysynaptic UBCs driving UBC responses to MF stimulation in electrophysiology experiments. Given the established existence (in this manuscript and Dino et al. 2000 Neurosci, Dino et al. 2000 ProgBrainRes, Nunzi and Mugnaini 2000 JCompNeurol, Nunzi et al. 2001 JCompNeurol) of polysynaptic connections from MFs to UBCs to UBCs, the MF evoked UBC responses established in this manuscript, especially responses to trains of stimuli could be mediated by direct MF inputs, or to polysynaptic UBC inputs, or possibly both (to my awareness not established either way). Thus the response durations could already include extension of duration by polysynaptic inputs, and so would overestimate the duration of monosynaptic inputs, and thus polysynaptic amplification/modulation, observed in the NEURON model.

      Author response: "UBCs receive a single mossy fiber input on their dendritic brush, and thus if our stimulation produces a reliable, short-latency response consistent with a monosynaptic input, then there is not likely to be a disynaptic input."

      This statement is not congruent with the literature, with early work by Mugnaini and colleagues (Mugnaini et al. 1994 Synapse; Mugnaini and Flores 1994 J. Comp. Neurol.) indicating that UBCs are innervated by 1-2 mossy fibers, which are as likely other UBC terminals as MFs. This leaves open the possibility that so called monosynaptic responses do, as originally suggested, already include polysynaptic feedforward amplification of duration. While the authors also indicate that isolated disynaptic currents can be observed when they occur in isolation, a careful examination and objective documentation of "monosynaptic" responses would address this issue. Presumably, if potential disynaptic UBC inputs occur during a monosynaptic MF response, it would be detected as an abrupt biphasic inward/outward current, due to additional AMPA receptor activation but further desensitization of those already active (as observed by Kinney et al. 1997 J. Neurophysiol: "The delivery of a second MF stimulus at the peak of the slow EPSC evoked a fast EPSC of reduced amplitude followed by an undershoot of the subsequent slow current"). If such polysynaptic inputs are truly absent and are "rare" in isolation, some estimation of how common or not such synaptic amplification is, would improve our understanding of the overall significance of these inputs.

      3) Lack of quantification of subtypes of UBC interconnectivity. Given that it is already established that UBCs synapse onto other UBCs (see refs above), the main potential advance of this manuscript in terms of connectivity is the establishment and quantification of ON-ON, ON-OFF, OFF-ON, and OFF-OFF subtypes of UBC interconnections. But, the authors only establish that each type exists, showing specific examples, but no quantification of the absolute or relative density was provided, and the authors' unquantified wording explicitly or implicitly states that they are not common. This lack of quantification and likely small number makes it difficult to know how important or what impact such synapses have on cerebellar processing, in the model and in situ.

      To address this issue, the authors added the following text to the discussion section: "We did not estimate the density of these UBC to UBC connections, because the sparseness of labeling using these approaches made an accurate calculation impossible. Previous work using organotypic slice cultures from P8 mice estimated that 2/3 of the UBC population receives input from other UBCs (Nunzi & Mugnaini, 2000), although it is unclear whether this is the case in older mice."

      While accurate, the addition doesn't really address the situation, which is that apparently the reported connections are rare. Adding the information about 2/3 of UBCs having UBC inputs in culture, implies the opposite might be true (i.e. that they might be quite common), which is in contrast to the authors' data, so should be reworded for clarity, which should also incorporate the considerations covered in point #2 above. I.e. if the authors do establish that none of their recordings have polysynaptic inputs, and if they determine that the number of cells that showed isolated di-synaptic inputs is indeed rare, then it suggests that these specific polysynaptic connections are in fact rare.

      4) Lack of critical parameters in NEURON model.<br /> A) The model uses # of molecules of glutamate released as the presumed quantal content, and this factor is constant. However, no consideration of changes in # of vesicles released from single versus trains of APs from MFs or UBCs is included. At most simple synapses, two sequential APs alters release probability, either up or down, and release probability changes dynamically with trains of APs. It is therefore reasonable to imagine UBC axon release probability is at least as complicated, and given the large surface area of contact between two UBCs, the number of vesicles released for any given AP is also likely more complex.

      B) the model does not include desensitization of AMPA receptors, which in the case of UBCs can paradoxically reduce response magnitude as vesicle release and consequent glutamate concentration in the cleft increases (Linney et al. 1997 JNeurophysiol, Lu et al. 2017 Neuron, Balmer et al. 2021 eLIFE), as would occur with trains of stimuli at MF to ON-UBCs.

      While the authors have not added the suggested additional parameters, their clarifications regarding the implications of existing parameters, and demonstration of reasonable fits to experimental data, and lack of substantial effect of simulating reduced vesicle release probability, provided by the authors, adequately addresses this concern.

      5) Lack of quantification of various electrophysiological responses. UBCs are defined (ON or OFF) based on inward or outward synaptic response, but no information is provided about the range of the key parameter of duration across cells, which seems most critical to the current considerations. There is a similar lack of quantification across cells of AP duration in response to stimulation or current injections, or during baseline. The latter lack is particularly problematic because in agreement with previous publications, the raw data in Fig. 1 shows ON UBCs as quiescent until MF stimulation and OFF UBCs firing spontaneously until MF stimulation, but, for example, at least one ON UBC in the NEURON model is firing spontaneously until synaptically activated by an OFF UBC (Fig. 11A), and an OFF UBC is silent until stimulated by a presynaptic OFF UBC (Fig. 11C). This may be expected/explainable theoretically, but then such cells should be observed in the raw data.

      The authors have added additional analysis and discussion, which adequately addresses this concern.

    3. eLife assessment

      This study presents valuable findings about synaptic connectivity among subsets of unipolar brush cells (UBCs), a specialized interneuron primarily located in the vestibular lobules of the cerebellar cortex. The evidence supporting the claims are interesting although incomplete in some areas. The work will be of interest to cerebellar neuroscientists as well as those focussed on synaptic properties and mechanisms. Although several compelling pieces of data were presented, substantial work remains to be conducted in order for the hypothesis and predictions of the manuscript to confirm how these factors play out in the actual brain circuit and how it would impact the processing of feedback or feedforward activity that would be required to promote behavior.

    4. Reviewer #2 (Public Review):

      In this paper, the authors presented a compelling rationale for investigating the role of UBCs in prolonging and diversifying signals. Based on the two types of UBCs known as ON and OFF UBC subtypes, they have highlighted the existing gaps in understanding UBCs connectivity and the need to investigate whether UBCs target UBCs of the same subtype, different subtypes, or both. The importance of this knowledge is for understanding how sensory signals are extended and diversified in the granule cell layer.

      The authors designed very interesting approaches to study UBCs connectivity by utilizing transgenic mice expressing GFP and RFP in UBCs, Brainbow approach, immunohistochemical and electrophysiological analysis, and computational models to understand how the feed-forward circuits of interconnected UBCs transform their inputs.

      This study provided evidence for the existence of distinct ON and OFF UBC subtypes based on their electrophysiological properties, anatomical characteristics, and expression patterns of mGluR1 and calretinin in the cerebellum. The findings support the classification of GRP UBCs as ON UBCs and P079 UBCs as OFF UBCs and suggest the presence of synaptic connections between the ON and OFF UBC subtypes. In addition, they found that GRP and P079 UBCs form parallel and convergent pathways and have different membrane capacitance and excitability. Furthermore, they showed that UBCs of the same subtype provide input to one another and modify the input to granule cells, which could provide a circuit mechanism to diversify and extend the pattern of spiking produced by mossy fiber input. Accordingly, they suggested that these transformations could provide a circuit mechanism for maintaining a sensory representation of movement for seconds.

      Overall, the article is well written in a sound detailed format, very interesting with excellent discovery and suggested model.

      I believe the authors have provided appropriate responses and have consequently revised the manuscript in a convincing manner. Although I am not an expert in physiology, I find the explanations and clarifications to be acceptable.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Cook, Watt, and colleagues previously reported that a mouse model of Spinocerebellar ataxia type 6 (SCA6) displayed defects in BDNF and TrkB levels at an early disease stage. Moreover, they have shown that one month of exercise elevated cerebellar BDNF expression and improved ataxia and cerebellar Purkinje cell firing rate deficits. In the current work, they attempt to define the mechanism underlying the pathophysiological changes occurring in SCA6. For this, they carried out RNA sequencing of cerebellar vermis tissue in 12-month-old SCA6 mice, a time when the disease is already at an advanced stage, and identified widespread dysregulation of many genes involved in the endo-lysosomal system. Focusing on BDNF/TrkB expression, localization, and signaling they found that, in 7-8 month-old SCA6 mice early endosomes are enlarged and accumulate BDNF and TrkB in Purkinje cells. Curiously, TrkB appears to be reduced in the recycling endosomes compartment, despite the fact that recycling endosomes are morphologically normal in SCA6. In addition, the authors describe a reduction in the Late endosomes in SCA6 Purkinje cells associated with reduced BDNF levels and a probable deficit in late endosome maturation.

      We would like to thank the reviewers for their careful reading of the paper, their feedback has helped us to add information and experiments to the paper that enhance the clarity of the findings.

      Strengths:

      The article is well written, and the findings are relevant for the neuropathology of different neurodegenerative diseases where dysfunction of early endosomes is observed. The authors have provided a detailed analysis of the endo-lysosomal system in SCA6 mice. They have shown that TrkB recycling to the cell membrane in recycling endosomes is reduced, and the late endosome transport of BDNF for degradation is impaired. The findings will be crucial in understanding underlying pathology. Lastly, the deficits in early endosomes are rescued by chronic administration of 7,8-DHF.

      We thank the reviewers for their positive feedback on this work.

      Weaknesses:

      The specificity of BDNF and TrkB immunostaining requires additional controls, as it has been very difficult to detect immunostaining of BDNF. In addition, in many of the figures, the background or outside of Purkinje cell boundaries also exhibits a positive signal.

      We agree with the reviewers that the performance of the BDNF and TrkB antibodies is an important concern. We have ourselves had difficulties with the performance of many antibodies and the images in this paper are the result of many years of optimization. We have therefore added further detail about the antibody optimization to the methods section of this paper, and have carried out new staining experiments with additional controls. We have added 2 new figure panels in supplementary figures 3 and 4 to demonstrate these tests.

      In the case of anti-BDNF antibodies, we have tested several antibodies and staining protocols and found that in our hands, the only antibody that reliably stained BDNF with a good signal to noise ratio was the one used in this paper (abcam ab108319). Even for this antibody, the staining was greatly enhanced by the use of a heat induced epitope retrieval (HIER) step, which allowed the visualization of BDNF within intracellular structures such as endosomes. When we quantified the intensity of this staining in our previous paper, the results were in agreement with those from a BDNF ELISA used to measure levels of BDNF in the cerebellar vermis of WT and SCA6 mice (Cook et al., 2022), which corroborates these results. As the staining was carried out in tissue sections and not dissociated cells, we also see positive signal from the BDNF staining outside of the Purkinje cells, since BDNF acts on cell-surface receptors and is thus released into the extracellular space around cells (Kuczewski et al., 2008) and is detectable in the extracellular matrix (Lam et al., 2019) and presynaptic terminals around neurons (Camuso et al., 2022; Choo et al., 2017). This is in contrast to studies that image BDNF mRNA with in-situ hybridization, which labels BDNF mRNA predominantly found in cells, and cannot tell us about sub-cellular or extracellular localization of BDNF protein. Together, these factors explain why we observe staining that is not cell- limited, but extends into the space around the cells of interest.

      We have added an additional supplemental figure to demonstrate the importance of using HIER when staining slices with anti-BDNF (Supplementary figure 3). We tested HIER protocols that involved heating the slices to 95°C in a variety of buffers. The buffers tested were sodium citrate buffer (10 mM sodium citrate, 0.05% Tween 20, pH 6), Tris buffer (10mM TBS, 0.05% Tween 20, pH 10), EDTA buffer (1mM EDTA, 0.05% Tween 20, pH 8) and neutral PBS. The PBS produced the best result, enhancing the staining of both anti-BDNF and anti-EEA1 antibodies (Supplementary figure 3). Therefore all slices stained using those antibodies were heated to 95°C in PBS using a heat block or thermocycler for 10 minutes, then allowed to cool before staining proceeded.

      The antibody we use (abcam ab108319) has been used in hundreds of other publications, including Javed et al., 2021 who ectopically expressed BDNF and noted colocalization between the antibody staining and the GFP tag of the BDNF construct, and Lejkowska et al., 2019 who overexpressed BDNF and saw a dramatic increase in antibody staining as well. The colocalization between ectopically expressed BDNF and the antibody in these studies demonstrates the specificity of the antibody.

      However, to further validate antibody specificity we used liver tissue as a negative control. In liver tissue from rodents and humans, the majority of the liver contains negligible levels of BDNF (Koppel et al., 2009; Vivacqua et al., 2014), see also the Human Protein Atlas. The exception is some cholangiocytes: epithelial cells that express BDNF at high levels (Vivacqua et al., 2014). We obtained liver tissue from a WT mouse that was undergoing surgery for an unrelated project and fixed and processed the tissue as we did for brain tissue (outlined in methods section). As we would expect, most of the cells in the liver showed BDNF immunoreactivity that was comparable to background levels (Supplementary figure 3). Interestingly, we were also able to detect sparse highly BDNF-positive cells in the liver, presumed cholangiocytes (Supp. Fig. 3). This pattern of liver BDNF expression is as predicted in the literature, and thus acts as a control for our antibody. We therefore believe that in our hands this antibody is able to stain BDNF with an appropriate degree of specificity.

      We also carried out staining experiments using a second anti-TrkB antibody that we had previously used to detect TrkB via Western bloing. We carried out immunohistochemistry as previously described using tissue sections from a WT mouse. The staining with the two different antibodies was carried out at the same time and all other reagents were kept constant. We found that both antibodies labelled TrkB in a similar pattern of localization, including in the early endosomes of the Purkinje cells (Supplementary figure 4). The second antibody however did have a lower signal to noise ratio and so we believe that the original anti-TrkB antibody used in this manuscript (EMD Millipore ab9872) is optimal for staining cerebellar tissue sections in our hands.

      One important concern about the conclusions is that the RNAseq experiment was conducted in 12-month- old SCA6 mice suggesting that the defects in the endo-lysosomal system may be caused by other pathophysiological events and, likewise, the impairment in BDNF signaling may also be indirect, as also noted by the authors. Indeed, Purkinje cells in SCA6 mice have an impaired ability to degrade other endocytosed cargo beyond BDNF and TrkB, most likely because of trafficking deficits that result in a disruption in the transport of cargo to the lysosomes and lysosomal dysfunction.

      We agree with the reviewers that the defects in the endo-lysosomal system may be caused by other events occurring in the course of disease progression. As mentioned by the reviewers, we have noted this possibility in the text. Detailed investigation into the sequence of events and the root causes of signaling disruption in SCA6 merits future study and we aim to address this in future work. We have expanded this explanation in the text.

      Moreover, the beneficial effects of 7,8-DHF treatment on motor coordination may be caused by 7,8-DHF properties other than the putative agonist role on TrkB. Indeed, many reservations have been raised about using 7,8-DHF as an agonist of TrkB activity. Several studies have now debunked (Todd et al. PlosONE 2014, PMID: 24503862; Boltaev et al. Sci Signal 2017, PMID: 28831019) or at the very least questioned (Lowe D, Science 2017: see Discussion: https://www.science.org/content/blog-post/those-compounds-aren-t- what-you-think-they-are Wang et al. Cell 2022 PMID: 34963057). Another interpretation is that 7,8-DHF possesses antioxidant activity and neuroprotection against cytotoxicity in HT-22 and PC12 cells, both of which do not express TrkB (Chen et al. Neurosci Lett 201, PMID: 21651962; Han et al. Neurochem Int. 2014, PMID: 24220540). Thus, while this flavonoid may have a beneficial effect on the pathophysiology of SCA6, it is most unlikely that mechanistically this occurs through a TrkB agonistic effect considering the potent anti-oxidant and anti-inflammatory roles of flavonoids in neurodegenerative diseases (Jones et al. Trends Pharmacol Sci 2012, PMID: 22980637).

      We thank the reviewers for raising this important point. We have noted in our previous paper (Cook et al., 2022) that 7,8-DHF may not be acting as a TrkB agonist in SCA6 mice, and are in agreement that other explanations are possible. We have now added information to the text of this paper to highlight this possibility. We did show in our previous paper that 7,8-DHF administration activates Akt signaling in the cerebellum of SCA6 mice, a signaling event that is known to take place downstream of TrkB activation. Additionally, 7,8-DHF treatment led to the increase of TrkB levels in the cerebellum of SCA6 mice (Cook et al., 2022), implicating TrkB in the mechanism of action, even if mechanistically, this is not via direct TrkB activation alone. However, even if the mechanism is currently incompletely explained, we believe that 7,8- DHF remains a valuable treatment strategy for SCA6. We have tried to rewrite the Discussion to highlight what we think is the most important takeaway: that 7,8-DHF can rescue endosomal and other deficits in SCA6, even if we do not currently know the full mechanism of action. We have therefore amended the text to add more detail about other potential explanations for the mechanism of action of 7,8-DHF.

      References

      Camuso S, La Rosa P, Fiorenza MT, Canterini S. 2022. Pleiotropic effects of BDNF on the cerebellum and hippocampus: Implications for neurodevelopmental disorders. Neurobiol Dis. doi:10.1016/j.nbd.2021.105606

      Choo M, Miyazaki T, Yamazaki M, Kawamura M, Nakazawa T, Zhang J, Tanimura A, Uesaka N, Watanabe M, Sakimura K, Kano M. 2017. Retrograde BDNF to TrkB signaling promotes synapse elimination in the developing cerebellum. Nat Commun 8:195. doi:10.1038/s41467-017-00260-w

      Cook AA, Jayabal S, Sheng J, Fields E, Leung TCS, Quilez S, McNicholas E, Lau L, Huang S, Watt AJ. 2022. Activation of TrkB-Akt signaling rescues deficits in a mouse model of SCA6. Sci Adv 8:3260. doi:10.1126/sciadv.abh3260

      Javed S, Lee YJ, Xu J, Huang WH. 2021. Temporal dissection of Rai1 function reveals brain-derived neurotrophic factor as a potential therapeutic target for Smith-Magenis syndrome. Hum Mol Genet 31:275–288. doi:10.1093/HMG/DDAB245

      Koppel I, Aid-Pavlidis T, Jaanson K, Sepp M, Pruunsild P, Palm K, Timmusk T. 2009. Tissue-specific and neural activity-regulated expression of human BDNF gene in BAC transgenic mice. BMC Neurosci 10:68. doi:10.1186/1471-2202-10-68

      Kuczewski N, Porcher C, Ferrand N, Fiorentino H, Pellegrino C, Kolarow R, Lessmann V, Medina I, Gaiarsa JL. 2008. Backpropagating action potentials trigger dendritic release of BDNF during spontaneous network activity. J Neurosci 28:7013–7023. doi:10.1523/JNEUROSCI.1673-08.2008

      Lam D, Enright HA, Cadena J, Peters SKG, Sales AP, Osburn JJ, Soscia DA, Kulp KS, Wheeler EK, Fischer NO. 2019. Tissue-specific extracellular matrix accelerates the formation of neural networks and communities in a neuron-glia co-culture on a multi-electrode array. Sci Rep 9. doi:10.1038/s41598- 019-40128-1

      Lejkowska R, Kawa MP, Pius-Sadowska E, Rogińska D, Łuczkowska K, Machaliński B, Machalińska A. 2019. Preclinical Evaluation of Long-Term Neuroprotective Effects of BDNF-Engineered Mesenchymal Stromal Cells as Intravitreal Therapy for Chronic Retinal Degeneration in Rd6 Mutant Mice. Int J Mol Sci 2019, Vol 20, Page 777 20:777. doi:10.3390/IJMS20030777

      Vivacqua G, Renzi A, Carpino G, Franchitto A, Gaudio E. 2014. Expression of brain derivated neurotrophic factor and of its receptors: TrKB and p75NT in normal and bile duct ligated rat liver. Ital J Anat Embryol 119:111–129. doi:10.13128/IJAE-15138

    2. eLife assessment

      This manuscript provides valuable insights to the underlying mechanism for Spinocerebellar ataxia 6 (SCA6) due to defective endolysosomal trafficking of BDNF and its receptor TrkB. The findings are compelling and significant in understanding the underlying pathology of SCA6. The authors have acknowledged the experimental weaknesses and recognize there may be multiple mechanisms to explain the findings.

    3. Joint Public Review:

      Cook, Watt, and colleagues previously reported that a mouse model of Spinocerebellar ataxia type 6 (SCA6) displayed defects in BDNF and TrkB levels at an early disease stage. Moreover, they have shown that one month of exercise elevated cerebellar BDNF expression and improved ataxia and cerebellar Purkinje cell firing rate deficits. In the current work, they attempt to define the mechanism underlying the pathophysiological changes occurring in SCA6. For this, they carried out RNA sequencing of cerebellar vermis tissue in 12-month-old SCA6 mice, a time when the disease is already at an advanced stage, and identified widespread dysregulation of many genes involved in the endo-lysosomal system. Focusing on BDNF/TrkB expression, localization, and signaling they found that, in 7-8 month-old SCA6 mice early endosomes are enlarged and accumulate BDNF and TrkB in Purkinje cells. Curiously, TrkB appears to be reduced in the recycling endosomes compartment, despite the fact that recycling endosomes are morphologically normal in SCA6. In addition, the authors describe a reduction in the Late endosomes in SCA6 Purkinje cells associated with reduced BDNF levels and a probable deficit in late endosome maturation.

      Strengths:<br /> The article is well written, and the findings are relevant for the neuropathology of different neurodegenerative diseases where dysfunction of early endosomes is observed. The authors have provided a detailed analysis of the endo-lysosomal system in SCA6 mice. They have shown that TrkB recycling to the cell membrane in recycling endosomes is reduced, and the late endosome transport of BDNF for degradation is impaired. The findings will be crucial in understanding underlying pathology. Lastly, the deficits in early endosomes are rescued by chronic administration of 7,8-DHF.

      Weaknesses:<br /> The specificity of BDNF and TrkB immunostaining requires additional controls, as it has been very difficult to detect immunostaining of BDNF.<br /> The revised manuscript has included additional analysis using epitope retrieval and a negative liver control with the Abcam antibody against BDNF. An alternative antibody that may be considered for BDNF detection is from Icosagen AS. This antibody has been found to be effective for immunofluorescence and immunoblot purposes.

      Two other issues were brought up in the initial review process--

      1) One important concern about the conclusions is that the RNAseq experiment was conducted in 12-month-old SCA6 mice suggesting that the defects in the endo-lysosomal system may be caused by other pathophysiological events and, likewise, the impairment in BDNF signaling may also be indirect, as also noted by the authors. Indeed, Purkinje cells in SCA6 mice have an impaired ability to degrade other endocytosed cargo beyond BDNF and TrkB, most likely because of trafficking deficits that result in a disruption in the transport of cargo to the lysosomes and lysosomal dysfunction.<br /> This concern was acknowledged in the revision and will require further analysis.

      2) Moreover, the beneficial effects of 7,8-DHF treatment on motor coordination may be caused by 7,8-DHF properties other than the putative agonist role on TrkB. Indeed, many reservations have been raised about using 7,8-DHF as an agonist of TrkB activity. Several studies have now debunked (Todd et al. PlosONE 2014, PMID: 24503862; Boltaev et al. Sci Signal 2017, PMID: 28831019) or at the very least questioned (Lowe D, Science 2017: see Discussion: https://www.science.org/content/blog-post/those-compounds-aren-t-what-you-think-they-are Wang et al. Cell 2022 PMID: 34963057). Another interpretation is that 7,8-DHF possesses antioxidant activity and neuroprotection against cytotoxicity in HT-22 and PC12 cells, both of which do not express TrkB (Chen et al. Neurosci Lett 201, PMID: 21651962; Han et al. Neurochem Int. 2014, PMID: 24220540). Thus, while this flavonoid may have a beneficial effect on the pathophysiology of SCA6, it is most unlikely that mechanistically this occurs through a TrkB agonistic effect considering the potent anti-oxidant and anti-inflammatory roles of flavonoids in neurodegenerative diseases (Jones et al. Trends Pharmacol Sci 2012, PMID: 22980637).<br /> The authors have acknowledged alternative explanations for the action of 7,8-DHF and have qualified the discussion of this issue.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editor for their thoughful and careful evaluation of our manuscript. We appreciate your time and effort and have incorporated many of these suggestions to improve our revised manuscript.

      Reviewer #1 (Public Review):

      Summary: Cullinan et al. explore the hypothesis that the cytoplasmic N- and C-termini of ASIC1a, not resolved in x-ray or cryo-EM structures, form a dynamic complex that breaks apart at low pH, exposing a C-terminal binding site for RIPK1, a regulator of necrotic cell death. They expressed channels tagged at their N- and C-termini with the fluorescent, non-canonical amino acid ANAP in CHO cells using amber stop-codon suppression. Interaction between the termini was assessed by FRET between ANAP and colored transition metal ions bound either to a cysteine reactive chelator attached to the channel (TETAC) or metal-chelating lipids (C18-NTA). A key advantage to using metal ions is that they are very poor FRET acceptors, i.e. they must be very close to the donor for FRET to occur. This is ideal for measuring small distances/changes in distance on the scales expected from the initial hypothesis. In order to apply chelated metal ions, CHO cells were mechanically unroofed, providing access to the inner leaflet of the plasma membrane. At high pH, the N- and C- termini are close enough for FRET to be measured, but apparently too far apart to be explained by a direct binding interaction. At low pH, there was an apparent increase in FRET between the termini. FRET between ANAP on the N-and Ctermini and metal ions bound to the plasma membrane suggests that both termini move away from the plasma membrane at low pH. The authors propose an alternative hypothesis whereby close association with the plasma membrane precludes RIPK1 binding to the C-terminus of ASIC1a.

      Strengths: The findings presented here are certainly valuable for the ion channel/signaling field and the technical approach only increases the significance of the work. The choice of techniques is appropriate for this study and the results are clear and high quality. Sufficient evidence is presented against the starting hypothesis.

      Weaknesses: I have a few questions about certain controls and assumptions that I would like to see discussed more explicitly in the manuscript.

      My biggest concern is with the C-terminal citrine tag. Might this prevent the hypothesized interaction between the N- and C-termini? What about the serine to cysteine mutations? The authors might consider a control experiment in channels lacking the C-terminal FP tag.

      While it is certainly possible that the C-terminal citrine tag is preventing the hypothesized interaction between the intracellular termini, there are a few things that mitigate (but not eliminate) this concern. First, previous work looking at the interaction between the intracellular termini used FPs on both the N- and C-termini and concluded that in fact there is an interaction (PMID:31980622). Our channels have only a single FP, and we use a higher resolution FRET approach. Second, we aVach our citrine tag with a 11-residue linker, allowing for enhanced flexibility of the region and hopefully allowing for more space for an interaction that was posited to be between the very proximal part of the C-terminus (near the membrane and away from the tag) and the untagged N-terminus. Third, we previously showed that Stomatin, a much larger protein than the NTD, could bind the distal C-terminus of rASIC3 with a large fluorescent protein connected by the same linker on the C-terminus. In the case of Stomatin, the interaction involved the residues at the distal portion of the C-terminus close to the bulky FP. Interestingly, while we did not publish this, without this flexible linker, Stomatin could not regulate the channel and likely did not bind.

      Despite this, we agree that this is possible and have added a statement in our limitations section explicitly saying this.

      Figure 2 supplement 1 shows apparent read-through of the N-terminal stop codons. Given that most of the paper uses N-terminal ANAP tags, this figure should be moved out of the supplement. Do Nterminally truncated subunits form functional channels? Do the authors expect N-terminally truncated subunits to co-assemble in trimers with full-length subunits? The authors should include a more explicit discussion regarding the effect of truncated channels on their FRET signal in the case of such co-assembly.

      The positions that show readthrough (E6, L18, H515) were not used in the study. We eliminated them largely on the basis of these westerns. We elected to put the bulk of the blots in the supplement simply because of how many there were. We believe this is the best compromise. It allows us to show representative blots for all our positions without making an illegible figure with 7 blots.

      The N-terminally truncated subunits would create very short peptides that are not able to create functional channels. A premature stop at say E8 would create a 7-mer. Our longest N-terminal truncation would only create a protein of 32 amino acids. These don’t contain the transmembrane segments and thus cannot make functional channels.

      As the epitope used for the western blots in Figure 2 and supplements is part of the C-terminal tag, these blots do not provide an estimate of the fraction of C-terminally truncated channels (those that failed to incorporate ANAP at the stop codon). What effect would C-terminally truncated channels have on the FRET signal if incorporated into trimers with full-length subunits?

      Alternatively, C-terminally truncated subunits would be able to form functional channels because they contain the full N-terminus, the transmembrane domains, the extracellular domain and a portion of the C-terminus. We don’t think this is a major contaminant to our experiments. The only two C-terminal ANAP positions we use are 464 and 505. In each of these cases, they are only used for memFRET. The ones that do not contain ANAP are essentially “invisible” to the experiment. Since we are measuring their proximity to the membrane, having some missing should not maVer. However, there is some chance that truncations in some subunits could allosterically affect the position of the CT in other subunits. We have added a discussion of this in the manuscript.

      Some general discussion of these results in the context of trimeric channels would be helpful. Is the putative interaction of the termini within or between subunits? Are the distances between subunits large enough to preclude FRET between donors on one subunit and acceptor ions bound on multiple subunits?

      Thank you for this comment. We did not directly test whether the distances are within or between subunits. We considered using a concatemer to do this, however, the concatemeric channels do not express particularly well. Then, UAA incorporation hurts the expression as well. It was unlikely we would be able to get sufficient expression for tmFRET.

      However, the Maclean group has previously tested this using FRET between concatenated subunits and determined that FRET is stronger within than between subunits. We have updated the manuscript to reflect a more thorough discussion of our results in the context of their trimeric assembly.

      The authors conclude that the relatively small amount of FRET between the cytoplasmic termini suggests that the interaction previously modeled in Rosetta is unlikely. Is it possible that the proposed structure is correct, but labile? For example, could it be that the FRET signal is the time average of a state in which the termini directly interact (as in the Rosetta model) and one in which they do not?

      The proposed RoseVa model does not include the reentrant loop of the channel, so it is probable that this model would change if it were redone to include this new feature of the channel.

      However, we do discuss the limitation of FRET as a method that measures a time average that is weighted towards closest approach in our discussion section. The termini are most certainly dynamic and it is possible that spend some time in close proximity. Given that FRET is biased towards closest approach, we actually think this strengthens our argument that the termini don’t spend a great deal of time in complex. In addition, our MST data suggests that the termini do not bind. We have added some commentary on this to the discussion section for clarity.

      Reviewer #2 (Public Review):

      Summary:

      The authors use previously characterised FRET methods to measure distances between intracellular segments of ASIC and with the membrane. The distances are measured across different conditions and at multiple positions in a very complete study. The picture that emerges is that the N- and C-termini do not associate.

      Strengths:

      Good controls, good range of measurements, advanced, well-chosen and carefully performed FRET measurements. The paper is a technical triumph. Particularly, given the weak fluorescence of ANAP, the extent of measurements and the combination with TETAC is noteworthy.

      The distance measurements are largely coherent and favour the interpretation that the N and C terminus are not close together as previously claimed.

      Weaknesses:

      One difficulty is that we do not have a positive control for what binding of something to either N- or Cterminus would look like (either in FRET or otherwise).

      We acknowledge that this is a challenge for the approach. Having a positive control for binding would be great but we are not sure such a thing exists. You could certainly imagine a complex between two domains where each label (ANAP and TETAC) are pointed away from one other (giving comparatively modest quenching) or one where they are very close (giving comparatively large quenching), both of which could still be bound. This is essentially a less significant version of the problem with using FPs to measure proximity…they are not very good proxies for the position of the termini. These small labels are certainly beVer proxies but still not perfect. Our conclusion here is based more on the totality of the data. We tried many combinations and saw no sign of distances closer than ~ 20A at resting pH. We think the simplest explanation is that they are not close to one another but we tried to lay out the limitations in the discussion.

      One limitation that is not mentioned is the unroofing. The concept of interaction with intracellular domains is being examined. But the authors use unroofing to measure the positions, fully disrupting the cytoplasm. Thus it is not excluded that the unroofing disrupts that interaction. This should be mentioned as a possible (if unlikely) limitation.

      Thank you for your comment. We discuss unroofing as a potential limitation because it exposes both sides of the plasma membrane to changes in pH. We have updated this section to include acknowledgement of the possibility that unroofing disrupts the interaction via washout of other critical proteins.

      Reviewer #3 (Public Review):

      Summary: The manuscript by Cullinan et al., uses ANAP-tmFRET to test the hypothesis that the NTD and CTD form a complex at rest and to probe these domains for acid-induced conformational changes. They find convincing evidence that the NTD and CTD do not have a propensity to form a complex. They also report these domains are parallel to the membrane and that the NTD moves towards, and the CTD away, from the membrane upon acidification.

      Strengths:

      The major strength of the paper is the use of tmFRET, which excels at measuring short distances and is insensitive to orientation effects. The donor-acceptor pairs here are also great choices as they are minimally disruptive to the structure being studied.

      Furthermore, they conduct these measurements over several positions with the N and C tails, both between the tails and to the membrane. Finally, to support their main point, MST is conducted to measure the association of recombinant N and C peptides, finding no evidence of association or complex formation.

      Weaknesses:

      While tmFRET is a strength, using ANAP as a donor requires the cells to be unroofed to eliminate background signal. This causes two problems. First, it removes any possible low affinity interacting proteins such as actinin (PMID 19028690). Second, the pH changes now occur to both 'extracellular' and 'intracellular' lipid planes. Thus, it is unclear if any conformational changes in the N and CTDs arise from desensitization of the receptor or protonation of specific amino acids in the N or CTDs or even protonation of certain phospholipid groups such as in phosphatidylserine. The authors do comment that prolonged extracellular acidification leads to intracellular acidification as well. But the concerns over disruption by unroofing/washing and relevance of the changes remain.

      We acknowledge that unroofing is a limitation of our approach and noted it in the discussion. However, we have updated the section to include the possibility that the act of unroofing and washing could also disrupt the potential interaction between the intracellular domains as well as between these domains and other intracellular proteins. This was the best approach we could use to address our questions and it required that we unroof the cells. However, we look forward to future studies or new techniques that do not require the unroofing of the cells.

      The distances calculated depend on the R0 between donor and acceptor. In turn, this depends on the donor's emission spectrum and quantum yield. The spectrum and yield of ANAP is very sensitive to local environment. It is a useful fluorophore for patch fluorometry for precisely this reason, and gating-induced conformational changes in the CTD have been reported just from changes in ANAP emission alone (PMID 29425514). Therefore, using a single R0 value for all positions (and both pHs at a single position) is inappropriate. The authors should either include this caveat and give some estimate of how big an impact changes spectrum and yield might have, or actually measure the emission spectra at all positions tested.

      This is a reasonable concern and one we considered. Measuring the quantum yield would be quite difficult. However, we have measured spectra at a number of positions and see a relatively minimal shik in the peak. Most positions peak between 481 and 484nm. If you calculate the difference in R0 using theoretical spectra with a blue shik of 20nm, the difference in R0 is only ~1.5A. A shik of 20nm is on the higher side of anything we have seen in the literature (PMID 30038260) and since even with that large a shik, the difference is minimal we do not think measuring spectra for each position would impact the overall conclusions presented. As you noted, though, the quantum yield also changes. Assuming a change in yield from 0.22 to 0.47, the largest we found reported in the literature (PMID:29923827) , the R0 would increase by 2A. This same paper showed that the blue shiked position was the one with the higher extinction coefficient so these changes would be working in opposition to one another making the difference in R0 even smaller. It is important to note, that while tmFRET is a much more powerful measure of distance than standard FRET, these distances, as you point out, are quite challenging to measure precisely. Our conclusions are based less on the absolute distances and more on the observation that no positions show large quenching and that if there is any change upon acidification, it is in the wrong direction.

      Overall, the writing and presentation of figures could be much improved with specific points mentioned in the recommendations for authors section.

      See below.

      The authors argue that the CTD is largely parallel to the plasma membrane, yet appear to base this conclusion on ANAP to membrane FRET of positions S464 and M505. Two positions is insufficient evidence to support such a claim. Some intermediate positions are needed.

      We do not see in the paper where we suggest that the CTD is parallel. However, your point that we could try and determine if this was the case is correct. However, we aVempted to create several other CTD TAG mutants but struggled with readthrough and poor expression of these mutants so we opted to just include S464 and M505. Our point from these data is only that the distal CTD (505) must spend significant time near the membrane to explain our FRET data.

      Upon acidification, NTD position Q14 moves towards the plasma membrane (Figure 8B). Q14 also gets closer to C515 or doesn't change relative to 505 (Figures 7C and B) upon acidification. Yet position 505 moves away from the membrane (Figure 8D). How can the NTD move closer to the membrane, and to the CTD but yet the CTD move further from the membrane? Some comment or clarification is needed.

      This is a reasonable question and one that is hard to definitively answer. Our goal here was to test the hypothesis that the termini are bound at rest. Mapping the precise positions of the termini is difficult for reasons we will enumerate in the question that asks why we didn’t make a model. There are potentially multiple explanations but the easiest one would be that the CTD could move away from the membrane but closer to Q14, for instance, if the distal termini, say, rotated towards the NTD. This would move 505 closer and have no impact on whether or not the NTD and CTD moved away or toward the membrane.

      Reviewer #1 (Recommendations For The Authors):

      Minor concerns

      The authors show the spectrum of ANAP attached to beads and use this spectrum to calculate R0 for their FRET measurements. Peak ANAP fluorescence is dependent on local environment and many reports show ANAP in protein blue-shiked relative to the values reported here. How would this affect the distance measurements reported?

      This is an important point. See above for the answer.

      Could the lack of interaction between the N- and C-terminal peptides in Figure 7 arise from the cysteine to serine mutations or lack of structure in the synthetic peptides. How were peptide concentrations measured/verified for the experiment?

      It is possible that cysteine to serine mutations could prevent the interaction. It is also possible that these peptides are not capable of adopting their native fold without the presence of the plasma membrane or due to being synthetically created. However, the termini are thought to be largely unstructured. We received these peptides in lyophilized form at >95% purity and resuspended to our desired stock concentration (3 mM C-terminus, 1 mM N-terminus). Even if our concentration was off, we see no signs of interaction up to quite a high concentration.

      How was photobleaching measured for correcting the data?

      We executed several mock experiments at various TAG positions using either pH 8 and pH 6, where we performed the experiments as usual but with a mock solution exchange when we would normally add the metal. We normalized the L-ANAP fluorescence to the first image and averaged together these values for pH 8 and pH 6. We then corrected using Equation 2 in the manuscript..

      We have updated the methods to include how we adjusted for bleaching.

      The authors may wish to make it more explicit that their Zn2+ controls also preclude the possibility that a changing FRET signal between ANAP and citrine may affect their data.

      Thank you for this comment. We agree, it would strengthen the manuscript to include this statement. We have now included this.

      It might be useful to the reader if the authors could include (as a supplement) plots of their data (like in Figure 6), in which FRET efficiency has been converted to distance.

      We considered this idea as well but felt like showing the actual data in the figures and the distances in a table would be best.

      Figure 5D is mentioned in the text before any other figures. This is unconventional. Could this panel be moved to Figure 1 or the mention moved to later?

      Changed

      western blot is not capitalized.

      Changed.

      Figure 1, the ANAP structure shown is the methyl ester, which is presumably cleaved before ANAP is conjugated to the tRNA. The authors may wish to replace this with the free acid structure.

      This is a fair point. We originally used the methyl ester structure to indicate the version of ANAP we chose to use. However, you are correct that the methyl ester is cleaved before conjugation to the tRNA. We replaced the methyl ester with the free acid structure to clarify this.

      Figures 1 and 4 should have scale bars for the images.

      Scale bars have been added to figures 1, 4, and 5.

      In Figure 3, the letters in the structures (particularly TETAC) are way too small. Please increase the font size.

      Changed

      In Figure 3 and Figure 3 supplement 1, the axes are labeled "Absorbance (M-1cm-1)." Absorbance is dimensionless. The authors are likely reporting the extinction coefficient.

      Thank you for catching this. We adjusted the axes to extinction coefficient.

      In Figures 5 B and C, it might be clearer if the headers read "Initial, +Cu2+/TETAC, DTT" rather than "Initial, FRET, Recovery."

      Changed

      The panel labels for Figure 8 seem to be out of order.

      Changed

      The L for L-ANAP should be rendered, by convention, in small caps.

      This is a good example of learning something new from the review process. This is the first I have ever heard of small caps. We can find no other papers that use small caps for L-ANAP so I am not 100% sure what convention this is referring to and don’t want to change the wrong thing in the paper. We are happy to change if the editorial staff at eLife agree but have lek this for now.

      Reviewer #2 (Recommendations For The Authors):

      With so many distances measured, why was not even a basic structural model attempted?

      We certainly considered it, but a number of things lead us to conclude that it might imply more certainty about the structure of these termini than we hope to give. 1) Given that the FRET is a time average of positions, these distance constraints would not do much constraining. 2) Given that the termini are likely unstructured and flexible this makes the problem in 1 worse. 3) There is no structural information to use as a starting point for a model. 4) The flexibility of the linkers for each FRET pair also introduces uncertainty. This can, in theory, be modeled as they do in EPR but all of this together made us decide not to do this. What we hope readers take home, is the overall picture of the data is not consistent with the original RIPK1 hypothesis.

      Maybe it would be good to draw a band on the graphs in Figure 6 for the FRET signal expected for interaction (and thus, disfavoured by these data). This would at least give context.

      We agree this could be helpful, but it is not so easy to do. What distance would we choose? We could put a line at ~5Å (the model predicted distance). As we noted above, a number of distances could be compatible with an interaction. However, we think it’s unlikely that if a complex was formed that none of our measurements would show a distance closer than 20Å at rest and that an unbinding event would then lead to a decrease in distance. This, to us, is the take home message.

      Minor points:

      "Aker unroofing the cells, only fluorescence associated with the "footprint", or dorsal surface, of the cell membrane is lek behind."

      The authors use dorsal and ventral in this section to describe parts of an adherent cell. But in the first instance, they remove the dorsal part of the cell, and then in this phrase, the dorsal part is lek behind....I am a bit confused.

      Thank you for pointing out this mistake, we have fixed this. It is indeed the ventral surface lek behind.

      "bind at rest an" - and?

      Changed

      "One previous study used a different approach to try and map the topography of the intracellular termini of ASIC1a comparable to our memFRET experiments." I think a citation is due.

      Citation added

      "great deal of precedent" even if this result is from my own lab, I would prefer that the authors note that it's one study from one lab! I think best just to delete "great deal of".

      “Great deal of” deleted

      I think the column "Significance" in the tables is unnecessary when the P value is given.

      Thank you for this suggestion. We agree and have made the change.

      Figure 7a Q14TAG has a clearly bimodal distribution at pH 8. What could be the meaning of this result? The authors do not mention it that I could find. Perhaps there is no meaning. The authors should state what they think is (or is not) going on.

      This is a good question and we don’t have a good answer. It appears to be experimental variability. The data from the “low fret” in this experimental condition all came from the same days. So something was different that day. We considered that they might be outliers to exclude but thought showing all of our data was the beVer path. We reperformed the ANOVA here separating out the “outlier” day and nothing of substance changed. Both populations were still different with P value less than 0.001.

      Typo: Lumencore

      Changed

      Maybe just a matter of taste but the panel created with Biorender in Figure 8 is not attractive and depicts the channel differently to in Figure 5D, which is again different from Figure 1A. Surely one advantage of using computer-generated artwork could be to have consistency.

      We agree and have used the same cartoon for all of our images with the one exception being the schematics that are just meant to show the positions that are present in each bar graph.

      Figure 4A was squashed to fit (text aspect ratio is wrong).

      Fixed

      Reviewer #3 (Recommendations For The Authors):

      Citrine is used to report incorporation. Yet citrine has a strong tendency to dimerize (PMID 27240257). Did the authors use mCitrine or just Citrine? This is quite important in interpreting their data.

      Thank you for pointing out this important distinction. We use mCitirine which we have added to the methods.

      The manuscript has numerous instances of imprecise language. For example, page 10, last para, first line, "previous studies have looked at..." or page 7, final paragraph "tell a similar story". Related, the figures could be much better. For example, in Figure 1, where the authors depict the anap chemical in red, as opposed to the blue one might expect of a blue emiqng fluorophore. In figure 6, ANAP is also in red with the quenching group in green. This is opposite to how one typically thinks of FRET with the warmer color being the acceptor not the donor. Moreover, the pH 6 condition is also colored the same shade of red as the ANAP. Labels of Cys positions would again be useful here. In Figure 3, the heteroatoms of TETAC and C18-NTA are very small and difficult to see. It would also be good to label these structures, and the spectra below, so the reader can tell at a glance without looking at the caption, what the structures and spectra arise from. Also, how are the absorption spectra normalized? This is not discussed in the methods. The lack of attention to presentation mars an otherwise nice study.

      Thank you for these points. We have made modifications to the manuscript to address these comments.

      Abstract, second last line "Aker prolonged acidification, ...", 'prolonged' could be interpreted as 'it takes a while for the domain to move' or 'the movement only happens aker a while'. This not what the authors intend to convey. Consider modifying to just 'Aker acidification,'

      We updated the main text to indicate that prolonged acidification is intended to describe acidification that occurs over the minutes timescale.

      Pdf page 6, bottom para on Anap incorporation not altering channel function: What is meant by 'steady state pH dependence of activation'? This implies the authors applied a pH stimulus, then waited until equilibrium was achieved ie. until desensitization was complete and measured the current at that point. It seems more likely they simply applied different pH stimuli and measured the peak response and that the use of 'steady state' here is a typo.

      We removed the phrase steady state.

      Same section, controls of electrophysiology allude to 485, 505 and 515 ANAP-containing channels. In fact, the authors have no way of determining what fraction (if any) of the pH evoked currents arise from channels containing Anap in those positions versus from simply having a translation stop but still functioning. This should be mentioned.

      This is correct. We cannot be sure the CTD TAG positions are not a mixture of ANAP-containing channels and truncations. See above for why we do not think this a big concern for the FRET experiments. Functionally, though, you are correct that we cannot tell. We now mention this in the paper.

      Methods, the abbreviation for SBT should be defined somewhere.

      Added.

      Methods, unroofing section, middle paragraph, the authors use nM not nm to list wavelengths of light.

      Changed.

      Figure 3C-D: There's an unexpected blip in the Anap emission spectra at ~500 nm. Are the grating efficiency of the spectrograph and quantum efficiency of the camera accounted for in these spectra?

      This is a good question. The data are not corrected for either camera efficiency or grating efficiency. We don’t have easy access to the actual data (although we can see a pdf version of each). There is a liVle blip in the grating efficiency graph that could partly explain the blip in our spectra.

      Figure 5C, were recovery experiments routinely done? If so, would be good to show more than n = 1 in the plot to get an idea of reproducibility.

      Recovery experiments were done in every experiment but are not shown for simplicity. We have included all FRET and recovery data for position Q14TAG-C469 at pH 6 in figure 5C to show reproducibility of our FRET and recovery data.

      Table 1, considering adding a Δ distance column (pH 8 versus 6) so the magnitude of changes are more easily seen.

      This is a reasonable suggestion but we decided not to include a Δ distance column. The data are whole numbers and people can easily determine the Δ distance. We felt that including that column would bring too much focus on what we think are preVy small changes. Our hope is that readers take away that the data are not consistent with complex formation between the determine and focus less on absolute distances.

      Figure 7A, Q14tag pH 8 condition has a quite a bit of spread and, likely, two populations. These data, as well as G11, are unlikely to be parametric and hence ANOVA is inappropriate. A normality test, and likely Kruskal-Wallis test is called for.

      Aker testing for normality, the data for Q14TAG C485 pH8 are non-normally distributed. However, a Kruskal Wallis is a non-parametric test for a one-way ANOVA and not applicable here. We separated the data out into population 1 and 2 and repeated the two-way ANOVA statistical test. When Q14TAG pH 8 is split into 2 populations, the statistics hardly change. When the data is not separated, Q14TAG pH 8 relative to pH 6 has a p-value <0.0001. When the 2 populations are separated, both populations relative to Q14TAG pH 6 still have a p-value of <0.0001.

    2. eLife assessment

      This valuable study illuminates molecular movements of acid-sensing ion channels by combining advanced chemical biology and biophysical techniques. The evidence for the main claim, lack of interaction of molecular termini, is compelling and challenges prior models. This work is expected to pique interest in the ion channel signaling field, providing a fresh perspective.

    3. Reviewer #1 (Public Review):

      Cullinan et al. explore the hypothesis that the cytoplasmic N- and C-termini of ASIC1a, not resolved in x-ray or cryo-EM structures, form a dynamic complex that breaks apart at low pH, exposing a C-terminal binding site for RIPK1, a regulator of necrotic cell death. They expressed channels tagged at their N- and C-termini with the fluorescent, non-canonical amino acid ANAP in CHO cells using amber stop-codon suppression. Interaction between the termini was assessed by FRET between ANAP and colored transition metal ions bound either to a cysteine reactive chelator attached to the channel (TETAC) or metal-chelating lipids (C18-NTA). A key advantage to using metal ions is that they are very poor FRET acceptors, i.e. they must be very close to the donor for FRET to occur. This is ideal for measuring small distances/changes in distance on the scales expected from the initial hypothesis. In order to apply chelated metal ions, CHO cells were mechanically unroofed, providing access to the inner leaflet of the plasma membrane. At high pH, the N- and C- termini are close enough for FRET to be measured, but apparently too far apart to be explained by a direct binding interaction. At low pH, there was an apparent increase in FRET between the termini. FRET between ANAP on the N-and C-termini and metal ions bound to the plasma membrane suggests that both termini move away from the plasma membrane at low pH. The authors propose an alternative hypothesis whereby close association with the plasma membrane precludes RIPK1 biding to the C-terminus of ASIC1a.

      The findings presented here are certainly valuable for the ion channel/signaling field and the technical approach only increases the significance of the work. The choice of techniques is appropriate for this study and the results are clear and high quality. Sufficient evidence is presented against the starting hypothesis. I have a few questions about certain controls and assumptions that I would like to see discussed more explicitly in the manuscript.

      --As discussed by the authors, the C-terminal citrine could potentially disrupt the hypothesized interaction between the N- and C-termini.

      --There is apparent read-through of some of the stop codons in the absence of ANAP, which could complicate interpretation of the experiments. The largest amount of read-through is for the E6TAG, L18TAG, and H515TAG constructs, which were not used for further experiments. However, some degree of read-through is evident from western blots for V10TAG, Q14TAG, L41TAG, and A44TAG as well.

      Since the epitope used for western blots is on the C-terminus of the protein, the blots do not show the fraction of truncated protein. As discussed by the authors, N-terminally truncated constructs would be too small to assemble into channels. In constructs with the TAG codon towards the C-terminus, there is the potential for co-assembly of full-length and truncated subunits into trimers. Truncated subunits would not contribute directly to the fluorescence signal, but could potentially have allosteric effects on the position of the C-termini of full-length ANAP-tagged constructs in the context of a mixed channel.

    4. Reviewer #2 (Public Review):

      Summary:<br /> The authors use previously characterised FRET methods to measure distances between intracellular segments of ASIC and with the membrane. The distances are measured across different conditions and at multiple positions in a very complete study. The picture that emerges is that the N- and C-termini do not associate.

      Strengths:<br /> Good controls, good range of measurements, advanced, well-chosen and carefully performed FRET measurements. The paper is a technical triumph. Particularly, given the weak fluorescence of ANAP, the extent of measurements and the combination with TETAC is noteworthy.

      The distance measurements are largely coherent and favour the interpretation that the N and C terminus are not close together as previously claimed.

      Weaknesses:<br /> One difficulty, which admittedly is hard to address, is that we do not have a positive control for what binding of something to either N- or C-terminus would look like (either in FRET or otherwise).

      One limitation is unroofing. The concept of interaction with intracellular domains is being examined. But the authors use unroofing to measure the positions, fully disrupting the cytoplasm. Thus it is not excluded that the unroofing disrupts that interaction. But this limitation is discussed adequately in the text.

    5. Reviewer #3 (Public Review):

      Summary: The manuscript by Cullinan et al., uses ANAP-tmFRET to test the hypothesis that the NTD and CTD form a complex at rest and to probe these domains for acid-induced conformational changes. They find convincing evidence that the NTD and CTD do not have a propensity to form a complex. They also report these domains are parallel to the membrane and that the NTD moves towards, and the CTD away, from the membrane upon acidification.

      Strengths:<br /> The major strength of the paper is the use of tmFRET, which excels at measuring short distances and is insensitive to orientation effects. The donor-acceptor pairs here are also great choices as they are minimally disruptive to the structure being studies.

      Furthermore, they conduct these measurements over several positions with the N and C tails, both between the tails and to the membrane. Finally, to support their main point, MST is conducted to measure the association of recombinant N and C peptides, finding no evidence of association or complex formation.

      Weaknesses:<br /> While tmFRET is a strength, using ANAP as a donor requires the cells to be unroofed to eliminate background signal. This causes two problems. First, it removes any possible low affinity interacting proteins such as actinin (PMID 19028690). Second, the pH changes now occur to both 'extracellular' and 'intracellular' lipid planes. Thus, it is unclear if any conformational changes in the N and CTDs arise from desensitization of the receptor or protonation of specific amino acids in the N or CTDs or even protonation of certain phospholipid groups such as in phosphatidylserine. The authors do mention this caveat. But until a new approach is developed, the concerns over disruption by unroofing/washing and relevance of the changes remain.

      Upon acidification, NTD position Q14 moves towards the plasma membrane (Figure 8B). Q14 also gets closer to C515 or doesn't change relative to 505 (Figures 7C and B) upon acidification. Yet position 505 moves away from the membrane (Figure 8D). It's unclear how the NTD moves closer to the membrane, and to the CTD but yet the CTD moves further from the membrane. Future experiments or approaches may refine this model.

    1. Reviewer #2 (Public Review):

      Summary:<br /> ECM components are prominent constituents of the pericellular environment of CNS cells and form complex and dynamic interactomes in the pericellular spaces. Based on bioinformatic analysis, more than 300 genes have been attributed to the so-called matrisome, many of which are detectable in the CNS. Yet, not much is known about their functions while increasing evidence suggests important contributions to developmental processes, neural plasticity, and inhibition of regeneration in the CNS. In this respect, the present work offers new insights and adds interesting aspects to the facets of ECM contributions to neural development. This is even more relevant in view of the fact that neurocan has recently been identified as a potential risk gene for neuropsychiatric diseases. Because ECM components occur in the interstitial space and are linked in interactomes their study is very difficult. A strength of the manuscript is that the authors used several approaches to shed light on ECM function, including proteome studies, the generation of knockout mouse lines, and the analysis of in vivo labeled neural progenitors. This multi-perspective approach permitted to reveal hitherto unknown properties of the ECM and highlighted its importance for the overall organization of the CNS.

      Strengths:<br /> Systematic analysis of the ternary complex between neurone, TNC, and hyaluronic acid; establishment of KO mouse lines to study the function of the complex, use of in utero electroporation to investigate the impact on neuronal migration;

      Weaknesses:<br /> The analysis is focused on neuronal progenitors, however, the potential impact of the molecules of interest, in particular, their removal on differentiation and /or survival of neural stem/progenitor cells is not addressed. The potential receptors involved are not considered. It also seems that rather the passage to the outer areas of the forming cortex is compromised, which is not the same as the migration process. The movement of the cells is not included in the analysis.

    2. Reviewer #1 (Public Review):

      Summary:<br /> In the present study, authors found the ternary complex formed by NCAN, TNC, and HA as an important factor facilitating the multipolar to bipolar transition in the intermediate zone (IZ) of the developing cortex. NCAM binds HA via the N-terminal Link modules, meanwhile, TNC cross-links NCAN through the CDL domain at the C-terminal. The expression and right localization of these three factors facilitate the multipolar-bipolar transition necessary for immature neurons to migrate radially. TNC and NCAM are also involved in neuronal morphology. The authors used a wide range of techniques to study the interaction between these three molecules in the developing cortex. In addition, single and double KO mice for NCAN and TNC were analyzed to decipher the role of these molecules in neuronal migration and morphology.

      Strengths:<br /> The study of the formation of the cerebral cortex is crucial to understanding the pathophysiology of many neurodevelopmental disorders associated with malformation of the cerebral cortex. In this study, the authors showed, for the first time, that the ternary complex formed by NCAN, TNC, and HA promotes neuronal migration. The results regarding the interaction between the three factors forming the ternary complex are convincing.

      Weaknesses:<br /> However, regarding the in vivo experiments, the authors should consider some points for the interpretation of the results:<br /> -The authors did not use the proper controls in their experiments. For embryonic analysis, such as cortical migration, neuronal morphology, and protein distribution (Fig. 6, 7, and 9), mutant mice should be compared with control littermates, since differences in the results could be due to differences in embryonic stages. For example, in Fig. 6 the dKO is more developed than the WT embryo.<br /> -The authors claim that NCAM and TNC are involved in neuronal migration from experiments using single KO embryos. This is a strong statement considering the mild results, with no significant difference in the case of TNC KO embryos, and once again, using embryos from different litters.<br /> -The measurement of immunofluorescence intensity is not the right method to compare the relative amount of protein between control and mutant embryos unless there is a right normalization.

    1. Reviewer #3 (Public Review):

      Summary:<br /> The authors sought to determine, at the level of individual presubiculum pyramidal cells, how allocentric spatial information from the retrosplenial cortex was integrated with egocentric information from the anterior thalamic nuclei. Employing a dual opsin optogenetic approach with patch clamp electrophysiology, Richevaux, and colleagues found that around three-quarters of layer 3 pyramidal cells in the presubiculum receive monosynaptic input from both brain regions. While some interesting questions remain (e.g. the role of inhibitory interneurons in gating the information flow and through different layers of presubiculum, this paper provides valuable insights into the microcircuitry of this brain region and the role that it may play in spatial navigation).

      Strengths:<br /> One of the main strengths of this manuscript was that the dual opsin approach allowed the direct comparison of different inputs within an individual neuron, helping to control for what might otherwise have been an important source of variation. The experiments were well-executed and the data was rigorously analysed. The conclusions were appropriate to the experimental questions and were well-supported by the results. These data will help to inform in vivo experiments aimed at understanding the contribution of different brain regions in spatial navigation and could be valuable for computational modelling.

      Weaknesses:<br /> Some attempts were made to gain mechanistic insights into how inhibitory neurotransmission may affect processing in the presubiculum (e.g. Figure 5) but these experiments were a little underpowered and the analysis carried out could have been more comprehensively undertaken, as was done for other experiments in the manuscript.

    2. eLife assessment

      Richevaux and colleagues conducted a valuable study that investigated the integration of thalamic and retrosplenial inputs in the dorsal presubiculum, an essential hippocampal region involved in spatial navigation and memory. Through ex vivo optogenetic electrophysiological experiments, they discovered that many presubicular pyramidal cells receive convergent inputs from both the anterior thalamus and the retrosplenial cortex. These solid findings provide a potential cellular mechanism for anchoring the brain's internal compass to external landmarks, shedding light on how the brain integrates spatial information with an animal's sense of its position in space.

    3. Reviewer #1 (Public Review):

      Summary:<br /> In this manuscript, the authors use anatomical tracing and slice physiology to investigate the integration of thalamic (ATN) and retrosplenial cortical (RSC) signals in the dorsal presubiculum (PrS). This work will be of interest to the field, as the postsubiculum is thought to be a key region for integrating internal head direction representations with external landmarks. The main result is that ATN and RSC inputs drive the same L3 PrS neurons, which exhibit superlinear summation to near-coincident inputs. Moreover, this activity can induce bursting in L4 PrS neurons, which can pass the signals LMN (perhaps gated by cholinergic input).

      Strengths:<br /> The slice physiology experiments are carefully done. The analyses are clear and convincing, and the figures and results are well-composed. Overall, these results will be a welcome addition to the field.

      Weaknesses:<br /> The conclusions about the circuit-level function of L3 PrS neurons sometimes outstrip the data, and their model of the integration of these inputs is unclear. I would recommend some revision of the introduction and discussion. I also had some minor comments about the experimental details and analysis.

      Specific major comments:<br /> 1) I found that the authors' claims sometimes outstrip their data, given that there were no in vivo recordings during behavior. For example, in the abstract, their results indicate "that layer 3 neurons can transmit a visually matched HD signal to medial entorhinal cortex", and in the conclusion they state "[...] cortical RSC projections that carry visual landmark information converge on layer 3 pyramidal cells of the dorsal presubiculum". However, they never measured the nature of the signals coming from ATN and RSC to L3 PrS (or signals sent to downstream regions). Their claim is somewhat reasonable with respect to ATN, where the majority of neurons encode HD, but neurons in RSC encode a vast array of spatial and non-spatial variables other than landmark information (e.g., head direction, egocentric boundaries, allocentric position, spatial context, task history to name a few), so making strong claims about the nature of the incoming signals is unwarranted.

      2) Related to the first point, the authors hint at, but never explain, how coincident firing of ATN and RSC inputs would help anchor HD signals to visual landmarks. Although the lesion data (Yoder et al. 2011 and 2015) support their claims, it would be helpful if the proposed circuit mechanism was stated explicitly (a schematic of their model would be helpful in understanding the logic). For example, how do neurons integrate the "right" sets of landmarks and HD signals to ensure stable anchoring? Moreover, it would be helpful to discuss alternative models of HD-to-landmark anchoring, including several studies that have proposed that the integration may (also?) occur in RSC (Page & Jeffrey, 2018; Yan, Burgess, Bicanski, 2021; Sit & Goard, 2023). Currently, much of the Discussion simply summarizes the results of the study, this space could be better used in mapping the findings to the existing literature on the overarching question of how HD signals are anchored to landmarks.

    4. Reviewer #2 (Public Review):

      Richevaux et al investigate how anterior thalamic (AD) and retrosplenial (RSC) inputs are integrated by single presubicular (PrS) layer 3 neurons. They show that these two inputs converge onto single PrS layer 3 principal cells. By performing dual-wavelength photostimulation of these two inputs in horizontal slices, the authors show that in most layer 3 cells, these inputs summate supra-linearly. They extend the experiments by focusing on putative layer 4 PrS neurons, and show that they do not receive direct anterior thalamic nor retrosplenial inputs; rather, they are (indirectly) driven to burst firing in response to strong activation of the PrS network.

      This is a valuable study, that investigates an important question - how visual landmark information (possibly mediated by retrosplenial inputs) converges and integrates with HD information (conveyed by the AD nucleus of the thalamus) within PrS circuitry. The data indicate that near-coincident activation of retrosplenial and thalamic inputs leads to non-linear integration in target layer 3 neurons, thereby offering a potential biological basis for landmark + HD binding.

      The main limitations relate to the anatomical annotation of 'putative' PrS L4 neurons, and to the presentation of retrosplenial/thalamic input modularity. Specifically, more evidence should be provided to convincingly demonstrate that the 'putative L4 neurons' of the PrS are not distal subicular neurons (as the authors' anatomy and physiology experiments seem to indicate). The modularity of thalamic and retrosplenial inputs could be better clarified in relation to the known PrS modularity.

    1. Author Response

      eLife assessment

      This paper by Aitchison and colleagues describes nanobody neutralizing and binding activity against various SARS-CoV-2 variants of concern. The findings are important in that the described nanobodies may have broad therapeutic relevance against current and future variants of concern and may be able to avoid significant resistance. The claims are incomplete: while the study is well-executed and uses a nice balance of biochemical and cellular assays, the efficacy of the proposed nanobody library against VOCs is not completely supported as IC50 values appear to increase against newer variants and are higher than previously used therapeutic bNAbs, animal data showing in vivo efficacy is lacking, and protection against future possible variants is not proven.

      This manuscript is a follow-up of our previous eLife manuscript “Highly synergistic combinations of nanobodies that target SARS-CoV-2 and are resistant to escape” https://elifesciences.org/articles/73027 where we described an “impressive collection of hundreds of new nanobodies binding SARS-CoV-2 spike by combining in vivo antibody affinity maturation and proteomics. [Editor’s evaluation]”. As a follow-up this submission extends the findings of our previous eLife publication and thus focuses on how our repertoire functions in the context of a rapidly evolving SARS-CoV-2 virus, relying on the established methodologies and approaches of the original paper. We explore how nanobody functions have been influenced by the emergence of SARS-CoV-2 variants containing extensive mutations in spike protein, which largely reduced the usefulness of therapeutic monoclonal antibody therapeutics. Our findings show that while some nanobodies lost efficacy in binding to and neutralizing these evolved spikes, a surprising number of nanobodies retained their binding and neutralization activity. This is an important finding, because these efficacious nanobodies target regions that appear rarely targetable by monoclonal antibodies. We also provide experimental validation of the importance of the interplay between binding and neutralization in synergy experiments, where even weakened binding still contributed to strongly enhancing the neutralization.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Ketaren, Mast, Fridy et al. assessed the ability of a previously generated llama nanobody library (Mast, Fridy et al. 2021) to bind and neutralize SARS-CoV-2 delta and omicron variants. The authors identified multiple nanobodies that retain neutralizing and/or binding capacity against delta, BA.1 and BA.4/5. Nanobody epitope mapping on spike proteins using structural modeling revealed possible mechanisms of immune evasion by viral variants as well as mechanisms of cross-variant neutralization by nanobodies. The authors additionally identified two nanobody pairs involving non-neutralizing nanobodies that exhibited synergy in neutralization against the delta variant. These results enabled the refinement of target epitopes of the nanobody repertoire and the discovery of several pan-variant nanobodies for further preclinical development.

      Strengths:

      Overall, this study is well executed and provides a valuable framework for assessing the impact of emerging SARS-CoV-2 variants on nanobodies using a combination of in vitro biochemical and cellular assays as well as computational approaches. There are interesting insights generated from the epitope mapping analyses, which offer possible explanations for how delta and omicron variants escape nanobody responses, as well as how some nanobodies exhibit cross-variant neutralization capacity. These analyses laid out a clear path forward for optimizing these promising next-gen therapeutics, particularly in the face of rapidly emerging SARS-CoV-2 variants. This work will be of interest to researchers in the fields of antibody/nanobody engineering, SARS-CoV-2 therapeutics, and host-virus interaction.

      Weaknesses:

      A main weakness of the study is that the efficacy statement is not thoroughly supported. While the authors comprehensively characterized the neutralizing ability of nanobodies in vitro, there is no animal data involving mice or hamsters to demonstrate the real protective efficacy in vivo. Yet, in the title and throughout the manuscript, the authors repeatedly used phrases like "retains efficacy" or "remains efficacious" to describe the nanobodies' neutralization or binding capacities.

      This claim is not well supported by the data and underestimates the impact of variants on the nanobodies, especially the omicron sublineages. For example, the authors showed that S1-RBD-15 had a ~100-fold reduction in neutralization titer against Omicron, with an IC50 at around 1 uM. This is much higher than the IC50 value of a typical anti-ancestral RBD nanobody reported in the previous study (Mast, Fridy et al. 2021). In fact, the authors themselves ascribe nanobodies with an IC50 above 1 uM as weak neutralizers. And there were many in the range of 0.1-1 uM.

      Furthermore, many nanobodies selected for affinity measurement against BA.4/5 had no detectable binding.

      Without providing in vivo protection data or including monoclonal antibodies that are known to be efficacious against variants in the in vitro assays as a benchmark, it is difficult to evaluate the efficacy just with the IC50 values.

      We respectfully disagree with the reviewer on several aspects of this critique.

      As to our use of the word efficacy - the quality of being successful in producing an intended result; effectiveness - we were specific to nanobody binding and in vitro neutralization of the variant spike proteins tested in the manuscript. Indeed, our manuscript made no claim of efficacy outside of this intended meaning. However, to prevent misinterpretation we will modify the final paragraph of our introduction to state explicitly that the nanobody repertoire retains efficacy in binding and neutralizing variants of spike. The final paragraph of the Introduction will include the following:

      “Here, we demonstrate that a subset of our previously published repertoire of nanobodies, generated against spike from the ancestral SARS-CoV-2 virus (Mast, Fridy et al. 2021), retains binding and in vitro neutralization efficacy against circulating variants of concern (VoC), including omicron BA.4/BA.5.”

      We agree that in vivo neutralization data would be an important complement to the in vitro binding and neutralization data. Experiments along these lines are ongoing, but are not considered part of a follow-up to our original paper where in vivo data were not included.

      We disagree with the Reviewer that “This claim is not well supported by the data and underestimates the impact of variants on the nanobodies, especially the omicron sublineages.” As we specifically state: “In comparison, groups I, I/II, I/IV, V, VII, VIII and the anti-S2 nanobodies contained the majority of omicron BA.1 neutralizers, though here the neutralization potency of many nanobodies was decreased compared to wild-type. This decrease in neutralization potency largely correlates with the accumulation of omicron BA.1 specific mutations throughout the RBD, which likely alters the epitope-binding site of these nanobodies, weakening their interaction with BA.1 spike (Fig. 1B). (emphasis added)”

      Naturally, we expected that some of our nanobodies would lose the ability to bind BA.4/BA.5. This enabled us to determine which areas on spike remained susceptible to our nanobodies. We show that 10/29 nanobodies tested retained binding to BA.4/5. We did not test our entire repertoire, just a subset was selected for. We stated the following:

      “Of the nanobodies that neutralized both delta and omicron BA.1, representatives from each of the nanobody epitope groups were selected for SPR analysis, where S1 binders with mapped epitopes that neutralized one or both variants well, were prioritized.”

      Reviewer #2 (Public Review):

      Summary:

      Interest in using nanobodies for therapeutic interventions in infectious diseases is growing due to their ability to bind hidden or cryptic epitopes that are inaccessible to conventional immunoglobulins. In the present study, the authors were posed (sic) to characterize nanobodies derived from the library produced earlier with the Wuhan strain of SARS-CoV-2, map their epitopes on SARS-CoV-2 spike protein, and demonstrate that some nanobodies retain binding and even neutralization against antigenically distant Variants of Concern (VOCs) that are currently circulating.

      Strengths:

      The authors demonstrate that some nanobodies - despite being obtained against the ancestral virus strain - retain high affinity binding to antigenically distant SARS-CoV-2 strains. This is despite the majority of the repertoire losing binding. Although limited to only two nanobody combinations, the demonstration of synergy in virus neutralization between nanobodies targeting different epitopes is compelling.

      We thank the Reviewer for this positive summary of the strengths of our study. In our previous work, we applied stringent criteria for the down-selection of nanobodies based on their affinity and diversity, as elaborated on in https://elifesciences.org/articles/73027. The current dataset is a further judiciously curated subset, featuring 41 nanobodies chosen to represent and inform on the 10 structurally mapped epitope groups that we initially identified. This subset is but the tip of an iceberg. For each nanobody demonstrating high-affinity binding and neutralization, we possess multiple sequence variants, offering alternative avenues for investigation. Moreover, our repertoire has since been further elaborated by use of a yeast display library (Cross et al., 2023 JBC) providing additional nanobodies capable of targeting the same epitopes. Our findings presented here, thus serve as a heuristic, enabling us to distill the much larger repertoire into manageable and informative clusters of data. We will modify our manuscript to be more explicit of these facts.

      Weaknesses:

      The authors imply that nanobodies that retain binding/neutralization of early Omicron sublineages will be active against currently circulating and future virus strains. Unfortunately, no reasoning for such a conclusion nor data supporting this prediction are provided.

      The nanobodies we propose to retain binding to current and emerging omicron sublineages at the time (Fig. 4) are those that still bind to omicron BA.1, BA.4/5. The structures of XBB and BQ.1 are not divergent enough from these aforementioned omicron sublineages in the regions we propose our nanobodies retain binding (Fig. 4) to result in loss of binding. Thus, we hypothesize that the epitopes where these nanobodies bind or are predicted to bind (outlined in black (Fig. 4)), represent regions on spike vulnerable to nanobody intervention. Importantly, we also now have further experimental data to support our predictions that these nanobodies in Fig. 4 will retain binding (see plot in Author response image 1). We will provide additional data and complements to key figures to help illustrate this in the revised manuscript.

      Author response image 1.

    2. eLife assessment

      This paper by Aitchison and colleagues describes nanobody neutralizing and binding activity against various SARS-CoV-2 variants of concern. The findings are important in that the described nanobodies may have broad therapeutic relevance against current and future variants of concern and may be able to avoid significant resistance. The claims are incomplete: while the study is well-executed and uses a nice balance of biochemical and cellular assays, the efficacy of the proposed nanobody library against VOCs is not completely supported as IC50 values appear to increase against newer variants and are higher than previously used therapeutic bNAbs, animal data showing in vivo efficacy is lacking, and protection against future possible variants is not proven.

    3. Reviewer #1 (Public Review):

      Summary:<br /> In this manuscript, Ketaren, Mast, Fridy et al. assessed the ability of a previously generated llama nanobody library (Mast, Fridy et al. 2021) to bind and neutralize SARS-CoV-2 delta and omicron variants. The authors identified multiple nanobodies that retain neutralizing and/or binding capacity against delta, BA.1 and BA.4/5. Nanobody epitope mapping on spike proteins using structural modeling revealed possible mechanisms of immune evasion by viral variants as well as mechanisms of cross-variant neutralization by nanobodies. The authors additionally identified two nanobody pairs involving non-neutralizing nanobodies that exhibited synergy in neutralization against the delta variant. These results enabled the refinement of target epitopes of the nanobody repertoire and the discovery of several pan-variant nanobodies for further preclinical development.

      Strengths:<br /> Overall, this study is well executed and provides a valuable framework for assessing the impact of emerging SARS-CoV-2 variants on nanobodies using a combination of in vitro biochemical and cellular assays as well as computational approaches. There are interesting insights generated from the epitope mapping analyses, which offer possible explanations for how delta and omicron variants escape nanobody responses, as well as how some nanobodies exhibit cross-variant neutralization capacity. These analyses laid out a clear path forward for optimizing these promising next-gen therapeutics, particularly in the face of rapidly emerging SARS-CoV-2 variants. This work will be of interest to researchers in the fields of antibody/nanobody engineering, SARS-CoV-2 therapeutics, and host-virus interaction.

      Weaknesses:<br /> A main weakness of the study is that the efficacy statement is not thoroughly supported. While the authors comprehensively characterized the neutralizing ability of nanobodies in vitro, there is no animal data involving mice or hamsters to demonstrate the real protective efficacy in vivo. Yet, in the title and throughout the manuscript, the authors repeatedly used phrases like "retains efficacy" or "remains efficacious" to describe the nanobodies' neutralization or binding capacities. This claim is not well supported by the data and underestimates the impact of variants on the nanobodies, especially the omicron sublineages. For example, the authors showed that S1-RBD-15 had a ~100-fold reduction in neutralization titer against Omicron, with an IC50 at around 1 uM. This is much higher than the IC50 value of a typical anti-ancestral RBD nanobody reported in the previous study (Mast, Fridy et al. 2021). In fact, the authors themselves ascribe nanobodies with an IC50 above 1 uM as weak neutralizers. And there were many in the range of 0.1-1 uM. Furthermore, many nanobodies selected for affinity measurement against BA.4/5 had no detectable binding. Without providing in vivo protection data or including monoclonal antibodies that are known to be efficacious against variants in the in vitro assays as a benchmark, it is difficult to evaluate the efficacy just with the IC50 values.

    4. Reviewer #2 (Public Review):

      Summary:<br /> Interest in using nanobodies for therapeutic interventions in infectious diseases is growing due to their ability to bind hidden or cryptic epitopes that are inaccessible to conventional immunoglobulins. In the present study, the authors were posed to characterize nanobodies derived from the library produced earlier with the Wuhan strain of SARS-CoV-2, map their epitopes on SARS-CoV-2 spike protein, and demonstrate that some nanobodies retain binding and even neutralization against antigenically distant Variants of Concern (VOCs) that are currently circulating.

      Strengths:<br /> The authors demonstrate that some nanobodies - despite being obtained against the ancestral virus strain - retain high affinity binding to antigenically distant SARS-CoV-2 strains. This is despite the majority of the repertoire losing binding. Although limited to only two nanobody combinations, the demonstration of synergy in virus neutralization between nanobodies targeting different epitopes is compelling.

      Weaknesses:<br /> The authors imply that nanobodies that retain binding/neutralization of early Omicron sublineages will be active against currently circulating and future virus strains. Unfortunately, no reasoning for such a conclusion nor data supporting this prediction are provided.

    1. Reviewer #1 (Public Review):

      Summary:<br /> The authors were attempting to determine the extent that CIH altered swallowing motor function; specifically, the timing and probability of the activation of the larygneal and submental motor pools. The paper describes a variety of different motor patterns elicited by optogenetic activation of individual neuronal phenotypes within PiCo in a group of mice exposed to CIH. They show that there are a variety of motor patterns that emerge in CIH mice; this is apparently different than the more consistent motor patterns elicited by PiCo activation in normoxic mice (previously published).

      Strengths:<br /> The preparation is technically challenging and gives valuable information related to the role of PiCo in the pattern of motor activation involved in swallowing and its timing with phrenic activity. Genetic manipulations allow for the independent activation of the individual neuronal phenotypes of PiCo (glutamatergic, cholinergic) which is a strength.

      Weaknesses:<br /> 1. The data presented are largely descriptive in terms of the effect of PiCo activation on the probability of swallowing and the pattern of motor activation changes following CIH. Comparisons made between experimental data acquired currently and those obtained in a previous cohort of animals (possibly years before) are extremely problematic, with the potential confounding influence of changing environments, genetics, and litter effects. The statistical analyses (i.e. comparing CIH with normoxic) appear insufficiently robust. Exactly how the data were compared is not described.

      2. There is limited mechanistic insight into how PiCo manipulation alters the pattern and probability of motor activation. For example, does CIH alter PiCo directly, or some other component of the circuit (NTS)? Techniques that silence or activation projections to/from PiCo should be interrogated. This is required to further delineate and define the swallowing circuit, which remains enigmatic.

      3. The functional significance of the altered (non-classic) patterns is unclear.

    2. Reviewer #2 (Public Review):

      Summary:<br /> In this study, the authors investigated the role of a medullary region, named Postinspiratory Complex (PiCo), in the mediation of swallow/laryngeal behaviours, their coordination with breathing, and the possible impact on the reflex exerted by chronic intermittent hypoxia (CIH). This region is characterized by the presence of glutamatergic/cholinergic interneurons. Thus, experiments have been performed in single allelic and intersectional allelic recombinase transgenic mice to specifically excite cholinergic/glutamatergic neurons using optogenetic techniques, while recording from relevant muscles involved in swallowing and laryngeal activation. The data indicate that in anaesthetized transgenic mice exposed to CIH, the optogenetic activation of PiCo neurons triggers swallow activity characterized by variable motor patterns. In addition, these animals show an increased probability of triggering a swallow when stimulation is applied during the first part of the respiratory cycle.

      They conclude that the PiCo region may be involved in the occurrence of swallow and other laryngeal behaviours. These data interestingly improve the ongoing discussion on neural pathways involved in swallow-breathing coordination, with specific attention to factors leading to disruption that may contribute to dysphagia under some pathological conditions.

      The Authors' conclusions are partially justified by their data. However, it should be acknowledged that the impact of the study is to a certain extent limited by the lack of knowledge on the source of excitatory inputs to PiCo during swallowing under physiological conditions, i.e. during water-evoked swallowing. Also the connectivity between this region and the swallowing CPG, a structure not well defined, or other brain regions involved in the reflex is not known.

      Strengths:<br /> Major strengths of the manuscript:

      - The methodological approach is refined and well-suited for the experimental question. The in vivo mouse preparation developed for this study takes advantage of selective optogenetic stimulation of specific cell types with the simultaneous EMG recordings from upper airway muscles involved in respiration and swallowing to assess their motor patterns. The animal model and the chronic intermittent hypoxia protocol have already been published in previous papers (Huff et al. 2022, 2023).

      - The choice of the topic. Swallow disruption may contribute to the dysphagia under some pathological conditions, such as obstructive sleep apnea. Investigations aimed at exploring and clarifying neural structures involved in this behaviour as well as the connectivity underpinning muscle coordination are needed.

      - This study fits in with previous works. This work is a logical extension of previous studies from this group on swallowing-breathing coordination with further advances using a mouse model for obstructive sleep apnea.

      Weaknesses:<br /> Major weaknesses of the manuscript:

      - The Authors should be more cautious in concluding that the PiCo is critical for the generation of swallowing itself. It remains to demonstrate that PiCo is necessary for swallowing and laryngeal function in a more physiological situation, i.e. swallow of a bolus of water or food. It should be interesting to investigate the effects of silencing PiCo cholinergic/glutamatergic neurons on normal swallowing. In this perspective, the title should be slightly modified to avoid "swallow pattern generation" (e.g. Chronic Intermittent Hypoxia reveals the role of the Postinspiratory Complex in the mediation of normal swallow production).

      - The duration of swallows evoked by optogenetic stimulation of PiCo is considerably shorter in comparison with the duration of swallows evoked by a physiological stimulus (water). This makes it hard to compare the timing and the pattern of motor response in CIH-exposed mice. In Figure 1, the trace time scale should be the same for water-triggered and PiCo-triggered swallows. In addition, it is not clear if exposure to CIH alters the ongoing respiratory activity. Is the respiratory rhythm altered by hypoxia? If a disturbed or irregular pattern of breathing is already present in CIH-exposed mice, could this alteration interfere with the swallowing behaviour?

    3. Reviewer #3 (Public Review):

      In the present study, the authors investigated the effects of CIH on the swallowing and breathing responses to PICO stimulation. Their conclusion is that glutamatergic-cholinergic neurons from PICO are not only critical for the gating of post-inspiratory and swallow activity, but also play important roles in the generation of swallow motor patterns. There are several aspects that deserve the authors' attention and comments, mainly related to the study´s conclusions.

      - The authors refer to PICO as the generator of post-inspiratory rhythm. However, evidence points to this region as a modulator of post-inspiratory activity rather than a rhythmogenic site (Toor et al., 2019 - 10.1523/JNEUROSCI.0502-19.2019; Oliveira et al., 2021 - 10.1016/j.neuroscience.2021.09.015). For example, sustained activation of PICO for 10 s barely affected the vagus or laryngeal post-inspiratory activity (Huff et al., 2023 - 10.7554/eLife.86103).

      - The optogenetic activation of glutamatergic and cholinergic neurons from PICO evoked submental and laryngeal responses, and CIH changed these motor responses. Therefore, the authors proposed that PICO is directly involved in swallow pattern generation and that CIH disrupts the connection between PICO and SPG (swallow pattern generator). However, the experiments of the present study did not provide evidence about connections between these two regions nor their possible disruption after CIH, or even whether PICO is part of SPG.

      - CIH affects several brainstem regions which might contribute to generating abnormal motor responses to PICO stimulation. For example, Bautista et al. (1995 - 10.1152/japplphysiol.01356.2011) documented that intermittent hypoxia induces changes in the activity of laryngeal motoneurons by neural plasticity mechanisms involving serotonin.

      - To support the hypothesis that PICO is directly involved in swallow pattern generation the authors should perform the inhibition of Vglut2-ChAT neurons from PICO and then evoke swallow motor responses. If swallow is abolished when the neurons from this region are inhibited, it would indicate that PICO is crucial to generate this behavior.

      - In almost all the data presented, the authors observed different patterns of changes in the motor submental and laryngeal responses to PICO activation, including that animals submitted to CIH (6%) presented a "normal" motor response. However, the authors did not discuss the possible explanations and functional implications of this variability.

      - In Figure 4, the authors need to present low magnification sections showing the PICO transfected neurons as well as the absence of transfection in the ventral respiratory column. The authors could also check the scale since the cAmb seems very small.

      - Finally, the title does not reflect the study. The present study did not demonstrate that PICO is a swallow pattern generator.

    1. eLife assessment

      Miyano et al. study the impact of RIM-BP2 deletion at mossy fiber synapses, using direct electrophysiological recordings from mossy terminals and STED super-resolution microscopy. The paper addresses an important question in the field of synaptic transmission and provides compelling evidence demonstrating reduced calcium channel abundance in mossy terminals upon RIM-BP2 removal.

    2. Reviewer #1 (Public Review):

      This nice study by Miyano combines slice electrophysiology and superresolution microscopy to address the role of RBP2 in Ca2+ channel clustering and neurotransmitter release at hippocampal mossy fiber terminals. While a number of studies demonstrated a critical role for RBPs in clustering Ca2+ channels at other synapses and some provided evidence for a role of the protein in molecular coupling of Ca2+ channels and release sites, the present study targets another key synapse that is an important model for presynaptic studies and offers access to a microdomain controlled synaptic vesicle (SV) release mechanism with low initial release probability.

      Summarizing a large body of high-quality work, the authors demonstrate reduced Ca2+ currents and a reduced release probability. They attribute the latter to the reduced Ca2+ influx and can restore release by increasing Ca2+ influx. Moreover, they propose an altered fusion competence of the SVs, which is not so strongly supported by the data in my view.

      The effects are relatively small, but I think the careful analysis of the RBP role at the mossy fiber synapse is an important contribution.

    3. Reviewer #2 (Public Review):

      The proper expression and organization of CaV channels at the presynaptic release sites are subject to coordinative and redundant control of many active zone-specific molecules including RIM-BPs. Previous studies have demonstrated that ablation of RIM-BPs in various mammalian synapses causes significant impairment of synaptic transmission, either by reducing CaV expression or decoupling CaV from synaptic vesicles. The mechanisms remain unknown.

      In the manuscript, Sakaba and colleagues aimed to examine the specific role of RIM-BP2 at the hippocampal mossy fiber-CA3 pyramidal cell synapse, which is well-characterized by low initial release probability and strong facilitation during repetitive stimulation. By directly recording Ca2+ currents and capacitance jumps from the MF boutons, which is very challenging but feasible, they showed that depolarization-evoked Ca2+ influx was reduced significantly (~39%) by KO of RIM-BP2, but no impacts on Ca-induced exocytosis and RRP (measured by capacitance change). They used STED microscopy to image the spatial distribution of the CaV2.1 cluster but found no change in the cluster number with a slight decrease in cluster intensity (~20%). They concluded that RIM-BP2 functions in tonic synapses by reducing CaV expression and thus differentially from phasic synapses by decoupling CaV-SV.

      In general, they provide solid data showing that RIM-BP2 KO reduces Ca influx at MF-CA3 synapse, but the phenotype is not new as Moser and colleagues have also used presynaptic recording and capacitance measurement and shown that RIM-BP2 KO reduces Ca2+ influx at hair cell active zone (Krinner et al., 2017), although at different synapse model expressing CaV1.3 instead of CaV2.1. Further, the concept that RIM-BP2 plays diverse functions in transmitter release at different central synapses has also been proposed with solid evidence (Brockmann et al., 2019).

    1. eLife assessment

      This manuscript provides a useful demonstration that distractor effects in multi-attribute decision-making correlate with the form of attribute integration (additive vs. multiplicative). The evidence supporting the conclusions is generally solid, but there are concerns regarding the robustness of the statistical analyses. In addition, the lack of a clear theoretical motivation complicates the interpretation. The manuscript will be interesting to decision-making researchers in neuroscience, psychology, and related fields.

    2. Reviewer #1 (Public Review):

      Summary:<br /> The current study provided a follow-up analysis using published datasets focused on the individual variability of both the distraction effect (size and direction) and the attribute integration style, as well as the association between the two. The authors tried to answer the question of whether the multiplicative attribute integration style concurs with a more pronounced and positively oriented distraction effect.

      Strengths:<br /> The analysis extensively examined the impacts of various factors on decision accuracy, with a particular focus on using two-option trials as control trials, following the approach established by Cao & Tsetsos (2022). The statistical significance results were clearly reported.

      The authors meticulously conducted supplementary examinations, incorporating the additional term HV+LV into GLM3. Furthermore, they replaced the utility function from the expected value model with values from the composite model.

      Weaknesses:<br /> There are several weaknesses in terms of theoretical arguments and statistical analyses.

      First, the manuscript suggests in the abstract and at the beginning of the introduction that the study reconciled the "different claims" about "whether distraction effect operates at the level of options' component attributes rather than at the level of their overall value" (see line 13-14), but the analysis conducted was not for that purpose. Integrating choice attributes in either an additive or multiplicative way only reflects individual differences in combining attributes into the overall value. The authors seemed to assume that the multiplicative way generated the overall value ("Individuals who tended to use a multiplicative approach, and hence focused on overall value", line 20-21), but such implicit assumption is at odds with the statement in line 77-79 that people may use a simpler additive rule to combine attributes, which means overall value can come from the additive rule.

      The second weakness is sort of related but is more about the lack of coherent conceptual understanding of the "additive rule", or "distractor effect operates at the attribute level". In an assertive tone (lines 77-80), the manuscript suggests that a weighted sum integration procedure of implementing an "additive rule" is equal to assuming that people compare pairs of attributes separately, without integration. But they are mechanistically distinct. The additive rule (implemented using the weighted sum rule to combine probability and magnitude within each option and then applying the softmax function) assumes value exists before comparing options. In contrast, if people compare pairs of attributes separately, preference forms based on the within-attribute comparisons. Mathematically these two might be equivalent only if no extra mechanisms (such as inhibition, fluctuating attention, evidence accumulation, etc) are included in the within-attribute comparison process, which is hardly true in the three-option decision.

      Could the authors comment on the generalizability of the current result? The reward magnitude and probability information are displayed using rectangular bars of different colors and orientations. Would that bias subjects to choose an additive rule instead of the multiplicative rule? Also, could the conclusion be extended to other decision contexts such as quality and price, whether a multiplicative rule is hard to formulate?

      The authors did careful analyses on quantifying the "distractor effect". While I fully agree that it is important to use the matched two-option trials and examine the interaction terms (DV-HV)T as a control, the interpretation of the results becomes tricky when looking at the effects in each trial type. Figure 2c shows a positive DV-HV effect in two-option trials whereas the DV-HV effect was not significantly stronger in three-option trials. Further in Figure 5b,c, in the Multiplicative group, the effect of DV-HV was absent in the two-option trials and present in the three-option trials. In the Additive group, however, the effect of DV-HV was significantly positive in the two-option trials but was significantly lowered in the three-option trials. Hence, it seems the different distractor effects were driven by the different effects of DV-HV in the two-option trials, rather than the three-option trials?

      Note that the pattern described above was different in Supplementary Figure 2, where the effect of DV-HV on the two-option trials was negative for both Multiplicative and Additive groups. I would suggest considering using Supplementary Figure 2 as the main result instead of Figure 5, as it does not rely on multiplicative EV to measure the distraction effect, and it shows the same direction of DV-HV effect on two-option trials, providing a better basis to interpret the (DV-HV)T effect.

    3. Reviewer #2 (Public Review):

      This paper addresses the empirical demonstration of "distractor effects" in multi-attribute decision-making. It continues a debate in the literature on the presence (or not) of these effects, which domains they arise in, and their heterogeneity across subjects. The domain of the study is a particular type of multi-attribute decision-making: choices over risky lotteries. The paper reports a re-analysis of lottery data from multiple experiments run previously by the authors and other laboratories involved in the debate.

      Methodologically, the analysis assumes a number of simple forms for how attributes are aggregated (adaptively, multiplicatively, or both) and then applies a "reduced form" logistic regression to the choices with a number of interaction terms intended to control for various features of the choice set. One of these interactions, modulated by ternary/binary treatment, is interpreted as a "distractor effect."

      The claimed contribution of the re-analysis is to demonstrate a correlation in the strength/sign of this treatment effect with another estimated parameter: the relative mixture of additive/multiplicative preferences.

      Major Issues

      1) How to Interpret GLM 1 and 2

      This paper, and others before it, have used a binary logistic regression with a number of interaction terms to attempt to control for various features of the choice set and how they influence choice. It is important to recognize that this modelling approach is not derived from a theoretical claim about the form of the computational model that guides decision-making in this task, nor an explicit test for a distractor effect. This can be seen most clearly in the equations after line 321 and its corresponding log-likelihood after 354, which contain no parameter or test for "distractor effects". Rather the computational model assumes a binary choice probability and then shoehorns the test for distractor effects via a binary/ternary treatment interaction in a separate regression (GLM 1 and 2). This approach has already led to multiple misinterpretations in the literature (see Cao & Tsetsos, 2022; Webb et al., 2020). One of these misinterpretations occurred in the datasets the authors studied, in which the lottery stimuli contained a confound with the interaction that Chau et al., (2014) were interpreting as a distractor effect (GLM 1). Cao & Tsetsos (2022) demonstrated that the interaction was significant in binary choice data from the study, therefore it can not be caused by a third alternative. This paper attempts to address this issue with a further interaction with the binary/ternary treatment (GLM 2). Therefore the difference in the interaction across the two conditions is claimed to now be the distractor effect. The validity of this claim brings us to what exactly is meant by a "distractor effect."

      The paper begins by noting that "Rationally, choices ought to be unaffected by distractors" (line 33). This is not true. There are many normative models that allow for the value of alternatives (even low-valued "distractors") to influence choices, including a simple random utility model. Since Luce (1959), it has been known that the axiom of "Independence of Irrelevant Alternatives" (that the probability ratio between any two alternatives does not depend on a third) is an extremely strong axiom, and only a sufficiency axiom for a random utility representation (Block and Marschak, 1959). It is not a necessary condition of a utility representation, and if this is our definition of rational (which is highly debatable), not necessary for it either. Countless empirical studies have demonstrated that IIA is falsified, and a large number of models can address it, including a simple random utility model with independent normal errors (i.e. a multivariate Probit model). In fact, it is only the multinomial Logit model that imposes IIA. It is also why so much attention is paid to the asymmetric dominance effect, which is a violation of a necessary condition for random utility (the Regularity axiom).

      So what do the authors even mean by a "distractor effect." It is true that the form of IIA violations (i.e. their path through the probability simplex as the low-option varies) tells us something about the computational model underlying choice (after all, different models will predict different patterns). However we do not know how the interaction terms in the binary logit regression relate to the pattern of the violations because there is no formal theory that relates them. Any test for relative value coding is a joint test of the computational model and the form of the stochastic component (Webb et al, 2020). These interaction terms may simply be picking up substitution patterns that can be easily reconciled with some form of random utility. While we can not check all forms of random utility in these datasets (because the class of such models is large), this paper doesn't even rule any of these models out.

      2) How to Interpret the Composite (Mixture) model?

      On the other side of the correlation are the results from the mixture model for how decision-makers aggregate attributes. The authors report that most subjects are best represented by a mixture of additive and multiplicative aggregation models. The authors justify this with the proposal that these values are computed in different brain regions and then aggregated (which is reasonable, though raises the question of "where" if not the mPFC). However, an equally reasonable interpretation is that the improved fit of the mixture model simply reflects a misspecification of two extreme aggregation processes (additive and EV), so the log-likelihood is maximized at some point in between them.

      One possibility is a model with utility curvature. How much of this result is just due to curvature in valuation? There are many reasonable theories for why we should expect curvature in utility for human subjects (for example, limited perception: Robson, 2001, Khaw, Li Woodford, 2019; Netzer et al., 2022) and of course many empirical demonstrations of risk aversion for small stakes lotteries. The mixture model, on the other hand, has parametric flexibility.

      There is also a large literature on testing expected utility jointly with stochastic choice, and the impact of these assumptions on parameter interpretation (Loomes & Sugden, 1998; Apesteguia & Ballester, 2018; Webb, 2019). This relates back to the point above: the mixture may reflect the joint assumption of how choice departs from deterministic EV.

      3) So then how should we interpret the correlation that the authors report?

      On one side we have the impact of the binary/ternary treatment which demonstrates some impact of the low value alternative on a binary choice probability. This may reflect some deep flaws in existing theories of choice, or it may simply reflect some departure from purely deterministic expected value maximization that existing theories can address. We have no theory to connect it to, so we cannot tell. On the other side of the correlation, we have a mixture between additive and multiplicative preferences over risk. This result may reflect two distinct neural processes at work, or it may simply reflect a misspecification of the manner in which humans perceive and aggregate attributes of a lottery (or even just the stimuli in this experiment) by these two extreme candidates (additive vs. EV). Again, this would entail some departure from purely deterministic expected value maximization that existing theories can address.

      It is entirely possible that the authors are reporting a result that points to the more exciting of these two possibilities. But it is also possible (and perhaps more likely) that the correlation is more mundane. The paper does not guide us to theories that predict such a correlation, nor reject any existing ones. In my opinion, we should be striving for theoretically-driven analyses of datasets, where the interpretation of results is clearer.

      4) Finally, the results from these experiments might not have external validity for two reasons. First, the normative criterion for multi-attribute decision-making differs depending on whether the attributes are lotteries or not (i.e. multiplicative vs additive). Whether it does so for humans is a matter of debate. Therefore if the result is unique to lotteries, it might not be robust for multi-attribute choice more generally. The paper largely glosses over this difference and mixes literature from both domains. Second, the lottery information was presented visually and there is literature suggesting this form of presentation might differ from numerical attributes. Which is more ecologically valid is also a matter of debate.

      Minor Issues:<br /> The definition of EV as a normative choice baseline is problematic. The analysis requires that EV is the normative choice model (this is why the HV-LV gap is analyzed and the distractor effect defined in relation to it). But if the binary/ternary interaction effect can be accounted for by curvature of a value function, this should also change the definition of which lottery is HV or LV for that subject!

      References<br /> Apesteguia, J. & Ballester, M. Monotone stochastic choice models: The case of risk and time preferences. Journal of Political Economy (2018).

      Block, H. D. & Marschak, J. Random Orderings and Stochastic Theories of Responses. Cowles Foundation Discussion Papers (1959).

      Khaw, M. W., Li, Z. & Woodford, M. Cognitive Imprecision and Small-Stakes Risk Aversion. Rev. Econ. Stud. 88, 1979-2013 (2020).

      Loomes, G. & Sugden, R. Testing Different Stochastic Specificationsof Risky Choice. Economica 65, 581-598 (1998).

      Luce, R. D. Indvidual Choice Behaviour. (John Wiley and Sons, Inc., 1959).

      Netzer, N., Robson, A. J., Steiner, J. & Kocourek, P. Endogenous Risk Attitudes. SSRN Electron. J. (2022) doi:10.2139/ssrn.4024773.

      Robson, A. J. Why would nature give individuals utility functions? Journal of Political Economy 109, 900-914 (2001).

      Webb, R. The (Neural) Dynamics of Stochastic Choice. Manage Sci 65, 230-255 (2019).

    4. Reviewer #3 (Public Review):

      Summary:<br /> The way an unavailable (distractor) alternative impacts decision quality is of great theoretical importance. Previous work, led by some of the authors of this study, had converged on a nuanced conclusion wherein the distractor can both improve (positive distractor effect) and reduce (negative distractor effect) decision quality, contingent upon the difficulty of the decision problem. In very recent work, Cao and Tsetsos (2022) reanalyzed all relevant previous datasets and showed that once distractor trials are referenced to binary trials (in which the distractor alternative is not shown to participants), distractor effects are absent. Cao and Tsetsos further showed that human participants heavily relied on additive (and not multiplicative) integration of rewards and probabilities.

      The present study by Wong et al. puts forward a novel thesis according to which interindividual differences in the way of combining reward attributes underlie the absence of detectable distractor effect at the group level. They re-analysed the 144 human participants and classified participants into a "multiplicative integration" group and an "additive integration" group based on a model parameter, the "integration coefficient", that interpolates between the multiplicative utility and the additive utility in a mixture model. They report that participants in the "multiplicative" group show a negative distractor effect while participants in the "additive" group show a positive distractor effect. These findings are extensively discussed in relation to the potential underlying neural mechanisms.

      Strengths:<br />  The study is forward-looking, integrating previous findings well, and offering a novel proposal on how different integration strategies can lead to different choice biases.<br />  The authors did an excellent job of connecting their thesis with previous neural findings. This is a very encompassing perspective that is likely to motivate new studies towards a better understanding of how humans and other animals integrate information in decisions under risk and uncertainty.<br />  Despite that some aspects of the paper are very technical, methodological details are well explained and the paper is very well written.

      Weaknesses:<br />  The authors quantify the distractor variable as "DV - HV", i.e., the relative distractor variable. Do the conclusions hold when the distractor is quantified in absolute terms (as "DV", see also Cao & Tsetsos, 2023)? Similarly, the authors show in Suppl. Figure 1 that the inclusion of a HV + LV regressor does not alter their conclusions. However, the (HV + LV)*T regressor was not included in this analysis. Does including this interaction term alter the conclusions considering there is a high correlation between (HV + LV)*T and (DV - HV)*T? More generally, it will be valuable if the authors assess and discuss the robustness of their findings across different ways of quantifying the distractor effect.<br />  The central finding of this study is that participants who integrate reward attributes multiplicatively show a positive distractor effect while participants who integrate additively show a negative distractor effect. This is a very interesting and intriguing observation. However, there is no explanation as to why the integration strategy covaries with the direction of the distractor effect. It is unlikely that the mixture model generates any distractor effect as it combines two "context-independent" models (additive utility and expected value) and is fit to the binary-choice trials. The authors can verify this point by quantifying the distractor effect in the mixture model. If that is the case, it will be important to highlight that the composite model is not explanatory; and defer a mechanistic explanation of this covariation pattern to future studies.<br />  Correction for multiple comparisons (e.g., Bonferroni-Holm) was not applied to the regression results. Is the "negative distractor effect in the Additive Group" (Fig. 5c) still significant after such correction? Although this does not affect the stark difference between the distractor effects in the two groups (Fig. 5a), the classification of the distractor effect in each group is important (i.e., should future modelling work try to capture both a negative and a positive effect in the two integration groups? Or just a null and a positive effect?).

    1. eLife assessment

      The study by Ghafari et al. addresses a question that is highly relevant for the field of attention as it connects structural differences in subcortical regions with oscillatory modulations during attention allocation. Using a combination of magnetoencephalography (MEG) and magnetic resonance imaging (MRI) data in human subjects, inter-individual differences in the lateralization of alpha oscillations are explained by asymmetry of subcortical brain regions. The results are important, and the strength of the evidence is convincing. Yet, clarifying the rationale, reporting the data in full, a more comprehensive analysis, and a more detailed discussion of the implications will strengthen the manuscript further.

    2. Reviewer #1 (Public Review):

      Summary:<br /> The authors re-analysed the data of a previous study in order to investigate the relation between asymmetries of subcortical brain structures and the hemispheric lateralization of alpha oscillations during visual spatial attention. The visual spatial attention task crossed the factors of target load and distractor salience, which made it possible to also test the specificity of the relation of subcortical asymmetries to lateralized alpha oscillations for specific attentional load conditions. Asymmetry of globus pallidus, caudate nucleus, and thalamus explained inter-individual differences in attentional alpha modulation in the left versus right hemisphere. Multivariate regression analysis revealed that the explanatory potential of these regions' asymmetries varies as a function of target load and distractor salience.

      Strengths:<br /> The analysis pipeline is straightforward and follows in large parts what the authors have previously used in Mazzetti et al (2019). The authors use an interesting study design, which allows for testing of effects specific to different dimensions of attentional load (target load/distractor salience). The results are largely convincing and in part replicate what has previously been shown. The article is well-written and easy to follow.

      Weaknesses:<br /> While the article is interesting to read for researchers studying alpha oscillations in spatial attention, I am somewhat sceptical about whether this article is of high interest to a broader readership. Although I read the article with interest, the conceptual advance made here can be considered mostly incremental. As the authors describe, the present study's main advance is that it does not include reward associations (as in previous work) and includes different levels of attentional load. While these design features and the obtained results indeed improve our general understanding of how asymmetries of subcortical structures relate to lateralized alpha oscillations, the conceptual advance is somewhat limited.

      While the analysis of the relation of individual subcortical structures to alpha lateralization in different attentional load conditions is interesting, I am not convinced that the present analysis is suited to draw strong conclusions about the subcortical regions' specificity. For example, the Thalamus (Fig. 5) shows a significant negative beta estimate only in one condition (low-load target, non-salient distractor) but not in the other conditions. However, the actual specificity of the relation of thalamus asymmetry to lateralized alpha oscillations would require that the beta estimate for this one condition is significantly higher than the beta estimates for the other three conditions, which has not been tested as far as I understand.

    3. Reviewer #2 (Public Review):

      Summary:<br /> In this study, Ghafari et al. explored the correlation between hemispheric asymmetry in the volume of various subcortical regions and lateralization of posterior alpha-band oscillations in a spatial attention task with varying cognitive demands. To this end, they combined structural MRI and task MEG to investigate the relationship between hemispheric differences in the volume of basal ganglia, thalamus, hippocampus, and amygdala and hemisphere-specific modulation of alpha-band power. The authors report that differences in the thalamus, caudate nucleus, and globus pallidus volume are linked to the attention-related changes in alpha band oscillations with differential correlations for different regions in different conditions of the design (depending on the salience of the distractor and/or the target).

      Strengths:<br /> The manuscript contributes to filling an important gap in current research on attention allocation which commonly focuses exclusively on cortical structures. Because it is not possible to reliably measure subcortical activity with non-invasive electrophysiological methods, they correlate volumetric measurements of the relevant subcortical regions with cortical measurements of alpha band power. Specifically, they build on their own previous finding showing a correlation between hemispheric asymmetry of basal ganglia volumes and alpha lateralization by assessing a task without an explicit reward component. Furthermore, the authors use differences in saliency and perceptual load to disentangle the individual contributions of the subcortical regions.

      Weaknesses:<br /> The theoretical bases of several aspects of the design and analyses remain unclear. Specifically, we missed statements in the introduction about why it is reasonable, from a theoretical perspective, to expect:<br /> (i) a link between volumetric measurements and task activity;<br /> (ii) a specific link with hemispheric asymmetry in subcortical structures (While focusing on hemispheric lateralization might circumvent the problem of differences in head size, it would be better to justify this focus theoretically, which requires for example a short review of evidence showing ipsilateral vs contralateral connections between the relevant subcortical and cortical structures);<br /> (iii) effects not only in basal ganglia and thalamus, but also hippocampus and amygdala (a justification of selection of all ROIs);<br /> (iv) effects that depend on distractor versus target salience (a rationale for the specific two-factor design is missing);<br /> (v) effects in the absence of reward (why it is important to show that the effect seen previously in a task with reward is seen also in a task without reward);<br /> (vi) effects on rapid frequency tagging.

      Second, the results are not fully reported. The model space and the results from the model comparison are omitted. Behavioral data and rapid frequency tagging results are not shown. Without having access to the data or the results of the analyses, the reader cannot evaluate whether the null effect corresponds to the absence of evidence or (as claimed in the discussion) evidence of absence.

      Third, it remains unclear whether the MMS is the best approach to analyzing effects as a function of target and distractor salience. To address the question of whether the effects of subcortical volumes on alpha lateralization vary with task demands (which we assume is the primary research question of interest, given the factorial design), we would like to evaluate some sort of omnibus interaction effect, e.g., by having target and distractor saliency interact with the subcortical volume factors to predict alpha lateralization. Without such analyses, the results are very hard to interpret. What are the implications of finding the differential effects of the different volumes for the different task conditions without directly assessing the effect of the task manipulation? Moreover, the report would benefit from a further breakdown of the effects into simple effects on unattended and attended alpha, to evaluate whether effects as a function of distractor (vs target) salience are indeed accompanied by effects on unattended (vs attended) alpha.

      The fourth concern is that the discussion section is not quite ready to help the reader appreciate the implications of key aspects of the findings. What are the implications for our understanding of the roles of different subcortical structures in the various psychological component processes of spatial attention? Why does the volumetric asymmetry of different subcortical structures have diametrically opposite effects on alpha lateralization? Instead, the discussion section highlights that the different subcortical structures are connected in circuits: "Globus pallidus also has wide projections to the thalamus and can thereby impact the dorsal attentional networks by modulating prefrontal activities." If this is true, then why does the effect of the GP dissociate from that of the thalamus? Also, what is it about the current behavioural paradigm that makes the behavioral readout insensitive to variation in subcortical volume (or alpha lateralization?)?

    1. eLife assessment

      The authors present a software package called osl-dynamics that uses generative models (Hidden Markov Model and Dynamic Network Modes) that can be adapted to the data, and the latent states and transition across states obtained through the model can be used to describe spectro-temporal characteristics of the brain signals, as well as for oscillatory burst detection. This approach is important and adds to the repertoire of techniques that can be used to study high-dimensional data and having access to this software (with tutorials and examples) will help other researchers test the usefulness of their approach. The evidence is convincing, but could further benefit from an objective way by which the output of their model can be compared/judged or through results from synthetic data with known properties.

    2. Reviewer #1 (Public Review):

      Summary:<br /> These types of analyses use many underlying assumptions about the data, which are not easy to verify. Hence, one way to test how the algorithm is performing in a task is to study its performance on synthetic data in which the properties of the variable of interest can be apriori fixed. For example, for burst detection, synthetic data can be generated by injected bursts of known durations, and checking if the algorithm is able to pick it up. Burst detection is difficult in the spectral domain since direct spectral estimators have high variance (see Subhash Chandran et al., 2018, J Neurophysiol). Therefore, detected burst lengths are typically much lower than injected burst lengths (see Figure 3). This problem can be solved by doing burst estimation in the time domain itself, for example, using Matching Pursuit (MP). I think the approach presented in this paper would also work since this model is also trained on data in the time domain. Indeed, the synthetic data can be made more "challenging" by injecting multiple oscillatory bursts that are overlapping in time, for which a greedy approach like MP may fail. It would be very interesting to test whether this method can "keep up" as the data is made more challenging. While showing results from brain signals directly (e.g., Figure 7) is nice, it will be even more impactful if it is backed up with results obtained from synthetic data with known properties.

      I was wondering about what kind of "synthetic data" could be used for the results shown in Figure 8-12 but could not come up with a good answer. Perhaps data in which different sensory systems are activated (visual versus auditory) or sensory versus movement epochs are compared to see if the activation maps change as expected. We see similarities between states across multiple runs (reproducibility analysis) and across tasks (e.g. Figure 8 vs 9) and even methods (Figure 8 vs 10), which is great. However, we should also expect the emergence of new modes specific to sensory activation (say auditory cortex for an auditory task). This will allow us to independently check the performance of this method.

      The authors should explain the reproducibility results (variational free energy and best run analysis) in the Results section itself, to better orient the reader on what to look for.

      Page 15: the comparison across subjects is interesting, but it is not clear why sensory-motor areas show a difference and the mean lifetime of the visual network decreases. Can you please explain this better? The promised discussion in section 3.5 can be expanded as well.

    3. Reviewer #2 (Public Review):

      Summary:<br /> The authors have developed a comprehensive set of tools to describe dynamics within a single time-series or across multiple time-series. The motivation is to better understand interacting networks within the human brain. The time-series used here are from direct estimates of the brain's electrical activity; however, the tools have been used with other metrics of brain function and would be applicable to many other fields.

      Strengths:<br /> The methods described are principled, and based on generative probabilistic models.<br /> This makes them compact descriptors of the complex time-frequency data.<br /> Few initial assumptions are necessary in order to reveal this compact description.<br /> The methods are well described and demonstrated within multiple peer-reviewed articles.<br /> This toolbox will be a great asset to the brain imaging community.

      Weaknesses:<br /> The only question I had was how to objectively/quantitatively compare different network models. This is possibly easily addressed by the authors.

    1. eLife assessment

      This study presents a useful differentiation method that produces syndetome-like cells from human induced pluripotent stem cells as determined through single-cell RNA sequencing analysis. Nevertheless, it is essential to note that the authors' assertion that the efficiency of syndetome differentiation can be enhanced by inhibiting BMP and Wnt requires further substantiation, as the evidence provided remains incomplete. The major weaknesses of the manuscript center on issues related to data representation in figures and their subsequent interpretation. The work holds relevance for scholars in the field of musculoskeletal research who are dedicated to advancing translational medicine for the benefit of patients.

    2. Reviewer #1 (Public Review):

      Papalamprou et al. established a methodology to differentiate iPSCs to the syndetome stage and validated it by marker gene expression and scRNA-seq analysis. They further found that inhibition of WNT signaling enhanced the homogeneity of the cell population after identifying a group of braching-off cells that overexpressed WNT. Their results will be helpful in developing cell therapy systems for tendon injuries. However, there are several issues to improve the manuscript:

      IPA analysis was performed after scRNA-seq. Although it is knowledge-based software with convenient graphic utilities, it is questionable whether an unbiased genome-level analysis was performed. Therefore, it is not convincing if WNT is the only and best signal for the branching-off marker. Perhaps independent approaches, such as GO, pathway, or module analyses, should be performed to validate the finding.

      According to the method section, two iPSC lines were used for the study. However, throughout the manuscript, it is not clearly described which line was used for which experiment. Did they show similar efficiency in differentiation and in responses to WNTi? It is also worrisome if using only two lines is the norm in the stem cell field. Please provide a rationale for using only two lines, which will restrict the observation of individual-specific differential responses throughout the study.

      How similar are syndetome cells with or without WNTi? It would be interesting to check if there are major DEGs that differentiate these two groups of cells.

      Please discuss the improvement of the current study compared to previous ones (e.g., PMID 36203346, 35083031, 35372337).

    3. Reviewer #2 (Public Review):

      Summary:<br /> Dr. Sheyn and colleagues report the step-wise induction of syndetome-like cells from human induced pluripotent stem cells (iPSCs), following a previously published protocol which they adjusted. The progression of the cells through each stage, i.e. presomitic mesoderm (PSM), somitic mesoderm (SM), sclerotome (SCL), and syndetome (SYN)) is characterized using FACS, RT-qPCR and immunofluorescence staining (IF). The authors also performed single-cell RNA sequencing (scRNAseq) analysis of their step-wise induced cells and identified signaling pathways which are potentially involved in and possibly necessary for syndetome induction. They then optimized their protocol by simultaneous inhibition of BMP and Wnt signaling pathways, which led to an increase in syndetome induction while inhibiting off-target differentiation into neural lineages.

      Strengths:<br /> The authors conducted scRNAseq analysis of each step of their protocol from iPSCs to syndetome-like cells and employed pathway analysis to uncover further insights into somitic mesoderm (SM) and syndetome (SYN) differentiation. They found that BMP inhibition, in conjunction with the inhibition of WNT signaling, plays a role in driving syndetome differentiation. Analyzing their scRNAseq results, they could improve the syndetome induction efficiency of their protocol from 47.6% to 67%-78% while off-target differentiation into neural lineages could be reduced.

      Weaknesses:<br /> The authors demonstrated the efficiency of syndetome induction solely by scRNA-seq data analysis before and after pathway inhibition, without using e.g. FACS analysis or immunofluorescence (IF)-staining based assessment. A functional assessment and validation of the induced cells is also completely missing.

      The following points are not clear and need to be addressed by the authors:<br /> 1. Notably, in Figure 1D, certain PSM markers (TBXT, MSGN1, WNT3A) show higher expression on day 3. If the authors initiate SM induction on day 3 instead of day 4, could this potentially enhance the efficiency of syndetome-like cell induction?

      2. In the third paragraph of the result section the authors note, "Interestingly, SCX, a prominent tenogenic transcription factor, was significantly downregulated at the SCL stage compared to iPSC, but upregulated during the differentiation from SCL to SYN." Despite this increase, the expression level of SCX in SYN remains lower than that in iPSCs in Fig.1G and Fig.3C. Can the authors provide an explanation for this? Can the authors provide IF data using iPSCs and compare it with in vitro-induced SYN cells? Can the authors provide e.g. additional scRNAseq data which could support this statement?

      3. In the fourth paragraph of the result section the authors state, "SM markers (MEOX1, PAX3) and SCL markers (PAX1, PAX9, NKX3.2, SOX9) were upregulated in a stepwise manner." However, the data for MEOX1 and NKX3.2 seems to be missing from Figure 3B-C. The authors should provide this data and/or additional support for their claim.

      4. In Figures 2B and 2E, the background of the red channel seems extremely high. Are there better images available, particularly for MEOX1? Given the expected high expression of MEOX1 in SM cells, the authors should observe a strong signal in the nucleus of the stained somitic mesoderm-like cells, but that is not the case in the shown figure. The authors should provide separate channel images instead of merged ones for clarity. The antibody which the authors used might not be specific. Can the authors provide images using an antibody which has been shown to work previously e.g. antibody by ATLAS (Cat#: HPA045214)?

      5. In Fig. 2C and Supplementary Fig. 1, the authors present data from immunofluorescence (IF) staining and FACS analysis using a DLL1 antibody. While FACS analysis indicates an efficiency of 96.2% for DLL1+ cells, this was not clearly observed in their IF data. How can the authors explain this discrepancy? Could the authors quantify their IF data and compare it with the corresponding FACS data?

      6. In Fig. 2G, PAX9 is expected to be expressed in the nucleus, but the shown IF staining does not appear to be localized to the nucleus. Could the authors provide improved or alternative images to clarify this? The authors should use antibodies shown to work with high specificity as already reported by other groups.

      7. Why did the authors choose to display day 10 data for SYN induction in Fig. 4A? Could they provide information about the endpoint of their culture at day 21?

      8. In Supplementary Fig. 5, the authors depicted the expression level of SCX, a SYN marker, which peaked at day 14 and then decreased. By day 21, it reached a level comparable to that of iPSCs. Given this observation, could the authors provide a characterization of the cells at day 21 during SYN induction using IF? What was the rationale behind selecting 21 days for SYN induction? The authors also need to show 'n numbers'; how many times were the experiments repeated independently (independent experiments)?<br /> 9. Overall the shown immunofluorescence (IF) data does not appear convincing. Could the authors please provide clearer images, including separate channel images, a bright field image, and magnified views of each staining?

      10. As stated by the authors in the manuscript, another research group performed FACS analysis to assess the efficiency of syndetome induction using SCX antibody, and/or quantification of immunofluorescence (IF) with SCX, MKX, COL1A1, or COL2A1 antibodies. Could the authors conduct a comparative analysis of syndetome induction efficiency both before and after protocol optimization, utilizing FACS analysis in conjunction with an SCX reporter line or antibody staining, e.g. quantifying induction efficiency via immunofluorescence (IF) staining with syndetome-specific marker genes?

      11. To enhance the paper's significance, the authors should conduct functional validation experiments and proper assessment of their induced syndetome-like cells. They could perform e.g. xeno-transplantation experiments with syndetome cells into SCID-mice or injury models. They could also assess whether the in vitro induced cells could be applied for in vitro tendon/ligament formation.

      12. The authors should also compare their scRNA-seq data with actual human embryo data sets, something which could be done given the recent increase in available human embryo scRNA-seq data sets.

    4. Reviewer #3 (Public Review):

      Papalamprou et al sought to fine-tune existing tenogenic differentiation protocols to develop a robust multi-step differentiation protocol to induce tendon cells from human GMP-ready iPSCs. In so doing, they found that while existing protocols are capable of driving cells towards a syndetome-like fate, the resultant cultures contain highly heterogeneous cell populations with sub-optimal cell survival. Through single-cell transcriptomic analysis, they identify WNT signaling as a potential driver of an off-target neural population and show that inhibition of WNT signaling at the later 2 stages of differentiation can be used to promote higher efficiency of generation of syndetome-like cells.

      This paper includes a useful paradigm for identifying transcriptional modulators of cell fate during differentiation and a clear example where transcriptional data can be used to guide the chemical modulation of a differentiation protocol to improve cell output. The paper's conclusions are mostly well supported by the data, but the image analysis and figure presentation need to be improved to strengthen the impact.

      The data outlining the differences between the differentiation outcome of the two tested iPSCs is intriguing, but the authors fail to comment on potential differences between the two iPSC lines that could result in drastically different cell outputs from the same differentiation protocol. This is a critically important point, as the majority of the SCX+ cells generated from the 007i cells using their WNTi protocol were found in the FC subpopulation that failed to form from the 83i line under the same protocol. From the analysis of only these 2 cell lines in vitro, it is difficult to assess whether this WNTi protocol can be broadly used to generate tenogenic cells.

      The authors make claims to changes in protein expression but fail to quantify either fluorescence intensity or percent cell expression from their immunofluorescence analyses to substantiate these claims. These claims are not fully supported by the data as presented as it is unclear whether there is increased expression of tendon markers at the protein level or more cells surviving the protocol. Additionally, in images where 3 channels are merged, it would be helpful to show individual channels where genes are shown in similar spectra (ie. Fig 2I SCX/MKX). Furthermore, the current layout and labelling scheme of Figure 4 makes it very difficult to compare conditions between SYN and SYNWNTi protocols.

      Individual data points should also be presented for all qPCR experiments (ie. Fig 4A). Biological replicate information is missing from several experiments, particularly the immunofluorescence data, and it is unclear whether the qPCR data was generated from technical or biological replicates.

    1. eLife assessment

      This study presents valuable structural data for the bacterial adhesin PrgB, an atypical microbial cell surface-anchored polypeptide that binds DNA. There is convincing support for the claims regarding the overall function and importance of individual domains, which integrate a wide range of new and previously published experimental data. The structure-based model of PrgB molecular activity will be impactful in the field of bacterial adhesins, conjugation, and biofilm formation, especially because it focuses on a clinically relevant Gram-positive pathogen, whereas most work in the field has been focused on Gram-negative model systems.

    1. eLife assessment

      This fundamental study identifies the homeodomain transcription factor and suspected autism-candidate gene Meis2 as transcriptional regulators of maturation and end-organ innervation of low-threshold mechanoreceptors (LTMRs) in the dorsal root ganglia (DRG) of mice. For a few years, the view on autism spectrum disorders (ASD) has shifted from a disorder that exclusively affects the brain to a condition that also includes the peripheral somatosensory system, even though our knowledge about the genes involved is not complete. The study by Desiderio and colleagues is therefore not only scientifically interesting but may also have clinical relevance. The work is convincing, with appropriate and validated methodology in line with current state-of-the-art and the findings contribute both to understanding and potential application.

    1. eLife assessment

      This paper presents a new method called MINT that is simple yet effective at BCI-style decoding tasks in stereotyped settings. While the reviewers raise caveats, overall they believe the work is a valuable study for the field of motor control, and the evidence to support their claims is solid.

    1. eLife assessment

      What makes one member of the species behave differently from another is a core problem in behavioral neuroscience. The authors studied the specific case of odor preference behavior in fruit flies, and searched for links to activity in the first and second stages of the olfactory system. This is a valuable study, but the results are overstated and the evidence incomplete. It is difficult to discern robust links between neural structure/function and behavior in the data set as presented here.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      We now make clear throughout the manuscript that our proposition, holding the fast cassette as central to control over powerful movements governed by the PMn, remains a hypothesis. However, we provide additional rationale for our thinking that this is the case based on functional distinctions between the PMns and SMns. Both reviewers 1 and 2 also questioned why so few synaptic and ion channel genes are seen for the SMn type. As pointed out by the reviewer, the idea that small differences in birthdates between Mn types seems like an unlikely explanation and was removed. Now, we better develop the idea that the low levels of expression of both ion channel and synaptic genes in SMns are consistent with the finding from electrophysiology that point to greatly lowered levels of transmitter release, compared to PMns. Additionally, for the purpose of identifying all synaptic and ion channel genes shared equally between Mn types, we re-examined the transcriptome. Figure 7A & B now reflect all genes in these two categories detected above threshold in PMn and SMn types, and not just examples.

      Reviewer 2

      We have added cell types in mammalian circuits shown to express the ion channel cassette members. Examples include the calyx of Held in the auditory circuit and the cerebellar Purkinje neurons. As we show with zebrafish PMn these mammalian neurons form fast, reliable circuits. In these cases, it is noteworthy that our proposal is the first to link all three as functional partners in fast AP firing and high-fidelity synaptic transmission. The suggestion that pancreatic cells would be represented in our data is deemed highly unlikely as our technique separated out the spinal cords prior to dissociation. Finally, as suggested, we added the disclaimer that we can not exclude the possibility that clusters sharing both glia and neuronal markers may represent cell doublets. Other minor corrections were all made.

      Reviewer 3

      First, we agree that the role of PMns is not restricted to escape behavior. They have been shown to participate in the highest speed of swimming as well. We have made this clear throughout the paper.

      Second, we are at odds with this reviewer over the Type I and Type II V2a recruitment during high speed swimming. We agree that both V2a types of interneurons are involved in high speed swimming and likely escape, as both directly innervate the PMns, as pointed out by the reviewer in Figure 2c of Menelaou and McLean 2019. However, the reviewer interprets Figure 2c to show that Type I, not Type II, V2a is more highly recruited over the range of higher swimming speeds whereas we conclude just the opposite. These data, along with other papers we cited, have been firmed up in the text to support a central role played by Type II.

      Third, the reviewer recommends we remove Figures 6b and 6c relating to our two newly discovered SMn markers, fox1b and alcamb. Our data shown in Figure 6a shows that these markers label SMn somas in two distinct layers along the dorsal-ventral axis in the spinal cord. The reviewer objects to Figures 6b and 6c which compare the location of our two markers to the distributions of two well studied SMn labeling transgenic lines, islet:GFP and gata2:GFP. The correspondence is not absolute but suggests that the fox1b labels islet SMns and alcamb labels the gata2 SMns. In the previous version of the paper, we suggested that this correspondence might further signal different dorsal-ventral projections. This suggestion was based solely on reports that islet and gata2 transgenic lines preferentially label SMns with different projections. We do not view this particular point as important and in light of the controversy surrounding these projections, as noted by the reviewer, we removed all reference to the subject of muscle target areas. We focus instead, on our finding of two new markers that label different dorsal ventral soma layers which MAY correspond to previously described SMn types. This reasoning is made clear in the manuscript and, because of its potential importance, we elected to retain Figures 6b and 6c as a call for future testing.

      The reviewer makes other suggestions that were all incorporated. The CoLo estimates indeed were too high, as questioned by the reviewer, because, early on, we inadvertently counted two clusters rather than the single cluster that was later authenticated. This has been corrected to reflect 1.1% in Table 1. The evx1 and evx2 data have been added to Figure 4C. Nomenclature is corrected for KA neurons. We make clear that the axonal projections for CoLo were made with mCherry expression not the in-situ label. The Hayashi reference was added.

    2. eLife assessment

      In zebrafish, primary motor neurons (PMNs) control escape movements, and a more heterogeneous population of secondary motor neurons (SMNs) regulate the speed of rhythmic swimming. Using single cell RNA sequencing (scRNAseq), the authors have obtained compelling evidence that PMNs, and two types of interneurons innervating them, express a set of three genes encoding voltage-gated ion channels enabling rapid firing. The PMNs also express high transcript levels of proteins involved in exocytosis, which would be expected to support rapid neurotransmitter release. These results will be important for those working on spinal cord function and zebrafish genomics/transcriptomics.

    3. Reviewer #1 (Public Review):

      This manuscript by Kelly et al. reports results from single-cell transcriptomic analysis of spinal neurons in zebrafish. The work builds on a strong foundation of literature and the objective, to discern gene expression patterns specializing functionally distinct motor circuits, is well rationalized. Specifically, they compared the transcriptomes in the escape and swimming circuits.

      The authors discovered, in the motor neurons of the escape circuit, two functional groups or "cassettes" of genes related to excitability and vesicle release, respectively. Expression of these genes make sense for a "fast" circuit. This finding will be important to the field and form the basis for subsequent studies differentiating the escape circuit from others.

    4. Reviewer #2 (Public Review):

      Summary: Kelly et al. strategically leverage the unique strengths of the zebrafish larval model and scRNA-seq to uncover genes that determine the stereotypic output of different neuronal circuits. The results lead to the identification of ion channel and synapse associated genes that distinguish a fast reliable neuronal circuit.

      Strengths:<br /> - Well-established neuronal markers allow the transcriptomic analyses to match a majority of the transcriptomic clusters to specific spinal neuron subtypes.<br /> - One transcriptomic cluster reveals the presence in zebrafish of a spinal neuron subtype previously identified in mammals.<br /> - The primary motor neuron and specific interneurons of the circuit mediating strong and fast swimming share expression of cassettes of ion channel and synapse-related gene cassette that sculpt fast and strong synaptic transmission.<br /> - Results are optimally placed in the context of the rich background and literature regarding zebrafish spinal neuron physiology.

      Weaknesses:<br /> -The revised version has addressed previous concerns.

      Likely Impact:<br /> - The ion channel and synapse-related gene cassettes that distinguish the primary motor neuron circuit are shared with some mammalian circuits that also generate fast, reliable synaptic transmission.<br /> - The transcriptomic data have been deposited in the publicly accessible Gene Expression Omnibus allowing others to mine the rich data set that also included glial cells that were not the focus of this study.

    5. Reviewer #3 (Public Review):

      Functional and anatomical studies of spinal circuitry in vertebrates have formed the basis of our understanding of neuronal control of movements. Larval zebrafish provide a simplified system for deciphering spinal circuitry. In this manuscript, the authors performed scRNAseq on spinal cord neurons in larval zebrafish, identifying major classes of neuronal and glial types. Through transcriptome analysis, they validated several key interneuron types previously implicated in zebrafish locomotion circuitry. The authors went beyond identifying transcriptional markers and explored synaptic molecules associated with the strength of motor output. They discovered molecular distinctions causally related to the unique physiology of primary motoneuron (PMn) function, which involves providing strong synaptic outputs for escapes and fast swimming. They defined functional 'cassettes' comprising specific combinations of voltage-dependent ion channel types and synaptic proteins, likely responsible for generating maximal motor outputs.

      Comments on revised version:

      "However, the reviewer interprets Figure 2c to show that Type I, not Type II, V2a is more highly recruited over the range of higher swimming speeds whereas we conclude just the opposite."

      BRE: The preceding is the authors' response to the Reviewer's critique of Version 1 of the manuscript and refers to Figure 2C of Menelaou and McLean, Nat Commun. 10:4197, 2019; PMID: 31519892; PMCID: PMC6744451. Below the Reviewer's second critique elaborates on this point. The authors chose not to modify the manuscript further.

      This is not what I would like to maintain in my previous report. Both Type I and Type II V2a neurons are recruited during very fast swimming (70 Hz). The degree of the de-recruitment of Type I V2a neurons during slower swimming (40-60 Hz) is larger than Type II. Thus, what I would like to say is that Type I V2a neurons are more analogous to PMns than Type II V2a neurons (Both PMns and SMns are recruited during very fast swimming, and PMns tend to be de-recruited during slower swimming).

      In this sense, I don't like the author's way of relating Type II V2a neurons to escapes and very fast swimming. However, if the authors insist on the current form of the manuscript, I do not strongly object.

    1. eLife assessment

      This study presents a valuable set of experiments to test whether Bombus terrestris bumblebees can detect lethal-level doses of a series of pesticides in nectar-mimicking sugary solutions. Behavioural essays were coupled with electrophysiological measurements to show that B. terrestris mouthparts cannot detect high levels of the tested pesticides. If confirmed using pesticide formulas, and other bumblebee species, the study will be of general interest in environmental science research. Most experimental data are compelling, and the conclusions are sound, but the write-up would benefit from a broader ecological context.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for The Authors)

      MAJOR CONCERNS

      1) Not addressed, but perhaps relevant, is that most of the postembryonic fish growth results from stem cells located in the ciliary marginal zone that make new neurons and Muller glia throughout the fish's life. Thus, Muller cell heterogeneity may result from the central to the peripheral gradient of Muller glial cell maturation.

      1a. Müller glial cell heterogeneity needs to be confirmed using, for example, in situ hybridization studies with gene-specific probes identified in the scRNAseq that distinguish these 2 populations. An additional approach could be the use of transgenic lines harboring tagged endogenous or transgene that reflects the promoter activity of the Muller glia subtypespecific gene.

      We thank the reviewer for the insightful comments and agree on the importance to substantiate the Müller glia heterogeneity in our manuscript. Our study is not the only study that provides evidence for Müller glia heterogeneity. In particular, we would like to refer to a recent publication (Krylov et al., 2023). Using single cell RNA sequencing, Krylov et al. detect Müller glia heterogeneity in the uninjured retina, as well as upon selective, genetic ablation of distinct subtypes of photoreceptors (e.g. long and short wavelength sensitive cones, as well as rods). They observe six distinct clusters of quiescent Müller glia that show differential spatial distribution along the dorsal/ventral retinal axis. For instance, they report a ventral quiescent Müller glia population that shares some marker genes (aldh1a3, rdh10a, smoc1) with our nonreactive Müller glia 2 (cluster 2, supplementary files 1 and 2). Moreover, the authors report that Müller glia located at different positions along the dorsal/ventral axis exhibit distinct patterns of pcna upregulation as well as subsequent re-activation upon photoreceptor ablation. We have added the supportive information from Krylov et al. in the discussion section (lines: 781-789) of our manuscript.

      2) Most interesting, but also least substantiated, is the authors' report of 2 different quiescent Muller glial cell populations in the uninjured retina and 2 different reactive Muller cell populations in the injured retina. If these populations exist independently of each other, it would be important to investigate if they differentially impacted retina regeneration.

      2a. CRISPR knockdown in F0 of factors thought to be involved in specific Müller glia-derived progenitor trajectories would be important to lend some functional significance to the data.

      We fully agree with the reviewer that addition of functional data would enrich the manuscript with valuable information. However, we don´t believe that the suggested CRISPR knockdown of selected genes in F0 animals (also known as crispants) represents a suitable approach. Crispants have been used successfully to investigate genetic contributions in embryonic-tolarval stages (the first few days) of zebrafish development, as injection of multiple gRNAs targeting the same gene is sufficient to achieve a bi-allelic knockout of the gene of up to 90% (Kroll et al., 2021). However, unless both alleles of the target gene(s) is/are mutated already early on with nearly 100%, it is unlikely that the gRNA inactivation would work equally well during subsequent development into adult stages (several months later, and after exponential growth and volume increase of the animal). Even if biallelic inactivation in the crispants does work early on, it remains unclear whether and how crispants survive to adulthood, which will be necessary in order to address gene function in the context of retina regeneration. Moreover, since we observe that the genetic events during adult retina regeneration are highly similar to the events during retina development, we would rather expect the crispants already display developmental phenotypes, which would further hamper the study of potential regenerationspecific phenotypes in adult animals. We are convinced that only ‘clean’ conditional gene inactivation studies will be suitable to address the impact of Müller glia and derived progenitor trajectories on retina regeneration. In this respect, we have recently developed the new conditional Cre-Controlled CRISPR mutagenesis system (Hans et al., Nature Comm 2021). We are currently establishing stable lines to enable controlled and specific gene inactivation, but have only obtained preliminary results so far; the final analysis will take much more time and is, therefore, beyond the scope of this work.

      3) The discussion should be modified to relate the data here presented with those described in Hoang et al., 2020.

      We followed the suggestions of the reviewer and compared our single cell RNA sequencing dataset to that described in Hoang et al., 2020. As one might expect, the comparison between the two datasets showed similarities but also significant differences due to the different experimental set-ups. We show the results of this comparison in additional main (new Figure 9) and supplementary figures (new Figure 9-figure supplement 1). In order to compare our newly obtained scRNAseq dataset of MG and MG-lineage-derived cells of the regenerating zebrafish retina to the previously published dataset of light-lesioned retina (Hoang et al., 2020), we employed the ingestion method (Scanpy, https://scanpy-tutorials.readthedocs.io/en/latest/ integrating-data-using-ingest.html) and mapped the clusters identified by Hoang and colleagues to our clusters (new Figure 9). While we applied a short-term lineage tracing strategy and only sequenced the enriched population of FAC-sorted MG and MG-derived cells of the regenerating zebrafish retina, Hoang and colleagues sequenced all retinal cells in the light-lesioned retina. Consequently, comparison between the two datasets uncovered similarities, but also significant differences, due to the different experimental set-ups (Figure 9A). Consistently, the cluster annotated as resting MG in Hoang et al. mapped to clusters annotated as non-reactive MG 1 and 2 in our dataset (new Figure 9B). The cluster annotated as activated MG in Hoang et al. mapped to clusters annotated as reactive MG 1 and 2, as well as to the cluster with hybrid identity of MG/progenitors in our dataset. Interestingly, some cells annotated as activated MG in Hoang et al. mapped also to neurogenic progenitor 2 and 3 clusters in our dataset (Figure 9B). The cluster annotated as progenitors in Hoang et al. mapped to the progenitor area in our dataset, which included neurogenic progenitors 2, 3 as well as photoreceptor and horizontal cell precursors (new Figure 9B). Finally, retinal ganglion cells, cones, GABAergic amacrine cells and bipolar cells annotated in Hoang et al. perfectly mapped to retinal ganglion cells, cone, amacrine and bipolar cells in our dataset (new Figure 9B). While we did not detect a mature horizontal cell cluster, Hoang and colleagues annotated a horizontal cell cluster, which cells mapped to reactive MG 2, MG/progenitors 1 and part of progenitors 3 in our dataset (new Figure 9B). Moreover, Hoang and colleagues annotated rod photoreceptors that mapped to progenitors 3, photoreceptor precursors, red and blue cones, horizontal cell precursors and bipolar cells in our dataset (new Figure 9B). Finally, due to the different cell isolation protocol, Hoang and colleagues annotated additional cell clusters that did not map to any cluster in our more selective dataset, and included oligodendrocytes, pericytes, retinal pigmented epithelial cells as well as vascular/endothelial cells (new Figure 9B). Next, we selected representative marker genes per cluster from our scRNAseq dataset and checked their expression in the dataset by Hoang and colleagues (Figure 9-figure supplement 1). The dot plot showing the expression of selected gene candidates per cluster further corroborated the large overlap between clusters annotated in the present study with those annotated in the study by Hoang and colleagues. These novel comparisons to the data of Hoang et al. are now included in the resubmitted version, and are described and discussed in an additional paragraph in the results (lines: 482-517) as well as discussion (lines: 766-807) sections.

      MINOR CONCERNS

      1) Fig 1C is difficult to interpret. I am also confused by the color coding which is not presented in the figure legend - why 3 shades of red and two of blue? Please define each (for example, what's the difference between red, purple, and light red in the 6dpl panel?). What are the white areas outlined by blue and red circles/cells (looks like a topography plot)? It appears that there is a fairly large amount of pcna:EGFP expression in the uninjured retina - what are these cells?

      We have replaced Figure 1C with a better one and rephrased/extended the explanation of the figure in the results (lines: 192-195). Figure 1C depicts contour plots, which represent the relative frequency of data. Each contour line encloses an equal percentage of events (that is, cells), and contour lines that are closely packed indicate a high concentration of events. In flow cytometry, contour plots are used to represent highly frequent events, as this kind of plots are independent on sample size.

      Concerning the observed pcna:EGFP expressing cells in the uninjured retina, we interpret them as proliferating cells coming from the ciliary marginal zone and from Müller glia of the central retina, which represent progenitors and Müller glia that have re-entered the cell cycle to generate rod progenitors, respectively. Consistent with that, we observe pcna:EGFPpositive cells in the ciliary marginal zone as well as central retina using immunofluorescence, as shown in Figure 1-figure supplement 1.

      2) Results, lines 186-188 are not presented clearly: EGFP+ cells may persist for some time after they leave the cell cycle, so stating EGFP+ cells are proliferating may not be correct. How long does PCNA promoter activity and EGFP expression remain after Muller cells exit the cell cycle? mCherry+/EGFP- cells may be non-reactive Muller glia or reactive Muller glia that have not entered the cell cycle. It seems likely that Muller glia start reprogramming before undergoing cell division.

      We agree with the reviewer that EGFP persists for some time after the cells have left the cell cycle, which we actually describe and use to benefit in our study. We do not know for how long exactly the pcna promoter is active within the cell cycle, but EGFP is known to have a half-life of approximately 24 hours (Li et al., 1998). Even though we cannot make a statement about EGFP persistence in Müller glia, we note that previous reports (Lahne et al., 2015; Nagashima et al., 2013; Nelson et al., 2013; Thummel et al., 2008) and our study (Figure 3-figure supplement 2) show PCNA at the protein level in Müller glia cells between 24 and 48 hpl, including our sampled 44 hpl time point (lines: 69-73). We also agree with the reviewer that Müller glia will become reactive to the injury most likely prior (lines: 67-69) to activation of the pcna promoter, meaning that Müller glia are EGFP-negative at this time point due to the immature status of EGFP after translation. However, we are confident that our data also comprises this cell state (early phase of Müller glia activation) because we sampled proliferating (EGFP- and mCherry-double positive cells) as well as non-proliferating Müller glia (mCherry-only positive cells) at all time points (lines: 213-215 and Figure 1C). We interpret that the early phase of Müller glia activation corresponds to Müller glia transitioning from a nonreactive to a reactive state. With respect to our UMAP, we map this cell state in cluster 1 localizing to the top left part of the cluster, abutting cluster 3, the reactive Müller glia 1 (Figure 2B).

      3) I am concerned by the observation that microglia were identified by scRNAseq as a contaminating cell population. Since FACS was based on gfap:mCherry expression, why did microglia end up in the mix? Also, what are the ‘...low-quality cells expressing many ribosomal transcripts...’ and why, if they are low-quality cells, did they pass the sequencing quality control as stated on lines 208-209?

      The reviewer is right that microglia should actually not end up in the sample when using the gfap:mCherry line. However, microglia always displayed a certain level of autofluorescence in our experimental set-up (possibly because they may have ingested some cell debris), which may have contributed to their presence in the FACS samples. In contrast to the reviewer, we were not concerned about this ‘contamination’, because the microglia could be easily identified and sorted out using bioinformatics. This is supported by the presented supplementary figure in which microglia separate from the core of clusters containing Müller glia and Müller gliaderived cells in the UMAP of the full dataset (Figure 2-figure supplement 1).

      We also acknowledge that ‘low quality cells’ is not an appropriate term for cells in the cluster expressing ribosomal mRNAs at high levels, as ribosomal enrichment actually does not give any information concerning their quality. We referred to them as ‘low quality’ because the enrichment in ribosomal transcripts masks their identity considerably. To correct this, we now renamed cells in this cluster descriptively as ‘ribosomal gene-enriched’ cells (Figure 2-figure supplement 1, line: 226).

      4) Fig. 2: please list in the text or fig legend the specific genes used to identify each cell cycle state. Why is cluster 3 considered a reactive Muller population when expressing S phase markers and PCNA? These features seem to distinguish cluster 3 from 4 and may suggest cluster 3 is a progenitor population. Further explanation is necessary to understand the assignments.

      Information about the specific genes used to identify each cell cycle state is provided in the paragraph “Bioinformatic analysis” (lines: 925-934) in the Materials and Methods section. We considered listing all the markers in either the results or the figure legends as well, but decided against it, as it impairs readability in our opinion. Nevertheless, we have now highlighted also in the results (line: 261) that the list of cell cycle markers is available in the Materials and Methods section.

      We understand the reviewer´s point that cluster 3 represents progenitors and not Müller glia, when PCNA expression is considered as a sole marker of progenitors or of Müller glia reprogrammed to a progenitor state (Hoang et al., 2020). However, we disagree with this view for three reasons. First, although PCNA is used as a marker of Müller glia reprogrammed to a progenitor state and of progenitors in Hoang et al., 2020, it should be noted that PCNA-positive, Müller glia cells are present in the central retina already in uninjured conditions, when regeneration-associated, Müller glia-derived progenitors are not detectable. Second, cluster 3 is evident only at 44 hpl, a time point at which Müller glia cells are about to divide or have undergone their first and only cell division. In this regard, we would like to refer to the discussion about Müller glia and Müller glia-derived progenitors as distinct populations in Lenkowski and Raymond, 2014. Third, we have performed in situ hybridization for starmaker (stm), a marker gene highly specific for cells in cluster 3 (supplementary files 1 and 3), combined with immunohistochemistry for GFAP and PCNA. The results of the staining are depicted in a new Figure 3-figure supplement 2. In strong agreement with our sequencing results, we observe stm expression only at 44 hpl, whereas no signal is detected in the uninjured as well as 4 and 6 dpl retina (Figure 3- figure supplement 2). Virtually all stm-positive cells at 44 hpl are also PCNA- and GFAP-double positive cells displaying a clear Müller glia morphology (Figure 3- figure supplement 2). Hence, we interpret cells in cluster 3 as reactive Müller glia, indicating that pcna can be used as a marker of progenitors, but not exclusively of progenitors, prevalently at later stages. At 44 hpl, Müller glia express pcna in order to undergo asymmetric cell division giving rise to the renewed Müller glia and the multipotent progenitor that will continue dividing.

      5) I am confused by the crlf1a scRNAseq data indicating it is associated with proliferating PCNA+ reactive Muller glia Cluster 3 and PCNA- reactive Muller glia Cluster4 at 44 hpl (Fig. 3), yet in Fig. 4 crlf1a in situ signal is exclusively associated with proliferating Muller glia at 44 hpl. Why don't we observe the crlf1a+/PCNA- cell population?

      We highlight that crlf1a expression is actually detected also at 4 dpl (Fig. 3). We also note that immunofluorescence in Fig 3. shows crlf1a mRNA and PCNA protein, whereas single cell RNA sequencing detects crlf1a and pcna transcripts. In this context, it is possible that crlf1a-, PCNAdouble positive cells detected at 4 dpl are still positive for the PCNA protein, but no longer express the pcna transcript. Double in situ hybridization for pcna and crlf1a would be needed to fully address whether crlf1a-positive cells are still pcna-positive at 4 dpl. It is also possible that crlf1a-, GFAP-double positive, PCNA-negative Müller glia are fewer and only masked in the crowd of crlf1a-, PCNA-double positive, GFAP-negative progenitors at 4 dpl (Raymond et al., 2006). We amended the discussion section with this information (lines: 634-654).

      6) scRNAseq cluster 3 is a proliferating population that is assigned "reactive Muller glia", whereas cluster 5 is assigned Muller glia/progenitor and in the Discussion referred to as MG about to go or already underwent asymmetric division to generate a progenitor (lines 568-571). I don't understand why cluster 3 is not referred to as the one harboring reactive MG/progenitors that underwent or are undergoing asymmetric cell division - The timing is right, as are the markers.

      We would like to refer the reviewer to the discussion in point 4, including the changes we introduced in the Materials and Methods (Lines 925-934). As mentioned above, we do not agree that PCNA alone represents an exclusive marker of progenitors, but is rather a marker of cells undergoing proliferation. Moreover, we note that Müller glia first and only division occurs between 31 and 48 hpl. Finally, as mentioned above, expression of stm is a unique marker for cluster 3, which is only evident at 44 hpl, but not of cluster 5, which is evident at 4 dpl.

      It seems cluster 5 might better fit the amplifying progenitor stage where some MG markers are retained but diluted by cell division. Please clarify the reasoning behind the labeling of this cluster. It is not clear why this cluster has to contain self-renewed Muller glia - why wouldn't these Muller cells partition to quiescent MG clusters 1 and 2 or reactive Muller glia in clusters 3 and 4?

      We partially agree with the reviewer that cluster 5 might better fit the amplifying progenitor state, and this is why we indicate this cluster as a “crossroad in the trajectory” in the discussion (lines: 613-631). However, we cannot entirely exclude that cells in cluster 5 contain selfrenewed Müller glia (differential gene expression analysis highlights glial markers too, see Figure 3A, supplementary file 6). Cells that we interpret as self-renewing Müller glia do not partition back to quiescent Müller glia (cluster 1 and 2) because they are on the way to be quiescent Müller glia again, yet they did not reach that point, maybe due to sampling reasons. Unfortunately, our short-term lineage tracing strategy ceases at 6 dpl. We also speculate in the discussion (lines: 679-682) that if we had sampled at later time points (e.g. at 14 dpl), we might have been able to detect the density of the cells in the glial area moving back to clusters 1 or 2 (cell density plots, Figure 2B).

      I also have trouble understanding cluster 4's assignment. The Discussion states it represents cells at the crossroad of glial and neurogenic trajectory containing self-renewed Muller glia as well as first-born MG-derived progenitors. However, it is populated by cells after 44 hpl (Fig. 2B) which is when reactive Muller glia are detected and lacks proliferative markers.

      We think that there is a misunderstanding here. We never refer to cluster 4 as a crossroad in the glial and neurogenic trajectory. We state that cluster 5 is actually the crossroad between the two trajectories (line 629). We further propose that self-renewed MG close the cycle via late reactive MG (cluster 4) and return into non-reactive Müller glia (clusters 1 and 2, red, dashed line in Figure 10) (now described in lines 631-633). The cell density plots support the direction of the cycle closing towards non-reactive Müller glia, in particular at 4 and 6 dpl (Figure 2B).

      Might cluster 4 represent a population of reactive MG remaining at 4 dpl that never entered the cell cycle and therefore would be devoid of Muller glia-derived progenitors?

      As stated in the manuscript, we actually think that marker expression as well as the cell density plots support our assignment of cluster 4 to represent self-renewed Müller glia closing the cycle to non-reactive Müller glia. Our assignment also fits well with the expected events following asymmetric cell division. However, as we cannot rule out the reviewer´s entire idea, we included the suggestion in the updated discussion (lines 651-654).

      7) Results, lines 163-164; Please provide a reference for "..... consistent with the previously described....."

      We thank the reviewer for this observation and we added the appropriate references (Fimbel et al., 2007; Lenkowski and Raymond, 2014; Thummel et al., 2008) in the updated version of the manuscript (lines: 171-172).

      Reviewer #2 (Recommendations For The Authors):

      Overall, this very thorough study provides interesting and unexpected results. The published data set will be useful for many subsequent studies. I have only a few remarks that the authors may consider discussing. Their cluster analysis revealed most of the expected cell clusters with some interesting surprises. One relates to photoreceptors where the authors describe well-separated clusters for red and green cones, while rods, UV and blue cones do not form clusters. For rods, this is discussed, but I miss a brief discussion on the "missing" cone subtypes.

      We thank the reviewer for the insightful comments. It is correct that we indeed detect only red and blue cones, as indicated by their expression of red-sensitive opsin gene (opn1lw2) and the blue-sensitive opsin gene (opn1sw2), respectively. It is possible that missing cone subtypes are born later than 6 dpl. As the reviewer suggested, we amended the discussion and added information about the missing cone subtypes (lines: 724-726).

      I am also intrigued by the two, quite separated amacrine cell clusters, while bipolar cells cluster in one cluster, without separation in (say) ON and OFF bipolar cells. This may also merit a discussion. What are their ideas on the small and quite separated amacrine cell cluster (cluster 14).

      Bipolar cells in cluster 15 are very sparse in our dataset, with only 40 cells in total. Hence, the sample size might be too small to be separated into ON and OFF subtypes. Alternatively, cells might be still immature, as we use 6 dpl as our latest sampled time point. Concerning cells in cluster 14, we think they are starburst amacrine cells, as indicated by their simultaneous expression of gad1b and chata (Figure 8-figure supplement 2B), which is a characteristic feature of starburst amacrine cells in mouse (O´Malley et al., 1992). We added this observation in the discussion (lines: 706-712).

    2. eLife assessment

      Müller glial cells of the zebrafish retina can differentiate into all neural cell classes following injury, providing full regenerative capabilities of the zebrafish retina. This valuable study presents a description of transcriptional changes of Müller glia cells in the adult and regenerating retina using single-cell RNA sequencing. The overall evidence supporting the main claims of the authors is solid.

    3. Reviewer #1 (Public Review):

      Muller glia function as retinal stem cells in the adult zebrafish retina. Following retinal injury, Muller glia are reprogrammed (reactive Muller glia), and then divide to produce a progenitor that amplifies and differentiates into retinal neurons. Previous scRNAseq analysis used total retinal RNA from uninjured and injured retinas isolated at time points when Muller glia are quiescent, being reprogrammed, and proliferating to reveal genes and gene regulatory networks underlying these events (Hoang et al., 2020). The manuscript by Celotto et al., used double transgenic zebrafish that allow them to purify by FACS quiescent and reactive Muller glia, Muller glia-derived progenitors, and their differentiating progeny at different times post retinal damage. RNA from these cell populations was used in scRNAseq studies to identify the transcriptomes associated with these cell populations. Importantly, they report two quiescent and two reactive Muller glia populations. These results raise the interesting possibility that Muller glia are a heterogeneous population whose members may exhibit different regenerative responses to retinal injury. However, without further experimentation, the validity and significance of this result remain unclear. In addition to putative Muller cell heterogeneity, Celotto et al., identified multiple progenitor classes, some of which are specified to regenerate specific retinal neuron types. Because of its focus on Muller glia and Muller glia-derived progenitors at mid to late stages of retina regeneration, this new scRNAseq data will be a useful resource to the research community for further interrogation of gene expression changes underlying retina regeneration.

    4. Reviewer #2 (Public Review):

      In this publication, the authors provide a comprehensive trajectory of transcriptional changes in Müller glia cells (MG) in the regenerating retina of zebrafish. These resident glia cells of the retina can differentiate into all neural cell classes following injury, providing full regenerative capabilities of the zebrafish retina. The authors achieved this by using single-cell RNA sequencing of Müller glia, progenitors, and regenerated progeny, comparing uninjured and light-lesioned retinae.

      The isolation strategy involves using two transgenic strains, one labelling dividing cells and their immediate progeny, and the other Müller glia cells. This allowed them to separate injury-induced proliferating and non-reactive Müller glia cells. Subsequent single-cell transcriptomics showed that MG could be non-reactive under both uninjured and lesioned conditions and reactive MG gives rise to a cell population that both replenishes the pool of MG and replenishes neurogenic retinal precursor cells. These precursor cells produce regenerated neurons in a developmental time series with ganglion cells being born first and bipolar cells being born last. Interestingly hybrid populations have been detected that co-share characteristics of photoreceptor precursors and reactive glia.

      This is the first study of its kind following the dynamic changes of transcriptional changes during retinal regeneration, providing a rich data source of genes involved in regeneration. Their finding of transcriptionally separable MG populations is intriguing.

      This study focuses on the light-lesioned retina and leaves open the question if the observed transcriptional trajectories of regenerating neurons are generalizable to other lesion models (e.g. chemical or mutational lesions) or are specific to the light-damaged retina.

    1. Author Response

      The following is the authors’ response to the original reviews.

      The Authors wish to thank the Reviewers for their detailed and insightful comments. By properly addressing these critiques, we sincerely believe our finished product will be substantially improved and provide greater insight to the academic community.

      Both Reviewers noted the importance of identifying the limitations of our study with particular emphasis on embedded implant heating due to switching gradient coils. Understanding the limitations of any model and/or simulation process is critical when adopting its use, especially when estimating the safety of embedded devices. For this reason, we have included the following text and corresponding references in our Discussion section:

      While the workflow presented herein establishes a validated approach to estimate RF heating due to the presence of a passive implant within a human subject undergoing an MR procedure, certain limitations and proper use stipulations of this methodology should be identified. These include:

      1) The approach of embedding a given passive implant must be carefully considered and supervised by an orthopaedic subject matter expert, preferably an orthopaedic surgeon. While the procedures described above focus on insertion and registration of an implant to make it numerically suitable for simulation, relevant anatomic and physiological considerations must also be addressed to ensure a physically realistic and appropriate result. This will enable a proper simulated fit and no empty spaces or unintended tissue deformations.

      2) Temperature changes presented are due only to RF energy deposition. The results do not take into account the impact of low-frequency induction heating of metallic implants naturally caused by the switching gradient coils. Important work on this subject matter has recently been reported in [21],[22],[23],[24],[25],[26],[27]. Unless an orthopaedic implant has a loop path, heating due to gradient fields is typically less than heating due to RF energy deposition. The present testbed would be applicable to the induction heating of implants (and the expected temperature rise of nearby tissues), after switching from Ansys HFSS (the full wave electromagnetic FEM solver) to Ansys Maxwell (the eddy current FEM solver). Two examples of this kind have already been considered in [25],[45].

      3) The procedures presented in this work have been based on the response of a single human model of advanced age and high morbidity.

      4) Finally, validation was achieved using available published data [42]-[44] and relies upon the legitimacy and veracity of that data. Coil geometry, power settings, and other relevant parameters were taken explicitly from these sources and modeled to enable a faithful comparison.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Heitmann et al introduce a novel method for predicting the potential of drug candidates to cause Torsades de Pointes using simulations. Despite the fact that a multitude of such methods have been proposed in the past decade, this approach manages to provide novelty in a way that is potentially paradigm-shifting. The figures are beautiful and manage to convey difficult concepts intuitively.

      Strengths:

      (1) Novel combination of detailed mechanistic simulations with rigorous statistical modeling

      (2) A method for predicting drug safety that can be used during drug development (3) A clear explication of difficult concepts.

      Weaknesses:

      (1) In this reviewer's opinion, the most important scientific issue that can be addressed is the fact that when a drug blocks multiple channels, it is not only the IC50 but also the Hill coefficient that can differ. By the same token, two drugs that block the same channel may have identical IC50s but different Hill coefficients. This is important to consider since concentration-dependence is an important part of the results presented here. If the Hill coefficients were to be significantly different, the concentration- dependent curves shown in Figure 6 could look very different.

      See our response below.

      (2) The curved lines shown in Figure 6 can initially be difficult to comprehend, especially when all the previous presentations emphasized linearity. But a further issue is obscured in these plots, which is the fact that they show a two-dimensional projection of a 4dimensional space. Some of the drugs might hit the channels that are not shown (INaL & IKs), whereas others will not. It is unclear, and unaddressed in the manuscript, how differences in the "hidden channels" will influence the shapes of these curves. An example, or at least some verbal description, could be very helpful.

      See our response below.

      Reviewer #1 (Recommendations For The Authors):

      The manuscript is generally well-written (with one important exception, see below). The manuscript can be improved with a few suggested modifications, ordered from most important to least important.

      (1) In this reviewer's opinion, the most important scientific issue that the authors need to address is the fact that when a drug blocks multiple channels, it is not only the IC50 but also the Hill coefficient that can differ. By the same token, two drugs that block the same channel may have identical IC50s but different Hill coefficients. This is important to consider since concentration-dependence is an important part of the results presented here.

      In a recent study (Varshneya et al, CPT PSP 2021 (PMID: 33205613)) they originally ran simulations with Hill coefficients of 1 for all the 4 drugs and 7 channels, then re-ran the simulations with differing Hill coefficients. The results were quantitatively quite different than what was originally obtained, even though the overall trends were identical. A look at the table provided in that paper's supplement shows that the estimated Hill coefficients range from 0.5 to 1.9, which is a pretty wide range.

      In this case, I don't think the authors should re-run the entire analysis. That would require entirely too much work and potentially detract from the elegant presentation of the manuscript in its current form. Although I haven't looked at the Llopis-Lorente dataset recently, I doubt that reliable Hill coefficients have been obtained for all 105 drugs. However, the Crumb et al dataset (PMID: 27060526) does provide this information for 30 drugs.

      Perhaps the authors could choose an example of two drugs that affect similar channels but with differences in the estimated Hill coefficients. Or even a carefully-designed hypothetical example could be of value. At the very least, Hill coefficients need to be mentioned as a limitation, but this would be stronger if it were coupled with at least some novel analyses.

      We fixed the Hill coefficients to h=1 because there is no evidence for co-operative drug binding in the literature that would require coefficients other than one. There is also the practical matter that only 17 of the 109 drugs in the dataset have a complete set of Hill coefficients. We have revised the Methods (Drug datasets) to make these justifications explicit:

      Lines 560-566: “… We also fixed the Hill coefficients at h = 1 because (i) there is no evidence for co-operative drug binding in the literature, and thus no theoretical justification for using coefficients other than one; (ii) only 17 of the 109 drugs in the dataset had a complete set of Hill coefficients (hCaL, hKr, hNaL, hKs) anyway. …”

      Out of interest, we re-ran our analysis using only those n=17 drugs (Amiodarone, Amitriptyline, Bepridil, Chlorpromazine, Diltiazem, Dofetilide, Flecainide, Mibefradil, Moxifloxacin, Nilotinib, Ondansetron, Quinidine, Quinine, Ranolazine, Saquinavir, Terfenadine and Verapamil). When the Hill coefficients were fixed at h=1, the prediction accuracy was 88.2% irrespective of the dosage (Author response image 1). When we used the estimated (free) Hill coefficients, the prediction accuracy remained unchanged (88.2%) for all doses except the lowest (1x to 2x) where it dropped to 82.4%. We concluded that using the Hill coefficients from the dataset made little difference to the results.

      Author response image 1.

      (2) I initially had a hard time understanding the curved lines shown in Figure 6 when all the previous presentations emphasized linearity. After thinking for a while, I was able to get it, but there was a further issue that I still struggle with. That is the fact that the plots all show a two-dimensional projection of a 4-dimensional space. Some of the drugs might hit the channels that are not shown (INaL & IKs), whereas others will not. How will differences in the "hidden channels" influence the shapes of these curves? An example, or at least some verbal description, could be very helpful.

      We omitted GKs and GNaL from Figure 6 because they added little to the story. Those “hidden” channels operate in the same manner as GKr and GNaL. They are shown in Supplementary Dataset S1. We have included more explicit references to the Supplementary in both the main text and the caption of Figure 6. We have also rewritten the section on ‘The effect of dosage on multi-channel block’ (lines 249-268) to better convey that the drug acts in four dimensions.

      (3) I also struggled a bit with Figure 3 and the section "Drug risk metric." What made this confusing was the PQR notation on the figure and the equations represented as A and B. Can these be presented in a common notation, or can the relationship be defined?

      We have replaced the PQR notation in Figure 3A with vector notation A and B to be consistent with the equations.

      Also in Figure 3B, I was unclear about the units on the x-axis. Is each step (e.g. from 0 to 1) the same distance as a single log unit along the abscissa or ordinate in Figure 3A?

      Yes it is. We have revised the caption for Figure 3B to explain it better.

      (4) The manuscript manages to explain difficult concepts clearly, and it is generally wellwritten. The important exception, however, is that the manuscript contains far too many sentence fragments. These often occur when the authors explain a difficult concept, then follow up with something that is essentially "and this in addition" or "with the exception of this."

      Lines 220-223: "In comparison, Linezolid is an antibacterial agent that has no clinical evidence of Torsades (Class 4) even though it too blocks IKr. Albeit less than it blocks ICaL (Figure 5A, right)."

      Lines 242-245: "Conversely, Linezolid shifts the population 1.18 units away from the ectopic regime. So only 0.0095% of those who received Linezolid would be susceptible. A substantial drop from the baseline rate of 0.93%."

      There are several others that I didn't note, so the authors should perform a careful copy edit of the entire manuscript.

      Thank you. We have remediated the fragmented sentences throughout.

      Reviewer #2 (Public Review):

      Summary:

      In the paper from Hartman, Vandenberg, and Hill entitled "assessing drug safety, by identifying the access of arrhythmia and cardio, myocytes, electro physiology", the authors, define a new metric, the axis of arrhythmia" that essentially describes the parameter space of ion channel conductance combinations, where early after depolarization can be observed.

      Strengths:

      There is an elegance to the way the authors have communicated the scoring system. The method is potentially useful because of its simplicity, accessibility, and ease of use. I do think it adds to the field for this reason - a number of existing methods are overly complex and unwieldy and not necessarily better than the simple parameter regime scan presented here.

      Weaknesses:

      The method described in the manuscript suffers from a number of weaknesses that plague current screening methods. Included in these are the data quality and selection used to inform the drug-blocking profile. It's well known that drug measurements vary widely, depending on the measurement conditions.

      We agree and have added a new section to describe these limitations, as follows:

      Lines 467-478: Limitations. The method was evaluated using a dataset of drugs that were drawn from multiple sources and diverse experimental conditions (LlopisLorente et al., 2020). It is known that such measurements differ prominently between laboratories and recording platforms (Kramer et al., 2020). Some drugs in the dataset combined measurements from disparate experiments while others had missing values. Of all the drugs in the dataset, only 17 had a complete set of IC50 values for ICaL, IKr, INaL and IKs. The accuracy of the predictions are therefore limited by the quality of the drug potency measurements.

      There doesn't seem to be any consideration of pacing frequency, which is an important consideration for arrhythmia triggers, resulting from repolarization abnormalities, but also depolarization abnormalities.

      It is true that we did not consider the effect of pacing frequency. We have included this in the limitations:

      Lines 479-485: The accuracy of the axis of arrhythmia is likewise limited by the quality of the biophysical model from which it is derived. The present study only investigated one particular variant of the ORd model (O’Hara et al., 2011; KroghMadsen et al., 2017) paced at 1 Hz. Other models and pacing rates are likely to produce differing estimates of the axis.

      Extremely high doses of drugs are used to assess the population risk. But does the method yield important information when realistic drug concentrations are used?

      Yes it does. The drugs were assessed across a range of doses from 1x to 32x therapeutic dose (Figure 8A). The prediction accuracy at low doses is 88.1%.

      In the discussion, the comparison to conventional approaches suggests that the presented method isn't necessarily better than conventional methods.

      The comparison is not just about accuracy. Our method achieves the same results at greatly reduced computational cost without loss of biophysical interpretation. We emphasise this in the Conclusion:

      Lines 446-465: Conclusion. Our approach resolves the debate between model complexity and biophysical realism by combining both approaches into the same enterprise. Complex biophysical models were used to identify the relationship between ion channels and torsadogenic risk — as it is best understood by theory. Those findings were then reduced to a simpler linear model that can be applied to novel drugs without recapitulating the complex computer simulations. The reduced model retains a bio-physical description of multi-channel drug block, but only as far as necessary to predict the likelihood of early after-depolarizations. It does not reproduce the action potential itself. Our approach thus represents a convergence of biophysical and simple models which retains the essential biophysics while discarding the unnecessary details. We believe the benefits of this approach will accelerate the adoption of computational assays in safety pharmacology and ultimately reduce the burden of animal testing.

      In conclusion, I have struggled to grasp the exceptional novelty of the new metric as presented, especially when considering that the badly needed future state must include a component of precision medicine.

      Safety pharmacology has a different aim to precision medicine. The former concerns the population whereas the latter concerns the individual. The novelty of our metric lies in reducing the complexity of multi-channel drug effects to a linear model that retains a biophysical interpretation.

      Reviewer #2 (Recommendations For The Authors):

      A large majority of drugs have more complex effects than a simple reduction and channel conductance. Some of these are included in the 109 drugs shown in Figure 7. An example is ranolazine, which is well known to have potent late sodium channel blocking effects - how are such effects included in the model as presented? I think at least suggesting how the approach can be expanded for broader applicability would be important to discuss.

      Our method does consider the simultaneous effect of the drug on multiple ion channels, specifically the L-type calcium current (ICaL), the delayed rectifier potassium currents (IKr and IKs), and the late sodium current (INaL). In the case of ranolazine (class 3 risk), the dose-responses for all four ion channels, based on IC50s published in Llopis-Lorente et al. are given in Supplementary Dataset S1.

      The response curves in Author response image 2 show that in this dataset, ranolazine blocks IKr and INaL almost equally - being only slightly less potent against IKr. There are two issues to consider here that potentially contribute to ranolazine being misclassified as pro-arrhythmic. First, the cell model is more sensitive to block of IKr than INaL. As a result, in the context of an equipotent drug, the prolonging effect of IKr block outweighs the balancing effect of INaL block, resulting in a pro-arrhythmic risk score. Second, the potency of IKr block in this dataset may be overestimated which in turn exaggerates the risk score. For example, measurements of ranolazine block of IKr from our own laboratory (Windley et al J Pharmacol Toxicol 87, 99–107, 2017) suggest that the IC50 of IKr is higher (35700 nM) than that reported in the LlopisLorente dataset (12000 nM). If this were taken into account, there would be less block of IKr relative to INaL, resulting in a safer risk score.

      Author response image 2.