10,000 Matching Annotations
  1. Sep 2025
    1. Reviewer #2 (Public review):

      Summary:

      Gao et al. investigated the change of aggression strategies by the social experience and its possible biological significance by using Drosophila. Two modes of inter-male aggression in Drosophila are known: lunging, high-frequency but weak mode, and tussling, low-frequency but more vigorous mode. Previous studies have mainly focused on the lunging. In this paper, the authors developed a new behavioral experiment system for observing tussling behavior and found that tussling is enhanced by group rearing, while lunging is suppressed. They then searched for neurons involved in the generation of tussling. Although olfactory receptors named Or67d and Or65a have previously been reported to function in the control of lunging, the authors found that these neurons do not function in the execution of tussling and another olfactory receptor, Or47b, is required for tussling, as shown by the inhibition of neuronal activity and the gene knockdown experiments. Further optogenetic experiments identified a small number of central neurons pC1[SS2] that induce the tussling specifically. These neurons express doublesex (dsx), a sex-determination factor, and knockdown of dsx strongly suppresses the induction of tussling. In order to further explore the ecological significance of the aggression mode change in group-rearing, a new behavioral experiment was performed to examine the territorial control and the mating competition. And finally, the authors found that differences in the social experience (group vs. solitary rearing) are important in these biologically significant competitions. These results add a new perspective to the study of aggression behavior in Drosophila. Furthermore, this study discusses an interesting general model in which the social experience modified behavioral changes play a role in reproductive success.

      Strengths:

      A behavioral experiment system that allows stable observation of tussling, which could not be easily analyzed due to its low-frequency, would be very useful. The experimental setup itself is relatively simple, just addition of a female to the platform, so it should be applicable to future research. The finding about the relationship between the social experience and the aggression mode change is quite novel. Although the intensity of aggression changes with the social experience was already reported in several papers (Liu et al., 2011 etc), the fact that the behavioral mode itself changes significantly has rarely been addressed, and is extremely interesting. The identification of sensory and central neurons required for the tussling makes appropriate use of the genetic tools and the results are clear. A major strength of this study in the neurobiology is the finding that another group of neurons (Or47b-expressing olfactory neurons and pC1[SS2] neurons), distinct from the group of neurons previously thought to be involved in low-intensity aggression (i.e. lunging), function in the tussling behavior. Furthermore, the results showing that the regulation of aggression by pC1[SS2] neurons is based the function of the dsx gene will bring a new perspective to the field. Further investigation of the detailed circuit analysis is expected to elucidate the neural substrate of the conflicting between the two aggression modes. The experimental systems examining the territory control and the reproductive competition in Fig. 6 are novel and have advantages in exploring their biological significance. It is important to note that, in addition to showing the effects of age and social experience on territorial and mating behaviors, the authors suggested that an altered fighting strategy has effects with respect to these behaviors.

      Weaknesses:

      New experimental paradigm in Fig. 6 is quite useful, but as the authors mentioned, still the future investigations are needed to reveal a direct relationship between aggression strategies and reproductive success.

    2. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This work addresses an important question in the field of Drosophila aggression and mating. Prior social isolation is known to increase aggression in males, manifesting as increased lunging, which is suppressed by group housing (GH). However, it is also known that single housed (SH) males, despite their higher attempts to court females, are less successful. Here, Gao et al., develop a modified aggression assay to address this issue by recording aggression in Drosophila males for 2 hours, with a virgin female immobilized by burying its head in the food. They found that while SH males frequently lunge in this assay, GH males switch to higher intensity but very low frequency tussling. Constitutive neuronal silencing and activation experiments implicate cVA sensing Or67d neurons in promoting high frequency lunging, similar to earlier studies, whereas Or47b neurons promote low frequency but higher intensity tussling. Optogenetic activation revealed that three pairs of pC1SS2 neurons increase tussling. Cell-type-specific DsxM manipulations combined with morphological analysis of pC1SS2 neurons and side-by-side tussling quantification link the developmental role of DsxM to the functional output of these aggression-promoting cells. In contrast, although optogenetic activation of P1a neurons in the dark did not increase tussling, thermogenetic activation under visible light drove aggressive tussling. Using a further modified aggression assay, GH males exhibit increased tussling and maintain territorial control, which could contribute to a mating advantage over SH males, although direct measures of reproductive success are still needed.

      Strengths:

      Through a series of clever neurogenetic and behavioral approaches, the authors implicate specific subsets of ORNs and pC1 neurons in promoting distinct forms of aggressive behavior, particularly tussling. They have devised a refined territorial control paradigm, which appears more robust than earlier assays using a food cup (Chen et al., 2002). This new setup is relatively clutter-free and could be amenable to future automation using computer vision approaches. The updated Figure 5, which combines cell-type-specific developmental manipulation of pC1SS2 neurons with behavioral output, provides a link between developmental mechanisms and functional aggression circuits. The manuscript is generally well written, and the claims are largely supported by the data.

      Thank you for the precise summary of the manuscript and acknowledgment of the novelty and significance of the study.

      Weakness:

      Although most concerns have been addressed, the manuscript still lacks a rigorous, objective method for quantifying lunging and tussling. Because scoring appears to have been done manually and a single lunge in a 30 fps video spans only 2-3 frames, the 0.2 s cutoff seems arbitrary, and there are no objective criteria distinguishing reciprocal lunging from tussling. Despite this, the study offers valuable insights into the neural and behavioral mechanisms of Drosophila aggression.

      Thank you for this comment. The duration of each lunge was measured by analyzing the videos frame by frame—from the frame before the initiation of the lunge to the frame after its completion—resulting in an average span of 3–5 frames. Given a frame rate of 30 fps, this corresponds to approximately 0.1–0.17 seconds. We acknowledge that there are certain limitations for manually quantifying the two types of aggressive behaviors, which has now been stated in the newly added “Limitations of the Study” section in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      Gao et al. investigated the change of aggression strategies by the social experience and its biological significance by using Drosophila. Two modes of inter-male aggression in Drosophila are known: lunging, high-frequency but weak mode, and tussling, low-frequency but more vigorous mode. Previous studies have mainly focused on the lunging. In this paper, the authors developed a new behavioral experiment system for observing tussling behavior and found that tussling is enhanced by group rearing, while lunging is suppressed. They then searched for neurons involved in the generation of tussling. Although olfactory receptors named Or67d and Or65a have previously been reported to function in the control of lunging, the authors found that these neurons do not function in the execution of tussling and another olfactory receptor, Or47b, is required for tussling, as shown by the inhibition of neuronal activity and the gene knockdown experiments. Further optogenetic experiments identified a small number of central neurons pC1[SS2] that induce the tussling specifically. These neurons express doublesex (dsx), a sex-determination factor, and knockdown of dsx strongly suppresses the induction of tussling. In order to further explore the ecological significance of the aggression mode change in group-rearing, a new behavioral experiment was performed to examine the territorial control and the mating competition. And finally, the authors found that differences in the social experience (group vs. solitary rearing) and the associated change in aggression strategy are important in these biologically significant competitions. These results add a new perspective to the study of aggression behavior in Drosophila. Furthermore, this study proposes an interesting general model in which the social experience modified behavioral changes play a role in reproductive success.

      Strengths:

      A behavioral experiment system that allows stable observation of tussling, which could not be easily analyzed due to its low-frequency, would be very useful. The experimental setup itself is relatively simple, just the addition of a female to the platform, so it should be applicable to future research. The finding about the relationship between the social experience and the aggression mode change is quite novel. Although the intensity of aggression changes with the social experience was already reported in several papers (Liu et al., 2011 etc), the fact that the behavioral mode itself changes significantly has rarely been addressed, and is extremely interesting. The identification of sensory and central neurons required for the tussling makes appropriate use of the genetic tools and the results are clear. A major strength of this study in neurobiology is the finding that another group of neurons (Or47b-expressing olfactory neurons and pC1[SS2] neurons), distinct from the group of neurons previously thought to be involved in low-intensity aggression (i.e. lunging), function in the tussling behavior. Furthermore, the results showing that the regulation of aggression by pC1[SS2] neurons is based on the function of the dsx gene will bring a new perspective to the field. Further investigation of the detailed circuit analysis is expected to elucidate the neural substrate of the conflict between the two aggression modes. The experimental systems examining the territory control and the reproductive competition in Fig. 6 are novel and have advantages in exploring their biological significance. It is important to note that in addition to showing the effects of age and social experience on territorial and mating behaviors, the authors experimentally demonstrated that altered fighting strategy has effects with respect to these behaviors.

      Thank you for your precise summary of our study and being very positive on the novelty and significance of the study.

      Reviewer #3 (Public review):

      In this revised manuscript, Gao et al. presented a series of well-controlled behavioral data showing that tussling, a form of high-intensity fighting among male fruit flies (Drosophila melanogaster) is enhanced specifically among socially experienced and relatively old males. Moreover, results of behavioral assays led authors to suggest that increased tussling among socially experienced males may increase mating success. They also concluded that tussling is controlled by a class of olfactory sensory neurons and sexually dimorphic central neurons that are distinct from pathways known to control lunges, a common male-type attack behavior.

      A major strength of this work is that it is the first attempt to characterize behavioral function and neural circuit associated with Drosophila tussling. Many animal species use both low-intensity and high-intensity tactics to resolve conflicts. High-intensity tactics are mostly reserved for escalated fights, which are relatively rare. Because of this, tussling in the flies, like high-intensity fights in other animal species, have not been systematically investigated. Previous studies on fly aggressive behavior have often used socially isolated, relatively young flies within a short observation duration. Their discovery that 1) older (14-days old) flies tend to tussle more often than younger (2 to 7-days-old) flies, 2) group-reared flies tend to tussle more often than socially isolated flies, and 3) flies tend to tussle at later stage (mostly ~15 minutes after the onset of fighting), are the result of their creativity to look outside of conventional experimental settings. These new findings are key for quantitatively characterizing this interesting yet under-studied behavior.

      Newly presented data have made several conclusions convincing. Detailed descriptions of methods to quantify behaviors help understand the basis of their claims by improving transparency. However, I remain concerned about authors' persistent attempt to link the high intensity aggression to reproductive success. The authors' effort to "tone down" the link between the two phenomena remains insufficient. There are purely correlational. I reiterate this issue because the overall value of the manuscript would not change with or without this claim.

      Thank you for acknowledging the novelty and significance of the study. Regarding the relationship you mentioned between high-intensity aggression and reproductive success, we further toned down the statement between them throughout the manuscript in the revised manuscript. We also modified the title to “Social Experience Shapes Fighting Strategies in Drosophila”. In addition, we now added a ‘Limitations of the Study’ section to clearly state the correlation between tussling and reproductive success.

      Reviewer #1 (Recommendations for the authors):

      If possible, mention the EM-connectome data showing the minimal interneuronal path from Or47b ORNs to pC1SS2 neurons (even if derived from the female connectome), which can strengthen the model of parallel sensory-central pathways.

      Thank you for this comment. According to data from the EM connectome, connecting Or47b ORNs to pC1d neurons requires at least two intermediate neurons. An example minimal pathway is: ORN_VA1v (L) → AL-AST1 (L) → PLP245 (L) → pC1d (R). We have added this point in the Discussion section of the revised manuscript.

      I'm not convinced that labeling lunges as "gentle" combat behavior works, either in the abstract or elsewhere. While lunging is indeed a lower-intensity form of aggression compared to tussling, applying anthropomorphic descriptors risks misleading readers.

      Thank you for this comment. We now use “low-intensity” instead of “gentle” to describe lunging.

      In Materials & Methods, please cross-check all figure-panel references after the recent re-numbering (e.g. "Figure 5A6A" etc.).

      Thank you for this comment. We have thoroughly verified the figure panel references in the Materials & Methods section.

      Ensure that Table S1 is clearly cited in the main text where you first describe fly genotypes.

      Thank you for this comment. We have now cited Table S1 in the main text.

      There are multiple grammatical errors and typos throughout the manuscript. Please correct them. Some examples are below, but this is not an exhaustive list:

      Line 98-102 requires rephrasing as the results are already published and not being observed by the authors.

      Thank you for this comment. We have revised the manuscript to “we occasionally observed the high-intensity boxing and tussling behavior in male flies as previously reported (Chen et al., 2002; Nilsen et al., 2004), which….”

      line 116- lower not 'lowed'.

      Corrected.

      line 942 & 945- knock-down males not 'knocking down males'.

      Corrected. Thank you very much for these comments.

      Reviewer #2 (Recommendations for the authors):

      The authors have almost completely answered the major comments I have noted on the ver.1 manuscript: (1) They clearly show changes in fighting strategy in the territory control behavior experiment in Fig. 6-figure supplements. (2) A detailed description of how aggressive behavior is measured. Thus, I am convinced by this revision.

      Thank you for these comments that make the manuscript a better version.

      Furthermore, in Fig. 5, which examined the relationship of pC1[SS2] characteristics with the function of dsx, is a novel data and very interesting. I look forward to further developments.

      Thank you. We will continue to explore this part in our future study.

      However, one point still concerns me.

      Line 192: Although the authors describe it as "usage-dependent," the trans-Tango technique is essentially a postsynaptic cell-labeling technique. It is possible that the labeling intensity in postsynaptic cells increases from the change in expression levels of the Or47b gene due to GH. However, there is no difference in the expression level of the Or47b gene labeled by GFP between SH and GH. Therefore, we cannot conclude that the expression of the Or47b gene is increased by rearing conditions.

      The original paper on trans-TANGO (Talay et al., 2017) does not discuss the usage-dependency. A review of trans-synaptic labeling techniques (Ni, Front Neural Circuits. 2021) discusses that the increase in trans-TANGO signaling with aging may be related to synaptic strength, but there is no experimental evidence for this. In my opinion, the results in Figure 3-figure supplement 2 only weakly suggest that the increase in trans-TANGO signaling may be explained by an increase in synaptic strength due to group rearing.

      We appreciate the reviewer’s insightful comment regarding the interpretation of the trans-Tango signal. Indeed, the original trans-Tango study (Talay et al., 2017) does not claim that the method is usage-dependent. The observed increase in trans-Tango labeling with age, as reported in their supplemental figures, may reflect accumulation over time, potentially influenced by synaptic maturation or increased component expression. To avoid overstating our results, we have revised the relevant statement in the manuscript to remove the term "usage-dependent" and now describe the change in trans-Tango signal more cautiously.  

      Reviewer #3 (Recommendations for the authors):

      Below are the cases where their professed attempts to "tone down the statement" appear ignored:

      Lines 27-29:

      "Our findings... suggest how social experience shapes fighting strategies to optimize reproductive success".

      We have now revised the manuscript to “Our findings… suggest that social experience may shape fighting strategies to optimize reproductive success.”

      Lines 85-86:

      "... discover that this infrequent yet intense form of combat is... crucial for territory dominance and mating competition".

      We have now revised the manuscript to “…discover that this infrequent yet intense form of combat is enhanced by social enrichment, while the low-intensity lunging is suppressed by social enrichment.” 

      Lines 335-339:

      "Here, we found that... GH males tend to... increase the high-intensity tussling, which enhances their territorial and mating competition."

      We have removed “which enhances their territorial and mating competition” in the revised manuscript.

      Lines 343-344:

      "... presenting a paradox between social experience, aggression and reproductive success. Our result resolved this paradox..."

      We have now revised the manuscript to “...Our results provide an explanation for this paradox…”

      Lines 355-358:

      "Interestingly, we found that the mating advantage gained through social enrichment can even offset the mating disadvantage associated with aging, further supporting the vital role of shifting fighting strategies in experienced, aged males."

      We have removed “further supporting the vital role of shifting fighting strategies in experienced, aged males” in the revised manuscript.

      Lines 361-362:

      "These results separate the function of the two fighting forms and rectify out understanding of how social experiences regulate aggression and reproductive success."

      We have removed this sentence in the revised manuscript.

      Some may say that a speculative statement is harmless, but I think it indeed is harmful unless it is clearly indicated as a speculation. It is regrettable that authors remain reluctant to change their claim without providing any new supporting evidence. All three reviewers raised the same concern in the first round of review.

      We apologize for not making the speculative nature of the statement clearer in the previous version. In the revised manuscript, we have now explicitly rephrased sentences to only suggest a correlation but not a causal link between tussling and reproductive success.

      I have no choice but to keep my evaluation of the manuscript as "Incomplete" unless the authors thoroughly eliminate any attempt to link these two. This must go beyond changing a few words in the lines listed above.

      Thank you for this comment. In addition to the lines listed above, we carefully checked all statements regarding the correlation between fighting strategies and reproductive success throughout the full text. Furthermore, we have also added a “Limitations of the Study” section to address the shortcomings of this study in the revised manuscript.

      I do not have the same level of concern over the interpretation of Fig. 6A-C, because this is directly linked to aggressive interactions. Even if the socially isolated males do not engage in tussling, it is not a leap to assume that a different fighting tactic of socially experienced males can give them an advantage in defending a territory. To me, this is a sufficient ethological link with the observed behavioral change.

      Thank you for this insightful comment.

      The following are relatively minor, although important, concerns.

      I beg to differ over the authors' definition of "tussling". Supplemental movies S1 and S2 appear to include "tussling" bouts in which 2 flies lunging at each other in rapid succession, and supplemental movie S3 appears to include bouts of "holding", in which one fly holds the opponent's wings and shakes vigorously. These cases suggest that the definition of "tussling" as opposed to "lunging" has a subjective element. However, I would not delve on this matter further because it is impossible to be completely objective over behavioral classification, even by using a computational method. An important point is that the definition is applied consistently within the publication. I have no reason to doubt that this was not the case.

      Thank you for this comment. Since the analysis of tussling behavior was conducted manually, it is challenging to achieve complete objectivity. However, we made every effort to apply consistent criteria throughout the analysis. We have added a “Limitations of the Study” section in the revised manuscript to clearly state this caveat. We appreciate your understanding.

      Authors now state that "all tester flies were loaded by cold anesthesia" (lines 432-433). I would like to draw attention to the well-known fact that anesthesia, whether by ice or by CO2, are long known to affect fly's subsequent behaviors (for aggression, see Trannoy S. et al., Learn. Mem. 2015. 22: 64-68). It will be prudent to acknowledge the possibility that this handling method could have contributed to unusually high levels of spontaneous tussling, which has not been reported elsewhere before.

      Thank you for this comment. The increased tussling behavior observed in our study is unlikely due to cold anesthesia, as noted by Trannoy S. et al. (2015), cold anesthesia profoundly reduces locomotion and general aggressiveness in flies. We acknowledge that the use of cold anesthesia in behavioral experiments may have potential effects on aggression. To minimize this influence, we allowed the flies to recover and adapt for at least 30 minutes before behavioral recording. Moreover, both control and experimental groups were treated in exactly the same manner to ensure consistency.

      It is intriguing that pC1SS2 neurons are dsx+ but fru-. Authors convincingly demonstrated that these neurons are clearly distinct from the P1a neurons, a well-characterized hub for male social behaviors. It is possible that pC1SS2 neurons overlap with previously characterized dsx+ neurons that are important for male aggressions (measured by lunges), such as in Koganezawa et al., Curr. Biol. 2016 and Chiu et al., Cell 2020, a point authors could have explicitly raised.

      Thank you for this comment. We have added this point into the Discussion section of the revised manuscript, as follows: “That tussling-promoting… aggression (Koganezawa et al., 2016). Moreover, the anatomical features of pC1<sup>SS2</sup> neurons are highly similar to the male-specific aggression-promoting (MAP) neurons identified by another previous study (Chiu et al., 2021).

      I acknowledge the authors' courage to initiate an investigation to a less characterized, high intensity fighting behavior. Tussling requires the simultaneous engagement of two flies. Even if there are confusion over the distinction between lunges and tussling, authors' conclusion that socially experienced flies and socially isolated flies employ distinct fighting strategy is convincing. The concern I raised above is about the interpretation of the data, not about the quality of data.

      Thank you for your constructive comments to make this manuscript better.

    1. eLife Assessment

      This fundamental work provides novel insights into the blood flow-dependent mechanisms of neuronal migration and the role of Gherlin signaling in the adult brain. The authors present convincing evidence that newborn rostral migratory stream (RMS) neurons are closely situated alongside blood vessels, preferentially along arterioles, and that migratory speed is correlated with blood flow. They also provide evidence (in vitro and some in vivo) that Ghrelin from blood is involved in augmenting RMS neuron migration speed.

    2. Reviewer #1 (Public review):

      Summary:

      This study provides compelling evidence suggesting that ghrelin, a molecule released in the surrounding of the major adult brain neurogenic niche (V-SVZ) by blood vessels with high blood flow controls the migration of newborn interneurons towards the olfactory bulbs.

      Strengths:

      This study is a tour de force as it provides a solid set of data obtained by time lapse recordings in vivo. The data demonstrate that the migration and guidance of newborn neurons relies on factors released by selective type of blood vessels.

      Weaknesses:

      Some intermediate conclusions are weak and may be reinforced by additional experiments.

      Comments on revisions: The manuscript has improved.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary: 

      This study provides compelling evidence suggesting that ghrelin, a molecule released in the surroundings of the major adult brain neurogenic niche (V-SVZ) by blood vessels with high blood flow, controls the migration of newborn interneurons towards the olfactory bulbs. 

      Strengths:

      This study is a tour de force as it provides a solid set of data obtained by time-lapse recordings in vivo. The data demonstrate that the migration and guidance of newborn neurons rely on factors released by selective types of blood vessels. 

      Weaknesses:

      Some intermediate conclusions are weak and may be reinforced by additional experiments. 

      We thank the reviewer for the thoughtful evaluation and constructive comments outlined in the “Recommendations for The Authors”. In response, we have incorporated additional data, revised relevant figures, and clarified explanations in the revised manuscript.

      Reviewer #2 (Public review)

      Summary: 

      The authors establish a close spatial relationship between RMS neurons and blood vessels. They demonstrated that high blood flow was correlated with migratory speed. In vitro, they demonstrate that Ghrelin functions as a motogen that increases migratory speed through augmentation of actin cup formation. The authors proceed to demonstrate through the knockdown of the Ghrelin receptor that fewer RMS neurons reach the OB.

      They show the opposite is true when the animal is fasted. 

      Strengths: 

      Compelling evidence of close association of RMS neurons with blood vessels (tissue clearing 3D), preferentially arterioles. Good use of 2-photon imaging to demonstrate migratory speed and its correlation with blood flow. In vitro analysis of Ghrelin administration to cultured RMS neurons, actin visualization, Ghsr1KD, is solid and compelling. 

      We sincerely thank the reviewer for the encouraging comments and helpful suggestions. As noted, our original manuscript lacked sufficient in vivo evidence connecting blood flow with ghrelin signaling. To address this, we have added new data and revised the explanations throughout the manuscript as described below.

      Weaknesses: 

      (1) Novelty of findings attenuated due to prior work, especially Li et al., Experimental Neurology 2014. Here, the authors demonstrated that Ghrelin enhances migration in adultborn neurons in the SVZ and RMS. 

      We agree with the reviewer that the idea that ghrelin enhances migration of new neurons is not entirely novel. The study by Li et al. (2014) provided critical insights that guided our investigation into ghrelin as a blood-derived factor promoting neuronal migration. However, our study expands on this by demonstrating that ghrelin directly stimulates migration via GHSR1a in cultured new neurons, and we further identified the cellular and cytoskeletal mechanisms involved. Specifically, we showed that ghrelin enhances somal translocation by activating actin dynamics at the rear of the cell soma. We have revised the Results and Discussion sections accordingly to emphasize these novel aspects as follows:

      “A previous study demonstrated that the migration of V-SVZ-derived new neurons was attenuated in ghrelin knockout mice (Li et al., 2014). In our study, we found that the migration of cultured new neurons was enhanced by the application of ghrelin to the culture medium, and this effect was abolished by Ghsr1a knockdown (KD). These findings suggest that ghrelin directly stimulates neuronal migration through its receptor, GHSR1a, on new neurons. A previous study showed that GHSR1a is expressed in various regions of the brain (Zigman et al., 2006). In our experiments, new neuron-specific KD of Ghsr1a indicated that ghrelin signaling acts in a cell-autonomous manner to regulate neuronal migration.” (Discussion, page 13, lines 10–18)

      “Furthermore, we identified the cellular and cytoskeletal mechanisms underlying this effect on migration. The results indicate that ghrelin enhances somal translocation during migration by activating actin cytoskeletal dynamics at the rear of the neuronal soma.” (Discussion, page 13, lines 24–26)

      (2) The evidence for blood delivery of Ghrelin is not very convincing. Fluorescently-labeled Ghrelin appears to be found throughout the brain parenchyma, irrespective of the distance from vessels. It is also not clear from the data whether there is a link between increased blood flow and Ghrelin delivery. 

      We agree that the correlation between blood flow and ghrelin transcytosis is not very convincing in our study. As the reviewer pointed out, Figure 3A gives the impression that fluorescent-labeled ghrelin is uniformly distributed throughout the brain parenchyma. However, high-magnification images newly added in Figure 3 show that some, but not all, vessels have particularly strong fluorescent signals in the parenchymal area adjacent to the abluminal side of vascular endothelial cells, visualized by CD31 immunostaining (Feng et al., 2004) (Figure 3A′, A′′). To quantify these observations, we defined two regions: Area I (perivascular area), within 10 μm of the abluminal surface of CD31-positive endothelium; and Area II (distant area), located 10–20 μm away (Figure 3E). Of note, Area I corresponds to the perivascular region where new neurons are frequently observed (Figure 1).

      Importantly, we found strong ghrelin signals in vascular endothelial cells of endomucin-negative high-flow vessels (Figure 3C, D). This suggests that transcytosis of blood-derived ghrelin may occur more frequently in high-flow vessels due to increased endocytosis at the endothelium. To test this, we quantified signal gradients in the extra-vessel regions as fold changes (Area I / Area II), as illustrated in Figure 3E. The proportion of vessel segments with >1.5-fold increases was significantly higher in endomucin-negative vessels than in endomucin-positive ones (Figure 3F). Furthermore, vessels with >2-fold increases were observed exclusively in the endomucinnegative group (6.48% ± 1.18%). 

      These data suggest that, in high-flow vessels, blood-derived ghrelin accumulates more in the immediate perivascular region than in areas further away. This supports the possibility that elevated blood flow delivers a larger amount of ghrelin to the vascular endothelium, enhancing its transcytosis into adjacent brain parenchyma. This mechanism may underlie the preferential migration of new neurons along perivascular regions with high blood flow, as shown in Figure 1.  We have incorporated this new data in Figure 3 and corresponding explanations into the Results, Figure legend and Methods

      (3) The in vivo link between Ghsr1KD and migratory speed is not established. Given the strong work to open the study on blood flow and migratory speed and the in vitro evidence that migratory speed is augmented by Ghrelin, the paper would be much stronger with direct measurement of migration speed upon Ghsr1KD. Indeed, blood flow should also be measured in this experiment since it would address concerns in 2. If blood flow and ghrelin delivery are linked, one would expect that Ghsr1KD neurons would not exhibit increased migratory speed when associated with slow or fast blood flow vessels. 

      In Figure 3, we showed that ghrelin transcytosis occurs preferentially in high-flow vessels, suggesting a role for ghrelin in mediating the effects of blood flow on neuronal migration. However, whether this dependence is solely attributable to ghrelin signaling remains unclear. 

      To address this, we tested whether Ghsr1a-KD modifies the impact of reduced blood  flow on neuronal migration by combining Ghsr1a-KD with bilateral common carotid artery stenosis (BCAS), a chronic cerebral hypoperfusion model (Figure S9A). We found that BCAS decreased the percentage of Ghsr1a-KD new neurons reaching the OB, similar to the effect seen in control neurons (Figure S9B, see also Figure 2A–C). This suggests that blood flow influences neuronal migration even under Ghsr1a-KD conditions. 

      Furthermore, we analyzed the distribution of Ghsr1a-KD neurons with respect to vessel flow characteristics. Even under Ghsr1a-KD, a higher proportion of new neurons were located in the area of endomucin-negative (high-flow) vessels compared with endomucin-positive (low-flow) vessels (Figure S9C), indicating that Ghsr1a-KD does not abolish the preferential association of migrating neurons with high flow vessels. These findings suggest that although ghrelin signaling contributes to blood flow-dependent migration, it is not the sole factor. Other blood-derived signals may also mediate this effect. We have included these new data in Figure S9 and updated the corresponding sections in the Results

      Reviewer #1 (Recommendations for the authors) :

      Major 

      Page 6, Line 13. Please provide in the result section some explanation about how photothrombic clot is induced.  

      We added the following explanation to the Results section to clarify the method used to induce photothrombotic clot formation.

      “For clot formation, a restricted area of selected vessels was irradiated by a two-photon laser immediately after intravenous injection of rose bengal.” (Results, Page 7, lines 27–28)

      Page 6, Line 18. The authors use the marmoset as an additional experimental model. Here, V-SVZ-derived newborn neurons migrate in other brain regions as compared to rodents. Please provide a clear rationale for moving from rodents to "common marmosets" as an experiment model. And why use marmosets only for this set of experiments? 

      We clarified the rationale for using common marmosets in addition to mice as follows:

      “Because blood vessel-guided neuronal migration in the adult brain is a conserved phenomenon across species (Kishimoto et al., 2011; Akter et al., 2021; Shvedov et al., 2024), we hypothesized that blood flow may also influence neuronal migration in other brain regions of primates. The neocortex, which supports higher-order brain functions and has undergone evolutionary expansion in primates, was selected as a target region. In common marmosets, but not in mice, V-SVZ-derived new neurons migrate toward the neocortex and ventral striatum (Akter et al., 2021) (Supplemental Movies S4 and S5).” (Results, Page 6, lines 19–25)

      Figure 2B. The experimental setup is possibly problematic as the lentiviral tracing measurement does not take into consideration the rate of neurogenesis or newborn neuron survival. Can authors assess the rate of proliferation and survival in the VSVZ/RMS upon BCAS to decipher whether the reduced number of cells observed in the OB only results from migration changes? (comparable remark stands for Figure 5) 

      To evaluate whether the reduction in the number of new neurons observed in the OB after BCAS (Figure 2B, C) is due solely to impaired migration, we assessed cell proliferation and survival in the V-SVZ and RMS. Specifically, we quantified the density of Ki67+ proliferating cells and cleaved caspase-3+ apoptotic cells in the sham and BCAS groups. BCAS significantly decreased cell proliferation and increased cell death in both the V-SVZ and RMS (Figure S4), suggesting that reduced neurogenesis and/or survival may contribute to the decreased neuronal distribution in the OB. 

      Although we cannot exclude the possibility that changes in cell proliferation or survival contributed to this effect, our photothrombotic clot formation experiments are better suited to directly examine how acute reduction in blood flow affects neuronal migration. These experiments allowed us to measure the migration speed of new neurons shortly after inducing localized blood flow inhibition. We found that clot formation significantly reduced the migration speed of new neurons (Figure 2E, H), indicating that blood flow changes directly impair neuronal migration in the adult brain. 

      We have included these new data in Figure S4 and updated the corresponding text in the Results, Discussion, Figure legend, and Methods as follows:

      Figure 3. About ghrelin signaling. It is unclear whether its transcytosis occurs in endomucin-negative because of the high bloodstream flow. How can this be explained? What happens upon BCAS, is there still a close relation between ghrelin transcytosis, blood flow, and neuron migration? 

      As correctly noted, our initial explanation and data did not provide sufficient evidence that higher blood flow delivers a larger amount of ghrelin into the brain parenchyma. We found that some vessels had particularly strong fluorescent signals in the parenchymal area adjacent to the abluminal surface of vascular endothelial cells, as visualized by CD31 immunostaining (Feng et al., 2004) (Figure 3A′, A′′). On the basis of our observation that strong fluorescent signals were detected in vascular endothelial cells of endomucin-negative (high-flow) vessels (Figure 3C, D), we hypothesized that ghrelin transcytosis may occur more frequently in high-flow vessels due to increased endocytosis at the vessel endothelium. 

      To test this hypothesis, we quantified signal gradients in the extra-vessel regions by calculating fold changes in fluorescent intensity between two zones: Area I (0–10 μm from the abluminal surface of the endothelium) and Area II (10–20 μm away), as illustrated in Figure 3E. Area I corresponds to the perivascular region where new neurons are frequently found (Figure 1). We found that the proportion of vessel segments with >1.5-fold signal increase in Area I relative to Area II was significantly higher in endomucin-negative vessels than endomucin-positive ones (Figure 3F). Furthermore, vessel segments with >2-fold increases were observed exclusively in the endomucin-negative group (6.48% ± 1.18%). These results support the idea that higher blood flow increases the amount of ghrelin that reaches the luminal surface of vascular endothelial cells, thereby increasing the possibility of ghrelin transcytosis into the brain parenchyma.

      We also examined whether blood flow inhibition–induced by BCAS or photothrombotic clot formation–affects the relationship between ghrelin transcytosis, blood flow, and neuronal migration. The above results suggest that blood flow reduction may decrease ghrelin transcytosis, thereby contributing to impaired neuronal migration. To further explore this, we analyzed the distribution of new neurons around high- versus low-flow vessels under BCAS conditions. In the BCAS group, we still observed a higher density of new neurons in the region of high-flow (endomucin-negative) vessels compared with in low-flow (endomucin-positive) ones (Figure S9C). This suggests that even under reduced blood flow, neuronal migration preferentially occurs near high-flow vessels. Taken together, these results suggest that ghrelin transcytosis, blood flow and neuronal migration are connected, and that this relationship persists under conditions of blood flow reduction.

      Figure 4. Is ghrelin controlling both individual Dcx+ neuron migration as well as chain migration (cells moving more together)? This should be assessed and clarified. 

      How is ghrelin controlling actin dynamics in newborn migrating neurons? Since somal translocation speed and somal stride length are both modulated by ghrelin, this factor may also control MT remodeling, could that be checked? 

      We have revised the manuscript to better explain the role of ghrelin in both modes of neuronal migration–chain and individual. Initially, we demonstrated that ghrelin enhances the migration of new neurons in V-SVZ culture (Figure 4A, B), where these neurons migrate outward as chains, indicating that ghrelin facilitates chain migration. In subsequent in vitro experiments (Figure 4C–M), we showed that ghrelin also enhances the migration of individual neurons. To examine this in vivo, we injected Ghsr1a-KD and control lentiviruses into two different anatomical regions: the V-SVZ, where chain migration originates, and the OB core, where new neurons migrate individually. These experiments enabled us to assess the role of ghrelin signaling in each mode of migration independently. We found that ghrelin enhanced both chain migration in the RMS and individual migration in the OB. These results indicate that ghrelin signaling facilitates both forms of neuronal migration. We added the following text in the Results section:

      “To assess the direct effect of ghrelin on neuronal migration, we applied recombinant ghrelin to V-SVZ cultures, in which new neurons emerge and migrate as chains (Figure 4A). Ghrelin significantly increased the migration distance of these neurons (Figure 4B), indicating enhanced chain migration. We then used super-resolution time-lapse imaging to examine individually migrating neurons with or without knockdown (KD) of growth hormone secretagogue receptor 1a (GHSR1a), a ghrelin receptor expressed in V-SVZ-derived new neurons (Li et al., 2014) (Figure 4C). Ghrelin enhanced the migration speed of control cells (lacZ-KD) cells, indicating that it also facilitates individual migration (Figure 4D).” (Results, Page 9, lines 5–12)

      “Of the total labeled Dcx+ cells, the percentage of Dcx+ cells reaching the GL was significantly lower in the Ghsr1a-KD group than in the control group (Figure 5B, C), suggesting that ghrelin enhances individual radial migration of new neurons in the OB.” (Results, Page 10, lines 5–8) “These data indicate that ghrelin signaling facilitates both individual migration in the OB and chain migration in the RMS.” (Results, Page 10, lines 17–18)

      We also added discussion on how ghrelin may regulate cytoskeletal dynamics in migrating neurons. Ghrelin signaling has been reported to control actin cytoskeletal remodeling in astrocytoma cells (Dixit et al., 2006), which led us to investigate similar effects in migrating neurons. Rac, a member of the Rho GTPase family, was shown to mediate this actin remodeling in astrocytoma migration, suggesting it may also be involved in ghrelin-induced actin cup formation in new neurons. Furthermore, because somal translocation depends not only on actin but also on microtubule dynamics (Kaneko et al., 2017), it is possible that ghrelin influences both systems. Supporting this idea, ghrelin signaling was shown to modulate microtubule behavior via SFK-dependent phosphorylation of α-tubulin (Slomiany and Slomiany, 2017). These findings suggest that ghrelin may enhance somal translocation through coordinated regulation of both the actin and microtubule systems. We added following text in the Results and Discussion sections:

      “Ghrelin signaling has been reported to regulate actin cytoskeletal dynamics in astrocytoma cells (Dixit et al., 2006), which led us to examine whether a similar mechanism operates in migrating neurons.”(Results, Page 9, lines 23–25)

      “Further studies are needed to elucidate how ghrelin promotes actin cup formation in migrating neurons. Given that Rac, a Rho family GTPase, mediates actin remodeling downstream of ghrelin in astrocytoma cells (Dixit et al., 2006), it is possible that Rac may also be involved in ghrelininduced cytoskeletal regulation in new neurons.” (Discussion, Page 13, lines 28–31)

      “In addition to actin remodeling, ghrelin may regulate microtubule dynamics. Ghrelin signaling was shown to modulate microtubules via SFK-dependent phosphorylation of α-tubulin (Slomiany and Slomiany, 2017), raising the possibility that ghrelin promotes somal translocation of new neurons through coordinated regulation of both actin and microtubule networks (Kaneko et al., 2017).” (Discussion, Page 13, line 31–Page 14, line 2)

      It would also be informative to provide immunolabeling of Ghsr1 in the V-SVZ / RMS/ OB to have a clear picture of the expression pattern of this receptor. Newborn neurons migrate along blood vessels, which are surrounded by astrocytes that have also been reported to express Ghsr1, thus could newborn neuron migration change may also arise from activation of Ghsr1 in their surrounding astrocytes? 

      A previous study reported that GHSR1a is expressed in DCX+ new neurons in the RMS and OB, and in V-SVZ neural progenitor cells (Li et al., 2014). To visualize the spatial expression pattern of Ghsr1a, we performed RNAscope in situ hybridization because specific anti-GHSR1a antibodies suitable for immunohistochemistry were not available. Consistent with the previous report, we detected Ghsr1a mRNA in DCX+ new neurons in the VSVZ, RMS, and OB (Figure S5A), indicating that new neurons directly receive ghrelin signaling. 

      Moreover, our KD experiments demonstrated that ghrelin enhanced the migration of new neurons in a cell-autonomous manner via GHSR1a (Figure 4, 5). Nevertheless, a recent study (Stark et al., 2024) showed that GHSR1a was expressed in various cell types, including glutamatergic and GABAergic neurons, suggesting that ghrelin may also exert non-cellautonomous effects on neuronal migration. Given the presence of diverse cell types, including neurons, microglia, pericytes, and astrocytes, along the migratory route, it remains possible that GHSR1a activation in these neighboring cells contributes to the overall regulation of neuronal migration. 

      Figure 5. About the in vivo knockdown of Ghsr1a. The results section (page 9, line 3) mentioned that mice were either injected with one or the other construct but Figure 5 shows coincidence of GFP and dsRed positive cells. Were control and Ghsr1a shRNAs injected together into the same mouse? Could you quantify the number of cells in green (control), red (Ghsr1a KD), and yellow (both)? Won't they mostly be yellow? Have you tried injecting control and Ghsr1a separately? If yes, do you get the same result? Such analysis would be important to separate cell autonomous from noncell autonomous effects. 

      To minimize variability in injection conditions, we initially coinjected control and Ghsr1a-KD lentiviruses into the same mice and analyzed their migration using a paired design. As the reviewer correctly noted, some cells were coinfected and expressed both EmGFP and DsRed (18.7% ± 2.86% of EmGFP+ cells and 10.8% ± 0.533% of DsRed+ cells). To ensure that this overlap did not affect our analysis, we excluded EmGFP+/DsRed+ double-positive cells and focused solely on EmGFP+/DsRed− (control) and EmGFP−/DsRed+ (Ghsr1a-KD) single-positive cells. 

      We agree with the reviewer that coinjection could lead to reciprocal interactions between control and Ghsr1a-KD cells, potentially masking cell-autonomous effects. To address this, we performed an independent experiment in which control and Ghsr1a-KD lentiviruses were injected separately into different mice (Figure S7A), as suggested. Consistent with the results of the coinjection experiment, we found that the Ghsr1a-KD cells showed significantly reduced distribution in the GL compared with that in control cells (Figure S7B). Although we cannot exclude the possibility of a non-cell-autonomous effect of ghrelin, this result supports the conclusion that ghrelin signaling enhances neuronal migration in a cell-autonomous manner. 

      Who is expressing Ghsr1a, newborn neurons, and or their progenitors? The production and survival of newborn V-ZVS cells should be assessed upon knockdown of the ghrelin receptor too. 

      To determine whether the altered distribution of new neurons observed upon Ghsr1aKD is due to impaired migration rather than decreased cell production or survival, we examined the effects of Ghsr1a-KD on the proliferation and survival of new neurons and their progenitors, which express GHSR1a (Li et al., 2014). 

      We compared the proportion of cleaved caspase-3+ cells and Ki67+ cells from the total labeled cells in the V-SVZ and RMS between the control and Ghsr1a-KD groups. There was no significant difference in the proportion of cleaved caspase-3+ cells between the groups (Control: 874 cells from 5 mice; Ghsr1a-KD: 678 cells from 7 mice), suggesting that ghrelin signaling does not affect the survival of new neurons and their progenitors. 

      Similarly, the proportion of Ki67+ cells in the RMS did not differ significantly between the two groups (Figure S8), indicating that Ghsr1a-KD does not impair cell proliferation in the RMS. However, it remains technically difficult to evaluate whether Ghsr1a-KD affects proliferation in the VSVZ, because lentivirus injection into the VSVZ may interfere with GHSR1a expression not only in new neurons and neural progenitors, but also in other cell types known to express GHSR1a (Zigman et al., 2006). A previous study reported that ghrelin signaling promoted cell proliferation in the V-SVZ (Li et al., 2014), thus we cannot exclude the possibility that Ghsr1a-KD may affect V-SVZ proliferation.

      To overcome this limitation, we assessed the effects of Ghsr1a-KD on neuronal migration using in vitro KD experiments (Figure 4C–J) and in vivo OB-core lentivirus injections (Figure 5A–C), both of which did not interfere with proliferation in the V-SVZ. These complementary approaches consistently demonstrated that Ghsr1a-KD reduces the migration speed of new neurons. 

      “To determine whether the altered distribution of new neurons after Ghsr1a-KD is due to impaired migration rather than changes in cell production or survival, we assessed the effects of Ghsr1aKD on the proliferation and survival of new neurons and their progenitors, which express GHSR1a (Li et al., 2014). We quantified the proportion of cleaved caspase-3+ cells and Ki67+ cells from the total labeled cells in the V-SVZ and RMS in both control and Ghsr1a-KD groups. We found no significant difference in cleaved caspase-3+ cell proportions between the groups (Control: 874 cells from 5 mice; Ghsr1a-KD: 678 cells from 7 mice), suggesting that ghrelin signaling does not influence the survival of new neurons and their progenitors. Similarly, the percentage of Ki67+ cells in the RMS was similar between the two groups (Figure S8), indicating that Ghsr1a-KD does not impair cell proliferation in the RMS. However, technical limitations prevented a reliable evaluation of proliferation in the V-SVZ, as lentivirus injection into this region may interfere with GHSR1a expression in not only neural progenitors and new neurons, but also other GHSR1aexpressing cell types (Zigman et al., 2006). Although ghrelin signaling has been reported to promote cell proliferation in the V-SVZ (Li et al., 2014), our complementary in vitro KD experiments (Figure 4C–J) and in vivo OB-core lentivirus injections (Figure 5A–C), which did not affect the V-SVZ, consistently demonstrated that Ghsr1a-KD reduces neuronal migration. Taken together, our results suggest that blood-derived ghrelin enhances neuronal migration in the RMS and OB by stimulating actin cytoskeleton contraction in the cell soma, rather than by altering cell proliferation or survival.” (Results, Page 10, line 19–Page 11, line 4)

      “rat anti-Ki67 (1:500, #14-5698-82, eBioscience); and rabbit anti-cleaved caspase-3 (1:200, #9661, Cell Signaling Technology)” (Methods, Page 48, lines 14–16)

      How much is ghrelin/Ghsr1 signaling conserved in marmosets? 

      How ghrelin signaling is conserved between mice and common marmosets is important to clarify. A previous study reported the existence of a ghrelin homolog in common marmoset, which shares high sequence similarity with that in mice (Takemi et al., 2016). Moreover, the GHSR1a homolog in the common marmoset (https://www.ncbi.nlm.nih.gov/protein/380748978) shares 95.36% amino acid identity with its mouse counterpart. These findings suggest that blood-derived ghrelin may similarly promote neuronal migration in the marmoset brain, as observed in mice. 

      We have added the following text in the Discussion section:

      “Our data showed that new neurons preferentially migrate along arteriole-side vessels rather than venule-side vessels in both mouse and common marmoset brains, suggesting that the mechanism of blood flow-dependent neuronal migration is conserved across rodent and primate species, as well as across brain regions. A previous study identified a ghrelin homolog in the common marmoset with high sequence similarity to the murine version (Takemi et al., 2016). In addition, the marmoset GHSR1a homolog shares 95.36% amino acid identity with that of the mouse (https://www.ncbi.nlm.nih.gov/protein/380748978). These findings suggest that bloodderived ghrelin promotes neuronal migration in the common marmoset brain in a manner similar to that in mice.” (Discussion, Page 15, lines 8–16)

      Page 9. Starvation has been shown to boost ghrelin blood levels. What is the exact protocol used in this experiment and is this indeed increasing Ghrelin release from blood vessels in the V-SVZ? What about Ghsr1 expression level in newborn neurons? 

      We have clarified the calorie restriction (CR) protocol used in our experiments. We adopted a 70% CR protocol, which was previously shown to enhance hippocampal neurogenesis when administered for 14 days (Hornsby et al., 2016). In our study, the daily food intake under ad libitum (AL) conditions was first measured, and CR mice were then fed 70% of that amount for 5 consecutive days (see Figure 5I and Figure S10A). 

      To assess whether CR enhances ghrelin transcytosis into the brain parenchyma, we performed ELISA to quantify ghrelin levels in the OB and RMS. However, ghrelin concentrations were below the detection limit in both groups, precluding a direct comparison.

      We also considered whether CR modulates the expression level of the ghrelin receptor GHSR1a. A recent study reported that fasting increased GHSR1a expression in the OB (Stark et al., 2024), raising the possibility that CR may exert a similar effect. To test this, we performed in situ hybridization and quantified Ghsr1a mRNA puncta in Dcx+ cells in the OB. No significant difference was found between the AL and CR groups (Figure S5B), suggesting that CR does not alter GHSR1a expression levels in new neurons. 

      Although we cannot exclude the possibility that CR increases GHSR1a expression in other OB cell types, our combined CR and Ghsr1a-KD experiments strongly support a cellautonomous contribution of ghrelin signaling to the enhanced neuronal migration observed under CR conditions. Corresponding data and text have been added to Figure S5 and the Results, Discussion, and the Figure legend sections as follows:

      Minor 

      Page 4 

      Line 19 In Supplemental movies 1 and 2, it is unclear where to see the GFP+ new neurons interact with BV. Can you add arrows as an indication for the readers? It will be better to add the anatomy term for orientation, caudal, or rostral in the video. (The same for Supplemental movies 3, 4, and 5).  

      To clarify the regions of interest in Supplemental Movies 1 and 2, where neuron–vessel interactions in the RMS are highlighted, we added dotted lines indicating the RMS boundaries. In addition, we created a new movie (Supplemental Movie S1′) showing a high-magnification view of Supplemental Movie S1, in which arrows mark EGFP+ new neurons interacting with blood vessels. We also added orientation indicators (e.g., caudal and rostral) and arrows to highlight new neuron–vessel interactions in Supplemental Movies S1–S5. 

      The following descriptions have been added to the Figure legends:

      “Supplemental Movie S1′ 

      High-magnification view extracted from Supplemental Movie S1. Arrows indicate EGFP+ cells interacting with blood vessels.” (Figure legend, Page 46, lines 6–8)

      “Arrows indicate EGFP+ cells interacting with blood vessels.” (Figure legend, Supplemental Movie S3, Page 46, lines 16–17)

      “Arrows indicate Dcx+ cells interacting with blood vessels.” (Figure legend, Supplemental Movies S4 and S5, Page 46, lines 21–22, 26–27)

      Blood vessels are labeled in the Supplemental movies 2 and 3 by employing Flt1DsRed transgenic mice instead of RITC-Dex-GMA. However, Flt1-DsRed transgenic mice are not mentioned in the results section. 

      We have now included an explanation regarding the use of Flt1-DsRed mice, in which vascular endothelial cells were labeled with DsRed.

      “To visualize blood vessels, we also used Flt1-DsRed transgenic mice, in which vascular endothelial cells were specifically labeled with DsRed (Matsumoto et al., 2012). Using DcxEGFP/Flt1-DsRed double transgenic mice, we observed close spatial relationships between new neurons and blood vessels (Supplemental Movies S2 and S3).” (Results, Page 4, lines 22– 26)

      Figure 5. Can you indicate (in the figure legend and the result section) the stage of the adult brain used for this experiment? 

      We used 6- to 12-week-old adult male mice in all experiments in this study. To specify this, we have added the age of animals to both the Results and the relevant Figure legends as follows:

      “Therefore, we first studied blood vessel-guided neuronal migration in the RMS and OB using three-dimensional imaging in 6- to 12-week-old adult mice, which enabled analysis of the in vivo spatial relationship between new neurons and blood vessels.” (Results, Page 4, lines 14–16)

      “Figure 1 New neurons migrate along blood vessels with abundant flow in the adult brain.” (Figure legend, Page 25, line 4)

      “(B, C) Three-dimensional reconstructed images of a new neuron (green) and blood vessels (red) in the rostral migratory stream (RMS) (B) and glomerular layer (GL) (C) of 6- to 12-weekold adult mice.” (Figure legend, Page 25, lines 6–8)

      “(E) Transmission electron microscopy image of a new neuron (green) in close contact with a blood vessel (red) in the GL of a 6- to 12-week-old adult mouse.” (Figure legend, Page 26, lines 4–5)

      “(F) Time-lapse images of a migrating neuron (indicated by asterisks) in the GL of a 6- to 12week-old Dcx-EGFP mouse.” (Figure legend, Page 26, lines 6–7)

      “Figure 3 Ghrelin is delivered from the bloodstream to the RMS and OB in the adult brain (A) Representative images of the OB and cortex of a fluorescent ghrelin-infused mouse (6 to 12 weeks old).” (Figure legend, Page 30, lines 1–3)

      “Lentivirus injection into the OB core (A) and the VSVZ (D) was performed in 6- to 12-week-old adult mice.” (Figure legend, Page 33, lines 3–4)

      Reviewer #2 (Recommendations for author):

      Major:

      Ghsr1KD and blood flow 2-photon experiments to directly measure migratory speed. Could also do the same with fasting with or without Ghsr1KD.  

      We thank the reviewer for the valuable suggestion to strengthen our study. As pointed out in the Public Review, we agree that direct in vivo measurement of neuronal migration speed under Ghsr1a-KD conditions is important to clarify the link between ghrelin signaling and blood flow. 

      Two-photon imaging is the most suitable method for this purpose. Although we attempted two-photon imaging of Ghsr1a-KD new neurons, the number of virus-infected cells observed in vivo was too low to yield reliable data. Therefore, we chose an alternative strategy, combining Ghsr1a-KD with blood flow reduction using the BCAS model (Figure S9A), in which migration speed can be quantified based on the percentage of labeled cells reaching the OB. As stated in the Public Review response, BCAS significantly decreased the migration speed of Ghsr1a-KD new neurons (Figure S9B), indicating that Ghsr1a-KD does not abolish the influence of blood flow reduction. These findings suggest that ghrelin signaling is involved, but is not essential, for blood flow-dependent neuronal migration. 

      As suggested by the reviewer, direct observation of migration dynamics (e.g., somal translocation, leading process extension, stationary and migratory phases) is needed, especially in calorie restriction experiments. Although our data indicate that ghrelin signaling is required for fasting-induced increases in migration speed of new neurons, calorie restriction could also change concentrations of other factors in blood (Bonnet et al., 2020; Wu et al., 2024; Alogaiel et al., 2025), which may independently affect behavior of migrating neurons. Given that ghrelin is not the sole factor contributing to blood flow-dependent neuronal migration, other circulating factors could affect behavior of migrating neurons in a different manner during fasting. In vivo twophoton imaging would be a powerful approach to determine whether fasting-induced neuronal migration is caused by upregulated somal translocation speed, which would further support a role for ghrelin in this process.

      We have added the following text in the Discussion:

      “Although our data indicate that ghrelin signaling is essential for fasting-induced acceleration of neuronal migration, calorie restriction may also alter the concentrations of other circulating factors (Bonnet et al., 2020; Wu et al., 2024; Alogaiel et al., 2025), which could independently influence the behavior of migrating neurons.” (Discussion, Page 14, lines 25–29)

      Minor: 

      (1) Show fluorescent Ghreliin in Figure 3 for all brain areas measured in Figure 1 (GL, EPL, GCL, and RMS) for direct comparison.  

      To allow for direct comparison across brain regions, we added a new Supplemental figure showing the distribution of fluorescently labeled ghrelin in the OB, including the GL, EPL, GCL and RMS. This comprehensive view highlights ghrelin localization relative to vasculature and migrating neurons in the regions analyzed in Figure 1.

      (1) Figure 1, panel I is presented in a confusing manner. High blood flow points to 0 degrees, low blood flow to 180 degrees. It implies (unintentionally, I am sure) that low blood flow results in migration away from OB. Maybe plot separately?

      We agree that the original presentation of Figure 1I could be misinterpreted as referring to anatomical orientation (i.e., toward or away from the OB). To avoid confusion, we revised the figure to categorize new neuron–vessel interactions into four groups according to (1) the angle between the migration direction and vessel axis (small or large), and (2) whether the new neuron is migrating toward or away from the direction of higher blood flow. This new presentation avoids implying a fixed anatomical direction and better reflects the relationship between local blood flow and neuronal migration behavior. The revised figure is presented as Supplemental Figure S1.

    1. eLife Assessment

      This important work begins to understand how BDNF regulates the phosphorylation and activity of LRRK2. The overall strength of evidence has been assessed as compelling, though some claims are only partially supported. The work will be of interest for those that might pursue specific LRRK2 interactions and mutational effects on these pathways as the work continues to develop.

    2. Reviewer #1 (Public review):

      Summary:

      LRRK2 protein is familially linked to Parkinson's disease by the presence of several gene variants that all confer a gain-of-function effect on LRRK2 kinase activity.

      The authors examine the effects of BDNF stimulation in immortalized neuron-like cells, cultured mouse primary neurons, hIPSC-derived neurons, and brain tissue from genetically modified mice. They examine a LRRK2 regulatory phosphorylation residue, LRRK2 binding relationships, and measures of synaptic structure and function.

      Strengths:

      The study addresses an important research question: how does a PD-linked protein interact with other proteins, and contribute to responses to a well-characterized neuronal signalling pathway involved in the regulation of synaptic function and cell health.

      They employ a range of good models and techniques to fairly convincingly demonstrate that BDNF stimulation alters LRRK2 phosphorylation and binding to many proteins. IN this revised manuscript, aspects are well validated e.g., drebrin binding, but there is a disconnect between these findings and alterations to LRRK2 substrates. A convincing phosphoproteomic analysis of PD mutant Knock-in mouse brain is included. Overall the links between LRRK2, LRRK2 activity, and the changes to synaptic molecules, structures, and activity are intriguing.

      Weaknesses:

      The data sets remain disjointed, conclusions are sweeping, and not always in line with what the data is showing. Validation of 'omics' data is light. Some inconsistencies with the major conclusions are ignored. Several of the assays employed (western blotting especially) are underpowered, findings key to their interpretation are addressed in only one or other of the several models employed, and supporting observations are lacking.

      Main Conclusions of Abstract:

      (1) Increase in pLRRK2 Ser935 and pRAB after BDNF in SH-SY5Y & mouse neurons

      Well supported, but only for pLRRK2 in neurons, why not pERK pAkt & pRab?

      (2) Omics Proteome remodelling of LRRK2 interactome with BDNF & different in G2019S mouse neurons.

      Supports that the phosphoproteome of G2019S is different. Drebrin interaction with LRRK2 very well supported. Link between drebrin and LRRK2 activity somewhat supported (pS935 site), but the consequence (non-specific pRab8) not supported, as there is no evidence of a change in LRRK2 substrate(s).

      (3) Golgi 1 month LKO mouse altered dendritic spines, transient at 1m not older.

      Supported but very small transient change in spines, disconnected to other results (e.g., drebrin).

      (4) iPSC-derived neurons BDNF increases mEPSC frequency (transient at 70 not 50 or 90 days) in WT not KO "which appear to bypass this regulation through developmental compensation"

      Weak, not clear what is being bypassed.

      Main Conclusions Based on Old and New Figure / Data:

      (1) Increase in pLRRK2 Ser935 and pRAB after BDNF in SH-SY5Y & mouse neurons

      Well supported, but only for pLRRK2 in neurons, why not ERK Akt & Rab?

      (2) BDNF promotes LRRK2 interaction with "post-synaptic actin cytoskeleton components"

      Tone down, only one postsynaptic validated - drebrin strong BUT CONTRADICTORY; link between drebrin and LRRK2 activity (pS935 site) supported, consequence (non-specific pRab8) broken, no evidence of change in LRRK2 substrate.

      (3) LRRK2 G2019S striatal phosphoproteome is different from WT.

      It is different. Where is link to BDNF or Drebrin?

      (4) BDNF signaling is impaired in Lrrk2 knockout neurons

      TrkB changes seem higher in SHSY5Y. pAKT impaired, pERK not convincing. Primary neurons Akt slower but it and Erk mostly intact. MLi-2 did not block pAkt or pErk in WT or KO (higher in latter). Whatever is happening in KO, Mli-2 not really blocking effect in WT. If we are to assume that studying the KO was a means to understand LRRK2 function, the authors data should explain why we care if an effect is absent in LKO, if LRRK2 isn't doing the same job in WT?

      BDNF increases synaptic puncta in WT not LKO (which start higher?). Is this BDNF increase blocked by LRRK2 inhibition?

      (5) Postsynaptic structural changes in Lrrk2 knockout neurons

      Golgi impregnation shows some very small spine changes at 1m. Not sustained over age. mRNA changes are very small (10% not even a fold... very weak and should be written as so). Derbrin levels reduced clearly at 1m, but probably also at 4 & 18. Underpowered, disconnected time course from the spine changes.

      (6) An effect on "spontaneous electrical activity" at Div70

      Weak. What is so special at 70 days that means we should be confident in the differences, or be satisfied that the other time points are legitimately ignored? These are 10-11 cells from 3 cultures assayed at 3 time points but only one is presented (rest in supplement). This should be a 2 (time) or 3 way (+culture RM) ANOVA. As it stands, in WT there is a little - no activity at 50 days, little to no at 70 days, and variable to lots or none at 90. BDNF did nothing at 50 or 90 but may have at 70. In KO low activity stable at 50 & 70, tanks at 90. BDNF would seem to have a similar effect on KO at 90 as WT at 70, but as there are only 7 cells it remains inconclusive. Thus the conclusion that BDNF signalling is broken in LKO is not well supported by the ephys data, nor is the BDNF effect in WT cells (even at the 70 day time point) shown to be susceptible to LRRK2 inhibition.

    3. Reviewer #2 (Public review):

      The data show that BDNF regulates the PD-associated kinase LRRK2, they place LRRK2 within well-described BDNF pathways biochemically, and they show that LRRK2 can play a role mediating BDNF-driven synaptic outcomes at excitatory synapses. The chief strength is that the data provide a potential focal point for multiple observations that have been made across many labs. The findings will be of broad interest because LRRK2 has emerged as a protein that is likely to be part of Parkinson's pathology and its normal and pathological actions remain poorly understood.

      A major strength of the study is the multiple approaches that were used (biochemistry, bioinformatics, light and electron microscopy and electrophysiology) across different experimental models (cells, primary neurons, human neurons, mice) to identify and examine the impact of BDNF on LRRK2 signaling and functions. Noteworthy is also the employment of LRRK2KO preparations to validate outcomes and to place LRRK2 actions up or downstream.

      The demonstration that LRRK2 and drebrin interact directly is important and suggests that other interacting proteins identified biochemically and bioinformatically in the paper will be important to pursue.

      Some data from different models do not fit well with one another (like mouse and human neurons). This is likely due to inherent differences in the preparations. Since different experiments were carried out on the different preps, however, it is not possible to cross compare. The lack of this information is viewed more as an open question than a cause for concern.

    1. eLife Assessment

      This manuscript presents a valuable and insightful contribution to the understanding of how Legionella pneumophila remodels its vacuolar niche through coordinated ubiquitination mechanisms. The identification of Rab5 as a target of both canonical and phosphoribosyl ubiquitination, and the demonstration of a detergent-resistant ubiquitin "cloud" surrounding the LCV, represent significant advances in the field. The findings are supported by rigorous experimental design, robust quantitative analyses, and clear mechanistic insight, meeting a standard of evidence that is compelling and exceeds current state-of-the-art approaches.

    2. Reviewer #1 (Public review):

      Summary:

      In the submitted manuscript, Steinbach et al describe the formation of a detergent-resistant "cloud" around the Legionella-containing vacuole (LCV) that functions as a protective barrier. The authors show that formation of the "cloud" barrier is contingent upon the phosphoribosyl-ubiquitination activity of the SidE/SdeABC effector family, and is temporally regulated, with the assembly and subsequent disassembly of the "cloud" coinciding with replication and vacuolar expansion. The authors postulate a model of "cloud" barrier formation that relies upon a wave of initial ubiquitination by the SidC effector family, after which the SidE/SdeABC family expands the ubiquitination and forms cross-links that render the ubiquitin cloud resistant to harsh detergents. Additionally, Steinbach et al. also demonstrate that Rab5 is recruited to the LCV and remains associated for a considerable period.

      Strengths:

      This manuscript is very well written, with clear justification provided for experiments that make it very easy to follow along with the experimental logic. The figures have clearly been designed with much thought and are easy to interpret. Steinbach et al have also done a commendable job of addressing the previous reviewers' comments, even though some may suggest that some of these comments could be viewed as slightly unreasonable. This work would be of interest to both the Legionella and ubiquitin fields. Legionella researchers would potentially be interested to explore the proposed barrier model as the function for the ubiquitin "cloud," whereas ubiquitin researchers may be interested in exploring the mechanisms underlying SidE's crosslinking ability.

      Weaknesses:

      While the work is important and describes the physical nature of the ubiquitin cloud on the Legionella vacuole, it is somewhat descriptive in nature and does not dig deeply into what purpose this cloud serves. This is a complicated topic that will certainly stimulate additional research in this area.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript "Canonical and phosphoribosyl ubiquitination coordinate to stabilize a proteinaceous structure surrounding the Legionella-containing vacuole" by Steinbach et al. is well written and presents strong evidence that satisfactorily supports the main hypothesis and research objectives. The authors have clearly demonstrated the presence of cloud-like, detergent-resistant GTPase Rab5 surrounding the LCV, and formation of the structure is dependent on the SidE family of effectors. The study provides insights into the relevant (associated with described phenotype) ubiquitination pathways. The findings advance our understanding of Legionella pneumophila vacuole remodeling during intracellular infection and open directions for future research to establish broader implications of this structure on Legionella pathogenesis.

      Strengths:

      The manuscript convincingly demonstrates the presence of a cloud-like, detergent-resistant GTPase Rab5 surrounding the LCV through elegant microscopy. The experimental evidence about the dependence of the observed phenotype on the SidE family of effectors is compelling and presented with strong scientific rigor. The introduction is well-written, and the discussion is thorough and satisfactory. The article is thought-provoking and shows preliminary evidence for ubiquitin-mediated protection and spatial organization of the LCV.

      Weaknesses:

      The manuscript is well-organized and detailed, and it is hard to find weaknesses under the set goals of the research. A few weaknesses are that the molecular determinants or the regulatory mechanisms that drive selective versus non-selective incorporation of host proteins into this structure are unclear, and, as the authors mentioned, further work is required to establish the precise biophysical basis of the detergent resistance and expansive morphology of the ubiquitinated GTPase "cloud". Currently, the function or purpose of the structure is completely speculative. The effects or importance of the structure on bacterial replication is also not established in the current study. Figure 2D, right panel, Western blot results, the authors suggested the signal present in all four lanes between 37 and 25 kDa is 'nonspecific', which is probably a 'too intense' signal to be called so. Mass spec analysis would be interesting in order to identify sources of such intense signals. With these few limitations, the research presented in this manuscript is experimentally rigorous and opens avenues for future research.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript by Mukherjee and colleagues extended earlier studies on the coordination of the SidC and SidE effector families on the generation of a unique ubiquitin layer on the surface of the vacuoles containing the bacterial pathogen Legionella pneumophila (LCV).

      Strengths:

      The main strength of the manuscript is the identification of the small GTPase Rab5 as a major "carrier" of these differently modified ubiquitin and ubiquitin chains, which was nicely quantified.

      Weaknesses:

      (1) The results are mostly descriptive, based on mechanistic studies from earlier works.

      (2) The majority of the work was dedicated to the characterization of the unique ubiquitin layer on the LCV. One important question was ignored: what is the role of Rab5 in this process? Is the GTPase activity of Rab5 required for its ubiquitination by SidC and SidE? The authors should create a Rab5 KO cell line, complement the line with different mutants of Rab5, and examine their ubiquitination and association with the LCV.

      (3) The finding that Rab5 is associated with the LCV supports the notion that the LCV has characteristics of endo- or/late endosomes. The positioning of the LCV in the endocytic pathway should be discussed in the context of earlier studies (e.g.,PMID: 38739652; PMID: 11067875; PMID: 11067875).

    1. eLife Assessment

      Sanchez-Vasquez et al establish an innovative approach to induce aneuploidy in preimplantation embryos. This important study extends the author's previous publications evaluating the consequences of aneuploidy in the mammalian embryo. In this work, the authors investigate the developmental potential of aneuploid embryos and characterize changes in gene expression profiles under normoxic and hypoxic culture conditions. Using a solid methodology they identify sensitivity to Hif1alpha loss in aneuploid embryos, and in further convincing experiments they assess how levels of DNA damage and DNA repair are altered under hypoxic and normoxic conditions.

    2. Reviewer #1 (Public review):

      Summary:

      This paper developed a model of chromosome mosaicism by using a new aneuploidy-inducing drug (AZ3146), and compared this to their previous work where they used reversine, to demonstrate the fate of aneuploid cells during murine preimplantation embryo development. They found that AZ3146 acts similarly to reversine in inducing aneuploidy in embryos, but interestingly showed that the developmental potential of embryos is higher in AZ3146-treated vs. reversine-treated embryos. This difference was associated with changes in HIF1A, p53 gene regulation, DNA damage, and fate of euploid and aneuploid cells when embryos were cultured in a hypoxic environment.

      Strengths:

      In the current study, the authors investigate the fate of aneuploid cells in the preimplantation murine embryo using a specific aneuploidy-inducing compound to generate embryos that were chimeras of euploid and aneuploid cells. The strength of the work is that they investigate the developmental potential and changes in gene expression profiles under normoxic and hypoxic culture conditions. Further, they also assessed how levels of DNA damage and DNA repair are altered in these culture conditions. They also assessed the allocation of aneuploid cells to the divergent cell lineages of the blastocyst stage embryo.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1(Public review):

      We deeply appreciate the reviewer comments on our manuscript. Following up the revisions, our manuscript has been improved thanks to their insightful remarks. We have proceeded with all the required changes.  

      Weaknesses:

      The authors have still not addressed the inconsistent/missing description for sample size, the appropriate number of * for each figure panel, and the statistical tests used.

      Description of sample size, specific P value and statistical test used has been added it both in the main text, figures and figure legends.

      The authors assign 5% oxygen as hypoxia. This is not the case as the in vivo environment is close to this value. 5% is normoxia. Clinical IVF/embryo culture occurs at 5% O2. Please adjust your narrative around this.

      We define in our manuscript “normoxia” as the standard atmospheric oxygen levels in tissue culture incubators, which range from about 20–21% oxygen. Our definition of hypoxia is 5% concentration of oxygen, taking into consideration the standard levels of oxygen in the IVF clinics. Physiological oxygen in mouse varies from ~1.5% to 8% (Alva et al 2022). Considering that these levels of oxygen are the standard levels in tissue culture practices, a paragraph has been added to the discussion and materials and methods for further clarification   

      Reviewer #2 (Public review):

      Weakness:

      Given that this is a study on the induction of aneuploidy, it would be meaningful to assess aneuploidy immediately after induction, and then again before implantation. This is also applicable to the competition experiments on page 7/8. What is shown is the competitiveness of treated cells. Because the publication centers around aneuploidy, inclusion of such data in the main figure at all relevant points would strengthen it. There is some evaluation of karyotypes only in the supplemental - why? Would be good not to rely on a single assay that the authors appear to not give much importance.

      This is an excellent point. However, due to the stochasticity of the arising of aneuploidies when embryos are treated with AZ3146 and reversine (Bolton et al 2016), every treatment is likely to generate different levels of aneuploidy. Due to this, and to the technical limitations of generating single-cell genomic DNA sequencing at the blastocyst stage, we were unable to determine the karyotype of all cells after different conditions. Nevertheless, Regin et al 2024 (eLife) showed similar results on the overall transcriptome changes of different dosages of aneuploidy: high dosage embryos overexpress p53, like reversine-treated embryos; meanwhile, low dosage embryos overexpress the hypoxic pathway, including HIF1A, similar to embryos treated with AZ3146.  

      Reviewer #1 (Recommendations for the authors):

      Corrections required before final publishing:

      Please ensure that the number of asterisks is in alignment with standard convention (* <0.05; ** <0.01; *** <0.001; **** <0.0001). If you want to describe an exact P -vale it should be presented as P = 0.0004. line 108 *** is <0.0004. line 263 * P<0.0044

      Same issue appears in lines 697, 711, 722, 753, 685

      Specific values have been added in the figures and modified in the text. 

      Line 199: "...viable E9.5 embryos" missing "Figure S1D"

      Modified in manuscript

      Line 120: "...decidua" please add "Figure S1C"

      Modified in manuscript

      Line 126-127: Please add a description for the results (morula) in Fig 1D, e.g., It appears that YH2Ax persists from 8-cell to morula when treated with Reversine but not AZ3146"

      At the morula stage, the levels of γH2A.X in reversine- and AZ3146-treated embryos are similar (Fig. 1E). However, at the blastocyst stage, high levels of γH2A.X are maintained in reversine-treated embryos and reduced in AZ3146-treated embryos, suggesting some level of DNA repair between the morula-to-blastocyst stages (Fig. S2A). In contrast, in hypoxia, the levels of γH2A.X are low in the three treatments at the morula stages, suggesting that DNA repair can be enhanced under hypoxic conditions. Similar results have been reported in somatic cells (Marti et al., 2021; Pietrzak et al., 2018).

      Line 213: PARP1 levels were also similar under all conditions; but Fig3E, top right shows PARP1 was significantly lower with Reversine treatment; also please correct me if i am wrong, but does the phrase "all conditions" cross reference yH2AX and PARP1 between Fig 3 and Fig 1 to show the impact of hypoxia? Because from my understanding Fig 1 was done in 20% oxygen, but Fig 3 was done in 5% oxygen – hypoxia.

      This is correct. Modification in the manuscript has been performed for clarification

      Line 264: extra forward dash? "Reversine/AZ3146/ aggregation"

      Modified in manuscript

      Line 644: you don't have a control for IDF treatment, so how did you differentiate between impact of aneuploid drugs vs IDF treatment alone? Would the impact observed be due to compounding effect of aneuploidy drugs + IDF?

      This is a great observation. We previously demonstrated that IDF-1174 treatments in embryos do not affect pre-implantation development (Fig. S3).

      Line 681: change their behaviour is a vague statement. Be specific.

      Modified in manuscript

      Line 676 missing bracket "E)"

      Modified in manuscript

      Line 680: "...significantly on" should be "for"

      Modified in manuscript

      Line 682-685: "...hypoxia favours the survival of reversine-induced aneuploid cells." does it? the statement before this says in Rev/AZ chimeras, AZ blastomeres contribute similarly to reversine-blastomeres to the TE and PE but significantly increase contributions to the EPI.Wouldn't this mean hypoxia favours survival of AZ aneuploid cells in EPI?

      In normoxic conditions, AZ3146 treated cells in Rev/AZ chimeras contributed mostly to the EPI and TE but not PE. In contrast, in normoxic conditions, Rev-treated cells contributed similarly to all the lineages. This result seems to be due to a better survival of Rev-treated cells under normoxic conditions (Fig. 4D-E)

      Line 720: (b) shows blastocyst staining from what group? DMSO? Rev/AZ? Or are the 3 blastocysts shown here, 3 separate examples of Reversine-treated blastocysts? Would require labelling Fig S2B, and adding a short description in the corresponding figure legend

      Figure (B) shows the expression pattern of PARP1 at the blastocyst stage. Modified in manuscript

      Figure 2, Figure S3 and Figure S6: were these experiments performed at 5% or 20% O2, please add detail.

      Modified in manuscript

      Reviewer #2 (Recommendations for the authors):

      Lines 45-46 understanding of reduction of aneuploidy should mention/discuss the paper of attrition/selection, of the kind by the Brivanlou lab for instance, or others. As well as allocation to specific lineages, including the authors' work.

      A section in the discussion has been added in response to this recommendation. Comparison between models is debatable.

      The response does not clarify whether other papers were cited instead, or the authors own work that has shown preferential allocation to TE.

    1. eLife Assessment

      This important study provides a novel approach for delineating subcortical-cortical white matter bundles. The authors provide convincing evidence by harnessing state-of-the-art methods and cross-species data. Together, this effort will be of interest to scientists across multiple subfields and accelerate progress in a biologically critical but methodologically challenging area.

    2. Reviewer #1 (Public review):

      The authors note that it is challenging to perform diffusion MRI tractography consistently in both humans and macaques, particularly when deep subcortical structures are involved. The scientific advance described in this paper is effectively an update to the tracts that the XTRACT software supports. The changes to XTRACT are soundly motivated in theory (based on anatomical tracer studies) and practice (changes in seeding/masking for tractography).

    3. Reviewer #2 (Public review):

      Summary:

      In this article, Assimopoulos et al. expand the FSL-XTRACT software to include new protocols for identifying cortical-subcortical tracts with diffusion MRI, with a focus on tracts connecting to the amygdala and striatum. They show that the amygdalofugal pathway and divisions of the striatal bundle/external capsule can be successfully reconstructed in both macaques and humans while preserving large-scale topographic features previously defined in tract tracing studies. The authors set out to create an automated subcortical tractography protocol, and they accomplish this for a subset of specific subcortical connections.

      Strengths:

      The main strength of the current study is the translation of established anatomical knowledge to a tractography protocol for delineating cortical-subcortical tracts that are difficult to reconstruct. Diffusion MRI-based tractography is highly prone to false positives; thus, constraining tractography outputs by known anatomical priors is important. The authors used existing tracing literature to create anatomical constraints for tracking specific cortical-subcortical connections and refined their protocol through an iterative process and in collaboration with multiple neuroanatomists. Key additional strengths include 1) the creation of a protocol that can be applied to both macaque and human data; 2) demonstration that the protocol can be applied to be high quality data (3 shells, > 250 directions, 1.25 mm isotropic, 55 minutes) and lower quality data (2 shells, 100 directions, 2 mm isotropic, 6.5 minutes); and 3) validation that the anatomy of cortical-subcortical tracts derived from the new method are more similar in monozygotic twins than in siblings and unrelated individuals.

      Overall Appraisal:

      This new method will accelerate research on anatomically validated cortical-subcortical white matter pathways. The work has utility for diffusion MRI researchers across fields.

      Editors' note:

      Both reviewers were satisfied with the responses to their feedback.

    1. eLife Assessment

      This valuable study presents an analysis of evolutionary conservation in intrinsically disordered regions, identified as key drivers of phase separation, leveraging a protein language model. The strength of evidence is convincing, but a clearer justification of the methods and analyses is needed to fully support the main claims.

    2. Reviewer #1 (Public review):

      The manuscript by Zhang et al describes the use of a protein language model (pLM) to analyse disordered regions in proteins, with a focus on those that may be important in biological phase separation. While the paper is relatively easy to read overall, my main comment is that the authors could perhaps make it clearer which observations are new, and which support previous work using related approaches. Further, while the link to phase separation is interesting, it is not completely clear which data supports the statements made, and this could also be made clearer.

      Major comments:

      (1) With respect to putting the work in a better context of what has previously been done before, this is not to say that there is not new information in it, but what the authors do is somewhat closely related to work by others. I think it would be useful to make those links more directly. Some examples:

      (1a) Alderson et al (reference 71) analysed in detail the conservation of IDRs (via pLDDT, which is itself related to conservation) to show, for example, that conserved residues fold upon binding. This analysis is very similar to the analysis used in the current study (using ESM2 as a different measure of conservation). Thus, the approach (pages 7-8) described as "This distinction allows us to classify disordered regions into two types: "flexible disordered" regions, which show high ESM2 scores and greater mutational tolerance, and "conserved disordered" regions, which display low ESM2 scores, indicating varying levels of mutational constraint despite a lack of stable folding." is fundamentally very similar to that used by Alderson et al. Thus, the result that "Given that low ESM2 scores generally reflect mutational constraint in folded proteins, the presence of region a among disordered residues suggests that certain disordered amino acids are evolutionarily conserved and likely functionally significant" is in some ways very similar to the results of that paper.

      (1b) Dasmeh et al (https://doi.org/10.1093/genetics/iyab184), Lu et al (https://doi.org/10.1371/journal.pcbi.1010238) and Ho & Huang (https://doi.org/10.1002/pro.4317) analysed conservation in IDRs, including aromatic residues and their role in phase separation

      (1c) A number of groups have performed proteomewide saturation scans using pLMs, including variants of the ESM family, including Meier (reference 89, but cited about something else) and Cagiada et al (https://doi.org/10.1101/2024.05.21.595203) that analysed variant effects in IDRs using a pLM. Thus, I think statements such as "their applicability to studying the fitness and evolutionary pressures on IDRs has yet to be established" should possibly be qualified.

      (2) On page 4, the authors write, "The conserved residues are primarily located in regions associated with phase separation." These results are presented as a central part of the work, but it is not completely clear what the evidence is.

      (3) It would be useful with an assessment of what controls the authors used to assess whether there are folded domains within their set of IDRs.

    3. Reviewer #2 (Public review):

      This manuscript uses the ESM2 language model to map the evolutionary fitness landscape of intrinsically disordered regions (IDRs). The central idea is that mutational preferences predicted by these models could be useful in understanding eventual IDR-related behavior, such as disruption of otherwise stable phases. While ESM2-type models have been applied to analyze such mutational effects in folded proteins, they have not been used or verified for studying IDRs. Here, the authors use ESM2 to study membraneless organelle formation and the related fitness landscape of IDRs.

      Through this, their key finding in this work is the identification of a subset of amino acids that exhibit mutation resistance. Their findings reveal a strong correlation between ESM2 scores and conservation scores, which if true, could be useful for understanding IDRs in general. Through their ESM2-based calculations, the authors conclude that IDRs crucial for phase separation frequently contain conserved sequence motifs composed of both so-called sticker and spacer residues. The authors note that many such motifs have been experimentally validated as essential for phase separation.

      Unfortunately, I do not believe that the results can be trusted. ESM2 has not been validated for IDRs through experiments. The authors themselves point out its little use in that context. In this study, they do not provide any further rationale for why this situation might have changed. Furthermore, they mention that experimental perturbations of the predicted motifs in in vivo studies may further elucidate their functional importance, but none of that is done here. That some of the motifs have been previously validated does not give any credibility to the use of ESM2 here, given that such systems were probably seen during the training of the model.

      I believe that the authors should revamp their whole study and come up with a rigorous, scientific protocol where they make predictions and test them using ESM2 (or any other scientific framework).

    4. Reviewer #3 (Public review):

      Summary:

      This is a very nice and interesting paper to read about motif conservation in protein sequences and mainly in IDRs regions using the ESM2 language model. The topic of the paper is timely, with strong biological significance. The paper can be of great interest to the scientific community in the field of protein phase transitions and future applications using the ESM models. The ability of ESM2 to identify conserved motifs is crucial for disease prediction, as these regions may serve as potential drug targets. Therefore, I find these findings highly significant, and the authors strongly support them throughout the paper. The work motivates the scientific community towards further motif exploration related to diseases.

      Strengths:

      (1) Revealing conserved regions in IDRs by the ESM-2 language model.

      (2) Identification of functionally significant residues within protein sequences, especially in IDRs.

      (3) Findings supported by useful analyses.

      Weaknesses:

      (1) Lack of examples demonstrating the potential biological functions of these conserved regions

      (2) Very limited discussion of potential future work and of limitations.

    1. eLife Assessment

      This Review Article takes an original angle and covers several aspects of the leukocytes extravasation process with a focus on the role of ECM proteins. It is a timely piece with an original viewpoint. The current manuscript would benefit from improvement in writing and organization.

    2. Reviewer #1 (Public review):

      Summary:

      In this review, the author covered several aspects of the inflammation response, mainly focusing on the mechanisms controlling leukocyte extravasation and inflammation resolution.

      Strengths:

      This review is based on an impressive number of sources, trying to comprehensively present a very broad and complex topic.

      Weaknesses:

      (1) This reviewer feels that, despite the title, this review is quite broad and not centred on the role of the extracellular matrix.

      (2) The review will benefit from a stronger focus on the specific roles of matrix components and dynamics, with more informative subheadings.

      (3) The macrophage phenotype section doesn't seem well integrated with the rest of the review (and is not linked to the ECM).

      (4) Table 1 is difficult to follow. It could be reformatted to facilitate reading and understanding

      (5) Figure 2 appears very complex and broad.

      (6) Spelling and grammar should be thoroughly checked to improve the readability.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript is a timely and comprehensive review of how the extracellular matrix (ECM), particularly the vascular basement membrane, regulates leukocyte extravasation, migration, and downstream immune function. It integrates molecular, mechanical, and spatial aspects of ECM biology in the context of inflammation, drawing from recent advances. The framing of ECM as an active instructor of immune cell fate is a conceptual strength.

      Strengths:

      (1) Comprehensive synthesis of ECM functions across leukocyte extravasation and post-transmigration activity.

      (2) Incorporation of recent high-impact findings alongside classical literature.

      (3) Conceptually novel framing of ECM as an active regulator of immune function.

      (4) Effective integration of molecular, mechanical, and spatial perspectives.

      Weaknesses:

      (1) Insufficient narrative linkage between the vascular phase (Sections 2-6) and the in-tissue phase (Sections 7-10).

      (2) Underrepresentation of lymphocyte biology despite mention in early sections.

      (3) The MIKA macrophage identity framework is only loosely tied to ECM mechanisms.

      (4) Limited discussion of translational implications and therapeutic strategies.

      (5) Overly dense figure insets and underdeveloped links between ECM carryover and downstream immune phenotypes.

      (6) Acronyms and some mechanistic details may limit accessibility for a broader readership.

    4. Reviewer #3 (Public review):

      Summary & Strengths:

      This review by Yu-Tung Li sheds new light on the processes involved in leukocyte extravasation, with a focus on the interaction between leukocytes and the extracellular matrix. In doing so, it presents a fresh perspective on the topic of leukocyte extravasation, which has been extensively covered in numerous excellent reviews. Notably, the role of the extracellular matrix in leukocyte extravasation has received relatively little attention until recently, with a few exceptions, such as a study focusing on the central nervous system (J Inflamm 21, 53 (2024) doi.org/10.1186/s12950-024-00426-6) and another on transmigration hotspots (J Cell Sci (2025) 138 (11): jcs263862 doi.org/10.1242/jcs.263862). This review synthesizes the substantial knowledge accumulated over the past two decades in a novel and compelling manner.

      The author dedicates two sections to discussing the relevant barriers, namely, endothelial cell-cell junctions and the basement membrane. The following three paragraphs address how leukocytes interact with and transmigrate through endothelial junctions, the mechanisms supporting extravasation, and how minimal plasma leakage is achieved during this process. The subsequent question of whether the extravasation process affects leukocyte differentiation and properties is original and thought-provoking, having received limited consideration thus far. The consequences of the interaction between leukocytes and the extracellular matrix, particularly regarding efferocytosis, macrophage polarization, and the outcome of inflammation, are explored in the subsequent three chapters. The review concludes by examining tissue-specific states of macrophage identity.

      Weaknesses:

      Firstly, the first ten sections provide a comprehensive overview of the topic, presenting logical and well-formulated arguments that are easily accessible to a general audience. In stark contrast, the final section (Chapter 11) fails to connect coherently with the preceding review and is nearly incomprehensible without prior knowledge of the author's recent publication in Cell. Mol. Life Sci. CMLS 772 82, 14 (2024). This chapter requires significantly more background information for the general reader, including an introduction to the Macrophage Identity Kinetics Archive (MIKA), which is not even introduced in this review, its basis (meta-analysis of published scRNA-seq data), its significance (identification of major populations), and the reasons behind the revision of the proposed macrophage states and their further development. Secondly, while the attempt to integrate a vast amount of information into fewer figures is commendable, it results in figures that resemble a complex puzzle. The author may consider increasing the number of figures and providing additional, larger "zoom-in" panels, particularly for the topics of clot formation at transmigration hotspots and the interaction between ECM/ECM fragments and integrins. Specifically, the color coding (purple for leukocyte α6-integrins, blue for interacting laminins, also blue for EC α6 integrins, and red for interacting 5-1-1 laminins) is confusing, and the structures are small and difficult to recognize.

    1. eLife Assessment

      This important study explored a number of issues related to citations in the peer review process. An analysis of more than 37000 peer reviews at four journals found that: i) during the first round of review, reviewers were less likely to recommend acceptance if the article under review cited the reviewer's own articles; ii) during the second and subsequent rounds of review, reviewers were more likely to recommend acceptance if the article cited the reviewer's own articles; iii) during all rounds of review, reviewers who asked authors to cite the reviewer's own articles (a practice known as 'coercive citation') were less likely to recommend acceptance. However, when an author agreed to cite work by the reviewer, the reviewer was more likely to recommend acceptance of the revised article. The evidence is convincing, but article would benefit from a clearer presentation of the results and a more nuanced discussion of the motivations of reviewers.

    2. Reviewer #1 (Public Review):

      Summary:

      The work used open peer reviews and followed them through a succession of reviews and author revisions. It assessed whether a reviewer had requested the author include additional citations and references to the reviewers' work. It then assessed whether the author had followed these suggestions and what the probability of acceptance was based on the authors decision.

      Strengths and weaknesses:

      The work's strengths are the in-depth and thorough statistical analysis it contains and the very large dataset it uses. The methods are robust and reported in detail. However, this is also a weakness of the work. Such thorough analysis makes it very hard to read! It's a very interesting paper with some excellent and thought provoking references but it needs to be careful not to overstate the results and improve the readability so it can be disseminated widely. It should also discuss more alternative explanations for the findings and, where possible, dismiss them.

    3. Reviewer #2 (Public Review):

      Summary:

      This article examines reviewer coercion in the form of requesting citations to the reviewer's own work as a possible trade for acceptance and shows that, under certain conditions, this happens.

      Strengths:

      The methods are well done and the results support the conclusions that some reviewers "request" self-citations and may be making acceptance decisions based on whether an author fulfills that request.

      Weaknesses:

      The author needs to be more clear on the fact that, in some instances, requests for self-citations by reviewers is important and valuable.

    4. Reviewer #3 (Public Review):

      Summary:

      In this article, Barnett examines a pressing question regarding citing behavior of authors during the peer review process. In particular, the author studies the interaction between reviewers and authors, focusing on the odds of acceptance, and how this may be affected by whether or not the authors cited the reviewers' prior work, whether the reviewer requested such citations be added, and whether the authors complied/how that affected the reviewer decision-making.

      Strengths:

      The author uses a clever analytical design, examining four journals that use the same open peer review system, in which the identities of the authors and reviewers are both available and linkable to structured data. Categorical information about the approval is also available as structured data. This design allows a large scale investigation of this question.

      Weaknesses:

      My concerns pertain to the interpretability of the data as presented and the overly terse writing style.

      Regarding interpretability, it is often unclear what subset of the data are being used both in the prose and figures. For example, the descriptive statistics show many more Version 1 articles than Version 2+. How are the data subset among the different possible methods?

      Likewise, the methods indicate that a matching procedure was used comparing two reviewers for the same manuscript in order to control for potential confounds. However, the number of reviews is less than double the number of Version 1 articles, making it unclear which data were used in the final analysis. The methods also state that data were stratified by version. This raises a question about which articles/reviews were included in each of the analyses. I suggest spending more space describing how the data are subset and stratified. This should include any conditional subsetting as in the analysis on the 441 reviews where the reviewer was not cited in Version 1 but requested a citation for Version 2. Each of the figures and tables, as well as statistics provided in the text should provide this information, which would make this paper much more accessible to the reader. [Note from editor: Please see "Editorial feedback" for more on this]

      Finally, I would caution against imputing motivations to the reviewers, despite the important findings provided here. This is because the data as presented suggest a more nuanced interpretation is warranted. First, the author observes similar patterns of accept/reject decisions whether the suggested citation is a citation to the reviewer or not (Figs 3 and 4). Second, much of the observed reviewer behavior disappears or has much lower effect sizes depending on whether "Accept with Reservations" is considered an Accept or a Reject. This is acknowledged in the results text, but largely left out of the discussion. The conditional analysis on the 441 reviews mentioned above does support a more cautious version of the conclusion drawn here, especially when considered alongside the specific comments left by reviewers that were mentioned in the results and information in Table S.3. However, I recommend toning the language down to match the strength of the data.

    5. Reviewer #4 (Public Review):

      Summary:

      This work investigates whether a citation to a referee made by a paper is associated with a more positive evaluation by that referee for that paper. It provides evidence supporting this hypothesis. The work also investigates the role of self citations by referees where the referee would ask authors to cite the referee's paper.

      Strengths:

      This is an important problem: referees for scientific papers must provide their impartial opinions rooted in core scientific principles. Any undue influence due to the role of citations breaks this requirement. This work studies the possible presence and extent of this.

      Barring a few issues discussed below, the methods are solid and well done. The work uses a matched pair design which controls for article-level confounding and further investigates robustness to other potential confounds.

      It is surprising that even in these investigated journals where referee names are public, there is prevalence of such citation-related behaviors.

      Weaknesses:

      Some overall claims are questionable:

      "Reviewers who were cited were more likely to approve the article, but only after version 1" It also appears that referees who were cited were less likely to approve the article in version 1. This null or slightly negative effect undermines the broad claim of citations swaying referees. The paper highlights only the positive results while not including the absence (and even reversal) of the effect in version 1 in its narrative.

      "To the best of our knowledge, this is the first analysis to use a matched design when examining reviewer citations" Does not appear to be a valid claim based on the literature reference [18]

      It will be useful to have a control group in the analysis associated to Figure 5 where the control group comprises matched reviews that did not ask for a self citation. This will help demarcate words associated with approval under self citation (as compared to when there is no self citation). The current narrative appears to suggest an association of the use of these words with self citations but without any control.

      More discussion on the recommendations will help: For the suggestion that "the reviewers initially see a version of the article with all references blinded and no reference list" the paper says "this involves more administrative work and demands more from peer reviewers". I am afraid this can also degrade the quality of peer review, given that the research cannot be contextualized properly by referees. Referees may not revert back to all their thoughts and evaluations when references are released afterwards.

    6. Author response:

      There was a common theme across the reviews to provide a more cautious interpretation and to consider the key question of whether peer reviewers who include citations are being purely self-serving or are highlighting important missing context. I will include a suggested new text analysis to cover this and will expand the discussion on this key question. Reviewers highlighted some confusion around the sample sizes for the different analyses, and I will clarify all sample sizes in the next version.

    1. eLife Assessment

      This is a useful study, describing transcriptome-based PPGL subtypes and exploring the mutations, immune correlates, and disease progression of cases in each subtype. The cohort is a reasonable size, and a second cohort is included from TCGA. The identification of driver mutations in PPGL is incomplete, and this compromises characterisation for prognostic purposes. This is a reasonable starting point from which to further elucidate PPGL subtypes.

    2. Reviewer #1 (Public review):

      This study presents an exploration of PPGL tumour bulk transcriptomics and identifies three clusters of samples (labeled as subtypes C1-C3). Each subtype is then investigated for the presence of somatic mutations, metabolism-associated pathways and inflammation correlates, and disease progression.

      The proposed subtype descriptions are presented as an exploratory study. The proposed potential biomarkers from this subtype are suitably caveated, and will require further validation in PPGL cohorts together with a mechanistic study.

      The first section uses WGCNA (a method to identify clusters of samples based on gene expression correlations) to discover three transcriptome-based clusters of PPGL tumours.

      The second section inspects a previously published snRNAseq dataset, and labels some of the published cells as subtypes C1, C2, C3 (Methods could be clarified here), among other cells labelled as immune cell types. Further details about how the previously reported single-nuclei were assigned to the newly described subtypes C1-C3 require clarification.

      The tumour samples are obtained from multiple locations in the body (Figure 1A). It will be important to see further investigation of how the sample origin is distributed among the C1-C3 clusters, and whether there is a sample-origin association with mutational drivers and disease progression.

    3. Reviewer #2 (Public review):

      Summary:

      A study that furthers the molecular definition of PPGL (where prognosis is variable) and provides a wide range of sub-experiments to back up the findings. One of the key premises of the study is that identification of driver mutations in PPGL is incomplete and that compromises characterisation for prognostic purposes. This is a reasonable starting point on which to base some characterisation based on different methods.

      Strengths:

      The cohort is a reasonable size, and a useful validation cohort in the form of TCGA is used. Whilst it would be resource-intensive (though plausible given the rarity of the tumour type) to perform RNAseq on all PPGL samples in clinical practice, some potential proxies are proposed.

      Weaknesses:

      The performance of some of the proxy markers for transcriptional subtype is not presented.

      There is limited prognostic information available.

    4. Author response:

      Reviewer #1 (Public Review):

      This study presents an exploration of PPGL tumour bulk transcriptomics and identifies three clusters of samples (labeled as subtypes C1-C3). Each subtype is then investigated for the presence of somatic mutations, metabolism-associated pathways and inflammation correlates, and disease progression. The proposed subtype descriptions are presented as an exploratory study. The proposed potential biomarkers from this subtype are suitably caveated and will require further validation in PPGL cohorts together with a mechanistic study.

      The first section uses WGCNA (a method to identify clusters of samples based on gene expression correlations) to discover three transcriptome-based clusters of PPGL tumours. The second section inspects a previously published snRNAseq dataset, and labels some of the published cells as subtypes C1, C2, C3 (Methods could be clarified here), among other cells labelled as immune cell types. Further details about how the previously reported single-nuclei were assigned to the newly described subtypes C1-C3 require clarification.

      Thank you for your valuable suggestion. In response to the reviewer’s request for further clarification on “how previously published single-nuclei data were assigned to the newly defined C1-C3 subtypes,” we have provided additional methodological details in the revised manuscript (lines 103-109). Specifically, we aggregated the single-nucleus RNA-seq data to the sample level by summing gene counts across nuclei to generate pseudo-bulk expression profiles. These profiles were then normalized for library size, log-transformed (log1p), and z-scaled across samples. Using genesets scores derived from our earlier WGCNA analysis of PPGLs, we defined transcriptional subtypes within the Magnus cohort (Supplementary Figure. 1C). We further analyzed the single-nucleus data by classifying malignant (chromaffin) nuclei as C1, C2, or C3 based on their subtype scores, while non-malignant nuclei (including immune, stromal, endothelial, and others) were annotated using canonical cell-type markers (Figure. 4A).

      The tumour samples are obtained from multiple locations in the body (Figure 1A). It will be important to see further investigation of how the sample origin is distributed among the C1-C3 clusters, and whether there is a sample-origin association with mutational drivers and disease progression.

      Thank you for your valuable suggestion. In the revised manuscript (lines 74-79), Figure. 1A, Table S1 and Supplementary Figure. 1A, we harmonized anatomic site annotations from our PPGL cohort and the TCGA cohort and analyzed the distribution of tumor origin (adrenal vs extra-adrenal) across subtypes. The site composition is essentially uniform across C1-C3—approximately 75% pheochromocytoma (PC) and 25% paraganglioma (PG)—with only minimal variation. Notably, the proportion of extra-adrenal origin (paraganglioma origin) is slightly higher in the C1 subtype (see Supplementary Figure 1A), which aligns with the biological characteristics of tumors from this anatomical site, which typically exhibit more aggressive behavior.

      Reviewer #2 (Public Review):

      A study that furthers the molecular definition of PPGL (where prognosis is variable) and provides a wide range of sub-experiments to back up the findings. One of the key premises of the study is that identification of driver mutations in PPGL is incomplete and that compromises characterisation for prognostic purposes. This is a reasonable starting point on which to base some characterisation based on different methods. The cohort is a reasonable size, and a useful validation cohort in the form of TCGA is used. Whilst it would be resource-intensive (though plausible given the rarity of the tumour type) to perform RNA-seq on all PPGL samples in clinical practice, some potential proxies are proposed.

      We sincerely thank the reviewer for their positive assessment of our study’s rationale. We fully agree that RNA sequencing for all PPGL samples remains resource-intensive in current clinical practice, and its widespread application still faces feasibility challenges. It is precisely for this reason that, after defining transcriptional subtypes, we further focused on identifying and validating practical molecular markers and exploring their detectability at the protein level.

      In this study, we validated key markers such as ANGPT2, PCSK1N, and GPX3 using immunohistochemistry (IHC), demonstrating their ability to effectively distinguish among molecular subtypes (see Figure. 5). This provides a potential tool for the clinical translation of transcriptional subtyping, similar to the transcription factor-based subtyping in small cell lung cancer where IHC enables low-cost and rapid molecular classification.

      It should be noted that the subtyping performance of these markers has so far been preliminarily validated only in our internal cohort of 87 PPGL samples. We agree with the reviewer that larger-scale, multi-center prospective studies are needed in the future to further establish the reliability and prognostic value of these markers in clinical practice.

      The performance of some of the proxy markers for transcriptional subtype is not presented.

      We agree with your comment regarding the need to further evaluate the performance of proxy markers for transcriptional subtyping. In our study, we have in fact taken this point into full consideration. To translate the transcriptional subtypes into a clinically applicable classification tool, we employed a linear regression model to compare the effect values (β values) of candidate marker genes across subtypes (Supplementary Figure. 1D-F). Genes with the most significant β values and statistical differences were selected as representative markers for each subtype.

      Ultimately, we identified ANGPT2, PCSK1N, and GPX3—each significantly overexpressed in subtypes C1, C2, and C3, respectively, and exhibiting the most pronounced β values—as robust marker genes for these subtypes (Figure. 5A and Supplementary Figure. 1D-F). These results support the utility of these markers in subtype classification and have been thoroughly validated in our analysis. 

      There is limited prognostic information available.

      Thank you for your valuable suggestion. In this exploratory revision, we present the available prognostic signal in Figure. 5C. Given the current event numbers and follow-up time, we intentionally limited inference. We are continuing longitudinal follow-up of the PPGL cohort and will periodically update and report mature time-to-event analyses in subsequent work.

    1. eLife Assessment

      This valuable study presents an analysis of evolutionary conservation in intrinsically disordered regions, identified as key drivers of phase separation, leveraging a protein language model. The strength of evidence presented is convincing overall, though the theoretical grounding could benefit from further development.

    2. Reviewer #1 (Public review):

      The manuscript by Zhang et al describes the use of a protein language model (pLM) to analyse disordered regions in proteins, with a focus on those that may be important in biological phase separation. This is an interesting study that supports, complements and extends previous related analyses on the conservation and mutational tolerance of disordered regions, with a particular focus on disordered regions in proteins that are found in condensates.

    3. Reviewer #2 (Public review):

      This manuscript uses the ESM2 language model to map the evolutionary fitness landscape of intrinsically disordered regions (IDRs). The central idea is that mutational preferences predicted by these models could be useful in understanding eventual IDR-related behavior, such as disruption of otherwise stable phases. While ESM2-type models have been applied to analyze such mutational effects in folded proteins, they have not been used or verified for studying IDRs. Here, the authors use ESM2 to study membraneless organelle formation and the related fitness landscape of IDRs.

      Through this, their key finding in this work is the identification of a subset of amino acids that exhibit mutation resistance. Their findings reveal a strong correlation between ESM2 scores and conservation scores, which if true, could be useful for understanding IDRs in general. Through their ESM2-based calculations, the authors conclude that IDRs crucial for phase separation frequently contain conserved sequence motifs composed of both so-called sticker and spacer residues. The authors note that many such motifs have been experimentally validated as essential for phase separation.

      Comments on revisions:

      Unfortunately my concerns about lack of theoretical grounding and validation (especially critical in lack of theoretical grounding) persist. The argument about correlation between ESM2 scores and MSA conservation is circular. Protein language models already encode residue‑level conservation, so agreement with conservation does not establish new predictive power. For IDRs, conservation is a poor surrogate for function because many functions are mediated by short, degenerate SLiMs that are frequently gained and lost. Sequence‑only predictions therefore need orthogonal (preferably experimental or at the least in silico) tests. Finally, without a family‑level holdout (e.g., cluster de‑duplication at low identity) and prospective tests, overlap with known motifs cannot rule out training‑data memorization/near‑duplicates.

    1. eLife Assessment

      This investigation presents a valuable contribution by elucidating the genetic determinants of growth and fitness across multiple clinical strains of Mycobacterium intracellulare, an understudied non-tuberculous mycobacterium. Employing transposon sequencing (Tn-seq), the authors identify a core set of 131 genes essential for bacterial viability, offering a solid foundation for anti-mycobacterial drug discovery. However, there are minor but nonetheless significant concerns about data organization, which need to be addressed for greater scientific impact.

    2. Reviewer #1 (Public review):

      Summary:

      In this descriptive study, Tateishi et al. report a Tn-seq based analysis of genetic requirements for growth and fitness in 8 clinical strains of Mycobacterium intracellulare Mi), and compare the findings with a type strain ATCC13950. The study finds a core set of 131 genes that are essential in all nine strains, and therefore are reasonably argued as potential drug targets. Multiple other genes required for fitness in clinical isolates have been found to be important for hypoxic growth in the type strain.

      Strengths:

      The study has generated a large volume of Tn-seq datasets of multiple clinical strains of Mi from multiple growth conditions, including from mouse lungs. The dataset can serve as an important resource for future studies on Mi, which despite being clinically significant remains a relatively understudied species of mycobacteria.

      Weaknesses:

      The primary claim of the study that the clinical strains are better adapted for hypoxic growth is yet to be comprehensively investigated. However, this reviewer thinks such an investigation would require a complex experimental design and perhaps forms an independent study.

    3. Reviewer #4 (Public review):

      Summary:

      In this study Tateishi et al. used TnSeq to identify 131 shared essential or growth defect-associated genes in eight clinical MAC-PD isolates and the type strain ATCC13950 of Mycobacterium intracellulare which are proposed as potential drug targets. Genes involved in gluconeogenesis and the type VII secretion system which are required for hypoxic pellicle-type biofilm formation in ATCC13950 also showed increased requirement in clinical strains under standard growth conditions. These findings were further confirmed in a mouse lung infection model.

      Strengths:

      This study has conducted TnSeq experiments in reference and 8 different clinical isolates of M. intracellulare thus producing large number of datasets which itself is a rare accomplishment and will greatly benefit the research community.

      Weaknesses:

      (1) A comparative growth study of pure and mixed cultures of clinical and reference strains under hypoxia will be helpful in supporting the claim that clinical strains adapt better to such conditions. This should be mentioned as future directions in the discussion section along with testing the phenotype of individual knockout strains.<br /> (2) Authors should provide the quantitative value of read counts for classifying a gene as "essential" or "non-essential" or "growth-defect" or "growth-advantage". Merely mentioning "no insertions in all or most of their TA sites" or "unusually low read counts" or "unusually high low read counts" is not clear.<br /> (3) One of the major limitations of this study is the lack of validation of TnSeq results with individual gene knockouts. Authors should mention this in the discussion section.

    4. Reviewer #5 (Public review):

      Summary:

      In the research article, "Functional genomics reveals strain-specific genetic requirements conferring hypoxic growth in Mycobacterium intracellulare" Tateshi et al focussed their research on pulmonary disease caused by Mycobacterium avium-intracellulare complex which has recently become a major health concern. The authors were interested in identifying the genetic requirements necessary for growth/survival within host and used hypoxia and biofilm conditions that partly replicate some of the stress conditions experienced by bacteria in vivo. An important finding of this analysis was the observation that genes involved in gluconeogenesis, type VII secretion system and cysteine desulphurase were crucial for the clinical isolates during standard culture while the same were necessary during hypoxia in the ATCC type strain.

      Strength of the study:

      Transposon mutagenesis has been a powerful genetic tool to identify essential genes/pathways necessary for bacteria under various in vitro stress conditions and for in vivo survival. The authors extended the TnSeq methodology not only to the ATCC strain but also to the recently clinical isolates to identify the differences between the two categories of bacterial strains. Using this approach they dissected the similarities and differences in the genetic requirement for bacterial survival between ATCC type strains and clinical isolates. They observed that the clinical strains performed much better in terms of growth during hypoxia than the type strain. These in vitro findings were further extended to mouse infection models and similar outcomes were observed in vivo further emphasising the relevance of hypoxic adaptation crucial for the clinical strains which could be explored as potential drug targets.

      Weakness:

      The authors have performed extensive TnSeq analysis but fail to present the data coherently. The data could have been well presented both in Figures and text. In my view this is one of the major weakness of the study.

    1. eLife Assessment

      This important study provides evidence supporting the idea that postnatal experience plays an instructive role in shaping the patterns of functional connectivity between extrastriate visual cortex and frontal regions during development, by comparing neonates, blind and sighted adults. The evidence supporting the authors' claim is solid. Nevertheless, substantial weaknesses remain in mechanistic interpretation and alignment with relevant developmental frameworks. This study will be of significant interest to neuroscientists and neuroimaging researchers focused on vision, plasticity and development.

    2. Reviewer #1 (Public review):

      Summary:

      The present study evaluates the role of visual experience in shaping functional correlations between human extrastriate visual cortex and frontal regions. The authors used fMRI to assess "resting-state" temporal correlations in three groups: sighted adults, congenitally blind adults, and neonates. Previous research has already demonstrated differences in functional correlations between visual and frontal regions in sighted compared to early blind individuals. The novel contribution of the current study lies in the inclusion of an infant dataset, which allows for an assessment of the developmental origins of these differences.

      The main results of the study reveal that correlations between prefrontal and visual regions are more prominent in the blind and infant groups, with the blind group exhibiting greater lateralization. Conversely, correlations between visual and somato-motor cortices are more prominent in sighted adults. Based on these data, the authors conclude that visual experience plays an instructive role in shaping these cortical networks. This study provides valuable insights into the impact of visual experience on the development of functional connectivity in the brain.

      Strengths:

      The dissociations in functional correlations observed among the sighted adult, congenitally blind, and neonate groups provide strong support for the main conclusion regarding postnatal experience-driven shaping of visual-frontal connectivity.

      The inclusion of neonates offers a unique and valuable developmental anchor for interpreting divergence between blind and sighted adults. This is a major advance over prior studies limited to adult comparisons.

      Convergence with prior findings in the blind and sighted adult groups reinforces the reliability and external validity of the present results.

      The split-half reliability analysis in the infant data increases confidence in the robustness of the reported group differences.

      Weaknesses:

      The manuscript risks overstating a mechanistic distinction between sighted and blind development by framing visual experience as "instructive" and blindness as "reorganizing." Similarly, the binary framing of visual experience and blindness as independent may oversimplify shared plasticity mechanisms.

      The interpretation of changes in temporal correlations as altered neural communication does not adequately consider how shifts in shared variance across networks may influence these measures without reflecting true biological reorganization.

      The discussion does not substantively engage with the longstanding debate over whether sensory experience plays an instructive or permissive role in cortical development.

      The relationship between resting-state and task-based findings in blindness remains unclear.

    3. Reviewer #2 (Public review):

      Summary:

      Tian et al. explore the developmental origins of cortical reorganization in blindness. Previous work has found that a set of regions in the occipital cortex show different functional responses and patterns of functional correlations in blind vs. sighted adults. Here, Tian et al. explore how this organization arises over development. Is the "starting state" more like the blind pattern, or more like the adult pattern? Their analyses reveal that the answer depends on the particular networks investigated. Some functional connections in infants look more like blind than sighted adults; other functional connections look more like sighted than blind adults; and others fall somewhere in the middle, or show an altogether different pattern in infants compared with both sighted and blind adults.

      Strengths:

      The paper addresses very important questions about the starting state in the developing visual cortex, and how cortical networks are shaped by experience. Another clear strength lies in the unequivocal nature of many results. Many results have very large effect sizes, critical interactions between regions and groups are tested and found, and infant analyses are replicated in split halves of the data.

      Weaknesses:

      While potential roles of experience (e.g., visual, cross-modal) are discussed in detail, little consideration is given to the role of experience-independent maturation. The infants scanned are extremely young, only 2 weeks old. It is possible then that the sighted adult pattern may still emerge later in infancy or childhood, regardless of infant visual experience. If so, the blind adult pattern may depend on blindness-related experience only (which may or may not reflect "visual" experience per se). In short, it is not clear that birth, or the first couple weeks of life, are a clear cut "starting point" for development, after which all change can be attributed to experience.

    4. Reviewer #3 (Public review):

      Summary

      This study aimed to investigate whether the differences observed in the organization of visual brain networks between blind and sighted adults result from a reorganization of an early functional architecture due to blindness, or whether the early architecture is immature at birth and requires visual experience to develop functional connections. This question was investigated through the comparison of 3 groups of subjects with resting-state functional MRI (rs-fMRI). Based on convincing analyses, the study suggests that: 1) secondary visual cortices showed higher connectivity to prefrontal cortical regions (PFC) than to non-visual sensory areas (S1/M1 and A1) in infants like in blind adults, in contrast to sighted adults; 2) the V1 connectivity pattern of infants lies between that of sighted adults (showing stronger functional connectivity with non-visual sensory areas than with PFC) and that of blind adults (showing stronger functional connectivity with PFC than with non-visual sensory areas); 3) the laterality of the connectivity patterns of infants resembled those of sighted adults more than those of blind adults, but infants showed a less differentiated fronto-occipital connectivity pattern than adults.

      Strengths

      The question investigated in this article is important for understanding the mechanisms of plasticity during typical and impaired development, and the approach considered, which compares different groups of subjects including, neonates/infants and blind adults, is highly original.

      Overall, the presented analyses are solid and well detailed, and the results and discussion are convincing.

      Weaknesses

      While it is informative to compare the "initial" state (close to birth) and the "final" states in blind and sighted adults to study the impact of post-natal and visual experience, this study does not analyze the chronology of this development and when the specialization of functional connections is completed. This would require investigating the evolution of functional connectivity of the visual system as a function of visual experience and thus as a function of age, at least during toddlerhood given the early and intense maturation of the visual system after birth. This could be achieved by analyzing different developmental periods using open databases such as the Baby Connectome Project.

      The rationale for grouping full-term neonates and preterm infants (scanned at term-equivalent age) is not understandable when seeking to perform comparisons with adults. Even if the study results do not show differences between full-terms and preterms in terms of functional connectivity differences between regions and of connectivity patterns, preterms group had different neurodevelopment and post-natal (including visual) experiences (even a few weeks might have an impact). And actually they show reduced connectivity strength systematically for all regions compared with full-terms (Sup Fig 7). Considering a more homogeneous group of neonates would have strengthen the study design.

      The rationale for presenting results on the connectivity of secondary visual cortices before the one of primary cortices (V1) could be clarified.

      The authors acknowledge the methodological difficulties for defining regions of interest (ROIs) in infants in a similar way as adults. Since the brain development is not homogeneous and synchronous across brain regions (in particular with the frontal and parietal lobes showing a delayed growth), this poses major problems for registration. This raises the question of whether the study findings could be biased by differences in ROI positioning across groups.

    5. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The present study evaluates the role of visual experience in shaping functional correlations between extrastriate visual cortex and frontal regions. The authors used fMRI to assess "resting-state" temporal correlations in three groups: sighted adults, congenitally blind adults, and neonates. Previous research has already demonstrated differences in functional correlations between visual and frontal regions in sighted compared to early blind individuals. The novel contribution of the current study lies in the inclusion of an infant dataset, which allows for an assessment of the developmental origins of these differences.

      The main results of the study reveal that correlations between prefrontal and visual regions are more prominent in the blind and infant groups, with the blind group exhibiting greater lateralization. Conversely, correlations between visual and somato-motor cortices are more prominent in sighted adults. Based on these data, the authors conclude that visual experience plays an instructive role in shaping these cortical networks. This study provides valuable insights into the impact of visual experience on the development of functional connectivity in the brain.

      Strengths:

      The dissociations in functional correlations observed among the sighted adult, congenitally blind, and neonate groups provide strong support for the study's main conclusion regarding experience-driven changes in functional connectivity profiles between visual and frontal regions.

      In general, the findings in sighted adult and congenitally blind groups replicate previous studies and enhance the confidence in the reliability and robustness of the current results.

      Split-half analysis provides a good measure of robustness in the infant data.

      Weaknesses:

      There is some ambiguity in determining which aspects of these networks are shaped by experience.

      This uncertainty is compounded by notable differences in data acquisition and preprocessing methods, which could result in varying signal quality across groups. Variations in signal quality may, in turn, have an impact on the observed correlation patterns.

      The study's findings could benefit from being situated within a broader debate surrounding the instructive versus permissive roles of experience in the development of visual circuits.

      Reviewer #2 (Public Review):

      Summary:

      Tian et al. explore the developmental organs of cortical reorganization in blindness. Previous work has found that a set of regions in the occipital cortex show different functional responses and patterns of functional correlations in blind vs. sighted adults. In this paper, Tian et al. ask: how does this organization arise over development? Is the "starting state" more like the blind pattern, or more like the adult pattern? Their analyses reveal that the answer depends on the particular networks investigated; some functional connections in infants look more like blind than sighted adults; other functional connections look more like sighted than blind adults; and others fall somewhere in the middle, or show an altogether different pattern in infants compared with both sighted and blind adults. 

      Strengths:

      The question raised in this paper is extremely important: what is the starting state in development for visual cortical regions, and how is this organization shaped by experience? This paper is among the first to examine this question, particularly by comparing infants not only with sighted adults but also blind adults, which sheds new light on the role of visual (and cross-modal) experience. Another clear strength lies in the unequivocal nature of many results. Many results have very large effect sizes, critical interactions between regions and groups are tested and found, and infant analyses are replicated in split halves of the data. 

      Weaknesses:

      A central claim is that "infant secondary visual cortices functionally resemble those of blind more than sighted adults" (abstract, last paragraph of intro). I see two potential issues with this claim. First, a minor change: given the approaches used here, no claims should be made about the "function" of these regions, but rather their "functional correlations". Second (and more importantly), the claim that the secondary visual cortex in general resembles blind more than sighted adults is still not fully supported by the data. In fact, this claim is only true for one aspect of secondary visual area functional correlations (i.e., their connectivity to A1/M1/S1 vs. PFC). In other analyses, the infant secondary visual cortex looks more like sighted adults than blind adults (i.e., in within vs. across hemisphere correlations), or shows a different pattern from both sighted and blind adults (i.e., in occipito-frontal subregion functional connectivity). It is not clear from the manuscript why the comparison to PFC vs. non-visual sensory cortex is more theoretically important than hemispheric changes or within-PFC correlations (in fact, if anything, the within-PFC correlations strike me as the most important for understanding the development and reorganization of these secondary visual regions). It seems then that a more accurate conclusion is that the secondary visual cortex shows a mix of instructive effects of vision and reorganizing effects of blindness, albeit to a different extent than the primary visual cortex.

      Relatedly, group differences in overall secondary visual cortex connectivity are particularly striking as visualized in the connectivity matrices shown in Figure S1. In the results (lines 105-112), it is noted that while the infant FC matrix is strongly correlated with both adult groups, the infant group is nonetheless more strongly correlated with the blind than sighted adults. I am concerned that these results might be at least partially explained by distance (i.e., local spread of the bold signal), since a huge portion of the variance in these FC matrices is driven by stronger correlations between regions within the same system (e.g., secondary-secondary visual cortex, frontal-frontal cortex), which are inherently closer together, relative to those between different systems (e.g., visual to frontal cortex). How do results change if only comparisons between secondary visual regions and non-visual regions are included (i.e., just the pairs of regions within the bold black rectangle on the figure), which limits the analysis to long-rang connections only? Indeed, looking at the off-diagonal comparisons, it seems that in fact there are three altogether different patterns here in the three groups. Even if the correlation between the infant pattern and blind adult pattern survives, it might be more accurate to claim that infants are different from both adult groups, suggesting both instructive effects of vision and reorganizing effects of blindness. It might help to show the correlation between each group and itself (across independent sets of subjects) to better contextualize the relative strength of correlations between the groups. 

      It is not clear that differences between groups should be attributed to visual experience only. For example, despite the title of the paper, the authors note elsewhere that cross-modal experience might also drive changes between groups. Another factor, which I do not see discussed, is possible ongoing experience-independent maturation. The infants scanned are extremely young, only 2 weeks old. Although no effects of age are detected, it is possible that cortex is still undergoing experience-independent maturation at this very early stage of development. For example, consider Figure 2; perhaps V1 connectivity is not established at 2 weeks, but eventually achieves the adult pattern later in infancy or childhood. Further, consider the possibility that this same developmental progression would be found in infants and children born blind. In that case, the blind adult pattern may depend on blindness-related experience only (which may or may not reflect "visual" experience per se). To deal with these issues, the authors should add a discussion of the role of maturation vs. experience and temper claims about the role of visual experience specifically (particularly in the title). 

      The authors measure functional correlations in three very different groups of participants and find three different patterns of functional correlations. Although these three groups differ in critical, theoretically interesting ways (i.e., in age and visual/cross-modal experience), they also differ in many uninteresting ways, including at least the following: sampling rate (TR), scan duration, multi-band acceleration, denoising procedures (CompCor vs. ICA), head motion, ROI registration accuracy, and wakefulness (I assume the infants are asleep).

      Addressing all of these issues is beyond the scope of this paper, but I do feel the authors should acknowledge these confounds and discuss the extent to which they are likely (or not) to explain their results. The authors would strengthen their conclusions with analyses directly comparing data quality between groups (e.g., measures of head motion and split-half reliability would be particularly effective).

      Response #1: We appreciate the reviewer’s comments. In response, we have revised the paper to provide a more balanced summary of the data and clarified in the introduction which signatures the paper focuses on and why. Additionally, we have included several control analyses to account for other plausible explanations for the observed group differences. Specifically, we randomly split the infant dataset into two halves and performed split-half cross-validation. Across all comparisons, the results from the two halves were highly similar, suggesting that the effects are robust (see Supplementary Figures S3 and S4).

      Furthermore, we compared the split-half noise ceiling across the groups (infants, sighted adults, and blind adults) and found no significant differences between them (details in response #6). Finally, we repeated our analysis after excluding infants with a radiology score of 4 or 5, and the results remained consistent, indicating that our findings are not confounded by potential brain anomalies (details in response #2).

      We hope these control analyses help strengthen our conclusions.

      Reviewer #3 (Public Review):

      Summary:

      This study aimed to investigate whether the differences observed in the organization of visual brain networks between blind and sighted adults result from a reorganization of an early functional architecture due to blindness, or whether the early architecture is immature at birth and requires visual experience to develop functional connections. This question was investigated through the comparison of 3 groups of subjects with resting-state functional MRI (rs-fMRI). Based on convincing analyses, the study suggests that: 1) secondary visual cortices showed higher connectivity to prefrontal cortical regions (PFC) than to non-visual sensory areas (S1/M1 and A1) in sighted infants like in blind adults, in contrast to sighted adults; 2) the V1 connectivity pattern of sighted infants lies between that of sighted adults (stronger functional connectivity with non-visual sensory areas than with PFC) and that of blind adults (stronger functional connectivity with PFC than with non-visual sensory areas); 3) the laterality of the connectivity patterns of sighted infants resembled those of sighted adults more than those of blind adults, but sighted infants showed a less differentiated fronto-occipital connectivity pattern than adults.

      Strengths:

      The question investigated in this article is important for understanding the mechanisms of plasticity during typical and impaired development, and the approach considered, which compares different groups of subjects including, neonates/infants and blind adults, is highly original.

      -Overall, the analyses considered are solid and well-detailed. The results are quite convincing, even if the interpretation might need to be revised downwards, as factors other than visual experience may play a role in the development of functional connections with the visual system.

      Weaknesses:

      While it is informative to compare the "initial" state (close to birth) and the "final" states in blind and sighted adults to study the impact of post-natal and visual experience, this study does not analyze the chronology of this development and when the specialization of functional connections is completed. This would require investigating when experience-dependent mechanisms are important for the setting- establishment of multiple functional connections within the visual system. This could be achieved by analyzing different developmental periods in the same way, using open databases such as the Baby Connectome Project. Given the early, "condensed" maturation of the visual system after birth, we might expect sighted infants to show connectivity patterns similar to those of adults a few months after birth.

      The rationale for mixing full-term neonates and preterm infants (scanned at term-equivalent age) from the dHCP 3rd release is not understandable since preterms might have a very different development related to prematurity and to post-natal (including visual) experience. Although the authors show that the difference between the connectivity of visual and other sensory regions, and the one of visual and PFC regions, do not depend on age at birth, they do not show that each connectivity pattern is not influenced by prematurity. Simply not considering the preterm infants would have made the analysis much more robust, and the full-term group in itself is already quite large compared with the two adult groups. The current study setting and the analyses performed do not seem to be an adequate and sufficient model to ascertain that "a few weeks of vision after birth is ... insufficient to influence connectivity".

      In a similar way, excluding the few infants with detected brain anomalies (radiological scores higher or equal to 4) would strengthen the group homogeneity by focusing on infants supposed to have a rather typical neurodevelopment. The authors quote all infants as "sighted" but this is not guaranteed as no follow-up is provided.

      Response #2: We appreciate the reviewer’s suggestion. We re-analyzed the infant cohort after excluding all cases with radiological scores ≥4 (n =39 infants excluded). The revised analysis confirmed that the connectivity patterns reported in the main text remain statistically unchanged (see Supplementary Fig. S11). This demonstrates the robustness of our findings to potential confounding effects from potential brain anomalies. We have explicitly clarified this in the revised Methods section (page 14, line 391in the manuscript).

      In our dataset, newborns (average age at scan = 2.79 weeks) have very limited and immature vision. We agree with the reviewer that long-term visual outcomes cannot be guaranteed without follow-up data. The term "sighted infants" was used operationally to distinguish this cohort from congenitally blind populations.

      The post-menstrual age (PMA) at scan of the infants is also not described. The methods indicate that all were scanned at "term-equivalent age" but does this mean that there is some PMA variability between 37 and 41 weeks? Connectivity measures might be influenced by such inter-individual variability in PMA, and this could be evaluated.

      The rationale for presenting results on the connectivity of secondary visual cortices before one of the primary cortices (V1) was not clear to understand. Also, it might be relevant to better justify why only the connectivity of visual regions to non-visual sensory regions (S1-M1, A1) and prefrontal cortex (PFC) was considered in the analyses, and not the ones to other brain regions.

      In relation to the question explored, it might be informative to reposition the study in relation to what others have shown about the developmental chronology of structural and functional long-distance and short-distance connections during pregnancy and the first postnatal months.

      The authors acknowledge the methodological difficulties in defining regions of interest (ROIs) in infants in a similar way as adults. The reliability and the comparability of the ROIs positioning in infants is definitely an issue. Given that brain development is not homogeneous and synchronous across brain regions (in particular with the frontal and parietal lobes showing delayed growth), the newborn brain is not homothetic to the adult brain, which poses major problems for registration. The functional specialization of cortical regions is incomplete at birth. This raises the question of whether the findings of this study would be stable/robust if slightly larger or displaced regions had been considered, to cover with greater certainty the same areas as those considered in adults. And have other cortical parcellation approaches been considered to assess the ROIs robustness (e.g. MCRIB-S for full-terms)?

      Recommendations for the Authors:

      Reviewer #1(Recommendations for the authors):

      Further consideration should be given to the underlying changes in network architecture that may account for differences in functional correlations across groups. An increase (or decrease) in correlation between two regions could signify an increase (decrease) in connection or communication between those regions. Alternatively, it might reflect an increase in communication or connection with a third region, while the physical connections/interactions between the two original regions remain unchanged. These possibilities lead to distinct mechanistic interpretations. For example, there are substantial changes in connectivity during early visual (e.g. Burkhalter A. 1993, Cerebral Cortex) and visuo-motor development (e.g., Csibra et al. 2000 Neuroreport). It's not clear whether increases in communication within the visual network and improvements in visuo-motor behavior (e.g., Yizhar et al. 2023 Frontiers in Neuroscience) wouldn't produce a qualitatively similar pattern of results.

      Relatedly, the within-network correlation patterns between visual ROIs and frontal ROIs appear markedly different between sighted adults and infants (Supplementary Figure S1). To what extent do the differences in long-range correlations between visual and frontal regions reflect these within-network differences in functional organization?

      Response #3: The reviewer is raising some interesting questions about possible mechanisms and network changes. Resting state studies are indeed always subject to possibility that some effects are mediated by a third, unobserved region. Prior whole-cortex connectivity analyses have observed primarily changes in occipito-frontal connectivity in blindness, so there is not a clear cortical ‘third region’ candidate (Deen et al., 2015). However, some thalamic affects have also been observed and could contribute to the phenomenon (Bedny et al., 2011). Resting state changes in correlation between two areas do not imply changes in strength of long-range anatomical connectivity. Indeed, in the current case they may well reflect differential functional coupling, rather than strengthening or weakening of anatomical connections. We now discuss this in the Discussion section on page 12, line 301 as follows:

      “Despite these insights, many questions remain regarding the neurobiological mechanisms underlying experience-based functional connectivity changes and their relationship to anatomical development. Long-range anatomical connections between brain regions are already present in infants—even prenatally—though they remain immature (Huang et al., 2009; Kostović et al., 2019, 2021; Takahashi et al., 2012; Vasung, 2017). Functional connectivity changes may stem from local synaptic modifications within these stable structural pathways, consistent with findings that functional connectivity can vary independently of structural connection strength (Fotiadis et al., 2024). Moreover, functional connectivity has been shown to outperform structural connectivity in predicting individual behavioral differences, suggesting that experience-based functional changes may reflect finer-scale synaptic or network-level modulations not captured by macrostructural measures (Ooi et al., 2022). Prior studies also suggest that, even in adults, coordinated sensory-motor experience can lead to enhancement of functional connectivity across sensory-motor systems, indicating that large-scale changes in functional connectivity do not necessarily require corresponding changes in anatomical connectivity (Guerra-Carrillo et al., 2014; Li et al., 2018).”

      It is not clear how changes in correlation patterns among visual areas would produce the connectivity between visual areas and prefrontal areas reported in the current study. Activity in visual areas drives correlations both among visual areas and between visual and prefrontal areas and the same is true of prefrontal corticies.

      The findings from this study should be more closely linked to the extensive literature surrounding the debate on whether experience plays an instructive or permissive role in visual development (e.g., Crair 1999 Current Opin Neurobiol; Sur et al. 1999 J Neurobiol; Kiorpes 2016 J Neurosci; Stellwagen & Shatz 2002 Neuron; Roy et al. 2020 Nature Communications).

      Response #4: The instructive role suggests that specific experiences or patterns of neural activity directly shape and organize neural circuitry, while the permissive role indicates that such experiences or activity merely enable other factors, such as molecular signals, to influence neural circuit formation(Crair, 1999; Sur et al., 1999). To distinguish whether experience plays an instructive or permissive role, it is essential to manipulate the pattern or information content of neural activity while maintaining a constant overall activity level (Crair, 1999; Roy et al., 2020; Stellwagen & Shatz, 2002). However, both the sighted and blind adult groups have had extensive experience and neural activity in the visual cortices. For the sighted group, activity in the visual cortex is partly driven by bottom-up input from the external environment, through the retina, LGN, and ultimately to the cortex. In contrast, the blind group’s visual cortex activity is partially driven by top-down input from non-visual networks. The precise role of this activity in shaping the observed connectivity patterns remains unclear. Although our study cannot speak to this issue directly, we now link to the relevant literature on page 12,line 320 of the manuscript in the Discussion section as follows:

      “The current findings reveal both effects of vision and effects of blindness on the functional connectivity patterns of the visual cortex. A further open question is whether visual experience plays an instructive or permissive role in shaping neural connectivity patterns. An instructive role suggests that specific sensory experiences or patterns of neural activity directly shape and organize neural circuitry. In contrast, a permissive role implies that sensory experience or neural activity merely facilitates the influence of other factors—such as molecular signals—on the formation and organization of neural circuits (Crair, 1999; Sur et al., 1999). Studies with animals that manipulate the pattern or informational content of neural activity while keeping overall activity levels constant could distinguish between these hypotheses (Crair, 1999; Roy et al., 2020; Stellwagen & Shatz, 2002).”

      The assertion that a few weeks of vision after birth is insufficient to influence connectivity is provocative. Though supported by the study's results, it would benefit from integration with research in animal models showing considerable malleability of networks from early experience (e.g., Akerman et al. 2002 Neuron; Li et al. 2006 Nature Neuroscience; Stacy et al. 2023 J Neuroscience).

      Response #5: We thank the reviewer for their suggestion. The present study found that several weeks of postnatal visual experience is insufficient to significantly alter the long-term connectivity patterns of the visual cortices. While animal studies have shown that acute visual experience, or even exposure to visual stimuli through unopened eyelids, can robustly influence visual system development(Akerman et al., 2002; Li et al., 2008; Van Hooser et al., 2012). We think this discrepancy may be attributed to the substantial differences in developmental timelines between species. The human lifespan is much longer, and so is the human critical period, making it unclear how to map duration from one species to another. We briefly touched upon the time course issue in page 11 line 289 in the Discussion section as follows:

      “The present results reveal the effects of experience on development of functional connectivity between infancy and adulthood, but do not speak to the precise time course of these effects. Infants in the current sample had between 0 and 20 weeks of visual experience. Comparisons across these infants suggests that several weeks of postnatal visual experience is insufficient to produce a sighted-adult connectivity profile. The time course of development could be anywhere between a few months and years and could be tested by examining data from children of different ages.”

      Substantial differences between the groups are evident in several key aspects of the study, including the number of subjects, brain sizes, imaging parameters, and data preprocessing, all of which are likely to have an impact on the overall signal quality. To clarify how these differences might have impacted correlation differences between groups, it would be essential to include information on the noise ceilings for each correlation analysis within each group.

      Response #6: We thank the reviewer for their suggestion. We now report the split-half noise ceiling for adult and infant groups. For each participant, we first split the rs-fMRI time series into two halves, then calculated the ROI-wise rsFC pattern from the two splits. The split-half noise ceiling was estimated according to Lage-Castellanos et al (2019). The noise ceilings of the three groups (infants: 0.90 ± 0.056,blind adults: 0.88 ± 0.041, sighted adults: 0.90 ± 0.055) showed no significant difference (One-way ANOVA<sub>,</sub> F(2,552) = 2.348, p = 0.097). Therefore, we believe that overall signal quality is unlikely to impact our results. We also add the relevant context in the Method section in page 16 Line 447 as follows:

      “Substantial differences between the groups exist in this study, including the number of subjects, brain sizes, imaging parameters, and data preprocessing, all of which are likely to have an impact on the overall signal quality. To address this concern, we compared the split-half noise ceiling across the groups (infants, sighted adults, and blind adults). For each participant, we first split the rs-fMRI time series into two halves, then calculated the ROI-wise rsFC pattern from the two splits. The split-half noise ceiling was estimated according to Lage-Castellanos et al (Lage-Castellanos et al., 2019). The noise ceilings of the three groups (infants: 0.90 ± 0.056, blind adults: 0.88 ± 0.041, sighted adults: 0.90 ± 0.055) showed no significant difference (One-way ANOVA, F (2,552) = 2.348, p = 0.097). Therefore, overall signal quality is unlikely to impact our results.”

      In general, it appears that the infant correlations are stronger compared to the other groups. While this could reflect increased coherence or lack of differentiation, it is also possible that it is simply due to the presence of a non-neuronal global signal. Such a signal has the potential to substantially limit the effective range of functional correlations and comparisons with adults. To address this, it is advisable to conduct control analyses aimed at assessing and potentially removing global signals.

      Response #7: We agree with the reviewer that global signal regression (GSR) may help reduce non-neuronal artifacts, such as motion, cardiac, and respiratory signals, which are known to correlate with the global signal. However, the global signal also contains neural signals from gray matter, and removing it can introduce unwanted artifacts, especially for the current study. First, GSR can reduce the physiological accuracy of functional connectivity (FC); second, GSR may have differential effects across groups, potentially introducing additional artifacts in between-group comparisons, as noted by Murphy et al (Murphy & Fox, 2017). The CompCor method (Behzadi et al., 2007; Whitfield-Gabrieli & Nieto-Castanon, 2012) is capble to estimate the global non-neuronal artifacts like the GSR method. Meanwhile as it estimate global non-neuronal artifacts from signals within the white matter (WM) and cerebrospinal fluid (CSF) masks, but not the gray matter (GM), CompCor could introduce minimal unwanted bias to the GM signal.

      Was there a difference in correlations for preterm vs term neonates? Recent research has suggested that preterm births can have an impact on functional networks, particularly in frontal cortices. e.g., Tokariev et al. 2019, Li et al. 2021 elife; Zhang et al. 2022 Fronteirs in Neuroscience.

      Response #8: We have compared preterm and term neonates for all the main results, including the connectivity from the secondary visual cortex/V1 to non-visual sensory cortices versus prefrontal cortices, the laterality of occipito-frontal connectivity, and the specialization across different fronto-occipital networks. This information is reported in Page 6 line 169 and Supplementary Figure S7. The connectivities of full-term infants are generally higher than those of preterm infants. However, the connectivity patterns of term and preterm infants are very similar.

      The consistency between the current results and prior work (e.g., Burton et al. 2014) is notable, particularly in the observed greater correlations in prefrontal regions and weaker correlations in somato-motor regions for early blind individuals compared to sighted. However, almost all visual-frontal correlations in both groups were negative in that prior study. Some discussion on why positive correlations were found in the current study could help to clarify.

      Response #9: Many other papers have reported positive correlations similar to those found in our study (e.g., Deen et al., 2015; Kanjlia et al., 2021). In contrast, Burton's study identified predominantly negative visual-frontal correlations, we think this is likely because the global signal was regressed out during preprocessing. This methodological choice can lead to an increase in negative connections (Murphy & Fox, 2017).

      The term "secondary visual areas" used throughout the paper lacks specificity, and its usage in terms of underlying anatomical and functional areas has been inconsistent in the literature. It would be advisable to adopt a more precise characterization based on functional and/or anatomical criteria.

      Response #10: We specified in the article that Tthe occipital ROIs were defined in the current study are functional areas in people born blind identified in prior studies as regions that respond to three non-visual tasks such as language, math, or executive function, and show functional connectivity changes in blind adults in previous studies (Kanjlia et al., 2016, 2021; Lane et al., 2015). These regions respond to language, math and executivie function in the congenitally blind population (see Figure 1.) The are refered collectively as ‘secondary visual areas’ to destinguish them from V1. Anatomically, these three regions cover the majority of the lateral occipital cortex and part of the ventral occipital cortex, providing a good sample of the connectivity profile of higher-order visual areas. Thus, we are using the term "secondary visual areas" to refer to these regions. In blind individuals, although these regions respond to non-visual tasks, their exact functions are unknown.

      The inclusion of the ventral temporal cortex in the visual ROIs is currently only depicted in Supplementary Figure S7. To enhance the clarity of the areas of interest analyzed, it would be advisable to illustrate the ventral temporal areas in the main text. Were there notable differences in the frontal correlations between the lateral occipital visual areas and ventral temporal areas?

      Response #11: We thank the reviewer for pointing out this issue. We added a statement about the ventral visual cortex in describing the location of the ROI and added the ventral view of ROIs in the Figure 1. The language-responsive and math -responsive ROIs covers both the lateral and ventral visual cortex, whereas executive function (response-conflict) regions cover only the lateral visual cortex. We compared the connectivity patterns of these three regions and found no differences (see supplementary Fig S2).

      The blind group results are characterized as reflecting a reorganization in comparison to sighted adults while the results for sighted adults compared to infants are discussed more as a maturation ("adult pattern isn't default but requires experience to establish"). Both the sighted and blind adult groups showed differences from the infant group, and these differences are attributed to the role of experience. Why use "reorganization" for one result and maturation for another?

      Response #12: We agree with the reviewer that both of the adult groups should be thought of as equal in relation to the infants. In other words, the brain develops under one set of experiential conditions or another. We do not think that the adult sighted pattern reflects maturation. Rather, the sighted adult pattern reflects the combined influence of maturation and visual experience. The adult blind pattern reflects the combined influence of maturation and blindness. We use the term ‘reorganization’ to label differences in the blind adults relative to sighted infants. We do so for the purpose of clarity and to remain consistent with terminology in prior liaterature. However, we agree with the reviewer that the blind group does not reflect ‘reorganization’ intrinsically any more than the sighted adult group.

      The statement that "visual experience is required to set up long-range functional connectivity" is unclear, especially since the infant and blind groups showed stronger long-range functional correlations with PFC.

      Response #13: We revised this sentence to specifically as “visual experience establishes elements of the sighted-adult long-range connectivity” in tha Abstract line 17.

      The statement that the visual ROIS roughly correspond to "the anatomical location of areas such as V5/MT+, LO, V3a, and V4v" appears imprecise. From Supplementary Figure S7, these areas cover anterior portions of ventral temporal cortex (do these span the anatomical location of putative category-selective areas?) and into the intraparietal sulcus.

      Response #14: Thanks to the reviewer for the clarification. The ventral ROIs cover the middle and part of the anterior portion of the ventral temporal lobe, including the putative category-selective areas. Additionally, the dorsal ROIs extend beyond the occipital lobe to the intraparietal sulcus and superior parietal lobule. We have added a more detailed description of the anatomical location of the ROI in the Methods section Page 17 line 489 as follows:

      “Each functional ROI spans multiple anatomical regions and together the secondary visual ROIs tile large portions of lateral occipital, occipito-temporal, dorsal occipital and occipito-parietal cortices. In sighted people, the secondary visual occipital ROIs include the anatomical locations of functional regions such as motion area V5/MT+, the lateral occipital complex (LO), category specific ventral occipitotemporal cortices and dorsally, V3a and V4v.  The occipital ROI also covers the middle of the ventral temporal lobe. Dorsally, it extended to the intraparietal sulcus and superior parietal lobule.”

      The motivation for assessing correlations with motor and frontal regions was briefly discussed in the introduction. It would be helpful to reiterate this motivation when first introducing the analyses in the results.

      Response #15: Thank you for the thoughtful suggestion. Upon reflection, we chose to substantially revise the Introduction to more clearly and comprehensively explain the rationale for examining the couplings with motor and frontal regions, rather than reiterating it in the Results section. We believe this revised framing provides a stronger foundation for the analyses that follow, while avoiding redundancy across sections. We hope this addresses the reviewer’s concern.

      Reviewer #2 (Recommendations for the authors):

      Congratulations on a well-written paper and an interesting set of results.

      Reviewer #3 (Recommendations for the authors):

      Abstract:

      Mentioning "sighted infants" does not seem adequate.

      Response #16: In our dataset, newborns (average age at scan = 2.79 weeks) have very limited and immature vision. We agree with the reviewer that long-term visual outcomes cannot be guaranteed without follow-up data. The term "sighted infants" was used operationally to distinguish this cohort from congenitally blind populations.

      In sentences after "Specifically...", it was not clear whether the authors referred to V1 connectivity.

      Response #17: We thank the reviewer for this comment. In the revised abstract, we have removed the original "Specifically..." phrasing and clarified the results.

      Introduction

      Talking about the "instructive effects" of vision might be confusing or misleading. Visual experiences like exposure to oral language are part of the normal/spontaneous environment that allows the infant behavioral acquisitions (contrarily with learnings that occur later during development with instruction like for reading).

      Response #18: We appreciate the reviewer’s concern and would like to clarify that the term “instructive effect” is used here derived from neurodevelopmental studies (Crair, 1999; Sur et al., 1999). In this context, “instructive” refers to activity-dependent mechanisms where patterns of neural activity actively guide the organization of synaptic connectivity, emphasizing that spontaneous or sensory-driven activity (e.g., retinal waves, visual experience) can directly shape circuit refinement, as seen in ocular dominance column formation. In the context of our study, we emphasize that vision plays an instructive role in setting up the balance of connectivity between occipital cortex and non-visual networks.

      For references on the development of connectivity, I would advise citing MRI studies but also studies based on histological approaches (see for example the detailed review by Kostovic et al, NeuroImage 2019).

      Response #19: We thank the reviewer for this suggestion. We have incorporated a discussion on the long-range anatomical connections that emerge as early as infancy, referencing studies that employed diffusion MR imaging and histological methods, as detailed below.

      “Many long-range anatomical connections between brain regions are already established in infants, even before birth, although they are not yet mature (Huang et al., 2009; Kostović et al., 2019, 2021; Takahashi et al., 2012; Vasung, 2017).” (Page 12, line 303 in the manuscript)

      Results

      P7 l170: It might be helpful to be precise that this is "compared with inter-hemispheric connectivity".

      Response #20: We thank the reviewer for this suggestion. To align with our established terminology, we have revised the statement to explicitly contrast within-hemisphere connectivity with between-hemisphere connectivity. The modified text now reads (page 7, line 183 in the manuscript):

      “Compared to sighted adults, blind adults exhibited a stronger dominance of within-hemisphere connectivity over between-hemisphere connectivity. That is, in people born blind, left visual networks are more strongly connected to left PFC, whereas right visual networks are more strongly connected to right PFC.

      L176-181: It was not clear to me what was the difference between "across" and "between hemisphere connectivity". Would it be informative to test the difference between blind and sighted adults?

      Response #21: We clarify that there is no distinction between the terms “across” and “between hemisphere connectivity”—they refer to the same concept. To ensure consistency, we have revised the text to exclusively use “between hemisphere connectivity” throughout the manuscript. Regarding the comparison between blind and sighted adults, we conducted statistical comparisons between these groups in our analysis, and the results have been incorporated into the revised version (Page 7, line 187 in the manuscript).

      Adding statistics on Figure 3, but also on Figures 1 and 2 might help the reading.

      Response #22: We have added the statistics in Figure 1-4.

      Adding the third comparison in Figure 4 would be possible in my view.

      Response #23: We explored integrating the response-conflict region into Figure 4, but this would require a 3x3 bar chart with pairwise statistical significance markers, which introduced excessive visual complexity that hindered readers’ ability to grasp our intended message. To ensure clarity, we retained the original Figure 4 while providing the complete three-region analysis (including all statistical comparisons) in Supplementary Figure S8 to ensure completeness.

      Methods

      The authors might have to specify ages at birth, and ages at scan (median + range?).

      Response #24: We have added that information in the Methods section as follows:

      “The average age from birth at scan = 2.79 weeks (SD = 3.77, median = 1.57, range = 0 – 19.71); average gestational age at scan = 41.23 weeks (SD = 1.77, median = 41.29, range = 37 – 45.14); average gestational age at birth = 38.43 weeks (SD = 3.73, median = 39.71, range = 23 – 42.71).” (Page 14, line 379 in the manuscript)

      It might be relevant to comment on the range of available fMRI volumes, and the fact that connectivity measures might then be less robust in infants.

      Response #25: We report the range of fMRI volumes in the Methods section (Page 16, Line 449). Adult participants (blind and sighted) underwent 1–4 scanning sessions, each containing 240 volumes (mean scan duration: 710.4 seconds per participant). For infants, all subjects had 2300 fMRI volumes, and we retained a subset of 1600 continuous volumes per subject with the minimum number of motion outliers. While infant connectivity measures may inherently exhibit lower robustness due to developmental and motion-related factors, our infant cohort’s large sample size (n=475) and stringent motion censoring criteria enhance the reliability of group-level inferences. We have integrated this clarification into the Methods section (Page 16, Line 444) as follows:

      "While infant connectivity estimates may be less robust at the individual level compared to adults due to shorter scan durations and higher motion, our cohort’s large sample size (n=475) and rigorous motion censoring mitigate these limitations for group-level analyses. "

      The mention of dHCP 2nd release should be removed from the paragraph on data availability.

      Response #26: We have removed it.

    1. eLife Assessment

      This important study highlights the novel role of RSPO mimetic SZN-043 in the activation of hepatic WNT signaling and promoting hepatocyte regeneration. The authors provide convincing evidence of SZN-043 increasing hepatocytes proliferation in various mouse models, including a humanized mouse liver model, ALD model and CCL4 fibrosis model. This study will be of interest to researchers in liver regeneration and repair mechanisms.

    2. Reviewer #1 (Public review):

      Summary:

      The work by Fisher et al describes the role of novel RSPO mimetics in the activation of WNT signaling and hepatocyte regeneration. However, the results of the experiments and weaknesses of the methods used do not support the conclusions of the authors that the new therapy can promote liver regeneration in alcohol-induced liver cirrhosis.

      Strengths:

      Similarly to its precursor, aASGR1-RSPO2-RA-IgG, SZN-043 can upregulate Wnt target genes and promote hepatocyte proliferation in the liver.

      Comments on revisions:

      The authors responded to all my comments and concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The study by Fisher et al investigates therpauetic role for SZN-043, a hepatocyte-targeted R-spondin mimetic, for its potential role in restoring Wnt signaling and promoting liver-regeneration in alcohol-associated liver disease (ALD). Using multiple preclinical models, the compound was shown to promote hepatocyte proliferation and reduce fibrosis. This study highlights the efficacy in promoting liver regeneration while maintaining controlled signaling. Limitations include a need for further exploration of off-target effects and fibrosis mechanisms. The findings support SZN-043 as a promising candidate for ALD therapy, warranting further clinical evaluation. This is a well deigned study with thorough investigation using multiple disease models.

      Strengths:

      (1) Well-written manuscript with clear design, robust methods, and discussion.

      (2) Using multiple models strengthens the findings and expands beyond ALD.

      (3) Identification of SZN-043 as a novel potent drug for liver regeneration.

    4. Author response:

      Response to Comments from reviewer #1

      Many thanks for appreciating that SZN-043 can promote hepatocyte proliferation via the Wnt-signaling pathway.

      (1) The reviewer is concerned with using only CYP1A2 expression as an endpoint to make a conclusion about the effect of SZN-043 on Wnt activity in human ALD samples. The reviewer raises a good point as the more commonly used Wnt target gene, AXIN2, is not consistantly changed in both cohorts. We were at first also surprised by this finding. However, upon closer analysis we found that the expression of hepatocyte-specific target genes such as CYP1A2 (Figure 2), CYP2E1, OAT, LGR5, GLUL (Table 1) and ZNRF3 were mostly expressed in hepatocytes and ductal cells were all down-regulated in ALD samples. Others Wnt target genes expressed in epithelial and mesenchymal liver cell populations, such as AXIN2, CCND1 and NOTUM are indeed not consistently and significantly changed. Given that SZN-043 is not active on mesenchymal cells, this discrepancy could be best explained by the large increase in mesenchymal cells in ALD tissue samples, thereby confounding the results. We have now clarified this in the discussion. Another method to assess Wnt activity is to measure b-catenin phosphorylation and nuclear transfer. In our hands, this method was found to be better suited for tissue culture than histological sections from in vivo studies. We have also amended the manuscript title to refer to expression of Wnt target genes, rather than Wnt activity.

      (2) We have now added a supplemental figure to show the lack of Ki-67+ human hepatocytes in the cirrhotic tissue samples to confirm the absence of hepatocyte proliferation (Figure S1).

      (3) The differences in amino acid sequence between SZN-043 and its precursor, αASGR1-RSPO2-RAIgG, can be found in the material and method section. These changes in amino acid sequences improved the biophysical properties of the final clinical candidate, such as oxidation and nonspecific binding. The biochemical analysis of those differences exceeds the scope of the current manuscript. We present here the pharmacokinetic properties of SZN-043 only, as this was the only molecule advanced to clinical trial and used in the studies presented here.

      (4) The reviewer suggests to assess the effect of SZN-043 in Ctnnb1-KO mice to confirm that SZN043 acts via a canonical Wnt pathway. Indeed, there were several reports on the ability of Rspondin to act on other pathways besides the Wnt signaling pathway (for recent review, Niehrs et al, 2024, Bioessays). However, while an interesting suggestion, this line of investigation belongs to MOA studies and exceeds the scope of the current manuscript. An additional manuscript presenting MOA studies for SZN-043 was recently submitted elsewhere. Still, we have added this possibility in the discussion section.

      (5) The reviewer is asking how SZN-043 is affecting liver functions in general. Indeed, we have observed a consistent reduction in the international normalized ratio of prothrombin time using the thioacetamide (TAA)-induced fibrosis model and previously published those findings (Zhang, 2020). In our hands, the TAA is the only liver injury model that significantly increases INR. This increase is modest compared to that observed in clinical patients. Therefore, we do not report INR findings for other models. We have not seen any effects of SZN-043 on hepatocyte differentiation markers such as HNF4A (data not shown) and the hepatocyte specific ASGR1/2 as shown in Figure 5. Rather we focused on proliferation as the main potentially beneficial endpoint, to restore the parenchymal mass in injured livers. Finally, consistent with what was reported in the literature, we have observed a transient and reciprocal effect on albumin and alfa-fetoprotein expression during the proliferative phase of liver regeneration. These results are detailed in an additional manuscript presenting MOA studies for SZN-043, which was recently submitted elsewhere.

      (6) We have used females only in the ethanol-induced injury models because there are numerous reports in the literature stating that males are not as susceptible to those injuries.  

      (7) The reviewer questions the relevance of the ethanol-induced injury model used to evaluate SZN043 efficacy. Indeed, none of the disease model developed to date reproduce the severity and complexity of alcohol-associated liver diseases, although some, such as the ethanol supplemented Lieber DeCarli diet, are more commonly used than others – which is the reason why this model was selected. 

      (8) The reviewer questions the relevance of the fibrosis model used to evaluate SZN-043 efficacy. Indeed, none of the fibrosis models developed to date reproduce the severity and complexity of cirrhosis in human livers. While combining ethanol with CCl4 would lead to more severe fibrotic livers, CCl4 itself is not involved in ALD in humans. Both models are likely to result in similar pericentral fibrosis with central-to-central bridging. In this study, we were mostly interested in addressing the effects of SZN-043 in a tissue affected by fibrotic scars.  

      (9) The sex of CCl4-treated mice is male. We added this information in the methods section.

      (10) A summary of histology and fibrosis assessment data for alcohol-fed mice was added in supplemental Table S3. In our hands, the use of aging mice did not induce the presence of fibrosis, in contrast to published results.  

      (11) The rationale for using 13.5-month-old mice in the alcohol studies and scid mice in the CCl4 studies has been clarified in the results and discussion sections. 

      a. Briefly, aging mice were reported to be more susceptible to ethanol-induced injury than young mice and to include induction of fibrosis. However, we were unable to reproduce the presence of fibrosis reported in the literature.  

      b. Scid mice were used in the CCl4 studies to test whether a stronger response could be observed in the absence of a potential anti-drug antibodies response. While a modest reduction in fibrosis was observed in both B6 and scid mice following the SZN-043 treatment, the effect size did not seem affected by the mouse strain. 

      Response to Comments from reviewer #2

      Many thanks for appreciating that the use of multiple disease models to identify SZN-043 as a potential novel drug for liver regeneration.

      (1) The importance of restoring liver regeneration capacity to reduce the need for liver transplantation had been emphasized in the introduction.

      (2) There is continuous damage to the mouse hepatocytes in the FRG mice, due to the Fah mutation. They undergo repair mechanisms favoring the proliferation of human hepatocytes during the production period. Injury models that affect the human hepatocytes population have been developed in these mice. However, the primary goal of this study was to confirm that SZN043 was efficacious in inducing human hepatocytes proliferation, a feature difficult to reproduce in primary hepatocyte cultures. Given the artefactual nature of the chimeric liver in FRG mice and the high cost of these mice, further studies were not judged to be necessary.

      (3) Corrected

      (4) A figure including DAPI staining has now been included in supplemental Figure S2.

      (5) Clarification that the 8 weeks alcohol feeding used in our study design is a modification of the NIAAA model. While some ASGR1 has been reported on the surface of macrophages, additional data from MOA studies strongly suggest that the effect of SZN-043 is mediated via a hepatocytespecific mechanism (submitted manuscript).

      (6) The reviewer inquired about the potential role of macrophages in promoting an antiinflammatory state in response to SZN-043. While a direct effect is unlikely, a potential effect of macrophages in response to SZN-043 is plausible. Wnt activation is known to induce the secretion of hepatokines, such as LECT2, which in turn can influence macrophage activity. This possibility is discussed in the discussion section.

      (7) The potential off-target effects of SZN-043 such as stellate cell activation is discussed in the discussion section.

      (8) The discussion of the limitations of current models has been included in the discussion section of the manuscript.

      (9) We have now included a discussion of prior RSPO-based therapies, such as OMP-131R10. We explain why the hepatocyte-targeting of RSPO activity minimizes undesired effects.

    1. eLife Assessment

      This study presents a valuable finding that the blood-brain barrier (BBB) may be modulated through specific modes of electroacupuncture stimulation. The data were collected and analyzed using a solid and validated methodology, and can be used as a starting point for functional studies of the BBB for drug delivery across healthy and diseased states. The work will be of broad interest to scientists working in the field of drug delivery and drug development.

    2. Joint Public Review:

      This study employs single-cell RNA sequencing to investigate how electroacupuncture (EA) stimulation alters the transcriptional profiles of central nervous system cell types following blood-brain barrier (BBB) opening. The authors seek to characterize changes in gene expression and pathway activities across diverse neural cells in response to electroacupuncture (EA) stimulation using high-resolution transcriptomics. This approach has the potential to elucidate the cellular mechanisms underlying EA stimulation and their implications for therapeutic intervention. The work engages with a timely and biologically significant question regarding noninvasive stimulation methods to manipulate BBB permeability. However, no in vivo/in vitro functional assays are provided to validate the changes in BBB permeability or cytokine release in the tested models. The experimental rationale remains inadequately explained, and key details regarding the magnitude, duration, and spatial distribution of BBB opening in this system are still lacking.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The work from this paper successfully mapped transcriptional landscape and identified EA-responsive cell types (endothelial, microglia). Data suggest EA modulates BBB via immune pathways and cell communication. However, claims of "BBB opening" are not directly proven (no permeability data).

      (1) No in vivo/in vitro assays confirm BBB permeability changes (e.g., Evans blue leakage, TEER).

      (2) Only male rats were used, ignoring sex-specific BBB differences.

      (3) Pericytes and neurons, critical for the BBB, were not captured, likely due to dissociation artifacts.

      (4) Protein-level validation (Western blot, IHC) absent for key genes (e.g., LY6E, HSP90).

      (5) Fixed stimulation protocol (2/100 Hz, 40 min); no dose-response or temporal analysis.

      We sincerely apologize for the oversight regarding the description of changes in blood-brain barrier permeability. In fact, our team conducted a series of preliminary studies that verified this aspect, and we hace provided a more detailed introduction in the introduction section, in lines 60-71 of the manuscript.

      We are very grateful to the reviewers for pointing out the important and meaningful issue of "gender-specific BBB differences." We will make this a focal point in our future research.

      As for pericytes and neurons, we acknowledge their importance in the function of the blood-brain barrier. We acknowledge the importance of pericytes and neurons in the blood-brain barrier. However, neurons are absent because our sample processing method involves dissociation. During the dissociation procedure, neuronal axons, which are relatively long, are filtered out during the frequent cell suspension steps and cannot enter the downstream microfluidic system for analysis, so they are not present in our data. Since this experiment is primarily focused on non-neuronal cells, we did not choose to use nucleus extraction for sample processing. As for pericytes, we believe they are not captured because their proportion in our samples is extremely low, which is why they are not present in the data. Further research may require single-nucleus transcriptomics or the separate isolation of these two cell types for study. Of course, in our current mechanistic studies, we are also fully considering the important roles these two cell types play in BBB function.

      In addition, to validate the results at the protein level, we have recently conducted some experiments. However, as several proteins are currently at a critical stage of further experimental validation, it is not appropriate to present them in the manuscript at this time. Instead, we have uploaded the relevant data as an appendix for your review. This includes a figure of several protein markers we examined, as well as a table of the antibodies used.

      This section is also further elaborated in the introduction and its references.

      Reviewer #2 (Public review):

      Summary:

      This study uses single-cell RNA sequencing to explore how electroacupuncture (EA) stimulation alters the brain's cellular and molecular landscape after blood-brain barrier (BBB) opening. The authors aim to identify changes in gene expression and signaling pathways across brain cell types in response to EA stimulation using single-cell RNA sequencing. This direction holds promise for understanding the consequences of noninvasive methods of BBB opening for therapeutic drug delivery across the BBB.

      (1) The work falls short in its current form. The experimental design lacks a clear justification, and readers are not provided with sufficient background information on the extent, timing, or regional specificity of BBB opening in this EA model. These details, established in prior work, are critical to understanding the rationale behind the current transcriptomic analyses.

      (2) Further, the results are often presented with minimal context or interpretation. There is no model of intercellular or molecular coordination to explain the BBB-opening process, despite the stated goal of identifying such mechanisms. The statement that EA induces a "unique frontal cortex-specific transcriptome signature" is not supported, as no data from other brain regions are presented. Biological interpretation is at times unclear or inaccurate - for instance, attributing astrocyte migration effects to endothelial cell clusters or suggesting microglial tight junction changes without connecting them meaningfully to endothelial function.

      (3) The study does include analyses of receptor-ligand signaling and cell-cell communication, which could be among its most biologically rich outputs. However, these are relegated to supplementary material and not shown in the leading figures. This choice limits the utility of the manuscript as a hypothesis-generating resource.

      (4) Overall, while the dataset may be of interest to BBB researchers and those developing technologies for drug delivery across the BBB, the manuscript in its current form does not yet fulfill its interpretive goals. A more integrated and biologically grounded analysis would be beneficial.

      This section is also further elaborated in the introduction and its references.

      Our current study is actually based on previous findings that electroacupuncture can open the BBB, with a more pronounced effect observed in the frontal lobe (this aspect should be further described in the research background). Building on this foundation, our aim is to delineate the potential biological mechanisms involved. Therefore, we selected frontal lobe tissue as our primary choice for sequencing and have not yet investigated differences across other brain regions, although this may become a focus of future research. Additionally, we recognize that the mechanism underlying BBB opening is complex, and at present, we cannot determine whether it is driven by a single direct factor or by coordinated actions between cells or molecules. As such, our results are presented only briefly for now, and we will carefully consider whether to supplement our findings by incorporating insights from other studies.

      Considering the overall data layout and the length of the article, we ultimately decided not to make any changes to the presentation of the article's data. The images included in the supplementary materials are also thoroughly described and referenced in the manuscript, allowing readers to selectively view any data they are interested in.

      Indeed, our current dataset and analysis tend to present objective data results. We are also conducting a series of validations that may be related to the biology of the blood-brain barrier, and we look forward to sharing and discussing any future research findings with you and everyone.

      Reviewer #1 (Recommendations for the authors):

      (1) Figures 3-7: Label treatment groups (CON vs. EA) consistently in legends.

      (2) Methods: Specify rat strain (Sprague-Dawley) in the abstract.

      (3) Clarify Limitations: Explicitly state that BBB opening is inferred, not proven.

      This section has been revised at lines 743-733, 748, 949, 754-755, and 759-760 of the manuscript.

      Revised at line 31 of the manuscript.

      Thank you for your feedback. The background information on the open evidence of BBB has been added to the introduction.

      Reviewer #2 (Recommendations for the authors):

      (1) Abstract and Introduction

      • Include specific key findings in the abstract to improve clarity and reader engagement.

      • Expand the introduction to situate this work in the context of other BBB-opening methods (e.g., ultrasound) and the known consequences of BBB disruption.

      • Clarify the rationale for choosing electroacupuncture.

      • Include information (perhaps summarized from previous studies) about the extent, timeline, and functional assessment of BBB opening in this model to help justify the single-cell RNA-seq design.

      (2) Experimental Rationale and Context

      • Reiterate experimental design and rationale in each results section, rather than relying exclusively on the Methods section.

      • Specify the time point of tissue collection relative to the EA intervention.

      • Describe the anatomical sites of acupuncture stimulation and their physiological relevance.

      (3) Data Presentation

      • Replace the human brain cartoon in Figure 1 with an anatomically appropriate rat brain schematic.

      • Reevaluate which data are presented in the main versus supplementary figures. Highlight biologically meaningful results, such as cell-cell communication and ligand-receptor interactions, in the main figures rather than supplementary data.

      (4) Interpretation and Modeling

      • More carefully link transcriptional changes (e.g., Wnt signaling in microglia) to biologically plausible mechanisms of BBB regulation-e.g., microglial signaling to endothelial cells.

      • Clarify whether the presence of granulocytes and T cells might result from a lack of perfusion prior to brain dissection.

      • Consider proposing a model (even speculative) of how EA leads to BBB opening based on observed transcriptional changes.

      First, for the sake of brevity in the abstract, we did not present specific results in this section. Second, since BBB opening via EA is a unique strategy, our previous studies have examined the opening time window and the recovery of the BBB after EA intervention (as mentioned in the introduction). We believe its characteristics differ from those of ultrasound-induced BBB opening and BBB disruption, so we did not conduct comparative discussions, but objectively presented our research findings. In further functional validation experiments, we may consider integrating other opening strategies in our studies. Additionally, the choice of electroacupuncture was based on our previous series of studies, which have already been outlined in the research background. Finally, we did indeed determine the experimental design of this study based on prior research, as described in the background section of the introduction.

      We decided not to make changes to this section in the manuscript after careful consideration. The setup of electroacupuncture intervention and controls has been thoroughly discussed in our previous studies (as referenced in the introduction), so we have not repeated it in this manuscript. Overall, building on all our previous findings, this study focuses primarily on the potential mechanisms of EA intervention. The anatomical sites of acupuncture stimulation and their physiological relevance are another key area of our research, and we are currently conducting a series of related studies. We look forward to sharing these findings with you in the future.

      We have already changed the human brain diagram in Figure 1 to a rat brain diagram, and have replaced Figure 1 in the files with the revised version. However, considering the overall data layout and the length of the article, we ultimately decided not to make changes to the data presentation in the manuscript. The images in the supplementary materials are also thoroughly described and referenced in the manuscript, allowing readers to selectively view the data they are interested in.

      This section has provided us with excellent suggestions for further exploration, although no changes have been made to the manuscript at this time. In the future, we may conduct more detailed transcriptomic studies focusing on sex differences and different brain regions, which will allow for a more comprehensive analysis of the biological mechanisms involved in BBB regulation.

    1. eLife Assessment

      This valuable study explores the role of the chromatin regulator ATAD2 in mouse spermatogenesis. It convincingly demonstrates that ATAD2 is essential for proper chromatin remodeling in haploid spermatids, influencing gene accessibility, H3.3-mediated transcription, and histone eviction. Using Atad2 knockout (KO) mice, the authors link ATAD2 to the DNA-replication-independent incorporation of sperm-specific proteins like protamines and histone H3.3. Although the findings highlight chromatin abnormalities and impaired in vitro fertilization in KO mice, natural fertility remains unaffected, suggesting possible in vivo compensatory mechanisms. However, in its current form, the study lacks mechanistic insight and provides only partial evidence for ATAD2's molecular role, limiting its functional conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      The authors analyzed the expression of ATAD2 protein in post-meiotic stages and characterized the localization of various testis-specific proteins in the testis of the Atad2 knockout (KO). By cytological analysis as well as the ATAC sequencing, the study showed that increased levels of HIRA histone chaperone, accumulation of histone H3.3 on post-meiotic nuclei, defective chromatin accessibility and also delayed deposition of protamines. Sperm from the Atad2 KO mice reduces the success of in vitro fertilization. The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin.

      Strengths:

      The paper describes the role of ATAD2 AAA+ ATPase in the proper localization of sperm-specific chromatin proteins such as protamine, suggesting the importance of the DNA replication-independent histone exchanges with the HIRA-histone H3.3 axis.

      Weaknesses:

      (1) Some results lack quantification.

      (2) The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Liakopoulou et al. presents a comprehensive investigation into the role of ATAD2 in regulating chromatin dynamics during spermatogenesis. The authors elegantly demonstrate that ATAD2, via its control of histone chaperone HIRA turnover, ensures proper H3.3 localization, chromatin accessibility, and histone-to-protamine transition in post-meiotic male germ cells. Using a new well-characterized Atad2 KO mouse model, they show that ATAD2 deficiency disrupts HIRA dynamics, leading to aberrant H3.3 deposition, impaired transcriptional regulation, delayed protamine assembly, and defective sperm genome compaction. The study bridges ATAD2's conserved functions in embryonic stem cells and cancer to spermatogenesis, revealing a novel layer of epigenetic regulation critical for male fertility.

      Strengths:

      The MS first demonstration of ATAD2's essential role in spermatogenesis, linking its expression in haploid spermatids to histone chaperone regulation by connecting ATAD2-dependent chromatin dynamics to gene accessibility (ATAC-seq), H3.3-mediated transcription, and histone eviction. Interestingly and surprisingly, sperm chromatin defects in Atad2 KO mice impair only in vitro fertilization but not natural fertility, suggesting unknown compensatory mechanisms in vivo.

      Weaknesses: The MS is robust and there are not big weaknesses

    4. Reviewer #3 (Public review):

      Summary:

      The authors generated knockout mice for Atad2, a conserved bromodomain-containing factor expressed during spermatogenesis. In Atad2 KO mice, HIRA, a chaperone for histone variant H3.3, was upregulated in round spermatids, accompanied by an apparent increase in H3.3 levels. Furthermore, the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis were partially disrupted in the absence of ATAD2, possibly due to delayed histone removal. Despite these abnormalities, Atad2 KO male mice were able to produce offspring normally.

      Strengths:

      The manuscript addresses the biological role of ATAD2 in spermatogenesis using a knockout mouse model, providing a valuable in vivo framework to study chromatin regulation during male germ cell development. The observed redistribution of H3.3 in round spermatids is clearly presented and suggests a previously unappreciated role of ATAD2 in histone variant dynamics. The authors also document defects in the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis, providing phenotypic insight into chromatin transitions in late spermatogenic stages. Overall, the study presents a solid foundation for further mechanistic investigation into ATAD2 function.

      Weaknesses:

      While the manuscript reports the gross phenotype of Atad2 KO mice, the findings remain largely superficial and do not convincingly demonstrate how ATAD2 deficiency affects chromatin dynamics. Moreover, the phenotype appears too mild to elucidate the functional significance of ATAD2 during spermatogenesis.

      (1) Figures 4-5: The analyses of differential gene expression and chromatin organization should be more comprehensive. First, Venn diagrams comparing the sets of significantly differentially expressed genes between this study and previous work should be shown for each developmental stage. Second, given the established role of H3.3 in MSCI, the effect of Atad2 knockout on sex chromosome gene expression should be analyzed. Third, integrated analysis of RNA-seq and ATAC-seq data is needed to evaluate how ATAD2 loss affects gene expression. Finally, H3.3 ChIP-seq should be performed to directly assess changes in H3.3 distribution following Atad2 knockout.

      (2) Figure 3: The altered distribution of H3.3 is compelling. This raises the possibility that histone marks associated with H3.3 may also be affected, although this has not been investigated. It would therefore be important to examine the distribution of histone modifications typically associated with H3.3. If any alterations are observed, ChIP-seq analyses should be performed to explore them further.

      (3) Figure 7: While the authors suggest that pre-PRM2 processing is impaired in Atad2 KO, no direct evidence is provided. It is essential to conduct acid-urea polyacrylamide gel electrophoresis (AU-PAGE) followed by western blotting, or a comparable experiment, to substantiate this claim.

      (4) HIRA and ATAD2: Does the upregulation of HIRA fully account for the phenotypes observed in Atad2 KO? If so, would overexpression of HIRA alone be sufficient to phenocopy the Atad2 KO phenotype? Alternatively, would partial reduction of HIRA (e.g., through heterozygous deletion) in the Atad2 KO background be sufficient to rescue the phenotype?

      (5) The mechanism by which ATAD2 regulates HIRA turnover on chromatin and the deposition of H3.3 remains unclear from the manuscript and warrants further investigation.

    5. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      The authors analyzed the expression of ATAD2 protein in post-meiotic stages and characterized the localization of various testis-specific proteins in the testis of the Atad2 knockout (KO). By cytological analysis as well as the ATAC sequencing, the study showed that increased levels of HIRA histone chaperone, accumulation of histone H3.3 on post-meiotic nuclei, defective chromatin accessibility and also delayed deposition of protamines. Sperm from the Atad2 KO mice reduces the success of in vitro fertilization. The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin. 

      We would like to take this opportunity to highlight that the present study builds on our previously published work, which examined the function of ATAD2 in both yeast S. pombe and mouse embryonic stem (ES) cells (Wang et al., 2021). In yeast, using genetic analysis we showed that inactivation of HIRA rescues defective cell growth caused by the absence of ATAD2. This rescue could also be achieved by reducing histone dosage, indicating that the toxicity depends on histone over-dosage, and that HIRA toxicity, in the absence of ATAD2, is linked to this imbalance.

      Furthermore, HIRA ChIP-seq performed in mouse ES cells revealed increased nucleosome-bound HIRA, particularly around transcription start sites (TSS) of active genes, along with the appearance of HIRA-bound nucleosomes within normally nucleosome-free regions (NFRs). These findings pointed to ATAD2 as a major factor responsible for unloading HIRA from nucleosomes. This unloading function may also apply to other histone chaperones, such as FACT (see Wang et al., 2021, Fig. 4C).

      In the present study, our investigations converge on the same ATAD2 function in the context of a physiologically integrated mammalian system—spermatogenesis. Indeed, in the absence of ATAD2, we observed H3.3 accumulation and enhanced H3.3-mediated gene expression. Consistent with this functional model of ATAD2— unloading chaperones from histone- and non-histone-bound chromatin—we also observed defects in histone-toprotamine replacement.

      Together, the results presented here and in Wang et al. (2021) reveal an underappreciated regulatory layer of histone chaperone activity. Previously, histone chaperones were primarily understood as factors that load histones. Our findings demonstrate that we must also consider a previously unrecognized regulatory mechanism that controls assembled histone-bound chaperones. This key point was clearly captured and emphasized by Reviewer #2 (see below).

      Strengths: 

      The paper describes the role of ATAD2 AAA+ ATPase in the proper localization of sperm-specific chromatin proteins such as protamine, suggesting the importance of the DNA replication-independent histone exchanges with the HIRA-histone H3.3 axis. 

      Weaknesses: 

      (1) Some results lack quantification. 

      We will consider all the data and add appropriate quantifications where necessary.

      (2) The work was performed well, and most of the results are convincing. However, this manuscript does not suggest a molecular mechanism for how ATAD2 promotes the formation of testis-specific chromatin. 

      Please see our comments above.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript by Liakopoulou et al. presents a comprehensive investigation into the role of ATAD2 in regulating chromatin dynamics during spermatogenesis. The authors elegantly demonstrate that ATAD2, via its control of histone chaperone HIRA turnover, ensures proper H3.3 localization, chromatin accessibility, and histone-toprotamine transition in post-meiotic male germ cells. Using a new well-characterized Atad2 KO mouse model, they show that ATAD2 deficiency disrupts HIRA dynamics, leading to aberrant H3.3 deposition, impaired transcriptional regulation, delayed protamine assembly, and defective sperm genome compaction. The study bridges ATAD2's conserved functions in embryonic stem cells and cancer to spermatogenesis, revealing a novel layer of epigenetic regulation critical for male fertility. 

      Strengths: 

      The MS first demonstration of ATAD2's essential role in spermatogenesis, linking its expression in haploid spermatids to histone chaperone regulation by connecting ATAD2-dependent chromatin dynamics to gene accessibility (ATAC-seq), H3.3-mediated transcription, and histone eviction. Interestingly and surprisingly, sperm chromatin defects in Atad2 KO mice impair only in vitro fertilization but not natural fertility, suggesting unknown compensatory mechanisms in vivo. 

      Weaknesses:

      The MS is robust and there are not big weaknesses 

      Reviewer #3 (Public review): 

      Summary: 

      The authors generated knockout mice for Atad2, a conserved bromodomain-containing factor expressed during spermatogenesis. In Atad2 KO mice, HIRA, a chaperone for histone variant H3.3, was upregulated in round spermatids, accompanied by an apparent increase in H3.3 levels. Furthermore, the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis were partially disrupted in the absence of ATAD2, possibly due to delayed histone removal. Despite these abnormalities, Atad2 KO male mice were able to produce offspring normally. 

      Strengths: 

      The manuscript addresses the biological role of ATAD2 in spermatogenesis using a knockout mouse model, providing a valuable in vivo framework to study chromatin regulation during male germ cell development. The observed redistribution of H3.3 in round spermatids is clearly presented and suggests a previously unappreciated role of ATAD2 in histone variant dynamics. The authors also document defects in the sequential incorporation and removal of TH2B and PRM1 during spermiogenesis, providing phenotypic insight into chromatin transitions in late spermatogenic stages. Overall, the study presents a solid foundation for further mechanistic investigation into ATAD2 function. 

      Weaknesses:

      While the manuscript reports the gross phenotype of Atad2 KO mice, the findings remain largely superficial and do not convincingly demonstrate how ATAD2 deficiency affects chromatin dynamics. Moreover, the phenotype appears too mild to elucidate the functional significance of ATAD2 during spermatogenesis. 

      We respectfully disagree with the statement that our findings are largely superficial. Based on our investigations of this factor over the years, it has become evident that ATAD2 functions as an auxiliary factor that facilitates mechanisms controlling chromatin dynamics (see, for example, Morozumi et al., 2015). These mechanisms can still occur in the absence of ATAD2, but with reduced efficiency, which explains the mild phenotype we observed.

      This function, while not essential, is nonetheless an integral part of the cell’s molecular biology and should be studied and brought to the attention of the broader biological community, just as we study essential factors. Unfortunately, the field has tended to focus primarily on core functional actors, often overlooking auxiliary factors. As a result, our decade-long investigations into the subtle yet important roles of ATAD2 have repeatedly been met with skepticism regarding its functional significance, which has in turn influenced editorial decisions.

      We chose eLife as the venue for this work specifically to avoid such editorial barriers and to emphasize that facilitators of essential functions do exist. They deserve to be investigated, and the underlying molecular regulatory mechanisms must be understood.

      (1) Figures 4-5: The analyses of differential gene expression and chromatin organization should be more comprehensive. First, Venn diagrams comparing the sets of significantly differentially expressed genes between this study and previous work should be shown for each developmental stage. Second, given the established role of H3.3 in MSCI, the effect of Atad2 knockout on sex chromosome gene expression should be analyzed. Third, integrated analysis of RNA-seq and ATAC-seq data is needed to evaluate how ATAD2 loss affects gene expression. Finally, H3.3 ChIP-seq should be performed to directly assess changes in H3.3 distribution following Atad2 knockout.  

      (1) In the revised version, we will include Venn diagrams to illustrate the overlap in significantly differentially expressed genes between this study and previous work. However, we believe that the GSEAs presented here provide stronger evidence, as they indicate the statistical significance of this overlap (p-values). In our case, we observed p-value < 0.01 (**) and p < 0.001 (***).

      (2) Sex chromosome gene expression was analyzed and is presented in Fig. 5C.

      (3) The effect of ATAD2 loss on gene expression is shown in Fig. 4A, B, and C as histograms, with statistical significance indicated in the middle panels.

      (4) Although mapping H3.3 incorporation across the genome in wild-type and Atad2 KO cells would have been informative, the available anti-H3.3 antibody did not work for ChIP-seq, at least in our hands. The authors of Fontaine et al., 2022, who studied H3.3 during spermatogenesis in mice, must have encountered the same problem, since they tagged the endogenous H3.3 gene to perform their ChIP experiments.

      (2) Figure 3: The altered distribution of H3.3 is compelling. This raises the possibility that histone marks associated with H3.3 may also be affected, although this has not been investigated. It would therefore be important to examine the distribution of histone modifications typically associated with H3.3. If any alterations are observed, ChIP-seq analyses should be performed to explore them further.  

      Based on our understanding of ATAD2’s function—specifically its role in releasing chromatin-bound HIRA—in the absence of ATAD2 the residence time of both HIRA and H3.3 on chromatin increases. This results in the detection of H3.3 not only on sex chromosomes but across the genome. Our data provide clear evidence of this phenomenon. The reviewer is correct in suggesting that the accumulated H3.3 would carry H3.3-associated histone PTMs; however, we are unsure what additional insights could be gained by further demonstrating this point.

      (3) Figure 7: While the authors suggest that pre-PRM2 processing is impaired in Atad2 KO, no direct evidence is provided. It is essential to conduct acid-urea polyacrylamide gel electrophoresis (AU-PAGE) followed by western blotting, or a comparable experiment, to substantiate this claim. 

      Figure 7 does not suggest that pre-PRM2 processing is affected in Atad2 KO; rather, this figure—particularly Fig. 7B—specifically demonstrates that pre-PRM2 processing is impaired, as shown using an antibody that recognizes the processed portion of pre-PRM2. ELISA was used to provide a more quantitative assessment; however, in the revised manuscript we will also include a western blot image.

      (4) HIRA and ATAD2: Does the upregulation of HIRA fully account for the phenotypes observed in Atad2 KO? If so, would overexpression of HIRA alone be sufficient to phenocopy the Atad2 KO phenotype? Alternatively, would partial reduction of HIRA (e.g., through heterozygous deletion) in the Atad2 KO background be sufficient to rescue the phenotype? 

      These are interesting experiments that require the creation of appropriate mouse models, which are not currently available.

      (5)The mechanism by which ATAD2 regulates HIRA turnover on chromatin and the deposition of H3.3 remains unclear from the manuscript and warrants further investigation. 

      The Reviewer is absolutely correct. In addition to the points addressed in response to Reviewer #1’s general comments (see above), it would indeed have been very interesting to test the segregase activity of ATAD2 (likely driven by its AAA ATPase activity) through in vitro experiments using the Xenopus egg extract system described by Tagami et al., 2004. This system can be applied both in the presence and absence (via immunodepletion) of ATAD2 and would also allow the use of ATAD2 mutants, particularly those with inactive AAA ATPase or bromodomains. However, such experiments go well beyond the scope of this study, which focuses on the role of ATAD2 in chromatin dynamics during spermatogenesis

      Reference

      Wang T, Perazza D, Boussouar F, Cattaneo M, Bougdour A, Chuffart F, Barral S, Vargas A, Liakopoulou A, Puthier D, Bargier L, Morozumi Y, Jamshidikia M, Garcia-Saez I, Petosa C, Rousseaux S, Verdel A, Khochbin S. ATAD2 controls chromatin-bound HIRA turnover. Life Sci Alliance. 2021 Sep 27;4(12):e202101151. doi: 10.26508/lsa.202101151. PMID: 34580178; PMCID: PMC8500222.

      Morozumi Y, Boussouar F, Tan M, Chaikuad A, Jamshidikia M, Colak G, He H, Nie L, Petosa C, de Dieuleveult M, Curtet S, Vitte AL, Rabatel C, Debernardi A, Cosset FL, Verhoeyen E, Emadali A, Schweifer N, Gianni D, Gut M, Guardiola P, Rousseaux S, Gérard M, Knapp S, Zhao Y, Khochbin S. Atad2 is a generalist facilitator of chromatin dynamics in embryonic stem cells. J Mol Cell Biol. 2016 Aug;8(4):349-62. doi: 10.1093/jmcb/mjv060. Epub 2015 Oct 12. PMID: 26459632; PMCID: PMC4991664.

      Fontaine E, Papin C, Martinez G, Le Gras S, Nahed RA, Héry P, Buchou T, Ouararhni K, Favier B, Gautier T, Sabir JSM, Gerard M, Bednar J, Arnoult C, Dimitrov S, Hamiche A. Dual role of histone variant H3.3B in spermatogenesis: positive regulation of piRNA transcription and implication in X-chromosome inactivation. Nucleic Acids Res. 2022 Jul 22;50(13):7350-7366. doi: 10.1093/nar/gkac541. PMID: 35766398; PMCID: PMC9303386.

      Tagami H, Ray-Gallet D, Almouzni G, Nakatani Y. Histone H3.1 and H3.3 complexes mediate nucleosome assembly pathways dependent or independent of DNA synthesis. Cell. 2004 Jan 9;116(1):51-61. doi:10.1016/s0092-8674(03)01064-x. PMID: 14718166.

    1. eLife Assessment

      This useful work identifies new monoclonal antibodies produced by cystic fibrosis patients against Pseudomonas aeruginosa type three secretion system. The evidence supporting authors' claim is solid. Nonetheless, the manuscript may benefit from a more in depth description of what the authors learned from their structure-based analyses of antibodies targeting PcrV.

    2. Reviewer #1 (Public review):

      Summary:

      Desveaux et al. describe human mAbs targeting protein from the Pseudomonas aeruginosa T3SS, discovered by employing single cell B cell sorting from cystic fibrosis patients. The mAbs were directed at the proteins PscF and PcrV. They particularly focused on two mAbs binding the T3SS with the potential of blocking activity. The supplemented biochemical analysis was crystal structures of P3D6 Fab complex. They also compared the blocking activity with mAbs that were described in previous studies, using an assay that evaluated the toxin injection. They conducted mechanistic structure analysis and found that these mAbs might act through different mechanisms by preventing PcrV oligomerization and disrupting PcrVs scaffolding function.

      The antibiotic resistance crisis requires the development of new solutions to treat infections cause by MDR bacteria. The development of antibacterial mAbs holds great potential. In that context, this report is important as it paves the way for the development of additional mAbs targeting various pathogens that harbor the T3SS. In this report the authors present a comparative study of their discovered mAbs vs. a commercial mAb currently in clinical testing resulting in valuate data with applicative implications. The authors investigated the mechanism of action of the mAbs using advanced methods and assays for characterization of antibody and antigen interaction, underlining the effort to determine the discovered mAbs suitability for downstream application.

    3. Reviewer #2 (Public review):

      Summary:

      Desveaux et al. performed Elisa and translocation assays to identify among 34 cystic fibrosis patients which ones produced antibodies against P. aeruginosa type three secretion system (T3SS). Authors were especially interested in antibodies against PcrV and PcsF, two key components of the T3SS. The authors leveraged their binding assays and flow cytometry to isolate individual B cells from the two most promising sera, and then obtained monoclonal antibodies for the proteins of interest. Among the tested monoclonal antibodies, P3D6 and P5B3 emerged as the best candidates due to their inhibitory effect on the ExoS-Bla translocation marker (with 24% and 94% inhibition, respectively). The authors then showed that P5B3 binds to the five most common variants of PcrV, while P3D6 seems to recognize only one variant. Furthermore, the authors showed that P3D6 inhibits translocon formation, measured as cell death of J774 macrophages. To get insights into the P3D6-PcrV interaction, the authors defined the crystal structure of the P3D6-PcrV complex. Finally, the authors compared their new antibodies with two previous ones (i.e., MEDI3902 and 30-B8).

      Strengths:

      • Article is well written.

      • Authors used complementary assays to evaluate protective effect of candidate monoclonal antibodies.

      • Authors offered crystal structure with insights into the P3D6 antibody-T3SS interaction (e.g., interactions with monomer vs pentamers).

      • Authors put their results in context by comparing their antibodies with respect to previous ones.

      Weaknesses:

      • Results shown in Fig. 6 should be initially described in the Results section and not in the Discussion section.

      • The authors should describe, in the Discussion (and also in L146-147), in more detail the gained insights into how anti-PcrV antibodies work. This is especially important given previous reports of more potent antibodies (e.g., Simonis et al.) that significantly reduces the novelty of their work. Hence, authors could explicitly highlight how their study differentiate from previous work, and what unique insights were gained (in the current version is not completely obvious).

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Desveaux et al. describe human mAbs targeting protein from the Pseudomonas aeruginosa T3SS, discovered by employing single cell B cell sorting from cystic fibrosis patients. The mAbs were directed at the proteins PscF and PcrV. They particularly focused on two mAbs binding the T3SS with the potential of blocking activity. The supplemented biochemical analysis was crystal structures of P3D6 Fab complex. They also compared the blocking activity with mAbs that were described in previous studies, using an assay that evaluated the toxin injection. They conducted mechanistic structure analysis and found that these mAbs might act through different mechanisms by preventing PcrV oligomerization and disrupting PcrVs scaffolding function.

      Strengths:

      The antibiotic resistance crisis requires the development of new solutions to treat infections caused by MDR bacteria. The development of antibacterial mAbs holds great potential. In that context, this report is important as it paves the way for the development of additional mAbs targeting various pathogens that harbor the T3SS. In this report, the authors present a comparative study of their discovered mAbs vs. a commercial mAb currently in clinical testing resulting in valuable data with applicative implications. The authors investigated the mechanism of action of the mAbs using advanced methods and assays for the characterization of antibody and antigen interaction, underlining the effort to determine the discovered mAbs suitability for downstream application.

      Weaknesses:

      Although the information presented in this manuscript is important, previous reports regarding other T3SS structures complexed with antibodies, reduce the novelty of this report. Nevertheless, we provide several comments that may help to improve the report. The structural analysis of the presented mAbs is incomplete and unfortunately, the authors did not address any developability assessment. With such vital information missing, it is unclear if the proposed antibodies are suited for diagnostic or therapeutic usage. This vastly reduces the importance of the possibly great potential of the authors' findings. Moreover, the structural information does not include the interacting regions on the mAb which may impede the optimization of the mAb if it is required to improve its affinity.

      As described in the manuscript (Fig. 6), our mAbs are markedly less effective in every in vitro T3SS inhibition assay than the mAbs recently described by Simonis et al. They are therefore very unlikely to outperform these mAbs in in vivo animal models of P. aeruginosa infection. Considering the high cost of animal experiments and ethical concerns-and in accordance with the Reduction principal of the 3Rs guidelines-we chose not to pursue in vivo experiments. Instead, we focused on leveraging the new isolated mAbs to investigate the mechanisms of action and structural features of anti-PcrV mAbs.

      Following the reviewer's suggestion, we have now added mAb interaction features into the structural data presented in the manuscript. However, based on the efficiency data, the structural analysis and the mechanistic insights presented, we do not consider further therapeutic use and optimization of our mAbs to be warranted.

      Reviewer #2 (Public review):

      Summary:

      Desveaux et al. performed Elisa and translocation assays to identify among 34 cystic fibrosis patients which ones produced antibodies against P. aeruginosa type three secretion system (T3SS). The authors were especially interested in antibodies against PcrV and PcsF, two key components of the T3SS. The authors leveraged their binding assays and flow cytometry to isolate individual B cells from the two most promising sera, and then obtained monoclonal antibodies for the proteins of interest. Among the tested monoclonal antibodies, P3D6 and P5B3 emerged as the best candidates due to their inhibitory effect on the ExoS-Bla translocation marker (with 24% and 94% inhibition, respectively). The authors then showed that P5B3 binds to the five most common variants of PcrV, while P3D6 seems to recognize only one variant. Furthermore, the authors showed that P3D6 inhibits translocon formation, measured as cell death of J774 macrophages. To get insights into the P3D6PcrV interaction, the authors defined the crystal structure of the P3D6-PcrV complex. Finally, the authors compared their new antibodies with two previous ones (i.e., MEDI3902 and 30-B8).

      Strengths:

      (1) The article is well written.

      (2) The authors used complementary assays to evaluate the protective effect of candidate monoclonal antibodies.

      (3) The authors offered crystal structure with insights into the P3D6 antibody-T3SS interaction (e.g., interactions with monomer vs pentamers).

      (4) The authors put their results in context by comparing their antibodies with respect to previous ones.

      Weaknesses:

      The authors used a similar workflow to the one previously reported in Simonis et al. 2023 (antibodies from cystic fibrosis patients that included B cell isolation, antibody-PcrV interaction modeling, etc.) but the authors do not clearly explain how their work and findings differentiate from previous work.   

      We employed a similar mAb isolation pipeline to that used by Simonis et al., beginning with the screening of a cohort of cystic fibrosis patients chronically infected with P. aeruginosa. As in Simonis et al., we isolated specific B cells using a recombinant PcrV bait, followed by single-cell PCR amplification of immunoglobulin genes. The main differences in methodology between the two studies are as follows: i) the use of individuals from different cohorts, and therefore having different Ab repertoires; ii) the nature of the screening assays, although in both cases the screening was focused on the inhibition of T3SS function; iii) the PcrV labeling strategy, with Simonis et al. employing direct labeling, whereas we used a biotinylated tag combined with streptavidin;

      The number of specific mAbs obtained and produced was higher in Simonis et al. (47 versus 9 in our study). They sorted B cells from three individuals compared to two in our work and possibly started with a larger amount of PBMCs per donor, which may account for the higher number of specific B cells and mAbs isolated. Considering that the strategies were overall very similar, the greater number of mAbs isolated in Simonis et al. likely explains, to a large extent, why they identified mAbs targeting different epitopes compared to ours, including highly potent mAbs that we did not recover. 

      Our modeling study, unlike that of Simonis et al., which relied on an AlphaFold prediction of the multimeric structure of P. aeruginosa PcrV, was based on the experimentally determined structure of the homologous Salmonella SipD pentamer, as described in the manuscript. Furthermore, we compared our mAb P3D6 not only with 30-B8 from Simonis et al., but also with MEDI3902. Finally, in contrast to the approach of Simonis et al., we used functional assays to investigate the differences in mechanisms of action among these mAbs, which target three distinct epitopes.

      (2) Although new antibodies against P. aeruginosa T3SS expand the potential space of antibodybased therapies, it is unclear if P3D6 or P5B3 are better than previous antibodies. In fact, in the discussion section authors suggested that the 30-B8 antibody seems to be the most effective of the tested antibodies.  

      As explained above and shown in the Results section (Figure 6), the 30-B8 mAb is markedly more effective at inhibiting T3SS activity in both in vitro assays used.

      (3) The authors should explain better which of the two antibodies they have discovered would be better suited for follow-up studies. It is confusing that the authors focused the last sections of the manuscript on P3D6 despite P3D6 having a much lower ExoS-Bla inhibition effect than P5B3 and the limitation in the PcrV variant that P3D6 seems to recognize. A better description of this comparison and the criteria to select among candidate antibodies would help readers identify the main messages of the paper. 

      The P3D6 mAb shows stronger inhibitory activity than P5B3 in the two assays used, as shown in Supplementary Figure 1. An error in the table in Figure 2B was corrected and this table now reflects the results presented in Supplementary Figure 1. 

      The final sections of the manuscript focus on P3D6, which is more potent than P5B3, and for which we successfully determined a co-crystal structure with PcrV*. All parallel attempts to obtain a structure of P5B3 in complex with PcrV* failed. The P3D6-PcrV* structure was used to analyze epitope recognition and mechanisms of action in comparison to previously described mAbs. As previously mentioned, we do not consider further studies aimed at therapeutic development and optimization of our mAbs to be justified given the current data. Therefore, we believe that the main message of the paper is adequately captured in the title.

      (4) This work could strongly benefit from two additional experiments:

      (a) In vivo experiments: experiments in animal models could offer a more comprehensive picture of the potential of the identified monoclonal antibodies. Additionally, this could help to answer a naïve question: why do the patients that have the antibodies still have chronic P. aeruginosa infections? 

      As explained above, the mAbs we isolated are significantly less potent than those described by Simonis et al., and are therefore unlikely to outperform the best anti-PcrV candidates in vivo. In light of the data, and considering ethical concerns related to animal use in research and budgetary constraints, we decided not to proceed with in vivo experiments.

      There are a number of reasons that may explain why patients with anti-PcrV Abs blocking the T3SS can still be chronically infected with Pa. First these Abs may be at limiting concentration, particularly in sites where Pa replicates, and thus unable to clear infection. in addition, it has been described that the T3SS is downregulated in chronic infection in cystic fibrosis patients. This suggests that a therapeutic intervention with T3SS inhibiting Abs may be more efficient if done early in cystic fibrosis patients to prevent colonization when Pa possesses an active T3SS. Finally, T3SS is not the only virulence mechanism employed by P. aeruginosa during infection. Indeed, multiple protein adhesins and polysaccharides are important factors facilitating the formation of bacterial biofilms that are crucial for establishing chronic persistent infection. In this regard, a combination of Abs targeting different factors on the P. aeruginosa surface may be needed to treat chronic infections.  

      (b) Multi-antibody T3SS assays (i.e., a combination of two or more monoclonal antibodies evaluated with the same assays used for characterization of single ones). This could explore the synergistic effects of combinatorial therapies that could address some of the limitations of individual antibodies. 

      Given the high potency of the Simonis mAbs and the mechanisms of action highlighted by our analysis, it is unlikely that our mAbs would synergize with those described by Simonis. Additionally, since our two mAbs cross-compete for binding, synergy between them is also improbable.

      Reviewer #1 (Recommendations for the authors):

      Line 166: How was the serum-IgG purified? (e.g., protein A, protein G). 

      Protein A purification was used, as now mentioned in the manuscript. Purified Igs were thus predominantly IgG1, IgG2 and IgG4, as indicated.

      (2) Line 196: When mentioning affinities, it is preferable to present in molar units. 

      To facilitate comparisons, Ab concentrations were presented in µg/mL as in Simonis et al.

      (3) Line 206: The author states that P3D6 displays significantly reduced ExoS-Bla injection (Figure 2B), but according to the presented table, ExoS-Bla inhibition was higher for P5B3. Additionally, when using "significantly", what was the statistical test that was used to evaluate the significance? Please clarify.

      We thank the reviewer for pointing out this inconsistency. Indeed, the names of P3D6 and P5B3 were exchanged when building the table related to Figure 2B. The corrected version of this figure is now presented in the new version of the manuscript. An ANOVA was performed to evaluate the significance of the observed difference (adjusted p-values < 0.001) and it is now mentioned in the figure caption.  

      (4) Line 215: "P3B3" typo.

      This was corrected.

      (5) Figure 3B: Could the author explain the higher level of ExoS-Bla injection when using VRCO1 antibody compared to no antibody.  

      A slightly higher level of the median is observed in the case of three variants out of five. However, this difference is not statistically significant (p-value > 0.05).

      (6) Supplement Figure 1: the presented grey area is not clear (is it the 95%CI?) and how was the IC50 calculated? With what model was it projected? Are the values for IC50 beyond the 100µg/mL mark a projection? It seems that projecting such greater values (such as the IC50 of over 400µg/mL for variant 5) is prone to high error probability.

      The grey area represents the 95% confidence interval (95% CI) and it is now mentioned in the figure caption. The IC50 and 95% CI were both inferred by the dose-response drc R package based on a three-parameters log-logistic model and it is now explained in the Materials & Methods section. The p-values for IC50 beyond the 100µg/mL were below 0.05 but we agree that such extrapolation should be considered with precaution (see below our response to comment number 7).

      (7) Line 227: The author describes that P5B3 has similar IC50 values towards variants 1-4, but the  IC50 towards variant 5 is substantially higher with 400µg/mL, albeit the only difference between variant 4 and 5 is the switch position 225 Arg -> Lys which are very similar in their properties. Please provide an explanation. 

      As explained in our response to comment number 6, we agree that the comparison of IC50 that are estimated to be close or higher than the highest experimental concentration is somehow speculative. Indeed, we performed further statistical analysis that showed no significant difference between the IC50 toward the five PcrV variants of mAb P5B3. In contrast, the difference between the IC50 of mAbs P5B3 and P3D6 toward variant 1 is statistically significant. This is now explained in the manuscript.

      (8) Line 233: Pore assembly: It is not clear how the data was normalized. The authors mention the methods normalization against the wildtype strain in the absence of antibodies, but did not elaborate clearly if the mutant strain has the same base cytotoxicity as the wild type. It would be helpful to show the level of cytotoxicity of the wild type compared to the mutant in the absence of antibodies to understand the baseline of cytotoxicity of both strains.  

      In these experiments we did not use the wild-type strain. As explained, the only strain that allows the measurement of pore formation by translocators PopB/PopD is the one lacking all effectors. All the experiments were done with this strain, and all the measurements were normalized accordingly. 

      (9) Figure 4: The explanation is redundant as it is clearly stated in the results. It would be better for the caption to describe the figure and leave interpretation to the results section. Overall, this comment is relevant to all figure captions, as it will reduce redundancy. My suggestion is to keep the figure caption as a road map to understand what is shown in the figure. For example, the Figure 4 caption should include that the concentration is presented in logarithmic scale, what is the dashed line, what is the grey area (what interval does it represent?), what each circle represents, and what is the regression model used? 

      Figure captions have been improved as suggested. 

      (10) Line 432: The authors apparently misquoted the original article describing the chimeric form PcrV* by describing the fusion of amino acids 1-17 and 136-249. I quote the original article by Tabor et al. "[...] we generated a truncated PcrV fragment (PcrVfrag) comprising PcrV amino acids 1-17 fused to amino acids 149-236 [...]". Additionally, how does the absence of amino acid 21 in the variant affect the conclusion? 

      Our construct was inspired by the one described in Tabor et al. but was not identical. We have therefore replaced "was constructed based on a construct by Tabor et al." for "whose design was inspired by the construct described in Tabor et al."

      Amino acid 21 is only absent in the construct used for crystallization experiments; all other experiments looking at Ab activity were performed with bacteria bearing full-length PcrV. The difference in P3D6 activity between variants V1 and V2-appears to be explained by the nature of the residue at position 225, according to the structural data, as explained now in more detail in the manuscript. Accordingly, the difference in efficiency of P3D6 against the V1 and V2  variants is explained by the residue at position 225, as both variants have the same residue at position 21. However, while the nature of the residue at position 225 appears to explain the absence of efficiency of the Ab for the variants studied, an impact of residue 21 could not be totally ruled out in putative variants with a Ser at 225 but different amino acids at 21.

      (11) Line 569: Missing word - ESRF stands for European Synchrotron Radiation Facility. 

      This has been corrected.

      (12) Line 268-269 (Figure 5A): The description of the alpha helices in relation to the figure is incomplete. Helices 2,3 and 5 are not indicated. 

      Indeed, since the structure is well-known and in the interest of visibility and simplicity, we only included the most relevant secondary structure features.

      (13) Line 271-272: It would be good to elaborate on the exact binding platform between LC and HC of the Fab and the residues on the PcrV side. For example, the author could apply the structure to PDBePISA (EMBL-EBI) which will provide details about the interface between the PcrV and the antibody. It is very interesting to learn what regions of the antibody are in charge of the binding, such as: is the H-CDR3 the major contributor of the binding or are other CDRs more involved? Additionally, in line 275 they state that the substitution of Ser 225 with Arg or Lys is consistent with the P3D6 insufficient binding. What contributed to this result on the antibodies side? 

      In order to address this question, we are now providing a LigPlot figure (supplementary Figure 3) in which specific interactions between PcrV* and the Fab are shown.

      (14) Line 291: It is unclear from what data the authors concluded that anti-PscF targets 3 distinct regions of PscF. 

      The data are shown in Supplementary Table 2, as mentioned in the manuscript. We have now modified the order of the anti-PcrV mAbs in the table to better illustrate the three identified epitope clusters (Sup table 2). Similarly, the anti-PscF mAbs appear to group into three clusters as P3G9 and P5E10 only compete with themselves, while mabs P3D6 and P5B3 compete with themselves and each other.

      (15) Line 315: It is preferable to introduce results in the results section instead of the discussion. 

      While preparing the manuscript, we initially included these results as a separate paragraph in the Results section, but ultimately chose the current format to improve flow and avoid redundancy.

      (16) Supplement Figure 2: What was the regression model used to evaluate IC50, and what is presented in the graph? What is the dashed line (see comment for Figure 4 above)? 

      The regression is based on a three-parameters log-logistic model and the light-colors area correspond to the 95% IC. The dashed lines visually represents 100% of ExoS-Bla injection. These information are now mentioned in the figure caption.

      (17) Figure 6B: It would be better to show an additional rotation of the PcrV bound by Fab 30-B8 that corresponds to the same as the one represented with Fab MEDI3092. This would clear up the differences in binding regions. Same for Fab P3D6. 

      Figure 6 already depicts two orientations. Despite the fact that we agree that additional orientations could be of interest, we believe that this would add unnecessary complexity to the figure, and would prefer to maintain the figure as is, if possible.

      (18) Line 356-358: The author proposes an experiment to support the suggested mechanism of P3D6, it would follow up with a bio-chemical analysis showing the prevention of PcrV oligomerization in its presence. 

      We understand the reviewers’ comment regarding the potential use of biochemical approaches to test our hypothesis. However, this not currently feasible as we have been unable to achieve in vitro oligomerization of PcrV alone, possibly due to the absence of other T3SS components, such as the polymerized PscF needle.

      (19) Line 456: Missing details about how the ELISA was conducted including temperature, how the antigen was absorbed, plate type, etc. 

      Experimental details have been added.

      (20) Line 460: Missing substrate used for alkaline phosphatase. 

      The nature of the substrate was added to the methods.

    1. eLife Assessment

      This study makes the valuable claim that people track, specifically, the elasticity of control (that is, the degree to which outcome depends on how many resources - such as money - are invested), and that control elasticity is impaired in certain types of psychopathologies. A novel task is introduced that provides solid evidence that this learning process occurs and that human behavior is sensitive to changes in the elasticity of control. Evidence that elasticity inference is distinct from more general learning mechanisms and is related to psychopathology remains incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigated the elasticity of controllability by developing a task that manipulates the probability of achieving a goal with a baseline investment (which they refer to as inelastic controllability) and the probability that additional investment would increase the probability of achieving a goal (which they refer to as elastic controllability). They found that a computational model representing the controllability and elasticity of the environment accounted better for the data than a model representing only the controllability. They also found that prior biases about the controllability and elasticity of the environment was associated with a composite psychopathology score. The authors conclude that elasticity inference and bias guide resource allocation.

      Strengths:

      This research takes a novel theoretical and methodological approach to understanding how people estimate the level of control they have over their environment, and how they adjust their actions accordingly. The task is innovative and both it and the findings are well-described (with excellent visuals). They also offer thorough validation for the particular model they develop. The research has the potential to theoretically inform understanding of control across domains, which is a topic of great importance.

      Weaknesses:

      In its revised form, the manuscript addresses most of my previous concerns. The main remaining weakness pertains to the analyses aimed at addressing my suggesting of Bayesian updating as an alternative to the model proposed by the authors. My suggestion was to assume that people perform a form of function approximation to relate resource expenditure to success probability. The authors performed a version of this where people were weighing evidence for a few canonical functions (flat, step, linear), and found that this model underperforms theirs. However, this Bayesian model is quite constrained in its ability to estimate the function relating resources. A more robust test would be to assume a more flexible form of updating that is able to capture a wide range of distributions (e.g., using basis functions, gaussian processes, or nonparametric estimators); see, e.g., work by Griffiths on human function learning). The benefit of testing this type of model is that it would make contact with a known form of inference that individuals engage in across various settings, and therefore could offer a more parsimonious and generalizable account of function learning, whereby learning of resource elasticity is a special case. I defer to the authors as to whether they'd like to pursue this direction, but if not I think it's still important that they acknowledge that they are unable to rule out a more general process like this as an alternative to their model. This also pertains to inferences about individual differences, which currently hinge on their preferred model being the most parsimonious.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors test whether controllability beliefs and associated actions/resource allocation are modulated by things like time, effort, and monetary costs (what they call "elastic" as opposed to "inelastic" controllability). Using a novel behavioral task and computational modeling, they find that participants do indeed modulate their resources depending on whether they are in an "elastic," "inelastic," or "low controllability" environment. The authors also find evidence that psychopathology is related to specific biases in controllability.

      Strengths:

      This research investigates how people might value different factors that contribute to controllability in a creative and thorough way. The authors use computational modeling to try to dissociate "elasticity" from "overall controllability," and find some differential associations with psychopathology. This was a convincing justification for using modeling above and beyond behavioral output, and yielded interesting results. Notably, the authors conclude that these findings suggest that biased elasticity could distort agency beliefs via maladaptive resource allocation. Overall, this paper reveals important findings about how people consider components of controllability.

      Weaknesses:

      The authors have gone to great lengths to revise the manuscript to clarify their definitions of "elastic" and "inelastic" and bolster evidence for their computational model, resulting in an overall strong manuscript that is valuable for elucidating controllability dynamics and preferences. One minor weakness is that the justification for the analysis technique for the relationships between the model parameters and the psychopathology measures remains lacking given the fact that simple correlational analyses did not reveal any significant associations nor were there results of any regression analyses. That said, the authors did preregister the CCA analysis, so while perhaps not the best method, it was justified to complete it. Regardless of method, the psychopathology results are not particularly convincing, but provide an interesting jumping-off point for further exploration in future work.

    4. Reviewer #3 (Public review):

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome. In particular, the authors identify one key dimension: the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally argue that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea has the potential to change how we think about several major mental disorders in a substantial way, and can additionally help us better understand how healthy people navigate challenging decision-making problems. More concisely, it is a *very good idea*.

      The more concrete contributions, however, are not as strong. In particular, evidence for the paper's most striking claims is weak. Quoting the abstract, these claims are (1) "the elasticity of control [is] a distinct cognitive construct guiding adaptive behavior" and (2) "overestimation of elasticity is associated with elevated psychopathology involving an impaired sense of control."

      Main issues

      I'll highlight the key points.

      - The task cannot distinguish elasticity inference from general learning processes

      - Participants were explicitly instructed about elasticity, with labeled examples

      - The psychopathology claims rely on an invalid interpretation of CCA, and are contradicted by simple correlations (elasticity bias and the sense of agency scale is r=0.03)

      Distinct construct

      Starting with claim 1, there are three subclaims here. (1A) People's behavior is sensitive to differences in elasticity; (1B) there are mental processes specific to elasticity inference, i.e., not falling out of general learning mechanisms; and, implicitly, (1C) people infer elasticity naturally as they go about their daily lives. The results clearly support 1A. However, 1B and 1C are not well supported.

      (1B) The data cannot support the "distinct cognitive construct" claim because the task is too simple to dissociate elasticity inference from more general learning processes (also raised by Reviewer 1). The key behavioral signature for elasticity inference (vs. generic controllability inference) is the transfer across ticket numbers, illustrated in Fig 4. However, this pattern is also predicted by a standard Bayesian learner equipped with an intuitive causal model of the task. Each ticket gives you another chance to board and the agent infers the probability that each attempt succeeds. Crucially, this logic is not at all specific to elasticity or even control. An identical model could be applied to inferring the bias of a coin from observations of whether any of N tosses were heads-a task that is formally identical to this one (at least, the intuitive model of the task; see first minor comment).

      Importantly, this point cannot be addressed by showing that the author's model fits data better than this or any other specific Bayesian model. It is not a question of whether one particular updating rule explains data better than another. Rather, it is a question of whether the task can distinguish between biases in *elasticity* inference versus biases in probabilistic inference more generally. The present task cannot make this distinction because it does not make separate measurements of the two types of inference. To provide compelling evidence that elasticity inference is a "distinct cognitive construct", one would need to show that there are reliable individual differences in elasticity inference that generalize across contexts but do not generalize to computationally similar types of probabilistic inference (e.g. the coin flipping example).

      (1C) The implicit claim that people infer elasticity outside of the experimental task is undermined by the experimental design. The authors explicitly tell people about the two notions of control as part of the training phase: "To reinforce participants' understanding of how elasticity and controllability were manifested in each planet, [participants] were informed of the planet type they had visited after every 15 trips."

      In the revisions, the authors seem to go back and forth on whether they are claiming that people infer elasticity without instruction (I won't quote it here). I'll just note that the examples they provide in the most recent rebuttal are all cases in which one never receives explicit labels about elasticity. If people only infer elasticity when it is explicitly labeled, I struggle to see its relevance for understanding human cognition and behavior.

      Psychopathology

      Finally, I turn to claim 2, that "overestimation of elasticity is associated with elevated psychopathology involving an impaired sense of control." The CCA analysis is in principle unable to support this claim. As the authors correctly note in their latest rebuttal, the CCA does show that "there is a relationship between psychopathology traits and task parameters". The lesion analysis further shows that "elasticity bias specifically contributes to this relationship" (and similarly for the Sense of Agency scale). Crucially, however, this does *not* imply that there is a relationship between those two variables. The most direct test of that relationship is the simple correlation, which the authors report only in a supplemental figure: there is no relationship (r=0.03). Although it is of course possible that there is a relationship that is obscured by confounding variables, the paper provides no evidence-statistical or otherwise-that such a relationship exists.

      Minor comments

      The statistical structure of the task is inconsistent with the framing. In the framing, participants can make either one or two second boarding attempts (jumps) by purchasing extra tickets. The additional attempt(s) will thus succeed with probability p for one ticket and 2p - p^2 for two tickets; the p^2 captures the fact that you only take the second attempt if you fail on the first. A consequence of this is buying more tickets has diminishing returns. In contrast, in the task, participants always jumped twice after purchasing two tickets, and the probability of success with two tickets was exactly double that with one ticket. Thus, if participants are applying an intuitive causal model to the task, the researcher could infer "biases" in elasticity inference that are probably better characterized as effective use of prior information (encoded in the causal model).

      The model is heuristically defined and does not reflect Bayesian updating. For example, it over-estimates maximum control by not using losses with less than 3 tickets (intuitively, the inference here depends on what your beliefs about elasticity). Including forced three-ticket trials at the beginning of each round makes this less of an issue; but if you want to remove those trials, you might need to adjust the model. The need to introduce the modified model with kappa is likely another symptom of the heuristic nature of the model updating equations.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This research takes a novel theoretical and methodological approach to understanding how people estimate the level of control they have over their environment and how they adjust their actions accordingly. The task is innovative and both it and the findings are well-described (with excellent visuals). They also offer thorough validation for the particular model they develop. The research has the potential to theoretically inform understanding of control across domains, which is a topic of great importance.

      We thank the Reviewer for their favorable appraisal and valuable suggestions, which have helped clarify and strengthen the study’s conclusion. 

      In its revised form, the manuscript addresses most of my previous concerns. The main remaining weakness pertains to the analyses aimed at addressing my suggesting of Bayesian updating as an alternative to the model proposed by the authors. My suggestion was to assume that people perform a form of function approximation to relate resource expenditure to success probability. The authors performed a version of this where people were weighing evidence for a few canonical functions (flat, step, linear), and found that this model underperformed theirs. However, this Bayesian model is quite constrained in its ability to estimate the function relating resources. A more robust test would be to assume a more flexible form of updating that is able to capture a wide range of distributions (e.g., using basis functions, gaussian processes, or nonparametric estimators); see, e.g., work by Griffiths on human function learning). The benefit of testing this type of model is that it would make contact with a known form of inference that individuals engage in across various settings and therefore could offer a more parsimonious and generalizable account of function learning, whereby learning of resource elasticity is a special case. I defer to the authors as to whether they'd like to pursue this direction, but if not I think it's still important that they acknowledge that they are unable to rule out a more general process like this as an alternative to their model. This pertains also to inferences about individual differences, which currently hinge on their preferred model being the most parsimonious.

      We thank the Reviewer for this thoughtful suggestion. We acknowledge that more flexible function learning approaches could provide a stronger test in favor of a more general account. Our Bayesian model implemented a basis function approach where the weights of three archetypal functions (flat, step, linear) are learned from experience Testing models with more flexible basis functions would likely require a task with more than three levels of resource investment (1, 2, or 3 tickets). This would make an interesting direction for future work expanding on our current findings. We now incorporate this suggestion in more detail in our updated manuscript (335-341):

      “Second, future models could enable generalization to levels of resource investment not previously experienced. For example, controllability and its elasticity could be jointly estimated via function approximation that considers control as a function of invested resources. Although our implementation of this model did not fit participants’ choices well (see Methods), other modeling assumptions drawn from human function learning [30] or experimental designs with continuous action spaces may offer a better test of this idea.”

      Reviewer #2 (Public review):

      This research investigates how people might value different factors that contribute to controllability in a creative and thorough way. The authors use computational modeling to try to dissociate "elasticity" from "overall controllability," and find some differential associations with psychopathology. This was a convincing justification for using modeling above and beyond behavioral output and yielded interesting results. Notably, the authors conclude that these findings suggest that biased elasticity could distort agency beliefs via maladaptive resource allocation. Overall, this paper reveals important findings about how people consider components of controllability. The authors have gone to great lengths to revise the manuscript to clarify their definitions of "elastic" and "inelastic" and bolster evidence for their computational model, resulting in an overall strong manuscript that is valuable for elucidating controllability dynamics and preferences. 

      We thank the Reviewer for their constructive feedback throughout the review process, which has substantially strengthened our manuscript and clarified our theoretical framework.

      One minor weakness is that the justification for the analysis technique for the relationships between the model parameters and the psychopathology measures remains lacking given the fact that simple correlational analyses did not reveal any significant associations.

      We note that the existence of bivariate relationships is not a prerequisite for the existence of multivariate relationships. Conditioning the latter on the former, therefore, would risk missing out on important relationships existing in the data. Ultimately, correlations between pairs of variables do not offer a sensitive test for the general hypothesis that there is a relationship between two sets of variables. As an illustration, consider that elasticity bias correlated in our data (r = .17, p<.001) with the difference between SOA (sense of agency) and SDS (self-rating depression). Notably, SOA and SDS were positively correlated (r = .47, p<.001), and neither of them was correlated with elasticity bias (SOA: r=.04 p=.43, SDS: r=-.06, p=.16). It was a dimension that ran between them that mapped onto elasticity bias. This specific finding is incidental and uncorrected for multiple comparisons, hence we do not report it in the manuscript, but it illustrates the kinds of relationships that cannot be accounted for by looking at bivariate relationships alone.  

      Reviewer #3 (Public review):

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome.

      In particular, the authors identify one key dimension: the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally argue that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea has the potential to change how we think about several major mental disorders in a substantial way and can additionally help us better understand how healthy people navigate challenging decision-making problems. More concisely, it is a very good idea.

      We thank the Reviewer for their thoughtful engagement with our manuscript. We appreciate their recognition of elasticity as a key dimension of control that has the potential to advance our understanding of psychopathology and healthy decision-making.

      Starting with theory, the authors do not provide a strong formal characterization of the proposed notion of elasticity. There are existing, highly general models of controllability (e.g., Huys & Dayan, 2009; Ligneul, 2021) and the elasticity idea could naturally be embedded within one of these frameworks. The authors gesture at this in the introduction; however, this formalization is not reflected in the implemented model, which is highly task-specific.

      Our formal definition of elasticity, detailed in Supplementary Note 1, naturally extends the reward-based and information-theoretic definitions of controllability by Huys & Dayan (2009) and Ligneul (2021). We now further clarify how the model implements this formalized definition (lines 156-159).

      “Conversely, in the ‘elastic controllability model’, the beta distributions represent a belief about the maximum achievable level of control (𝑎<sub>Control</sub>, 𝑏<sub>Control</sub>) coupled with two elasticity estimates that specify the degree to which successful boarding requires purchasing at least one (𝑎<sub>elastic≥1</sub>, 𝑏<sub>elastic≥1</sub>) or specifically two (𝑎<sub>elastic2</sub>, 𝑏<sub>elastic2</sub>) extra tickets. As such, these elasticity estimates quantify how resource investment affects control. The higher they are, the more controllability estimates can be made more precise by knowing how much resources the agent is willing and able to invest (Supplementary Note 1).”

      Moreover, the authors present elasticity as if it is somehow "outside of" the more general notion of controllability. However, effort and investment are just specific dimensions of action; and resources like money, strength, and skill (the "highly trained birke") are just specific dimensions of state. Accordingly, the notion of elasticity is necessarily implicitly captured by the standard model. Personally, I am compelled by the idea that effort and resource (and therefore elasticity) are particularly important dimensions, ones that people are uniquely tuned to. However, by framing elasticity as a property that is different in kind from controllability (rather than just a dimension of controllability), the authors only make it more difficult to integrate this exciting idea into generalizable models.

      We respectfully disagree that we present elasticity as outside of, or different in kind from, controllability. Throughout the manuscript, we explicitly describe elasticity as a dimension of controllability (e.g., lines 70-72, along many other examples). This is also expressed in our formal definition of elasticity (Supplementary Note 1). 

      The argument that vehicle/destination choice is not trivial because people occasionally didn't choose the instructed location is not compelling to me-if anything, the exclusion rate is unusually low for online studies. The finding that people learn more from non-random outcomes is helpful, but this could easily be cast as standard model-based learning very much like what one measures with the Daw two-step task (nothing specific to control here). Their final argument is the strongest, that to explain behavior the model must assume "a priori that increased effort could enhance control." However, more literally, the necessary assumption is that each attempt increases the probability of success-e.g. you're more likely to get a heads in two flips than one. I suppose you can call that "elasticity inference", but I would call it basic probabilistic reasoning.

      We appreciate the Reviewer’s concerns but feel that some of the more subjective comments might not benefit from further discussion. We only note that controllability and its elasticity are features of environmental structure, so in principle any controllability-related inference is a form of model-based learning. The interesting question is whether people account in their world model for that particular feature of the environment.   

      The authors try to retreat, saying "our research question was whether people can distinguish between elastic and inelastic controllability." I struggle to reconcile this with the claim in the abstract "These findings establish the elasticity of control as a distinct cognitive construct guiding adaptive behavior". That claim is the interesting one, and the one I am evaluating the evidence in light of.

      In real-world contexts, it is often trivial that sometimes further investment enhances control and sometimes it does not. For example, students know that if they prepare more extensively for their exams they will likely be able to achieve better grades, but they also know that there is uncertainty in this regard – their grades could improve significantly, modestly, or in some cases, they might not improve at all, depending on the type of exams their study program administers and the knowledge or skills being tested. Our research question was whether in such contexts people learn from experience the degree to which controllability is elastic to invested resources and adapt their resource investment accordingly. Our findings show that they do. 

      The authors argue for CCA by appeal to the need to "account for the substantial variance that is typically shared among different forms of psychopathology". I agree. A simple correlation would indeed be fairly weak evidence. Strong evidence would show a significant correlation after *controlling for* other factors (e.g. a regression predicting elasticity bias from all subscales simultaneously). CCA effectively does the opposite, asking whether-with the help of all the parameters and all the surveys-one can find any correlation between the two sets of variables. The results are certainly suggestive, but they provide very little statistical evidence that the elasticity parameter is meaningfully related to any particular dimension of psychopathology.

      We agree with the Reviewer on the relationship between elasticity and any particular dimension of psychopathology. The CCA asks a different question, namely, whether there is a relationship between psychopathology traits and task parameters, and whether elasticity bias specifically contributes to this relationship. 

      I am very concerned to see that the authors removed the discussion of this limitation in response to my first review. I quote the original explanation here:

      - In interpreting the present findings, it needs to be noted that we designed our task to be especially sensitive to overestimation of elasticity. We did so by giving participants free 3 tickets at their initial visits to each planet, which meant that upon success with 3 tickets, people who overestimate elasticity were more likely to continue purchasing extra tickets unnecessarily. Following the same logic, had we first had participants experience 1 ticket trips, this could have increased the sensitivity of our task to underestimation of elasticity in elastic environments. Such underestimation could potentially relate to a distinct psychopathological profile that more heavily loads on depressive symptoms. Thus, by altering the initial exposure, future studies could disambiguate the dissociable contributions of overestimating versus underestimating elasticity to different forms of psychopathology.

      The logic of this paragraph makes perfect sense to me. If you assume low elasticity, you will infer that you could catch the train with just one ticket. However, when elasticity is in fact high, you would find that you don't catch the train, leading you to quickly infer high elasticity eliminating the bias. In contrast, if you assume high elasticity, you will continue purchasing three tickets and will never have the opportunity to learn that you could be purchasing only one-the bias remains.

      The authors attempt to argue that this isn't happening using parameter recovery. However, they only report the *correlation* in the parameter, whereas the critical measure is the *bias*. Furthermore, in parameter recovery, the data-generating and data-fitting models are identical-this will yield the best possible recovery results. Although finding no bias in this setting would support the claims, it cannot outweigh the logical argument for the bias that they originally laid out. Finally, parameter recovery should be performed across the full range of plausible parameter values; using fitted parameters (a detail I could only determine by reading the code) yields biased results because the fitted parameters are themselves subject to the bias (if present). That is, if true low elasticity is inferred as high elasticity, then you will not have any examples of low elasticity in the fitted parameters and will not detect the inability to recover them.

      The logic the Reviewer describes breaks down when one considers the dynamics of participants’ resource investment choices. A low elasticity bias in a participant’s prior belief would make them persist for longer in purchasing a single ticket despite failure, as compared to a person without such a bias. Indeed, the ability of the experimental design to demonstrate low elasticity biases is evidenced by the fact that the majority of participants were fitted with a low elasticity bias (μ = .16 ± .14, where .5 is unbiased). 

      Originally, the Reviewer was concerned that elasticity bias was being confounded with a general deficit in learning. The weak inter-parameter correlations in the parameter recovery test resolved this concern, especially given that, as we now noted, the simulated parameter space encompassed both low and high elasticity biases (range=[.02,.76]). Furthermore, regarding the Reviewer's concern about bias in the parameter recovery, we found no such significant bias with respect to the elasticity bias parameter (Δ(Simulated, Recovered)= -.03, p=.25), showing that our experiment could accurately identify low and high elasticity biases.

      The statistical structure of the task is inconsistent with the framing. In the framing, participants can make either one or two second boarding attempts (jumps) by purchasing extra tickets. The additional attempt(s) will thus succeed with probability p for one ticket and 2p – p<sup>^</sup>2 for two tickets; the p<sup>^</sup>2 captures the fact that you only take the second attempt if you fail on the first. A consequence of this is buying more tickets has diminishing returns. In contrast, in the task, participants always jumped twice after purchasing two tickets, and the probability of success with two tickets was exactly double that with one ticket. Thus, if participants are applying an intuitive causal model to the task, they will appear to "underestimate" the elasticity of control. I don't think this seriously jeopardizes the key results, but any follow-up work should ensure that the task's structure is consistent with the intuitive causal model.

      We thank the Reviewer for this comment, and agree the participants may have employed the intuitive understanding the Reviewer describes. This is consistent with our model comparison results, which showed that participants did not assume that control increases linearly with resource investment (lines 677-692). Consequently, this is also not assumed by our model, except perhaps by how the prior is implemented (a property that was supported by model comparison). In the text, we acknowledge that this aspect of the model and participants’ behavior deviates from the true task's structure, and it would be worthwhile to address this deviation in future studies. 

      That said, there is no reason that this will make participants appear to be generally underestimating elasticity. Following exposure to outcomes for one and three tickets, any nonlinear understanding of probabilities would only affect the controllability estimate for two tickets. This would have contrasting effects on the elasticity estimated to the second and third tickets, but on average, it would not change the overall elasticity estimated. On the other hand, such a participant is only exposed to outcomes for two and three tickets, they would come to judge the difference between the first and second tickets too highly, thereby overestimating elasticity.  

      The model is heuristically defined and does not reflect Bayesian updating. For example, it overestimates maximum control by not using losses with less than 3 tickets (intuitively, the inference here depends on what your beliefs about elasticity). Including forced three-ticket trials at the beginning of each round makes this less of an issue; but if you want to remove those trials, you might need to adjust the model. The need to introduce the modified model with kappa is likely another symptom of the heuristic nature of the model updating equations.

      Note that we have tested a fully Bayesian model (lines 676-691), but found that this model fitted participants’ choices worse. 

      You're right; saying these analyses provides "no information" was unfair. I agree that this is a useful way to link model parameters with behavior, and they should remain in the paper. However, my key objection still holds: these analyses do not tell us anything about how *people's* prior assumptions influence behavior. Instead, they tell us about how *fitted model parameters* depend on observed behavior. You can easily avoid this misreading by adding a small parenthetical, e.g.

      Thus, a prior assumption that control is likely available **(operationalized by \gamma_controllability)** was reflected in a futile investment of resources in uncontrollable environments.

      We thank the Reviewer for the suggestion and have added this parenthetical (lines 219, 225).

    1. eLife Assessment

      This study provides valuable insights with solid evidence into altered tactile perception in a mouse model of ASD (Fmr1 mice), paralleling sensory abnormalities in Fragile X and autism. Its main strength lies in the use of a novel tactile categorization task and the careful dissection of behavioral performance across training and difficulty levels, suggesting that deficits may stem from an interaction between sensory and cognitive processes. However, while the experiments are well executed, the reported effects are subtle and sometimes non-significant. The interpretation of results may be over-extended given the nature of the data (solely behavioral) and the absence of mechanistic, causal, or computational approaches limits the strength of the broader conclusions. The work will be relevant to those interested in autism, cognition, and/or sensory processing.

    2. Reviewer #1 (Public review):

      Summary:

      This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention.

      Strengths:

      The experiments seem well performed, with interesting results. Thus, this study can/will advance our understanding of atypical tactile perception and its relation to cognitive factors in autism.

      Weaknesses:

      Certain aspects of the analyses (and therefore the results) are unclear, which makes the manuscript difficult to understand. Clearer presentation, with the addition of more standard psychometric analyses, and/or other useful models (like logistic regression) would improve this aspect. The use of d' needs better explanation, both in terms of how and why these analyses are appropriate (and perhaps it should be applied for more specific needs rather than as a ubiquitous measure).

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents a tactile categorization task in head-fixed mice to test whether Fmr1 knockout mice display differences in vibrotactile discrimination using the forepaw. Tactile discrimination differences have been previously observed in humans with Fragile X Syndrome, autistic individuals, as well as mice with loss of Fmr1 across multiple studies. The authors show that during training, Fmr1 mutant mice display subtle deficits in perceptual learning of "low salience" stimuli, but not "high salience" stimuli, during the task. Following training, Fmr1 mutant mice displayed an enhanced tactile sensitivity under low-salience conditions but not high-salience stimulus conditions. The authors suggest that, under 'high cognitive load' conditions, Fmr1 mutant mouse performance during the lowest indentation stimuli presentations was affected, proposing an interplay of sensory and cognitive system disruptions that dynamically affect behavioral performance during the task.

      Strengths:

      The study employs a well-controlled vibrotactile discrimination task for head-fixed mice, which could serve as a platform for future mechanistic investigations. By examining performance across both training stages and stimulus "salience/difficulty" levels, the study provides a more nuanced view of how tactile processing deficits may emerge under different cognitive and sensory demands.

      Weaknesses:

      The study is primarily descriptive. The authors collect behavioral data and fit simple psychometric functions, but provide no neural recordings, causal manipulations, or computational modeling. Without mechanistic evidence, the conclusions remain speculative. Second, the authors repeatedly make strong claims about "categorical priors," "attention deficits," and "choice biases," but these constructs are inferred indirectly from secondary behavioral measures. Many of the effects are based on non-significant trends, and alternative explanations (such as differences in motivation, fatigue, satiety, stereotyped licking, and/or reward valuation) are not considered. Third, the mapping of the behavioral results onto high-level cognitive constructs is tenuous and overstated. The authors' interpretations suggest that they directly tested cognitive theories such as Load Theory, Adaptive Resonance Theory, or Weak Central Coherence. However, the experiments do not manipulate or measure variables that would allow such theories to be tested. More specific comments are included below.

      (1) The authors employ a two-choice behavioral task to assess forepaw tactile sensitivity in Fmr1 knockout mice. The data provide an interesting behavioral observation, but it is a descriptive study. Without mechanistic experiments, it is difficult to draw any conclusions, especially regarding top-down or bottom-up pathway dysfunctions. While the task design is elegant, the data remain correlational and do not advance our mechanistic understanding of Fmr1-related sensory and/or cognitive alterations.

      (2) The conclusions hinge on speculative inferences about "reduced top-down categorization influence" or "choice consistency bias," but no neural, circuit-level, or causal manipulations (e.g., optogenetics, pharmacology, targeted lesions, modeling) are used to support these claims. Without mechanistic data, the translational impact is limited.

      (3) Statistical analysis:

      (a) Several central claims are based on "trends" rather than statistically significant effects (e.g., reduced task sensitivity, reduced across-category facilitation). Building major interpretive arguments on non-significant findings undermines confidence in the conclusions.

      (b) The n number for both genotypes should be increased. In several experiments (e.g., Figure 1D, 2E), one animal appears to be an outlier. Considering the subtle differences between genotypes, such an outlier could affect the statistical results and subsequent interpretations.

      (c) The large number of comparisons across salience levels, categories, and trial histories raises concern for false positives. The manuscript does not clearly state how multiple comparisons were controlled.

      (d) The data in Figure 5, shown as separate panels per indentation value, are analyzed separately as t-tests or Mann-Whitney tests. However, individual comparisons are inappropriate for this type of data, as these are repeated stimulus applications across a given session. The data should be analyzed together and post-hoc comparisons reported. Given the very subtle difference in miss rates across control and mutant mice for 'low-salience' stimulus trials, this is unlikely to be a statistically meaningful difference when analyzed using a more appropriate test.

      (4) Emphasis on theoretical models:

      The paper leans heavily on theories such as Adaptive Resonance Theory, Load Theory of Attention, and Weak Central Coherence, but the data do not actually test these frameworks in a rigorous way. The discussion should be reframed to highlight the potential relevance of these frameworks while acknowledging that the current data do not allow them to be assessed.

    4. Reviewer #3 (Public review):

      Summary:

      Developing consistent and reliable biomarkers is critically important for developing new pharmacological therapies in autism spectrum disorders (ASDs). Altered sensory perception is one of the hallmarks of autism and has been recently added to DSM-5 as one of the core symptoms of autism. Touch is one of the fundamental sensory modalities, yet it is currently understudied. Furthermore, there seems to be a discrepancy between different studies from different groups focusing on tactile discrimination. It is not clear if this discrepancy can be explained by different experimental setups, inconsistent terminology, or the heterogeneity of sensory processing alterations in ASDs. The authors aim to investigate the interplay between tactile discrimination and cognitive processes during perceptual decisions. They have developed a forepaw-based 2-alternative choice task for mice and investigated tactile perception and learning in Fmr1-/y mice

      Strengths:

      There are several strengths of this task: translational relevance to human psychophysical protocols, including controlled vibrotactile stimulation. In addition to the experimental setup, there are also several interesting findings: Fmr1-/y mice demonstrated choice consistency bias, which may result in impaired perceptual learning, and enhanced tactile discrimination in low-salience conditions, as well as attentional deficits with increased cognitive load. The increase in the error rates for low salience stimuli is interesting. These observations, together with the behavioral design, may have a promising translational potential and, if confirmed in humans, may be potentially used as biomarkers in ASD.

      Weaknesses:

      Some weaknesses are related to the lack of the original raster plots and density plots of licks under different conditions, learning rate vs time, and evaluation of the learning rate at different stages of learning. Overall, these data would help to answer the question of whether there are differences in learning strategies or neural circuit compensation in Fmr1-/y mice. It is also not clear if reversal learning is impaired in Fmr1-/y mice.

    5. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention.  

      We appreciate the reviewer’s statement highlighting the importance of our study. 

      Strengths: 

      The experiments seem well performed, with interesting results. Thus, this study can/will advance our understanding of atypical tactile perception and its relation to cognitive factors in autism. 

      We thank the reviewer for recognizing the quality of our experiments and the relevance of our findings for understanding tactile perception and cognition in autism.

      Weaknesses: 

      Certain aspects of the analyses (and therefore the results) are unclear, which makes the manuscript difficult to understand. Clearer presentation, with the addition of more standard psychometric analyses, and/or other useful models (like logistic regression) would improve this aspect. The use of d' needs better explanation, both in terms of how and why these analyses are appropriate (and perhaps it should be applied for more specific needs rather than as a ubiquitous measure). 

      We thank the reviewer for the helpful comments. We understand that the analyses were difficult to follow, and we will work on the clarity of the Results section. However, we would like to emphasize that every d′ measure is accompanied by analyses of response rates (i.e., correct and incorrect choice rates). In addition, we applied standard psychometric analyses whenever possible. Specifically, psychometric functions were fitted to the data using logistic regression. We will rework the text to clarify these points.

      During training, only two stimulus amplitudes were presented, which precluded the construction of psychometric curves. For the categorization task, however, psychometric analyses were feasible and conducted (Figure 2). These analyses revealed no evidence of categorization bias (as measured by threshold) or accuracy (as measured by the slope) across stimulus strengths.

      The calculation of d’ is included in the Methods, but we will also report and explain its use in each part of the Results section where it has been included.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript presents a tactile categorization task in head-fixed mice to test whether Fmr1 knockout mice display differences in vibrotactile discrimination using the forepaw. Tactile discrimination differences have been previously observed in humans with Fragile X Syndrome, autistic individuals, as well as mice with loss of Fmr1 across multiple studies. The authors show that during training, Fmr1 mutant mice display subtle deficits in perceptual learning of "low salience" stimuli, but not "high salience" stimuli, during the task. Following training, Fmr1 mutant mice displayed an enhanced tactile sensitivity under low-salience conditions but not high-salience stimulus conditions. The authors suggest that, under 'high cognitive load' conditions, Fmr1 mutant mouse performance during the lowest indentation stimuli presentations was affected, proposing an interplay of sensory and cognitive system disruptions that dynamically affect behavioral performance during the task. 

      Strengths: 

      The study employs a well-controlled vibrotactile discrimination task for head-fixed mice, which could serve as a platform for future mechanistic investigations. By examining performance across both training stages and stimulus "salience/difficulty" levels, the study provides a more nuanced view of how tactile processing deficits may emerge under different cognitive and sensory demands. 

      We thank the reviewer for emphasizing the strengths of our task design and analysis approach, and we appreciate that the potential of this platform for future mechanistic investigations is recognized.

      Weaknesses: 

      The study is primarily descriptive. The authors collect behavioral data and fit simple psychometric functions, but provide no neural recordings, causal manipulations, or computational modeling. Without mechanistic evidence, the conclusions remain speculative. 

      We thank the reviewer for the careful reading of our manuscript and for the constructive feedback. The reviewer raises a valid point. We agree that our study is primarily descriptive and focused on behavioral data, and we appreciate the opportunity to clarify the scope and interpretation of our findings. Our primary goal was to characterize behavioral patterns during tactile discrimination and categorization, and the psychometric analyses were intended to provide a detailed description of these patterns. We do not claim to provide direct neural, causal, or computational evidence. 

      Second, the authors repeatedly make strong claims about "categorical priors," "attention deficits," and "choice biases," but these constructs are inferred indirectly from secondary behavioral measures. Many of the effects are based on non-significant trends, and alternative explanations (such as differences in motivation, fatigue, satiety, stereotyped licking, and/or reward valuation) are not considered. 

      Alternative explanations of our findings, such as differences in motivation, fatigue, satiety, stereotyped licking, and reward valuation have indeed been considered. We will revise the manuscript to present these points more clearly. 

      Third, the mapping of the behavioral results onto high-level cognitive constructs is tenuous and overstated. The authors' interpretations suggest that they directly tested cognitive theories such as Load Theory, Adaptive Resonance Theory, or Weak Central Coherence. However, the experiments do not manipulate or measure variables that would allow such theories to be tested. More specific comments are included below.

      This was not done intentionally. We do not claim to have tested the Load Theory; rather, inspired by it, we assessed behavioral patterns in our tactile categorization task. We agree that referring to the Adaptive Resonance Theory, which is based on artificial neural network models, might be misleading since we focus on behavioral results, and we will revise the text accordingly. However, our task allowed us to examine the impact of categorization on discrimination, confirming that Fmr1<sup>-/y</sup>ation can amplify perceptual differences between stimuli belonging to different categories and reduce perceived differences within a category in WT mice but not in the mice when low-salience stimuli were experienced. Finally, we do not claim to have tested the Weak Central Coherence theory, although our results suggest reduced use of categories in low-salience tactile discrimination. 

      (1) The authors employ a two-choice behavioral task to assess forepaw tactile sensitivity in Fmr1 knockout mice. The data provide an interesting behavioral observation, but it is a descriptive study. Without mechanistic experiments, it is difficult to draw any conclusions, especially regarding top-down or bottom-up pathway dysfunctions. While the task design is elegant, the data remain correlational and do not advance our mechanistic understanding of Fmr1-related sensory and/or cognitive alterations. 

      We agree with the reviewer that our current experiments are behavioral in nature and do not provide direct mechanistic evidence for top-down pathway dysfunction. Our goal was to carefully characterize tactile responses and behavioral patterns in Fmr1<sup>-/y</sup> mice. The notion of “top-down” is used at the behavioral level, referring to the influence of higher-level cognitive processes (e.g., categorization, attention) on perception, rather than to underlying neural circuits. We will revise the manuscript to more clearly emphasize that our conclusions are based on behavioral observations, and we will frame mechanistic inferences as hypotheses rather than established findings. We will also explicitly note that future work using neural recordings or causal manipulations will be required to directly test these hypotheses.

      We also note that identifying the precise top-down circuits involved will require extensive additional experimentation. For example, one would first need to pinpoint the specific top-down pathway that modulates the influence of categorization on discrimination without directly altering categorization itself. After such a circuit is identified, further work would then be needed to rescue or manipulate this pathway in the Fmr1<sup>-/y</sup> model. These steps represent a substantial program of mechanistic research that, while important, goes well beyond the scope of the present study.

      (2) The conclusions hinge on speculative inferences about "reduced top-down categorization influence" or "choice consistency bias," but no neural, circuit-level, or causal manipulations (e.g., optogenetics, pharmacology, targeted lesions, modeling) are used to support these claims. Without mechanistic data, the translational impact is limited. 

      We recognize that “reduced top-down categorization influence” and “choice consistency bias” are based on behavioral observations. However, we respectfully disagree that this makes these constructs inherently speculative. Similar behavioral inferences have been applied in previous clinical studies to characterize cognitive tendencies (Soulières et al., 2007; Feigin et al., 2021). The translational impact of our work lies in the highly translational platform we have developed – and in highlighting the complexity of tactile measures and additional analyses that can be conducted in clinical studies.

      We agree with the reviewer that the neural-based experiments would indeed provide valuable mechanistic insight into our observed behavioral alterations, and we believe future studies should therefore focus on their underlying neurobiological substrate.

      We will revise the language throughout the manuscript to clarify that all conclusions are based on behavioral measures.  

      (3) Statistical analysis: 

      (a) Several central claims are based on "trends" rather than statistically significant effects (e.g., reduced task sensitivity, reduced across-category facilitation). Building major interpretive arguments on nonsignificant findings undermines confidence in the conclusions.  

      Several trends are evident in complex measures, such as d’ analyses on task sensitivity or responses pooled across different amplitudes. Additional analyses revealed which component of these measures showed a statistically significant difference across genotypes, namely the low-salience incorrect choices accounting for low task sensitivity. We chose to present all analyses to be transparent and to highlight that commonly used complex measures (like d’ analyses) may mask important findings. In the text, we described p-values between 0.05 and 0.1 as observed trends without over-interpreting their significance. 

      (b) The n number for both genotypes should be increased. In several experiments (e.g., Figure 1D, 2E), one animal appears to be an outlier. Considering the subtle differences between genotypes, such an outlier could affect the statistical results and subsequent interpretations. 

      The number of mice used in each genotype group is consistent with standard practices in behavioral studies using mice and sensory tasks. We have performed effect size measures (e.g., Cohen’s d) alongside some of the statistical comparisons, showing a medium effect size (>0.5). 

      As the reviewer correctly noted, no mice were excluded based on outlier analyses, since the observed variability reflects true biological differences rather than experimental or technical errors. We will reexamine our dataset for potential outliers. If any are identified, we will perform analyses both with and without the outlier and report any effects that are sensitive to single animals. These procedures and results will be explicitly described in the Methods and Results sections.

      (c) The large number of comparisons across salience levels, categories, and trial histories raises concern for false positives. The manuscript does not clearly state how multiple comparisons were controlled.  

      We thank the reviewer for raising this important point and we will include a clear statement on multiple comparisons in the Methods section. 

      (d) The data in Figure 5, shown as separate panels per indentation value, are analyzed separately as ttests or Mann-Whitney tests. However, individual comparisons are inappropriate for this type of data, as these are repeated stimulus applications across a given session. The data should be analyzed together and post-hoc comparisons reported. Given the very subtle difference in miss rates across control and mutant mice for 'low-salience' stimulus trials, this is unlikely to be a statistically meaningful difference when analyzed using a more appropriate test. 

      We thank the reviewer for raising this point. This was not done intentionally. A repeated-measures ANOVA on miss rates for low-salience stimuli during categorization confirmed that there are statistically significant differences both across stimulus amplitudes and between genotypes. Additional correction for multiple comparisons will be performed and explained in the Methods section.  

      (4) Emphasis on theoretical models: The paper leans heavily on theories such as Adaptive Resonance Theory, Load Theory of Attention, and Weak Central Coherence, but the data do not actually test these frameworks in a rigorous way. The discussion should be reframed to highlight the potential relevance of these frameworks while acknowledging that the current data do not allow them to be assessed. 

      As mentioned above, our goal was not to directly test these theories but rather to apply them within our translational framework. The Discussion section will be reframed to highlight that our findings are consistent with predictions from certain cognitive theories rather than implying that these frameworks were directly tested.

      Reviewer #3 (Public review): 

      Summary: 

      Developing consistent and reliable biomarkers is critically important for developing new pharmacological therapies in autism spectrum disorders (ASDs). Altered sensory perception is one of the hallmarks of autism and has been recently added to DSM-5 as one of the core symptoms of autism. Touch is one of the fundamental sensory modalities, yet it is currently understudied. Furthermore, there seems to be a discrepancy between different studies from different groups focusing on tactile discrimination. It is not clear if this discrepancy can be explained by different experimental setups, inconsistent terminology, or the heterogeneity of sensory processing alterations in ASDs. The authors aim to investigate the interplay between tactile discrimination and cognitive processes during perceptual decisions. They have developed a forepaw-based 2-alternative choice task for mice and investigated tactile perception and learning in Fmr1-/y mice 

      Strengths: 

      There are several strengths of this task: translational relevance to human psychophysical protocols, including controlled vibrotactile stimulation. In addition to the experimental setup, there are also several interesting findings: Fmr1-/y mice demonstrated choice consistency bias, which may result in impaired perceptual learning, and enhanced tactile discrimination in low-salience conditions, as well as attentional deficits with increased cognitive load. The increase in the error rates for low salience stimuli is interesting. These observations, together with the behavioral design, may have a promising translational potential and, if confirmed in humans, may be potentially used as biomarkers in ASD. 

      We appreciate the reviewer’s positive assessment of our study’s translational value and the importance of our behavioral findings.

      Weaknesses: 

      Some weaknesses are related to the lack of the original raster plots and density plots of licks under different conditions, learning rate vs time, and evaluation of the learning rate at different stages of learning. Overall, these data would help to answer the question of whether there are differences in learning strategies or neural circuit compensation in Fmr1-/y mice. It is also not clear if reversal learning is impaired in Fmr1-/y mice.  

      We thank the reviewer for these helpful suggestions. We agree that visualizing behavioral patterns, such as raster and density plots of licks, as well as learning rate over time, could provide additional insights into learning dynamics. This analysis will be conducted and added into the revised manuscript.

      There was no assessment of reversal learning in Fmr1<sup>-/y</sup> mice in this study. While it is an interesting and important question based on previous findings in preclinical and clinical studies, it falls outside the scope of the current manuscript.    

      Feigin H, Shalom-Sperber S, Zachor DA, Zaidel A (2021) Increased influence of prior choices on perceptual decisions in autism. Elife 10.

      Soulières I, Mottron L, Saumier D, Larochelle S (2007) At ypical categorical perception in autism: Autonomy of discrimination? J Autism Dev Disord 37:481–490.

    1. eLife Assessment

      This important study reports an endometrial organoid culture system mimicking the window of implantation. The evidence supporting the conclusion drawn is convincing. The data will be of interest to embryologists and investigators working on reproductive biology and medicine.

    2. Reviewer #1 (Public review):

      Summary:

      This study generated 3D cell constructs from endometrial cell mixtures that were seeded in the Matrigel scaffold. The cell assemblies were treated with hormones to induce a "window of implantation" (WOI) state. Although many bioinformatic analyses point in this direction, there are major concerns that must be addressed.

      Strengths:

      The addition of 3 hormones to enhance the WOI state (although not clearly supported in comparison to the secretory state).

      Comments on revisions:

      The authors did their best to revise their study according to the Reviewers' comments. However, the study remains unconvincing, incomplete and at the same time still too dense and not focused enough.

    3. Reviewer #2 (Public review):

      Zhang et al. have developed an advanced three-dimensional culture system of human endometrial cells, termed a receptive endometrial assembloid, that models the uterine lining during the crucial window of implantation (WOI). During this mid-secretory phase of the menstrual cycle, the endometrium becomes receptive to an embryo, undergoing distinctive changes. In this work, endometrial cells (epithelial glands, stromal cells, and immune cells from patient samples) were grown into spheroid assembloids and treated with a sequence of hormones to mimic the natural cycle. Notably, the authors added pregnancy-related factors (such as hCG and placental lactogen) on top of estrogen and progesterone, pushing the tissue construct into a highly differentiated, receptive state. The resulting WOI assembloid closely resembles a natural receptive endometrium in both structure and function. The cultures form characteristic surface structures like pinopodes and exhibit abundant motile cilia on the epithelial cells, both known hallmarks of the mid-secretory phase. The assembloids also show signs of stromal cell decidualization and an epithelial mesenchymal transition, like process at the implantation interface, reflecting how real endometrial cells prepare for possible embryo invasion.

      Although the WOI assembloid represents an important step forward, it still has limitations: the supportive stromal and immune cell populations decrease over time in culture, so only early-passage assembloids retain full complexity. Additionally, the differences between the WOI assembloid and a conventional secretory-phase organoid are more quantitative than absolute; both respond to hormones and develop secretory features, but the WOI assembloid achieves a higher degree of differentiation due to the addition of "pregnancy" signals. Overall, while it's a reinforced model (not an exact replica of the natural endometrium), it provides a valuable in vitro system for implantation studies and testing potential interventions, with opportunities to improve its long-term stability and biological fidelity in the future.

    1. eLife Assessment

      This study presents a valuable finding on whether executive resources mediate the impact of language predictability in reading in the context of aging. The presentation of evidence is incomplete; further conceptual clarifications, methodological details, and addressing potential confounds would strengthen the study. The work will be of interest to cognitive neuroscientists working on reading, language comprehension, and executive control.

    2. Reviewer #1 (Public review):

      This manuscript reports a dual-task experiment intended to test whether language prediction relies on executive resources, using surprisal-based measures of predictability and an n-back task to manipulate cognitive load. While the study addresses a question under debate, the current design and modeling framework fall short of supporting the central claims. Key components of cognitive load, such as task switching, word prediction vs integration, are not adequately modeled. Moreover, the weak consistency in replication undermines the robustness of the reported findings. Below unpacks each point.

      Cognitive load is a broad term. In the present study, it can be at least decomposed into the following components:

      (1) Working memory (WM) load: news, color, and rank.

      (2) Task switching load: domain of attention (color vs semantics), sensorimotor rules (c/m vs space).

      (3) Word comprehension load (hypothesized against): prediction, integration.

      The components of task switching load should be directly included in the statistical models. Switching of sensorimotor rules may be captured by the "n-back reaction" (binary) predictor. However, the switching of attended domains and the interaction between domain switching and rule complexity (1-back or 2-back) were not included. The attention control experiment (1) avoided useful statistical variation from the Read Only task, and (2) did not address interactions. More fundamentally, task-switching components should be directly modeled in both performance and full RT models to minimize selection bias. This principle also applies to other confounding factors, such as education level. While missing these important predictors, the current models have an abundance of predictors that are not so well motivated (see later comments). In sum, with the current models, one cannot determine whether the reduced performance or prolonged RT was due to affecting word prediction load (if it exists) or merely affecting the task switching load.

      The entropy and surprisal need to be more clearly interpreted and modeled in the context of the word comprehension process. The entropy concerns the "prediction" part of the word comprehension (before seeing the next word), whereas surprisal concerns the "integration" part as a posterior. This interpretation is similar to the authors writing in the Introduction that "Graded language predictions necessitate the active generation of hypotheses on upcoming words as well as the integration of prediction errors to inform future predictions [1,5]." However, the Results of this study largely ignored entropy (treating it as a fixed effect) and only focus on surprisal without clear justification.

      In Table S3, with original and replicated model fitting results, the only consistent interaction is surprisal x age x cognitive load [2-back vs. Reading Only]. None of the two-way interactions can be replicated. This is puzzling and undermines the robustness of the main claims of this paper.

    3. Reviewer #2 (Public review):

      Summary:

      This paper considers the effects of cognitive load (using an n-back task related to font color), predictability, and age on reading times in two experiments. There were main effects of all predictors, but more interesting effects of load and age on predictability. The effect of load is very interesting, but the manipulation of age is problematic, because we don't know what is predictable for different participants (in relation to their age). There are some theoretical concerns about prediction and predictability, and a need to address literature (reading time, visual world, ERP studies).

      Strengths/weaknesses

      It is important to be clear that predictability is not the same as prediction. A predictable word is processed faster than an unpredictable word (something that has been known since the 1970/80s), e.g., Rayner, Schwanenfluegel, etc. But this could be due to ease of integration. I think this issue can probably be dealt with by careful writing (see point on line 18 below). To be clear, I do not believe that the effects reported here are due to integration alone (i.e., that nothing happens before the target word), but the evidence for this claim must come from actual demonstrations of prediction.

      The effect of load on the effects of predictability is very interesting (and also, I note that the fairly novel way of assessing load is itself valuable). Assuming that the experiments do measure prediction, it suggests that they are not cost-free, as is sometimes assumed. I think the researchers need to look closely at the visual world literature, most particularly the work of Huettig. (There is an isolated reference to Ito et al., but this is one of a large and highly relevant set of papers.)

      There is a major concern about the effects of age. See the Results (161-5): this depends on what is meant by word predictability. It's correct if it means the predictability in the corpus. But it may or may not be correct if it refers to how predictable a word is to an individual participant. The texts are unlikely to be equally predictable to different participants, and in particular to younger vs. older participants, because of their different experiences. To put it informally, the newspaper articles may be more geared to the expectations of younger people. But there is also another problem: the LLM may have learned on the basis of language that has largely been produced by young people, and so its predictions are based on what young people are likely to say. Both of these possibilities strike me as extremely likely. So it may be that older adults are affected more by words that they find surprising, but it is also possible that the texts are not what they expect, or the LLM predictions from the text are not the ones that they would make. In sum, I am not convinced that the authors can say anything about the effects of age unless they can determine what is predictable for different ages of participants. I suspect that this failure to control is an endemic problem in the literature on aging and language processing and needs to be systematically addressed.

      Overall, I think the paper makes enough of a contribution with respect to load to be useful to the literature. But for discussion of age, we would need something like evidence of how younger and older adults would complete these texts (on a word-by-word basis) and that they were equally predictable for different ages. I assume there are ways to get LLMs to emulate different participant groups, but I doubt that we could be confident about their accuracy without a lot of testing. But without something like this, I think making claims about age would be quite misleading.

    4. Author response:

      Reviewer #1 (Public review):

      Cognitive Load and Task-Switching Components:

      We agree that cognitive load is multi-faceted and encompasses dimensions not fully captured in our present models, including domain and rule switching. For the revision, we will explicitly model these components in the statistical analyses by incorporating predictors reflecting attended domain switching and rule complexity, as suggested. We will also explain our inclusion of n-back reaction predictors and justify their relationship with theoretical constructs of executive function. Full details of coding schemes will be provided.

      Modeling Entropy and Surprisal:

      We appreciate the reviewer’s suggestion to further explain the distinction between entropy (predictive uncertainty) and surprisal (integration difficulty), and acknowledge that our treatment of entropy warrants extension. In the revision, we will expand the results and discussion on entropy, providing clearer theoretical motivation for its inclusion and conducting supplementary analyses to examine its role alongside surprisal.

      Replicability of Findings:

      We note the concern regarding two-way vs. three-way interactions in model replication. In the revised manuscript, we will report robustness analyses on subsets of our data (e.g., matched age and education groups), clarify degrees of freedom and group sizes, and transparently report any discrepancies.

      Predictors and Statistical Modeling:

      We will add clarifications on predictor selection, data structure, and rationale for model hierarchy. The functions of d-prime, comprehension accuracy, and performance modeling will be described in more detail, including discussion of block-level vs. participant-level effects.

      Reviewer #2 (Public review):

      Distinction Between Prediction and Predictability:

      We recognize the importance of clearly communicating the difference between prediction and predictability, as well as integration-based vs. prediction-based effects. We will clarify these distinctions throughout the introduction, methods, and discussion sections, citing the relevant theoretical literature (e.g., Pickering & Gambi 2018; Federmeier 2007; Staub 2015; Frisson 2017).

      Aging, Corpus Predictability, and Individual Differences:

      We appreciate the critical point regarding age, corpus-based predictability, and potential cohort effects in language model estimates. In the revision, we will provide conceptual clarifications on how surprisal and entropy might differ for different age groups and discuss limitations in extrapolating these metrics to participant-specific predictions. The limitations inherent in relying on LLM-derived estimates and text materials will be more directly addressed.

      Coverage of Literature and Paradigms:

      We will broaden the literature review as requested, particularly on the N400 effects and behavioral traditions in prediction research. These additions should help contextualize the present work within both neuroscience and psycholinguistics.

      Experimental Context and Predictability Metrics:

      We will address concerns regarding the context window for prediction estimation, describing more precisely how context was defined and whether broader textual cues may improve predictability metrics.

      References

      Pickering, M.J. & Gambi, C. (2018). Predicting while comprehending language: A theory and review. Psychol. Bull., 144(10), 1002–1044.

      Federmeier, K.D. (2007). Thinking ahead: The role and roots of prediction in language comprehension. Psychophysiology, 44(4), 491–505.

      Frisson, S. (2017). Can prediction explain the lexical processing advantage for short words? J. Mem. Lang., 95, 121–138.\

      Staub, A. (2015). The effect of lexical predictability on eye movements in reading: Critical review and theoretical interpretation. Lang. Linguist. Compass, 9(8), 311–327.Huettig, F. & Mani, N. (2016). Is prediction necessary to understand language? Probably not. Trends Cogn. Sci., 20(10), 484–492.We appreciate the reviewers’ constructive comments and believe their suggestions will meaningfully strengthen the paper. Our planned revisions will address each of the above points with additional analyses, clarifications, and expanded discussion.

    1. eLife Assessment

      This study used a conditional knockout mouse line to remove Ptbp1 in retinal progenitors and showed that its deletion has no effect on retinal neurogenesis or cell fate specification, thereby challenging the prevailing view of Ptbp1 as a master regulator of neuronal fate. The findings are convincing, supported by transcriptome analysis, histology, and proliferation assays. This study is important, though the genetic tools employed may not fully capture Ptbp1's potential role during the earliest stages of retinal development.

    2. Reviewer #1 (Public review):

      Summary:

      The researchers sought to determine whether Ptbp1, an RNA-binding protein formerly thought to be a master regulator of neuronal differentiation, is required for retinal neurogenesis and cell fate specification. They used a conditional knockout mouse line to remove Ptbp1 in retinal progenitors and analyzed the results using bulk RNA-seq, single-cell RNA-seq, immunohistochemistry, and EdU labeling. Their findings show that Ptbp1 deletion has no effect on retinal development, since no defects were found in retinal lamination, progenitor proliferation, or cell type composition. Although bulk RNA-seq indicated changes in RNA splicing and increased expression of late-stage progenitor and photoreceptor genes in the mutants, and single-cell RNA-seq detected relatively minor transcriptional shifts in Müller glia, the overall phenotypic impact was low. As a result, the authors conclude that Ptbp1 is not required for retinal neurogenesis and development, thus contradicting prior statements about its important role as a master regulator of neurogenesis. They argue for a reassessment of this stated role. While the findings are strong in the setting of the retina, the larger implications for other areas of the CNS require more investigation. Furthermore, questions about potential reimbursement from Ptbp2 warrant further research.

      Strengths:

      This study calls into doubt the commonly held belief that Ptbp1 is a critical regulator of neurogenesis in the CNS, particularly in retinal development. The adoption of a conditional knockout mouse model provides a reliable way for eliminating Ptbp1 in retinal progenitors while avoiding the off-target effects often reported in RNAi experiments. The combination of bulk RNA-seq, scRNA-seq, and immunohistochemistry enables a thorough examination of molecular and cellular alterations at both embryonic and postnatal stages, which strengthens the study's findings. Furthermore, using publicly available RNA-Seq datasets for comparison improves the investigation of splicing and expression across tissues and cell types. The work is well-organized, with informative figure legends and supplemental data that clearly show no substantial phenotypic changes in retinal lamination, proliferation, or cell destiny, despite identified transcriptional and splicing modifications.

      Weaknesses:

      The retina-specific method raises questions regarding whether Ptbp1 is required in other CNS locations where its neurogenic roles were first proposed. The claim that Ptbp1 is "fully dispensable" for retinal development may be toned down, given the transcriptional and splicing modifications identified. The possibility of subtle or transitory impacts, such as ectopic neuron development followed by cell death, is postulated, but not completely investigated. Furthermore, as the authors point out, the compensating potential of increased Ptbp2 warrants additional exploration. Although the study performs well in transcriptome and histological analyses, it lacks functional assessments (such as electrophysiological or behavioral testing) to determine if small changes in splicing or gene expression affect retinal function. While 864 splicing events have been found, the functional significance of these alterations, notably the 7% that are neuronal-enriched and the 35% that are rod-specific, has not been thoroughly investigated. The manuscript might be improved by describing how these splicing changes affect retinal development or function.

    3. Reviewer #2 (Public review):

      Summary:

      Ptbp1 has been proposed as a key regulator of neuronal fate through its role in repressing neurogenesis. In this study, the authors conditionally inactivated Ptbp1 in mouse retinal progenitor cells using the Chx10-Cre line. While RNA-seq analysis at E16 revealed some changes in gene expression, there were no significant alterations in retinal cell type composition, and only modest transcriptional changes in the mature retina, as assessed by immunofluorescence and scRNAseq. Based on these findings, the authors conclude that Ptbp1 is not essential for cell fate determination during retinal development.

      Strengths:

      Despite some effects of Ptbp1 inactivation (initiated around E11.5 with the onset of Chx10-Cre activity) on gene expression and splicing, the data convincingly demonstrate that retinal cell type composition remains largely unaffected. This study is highly significant since it challenges the prevailing view of Ptbp1 as a central repressor of neurogenesis and highlights the need to further investigate, or re-evaluate, its role in other model systems and regions of the CNS.

      Weaknesses:

      A limitation of the study is the use of the Chx10-Cre driver, which initiates recombination around E11. This timing does not permit assessment of Ptbp1 function during the earliest phases of retinal development, if expressed at that time.

    1. eLife Assessment

      This valuable study presents a mechanistic model of predictive coding by medial entorhinal cortex grid cells, implemented with biologically detailed conductance-based neurons. The evidence supporting the emergence of this coding scheme from specific membrane currents and the anatomical connectivity among inhibitory neurons is solid. However, the justification for the choice of connectivity patterns and other network parameters remains somewhat incomplete. This work will be of interest to neuroscientists working on spatial navigation, circuit dynamics, and neuronal coding.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors aim to elucidate the mechanisms by which grid cells in the medial entorhinal cortex generate predictive representations of spatial location. To address this, they built a computational model integrating intrinsic neuronal dynamics with structured network connectivity. Specifically, they combine a conductance-based single-cell model incorporating biologically realistic HCN channels with a continuous attractor network that reflects known properties of grid cell circuitry. Their simulations show that HCN conductance can shift grid fields forward by approximately 5% of their diameter, consistent with experimental observations in layer II grid cells. Additionally, by introducing asymmetry in the connectivity of interneurons, the model produces larger forward shifts, which parallel properties observed in layer III grid cells. Together, these two mechanisms provide a unified framework for explaining layer-specific predictive coding in the entorhinal cortex.

      Strengths:

      A major strength of the study lies in its conceptual contribution. The authors propose two distinct mechanisms to generate forward-shifted grid fields for predictive coding. One mechanism is intrinsic and depends on the time constants associated with HCN channels. The other is network-based and results from asymmetries in interneuron connectivity. These two mechanisms correspond to different observed properties of grid cells in layer II and layer III, respectively. The modeling is based on previously validated frameworks of continuous attractor network models (e.g., Burak & Fiete; Kang & DeWeese), but it incorporates several novel features, including the incorporation of biophysically realistic HCN channels, a network architecture that excludes stellate-stellate connections and relies on interneurons, and asymmetric interneuron connectivity.

      Weaknesses:

      One of the proposed mechanisms for predictive coding, namely asymmetric interneuron connectivity, is a novel idea. However, this type of connectivity has not yet been demonstrated experimentally in the medial entorhinal cortex. Therefore, the biological plausibility of this mechanism remains uncertain and will need to be evaluated in future empirical studies.

    3. Reviewer #2 (Public review):

      Summary:

      This study proposes that predictive spatial representations in medial entorhinal cortex (MEC) grid cells arise through two distinct biophysical mechanisms: (1) HCN conductance-dependent temporal dynamics, which generate modest forward shifts (~5% of grid field diameter) in Layer II cells, and (2) network asymmetry, enabling larger predictive shifts (~25% of grid field diameter) in Layer III cells. The model further predicts a dorsoventral gradient in predictive coding magnitude, correlating with observed HCN conductance variations. These results provide a mechanistic framework for understanding how intrinsic cellular properties and circuit architecture collectively enable prospective spatial coding in the MEC. This is an important study.

      Strengths:

      These findings reveal how cellular properties and circuit design enable prospective spatial coding. This novel, impactful study will be of interest to the field.

      Weaknesses:

      Some of the models are too mathematical and do not fit with the biological observation.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Shaikh and Assisi addresses a timely and important question related to the neural circuit mechanisms underlying spatial representations during navigation. Concretely, they present a model of the medial entorhinal cortex (MEC) with biophysically detailed conductance-based stellate cells that can perform path integration and reveal two potential mechanisms underlying two forms of predictive coding by grid cells in the MEC. One mechanism uses HCN channels to explain predictive coding in MEC layer II grid cells equivalent to ~5% of the diameter of a grid field, and the other uses asymmetric connections between interneurons and stellate cells, resulting in a ~25% predictive bias of layer III grid cells. The methods and model are technically sound, and the model is expected to be useful for computational neuroscientists studying the neural mechanisms of spatial navigation.

      Strengths:

      One strength of the model is its use of conductance-based neuron models of stellate cells and interneurons, adding important biophysical constraints and details to existing continuous attractor network models of grid cells. The model fills a gap in the literature by providing mechanisms for predictive coding constrained by biophysical properties of stellate cells and simplified network topology.

      Weaknesses:

      A weakness of the model is that the neural network is relatively small (five sheets with 71 × 71 neurons each), and the 2-D toroidal topology is further simplified to a 1-D ring attractor consisting of three rings with 192 neurons each. The model incorporates biophysical detail at the single-neuron level, but not at the network level. For example, it includes only stellate cells and a generic interneuron type, and does not implement data-driven connectivity patterns.

      The restricted network size and the limited experimental knowledge about connectivity among stellate cells, principal cells, and different interneuron types in the MEC could be addressed in more detail. Moreover, the manuscript lacks a thorough discussion of assumptions common to most continuous attractor network (CAN) models of grid cells, such as the use of "hand-crafted" connections between direction-sensitive conjunctive grid cells and network cells to drive attractor shifts. Including such a discussion would strengthen the manuscript. This is especially relevant given the authors' explicit claim that they have revealed two mechanisms underlying the emergence of a predictive code in the MEC. In this reviewer's view, the work demonstrates a potential mechanism, but one that requires experimental verification. The significance of the model would thus be increased by providing more experimentally testable predictions of the model.

    1. eLife Assessment

      This fundamental study shows how past experiences shape perception across short, medium, and long time scales, using a single behavioural paradigm and reanalysed EEG data. It provides convincing evidence for two processes across all scales: an attention-dependent mechanism that speeds responses to expected events, and an attention-independent mechanism where expected events are encoded less precisely, consistent with feedforward dampening. The work offers a unifying account of temporal context effects, though stronger brain-behaviour links, integration with serial dependence attraction and repulsion models, and extension to other timescale definitions would further strengthen the contribution.

    2. Reviewer #1 (Public review):

      Summary:

      This paper addresses an important and topical issue: how temporal context - at various time scales - affects various psychophysical measures, including reaction times, accuracy and localization. It offers interesting insights, with separate mechanisms for different phenomena, which are well discussed.

      Strengths:

      The paradigm used is original and effective. The analyses are rigorous.

      Comments on revised version:

      I think the authors have dealt adequately with my issues, none of which were fundamental.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the influence of prior stimuli over multiple time scales in a position discrimination task, using pupillometry data and a reanalysis of EEG data from an existing dataset. The authors report consistent history-dependent effects across task-related, task-unrelated, and stimulus-related dimensions, observed across different time scales. These effects are interpreted as reflecting a unified mechanism operating at multiple temporal levels, framed within predictive coding theory.

      Strengths:

      The authors have done a good job in their revision, clarifying important points and stating the limitations of the study clearly.

      I also think they made a valid effort to address and correct issues arising from the temporal dependency confound, although I still wonder whether the best approach would have been to design an experiment in a way that avoided this confound in the first place.<br /> Overall, this is a substantially improved version, and I particularly appreciate the clarification and correction regarding the direction of the bias in the EEG data (repulsive rather than attractive).

      Weaknesses:

      These are now relatively minor points.

      I believe this latter aspect, the repulsive bias, may deserve further discussion, especially in relation to their behavioral findings and, in particular, to earlier work proposing multi-stage frameworks of serial dependence, where low-level repulsion interacts with attractive biases at higher-level stages (Fritsche et al., 2020; Pascucci et al., 2019; Sheehan & Serences, 2022). The authors may also consider to cite some key reviews on serial dependence that discuss both repulsion and attraction in forced-choice and reproduction tasks (Manassi et al., 2023; Pascucci et al., 2023).

      Related to this, after finding the opposite pattern, is the sentence in line 472-473 ("Further, we found an attractive...") and the related argument still valid?

      Regarding my earlier point about former line 197 and Figure 3b,c: what I noticed-similar to the patterns reported in the studies I referenced-is that the data cannot be simply described as showing faster and more accurate responses for small deltas. Responses also appear faster and more accurate for very large deltas, with performance being worse in between. Indeed, as the authors state: "The peak in precision for large Deltas locations is consistent with alternate events being encoded more precisely, while the peak for small offsets may be explained by the attractive bias towards the previous target." I wonder whether it is necessary, or unequivocally supported by the data, to hypothesize two separate mechanisms here. An alternative could be interference effects between consecutive stimuli that are neither identical nor completely different-making the previous one more likely to interfere with the current stimulus representation.

      Finally, this is definitely a minor point, but I still find the reply to my comment about the prediction of stable retinal input rather speculative. Such a prediction would seem more plausible in world-centered coordinates.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The manuscript is quite dense, with some concepts that may prove difficult for the non-specialist. I recommend spending a few more words (and maybe some pictures) describing the difference between task-relevant and task-irrelevant planes. Nice technique, but not instantly obvious. Then we are hit with "stimulus-related", which definitely needs some words (also because it is orthogonal to neither of the above). 

      We agree that the original description of the planes was too terse and have expanded on this in the revised manuscript.

      Line 85 - To test the influence of attention, trials were sorted according to two spatial reference planes, based on the location of the stimulus: task-related and task-unrelated (Fig. 1b). The task-related plane corresponded to participants’ binary judgement (Fig 1b, light cyan vertical dashed line) and the task-unrelated plane was orthogonal to this (Fig 1b, dark cyan horizontal dashed line). For example, if a participant was tasked with performing a left-or-right of fixation judgement, then their task-related plane was the vertical boundary between the left and right side of fixation, while their task-unrelated plane was the horizontal boundary. The former (left-right) axis is relevant to their task while the latter (top-bottom) axis is orthogonal and task irrelevant. This orthogonality can be leveraged to analyze the same data twice (once according to the task-related plane and again according to the taskunrelated plane) in order to compare performance when the relative location of an event is either task relevant or irrelevant.

      Line 183 - whereas task planes were constant, the stimulus-related plane was defined by the location of the stimulus on the previous trial, and thus varied from trial to trial. That is, on each trial, the target is considered a repeat if it changes location by <|90°| relative to its location on the previous trial, and an alternate if it moves by >|90°|.

      (2) While I understand that the authors want the three classical separations, I actually found it misleading. Firstly, for a perceptual scientist to call intervals in the order of seconds (rather than milliseconds), "micro" is technically coming from the raw prawn. Secondly, the divisions are not actually time, but events: micro means one-back paradigm, one event previously, rather than defined by duration. Thirdly, meso isn't really a category, just a few micros stacked up (and there's not much data on this). And macro is basically patterns, or statistical regularities, rather than being a fixed time. I think it would be better either to talk about short-term and long-term, which do not have the connotations I mentioned. Or simply talk about "serial dependence" and "statistical regularities". Or both. 

      We agree that the temporal scales defined in the current study are not the only way one could categorize perceptual time. We also agree that by using events to define scales, we ignore the influence of duration. In terms of the categories, we selected these for two reasons: 1) they conveniently group previous phenomena, and 2) they loosely correspond to iconic-, short- and long-term memory. We agree that one could also potentially split it up into two categories (e.g., short- and long-term), but in general, we think any form of discretization will have limitations. For example, Reviewer 1 suggests that the meso category is simply a few micros stacked together. However, there is a rich literature on phenomena associated with sequences of an intermediate length that do not appear to be entirely explained by stacking micro effects (e.g., sequence learning and sequential dependency). We also find that when controlling for micro level effects, there are clear meso level effects. Also, by the logic that meso level effects are just stacked micro effects, one could also argue the same for macro effects. We don’t think this argument is incorrect, rather we think it exemplifies the challenge of discretising temporal scales. Ultimately, the current study was aimed to test whether seemingly disparate phenomena identified in previous work could be captured by unifying principles. To this end we found that these categories were the most useful. However, we have included a “Limitations and future directions” section in the Discussion of the revised manuscript that acknowledges both the alternative scheme proposed by Reviewer 1, and the value of extending this work to consider the influence of duration (as well as events).

      Line 488 - Limitations and future directions. One potential limitation of the current study is the categorization of temporal scales according to events, independent of the influence of event duration. While this simplification of time supports comparison between different phenomena associated with each scale (e.g., serial dependence, sequential dependencies, statistical learning), future work could investigate the role of duration to provide a more comprehensive understanding of the mechanisms identified in the current study.

      Related to this, while the temporal scales applied here conveniently categorized known sensory phenomena, and partially correspond to iconic-, short-, and long-term memory, they are but one of multiple ways to delineate time. For example, temporal scales could alternatively be defined simply as short- and long-term (e.g., by combining micro and meso scale phenomena). However, this could obscure meaningful differences between phenomena associated with sensory persistence and short-term memory, or qualitative differences in the way that shortsequences of events are processed.

      (3) More serious is the issue of precision. Again, this is partially a language problem. When people use the engineering terms "precision" and "accuracy" together, they usually use the same units, such as degrees. Accuracy refers to the distance from the real position (so average accuracy gives bias), and precision is the clustering around the average bias, usually measured as standard deviation. Yet here accuracy is percent correct: also a convention in psychology, but not when contrasting accuracy with precision, in the engineering sense. I suggest you change "accuracy" to "percent correct". On the other hand, I have no idea how precision was defined. All I could find was: "mixture modelling was used to estimate the precision and guess rate of reproduction responses, based on the concentration (k) and height of von Mises and uniform distributions, respectively". I do not know what that means.

      In the case of a binary decision, is seems reasonable to use the term “accuracy” to refer to the correspondence between the target state and the response on a task. However, we agree that while our (main) task is binary, the target is not and nor is the secondary task. We thank the reviewer for bringing this to our attention, as we agree that this will be a likely cause of confusion. To avoid confusion we have specifically referred to “task accuracy” throughout the revised manuscript.

      With regards to precision, our measure of precision is consistent with what Reviewer 1 describes as such, i.e., the clustering of responses. In particular, the von Mises distribution is essentially a Gaussian distribution in circular space, and the kappa parameter defines the width of the distribution, regardless of the mean, with larger values of kappa indicating narrower (more precise) distributions. We could have used standard deviation to assess precision; however, this would incorrectly combine responses on which participants failed to encode the target (e.g., because of a blink) and were simply guessing. To account for these trials, we applied mixture modelling of guess and genuine responses to isolate the precision of genuine responses, as is standard in the visual working memory literature. However, we agree that this was not sufficiently described in the original manuscript and have elaborated on this method in the revised version.

      Line 598 - From the reproduction task, we sought to estimate participant’s recall precision. It is likely that on some trials participants failed to encode the target and were forced to make a response guess. To isolate the recall precision from guess responses, we used mixture modelling to estimate the precision and guess rate of reproduction responses, based on the concentration (k) and height of von Mises and uniform distributions, respectively (Bays et al., 2009). The k parameter of the von Mises distribution reflects its width, which indicates the clustering of responses around a common location.

      (4) Previous studies show serial dependence can increase bias but decrease scatter (inverse precision) around the biased estimate. The current study claims to be at odds with that. But are the two measures of precision relatable? Was the real (random) position of the target subtracted from each response, leaving residuals from which the inverse precision was calculated? (If so, the authors should say so..) But if serial dependence biases responses in essentially random directions (depending on the previous position), it will increase the average scatter, decreasing the apparent precision. 

      Previous studies have shown that when serial dependence is attractive there is a corresponding increase in precision around small offsets from the previous item (citations). Indeed, attractive biases will lead to reduced scattering (increased precision) around a central attracter. Consistent with previous studies, and this rational, we also found an attractive bias coupled with increased precision. To clarify, for the serial dependency analysis, we calculated bias and precision by binning reproduction responses according to the offset between the current and previous target and then performing the same mixture modelling described above to estimate the mean (bias) and kappa (precision) parameters of the von Mises distribution fit to the angular errors. This was not explained in the original manuscript, so we thank Reviewer 1 for bringing this to our attention and have clarified the analysis in the revised version.

      Line 604 - For the serial dependency analysis, we calculated bias and precision by binning reproduction responses according to the angular offset between the current and previous target and then performing mixture modelling to estimate the mean (bias) and k (precision) parameters of the von Mises distribution.

      (5) I suspect they are not actually measuring precision, but location accuracy. So the authors could use "percent correct" and "localization accuracy". Or be very clear what they are actually doing. 

      As explained in our response to Reviewer 1’s previous comment, we are indeed measuring precision.

      Reviewer #2 (Public review):

      (1) The abstract should more explicitly mention that conclusions about feedforward mechanisms were derived from a reanalysis of an existing EEG dataset. As it is, it seems to present behavioral data only.

      It is not clear what relevance the fact that the data has been analyzed previously has to the results of the current study. However, we do think that it is important to be clear that the EEG recordings were collected separately from the behavioural and eyetracking data, so we have clarified this in the revised abstract.

      Line 7 - By integrating behavioural and pupillometry recordings with electroencephalographical recordings from a previous study, we identify two distinct mechanisms that operate across all scales.

      (2) The EEG task seems quite different from the others, with location and color changes, if I understand correctly, on streaks of consecutive stimuli shown every 100 ms, with the task involving counting the number of target events. There might be different mechanisms and functions involved, compared to the behavioral experiments reported. 

      As stated above, we agree that it is important that readers are aware that the EEG recordings were collected separately to the behavioural and eyetracking data. We were forthright about this in the original manuscript and how now clarified this in the revised abstract. We agree that collecting both sets of data in the same experiment would be a useful validation of the current results and have acknowledged this in a new Limitations and future directions section of the Discussion of the revised manuscript.

      Line 501 - Another limitation of the current study is that the EEG recordings were collected in the separate experiment to the behavioural and pupillometry data. The stimuli and task were similar between experiments, but not identical. For example, the EEG experiment employed coloured arc stimuli presented at a constant rate of ~3.3 Hz and participants were tasked with counting the number of stimuli presented at a target location. By contrast, in the behavioural experiment, participants viewed white blobs presented at an average rate of ~2.8 Hz and performed a binary spatial task coupled with an infrequent reproduction task. An advantage of this was that the sensory responses to stimuli in the EEG recordings were not conflated with motor responses; however, future work combining these measures in the same experiment would serve as a validation for the current results.

      (3) How is the arbitrary choice of restricting EEG decoding to a small subset of parieto-occipital electrodes justified? Blinks and other artifacts could have been corrected with proper algorithms (e.g., ICA) (Zhang & Luck, 2025) or even left in, as decoders are not necessarily affected by noise. Moreover, trials with blinks occurring at the stimulus time should be better removed, and the arbitrary selection of a subset of electrodes, while reducing the information in input to the decoder, does not account for trials in which a stimulus was missed (e.g., due to blinks).

      Electrode selection was based on several factors: 1) reduction of eye movement/blink artifacts (as noted in the original manuscript), 2) consistency with the previous EEG study (Rideaux, 2024) and other similar decoding studies (Buhmann et al., 2024; Harrison et al., 2023; Rideaux et al., 2023), 3) improved signal-to-noise by including only sensors that carry the most position information (as shown in Supplementary Figure 1a and the previous EEG study). We agree that this was insufficiently explained in the original manuscript and have clarified our sensor selection in the revised version.

      Line 631 - We only included the parietal, parietal-occipital, and occipital sensors in the analyses to i) reduce the influence of signals produced by eye movements, blinks, and non-sensory cortices, ii) for consistency with similar previous decoding studies (Buhmann et al., 2024; Rideaux, 2024; Rideaux et al., 2025), and iii) to improve decoding accuracy by restricting sensors to those that carried spatial position information (Supplementary Fig. 1a).

      (4) The artifact that appears in many of the decoding results is puzzling, and I'm not fully convinced by the speculative explanation involving slow fluctuations. I wonder if a different high-pass filter (e.g., 1 Hz) might have helped. In general, the nature of this artifact requires better clarification and disambiguation.

      We agree that the nature of this artifact requires more clarification and disambiguation. Due to relatively slow changes in the neural signal, which are not stimulus-related, there is a degree of temporal autocorrelation in the recordings. This can be filtered out, for example, by using a stricter high-pass filter; however, we tried a range of filters and found that a cut-off of at least 0.7 Hz is required to remove the artifact, and even a filter of 0.2 Hz introduces other (stimulus-related) artifacts, such as above-chance decoding prior to stimulus onset. These stimulus-related artifacts are due to the temporal smearing of data, introduced by the filtering, and have a more pronounced and complex influence on the results and are more difficult to remove through other means, such as the baseline correction applied in the original manuscript.

      The temporal autocorrelation is detected by the decoder during training and biases it to classify/decode targets that are presented nearby in time as similar. That is, it learns the neural pattern for a particular stimulus location based on the activity produced by the stimulus and the temporal autocorrelation (determined by slow stimulus unrelated fluctuations). The latter only accounts for a relatively smaller proportion of the variance in the neural recordings under normal circumstances and would typically go undetected when simply plotting decoding accuracy as a function of position. However, it becomes weakly visible when decoding accuracy is plotted as a function of distance from the previous target, as now the bias (towards temporally adjacent targets) aligns with the abscissa. Further, it becomes highly visible when the stimulus labels are shuffled, as now the decoder can only learn from the variance associated with the temporal autocorrelation (and not from the activity produced by the stimulus).

      In the linear discriminant analysis, this led to temporally proximal items being more likely to be classified as on the same side. This is why there is above-chance performance for repeat trials (Supplementary Figure 2b), and below-chance performance for alternate trials, even when the labels are shuffled – the temporal autocorrelation produces a general bias towards classifying temporally proximate stimuli as on the same side, which selectively improves the classification accuracy of repeat trials. Fortunately, the bias is relatively constant as a function of time within the epoch and is straightforward to estimate by shuffling the labels, which means that it can be removed through a baseline correction. However, to further demonstrate that the autocorrelation confound cannot account for the differences observed between repeat and alternate trials in the micro classification analysis, we now additionally show the results from a more strictly filtered version of the data (0.7 Hz). These results show a similar pattern as the original, with the additional stimulusrelated artifacts introduced by the strict filter, e.g., above chance decoding prior to stimulus onset.

      In the inverted encoding analysis, the same temporal autocorrelation manifests as temporally proximal trials being decoded as more similar locations. This is why there is increased decoding accuracy for targets with small angular offsets from the previous target, even when the labels are shuffled (Supplementary Figure 3c), because it is on these trials that the bias happens to align with the correct position. This leads to an attractive bias towards the previous item, which is most prominent when the labels are shuffled.

      To demonstrate the phenomenon, we simulated neural recordings from a population of tuning curves and performed the inverted encoding analysis on a clean version of the data and a version in which we introduced temporal autocorrelation. We then repeated this after shuffling the labels. The simulation produced very similar results to those we observed in the empirical data, with a single exception: while precision in the simulated shuffled data was unaffected by autocorrelation, precision in the unshuffled data was clearly affected by this manipulation. This may explain why we did not find a correlation between the shuffled and unshuffled precision in the original manuscript. 

      These results echo those from the classification analysis, albeit in a more continuous space. However, whereas in the classification analysis it was straightforward to perform a baseline correction to remove the influence of general temporal dependency, the more complex nature of the accuracy, precision, and bias parameters over the range of time and delta location makes this approach less appropriate. For example, the bias in the shuffled condition ranged from -180 to 180 degrees, which when subtracted from the bias in the unshuffled condition would produce an equally spurious outcome, i.e., the equal opposite of this extreme bias. Instead for the inverted encoding analysis, we used the data high-pass filtered at 0.7 Hz. As with the classification analysis, this removed the influence of general temporal dependencies, as indicated by the results of the shuffled data analysis (Supplementary Figure 3f), but it also temporally smeared the stimulus-related signal, resulting in above chance decoding accuracy prior to stimulus onset (Supplementary Figure 3d). However, given thar we were primarily interested in the pattern of accuracy, precision, and bias as a function of delta location, and less concerned with the precise temporal dynamics of these changes, which appeared relatively stable in the filtered data. Thus, this was the more suitable approach to removing the general temporal dependencies in the inverted encoding analysis and the one that is presented in Figure 3.

      We have updated the revised manuscript in light of these changes, including a fuller description of the artifact and the results from the abovementioned control analyses.

      Figure 3 updated.

      Figure 3 caption - e) Decoding accuracy for stimulus location, from reanalysis of previously published EEG data (17). Inset shows the EEG sensors included in the analysis (blue dots), and black rectangles indicate the timing of stimulus presentations (solid: target stimulus, dashed: previous and subsequent stimuli). f) Decoding accuracy for location, as a function of time and D location. Bright colours indicate higher decoding accuracy; absolute accuracy values can be inferred from (e). g-i) Average location decoding  (g) accuracy, (h) precision, and (h) bias from 50 – 500 ms following stimulus onset. Horizontal bar in (e) indicates cluster corrected periods of significance; note, all time points were significantly above chance due to temporal smear introduced by strict high-pass filtering (see Supplementary Figure 3 for full details). Note, the temporal abscissa is aligned across (e & f). Shaded regions indicate ±SEM.

      Line 218 - To further investigate the influence of serial dependence, we applied inverted encoding modelling to the EEG recordings to decode the angular location of stimuli. We found that decoding accuracy of stimulus location sharply increased from ~60 ms following stimulus onset (Fig. 3e). Note, to reduce the influence of general temporal dependencies, we applied a 0.7 Hz high-pass filter to the data, which temporally smeared the stimulus-related information, resulting in above chance decoding accuracy prior to stimulus presentation (for full details, see Supplementary Figure 3). To understand how serial dependence influences the representation of these features, we inspected decoding accuracy for location as a function of both time and D location (Fig. 3f). We found that decoding accuracy varied depending not only as a function of time, but also as a function of D location. To characterise this relationship, we calculated the average decoding accuracy from 50 ms until the end of the epoch (500 ms), as a function of D location (Fig. 3g). This revealed higher accuracy for targets with larger D location. We found a similar pattern of results for decoding precision (Fig. 3h). These results are consistent with the micro temporal context (behavioural) results, showing that targets that alternated were recalled more precisely. Lastly, we calculated the decoding bias as a function of D location and found a clear repulsive bias away from the previous item (Fig. 3i). While this result is inconsistent with the attractive behavioural bias, it is consistent with recent studies of serial dependence suggesting an initial pattern of repulsion followed by an attractive bias during the response period (20–22).

      Line 726 - As shown in Supplementary Figure 3, we found the same general temporal dependencies in the decoding accuracy computed using inverted encoding that were found using linear discriminant classification. However, as a baseline correction would not have been appropriate or effective for the parameters decoded with this approach, we instead used a high-pass filter of 0.7 Hz to remove the confound, while being cautious about interpreting the timing of effects produced by this analysis due to the temporal smear introduced by the filter.

      Supplementary Figure 2 updated.

      Supplementary Figure 2 caption - Removal of general micro temporal dependencies in EEG responses. We found that there were differences in classification accuracy for repeat and alternate stimuli in the EEG data, even when stimulus labels were shuffled. This is likely due to temporal autocorrelation within the EEG data due to low frequency signal changes that are unrelated to the decoded stimulus dimension. This signal trains the decoder to classify temporally proximal stimuli as the same class, leading to a bias towards repeat classification. For example, in general, the EEG signal during trial one is likely to be more similar to that during trial two than during trial ten, because of low frequency trends in the recordings. If the decoder has been trained to classify the signal associated with trial one as a leftward stimulus, then it will be more likely to classify trial two as a leftward stimulus too. These autocorrelations are unrelated to stimulus features; thus, to isolate the influence of stimulus-specific temporal context, we subtracted the classification accuracy produced by shuffling the stimulus labels from the unshuffled accuracy (as presented in Figure 2e, f). We confirmed that using a stricter high-pass filter (0.7 Hz) removes this artifact, as indicated by the equal decoding accuracy between the two shuffled conditions. However, the stricter high-pass filter temporally smears the stimulus-related signal, which introduces other (stimulus-related) artifacts, e.g., above-chance decoding accuracy prior to stimulus presentation, that are larger and more complex, i.e., changing over time. Thus, we opted to use the original high pass filter (0.1 Hz) and apply a baseline correction. a) The uncorrected classification  accuracy along task related and unrelated planes. Note that these results are the same as the corrected version shown in Figure 2e, because the confound is only apparent when accuracy is grouped according to temporal context.

      b) Same as (a), but split into repeat and alternate stimuli, along (left) task-related and (right) unrelated planes. Classification  accuracy when labels are shuffled is also shown. Inset in (a) shows the EEG sensors included in the analysis (blue dots). (c, d) Same as (a, b), but on data filtered using a 0.7 Hz high-pass filter. Black rectangles indicate the timing of stimulus presentations (solid: target stimulus, dashed: previous and subsequent stimuli). Shaded regions indicate ±SEM.

      Supplementary Figure 3 updated.

      Supplementary Figure 3 caption - Removal of general temporal dependencies in EEG responses for inverted encoding analyses. As described in Methods - Neural Decoding, we used inverted encoding modelling of EEG recordings to estimate the decoding accuracy, precision, and bias of stimulus location. Just as in the linear discriminant classification analysis, we also found the influence of general temporal dependencies in the results produced by the inverted encoding analysis. In particular, there was increased decoding accuracy for targets with low D location. This was weakly evident in the period prior to stimulus presentation, but clearly visible when the labels were shuffled. These results are mirror those from the classification analysis, albeit in a more continuous space. However, whereas in the classification analysis it was straightforward to perform a baseline correction to remove the influence of general temporal dependency, the more complex nature of the accuracy, precision, and bias parameters over the range of time and D location makes this approach less appropriate. For example, the bias in the shuffled condition ranged from -180° to 180°, which when subtracted from the bias in the unshuffled condition would produce an equally spurious outcome, i.e., the equal opposite of this extreme bias. Instead for the inverted encoding analysis, we used the data high-pass filtered at 0.7 Hz. As with the classification analysis, this significantly reduced the influence of general temporal dependencies, as indicated by the results of the shuffled data analysis, but it also temporally smeared the stimulus-related signal, resulting in above chance decoding accuracy prior to stimulus onset. However, we were primarily interested in the pattern of accuracy, precision, and bias as a function of D location, and less concerned with the precise temporal dynamics of these changes. Thus, this was the more suitable approach to removing the general temporal dependencies in the inverted encoding analysis and the one that is presented in Figure 3. (a) Decoding accuracy as a function of time for the EEG data filtered using a 0.1 Hz high-pass filter. Inset shows the EEG sensors included in the analysis (blue dots), and black rectangles indicate the timing of stimulus presentations (solid: target stimulus, dashed: previous and subsequent stimuli). (b, c) The same as (a), but as a function of time and D location for (b) the original data and (c) data with shuffled labels. (d-f) Same as (a-c), but for data filtered using a 0.7 Hz high-pass filter. Shaded regions in (a, d) indicate ±SEM. Horizontal bars in (a, d) indicate cluster corrected periods of significance; note, all time points in (d) were significantly above chance. Note, the temporal abscissa is vertically aligned across plots (a-c & d-f).

      In the process of performing these additional analyses and simulations, we became aware that the sign of the decoding bias in the inverted encoding analyses had been interpreted in the wrong direction. That is, where we previously reported an initial attractive bias followed by a repulsive bias relative to the previous target, we have in fact found the opposite, an initial repulsive bias followed by an attractive bias relative to the previous target. Based on the new control analyses and simulations, we think that the latter attractive bias was due to general temporal dependencies. That is, in the filtered data, we only observe a repulsive bias. While the bias associated with serial dependence was not a primary feature of the study, this (somewhat embarrassing) discovery has led to reinterpretation of some results relating to serial dependence. However, it is encouraging to see that our results now align with those of recent studies (Fischer et al., 2024; Luo et al., 2025; Sheehan et al. 2024).

      Line 385 - Our corresponding EEG analyses revealed better decoding accuracy and precision for stimuli preceded by those that were different and a bias away from the previous stimulus. These results are consistent with finding that alternating stimuli are recalled more precisely. Further, while the repulsive pattern of biases is inconsistent with the observed behavioural attractive biases, it is consistent with recent work on serial dependence indicating an initial period of repulsion, followed by an attractive bias during the response period (20–22). These findings indicate that serial dependence and first-order sequential dependencies can be explained by the same underlying principle.

      (5) Given the relatively early decoding results and surprisingly early differences in decoding peaks, it would be useful to visualize ERPs across conditions to better understand the latencies and ERP components involved in the task.

      A rapid presentation design was used in the EEG experiment, and while this is well suited to decoding analyses, unfortunately we cannot resolve ERPs because the univariate signal is dominated by an oscillation at the stimulus presentation frequency (~3 Hz). We agree that this could be useful to examine in future work.

      (6) It is unclear why the precision derived from IEM results is considered reliable while the accuracy is dismissed due to the artifact, given that both seem to be computed from the same set of decoding error angles (equations 8-9).

      This point has been addressed in our response to point (4).

      (7) What is the rationale for selecting five past events as the meso-scale? Prior history effects have been shown to extend much further back in time (Fritsche et al., 2020). 

      We used five previous items in the meso analyses to be consistent with previous research on sequential dependencies (Bertelson, 1961; Gao et al., 2009; Jentzsch & Sommer, 2002; Kirby, 1976; Remington, 1969). However, we agree that these effects likely extend further and have acknowledged this in the revied version of the manuscript.

      Line 240 - Higher-order sequential dependences are an example of how stimuli (at least) as far back as five events in the past can shape the speed and task accuracy of responses to the current stimulus (9, 10); however, note that these effects have been observed for more than five events (20).

      (8) The decoding bias results, particularly the sequence of attraction and repulsion, appear to run counter to the temporal dynamics reported in recent studies (Fischer et al., 2024; Luo et al., 2025; Sheehan & Serences, 2022). 

      This point has been addressed in our response to point (4).

      (9) The repulsive component in the decoding results (e.g., Figure 3h) seems implausibly large, with orientation differences exceeding what is typically observed in behavior. 

      As noted in our response to point (4), this bias was likely due to the general temporal dependency confound and has been removed in the revised version of the manuscript.

      (10) The pattern of accuracy, response times, and precision reported in Figure 3 (also line 188) resembles results reported in earlier work (Stewart, 2007) and in recent studies suggesting that integration may lead to interference at intermediate stimulus differences rather than improvement for similar stimuli (Ozkirli et al., 2025).

      Thank you for bringing this to our attention, we have acknowledged this in the revised manuscript.

      Line 197 - Consistent with our previous binary analysis, and with previous work (19), we also found that responses were faster and more accurate when D location was small (Fig. 3b, c).

      (11) Some figures show larger group-level variability in specific conditions but not others (e.g., Figures 2b-c and 5b-c). I suggest reporting effect sizes for all statistical tests to provide a clearer sense of the strength of the observed effects. 

      Yes, as noted in the original manuscript, we find significant differences between the variance task-related and -unrelated conditions. We think this is due to opposing forces in the task-related condition: 

      “The increased variability of response time differences across the taskrelated plane likely reflects individual differences in attention and prioritization of responding either quickly or accurately. On each trial, the correct response (e.g., left or right) was equally probable. So, to perform the task accurately, participants were motivated to respond without bias, i.e., without being influenced by the previous stimulus. We would expect this to reduce the difference in response time for repeat and alternate stimuli across the taskrelated plane, but not the task-unrelated plane. However, attention may amplify the bias towards making faster responses for repeat stimuli, by increasing awareness of the identity of stimuli as either repeats or alternations (17). These two opposing forces vary with task engagement and strategy and thus would be expected produce increased variability across the task-related plane.” We agree that providing effect sizes may provided a clearer sense of the observed effects and have done so in the revised version of the manuscript.

      Line 739 - For Wilcoxon signed rank tests, the rank-biserial correlation (r) was calculated as an estimate of effect size, where 0.1, 0.3, and 0.5 indicate small, medium, and large effects, respectively (54). For Friedman’s ANONA tests, Kendal’s W was calculated as an estimate of effect size, where 0.1, 0.3, and 0.5 indicate small, medium, and large effects, respectively (55).

      (12) The statement that "serial dependence is associated with sensory stimuli being perceived as more similar" appears inconsistent with much of the literature suggesting that these effects occur at post-perceptual stages (Barbosa et al., 2020; Bliss et al., 2017; Ceylan et al., 2021; Fischer et al., 2024; Fritsche et al., 2017; Sheehan & Serences, 2022). 

      In light of the revised analyses, this statement has been removed from the manuscript.

      (13) If I understand correctly, the reproduction bias (i.e., serial dependence) is estimated on a small subset of the data (10%). Were the data analyzed by pooling across subjects?

      The dual reproduction task only occurred on 10% of trials. There were approximately 2000 trials, so ~200 reproduction responses. For the micro and macro analyses, this was sufficient to estimate precision within each of the experimental conditions (repeat/alternate, expected/unexpected). However, it is likely that we were not able to reproduce the effect of precision at the meso level across both experiments because we lacked sufficient responses to reliably estimate precision when split across the eight sequence conditions. Despite this, the data was always analysed within subjects.

      (14) I'm also not convinced that biases observed in forced-choice and reproduction tasks should be interpreted as arising from the same process or mechanism. Some of the effects described here could instead be consistent with classic priming. 

      We agree that the results associated with the forced-choice task (response time task accuracy) were likely due to motor priming, but that a separate (predictive) mechanism may explain the (precision) results associated with the reproduction task. These are two mechanisms we think are operating across the three temporal scales investigated in the current study.

      Reviewing Editor Comments:

      (1) Clarify task design and measurement: The dense presentation makes it difficult to understand key design elements and their implications. Please provide clearer descriptions of all task elements, and how they relate to each other (EEG vs. behaviour, stimulus plane vs. TR and TU plane, reproduction vs. discrimination and role of priming), and clearly explain how key measures were computed for each of these (e.g., precision, accuracy, reproduction bias).

      In the revised manuscript, we have expanded on descriptions of the source and nature of the data (behavioural and EEG), the different planes analyzed in the behavioural task, and how key metrics (e.g., precision) were computed.

      (2) Offer more insight into underlying data, including original ERP waveforms to aid interpretation of decoding results and the timing of effects. In particular, unpack the decoding temporal confound further.

      In the revised manuscript, we have considerably offered more insight into the decoding results, in particular, the nature of the temporal confound. We were unable to assess ERPs due to the rapid presentation design employed in the EEG experiment.

      (3) Justify arbitrary choices such as electrode selection for EEG decoding (e.g., limiting to parieto-occipital sensors), number of trials in meso scale, and the time terminology itself.

      In the revised manuscript, we have clarified the reasons for electrode selection.

      (3) Discuss deviations from literature: Several findings appear to contradict or diverge from previous literature (e.g., effects of serial dependence). These discrepancies could be discussed in more depth. 

      Upon re-analysis of the serial dependence bias and removal of the temporal confound, the results of the revised manuscript now align with those from previous literature, which has been acknowledged.

      Reviewer #1 (Recommendations for the authors):

      (1) would like to use my reviewer's prerogative to mention a couple of relevant publications. 

      Galluzzi et al (Journal of Vision, 2022) "Visual priming and serial dependence are mediated by separate mechanisms" suggests exactly that, which is relevant to this study.

      Xie et al. (Communications Psychology, 2025) "Recent, but not long-term, priors induce behavioral oscillations in peri-saccadic vision" also seems relevant to the issue of different mechanisms. 

      Thank you for bringing these studies to our attention. We agree that they are both relevant have referenced both appropriately in the revised version of the manuscript.

      Reviewer #2 (Recommendations for the authors): 

      (1) I find the discussion on attention and awareness (from line 127 onward) somewhat vague and requiring clarification.

      We agree that this statement was vague and referred to “awareness” without operationation. We have revised this statement to improve clarity.

      Line 135 - However, task-relatedness may amplify the bias towards making faster responses for repeat stimuli, by increasing attention to the identity of stimuli as either repeats or alternations (17).

      (2) Line 140: It's hard to argue that there are expectations that the image of an object on the retina is likely to stay the same, since retinal input is always changing. 

      We agree that retinal input is often changing, e.g., due to saccades, self-motion, and world motion. However, for a prediction to be useful, e.g., to reduce metabolic expenditure or speed up responses, it must be somewhat precise, so a prediction that retinal input will change is not necessarily useful, unless it can specify what it will change to. Given retinal input of x at time t, the range of possible values of x at time t+1 (predicting change) is infinite. By contrast, if we predict that x=x at time t+1 (no change), then we can make a precise prediction. There is, of course, other information that could be used to reduce the parameter space of predicted change from x at time t, e.g., the value of x at time t-1, and we think this drives predictions too. However, across the infinite distribution of changes from x, zero change will occur more frequently than any other value, so we think it’s reasonable to assert that the brain may be sensitive to this pattern.

      (3) Line 564: The gambler's fallacy usually involves sequences longer than just one event.

      Yes, we agree that this phenomenon is associated with longer sequences. This section of the manuscript was in regards to previous findings that were not directly relevant to the current study and has been removed in the revised version.

      (4) In the shared PDF, the light and dark cyan colors used do not appear clearly distinguishable. 

      I expect this is due to poor document processing or low-quality image embeddings. I will check that they are distinguishable in the final version.

      References: 

      Barbosa, J., Stein, H., Martinez, R. L., Galan-Gadea, A., Li, S., Dalmau, J., Adam, K. C. S., Valls-Solé, J., Constantinidis, C., & Compte, A. (2020). Interplay between persistent activity and activity-silent dynamics in the prefrontal cortex underlies serial biases in working memory. Nature Neuroscience, 23(8), Articolo 8. https://doi.org/10.1038/s41593-020-0644-4

      Bliss, D. P., Sun, J. J., & D'Esposito, M. (2017). Serial dependence is absent at the time of perception but increases in visual working memory. Scientific reports, 7(1), 14739. 

      Ceylan, G., Herzog, M. H., & Pascucci, D. (2021). Serial dependence does not originate from low-level visual processing. Cognition, 212, 104709. https://doi.org/10.1016/j.cognition.2021.104709

      Fischer, C., Kaiser, J., & Bledowski, C. (2024). A direct neural signature of serial dependence in working memory. eLife, 13. https://doi.org/10.7554/eLife.99478.1

      Fritsche, M., Mostert, P., & de Lange, F. P. (2017). Opposite effects of recent history on perception and decision. Current Biology, 27(4), 590-595. 

      Fritsche, M., Spaak, E., & de Lange, F. P. (2020). A Bayesian and efficient observer model explains concurrent attractive and repulsive history biases in visual perception. eLife, 9, e55389. https://doi.org/10.7554/eLife.55389

      Gekas, N., McDermott, K. C., & Mamassian, P. (2019). Disambiguating serial effects of multiple timescales. Journal of vision, 19(6), 24-24. 

      Luo, M., Zhang, H., Fang, F., & Luo, H. (2025). Reactivation of previous decisions repulsively biases sensory encoding but attractively biases decision-making. PLOS Biology, 23(4), e3003150. https://doi.org/10.1371/journal.pbio.3003150

      Ozkirli, A., Pascucci, D., & Herzog, M. H. (2025). Failure to replicate a superiority effect in crowding. Nature Communications, 16(1), 1637. https://doi.org/10.1038/s41467025-56762-5

      Sheehan, T. C., & Serences, J. T. (2022). Attractive serial dependence overcomes repulsive neuronal adaptation. PLoS biology, 20(9), e3001711. 

      Stewart, N. (2007). Absolute identification is relative: A reply to Brown, Marley, and

      Lacouture (2007).  Psychological  Review, 114, 533-538. https://doi.org/10.1037/0033-295X.114.2.533

      Treisman, M., & Williams, T. C. (1984). A theory of criterion setting with an application to sequential dependencies. Psychological review, 91(1), 68. 

      Zhang, G., & Luck, S. J. (2025). Assessing the impact of artifact correction and artifact rejection on the performance of SVM- and LDA-based decoding of EEG signals. NeuroImage, 316, 121304. https://doi.org/10.1016/j.neuroimage.2025.121304

    1. eLife Assessment

      Complementing previous work (Namiki et al, 2018), this study provides an important resource for the Drosophila community as it reports 500 lines targeting descending neurons (DN), in addition to compiling 306 existing DN lines from the literature. The compelling work characterizes 146 DNs and makes a critical link with the DNs identified in Electron microscopy (EM). The lines in this paper will be of interest to Drosophila neuroscientists who will be able to use the reported genetic drivers for further functional characterization of DNs and circuit mapping in conjunction with existing EM datasets.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Zung et al. describes a curated library of genetic lines labeling a class of important neurons called Descending Neurons in the fruit fly, Drosophila melanogaster. These neurons are especially important in their critical role in relaying information from the brain to motor circuits within the ventral nerve cord - the insect analogy of the vertebrate spinal cord. The authors screened through a vast resource of Gal4 lines to generate 500 new genetic lines that allow for the precise labeling of 190 (40%) of all Descending Neurons. The tools introduced here will allow researchers to perform precise circuit dissection of the exact roles these neurons play in linking the brain to the ventral nerve cord.

      Strengths:

      This manuscript represents an important follow-up to the author's 2018 paper in the extension of the genetic toolkit from 178 genetic lines that target 65 Descending Neuron (DN) classes to 806 lines that target 190 DN classes. The presentation of this toolkit is comprehensive with confocal images, informative classifications of lines based on specificity/consistency, and identification of the neuron types - when possible - in the EM dataset.

      Weaknesses:

      No weaknesses were identified by this reviewer.

    3. Reviewer #2 (Public review):

      Summary:

      Descending neurons (DNs) are critical nodes in the neural computation underlying sensorimotor transformation. Building on their earlier work, the authors have substantially expanded the genetic resources for labeling these cell types in D. melanogaster, offering a valuable public resource.

      Strengths:

      The authors identified 146 additional DN types and generated 500 new DN driver lines, expanding the genetic reagents from labeling 98 cell types to 244, representing approximately 50% of all DN types estimated by EM connectomes. While the EM connectomes offer unprecedented resolution of neuronal cell types and their connectivity, genetic access to these cell types remains essential for studying their functions and testing hypotheses. Given the broad interest in DNs, the reagents generated in this study will be of important value for addressing a wide range of questions in sensorimotor transformation.

      The organization of the dataset is overall intuitive and comprehensive. The authors also provided clear information and guidance on accessing the relevant resources, such as stack images and fly lines. In addition, the authors have thoughtfully handled the information updated from the earlier collection they generated (Namiki et al. 2018) and incorporated previously published DN lines, providing a consolidated and up-to-date resource for the DN community.

      Weaknesses:

      No weaknesses were identified by this reviewer.

    4. Reviewer #3 (Public review):

      Summary:

      This study provides the Drosophila community with a large collection of new split-Gal4 descending neuron genetic lines. They extend previous efforts to characterize and identify genetic lines for this important class of neurons by providing images of descending neurons and a metric for genetic lines based on specificity and consistency. Their discussion highlights several applications of this collection, for example, to understand the function of new descending neurons through optogenetic and/or physiological characterization. They also helpfully discuss caveats, encouraging users of this collection to validate expression patterns and to be careful when interpreting optogenetic experimental results, considering potential off-target labeling in the lines. Overall, members of the Drosophila community interested in understanding the function of descending neurons and their role in behavior will find this a helpful resource.

      Strengths:

      (1) The authors extend the previous genetic access of descending neurons in Drosophila to over 800 split-Gal4 lines and 190 cell types (nearly half of the known population of descending neurons). The authors update and at times correct the previous identification of descending neurons from a previous, large-scale analysis. The authors extend and, at times, correct previous efforts at characterizing these neurons.

      (2) Clear images of descending neurons labeled by new genetic lines are presented in the main figure papers for reference.

      (3) This study classifies lines labeling descending neurons using a quality score to indicate specificity and consistency. They provide this for the entire set of genetic lines, a valuable assessment for researchers interested in targeting these neurons for optogenetic or physiological characterization.

      Weaknesses:

      Although this paper represents a substantial effort and useful contribution to the Drosophila community, a few weaknesses, primarily regarding the specificity and reliability of genetic lines, remain:

      (1) The authors state that optogenetic activation of DN types using the new split-GAL4 lines is expected to reliably activate the target neurons with virtually no off-target effects in the rest of the central nervous system. More data supporting this conclusion, including both qualitative and quantitative anatomical evidence, would strengthen this claim.

      (2) The authors do recommend that researchers using these lines examine expression patterns themselves to evaluate line cleanliness and consistency, but some analysis by the authors would be useful, for example, providing guidelines for best practices to perform this evaluation.

      (3) Changes in expression patterns after several generations are noted by the authors, weakening confidence somewhat in the long-term usefulness of this collection of genetic lines.

    1. eLife Assessment

      This important study presents the development of a novel inhibitor for SARS-CoV-2 Mac1 that has potential utility both as an antiviral therapeutic and as a tool for probing the molecular mechanisms by which infection-induced ADP-ribosylation triggers robust host antiviral responses. Though minor gaps in understanding the compound's precise molecular mechanism of action and its ability to target Mac1 from other coronaviruses remain, the evidence for its effects on SARS-CoV-2 in relevant biological models is compelling.

    2. Reviewer #1 (Public review):

      SARS-CoV-2 encodes a macrodomain (Mac1) within the nsp3 protein that removes ADP-ribose groups from proteins. However, its role during infection is not well understood. Evidence suggests that Mac1 antagonizes the host interferon response by counteracting the wave of ADP ribosylation that occurs during infection. Indeed, several PARPs are interferon-stimulated genes. While multiple targets have been proposed, the mechanistic links between ADP ribosylation and a robust antiviral response remain unclear.

      Genetic inactivation of Mac1 abrogates viral replication in vivo, suggesting that small-molecule inhibitors of Mac1 could be developed into antivirals to treat COVID-19 and other emerging coronaviruses. The authors report a potent and selective small molecule inhibitor targeting Mac1 (AVI-4206) that demonstrates efficacy in human airway organoids and animal models of SARS-CoV-2 infection. While these results are compelling and provide proof of concept for the therapeutic targeting of Mac1, I am particularly intrigued by the potential of this compound as a probe to elucidate the mechanistic connections between infection-induced ADP ribosylation and the host antiviral response.

      The precise function of Mac1 remains unclear. Given its presence in multiple viruses, it likely acts on a fundamental host immune pathway(s). AVI-4206, while promising as a lead compound for the development of antivirals targeting coronaviruses, could also be a valuable tool for uncovering the function of the Mac1 domain. This may lead to fundamental insights into the host immune response to viral infection.

    3. Reviewer #2 (Public review):

      Summary:

      The authors describe the development of a novel inhibitor (AVI-4206) for the first macrodomains of the nsp3 protein of SARS-CoV-2 (Mac1). This involves both medical chemical synthesis, structural work as well as biochemical characterisation. Subsequently the authors present their finding of the efficacy of the inhibitor both on cell culture as well as animal models of SARS-CoV-2 infection. They find that despite high affinity for Mac1 and the known replicatory defects of catalytically inactive Mac1 only moderate beneficial effects can be observed in their chosen models.

      Strengths:

      The authors employ a variety of different assay to study the affinity, selectivity and potency of the novel inhibitor and thus the in vitro data are very compelling.<br /> Similarly, the authors use several cell culture and in vivo models to strengthen their findings. In addition, the authors address several aspects of the health impact of coronaviral infections from animal survival, over viral load to histological assessment of lung damage.

      Weaknesses:

      The selection of Targ1 and MacroD2 as off-target human macrodomains is sub-optimal as several studies have shown that the first macrodomains of PARP9 and PARP14 are much closer related to coronaviral macrodomains and both macrodomains are implicated in antiviral defence and immunity. However, the authors address this issue by providing modeling data that show clashes with AVI-4206 similarly to their models with MacroD2 and TARG1.

      Comments on revisions:

      While the authors have not addressed all my suggestions experimentally, I would like to nevertheless congratulate them on a significantly strengthened manuscript that will provide a valuable contribution to the field.

    4. Reviewer #3 (Public review):

      Summary:

      The authors were trying to validate SARS-CoV-2 Mac1 as a drug discovery target and by extension other viral macrodomains.

      Strengths:

      The medicinal chemistry and structure based optimization is exemplary. Macrodomains and ADPribosyl hydrolases have the reputation for being undruggable, yet the authors managed to optimize hits from a fragment screen using structure based approaches and fragment linking to make a 20nM inhibitor as a tool compound to validate the target.<br /> In addition, the in vivo work is also a strength. The ability to reduce the viral count at a rate comparable to nirmatrelvir is impressive. Tracking the cytokine expression levels also supports much of the genetic data and mechanism of action for macrodomains.

      Weaknesses:

      The main compound AVI-4206, while being very potent and selective is not appreciably orally bioavailable. The fact that they have to use high doses of the compound IP to see in vivo effects may lead to questions regarding off target effects. The authors acknowledge this and point it out as a potential avenue for further optimization.

      The cellular models are not as predictive of antiviral activity as one would expect. However, the authors had enough chutzpah to test the compound in vivo knowing that cellular models might not be an accurate representation of a living system with a fully functional immune system all of which is most likely needed in an antiviral response to test the importance of Mac1 as a target.

      Comments on revisions:

      All previous suggestions were addressed. I am satisfied with the author's modifications.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations for the authors):

      Although this study is rigorous and the paper is well-written, I have a few concerns that the authors should address before publication.

      (1) Cellular levels of protein ADP-ribosylation should be analyzed using anti-ADPR antibodies following infection, both with and without Mac1 and AVI-4206 treatment. While the authors have provided impressive in vivo data, these experiments could ideally be conducted in mice. However, I would be amenable to these analyses being performed in human airway organoids, as they demonstrate clear phenotypes following AVI-4206 treatment post-infection. For a more in-depth exploration, the authors could consider affinity purifying ADP-ribosylated proteins and identifying them via mass spectrometry. I would find it particularly compelling if this approach revealed components of the NF-kB signaling pathway, given the intriguing results presented in Fig. 5. I am also curious if there are differences in ADP ribosylated proteins when comparing Mac1 KO SARS-C0V-2 to AVI-4206 treatment.

      We note that despite the recent flurry of activity around Mac1, there is a surprising lack of public data on overall ADPr levels or targets. While we will address the literature precedence for PARP14 signals specifically below (Reviewer 2 point (h)) by immunofluorescence, we note that overall levels have not been characterized biochemically previously. Recent PARP14 papers and the ASAP AViDD preprint show changes by immunofluorescence only: and the evidence in that preprint is quite modest - see Figure 7B - https://pmc.ncbi.nlm.nih.gov/articles/PMC11370477/.

      We suspect the difficulty in tracking changes biochemically is due to multiple factors that influence the overall detectability and reproducibility. First, with regard to detectability - it is quite possible that only a small change in the ADPr status of a small number of targets is responsible for the phenotypes in vivo. Virus levels are very low in the organoid system and the variability in ADPr levels from tissue samples from in vivo experiments is high. Given the difficulty in translating back to cellular models, this problem is therefore magnified further. Second, with regard to reproducibility - we observe a great deal of reagent dependence on ADPr signals by Western blot+/- Mac1 expression in both cellular and tissue lysates (including when stimulated with H2O2, interferon, or during viral infection). Similarly, we do not observe reproducible proteins that pulldown with Mac1 when assayed by mass spectrometry. It is quite likely that these issues are a result of tissue/sample preparation that results in a loss of the ADPr modification during preparation (especially for acidic residue modifications). This also explains the reliance on IF assays in the PARP14 literature. A very good discussion of these issues is also contained in this paper: https://doi.org/10.1042/BSR20240986.

      Nonetheless we have attempted one final experiment. Here, we have measured ADPr modification of cellular lysates upon uninfected conditions as well as upon infection with either WT or N40D mutant virus. For all conditions, this was done with or without treatment of cells with 100 μM of AVI-4206. Measurement of ADPr modifications by western blot using a  pan-ADPr antibody revealed a single prominent band with a molecular weight of ~130kDa, that showed a uniform increase in signal upon treatment of cells with AVI-4206 regardless of infection status. While this general trend was also observed with the mono-ADPr antibody, it was not statistically significant in its regulation upon AVI-4206 treatment. We suspect that the major band observed in these western blots is PARP1, as upon enrichment of ADPr proteins from these lysates by Af1521 immunoprecipitation, we find PARP1 to be among the most abundant proteins detected within this molecular weight range. We note that there is a baseline increase in polyADPr detection upon infection of virus with WT Mac1 (relative to uninfected and virus with N40D) and further increase when treated with AVI-4206. This compound-dependent increase is paralleled in the uninfected and N40D conditions. The counterintuitive increase upon WT Mac1 virus infection, which should erase ADPr marks, and the compound-dependent increase in the uninfected condition suggest that there are many indirect effects on ADPr signalling dynamics in this experiment. These results are difficult to reconcile with the specificity profiling of AVI-4206 (Supplementary Figure5: Thermal proteome profiling in A549 cellular lysates). As mentioned above, the lack of consistent signal across reagents for ADPr detection and the timing of monitoring ADPr levels are additional complicating factors.

      We added to the results:

      “However, we observed no strong consistent signals of global pan-ADP-ribose (panADPr) or mono-ADP-ribose (monoADPr) accumulation in infected cells treated with AVI-4206 in immunoblot analyses (Supplementary Figure 8).”

      Methods for experiment:

      Calu3 cells were obtained from ATCC and cultured in Advanced DMEM (Gibco) supplemented with 2.5% FBS, 1x GlutaMax, and 1x Penicillin-Streptomycin at 37°C and 5% CO<sub>2</sub>. 5x10<sup>6</sup> cells were plated in 15-cm dishes and media was changed every 2-3 days until the cells were 80% confluent. The cells were treated with INFy 50 ng/mL (R&D Systems) w/without AVI-4206 100 μM. After 6 hours, the cells were infected with WA1 or WA1 NSP3 Mac1 N40D at a multiplicity of infection (MOI) of 1 for 36 hours. The cells were washed with PBS x 3 and scraped in Pierce IP Lysis Buffer (ThermoFisher) containing 1x HALT protease and phosphatase inhibitor mix (ThermoFisher) on ice. The lysate was stored at -80C until further processing.

      The cell lysate was incubated for 5 minutes at room temperature with recombinant benzonase. Following incubation, the lysate was centrifuged at 13,000 rpm at 4°C for 20 minutes, and the supernatant was collected. The samples were then boiled for 5 minutes at 95°C in 1x NuPAGE LDS sample buffer (Invitrogen) with a final concentration of 1X NuPAGE sample reducing agent (Invitrogen). For the detection of ADPr levels in whole-cell lysates, the samples were subjected to SDS-PAGE and Immunoblotting. All primary and secondary antibodies (pan-ADP-ribose antibody (MABE1016, Millipore), Mono-ADP-ribose antibody (AbD33204, Bio-Rad), HRP-conjugated (Cell signaling), used at a 1:1000 dilution were diluted in 5% non-fat dry milk in TBST. Signals were detected by chemiluminescence (Thermo) and visualized using the ChemiDoc XRS+ System (Bio-Rad). Densitometric analysis was performed using Image Lab (Bio-Rad). Quantification was normalized to Actin. The data are expressed as mean ± SD. Statistical differences were determined using an unpaired t-test in GraphPad Prism 10.3.1.

      (2) SARS-CoV-2 escape mutants for AVI-4206 should be generated, sequenced, and evaluated for both ADP-ribosyl hydrolase activity and their susceptibility to inhibition by AVI-4206.

      We thank the reviewer for this suggestion. These are indeed key experiments which are currently hampered by the lack of a cell line that is fully responsive to drug treatment. Although infected organoids and macrophages show an effect in response to AVI-4206, viral levels are ~3 logs lower than in cell lines and difficult to sequence. In the absence of a system that would allow meaningful screening for outgrowth of resistant viruses, we have conducted mass spectrometry studies that showed that Mac1 is the only significant hit for AVI-4206 (SupplementaryFigure 5). The suggested outgrowth experiments will be conducted once a responsive cell line model has been established.

      (3) Given that Mac1 is found in several coronaviruses, it would be insightful for the authors to test a selection of Mac1 homologs from divergent coronaviruses to assess whether AVI-4206 can inhibit their activity in vitro.

      As mentioned above, inconsistencies in ADPr staining limit our ability to directly measure cellular activity. As an alternative approach to measure AVI-4206 selectivity in cells, we have adapted our CETSA assay for SARS-1 and MERs macrodomain proteins and find evidence that AVI-4206 can shift the melting temperature of both proteins, albeit to a lesser degree than that seen for Mac1. In line with MERS being more structurally divergent than SARS-1 from SARS CoV2, the ΔTagg for SARS-1 and MERS are 4℃ and 1℃, respectively, compared to 9℃ for Mac1.  These data have been added as Supplementary Fig S3C. Development of broader spectrum pan-inhibitors is on our radar for future work which will more thoroughly assess homologs from divergent coronaviruses.

      We added the following sentence to the main results:

      “Encouragingly, we were also able to adapt our CETSA assay for SARS-1 and MERs macrodomain proteins and find that AVI-4206 can shift the melting temperature of both proteins, albeit to a lesser degree than that seen for Mac1 (Supplementary Figure 3C).”

      We also added this supplementary figure 3:

      Minor

      (1) Line 88, "respectively.heir potency"

      Fixed, thank you!

      (2) Line 149 add a period after proteome

      Fixed, thank you!

      Reviewer #2 (Recommendations for the authors):

      (a) The authors assess inhibition of MacroD2 and Targ1 as of-targets for AVI-4206. However, Mac1 belongs to the MacroD-type class of macrodomains of which MacroD1, MacroD2 and MOD1s of PARP9 and PARP14 are the human members. In contrast Targ1 belongs to the ALC1-like class, which is only very distantly related to Mac1. Furthermore, recent studies have shown that the first macrodomains of PARP9 and PARP4 (MOD1 of PARP9/14) are much closer related to Mac1 and PARP9/14 were implicated in antiviral immunity. As such the authors should include assays showing the activity of their compounds against MacroD1 and MOD1s of PARP9/14.

      We emphasize that we detect no significant shift for any protein other than Mac1 in A549 cells by CETSA-MS (Supplementary Figure 6). For Mac1 CESTA, we see an average of 6 PARP14 spectral counts across conditions and did not detect PARP9.  In addition, for separate work in MPro, we ran similar CETSA experiments where we observed an average of 2 PARP9 and 15 PARP14 spectral counts across conditions. Although PARP9 and PARP14 massively increase expression upon IFN treatment in A549 cells, both proteins have been detected by Western Blot in A549 cells previously at baseline.

      Nonetheless, we have included modeling of more diverse macrodomains as a supplemental figure and added to the text:

      Modeling of other diverse macrodomains, including those within human PARP9 and PARP14 further suggests that AVI-4206 is selective for Mac1 (Supplementary Figure 4)

      (b) In the context of SARS-CoV-2 superinfection are a known major complication of infections. These superinfections are associated with lung damage and therefore it would be good if the authors could assess lung damage, e.g. by histology, to see if their treatment has a positive impact on lung damage and thus may help to suppress complications.

      We performed histology and the results are inconclusive, but suggest that AVI-4206 treatment could lower apoptosis.There is no difference in pathology between the N40D cohort and vehicle with these markers. This could suggest that AVI-4206 provides an additional mechanism that results in protection.  We added to the results:

      Caspase 3 staining shows that AVI-4206 treatment reduces apoptosis in the lungs compared to vehicle controls. Additionally, Masson's Trichrome staining reveals  a significant reduction in collagen deposition, a surrogate for lung pathology, in the lungs of AVI-4206 treated animals.(Supplementary Figure 9).

      Histology:

      Mouse lung tissues were fixed in 4% PFA (Sigma Aldrich, Cat #47608) for 24 hours, washed three times with PBS and stored in 70% ethanol. All the stainings were performed at Histo-Tec Laboratory (Hayward, CA). Samples were processed, embedded in paraffin, and sectioned at 4μm. The slides were dewaxed using xylene and alcohol-based dewaxing solutions. Epitope retrieval was performed by heat-induced epitope retrieval (HIER) of the formalin-fixed, paraffin-embedded tissue using citrate-based pH 6 solution (Leica Microsystems, AR9961) for 20 mins at 95°C. The tissues were stained for H&E, caspase-3 (Biocare #CP229c 1:100), and trichrome, dried, coverslipped (TissueTek-Prisma Coverslipper), and visualized using Axioscan 7 slide scanner (ZEISS) at 40X. Image quantification was performed with Image J software and GraphPad Prism.

      (c) Fig. 1D labelling is wrong

      Thank you - fortunately the data were plotted correctly and it was just the inset table of values that was incorrect. This is now fixed!

      (d) Line 88: "T" missing at start of sentence

      Fixed, thank you!

      (e) Line 118: NudT5/AMP-Glo assay was developed in https://doi.org/10.1021/acs.orglett.8b01742

      We have added this foundational reference, thank you!

      (f) Line 147ff: It would be good if the authors could highlight that the TPP methodology has known limitations (e.g. detection of low abundance proteins and low thermal shift of some binders) and thus is not an absolute proof that AVI-4206 "engage with high specificity for Mac1"

      We added this important context to the concluding sentence of this paragraph:

      “While this assay may not be sensitive to detection of proteins with low abundance proteins or low thermal shift upon ligand binding, collectively, these results indicate that AVI-4206 can cross cellular membranes and engage with high specificity for Mac1.”

      (g) The authors use their well established in vitro Mac1 model as well as the SARS-CoV-2 WA strain. Given the ongoing diversification of SARS-CoV-2 and the current prevalence of the Omicron VOC it would be good if the authors could investigate whether alteration in Mac1 occurred or are detected which could influence the efficacy of their inhibitor. Similarly, it would be interesting to know how effective their drug is on other clinically relevant beta-CoV Mac1, e.g. from MERS or SARS1.

      We thank the reviewer for the suggestion. Mac1 is one of the more conserved areas of the SARS-CoV-2 genome as there has only been one nonsynonymous mutation V34L (Orf1a:V1056L) that recently emerged in the BA.2.86 lineage and is now in all of the JN.1 derivatives. Currently, the mutation is only ~80% penetrant in circulating SARS-CoV-2 sequences suggesting that it might revert to wild-type and is not associated with a fitness benefit. Based on our structural analysis (shown in Supplementary Figure4D above), we do not believe this mutation affects AVI-4206 binding, but we are including this variant in our future in vitro and in vivo studies as well as other beta-CoV.  For SARS and MERS, see response to Reviewer 1 using CETSA to show that these targets are engaged by AVI-4206.

      (h) As methods to detect PARP14-derived ADP-ribosylation are available and it was shown that Mac1 can reverse this modification in cells. It would be good if the authors could investigate the impact of AVI-4206 on ADP-ribosylation in vivo.

      To test this idea we adapted the IF assay used by others in the field and show an effect of AVI-4206. We have added to the text:

      Although the IFN response was not sufficient to control viral replication, it is possible that the changes in ADP-ribosylation, in particular marks catalyzed by PARP14, downstream of IFN treatment could serve as a marker for Mac1 efficacy  (Ribeiro et al. 2025). To investigate whether downstream signals from PARP14 were specifically erased by Mac1, we used an immunofluorescence assay that showed that Mac1 could remove IFN-γ-induced ADP-ribosylation that is mediated by PARP14 (Kar et al. 2024).  We stably expressed wild-type Mac1 and the N40D mutant Mac1 in A549 cells. The data showed that Mac1 expression decreased IFN-γ-induced ADP-ribosylation, whereas the Mac1-N40D mutant did not (Figure 3E, F), indicating that Mac1 mediates the hydrolysis of IFN-γ-induced ADP-ribosylation. The PARP14 inhibitor RBN012759 completely blocked IFN-γ-induced ADP-ribosylation (Figure 3E, F), further confirming that IFN-γ-induced ADP-ribosylation is mediated by PARP14. AVI-4206 reversed the Mac1-induced hydrolysis of ADP-ribosylation and enhanced the ADP-ribosylation signal in Mac1-overexpressing cells (Figure 3E, F), further demonstrating its ability to inhibit the hydrolase activity of Mac1. We further validated this result using different ADP-ribosylation antibodies for immunofluorescence (Supplementary Figure 7). However, we observed no strong consistent signals of global pan-ADP-ribose (panADPr) or mono-ADP-ribose (monoADPr) accumulation in infected cells treated with AVI-4206 in immunoblot analyses (Supplementary Figure 8). Collectively, these results provide further evidence that simple cellular models are insufficient to explore the effects of Mac1 inhibition and that monitoring specific PARP14-mediated ADP-ribosylation patterns can provide an accessible biomarker for the efficacy of Mac1 inhibition.

      A549 Mac1 expression cell construction

      Mac1 wild-type (Mac1) and N1062D mutant (Mac1 N1062D) gene fragments were loaded into pLVX-EF1α-IRES-Puro (empty vector, EV) using Gibson cloning kit (NEB E5510). Lentivirus was prepared as previously described (PMID: 30449619; DOI: 10.1016/j.cell.2018.10.024). Briefly, 15 million HEK293T cells were grown overnight on 15 cm poly-L-Lysine coated dishes and then transfected with 6 ug pMD2.G (Addgene plasmid # 12259 ; http://n2t.net/addgene:12259 ; RRID:Addgene_12259), 18 ug dR8.91 (since replaced by second generation compatible pCMV-dR8.2, Addgene plasmid #8455) and 24 ug pLVX-EF1α-IRES-Puro (EV, Mac1, Mac1-N1062D) plasmids using the lipofectamine 3000 transfection reagent per the manufacturer’s protocol (Thermo Fisher Scientific, Cat #L3000001). pMD2.G and dR8.91 were a gift from Didier Trono. The following day, media was refreshed with the addition of viral boost reagent at 500x as per the manufacturer’s protocol (Alstem, Cat #VB100). Viral supernatant was collected 48 hours post transfection and spun down at 300 g for 10 minutes, to remove cell debris. To concentrate the lentiviral particles, Alstem precipitation solution (Alstem, Cat #VC100) was added, mixed, and refrigerated at 4°C overnight. The virus was then concentrated by centrifugation at 1500 g for 30 minutes, at 4°C. Finally, each lentiviral pellet was resuspended at 100x of original volume in cold DMEM+10%FBS+1% penicillin-streptomycin and stored until use at -80°C. To generate Mac1 overexpressing cells, 2 million A549 cells were seeded in 10 cm dishes and transduced with lentivirus in the presence of 8 μg/mL polybrene (Sigma, TR-1003-G). The media was changed after 24h and, after 48 hours, media containing 2μg/ml puromycin was added. Cells were selected for 72 hours and then expanded without selection. The expression of Mac1 was confirmed by Western Blot.

      Immunofluorescence assay:

      To assess the effect of Mac1 on IFN-induced ADP-ribosylation. A549-pLVX-EV, A549-pLVX-Mac1 and A549-pLVX-Mac1-N1062D cells were seeded in 96-well plate (10,000 cells/well). Cells were pre-treated with medium or 100 unit/mL IFN-γ (Sigma, SRP3058) for 24 hours to induce the expression of ADP-ribosylation. These 3 cell lines were then treated the next day with the indicated concentrations of AVI-4206 or RBN012759 (Medchemexpress, HY-136979). After 24 hours of exposure to drugs, treated cells were fixed in pre-cooled methanol at -20°C for 20 min, blocked in 3% bovine serum albumin for 15 min, incubated with Poly/Mono-ADP Ribose (E6F6A) Rabbit mAb (CST, 83732S) or Poly/Mono-ADP Ribose (D9P7Z) Rabbit mAb (CST, 89190S) antibodies for 1 h, and then incubated with Goat anti-Rabbit IgG Secondary Antibody, Alexa Fluor 488 (ThermoFisher, A-11008) secondary antibodies for 30 min and stained with DAPI for 10 minutes. Fluorescent cells were imaged with an IN Cell Analyzer 6500 System (Cytiva) and analyzed using IN Carta software (Cytiva).

      Reviewer #3 (Recommendations for the authors):

      Just a couple of observations/details that might help strengthen the article:

      (1) The caco-1 data for AVI4206 would suggest that there is some sort of efflux going on, yet there is no mention of it in the paper. This might be useful in the optimization paradigm moving forward.

      We thank the reviewer for this observation and suggestion.  Indeed, we believe that efflux is behind the low oral bioavailability of AVI-4206.  We are working specifically to remove this liability in next-generation analogs, using the caco2 assay to guide this ongoing effort. Keep an eye out for a preprint on this soon!  We have added to the discussion:

      “In addition to dissecting such molecular mechanisms of macrodomain function and inhibition, future efforts will focus on improving pharmacokinetic properties, including a cellular efflux liability that results in low oral bioavailability of AVI-4206. ”

      (2) There are some spectroscopic anomalies/mistakes in the NMR data. The carbon NMR for 1-((8-amino-9H-pyrimido[4,5-b]indol-4-yl)amino)pyrrolidin-2-one should only have 14 unique carbons, but the authors report 15. The HNMR for AVI1500 should only have 19 H's, but the authors list 20. The HNMR data for AVI3762/3763 should have 16 H's, but the authors only report 13. The CNMR for AVI4206 should only have 19 unique carbons, but the authors report 20.

      Thank you for noting these inconsistencies regarding the reported NMR spectra. We have rectified them by more closely examining the spectra and in some cases acquiring new data. We identified one peak (47.9) in the 13C NMR of 1-((8-amino-9H-pyrimido[4,5-b]indol-4-yl)amino)pyrrolidin-2-one that is apparently an artifact of the automated peak picking in the data analysis software.  In the 1H NMR of AVI-1500, the triplet peak at 7.20 integrates to 1H, but was erroneously reported as 2H in the original manuscript.  This error has been corrected.  Spectra were re-acquired for AVI-3762, AVI-3763, and AVI-4206 with longer acquisition times, and/or on a 600 MHz spectrometer to afford the complete line lists now reported in the revised manuscript. Please note AVI-4206 has 18 distinct 13C resonances due to the equivalence of the gem-dimethyl methyl groups.

    1. eLife Assessment

      This study reanalyzed previously published scRNA-seq and TCR-seq data to examine the proportion and characteristics of dual-TCR-expressing Treg cells in mice, presenting some useful insights into TCR diversity and immune regulation. However, the evidence is incomplete, particularly with respect to data interpretation, statistical rigor, and the functionality of dual -TCR Treg cells. The study is potentially of interest to immunologists studying T-cell biology.

    2. Reviewer #2 (Public review):

      Summary:

      The manuscript, by Xu and Peng, et al. investigates whether co-expression of 2 T cell receptor (TCR) clonotypes can be detected in FoxP3+ regulatory CD4+ T cells (Tregs) and if it is associated with identifiable phenotypic effects. This paper presents data reanalyzing publicly available single-cell TCR sequencing and transcriptional analysis, convincingly demonstrating that dual TCR co-expression can be detected in Tregs, both in peripheral circulation as well as among Tregs in tissues. They then compare metrics of TCR diversity between single-TCR and dual TCR Tregs, as well as between Tregs in different anatomic compartments, finding the TCR repertoires to be generally similar though with dual TCR Tregs exhibiting a less diverse repertoire and some moderate differences in clonal expansion in different anatomic compartments. Finally, they examine the transcriptional profile of dual TCR Tregs in these datasets, finding some potential differences in expression of key Treg genes such as Foxp3, CTLA4, Foxo3, Foxo1, CD27, IL2RA, and Ikzf2 associated with dual TCR-expressing Tregs, which the authors postulate implies a potential functional benefit for dual TCR expression in Tregs.

      Strengths:

      This report examines an interesting and potentially biologically significant question, given recent demonstrations that dual TCR co-expression is a much more common phenomenon than previously appreciated (approximately 15-20% of T cells) and that dual TCR co-expression has been associated with significant effects on the thymic development and antigenic reactivity of T cells. This investigation leverages large existing datasets of single-cell TCRseq/RNAseq to address dual TCR expression in Tregs. The identification and characterization of dual TCR Tregs is rigorously demonstrated and presented, providing convincing new evidence of their existence.

      Weaknesses:

      The existence of dual TCR expression by Tregs has previously been demonstrated in mice and humans, limiting the novelty of the reported findings. The presented results should be considered in the context of these prior important findings. The focus on self-citation of their previous work, using the same approach to measure dual TCR expression in other datasets. limits the discussion of other more relevant and impactful published research in this area. Also, Reference #7 continues to list incorrect authors. The authors do not present a balanced or representative description of the available knowledge about either dual TCR expression by T cells or TCR repertoires of Tregs.

      The approach used follows a template used previously by this group for re-analysis of existing datasets generated by other research groups. The descriptions and interpretations of the data as presented are still shallow, lacking innovative or thoughtful approaches that would potentially be innovation or provide new insight.

      This demonstration of dual TCR Tregs is notable, though the authors do not compare the frequency of dual TCR co-expression by Tregs with non-Tregs. This limits interpreting the findings in the context of what is known about dual TCR co-expression in T cells. The response to this criticism in a previous review is considered non-responsive and does not improve the data or findings.

      Comparison of gene expression by single- and dual TCR Tregs is of interest, but as presented is difficult to interpret. The interpretations of the gene expression analyses are somewhat simplistic, focusing on single-gene expression of some genes known to have function in Tregs. However, the investigators continue to miss an opportunity to examine larger patterns of coordinated gene expression associated with developmental pathways and differential function in Tregs (Yang. 2015. Science. 348:589; Li. 2016. Nat Rev Immunol. Wyss. 2016. 16:220; Nat Immunol. 17:1093; Zenmour. 2018. Nat Immunol. 19:291). No attempt to define clusters is made. No comparison is made of the proportions of dual TCR cells in transcriptionally-defined clusters. The broad assessment of key genes by single- and dual TCR cells is conceptually interesting, but likely to be confounded by the heterogeneity of the Treg populations. This would need to be addressed and considered to make any analyses meaningful.

      The study design, re-analysis of existing datasets generated by other scientific groups, precludes confirmation of any findings by orthogonal analyses.

    3. Reviewer #3 (Public review):

      Summary:

      This study addressed the TCR pairing types and CDR3 characteristics of Treg cells. By analyzing scRNA and TCR-seq data, it claims that 10-20% of dual TCR Treg cells exist in mouse lymphoid and non-lymphoid tissues and suggests that dual TCR Treg cells in different tissues may play complex biological functions.

      Strengths:

      The study addresses an interesting question of how dual-TCR-expressing Treg cells play roles in tissues.

      Weaknesses:

      This study is inadequate, particularly regarding data interpretation, statistical rigor, and the discussion of the functional significance of Dual TCR Tregs.

      Comments on revisions:

      Although the authors have provided brief explanations in response to the reviewers' comments, they do not present any additional analyses that would address the fundamental concerns in a convincing manner.

      Moreover, the in silico analyses presented in the manuscript alone are insufficient to support the conclusions, and the functional experiments requested by the reviewers have not been conducted.

      In the current rebuttal, while some textual additions have been made to the manuscript, the only substantial revision to the figures appears to be the inclusion of statistical significance annotations (e.g., Fig. 1G, Fig. 3G). These changes do not adequately strengthen the overall data or address the core issues raised.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      (1) The use of single-cell RNA and TCR sequencing is appropriate for addressing potential relationships between gene expression and dual TCR.

      Thank you for your detailed review and suggestions. The main advantages of scRNA+TCR-seq are as follows: (1) It enables comparative analysis of features such as the ratio of single TCR paired T cells to dual TCR paired T cells at the level of a large number of individual T cells, through mRNA expression of the α and β chains. In the past, this analysis was limited to a small number of T cells, requiring isolation of single T cells, PCR amplification of the α and β chains, and Sanger sequencing; (2) While analyzing TCR paired T cell characteristics, it also allows examination of mRNA expression levels of transcription factors in corresponding T cells through scRNA-seq.

      (2) The data confirm the presence of dual TCR Tregs in various tissues, with proportions ranging from 10.1% to 21.4%, aligning with earlier observations in αβ T cells.

      Thank you very much for your detailed review and suggestions. Early studies on dual TCR αβ T cells have been very limited in number, with reported proportions of dual TCR T cells ranging widely from 0.1% to over 30%. In contrast, scRNA+TCR-seq can monitor over 5,000 single and paired TCRs, including dual paired TCRs, in each sample, enabling more precise examination of the overall proportion of dual TCR αβ T cells. It is important to note that our analysis focuses on T cells paired with functional α and β chains, while T cells with non-functional chain pairings and those with a single functional chain without pairing were excluded from the total cell proportion analysis. Previous studies generally lacked the ability to determine expression levels of specific chains in T cells without dual TCR pairings.

      (3) Tissue-specific patterns of TCR gene usage are reported, which could be of interest to researchers studying T cell adaptation, although these were more rigorously analyzed in the original works.

      Thank you very much for your detailed review and suggestions. T cell subpopulations exhibit tissue specificity; thus, we conducted a thorough investigation into Treg cells from different tissue sites. This study builds upon the original by innovatively analyzing the differences in VDJ rearrangement and CDR3 characteristics of dual TCR Treg cells across various tissues. This provides new insights and directions for the potential existence of “new Treg cell subpopulations” in different tissue locations. The results of this analysis suggest the necessity of conducting functional experiments on dual TCR Treg cells at both the TCR protein level and the level of effector functional molecules.

      (4) Lack of Novelty: The primary findings do not substantially advance our understanding of dual TCR expression, as similar results have been reported previously in other contexts.

      Thank you for your detailed review and suggestions. Early research on dual TCR T cells primarily relied on transgenic mouse models and in vitro experiments, using limited TCR alpha chain or TCR beta chain antibody pairings. Flow cytometry was used to analyze a small number of T cells to estimate dual TCR T cell proportion. No studies have yet analyzed dual TCR Treg cell proportion, V(D)J recombination, and CDR3 characteristics at high throughput in physiological conditions. The scRNA+TCR-seq approach offers an opportunity to conduct extensive studies from an mRNA perspective. With high-throughput advantages of single-cell sequencing technology, researchers can analyze transcriptomic and TCR sequence characteristics of all dual TCR Treg cells within a study sample, providing new ideas and technical means for investigating dual TCR T cell proportions, characteristics, and origins under different physiological and pathological states.

      (5) Incomplete Evidence: The claims about tissue-specific differences lack sufficient controls (e.g., comparison with conventional T cells) and functional validation (e.g., cell surface expression of dual TCRs).

      Thank you for your detailed review and suggestions. This study indeed only analyzed dual  TCR Treg cells from different tissue locations based on the original manuscript, without a comparative analysis of other dual TCR T cell subsets corresponding to these tissue locations. The main reason for this is that, in current scRNA+TCR-seq studies of different tissue locations, unless specific T cell subsets are sorted and enriched, the number of T cells obtained from each subset is very low, making a detailed comparative analysis impossible. In the results of the original manuscript, we observed a relatively high proportion of dual TCR Treg cell populations in various tissues, with differences in TCR composition and transcription factor expression. Following the suggestions, we have included additional descriptions in R1, citing the study by Tuovinen et al., which indicates that the proportion of dual TCR Tregs in lymphoid tissues is higher than other T cell types. This will help understand the distribution characteristics of dual TCR Treg cells in different tissues and provide a basis for mRNA expression levels to conduct functional experiments on dual TCR Treg cells in different tissue locations.

      (6) Methodological Weaknesses: The diversity analysis does not account for sample size differences, and the clonal analysis conflates counts and clonotypes, leading to potential misinterpretation.

      We thank you for your review and suggestions. In response to your question about whether the diversity analysis considered the sample size issue, we conducted a detailed review and analysis. This study utilized the inverse Simpson index to evaluate TCR diversity of Treg cells. A preliminary analysis compared the richness and evenness of single TCR Treg cell and dual TCR Treg cell repertoires. The two datasets analyzed were from four mouse samples with consistent processing and sequencing conditions. However, when analyzing single TCR Tregs and dual TCR Tregs from various tissues, differences in detected T cell numbers by sequencing cannot be excluded from the diversity analysis. Following recommendations, we provided additional explanations in R1: CDR3 diversity analysis indicates TCR composition of dual TCR Treg cells exhibits diversity, similar to single TCR Treg cells; however, diversity indices of single TCR Tregs and dual TCR Tregs are not suitable for statistical comparison. Regarding the "clonal analysis" you mentioned, we define clonality based on unique TCR sequences; cells with identical TCR sequences are part of the same clone, with ≥2 counts defined as expansion. For example, in Blood, there are 958 clonal types and 1,228 cells, of which 449 are expansion cells. In R1, we systematically verified and revised clonal expansion cells across all tissue samples according to a unified standard.

      (7) Insufficient Transparency: The sequence analysis pipeline is inadequately described, and the study lacks reproducibility features such as shared code and data.

      Thank you for your review and suggestions. Based on the original manuscript, we have made corresponding detailed additions in R1, providing further elaboration on the analysis process of shared data, screening methods, research codes, and tools. This aims to offer readers a comprehensive understanding of the analytical procedures and results.

      (8) Weak Gene Expression Analysis: No statistical validation is provided for differential gene expression, and the UMAP plots fail to reveal meaningful clustering patterns.

      Thank you very much for your review and suggestions. Based on your recommendations, we conducted an initial differential expression analysis of the top 10 mRNA molecules in single TCR Treg and dual TCR Treg cells using the DESeq2 R package in R1, with statistical significance determined by Padj < 0.05. Regarding the clustering patterns in the UMAP plots, since the analyzed samples consisted of isolated Treg cell subpopulations that highly express immune suppression-related genes, we did not perform a more detailed analysis of subtypes and expression gene differences. This study primarily aims to explore the proportions of single TCR and dual TCR Treg cells from different tissue sources, as well as the characteristics of CDR3 composition, with a focus on showcasing the clustering patterns of samples from different tissue origins and various TCR pairing types.

      (9) A quick online search reveals that the same authors have repeated their approach of reanalysing other scientists' publicly available scRNA-VDJ-seq data in six other publications,In other words, the approach used here seems to be focused on quick re-analyses of publicly available data without further validation and/or exploration.

      Thank you for your review and suggestions. Most current studies utilizing scRNA+TCR-seq overlook analysis of TCR pairing types and related research on single TCR and dual TCR T cell characteristics. Through in-depth analysis of shared scRNA+TCR-seq data from multiple laboratories, we discovered a significant presence of dual TCR T cells in high-throughput T cell research results that cannot be ignored. In this study, we highlight the higher proportion of dual TCR Tregs in different tissue locations, which exhibits a certain degree of tissue specificity, suggesting these cells may participate in complex functional regulation of Tregs. This finding provides new ideas and a foundation for further research into dual TCR Treg functions. However, as reviewers pointed out, findings from scRNA+TCR-seq at the mRNA level require additional functional experiments on dual TCR T cells at the protein level. We have supplemented our discussion in R1 based on these suggestions.

      Reviewer #2 (Public review):

      (1) The existence of dual TCR expression by Tregs has previously been demonstrated in mice and humans (Reference #18 and Tuovinen. 2006. Blood. 108:4063; Schuldt. 2017. J Immunol. 199:33, both omitted from references). The presented results should be considered in the context of these prior important findings.

      Thank you very much for your review and suggestions. Based on the original manuscript, we have supplemented our reading, understanding, and citation of closely related literature (Tuovinen, 2006, Blood, 108:4063 (line 44,line175 in R1); Schuldt, 2017, J Immunol, 199:33 (line 44,line178 in R1)). We once again appreciate the valuable comments from the reviewers, and we will refer to these in our subsequent dual TCR T cell research.

      (2) This demonstration of dual TCR Tregs is notable, though the authors do not compare the frequency of dual TCR co-expression by Tregs with non-Tregs. This limits interpreting the findings in the context of what is known about dual TCR co-expression in T cells.

      Thank you very much for your review and suggestions. This analysis is primarily based on the scRNA+TCR-seq study of sorted Treg cells, where we found the proportions and distinguishing features of dual TCR Treg cells in different tissue sites. Given the diversity and complexity of Treg function, conducting a comparative analysis of the origins of dual TCR Treg cells and non-T cells with dual TCRs will be a meaningful direction. Currently, peripheral induced Treg cells can originate from the conversion of non-Treg cells; however, little is known about the sources and functions of dual TCR Treg cell subsets in both central and peripheral sites. In R1, we have supplemented the discussion regarding the possible origins and potential applications of the "novel dual TCR Treg" subsets.

      (3) Comparison of gene expression by single- and dual TCR Tregs is of interest, but as presented is difficult to interpret. Statistical analyses need to be performed to provide statistical confidence that the observed differences are true.

      Thank you very much for your review and suggestions. Based on your recommendations, we performed an initial differential expression analysis of the top 10 mRNA molecules in single TCR Treg and dual TCR Treg cells using the DESeq2 R package in R1, with a statistical significance threshold of Padj<0.05 for comparisons.

      (4) The interpretations of the gene expression analyses are somewhat simplistic, focusing on the single-gene expression of some genes known to have a function in Tregs. However, the investigators miss an opportunity to examine larger patterns of coordinated gene expression associated with developmental pathways and differential function in Tregs (Yang. 2015. Science. 348:589; Li. 2016. Nat Rev Immunol. Wyss. 2016. 16:220; Nat Immunol. 17:1093; Zenmour. 2018. Nat Immunol. 19:291).

      Thank you for your review and suggestions. This study is based on publicly available scRNA+TCR-seq data from different organ sites generated by the original authors, focusing on sorted and enriched Treg cells within each tissue sample. However, there was no corresponding research on other cell types in each tissue sample, preventing analysis of other cells and factors involved in development and differentiation of single TCR Treg and dual TCR Treg. The literature suggested by the reviewer indicates that development, differentiation, and function of Treg cells have been extensively studied, resulting in significant advances. It also highlights complexity and diversity of Treg origins and functions. This research aims to investigate "novel dual TCR Treg cell subpopulations" that may exhibit tissuespecific differences found in the original authors' studies of Treg cells across different organ sites. This suggests further experimental research into their development, differentiation, origin, and functional gene expression as an important direction, which we have supplemented in the discussion section of R1.

      Reviewer #3 (Public review):

      (1) Definition of Dual TCR and Validity of Doublet Removal:This study analyzes Treg cells with Dual TCR, but it is not clearly stated how the possibility of doublet cells was eliminated. The authors mention using DoubletFinder for detecting doublets in scRNA-seq data, but is this method alone sufficient?We strongly recommend reporting the details of doublet removal and data quality assessment in the Supplementary Data.

      Thank you very much for your review and suggestions. In the analysis of the shared scRNA+TCR-seq data across multiple laboratories, as you mentioned, this study employed the DoubletFinder R package to exclude suspected doublets. Additionally, we used the nCount values of individual cells (i.e., the total sequencing reads or UMI counts for each cell) as auxiliary parameters to further optimize the assessment of cell quality. Generally, due to the possibility that doublet cells may contain gene expression information from two or more cells, their nCount values are often abnormally high. In this study, all cells included in the analysis had nCount values not exceeding 20,000. Among the five tissue sample datasets, we further utilized hashtag oligonucleotide (HTO) labeling (where HTO labeling provides each cell with a unique barcode to differentiate cells from different tissue sources. By analyzing HTO labels, doublets and negative cells can be accurately identified) to eliminate doublets and negative cells.After the removal of chimeric cells, all samples exhibited T cells that possessed two or more TCR clones. This phenomenon validates the reliability of the methodological approach employed in this study and indicates that the analytical results accurately reflect the proportion of dual TCR T cells. Based on the recommendations of the reviewers, we have supplemented and clarified the methods and discussion sections in the manuscript. It is particularly noteworthy that in our analysis, the discussed dual TCR Treg cells and single TCR Treg cells specifically refer to those T cells that possess both functional α and β chains, which are capable of forming TCR. We have excluded from this analysis any Treg cells that possess only a single functional α or β chain and do not form TCR pairs, as well as those Treg cells in which the α or β chains involved in TCR pairing are non-functional.

      (2) In Figure 3D, the proportion of Dual TCR T cells (A1+A2+B1+B2) in the skin is reported to be very high compared to other tissues. However, in Figure 4C, the proportion appears lower than in other tissues, which may be due to contamination by non-Tregs. The authors should clarify why it was necessary to include non-Tregs as a target for analysis in this study. Additionally, the sensitivity of scRNA-seq and TCR-seq may vary between tissues and may also be affected by RNA quality and sequencing depth in skin samples, so the impact of measurement bias should be assessed.

      We deeply appreciate your review and constructive comments. Based on the original manuscript, we have further supplemented and elaborated on the uniqueness and relative proportions of double TCR T cell pairs in skin tissue samples in Section R1. Due to the scarcity of T cells in skin samples, we included some non-Treg cells during single-cell RNA sequencing and TCR sequencing to obtain a sufficient number of cells for effective analysis. The presence of non-regulatory T cells may indeed impact the statistical representation of double TCR T cells as well as the related comparative analyses, as noted by the reviewer. T cells with A1+A2+B1+B2 type double TCR pairings are primarily found within the non-regulatory T cell population in the skin. In response to this point, we have provided a detailed explanation of this analytical result in the revised manuscript R1. Furthermore, concerning the two datasets included in the study, we conducted a comparative analysis in R1, exploring how factors such as sequencing depth at different tissue sites might introduce biases in our findings, which we have thoroughly elaborated upon in the discussion section. We thank you once again for your valuable suggestions. 

      (3) Issue of Cell Contamination:In Figure 2A, the data suggest a high overlap between blood, kidney, and liver samples, likely due to contamination. Can the authors effectively remove this effect? If the dataset allows, distinguishing between blood-derived and tissue-resident Tregs would significantly enhance the reliability of the findings. Otherwise, it would be difficult to separate biological signals from contamination noise, making interpretation challenging.

      We thank you for your review and suggestions. We have carefully verified data sources for tissues such as blood, kidneys, and liver. In the study by Oliver T et al., various techniques were employed to differentiate between leukocytes from blood and those from tissues, ensuring accurate identification of leukocytes from tissue samples. First, anti-CD45 antibody was injected intravenously to label cells in the vasculature, verifying that analyzed cells were indeed resident in the tissue. Second, prior to dissection and cell collection, authors performed perfusion on anesthetized mice to reduce contamination of tissue samples by leukocytes from the vasculature. Additionally, during single-cell sequencing, authors utilized HTO technology to avoid overlap between cells from different tissues.

      Analysis of the scRNA+TCR-seq data shared by the original authors revealed highly overlapping TCR sequences in blood, kidney, and liver, despite distinct cell labels associated with each tissue. While these techniques minimize overlap of cells from different sources, they cannot completely rule out the potential impact of this technical issue. As suggested, we have provided additional clarification in R1 of the manuscript regarding this phenomenon of high overlap in the kidney, liver, and blood, indicating that the possibility of Treg migration from blood to kidney and liver cannot be entirely excluded.

      (4) Inconsistency Between CDR3 Overlap and TCR Diversity:The manuscript states that Single TCR Tregs have a higher CDR3 overlap, but this contradicts the reported data that Dual TCR Tregs exhibit lower TCR diversity (higher 1/DS score). Typically, when TCR diversity is low (i.e., specific clones are concentrated), CDR3 overlap is expected to increase. The authors should carefully address this discrepancy and discuss possible explanations.

      Thank you for your review and suggestions. Regarding the potential relationship between CDR3 overlap and TCR diversity, in samples with consistent sequencing depth, lower diversity indeed corresponds to a higher proportion of CDR3 overlap. In our analysis of scRNA+TCR-seq data, we found that single TCR Tregs exhibit both higher diversity and CDR3 overlap, seemingly presenting contradictory analytical results (i.e., dual TCR Tregs show lower TCR diversity and CDR3 overlap). In R1, we supplemented the analysis of possible reasons: the presence of multiple TCR chains in dual TCR Treg cells may lead to a higher uniqueness of CDR3 due to multiple rearrangements and selections, resulting in lower CDR3 overlap; the lower diversity of dual TCR Tregs may be related to the number of T cells sequenced in each sample. The CDR3 diversity analysis in this study merely suggests that the TCR composition of dual TCR Treg cells is diverse, similar to that of single TCR Tregs. However, the diversity indices of single TCR Tregs and dual TCR Tregs are not suitable for statistical comparative analysis. A more in-depth and specific analysis of the diversity and overlap of the VDJ recombination mechanisms and CDR3 composition in dual TCR Tregs during development will be an important technical means to elucidate the function of dual TCR Treg cells.

      (5) Functional Evaluation of Dual TCR Tregs:This study indicates gene expression differences among tissue-resident Dual TCR T cells, but there is no experimental validation of their functional significance. Including functional assays, such as suppression assays or cytokine secretion analysis, would greatly enhance the study's impact.

      We sincerely appreciate your review and suggestions: In this analysis of scRNA+TCR-seq data, we innovatively discovered a higher proportion of dual TCR Treg cells in different tissue sites, which exhibited differences in tissue characteristics. Furthermore, we conducted a comparative analysis of the homogeneity and heterogeneity between single TCR Treg and dual TCR Treg cells. This result provides a foundation for further research on the origin and characteristics of dual TCR Treg cells in different tissue sites, offering new insights for understanding the complexity and functional diversity of Treg cells. Based on your suggestions, we have supplemented R1 with the feasibility of further exploring the functions of tissue-resident dual TCR T cells and the necessity for potential application research.

      (6) Appropriateness of Statistical Analysis:When discussing increases or decreases in gene expression and cell proportions (e.g., Figure 2D), the statistical methods used (e.g., t-test, Wilcoxon, FDR correction) should be explicitly described. They should provide detailed information on the statistical tests applied to each analysis.

      Thank you for your review and suggestions: Based on the original manuscript, we have supplemented the specific statistical methods for the differences in cell proportions and gene expression in R1.

    1. eLife Assessment

      This study proposes an important new approach to analyzing cell-count data, which are often undersampled and cannot be accurately assessed using traditional statistical methods. The case studies presented in the article provide compelling evidence of the superiority of the proposed methodology over existing approaches, which could promote the use of Bayesian statistics among neuroscientists. The authors have taken steps to make the methodology accessible, although some implementation difficulties are likely to remain.

    2. Reviewer #1 (Public review):

      Summary:

      This work proposes a new approach to analyse cell-count data from multiple brain regions. Collecting such data can be expensive and time-intensive, so, more often than not, the dimensionality of the data is larger than the number of samples. The authors argue that Bayesian methods are much better suited to correctly analyse such data compared to classical (frequentist) statistical methods. They define a hierarchical structure, partial pooling, in which each observation contributes to the population estimate to more accurately explain the variance in the data. They present two case studies in which their method proves more sensitive in identifying regions where there are significant differences between conditions, which otherwise would be hidden.

      Strengths:

      The model is presented clearly, and the advantages of the hierarchical structure are strongly justified. Two alternative ways are presented to account for the presence of zero counts. The first involves the use of a horseshoe prior, which is the more flexible option, while the second involves a modified Poisson likelihood, which is better suited to datasets with a large number of zero counts, perhaps due to experimental artifacts. The results show a clear advantage of the Bayesian method for both case studies.<br /> The code is freely available, and it does not require a high-performance cluster to execute for smaller datasets. As Bayesian statistical methods become more accessible in various scientific fields, the whole scientific community will benefit from the transition away from p-values. Hierarchical Bayesian models are an especially useful tool that can be applied to many different experimental designs. However, while conceptually intuitive, their implementation can be difficult. The authors provide a good framework with room for improvement.

      Weaknesses:

      As with any Bayesian model, the choice of prior can significantly influence the results. The authors explain how the methodology can be adapted to different data properties, though selecting an appropriate prior or likelihood may not always be straightforward. They propose a 'standard workflow' as an alternative to traditional approaches, which could and should be used alongside established methods while Bayesian techniques continue to evolve and improve.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      “Alternative possibilities are discussed regarding the prior and likelihood of the model. Given that the second case study inspired the introduction of the zero-inflation likelihood, it is not clear how applicable the general methodology is to various datasets. If every unique dataset requires a tailored prior or likelihood to produce the best results, the methodology will not easily replace more traditional statistical analyses that can be applied in a straightforward manner. Furthermore, the differences between the results produced by the two Bayesian models in case study 2 are not discussed. In specific regions, the models provide conflicting results (e.g., regions MH, VPMpc, RCH, SCH, etc.), which are not addressed by the authors. A third case study would have provided further evidence for the generalizability of the methodology.”

      We hope in this paper to propose a ‘standard workflow’ for these data; this standard workflow uses the horseshoe prior and we propose that this is the approach used to describe cell count data instead of the better established, but to our thinking, inefficient, t-testing approach.

      The horseshoe prior is robust and allows a partially-pooled model to used while weighing-up the contribution of different data points. This is an analogue of excluding outliers and, in any analysis it is normal to investigate further if there are points being excluded as outliers. Often this reveals a particular challenge with the data, in the case of the data here, there are a lot of zeros, indicating that some samples should be excluded because the preparation failed to tag cells rather than because there were no cells to tag. This idea behind the ZIP example is to show that the Bayesian method can allow for this sort of further investigation and, indeed, as the reviewer notes this sort of extended analysis is often bespoke, tailored to the data.

      We have clearly failed to explain that the ‘standard workflow’ we propose replace the more traditional methods is the first one we describe, with the horseshoe prior; this produces better results on both datasets than the traditional approach. However, we also feel it is useful to show how a more tailored follow-on can be useful; we need to make it clear that this is intended as an illustration of an ‘optional extra’ rather than a part of the more straightforward ‘standard workflow’.

      To make this clearer we have made altered the text in several locations:

      • end of Introduction: added clarifying sentence “Here, our aim is to introduce a ‘standard’ Bayesian model for cell count data. We illustrate the application of this model to two datasets, one related to neural activation and the other to developmental lineage. For the second dataset, we also demonstrate a second example extension Bayesian model.”

      • Section Hierarchical modeling: “Our goal in both cases is to quantify group differences in the data. We present a ‘standard’ hierarchical model. This model reflects the experimental features common to cell count experiments and reflects the hierarchical structure of cell count data; the standard model is designed to deal robustly and efficiently with noise. On some occasions, to reflect a specific hypotheses, the structure of a particular experiment or an observed source of noise, this model can be further refined or changed to target the analysis. We will give an example of this for our second dataset.”

      • Section Horseshoe prior: “The alternative is via a flexible prior such as the horseshoe Carvalho et al., 2010; Piironen and Vehtari, 2017. This more generic option may be suitable as a default ‘standard’ approach in the typical case where outliers are poorly understood.”

      • Discussion: word ‘standard’ added to sentence: “Our standard workflow uses a horseshoe prior, along with the partial pooling, this allows our model to deal effectively with outliers.”

      • Discussion: modified sentence “The horseshoe prior model workflow we have exhibited here is intended as a standard approach.”

      Indeed, because the horseshoe prior deals robustly with outliers, whereas the ZIP is intended to model the outliers, any substantial difference between the two should be examined carefully. The referee is right to point out that we have not explained this in any detail and has helpfully listed a few brain regions were there are differences. This is useful, particularly since the examples listed illustrate in a useful way the opportunities and hazards this sort of data presents. To address this, we have added a new version of Figure 6 to the revised manuscript

      Previously Figure 6 showed two example brain regions: MPN and TMd. We have now added MH and SCH to the figure, and new text commenting on the insights the plots provide, both in the Results and Discussion.

      Reviewer #2 (Public review):

      “A clearer link between the experimental data and model-structure terminology would be a benefit to the non-expert reader.”

      This is a very good point and we are acutely aware through our own work how difficult it can be moving between fields with different research goals, different scientific cultures and different technical vocabularies. Just as it can be difficult translating from one language to another without losing nuance and meaning, it can be a real challenge finding technical terms that are useful for the non-expert reader while retaining the precision the application requires! In the long run, we hope that, just as some of the very specialized vocabulary that surrounds frequentist statistics has become familiar to to the working experimental scientists, the precise terminology involved in Bayesian modelling will become familiar and transparent. However, in advance of that day, we have included a glossary of terms at the end of the main text, and have made numerous small tweaks to make sure that link between data and model terminology is clearer and better explained.

      Reviewer #1 (Recommendations fro the authors):

      (1) “I would strongly recommend that the authors include more case studies in the manuscript, and address the qualitative differences between the different versions of the model.”

      We agree that our method will only become established when it is applied to more datasets, we hope to contribute to further analysis and we know other people are already using the approach on their own data. We do, however, feel that adding more datasets to this paper will make it longer and more complex; the plan, instead, is to use the method on novel datasets to test specific hypotheses, so that the results will include novel scientific findings as well as adding another illustration of the Bayesian approach applied to data that is already well studied.

      (2) “Figure 6 is not discussed in the main text.”

      We had discussed the results presented in Figure 6 in the second paragraph of the section “Case study two – Ontogeny of inhibitory interneurons of the mouse thalamus”, however the reviewer is right in that we did not directly refer to the Figure – this was an oversight. In any case, in the revised manuscript we present a new version of Figure 6 (in response to above comment), which is now explicitly cited in the text.

      Revised Figure 6: Example data and inferences highlighting model discrepancies. On the left under ‘data’: boxplots with medians and interquartile ranges for the raw data for four example brain regions. The shape of each point pairs left and right hemisphere readings in each of the five animals. On the right under ‘inference’: HDIs and confidence intervals are plotted. Purple is the Bayesian horseshoe model, pink is the Bayesian ZIP model, and orange is the sample mean. The Bayesian estimates are not strongly influenced by the zero-valued observations (MPN, SCH, TMd) or large-valued outliers (MH) and have means close to the data median. This explains the advantage of the Bayesian results over the confidence interval.

      Reviewer #2 (Recommendations from the authors):

      (1) “This is a generally well-written methodology paper that also provides the underlying code as a resource. As a reviewer outside both cell-count modelling and hierarchical-Bayesian approaches (though with a general interest in the topics) I found the method a little difficult to follow and would have liked to have been left with a better understanding of how the method is applied to the data. For example, in Figure 1 we are introduced to brain region count, animal count, and “items”. Then in the next line: pooling, model, structure, population and etc in subsequent lines. It is not clear what the subscripts (the pools?) are referring to: are they different regions R or animals N? These terms need to be better linked to the data and/or trimmed. Having said that, the later results look like a solid contribution to the field with a significant reduction in uncertainty from the Bayesian approach over the frequentist one. A future version of the manuscript, therefore, would benefit from greater precision of language as well as an economy and greater focus of terms linking the method to the biology. This is particularly the case around the exposition parts in Figure 1, Figure 2, and the “Hierarchical modelling” section.”

      This is another important point. We have now made numerous small changes to tighten up the text in the paper, in response to both this point and the next point.

      (2) “Language throughout could be sharpened. Subjectivity like “surprising outliers” could be removed and quirky grammar like “often small, ten is a typical” improved. There are also typos “an rate” etc that should be tidied up.”

      As per previous response, we have made numerous tweaks and small improvements and feel that the paper is stronger in this respect.

      (3) “Figure 1 caption. “It is a spectrum that depends” Is spectrum the right word here? Also, “thicker stroke” what does this refer to? Wasn’t immediately clear. In A, why is the whole animal within the R bracket that signifies brain regions, and then the brain regions are within the N bracket that signifies whole animals? Apart from the teal colouring, what are the other coloured regions in the image referring to? Improving this first figure would greatly help a reader unfamiliar with the context of the approach.”

      We have replaced the word “spectrum” with “continuum”. We have replaced “ Observed quantities have been highlighted with a thicker stroke in the graphical model.” with “The observed data quantities, y<sub>i</sub> to y<sub>n</sub>, are highlighted with a thick line in the model diagrams”. We have added the following text to describe the red and green lines in panel A: “green and red lines indicate regions labeled as damaged”.

      (4) “On P2 there is no discussion of priors when running through the advantage of the Bayesian approach. Is this a choice or an oversight? Priors do have a role in the later analysis.”

      A short additional paragraph has been added to the introduction outlining the advantage of having a prior, but also noting that the obligation to pick a prior can be intimidating and that suggesting priors is one of the contributions of our paper: “A Bayesian model also includes a set of probability distributions, referred to as the prior, which represent those beliefs it is reasonable to hold about the statistical model parameters before actually doing the experiment. The prior can be thought of as an advantage, it allows us to include in our analysis our understanding of the data based on previous experiments. The prior also makes explicit in a Bayesian model assumptions that are often implicit in other approaches. However, having to design priors is often considered a challenge and here we hope to make this more straightforward by suggesting priors that are suitable for this class of data.”

      (5) “On P4 more explanation would help greatly. Formulas like 23*10*4 or 50*6+50*4 are presented without explanation. What are the various numbers being multiplied? Regions, animals? Again, a clearer link between biological data and model structure would be advantageous.”

      We have now modified this line to clearly state the numbers’ sources: “The index i runs over the full set of samples, which in this case comprises 23 brain regions ×10 animals ×4 groups ≈920 datapoints in the first study, and 50 brain regions × 6 HET animals + 50 brain regions × 4 KO animals ≈500 datapoints in the second.”

      (6) “P6 and Results. Is it possible to show examples of the data set sampled from? Perhaps an image or two for the two experiments. Both Figures 4 and 5 as they currently are could be made slightly smaller to provide space for a small explanatory sub-panel. This would help ground the results.”

      This is a good idea. We have now added heatmap visualisations of both entire datasets to revised versions of Figures 4 and 5 (assuming that this is what the reviewer was suggesting).

    1. eLife Assessment

      Using single-cell transcriptomic data from adult mouse inner ear hair cells, the authors identify the differences and similarities of the four hair cell types. They make an important finding: that vestibular hair cells can express many ciliary motility-related genes. Some hair cell kinocilia display motility, suggesting that the kinocilium of vestibular hair cells may function as an active force generator to increase sensitivity. The evidence is incomplete as to whether all kinocilia beat and what the function of kinocilia movement is.

    2. Reviewer #1 (Public review):

      Summary

      Xu et al. use transcriptomic comparisons of mouse cochlear and vestibular hair to show that the vestibular hair cells alone are enriched in gene expression for proteins necessary for cilia motility and to further argue that such motility is a normal function of the kinocilia.

      Background:

      Cilia are prominent in sensory receptors, including vertebrate photoreceptors, olfactory neurons, and mechanosensitive hair cells of the inner ear and lateral line. Cilia can be motile or nonmotile depending on their axonemal structure: motile cilia require dynein and the inner 2 singlet microtubules of the 9+2 array. Primary cilia, present early in development, are considered to have sensory functions and to be nonmotile (Mill et al., Nature Rev Gen 2023).

      In hair cells, the kinocilium anchors and polarizes the mechanosensitive hair bundle of specialized microvilli. The kinocilium matures from the primary cilium of a newborn hair cell; behind it, the bundle of mechanosensory microvilli rises in a descending staircase of rows. During maturation of the mammalian cochlea, all hair cells lose the kinocilium, though not the associated basal body. The consensus for many years has been that most vertebrate kinocilia, and especially mammalian kinocilia, are nonmotile, based largely on the lack of spontaneous motility in excised mammalian vestibular organs, but also on the impression that the rare examples of spontaneous beating motility even in non-mammalian hair cells are associated with deterioration of the preparation (Rüsch & Thurm 1990).

      Strengths

      In comparing RNA expression across the 4 major types of mouse hair cells - 2 cochlear and 2 vestibular - Xu et al. noted that some ciliary genes related to motility are expressed by vestibular but not cochlear hair cells. They curated the ciliary genes into types known to be associated with different aspects of beating motility, and also investigated the expression of genes typical of primary cilia, which are considered to have sensory and cell signaling functions and to be nonmotile. They add immunostaining to back up some of the RNA data, and also evaluate relative expression by neonatal mouse cochlear and vestibular hair cells from a published dataset. The focus on kinociliary genes is an appropriate use of the comparative expression data for cochlear and vestibular hair cells, and the paper overall is readable and interesting. The transcriptome data are rounded off by comparing the authors' results in adult hair cells with published neonatal mouse cochlear and vestibular transcriptomes.

      Weaknesses:

      (1) Data:

      a) The main weakness in the data is the lack of functional and anatomical data from mouse hair bundles. While the authors compensate in part for this difficulty with bullfrog crista bundles, those data are also fragmentary - one TEM and 2 exemplar videos. Much of the novelty of the EM depends on the different appearance of stretches of a single kinocilium - can we be sure of the absence of the central microtubule singlets at the ends?

      b) While it was a good idea to compare ciliary motility expression in published P2 datasets for mouse cochlear and vestibular hair cells for comparison with the authors' adult hair cell data, the presentation is too superficial to assess (Figure 6C-E; text from line 336) - it is hard to see the basis for concluding that motility genes are specifically lower in P2 cochlear hair cells than vestibular hair cells. Visually, it is striking that CHCs have much darker bands for about 10 motility-related genes.

      (2) Interpretation:

      The authors take the view that kinociliary motility is likely to be normally present but is rare in their observations because the conditions are not right. But while others have described some (rare) kinociliary motility in fish organs (Rusch & Thurm 1990), they interpreted its occurrence as a sign of pathology. Indeed, in this paper, it is not clear, or even discussed, how kinociliary motility would help with mechanosensitivity in mature hair bundles. Rather, the presence of an autonomous rhythm would actively interfere with generating temporally faithful representations of the head motions that drive vestibular hair cells.

      Could kinociliary beating play other roles, possibly during development - for example, by interacting with forming accessory structures (but see Whitfield 2020) or by activating mechanosensitivity cell-autonomously, before mature stimulation mechanisms are in place? Then a latent capacity to beat in mature vestibular hair cells might be activated by stressful conditions, as speculated regarding persistent Piezo channels that are normally silent in mature cochlear hair cells but may reappear when TMC channel gating is broken (Beurg and Fettiplace 2017). While these are highly speculative thoughts, there is a need in the paper for more nuanced consideration of whether the observed motility is normal and what good it would do.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors compared the transcriptomes of the various types of hair cells contained in the sensory epithelia of the cochlea and vestibular organs of the mouse inner ear. The analysis of their transcriptomic data led to novel insights into the potential function of the kinocilium.

      Strengths:

      The novel findings for the kinocilium gene expression, along with the demonstration that some kinocilia demonstrate rhythmic beating as would be seen for known motile cilia, are fascinating. It is possible that perhaps the kinocilium, known to play a very important role in the orientation of the stereocilia, may have a gene expression pattern that is more like a primary cilium early in development and later in mature hair cells, more like a motile cilium. Since the kinocilium is retained in vestibular hair cells, it makes sense that it is playing a different role in these mature cells than its role in the cochlea.

      Another major strength of this study, which cannot be overstated, is that for the transcriptome analysis, they are using mature mice. To date, there is a lot of data from many labs for embryonic and neonatal hair cells, but very little transcriptomic data on the mature hair cells. They do a nice job in presenting the differences in marker gene expression between the 4 hair cell types. This information is very useful to those labs studying regeneration or generation of hair cells from ES cell cultures. One of the biggest questions these labs confront is what type of hair cells develop in these systems. The more markers available, the better. These data will also allow researchers in the field to compare developing hair cells with mature hair cells to see what genes are only required during development and not in later functioning hair cells.

    4. Author response:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) Data:

      a) The main weakness in the data is the lack of functional and anatomical data from mouse hair bundles. While the authors compensate in part for this difficulty with bullfrog crista bundles, those data are also fragmentary - one TEM and 2 exemplar videos. Much of the novelty of the EM depends on the different appearance of stretches of a single kinocilium - can we be sure of the absence of the central microtubule singlets at the ends?

      Our single-cell RNA-seq findings show that genes related to motile cilia are specifically expressed in vestibular hair cells. This has not been demonstrated before. We have also provided supporting evidence using electrophysiology and imaging from bullfrogs and mice. Although no ultrastructural images of mouse vestibular kinocilia were provided in our study, transmission electron micrograph of mouse vestibular kinocilia has been published (O’Donnell and Zheng, 2022). The mouse vestibular kinocilia have a “9+2” microtubule configuration with nine doublet microtubules surrounding two central singlet microtubules. This finding contrasts with a previous study, which demonstrated that the vestibular kinocilia from guinea pigs lack central singlet microtubules and inner dynein arms, whereas outer dynein arms and radial spokes are present (Kikuchi et al., 1989). The central pair of microtubules is absent at the end of the bullfrog saccular kinocilium (Fig. 7A).  We would like to point out that the dual identity of primary and motile cilia is not just based on the TEM images. The kinocilium has long been considered a specialized cilium, and its role as a primary cilium during development has been demonstrated before (Moon et al., 2020; Shi et al., 2022).  

      In most motile cilia, the central pair complex (CPC) does not originate directly from the basal body; instead, it begins a short distance above the transition zone, a feature that already illustrates variation in CPC assembly across systems (Lechtreck et al., 2013). The CPC can also show variation in its spatial extent: for example, in mammalian sperm axonemes, it can terminate before reaching the distal end of the axoneme (Fawcett and Ito, 1965). In addition, CPC orientation differs across organisms: in metazoans and Trypanosoma, the CPC is fixed relative to the outer doublets, whereas in Chlamydomonas and ciliates it twists within the axoneme (Lechtreck et al., 2013). Such variation has been described in multiple motile cilia and flagella and is therefore not unique to vestibular kinocilia. What appears more unusual in our data is the organization at the distal tip, where a distinct distal head is present, similar to cilia tip morphologies recently described in human islet cells (Polino et al., 2023). Although this feature is intriguing, we interpret it primarily as a structural signature rather than as evidence for a specialized motile adaptation, and we will moderate our interpretation accordingly in the revision.

      b) While it was a good idea to compare ciliary motility expression in published P2 datasets for mouse cochlear and vestibular hair cells for comparison with the authors' adult hair cell data, the presentation is too superficial to assess (Figure 6C-E; text from line 336) - it is hard to see the basis for concluding that motility genes are specifically lower in P2 cochlear hair cells than vestibular hair cells. Visually, it is striking that CHCs have much darker bands for about 10 motility-related genes.

      We aimed to show that kinocilia in neonatal cochlear and vestibular hair cells are largely similar, except that neonatal cochlear hair cells lack key genes and proteins required for the motile apparatus. While these genes (e.g., Dynll1, Dynll2, Dynlrb1, Cetn2, and Mdh1) appear more highly expressed in P2 cochlear hair cells, they are not uniquely associated with the axoneme. For example, Dynll1/2 and Dynlrb1 are components of the cytoplasmic dynein-1 complex (Pfister et al., 2006), Cetn2 has multiple basic cellular functions beyond cilia (e.g., centrosome organization, DNA repair), and Mdh1 encodes a cytosolic malate dehydrogenase involved in central metabolic pathways such as the citric acid cycle and malate–aspartate shuttle. This contrasts with axonemal dyneins, which are uniquely required for cilia motility. To avoid ambiguity, we will mark such cytoplasmic or multifunctional genes with stars in both Figure 5G and Figure 6D together with legend in the revised manuscript.

      Although those genes (i.e., Dynll1, Dynll2, Dynlrb1, Cetn2, and Mdh1) are highly expressed in neonatal cochlear hair cells, key genes for motile machinery are not detected. For example, Dnah6, Dnah5, and Wdr66 are not expressed in the P2 cochlear hair cells.  Dnah6 and Dnah5 encode axonemal dynein and are part of inner and outer dynein arms while Wdr66 is a component of radial spokes. Importantly, we did not detect the expression of CCDC39 and CCDC40 in kinocilia of P2 cochlear hair cells.  Axonemal CCDC39 and CCDC40 are the molecular rulers that organize the axonemal structure in the 96-nm repeating interactome and are required for the assembly of IDAs and N-DRC for ciliary motility (Becker-Heck et al., 2011; Merveille et al., 2011; Oda et al., 2014). We will modify Figure 6D to highlight the key difference between P2 cochlear and vestibular hair cells in the revised manuscript. We will also revise the text so that the key differences will clearly be described.

      (2) Interpretation:

      The authors take the view that kinociliary motility is likely to be normally present but is rare in their observations because the conditions are not right. But while others have described some (rare) kinociliary motility in fish organs (Rusch & Thurm 1990), they interpreted its occurrence as a sign of pathology. Indeed, in this paper, it is not clear, or even discussed, how kinociliary motility would help with mechanosensitivity in mature hair bundles. Rather, the presence of an autonomous rhythm would actively interfere with generating temporally faithful representations of the head motions that drive vestibular hair cells.

      Spontaneous flagella-like rhythmic beating of kinocilia in vestibular HCs in frogs and eels (Flock et al., 1977; Rüsch and Thurm, 1990) and in zebrafish early otic vesicle (Stooke-Vaughan et al., 2012; Wu et al., 2011) has been reported previously. Based on Rüsch and Thurm (1990), spontaneous kinocilia motility occurred under non-physiological conditions and was interpreted as a sign of cellular deterioration rather than a normal feature. We speculate that deterioration under non-physiological conditions may lead to the disruption of lateral links between the kinocilium and the stereociliary bundle, effectively unloading the kinocilium and allowing it to move more freely. Additionally, fluctuations in intracellular ATP levels may contribute, as ciliary motility is highly ATP-dependent; when ATP is depleted, beating ceases. Similar phenomena have been documented in respiratory epithelia, where ciliary activity can temporarily pause. Nevertheless, the fact that kinocilia can exhibit spontaneous motility under these conditions indicates that they possess the motile machinery necessary for such beating. Irrespective of the condition, cilia without the molecular machinery required for motility will not be able to move.

      We agree with the reviewer that, based on the present data, it is difficult to know the functional role of kinocilia and whether the presence of such autonomous rhythm would interfere with temporal fidelity. Spontaneous bundle motion, driven by the active process associated with mechanotransduction, was observed in bullfrog saccular hair cells (Benser et al., 1996; Martin et al., 2003). We will revise the discussion to clarify this important point of the reviewer. Specifically, we will emphasize that our observations of ciliary beating in the ex vivo conditions may not reflect its properties in the mature in vivo context, but rather a byproduct of motile machinery clearly present in the kinocilia. We speculate that this machinery in mature hair cells could operate in a more subtle mode—modulating the rigor state of dynein arms or related axonemal structures to influence kinociliary mechanics and, in turn, bundle stiffness in response to stimuli or signaling cues. Such a mechanism could either enhance sensitivity or introduce filtering properties, thereby contributing to the fine control of mechanosensory function without compromising temporal fidelity. Future studies using loss-of-function approach will be needed to reveal the unexplored role(s) of kinocilia for vestibular hair cells in vertebrates. 

      Could kinociliary beating play other roles, possibly during development - for example, by interacting with forming accessory structures (but see Whitfield 2020) or by activating mechanosensitivity cell-autonomously, before mature stimulation mechanisms are in place? Then a latent capacity to beat in mature vestibular hair cells might be activated by stressful conditions, as speculated regarding persistent Piezo channels that are normally silent in mature cochlear hair cells but may reappear when TMC channel gating is broken (Beurg and Fettiplace 2017). While these are highly speculative thoughts, there is a need in the paper for more nuanced consideration of whether the observed motility is normal and what good it would do.

      We thank the reviewer for these excellent suggestions. We agree that kinociliary motility could plausibly serve roles during development, for example by guiding hair bundle formation or by contributing to early mechanosensitivity and spontaneous activity before mature stimulation mechanisms are established. It is also possible that the motility machinery represents a latent capacity in mature vestibular hair cells that could be reactivated under stress or pathological conditions. We will revise the Discussion to address these possibilities and to provide a more nuanced consideration of whether the observed motility is normal and what potential functions it might serve.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors compared the transcriptomes of the various types of hair cells contained in the sensory epithelia of the cochlea and vestibular organs of the mouse inner ear. The analysis of their transcriptomic data led to novel insights into the potential function of the kinocilium.

      Strengths:

      The novel findings for the kinocilium gene expression, along with the demonstration that some kinocilia demonstrate rhythmic beating as would be seen for known motile cilia, are fascinating. It is possible that perhaps the kinocilium, known to play a very important role in the orientation of the stereocilia, may have a gene expression pattern that is more like a primary cilium early in development and later in mature hair cells, more like a motile cilium. Since the kinocilium is retained in vestibular hair cells, it makes sense that it is playing a different role in these mature cells than its role in the cochlea.

      Another major strength of this study, which cannot be overstated, is that for the transcriptome analysis, they are using mature mice. To date, there is a lot of data from many labs for embryonic and neonatal hair cells, but very little transcriptomic data on the mature hair cells. They do a nice job in presenting the differences in marker gene expression between the 4 hair cell types. This information is very useful to those labs studying regeneration or generation of hair cells from ES cell cultures. One of the biggest questions these labs confront is what type of hair cells develop in these systems. The more markers available, the better. These data will also allow researchers in the field to compare developing hair cells with mature hair cells to see what genes are only required during development and not in later functioning hair cells.

      We would like to thank reviewer 2 for his/her comments and hope that the datasets provided in this manuscript will be a useful resource for researchers in the auditory and vestibular neuroscience community.

      Joint Recommendations:

      We will make changes in the revision based on the joint recommendations of the two reviewers.

      References

      Becker-Heck, A., Zohn, I.E., Okabe, N., Pollock, A., Lenhart, K.B., Sullivan-Brown, J., McSheene, J., Loges, N.T., Olbrich, H., Haeffner, K., Fliegauf, M., Horvath, J., Reinhardt, R., Nielsen, K.G., Marthin, J.K., Baktai, G., Anderson, K.V., Geisler, R., Niswander, L., Omran, H., Burdine, R.D., 2011. The coiled-coil domain containing protein CCDC40 is essential for motile cilia function and left-right axis formation. Nat Genet 43, 79–84. https://doi.org/10.1038/ng.727

      Benser, M.E., Marquis, R.E., Hudspeth, A.J., 1996. Rapid, Active Hair Bundle Movements in Hair Cells from the Bullfrog’s Sacculus. J. Neurosci. 16, 5629–5643. https://doi.org/10.1523/JNEUROSCI.16-18-05629.1996

      Fawcett, D.W., Ito, S., 1965. The fine structure of bat spermatozoa. American Journal of Anatomy 116, 567–609. https://doi.org/10.1002/aja.1001160306

      Flock, Å., Flock, B., Murray, E., 1977. Studies on the Sensory Hairs of Receptor Cells in the Inner Ear. Acta Oto-Laryngologica 83, 85–91. https://doi.org/10.3109/00016487709128817

      Kikuchi, T., Takasaka, T., Tonosaki, A., Watanabe, H., 1989. Fine structure of guinea pig vestibular kinocilium. Acta Otolaryngol 108, 26–30.https://doi.org/10.3109/00016488909107388

      Lechtreck, K.-F., Gould, T.J., Witman, G.B., 2013. Flagellar central pair assembly in Chlamydomonas reinhardtii. Cilia 2, 15. https://doi.org/10.1186/2046-2530-2-15

      Martin, P., Bozovic, D., Choe, Y., Hudspeth, A.J., 2003. Spontaneous Oscillation by Hair Bundles of the Bullfrog’s Sacculus. J. Neurosci. 23, 4533–4548. https://doi.org/10.1523/JNEUROSCI.23-11-04533.2003

      Merveille, A.-C., Davis, E.E., Becker-Heck, A., Legendre, M., Amirav, I., Bataille, G., Belmont, J., Beydon, N., Billen, F., Clément, A., Clercx, C., Coste, A., Crosbie, R., de Blic, J., Deleuze, S., Duquesnoy, P., Escalier, D., Escudier, E., Fliegauf, M., Horvath, J., Hill, K., Jorissen, M., Just, J., Kispert, A., Lathrop, M., Loges, N.T., Marthin, J.K., Momozawa, Y., Montantin, G., Nielsen, K.G., Olbrich, H., Papon, J.-F., Rayet, I., Roger, G., Schmidts, M., Tenreiro, H., Towbin, J.A., Zelenika, D., Zentgraf, H., Georges, M., Lequarré, A.-S., Katsanis, N., Omran, H., Amselem, S., 2011. CCDC39 is required for assembly of inner dynein arms and the dynein regulatory complex and for normal ciliary motility in humans and dogs. Nat Genet 43, 72–78. https://doi.org/10.1038/ng.726

      Moon, K.-H., Ma, J.-H., Min, H., Koo, H., Kim, H., Ko, H.W., Bok, J., 2020. Dysregulation of sonic hedgehog signaling causes hearing loss in ciliopathy mouse models. eLife 9, e56551. https://doi.org/10.7554/eLife.56551

      Oda, T., Yanagisawa, H., Kamiya, R., Kikkawa, M., 2014. A molecular ruler determines the repeat length in eukaryotic cilia and flagella. Science 346, 857–860. https://doi.org/10.1126/science.1260214

      O’Donnell, J., Zheng, J., 2022. Vestibular Hair Cells Require CAMSAP3, a Microtubule Minus-End Regulator, for Formation of Normal Kinocilia. Front Cell Neurosci 16, 876805. https://doi.org/10.3389/fncel.2022.876805

      Pfister, K.K., Shah, P.R., Hummerich, H., Russ, A., Cotton, J., Annuar, A.A., King, S.M., Fisher, E.M.C., 2006. Genetic Analysis of the Cytoplasmic Dynein Subunit Families. PLOS Genetics 2, e1. https://doi.org/10.1371/journal.pgen.0020001

      Polino, A.J., Sviben, S., Melena, I., Piston, D.W., Hughes, J.W., 2023. Scanning electron microscopy of human islet cilia. Proceedings of the National Academy of Sciences 120, e2302624120. https://doi.org/10.1073/pnas.2302624120

      Rüsch, A., Thurm, U., 1990. Spontaneous and electrically induced movements of ampullary kinocilia and stereovilli. Hearing Research 48, 247–263. https://doi.org/10.1016/0378-5955(90)90065-W

      Shi, H., Wang, H., Zhang, C., Lu, Y., Yao, J., Chen, Z., Xing, G., Wei, Q., Cao, X., 2022. Mutations in OSBPL2 cause hearing loss associated with primary cilia defects via sonic hedgehog signaling [WWW Document]. https://doi.org/10.1172/jci.insight.149626

      Stooke-Vaughan, G.A., Huang, P., Hammond, K.L., Schier, A.F., Whitfield, T.T., 2012. The role of hair cells, cilia and ciliary motility in otolith formation in the zebrafish otic vesicle. Development 139, 1777–1787. https://doi.org/10.1242/dev.079947

      Wu, D., Freund, J.B., Fraser, S.E., Vermot, J., 2011. Mechanistic Basis of Otolith Formation during Teleost Inner Ear Development. Developmental Cell 20, 271–278. https://doi.org/10.1016/j.devcel.2010.12.00

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      The authors aim to explore the effects of the electrogenic sodium-potassium pump (Na<SUP>+</SUP>/K<SUP>+</SUP>ATPase) on the computational properties of highly active spiking neurons, using the weakly-electric fish electrocyte as a model system. Their work highlights how the pump's electrogenicity, while essential for maintaining ionic gradients, introduces challenges in neuronal firing stability and signal processing, especially in cells that fire at high rates. The study identifies compensatory mechanisms that cells might use to counteract these effects, and speculates on the role of voltage dependence in the pump's behavior, suggesting that Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase could be a factor in neuronal dysfunctions and diseases

      Strengths:

      (1) The study explores a less-examined aspect of neural dynamics-the effects of Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase electrogenicity. It offers a new perspective by highlighting the pump's role not only in ion homeostasis but also in its potential influence on neural computation.

      (2) The mathematical modeling used is a significant strength, providing a clear and controlled framework to explore the effects of the Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase on spiking cells. This approach allows for the systematic testing of different conditions and behaviors that might be difficult to observe directly in biological experiments.

      (3) The study proposes several interesting compensatory mechanisms, such as sodium leak channelsand extracellular potassium buffering, which provide useful theoretical frameworks for understanding how neurons maintain firing rate control despite the pump's effects.

      Weaknesses:

      (1) While the modeling approach provides valuable insights, the lack of experimental data to validate the model's predictions weakens the overall conclusions.

      (2)The proposed compensatory mechanisms are discussed primarily in theoretical terms without providing quantitative estimates of their impact on the neuron's metabolic cost or other physiological parameters.

      Comments on revisions:

      The revised manuscript is notably improved.

      We thank the reviewer for their concise and accurate summary and appreciate the constructive feedback on the article’s strengths and weaknesses. Experimental work is beyond the scope of our modeling-based study. However, we would like our work to serve as a framework for future experimental studies into the role of the electrogenic pump current (and its possible compensatory currents) in disease, and its role in evolution of highly specialized excitable cells (such as electrocytes).

      Quantitative estimates of metabolic costs in this study are limited to the ATP that is required to fuel the Na<SUP>+</SUP>/K<SUP>+</SUP> pump. By integrating the net pump current over time and dividing by one elemental charge, one can find the rate of ATP that is consumed by the Na<SUP>+</SUP>/K<SUP>+</SUP> pump for either compensatory mechanism. The difference in net pump current is thus proportional to ATP consumption, which allows for a direct comparison of the cost efficiency of the Na<SUP>+</SUP>/K<SUP>+</SUP> pump for each proposed compensatory mechanism. The Na<SUP>+</SUP>/K<SUP>+</SUP> pump is however not the only ATP-consuming element in the electrocyte, and some of the compensatory mechanisms induce other costs related to cell ‘housekeeping’ or presynaptic processes. We now added a section in the appendix titled ‘Considerations on metabolic costs of compensatory mechanisms’ (section 11.4), where we provide rough estimates on the influence of the compensatory mechanisms on the total metabolic costs of the cell and membrane space occupation. Although we argue that according these rough estimates, the impact of discussed compensatory mechanisms could be significant, due to the absence of more detailed experimental quantification, a plausible quantitative cost estimate on the whole cell level remains beyond the scope of this article.

      Reviewer #1 (Recommendations for the authors):

      I just have a few recommendations on the updated manuscript.

      (1) When exploring the different roles of Na<SUP>+</SUP>/K<SUP>+</SUP>-ATPase in the Results section, the authors employed many different models. For instance, the voltage equation on page 15, voltage equation (2) on page 22, voltage equation (12) on page 24, voltage equation (30) on page 32, and voltage equation (38) on page 35 are presented as the master equations for their respective biophysical models. Meanwhile, the phase models are presented on page 29 and page 33. I would recommend that the authors clearly specify which equations correspond to each subsection of the Results section and explicitly state which equations were used to generate the data in each figure. This would help readers more easily follow the connections between the models, the results, and the figures.

      We thank the reviewer for pointing out that the links of the different voltage equations to the results could be expressed more explicitly in the article. All simulations were done using the ‘master equation’  expressed in Eq. 2, and the other voltage equations that are specified in the article (in the new version of the article Eqs. 13, 31, and 39) are reformulations of Eq. 2 to analytically show different properties of the voltage equation (Eq. 2). This has now been mentioned in the article when formulating the voltage equations, and the equation for the total leak current (in the new version Eq. 3) has been added for completeness.

      (2) The authors may want to revisit their description and references concerning Eigenmannia virescens. For example, wave-type weakly electric fish (e.g., Eigenmannia) and pulse-type weakly electric fish (e.g., Gymnotus carapo) exhibit large differences, making references 52-55 may be inappropriate for subsection 4.3.1, as these studies focus on Gymnotus carapo. Additionally, even within wave-type species, chirp patterns vary. For example, Eigenmannia can exhibit short "pauses"-type chirps, whereas Apteronotus leptorhynchus (another waver-form fish) does not (https://pubmed.ncbi.nlm.nih.gov/14692494/).

      We thank the reviewer for pointing this out. The citations and phrasing in sections 4.3.1 and 4.3.2 have been updated to specifically refer to the weakly electric fish e. Virescens.

      (3) Table on page 21: Please explain why the parameter value (13.5mM) of [Na<SUP>^</SUP>+]_{in} is 10 timeslarger than its value (1.35mM) in reference [26]? How does this value (13.5mM) compare with the range of variable [Na<SUP>^</SUP>+]_{in} in equation (6)?

      The intracellular sodium concentration in reference [26] was reported to be 1.35 mM, but the authors also reported an extracellular sodium concentration of 120 mM, and a sodium reversal potential of 55 mV. Upon calculating the sodium reversal potential, we found that an intracellular sodium concentration of 1.35 mM would give a sodium reversal potential of 113 mV. An intracellular sodium concentration of 13.5 mM, on the other hand, leads to the reported and physiological reversal potential of 55 mV. This has now been clarified in the article, and the connection between this value and Eq. 6 (Eq. 7 in the new version) has also been clarified.

      Reviewer #2 (Public review):

      Summary:

      The paper by Weerdmeester, Schleimer, and Schreiber uses computational models to present the biological constraints under which electrocytes - specialized, highly active cells that facilitate electro-sensing in weakly electric fish-may operate. The authors suggest potential solutions that these cells could employ to circumvent these constraints.

      Electrocytes are highly active or spiking (greater than 300Hz) for sustained periods (for minutes to hours), and such activity is possible due to an influx of sodium and efflux of potassium ions into these cells after each spike. The resulting ion imbalance must be restored, which in electrocytes, as with many other biological cells, is facilitated by the Na-K pumps at the expense of biological energy, i.e., ATP molecules. For each ATP molecule the pump uses, three positively charged sodium ions from the intracellular space are exchanged for two positively charged potassium ions from the extracellular space. This creates a net efflux of positive ions into the extracellular space, resulting in hyperpolarized potentials for the cell over time. For most cells, this does not pose an issue, as their firing rate is much slower, and other compensatory mechanisms and pumps can effectively restore the ion imbalances. However, in the electrocytes of weakly electric fish, which spike at exceptionally high rates, the net efflux of positive ions presents a challenge. Additionally, these cells are involved in critical communication and survival behaviors, underscoring their essential role in reliable functioning.

      In a computational model, the authors test four increasingly complex solutions to the problem of counteracting the hyperpolarized states that occur due to continuous NaK pump action to sustain baseline activity. First, they propose a solution for a well-matched Na leak channel that operates in conjunction with the NaK pump, counteracting the hyperpolarizing states naturally. Their model shows that when such an orchestrated Na leak current is not included, quick changes in the firing rates could have unexpected side effects. Secondly, they study the implications of this cell in the context of chirps-a means of communication between individual fish. Here, an upstream pacemaking neuron entrains the electrocyte to spike, which ceases to produce a so-called chirp - a brief pause in the sustained activity of the electrocytes. In their model, the authors demonstrate that including the extracellular potassium buffer is necessary to obtain a reliable chirp signal. Thirdly, they tested another means of communication in which there was a sudden increase in the firing rate of the electrocyte, followed by a decay to the baseline. For this to occur reliably, the authors emphasize that a strong synaptic connection between the pacemaker neuron and the electrocyte is necessary. Finally, since these cells are energy-intensive, they hypothesize that electrocytes may have energy-efficient action potentials, for which their NaK pumps may be sensitive to the membrane voltages and perform course correction rapidly.

      Strengths:

      The authors extend an existing electrocyte model (Joos et al., 2018) based on the classical Hodgkin and Huxley conductance-based models of sodium and potassium currents to include the dynamics of the sodium-potassium (NaK) pump. The authors estimate the pump's properties based on reasonable assumptions related to the leak potential. Their proposed solutions are valid and may be employed by weakly electric fish. The authors explore theoretical solutions to electrosensing behavior that compound and suggest that all these solutions must be simultaneously active for the survival and behavior of the fish. This work provides a good starting point for conducting in vivo experiments to determine which of these proposed solutions the fish employ and their relative importance. The authors include testable hypotheses for their computational models.

      Weaknesses:

      The model for action potential generation simplifies ion dynamics by considering only sodium and potassium currents, excluding other ions like calcium. The ion channels considered are assumed to be static, without any dynamic regulation such as post-translational modifications. For instance, a sodium-dependent potassium pump could modulate potassium leak and spike amplitude (Markham et al., 2013).

      This work considers only the sodium-potassium (NaK) pumps to restore ion gradients. However, in many cells, several other ion pumps, exchangers, and symporters are simultaneously present and actively participate in restoring ion gradients. When sodium currents dominate action potentials, and thus when NaK pumps play a critical role, such as the case in Eigenmannia virescens, the present study is valid. However, since other biological processes may find different solutions to address the pump's non-electroneutral nature, the generalizability of the results in this work to other fast-spiking cell types is limited. For example, each spike could include a small calcium ion influx that could be buffered or extracted via a sodium-calcium exchanger.

      We thank the reviewer for the detailed summary and the updated identified strengths and weaknesses. The current article indeed focuses on and isolates the interplay between sodium currents, potassium currents, and sodium-potassium pump currents. As discussed in section 5.1, in excitable cells where these currents are the main players in action-potential generation, the results presented in this article are applicable. The contribution of post-translational effects of ion channels, other ionic currents, and other active transporters and pumps, could be exciting avenues for further studies

      .

      Reviewer #2 (Recommendations for the authors):

      Thank you for addressing my comments.

      All the figures are now consistent. The color schema used is clear.

      The methods and discussions expansions improve the paper.

      Including the model assumptions and simplifications is appreciated.

      Including internal references is helpful.

      The equations are clear, and the references have been fixed.

      I am content with the changes. I have updated my review accordingly.

      We thank the reviewer for their initial constructive comments that lead to the significant improvement of the article.

      Page : 3 Line : 113 Author : Unknown Author 07/24/2025 

      Although this is technically correct, the article is about electrocommunication signals and does not focus on sensing.

      Page : 3 Line : 153 Author : Unknown Author 07/24/2025

      electrocommunication

      Page : 4 Line : 164 Author : Unknown Author 07/24/2025 

      Judging from the cited article, I think this should be a sodium-dependent potassium current.

    2. Reviewer #2 (Public review):

      Summary:

      The paper by Weerdmeester, Schleimer, and Schreiber uses computational models to present the biological constraints under which electrocytes - specialized, highly active cells that facilitate electro-sensing in weakly electric fish-may operate. The authors suggest potential solutions that these cells could employ to circumvent these constraints.

      Electrocytes are highly active or spiking (greater than 300Hz) for sustained periods (for minutes to hours), and such activity is possible due to an influx of sodium and efflux of potassium ions into these cells after each spike. The resulting ion imbalance must be restored, which in electrocytes, as with many other biological cells, is facilitated by the Na-K pumps at the expense of biological energy, i.e., ATP molecules. For each ATP molecule the pump uses, three positively charged sodium ions from the intracellular space are exchanged for two positively charged potassium ions from the extracellular space. This creates a net efflux of positive ions into the extracellular space, resulting in hyperpolarized potentials for the cell over time. For most cells, this does not pose an issue, as their firing rate is much slower, and other compensatory mechanisms and pumps can effectively restore the ion imbalances. However, in the electrocytes of weakly electric fish, which spike at exceptionally high rates, the net efflux of positive ions presents a challenge. Additionally, these cells are involved in critical communication and survival behaviors, underscoring their essential role in reliable functioning.

      In a computational model, the authors test four increasingly complex solutions to the problem of counteracting the hyperpolarized states that occur due to continuous NaK pump action to sustain baseline activity. First, they propose a solution for a well-matched Na leak channel that operates in conjunction with the NaK pump, counteracting the hyperpolarizing states naturally. Their model shows that when such an orchestrated Na leak current is not included, quick changes in the firing rates could have unexpected side effects. Secondly, they study the implications of this cell in the context of chirps-a means of communication between individual fish. Here, an upstream pacemaking neuron entrains the electrocyte to spike, which ceases to produce a so-called chirp - a brief pause in the sustained activity of the electrocytes. In their model, the authors demonstrate that including the extracellular potassium buffer is necessary to obtain a reliable chirp signal. Thirdly, they tested another means of communication in which there was a sudden increase in the firing rate of the electrocyte, followed by a decay to the baseline. For this to occur reliably, the authors emphasize that a strong synaptic connection between the pacemaker neuron and the electrocyte is necessary. Finally, since these cells are energy-intensive, they hypothesize that electrocytes may have energy-efficient action potentials, for which their NaK pumps may be sensitive to the membrane voltages and perform course correction rapidly.

      Strengths:

      The authors extend an existing electrocyte model (Joos et al., 2018) based on the classical Hodgkin and Huxley conductance-based models of sodium and potassium currents to include the dynamics of the sodium-potassium (NaK) pump. The authors estimate the pump's properties based on reasonable assumptions related to the leak potential. Their proposed solutions are valid and may be employed by weakly electric fish. The authors explore theoretical solutions to electrosensing behavior that compound and suggest that all these solutions must be simultaneously active for the survival and behavior of the fish. This work provides a good starting point for conducting in vivo experiments to determine which of these proposed solutions the fish employ and their relative importance. The authors include testable hypotheses for their computational models.

    3. Reviewer #1 (Public review):

      Summary:

      The authors aim to explore the effects of the electrogenic sodium-potassium pump (Na+/K+-ATPase) on the computational properties of highly active spiking neurons, using the weakly-electric fish electrocyte as a model system. Their work highlights how the pump's electrogenicity, while essential for maintaining ionic gradients, introduces challenges in neuronal firing stability and signal processing, especially in cells that fire at high rates. The study identifies compensatory mechanisms that cells might use to counteract these effects, and speculates on the role of voltage dependence in the pump's behavior, suggesting that Na+/K+-ATPase could be a factor in neuronal dysfunctions and diseases

      Strengths:

      (1) The study explores a less-examined aspect of neural dynamics-the effects of Na+/K+-ATPase electrogenicity. It offers a new perspective by highlighting the pump's role not only in ion homeostasis but also in its potential influence on neural computation.

      (2) The mathematical modeling used is a significant strength, providing a clear and controlled framework to explore the effects of the Na+/K+-ATPase on spiking cells. This approach allows for the systematic testing of different conditions and behaviors that might be difficult to observe directly in biological experiments.

      (3) The study several interesting compensatory mechanisms, such as sodium leak channels and extracellular potassium buffering, which provide useful theoretical frameworks for understanding how neurons maintain firing rate control despite the pump's effects.

      Comments on revisions:proposes

      The revised manuscript is notably improved.

    4. eLife Assessment

      This important study provides new insights into the lesser-known effects of the sodium-potassium pump on how nerve cells process signals, particularly in highly active cells like those of weakly electric fish. The computational methods used to establish the claims in this work are compelling and can be used as a starting point for further studies.

    1. eLife Assessment

      This important study presents a sequence-based method for predicting drug-interacting residues in intrinsically disordered proteins (IDPs), addressing a significant challenge in understanding small-molecule:IDP interactions. The findings have solid support through examples underscoring the role of aromatic interactions. While predicted binding sites remain coarse, validation was done on a total of 10 IDPs at varying depths. The method builds on the authors' previous work and, with ad hoc modifications, is poised to benefit this emerging field.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors developed a sequence-based method to predict drug-interacting residues in IDP, based on their recent work, to predict the transverse relaxation rates (R2) of IDP trained on 45 IDP sequences and their corresponding R2 values. The discovery is that the IDPs interact with drugs mostly using aromatic residues that are easy to understand, as most drugs contain aromatic rings. They validated the method using several case studies, and the predictions are in accordance with chemical shift perturbations and MD simulations. The location of the predicted residues serves as a starting point for ligand optimization.

      Strengths:

      This work provides the first sequence-based prediction method to identify potential druginteracting residues in IDP. The validity of the method is supported by case studies. It is easy to use, and no time-consuming MD simulations and NMR studies are needed.

      Weaknesses:

      The method does not depend on the information of binding compounds, which may give general features of IDP-drug binding. However, due to the size and chemical structures of the compounds (for example, how many aromatic rings), the number of interacting residues varies, which is not considered in this work. Lacking specific information may restrict its application in compound optimization, aiming to derive specific and potent binding compounds.

      We fully recognize that different compounds may have different interaction propensity profiles along the IDP sequence. In future studies, we will investigate compound-specific parameter values. The limiting factor is training data, but such data are beginning to be available.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors introduce DIRseq, a fast, sequence-based method that predicts druginteracting residues (DIRs) in IDPs without requiring structural or drug information. DIRseq builds on the authors' prior work looking at NMR relaxation rates, and presumes that those residues that show enhanced R2 values are the residues that will interact with drugs, allowing these residues to be nominated from the sequence directly. By making small modifications to their prior tool, DIRseq enables the prediction of residues seen to interact with small molecules in vivo.

      Strengths:

      The preprint is well written and easy to follow

      Weaknesses:

      (1) The DIRseq method is based on SeqDYN, which itself is a simple (which I do not mean as a negative - simple is good!) statistical predictor for R2 relaxation rates. The challenge here is that R2 rates cover a range of timescales, so the physical intuition as to what exactly elevated R2 values mean is not necessarily consistent with "drug interacting". Presumably, the authors are not using the helix boost component of SeqDYN here (it would be good to explicitly state this). This is not necessarily a weakness, but I think it would behove the authors to compare a few alternative models before settling on the DIRseq method, given the somewhat ad hoc modifications to SeqDYN to get DIRseq.

      Actually, the factors that elevate R2 are well-established. These are local interactions and residual secondary structures (if any). The basic assumption of our method is that intra-IDP interactions that elevate R2 convert to IDP-drug interactions. This assumption was supported by our initial observation that the drug interaction propensity profiles predicted using the original SeqDYN parameters already showed good agreement with CSP profiles. We only made relatively small adjustments to the parameters to improve the agreement. Indeed we did not apply the helix boost portion of SeqDYN to DIRseq, and now state as such (p. 4, second last paragraph). We now also compare DIRseq with several alternative models, as summarized in new Table S2.

      Specifically, the authors previously showed good correlation between the stickiness parameter of Tesei et al and the inferred "q" parameter for SeqDYN; as such, I am left wondering if comparable accuracy would be obtained simply by taking the stickiness parameters directly and using these to predict "drug interacting residues", at which point I'd argue we're not really predicting "drug interacting residues" as much as we're predicting "sticky" residues, using the stickiness parameters. It would, I think, be worth the authors comparing the predictive power obtained from DIRseq with the predictive power obtained by using the lambda coefficients from Tesei et al in the model, local density of aromatic residues, local hydrophobicity (note that Tesei at al have tabulated a large set of hydrophobicity scores!) and the raw SeqDYN predictions. In the absence of lots of data to compare against, this is another way to convince readers that DIRseq offers reasonable predictive power.

      We now compare predictions of these various parameter sets, and report the results in Table S2.  In short, among all the tested parameter sets, DIRseq has the best performance as measured by (1) strong correlations between prediction scores and CSPs and (2) high true positives and low false positives (p. 7-9).

      (2) Second, the DIRseq is essentially SeqDYN with some changes to it, but those changes appear somewhat ad hoc. I recognize that there is very limited data, but the tweaking of parameters based on physical intuition feels a bit stochastic in developing a method; presumably (while not explicitly spelt out) those tweaks were chosen to give better agreement with the very limited experimental data (otherwise why make the changes?), which does raise the question of if the DIRseq implementation of SeqDYN is rather over-parameterized to the (very limited) data available now? I want to be clear, the authors should not be critiqued for attempting to develop a model despite a paucity of data, and I'm not necessarily saying this is a problem, but I think it would be really important for the authors to acknowledge to the reader the fact that with such limited data it's possible the model is over-fit to specific sequences studied previously, and generalization will be seen as more data are collected.

      We have explained the rationale for the parameter tweaks, which were limited to q values for four amino-acid types, i.e., to deemphasize hydrophobic interactions and slightly enhance electrostatic interactions (p. 4-5). We now add that these tweaks were motivated by observations from MD simulations of drug interactions with a-syn (ref 13). As already noted in the response to the preceding comment, we now also present results for the original parameter values as well as for when the four q values are changed one at a time.

      (3) Third, perhaps my biggest concern here is that - implicit in the author's assumptions - is that all "drugs" interact with IDPs in the same way and all drugs are "small" (motivating the change in correlation length). Prescribing a specific length scale and chemistry to all drugs seems broadly inconsistent with a world in which we presume drugs offer some degree of specificity. While it is perhaps not unexpected that aromatic-rich small molecules tend to interact with aromatic residues, the logical conclusion from this work, if one assumes DIRseq has utility, is that all IDRs bind drugs with similar chemical biases. This, at the very least, deserves some discussion.

      The reviewer raises a very important point. In Discussion, we now add that it is important to further develop DIRseq to include drug-specific parameters when data for training become available (p. 12-13). To illustrate this point, we use drug size as a simple example, which can be modeled by making the b parameter dependent on drug molecule size.

      (4) Fourth, the authors make some general claims in the introduction regarding the state of the art, which appear to lack sufficient data to be made. I don't necessarily disagree with the author's points, but I'm not sure the claims (as stated) can be made absent strong data to support them. For example, the authors state: "Although an IDP can be locked into a specific conformation by a drug molecule in rare cases, the prevailing scenario is that the protein remains disordered upon drug binding." But is this true? The authors should provide evidence to support this assertion, both examples in which this happens, and evidence to support the idea that it's the "prevailing view" and specific examples where these types of interactions have been biophysically characterized.

      We now cite nine studies showing that IDPs remain disordered upon drug binding.

      Similarly, they go on to say:

      "Consequently, the IDP-drug complex typically samples a vast conformational space, and the drug molecule only exhibits preferences, rather than exclusiveness, for interacting with subsets of residues." But again, where is the data to support this assertion? I don't necessarily disagree, but we need specific empirical studies to justify declarative claims like this; otherwise, we propagate lore into the scientific literature. The use of "typically" here is a strong claim, implying most IDP complexes behave in a certain way, yet how can the authors make such a claim? 

      Here again we add citations to support the statement.

      Finally, they continue to claim:

      "Such drug interacting residues (DIRs), akin to binding pockets in structured proteins, are key to optimizing compounds and elucidating the mechanism of action." But again, is this a fact or a hypothesis? If the latter, it must be stated as such; if the former, we need data and evidence to support the claim.

      We add citations to both compound optimization and mechanism of action.

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should compare the sequences of the IDPs in the case studies with the 45 IDPs in training the SeqDYN model to make sure that they are not included in the training dataset or are highly homologous.

      Please note that the data used for training SeqDYN were R2 rates, which are independent of the property being studied here, i.e., drug interacting residues. Therefore whether the IDPs studied here were in the training set for SeqDYN is immaterial.

      (2) The authors manually tuned four parameters in SeqDYN to develop the model for predicting drug-interacting residues without giving strict testing or explanations. More explanations, testing of more values, and ablation testing should be given.

      As responded above, we now both expand the explanation and present more test results.

      (3) The authors changed the q values of L, I, and M to the value of V. What are the results if these values are not changed?

      These results are shown in Table S2 (entry named SeqDYN_orig).

      (4) Only one b value is chosen based on the assumption that a drug molecule interacts with 3-4 residues at a time. However, the number of interacting residues is related to the size of the drug molecule. Adjusting the b value with the size of the ligand may provide improvement. It is better to test the influence of adjusting b values. At least, this should be discussed.

      Good point! We now state that b potentially can be adjusted according to ligand size (p. 12-13). In addition, we also show the effect of varying b on the prediction results (Table S2; p. 8, last paragraph).

      (5) The authors add 12 Q to eliminate end effects. However, explanations on why 12 Qs are chosen should be given. How about other numbers of Q or using other residues (e.g., the commonly used residues in making links, like GS/PS or A?

      As we already explained, “Gln was selected because its 𝑞 value is at the middle of the 20 𝑞 values.” (p. 5, second paragraph). Also, 12 Qs are sufficient to remove any end effects; a higher number of Qs does not make any difference.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors make reference to the "C-terminal IDR" in cMyc, but the region they note is found in the bHLH DNA binding domain (which falls from residue ~370-420).

      We now clarify that this region is disordered on its own but form a helix-loop-loop structure upon heterodimerization with Max (p. 11, last paragraph).

      (2) Given the fact that X-seq names are typically associated with sequencing-based methods, it's perhaps confusing to name this method DIRseq?

      We appreciate the reviewer’s point, but by now the preprint posted in bioRxiv is in wide circulation, and the DIRseq web server has been up for several months, so changing its name would cause a great deal of confusion.

      (3) I'd encourage the authors just to spell out "drug interacting residues" and retain an IDR acronym for IDRs. Acronyms rarely make writing clearer, and asking folks to constantly flip between IDR and DIR is asking a lot of an audience (in this reviewer's opinion, anyway).

      The reviewer makes a good point; we now spell out “drug-interacting residues”.

      (4) The assumption here is that CSPs result from direct drug:IDR interactions. However, CSPs result from a change in the residue chemical environment, which could in principle be an indirect effect (e.g., in the unbound state, residues A and B interact; in the bound state, residue A is now free, such that it experiences a CSP despite not engaging directly). While I recognize such assumptions are commonly made, it behoves the authors to explicitly make this point so the reader understands the relationship between CSPs and binding.

      We did add caveats of CSP in Introduction (p. 3, second paragraph).

      (5) On the figures, please label which protein is which figure, as well as provide a legend for the annotations on the figures (red line, blue bar, cyan region, etc.)

      We now label protein names in Fig. 1. For annotation of display items, it is also made in the Figs. 2 and 3 captions; we now add it to the Fig. 4 caption.

      (6) abstract: "These successes augur well for deciphering the sequence code for IDP-drug binding." - This is not grammatically correct, even if augur were changed to agree. Suggest rewriting.

      “Augur well” means to be a good sign (for something). We use this phrase here in this meaning.

      (6) page 5: "we raised the 𝑞 value of Asp to be the same as that of Glu" → suggested "increased" instead of raised.

      We have made the suggested change.

      (7) The authors should consider releasing the source code (it is available via the .js implementation on the server, but this is not very transferable/shareable, so I'd encourage the authors to provide a stand-alone implementation that's explicitly shareable).

      We have now added a link for the user to download the source code.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors introduce DIRseq, a fast, sequence-based method that predicts drug-interacting residues (DIRs) in IDPs without requiring structural or drug information. DIRseq builds on the authors' prior work looking at NMR relaxation rates, and presumes that those residues that show enhanced R2 values are the residues that will interact with drugs, allowing these residues to be nominated from the sequence directly. By making small modifications to their prior tool, DIRseq enables the prediction of residues seen to interact with small molecules in vivo.

      Strengths:

      The preprint is well written and easy to follow.

    4. Reviewer #1 (Public review):

      Summary:

      The authors developed a sequence-based method to predict drug-interacting residues in IDP, based on their recent work, to predict the transverse relaxation rates (R2) of IDP trained on 45 IDP sequences and their corresponding R2 values. The discovery is that the IDPs interact with drugs mostly using aromatic residues that are easy to understand, as most drugs contain aromatic rings. They validated the method using several case studies, and the predictions are in accordance with chemical shift perturbations and MD simulations. The location of the predicted residues serves as a starting point for ligand optimization.

      Strengths:

      This work provides the first sequence-based prediction method to identify potential drug-interacting residues in IDP. The validity of the method is supported by case studies. It is easy to use, and no time-consuming MD simulations and NMR studies are needed.

      Weaknesses:

      The method does not depend on the information of binding compounds, which may give general features of IDP-drug binding. However, due to the size and chemical structures of the compounds (for example, how many aromatic rings), the number of interacting residues varies, which is not considered in this work. Lacking specific information may restrict its application in compound optimization, aiming to derive specific and potent binding compounds.

      Comments on revised version:

      I'm satisfied with the authors' response and the public review does not need further changes.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife Assessment

      The authors examine the effect of cell-free chromatin particles (cfChPs) derived from human serum or from dying human cells on mouse cells in culture and propose that these cfChPs can serve as vehicles for cell-to-cell active transfer of foreign genetic elements. The work presented in this paper is intriguing and potentially important, but it is incomplete. At this stage, the claim that horizontal gene transfer can occur via cfChPs is not well supported because it is only based on evidence from one type of methodological approach (immunofluorescence and fluorescent in situ hybridization (FISH)) and is not validated by whole genome sequencing.

      We disagree with the eLife assessment that our study is incomplete because we did not perform whole genome sequencing. Tens of thousands of genomes have been sequenced, and yet they have failed to detect the presence of the numerous “satellite genomes” that we describe in our paper. To that extent whole genome sequencing has proved to be an inappropriate technology. Rather, eLife should have commended us for the numerous control experiments that we have done to ensure that our FISH probes and antibodies are target specific and do not cross-react.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Horizontal gene transfer is the transmission of genetic material between organisms through ways other than reproduction. Frequent in prokaryotes, this mode of genetic exchange is scarcer in eukaryotes, especially in multicellular eukaryotes. Furthermore, the mechanisms involved in eukaryotic HGT are unknown. This article by Banerjee et al. claims that HGT occurs massively between cells of multicellular organisms. According to this study, the cell free chromatin particles (cfChPs) that are massively released by dying cells are incorporated in the nucleus of neighboring cells.

      The reviewer is mistaken. We do not claim that the internalized cfChPs are incorporated into the nucleus. We show throughout the paper that the cfChPs perform their novel functions autonomously outside the genome without being incorporated into the nucleus. This is clearly seen in all our chromatin fibre images, metaphase spreads and our video abstract. Occasionally, when the cfChPs fluorescent signal overlie the chromosomes, we have been careful to state that the cfChPs are associated with the chromosomes without implying that they have integrated.

      These cfChPs are frequently rearranged and amplified to form concatemers, they are made of open chromatin, expressed, and capable of producing proteins. Furthermore, the study also suggests that cfChPs transmit transposable elements (TEs) between cells on a regular basis, and that these TEs can transpose, multiply, and invade receiving cells. These conclusions are based on a series of experiments consisting in releasing cfChPs isolated from various human sera into the culture medium of mouse cells, and using FISH and immunofluorescence to monitor the state and fate of cfChPs after several passages of the mouse cell line.

      Strengths:

      The results presented in this study are interesting because they may reveal unsuspected properties of some cell types that may be able to internalize free-circulating chromatin, leading to its chromosomal incorporation, expression, and unleashing of TEs. The authors propose that this phenomenon may have profound impacts in terms of diseases and genome evolution. They even suggest that this could occur in germ cells, leading to within-organism HGT with long-term consequences.

      Again the reviewer makes the same mistake. We do not claim that the internalized cfChPs are incorporated into the chromosomes. We have addressed this issue above.

      We have a feeling that the reviewer has not understood our work – which is the discovery of “satellite genomes” which function autonomously outside the nuclear genome.

      Weaknesses:

      The claims of massive HGT between cells through internalization of cfChPs are not well supported because they are only based on evidence from one type of methodological approach: immunofluorescence and fluorescent in situ hybridization (FISH) using protein antibodies and DNA probes. Yet, such strong claims require validation by at least one, but preferably multiple, additional orthogonal approaches. This includes, for example, whole genome sequencing (to validate concatemerization, integration in receiving cells, transposition in receiving cells), RNA-seq (to validate expression), ChiP-seq (to validate chromatin state).

      We disagree with the reviewer that our study is incomplete because we did not perform whole genome sequencing. Tens of thousands of genomes have been sequenced, and yet they have failed to detect the presence of the numerous “satellite genomes” that we describe in our paper. To that extent whole genome sequencing has proved to be an inappropriate approach. Rather, the reviewer should have commended us for the numerous control experiments that we have done to ensure that our FISH probes and antibodies are target specific and do not cross-react.

      Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed on Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism. Yet, telomere-to-telomere genomes have been produced for many eukaryote species, calling into question the conclusions of this study.

      The reviewer has raised a related issue below and we have responded to both of them together.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I thank the authors for taking my comments and those of the other reviewer into account and for adding new material to this new version of the manuscript. Among other modifications/additions, they now mention that they think that NIH3T3 cells treated with cfChPs die out after 250 passages because of genomic instability which might be caused by horizontal transfer of cfChPs DNA into the genome of treated cells (pp. 45-46, lines 725-731). However, no definitive formal proof of genomic instability and horizontal transfer is provided.

      We mention that the NIH3T3 cells treated with cfChPs die out after 250 passages in response to the reviewer’s earlier comment “Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed in Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism”.

      We have agreed with the reviewer and have simply speculated that the cells may die because of extreme genomic instability. We have left it as a speculation without diverting our paper in a different direction to prove genomic instability.

      The authors now refer to an earlier study they conducted in which they Illumina-sequenced NIH3T3 cells treated with cfChPs (pp. 48, lines. 781-792). This study revealed the presence of human DNA in the mouse cell culture. However, it is unclear to me how the author can conclude that the human DNA was inside mouse cells (rather than persisting in the culture medium as cfChPs) and it is also unclear how this supports horizontal transfer of human DNA into the genome of mouse cells. Horizontal transfer implies integration of human DNA into mouse DNA, through the formation of phosphodiester bounds between human nucleotides and mouse nucleotides. The previous Illumina-sequencing study and the current study do not show that such integration has occured. I might be wrong but I tend to think that DNA FISH signals showing that human DNA lies next to mouse DNA does not necessarily imply that human DNA has integrated into mouse DNA. Perhaps such signals could result from interactions at the protein level between human cfChPs and mouse chromatin?

      With due respect, our earlier genome sequencing study that the reviewer refers to was done on two single cell clones developed following treatment with cfChPs. So, the question of cfChPs lurking in the culture medium does not arise.

      The authors should be commended for doing so many FISH experiments. But in my opinion, and as already mentioned in my earlier review of this work, horizontal transfer of human DNA into mouse DNA should first be demonstrated by strong DNA sequencing evidence (multiple long and short reads supporting human/mouse breakpoints; discarding technical DNA chimeras) and only then eventually confirmed by FISH.

      As mentioned earlier, we disagree with the reviewer that our study is incomplete because we did not perform whole genome sequencing. Tens of thousands of genomes have been sequenced, and yet they have failed to detect the presence of the numerous “satellite genomes” that we describe in our paper. To that extent whole genome sequencing has proved to be an inappropriate approach. Rather, the reviewer should have commended us for the numerous control experiments that we have done to ensure that our FISH probes and antibodies are target specific and do not cross-react.

      Regarding my comment on the quantity of human cfChPs that has been used for the experiments, the authors replied that they chose this quantity because it worked in a previous study. Could they perhaps explain why they chose this quantity in the earlier study? Is there any biological reason to choose 10 ng and not more or less? Is 10 ng realistic biologically? Could it be that 10 ng is orders of magnitude higher than the quantity of cfChPs normally circulating in multicellular organisms and that this could explain, at least in part, the results obtained in this study?

      The reviewer again raises the same issue to which we have already addressed in our revised manuscript. To quote “We chose to use 10ng based on our earlier report in which we had obtained robust biological effects such as activation of DDR and activation of apoptotic pathways using this concentration of cfChPs (Mittra I et. al., 2015)”.

      It is also mentioned in the response that RNA-seq has been performed on mouse cells treated with cfChPs, and that this confirms human-mouse fusion (genomic integration). Since these results are not included in the manuscript, I cannot judge how robust they are and whether they reflect a biological process rather than technical issues (technical chimeras formed during the RNA-seq protocol is a well-known artifact). In any case, I do not think that genomic integration can be demonstrated through RNA-seq as junction between human and mouse RNA could occur at the RNA level (i.e. after transcription). RNA-seq could however show whether human-mouse chimeras that have been validated by DNA-sequencing are expressed or not.

      We did perform transcriptome sequencing as suggested earlier by the reviewer, but realized that the amount of material required to be incorporated into the manuscript to include “material and methods”, “results”, “discussion”, “figures” and “legends to figures” and “supplementary figures and tables” would be so massive that it will detract from the flow of our work and hijack it in a different direction. We have, therefore, decided to publish the transcriptome results as a separate manuscript.

      Given these comments, I believe that most of the weaknesses I mentioned in my review of the first version of this work still hold true.

      An important modification is that the work has been repeated in other cell lines, hence I removed this criticism from my earlier review.

      Additional changes made

      (1) We have now rewritten the “Abstract” to 250 words to fit in eLife’s instructions. (It was not possible to reduce the word count further.

      (2) We have provided the Video 1 as separate file instead of link.

      (3) Some of Figure Supplements (which were stand-alone) are now given as main figures. We have re-arranged Figures and Figure Supplements in accordance with eLife’s instructions.

      (4) We have now provided a list of the various cell lines used in this study, their tissue origin and procurement source in Supplementary File 3.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Horizontal gene transfer is the transmission of genetic material between organisms through ways other than reproduction. Frequent in prokaryotes, this mode of genetic exchange is scarcer in eukaryotes, especially in multicellular eukaryotes. Furthermore, the mechanisms involved in eukaryotic HGT are unknown. This article by Banerjee et al. claims that HGT occurs massively between cells of multicellular organisms. According to this study, the cell free chromatin particles (cfChPs) that are massively released by dying cells are incorporated in the nucleus of neighboring cells. These cfChPs are frequently rearranged and amplified to form concatemers, they are made of open chromatin, expressed, and capable of producing proteins. Furthermore, the study also suggests that cfChPs transmit transposable elements (TEs) between cells on a regular basis, and that these TEs can transpose, multiply, and invade receiving cells. These conclusions are based on a series of experiments consisting in releasing cfChPs isolated from various human sera into the culture medium of mouse cells, and using FISH and immunofluorescence to monitor the state and fate of cfChPs after several passages of the mouse cell line.

      Strengths:

      The results presented in this study are interesting because they may reveal unsuspected properties of some cell types that may be able to internalize free-circulating chromatin, leading to its chromosomal incorporation, expression, and unleashing of TEs. The authors propose that this phenomenon may have profound impacts in terms of diseases and genome evolution. They even suggest that this could occur in germ cells, leading to within-organism HGT with long-term consequences.

      Weaknesses:

      The claims of massive HGT between cells through internalization of cfChPs are not well supported because they are only based on evidence from one type of methodological approach: immunofluorescence and fluorescent in situ hybridization (FISH) using protein antibodies and DNA probes. Yet, such strong claims require validation by at least one, but preferably multiple, additional orthogonal approaches. This includes, for example, whole genome sequencing (to validate concatemerization, integration in receiving cells, transposition in receiving cells), RNA-seq (to validate expression), ChiP-seq (to validate chromatin state).

      We have responded to this criticism under “Reviewer #1 (Recommendations for the authors, item no. 1-4)”.

      Another weakness of this study is that it is performed only in one receiving cell type (NIH3T3 mouse cells). Thus, rather than a general phenomenon occurring on a massive scale in every multicellular organism, it could merely reflect aberrant properties of a cell line that for some reason became permeable to exogenous cfChPs. This begs the question of the relevance of this study for living organisms.

      We have responded to this criticism under “Reviewer #1 (Recommendations for the authors, item no. 6)”.

      Should HGT through internalization of circulating chromatin occur on a massive scale, as claimed in this study, and as illustrated by the many FISH foci observed in Fig 3 for example, one would expect that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome for a given organism. Yet, telomere-to-telomere genomes have been produced for many eukaryote species, calling into question the conclusions of this study.

      The reviewer is right in expecting that the level of somatic mosaicism may be so high that it would prevent assembling a contiguous genome. This is indeed the case, and we find that beyond ~ 250 passages the cfChPs treated NIH3T3 cells begin to die out apparently become their genomes have become too unstable for survival. This point will be highlighted in the revised version (pp. 45-46, lines 725-731).

      Reviewer #2 (Public review):

      I must note that my comments pertain to the evolutionary interpretations rather than the study's technical results. The techniques appear to be appropriately applied and interpreted, but I do not feel sufficiently qualified to assess this aspect of the work in detail.

      I was repeatedly puzzled by the use of the term "function." Part of the issue may stem from slightly different interpretations of this word in different fields. In my understanding, "function" should denote not just what a structure does, but what it has been selected for. In this context, where it is unclear if cfChPs have been selected for in any way, the use of this term seems questionable.

      We agree. We have removed the term “function” wherever we felt we had used it inappropriately.

      Similarly, the term "predatory genome," used in the title and throughout the paper, appears ambiguous and unjustified. At this stage, I am unconvinced that cfChPs provide any evolutionary advantage to the genome. It is entirely possible that these structures have no function whatsoever and could simply be byproducts of other processes. The findings presented in this study do not rule out this neutral hypothesis. Alternatively, some particular components of the genome could be driving the process and may have been selected to do so. This brings us to the hypothesis that cfChPs could serve as vehicles for transposable elements. While speculative, this idea seems to be compatible with the study's findings and merits further exploration.

      We agree with the reviewer’s viewpoint. We have replaced the term “predatory genome” with a more realistic term “satellite genome” in the title and throughout the manuscript. We have also thoroughly revised the discussion section and elaborated on the potential role of LINE-1 and Alu elements carried by the concatemers in mammalian evolution. (pp. 46-47, lines 743-756).

      I also found some elements of the discussion unclear and speculative, particularly the final section on the evolution of mammals. If the intention is simply to highlight the evolutionary impact of horizontal transfer of transposable elements (e.g., as a source of new mutations), this should be explicitly stated. In any case, this part of the discussion requires further clarification and justification.

      As mentioned above, we have revised the “discussion” section taking into account the issues raised by the reviewer and highlighted the potential role of cfChPs in evolution by acting as vehicles of transposable elements.

      In summary, this study presents important new findings on the behavior of cfChPs when introduced into a foreign cellular context. However, it overextends its evolutionary interpretations, often in an unclear and speculative manner. The concept of the "predatory genome" should be better defined and justified or removed altogether. Conversely, the suggestion that cfChPs may function at the level of transposable elements (rather than the entire genome or organism) could be given more emphasis.

      As mentioned above, we have replaced the term “predatory genome” with “satellite genome” and revised the “discussion” section taking into account the issues raised by the reviewer.

      Reviewer #1 (Recommendations for the authors):

      (1) I strongly recommend validating the findings of this study using other approaches. Whole genome sequencing using both short and long reads should be used to validate the presence of human DNA in the mouse cell line, as well as its integration into the mouse genome and concatemerization. Breakpoints between mouse and human DNA can be searched in individual reads. Finding these breakpoints in multiple reads from two or more sequencing technologies would strengthen their biological origin. Illumina and ONT sequencing are now routinely performed by many labs, such that this validation should be straightforward. In addition to validating the findings of the current study, it would allow performance of an in-depth characterization of the rearrangements undergone by both human cfChPs and the mouse genome after internalization of cfChPs, including identification of human TE copies integrated through bona fide transposition events into the mouse genome. New copies of LINE and Alu TEs should be flanked by target site duplications. LINE copies should be frequently 5' truncated, as observed in many studies of somatic transposition in human cells.

      (2) Furthermore, should the high level of cell-to-cell HGT detected in this study occur on a regular basis within multicellular organisms, validating it through a reanalysis of whole genome sequencing data available in public databases should be relatively easy. One would expect to find a high number of structural variants that for some reason have so far gone under the radar.

      (3) Short and long-read RNA-seq should be performed to validate the expression of human cfChPs in mouse cells. I would also recommend performing ChIP-seq on routinely targeted histone marks to validate the chromatin state of human cfChPs in mouse cells.

      (4) The claim that fused human proteins are produced in mouse cells after exposing them to human cfChPs should be validated using mass spectrometry.

      The reviewer has suggested a plethora of techniques to validate our findings. Clearly, it is neither possible to undertake all of them nor to incorporate them into the manuscript. However, as suggested by the reviewer, we did conduct transcriptome sequencing of cfChPs treated NIH3T3 cells and were able to detect the presence of human-human fusion sequences (representing concatemerisation) as well as human-mouse fusion sequences (representing genomic integration). However, we realized that the amount of material required to be incorporated into the manuscript to include “material and methods”, “results”, “discussion”, “figures” and “legends to figures” and “supplementary figures and tables” would be so massive that it will detract from the flow of our work and hijack it in a different direction. We have, therefore, decided to publish the transcriptome results as a separate manuscript. However, to address the reviewer’s concerns we have now referred to results of our earlier whole genome sequencing study of NIH3T3 cells similarly treated with cfChPs wherein we had conclusively detected the presence of human DNA and human Alu sequences in the treated mouse cells. These findings have now been added as an independent paragraph (pp. 48, lines. 781-792).

      (5) It is unclear from what is shown in the paper (increase in FISH signal intensity using Alu and L1 probes) if the increase in TE copy number is due to bona fide transposition or to amplification of cfChPs as a whole, through mechanisms other than transposition. It is also unclear whether human TEs end up being integrated into the neighboring mouse genome. This should be validated by whole genome sequencing.

      Our results suggest that TEs amplify and increase their copy number due to their association with DNA polymerase and their ability to synthesize DNA (Figure 14a and b). Our study design cannot demonstrate transposition which will require real time imaging.

      The possibility of incorporation of TEs into the mouse genome is supported by our earlier genome sequencing work, referred to above, wherein we detected multiple human Alu sequences in the mouse genome (pp. 48, lines. 781-792).

      (6) In order to be able to generalize the findings of this study, I strongly encourage the authors to repeat their experiments using other cell types.

      We thank the reviewer for this suggestion. We have now used four different cell lines derived from four different species and demonstrated that horizontal transfer of cfChPs occur in all of them suggesting that it is a universal phenomenon. (pp. 37, lines 560-572) and (Supplementary Fig. S14a-d).

      We have also mentioned this in the abstract (pp. 3, lines 52-54).

      (7) Since the results obtained when using cfChPs isolated from healthy individuals are identical to those shown when using cfChPs from cancer sera, I wonder why the authors chose to focus mainly on results from cancer-derived cfChPs and not on those from healthy sera.

      Most of the experiments were conducted using cfChPs isolated from cancer patients because of our especial interest in cancer, and our earlier results (Mittra et al., 2015) which had shown that cfChPs isolated from cancer patients had significantly greater activity in terms of DNA damage and activation of apoptotic pathways than those isolated from healthy individuals. We have now incorporated the above justification on (pp. 6, lines. 124-128).

      (8) Line 125: how was the 10-ng quantity (of human cfChPs added to the mouse cell culture) chosen and how does it compare to the quantity of cfChPs normally circulating in multicellular organisms?

      We chose to use 10ng based on our earlier report in which we had obtained robust biological effects such as activation of DDR and apoptotic pathways using this concentration of cfChPs (Mittra I et. al. 2015). We have now incorporated the justification of using this dose in our manuscript (pp. 51-52, lines. 867-870).

      (9) Could the authors explain why they repeated several of their experiments in metaphase spreads, in addition to interphase?

      We conducted experiments on metaphase spreads in addition to those on chromatin fibres because of the current heightened interest in extra-chromosomal DNA in cancer, which have largely been based on metaphase spreads. We were interested to see how the cfChP concatemers might relate to the characteristics of cancer extrachromosomal DNA and whether the latter in fact represent cfChPs concatemers acquired from surrounding dying cancer cells. We have now mentioned this on pp. 7, lines 150-155.

      (10) Regarding negative controls consisting in checking whether human probes cross-react with mouse DNA or proteins, I suggest that the stringency of washes (temperature, reagents) should be clearly stated in the manuscript, such that the reader can easily see that it was identical for controls and positive experiments.

      We were fully aware of these issues and were careful to ensure that washing steps were conducted meticulously. The careful washing steps have been repeatedly emphasized under the section on “Immunofluorescence and FISH” (pp. 54-55, lines. 922-944).

      (11) I am not an expert in Immuno-FISH and FISH with ribosomal probes but it can be expected that ribosomal RNA and RNA polymerase are quite conserved (and thus highly similar) between humans and mice. A more detailed explanation of how these probes were designed to avoid cross-reactivity would be welcome.

      We were aware of this issue and conducted negative control experiment to ensure that the human ribosomal RNA probe and RNA polymerase antibody did not cross-react with mouse. Please see Supplementary Fig. S4c.

      (12) Finally, I could not understand why the cfChPs internalized by neighboring cells are called predatory genomes. I could not find any justification for this term in the manuscript.

      We agree and this criticism has also been made by #Reviewer 2. We have now replaced the term “predatory” genomes with “satellite” genomes.

      Reviewer #2 (Recommendations for the authors):

      (1) P2 L34: The term "role" seems to imply "what something is supposed to do" (similar to "function"). Perhaps "impact" would be more neutral. Additionally, "poorly defined" is vague-do you mean "unknown"?

      We thank the reviewer for this suggestion. We have now rephrased the sentence to read “Horizontal gene transfer (HGT) plays an important evolutionary role in prokaryotes, but it is thought to be less frequent in mammals.” (pp. 2, lines. 26-27).

      (2) P2 L35: It seems that the dash should come after "human blood."

      Thank you, we have changed the position of the dash (pp. 2, line. 29).

      (3) P2 L37: Must we assume these structures have a function? Could they not simply be side effects of other processes?

      We think this is a matter of semantics, especially since we show that cfChPs once inside the cell perform many functions such as replication, DNA synthesis, RNA synthesis, protein synthesis etc. We, therefore, think the word “function” is not inappropriate.

      (4) Abstract: After reading the abstract, I am unclear on the concept of a "predatory genome." Based on the summarized results, it seems one cannot conclude that these elements provide any adaptive value to the genome.

      We agree. We have now replaced the term “predatory” genomes with a more realistic term viz. “satellite” genomes.

      (5) Video abstract: The video abstract does not currently stand on its own and needs more context to be self-explanatory.

      Thank you for pointing this out. We have now created a new and much more professional video with more context which we hope will meet with the reviewer’s approval.

      (6) P4 L67: Again, I am uncertain that HGT should be said to have "a role" in mammals, although it clearly has implications and consequences. Perhaps "role" here is intended to mean "consequence"?

      We have now changed the sentence to read as follows “However, defining the occurrence of HGT in mammals has been a challenge” (pp. 4, line. 73).

      (7) P6 L111: The phrase "to obtain a new perspective about the process of evolution" is unclear. What exactly is meant by this statement?

      We have replaced this sentence altogether which now reads “The results of these experiments are presented in this article which may help to throw new light on mammalian evolution, ageing and cancer” (pp. 5-6, lines 116-118).

      (8) P38 L588: The term "predatory genome" has not been defined, making it difficult to assess its relevance.

      This issue has been addressed above.

      (9) P39 L604: The statement "transposable elements are not inherent to the cell" suggests that some TEs could originate externally, but this does not rule out that others are intrinsic. In other words, TEs are still inherent to the cell.

      This part of the discussion section has been rewritten and the above sentence has been deleted.

      (10) P39 L609: The phrase "may have evolutionary functions by acting as transposable elements" is unclear. Perhaps it is meant that these structures may serve as vehicles for TEs?

      This sentence has disappeared altogether in the revised discussion section.

      (11) P41 L643: "Thus, we hypothesize ... extensively modified to act as foreign genetic elements." This sentence is unclear. Are the authors referring to evolutionary changes in mammals in general (which overlooks the role of standard mutational processes)? Or is it being proposed that structural mutations (including TE integrations) could be mediated by cfChPs in addition to other mutational mechanisms?

      We have replaced this sentence which now reads “Thus, “within-self” HGT may occur in mammals on a massive scale via the medium of cfChP concatemers that have undergone extensive and complex modifications resulting in their behaviour as “foreign” genetic elements” (pp. 47, lines 763-766).

      (12) P41 L150: The paragraph beginning with "It has been proposed that extreme environmental..." transitions too abruptly from HGT to adaptation. Is it being proposed that cfChPs are evolutionary processes selected for their adaptive potential? This idea is far too speculative at this stage and requires clarification.

      We agree. This paragraph has been removed.

      (13) P43 L681: This summary appears overly speculative and unclear, particularly as the concept of a "predatory genome" remains undefined and thus cannot be justified. It suggests that cfChPs represent an alternative lifestyle for the entire genome, although alternative explanations seem far more plausible at this point.

      We have now replaced the term “predatory” genome with “satellite” genome. The relevant part of the summary section has also been partially revised (pp. 49-50, lines 817-831).

      Changes independent of reviewers’ comments.

      We have made the following additions / modifications.

      (1) The abstract has been modified and it’s “conclusion” section has been rewritten.

      (2) Section 1.14 has been newly added together with accompanying Figures 15 a,b and c.

      (3) The “Discussion” section has been greatly modified and parts of it has been rewritten.

    1. eLife Assessment

      This fundamental study reveals that aging in yeast leads to chromosome mis-segregation due to asymmetric partitioning of chromosomes, driven by disruption of the nuclear pore complex and pre-mRNA leakage. The findings are convincingly supported by carefully-designed experimental data with a combination of genetic, molecular biology and cell biology approaches.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors explore a novel mechanism linking aging to chromosome mis-segregation and aneuploidy in yeast cells. They reveal that, in old yeast mother cells, chromosome loss occurs through asymmetric partitioning of chromosomes to daughter cells, a process coupled with the inheritance of an old Spindle Pole Body. Remarkably, the authors identify that remodeling of the nuclear pore complex (NPC), specifically the displacement of its nuclear basket, triggers these asymmetric segregation events. This disruption also leads to the leakage of unspliced pre-mRNAs into the cytoplasm, highlighting a breakdown in RNA quality control. Through genetic manipulation, the study demonstrates that removing introns from key chromosome segregation genes is sufficient to prevent chromosome loss in aged cells. Moreover, promoting pre-mRNA leakage in young cells mimics the chromosome mis-segregation observed in old cells, providing further evidence for the critical role of nuclear envelope integrity and RNA processing in aging-related genome instability.

      Strengths:

      The findings presented are not only intriguing but also well-supported by robust experimental data, highlighting a previously unrecognized connection between nuclear envelope integrity, RNA processing, and genome stability in aging cells, deepening our understanding of the molecular basis of chromosome loss in aging.

      Weaknesses:

      The authors have satisfactorily addressed my concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The authors make the interesting discovery of increased chromosome non-dysjunction in aging yeast mother cells. The phenotype is quite striking and well supported with solid experimental evidence. This is quite significant to a haploid cell (as used here) - loss of an essential chromosome leads to death soon thereafter. The authors then work to tie this phenotype to other age-associated phenotypes that have been previously characterized: accumulation of extrachromosomal rDNA circles that then correlate with compromised nuclear pore export functions, which correlates with "leaky" pores that permit unspliced mRNA messages to be inappropriately exported to the cytoplasm. They then infer that three intron containing mRNAs that encode portions in resolving sister chromatid separation during mitosis, are unspliced in this age-associated defect and thus lead to the non-dysjunction problem.

      Strengths:

      The discovery of age-associated chromosome non-dysjunction is an interesting discovery, and it is demonstrated in a convincing fashion with "classic" microscopy-based single cell fluorescent chromosome assays that are appropriate and seem robust. The correlation of this phenotype with other age-associated phenotypes - specifically extrachromosomal rDNA circles and nuclear pore dysfunction - is supported by in vivo genetic manipulations that have been well-characterized in the past.

      In addition, the application of the single cell mRNA splicing defect reporter showed very convincingly that general mRNA splicing is compromised in aged cells. Such a pleiotropic event certainly has big implications.

      Weaknesses:

      The authors have addressed my major concerns with experimentation or clarification.

    4. Reviewer #3 (Public review):

      Summary:

      Mirkovic et al explore the cause underlying development of aneuploidy during aging. This paper provides a compelling insight into the basis of chromosome missegregation in aged cells, tying this phenomenon to the established Nuclear Pore Complex architecture remodeling that occurs with aging across a large span of diverse organisms. The authors first establish that aged mother cells exhibit aberrant error correction during mitosis. As extrachromosomal rDNA circles (ERCs) are known to increase with age and lead to NPC dysfunction that can result in leakage of unspliced pre-mRNAs, Mirkovic et al search for intron-containing genes in yeast that may be underlying chromosome missegregation, identifying three genes in the aurora B-dependent error correction pathway: MCM21, NBL1, and GLC7. Interestingly, intron-less mutants in these genes suppress chromosome loss in aged cells, with a significant impact observed when all three introns were deleted (3x∆i). The 3x∆i mutant also suppresses the increased chromosome loss resulting from nuclear basket destabilization in a mlp1∆ mutant. The authors then directly test if aged cells do exhibit aberrant mRNA export, using RNA FISH to identify that old cells indeed leak intron-containing pre-mRNA into the cytoplasm, as well as a reporter assay to demonstrate translation of leaked pre-mRNA, and that this is suppressed in cells producing less ERCs. Mutants causing increased pre-mRNA leakage are sufficient to induce chromosome missegregation, which is suppressed by the 3x∆i.

      Strengths:

      The finding that deleting the introns of 3 genes in the Aurora B pathway can suppress age-related chromosome missegregation is highly compelling. Additionally, the rationale behind the various experiments in this paper is well-reasoned and clearly explained.

      Weaknesses:

      My main concerns have been thoroughly addressed by the authors.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      In this study, the authors explore a novel mechanism linking aging to chromosome mis-segregation and aneuploidy in yeast cells. They reveal that, in old yeast mother cells, chromosome loss occurs through asymmetric partitioning of chromosomes to daughter cells, a process coupled with the inheritance of an old Spindle Pole Body. Remarkably, the authors identify that remodelling of the nuclear pore complex (NPC), specifically the displacement of its nuclear basket, triggers these asymmetric segregation events. This disruption also leads to the leakage of unspliced pre-mRNAs into the cytoplasm, highlighting a breakdown in RNA quality control. Through genetic manipulation, the study demonstrates that removing introns from key chromosome segregation genes is sufficient to prevent chromosome loss in aged cells. Moreover, promoting pre-mRNA leakage in young cells mimics the chromosome mis-segregation observed in old cells, providing further evidence for the critical role of nuclear envelope integrity and RNA processing in aging-related genome instability. 

      Strengths: 

      The findings presented are not only intriguing but also well-supported by robust experimental data, highlighting a previously unrecognized connection between nuclear envelope integrity, RNA processing, and genome stability in aging cells, deepening our understanding of the molecular basis of chromosome loss in aging. 

      We thank the reviewer for this very positive assessment of our work

      Weaknesses: 

      Further analysis of yeast aging data from microfluidic experiments will provide important information about the dynamic features and prevalence of the key aging phenotypes, e.g. pre-mRNA leakage and chromosome loss, reported in this work. 

      We thank the reviewer for bringing this point, which we have addressed in the revised version of the manuscript.  In short, chromosome loss is an abrupt, late event in the lifespan of the cells. To examine its prevalence, we have quantified the combined loss frequency of two chromosomes when both are labelled in the same cell. Whereas single chromosomes are lost at a frequency of 10-15% per cell, less than 5% of the cells lose both at the same time.  Thus, the different chromosomes are lost largely but not fully independently from each other. Based on these data, and on the fact that yeast cells have 16 chromosomes, we evaluate that about half of the cells lose at least one chromosome in their final cell cycle.

      We also tried to estimate the prevalence of the pre-mRNA leakage phenotype, based on the increased mCherry to GFP ratio observed between 0h and 24 hours of aging for 146 individual cells. For this analysis, we compared the mCherry/GFP ratio at 0 and 24h for the same individual cell. This analysis indicates that 81% of the cells show a fold change strictly above 1 as they age. Furthermore, the data appears to be unimodal. Thus, we can conservatively conclude that a majority of the cells show premRNA leakage at 24 hours.  Since not all cells are at the end of their life at that time, this is possibly an underestimate.

      In addition, a discussion would be needed to clarify the relationship between "chromosome loss" in this study and "genomic missegregation" reported previously in yeast aging. 

      Genomic mis-segregation is characterized by the entry of both SPBs and all the chromosomes into the daughter cell compartment (PMID: 31714209).  We have observed these events in our movies as well.  However, the chromosome loss phenotype that we are focusing on affects only some chromosomes (as discussed above) and takes place under proper elongation of the spindle, with one SPB remaining in the mother cell whereas the other one goes to the bud, as shown in the manuscript’s Figure 2.  In our movies, chromosome loss is at least three-fold more frequent (for a single chromosome) than full genome mis-segregation (Sup Fig 1A-B). Furthermore, whereas chromosome loss is alleviated by the removal of the introns of MCM21, NBL1 and GLC7, genomic mis-segregation is not (Sup Fig 1B).  Thus, genomic mis-segregation mentioned by the reviewer is a process distinct from the chromosome loss that we report.  This discussion and the relevant data have been added to the manuscript.

      We thank the reviewer for bringing up the possible confusion between these two phenotypes, allowing us to clarify this point.

      Reviewer #2 (Public review): 

      Summary: 

      The authors make the interesting discovery of increased chromosome non-dysjunction in aging yeast mother cells. The phenotype is quite striking and well supported with solid experimental evidence. This is quite significant to a haploid cell (as used here) - loss of an essential chromosome leads to death soon thereafter. The authors then work to tie this phenotype to other age-associated phenotypes that have been previously characterized: accumulation of extrachromosomal rDNA circles that then correlate with compromised nuclear pore export functions, which correlates with "leaky" pores that permit unspliced mRNA messages to be inappropriately exported to the cytoplasm. They then infer that three intron containing mRNAs that encode portions in resolving sister chromatid separation during mitosis, are unspliced in this age-associated defect and thus lead to the non-dysjunction problem. 

      Strengths: The discovery of age-associated chromosome non-dysjunction is an interesting discovery, and it is demonstrated in a convincing fashion with "classic" microscopy-based single cell fluorescent chromosome assays that are appropriate and seem robust. The correlation of this phenotype with other age-associated phenotypes - specifically extrachromosomal rDNA circles and nuclear pore dysfunction - is supported by in vivo genetic manipulations that have been well-characterized in the past. 

      In addition, the application of the single cell mRNA splicing defect reporter showed very convincingly that general mRNA splicing is compromised in aged cells. Such a pleiotropic event certainly has big implications. 

      We thank the reviewer for this assessment of our work.  To avoid confusion, we would like to stress out, however, that our data do not show that splicing per se is defective in old cells.  Actually, we specifically show that the cells are unlikely to show splicing defect (last figure of the original and the revised version of the manuscript). Our data specifically show that unspliced mRNAs tend to leak out of the nucleus of old cells.

      Weaknesses: 

      The biggest weakness is "connecting all the dots" of causality and linking the splicing defect to chromosome disjunction. I commend the authors for making a valiant effort in this regard, but there are many caveats to this interpretation. While the "triple intron" removal suppressed the non-dysjunction defect in aged cells, this could simply be a kinetic fix, where a slowdown in the relevant aspects of mitosis, could give the cell time to resolve the syntelic attachment of the chromatids.  

      The possibility that intron-removal leads to a kinetic fix is an interesting idea that we have now considered.  In the revised manuscript, we now provide measurements of mitotic duration in the “triple intron” mutant compared to wild type cells and the duration of their last cell cycle (See supplementary figure 3A-D). There is no evidence that removing these introns slows down mitosis.  Thus, the kinetic fix hypothesis is unlikely to explain our observation about the effect of intron removal.

      To this point, I note that the intron-less version of GLC7, which affects the most dramatic suppression of the three genes, is reported by one of the authors to have a slow growth rate (Parenteau et al, 2008 - https://doi.org/10.1091/mbc.e07-12-1254)

      The reviewer is right, removing the intron of GLC7 reduces the expression levels of the gene product (PMID: 16816425) to about 50% of the original value and causes a slow growth phenotype.  However, the cells revert fairly rapidly through duplication of the GLC7-∆i gene (see supplementary Figure 3EF).  As a consequence, neither the GLC7-∆i nor the 3x∆i mutant strains show noticeable growth phenotypes by spot assays.  We now document these findings in supplementary figure 3.  

      Lastly, the Herculean effort to perform FISH of the introns in the cytoplasm is quite literally at the statistical limit of this assay. The data were not as robust as the other assays employed through this study. The data show either "no" signal for the young cells or a signal of 0, 1, or 2 FISH foci in the aged cells. In a Poisson distribution, which this follows, it is improbable to distinguish between these differences. 

      This is correct, this experiment was not the easiest of the manuscript... However, despite the limitations of the assay, the data presented in figure 7B are very clear.  300 cells aged by MEP were analysed, divided in the cohorts of 100 each, and the distribution of foci (nuclear vs cytoplasmic) in these aged cells were compared to the distribution in three cohorts of young cells.  For all 3 aged cohorts, over 70% of the visible foci were cytoplasmic, while in the young cells, this figure was around 3%.  A t-test was conducted to compare these frequencies between young and old cells (Figure 7B). The difference is highly significant.  Therefore, we are clearly not at the statistical limit.

      What the reviewer refers to is the supplementary Figure 4, where we were simply asking i) is the signal lost in cells lacking the intron of GLC7 (the response is unambiguously yes) and ii) what is the general number of dots per cell between young and old wild type cells (without distinguishing between nuclear and cytoplasmic) and the information to be taken from this last quantification is indeed that there is no clearly distinguishable difference between these two population of cells, as the reviewer rightly concludes.  In other word, the reason why there are more dots in the cytoplasm of the old cells in the Figure 7B is not because the old cells have much more dots in general (see supplementary Figure 4C).  We hope that these clarifications help understand the data better.  We have edited the manuscript to avoid confusion.

      Reviewer #3 (Public review): 

      Summary: 

      Mirkovic et al explore the cause underlying development of aneuploidy during aging. This paper provides a compelling insight into the basis of chromosome missegregation in aged cells, tying this phenomenon to the established Nuclear Pore Complex architecture remodelling that occurs with aging across a large span of diverse organisms. The authors first establish that aged mother cells exhibit aberrant error correction during mitosis. As extrachromosomal rDNA circles (ERCs) are known to increase with age and lead to NPC dysfunction that can result in leakage of unspliced pre-mRNAs, Mirkovic et al search for intron-containing genes in yeast that may be underlying chromosome missegregation, identifying three genes in the aurora B-dependent error correction pathway: MCM21, NBL1, and GLC7. Interestingly, intron-less mutants in these genes suppress chromosome loss in aged cells, with a significant impact observed when all three introns were deleted (3x∆i). The 3x∆i mutant also suppresses the increased chromosome loss resulting from nuclear basket destabilization in a mlp1∆ mutant. The authors then directly test if aged cells do exhibit aberrant mRNA export, using RNA FISH to identify that old cells indeed leak intron-containing pre-mRNA into the cytoplasm, as well as a reporter assay to demonstrate translation of leaked pre-mRNA, and that this is suppressed in cells producing less ERCs. Mutants causing increased pre-mRNA leakage are sufficient to induce chromosome missegregation, which is suppressed by the 3x∆i. 

      Strengths: 

      The finding that deleting the introns of 3 genes in the Aurora B pathway can suppress age-related chromosome missegregation is highly compelling. Additionally, the rationale behind the various experiments in this paper is well-reasoned and clearly explained. 

      We thank the reviewer for their very positive assessment of our work

      Weaknesses:  

      In some cases, controls for experiments were not presented or were depicted in other figures. 

      We are sorry about this confusion.  We have improved our presentation of the controls, bringing them back each time they are relevant.  We have also added those that were missing (such as those mentioned by reviewer 2, see above). Note that the frequencies of centromeric plasmid loss at 0h in Figure 1C is not meaningful and therefore not presented. Since the cells were grown on selective medium before loading on to the ageing chip, we cannot report a plasmid loss frequency here. The ageing experiments themselves were subsequently conducted in full medium, to allow for centromeric plasmid loss without killing the cell. We explain this in the materials and methods section.

      High variability was seen in chromosome loss data, leading to large error bars. 

      We thank the reviewer for this comment. The variance in those two figures (3A and 5D) comes from the suboptimal plotting of this data. This is now corrected as follows.  We divided the available data into 4 cohorts and then plotted the average loss frequency across these cohorts for the indicated age groups.  This filters out much of the noise and improves the statistical resolution.

      The text could have been more polished. 

      Thank you for this comment.  We have gone through the manuscript again in detail.

      Reviewer #1 (Recommendations for the authors):

      (1) A previous study (PMID: 31714209). showed that aging yeast cells undergo genomic missegregation in which material was abnormally segregated to the daughter cells, leading to cell cycle arrest. After that, the missegregation is either corrected by returning aberrantly segregated genetic material to the mother cells so that they can resume cell cycles, or if not corrected, the mother cells will terminally exist the cell cycle and eventually die. That paper also showed that this agedependent genomic missegregation is related to rDNA instability. Is the chromosome loss in this work related to the genomic missegregation reported before? Is it partially reversible like genomic missegregation? Are all the chromosomes lost in one cell division, like in the case of genomic missegregation? Some additional characterization and a discussion would be helpful. 

      As mentioned above, indeed the phenotype of full genome mis-segregation described by Crane et al. (2019) is observable in our data as well. At 24h ~3% of the cells segregate both SPBs to the bud, as they previously described (Supp Figure 1A and B).  This phenomenon is clearly distinct from asymmetric chromosome partition, where cells undergo anaphase, separate the SPBs and segregate one to the mother cell and one to the bud (Figure 2A).  Also, asymmetric chromosome partitioning affects only a subset of the chromosomes (see below), not the entire genome. Finally, unlike asymmetric chromosome partitioning, the frequency of genome mis-segregation in ageing was not alleviated by intron removal (Supp Figure 1B). Thus, these two processes are clearly distinct and driven by different mechanisms. Note that asymmetric chromosome partitioning appears 3 to 5 times more frequently than genomic mis-segregation.

      Supporting further the notion that these two processes are distinct, chromosome loss seals the end of the life of the cell, as we reported, indicating that this is not a reversible event.  Also, it does not involve all chromosomes at once. Cells that contain the labelled versions of both chromosome II and IV at the same time, the loss frequency of both chromosomes is less than 5%, whereas each chromosome is lost in 10-15% of the cells (Figure 1C). Thus, most cells lose one and keep the other. Furthermore, this indicates that there are many more cells losing at least one chromosome than the 15% that lose chromosome IV for example, probably 50% or more.  Thus, chromosome loss by asymmetric segregation is much more frequent than the partly transient transfer of the entire nucleus to the bud.

      (2) What percentage of aging WT cells undergo pre-mRNA leakage (using the GFP/mCherry reporter) during their entire lifespan? Is it a sporadic, reversible process or an accumulative, one-way deterioration? Previous studies (PMID: 32675375; PMID: 24332850; PMID: 36194205; PMID: 31291577) showed that only a fraction of yeast cells age with rDNA instability and ERC accumulation, as indicated by excessive rRNA transcription and nucleolar enlargement. Are they the same fraction of aging cells that undergo pre-mRNA leakage and chromosome loss? This information will indicate the prevalence of the key aging phenotypes reported in this work and should be readily obtainable from microfluidic experiments. In addition, a careful discussion would be helpful. 

      Pre-mRNA leakage is relatively widespread in the population, but it is difficult to put a precise number on it. Analysis of how the mCherry/GFP ratio changes in 146 individual cells between 0 and 24 hours and imaging in our microfluidics platform indicates that ~80% show an increase and 50% of the cells show an increase above 1.5-fold. Therefore, the frequencies of pre-mRNA leakage and chromosome loss are probably similar.  We have modified the discussion to account for these considerations.  This would be in the same range as the frequency of aging by ERC accumulation (mode 1) estimated by PMID: 32675375. 

      Reviewer #2 (Recommendations for the authors)

      The manuscript could use a bit of editing in places - please go through it once more. 

      Editing suggestions: 

      Line 80 – irrespective

      Corrected.

      Line 97 - these are not "rates" but frequencies. Please correct this error throughout. 

      Replaced “rate” with “frequency throughout the manuscript and the figures, when pertaining to chromosome loss

      Line 328 - increase in chromosome... 

      Corrected.

      Line 379 - tampering 

      Reviewer #3 (Recommendations for the authors):

      Specific Feedback to Authors 

      (a) Major Points 

      (i) While the proposed connection between ERC-mediated nuclear basket removal and erroneous error correction was clearly stated, this connection is correlative and was not directly tested. Specifically, although mutants impacting ERC levels were tested for missegregation, it was not directly tested if increased missegregation levels occurred due to ERC tethering to the NPC and subsequent nuclear basket removal. It is possible that the increased ERCs may be driving missegregation via a different pathway. Authors should consider experiments to strengthen this idea, such as looking at chromosome loss frequency in a sir2∆ 3x∆i double mutant, or a sir2∆ sgf73∆ double mutant. 

      This connection is addressed in the original version of the manuscript, where we show that preventing attachment of ERCs to the NPC, by removing the linker protein Sgf73, alleviates chromosome loss.  The link is further substantiated by the fact that removing the basket on its own promote chromosome loss and that in both cases, namely during normal aging, i.e., upon ERC accumulation, and upon basket removal the mechanism of chromosome loss is the same.  In both cases, it depends on the introns of the GLC7, MCM21 and NBL1 genes.  

      However, we acknowledge that the mutants tested have pleiotropic effects, making interpretation somewhat difficult, even when examining chromosome loss in multiple mutants that affect ERC formation and NPC remodelling, as we have done.  As recommended by the reviewer, we have characterized the phenotype of the sir2∆ 3x∆i mutant strain. Intron removal in the sir2∆ mutant cells largely rescued the elevated chromosome loss frequency of these cells and slightly extended their replicative lifespan (Figure 6D-E). We conclude that intron removal can remedy the chromosome loss phenotype of the sir2∆. Although clearly significant, the effect on the replicative lifespan was not very strong, likely due to the sir2∆ affecting other ageing processes.

      Touching on this question, we added a new set of experiments asking whether any accumulating DNA circle causes chromosome loss in an intron-dependent manner.  Thus, we have introduced a noncentromeric replicative plasmid in wild type and 3x∆i mutant strains carrying the labelled version of chromosome II (Figure 6A-C).  These studies show that these cells age much faster than wild type cells, as expected, and lose chromosomes at a higher frequency than non-transformed cells.  Finally, the effect is at least in part alleviated by removing the introns of NBL1, MCM21 and GLC7.

      Therefore, after adding this new and more direct test of the role of DNA circles in chromosome loss, we are confidently concluding that ERC-mediated basket removal is the trigger of chromosome loss in old cells.

      (b) Minor Points 

      (i) In Figure 1C, the text (lines 91-92) argues that chromosome loss happens abruptly as cells age; however the data only show loss at young and old time points, not an intermediate, which leaves open the possibility that chromosome loss is occurring gradually. While cells that lost chromosomes should fail to divide further, we don't know if these events happened and were simply excluded.

      We agree with the reviewer that formally the conclusion drawn in the lines 91-92 (of the original manuscript), namely that chromosome loss takes place abruptly as cells age, cannot be drawn from the Figure 1C alone but only from subsequent observations. However, since chromosome loss is lethal in haploid, as we mention in the text and the reviewer notes as well, it is difficult to envision how cells could lose chromosomes before the end of their lifespan and must therefore increase abruptly as the cells reach that point.  This is now underlined in the revised version of the manuscript. Accordingly, the frequency of chromosome loss per age group, which is depicted in Figure 3A, shows that the wild type cells that have budded less than 10 times show no chromosome loss. The chromosome loss frequency starts to ramp up only pass that point. Therefore, chromosome loss does not increase linearly with age.

      Additionally, cells that lost minichromosome should not arrest. We suggest that the interpretation of these data should be softened in the text, or that chromosome loss fraction could be more effectively portrayed as a Kaplan-Meier survival curve depicting cells that have not lost chromosomes, if these data are easily available. Or, chromosome loss at an intermediate time point could be depicted. 

      Since we cannot visualize more than 2 chromosomes at a time, it is not possible to plot the KaplanMeier curve of cells that have not lost chromosomes. However, as mentioned above, the chromosome loss frequencies at intermediate time points are depicted in Figure 3A and Figure 4B and shows that it increases with age.

      (ii) Also regarding Figure 1, it would be helpful to expound on the purpose of the minichromosomes, as well as how the Ubi-GFP minichromosome is constructed. 

      We now explained why we tested the loss of minichromosome, namely, as a mean to test whether the centromere is necessary and sufficient to drive the loss of the genetic material linked to it, i.e., chromosomes, in old cells.  Concerning the Ubi-GFP minichromosome, the Materials and methods section is now updated and reports plasmid construction, backbone used, primers as well as the plasmid sequence being available in the supplementary data.

      The purpose of the minichromosome initially appears to be the engineering of an eccDNA (ERC) with a CEN to demonstrate distinct behaviour, but it is unclear whether this was actually conducted or if the minichromosome are simply CEN plasmids and/or if this was the intended goal. Furthermore, lines 102-103 state that the presence of a centromere was necessary and sufficient for minichromosome loss. However, since no constructs lacking a centromere were tested, necessity cannot be concluded. Please clarify this in the text and include experimental details to help readers understand what was tested. 

      We apologize for having been too short here. The behaviour of the CEN-less version of this plasmid has been characterized in detail in previous studies (Shcheprova et al., 2008; Denoth-Lippuner 2014, Meinema et al 2022). Here we focused on the behaviour of the CEN+ version of an otherwise Identical plasmid.  We now clarify in the text that this plasmid is retained in the mother cell when CEN-less and cite the relevant literature. 

      (iii) It is unclear how cells at 0-3 budding events were identified in assays using the microfluidics platform. Can the authors clarify the known "age" of the cells once captured, i.e. how do the authors know how many divisions a cell has undergone prior to capture? 

      The reviewer is right; we do not know the exact age of these cells.  However, in any asynchronous population of yeast cells, which is what we start from, 50% of the cells are newborn daughters, 25% have budded once, 12.5 have budded twice, 6.25 % have budded three times…  Therefore, at the time of loading, 93% of the cells have budded between 0 and 3 times.  For this reason, we report to this population as cells age 0-3 CBE. We acknowledge that this is an approximation, but it remains a relatively safe one.  

      (iv) While the schematic in Figure 2D is generally helpful, a different depiction of the old and new SPBs would be beneficial in cases where the new SPB and TetR-GFP are depicted as colocalized, it is difficult to see that the red is fainter for the new SPB. 

      We have corrected this issue by completely separating the SPB and the Chromosome signals in the Figure 2D.

      (v) In Figure 2F, the grey colour of the 12h Ipl1-321 data bar did not have high enough contrast when the manuscript was printed-would recommend changing this to a darker shade. 

      We have corrected this issue by using a darker shade of grey.

      (vi) In Figure 3A, 'Budding' is misspelled on X-axis label  

      We have corrected this error.

      (vii) In Figure 4, the authors should clarify the differences between the analyses in panels B and C. The distinction is not immediately clear and may be difficult to grasp upon initial reading. 

      We have corrected this issue in the main text as well as figure legend.

      (viii) In Figure 5, It would aid comparisons to depict the 3x∆i only as well on panels B, D, and E. 

      We have added 3x∆i data to Figure 5,6 and 8.

      (ix) In Figure 6D, it is unclear why there was an appreciable level of unspliced RNA in the wild-type and sir2∆ young cells. Additionally, it is unclear why there is so much signal observed in the Merge image for the old wild-type cell, especially regarding the apparent bright spot. Is that nuclear signal? Please clarify. 

      The pre-mRNA processing reporter is not very efficiently spliced. It was selected as such during design (Sorenson et al 2014; DOI: 10.1261/rna.042663.113) to provide sensitivity. As for the bright spot occurring, translation of the unspliced reporter produces the N-terminal part of a ribosomal protein, a fraction of which forms some sort of nuclear aggregate in a fraction of the population. 

      (x) In Figure 6E, why does the sir2∆ exhibit higher mCherry/GFP than the wild-type and fob1∆ at "young age"? Is this due to disrupted proteostasis in the sir2∆, or a different pleiotropic effect of sir2∆? Please comment on this observation in the text.

      Indeed, as we have stated in the text the sir2∆ mutation already perturbs pre-mRNA processing in young cells. We do not know the reason of this but indeed it is most probably reflective of its pleiotropic function. Following the reviewer’s request, we now state this in the text. For example, Sir2 may regulate the acetylation state of the basket itself.  The genetic interactions observed between sir2∆ and quite a few nucleoporin mutations seem to support this possibility. 

      (xi) Throughout, the authors switch between depicting aging in Completed Budding Events versus hours, which made it difficult to compare data across figures

      Ideally, all the data in this manuscript should be plotted according to the CBE age of the cell. To ensure that the major findings are plotted in such a way, we have done so for over ~3000 combined cells and thousands of replicative divisions in Figures 3,5-7. All the measurements of chromosome loss at a specific CBE had to be done manually, due to the absence of algorithms that would be able to accurately detect chromosome loss and replicative age. Therefore, doing this for the entirety of our dataset, encompassing well over 50 ageing chips and tens of thousands of cells is not easily doable at this stage. 

      (xii) Typo on line 12 (Sindle Pole Body) 

      We have corrected this error.

      (xiii) The phrase should be 'chromosome partitioning' rather than 'chromosome partition', throughoutfor example, line 17 

      Replaced “chromosome partition” with “chromosome partitioning” throughout the text.

      (xiv) There are inconsistencies between plural and singular references throughout sentences-example, lines 35-37, and lines 44-45. 

      We carefully combed through the manuscript again and hope that we caught all inconsistencies.

    1. eLife Assessment

      This important study of artificial selection in microbial communities shows that the possibility of selecting a desired fraction of slow and fast-growing types is impacted by their initial fractions. The evidence, which relies on mathematical analysis and simulations of a stochastic model, is compelling. It highlights the tension between selection at the strain and the community level. This study should be of interest to researchers interested in ecology, both theoretical and experimental.

    2. Reviewer #1 (Public review):

      Summary:

      The authors demonstrate with a simple stochastic model that the initial composition of the community is important in achieving a target frequency during the artificial selection of a community.

      Strengths:

      To my knowledge, the intra-collective selection during artificial selection has not been seriously theoretically considered. However, in many cases, the species dynamics during the incubation of each selection cycle is important and relevant to the outcome of the artificial selection experiment. Stochasticity from birth and death (demographic stochasticity) plays a big role in these species' abundance dynamics. This work uses a simple framework to tackle this idea meticulously.

      This work may or may not be related to hysteresis (path dependency). If this is true, maybe it would be nice to have a discussion paragraph talking about how this may be the case. Then, this work would even attract the interest of people studying dynamical systems.

      Weaknesses:

      (1) Connecting structure and function.<br /> In typical artificial selection literature, most of them select the community based on collective function. Here in this paper, the authors are selecting a target composition. Although there is a schematic cartoon illustrating the relationship between collective function (y-axis) and the community composition in the main figure 1, there is no explicit explanation or justification of what may be the origin of this relationship. I think giving the readers a naïve idea about how this structure-function relationship arises in the introduction section would help. This is because the conclusion of this paper is that the intra-collective selection makes it hard to artificially select for a community that has an intermediate frequency of f (or s). If there is really evidence or theoretical derivation from this framework that indeed the highest function comes from the intermediate frequency of f, then the impact of this paper would increase because the conclusions of this stochastic model could allude to the reasons for the prevalent failures of artificial selection in literature.

      (2) Explain intra-collective and inter-collective selection better for readers.<br /> The abstract, the introduction, and the result section use these terms or intra-collective and inter-collective selection without much explanation. For the wide readership of eLife, a clear definition in the beginning would help the audience grasp the importance of this paper, because these concepts are at the core of this work.

      (3) Achievable target frequency strongly depending on the degree of demographic stochasticity.<br /> I would expect that the experimentalists would find these results interesting and would want to consider these results during their artificial selection experiments. The main figure 4 indicates that the Newborn size N0 is a very important factor to consider during the artificial selection experiment. This would be equivalent to how much bottleneck you impose on the artificial selection process in every iteration step (i.e., the ratio of serial dilution experiment). However, with a low population size, all target frequencies can be achieved, and therefore in these regimes, the initial frequency now does not matter much. It would be great for the authors to provide what the N0 parameter actually means during the artificial selection experiments. Maybe relative to some other parameter in the model. I know this could be very hard. But without this, the main result of this paper (initial frequency matters) cannot be taken advantage of by the experimentalists.

      (4) Consideration of environmental stochasticity.<br /> The success (gold area of Figure 2d) in this framework mainly depends on the size of the demographic stochasticity (birth-only model) during the intra-collective selection. However, during experiments, a lot of environmental stochasticity appears to be occurring during artificial selection. This may be out of the scope of this study. But it would definitely be exciting to see how much environmental stochasticity relative to the demographic stochasticity (variation in the Gaussian distribution of F and S) matters in succeeding in achieving the target composition from artificial selection.

      (5) Assumption about mutation rates<br /> If setting the mutation rates to zero does not change the result of the simulations and the conclusion, what is the purpose of having the mutation rates \mu? Also, is the unidirectional (S -> F -> FF) mutation realistic? I didn't quite understand how the mutations could fit into the story of this paper.

      (6) Minor points<br /> In Figure 3b, it is not clear to me how the frequency difference for the Intra-collective and the Inter-collective selection is computed.<br /> In Figure 5b, the gold region (success) near the FF is not visible. Maybe increase the size of the figure or have an inset for zoom-in. Why is the region not as big as the bottom gold region?

      Comments on revisions:

      I thank the authors for addressing many points raised by the reviewers. Overall, the readability of the manuscript has improved with more context provided around why they were solving this specific problem. However, I've found many of the responses to be too terse. It would have been nicer if there had been more discussion and description of the thought process that led up to the conclusions they made for each comment or question. Instead, many of the responses only showed the screenshot of the text they added.

      Most of my comments or questions were answered. Below are my comments on some of the authors' responses.

      (2) Explain intra-collective and inter-collective selection better for readers.<br /> In the Abstract and Introduction, you've added more sentences about the intra-collective or inter-collective selection. However, these are either making analogies to the waterfall or just describing the result of the intra/inter-collective selection. I would still appreciate a proper definition of those terms, which is paramount for readers to understand the entire paper.

      (4) Consideration of environmental stochasticity.<br /> I think providing the reason 'why' the paper focuses on demographic stochasticity and not environmental stochasticity will greatly justify the paper's work. For example, citing papers that actually performed artificial selection and pointing out that your model captures the stochasticity from those kinds of experiments would be great.

      (5) Assumption about mutation rates.<br /> It would be great if you could add a citation in the added sentence to support your claim: "This scenario is encountered in biotechnology: .....".

    3. Reviewer #3 (Public review):

      The authors address the process of community evolution under collective-level selection for a prescribed community composition. They mostly consider communities composed of two types that reproduce at different rates, and that can mutate one into the other. Due to such difference in 'fitness' and to the absence of density dependence, within-collective selection is expected to always favour the fastest grower, but collective-level selection can oppose this tendency, to a certain extent at least. By approximating the stochastic within-generation dynamics and solving it analytically, the authors show that not only high frequencies of fast growers can be reproducibly achieved, aligned with their fitness advantage. Small target frequencies can also be maintained, provided that the initial proportion of fast growers is sufficiently small. In this regime, similar to the 'stochastic corrector' model, variation upon which selection acts is maintained by a combination of demographic stochasticity and of sampling at reproduction. These two regions of achievable target compositions are separated by a gap, encompassing intermediate frequencies that are only achievable when the bottleneck size is small enough or the number of communities is (disproportionately) large.

      A similar conclusion, that stochastic fluctuations can maintain the system over evolutionary time far from the prevalence of the faster-growing type, is then confirmed by analyzing a three-species community, suggesting that the qualitative conclusions of this study are generalizable to more complex communities.

      I expect that these results will be of broad interest to the community of researchers who strive to improve community-level selection but are often limited to numerical explorations, with prohibitive costs for a full characterization of the parameter space of such embedded populations. The realization that not all target collective functions can be as easily achieved and that they should be adapted to the initial conditions and the selection protocol is also a sobering message for designing concrete applications.

      A major strength of this work is that the qualitative behaviour of the system is captured by an analytically solvable approximation so that the extent of the 'forbidden region' can be directly and generically related to the parameters of the selection protocol.

      The phenomenon the authors characterize is ecological in nature, though it is maintained even when switching between types is possible. Calling this dynamics community evolution reflects a widespread ambiguity in the field, not ascribable just to this work.

      Although different types compete for being represented in the next generation's propagules, within-generation ecology is here representative of exponential growth. As species interactions are commonly manifest in lab serial dilution experiments, it would be interesting if future work explores the extent of the robustness of these results to density-dependent demography.

    1. eLife Assessment

      In this important quantitative study of HIV-1 evolution in humans and rhesus macaques, selection coefficients are inferred at scale over the HIV genome. Selection coefficients are similar in humans and macaques, providing compelling evidence that these coefficients are representative of the fitness landscapes of these viruses within hosts. This work will be of interest to the community working on quantitative evolution and fitness landscape inference, and the finding that rapid fitness gains in the HIV population predict bNAb emergence has significant implications for HIV vaccine design.

    2. Reviewer #1 (Public review):

      Summary:

      The present work studies the coevolution of HIV-1 and the immune response in clinical patient data. Using the Marginal Path Likelihood (MPL) framework, they infer selection coefficients for HIV mutations from time-series data of virus sequences as they evolve in a given patient.

      Strengths:

      The authors analyze data from two human patients, consisting of HIV population sequence samples at various points in time during the infection. They inferred selection coefficients from the observed changes in sequence abundance using MPL. Most beneficial mutations appear in viral envelop proteins. The authors also analyze SHIV samples in rhesus macaques, and find selection coefficients that are compatible with those found in the corresponding human samples.

      The manuscript is well written and organized.

      Comments on revisions:

      In their revised version the authors have addressed most of these points satisfactorily.

    3. Reviewer #2 (Public review):

      This paper combines a biological topic of interest with the demonstration of important theoretical/methodological advances. Fitness inference is the foundation of the quantitative analysis of adapting systems. It is a hard and important problem and this paper highlights a compelling approach (MPL) first presented in (1) and refined in (2), roughly summarized in equation 3.

      The authors find that positive selection shapes the variable regions of env in shared patterns across two patient donors. The patterns of positive selection are interesting in and of themselves, they confirm the intuition that hyper-variation in env is the result of immune evasion rather than a broadly neutral landscape (flatness). They show that the immune evasion patterns due to CD8 T and naive B-cell selection are shared across patients. Furthermore, they suggest that a particular evolutionary history (larger flux to high fitness states) is associated with bNAb emergence. Mimicking this evolutionary pattern in vaccine design may help us elicit bNAbs in patients in the future.

      The fitness landscape of env in multiple hosts is immensely valuable especially because of how often SHIV has used as proxy for HIV. The strength of reversion-to-consensus selection is a known pattern of HIV post-infection populations but they are nicely quantified here. Agreement between SHIV and HIV evolution is shown. They find selection is larger for autologous antibodies than the bNAbs themselves (perhaps bNAbs are just too small a component of the host response to drive the bulk of selection?), and that big fitness increases precede antibody breadth in rhesus-macaques, suggesting that this fitness increase is the immune challenge required to draw forth a bNAb. All of high interest to HIV researchers.

      (1) Sohail, M. S., Louie, R. H., McKay, M. R. & Barton, J. P. Mpl resolves genetic linkage in fitness inference from complex evolutionary histories. Nature biotechnology 39, 472-479 (2021).

      (2) Shimagaki, K. & Barton, J. P. Bézier interpolation improves the inference of dynamical models from data. Physical Review E 107, 024116 (2023).

      Strength of evidence:

      Equation 3 is a beautiful and intuitive tool that accounts for linkage and can be solved precisely even in the presence of detailed mutational and selection models. They have addressed my earlier concerns the effects of incomplete observations of the frequency bias fitness inference on rare sites.

      Whether the fact that fitness increases occured before or after the presence of the bnab remains incompletely known. bNAb detection is different from bNAb presence and the possibility that fitness increases occurred after the bNAbs appeared remains. Still, their conclusion is plausible and fits in with the other observations which form a coherent and compelling picture.

      Overall this is a convincing paper. It is a valuable introduction to a practical method of fitness inference at the scale of the entire env gene and how this information can be leveraged to learn some interesting biology.

    4. Reviewer #3 (Public review):

      Summary:

      Shimagaki et al. investigate the virus-antibody coevolutionary processes that drive the development of broadly neutralizing antibodies (bnAbs). The study's primary goal is to characterize the evolutionary dynamics of HIV-1 within hosts that accompany the emergence of bnAbs, with a particular focus on inferring the landscape of selective pressures shaping viral evolution. To assess the generality of these evolutionary patterns, the study extends its analysis to rhesus macaques (RMs) infected with simian-human immunodeficiency viruses (SHIV) incorporating HIV-1 Env proteins derived from two human individuals.

      Strengths:

      A key strength of the study is its rigorous assessment of the similarity in evolutionary trajectories between humans and macaques. This cross-species comparison is particularly compelling, as it quantitatively establishes a shared pattern of viral evolution using a sophisticated inference method. The finding that similar selective pressures operate in both species adds robustness to the study's conclusions and suggests broader biological relevance. In the revised version, the Authors included a simple but clear explanation of the statistical method for inferring the model's parameters in the main text. Moreover, I find the potential implications of the methodology absent in the original submission very interesting.

      Conclusions:

      Overall, the study presents a compelling analysis of HIV-1 evolution and its parallels in SHIV-infected macaques.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The present work studies the coevolution of HIV-1 and the immune response in clinical patient data. Using the Marginal Path Likelihood (MPL) framework, they infer selection coefficients for HIV mutations from time-series data of virus sequences as they evolve in a given patient.

      Strengths:

      The authors analyze data from two human patients, consisting of HIV population sequence samples at various points in time during the infection. They infer selection coefficients from the observed changes in sequence abundance using MPL. Most beneficial mutations appear in viral envelop proteins. The authors also analyze SHIV samples in rhesus macaques, and find selection coefficients that are compatible with those found in the corresponding human samples.

      Weaknesses:

      The MPL method used by the authors considers only additive effects of mutations, thus ignoring epistasis.

      As suggested, we have now addressed this limitation by inferring epistatic fitness landscapes for CH505, CH848, SHIV.CH505, and SHIV.CH848. Indeed, the computational burden of the epistasis inference procedure was one constraint that motivated us to consider only additive fitness in the previous version of our paper. The original approach developed by Sohail et al. (2022) tested only sequences with <50 sites due to this limitation, far smaller than the ones we consider. Beyond this computational constraint, we also believed that 1) an additive fitness model may suffice to capture local fitness landscapes, and practically, 2) epistatic interactions are more challenging to validate than the effects of individual mutations, making the interpretation of the model more complex.

      However, after performing the analyses described in this paper, we developed a new approach for identifying epistatic interactions that can scale to much longer sequences (Shimagaki et al., Genetics, in press). We therefore applied this method to infer an epistatic fitness landscape for the HIV and SHIV data sets that we studied. As in that work, we focused on short-range (<50 bp) interactions which we could more confidently estimate from data. We have added a section in the SI describing the epistatic fitness model and our analysis. 

      Overall, we found substantial agreement between the epistatic and purely additive models in terms of the estimated fitness effects of individual mutations (new Supplementary Fig. 8) and overall fitness (Supplementary Fig. 9). Consistent with our prior work, we did not find substantial evidence for very strong epistatic interactions (Supplementary Fig. 10). This does not necessarily mean that strong epistatic interactions do not exist; rather, this shows that strong interactions don’t substantially improve the fit of the model to data, and thus many are regularized toward zero. While the biological validation of epistatic interactions is challenging, we found that the largest epistatic interactions, which we defined as the top 1% of all shortrange interactions, were modestly but significantly enriched in the CD4 binding site, V1 and V5 regions for CH505 and in the CD4 binding site, V4, and V5 for CH848. In addition, mutation pairs N280S/V281A and E275K/V281G, which confer resistance to CH235, ranked in the top 15% of all epistatic interactions in CH505.

      We have now included an additional section in the Results, “Robustness of inferred selection to changes in the fitness model and finite sampling”, which discusses our epistatic analyses (page 6, lines 415-464), along with the above Supplementary Figures and a technical section in the SI summarizing the epistasis inference approach.

      Although the evolution of broadly neutralizing antibodies (bnAbs) is a motivating question in the introduction and discussion sections (and the title), the relevance of the analysis and results to better understanding how bnAbs arise is not clear. The only result presented in direct connection to bnAbs is Figure 6.

      It is true that, while bnAb development is a major motivator of our study, our analysis focuses on HIV-1 and does not directly consider antibody evolution. We have now brought attention to this point as a limitation directly in the Discussion. Following the suggestion below in the “Recommendations for the authors,” we have edited our manuscript to place more emphasis on viral fitness and somewhat reduce the emphasis on bnAbs, though this remains an important motivating factor. Specifically, the Abstract now begins

      Human immunodeficiency virus (HIV)-1 evolves within individual hosts to escape adaptive immune responses while maintaining its capacity for replication. Coevolution between the HIV-1 and the immune system generates extraordinary viral genetic diversity. In some individuals, this process also results in the development of broadly neutralizing antibodies (bnAbs) that can neutralize many viral variants, a key focus of HIV-1 vaccine design. However, a general understanding of the forces that shape virusimmune coevolution within and across hosts remains incomplete. Here we performed a quantitative study of HIV-1 evolution in humans and rhesus macaques, including individuals who developed bnAbs.

      We have similarly modified the Discussion to focus first on viral fitness. In response to comments from Reviewer 3, we have also more clearly articulated how our work might contribute to the understanding of bnAb development in the Discussion.

      Questions or suggestions for further discussion:

      I list here a number of points for which I believe the paper would benefit if additional discussion/results were included.

      The MPL method used by the authors considers only additive effects of mutations, thus ignoring epistasis. In Sohail et al (2022) MBE 39(10), p. msac199  (https://doi.org/10.1093/molbev/msac199) an extension of MPL is developed allowing one to infer epistasis. Can the authors comment on why this was not attempted here?

      I presume one possible reason is that epistasis inference requires considerably more computational effort (and more data). However, since the authors find most beneficial mutations occurring in Env, perhaps restricting the analysis to Env genes only (e.g. the trimer shown in Figure 2) can lead to tractable inference of epistasis within this segment (instead of the full genome).

      As described above, we have now addressed this comment by inferring epistatic fitness landscapes for the data sets that we consider. Our overall results using the epistatic fitness model are consistent with the ones that we previously obtained with an additive model.

      Do the authors find correlations in the inferred selection coefficients of the two samples CH505 and CH848? I could not find any discussion of this in the manuscript. Only correlations between Humans and RM are discussed.

      To address this question, we compared the fitness values and individual selection coefficients across CH505 and CH848 data sets. We found little correlation between CH505 and CH848 fitness values (shown in a new Supplementary Fig. 6) or selection coefficients. We found only 199 common mutations between HIV-1 amino acid sequences from CH505 and CH848 out of 868 and 1,406 total mutations, respectively. Thus, we were not surprised to find no strong relationship between fitness estimates from CH505 and CH848 data sets. 

      Reviewer #2 (Public review):

      Summary:

      This paper combines a biological topic of interest with the demonstration of important theoretical/methodological advances. Fitness inference is the foundation of the quantitative analysis of adapting systems. It is a hard and important problem and this paper highlights a compelling approach (MPL) first presented in (1) and refined in (2), roughly summarized in equation 12.

      (1) Sohail, M. S., Louie, R. H., McKay, M. R. & Barton, J. P. Mpl resolves genetic linkage in fitness inference from complex evolutionary histories. Nature biotechnology 39, 472-479 (2021).

      (2) Shimagaki, K. & Barton, J. P. Bézier interpolation improves the inference of dynamical models from data. Physical Review E 107, 024116 (2023).

      The authors find that positive selection shapes the variable regions of env in shared patterns across two patient donors. The patterns of positive selection are interesting in and of themselves, they confirm the intuition that hyper-variation in env is the result of immune evasion rather than a broadly neutral landscape (flatness). They show that the immune evasion patterns due to CD8 T and naive B-cell selection are shared across patients. Furthermore, they suggest that a particular evolutionary history (larger flux to high fitness states) is associated with bNAb emergence. Mimicking this evolutionary pattern in vaccine design may help us elicit bNAbs in patients in the future.

      There is a lot of information to be found in the full fitness landscape of env. The enormous strength of reversion-to-consensus in the patterns is a known pattern of HIV post-infection populations but they are nicely quantified here. Agreement between SHIV and HIV evolution is shown. They find selection is larger for autologous antibodies than the bNAbs themselves (perhaps bNAbs are just too small a component of the host response to drive the bulk of selection?), and that big fitness increases precede antibody breadth in rhesus macaques, suggesting that this fitness increase is the immune challenge required to draw forth a bNAb. This is all of high interest to HIV researchers.

      Strength of evidence:

      One limitation is, of course, that the fitness model is constant in time when the immune challenge is variable and changing. This simplification may complicate some interpretations.

      We agree that this is a limitation of our current approach. In prior work, we have found that the constant fitness effects of mutations that we infer typically reflect the time-averaged fitness effect when the selection changes over time (Gao and Barton, PNAS 2025; Lee et al., Nat Commun 2025). It could be difficult, however, to capture changes in selection that fluctuate rapidly with underlying immune responses. We have added a new paragraph in the Discussion that more clearly sets out some of the limitations of our analysis, including our assumption of constant selection coefficients.

      There are additional methodological and technical limitations that should be considered in the interpretation of our results. Most notably, we assume that the viral fitness landscape is static in time. While we do not expect selection for effective replication (“intrinsic” fitness) to change substantially over time, pressure for immune escape could vary along with the immune responses that drive them. In prior work, we have found that constant selection coefficients typically reflect the average fitness effect of a mutation when its true contribution to fitness is time-varying [42,43]. This may not adequately description mutational effects that undergo large or rapid shifts in time. Future work should also examine temporal patterns in selection for individual mutations.

      Equation 12 in the methods is really a beautiful tool because it is so simple, but accounts for linkage and can be solved precisely even in the presence of detailed mutational and selection models. However, the reliance on incomplete observations of the frequency leads to complications that must be carefully (re)addressed here.

      For instance, the consistent finding of strong selection in hypervariable regions is biologically intuitive but so striking, that I worry that it might be the result of a bias for selection in high entropy regions. 

      Thank you for this suggestion. We agree that it is important to carefully interrogate these results. To assess the effects of general sequence variability on inferred selection, we first computed a position-specific entropy measure, H<sub >i</sub >, for each site i. We first defined the time-dependent entropy H<sub >i</sub >(t) = - ∑<sub >a</sub> x<sub>i</sub> (a, t) log x<sub>i</sub> (a, t)), where x<sub>i</sub> (a, t) represents the frequency of amino acid/nucleotide a at position i and time t, at each sample time. We then computed H<sub>i</sub> as the average of H<sub>i</sub>(t) across all sample times. A new Supplementary Fig. 1 plots the entropy against the inferred selection coefficients. Although some sequence variation must be observed in order for us to infer that a mutation is beneficial, we did not find a systematic bias toward larger (more beneficial) selection coefficients at more variable sites. Overall, we found only a modest correlation between inferred selection coefficients and entropy (Pearson’s r = 0.33 and 0.29 for CH505 and CH848, respectively), which appears to be partly driven by the tendency for mutations inferred to be significantly deleterious to occur at sites with low entropy. In addition to the new Supplementary Figure, we have added a reference to this analysis in the main text:

      To test whether our results might be biased by overall sequence variability, we examined the relationship between our inferred selection coefficients and entropy, a common measure of sequence variability. Overall, we found only a modest correlation between selection and entropy, suggesting that the signs of selection that we observe are not due to increased sequence variability alone (Supplementary Fig. 1).

      Mutational and covariance terms in equation 12 might be underestimated, due to finite sampling effect in highly diverse populations. Sampling effects lead to zeros in x(t) when actual frequency zeros might be rare at the population sizes of HIV viral loads and mutation rates. Both mutational flux and C underestimation will bias selection upward in eq. 12. 

      The prior papers (1) and (2) seem to show robustness to finite sampling effects, but, again, more care needs to be shown that this robustness transfers to the amino acid inference under these conditions. That synonymous sites are rarely selected for in the nucleotide level is a good sign, and it may be a matter of simply fully explaining the amino-acid level model.

      As above, we agree that these tests are important. To assess the robustness of our results to finite sampling, we performed bootstrap sampling on the viral sequences and inferred selection coefficients using the resampled sequences. Specifically, we resampled the same number of sequences as in the original data at each time point and repeated this for all time points across all HIV-1 and SHIV data sets. A new Supplementary Fig. 11 shows a typical comparison of the original selection coefficients vs. those obtained through bootstrap resampling. Overall, we observe a high degree of consistency between the selection coefficients in each case, which is surely aided by the long time series in these data sets. As pointed out by the reviewer, uncertainty in low-frequency mutations is a particular concern, though the effects on inferred selection are mitigated by regularization. 

      We have added a section in the Results, “Robustness of inferred selection to changes in the fitness model and finite sampling”, which includes this analysis:

      Finite sampling of sequence data could also affect our analyses. To further test the robustness of our results, we inferred selection coefficients using bootstrap resampling, where we resample sequences from the original ensemble, maintaining the same number of sequences for each time point and subject. The selection coefficients from the bootstrap samples are consistent with the original data (see Supplementary Fig. 11), with Pearson’s r values of around 0.85 for HIV-1 data sets and 0.95 for SHIV data sets, respectively.

      Uncertainty propagates to the later parts of the paper, eg. HIV and SIV shared patterns might be the result of shared biases in the method application. However, this worry does not extend to the apples-to-apples comparison of fitness trajectories across individuals (Figures 5 and 6) which I think are robust (for these sample sizes). 

      One way to address this uncertainty is to compare the fitness values and individual selection coefficients across CH505 and CH848 data sets, which was also requested by Reviewer 1. Overall, we found little correlation between CH505 and CH848 fitness values (shown in a new Supplementary Fig. 6) or selection coefficients. This suggests that similarities between HIV-1 and SHIV landscapes are not solely determined by potential biases in the inference approach. We have now added a reference to this point in the main text:

      In contrast, the inferred fitness landscapes of CH505 and CH848, which share few mutations in common, are poorly correlated (Supplementary Fig. 6). This suggests that the similarities between viral fitness values in humans and RMs are not artifacts of the model, but rather stem from similarities in underlying evolutionary drivers.

      The timing evidence is slightly weakened by the fact that bNAb detection is different from bNAb presence and the possibility that fitness increases occurred after the bNAbs appeared remains. Still, their conclusion is plausible and fits in with the other observations which form a coherent and compelling picture.

      Yes, we agree that this is a limitation of our analysis — bNAbs may have been present at low levels before they were detected, and we cannot definitively reject selection by bNAbs. Nonetheless, in at least one case (RM5695), rapid fitness gains were substantially separated in time from bNAb detection (roughly 2 weeks after infection vs. 16 weeks, respectively). We have now added this point in a new paragraph in the Discussion:

      While we found a strong relationship between viral fitness dynamics and the emergence of bnAbs, it may not be true that the former stimulates the latter. For example, bnAbs may have been present within each host before they were experimentally detected. Rapid viral fitness gains within hosts that developed broad antibody responses could then have been driven by undetected bnAb lineages. However, we did not find strong selection for known bnAb resistance mutations, and in at least one case (RM5695), rapid fitness gains (roughly 2 weeks after infection) substantially preceded bnAb detection (16 weeks). Still, given the limited size of the data set that we studied, it is unclear the extent to which our results will transfer to larger and broader data sets.

      Overall thisrpretations could provide valuable insights into the broader significance of these results. is a convincing paper, part of a larger admirable project of accurately inferring complete fitness landscapes.

      Reviewer #3 (Public review):

      Summary:

      Shimagaki et al. investigate the virus-antibody coevolutionary processes that drive the development of broadly neutralizing antibodies (bnAbs). The study's primary goal is to characterize the evolutionary dynamics of HIV-1 within hosts that accompany the emergence of bnAbs, with a particular focus on inferring the landscape of selective pressures shaping viral evolution. To assess the generality of these evolutionary patterns, the study extends its analysis to rhesus macaques (RMs) infected with simianhuman immunodeficiency viruses (SHIV) incorporating HIV-1 Env proteins derived from two human individuals.

      Strengths:

      A key strength of the study is its rigorous assessment of the similarity in evolutionary trajectories between humans and macaques. This cross-species comparison is particularly compelling, as it quantitatively establishes a shared pattern of viral evolution using a sophisticated inference method. The finding that similar selective pressures operate in both species adds robustness to the study's conclusions and suggests broader biological relevance.

      Weaknesses:

      However, the study has some limitations. The most significant weakness is that the authors do not sufficiently discuss the implications of the observed similarities. While the identification of shared evolutionary patterns (e.g., Figure 5) is intriguing, the study would benefit from a more explicit discussion of what these findings mean for instance, in the context of HIV vaccine design, immunotherapy, or fundamental viral-host interactions. Even speculative inte

      Thank you for this suggestion. We have now clarified the potential implications of our work in several areas. While speculative, one possible application is in vaccine design: it may be beneficial to design sequential immunogens to mimic the patterns of viral evolution associated with rapid fitness gains. This “population-based” design principle is different from typical approaches, which have focused on molecular details of virus surface proteins. 

      We have extended our discussion of our results in the context of viral evolution within and across hosts and related host species. Overall, our work suggests that there may be relatively few paths to significantly higher viral fitness in vivo. Evolutionary “contingencies” such as shifting immune pressure or epistatic interactions could influence the direction of evolution, but not so dramatically that the dynamics that we see in different hosts are not comparable. We have also connected our work more broadly to the literature in evolutionary parallelism in HIV-1 in different contexts.

      A secondary, albeit less critical, limitation is the placement of methodological details in the Supplementary Information. While it is understandable that the authors focus on results in the main text - especially since the methodology is not novel and has been previously described in earlier publications - some readers might benefit from a more thorough presentation of the method within the main paper.

      We have now modified the main text to add a new section, “Model overview,” that lays out the key steps of our approach. While we reserve technical details for the Methods, we believe that this new section provides more intuition about how our results were obtained (including a discussion of the important Eq. 12, now Eq. 3 in the main text) and our underlying assumptions.

      Conclusions:

      Overall, the study presents a compelling analysis of HIV-1 evolution and its parallels in SHIV-infected macaques. While the quantitative comparison between species is a notable contribution, a deeper discussion of its broader implications would strengthen the paper's impact.

      Reviewer #1 (Recommendations for the authors):

      I suggest de-emphasizing bnAbs and focusing on selection landscape inference, which seems to be the actual focus of the paper.

      While we do not directly study antibody development in this work, bnAb development is certainly an important motivating factor. As described in the responses above, we have now modified the Abstract and Discussion to place relatively more emphasis on fitness comparisons and to relatively less focus on bnAb development.  

      Reviewer #2 (Recommendations for the authors):

      Please make sure that the MPL method is defined in this paper and its limitations are at least partially repeated.

      As noted in responses above, we have now included more methodological details in the main text of the paper, which we hope will make the intuition and assumptions involved in our analysis clearer.

      I'd like the code to better show or describe the model, I could not figure out the model details by looking at the code. It seems mostly just to be csv exporting for use with preexisting MPL code. A longer code readme would be helpful.

      We have now updated the README on GitHub to include a conceptual overview of our inference approach, which references how each step is implemented in the code.

      Reviewer #3 (Recommendations for the authors):

      Try to give some more details (not necessarily giving the full mathematical derivation) on the statistical method utilized.

      As noted above, we have now expanded our discussion of the statistical methods and assumptions in the main text.

      Figures 3 and 4 are somewhat 'messy'. Although I do not have a constructive suggestion here, I feel that with a little more effort maybe the authors could come up with something more clean.

      It is true that the mutation frequency dynamics are somewhat “choppy” and difficult to follow intuitively. To attempt to make these figures easier to parse visually, we have increased the transparency on the lines and added exponential smoothing to the mutation frequencies, resulting in smoother trajectories. The trajectories without smoothing are retained in Supplementary Fig. 3. Here we also note that this smoothing is for visual purposes only; we use the original frequency trajectories for inference, rather than the smoothed ones.

    1. eLife Assessment

      This valuable study characterizes the emergence of the membrane-associated periodic cytoskeleton (MPS) in the axons of human motor neurons derived from induced pluripotent stem cells. Super-resolution imaging of beta-II spectrin provides convincing evidence for the patterned assembly of spectrin-poor gaps and spectrin-rich MPS in the medial region of the axons and its enhancement by the kinase inhibitor staurosporine. The data advocates against gap formation by cytoskeleton disassembly in a continuous MPS. Instead, a continuous MPS may result from nascent MPS patches and their maturation, a model that would benefit from live imaging for validation.

    2. Reviewer #1 (Public review):

      Summary:

      Ever since the surprising discovery of the membrane-associated Periodic Skeleton (MPS) in axons, a significant body of published work has been aimed at trying to understand its assembly mechanism and function. Despite this, we still lack a mechanistic understanding of how this amazing structure is assembled in neuronal cells. In this article, the authors report a "gap-and-patch" pattern of labelled spectrin in iPSC-derived human motor neurons grown in culture. The mid-sections of these axons exhibit patches with reasonably well-organized MPS that are separated by gaps lacking any detectable MPS and having low spectrin content. Further, they report that the intensity modulation of spectrin is correlated with intensity modulations of tubulin as well. However, neurofilament fluorescence does not show any correlation. Using DIC imaging, the authors show that often the axonal diameter remains uniform across segments, showing a patch-gap pattern. Gaps are seen more abundantly in the midsection of the axon, with the proximal section showing continuous MPS and the distal segment showing continuous spectrin fluorescence but no organized MPS. The authors show that spectrin degradation by caspase/calpain is not responsible for gap formation, and the patches are nascent MPS domains. The gap and patch pattern increases with days in culture and can be enhanced by treating the cells using the general kinase inhibitor staurosporine. Treatment with the actin depolymerizing agent Latrunculin A reduces gap formation. The reasons for the last two observations are not well understood/explained.

      Strengths:

      The claims made in the paper are supported by extensive imaging work and quantification of MPS. Overall, the paper is well written and the findings are interesting. Although much of the reported data are from axons treated with staurosporine, this may be a convenient system to investigate the dynamics of MPS assembly, which is still an open question.

      Weaknesses:

      Much of the analysis is on staurosporine-treated cells, and the effects of this treatment can be broad. The increase in patch-gap pattern with days in culture is intriguing, and the reason for this needs to be checked carefully. It would have been nice to have live cell data on the evolution of the patch and gap pattern using a GFP tag on spectrin. The evolution of individual patches and possible coalescence of patches can be observed even with confocal microscopy if live cell super-resolution observation is difficult.

      Some more comments:

      (1) Axons can undergo transient beading or regularly spaced varicosity formation during media change if changes in osmolarity or chemical composition occur. Such shape modulations can induce cytoskeletal modulations as well (the authors report modulations in microtubule fluorescence). The authors mention axonal enlargements in some instances. Although they present DIC images to argue that the axons showing gaps are often tubular, possible beading artefacts need to be checked. Beading can be transient and can be checked by doing media changes while observing the axons on a microscope.

      (2) Why do microtubules appear patchy? One would imagine the microtubule lengths to be greater than the patch size and hence to be more uniform.

      (3) Why do axons with gaps increase with days in culture? If patches are nascent MPS that progressively grow, one would have expected fewer gaps with increasing days in culture. Is this indicative of some sort of degeneration of axons?

      (4) It is surprising that Latrunculin A reduces gap formation induced by staurosporine (also seems to increase MPS correlation) while it decreases actin filament content. How can this be understood? If the idea is to block actin dynamics, have the authors tried using Jasplakinolide to stabilize the filaments?

      (5) The authors speculate that the patches are formed by the condensation of free spectrins, which then leaves the immediate neighborhood depleted of these proteins. This is an interesting hypothesis, and exploring this in live cells using spectrin-GFP constructs will greatly strengthen the article. Will the patch-gap regions evolve into continuous MPS? If so, do these patches expand with time as new spectrin and actin are recruited and merge with neighboring patches, or can the entire patch "diffuse" and coalesce with neighboring patches, thus expanding the MPS region?

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Gazal et al. describe the presence of unique gaps and patches of BetaII-spectrin in medial sections of long human motor neuron axons. BII-spectrin, along with Alpha-spectrin, forms horizontal linkers between 180nm spaced F-actin rings in axons. These F-actin rings, along with the spectrin linkers, form membrane periodic structures (MPS) which are critical for the maintenance of the integrity, size, and function of axons. The primary goal of the authors was to address whether long motor axons, particularly those carrying familial mutations associated with the neurodegenerative disorder ALS, show defects in gaps and patches of BetaII-spectrin, ultimately leading to degradation of these neurons.

      Strengths:

      The experiments are well-designed, and the authors have used the right methods and cutting-edge techniques to address the questions in this manuscript. The use of human motor neurons and the use of motor neurons with different familial ALS mutations is a strength. The use of isogenic controls is a positive. The induction of gaps and patches by the kinase inhibitor staurosporine and their rescue by Latrunculin A is novel and well-executed. The use of biochemical assays to explore the role of calpains is appropriate and well-designed. The use of STED imaging to define the periodicity of MPS in the gaps and patches of spectrin is a strength.

      Weaknesses:

      The primary weakness is the lack of rigorous evaluation to validate the proposed model of spectrin capture from the gaps into adjacent patches by the use of photobleaching and live imaging. Another point is the lack of investigation into how gaps and patches change in axons carrying the familial ALS mutations as they age, since 2 weeks is not a time point when neurodegeneration is expected to start.

    4. Reviewer #3 (Public review):

      Summary:

      Gazal et al present convincing evidence supporting a new model of MPS formation where a gap-and-patch MPS pattern coalesces laterally to give rise to a lattice covering the entire axon shaft.

      Strengths:

      (1) This is a very interesting study that supports a change in paradigm in the model of MPS lattice formation.

      (2) Knowledge on MPS organization is mainly derived from studies using rat hippocampal neurons. In the current manuscript, Gazal et al use human IPS-derived motor neurons, a highly relevant neuron type, to further the current knowledge on MPS biology.

      (3) The quality of the images provided, specifically of those involving super-resolution, is of a high standard. This adequately supports the conclusions of the authors.

      Weaknesses:

      (1) The main concern raised by the manuscript is the assumption that staudosporine-induced gap and patch formation recapitulates the physiological assembly of gaps and patches of betaII-spectrin.

      (2) One technical challenge that limits a more compelling support of the new model of MPS formation is that fixed neurons are imaged, which precludes the observation of patch coalescence.

    5. Author response:

      Reviewer #1 (Public review)

      Summary:

      Ever since the surprising discovery of the membrane-associated Periodic Skeleton (MPS) in axons, a significant body of published work has been aimed at trying to understand its assembly mechanism and function. Despite this, we still lack a mechanistic understanding of how this amazing structure is assembled in neuronal cells. In this article, the authors report a "gap-and-patch" pattern of labelled spectrin in iPSC-derived human motor neurons grown in culture. The mid-sections of these axons exhibit patches with reasonably well-organized MPS that are separated by gaps lacking any detectable MPS and having low spectrin content. Further, they report that the intensity modulation of spectrin is correlated with intensity modulations of tubulin as well. However, neurofilament fluorescence does not show any correlation. Using DIC imaging, the authors show that often the axonal diameter remains uniform across segments, showing a patch-gap pattern. Gaps are seen more abundantly in the midsection of the axon, with the proximal section showing continuous MPS and the distal segment showing continuous spectrin fluorescence but no organized MPS. The authors show that spectrin degradation by caspase/calpain is not responsible for gap formation, and the patches are nascent MPS domains. The gap and patch pattern increases with days in culture and can be enhanced by treating the cells using the general kinase inhibitor staurosporine. Treatment with the actin depolymerizing agent Latrunculin A reduces gap formation. The reasons for the last two observations are not well understood/explained.

      We thank the reviewer for the detailed and accurate description of the data shown and its relevance to further our understanding of MPS assembly mechanism and function.

      Strengths:

      The claims made in the paper are supported by extensive imaging work and quantification of MPS. Overall, the paper is well written and the findings are interesting. Although much of the reported data are from axons treated with staurosporine, this may be a convenient system to investigate the dynamics of MPS assembly, which is still an open question.

      We thank the reviewer for the positive comments on the manuscript, the techniques used and the proposed model.

      Weaknesses:

      Much of the analysis is on staurosporine-treated cells, and the effects of this treatment can be broad. The increase in patch-gap pattern with days in culture is intriguing, and the reason for this needs to be checked carefully. It would have been nice to have live cell data on the evolution of the patch and gap pattern using a GFP tag on spectrin. The evolution of individual patches and possible coalescence of patches can be observed even with confocal microscopy if live cell super-resolution observation is difficult.

      We will consider the inclusion of live imaging experiments using the expressión of C-terminus-tagged human beta2-spectrin in the revised version of the manuscript. We are familiar with live-imaging and FRAP experiments and we will explore how to develop these experiments to generate data for inclusion in a revised submission.

      Some more comments:

      (1) Axons can undergo transient beading or regularly spaced varicosity formation during media change if changes in osmolarity or chemical composition occur. Such shape modulations can induce cytoskeletal modulations as well (the authors report modulations in microtubule fluorescence). The authors mention axonal enlargements in some instances. Although they present DIC images to argue that the axons showing gaps are often tubular, possible beading artefacts need to be checked. Beading can be transient and can be checked by doing media changes while observing the axons on a microscope.

      We don´t discard the presence of “nano beads” in these axons. It was recently suggested that the normal morphology of axons is indeed resembling “pearls-on-a-string” (Griswold et al., 2025), with “nano beads” separated by thin tubular "connectors" (also referred to as NSV, for non-synaptic varicosities). However, it is unlikely that the gap-patch pattern of beta2-spectrin can be attributed to such a morphology, given we used formaldehyde as fixative, and Griswold and colleagues show that the use of aldehyde-based fixatives do not preserve NSVs. We are able to see scattered axonal enlargements (“micro beads”), as we described in distal portions in Fig. 1C(C2) and E. However, the number, appearance and staining of these are not compatible with the gap-patch pattern in beta2-spectrin. Moreover, we would have expected to see these NSVs in our extensive STED imaging, yet we did not. We will discuss this further in the resubmission.

      (2) Why do microtubules appear patchy? One would imagine the microtubule lengths to be greater than the patch size and hence to be more uniform.

      Our stainings are for tubulin protein isoforms beta-III and alpha-II. That is, they would label microtubules, but free tubulin as well. The slight decrease in intensity for tubulin within gaps is indeed something to investigate, but we don´t interpret this as “patchy microtubules”. If the Reviewer refers to Fig. 2C-D, it is actually difficult to anticipate the slight decrease in intensity by the naked eye. To further support this, we will consider including stainings and quantitative analyses for microtubules in the resubmission. We are familiar with the use of permeabilizing conditions during fixation (in protocols known as “cytoskeletal fixation” to label microtubules (and not free tubulin).

      (3) Why do axons with gaps increase with days in culture? If patches are nascent MPS that progressively grow, one would have expected fewer gaps with increasing days in culture. Is this indicative of some sort of degeneration of axons?

      We agree with the apparent discrepancy. However, one has to take into account that these axons are still elongating even at 2 weeks in culture. Hence, at any time point, there is a new axonal compartment recently added, and hence, with low beta2-spectrin and no MPS. Also, the dynamical evolution of the MPS has to take into account beta2-spectrin supply. If supply is somehow lower than a given threshold, it is expected that there will be more gaps, given the new, more distant parts of the axons have a lower supply of beta2-spectrin . To explore this formally, we are working on simulations of these multifactorial dynamic systems to better understand this, that together with key experimental observations would enhance our understanding into overall MPS assembly in growing axons. However, findings for this project will be the subject of another manuscript.

      (4) It is surprising that Latrunculin A reduces gap formation induced by staurosporine (also seems to increase MPS correlation) while it decreases actin filament content. How can this be understood? If the idea is to block actin dynamics, have the authors tried using Jasplakinolide to stabilize the filaments?

      The results with the co-treatment with Latrunculin A and Staurosporine are indeed intriguing, and provide clear evidence that the gap-and-patch pattern arises from local assembly of the MPS, requiring new actin filaments. However, the fact that F-actin within the pre-formed MPS seems unaffected is not surprising. There are many different populations of F-actin in axons (i.e. MPS rings, longitudinal filaments, actin patches, actin trails). Latrunculin A affects filaments indirectly. The target of Latrunculin A is not actin filaments, but free monomers. It ultimately affects actin filaments as they end up losing monomers, and devoid of new monomers, filaments get shorter and eventually disappear. The drastic decrease in F-actin in our axons reflects that. The fact that F-actin in the MPS is preserved only speaks to the fact that these filaments are stable -if they are not losing monomers in the time frame of the treatment, the filament remains unaffected. We will support this with more observations and imaging and with a more extensive discussion summarizing the literature on the matter in the resubmission.

      On the other hand, the use of F-actin stabilizing drugs (like Jasplakinolide) would have a different effect. We will study how an experiment with these drugs could be informative of the process under investigation for the resubmission

      (5) The authors speculate that the patches are formed by the condensation of free spectrins, which then leaves the immediate neighborhood depleted of these proteins. This is an interesting hypothesis, and exploring this in live cells using spectrin-GFP constructs will greatly strengthen the article. Will the patch-gap regions evolve into continuous MPS? If so, do these patches expand with time as new spectrin and actin are recruited and merge with neighboring patches, or can the entire patch "diffuse" and coalesce with neighboring patches, thus expanding the MPS region?

      We agree with the reviewer's interpretation. A virtue of our experimental model and our interpretations of the observations in fixed cells is that it gives rise to informative questions such as the ones posed by the reviewer. As stated above, we will consider the inclusion of live imaging experiments using the expressión of C-terminus tagged human beta2-spectrin in the revised version of the manuscript. We are familiar with live-imaging and FRAP experiments and we think we can provide the evidence suggested.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Gazal et al. describe the presence of unique gaps and patches of BetaII-spectrin in medial sections of long human motor neuron axons. BII-spectrin, along with Alpha-spectrin, forms horizontal linkers between 180nm spaced F-actin rings in axons. These F-actin rings, along with the spectrin linkers, form membrane periodic structures (MPS) which are critical for the maintenance of the integrity, size, and function of axons. The primary goal of the authors was to address whether long motor axons, particularly those carrying familial mutations associated with the neurodegenerative disorder ALS, show defects in gaps and patches of BetaII-spectrin, ultimately leading to degradation of these neurons.

      We thank the reviewer for the detailed and accurate description of the data shown.

      Strengths:

      The experiments are well-designed, and the authors have used the right methods and cutting-edge techniques to address the questions in this manuscript. The use of human motor neurons and the use of motor neurons with different familial ALS mutations is a strength. The use of isogenic controls is a positive. The induction of gaps and patches by the kinase inhibitor staurosporine and their rescue by Latrunculin A is novel and well-executed. The use of biochemical assays to explore the role of calpains is appropriate and well-designed. The use of STED imaging to define the periodicity of MPS in the gaps and patches of spectrin is a strength.

      We thank the reviewer for the positive comments on the manuscript, the techniques used and the proposed model.

      Weaknesses:

      The primary weakness is the lack of rigorous evaluation to validate the proposed model of spectrin capture from the gaps into adjacent patches by the use of photobleaching and live imaging. Another point is the lack of investigation into how gaps and patches change in axons carrying the familial ALS mutations as they age, since 2 weeks is not a time point when neurodegeneration is expected to start.

      We will consider the inclusion of live imaging experiments using the expressión of tagged human beta2-spectrin in the revised version of the manuscript. We are familiar with live-imaging and FRAP experiments and we believe we can provide the evidence suggested. We don't discard the notion that axons carrying familial ALS mutations will show defects in MPS formation and/or stability when observed at longer culture times, or under culture conditions that promote neuronal aging (Guix et al., 2021). Thus, we will continue to work with these cells, but the goal of that project lies well beyond the primary message of the present manuscript, and we anticipate that the revised version will not include new data on this matter. 

      Reviewer #3 (Public review):

      Summary:

      Gazal et al present convincing evidence supporting a new model of MPS formation where a gap-and-patch MPS pattern coalesces laterally to give rise to a lattice covering the entire axon shaft.

      Strengths:

      (1) This is a very interesting study that supports a change in paradigm in the model of MPS lattice formation.

      (2) Knowledge on MPS organization is mainly derived from studies using rat hippocampal neurons. In the current manuscript, Gazal et al use human IPS-derived motor neurons, a highly relevant neuron type, to further the current knowledge on MPS biology.

      (3) The quality of the images provided, specifically of those involving super-resolution, is of a high standard. This adequately supports the conclusions of the authors.

      We thank the reviewer for the positive comments on the manuscript, the techniques used and the proposed model.

      Weaknesses:

      (1) The main concern raised by the manuscript is the assumption that staudosporine-induced gap and patch formation recapitulates the physiological assembly of gaps and patches of betaII-spectrin.

      We will further explore the inclusion of more measurements of other parameters and variables towards establishing whether these gaps-and-patches patterns are equivalent structures in control and staurosporine-treated cells. 

      (2) One technical challenge that limits a more compelling support of the new model of MPS formation is that fixed neurons are imaged, which precludes the observation of patch coalescence.

      As stated before regarding similar comments by other reviewers, we will consider the inclusion of live imaging experiments in the revised version of the manuscript.

      Nicolas Unsain, PhD, and Thomas Durcan, PhD.

      References

      Griswold, J.M., Bonilla-Quintana, M., Pepper, R. et al. Membrane mechanics dictate axonal pearls-on-a-string morphology and function. Nat Neurosci 28, 49–61 (2025). https://doi.org/10.1038/s41593-024-01813-1

      Guix F.X., Marrero Capitán A., Casadomé-Perales A., Palomares-Pérez .I, López Del Castillo I., Miguel V., Goedeke L., Martín M.G., Lamas S., Peinado H., Fernández-Hernando C., Dotti C.G. Increased exosome secretion in neurons aging in vitro by NPC1-mediated endosomal cholesterol buildup. Life Sci Alliance. 2021 Jun 28;4(8):e202101055. doi: 10.26508/lsa.202101055. Print 2021 Aug.

    1. eLife Assessment

      The effort is timely and the paper carries valuable insights into the function of UTR mutations. There are still significant concerns about both the quality of the screen data, and its ability to detect significant changes in translation and their direction. Therefore, the ability of the screen to support the extensive downstream statistical analysis is limited and leaves the paper incomplete.

    2. Reviewer #1 (Public review):

      The authors describe a massively parallel reporter assays (MPRA) screen focused at identifying polymorphisms in 5' and 3' UTRs that affect translation efficiency and thus might have a functional impact on cells. The topic is of timely interest, and indeed, several related efforts have recently been published and preprinted (e.g., https://pubmed.ncbi.nlm.nih.gov/37516102/ and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635273/). This study has several major issues with the results and their presentation.

      Major comments:

      • The main issue remains that it appears that the screen has largely failed, and the reasons for that remain unclear, which make it difficult to interpret how useful is the resulting data. The authors mention batch effects as a potential contributor. The authors start with a library that includes ~6,000 variants, which makes it a medium-size MPRA. But then, only 483 pairs of WT/mutated UTRs yield high confidence information, which is already a small number for any downstream statistical analysis, particularly since most don't actually affect translation in the reporter screen setting (which is not unexpected). It is unclear why >90% of the library did not give high-confidence information. The profiles presented as base-case examples in Fig. 2B don't look very informative or convincing. All the subsequent analysis is done on a very small set of UTRs that have an effect, and it is unclear to this reviewer how these can yield statistically significant and/or biologically-relevant associations.

      • From the variants that had an effect, the authors go on to carry out some protein-level validations, and see some changes, but it is not clear if those changes are in the same direction was observed in the screen. In their rebuttal the authors explain that they largely can not infer directionality of changes form the screen, which further limits its utility.

      • It is particularly puzzling how the authors can build a machine learning predictor with >3,000 features when the dataset they use for training the model has just a few dozens of translation-shifting variants.

      Comments on revisions:

      It appears that the authors have extracted the information they could from the problematic dataset they obtained. Repeating the experiments in a cleaner setting, obtaining data for the >6000 UTRs they intended will allow the authors to achieve the goals they set out to achieve in establishing the screen.

    1. eLife Assessment

      In their study, Cummings et al. provide a valuable advance in understanding the hierarchical regulation of tubulin polyglycylation, demonstrating that TTLL8 initiates monoglycylation which is a prerequisite for TTLL10-mediated polyglycylation. The evidence supporting these mechanistic insights is solid, relying on a compelling combination of purified biochemical assays, mass spectrometry, and microscopy. The work is further valued for revealing an unexpected crosstalk between polyglycylation and polyglutamylation that ensures a balanced post-translational modification landscape for proper cilia function.

    2. Reviewer #1 (Public review):

      Summary:

      In their current study, Cummings et al have approached this fundamental biochemical problem using a combination of purified enzyme-substrate reactions, MS/MS and microscopy in vitro to provide key insights into the hierarchy of generating polyglycylation in cilia and flagella. They first establish that TTLL8 is a monoglycylase, with the potential to add multiple mono glycine residues on both α- and β-tubulin. They then go on to establish that the monoglycylation is essential for TTLL10 binding and catalytic activity, which progressively reduces as the level of polyglycylation increases. This provides an interesting mechanism of how level of polyglycylation is regulated in the absence of a deglycylase. Finally, the authors also establish that for efficient TTLL10 activity, it is not just monoglycylation, but also polyglutamylation that is necessary, giving a key insight into how both these modifications interact with each other to ensure there is a balanced level of PTMs on the axonemes for efficient cilia function.

      Strengths:

      The manuscript is well written, and experiments are succinctly planned and outlined. The experiments used provide the conclusions to what the authors were hypothesising and provide some new novel possible mechanistic insights into the whole process of regulation of tubulin glycylation in motile cilia.

      Weaknesses:

      There were some weaknesses in the initial submission of the manuscript, but the authors have addressed these in their revised version either by giving clear explanations in the text or through additional experiments.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      “In their current study, Cummings et al have approached this fundamental biochemical problem using a combination of purified enzyme-substrate reactions, MS/MS, and microscopy in vitro to provide key insights into the hierarchy of generating polyglycylation in cilia and flagella. They first establish that TTLL8 is a monoglycylase, with the potential to add multiple mono glycine residues on both α- and β-tubulin. They then go on to establish that monoglycylation is essential for TTLL10 binding and catalytic activity, which progressively reduces as the level of polyglycylation increases. This provides an interesting mechanism of how the level of polyglycylation is regulated in the absence of a deglycylase. Finally, the authors also establish that for efficient TTLL10 activity, it is not just monoglycylation, but also polyglutamylation that is necessary, giving a key insight into how both these modifications interact with each other to ensure there is a balanced level of PTMs on the axonemes for efficient cilia function.”

      Strengths: 

      The manuscript is well-written, and experiments are succinctly planned and outlined. The experiments were used to provide the conclusions to what the authors were hypothesising and provide some new novel possible mechanistic insights into the whole process of regulation of tubulin glycylation in motile cilia.”

      We thank the reviewer for their support of our study and recognition of its importance to understanding microtubule glycylation and its regulation.  

      “The initial part of the manuscript where the authors discuss about the requirement of monoglycylation by TTLL8 is not new. This was established back in 2009 when Rogowski et al (2009) showed that polyglycylation of tubulin by TTLL10 occurs only when co-expressed in cells with TTLL3 or TTLL8. So, this part of the study adds very little new information to what was known. “

      Our study provides the first in vitro evidence with purified recombinant components that human TTLL8 is exclusively a monoglycylase (Figure 1) and that polyglycylation by TTLL10 requires previous priming with monoglycylation (Figure 2). Studies with purified recombinant components are the gold standard for establishing the activity of an enzyme as cellular work can be obfuscated by the activity of other regulators. We did cite in our original submission the work by Rogowski, Gaertig and Janke from 2009 (reference 15 in the original submission) as well as that Ikegami and Setou 2009 work (reference 26 in the original submission) that established that TTLL10 polygyclylase activity requires co-expression with TTLL8 in cells. Specifically, we stated in our original submission and in the revised manuscript:

      “Cellular overexpression studies coupled with the use of antibodies that recognize mono- and polyglycylation indicate that TTLL8 is also a glycyl-initiase, while TTLL10 a glycyl-elongase (15, 26).  However, direct biochemical evidence with purified enzymes for segregated initiation and elongation activity for glyclases is still lacking as does knowledge of their substrate specificity and regulation.” 

      In addition to citing the Setou study, we now cite again the Rogowski, Gaertig and Janke 2009 study later in the manuscript when the cellular data are mentioned again.  Specifically, we state in the revised manuscript: 

      “This is consistent with cellular overexpression data which showed that polyglycylation signal was detected via antibody only in tubulin from cells that co-expressed TTLL8 and TTLL10, but not TTLL10 alone (15, 26).”

      “The study also fails to discuss the involvement of the other monoglycylase, TTLL3 in the entire study, which is a weakness as in vivo, in cells, both the monoglycylases act in concert and so, may play a role in regulating the activity of TTLL10. “

      We previously showed that purified recombinant TTLL3, like TTLL8, adds only monoglycines, with a preference for the b-tubulin tail (Garnham et al., PNAS 2017). Given that TTLL10 requires priming by monoglycylation, we expect that, similarly to TTLL8, TTLL3 will allow elongation of the initial monoglycyline chains by TTLL10. 

      (1) From the mass spec data, it appears that the Xaenopus Laevis TTLL10 can add up to 18 residues. However, the numbers indicated in Figure 2E seem to suggest that it is a maximum of 23 residues only at a particular position. Does this mean that the 13-18 residues observed are a collection of multiple short-chain polyglycylations or are there positions that the authors observed where there were chains of longer than 3 glycine residues? This would be an interesting point to note as when it was discovered in Paramecium, the polyglycyl chains were reported to be up to 34 residues (Redeker et al., Science 1994). If the authors could test the TTLL10 from Paramecium to observe if this is a consistent phenomenon across evolution or is there a biologically significant difference that is being developed, would be interesting to know.”

      Figure 2E shows a subset of the modified tails that we identified and where the position of the posttranslationally added glycine can be mapped to a specific position, or range of positions. Additional species exist. We note that the mass spectra in Figure 2B are intact LC/MS, while those in Figure 2E are MS/MS. The ionization of tubulin tail peptides with larger number of glycines is not as efficient as for shorter glycine chains, reducing the sensitivity of detection of species that have higher number of glycines. This is not as pronounced when the mass spectra are obtained from the intact protein (Figure 2B). In summary, our data supports the fact that TTLL10 elongates polyglycine chains at multiple positions in the tubulin tail (shown in Figure 2E), however, we cannot ascertain the maximum polyglycine chain length, only the total number of glycyines added.

      Testing the enzyme from Paramecium is an interesting proposal but outside the scope of this manuscript. 

      (2) While it is interesting to know that the TTLL10 binds to TTLL8-modified tubulin with a much higher affinity than unmodified tubulin, in vivo, the microtubules will be a mixture of both TTLL3- and TTLL8-modified tubulin. It would be good to see the binding of the enzyme to a tubulin that is modified by both TTLL3 and TTLL8 if the two have a greater influence on TTLL10 binding.”

      Our previous work showed that purified recombinant TTLL3 has purely monoglycylase activity, with a preference for b-tubulin (Garnham et al., PNAS 2017). The sites of monoglycylation by TTLL3 overlap with those introduced by TTLL8 on b-tubulin (the difference being mainly that TTLL3 is more selective towards b-tubulin and thus has lower activity on a-tubulin). TTLL8 introduces additional monoGlys on the a-tubulin tail. Therefore, it is unlikely that TTLL10 will have a different response to microtubules that carry similar numbers of Gly residues, regardless of whether introduced by TTLL8 or TTLL3 and 8. Our data show that TTLL10 binding increases with Gly number, but that the gains in affinity plateau as the density of glycine residues on the tails increases above a certain threshold, likely because one TTLL10 molecule recognizes one monoGly branch, and steric hindrance on the tubulin tail prevents further recruitment of additional TTLL10 molecules.  

      (3) The authors have always increased the number of monoglycines in beta-tubulin more than in alpha-tubulin. Is there a rationale for this? Since TTLL8 is known to predominantly modify alphatubulin (Rogowski et al., 2009; Gadadhar et al., 2017) why did the authors not check for the increased binding of the TTLL10 on dimers where the number of monoglycines on alpha-tubulin is higher than 1.1? Especially when they themselves observe in their mass spec that even on alphatubulin there are 1, 2, and 3 glycines added. I would like to see what happens if the ratio is high alpha-G + low beta-G”

      As our spectra in Figure 1 show, we find that TTLL8 is able to modify robustly in vitro both a- and b-tubulin but that it shows a slight preference for b-tubulin (Figure 1B). The work from the Janke group that the reviewer is referring to (Rogowski et al., 2009 and Gadahar et al., 2017) did not use recombinant, purified enzymes and unmodified microtubules as substrates and used axonemal tubulin (which carries many modifications), and so it is possible that the a-tubulin preference observed in that system when TTLL8 is overexpressed, is likely to other factors that do not reflect the biochemical property of the enzyme alone (for example, it could be because btubulin site are not available because they are already glutamylated). As can be seen from Figure 3D, the gain in affinity when increasing the number of glycines from one glycine is small, compared to the initial monoglycine added to the a- and the b-tubulin tail, likely reflecting that one tail cannot bind more than one TTLL10 at one time because of steric hindrance. Moreover, it is important here to note that glutamylation and glycylases compete for the same sites on the tubulin tails, as we have for example shown for TTLL3 and TTLL7 (Garnham et al., 2017), therefore the activity of these enzymes in vivo or with non-naïve substrates are context dependent and influences also what sites are available for TTLL10 to modify. In conclusion, by using recombinant enzymes and naïve tubulin we gain insight into the intrinsic property of these enzymes and therefore provide a framework for the interpretation of in vitro and in vivo observations. 

      (4) I wonder why the authors did not use the human TTLL10 to test if this also shows similar binding to the glycylated tubulin despite the fact that it is enzymatically inactive. If it does, then it would be interesting to see the kinetics of binding of this enzyme to see if the fall off of the enzyme from the tubulin is solely driven by the level of polyglycylation only, or if it has any other mechanism involved as well.”

      Work with human recombinant TTLL10, a TTLL10 homolog which was proposed to be inactive, will be an interesting future direction but outside the scope of this manuscript. We did note in our previous manuscript (Garnham et al., 2017, Figure S5) that the residues which are mutated in the human enzyme compared to other mammals are on the dorsal face of the enzyme, far away from the active site, raising an interesting question of how they inactivate the enzyme.   We need however to emphasize that our work clearly shows that it is polyglycylation on the microtubules that reduces binding of TTL10 to microtubules because experiments done in the absence of glycylating activity i.e. with enzyme that was incubated with microtubules that were pre-modified with polyglycline chains, but in the absence of glycyine substrate (precluding any glycylation activity during the binding assay) show that the binding decreases monotonically with the number of polyglycines  on the microtubule (Figures 4A, B).  

      (5) In Figure 5, the authors use monoglycylated tubulin that is either glutamylated or not to show that the activity of TTLL10 is enhanced by the extent of polyglutamylation present on the tubulin. However, there is no evidence of the enzyme binding to microtubules that are only glutamylated. It would be good to test this to determine if the binding is also dependent on both monoglycylation and glutamylation or is it only the enzyme activity.

      Figure 5E shows that TTLL10 binding increases with monoglycylation alone, and that glutamylation is additive and Figures 4A, B show that it is not the enzyme activity that affects the binding, but the glycylation state of the microtubule. We did not determine binding to microtubules that were only glutamylated, because TTLL10 would not be able to elongate polyglycine chains on those microtubules, even if it bound. 

      (6) The level of polyglycylation used in Figure 5 is quite low. It would be good to see how the length of the polyglycine chain impacts TTLL10 activity in the presence of polyglutamylation, and whether this has any cooperative effect leading to longer chain polyglycylation than what is seen with only monoglycylated tubulin.

      We expect longer chain polyglycylation to have an inhibitory effect as we show in Figure 4. 

      “(7) In the overall study, the authors fail to discuss whether the activity of both the glycylases at different sites on tubulin is sequential, or modifications at different residues happen all at once. If the authors were to do a sequential time course of the modification followed by MS/MS analysis, they could get some indications about this.”

      As the data in Figure 3D shows, the effect of adding more monoGly site on a tubulin tail has a muted effect on binding, indicating that the additional mono-Gly branches do not lead to more TTLL10 recruitment because of steric hindrance i.e. multiple TTLL10 enzymes cannot be accommodated on the same tail at the same time efficiently. This is consistent with the overall dimensions of the enzyme and the positions of its active site, which were modeled initially in our previous publication (Garnham et al., PNAS 2017).  The site of TTL10 action is pre-determined by the position of the mono-Gly branch introduced by TTLL3 or TTLL8. The length of the tubulin tail and the proximity of mono-Gly sites to each other precludes TTLL10 acting at multiple positions at once on the same tail.

      “(8) Do the modifications have any cooperative effect with respect to the sites of modification? Does modifying a particular site enhance the kinetics of modification of the other sites? Can the authors test this?”

      This would be an interesting line of future investigations.  

      “Minor points:

      (1’) The authors opine that the level of polyglycylation is regulated by the decreased binding of the TTLL10 to the polyglycylated tubulin. While this is an interesting argument, which could be a possibility based on the data they present, it would still not answer if this is a mechanism followed by TTLL10 of all species or not. If they could test the efficacy of TTLL10 from another species, to see the binding efficiency of that enzyme, it could potentially strengthen their argument of this possible mechanism.”

      The differences between the properties of TTLL10 from different organisms will be an interesting focus of future investigations, but outside the scope of this present study. However, we would like to point out that the level of sequence conservation between TTLL10 makes it unlikely that other TTLL10 do not follow a similar mechanism, albeit with possible differences in the extent of the response.  We also note that we have shown that polyglycylation also inhibits binding to the microtubule of the severing enzyme katanin (Szczesna et al., Dev. Cell 2022). Therefore, these studies suggests that polyglycylation might be a more general mechanism for reducing microtubule binding affinity since glycylation reduces the negative charge on the tubulin tails, which frequently interact with positively charged domains or interfaces in microtubule associated proteins.  

      “(2) The authors indicate that glycylases act on pre-glutamylated microtubules. However, in their assays, they use unmodified tubulin, which I would presume is also not glutamylated. If this is the case, how can they justify that the enzymes prefer pre-glutamylated microtubules? This is a bit unclear. Do they mean that their tubulin is already pre-glutamylated? Have they tested this?”

      The statement regarding the action of these enzymes on glutamylated microtubules refer to the in vivo situation where polyglycylated microtubules appear in cilia biogenesis after the microtubules in the axoneme are already glutamylated. In vitro, by using microtubules that are only monoglycylated and microtubules that are both glutamylated and monoglycylated, we show that glutamylation further increases recruitment of TTLL10 to microtubules that are monoglycyated. Therefore, glutamylated microtubules will be polyglycylated preferentially over those that are not glutamylated. 

      We state: “Axonemal microtubules are abundantly glutamylated. Glutamylation appears during cilia development first, followed by glycylation (12, 13), indicating that in this scenario glycylases act on pre-glutamylated microtubule substrates.”

      “(3) In continuation with the previous point, an immunoblot of their purified tubulin showing no reactivity to anti-glycylation or anti-glutamylation antibodies, which upon treatment with TTLL8 reacts to the anti-glycylation antibody would be confirmatory evidence to show that the isolated tubulin was indeed unmodified.”

      We have now included a Western blot of our TOG-purified tubulin as Figure S3 in our revised manuscript.  This shows a faint signal with the pep-G1 antibody and a very strong signal after TTLL8 treatment. We are not sure whether the low signal with the pep-G1 antibody for the unmodified tubulin is due to low bona fide monoglycylation-specific signal or a low affinity nonspecific interaction of this antibody (raised against mono-glycylated tubulin tail peptides) with the unmodified tubulin. We note that this signal is clearly visible only when loading at least 0.2 micrograms of the purified tubulin. At this loading level the signal for the glycylated species is saturated. It is also important to note that we have not detected glycylated species in this tubulin either by LC-MS or MS/MS. Therefore, our data strongly indicate that the tubulin purified from tsA201 cells is not glycylated or has at most extremely low levels of glycylation. Importantly, this potential trace level of monoglycylated tubulin does not affect any of the conclusions in this study. The Western blot also shows no detectable signal with the polyglycyation antibody in the unmodified tubulin and a very strong, saturated signal after the tubulin was treated with both TTLL8 and TTLL10.  We also added an additional Figure S8 that shows that the tSA201 tubulin does not give a detectable signal for glutamylation. Please see also Figure 3 from Vemu et al., Methods Enzymology 2017 where we also published a Western blot from our TOG-purified tubulin using anti-glutamylation antibodies. 

      “(4) In their study, the authors have used polyglycylation of up to 10-13 residues. This brings me to my first point that in the case of Paramecium, the number was identified to be up to 34, which would mean that this enzyme has higher binding or catalytic activity. I would like to know the authors' perspective on this, as to what could potentially determine the difference in the activities of TTLL10 across species.”

      The Xenopus TTLL10 enzyme can add more glycines than the 10-13 range that we show here if the enzyme is incubated for longer periods. The fact that glycine numbers as high as 34 were detected in Paramecium does not necessarily mean that the Paramecium enzyme is more active since there is no equivalent data to compare it with from Xenopus. The only way to address potential species differences in enzyme specific activity is to purify enzymes from different species and compare their activity side-by-side.  

      (5) How was the completion of the reaction of monoglycylation and polyglycylation determined? If the enzymes were left for more than 20 minutes, did TTLL8/ TTLL10 add more glycines? What is the reason for using less tubulin (1:20 enzyme:tubulin molar ratio) for monoglycylation by TTLL8, and more tubulin (1:50 enzyme:tubulin molar ratio) for polyglycylation by TTLL10?

      Yes, if the enzymes were incubated longer, they added more glycines. The extent of glycylation was determined from the LC-MS and the incubation time was varied to obtain samples with fewer or more glycines.   The lower ratio used for TTLL10 is because of the higher specific activity of that enzyme compared to TTLL8.  

      (6) Figure S2 A, b2 ion is not indicated in the peptide sequence, while it is shown in the m/z graph.

      We thank the reviewer for the careful reading. We have corrected this in our MS/MS spectrum. 

      Reviewer #2 (Public review):

      “In their manuscript, Cummings et al. focus on the enzymatic activities of TTLL3, TTLL8, and TTLL10, which catalyze the glycylation of tubulin, a crucial posttranslational modification for cilia maintenance and motility. The experiments are beautifully performed, with meticulous attention to detail and the inclusion of appropriate controls, ensuring the reliability of the findings. The authors utilized in vitro reconstitution to demonstrate that TTLL8 functions exclusively as a glycyl initiase, adding monoglycines at multiple positions on both α- and β-tubulin tails. In contrast, TTLL10 acts solely as a tubulin glycyl elongase, extending existing glycine chains. A notable finding is the differential substrate recognition between TTLL glycylases and TTLL glutamylases, highlighting a broader substrate promiscuity in glycylases compared to the more selective glutamylases. This observation aligns with the greater diversification observed among glutamylases. The study reveals a hierarchical mechanism of enzyme recruitment to microtubules, where TTLL10 binding necessitates prior monoglycylation by TTLL8. This binding is progressively inhibited by increasing polyglycine chain length, suggesting a self-regulatory mechanism for polyglycine chain length control. Furthermore, TTLL10 recruitment is enhanced by TTLL6mediated polyglutamylation, illustrating a complex interplay between different tubulin modifications. In addition, they uncover that polyglutamylation stimulates TTLL10 recruitment without necessarily increasing glycylation on the same tubulin dimer, due to the potential for TTLLs to interact with neighboring tubulin dimers. This mechanism could lead to an enrichment of glycylation on the same microtubule, contributing to the complexity of the tubulin code. The article also addresses a significant challenge in the field: the difficulty of generating microtubules with controlled posttranslational modifications for in vitro studies. By identifying the specific modification sites and the interplay between TTLL activities, the authors provide a valuable tool for creating differentially glycylated microtubules. This advancement will facilitate further studies on the effects of glycylation on microtubule-associated proteins and the broader implications of the tubulin code. In summary, this study substantially contributes to our knowledge of posttranslational enzymes and their regulation, offering new insights into the biochemical mechanisms underlying microtubule modifications. The rigorous experimental approach and the novel findings presented make this a pivotal addition to the field of cellular and molecular biology.”

      We thank the reviewer for their support of our work.

    1. eLife Assessment

      This study provides convincing evidence of coordinated spiking activity of neurons in the anterior cingulate cortex (ACC), and correlated activity in the CA1 subregion of the hippocampus, during observational learning. The authors also show coordinated ACC-CA1 neural activity during rest periods prior to the performance of the observationally learned task. The important findings significantly advance the field's understanding of neural mechanisms underlying social learning.

    2. Reviewer #1 (Public review):

      Summary:

      Mou and Ji investigate the relationship between firing rates in the anterior cingulate cortex (ACC) and CA1 neurons during observational learning. They found trajectory-selective responses in the ACC, coordinated activity between ACC and CA1 place cells for specific trajectories, and reactivation of these ensembles during sharp-wave ripples (SWRs), particularly during hippocampal replay events. The study is methodologically sound, the data are clearly presented, and the conclusions are well supported. The work is both novel and highly relevant to our understanding of social learning. Compared to the previous version of the paper, they have added substantial characterization of neuronal properties related to their firing during the task and replay events. I believe that the authors have therefore addressed most of my concerns and recommend the paper for publication as is.

      Strengths:

      The study is well designed, the data presented is very clear and the conclusions are appropriate regarding their results. The study is novel and of high relevance for the understanding of social learning.

      Weaknesses:

      All previous weaknesses have been addressed.

    3. Reviewer #2 (Public review):

      Summary:

      In the manuscript, Xiang Mou and Daoyun JI investigate how ACC neurons activated by observational learning communicate with the hippocampus. They assess this line of communication through a complex behavioral technique, in vivo electrophysiology, pharmacological approaches, and data analytical techniques. Firstly, authors find that observational performance is dependent on the ACC, and that the ACC possess neurons that show side selectivity (trajectory related) in both the observation box, when shuttling to reward, and during subsequent maze running, shuttling to the corresponding same side for reward. The side-selective activation appears stronger for correct trials compared to error trials specifically during observation of Demo rats. They compare how the CA1 of the hippocampus encodes these two environments and find that ACC side-selective neurons show correlation with side-selective CA1 ensembles during maze behavior, water consumption, and sharp-wave ripples.

      Strengths:

      Overall, the paper provides strong evidence that ACC neurons are activated by observational learning and that this activation seems to be correlated with CA1 activity.

      Weaknesses:

      Concerns, however, surround the strength of evidence that links ACC and CA1 activity during observational learning. Only weak correlations between the two regions are shown, and it is unclear if the ACC may lead CA1 activity or vice versa. It is possible that these processes reflect two parallel pathways. Without manipulation of ACC, it is difficult to assess whether ACC activity influences hippocampal replay.

      Comments on revisions:

      Lines 361-362: R and P values do not match that of Figure 5C.

    4. Reviewer #3 (Public review):

      Summary:

      Mou and Ji investigated neuro-computational mechanisms behind observational spatial learning in rats and reported several signs of functional coupling between the ACC and CA1 at the single neuron level. Using multi-site tetrode recording, they found that ACC cells encoding a path in a maze were activated while a rat observed another rat taking that path. This activation was also correlated with the activation of CA1 cells encoding the same path and facilitated their replay during sharp-wave ripples (SWRs) before the recording rat ran on the maze by itself. These activity patterns were associated with correct path choice during self-running and were absent in control conditions where the recording rat did not learn the correct choice through observations. Based on these findings, the authors argue that ACC cells capture the critical information during observation to organize hippocampal cell activity for subsequent spatial decisions.

      Strengths:

      The authors used multiple outcome measures to build a strong case for path-specific spike coordination between ACC and CA1 cells. The analyses were conducted carefully, and proper control measures were used to establish the statistical significance of the detected effects. The authors also demonstrated tight correlations between the spike coordination patterns and the successful use of observed information for future decisions.

      Weaknesses:

      (1) As evidence for the activation of path information in the ACC during observation, the authors showed positive correlations between firing rates during observation and those during self-running. This argument will be solidified if the authors use a decoding approach to demonstrate the activation of path-selective ACC ensemble activity patterns during observation. This approach will also open the door to uncovering how the activation of ACC path representation is related to the moment-to-moment position of the demonstrator rat and whether it is coupled with the timing of SWRs.

      (2) The authors argued that the ACC biases the content of awake replay in CA1 during SWRs in the observation period. The reviewer wonders if a similar bias also occurs during SWRs in the self-run period (i.e., water consumption after the correct choice). This analysis will help test whether the biased replay occurs due to the need to convert observed information into future choices.

      (3) Although the authors demonstrated the necessity of the ACC for the task, it remains to be determined whether firing coordination between the ACC and CA1 during observation is necessary for the correct path choice during self-runs. Some discussion on this point, along with future direction, would be beneficial for readers.

      Comments on revisions:

      The authors fully addressed my recommendations. I do not have any further comments.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Minor:

      (1) In Figure 2, only the right or left selective neurons are presented for the comparison, it would be helpful to also compare these with the neurons that are not selective for any of the sides and maybe include them in the supplemental materials

      We have included all non-selective neurons in Figure 2D and supplemental Figure 2B. Their differences in firing rate between left and right sides are quantified by their selective indices (SIs). 

      (2) The authors should provide controls of speed during NMDA infusion and vehicle.

      We have quantified and compared the duration of running laps, which is equivalent to speed.

      (3) In Figure 1d, the trend shows that even during NMDA infusion, the animals learn as shown by a higher proportion of correct trials in the 3rd compared to the 1st trial

      We thank the reviewer for pointing that out. We noticed that NMDAlesioned ACC animal showed a trend of improved performance in the track, and we believe this is due to re-learning of the task, which we point out in the main text. However, we emphasize that, compared to the Vehicle control, the overall performance of NMDA-lesioned animals was significantly impaired.

      (4) Clarify the implications of the NMDA experiments, as it is not straightforward to interpret that an interplay between ACC-CA1 is involved in this task as per this experiment.

      Rather than stating the involvement of ACC-CA1 interplay, we use the results of NMDA lesion experiment to demonstrate that ACC is also required, besides CA1, for the task.

      (5) In Figure 4b, there seems to be a lag between CA1 and ACC correlations; the authors could provide a quantification of this temporal delay between CA1 and ACC.

      Figure 4B shows the cross-correlation between one example ACC cell and its associated CA1 ensembles on the left and opposite sides. There was a broad peak around time lag 0. Our further investigation did not identify a significant, systemic delay for all ACC cells, which led us to quantify the correlation at time lag 0 in Figure 4C and D.

      (6) The example correlation provided in 5c for the opposite, doesn't seem representative of the population trend as shown in 5d, since both the Same and the Opposite for the demo show a positive trend. It would be best to choose an example that represents the population better.

      Following the reviewer’s suggestion, we have replaced the original plot with another ACC cell in Figure 5C.

      (7) Almost the same can be applied to Figure 6.

      Following the reviewer’s suggestion, we have replaced the original plot with another ACC cell in Figure 6E.

      (8) The results in Figure 7 are convincing, in my opinion, as they show that the trend is lost for the opposite side (contrary to the coactivation shown in Figures 5 and 6 that showed the same trends for the same and opposite during Demo). Do the authors have any interpretation of this? Is it due to co-activity reflecting other task-relevant features different than the spatial trajectory being observed?

      The correlation on the opposite side between CA1 and ACC shown in Figure 5C-D and Figure 6E-F is likely due to a general interaction between CA1 activities around SWRs with prefrontal cortical areas including ACC, as shown in previous studies (Jadhav et al., 2016; Remondes and Wilson, 2015).  We would like to point out that this correlation only quantifies the coactivation between CA1 ensemble firing rates and individual ACC cells’ firing rate. This raw correlation does not consider the content of spikes generated by CA1 ensembles, neglecting the sequential firing patterns of CA1 cells. The replay analysis in Fig. 7 examines the order of spikes generated by individual CA1 cells. The result in Fig. 7 shows that the sequential activation of CA1 place cells more accurately reflects the distinction between the same- and opposite-side trajectories. We consider Fig. 7 is more refined analysis than Figs. 5 and 6.

      (9) For all the figures regarding SWR activities, the authors should provide average PSTH for CA1 as well as ACC, perhaps also examples of neurons that are selectively active during one side or the opposite side runs.

      Following the reviewer’s suggestion, we have added data to show PSTH for CA1 and ACC cells surrounding SWR peaks (Figure S5E, F). 

      Reviewer #2 (Recommendations For The Authors):

      Below are additional notes for improvements.

      (1) Figure 1C. Unclear what Time 0 indicates.

      We specify it (OB's poke time) in the figure legend. 

      (2) Figure 2C. Unclear what the numbers above datapoints mean.

      Those numbers are selection indices (SIs), as specified in the legend. 

      (3) Figure 5: Line 374-375. Given the repetitive nature of the task, it is unclear whether SWRs are encoding upcoming or past spatial trajectories or whether they are encoding trajectories at all. The authors would need to show that SWRs-ACC communication is predictive of task outcome to claim it is specifically necessary for future outcomes rather than consolidating past trajectories.

      We agree with the reviewer and have made changes to reflect that the ACC-CA1 correlation in Fig.5 is specific to the same side of their selectivity, not exactly to future trajectories. Regarding the repetitive nature of the task (same-side rule), we have specifically addressed the advantage and limitation of this task design in the discussion. Regarding the observer's own past vs. future trajectories, our past publication (Mou et al., 2022) shows that the CA1 replay in SWRs more likely encode the correct, future trajectories. 

      (4) Figure 7. It appears that the correlation was conducted between ACC activity and CA1 replays recorded at distinct time windows (delay period vs. water consumption). It is unclear how ACC activity could influence CA1 replays when they occur hundreds of milliseconds apart or even longer.

      We thank the reviewer for raising this important question. We have shown that the higher same-side ACC activity during observation continues during water consumption. However, our added data in Fig.S5E show that this enhancement did not occur precisely within SWRs. We thus propose a possibility that the overall enhanced activity of same-side ACC cells during water consumption provides an overall, background excitation boost to same-side CA1 cells to enhance their replay within SWRs. We have revised the discussion section to present this model. 

      (5) Abstract: lines 24-25 Discussion: lines 475-476 Based on the data there is no certainty whether ACC biases or coordinates CA1 replays. The data simply shows that they are correlated with one another.

      We have modified those sentences to clarify the non-causal nature of the interaction.

      Reviewer #3 (Recommendations For The Authors):

      Please see below for the list of minor corrections and suggestions:

      (1) Line 136-143: On the data shown in Figure 1D, I recommend using two-way mixed ANOVA with sessions as a within-subjects factor and groups as a between-subjects factor.

      We thank the reviewer for this point. We indeed use two-way ANOVA for those comparisons. We have specified out in the text.

      (2) Line 219-228: I recommend expanding the explanation of two control conditions here. It was written in the method section, but the readers would appreciate the gist of these conditions in the result section. In particular, it was unclear how box SI was calculated in the Empty condition. Also, the plots of poke rates in the control conditions will be useful to show that rats did not learn the correct choice from observation in these control conditions.

      We have added more explanation of the two control conditions in the text. The quantifications of poke rates for Demo and two control conditions (Object, Empty) are provided in our previous publication (Mou et al., 2022).

      (3) Line 610: Please specify the number of three types of sessions each rat underwent and the order of these session types.

      We revise the texts in the Method section and provide the numbers.

      (4) In Figure 2c legend, please specify what the number (e.g., -0.41) indicates.

      Those numbers are selection indices (SIs), as specified in the legend.

    1. eLife Assessment

      This valuable study introduces a data augmentation approach based on generative unsupervised models to address data imbalance in immune receptor modeling. Support for the findings is solid, showing that the use of generated data increases the performance of downstream supervised prediction tasks, e.g., TCR-peptide interaction prediction. However, the validation, mainly relying on synthetic data, could be completed, especially regarding unseen epitopes, and given the exclusive focus on CDR3β. The results should be of interest to the communities working on immunology and biological sequence data analysis.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript presents a deep learning framework for predicting T cell receptor (TCR) binding to antigens (peptide-MHC) using a combination of data augmentation techniques to address class imbalance in experimental datasets, and introduces both peptide-specific and pan-specific models for TCR-MHC-I binding prediction. The authors leverage a large, curated dataset of experimentally validated TCR-MHC-I pairs and apply a data augmentation strategy based on generative modeling to generate new TCR sequences. The approach is evaluated on benchmark datasets, and the resulting models demonstrate improved accuracy and robustness.

      Strengths:

      The most significant contribution of the manuscript lies in its data augmentation approach to mitigate class imbalance, particularly for rare but immunologically relevant epitope classes. The authors employ a generative strategy based on two deep learning architectures:

      (1) a Restricted Boltzmann Machine (RBM) and

      (2) a BERT-based language model, which is used to generate new CDR3B sequences of TCRs that are used as synthetic training data for creating a class balance of TCR-pMHC binding pairs.

      The distinction between peptide-specific (HLA allele-specific) and pan-specific (generalized across HLA alleles) models is well-motivated and addresses a key challenge in immunogenomics: balancing specificity and generalizability. The peptide-specific models show strong performance on known HLA alleles, which is expected, but the pan-specific model's ability to generalize across diverse HLA types, especially those not represented in training, is critical.

      Weaknesses:

      The paper would benefit from a more rigorous analysis of the biological validity of the augmented data. Specifically, how do the synthetic CDR3B sequences compare to real CDR3B in terms of sequence similarity, motif conservation? The authors should provide a quantitative assessment (via t-SNE or UMAP projections) of real vs. augmented sequences, or by measuring the overlap in known motif positions, before and after augmentation. Without such validation, the risk of introducing "hallucinated" sequences that distort model learning remains a concern. Moreover, it would strengthen the argument if the authors demonstrated that performance gains are not merely due to overfitting on synthetic data, but reflect genuine generalization to unseen real data. Ultimately, this can only be performed through elaborate experimental wet-lab validation experiments, which may be outside the scope of this study.

      While generative modeling for sequence data is increasingly common, the choice of RBM, which is a relatively older architecture, could benefit from stronger justification, especially given the emergence of more powerful and scalable alternatives (e.g., ProGen, ESM, or diffusion-based models). While BERT was used, it will be valuable in the future to explore other architectures for data augmentation.

      The manuscript would be more compelling if the authors performed a deeper analysis of the pan-specific model's behavior across HLA supertypes and allele groups. Are the learned representations truly "pan" or merely a weighted average of the most common alleles? The authors should assess whether the pan-specific model learns shared binding motifs (anchor residue preferences) and whether these features are interpretable through attention maps. A failure to identify such patterns would raise concerns about the model's interpretability and biological relevance.

      The exclusive focus on CDR3β for TCR modeling is biologically problematic. TCRs are heterodimers composed of α and β chains, and both CDR1, CDR2, and CDR3 regions of both chains contribute to antigen recognition. The CDR3β loop is often more diverse and critical, but CDR3α and the CDR1/2 loops also play significant roles in binding affinity and specificity. By generating only CDR3B sequences and not modeling the full TCR αβ heterodimer, the authors risk introducing a systematic bias toward β-chain-dominated recognition, which will not reflect the full complexity of TCR-peptide-MHC interactions.

    3. Reviewer #2 (Public review):

      Summary:

      This paper presents a thoughtful and well-motivated strategy to address a major challenge in TCR-epitope binding prediction: data imbalance, particularly the scarcity of positive (binding) TCR, peptide pairs. The authors introduce a two-step pipeline combining data balancing, via undersampling and generative augmentation, and a supervised CNN-based classifier. Notably, the use of Restricted Boltzmann Machines (RBMs) and BERT-style transformer models to generate synthetic CDR3β sequences is shown to improve model performance. The proposed method is applied to both peptide-specific and pan-specific settings, yielding notable performance improvements, especially for in-distribution peptides. Generative augmentation also leads to measurable gains for out-of-distribution epitopes, particularly those with high sequence similarity to the training set.

      Strengths:

      (1) The authors tackle the well-known but under-addressed issue of class imbalance in TCR-epitope binding data, where negatives vastly outnumber positive (binding) pairs. This imbalance undermines classifier reliability and generalization.

      (2) The model is tested on both in-distribution (seen epitopes) and out-of-distribution (unseen epitopes) scenarios. Including a synthetic lattice protein benchmark allows the authors to dissect generalization behavior in a controlled environment.

      (3) The paper shows a measurable benefit of generative. For example, AUC improvements of up to +0.11 are observed for peptides closely related to those seen during training, demonstrating the method's practical impact.

      (4) A direct comparison between RBM- and Transformer-based sequence generators adds value, offering the community guidance on trade-offs between different generative architectures in TCR modeling applications.

      Weaknesses:

      (1) Generalization degrades with epitope dissimilarity

      The performance drops substantially as the test epitope becomes more dissimilar to the training set. This is expected, but it highlights an essential limitation of the generative models: they help only when the test epitope is similar to one already seen. Table 1 shows that the performance gain from generative augmentation decreases as the test epitope becomes more dissimilar to the training epitopes. For epitopes with a Levenshtein distance of 1 from the training set, the average AUC improvement is approximately +0.11. This gain drops to around +0.06 for epitopes at distance 2. It becomes minimal for those at distance 4, indicating an explicit limitation in the model's ability to generalize to more distant epitopes. The authors should quantify more explicitly how far the model can generalize effectively. What is the performance degradation threshold as a function of Levenshtein distance?

      (2) What is the minimal number of positive samples needed for data augmentation to help?

      The approach has an intrinsic catch-22: generative models require data to learn the underlying distribution and cannot be applied to epitopes with insufficient data. As a result, the method is unlikely to be effective for completely new epitopes. Could the authors quantify the minimum number of real binders needed for effective generative augmentation? This would be particularly relevant for zero-shot or few-shot prediction scenarios, where only 0-10 positive samples are available. Such experiments would help clarify the practical limits of the proposed strategy.

      (3) Lack of end-to-end evaluation on unseen epitopes as inputs

      The authors frame peptide-specific models as classification over a few known epitopes, a closed-set formulation. While this is useful for evaluating generation effects, it's not representative of the more practical open-set task of predicting binding to truly novel epitopes. A stronger test would include models that take peptides as input (e.g., pan-specific, peptide-conditioned classifiers), including unseen epitopes at test time. Could the authors attempt an evaluation on benchmarks like IMMREP25 or other datasets where test epitopes are excluded from training?

      (4) Focus on β-chain limits generalizability

      The current pipeline is trained exclusively on CDR3β sequences. However, the field is increasingly moving toward single-cell sequencing, which provides paired α/β TCR chain data. Understanding how the proposed approach performs when both chains are available would be valuable. Could the authors evaluate the performance gains on paired α/β information, even in a small subset of single-cell data?

      (5) Synthetic lattice proteins (LPs) have limited biological fidelity

      While the LP-based benchmark presented in Figure 5 is a clever and controlled tool for probing model generalization, it remains conceptually and biophysically distant from real TCR-peptide interactions. Its utility as a toy model is valid, but its limitations should be more explicitly acknowledged:

      a) Over-simplified binding landscape: The LP system is designed for tractability, with a simplified sequence-structure mapping and fixed lattice constraints. As shown in Figure 5c, the LP binding landscape is linearly separable, in stark contrast to the complex and often degenerate nature of real TCR-epitope interactions, where multiple structurally distinct TCRs can bind the same peptide and vice versa.

      b) Absence of immunological context: The LP model abstracts away key biological factors such as MHC restriction, α/β chain pairing, peptide presentation, and structural constraints of the TCR-pMHC complex. These are essential for understanding binding specificity in actual immune repertoires.

      c) Overestimation of generalization: While performance drops on more distant LP structures, even these are structurally and statistically more similar to the training data than truly novel biological epitopes. Thus, the LP benchmark likely underestimates the true difficulty of out-of-distribution generalization in real-world TCR prediction tasks.

      d) Simplified biophysics: The LP simulations rely on coarse-grained energy models and empirical potentials that do not capture conformational dynamics, side-chain flexibility, or realistic binding energetics of TCR-peptide interfaces.

      In summary, while the LP benchmark helps isolate specific generalization behaviors and for sanity-checking model performance under controlled perturbations, its biological relevance is limited. The authors should explicitly frame these assumptions and limitations to prevent overinterpreting results from this synthetic system.

    4. Reviewer #3 (Public review):

      Summary:

      The authors present a method to address class imbalance in T cell receptor (TCR)-epitope binding datasets by generating synthetic positive binding examples using generative models, specifically BERT-based architectures and Restricted Boltzmann Machines (RBMs). They hypothesize that improving class balance can enhance model performance in predicting TCR-peptide binding.

      Strengths:

      (1) Interesting biological as well as technical topic.

      (2) Solid technical foundations.

      Weaknesses:

      (1) Fundamental Biological Oversight:

      While the computational strategy of augmenting positive samples via generative models is technically interesting, the manuscript falls short in addressing key biological considerations. Specifically, the authors simulate and evaluate only CDR3β-peptide binding interactions. However, antigen recognition by T cells involves both the α- and β-chains of the TCR. The omission of CDR3α undermines the biological realism and limits the generalizability of the findings.

      (2) Validation of Simulated Data:

      The central claim of the manuscript is that simulated positive examples improve predictive performance. However, there is no rigorous validation of the biological plausibility or realism of the generated TCR sequences. Without independent evaluation (e.g., testing whether synthetic TCR-peptide pairs are truly binding), it remains unclear whether the performance gains are biologically meaningful or merely reflect artifacts of the generation process.

      (3) Risk of Bias and Overfitting:

      Training and evaluating models with generated data introduces a risk of circularity and bias. The observed improvements may not reflect better generalization to real-world TCR-epitope interactions but could instead arise from overfitting to synthetic patterns. Additional testing on independent, biologically validated datasets would help clarify this point.

    5. Author response:

      We would like to thank editors and reviewers for their time spent on our work, fair assessments and constructive criticism. We plan to address their concerns in the future revision as follows, detailed by topic.

      (1) Limitations of focusing on CDR3β only

      In its current state, our work tested the proposed pipeline of data augmentation for binding prediction on benchmark datasets limited to peptide+CDR3β sequence pairs only. As pointed out by all the reviewers, the TCR-peptide interaction is more complex and involves also other regions of the receptor (such as the CDR3α chain) and the MHC presenting the peptide as well. To investigate how the inclusion of additional information impacts results, we plan to apply our pipeline in a setting where the generative protocol is extended to generate paired α and β. The supervised classifier will then receive a concatenation of α+β chains as inputs. We will compare the performance of this classifier with the one using β chains only, and add this analysis to the revised manuscript.

      (1) Validation of generated sequences and interpretation of the features learned by the generative model

      The reliability of the generative model in augmenting the training set with biologically sensible sequences is a crucial assumption of our approach, and we agree with the reviewers raising this as a main concern. Before stating our strategy to improve the soundness of the method, let us first point out a few aspects already considered in the present manuscript:

      • The test set of the classifier is always composed of real sequences: in this way, an increase in performance due to data augmentation cannot be due to overfitting to synthetic, possibly unrealistic, sequences.

      • The generative protocol is initialized from real sequences, and used to generate sequences not too far from them. In this respect, it could be taken as a way to “regularize” the simplest strategy of data augmentation, random oversampling (taking multiple copies of sequences at random to rebalance the data). This procedure avoids generating “wildly hallucinated” sequences with unreliable models. We will better quantify this statement (see below).

      • The training protocol is tailored to push the generative model towards learning binding features between peptide and CDR3β sequences (and not merely fitting their local statistics separately). For example, in the pan-specific setting, during training of the generative model on peptide+CDR3β sequences, the masked language modeling task is modified to force the model to recover the missing amino acid using only the other sequence context.

      We will better stress these points in the revised manuscript. To further validate the generative protocol in the future revision, we will carry out additional sanity checks on the generated data to confirm that the synthetic sequences remain biologically plausible and comparable to real ones.

      (1) Assessment of the performance of the pan-specific protocol for out-of-distribution data:

      To better clarify how the degradation in performance of a classifier tested on out-of-distribution data is impacted by the dissimilarity between test and training data distribution, we will improve the synthetic analysis currently reported in Table 1, adding confidence intervals for accuracy, quantifying thresholds on the distance for the method to work, providing t-SNE embeddings of in- and out-of distribution data.

      (2) Quantification of the threshold for the number of examples per class in order to train the generative model and obtain a performance increase

      In the paper, we adopted an operative common-sense threshold of at least 100 sequences per class in order to apply our data augmentation pipeline. We will quantify this effect testing this threshold in the revised manuscript, in order to (i) emphasize the limits of this two-step generative protocol in the low-data regime and to (ii) assess if the generative model falls back to a random oversampling strategy (due to strong overfitting) when few data are available for training.

      (3) Motivation for the use of RBMs:

      While RBMs have known limitations, their use in our pipeline (together with the more modern TCR-BERT, that we also test) is mainly motivated by the fact that they provide measurable increases in performance with data augmentation despite their simple 2-layer architecture. We stress that simpler generative (profile) models are unable to show this increase, see Appendix 3. In this respect, the RBM provides a minimal generative model allowing us to augment data successfully, and a lower bound to the increase of performance with respect to more complex architectures trained on more data. We will report this point of view in the text.

      (4) Clarification on the role of lattice proteins as an oversimplified toy model for protein interaction

      We agree with the points raised by Reviewer #2 on the limitations of lattice proteins as a model for protein interaction. Indeed, we used it merely as a toy model for phenomenology, a strategy whose validity has been fairly acknowledged by the reviewer. We will report in the main text all the drastic simplifications and reasons why the reader should take the comparison to real data with great care.

    1. eLife Assessment

      This valuable study combines microscopy and CRISPR screening in two different cell lines to identify factors involved in global chromatin organization, using centromere clustering as a proxy. Follow-up cell cycle synchronisation studies confirm roles in centromere clustering in mitosis. However, incomplete characterisation of the cell lines used limits the interpretation of the findings. The study will interest researchers studying genome organisation in mitosis.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Guin and colleagues establish a microscopy-based CRISPR screen to find new factors involved in global chromatin organization. As a proxy of global chromatin organization, they use centromere clustering in two different cell lines. They find 52 genes whose CRISPR depletion leads to centromere clustering defects in both cell lines. Using cell cycle synchronisation, they demonstrate that centromeres-redistribution upon depletion of these hits necessitates cell cycle progression through mitosis.

      Strengths:

      This manuscript explores the mechanisms of global chromatin organization, which is a scale of chromatin organization that remains poorly understood. The imaging-based CRISPR screen is very elegant, and the use of appropriate positive and negative controls reinforces the solidity of the findings.

      Weaknesses:

      Although the data are generally solid and well interpreted, a control showing that protein depletion works properly in cell-cycle arrested cells is lacking, both when using siRNAs and degron-based depletion.

    3. Reviewer #2 (Public review):

      The authors begin by highlighting the importance of genome organisation in cellular compartmentalisation and identity. They focus their study on centromeres - key chromosomal features required for segregation-and aim to identify proteins responsible for their spatial distribution in interphase nuclei. However, none of the experimental data addresses broader aspects of genome architecture, such as individual chromosome territories or A/B compartments. As such, the title of the article may be misleading and would benefit from being more specific, for example: "Identification of factors influencing centromere positioning in interphase."

      Strengths:

      One of the strengths of the paper is the comprehensive CRISPR-based screening and the comparative analysis between two distinct cell lines.

      Including further investigation into factors that behave differently across these cell lines - particularly in relation to expression levels or the unique "inverted architecture" of RPE cells-would have added valuable depth.

      Weaknesses:

      The filtering strategy used in the screen imposes significant constraints, as it selects only for non-essential or functionally redundant genes. This is a critical point, as key regulators of chromatin organisation - such as components of the condensin and cohesin complexes-are typically essential for viability. Similarly, known effectors of centromere behaviour (e.g., work by the Fachinetti's lab) often lead to aneuploidy, micronuclei formation, and cell cycle arrest in G1. The implication of this selection criterion should be clearly discussed, as it fundamentally shapes the interpretation of the study's findings.

      A major limitation of the study is the lack of connection between centromere clustering and its biological significance. It remains unclear whether this clustering is a meaningful proxy for higher-order genome organisation. Additionally, the study does not explore potential links to cell identity or transcriptional landscapes. Readers may struggle to grasp the broader relevance of the findings: if gene knockouts that alter centromere positioning do not affect cell viability or cell cycle progression, does this imply that centromere clustering - and by extension, interphase genome organisation - is not biologically significant?

      Another point requiring clarification is the conclusion that the four identified genes represent independent pathways regulating centromere clustering. In reality, all of these proteins localise to centromeres. For example, SPC24 and NUF2 are components of the NDC80 complex; Ki-67, a chromosome periphery protein, has been mapped to centromeres; and CAP-Hs, a subunit of the condensin II complex that during G1 promotes CENP-A deposition. Given their shared localisation, it would be informative to assess aneuploidy indices following depletion of each factor. Chromosome-specific probes could help determine whether centromere dysfunction leads to general mis-segregation or reflects distinct molecular mechanisms. Additionally, exploring whether Ki-67 mutants that affect its surfactant-like properties influence centromere clustering could provide a more mechanistic insight.

      Finally, the additive effects observed in double knockdowns do not necessarily confirm pathway independence. It is possible that mild mis-segregation effects are amplified when two proteins within the same pathway are depleted. This possibility should be considered in the interpretation of the data.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Guin et al. use a CRISPR KO screen of ~1000 candidates in two human cell lines, along with high-throughput image analysis, to demonstrate that orderly progression through mitosis shapes centromere organization. They identify ~50 genes that perturb centromere clustering when depleted in both RPE1 and HCT116 cells and validate many of these hits using RNAi. They then use auxin-mediated acute depletion of four factors (NCAPH2, KI67, SPC24, and NUF2) to demonstrate that their effects on centromere clustering require passage through mitosis. They further suggest that the lack of these factors during mitosis leads to the disorganization of centromeres on the mitotic spindle, and these effects persist in the subsequent interphase. Overall, the manuscript is clear, well-written, the experiments performed are appropriate, and the data are interpreted accurately. In my opinion, the main strength of this manuscript is the discovery of several hits associated with altered centromere organization. These hits will serve as a solid foundation for future work investigating genome organization in human cells. On the other hand, how the changes in centromere organization relate to other aspects of interphase genome architecture (A/B compartments, chromosome territories, etc) remains unclear and represents the main shortcoming of this manuscript.

      Comments:

      (1) Given the authors' suggestion that disorderly mitotic progression underlies the changes in centromere clustering in the subsequent interphase, I think it would be beneficial to showcase examples of disorderly mitosis in the AID samples and perhaps even quantify the misalignment on the metaphase plate.

      (2) I don't quite agree with the description that centromeres cluster into chromocenters (p4 para 2, p17 para 1, and other instances in the manuscript). To the best of my knowledge, chromocenters primarily consist of clustered pericentromeric heterochromatin, while the centromeres are studded on the chromocenter surface. This has been beautifully demonstrated in mouse cells (Guenatri et al., JCB, 2004), but it is true in other systems like flies and plants as well.

    1. eLife Assessment

      This important study reports that two distinct waves of ovarian follicles contribute to oocyte production in mice. The paper provides large amounts of data that will benefit future studies, although the methods and analysis are considered incomplete at present. Justification for the criteria of wave 1 follicles would benefit from further explanation and discussion. This work will be of interest to ovarian biologists and physicians working on female infertility.

    2. Reviewer #1 (Public review):

      Multiple waves of follicles have been proven to exist in multiple species, and different waves of follicles contribute differently to oogenesis and fertility. This work characterizes the wave 1 follicles in mouse comprehensively and compares different waves of follicles regarding their cellular and molecular features. Elegant mouse genetics methods are applied to provide lineage tracing of the wave 1 folliculogenesis, together with sophisticated microscopic imaging and analyses. Single-cell RNA-seq is further applied to profile the molecular features of cells in mouse ovaries from week 2 until week 6. While extensive details about the wave 1 follicles, especially the atresia process, are provided, the authors also identified another group of follicles located in the medullary-cortical boundary, which could also be labeled by the FoxL2-mediated lineage tracing method. The "boundary" or "wave 1.5" follicles are proposed by the authors to be the earliest wave 2 follicles, which contribute to the early fertility of puberty mice, instead of the wave 1 follicles, which undergo atresia with very few oocytes generated. The wave 1 follicle atresia, which degrades most oocytes, on the other hand, expands the number of theca cells and generates the interstitial gland cells in the medulla, where the wave 1 follicles are located. These gland cells likely contribute to the generation of androgen and estrogen, which shape oogenesis and animal development. By comparing scRNA-seq data from cells collected from week 2 until week 6 ovaries, the author profiled the changes in numbers of different cells and identified key genes that differ between wave 1 and wave 2 follicles, which could potentially be another driver of different waves of folliculogenesis. In summary, the authors provide a high amount of new results with good quality that illustrate the molecular and cellular features of different waves of mouse follicles, which could be further reused by other researchers in related fields. The findings related to the boundary follicles could potentially bring many new findings related to oogenesis.

      This paper is overall well-written with solid and intriguing conclusions that are well supported. The reviewer only has some minor comments for the authors' consideration that could potentially help with the readability of the paper.

      (1) The authors identify the wave 1.5 follicles at the medullary-cortex boundary, which begin to develop shortly after 2 weeks. Since the authors already collected scRNA-seq data from week 2 until week 6, could any special gene expression patterns be identified that make wave 1.5 follicle cells different from wave 1 and wave 2?

      (2) Are Figures 1C and 1E Z projections from multiple IF slices? If so, please provide representative IF slice(s) from Figures 1C and 1E and clearly label wave 1 and wave 2 follicles to better illustrate how the wave 1 follicles are clarified and quantified.

      (3) For Figure 3D, please also provide an image showing the whole ovary section, like in Figures 3A and 3C, to better illustrate the localization and abundance of different cells.

      (4) In Figure 4H, expressions of HSD3B1 and PLIN1 seem to be detected in almost all medulla cells. Does this mean all medulla cells gain gland cell features? Or there is only a subset of the medulla cells that are actively expressing these 2 proteins. Please provide image(s) with higher magnification to show more clearly how the expression of these 2 proteins differs among different cells.

      (5) Figure 5: The authors discussed cell number changes for different types of cells from week 2 to week 6. A table, or some plots, visualizing numbers of different cell types, instead of just providing original clusters in Dataset S6, at different time points, would make the changes easier to observe.

      (6) Figure S7: It would be more helpful to directly show the number of wave 1 follicles.

      (7) Did the fluorescence cryosection staining (Line 587 - 595) use the same buffers as in the whole-mount staining (Line 575 - 586)? Please clarify.

      (8) In Line 618, what tissue samples were collected? Please point out clearly.

    3. Reviewer #2 (Public review):

      Summary:

      This study explores an important question concerning the developmental trajectory of wave 1 ovarian follicles, leveraging valuable tools such as lineage tracing and single-cell RNA sequencing. These approaches position the authors well to dissect early follicle dynamics. The study would benefit from more in-depth analysis, including quantification using the lineage-traced ovaries, and comparison of wave 1 and 2 follicular cells per stage within the single cell dataset.

      Strengths:

      This study aims to address an important question regarding the developmental trajectories of wave 1 ovarian follicles and how they differ from wave 2 follicles that contribute to long-term fertility. This is an important topic, as many studies on ovarian follicle development rely on samples collected at perinatal timepoints in the mouse, which primarily represent wave 1 follicles, to infer later fertility. The research group has the tools and expertise necessary to tackle these questions.

      Weaknesses:

      Wave 1 follicles are quantified based on the criteria of oocytes larger than 20 µm located within the medullary region, using whole-mount staining. However, the boundary between the medulla and cortex appears somewhat arbitrary. Quantification using FOXL2-lineage-traced ovaries provides a more reliable method for identifying wave 1 follicles. As the developmental trajectory of wave 1 follicles has been well described in Zhang et al. 2013, it would be valuable to provide a more detailed quantification of both labeled and unlabeled follicles by specific follicle stages. In fact, in Zhang et al. 2013, the authors demonstrated that lineage-labeled primordial follicles can be found at the cortex-medulla boundary, suggesting that the observation of labeled "border follicles" is not unexpected. Quantification by follicle stage would provide greater insight into the timing and development of these follicles.

      Similarly, the analysis of wave 1 follicle loss should be performed on lineage-traced ovaries using cell death markers to demonstrate the loss of oocytes and granulosa cells, while confirming the preservation of theca and interstitial cells. In particular, granulosa cell loss should be assessed directly with cell death markers in lineage-traced ovaries, rather than from the loss of tamoxifen-labeled cells, as labeling efficiency varies between follicles (Figure 2G).

      Single-cell RNA sequencing presents a valuable dataset capturing the development of first-wave follicles. The use of a 40µm cell strainer during cell collection for the 10x platform may explain the exclusion of larger oocytes. However, it is still surprising that no oocytes were captured at all. The central question, how wave 1 follicular cells differ from wave 2 cells, should be investigated in more depth, with results validated on FOXL2-lineage-traced ovaries (i.e., Wnt4 staining in wave 1 antral follicles versus wave 2 using lineage-traced ovaries). This analysis should span all stages of follicle development. It also appears to be a missed opportunity that the single-cell sequencing analysis was not performed on lineage-traced ovaries, which would have enabled more definitive identification of wave 1-derived cells.

      Finally, this study does not directly assess fertility outcomes and should therefore refrain from drawing conclusions about the fertility potential of wave 1 follicles.

    4. Author response:

      The eLife assessment states that our manuscript is important only as a source of data for others to use in the future. Our methods and analysis of wave 1 follicles were said to be "incomplete" because one of two reviewers claimed we did not prove that 80% of wave 1 oocytes turn over by 5 wk.

      We believe that this assessment is simply wrong because critical supporting data already present in the existing manuscript was not understood by one reviewer. Wave 1 follicular oocyte turnover was said to be unproven and to remain uncertain because evidence of death was based only on a lack of Ddx4 staining. New experiments documenting expression of cell death markers, were said to be needed to show the oocytes died. However, our work was not based on the analysis of sectioned material, but used whole mount 3D reconstruction microscopy of cleared ovary preparations. Oocyte death was determined by the absence of an oocyte in fully reconstructed follicles and its replacement with an empty cavity, not just the absence of antibody staining. We included images and complete 3D reconstruction movies documenting these methods. The paper also documents that the holes frequently still contained zona pellucida remnants indicating the former presence of an oocyte. Moreover, we observed many intermediates of oocyte death- shrunken and deformed oocytes- and deformations of follicle structures due to the presence of the empty cavities. Controls showed that Ddx4 staining in the context of 3D imaging always revealed an obvious giant labeled oocyte in 100% of wave 1 follicles prior to death, and in wave 1.5 and wave 2 follicles at all stages. Thus, our methodology is already fully reliable. The reviewer is correct that the entire program of wave 1 development including their programmed turnover would be interesting to explore further. We already provided a large amount of new gene expression information, and documented the first examples of wave 1-specific gene expression. Further studies are not needed for the major conclusions of the paper and can wait for a follow up study.

      Secondly, the existence of wave 1.5 is not "speculative," as stated by the reviewing editor. We extensively validated and quantified the existence of wave 1.5 primordial follicles following Foxl2-cre activation at E16.5, and analysis at 2 wks in multiple experiments. Additionally, we showed wave 1.5 follicles were present at the medullar/cortex border at 2 wks even after activation of Foxl2-cre at E14.5. Our paper also connected for the first time wave 1.5 follicles to a population of non-growing, "poised" primordial follicles at this identical location near the medulla/cortex boundary by Meinsohn et al. in 2021. These follicles had not started to develop yet, and their ultimate fate was not known. We followed the development of these follicles and determined several differences in wave 1.5 follicle gene expression compared to wave 1. As noted in the assessment, our findings on wave 1.5 are now already being extended to other systems such as primate ovaries (adopting our name "wave 1.5" from our bioRxiv manuscript). The simultaneous claims that our discovery of wave 1.5 exists is speculative, and also that other people are finding wave 1.5 follicles in the species they are studying are logically incompatible.

      Response to reviewer 2:

      Line 239-245: Please note that Zhang et al. 2013 also show that lineage-labeled primordial follicles can be found at the cortex-medulla boundary (see their Figure 1B).

      The single image in the Zheng et al. 2014 paper may or may not show mosaic primordial follicles, but it would not be surprising since the experiment was identical to experiments in the paper. However, that single picture is only meaningful in the context of our subsequent work reported in the current manuscript. There was no mention of these follicles in the text of Zheng et al. 2014, no documentation or quantitation of their numbers, and no discussion or understanding of their significance. The incorrect conclusions of the paper were that wave 1 follicles- meaning rapidly developing follicles in the medulla- give rise to most early offspring. This conclusion reversed the previously accepted (and essentially correct) view that wave 1 follicles did not contribute significantly to fertility.

      "Finally, this study does not directly assess fertility outcomes and should therefore refrain from drawing conclusions about the fertility potential of wave 1 follicles." 

      We showed by lineage marking that only about 25 of about 200 wave 1 follicles survive even to wk 5. This clearly does prove our conclusion that the great majority of wave 1 follicles do not contribute to fertility.

    1. eLife Assessment

      This important study reports that higher genetically predicted BMI is associated with a modestly increased risk of head and neck cancer. The convincing evidence is supported by rigorous Mendelian Randomization approaches, using multiple genetic instruments and models that reduce sensitivity to pleiotropy. However, results from pleiotropy-robust analyses were less consistent, which limits the strength of causal inference. The work will be of interest to researchers studying cancer risk factors and genetic epidemiology.

    2. Reviewer #1 (Public review):

      Summary:

      The authors have conducted the largest to date Mendelian Randomization (MR) analysis of the association between genetically predicted measures of adiposity and risk of head and neck cancer (HNC) overall and by subsites within HNC. MR uses genetic predictors of an exposure, such as gene variants associated with high BMI or tobacco use, rather than data from individual physical exams or questionnaires and if it can be done in its idealized state, there should be no problems with confounding. Traditional epidemiologic studies have reported a variety of associations between BMI (and a few other measures of adiposity) and risk of HNC that typically differs by the smoking status of the subjects. Those findings are controversial given the complex relationship between tobacco and both BMI and HNC risk. Tobacco smokers are often thinner than no-smokers so this could create an artificial ('confounded') association that may not be fully adjusted away in risk models. The findings of a BMI-HNC association are often attributed to residual confounding and this seems ripe for an MR approach if suitable genetic instrumental variables can be created. Here the authors built a variety of genetic instrumental variables for BMI and other measures of adiposity as well as two instrumental variables for smoking habits and then tested their hypotheses in a large case-controls set of HNC and controls with genetic data.

      The authors found that the genetic model for BMI was associated with HNC risk in simple models, but this association disappeared when using models that better accounted for pleiotropy, the condition when genetic variants are associated with more than one trait such as both BMI and tobacco use. When they used both adiposity and tobacco use genetic instruments in a single model, there was a strong association with genetically predicted tobacco use (as is expected) but there was no remaining association with genetic predictors of adiposity. They conclude that high BMI/adiposity is not a risk factor for HNC.

      Strengths:

      The primary strength was the expansive use of a variety of different genetic instruments for BMI/adiposity/body size along with employing a variety of MR model types, several of which are known to be less sensitive to pleiotropy. They also used the largest case-control sample size to date.

      Weaknesses:

      The lack of pleiotropy is an unconfirmable assumption of MR and the addition of those models is therefore quite important as this is a primary weakness of the MR approach. Given that concern, I read the sensitivity analyses using pleiotropy-robust models as the main result and in that case, they are more limited in their ability to test their hypothesis as these models do not show a robust BMI instrumental variable association.

      Comments on the revised manuscript:

      After the first round of review, the authors have improved the manuscript by (1) adding the requested power calculations and adding text to help the reader integrate that additional information; (2) adding the main effects for the tobacco instruments; (3) updating the comparison of their results to the prior literature; (4) and some other edits to the text. They have declined to include the smoking stratified estimates and provide a rationale for this decision that references the potential for collider bias. While true that yet another bias might be introduced, that gets added to the list and the careful reader would know that. Many important questions in cancer etiology can only be addressed via observational approaches and each observational approach has the potential for a long list of biases. The best inference comes from integrating the totality of the data and realizing that most conclusions are subject to updating as we conduct more work and learn more.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Weaknesses:

      The lack of pleiotropy is an unconfirmable assumption of MR, and the addition of those models is therefore quite important, as this is a primary weakness of the MR approach. Given that concern, I read the sensitivity analyses using pleiotropy-robust models as the main result, and in that case, they can't test their hypotheses as these models do not show a BMI instrumental variable association. The other weakness, which might be remedied, is that the power of the tests here is not described. When a hypothesis is tested with an under-powered model, the apparent lack of association could be due to inadequate sample size rather than a true null. Typically, when a statistically significant association is reported, power concerns are discounted as long as the study is not so small as to create spurious findings. That is the case with their primary BMI instrumental variable model - they find an association so we can presume it was adequately powered. But the primary models they share are not the pleiotropy-robust methods MR-Egger, weighted median, and weighted mode. The tests for these models are null, and that could mean a couple of things: (1) the original primary significant association between the BMI genetic instrument was due to pleiotropy, and they therefore don't have a robust model to explore the effects of the tobacco genetic instrument. (2) The power for the sensitivity analysis models (the pleiotropy-robust methods) is inadequate, and the authors share no discussion about the relative power of the different MR approaches. If they do have adequate power, then again, there is no need to explore the tobacco instrument.

      Reviewing Editor Comments:

      We suggest that the authors add power estimates to assess whether the sample size is sufficient, given the strength and variability of the genetic instruments. It would also be helpful to present effect estimates for the tobacco instruments alone, to clarify their independent contribution and improve the interpretation of the joint models. In addition, the role of pleiotropy should be addressed more clearly, including which model is considered primary. Stratified analyses by smoking status are encouraged, as prior studies indicate that BMI-HNC associations may differ between smokers and non-smokers. Finally, the comparison with previous studies should be revised, as most reported null findings without accounting for tobacco instruments. If this study finds an association, it should not be framed as a replication

      We would like to highlight that post-hoc power calculations are often considered redundant since the statistical power estimated for an observed association is directly related to its p-value[1]. In other words, the uncertainty of the association is already reflected in its 95% confidence interval. However, we understand power calculations may still be of interest to the reader, so we have incorporated them in the revised manuscript. We have edited the text as follows (lines 151-155):“Consequently, we used the total R<sup>2</sup> values to examine the statistical power in our study[42]. However, we acknowledge that the value of post-hoc power calculations is limited, since the statistical power estimated for an observed association is already reflected in the 95% confidence interval presented alongside the point estimate[43].” We have also added supplementary figures 1 and 2.

      We can see that when using the latest HEADSpAcE data we were able to detect BMI-HNC ORs as small as 1.16 with 80% power, while the GAME-ON dataset only permitted the detection of ORs as small as 1.26 using the same BMI instruments (Figure B). We have explained these figures in the results section as follows (lines 257-263): “Using the BMI genetic instruments (total R<sup>2</sup>= 4.8%) and an α of 0.05, we had 80% statistical power to detect an OR as small as 1.16 for HNC risk (Supplementary Figure 1). For WHR (total R<sup>2</sup>= 3.1%) and WC (total R<sup>2</sup>= 4.4%), we could detect odds ratios (ORs) as small as 1.20 and 1.17, respectively. This is an improvement in terms of statistical power compared to the GAME-ON analysis published by Gormley et al.[28], for which there was 80% power to detect an OR as small as 1.26 using the same BMI genetic instruments (Supplementary Figure 2).”

      The reason we use inverse variance weighted (IVW) Mendelian randomization (MR) to obtain our main results rather than the pleiotropy-robust methods mentioned by the reviewer/editors (i.e., MR-Egger, weighted median and weighted mode) is that the former has greater statistical power than the latter[2]. Hence, instead of focussing on the statistical significance of the pleiotropy-robust analyses, we consider it is of more value to compare the consistency of the effect sizes and direction of the effect estimates across methods. Any evidence of such consistency increases our confidence in our main findings, since each method relies on different assumptions. As we cannot be sure about the presence and nature of horizontal pleiotropy, it is useful to compare results across methods even though they are not equally powered. It is true that our results for the genetically predicted effects of body mass index (BMI) on the risk of head and neck cancer (HNC) differ across methods. This is precisely what led us to question the validity of our main finding (suggesting a positive effect of BMI on HNC risk). We have now clarified this in the methods section of the revised manuscript as advised. Lines 165-171:

      “Because the IVW method assumes all genetic variants are valid instruments[44], which is unlikely the case, three pleiotropy-robust two-sample MR methods (i.e., MR-Egger[45], weighted median[46] and weighted mode[47]) were used in sensitivity analyses. When the magnitude and direction of effect estimates are consistent across methods that rely on different assumptions, the main findings are more convincing. As we cannot be sure about the presence and nature of horizontal pleiotropy, it is useful to compare results across methods even if they are not equally powered.”

      We understand that the reviewer/editors are concerned that we do not have a robust model to explore the role of tobacco consumption in the link between BMI and HNC. However, we have a different perspective on the matter. If indeed, the main IVW finding for BMI and HNC is due to pleiotropy (since some of the pleiotropy-robust methods suggest conflicting results), then the IVW multivariable MR method is a way to explore the potential source of this bias[3]. We were particularly interested in exploring the role of smoking in the observed association because smoking and adiposity are known to influence each other [4-9] and share a genetic basis[10, 11].

      We agree that it would be useful to present the univariable MR effect estimates for smoking behaviour and HNC risk along those obtained using multivariable MR. We have now included the univariable MR estimates for both smoking behaviour variables as a note under Supplementary Table 11 and in the manuscript (lines 316-318): “In univariable IVW MR, both CSI and SI were linked to an increased risk of HNC (CSI OR=4.47 per 1-SD higher CSI, 95%CI 3.31–6.03, p<0.001; SI OR=2.07 per 1-SD higher SI 95%CI 1.60–2.68, p<0.001) (Additional File 2: note in Supplementary Table 11).”

      We understand the appeal of conducting stratified MR analyses by smoking status. However, we anticipate such analyses would hinder the interpretation of our findings as they can induce collider bias which could spuriously lead to different effect estimates across strata[12, 13].

      We thank the reviewer/editors for their comment regarding the way we frame of our findings. We have now edited the discussion section to highlight our study results are different to those obtained in studies that do not account for smoking behaviour. Lines 398-401: “With a much larger sample (N=31,523, including 12,264 cases), our IVW MR analysis suggested BMI may play a role in HNC risk, in contrast to previous studies. However, our sensitivity analyses implied that causality was uncertain.”

      Reviewer #1 (Recommendations for the authors):

      The authors do share a table of the percent variance explained of the different genetic instruments, which vary widely, and that table is very welcome because we can get some sense of their utility. The problem is that they don't translate that into a power estimate for the case-control study size that they use. They say that it is the biggest to date, which is good, but without some formal power estimate, it is not particularly reassuring. A framework for MR study power estimates was reported in PMID: 19174578, but that was using very simple MR constructs in use in 2009, and it isn't clear to me if that framework can be used here. That power paper suggests that weak genetic instruments need very large sample sizes, far larger than what is used in the current manuscript. I am unable to estimate the true strength of the instruments used here, and so I am unsure of whether power is an issue or not.

      We have now included power calculations in our manuscript to address the reviewer’s concerns. Nevertheless, as mentioned above, post-hoc power calculations are of limited value, as statistical power is already reflected in the uncertainty around the point estimates (the 95% confidence intervals). Hence, it is important to avoid drawing conclusions regarding the likelihood of true effects or false negatives based on these calculations.

      Although the hypothesis here is that smoking accounts for the apparent BMI association previously reported for HNC, it would have been preferable to see the estimates for their 2 genetic instruments for tobacco alone. The current results only show the BMI instruments alone and then with the tobacco instruments. I would like to see what the risk estimates are for the tobacco instrument alone, so that I can judge for myself what happens in the joint models. As presented, one can only do that for the BMI instruments.

      We thank the reviewer for this comment. The univariable IVW MR estimate of smoking initiation was OR=2.07 (95%CI 1.60 to 2.68, p<0.001), while the one for comprehensive smoking index was OR=4.47 (95%CI 3.31 to 6.03, p<0.001). We have included this information in the manuscript as requested (please see response to reviewing editor above).

      On line 319, they write that "We did not find evidence against bias due to correlated pleiotropy..." I find this difficult to parse, but I think it means that they should believe that correlated pleiotropy remains a problem. So again, they seem to see their primary model as compromised, and so do I. This limitation is again stated by the authors on lines 351-352.

      We apologise if the wording of the sentence was not easy to understand. When using the CAUSE method, we did not find evidence to reject the null hypothesis that the sharing (correlated pleiotropy) model fits the data at least as well as the causal model. In other words, our CAUSE finding and the inconsistencies observed across our other sensitivity analyses led us to believe that our main IVW MR estimate for BMI-HNC was likely biased by correlated pleiotropy. We believe it is important to explore the source of this bias, which is why we used multivariable MR to investigate the direct effect of BMI on HNC risk while accounting for smoking behaviour.

      In the following paragraphs (lines 358-369), the authors state that their findings are consistent with prior reports, but that doesn't seem to be the case if we take their primary BMI instrument as representing the outcome of this manuscript. Here, they find an association between the BMI instrument and HNC risk, but in each of the other papers they present the primary finding was null without the extensive model changes or the aim of accounting for tobacco with another instrument. I don't see that as replication.

      This is a good point. We have now edited the discussion of our manuscript to avoid giving the impression that our findings replicate those from studies that do not account for smoking behaviour in their analyses. We have edited lines 384-401 as follows:

      “Previous MR studies suggest adiposity does not influence HNC risk[27-29]. Gormley et al.[28] did not find a genetically predicted effect of adiposity on combined oral and oropharyngeal cancer when investigating either BMI (OR=0.89 per 1-SD, 95% CI 0.72–1.09, p=0.26), WHR (OR=0.98 per 1-SD, 95% CI 0.74–1.29, p=0.88) or waist circumference (OR=0.73 per 1-SD, 95% CI 0.52–1.02, p=0.07) as risk factors. Similarly, a large two-sample MR study by Vithayathil et al.[29] including 367,561 UK Biobank participants (of which 1,983 were HNC cases) found no link between BMI and HNC risk (OR=0.98 per 1-SD higher BMI, 95% CI 0.93–1.02, p=0.35). Larsson et al.[27] meta-analysed Vithayathil et al.’s[29] findings with results obtained using FinnGen data to increase the sample size even further (N=586,353, including 2,109 cases), but still did not find a genetically predicted effect of BMI on HNC risk (OR=0.96 per 1-SD higher BMI, 95% CI 0.77–1.19, p=0.69). With a much larger sample (N=31,523, including 12,264 cases), our IVW MR analysis suggested BMI may play a role in HNC risk, in contrast to previous studies. However, our sensitivity analyses implied that causality was uncertain.”

      We also deleted part of a sentence in the discussion section, so lines 416-418 now look as follows: “An important strength of our study was that the HEADSpAcE consortium GWAS used had a large sample size which conferred more statistical power to detect effects of adiposity on HNC risk compared to previous MR analyses[27-29].”

      On lines 384-386 they note a strength is that this is the largest study to date, but I would reiterate that larger and more powerful does not equate to adequately powered.

      This is true. We have included power calculations in the manuscript as requested.

      It's well known that different HNC subsites have different etiologies, as they mention on lines 391-392, and it is implicit in their use of data on HPV positive and negative oropharyngeal cancer. They say that they did not find evidence for heterogeneity in this study, but that would only be true for the null BMI instrument. The effect sizes for their smoking instruments are strikingly different between the subsites.

      We agree and are sorry for the confusion we may have caused by the way we worded our findings. We have edited the text to clarify that the lack of subsite heterogeneity only applied to our results for BMI/WHC/WC-HNC risk. Lines 418-424 now read as follows:

      “Furthermore, the availability of data on more HNC subsites, including oropharyngeal cancers by HPV status, allowed us to investigate the relationship between adiposity and HNC risk in more detail than previous MR studies which limited their subsite analyses to oral cavity and overall oropharyngeal cancers[28, 68]. This is relevant because distinct HNC subsites are known to have different aetiologies[69], although we did not find evidence of heterogeneity across subsites in our analyses investigating the genetically predicted effects of BMI, WHR and WC on HNC risk.”

      Finally, the literature on mutational patterns gives us strong reason to believe that HNC caused by tobacco are biologically distinct from tumors not caused by tobacco. The authors report in the introduction that traditional observational studies of BMI and HNC have reported different findings in smokers versus never smokers, so I would assume there is a possibility that the BMI instrument could have different associations with tumors of the tobacco-induced phenotype and tumors with a non-tobacco induced phenotype. I would assume that authors have access to the data on self-reported tobacco use behavior, even if they can't separate these tumors by molecular types. Stratifying their analysis by tobacco users or not might reveal different results with the BMI instrument.

      We appreciate the reviewer’s comment. We agree that it would have been interesting to present stratified analyses by smoking status along our main findings. However, we decided against this because of the risk of inducing collider bias in our MR analyses i.e., where stratifying on smoking status may induce spurious associations between the adiposity instruments and confounding factors. Multivariable MR is considered a better way of investigating the direct effects of an exposure (adiposity) on an outcome (HNC) accounting for a third variable (smoking)[14], which is why we opted for this method instead.

      References:

      (1) Heinsberg LW, Weeks DE: Post hoc power is not informative. Genet Epidemiol 2022, 46(7):390-394.

      (2) Burgess S, Butterworth A, Thompson SG: Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol 2013, 37(7):658-665.

      (3) Burgess S, Davey Smith G, Davies NM, Dudbridge F, Gill D, Glymour MM, Hartwig FP, Kutalik Z, Holmes MV, Minelli C et al: Guidelines for performing Mendelian randomization investigations: update for summer 2023. Wellcome Open Res 2019, 4:186.

      (4) Morris RW, Taylor AE, Fluharty ME, Bjorngaard JH, Asvold BO, Elvestad Gabrielsen M, Campbell A, Marioni R, Kumari M, Korhonen T et al: Heavier smoking may lead to a relative increase in waist circumference: evidence for a causal relationship from a Mendelian randomisation meta-analysis. The CARTA consortium. BMJ Open 2015, 5(8):e008808.

      (5) Taylor AE, Morris RW, Fluharty ME, Bjorngaard JH, Asvold BO, Gabrielsen ME, Campbell A, Marioni R, Kumari M, Hallfors J et al: Stratification by smoking status reveals an association of CHRNA5-A3-B4 genotype with body mass index in never smokers. PLoS Genet 2014, 10(12):e1004799.

      (6) Taylor AE, Richmond RC, Palviainen T, Loukola A, Wootton RE, Kaprio J, Relton CL, Davey Smith G, Munafo MR: The effect of body mass index on smoking behaviour and nicotine metabolism: a Mendelian randomization study. Hum Mol Genet 2019, 28(8):1322-1330.

      (7) Asvold BO, Bjorngaard JH, Carslake D, Gabrielsen ME, Skorpen F, Smith GD, Romundstad PR: Causal associations of tobacco smoking with cardiovascular risk factors: a Mendelian randomization analysis of the HUNT Study in Norway. Int J Epidemiol 2014, 43(5):1458-1470.

      (8) Carreras-Torres R, Johansson M, Haycock PC, Relton CL, Davey Smith G, Brennan P, Martin RM: Role of obesity in smoking behaviour: Mendelian randomisation study in UK Biobank. BMJ 2018, 361:k1767.

      (9) Freathy RM, Kazeem GR, Morris RW, Johnson PC, Paternoster L, Ebrahim S, Hattersley AT, Hill A, Hingorani AD, Holst C et al: Genetic variation at CHRNA5-CHRNA3-CHRNB4 interacts with smoking status to influence body mass index. Int J Epidemiol 2011, 40(6):1617-1628.

      (10) Thorgeirsson TE, Gudbjartsson DF, Sulem P, Besenbacher S, Styrkarsdottir U, Thorleifsson G, Walters GB, Consortium TAG, Oxford GSKC, consortium E et al: A common biological basis of obesity and nicotine addiction. Transl Psychiatry 2013, 3(10):e308.

      (11) Wills AG, Hopfer C: Phenotypic and genetic relationship between BMI and cigarette smoking in a sample of UK adults. Addict Behav 2019, 89:98-103.

      (12) Coscia C, Gill D, Benitez R, Perez T, Malats N, Burgess S: Avoiding collider bias in Mendelian randomization when performing stratified analyses. Eur J Epidemiol 2022, 37(7):671-682.

      (13) Hamilton FW, Hughes DA, Lu T, Kutalik Z, Gkatzionis A, Tilling K, Hartwig FP, Davey Smith G: Non-linear Mendelian randomization: evaluation of effect modification in the residual and doubly-ranked methods with simulated and empirical examples. Eur J Epidemiol 2025.

      (14) Sanderson E, Davey Smith G, Windmeijer F, Bowden J: An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings. Int J Epidemiol 2019, 48(3):713-727.

    1. eLife Assessment

      This important work examines how microexons contribute to brain activity, structure, and behavior. The authors find that loss of microexon sequences generally has subtle impacts on these metrics in larval zebrafish, with few exceptions. The evidence is solid, using modern high-throughput phenotyping methodology in zebrafish. Overall, this work will be of interest to neuroscientists and generate further studies of interest to the field.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use high-throughput gene editing technology in larval zebrafish to address whether microexons play important roles in the development and functional output of larval circuits. They find that individual microexon deletions rarely impact behavior, brain morphology, or activity, and raise the possibility that behavioral dysregulation occurs only with more global loss of microexon splicing regulation. Other possibilities exist: perhaps microexon splicing is more critical for later stages of brain development, perhaps microexon splicing is more critical in mammals, or perhaps the behavioral phenotypes observed when microexon splicing is lost are associated with loss of splicing in only a few genes.

      Strengths:

      - The authors provide a qualitative analysis of microexon inclusion during early zebrafish development

      - The authors provide comprehensive phenotyping of microexon mutants, addressing the role of individual microexons in the regulation of brain morphology, activity, and behavior.

    3. Reviewer #3 (Public review):

      Summary:

      This paper sought to understand how microexons influence early brain function. By selectively deleting a large number of conserved microexons and then phenotyping the mutants with a behavior and brain activity assays, the authors find that most microexons have minimal effects on the global brain activity and broad behaviors of the larval fish-- although a few do have phenotypes.

      Strengths:

      The work takes full advantage of the scale that is afforded in zebrafish, generating a large mutant collection that is missing microexons and systematically phenotyping them with high throughput behaviour and brain activity assays. The work lays an important foundation for future studies that seek to uncover the likely subtle roles that single microexons will play in shaping development and behavior.

      Weaknesses:

      Although the manuscript includes evidence for many mutants that microexon deletion has minimal effect on full length transcript levels, some of the microexon loss does alter transcript levels. Since the mutations usually yielded no phenotype, these effects on full-length transcripts are unlikely to be a major confound. For mircoexon mutants displaying phenotypes, future work will have to tease apart whether secondary effects on the transcripts are contributing to the phenotype.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors use high-throughput gene editing technology in larval zebrafish to address whether microexons play important roles in the development and functional output of larval circuits. They find that individual microexon deletions rarely impact behavior, brain morphology, or activity, and raise the possibility that behavioral dysregulation occurs only with more global loss of microexon splicing regulation. Other possibilities exist: perhaps microexon splicing is more critical for later stages of brain development, perhaps microexon splicing is more critical in mammals, or perhaps the behavioral phenotypes observed when microexon splicing is lost are associated with loss of splicing in only a few genes.

      A few questions remain:

      (1) What is the behavioral consequence for loss of srrm4 and/or loss-of-function mutations in other genes encoding microexon splicing machinery in zebrafish?

      It has been established that srrm4 mutants exhibit no overt morphological phenotypes and are not visually impaired (Ciampi et al., 2022). We are coordinating our publication with Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860), which shows that srrm4 mutants also have minimal behavioral phenotypes. In contrast, srrm3 mutants have severe vision loss, early mortality, and numerous neural and behavioral phenotypes (Ciampi et al., 2022; Lopez-Blanch et al., 2024). We now point out the phenotypes of srrm3/srrm4 mutants in the manuscript.

      We chose not to generate and characterize the behavior and brain activity of srrm3/srrm4 mutants for two reasons: 1) we were aware of two other labs in the zebrafish community that had generated srrm3 and/or srrm4 mutants (Ciampi et al., 2022 and Gupta et al., 2024, https://doi.org/10.1101/2024.11.29.626094; Lopez-Blanch et al., 2024, https://doi.org/10.1101/2024.10.23.619860), and 2) we were far more interested in determining the importance of individual microexons to protein function, rather than loss of the entire splicing program. Microexon inclusion can be controlled by different splicing regulators, such as srrm3 (Ciampi et al., 2022) and possibly other unknown factors. Genetic compensation in srrm4 mutants could also result in microexons still being included through actions of other splicing regulators, complicating the analysis of these regulators. We mention srrm4 in the manuscript to point out that some selected microexons are adjacent to regulatory elements expected of this pathway. We did not, however, choose microexons to mutate based on whether they were regulated by Srrm4, making the characterization of srrm3/srrm4 mutants disconnected from our overarching project goal.

      We have edited the Introduction as follows to clarify our goal: “Studies of splicing regulators such as srrm4 impact the entire splicing program, making it impossible to determine the importance of individual microexons to protein function. Further, microexons could still be differentially included in a regulatory mutant via compensation by other splicing factors ...”

      (2) What is the consequence of loss-of-function in microexon splicing genes on splicing of the genes studied (especially those for which phenotypes were observed).

      We are unclear whether “microexon splicing genes” refers to the splicing regulators srrm3/srrm4, which we choose not to study in this work (see response to point #1 above), or the genes that contain microexons. The severe visual phenotypes of srrm3 mutants confounds the study of microexon splicing in this line because altered splicing levels could be due to downstream changes in this significantly different developmental context. A detailed discussion of splicing consequences on removal of microexons from microexoncontaining genes is in the response to point #4 below.

      (3) For the microexons whose loss is associated with substantial behavioral, morphological, or activity changes, are the same changes observed in loss-of-function mutants for these genes?

      In the first version of the manuscript, we had included two explicit comparisons of microexon loss with a standard loss-of-function allele, one with a phenotype and one without, in Figure S1 (now Figures S3 and S4) of this manuscript. Beyond the two pairs we had included, Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860) described mild behavioral phenotypes for a microexon removal for kif1b, and we showed developmental abnormalities for the kif1b loss-of-function allele (now Figure S3). We have now added a predicted protein-truncating allele for ppp6r3. This new line has phenotypes that are similar but slightly stronger in brain activity and structure than the mutant that lacks only the microexon. The prior Figure S1 (now Figures S3 and S4) was only briefly mentioned in the first version of the manuscript, and we now clarify this point in the Results: “Protein-truncating mutations in eleven additional genes that contain microexons revealed developmental and neural phenotypes in zebrafish (Figure S3, Figure S4), indicating that the genes themselves are involved in biologically relevant pathways. Three of these genes– tenm4, sptan1, and ppp6r3 – are also in our microexon line collection.”

      Additionally, we can draw expected conclusions from the literature, as some genes with our microexon mutations have been studied as typical mutants in zebrafish or mice. We have modified our manuscript to include a discussion of both loss-of-function zebrafish and mouse mutants. See the response to below point #4.

      (4) Do "microexon mutations" presented here result in the precise loss of those microexons from the mRNA sequence? E.g. are there other impacts on mRNA sequence or abundance?

      We acknowledge that unexpected changes to the mRNA of the tested mutants could occur following microexon removal. In particular, all regulatory elements should be removed from the region surrounding the microexon, as any remaining elements could drive the inclusion of unexpected exons that result in premature stop codons.

      First, we have clarified our generated mutant alleles by adding a figure (Figure S1) that details the location of the gRNA cut sites in relation to the microexon, its predicted regulatory elements, and its neighboring exons.

      Second, we have experimentally determined whether the mRNA was modified as expected for a subset of mutants with phenotypes. In all eight tested lines (Figure S2), the microexon was precisely eliminated without causing any other effects on the sequence of the transcript in the neighboring region. We did, however, observe an effect on transcript abundance for one homozygous mutant (vav2). It is possible that complex forms of genetic regulation are occurring that are not induced by unexpected isoforms or premature stop codons. Interestingly, Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860) eliminated a different microexon in vav2 and also observed a subtle well center preference. If their allele from an entirely different intronic region also results in transcript downregulation, it would support the hypothesis of genetic compensation through atypical pathways. If not, it is likely this phenotype is due specifically to removal of the microexon protein sequence. Not all mutants with phenotypes could be assessed with qRT-PCR because some were no longer present in the lab. All lines were generated in a similar way, however, removing both the microexon and neighboring regulatory elements while avoiding the neighboring exons. Accordingly, we now also explicitly point out those where the clean loss of the microexon was confirmed (eif4g3b, ppp6r3, sptan1, vti1a, meaf6, nrxn1a, tenm3) and those with possibly interesting phenotypes that were not confirmed (ptprd-1, ptprd-2, rapgef2, dctn4, dop1a, mapk8ip3).

      Third, we have further emphasized in the manuscript that these observed phenotypes are extremely mild compared to those observed in over one hundred protein-truncating mutations we have assessed in previous (Thyme et al., 2019; Capps et al., 2024) and unpublished ongoing work. We showed data for one mutant, tcf7l2, which we consider to have moderately strong neural phenotypes, and we have extended this comparison in the revision (new Figure 3G). Additionally, loss-of-function alleles for some microexoncontaining genes have strong developmental phenotypes, as we showed in Figure S1 (now Figures S3 and S4) of this manuscript in addition to our published work (Thyme et al., 2019; Capps et al., 2024). It is known from the literature that the loss-of-function mutants for mapk8ip3 are stronger than we observed here (Tuttle., et al., 2019), suggesting that only the microexon is removed in our line. The microexons in Ptprd are also well-studied in mice, and we expect that only the microexon was removed in our lines. Both Dctn4 and Rapgef2 are completely lethal prior to weaning in mice (the International Mouse Phenotyping Consortium).

      (5) Microexons with a "canonical layout" (containing TGC / UC repeats) were selected based on the likelihood that they are regulated by srrm4. Are there other parallel pathways important for regulating the inclusion of microexons? Is it possible to speculate on whether they might be more important in zebrafish or in the case of early brain development?

      The microexons were not selected based on the likelihood that they were regulated by Srrm4. We have clarified the manuscript regarding this point. There are parallel pathways that can control the inclusion of microexons, such as Srrm3 (Ciampi et al., 2022). It is wellknown that loss of srrm3 has a stronger impact on zebrafish development than srrm4 (Ciampi et al., 2022). The goal of our work was not to investigate these splicing regulators but instead to determine the individual importance of these highly conserved protein changes.

      Strengths:

      (1) The authors provide a qualitative analysis of splicing plasticity for microexons during early zebrafish development.

      (2) The authors provide comprehensive phenotyping of microexon mutants, addressing the role of individual microexons in the regulation of brain morphology, activity, and behavior.

      We thank the reviewer for their support. The pErk brain activity mapping method is highly sensitive, significantly minimizing the likelihood that the field has simply not looked hard enough for a neural phenotype in these microexon mutants. In our published work (Thyme et al., 2019), we show that brain activity can be drastically impacted without manifesting in differences in those behaviors assessed in a typical larval screen (e.g., tcf4, cnnm2, and more).

      Weaknesses:

      (1) It is difficult to interpret the largely negative findings reported in this paper without knowing how the loss of srrm4 affects brain activity, morphology, and behavior in zebrafish.

      See response to point 1.

      (2) The authors do not present experiments directly testing the effects of their mutations on RNA splicing/abundance.

      See response to point 4.

      (3) A comparison between loss-of-function phenotypes and loss-of-microexon splicing phenotypes could help interpret the findings from positive hits.

      See response to points 3 and 4.

      Reviewer #2 (Public review):

      Summary:

      The manuscript from Calhoun et al. uses a well-established screening protocol to investigate the functions of microexons in zebrafish neurodevelopment. Microexons have gained prominence recently due to their enriched expression in neural tissues and misregulation in autism spectrum disease. However, screening of microexon functionality has thus far been limited in scope. The authors address this lack of knowledge by establishing zebrafish microexon CRISPR deletion lines for 45 microexons chosen in genes likely to play a role in CNS development. Using their high throughput protocol to test larval behaviour, brain activity, and brain structure, a modest group of 9 deletion lines was revealed to have neurodevelopmental functions, including 2 previously known to be functionally important.

      Strengths:

      (1) This work advances the state of knowledge in the microexon field and represents a starting point for future detailed investigations of the function of 7 microexons.

      (2) The phenotypic analysis using high-throughput approaches is sound and provides invaluable data.

      We thank the reviewer for their support.

      Weaknesses:

      (1) There is not enough information on the exact nature of the deletion for each microexon.

      To clarify the nature of our mutant alleles, we have added a figure (Figure S1) that details the location of the microexon in relation to its predicted neighboring exons, deletion boundaries, guide RNAs, and putative regulatory elements.

      (2) Only one deletion is phenotypically analysed, leaving space for the phenotype observed to be due to sequence modifications independent of the microexon itself.

      We have determined whether the mRNA is impacted in unanticipated ways for a subset of mutants with mild phenotypes (see point #4 responses to Reviewer 1 for details). Our findings for three microexon mutants (ap1g1, vav2, and vti1a) are corroborated by LopezBlanch et al. (https://doi.org/10.1101/2024.10.23.619860). We have also already compared the microexon removal to a loss-of-function mutant for two lines (Figures S3 and S4), and we have made this comparison more obvious as well as increasing the discussion of the expected phenotypes from typical loss-of-function mutants (see point #3 response to reviewer 1).

      Unlike protein-coding truncations, clean removal of the microexon and its regulatory elements is unlikely to yield different phenotypic outcomes if independent lines are generated (with the exception of genetic background effects). When generating a proteintruncating allele, the premature stop codon can have different locations and a varied impact on genetic compensation. In previous work (Capps et al., 2024), we have observed different amounts of nonsense-mediated decay-induced genetic compensation (El-Brolosy, et al., 2019) depending on the location of the mutation. As they lack variable premature stop codons (the expectation of a clean removal), two mutants for the same microexons should have equivalent impacts on the mRNA.

      We now address the concern of subtle genetic background effects in the Methods: “Even with using sibling controls and collecting multiple biological replicates from individual parents, the possibility remains that linked genetic variation may have contributed to the mild phenotypes we observed, as only a single line was generated.”

      Reviewer #3 (Public review):

      Summary:

      This paper sought to understand how microexons influence early brain function. By selectively deleting a large number of conserved microexons and then phenotyping the mutants with behavior and brain activity assays, the authors find that most microexons have minimal effects on the global brain activity and broad behaviors of the larval fish-- although a few do have phenotypes.

      Strengths:

      The work takes full advantage of the scale that is afforded in zebrafish, generating a large mutant collection that is missing microexons and systematically phenotyping them with high throughput behaviour and brain activity assays. The work lays an important foundation for future studies that seek to uncover the likely subtle roles that single microexons will play in shaping development and behavior.

      We thank the reviewer for their support.

      Weaknesses:

      The work does not make it clear enough what deleting the microexon means, i.e. is it a clean removal of the microexon only, or are large pieces of the intron being removed as well-- and if so how much? Similarly, for the microexon deletions that do yield phenotypes, it will be important to demonstrate that the full-length transcript levels are unaffected by the deletion. For example, deleting the microexon might have unexpected effects on splicing or expression levels of the rest of the transcript that are the actual cause of some of these phenotypes.

      To clarify the nature of our mutant alleles, we have added a figure (Figure S1) that details the location of the microexon in relation to its predicted neighboring exons, deletion boundaries, guide RNAs, and putative regulatory elements. We have determined whether the mRNA is impacted in unanticipated ways for a subset of mutants with mild phenotypes (see point #4 responses to Reviewer 1 for details).

      Reviewer #1 (Recommendations for the authors):

      (1) For most ME mutations, 4 guide sequences are provided. More description / a diagram could be helpful to interpret how ME mutations were generated.

      We have added diagrams to the Supplementary Materials (new Figure S1) to show where the guide RNAs, cut sites, and putative regulatory elements are in relationship to the microexon and its neighboring exons. We have also added the following point to the text: “Four guide RNAs were used, two on each side of the microexon (Table S2, Figure S1).”

      (2) Figure 1 indicates that there are 45 microexons (MEs) but the text initially indicates that there are 44 that exist in a canonical layout (the text later indicates there are 45). This could be made more clear.

      The 45 refers to the mutants that were generated, not the microexons with putative Srrm4 regulatory elements. We did not choose microexons to mutate based on whether they were regulated by Srrm4. We have clarified these points in the manuscript as follows: “Of these 95 microexons, 42 exist in a canonical layout in the zebrafish genome, with both a UGC and UC repeat – or similar polypyrimidine tract – directly upstream of the alternatively spliced exon (Gonatopoulos-Pournatzis et al., 2018) (Table S1), indicating that Srrm4 likely controls their inclusion. Of the remaining microexons, 44 are organized similarly to the canonical layout, typically with either a UGC or UC repeat. Thus, they may also be regulated by Srrm4.” and “Using CRISPR/Cas9, we generated lines that removed 45 conserved microexons  (Table S2) and assayed larval brain activity, brain structure, and behavior (Figure 1A). Four guide RNAs were used, two on each side of the microexon (Table S2, Figure S1). For microexons with upstream regulatory elements that are likely important for splicing, these elements were also removed (Figure S1).”

      (3) The description of the "canonical layout" as containing TGC / UC repeats could be rewritten as either "containing a UGC motif and UC repeats" or "containing a TGC motif and TC repeats."

      This error has been corrected.

      (4) Why was tcf7l2 selected as a control for MAP mapping?

      The mutant for tcf7l2 is an example of a moderately strong phenotype from a recent study we completed (Capps et al., 2025). This mutant was selected because it has both increased and decreased activity and structure and is ideal for setting the range of the graph. We now include a comparison to additional mutants from this study of autism genes (Capps et al., 2025) to further demonstrate how mild the phenotypes are in the microexon removal mutants (new Figure 3G). We also include the activity and structure maps of tcf7l2 mutants in Supplementary Figures 9 and 11.

      (5) What does it mean that of the remaining microexons, most are similar to canonical layout?

      Typically, they would have one of the two regulatory elements instead of both or the location of the possible elements would be slightly farther away than expected. We have clarified this point in the manuscript as follows: “Of these 95 microexons, 42 exist in a canonical layout in the zebrafish genome, with both a UGC and UC repeat  or similar polypyrimidine tract – directly upstream of the alternatively spliced exon (Gonatopoulos-Pournatzis et al., 2018) (Table S1), indicating that Srrm4 likely controls their inclusion. Of the remaining microexons, 44 are organized similarly to the canonical layout, typically with either a UGC or UC repeat. Thus, they may also be regulated by Srrm4.”

      (6) Figure 2A is very difficult to see - most are either up or down - suggest splitting into 2 figures - one = heat map, second can summarize values that were both up and down.

      We prefer to retain this information for accuracy. The bubble location is offset to effectively share the box between the orange (decreased) and purple (increased) measures. For example, and as noted in the methods and now expanded upon, a measure can change between 4 and 6 dpf or a measure such as bout velocity could be increased while the distance traveled is decreased (both are magnitude measures). The offset of the bubbles is consistently 0.2 data units in x and y from the center of the box.

      (7) The authors apply rigorous approaches to testing the importance of microexons. I especially appreciate the inclusion of separate biological replicates in the main figures!

      We thank the reviewer for their positive feedback.

      (8) Page 5 line 5 - suggest "compared to homozygous mutants".

      The change has been made.

      (9) For Eif5g3b dark flash phenotype, it's not clear what "p-values are not calculated for response plots" means. A p-value is provided in the plot for ppp6r3 response freq.

      The eif4g3b plot is the actual response trace measuring through pixel changes whereas the ppp6r3 is the frequency of response. While informative, the response plot is time-based data with a wide dynamic range, making the average signal across the entire time window meaningless. We include the p-values for a related measure, the latency for the first 10 dark flashes in block 1 (day6dpfdf1a_responselatency) in the legend.

      (10) The ptprd phenotype in 2D is not described in the text.

      The change has been made.

      (11) Page 7 line 7: "mild" is repeated.

      This error has been corrected.

      Reviewer #2 (Recommendations for the authors):

      Specific points for needed improvement:

      (1) The title should be adjusted to more accurately describe the results. The term 'minimal' is under-representing the findings. 9/45 (20%) of targets in their screen have some phenotype, indicating that a significant number have indeed an important function. Moreover, the phenotypic analysis is limited, leaving space for missed abnormalities (as discussed by the authors). I would therefore suggest a more neutral title such as 'Systematic genetic deletion of microexons uncovers their roles in zebrafish brain development and larval behaviour'.

      While some microexon mutants do have repeatable phenotypes, these phenotypes are far milder than phenotypes observed in other mutant sets. We now include a comparison to additional mutants from this study of autism genes (Capps et al.,2025) to further demonstrate how mild the phenotypes are in the microexon removal mutants (new Figure 3G). The title states that these microexons have a minimal impact on larval zebrafish brain morphology and function, leaving room for the possibility of adult phenotypes. Thus, we prefer to retain this title.

      (2) Do the 45 chosen microexons correspond to the 44 with a canonical layout with TGC and UC repeats? If so, it needs to be explicitly stated in the text that exons were chosen for mutation based on the potential for SRRM4 regulation. If not, then the rationale for the choice of the 45 mutants from the 95 highly conserved events needs to be explained further.

      The 45 refers to the mutants that were generated, not the microexons with putative Srrm4 regulatory elements. We did not choose microexons to mutate based on whether they were regulated by Srrm4. We have clarified these points in the manuscript as follows: “Of these 95 microexons, 42 exist in a canonical layout in the zebrafish genome, with both a UGC and UC repeat – or similar polypyrimidine tract – directly upstream of the alternatively spliced exon (Gonatopoulos-Pournatzis et al., 2018) (Table S1), indicating that Srrm4 likely controls their inclusion. Of the remaining microexons, 44 are organized similarly to the canonical layout, typically with either a UGC or UC repeat. Thus, they may also be regulated by Srrm4.” and “Using CRISPR/Cas9, we generated lines that removed 45 conserved microexons (Table S2) and assayed larval brain activity, brain structure, and behavior (Figure   1A). Four guide RNAs were used, two on each side of the microexon (Table S2, Figure S1). For microexons with upstream regulatory elements that are likely important for splicing, these elements were also removed (Figure S1).”

      There was no clear rationale for those that were selected. We attempted to generate all 95 and some mutants were not successfully generated in our initial attempt. As we found minimal phenotypes, we elected to not continue to make the remaining ones on the list.

      (3) More detail regarding the design of guides for CRISPR is required in the text in the methods section. From Table S2, 4 guides were used per microexon. Were these designed to flank the microexon? How far into the intronic sequence were the guides designed? Were the splicing regulatory sequences (polypyrimidine tract, branchpoint) also removed? The flanking sequences of each of the 45 deletion lines need to be provided.

      We have added diagrams to the Supplementary Materials (new Figure S1) to show where the guide RNAs, cut sites, and putative regulatory elements are in relationship to the microexon and its neighboring exons. We removed the microexon and the surrounding area that contains the putative regulatory elements.

      (4) Following on from the previous point, to ascertain that the phenotype observed is truly due to lack of microexon (rather than other event linked to removed intronic sequences) - for the 7 exons newly identified as functionally important, at least one added deletion line has to be shown, presenting the same phenotype. If making 7 more lines can't be achieved in a reasonable time (we are aware this is a big ask), a MO experiment blocking microexon splicing needs to be provided (may not be ideal for analysis at 6 dpf). For the existing mutants and the new ones (or morphants), sequencing of the mRNAs for the 7 genes in mutants and siblings also needs to be added to check any possible change in other variants.

      Unlike protein-coding truncations, clean removal of the microexon and its regulatory elements is unlikely to yield different phenotypic outcomes if independent lines are generated (with the exception of genetic background effects). When generating a protein-truncating allele, the premature stop codon can have different locations and a varied impact on genetic compensation. In previous work (Capps et al., 2024), we have observed different amounts of nonsense-mediated decay-induced genetic compensation (El-Brolosy, et al., 2019) depending on the location of the mutation. As they lack variable premature stop codons (the expectation of a clean removal), two mutants for the same microexons should have equivalent impacts on the mRNA. We acknowledge that we inadequately described the generation of these alleles, and we now provide Figure S1 to show the microexon’s relationship to possible regulatory elements that impact splicing in unexpected ways if they remain.

      We now acknowledge the concern of subtle genetic background effects in the Methods: “Even with using sibling controls and collecting multiple biological replicates from individual parents, the possibility remains that linked genetic variation may have contributed to the mild phenotypes we observed, as only a single line was generated.”

      Given the caveats of MOs and transient microinjection for the study of 6 dpf phenotypes, we disagree that this suggested experiment would provide value. The phenotypic assays we use are highly sensitive, and we would not even trust CRISPANTs to yield reliable data. We have added an additional loss-of-function allele for ppp6r3 from the Sanger knockout project, which has a similar but stronger size change to the ppp6r3 microexon-removal line. In addition, our findings for three microexon mutants (ap1g1, vav2, and vti1a) are corroborated by Lopez-Blanch et al. (https://doi.org/10.1101/2024.10.23.619860).

      To support that these we generated clean removal of these microexons, we experimentally determined whether the mRNA is impacted in unanticipated ways for a subset of mutants with mild phenotypes (see the point #4 public response to Reviewer 1). We also have already compared the microexon removal to a loss-offunction mutant for two lines (Figure S1), and we have made that outcome more obvious as well as increasing the discussion of the expected phenotypes from typical loss-of-function mutants (see point #3 public response to Reviewer 1).

      (5) Figure 3: An image of control tcf7l2 mutant brain activity as a reference should be included.

      We now include the activity and structure maps of tcf7l2 mutants in Supplementary Figures 9 and 11.

      (6) Figure 3a/b. The gene names on the y-axis of the pERK and structure comparisons should be reordered to be alphabetical so that phenotypes can be compared by the reader for the same microexon across the two assays.

      These data are clustered so that any similarities between maps can be recognized. We prefer to retain the clustering to compare lines to each other.

      (7) Figure S6 legend. Including graph titles like "day3msdf_dpix_numberofbouts_60" is not comprehensible to the reader so should be replaced with more descriptive text. As should jargon such as "combo plot" and"habituation_day5dpfhab1post_responsefrequency_1_a1f1000d5p" etc.

      The legend has been edited to describe the experiments. Subsections of the prior names are maintained in parentheses to enable the reader to connect the plots in this figure to the specific image and underlying data in Zenodo.

      (8) Page 2 line 21 "to enable proper".

      The change has been made.

      (9) Page 7 line 7. Repeatable phenotypes were mild mild.

      This error has been corrected.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1B is confusingly laid out.

      We are unclear how to modify Figure 1B, as it is a bar plot. We have modified several figures to improve clarity.

      (2) Figure 1E-there are some pictures of zebrafish but to what end? They aren't labelled. The dark "no expression" looks really similar to the dark green, "high expression".

      The zebrafish images represent the ages assessed for microexon inclusion. We have added labels to clarify this point.

      (3) The main text says "microexons were removed by Crispr" but there is no detail in the main text about this at all-- and barely any in the methods. What does it mean to be removed? Cleanly? Or including part of the introns on either side? Etc. How selected, raised, etc? I can glean some of this from the Table S2 if I do a lot of extra work, but at least some notes about this would be important.

      We have added diagrams to the Supplementary Materials (new Figure S1) to show where the guide RNAs, cut sites, and putative regulatory elements are in relationship to the microexon and its neighboring exons. We removed the microexon and the surrounding area that contains the putative regulatory elements.

      (4) Figure 2 - There are no Ns, at least for the plots on the right. The reader shouldn't have to dig deep in Table S2 to find that. It is also unclear why heterozygous fish are not included in these analyses, since there are sibling data for all. Removed for readability of the plots might be warranted, but this should be made explicitly clear.

      The Ns for these plots have been added to the legend. The legend was also modified as follows: “Comparisons to the heterozygous larvae are removed for clarity and available in the Supplementary Materials, as they often have even milder phenotypes than homozygous.”

      (5) Needed data: for those with phenotypes, some evidence should be presented that the full-length transcripts that encode proteins without the microexons are still expressed at the same level and without splicing errors/NMD. Otherwise, some of these phenotypes that were found could be due to knockdown or LOF (or I suppose even overexpression) of the targeted gene.

      We have added a new Supplementary Figure S2 confirming clean removal of the microexons with RT-PCR for a subset of mutants with phenotypes. This figure also includes qRT-PCR for the same subset. We now discuss these findings: Results: “For eight mutant lines, we confirmed that the microexon was eliminated from the transcripts as expected (Figure S2). Although our genomic deletion did not yield unexpected isoforms, qRT-PCR on these eight lines revealed significant downregulation for the homozygous vav2 mutant (Figure S2), indicating possibly complex genetic regulation.”

    1. eLife Assessment

      This fundamental study explores a novel cellular mechanism underlying the degeneration of locus coeruleus neurons during chronic restraint stress. The evidence supporting the overexcitation of LC neurons after chronic stress is compelling. The topic is timely, the proposed mechanistic pathway is innovative, and the findings have translational relevance, particularly regarding therapeutic strategies targeting α2A-AR internalization in neurodegenerative diseases.

    2. Reviewer #1 (Public review):

      This study aims to elucidate the mechanisms by which stress-induced α2A-adrenergic receptor (α2A-AR) internalization leads to cytosolic noradrenaline (NA) accumulation and subsequent neuronal dysfunction in the locus coeruleus (LC). While the manuscript presents an interesting but ambitious model involving calcium dynamics, GIRK channel rundown, and autocrine NA signaling, several key limitations undermine the strength of the conclusions.

      First, the revision does not include new experiments requested by reviewers to validate core aspects of the mechanism. Specifically, there is no direct measurement of cytosolic NA levels or MAO-A enzymatic activity to support the link between receptor internalization and neurochemical changes. The authors argue that such measurements are either not feasible or beyond the scope of the study, leaving a significant gap in the mechanistic chain of evidence.

      Second, the behavioral analysis remains insufficient to support claims of cognitive impairment. The use of a single working memory test following an anxiety test is inadequate to verify memory dysfunction behaviors. Additional cognitive assays, such as the Morris Water Maze or Novel Object Recognition, are recommended but not performed.

      Third, concerns regarding the lack of rigor in differential MAO-A expression in fluorescence imaging were not addressed experimentally. Instead of clarifying the issue, the authors moved the figure to supplementary data without providing further evidence (e.g., an enzymatic assay or quantitative reanalysis of Western blot, or re-staining of IF for MAO-A) to support their interpretation.

      Fourth, concerns regarding TH staining remain unresolved. In Figure S7, the α2A-AR signal appears to resemble TH staining, and vice versa, raising the possibility of labeling errors. It is recommended that the authors re-examine this issue by either double-checking the raw data or repeating the immunostaining to validate the staining.

      Overall, the manuscript offers a potentially interesting framework but falls short in providing the experimental rigor necessary to establish causality. The reliance on indirect reasoning and reorganizing existing data, rather than generating new evidence, limits the overall impact and interpretability of the study.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the mechanism by which chronic stress induces degeneration of locus coeruleus (LC) neurons. The authors demonstrate that chronic stress leads to the internalization of α2A-adrenergic receptors (α2A-ARs) on LC neurons, causing increased cytosolic noradrenaline (NA) accumulation and subsequent production of the neurotoxic metabolite DOPEGAL via monoamine oxidase A (MAO-A). The study suggests a mechanistic link between stress-induced α2A-AR internalization, disrupted autoinhibition, elevated NA metabolism, activation of asparagine endopeptidase (AEP), and Tau pathology relevant to Alzheimer's disease (AD). The conclusions of this paper are well-supported mainly by the data, but some aspects of image acquisition require further examination.

      Strengths:

      This study clearly demonstrates the effects of chronic stimulation on the excitability of LC neurons using electrophysiological techniques. It also elucidates the role of α2-adrenergic receptor (α2-AR) internalization and the associated upstream and downstream signaling pathways of GIRK-1, using a range of pharmacological agents, highlighting the innovative nature of the work. Additionally, the study identifies the involvement of the MAO-A-DOPEGAL-AEP pathway in this process. The topic is timely, the proposed mechanistic pathway is compelling, and the findings have translational relevance, particularly in relation to therapeutic strategies targeting α2A-AR internalization in neurodegenerative diseases.

      Weaknesses:

      (1) The manuscript reports that chronic stress for 5 days increases MAO-A levels in LC neurons, leading to the production of DOPEGAL, activation of AEP, and subsequent tau cleavage into the tau N368 fragment, ultimately contributing to neuronal damage. However, the authors used wild-type C57BL/6 mice, and previous literature has indicated that AEP-mediated tau cleavage in wild-type mice is minimal and generally insufficient to cause significant behavioral alterations. Please clarify and discuss this apparent discrepancy.

      (2) It is recommended that the authors include additional experiments to examine the effects of different durations and intensities of stress on MAO-A expression and AEP activity. This would strengthen the understanding of stress-induced biochemical changes and their thresholds.

      (3) Please clarify the rationale for the inconsistent stress durations used across Figures 3, 4, and 5. In some cases, a 3-day stress protocol is used, while in others, a 5-day protocol is applied. This discrepancy should be addressed to ensure clarity and experimental consistency.

      (4) The abbreviation "vMAT2" is incorrectly formatted. It should be "VMAT2," and the full name (vesicular monoamine transporter 2) should be provided at first mention.

      Comments on revisions:

      The authors have addressed all of the reviewers' comments.

    4. Reviewer #3 (Public review):

      Summary:

      The authors present a technically impressive dataset showing that repeated excitation or restraint stress internalises somatodendritic α2A adrenergic autoreceptors (α2A ARs) in locus coeruleus (LC) neurons. Loss of these receptors weakens GIRK-dependent autoinhibition, raises neuronal excitability, and is accompanied by higher MAO A, DOPEGAL, AEP, and tau N368 levels. The work combines rigorous whole-cell electrophysiology with barbadin-based trafficking assays, qPCR, Western blotting, and immunohistochemistry. The final schematic is appealing and, in principle, could explain early LC hyperactivity followed by degeneration in ageing and Alzheimer's disease.

      Strengths:

      - Multi-level approach - The study integrates electrophysiology, pharmacology, mRNA quantification, and protein-level analysis.

      -Use of barbadin to block β-arrestin/AP-2-dependent internalisation is both technically precise and mechanistically informative

      -Well-executed electrophysiology

      -translation relevance

      -converges to a model that peers discussed (scientists can only discuss models - not data!)

      Weaknesses:

      Nevertheless, the manuscript currently reads as a sequence of discrete experiments rather than a single causal chain.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer # 1 (Public review)

      This study aims to elucidate the mechanisms by which stress-induced α2A-adrenergic receptor (α2A-AR) internalization leads to cytosolic noradrenaline (NA) accumulation and subsequent neuronal dysfunction in the locus coeruleus (LC). While the manuscript presents an interesting but ambitious model involving calcium dynamics, GIRK channel rundown, and autocrine NA signaling, several key limitations undermine the strength of the conclusions. 

      (1) First, the revision does not include new experiments requested by reviewers to validate core aspects of the mechanism. Specifically, there is no direct measurement of cytosolic NA levels or MAO-A enzymatic activity to support the link between receptor internalization and neurochemical changes. The authors argue that such measurements are either not feasible or beyond the scope of the study, leaving a significant gap in the mechanistic chain of evidence. 

      Although the reviewer #1 commented that “The authors argue that such measurements are either not feasible or beyond the scope of the study, leaving a significant gap in the mechanistic chain of evidence”, we believe that this comment may be unfair. 

      It may be unfair for the reviewer #1 to neglect our responses to the original reviewer comments regarding the direct measurement of cytosolic NA levels. It is true that none of the recommended methods to directly measure cytosolic NA levels are not feasible as described in the original authors’ response (see the original authors’ response to the comment raised by the Reviewer #1 as Recommendations for the authors (2)). To measure extracellular NA with GRAB-NE photometry, α2A-ARs must be expressed in the cell membrane. GRAB-NE photometry is not applicable unless α2A-ARs are expressed, whereas increases in cytosolic NA levels are caused by internalization of α2A-ARs in our study.

      In our study, we elaborated to detect the change in MAO-A protein with Western blot method, instead of examining MAO-A enzymatic activity. Because the relative quantification of active AEP and Tau N368 proteins by Western blot analysis should accurately reflect the change in the MAO-A enzymatic activity, enzymatic assay may not be necessarily required while we admit the necessity of enzymatic assay to better demonstrate the MAO-A activities as discussed in the previously revised manuscript (R1, page 10, lines 314-315). 

      We used the phrase “beyond the scope of the current study” for “the mechanism how Ca<sup>2+</sup> activates MAO-A” as described in the original authors’ responses (see the original authors’ response to the comment raised by the Reviewer #1 as Weakness (3)). We do not think that this mechanism must be investigated in the present study because the Ca<sup>2+</sup> dependent nature of MAO-A activity is already known (Cao et al., 2007). 

      On the other hand, because it is not possible to measure cytosolic NA levels with currently available methods, the quantification of the connection between α2A-AR internalization and increased cytosolic NA levels must be considered outside the scope of the study. However, our study demonstrated the qualitative relationship between α2A-AR internalization and active-AEP/TauN-368 reflecting increased cytosolic NA levels, leaving “a small gap in the mechanistic chain of evidence.” Therefore, it may be unreasonable to criticize our study as “leaving a significant gap in the mechanistic chain of evidence” with the phrase “beyond the scope of the current study.” 

      (2) Second, the behavioral analysis remains insufficient to support claims of cognitive impairment. The use of a single working memory test following an anxiety test is inadequate to verify memory dysfunction behaviors. Additional cognitive assays, such as the Morris Water Maze or Novel Object Recognition, are recommended but not performed.

      As described in the original authors’ response (see the original authors’ response to the comment raised by the Reviewer #1 as Weakness (4)), we had already done another behavioral test using elevated plus maze (EPM) test. By combining the two tests, it may be possible to more accurately evaluate the results of Y-maze test by differentiating the memory impairment from anxiety. However, the results obtained by these behavioral tests showed that chronic RS mice displayed both anxiety-like and memory impairment-like behaviors. Accordingly, we have softened the implication of anxiety and memory impairment (page 13, lines 396-399) and revised the abstract (page 2, line 59) in the revised manuscript (R2).  

      (3) Third, concerns regarding the lack of rigor in differential MAO-A expression in fluorescence imaging were not addressed experimentally. Instead of clarifying the issue, the authors moved the figure to supplementary data without providing further evidence (e.g., an enzymatic assay or quantitative reanalysis of Western blot, or re-staining of IF for MAO-A) to support their interpretation.

      Because the quantification of MAO-A expression can be performed with greater accuracy by means of Western blot than by immunohistochemistry, we have moved the immunohistochemical results (shown in Figure 5) to the supplemental data (Figure S8) following the suggestion made by the Reviewer #3. As the relative quantification of active AEP and Tau N368 proteins by Western blot analysis may accurately reflect changes in the MAO-A enzymatic activity which is consistent with the result of Western blot analysis of MAO-A, enzymatic assay or re-staining of immunofluorescence for MAO-A may not be necessarily required. We do not think that a new experiment of Western blot analysis is necessary to re-evaluate MAO-A just because of the lack of the less-reliable quantification of immunohistochemical staining.

      (4) Fourth, concerns regarding TH staining remain unresolved. In Figure S7, the α2A-AR signal appears to resemble TH staining, and vice versa, raising the possibility of labeling errors. It is recommended that the authors re-examine this issue by either double-checking the raw data or repeating the immunostaining to validate the staining.

      The reviewer #3 is misunderstanding Figure S7. In Figure S7, there are two types of α2A-AR expressing neurons; one is TH-positive LC neuron and the other is TH-negative neuron in mesencephalic trigeminal nucleus (MTN). This clearly indicates that TH staining is specific. Furthermore, α2A-AR staining was much more extensive in MTN neurons than in LC neurons. Thus, α2A-AR signal is not similar to TH signal and there are no labeling errors, which is also evident in the merged image (Figure S7C).

      (5) Overall, the manuscript offers a potentially interesting framework but falls short in providing the experimental rigor necessary to establish causality. The reliance on indirect reasoning and reorganizing of existing data, rather than generating new evidence, limits the overall impact and interpretability of the study.

      Overall, the reviewer #1 was not satisfied with our revision regardless of the authors’ responses. As detailed above in our responses to the replies (1)~(4), we believe that in the original authors’ responses and in the above-described responses we effectively responded to the criticisms by the reviewer #1.

      Reviewer #2 (Public review): 

      Comments on revisions: 

      The authors have addressed all of the reviewers' comments.

      We appreciate constructive and helpful comments made by the reviewer #2.

      Reviewer #3 (Public review): 

      Weaknesses:  

      Nevertheless, the manuscript currently reads as a sequence of discrete experiments rather than a single causal chain. Below, I outline the key points that should be addressed to make the model convincing.

      Please see the responses to the recommendation for the authors made by reviewer #3.

      Reviewer #3 (Recommendations for the authors):

      (1) Causality across the pathway  

      Each step (α2A internalisation, GIRK rundown, Ca<sup>2+</sup> rise, MAO-A/AEP upregulation) is demonstrated separately, but no experiment links them in a single preparation. Consider in vivo Ca<sup>2+</sup> or GRAB NE photometry during restraint stress while probing α2A levels with i.p. clonidine injection or optogenetic over excitation coupled to biochemical readouts. Such integrated evidence would help to overcome the correlational nature of the manuscript to a more mechanistic study. 

      Authors response: It is not possible to measure free cytosolic NA levels with GRAB NE photometry when α2A AR is internalized as described above (see the response to the comment made by reviewer #1 as the recommendation for the authors).

      The core idea behind my comment, as well as that of Reviewer 1, was to encourage integrating your individual findings into a more cohesive in vivo experiment. Using GRAB-NE to measure extracellular NA could serve as an indirect readout of NA uptake via NAT, and ultimately, cytosolic NA levels. Connecting these experiments would significantly strengthen the manuscript and enhance its overall impact. 

      It may be true that the measurement of extracellular NA could serve as an indirect readout of NA uptake via NAT, and ultimately cytosolic NA levels. However, the reviewer #3 is still misunderstanding the applicability of GRAB-NE method to detect NE in our study. As described in the original authors’ response, there appeared to be no fluorescence probe to label cytosolic NA at present. Especially, the GRAB-NE method recommended by the reviewers #1 and #3 is limited to detect NA only when α2A-AR is expressed in the cell membrane.Therefore, when increases in cytosolic NA levels are caused by internalization of α2A-ARs, NA measurement with GRAB-NE photometry is not applicable.

      (2) Pharmacology and NE concentration  

      The use of 100 µM noradrenaline saturates α and β adrenergic receptors alike. Please provide ramp measurements of GIRK current in dose-response at 1-10 µM NE (blocked by atipamezole) to confirm that the rundown really reflects α2A activity rather than mixed receptor effects. 

      Authors response: It is true that 100 µM noradrenaline activates both α and β adrenergic receptors alike. However, it was clearly showed that enhancement of GIRK-I by 100 µM noradrenaline was completely antagonized by 10 µM atipamezole and the Ca<sup>2+</sup> dependent rundown of NA-induced GIRK-I was prevented by 10 µM atipamezole. Considering the Ki values of atipamezole for α2A AR (=1~3 nM) (Vacher et al., 2010, J Med Chem) and β AR (>10 µM) (Virtanen et al., 1989, Arch Int Pharmacodyn Ther), these results really reflect α2A AR activity but not β AR activity (Figure S5). Furthermore, because it is already well established that NA-induced GIRK-I was mediated by α2A AR activity in LC neurons (Arima et al., 1998, J Physiol; Williams et al., 1985, Neuroscience), it is not necessarily need to re-examine 1-10 µM NA on GIRK-I.

      While the milestone papers by Williams remain highly influential, they should be re-evaluated in light of more recent findings, given that they date back over 40 years. Advances in our understanding now allow for a more nuanced interpretation of some of their results. For example, see McKinney et al. (eLife, 2023). This study demonstrates that presynaptic β-adrenergic receptors-particularly β2-can enhance neuronal excitability via autocrine mechanisms. This suggests that your post-activation experiments using atipamezole may not fully exclude a contribution of β-adrenergic signaling. Such a role might become apparent when conducting more detailed titration experiments.

      The reviewer #3 may be misunderstanding the report by McKinney et al. (eLife, 2013). This paper did not demonstrate that presynaptic β-adrenergic receptors-particularly β2- can enhance neuronal excitability via autocrine mechanisms. It is impossible for LC neurons to increase their excitability by activating β-adrenergic receptors, as we have clearly shown that enhancement of GIRK-I by 100 µM noradrenaline was completely antagonized by 10 µM atipamezole. Considering the difference in Ki values of atipamezole for α2-AR (= 2~4 nM) (Vacher et al., 2010, J Med Chem) and β-AR (>10 µM) (Virtanen et al., 1989, Arch Int Pharmacodyn Ther), such a complete antagonization (of 100 µM NA-induced GIRK-I) by 10 µM atipamezole really reflect α2A-AR activity but not β-AR activity (Figure S5). Furthermore, it is already well established that NA-induced GIRK-I was mediated by α2-AR activity in LC neurons (Arima et al., 1998, J Physiol). McKinney et al. (eLife, 2023) have just found the absence of lateral inhibition on adjacent LC neurons by NA autocrine caused respective spike activity. This has nothing to do with autoinhibition.

      (4) Age mismatch and disease claims 

      All electrophysiology and biochemical data come from juvenile (< P30) mice, yet the conclusions stress Alzheimer-related degeneration. Key endpoints need to be replicated in adult or aged mice, or the manuscript should soften its neurodegenerative scope. 

      Authors response: As described in the section of Conclusion, we never stress Alzheimer-related degeneration, but might give such an impression. To avoid such a misunderstanding, we have added a description “However, the present mechanism must be proven to be valid in adult or old mice, to validate its involvement in the pathogenesis of AD.” (R1, page 14, lines 448-450).

      It would be great to see this experiment performed in aged mice-you are the one who has everything in place to do it right now! 

      In our future separate studies, we would like to prove that the present mechanism is valid in aged mice, to validate its involvement in the pathogenesis of AD. This is partly because the patch-clamp study in aged mice is extremely difficult and takes much time.

      Authors response: In the abstract, you suggest that internalization of α2A-adrenergic receptors could represent a therapeutic target for Alzheimer's disease. "...Thus, it is likely that internalization of α2A-AR increased cytosolic NA, as reflected in AEP increases, by facilitating reuptake of autocrine-released NA. The suppression of α2A-AR internalization may have a translational potential for AD treatment."

      α2A-AR internalization was involved in the degeneration of LC neurons. Because we confirmed that spike-frequency adaptation reflecting α2A-AR-mediated autoinhibition can be induced in adult mice as prominently as in juvenile mice (Figure S10), it is not inadequate to suggest that the suppression of α2A-AR internalization may have a translational potential for anxiety/AD treatment (see Discussion; R2, page 14, lines 445-449).

      (6) Quantitative histology  

      Figure 5 presents attractive images, but no numerical analysis is provided. Please provide ROI-based fluorescence quantification (with n values) or move the images to the supplement and rely on the Western blots. 

      Author response: We have moved the immunohistochemical results in Fig. 5 to the supplement, as we believe the quantification of immunohistochemical staining is not necessarily correct.   

      What do you mean by that " ...immunohistochemical staining is not necessarily correct."  

      It is evident that in terms of quantification, Western blot analysis is a more accurate method than immunohistochemical staining. In this sense, it is the contention of our study that the ROI-based fluorescence quantification of immunohistochemical staining is not necessarily an accurate or correct procedure, compared to the quantification by Western blot analysis.

    1. eLife Assessment

      The analysis of neural morphology across Heliconiini butterfly species revealed brain area-specific changes associated with new foraging behaviours. While the volume of the centre for learning and memory, the mushroom bodies, was known to vary widely across species, new, valuable results show conservation of the volume of a center for navigation, the central complex. The presented evidence is convincing for both volumetric conservation in the central complex and fine neuroanatomical differences associated with pollen feeding, delivered by experimental approaches that are applicable to other insect species. This work will be of interest to evolutionary biologists, entomologists, and neuroscientists.

    2. Reviewer #1 (Public review):

      The authors previously reported that Heliconius, one genus of the Heliconiini butterflies, evolved to be efficient foragers to feed pollen of specific plants and have massively expanded mushroom bodies. Using the same image dataset, the authors segmented the central complex and associated brain regions and found that the volume of the central complex relative to the rest of the brain is largely conserved across the Heliconiini butterflies. By performing immunostaining to label a specific subset of neurons, the authors found several potential sites of evolutionary divergence in the central complex neural circuits, including the number of GABAergic ellipsoid body ring neurons and the innervation patterns of Allatostatin A expressing neurons in the noduli. These neuroanatomical data will be helpful to guide future studies to understand the evolution of the neural circuits for vector-based navigation.

      Strengths:

      The authors used a sufficiently large scale of dataset from 307 individuals of 41 species of Heliconiini butterflies to solidify the quantitative conclusions and present new microscopy data for fine neuroanatomical comparison of the central complex.

      Weaknesses:

      (1) Although the figures display a concise summary of anatomical findings, it would be difficult for non-experts to learn from this manuscript to identify the same neuronal processes in the raw confocal stacks. It would be helpful to have instructive movies to show a step-by-step guide for identification of neurons of interest, segmentations, and 3D visualizations (rotation) for several examples, including ER neurons (to supplement texts in line 347-353) and Allatostatin A neurons.

      (2) Related to (1), it was difficult for me to assess if the data in Figure 7 support the author's conclusions that ER neuron number increased in Heliconius Melpomene. By my understanding, the resolution of this dataset isn't high enough to trace individual axons and therefore authors do not rule out that the portion of "ER ring neurons" in Heliconius may not innervate the ER, as stated in Line 635 "Importantly, we also found that some ER neurons bypass the ellipsoid body and give rise to dense branches within distinct layers in the fan-shaped body (ER-FB)". If they don't innervate the ellipsoid body, why are they named as "ER neurons"?

      (3) Discussions around the lines 577-584 require the assumption that each ellipsoid body (EB) ring neuron typically arborises in a single microglomerulus to form a largely one-to-one connection with TuBu neurons within the bulb (BU), and therefore, the number of BU microglomeruli should provide an estimation of the number of ER neurons. Explain this key assumption or provide an alternative explanation.

      (4) The details of antibody information are missing in the Key resource table. Instead of citing papers, list the catalogue numbers and identifier for commercially available antibodies, and describe the antigen, and whether they are monoclonal or polyclonal. Are antigens conserved across species?

      (5) I did not understand why authors assume that foraging to feed on pollens is a more difficult cognitive task than foraging to feed on nectar. Would it be possible that they are equally demanding tasks, but pollen feeding allows Heliconius to pass more proteins and nucleic acids to their offspring and therefore they can develop larger mushroom bodies?

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Farnsworth et al. ask whether the previously established expansion of mushroom bodies in the pollen foraging Heliconius genus of Heliconiini butterflies co-evolved with adaptations in the central complex. Heliconius trap line foraging strategies to acquire pollen as a novel resource require advanced spatial memory mediated by larger mushroom bodies, but the authors show that related navigation circuits in the central complex are highly conserved across the Heliconiini tribe, with a few interesting exceptions. Using general immunohistochemical stains and 3D reconstruction, the authors compared volumes of central complex regions, and unlike the mushroom bodies, there was no evidence of expansion associated with pollen feeding. However, a second dataset of neuromodulator and neuropeptide antibody labeling reveals more subtle differences between pollen and non-pollen foragers and highlights sub-circuits that may mediate species-specific differences in behavior. Specifically, the authors found an expansion of GABAergic ER neurons projecting to the fan-shaped body in Heliconius, which may enhance their ability to path-integrate. They also found differences in Allatostatin A immunoreactivity, particularly increased expression in the noduli associated with pollen feeding. These differences warrant closer examination in future studies to determine their functional implication on navigation and foraging behaviors.

      Strengths:

      The authors leveraged a large morphological data set from the Heliconiini to achieve excellent phylogenetic coverage across the tribe with 41 species represented. Their high-quality histology resolves anatomical details to the level of specific, identifiable tracts and cell body clusters. They revealed differences at a circuit level, which would not be obvious from a volumetric comparison. The discussion of these adaptations in the context of central complex models is useful for generating new hypotheses for future studies on the function of ER-FB neurons and the role of Allatostatin A modulation in navigation.

      The conclusions drawn in this paper are measured and supported by rigorous statistics and evidence from micrographs.

      Weaknesses:

      The majority of results in this study do not reveal adaptations in the central complex associated with pollen foraging. However, reporting conserved traits is useful and illustrates where developmental or functional constraints may be acting. The implied hypothesis in the introduction is that expansion of mushroom bodies in Heliconius co-evolved with central complex adaptations, so it may be helpful to set up the alternate hypotheses in the beginning.

      In the main text, the authors describe differences in GABAergic neurons "across several species" but only one Heliconius and one outgroup species seem to be represented in the figures. ER numbers in Figure 7H are only compared for these two species. If this data is available for other species, it would strengthen the paper to add them to the analysis, since this was one of the most intriguing findings in the study. I would want to know if the increased ER number is a trend in Heliconius or specific to H. melpomene.

    4. Author response:

      We thank the two reviewers for their constructive criticisms which we will address in the coming weeks, and we are confident doing so will benefit the manuscript.

      We will aim to address all comments, but there are two main areas in particular that we highlight here:

      (1)  Both reviewers make important suggestions to improve the readers’ understanding of the anatomical complexities and raw files we provide. We will generate annotated confocal stacks and simplify the nomenclature to better guide the reader through the more complex details of the anatomy of the central complex, and the neuron types we characterized more closely.

      (2)  Both reviewers also pointed to several parts of our interpretations and discussion that should be clarified. We will do so by improving the language we use at certain sections to offer more precision, and by offering alternative explanations where possible.

    1. eLife Assessment

      This study offers a valuable theoretical framework for quantifying molecular transport across interfaces between coexisting liquid phases, emphasizing interfacial resistance as a central factor governing transport kinetics. The mathematical derivations are solid. To enhance the paper's relevance and broaden its appeal, it would be helpful to clarify how the key equations connect to existing literature and to elucidate the physical mechanisms underlying scenarios that give rise to substantial interfacial resistance.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors theoretically address the topic of interface resistance between a phase-separated condensate and the surrounding dilute phase. In a nutshell, "interface resistance" occurs if material in the dilute phase can only slowly pass through the interface region to enter the dense phase. There is some evidence from FRAP experiments that such a resistance may exist, and if it does, it could be biologically relevant insofar as the movement of material between dense and dilute phases can be rate-limiting for biological processes, including coarsening. The current study theoretically addresses interface resistance at two levels of description: first, the authors present a simple way of formulating interface resistance for a sharp interface model. Second, they derive a formula for interface resistance for a finite-width interface and present two scenarios where the interface resistance might be substantial.

      Strengths:

      The topic is of broad relevance to the important field of intracellular phase separation, and the work is overall credible.

      Weaknesses:

      There are a few problems with the study as presented - mainly that the key formula for the latter section has already been derived and presented in Reference 6 (notably also in this journal), and that the physical basis for the proposed scenarios leading to a large interface resistance is not clearly supported.

      (1) As noted, Equation 32 of the current study is entirely equivalent to Equation 8 of Reference 6, with a very similar derivation presented in Appendix 1 of that paper. In fact, Equation 8 in Reference 6 takes one more step by combining Equations 32 and 35 to provide a general expression for the interface resistance in an integral form. These prior results should be properly cited in the current work - the existing citations to Reference 6 do not make this overlap apparent.

      (2) The authors of the current study go on to examine cases where this shared equation (here Equation 32) might imply a large interface resistance. The examples are mathematically correct, but physically unsupported. In order to produce a substantial interface resistance, the current authors have to suppose that in the interface region between the dense and dilute phases, either there is a local minimum of the diffusion coefficient or a local minimum of the density. I am not aware of any realistic model that would produce either of these minima. Indeed, the authors do not present sufficient examples or physical arguments that would support the existence of such minima.

      In my view, these two issues limit the general interest of the latter portion of the current manuscript. While point 1 can be remedied by proper citation, point 2 is not so simple to address. The two ways the authors present to produce a substantial interface resistance seem to me to be mathematical exercises without a physical basis. The manuscript will improve if the authors can provide examples or compelling arguments for a minimum of either diffusion coefficient or density between the dense and dilute phases that would address point 2.

    3. Reviewer #2 (Public review):

      Summary:

      This work provides a general theoretical framework for understanding molecular transport across liquid-liquid phase boundaries, focusing on interfacial resistance arising from deviations from local equilibrium. By bridging sharp and continuous interface descriptions, the authors demonstrate how distinct microscopic mechanisms can yield similar effective kinetics and propose practical experimental validation strategies.

      Strengths:

      (1) Conceptually rich and physically insightful interface resistance formulation in sharp and continuous limits.

      (2) Strong integration of non-equilibrium thermodynamics with biologically motivated transport scenarios.

      (3) Thorough numerical and analytical support, with thoughtful connection to current and emerging experimental techniques.

      (4) Relevance to various systems, including biomolecular condensates and engineered aqueous two-phase systems.

      Weaknesses:

      (1) The work remains theoretical, mainly, with limited direct comparison to quantitative experimental data.

      (2) The biological implications are only briefly explored; further discussion of specific systems where interface resistance might play a functional role would enhance the impact.

      (3) Some model assumptions (e.g., symmetric labeling or idealized diffusivity profiles) could be further contextualized regarding biological variability.

    4. Reviewer #3 (Public review):

      The manuscript investigated the kinetics of molecule transport across interfaces in phase-separated mixtures. Through the development of a theoretical approach for a binary mixture in a sharp interface limit, the authors found that interface resistance leads to a slowdown in interfacial movement. Subsequently, they extended this approach to multiple molecular species (incorporating both labeled and unlabeled molecules) and continuous transport models. Finally, they proposed experimental settings in vitro and commented on the necessary optical resolution to detect signatures of interfacial kinetics associated with resistance.

      The investigation of transport kinetics across biomolecular condensate interfaces holds significant relevance for understanding cellular function and dysfunction mechanisms; thus, the topic is important and timely. However, the current manuscript presentation requires improvement. Firstly, the inclusion of numerous equations in the main text substantially compromises readability, and relocation of a part of the formulae and derivations to the Appendix would be more appropriate. Secondly, the manuscript would benefit from more comprehensive comparisons with existing theoretical studies on molecular transport kinetics. The text should also be written to be more approachable for a general readership. Modifications and sufficient responses to the specific points outlined below are recommended.

      (1) The authors introduced a theoretical framework to study the kinetics of molecules across an interface between two coexisting liquid phases and found that interface resistance leads to a slowdown in interfacial movement in a binary mixture and a decelerated molecule exchange between labeled and unlabeled molecules across the phase boundary. However, these findings appear rather expected. The work would be strengthened by a more thorough discussion of the kinetics of molecule transport across interfaces (such as the physical origin of the interface resistance and its specific impact on transport kinetics).

      (2) The formulae in the manuscript should be checked and corrected. Notably, Equation 10 contains "\phi_2\ln\phi_2" while Eq. 11b shows "n^{-1}\ln\phi_2", suggesting a missing factor of "n^{-1}". Similarly, Equation 18 obtained from Equation 11: the logarithmic term in Eq.11a is "n^{-1}\ln phi_1-\ln(1-\phi)" but the pre-exponential factor in Equation 18a is just "\phi_1/(1-\phi*)", where is "n^{-1}"? Additionally, there is a unit inconsistency in Equation 36, where the unit of \rho (s/m) does not match that of the right-hand side expression (s/m^2).

      (3) The authors stated that the numerical solutions are obtained using a custom finite difference scheme implemented in MATLAB in the Appendix. The description of numerical methods is insufficiently detailed and needs to be expanded, including specific equations or models used to obtain specific figures, the introduction of initial and boundary conditions, the choices of parameters and their reasons in terms of the biology.

      (4) The authors claimed that their framework naturally extends to multiple molecular species, but only showed the situation of labeled and unlabeled molecules across a phase boundary. How about three or more molecular species? Does this framework still work? This should be added to strengthen the manuscript and confirm the framework's general applicability.

    5. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      In this manuscript, the authors theoretically address the topic of interface resistance between a phase-separated condensate and the surrounding dilute phase. In a nutshell, "interface resistance" occurs if material in the dilute phase can only slowly pass through the interface region to enter the dense phase. There is some evidence from FRAP experiments that such a resistance may exist, and if it does, it could be biologically relevant insofar as the movement of material between dense and dilute phases can be rate-limiting for biological processes, including coarsening. The current study theoretically addresses interface resistance at two levels of description: first, the authors present a simple way of formulating interface resistance for a sharp interface model. Second, they derive a formula for interface resistance for a finite-width interface and present two scenarios where the interface resistance might be substantial. 

      Strengths: 

      The topic is of broad relevance to the important field of intracellular phase separation, and the work is overall credible. 

      Weaknesses: 

      There are a few problems with the study as presented - mainly that the key formula for the latter section has already been derived and presented in Reference 6 (notably also in this journal), and that the physical basis for the proposed scenarios leading to a large interface resistance is not clearly supported. 

      (1) As noted, Equation 32 of the current study is entirely equivalent to Equation 8 of Reference 6, with a very similar derivation presented in Appendix 1 of that paper. In fact, Equation 8 in Reference 6 takes one more step by combining Equations 32 and 35 to provide a general expression for the interface resistance in an integral form. These prior results should be properly cited in the current work - the existing citations to Reference 6 do not make this overlap apparent. 

      We agree and will make the overlap explicit, acknowledging priority and clarifying what is new here. The initial version of the preprint of Zhang et al. (2022) (https://www.biorxiv.org/content/10.1101/2022.03.16.484641v1) lacked the derivation (it referenced a Supplementary Note not yet available); it was added during the eLife submission. We worked from the preprint and missed this update, which we will now correct.

      (2) The authors of the current study go on to examine cases where this shared equation (here Equation 32) might imply a large interface resistance. The examples are mathematically correct, but physically unsupported. In order to produce a substantial interface resistance, the current authors have to suppose that in the interface region between the dense and dilute phases, either there is a local minimum of the diffusion coefficient or a local minimum of the density. I am not aware of any realistic model that would produce either of these minima. Indeed, the authors do not present sufficient examples or physical arguments that would support the existence of such minima. 

      We respectfully disagree with the reviewer on the physical plausibility of these scenarios there is both concrete experimental and theoretical evidence for the scenarios we discussed.

      Experimental: Strom et al. (2017) (our reference 11) describes a substantially reduced protein diffusion coefficient at an in vivo phase boundary, while Hahn et al. (2011a) and Hahn et al. (2011b) (our references 27 and 28) describe transient accumulation of molecules at a phase boundary, which they attribute to the Donnan potential, but conceivably a lowered mobility could play a role.

      Theoretical: Recent work (e.g., Majee et al. (2024)) shows that charged layers could form at phase boundaries, which could either repel or attract incoming molecules, depending on their charge, thus altering the local volume fraction, resulting in a trough or peak. Arguably, the model put forth by Zhang et al. (2024) could be mapped to a potential wall, where particles are reflected, unless in a certain state. We will add sentences to the corresponding results section, as well as the discussion to make this plausibility more apparent.

      In my view, these two issues limit the general interest of the latter portion of the current manuscript. While point 1 can be remedied by proper citation, point 2 is not so simple to address. The two ways the authors present to produce a substantial interface resistance seem to me to be mathematical exercises without a physical basis. The manuscript will improve if the authors can provide examples or compelling arguments for a minimum of either diffusion coefficient or density between the dense and dilute phases that would address point 2. 

      We believe we will be able to address both issues.

      Reviewer #2 (Public review): 

      Summary: 

      This work provides a general theoretical framework for understanding molecular transport across liquid-liquid phase boundaries, focusing on interfacial resistance arising from deviations from local equilibrium. By bridging sharp and continuous interface descriptions, the authors demonstrate how distinct microscopic mechanisms can yield similar effective kinetics and propose practical experimental validation strategies. 

      Strengths: 

      (1) Conceptually rich and physically insightful interface resistance formulation in sharp and continuous limits. 

      (2) Strong integration of non-equilibrium thermodynamics with biologically motivated transport scenarios. 

      (3) Thorough numerical and analytical support, with thoughtful connection to current and emerging experimental techniques. 

      (4) Relevance to various systems, including biomolecular condensates and engineered aqueous two-phase systems. 

      Weaknesses: 

      (1) The work remains theoretical, mainly, with limited direct comparison to quantitative experimental data. 

      We agree with the reviewer, an experimental manuscript is in progress.

      (2) The biological implications are only briefly explored; further discussion of specific systems where interface resistance might play a functional role would enhance the impact.

      We thank the reviewer for this comment. We will add several such scenarios to the discussion, including the possibility to use interface resistance as a way of ordering biochemical reactions in time, as well as their potential to exclude molecules from condensates for long time periods, which, while not effective in the long-time limit, could help on cellular timescales of minutes to hours to respond to transient events.

      (3) Some model assumptions (e.g., symmetric labeling or idealized diffusivity profiles) could be further contextualized regarding biological variability. 

      The treatment of labelled and unlabelled molecules as physically identical is well supported by our experiments. Droplets under typical experimental conditions, i.e. when bleaching is not too strong, do not markedly change size or volume fraction of molecules, which would be expected if the physical properties like molecular volume or interaction strength were significantly changed. However, we do agree that in more extreme bleaching regimes the bleach step itself will change the droplet properties, but this can be avoided by tuning the FRAP laser power and dwell times accordingly.

      Our diffusivity profiles are chosen in the simplest possible way to handle typical experimental constraints (large D outside, lower D inside, potentially lowered D at the boundary) and allow for a mean-field treatment. To the best of our knowledge, the precise make-up and concentration profiles of phase boundaries in biomolecular condensates are not currently known, due to limitations in optical resolution.

      Reviewer #3 (Public review): 

      The manuscript investigated the kinetics of molecule transport across interfaces in phase-separated mixtures. Through the development of a theoretical approach for a binary mixture in a sharp interface limit, the authors found that interface resistance leads to a slowdown in interfacial movement. Subsequently, they extended this approach to multiple molecular species (incorporating both labeled and unlabeled molecules) and continuous transport models. Finally, they proposed experimental settings in vitro and commented on the necessary optical resolution to detect signatures of interfacial kinetics associated with resistance. 

      The investigation of transport kinetics across biomolecular condensate interfaces holds significant relevance for understanding cellular function and dysfunction mechanisms; thus, the topic is important and timely. However, the current manuscript presentation requires improvement. Firstly, the inclusion of numerous equations in the main text substantially compromises readability, and relocation of a part of the formulae and derivations to the Appendix would be more appropriate. Secondly, the manuscript would benefit from more comprehensive comparisons with existing theoretical studies on molecular transport kinetics. The text should also be written to be more approachable for a general readership. Modifications and sufficient responses to the specific points outlined below are recommended. 

      (1) The authors introduced a theoretical framework to study the kinetics of molecules across an interface between two coexisting liquid phases and found that interface resistance leads to a slowdown in interfacial movement in a binary mixture and a decelerated molecule exchange between labeled and unlabeled molecules across the phase boundary. However, these findings appear rather expected. The work would be strengthened by a more thorough discussion of the kinetics of molecule transport across interfaces (such as the physical origin of the interface resistance and its specific impact on transport kinetics). 

      We thank the reviewer for this comment and will discuss possible mechanisms and how they map to our meanfield model in more detail, both in the corresponding results section, and in the discussion, as also outlined in our response to Reviewer #1.

      (2) The formulae in the manuscript should be checked and corrected. Notably, Equation 10 contains "\phi_2\ln\phi_2" while Eq. 11b shows "n^{-1}\ln\phi_2", suggesting a missing factor of "n^{-1}". Similarly, Equation 18 obtained from Equation 11: the logarithmic term in Eq.11a is "n<sup>^</sup>{-1}\ln phi_1-\ln(1-\phi)" but the pre-exponential factor in Equation 18a is just "\phi_1/(1-\phi*)", where is "n<sup>^</sup>{-1}"? Additionally, there is a unit inconsistency in Equation 36, where the unit of \rho (s/m) does not match that of the right-hand side expression (s/m<sup>^</sup>2). 

      We thank the reviewer. We identified that the error originates in the inline definition of the exchange chemical potential, already before equation 11. We inadvertently dropped a prefactor of n, which then shows up in the following equation as an exponent to (1-phi<sup>^</sup>*). Very importantly this means the main result eq. 25 still holds, and in the revised manuscript we will correct the ensuing typographical mistakes.

      (3) The authors stated that the numerical solutions are obtained using a custom finite difference scheme implemented in MATLAB in the Appendix. The description of numerical methods is insufficiently detailed and needs to be expanded, including specific equations or models used to obtain specific figures, the introduction of initial and boundary conditions, the choices of parameters and their reasons in terms of the biology.

      We will substantially expand the Appendix for the numerical solutions and add an explanatory file to the repository to make clear how the code can be run, as well as its dependencies.

      (4) The authors claimed that their framework naturally extends to multiple molecular species, but only showed the situation of labeled and unlabeled molecules across a phase boundary. How about three or more molecular species? Does this framework still work? This should be added to strengthen the manuscript and confirm the framework's general applicability. 

      We have shown in Bo et al. (2021) that the labelling approach can be carried over to multi-component systems. Each species may, for example, encounter its own interface resistance. We will discuss this in more detail in the revised manuscript.

    1. eLife Assessment

      This valuable work advances our understanding of the relation between multimodal MRI, cognition, and mental health. Convincing use of statistical learning techniques in UK Biobank data shows that 48% of the variance between an 11-task derived g-factor and imaging data can be explained. Overall, this paper contributes to the study of brain-behaviour relations and will be of interest for both its methods and its findings on how much variance in g can be explained.

      [Editorial note: a previous version was reviewed by Biological Psychiatry]

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to examine how the covariation between cognition (represented by a g-factor based on 12 features of 11 cognitive tasks) and mental health (represented by 133 diverse features) is reflected in MR-based neural markers of cognition, as measured through multimodal neuroimaging (structural, rsfMRI, and diffusion MR). To integrate multiple neuroimaging phenotypes across MRI modalities, they used a so-called stacking approach, which employs two levels of machine learning. First, they built a predictive model from each neuroimaging phenotype to predict a target variable. Next, in the stacking level, they used predicted values (i.e., cognition predicted from each neuroimaging phenotype) from the first level as features to predict the target variable. To quantify the contribution of the neural indicators of cognition explaining the relationship between cognition and mental health, they conducted commonality analyses. Results showed that when they stacked neuroimaging phenotypes within dwMRI, rsMRI, and sMRI, they captured 25.5%, 29.8%, and 31.6% of the predictive relationship between cognition and mental health, respectively. By stacking all 72 neuroimaging phenotypes across three MRI modalities, they enhanced the explanation to 48%. Age and sex shared substantial overlapping variance with both mental health and neuroimaging in explaining cognition, accounting for 43% of the variance in the cognition-mental health relationship.

      Strengths:

      (1) A big study population (UK Biobank with 14000 subjects).

      (2) The description of the methods (including Figure 1) is helpful in understanding the approach.

      (3) This revised manuscript is much improved compared to the previous version.

      Weaknesses:

      (1) Although the background and reason for the study are better described in this version of the manuscript, the relevance of the question is, in my opinion, still questionable. The authors aimed to determine whether neural markers of cognition explain the covariance between cognition and mental health and which of the 72 MRI-based features contribute to explaining most of the covariance. I would like to invite the authors to make a stronger case for the relevance, keeping the clinical and scientific relevance in mind (what would you explain to the clinician, what would you explain to the people with lived experience, and how can this knowledge contribute to innovation in mental health care?).

      (2) The discussion on the interpretation of the positive and negative PLRS loadings is not very convincing, and the findings are partly counterintuitive. For example (1) how to explain that distress has a positive loading and anxiety/trauma has a negative loading?; (2) how to explain that mental health features like wellbeing and happiness load in the same direction as psychosis and anxiety/trauma? From both a clinical and a neuroscientific perspective, this is hard to interpret.

      (3) The analysis plan has not been preregistered (e.g. at OSF).

      Note: the computational aspects of the methods fall beyond my expertise.

    3. Reviewer #2 (Public review):

      Summary:

      The goal of this manuscript was to examine whether neural indicators explain the relationship between cognition and mental health. The authors achieved this aim by showing that the combination of MRI markers better predicted the cognition-mental health covariation.

      Strengths:

      The evidence supporting the conclusions is compelling. There is a large sample (UK biobank data) and a clear description of advanced analyses.

      Weaknesses:

      In the previous version of the paper, it was not completely clear what it means to look at the overlap between cognition and mental health. The authors have addressed this in the current version.

    4. Author response:

      Notes to Editors

      We previously received comments from three reviewers at Biological Psychiatry, which we have addressed in detail below. The following is a summary of the reviewers’ comments along with our responses.

      Reviewers 1 and 2 sought clearer justification for studying the cognition-mental health overlap (covariation) and its neuroimaging correlates. In the revised manuscripts, we expanded the Introduction and Discussion to explicitly outline the theoretical implications of investigating this overlap with machine learning. We also added nuance to the interpretation of the observed associations.

      Reviewer 1 raised concerns about the accessibility of the machine learning methodology for readers without expertise in this field. We revised the Methods section to provide a clearer, step-by-step explanation of our machine learning approach, particularly the two-level machine learning through stacking. We also enhanced the description of the overall machine learning design, including model training, validation, and testing.

      In response to Reviewer 2’s request for deeper interpretation of our findings and stronger theoretical grounding, we have expanded our discussion by incorporating a thorough interpretation of how mental health indices relate to cognition, material that was previously included only in supplementary materials due to word limit constraints. We have further strengthened the theoretical justification for our study design, with particular emphasis on the importance of examining shared variance between cognition and mental health through the derivation of neural markers of cognition. Additionally, to enhance the biological interpretation of our results, we included new analyses of feature importance across neuroimaging modalities, providing clearer insights into which neural features contribute most to the observed relationships.

      Notably, Reviewer 3 acknowledged the strength of our study, including multimodal design, robust analytical approach, and clear visualization and interpretation of results. Their comments were exclusively methodological, underscoring the manuscript’s quality.

      Reviewer 1:

      The authors try to bridge mental health characteristics, global cognition and various MRI-derived (structural, diffusion and resting state fMRI) measures using the large dataset of UK Biobank. Each MRI modality alone explained max 25% of the cognitionmental health covariance, and when combined together 48% of the variance could be explained. As a peer-reviewer not familiar with the used methods (machine learning, although familiar with imaging), the manuscript is hard to read and I wonder what the message for the field might be. In the end of the discussion the authors state '... we provide potential targets for behavioural and physiological interventions that may affect cognition', the real relevance (and impact) of the findings is unclear to me.

      Thank you for your thorough review and practical recommendations. We appreciate your constructive comments and suggestions and hope our revisions adequately address your concerns.

      Major questions

      (1) The methods are hard to follow for people not in this specific subfield, and therefore, I expect that for readers it is hard to understand how valid and how useful the approach is.

      Thank you for your comment. To enhance accessibility for readers without a machine learning background, we revised the Methods section to clarify our analyses while retaining important technical details needed to understand our approach. Recognizing that some concepts may require prior knowledge, we provide detailed explanations of each analysis step, including the machine learning pipeline in the Supplementary Methods.

      Line 188: “We employed nested cross-validation to predict cognition from mental health indices and 72 neuroimaging phenotypes (Fig. 1). Nested cross-validation is a robust method for evaluating machine-learning models while tuning their hyperparameters, ensuring that performance estimates are both accurate and unbiased. Here, we used a nested cross-validation scheme with five outer folds and ten inner folds.

      We started by dividing the entire dataset into five outer folds. Each fold took a turn being held out as the outerfold test set (20% of the data), while the remaining four folds (80% of the data) were used as an outer-fold training set. Within each outer-fold training set, we performed a second layer of cross-validation – this time splitting the data into ten inner folds. These inner folds were used exclusively for hyperparameter tuning: models were trained on nine of the inner folds and validated on the remaining one, cycling through all ten combinations.

      We then selected the hyperparameter configuration that performed best across the inner-fold validation sets, as determined by the minimal mean squared error (MSE). The model was then retrained on the full outer-fold training set using this hyperparameter configuration and evaluated on the outer-fold test set, using four performance metrics: Pearson r, the coefficient of determination ( R<sup>2</sup>), the mean absolute error (MAE), and the MSE. This entire process was repeated for each of the five outer folds, ensuring that every data point is used for both training and testing, but never at the same time. We opted for five outer folds instead of ten to reduce computational demands, particularly memory and processing time, given the substantial volume of neuroimaging data involved in model training. Five outer folds led to an outer-fold test set at least n = 4 000, which should be sufficient for model evaluation. In contrast, we retained ten inner folds to ensure robust and stable hyperparameter tuning, maximising the reliability of model selection.

      To model the relationship between mental health and cognition, we employed Partial Least Squares Regression (PLSR) to predict the g-factor from 133 mental health variables. To model the relationship between neuroimaging data and cognition, we used a two-step stacking approach [15–17,61] to integrate information from 72 neuroimaging phenotypes across three MRI modalities. In the first step, we trained 72 base (first-level) PLSR models, each predicting the g-factor from a single neuroimaging phenotype. In the second step, we used the predicted values from these base models as input features for stacked models, which again predicted the g-factor. We constructed four stacked models based on the source of the base predictions: one each for dwMRI, rsMRI, sMRI, and a combined model incorporating all modalities (“dwMRI Stacked”, “rsMRI Stacked”, “sMRI Stacked”, and “All MRI Stacked”, respectively). Each stacked model was trained using one of four machine learning algorithms – ElasticNet, Random Forest, XGBoost, or Support Vector Regression – selected individually for each model (see Supplementary Materials, S6).

      For rsMRI phenotypes, we treated the choice of functional connectivity quantification method – full correlation, partial correlation, or tangent space parametrization – as a hyperparameter. The method yielding the highest performance on the outer-fold training set was selected for predicting the g-factor (see Supplementary Materials, S5).

      To prevent data leakage, we standardized the data using the mean and standard deviation derived from the training set and applied these parameters to the corresponding test set within each outer fold. This standardization was performed at three key stages: before g-factor derivation, before regressing out modality-specific confounds from the MRI data, and before stacking. Similarly, to maintain strict separation between training and testing data, both base and stacked models were trained exclusively on participants from the outer-fold training set and subsequently applied to the corresponding outer-fold test set.

      To evaluate model performance and assess statistical significance, we aggregated the predicted and observed g_factor values from each outer-fold test set. We then computed a bootstrap distribution of Pearson’s correlation coefficient (_r) by resampling with replacement 5 000 times, generating 95% confidence intervals (CIs) (Fig. 1). Model performance was considered statistically significant if the 95% CI did not include zero, indicating that the observed associations were unlikely to have occurred by chance.”

      (2) If only 40% of the cognition-mental health covariation can be explained by the MRI variables, how to explain the other 60% of the variance? And related to this %: why do the author think that 'this provides us confidence in using MRI to derive quantitative neuromarkers of cognition'?

      Thank you for this insightful observation. Using the MRI modalities available in the UK Biobank, we were able to account for 48% of the covariation between cognition and mental health. The remaining 52% of unexplained variance may arise from several sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research from our group and others has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank.

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the Research Domain Criteria (RDoC) framework, brain circuits represent only one level of neurobiological analysis relevant to cognition. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. We have now incorporated these considerations into the Discussion section.

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Regarding our confidence in using MRI to derive neural markers for cognition, we base this on the predictive performance of MRI-based models. As we note in the Discussion (Line 554: “Consistent with previous studies, we show that MRI data predict individual differences in cognition with a medium-size performance (r ≈ 0.4) [15–17, 28, 61, 67, 68].”), the medium effect size we observed (r ≈ 0.4) agrees with existing literature on brain-cognition relationships, confirming that machine learning leads to replicable results. This effect size represents a moderate yet meaningful association in neuroimaging studies of aging, consistent with reports linking brain to behaviour in adults (Krämer et al., 2024; Tetereva et al., 2022). For example, a recent meta-analysis by Vieira and colleagues (2022) reported a similar effect size (r = 0.42, 95% CI [0.35;0.50]). Our study includes over 15000 participants, comparable to or more than typical meta-analyses, allowing us to characterise our work as a “mega-analysis”. And on top of this predictive performance, we found our neural markers for cognition to capture half of the cognition-mental health covariation, boosting our confidence in our approach.

      Krämer C, Stumme J, da Costa Campos L, Dellani P, Rubbert C, Caspers J, et al. Prediction of cognitive performance differences in older age from multimodal neuroimaging data. GeroScience. 2024;46:283–308.

      Tetereva A, Li J, Deng JD, Stringaris A, Pat N. Capturing brain cognition relationship: Integrating task‐based fMRI across tasks markedly boosts prediction and test‐retest reliability. NeuroImage. 2022;263:119588.

      (3) Imagine that we can increase the explained variance using multimodal MRI measures, why is it useful? What does it learn us? What might be the implications?

      We assume that by variance, Reviewer 1 referred to the cognition-mental health covariation mentioned in point 2) above.

      If we can increase the explained cognition-mental health covariation using multimodal MRI measures, it would mean that we have developed a reasonable neuromarker that is close to RDoC’s neurobiological unit of analysis for cognition. RDoC treats cognition as one of the main basic functional domains that transdiagnostically underly mental health. According to RDoC, mental health should be studied in relation to cognition, alongside other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. RDoC further emphasizes that each domain, including cognition, should be investigated not only at the behavioural level but also through its neurobiological correlates. This means RDoC aims to discover neural markers of cognition that explain the covariation between cognition and mental health. For us, we approach the development of such neural markers using multimodal neuroimaging. We have now explained the motivation of our study in the first paragraph of the Introduction.

      Line 43: “Cognition and mental health are closely intertwined [1]. Cognitive dysfunction is present in various mental illnesses, including anxiety [2, 3], depression [4–6], and psychotic disorders [7–12]. National Institute of Mental Health’s Research Domain Criteria (RDoC) [13,14] treats cognition as one of the main basic functional domains that transdiagnostically underly mental health. According to RDoC, mental health should be studied in relation to cognition, alongside other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. RDoC further emphasizes that each domain, including cognition, should be investigated not only at the behavioural level but also through its neurobiological correlates. In this study, we aim to examine how the covariation between cognition and mental health is reflected in neural markers of cognition, as measured through multimodal neuroimaging.”

      More specific issues:

      Introduction

      (4) In the intro the sentence 'in some cases, altered cognitive functioning is directly related to psychiatric symptom severity' is in contrast to the next sentence '... are often stable and persist upon alleviation of psychiatric symptoms'.

      Thank you for pointing this out. The first sentence refers to cases where cognitive deficits fluctuate with symptom severity, while the second emphasizes that core cognitive impairments often remain stable even during symptom remission. To avoid this confusion, we have removed these sentences.

      (5) In the intro the text on the methods (various MRI modalities) is not needed for the Biol Psych readers audience.

      We appreciate your comment. While some members of our target audience may have backgrounds in neuroimaging, machine learning, or psychiatry, we recognize that not all readers will be familiar with all three areas. To ensure accessibility for those who are not familiar with neuroimaging, we included a brief overview of the MRI modalities and quantification methods used in our study to provide context for the specific neuroimaging phenotypes. Additionally, we provided background information on the machine learning techniques employed, so that readers without a strong background in machine learning can still follow our methodology.

      (6) Regarding age of the study sample: I understand that at recruitment the subjects' age ranges from 40 to 69 years. At MRI scanning the age ranges between about 46 to 82. How is that possible? And related to the age of the population: how did the authors deal with age in the analyses, since age is affecting both cognition as the brain measures?

      Thank you for noticing this. In the Methods section, we first outline the characteristics of the UK Biobank cohort, including the age at first recruitment (40-69 years). Table 1 then shows the characteristics of participant subsamples included in each analysis. Since our study used data from Instance 2 (the second in-person visit), participants were approximately 5-13 years older at scanning, resulting in the age range of 46 to 82 years. We clarified the Table 1 caption as follows:

      Line 113: “Table 1. Demographics for each subsample analysed: number, age, and sex of participants who completed all cognitive tests, mental health questionnaires, and MRI scanning”

      We acknowledge that age may influence cognitive and neuroimaging measures. In our analyses, we intentionally preserved age-related variance in brain-cognition relationships across mid and late adulthood, as regressing out age completely would artificially remove biologically meaningful associations. At the same time, we rigorously addressed the effects of age and sex through additional commonality analyses quantifying age and sex contributions to the relationship between cognition and mental health.

      As noted by Reviewer 1 and illustrated in Figure 8, age and sex shared substantial overlapping variance with both mental health and neuroimaging phenotypes in explaining cognitive outcomes. For example, in Figure 8i, age and sex together accounted for 43% of the variance in the cognition-mental health relationship:

      (2.76 + 1.03) / (2.76 + 1.03 + 3.52 + 1.45) ≈ 0.43

      Furthermore, neuromarkers from the all-MRI stacked model explained 72% of this age/sexrelated variance:

      2.76 / (2.76 + 1.03) ≈ 0.72

      This indicates that our neuromarkers captured a substantial portion of the cognition-mental health covariation that varied with age and sex, highlighting their relevance in age/sex-sensitive cognitive modeling.

      In the Methods, Results, and Discussion, we say:

      Methods

      Line 263: “To understand how demographic factors, including age and sex, contribute to this relationship, we also conducted a separate set of commonality analyses treating age, sex, age2, age×sex, and age2×sex as an additional set of explanatory variables (Fig. 1).”

      Results

      Line 445: “Age and sex shared substantial overlapping variance with both mental health and neuroimaging in explaining cognition, accounting for 43% of the variance in the cognition-mental health relationship. Multimodal neural marker of cognition based on three MRI modalities (“All MRI Stacked”) explained 72% of this age and sex-related variance (Fig. 8i–l and Table S21).”

      Discussion

      Line 660: “We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.”

      (7) Regarding the mental health variables: where characteristics with positive value (e.g. happiness and subjective wellbeing) reversely scored (compared to the negative items, such as anxiety, addition, etc)?

      We appreciate you noting this. These composite scores primarily represent standard clinical measures such as the GAD-7 anxiety scale and N-12 neuroticism scale. We did not reverse the scores to keep their directionality, therefore making interpretability consistent with the original studies the scores were derived from (e.g., Davis et al., 2020; Dutt et al., 2022). Complete descriptive statistics for all mental health indices and detailed derivation procedures are provided in the Supplementary Materials (S2). On Page 6, Supplementary Methods, we say:

      Line 92: “Composite mental health scores included the Generalized Anxiety Disorder (GAD-7), the Posttraumatic Stress Disorder (PTSD) Checklist (PCL-6), the Alcohol Use Disorders Identification Test (AUDIT), the Patient Health Questionnaire (PHQ-9) [12], the Eysenck Neuroticism (N-12), Probable Depression Status (PDS), and the Recent Depressive Symptoms (RDS-4) scores [13, 14]. To calculate the GAD-7, PCL-6, AUDIT, and PHQ-9, we used questions introduced at the online follow-up [12]. To obtain the N-12, PDS, and RDS-4 scores [14], we used data collected during the baseline assessment [13, 14].

      We subcategorized depression and GAD based on frequency, current status (ever had depression or anxiety and current status of depression or anxiety), severity, and clinical diagnosis (depression or anxiety confirmed by a healthcare practitioner). Additionally, we differentiated between different depression statuses, such as recurrent depression, depression triggered by loss, etc. Variables related to self-harm were subdivided based on whether a person has ever self-harmed with the intent to die.

      To make response scales more intuitive, we recorded responses within the well-being domain such that the lower score corresponded to a lesser extent of satisfaction (“Extremely unhappy”) and the higher score indicated a higher level of happiness (“Extremely happy”). For all questions, we assigned the median values to “Prefer not to answer” (-818 for in-person assessment and -3 for online questionnaire) and “Do not know” (-121 for in-person assessment and -1 for online questionnaire) responses. We excluded the “Work/job satisfaction” question from the mental health derivatives list because it included a “Not employed” response option, which could not be reasonably coded.

      To calculate the risk of PTSD, we used questions from the PCL-6 questionnaire. Following Davis and colleagues [12], PCL-6 scores ranged from 6 to 29. A PCL-6 score of 12 or below corresponds to a low risk of meeting the Clinician-Administered PTSD Scale diagnostic criteria. PCL-6 scores between 13 and 16 and between 17 and 25 are indicative of an increased risk and high risk of PTSD, respectively. A score of above 26 is interpreted as a very high risk of PTSD [12, 15]. PTSD status was set to positive if the PCL-6 score exceeded or was equal to 14 and encompassed stressful events instead of catastrophic trauma alone [12].

      To assess alcohol consumption, alcohol dependence, and harm associated with drinking, we calculated the sum of the ten questions from the AUDIT questionnaire [16]. We additionally subdivided the AUDIT score into the alcohol consumption score (questions 1-3, AUDIT-C) and the score reflecting problems caused by alcohol (questions 4-10, AUDIT-P) [17]. In questions 2-10 that followed the first trigger question (“Frequency of drinking alcohol”), we replaced missing values with 0 as they would correspond to a “Never” response to the first question.

      An AUDIT score cut-off of 8 suggests moderate or low-risk alcohol consumption, and scores of 8 to 15 and above 15 indicate severe/harmful and hazardous (alcohol dependence or moderate-severe alcohol use disorder) drinking, respectively [16, 18]. Subsequently, hazardous alcohol use and alcohol dependence status correspond to AUDIT scores of ≥ 8 and ≥ 15, respectively. The “Alcohol dependence ever” status was set to positive if a participant had ever been physically dependent on alcohol. To reduce skewness, we logx+1-transformed the AUDIT, AUDIT-C, and AUDIT-P scores [17].”

      Davis KAS, Coleman JRI, Adams M, Allen N, Breen G, Cullen B, et al. Mental health in UK Biobank – development, implementation and results from an online questionnaire completed by 157 366 participants: a reanalysis. BJPsych Open. 2020;6:e18.

      Dutt RK, Hannon K, Easley TO, Griffis JC, Zhang W, Bijsterbosch JD. Mental health in the UK Biobank: A roadmap to selfreport measures and neuroimaging correlates. Hum Brain Mapp. 2022;43:816–832.  

      (8) In the discussion section (page 23, line 416-421), the authors refer to specific findings that are not described in the results section > I would add these findings to the main manuscript (including the discussion / interpretation).

      We appreciate your careful reading. We agree that our original Results section did not explicitly describe the factor loadings for mental health in the PLSR model, despite discussing their implications later in the paper. We needed to include this part of the discussion in the Supplementary Materials to meet the word limit of the original submission. However, in response to your suggestion, we have now added the results regarding factor loadings to the Results section. We also moved the discussion of the association between mental health features and general cognition from the Supplementary Material to the manuscript’s Discussion.

      Results

      Line 298: “On average, information about mental health predicted the g-factor at  R<sup>2</sup><sub>mean</sub> = 0.10 and r<sub>mean</sub> \= 0.31 (95% CI [0.291, 0.315]; Fig. 2b and 2c and Supplementary Materials, S9, Table S12). The magnitude and direction of factor loadings for mental health in the PLSR model allowed us to quantify the contribution of individual mental health indices to cognition. Overall, the scores for mental distress, alcohol and cannabis use, and self-harm behaviours relate positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events relate negatively to cognition.”

      Discussion

      Line 492: “Factor loadings derived from the PLSR model showed that the scores for mental distress, alcohol and cannabis use, and self-harm behaviours related positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events related negatively to the g-factor. Positive PLSR loadings of features related to mental distress may indicate greater susceptibility to or exaggerated perception of stressful events, psychological overexcitability, and predisposition to rumination in people with higher cognition [72]. On the other hand, these findings may be specific to the UK Biobank cohort and the way the questions for this mental health category were constructed. In particular, to evaluate mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress. In this regard, the estimate for mental distress may be more indicative of whether an individual experiencing mental distress had an opportunity or aspiration to visit a doctor and seek professional help [73]. Thus, people with better cognitive abilities and also with a higher socioeconomic status may indeed be more likely to seek professional help.

      Limited evidence supports a positive association between self-harm behaviours and cognitive abilities, with some studies indicating higher cognitive performance as a risk factor for non-suicidal self-harm. Research shows an inverse relationship between cognitive control of emotion and suicidal behaviours that weakens over the life course [73,74]. Some studies have found a positive correlation between cognitive abilities and the risk of nonsuicidal self-harm, suicidal thoughts, and suicidal plans that may be independent of or, conversely, affected by socioeconomic status [75,76]. In our study, the magnitude of the association between self-harm behaviours and cognition was low (Fig. 2), indicating a weak relationship.

      Positive PLSR loadings of features related to alcohol and cannabis may also indicate the influence of other factors. Overall, this relationship is believed to be largely affected by age, income, education, social status, social equality, social norms, and quality of life [79–80]. For example, education level and income correlate with cognitive ability and alcohol consumption [79,81–83]. Research also links a higher probability of having tried alcohol or recreational drugs, including cannabis, to a tendency of more intelligent individuals to approach evolutionary novel stimuli [84,85]. This hypothesis is supported by studies showing that cannabis users perform better on some cognitive tasks [86]. Alternatively, frequent drinking can indicate higher social engagement, which is positively associated with cognition [87]. Young adults often drink alcohol as a social ritual in university settings to build connections with peers [88]. In older adults, drinking may accompany friends or family visits [89,90]. Mixed evidence on the link between alcohol and drug use and cognition makes it difficult to draw definite conclusions, leaving an open question about the nature of this relationship.

      Consistent with previous studies, we showed that anxiety and negative traumatic experiences were inversely associated with cognitive abilities [90–93]. Anxiety may be linked to poorer cognitive performance via reduced working memory capacity, increased focus on negative thoughts, and attentional bias to threatening stimuli that hinder the allocation of cognitive resources to a current task [94–96]. Individuals with PTSD consistently showed impaired verbal and working memory, visual attention, inhibitory function, task switching, cognitive flexibility, and cognitive control [97–100]. Exposure to traumatic events that did not reach the PTSD threshold was also linked to impaired cognition. For example, childhood trauma is associated with worse performance in processing speed, attention, and executive function tasks in adulthood, and age at a first traumatic event is predictive of the rate of executive function decline in midlife [101,102]. In the UK Biobank cohort, adverse life events have been linked to lower cognitive flexibility, partially via depression level [103].

      In agreement with our findings, cognitive deficits are often found in psychotic disorders [104,105]. We treated neurological and mental health symptoms as predictor variables and did not stratify or exclude people based on psychiatric status or symptom severity. Since no prior studies have examined isolated psychotic symptoms (e.g., recent unusual experiences, hearing unreal voices, or seeing unreal visions), we avoid speculating on how these symptoms relate to cognition in our sample.

      Finally, negative PLSR loadings of the features related to happiness and subjective well-being may be specific to the study cohort, as these findings do not agree with some previous research [107–109]. On the other hand, our results agree with the study linking excessive optimism or optimistic thinking to lower cognitive performance in memory, verbal fluency, fluid intelligence, and numerical reasoning tasks, and suggesting that pessimism or realism indicates better cognition [110]. The concept of realism/optimism as indicators of cognition is a plausible explanation for a negative association between the g-factor and friendship satisfaction, as well as a negative PLSR loading of feelings that life is meaningful, especially in older adults who tend to reflect more on the meaning of life [111]. The latter is supported by the study showing a negative association between cognitive function and the search for the meaning of life and a change in the pattern of this relationship after the age of 60 [112]. Finally, a UK Biobank study found a positive association of happiness with speed and visuospatial memory but a negative relationship with reasoning ability [113].”

      (9) In the discussion section (page 24, line 440-449), the authors give an explanation on why the diffusion measure have limited utility, but the arguments put forward also concern structural and rsfMRI measures.

      Thank you for this important observation. Indeed, the argument about voxel-averaged diffusion components (“… these metrics are less specific to the properties of individual white matter axons or bundles, and instead represent a composite of multiple diffusion components averaged within a voxel and across major fibre pathways”) could theoretically apply across other MRI modalities. We have therefore removed this point from the discussion to avoid overgeneralization. However, we maintain our central argument about the biological specificity of conventional tractography-derived diffusion metrics as their particular sensitivity to white matter microstructure (e.g., axonal integrity, myelin content) may make them better suited for detecting neuropathological changes than dynamic cognitive processes. This interpretation aligns with the mixed evidence linking these metrics to cognitive performance, despite their established utility in detecting white matter abnormalities in clinical populations (e.g., Bergamino et al., 2021; Silk et al., 2009). We clarify this distinction in the manuscript.

      Line 572: “The somewhat limited utility of diffusion metrics derived specifically from probabilistic tractography in serving as robust quantitative neuromarkers of cognition and its shared variance with mental health may stem from their greater sensitivity and specificity to neuronal integrity and white matter microstructure rather than to dynamic cognitive processes. Critically, probabilistic tractography may be less effective at capturing relationships between white matter microstructure and behavioural scores cross-sectionally, as this method is more sensitive to pathological changes or dynamic microstructural alterations like those occurring during maturation. While these indices can capture abnormal white matter microstructure in clinical populations such as Alzheimer’s disease, schizophrenia, or attention deficit hyperactivity disorder (ADHD) [117–119], the empirical evidence on their associations with cognitive performance is controversial [114, 120–126].”

      Bergamino M, Walsh RR, Stokes AM. Free-water diffusion tensor imaging improves the accuracy and sensitivity of white matter analysis in Alzheimer’s disease. Sci Rep. 2021;11:6990.

      Silk TJ, Vance A, Rinehart N, Bradshaw JL, Cunnington R. White-matter abnormalities in attention deficit hyperactivity disorder: a diffusion tensor imaging study. Hum Brain Mapp. 2009;30:2757–2765.

      Reviewer 2:

      This is an interesting study combining a lot of data to investigate the link between cognition and mental health. The description of the study is very clear, it's easy to read for someone like me who does not have a lot of expertise in machine learning.

      We thank you for your thorough review and constructive feedback. Your insightful comments have helped us identify conceptual and methodological aspects that required improvement in the manuscript. We have incorporated relevant changes throughout the paper, and below, we address each of your points in detail.

      Comment 1: My main concern with this manuscript is that it is not yet clear to me what it exactly means to look at the overlap between cognition and mental health. This relation is r=0.3 which is not that high, so why is it then necessary to explain this overlap with neuroimaging measures? And, could it be that the relation between cognition and mental health is explained by third variables (environment? opportunities?). In the introduction I miss an explanation of why it is important to study this and what it will tell us, and in the discussion I would like to read some kind of 'answer' to these questions.

      Thank you. It’s important to clarify why we investigated the relationship between cognition and mental health, and what we found using data from the UK Biobank.

      Conceptually, our work is grounded in the Research Domain Criteria (RDoC; Insel et al., 2010) framework. RDoC conceptualizes mental health not through traditional diagnostic categories, but through core functional domains that span the full spectrum from normal to abnormal functioning. These domains include cognition, negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. Within this framework, cognition is considered a fundamental domain that contributes to mental health across diagnostic boundaries. Meta-analytic evidence supports a link between cognitive functioning and mental health (Abramovitch, et al., 2021; East-Richard, et al., 2020). In the context of a large, population-based dataset like the UK Biobank, this implies that cognitive performance – as measured by various cognitive tasks – should be meaningfully associated with available mental health indicators.

      However, because cognition is only one of several functional domains implicated in mental health, we do not expect the covariation between cognition and mental health to be very high. Other domains, such as negative and positive valence systems, arousal and regulatory systems, or social processing, may also play significant roles. Theoretically, this places an upper bound on the strength of the cognition-mental health relationship, especially in normative, nonclinical samples.

      Our current findings from the UK Biobank reflect this. Most of the 133 mental health variables showed relatively weak individual correlations with cognition (mean r \= 0.01, SD = 0.05, min r \= –0.08, max r \= 0.17; see Figure 2). However, using a PLS-based machine learning approach, we were able to integrate information across all mental-health variables to predict cognition, yielding an out-of-sample correlation of r = 0.31 [95% CI: 0.29, 0.32].  

      We believe this estimate approximates the true strength of the cognition-mental health relationship in normative samples, consistent with both theoretical expectations and prior empirical findings. Theoretically, this aligns with the RDoC view that cognition is one of several contributing domains. Empirically, our results are consistent with findings from our previous mega-analysis in children (Wang et al., 2025). Moreover, in the field of gerontology, an effect size of r = 0.31 is not considered small. According to Brydges (2019), it falls around the 70th percentile of effect sizes reported in gerontological studies and approaches the threshold for a large effect (r \= 0.32). Given that most studies report within-sample associations, our out-of-sample results are likely more robust and generalizable (Yarkoni & Westfall, 2017).

      To answer, “why is it then necessary to explain this overlap with neuroimaging measures”, we again draw on the conceptual foundation of the RDoC framework. RDoC emphasizes that each functional domain, such as cognition, should be studied not only at the behavioural level but also across multiple neurobiological units of analysis, including genes, molecules, cells, circuits, physiology, and behaviour.

      MRI-based neural markers represent one such level of analysis. While other biological systems (e.g., genetic, molecular, or physiological) also contribute to the cognition-mental health relationship, neuroimaging provides unique insights into the brain mechanisms underlying this association – insights that cannot be obtained from behavioural data alone.

      In response to the related question, “Could the relationship between cognition and mental health be explained by third variables (e.g., environment, opportunities)?”, we note that developing a neural marker of cognition capable of capturing its relationship with mental health is the central aim of this study. Using the MRI modalities available in the UK Biobank, we were able to account for 48% of the covariation between cognition and mental health.

      The remaining 52% of unexplained variance may stem from several sources. According to the RDoC framework, neuromarkers could be further refined by incorporating additional neuroimaging modalities (e.g., task-based fMRI, PET, ASL, MEG/EEG, fNIRS) and integrating other units of analysis such as genetic, molecular, cellular, and physiological data.

      Once more comprehensive neuromarkers are developed, capturing a greater proportion of the cognition-mental health covariation, they may also lead to new research direction – to investigate how environmental factors and life opportunities influence these markers. However, exploring those environmental contributions lies beyond the scope of the current study.

      We discuss these considerations and explain the motivation of our study in the revised Introduction and Discussion.

      Line 481: “Our analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63–68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples – a promising direction for future research.”

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Introduction

      Line 43: “Cognition and mental health are closely intertwined [1]. Cognitive dysfunction is present in various mental illnesses, including anxiety [2, 3], depression [4–6], and psychotic disorders [7–12]. National Institute of Mental Health’s Research Domain Criteria (RDoC) [13,14] treats cognition as one of the main basic functional domains that transdiagnostically underly mental health. According to RDoC, mental health should be studied in relation to cognition, alongside other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. RDoC further emphasizes that each domain, including cognition, should be investigated not only at the behavioural level but also through its neurobiological correlates. In this study, we aim to examine how the covariation between cognition and mental health is reflected in neural markers of cognition, as measured through multimodal neuroimaging.”

      Discussion

      Line 481: “Our analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63–68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples – a promising direction for future research.”

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Insel T, Cuthbert B, Garvey M, Heinssen R, Pine DS, Quinn K, et al. Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. AJP. 2010;167:748–751.

      Abramovitch, A., Short, T., & Schweiger, A. (2021). The C Factor: Cognitive dysfunction as a transdiagnostic dimension in psychopathology. Clinical Psychology Review, 86, 102007.

      East-Richard, C., R. -Mercier, A., Nadeau, D., & Cellard, C. (2020). Transdiagnostic neurocognitive deficits in psychiatry: A review of meta-analyses. Canadian Psychology / Psychologie Canadienne, 61(3), 190–214.

      Wang Y, Anney R, Pat N. The relationship between cognitive abilities and mental health as represented by cognitive abilities at the neural and genetic levels of analysis. eLife. 2025.14:RP105537.

      Brydges CR. Effect Size Guidelines, Sample Size Calculations, and Statistical Power in Gerontology. Innovation in Aging. 2019;3(4):igz036.

      Yarkoni T, Westfall J. Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning. Perspect Psychol Sci. 2017;12(6):1100-1122.

      Comment 2 Title: - Shouldn't it be "MRI markers" (plural)?

      We used the singular form (“marker”) intentionally, as it refers to the composite neuroimaging marker derived from all three MRI modalities in our stacked model. This multimodal marker represents the combined predictive power of all modalities and captures the highest proportion of the mental health-cognition relationship in our analyses.

      Comment 3: Introduction - I miss an explanation of why it is useful to look at cognition-mental health covariation

      We believe we have sufficiently addressed this comment in our response to Reviewer 2, comment 1 above.

      Comment 4: - "Demonstrating that MRI-based neural indicators of cognition capture the covariation between cognition and mental health will thereby support the utility of such indicators for understanding the etiology of mental health" (page 4, line 56-58) - how/why?

      Previous research has largely focused on developing MRI-based neural indicators that accurately predict cognitive performance (Marek et al., 2022; Vieira et al., 2020). Building on this foundation, our findings further demonstrate that the predictive performance of a neural indicator for cognition is closely tied to its ability to explain the covariation between cognition and mental health. In other words, the robustness of a neural indicator – its capacity to capture individual differences in cognition – is strongly associated with how well it reflects the shared variance between cognition and mental health.

      This insight is particularly important within the context of the RDoC framework, which seeks to understand the etiology of mental health through functional domains (such as cognition) and their underlying neurobiological units of analysis (Insel et al., 2010). According to RDoC, for a neural indicator of cognition to be informative for mental health research, it must not only predict cognitive performance but also capture its relationship with mental health.

      Furthermore, RDoC emphasizes the integration of neurobiological measures to investigate the influence of environmental and developmental factors on mental health. In line with this, our neural indicators of cognition may serve as valuable tools in future research aimed at understanding how environmental exposures and developmental trajectories shape mental health outcomes. We discuss this in more detail in the revised Discussion.

      Line 481: “Our analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63–68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples – a promising direction for future research.”

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Marek S, Tervo-Clemmens B, Calabro FJ, Montez DF, Kay BP, Hatoum AS, et al. Reproducible brain-wide association studies require thousands of individuals. Nature. 2022;603:654–660.

      Vieira S, Gong QY, Pinaya WHL, et al. Using Machine Learning and Structural Neuroimaging to Detect First Episode Psychosis: Reconsidering the Evidence. Schizophr Bull. 2020;46(1):17-26.

      Insel T, Cuthbert B, Garvey M, Heinssen R, Pine DS, Quinn K, et al. Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. AJP. 2010;167:748–751.

      Comment 5: - The explanation about the stacking approach is not yet completely clear to me. I don't understand how the target variable can be the dependent variable in both step one and two. Or are those different variables? It would be helpful to also give an example of the target variable in line 88 on page 5

      Thank you for this excellent question. In our stacking approach, the same target variable, the g-factor, is indeed used across both modeling stages, but with a key distinction in how predictions are generated and integrated.

      In the first-level models, we trained separate Partial Least Squares Regression (PLSR) models for each of the 72 neuroimaging phenotypes, each predicting the g-factor independently. The predicted values from these 72 models were then used as input features for the second-level stacked model, which combined them to generate a final prediction of the g-factor. This twostage framework enables us to integrate information across multiple imaging modalities while maintaining a consistent prediction target.

      To avoid data leakage, both modeling stages were conducted entirely within the training set for each cross-validation fold. Only after the second-level model was trained was it applied to the outer-fold test participants who were not involved in any part of the model training process.

      To improve accessibility, we have revised the Methods section (see Page 10) to clarify this approach, ensuring that the description remains technically accurate while being easier to follow.

      Line 188: “We employed nested cross-validation to predict cognition from mental health indices and 72 neuroimaging phenotypes (Fig. 1). Nested cross-validation is a robust method for evaluating machine-learning models while tuning their hyperparameters, ensuring that performance estimates are both accurate and unbiased. Here, we used a nested cross-validation scheme with five outer folds and ten inner folds.

      We started by dividing the entire dataset into five outer folds. Each fold took a turn being held out as the outerfold test set (20% of the data), while the remaining four folds (80% of the data) were used as an outer-fold training set. Within each outer-fold training set, we performed a second layer of cross-validation – this time splitting the data into ten inner folds. These inner folds were used exclusively for hyperparameter tuning: models were trained on nine of the inner folds and validated on the remaining one, cycling through all ten combinations.

      We then selected the hyperparameter configuration that performed best across the inner-fold validation sets, as determined by the minimal mean squared error (MSE). The model was then retrained on the full outer-fold training set using this hyperparameter configuration and evaluated on the outer-fold test set, using four performance metrics: Pearson r, the coefficient of determination ( R<sup>2</sup>), the mean absolute error (MAE), and the MSE. This entire process was repeated for each of the five outer folds, ensuring that every data point is used for both training and testing, but never at the same time. We opted for five outer folds instead of ten to reduce computational demands, particularly memory and processing time, given the substantial volume of neuroimaging data involved in model training. Five outer folds led to an outer-fold test set at least n = 4 000, which should be sufficient for model evaluation. In contrast, we retained ten inner folds to ensure robust and stable hyperparameter tuning, maximising the reliability of model selection.

      To model the relationship between mental health and cognition, we employed Partial Least Squares Regression (PLSR) to predict the g-factor from 133 mental health variables. To model the relationship between neuroimaging data and cognition, we used a two-step stacking approach [15–17,61] to integrate information from 72 neuroimaging phenotypes across three MRI modalities. In the first step, we trained 72 base (first-level) PLSR models, each predicting the g-factor from a single neuroimaging phenotype. In the second step, we used the predicted values from these base models as input features for stacked models, which again predicted the g-factor. We constructed four stacked models based on the source of the base predictions: one each for dwMRI, rsMRI, sMRI, and a combined model incorporating all modalities (“dwMRI Stacked”, “rsMRI Stacked”, “sMRI Stacked”, and “All MRI Stacked”, respectively). Each stacked model was trained using one of four machine learning algorithms – ElasticNet, Random Forest, XGBoost, or Support Vector Regression – selected individually for each model (see Supplementary Materials, S6).

      For rsMRI phenotypes, we treated the choice of functional connectivity quantification method – full correlation, partial correlation, or tangent space parametrization – as a hyperparameter. The method yielding the highest performance on the outer-fold training set was selected for predicting the g-factor (see Supplementary Materials, S5).

      To prevent data leakage, we standardized the data using the mean and standard deviation derived from the training set and applied these parameters to the corresponding test set within each outer fold. This standardization was performed at three key stages: before g-factor derivation, before regressing out modality-specific confounds from the MRI data, and before stacking. Similarly, to maintain strict separation between training and testing data, both base and stacked models were trained exclusively on participants from the outer-fold training set and subsequently applied to the corresponding outer-fold test set.

      To evaluate model performance and assess statistical significance, we aggregated the predicted and observed gfactor values from each outer-fold test set. We then computed a bootstrap distribution of Pearson’s correlation coefficient (r) by resampling with replacement 5 000 times, generating 95% confidence intervals (CIs) (Fig. 1). Model performance was considered statistically significant if the 95% CI did not include zero, indicating that the observed associations were unlikely to have occurred by chance.”

      Comment 6: Methods - It's not clear from the text and Figure 1 which 12 scores from 11 tests are being used to derive the g-factor. Figure 1 shows only 8 bullet points with 10 scores in A and 13 tests under 'Cognitive tests' in B. Moreover, Supplement S1 describes 12 tests and 14 measures (Prospective Memory test is in the text but not in Supplementary Table 1).

      Thank you for identifying this discrepancy. In the original Figure 1b and in the Supplementary Methods (S1), the “Prospective Memory” test was accidentally duplicated, while it was present in the Supplementary Table 1 (Line 53, Supplementary Table 1). We have now corrected both figures for consistency. To clarify: Figure 1a presents the global mental health and cognitive domains studied, while Figure 1b now accurately lists 1) the 12 cognitive scores from 11 tests used to derive the g-factor (with the Trail Making Test contributing two measures – numeric and alphabetic trails) and 2) the three main categories of mental health indices used as machine learning features.

      We also corrected the Supplementary Materials to remove the duplicate test from the first paragraph. In Supplementary Table 1, there were 11 tests listed, and for the Trail Making test, we specified in the “Core measures” column that this test had 2 derivative scores: duration to complete the numeric path (Trail 1) and duration to complete the alphabetic path (Trail 2).

      Supplementary Materials, Line 46: “We used twelve scores from the eleven cognitive tests that represented the following cognitive domains: reaction time and processing speed (Reaction Time test), working memory (Numeric Memory test), verbal and numerical reasoning (Fluid Intelligence test), executive function (Trail Making Test), non-verbal fluid reasoning (Matrix Pattern Completion test), processing speed (Symbol Digit Substitution test), vocabulary (Picture Vocabulary test), planning abilities (Tower Rearranging test), verbal declarative memory (Paired Associate Learning test), prospective memory (Prospective Memory test), and visual memory (Pairs Matching test) [1].”

      Comment 7: - For the mental health measures: If I understand correctly, the questionnaire items were used individually, but also to create composite scores. This seems counterintuitive, because I would assume that if the raw data is used, the composite scores would not add additional information to that. When reading the Supplement, it seems like I'm not correct… It would be helpful to clarify the text on page 7 in the main text.

      You raise an excellent observation regarding the use of both individual questionnaire items and composite scores. This dual approach was methodologically justified by the properties of Partial Least Squares Regression (PLSR), our chosen first-level machine learning algorithm, which benefits from rich feature sets and can handle multicollinearity through dimensionality reduction. PLSR transforms correlated features into latent variables, meaning both individual items and composite scores can contribute unique information to the model. We elaborate on PLSR's mathematical principles in Supplementary Materials (S5).

      To directly address this concern, we conducted comparative analyses showing that the PLSR model (a single 80/20% training/test split), incorporating all 133 mental health features (both items and composites), outperformed models using either type alone. The full model achieved superior performance (MSE = 0.458, MAE = 0.537, \= 0.112, Pearson r = 0.336, p-value = 6.936e-112) compared to using only composite scores (93 features; MSE = 0.461, MAE = 0.538, R<sup>2</sup> = 0.107, Pearson r = 0.328, p-value = 5.8e-106) or only questionnaire items (40 features; MSE = 0.499, MAE = 0.561, R<sup>2</sup> = 0.033, Pearson r = 0.184, p-value = 2.53e-33). These results confirm that including both data types provide complementary predictive value. We expand on these considerations in the revised Methods section.

      Line 123: “Mental health measures encompassed 133 variables from twelve groups: mental distress, depression, clinical diagnoses related to the nervous system and mental health, mania (including bipolar disorder), neuroticism, anxiety, addictions, alcohol and cannabis use, unusual/psychotic experiences, traumatic events, selfharm behaviours, and happiness and subjective well-being (Fig. 1 and Tables S4 and S5). We included both selfreport questionnaire items from all participants and composite diagnostic scores computed following Davis et al. and Dutt et al. [35,36] as features in our first-level (for explanation, see Data analysis section) Partial Least Squares Regression (PLSR) model. This approach leverages PLSR’s ability to handle multicollinearity through dimensionality reduction, enabling simultaneous use of granular symptom-level information and robust composite measures (for mental health scoring details, see Supplementary Materials, S2). We assess the contribution of each mental health index to general cognition by examining the direction and magnitude of its PLSR-derived loadings on the identified latent variables”

      Comment 8: - Results - The colors in Figure 4 B are a bit hard to differentiate.

      We have updated Figure 4 to enhance colour differentiation by adjusting saturation and brightness levels, improving visual distinction. For further clarity, we split the original figure into two separate figures.

      Comment 9: - Discussion - "Overall, the scores for mental distress, alcohol and cannabis use, and self-harm behaviours relate positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events relate negatively to cognition," - this seems counterintuitive, that some symptoms relate to better cognition and others relate to worse cognition. Could you elaborate on this finding and what it could mean?

      We appreciate you highlighting this important observation. While some associations between mental health indices and cognition may appear counterintuitive at first glance, these patterns are robust (emerging consistently across both univariate correlations and PLSR loadings) and align with previous literature (e.g., Karpinski et al., 2018; Ogueji et al., 2022). For instance, the positive relationship between cognitive ability and certain mental health indicators like help-seeking behaviour has been documented in other population studies (Karpinski et al., 2018; Ogueji et al., 2022), potentially reflecting greater health literacy and access to care among cognitively advantaged individuals. Conversely, the negative associations with conditions like psychotic experiences mirror established neurocognitive deficits in these domains.

      As was initially detailed in Supplementary Materials (S12) and now expanded in our Discussion, these findings likely reflect complex multidimensional interactions. The positive loadings for mental distress indicators may capture: (1) greater help-seeking behaviour among those with higher cognition and socioeconomic resources, and/or (2) psychological overexcitability and rumination tendencies in high-functioning individuals. These interpretations are particularly relevant to the UK Biobank's assessment methods, where mental distress items focused on medical help-seeking rather than symptom severity per se (e.g., as a measure of mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress).

      Line 492: “Factor loadings derived from the PLSR model showed that the scores for mental distress, alcohol and cannabis use, and self-harm behaviours related positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events related negatively to the g-factor. Positive PLSR loadings of features related to mental distress may indicate greater susceptibility to or exaggerated perception of stressful events, psychological overexcitability, and predisposition to rumination in people with higher cognition [72]. On the other hand, these findings may be specific to the UK Biobank cohort and the way the questions for this mental health category were constructed. In particular, to evaluate mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress. In this regard, the estimate for mental distress may be more indicative of whether an individual experiencing mental distress had an opportunity or aspiration to visit a doctor and seek professional help [73]. Thus, people with better cognitive abilities and also with a higher socioeconomic status may indeed be more likely to seek professional help.

      Limited evidence supports a positive association between self-harm behaviours and cognitive abilities, with some studies indicating higher cognitive performance as a risk factor for non-suicidal self-harm. Research shows an inverse relationship between cognitive control of emotion and suicidal behaviours that weakens over the life course [73,74]. Some studies have found a positive correlation between cognitive abilities and the risk of nonsuicidal self-harm, suicidal thoughts, and suicidal plans that may be independent of or, conversely, affected by socioeconomic status [75,76]. In our study, the magnitude of the association between self-harm behaviours and cognition was low (Fig. 2), indicating a weak relationship.

      Positive PLSR loadings of features related to alcohol and cannabis may also indicate the influence of other factors. Overall, this relationship is believed to be largely affected by age, income, education, social status, social equality, social norms, and quality of life [79–80]. For example, education level and income correlate with cognitive ability and alcohol consumption [79,81–83]. Research also links a higher probability of having tried alcohol or recreational drugs, including cannabis, to a tendency of more intelligent individuals to approach evolutionary novel stimuli [84,85]. This hypothesis is supported by studies showing that cannabis users perform better on some cognitive tasks [86]. Alternatively, frequent drinking can indicate higher social engagement, which is positively associated with cognition [87]. Young adults often drink alcohol as a social ritual in university settings to build connections with peers [88]. In older adults, drinking may accompany friends or family visits [89,90]. Mixed evidence on the link between alcohol and drug use and cognition makes it difficult to draw definite conclusions, leaving an open question about the nature of this relationship.

      Consistent with previous studies, we showed that anxiety and negative traumatic experiences were inversely associated with cognitive abilities [90–93]. Anxiety may be linked to poorer cognitive performance via reduced working memory capacity, increased focus on negative thoughts, and attentional bias to threatening stimuli that hinder the allocation of cognitive resources to a current task [94–96]. Individuals with PTSD consistently showed impaired verbal and working memory, visual attention, inhibitory function, task switching, cognitive flexibility, and cognitive control [97–100]. Exposure to traumatic events that did not reach the PTSD threshold was also linked to impaired cognition. For example, childhood trauma is associated with worse performance in processing speed, attention, and executive function tasks in adulthood, and age at a first traumatic event is predictive of the rate of executive function decline in midlife [101,102]. In the UK Biobank cohort, adverse life events have been linked to lower cognitive flexibility, partially via depression level [103].

      In agreement with our findings, cognitive deficits are often found in psychotic disorders [104,105]. We treated neurological and mental health symptoms as predictor variables and did not stratify or exclude people based on psychiatric status or symptom severity. Since no prior studies have examined isolated psychotic symptoms (e.g., recent unusual experiences, hearing unreal voices, or seeing unreal visions), we avoid speculating on how these symptoms relate to cognition in our sample.

      Finally, negative PLSR loadings of the features related to happiness and subjective well-being may be specific to the study cohort, as these findings do not agree with some previous research [107–109]. On the other hand, our results agree with the study linking excessive optimism or optimistic thinking to lower cognitive performance in memory, verbal fluency, fluid intelligence, and numerical reasoning tasks, and suggesting that pessimism or realism indicates better cognition [110]. The concept of realism/optimism as indicators of cognition is a plausible explanation for a negative association between the g-factor and friendship satisfaction, as well as a negative PLSR loading of feelings that life is meaningful, especially in older adults who tend to reflect more on the meaning of life [111]. The latter is supported by the study showing a negative association between cognitive function and the search for the meaning of life and a change in the pattern of this relationship after the age of 60 [112]. Finally, a UK Biobank study found a positive association of happiness with speed and visuospatial memory but a negative relationship with reasoning ability [113].”

      Karpinski RI, Kinase Kolb AM, Tetreault NA, Borowski TB. High intelligence: A risk factor for psychological and physiological overexcitabilities. Intelligence. 2018;66:8–23.

      Ogueji IA, Okoloba MM. Seeking Professional Help for Mental Illness: A Mixed-Methods Study of Black Family Members in the UK and Nigeria. Psychol Stud. 2022;67:164–177.

      Comment 10: - All neuroimaging factors together explain 48% of the variance in the cognition-mental health relationship. However, this relationship is only r=0.3 - so then the effect of neuroimaging factors seems a lot smaller… What does it mean?

      Thank you for raising this critical point. We have addressed this point in our response to Reviewer 1, comment 2, Reviewer 1, comment 3 and Reviewer 2, comment 1.

      Briefly, cognition is related to mental health at around r = 0.3 and to neuroimaging phenotypes at around r = 0.4. These levels of relationship strength are consistent to what has been shown in the literature (e.g., Wang et al., 2025 and Vieira et al., 2020). We discussed the relationship between cognition and mental health in our response to Reviewer 2, comment 1 above. In short, this relationship reflects just one functional domain – mental health may also be associated with other domains such as negative and positive valence systems, arousal and regulatory systems, social processes, and sensorimotor functions. Moreover, in the context of gerontology research, this effect size is considered relatively large (Brydges et al., 2019).

      We conducted a commonality analysis to investigate the unique and shared variance of mental health and neuroimaging phenotypes in explaining cognition.  As we discussed in our response to Reviewer 1, comment 2, we were able to account for 48% of the covariation between cognition and mental health using the MRI modalities available in the UK Biobank. The remaining 52% of unexplained variance may arise from several sources.

      One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research from our group and others has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank (Tetereva et al., 2025).

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      We have now incorporated these considerations into the Discussion section.

      Line 481: “Our analysis confirmed the validity of the g-factor [31] as a quantitative measure of cognition [31], demonstrating that it captures nearly half (39%) of the variance across twelve cognitive performance scores, consistent with prior studies [63–68]. Furthermore, we were able to predict cognition from 133 mental health indices, showing a medium-sized relationship that aligns with existing literature [69,70]. Although the observed mental health-cognition association is lower than within-sample estimates in conventional regression models, it aligns with our prior mega-analysis in children [69]. Notably, this effect size is not considered small in gerontology. In fact, it falls around the 70th percentile of reported effects and approaches the threshold for a large effect at r = 0.32 [71]. While we focused specifically on cognition as an RDoC core domain, the strength of its relationship with mental health may be bounded by the influence of other functional domains, particularly in normative, non-clinical samples – a promising direction for future research.”

      Line 658: “Although recent debates [18] have challenged the predictive utility of MRI for cognition, our multimodal marker integrating 72 neuroimaging phenotypes captures nearly half of the mental health-explained variance in cognition. We demonstrate that neural markers with greater predictive accuracy for cognition also better explain cognition-mental health covariation, showing that multimodal MRI can capture both a substantial cognitive variance and nearly half of its shared variance with mental health. Finally, we show that our neuromarkers explain a substantial portion of the age- and sex-related variance in the cognition-mental health relationship, highlighting their relevance in modeling cognition across demographic strata.

      The remaining unexplained variance in the relationship between cognition and mental health likely stems from multiple sources. One possibility is the absence of certain neuroimaging modalities in the UK Biobank dataset, such as task-based fMRI contrasts, positron emission tomography, arterial spin labeling, and magnetoencephalography/electroencephalography. Prior research has consistently demonstrated strong predictive performance from specific task-based fMRI contrasts, particularly those derived from tasks like the n-Back working memory task and the face-name episodic memory task, none of which is available in the UK Biobank [15,17,61,69,114,142,151].

      Moreover, there are inherent limitations in using MRI as a proxy for brain structure and function. Measurement error and intra-individual variability, such as differences in a cognitive state between cognitive assessments and MRI acquisition, may also contribute to the unexplained variance. According to the RDoC framework, brain circuits represent only one level of neurobiological analysis relevant to cognition [14]. Other levels, including genes, molecules, cells, and physiological processes, may also play a role in the cognition-mental health relationship.

      Nonetheless, neuroimaging provides a valuable window into the biological mechanisms underlying this overlap – insights that cannot be gleaned from behavioural data alone. Ultimately, our findings validate brain-based neural markers as a fundamental neurobiological unit of analysis, advancing our understanding of mental health through the lens of cognition.”

      Wang Y, Anney R, Pat N. The relationship between cognitive abilities and mental health as represented by cognitive abilities at the neural and genetic levels of analysis. eLife. 2025.14:RP105537.

      Vieira S, Gong QY, Pinaya WHL, et al. Using Machine Learning and Structural Neuroimaging to Detect First Episode Psychosis: Reconsidering the Evidence. Schizophr Bull. 2020;46(1):17-26.

      Brydges CR. Effect Size Guidelines, Sample Size Calculations, and Statistical Power in Gerontology. Innovation in Aging. 2019;3(4):igz036.

      Tetereva A, Knodt AR, Melzer TR, et al. Improving Predictability, Reliability and Generalisability of Brain-Wide Associations for Cognitive Abilities via Multimodal Stacking. Preprint. bioRxiv. 2025;2024.05.03.589404.

      Reviewer 3:

      Buianova et al. present a comprehensive analysis examining the predictive value of multimodal neuroimaging data for general cognitive ability, operationalized as a derived g-factor. The study demonstrates that functional MRI holds the strongest predictive power among the modalities, while integrating multiple MRI modalities through stacking further enhances prediction performance. The inclusion of a commonality analysis provides valuable insight into the extent to which shared and unique variance across mental health features and neuroimaging modalities contributes to the observed associations with cognition. The results are clearly presented and supported by highquality visualizations. Limitations of the sample are stated clearly.

      Thank you once more for your constructive and encouraging feedback. We appreciate your careful reading and valuable methodological insights. Your expertise has helped us clarify key methodological concepts and improve the overall rigour of our study.

      Suggestions for improvement:

      (1) The manuscript would benefit from the inclusion of permutation testing to evaluate the statistical significance of the predictive models. This is particularly important given that some of the reported performance metrics are relatively modest, and permutation testing could help ensure that results are not driven by chance.

      Thank you, this is an excellent point. We agree that evaluating the statistical significance of our predictive models is essential.

      In our original analysis, we assessed model performance by generating a bootstrap distribution of Pearson’s r, resampling the data with replacement 5,000 times (see Figure 3b). In response to your feedback, we have made the following updates:

      (1) Improved Figure 3b to explicitly display the 95% confidence intervals.

      (2) Supplemented the results by reporting the exact confidence interval values.

      (3) Clarified our significance testing procedure in the Methods section.

      We considered model performance statistically significant when the 95% confidence interval did not include zero, indicating that the observed associations are unlikely to have occurred by chance.

      We chose bootstrapping over permutation testing because, while both can assess statistical significance, bootstrapping additionally provides uncertainty estimates in the form of confidence intervals. Given the large sample size in our study, significance testing can be less informative, as even small effects may reach statistical significance. Bootstrapping offers a more nuanced understanding of model uncertainty.

      Line 233: “To evaluate model performance and assess statistical significance, we aggregated the predicted and observed g-factor values from each outer-fold test set. We then computed a bootstrap distribution of Pearson’s correlation coefficient (r) by resampling with replacement 5 000 times, generating 95% confidence intervals (CIs) (Fig. 1). Model performance was considered statistically significant if the 95% CI did not include zero, indicating that the observed associations were unlikely to have occurred by chance.”

      (2) Applying and testing the trained models on an external validation set would increase confidence in generalisability of the model.

      We appreciate this excellent suggestion. While we considered this approach, implementing it would require identifying an appropriate external dataset with comparable neuroimaging and behavioural measures, along with careful matching of acquisition protocols and variable definitions across sites. These challenges extend beyond the scope of the current study, though we fully agree that this represents an important direction for future research.

      Our findings, obtained from one of the largest neuroimaging datasets to date with training and test samples exceeding most previous studies, align closely with existing literature: the predictive accuracy of each neuroimaging phenotype and modality for cognition matches the effect size reported in meta-analyses (r ≈ 0.4; e.g., Vieira et al., 2020). The ability of dwMRI, rsMRI and sMRI to capture the cognition-mental health relationship is, in turn, consistent with our previous work in pediatric populations (Wang et al., 2025; Pat et al., 2022).

      Vieira S, Gong QY, Pinaya WHL, et al. Using Machine Learning and Structural Neuroimaging to Detect First Episode Psychosis: Reconsidering the Evidence. Schizophr Bull. 2020;46(1):17-26.

      Wang Y, Anney R, Pat N. The relationship between cognitive abilities and mental health as represented by cognitive abilities at the neural and genetic levels of analysis. eLife. 2025.14:RP105537.

      Pat N, Wang Y, Anney R, Riglin L, Thapar A, Stringaris A. Longitudinally stable, brain-based predictive models mediate the relationships between childhood cognition and socio-demographic, psychological and genetic factors. Hum Brain Mapp. 2022;43:5520–5542.

      (3) The rationale for selecting a 5-by-10-fold cross-validation scheme is not clearly explained. Clarifying why this structure was preferred over more commonly used alternatives, such as 10-by-10 or 5-by-5 cross-validation, would strengthen the methodological transparency.

      Thank you for this important methodological question. Our choice of a 5-by-10-fold crossvalidation scheme was motivated by the need to balance robust hyperparameter tuning with computational efficiency, particularly memory and processing time. Retaining five outer folds allowed us to rigorously assess model performance across multiple data partitions, leading to an outer-fold test set at least n = 4 000 and providing a substantial amount of neuroimaging data involved in model training. In contrast, employing ten inner folds ensured robust and stable hyperparameter tuning that maximizes the reliability of model selection. Thus, the 5-outer-fold with our large sample provided sufficient out-of-sample test set size for reliable model evaluation and efficient computation, while 10 inner folds enabled robust hyperparameter tuning. We now provide additional rationale for this design decision on Page 10.

      Line 188: “We employed nested cross-validation to predict cognition from mental health indices and 72 neuroimaging phenotypes (Fig. 1). Nested cross-validation is a robust method for evaluating machine-learning models while tuning their hyperparameters, ensuring that performance estimates are both accurate and unbiased. Here, we used a nested cross-validation scheme with five outer folds and ten inner folds.

      We started by dividing the entire dataset into five outer folds. Each fold took a turn being held out as the outerfold test set (20% of the data), while the remaining four folds (80% of the data) were used as an outer-fold training set. Within each outer-fold training set, we performed a second layer of cross-validation – this time splitting the data into ten inner folds. These inner folds were used exclusively for hyperparameter tuning: models were trained on nine of the inner folds and validated on the remaining one, cycling through all ten combinations.

      We then selected the hyperparameter configuration that performed best across the inner-fold validation sets, as determined by the minimal mean squared error (MSE). The model was then retrained on the full outer-fold training set using this hyperparameter configuration and evaluated on the outer-fold test set, using four performance metrics: Pearson r, the coefficient of determination ( R<sup>2</sup>), the mean absolute error (MAE), and the MSE. This entire process was repeated for each of the five outer folds, ensuring that every data point is used for both training and testing, but never at the same time. We opted for five outer folds instead of ten to reduce computational demands, particularly memory and processing time, given the substantial volume of neuroimaging data involved in model training. Five outer folds led to an outer-fold test set at least n = 4 000, which should be sufficient for model evaluation. In contrast, we retained ten inner folds to ensure robust and stable hyperparameter tuning, maximising the reliability of model selection.”

      (4) A more detailed discussion of which specific brain regions or features within each neuroimaging modality contributed most strongly to the prediction of cognition would enhance neurobiological relevance of the findings.

      Thank you for this thoughtful suggestion. To address this point, we have included feature importance plots for the top-performing neuroimaging phenotypes within each modality (Figure 5 and Figures S2–S4), demonstrating the relative contributions of individual features to the predictive models. While we maintain our primary focus on cross-modality performance comparisons in the main text, as this aligns with our central aim of evaluating multimodal MRI markers at the integrated level, we outline the contribution of neuroimaging features with the highest predictive performance for cognition in the revised Results and Discussion.

      Methods

      Line 255: “To determine which neuroimaging features contribute most to the predictive performance of topperforming phenotypes within each modality, while accounting for the potential latent components derived from neuroimaging, we assessed feature importance using the Haufe transformation [62]. Specifically, we calculated Pearson correlations between the predicted g-factor and scaled and centred neuroimaging features across five outer-fold test sets. We also examined whether the performance of neuroimaging phenotypes in predicting cognition per se is related to their ability to explain the link between cognition and mental health. Here, we computed the correlation between the predictive performance of each neuroimaging phenotype and the proportion of the cognition-mental health relationship it captures. To understand how demographic factors, including age and sex, contribute to this relationship, we also conducted a separate set of commonality analyses treating age, sex, age<sup>2</sup>, age×sex, and age<sup>2</sup>×sex as an additional set of explanatory variables (Fig. 1).”

      Results

      dwMRI

      Line 331: “Overall, models based on structural connectivity metrics performed better than TBSS and probabilistic tractography (Fig. 3). TBSS, in turn, performed better than probabilistic tractography (Fig. 3 and Table S13). The number of streamlines connecting brain areas parcellated with aparc MSA-I had the best predictive performance among all dwMRI neuroimaging phenotypes (R<sup>2</sup><sub>mean</sub> = 0.052, r<sub>mean</sub> = 0.227, 95% CI [0.212, 0.235]). To identify features driving predictions, we correlated streamline counts in aparc MSA-I parcellation with the predicted g_factor values from the PLSR model. Positive associations with the predicted _g-factor were strongest for left superior parietal-left caudal anterior cingulate, left caudate-right amygdala, and left putamen-left hippocampus connections. The most marked negative correlations involved left putamen-right posterior thalamus and right pars opercularis-right caudal anterior cingulate pathways (Fig. 5 and Supplementary Fig. S2).”

      rsMRI

      Line 353: “Among RSFC metrics for 55 and 21 ICs, tangent parameterization matrices yielded the highest performance in the training set compared to full and partial correlation, as indicated by the cross-validation score. Functional connections between the limbic (IC10) and dorsal attention (IC18) networks, as well as between the ventral attention (IC15) and default mode (IC11) networks, displayed the highest positive association with cognition. In contrast, functional connectivity between the limbic (IC43, the highest activation within network) and default mode (IC11) and limbic (IC45) and frontoparietal (IC40) networks, between the dorsal attention (IC18) and frontoparietal (IC25) networks, and between the ventral attention (IC15) and frontoparietal (IC40) networks, showed the highest negative association with cognition (Fig. 5 and Supplementary Fig. S3 and S4)”

      sMRI

      Line 373: “FreeSurfer subcortical volumetric subsegmentation and ASEG had the highest performance among all sMRI neuroimaging phenotypes (R<sup>2</sup><sub>mean</sub> = 0.068, r<sub>mean</sub> = 0.244, 95% CI [0.237, 0.259] and R<sup>2</sup><sub>mean</sub> = 0.059, r<sub>mean</sub> = 0.235, 95% CI [0.221, 0.243], respectively). In FreeSurfer subcortical volumetric subsegmentation, volumes of all subcortical structures, except for left and right hippocampal fissures, showed positive associations with cognition. The strongest relations were observed for the volumes of bilateral whole hippocampal head and whole hippocampus (Fig. 5 and Supplementary Fig. S5 for feature importance maps). Grey matter morphological characteristics from ex vivo Brodmann Area Maps showed the lowest predictive performance (R<sup>2</sup><sub>mean</sub> = 0.008, r<sub>mean</sub> = 0.089, 95% CI [0.075, 0.098]; Fig. 3 and Table S15).”

      Discussion

      dwMRI

      Line 562: “Among dwMRI-derived neuroimaging phenotypes, models based on structural connectivity between brain areas parcellated with aparc MSA-I (streamline count), particularly connections with bilateral caudal anterior cingulate (left superior parietal-left caudal anterior cingulate, right pars opercularis-right caudal anterior cingulate), left putamen (left putamen-left hippocampus, left putamen-right posterior thalamus), and amygdala (left caudate-right amygdala), result in a neural indicator that best reflects microstructural resources associated with cognition, as indicated by predictive modeling, and more importantly, shares the highest proportion of the variance with mental health-g, as indicated by commonality analysis.”

      rsMRI

      Line 583: “We extend findings on the superior performance of rsMRI in predicting cognition, which aligns with the literature [15, 28], by showing that it also explains almost a third of the variance in cognition that mental health captures. At the rsMRI neuroimaging phenotype level, this performance is mostly driven by RSFC patterns among 55 ICA-derived networks quantified using tangent space parameterization. At a feature level, these associations are best captured by the strength of functional connections among limbic, dorsal attention and ventral attention, frontoparietal and default mode networks. These functional networks have been consistently linked to cognitive processes in prior research [127–130].”

      sMRI

      Line 608: “Integrating information about brain anatomy by stacking sMRI neuroimaging phenotypes allowed us to explain a third of the link between cognition and mental health. Among all sMRI neuroimaging phenotypes, those that quantified the morphology of subcortical structures, particularly volumes of bilateral hippocampus and hippocampal head, explain the highest portion of the variance in cognition captured by mental health. Our findings show that, at least in older adults, volumetric properties of subcortical structures are not only more predictive of individual variations in cognition but also explain a greater portion of cognitive variance shared with mental health than structural characteristics of more distributed cortical grey and white matter. This aligns with the Scaffolding Theory that proposes stronger compensatory engagement of subcortical structures in cognitive processing in older adults [138–140].”

      (5) The formatting of some figure legends could be improved for clarity - for example, some subheadings were not formatted in bold (e.g., Figure 2 c)

      Thank you for noticing this. We have updated the figures to enhance clarity, keeping subheadings plain while bolding figure numbers and MRI modality names.

    1. eLife Assessment

      This valuable paper investigates how fish avoid thermal disturbances that occur on fast timescales. The authors use a creative experimental approach that quickly creates a vertical thermal interface, which they combine with careful behavioral analyses. The evidence supporting their results is solid, but there is a potential confounding factor between temperature and vertical positioning, and characterization of the thermal interface would greatly assist in interpreting the results.

    2. Reviewer #1 (Public review):

      Summary:

      The experiment is interesting and well executed and describes in high detail fish behaviour in thermally stratified waters. The evidence is strong but the experimental design cannot distinguish between temperature and vertical position of the treatments.

      Strengths:

      High statistical power, solid quantification of behaviour.

      Weaknesses:

      A major issue with the experimental design is the vertical component of the experiment. Many thermal preference and avoidance experiments are run using horizontal division in shuttlebox systems or in annular choice flumes. These remove the vertical stratification component so that hot and cold can be compared equally, without the vertical layering as a confounding factor. The method chosen, with its vertical stratification, is inherently unable to control for this effect because warm water is always above, and cold water is always below. This complicates the interpretations.

    3. Reviewer #2 (Public review):

      The paper by Naudascher et al., investigates an interesting question: How do fish react to and avoid thermal disturbances from the optimum that occur on fast timescales. Previous work has identified potential strategies of warm avoidance in fish on short timescales while strategies for cold avoidance are far more elusive. The work combines a clever experimental paradigm with careful analysis to show that trout parr avoid cold water by limiting excursions across a warm-cold thermal interface. While direct measurements of the interface are lacking, thermal dynamics simulations suggest that trout parr avoid the warm-cold interface in the absence of gradient information.

      The authors assume that the thermal interface triggers the upward turning behavior, possibly leading to the formation of an associative memory. However, an alternative explanation is that exposure to cold water during initial excursions increases the tendency for upward turns. In other words, exposure to a cold interface changes the behavioral state leading to increases in gravity controlled upward turning. This could be an adaptive strategy since for temperatures > 4C swimming upwards is a good strategy to reach warmer water. That being said, the vertical design offers new insight and is ecologically relevant.

    1. eLife Assessment

      This important study uses the delay line axon model in the chick brainstem auditory circuit to examine the interactions between oligodendrocytes and axons in the formation of internodal distances. This is a significant and actively studied topic, and the authors have used this preparation to support the hypothesis that regional heterogeneity in oligodendrocytes underlies the observed variation in internodal length. In a solid series of experiments, the authors have used enhanced tetanus neurotoxin light chains, a genetically encoded silencing tool, to inhibit vesicular release from axons and support the hypothesis that regional heterogeneity among oligodendrocytes may underlie the biased nodal spacing pattern in the sound localization circuit.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #2 (Public review):

      Summary:

      Egawa et al describe the developmental timeline of the assembly of nodes of Ranvier in the chick brainstem auditory circuit. In this unique system, the spacing between nodes varies significantly in different regions of the same axon from early stages, which the authors suggest is critical for accurate sound localization. Egawa et al set out to determine which factors regulate this differential node spacing. They do this by using immunohistological analyses to test the correlation of node spacing with morphological properties of the axons, and properties of oligodendrocytes, glial cells that wrap axons with the myelin sheaths that flank the nodes of Ranvier. They find that axonal structure does not vary significantly, but that oligodendrocyte density and morphology varies in the different regions traversed by these axons, which suggests this is a key determinant of the region-specific differences in node density and myelin sheath length. They also find that differential oligodendrocyte density is partly determined by secreted neuronal signals, as (presumed) blockage of vesicle fusion with tetanus toxin reduced oligodendrocyte density in the region where it is normally higher. Based on these findings, the authors propose that oligodendrocyte morphology, myelin sheath length, and consequently nodal distribution are primarily determined by intrinsic oligodendrocyte properties rather than neuronal factors such as activity.

      Major comments:

      (1) The authors should test the efficiency of TeNT to validate that vesicular release is indeed inhibited from expressing neurons. Additionally, the authors should clarify if their TeNT expression system results in the whole tract being silenced, or results in sparse vesicular release inhibition in only a few neurons.

      (2) The authors should revise their statistical analyses throughout, and supply additional information to explain the rationale for the statistical tests used, including e.g. data normality, paired sampling, number of samples/independent biological replicates.

      (3) The main finding of the study is that the density of nodes differs between two regions of the chicken auditory circuit, probably due to morphological differences in the respective oligodendrocytes. Can the authors discuss if this finding is likely to be specific to the avian auditory circuit?

      (4) The study shows a correlation between node spacing and oligodendrocyte density, but the authors did not manipulate oligodendrocyte density per se (i.e. cell-autonomously). The authors should either include such experiments, or discuss their value in supporting the interpretation of their results.

      (5) The authors should discuss very pertinent prior studies, in particular to contextualize their findings with (a) known neuron-autonomous modes of node formation prior to myelination, (b) known effects of vesicular fusion directly on myelinating capacity and oligodendrogenesis, (c) known correlation of myelin length and thickness with axonal diameter, (d) regional heterogeneity in the oligodendrocyte transcriptome.

      Significance:

      In our view the study tackles a fundamental question likely to be of interest to a specialized audience of cellular neuroscientists. This descriptive study is suggestive that in the studied system, oligodendrocyte density determines the spacing between nodes of Ranvier, but further manipulations of oligodendrocyte density per se are needed to test this convincingly.

    3. Reviewer #3 (Public review):

      Summary:

      The authors have investigated the myelination pattern along the axons of chick avian cochlear nucleus. It has already been shown that there are regional differences in the internodal length of axons in the nucleus magnocellularis. In the tract region across the midline, internodes are longer than in the nucleus laminaris region. Here the authors suggest that the difference in internodal length is attributed to heterogeneity of oligodendrocytes. In the tract region oligodendrocytes would contribute longer myelin internodes, while oligodendrocytes in the nucleus laminaris region would synthesize shorter myelin internodes. Not only length of myelin internodes differs, but also along the same axon unmyelinated areas between two internodes may vary. This is an interesting contribution since all these differences contribute to differential conduction velocity regulating ipsilateral and contralateral innervation of coincidence detector neurons. However, the demonstration falls rather short of being convincing.

      Major comments:

      (1) The authors neglect the possibility that nodal cluster may be formed prior to myelin deposition. They have investigated stages E12 (no nodal clusters) and E15 (nodal cluster plus MAG+ myelin). Fig. 1D is of dubious quality. It would be important to investigate stages between E12 and E15 to observe the formation of pre-nodes, i.e., clustering of nodal components prior to myelin deposition.

      (2) The claim that axonal diameter is constant along the axonal length need to be demonstrated at the EM level. This would also allow to measure possible regional differences in the thickness of the myelin sheath and number of myelin wraps.

      (3) The observation that internodal length differs is explain by heterogeneity of sources of oligodendrocyte is not convincing. Oligodendrocytes a priori from the same origin remyelinate shorter internode after a demyelination event.

      Significance:

      The authors suggest that the difference in internodal length is attributed to heterogeneity of oligodendrocytes. In the tract region oligodendrocytes would contribute longer myelin internodes, while oligodendrocytes in the nucleus laminaris region would synthesize shorter myelin internodes. Not only length of myelin internodes differs, but also along the same axon unmyelinated areas between two internodes may vary. This is an interesting contribution since all these differences contribute to differential conduction velocity regulating ipsilateral and contralateral innervation of coincidence detector neurons.

      Comments on revised version:

      This revised version is in large improved and the responses to reviewers' comments are generally relevant. However, the response regarding pre-nodes is not satisfactory. I understand that the authors prefer to avoid further experimentations, but I think this is an important point that needs to be clarified. Exploring stages between E12 and E15 are therefore of importance. When carefully examining some of the figures (Fig. 1E or 2D) I think that at E15 they may well be pre-nodes formation prior to myelin deposition, on structure the authors considered to be heminodes. To be convincing they should use double or triple labeling with, in addition to the nodal proteins (ankG and/or Nav pan), a good myelin marker such as antiPLP. The rat monoclonal developed by late Pr Ikenaka would give a sharper staining than the anti MAG they used. (I assume the clone must still be available in Okazaki ).

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Evidence, reproducibility and clarity

      The manuscript by Egawa and colleagues investigates differences in nodal spacing in an avian auditory brain stem circuit. The results are clearly presented and data are of very high quality. The authors make two main conclusions:

      (1) Node spacing, i.e. internodal length, is intrinsically specified by the oligodendrocytes in the region they are found in, rather than axonal properties (branching or diameter).

      (2) Activity is necessary (we don't know what kind of signaling) for normal numbers of oligodendrocytes and therefore the extent of myelination.

      These are interesting observations, albeit phenomenon. I have only a few criticisms that should be addressed:

      (1) The use of the term 'distribution' when describing the location of nodes is confusing. I think the authors mean rather than the patterns of nodal distribution, the pattern of nodal spacing. They have investigated spacing along the axon. I encourage the authors to substitute node spacing or internodal length for node distribution.

      Thanks for your suggestion to avoid confusion. We used the phrase "nodal spacing" instead of "nodal distribution" throughout the revised manuscript.

      (2) In Seidl et al. (J Neurosci 2010) it was reported that axon diameter and internodal length (nodal spacing) were different for regions of the circuit. Can the authors help me better understand the difference between the Seidl results and those presented here?

      As a key distinction, our study focuses specifically on the main trunk of the contralateral projection of NM axons. This projection features a sequential branching structure known as the delay line, where collateral branches form terminal arbors and connect to the ventral dendritic layer of NL neurons. This structural organization plays a critical role in influencing the dynamic range of ITD detection by regulating conduction delays along the NM axon trunk.

      The study by Seidl et al. (2010) is a pioneering work that measured diameter of NM axon using electron microscopy, providing highly reliable data. However, due to the technical  limitations of electron microscopy, which does not allow for the continuous tracing of individual axons, it is not entirely clear whether the axons measured in the ventral NL region correspond to terminal arbors of collateral branches or the main trunk of NM axons (see Figure 9E, F in their paper). Instead, they categorized axon diameters based on their distance from NL cell layer, showing that axon diameter increases distally (see Figure 9G in their paper). Notably, the diameters of ventral axons located more than 120 μm away from the NL cell layer is almost identical to those in the midline.

      As illustrated in our Figure 4D and Supplementary Video 2, the main trunk of the contralateral NM projection is predominantly located in these distal regions. Therefore, our findings complement those of Seidl et al. (2010) rather than contradicting them. We made this point as clear as possible in text (page 7, line 3).

      (3) The authors looked only in very young animals - are the results reported here applicable only to development, or does additional refinement take place with aging?

      In this study, we examined chick embryos from E9 to just before hatching (E21) and post-hatch chicks up to P9. Chickens begin to perceive sound around E12 and possess sound localization abilities at the time of hatching (Grier et al., 1967) (added to page 4, line 9). Therefore, by E21, the sound localization circuit is largely established.

      On the other hand, additional refinement of the circuit with aging is certainly possible. A key cue for sound localization, interaural time difference (ITD), depends on the distance between the two ears, which increases as the animal grows. As shown in Figure 2G, internodal length increased by approximately 20% between E18 and P9 while maintaining regional differences. Given that NM axons are nearly fully myelinated by E21 (Figure 4D, 6C), this suggests that myelin extends in proportion to the overall growth of the head and brain volume. We described this possibility in text (page 5, line 21)

      Thus, our study covers not only the early stages of myelination but also the post-functional maturation in the sound localization circuit.

      (4) The fact that internodal length is specified by the oligodendrocyte suggests that activity may not modify the location of nodes of Ranvier - although again, the authors have only looked during early development. This is quite different than this reviewer's original thoughts - that activity altered internodal length and axon diameter. Thus, the results here argue against node plasticity. The authors may choose to highlight this point or argue for or against it based on results in adult birds?

      In this study, we demonstrated that although vesicular release did not affect internodal length, it selectively promoted oligodendrogenesis, thereby supporting the full myelination and hence the pattern of nodal spacing along the NM axons. We believe that this finding falls within the broader scope of 'activity-dependent plasticity' involving oligodendrocytes and nodes.

      As summarized in the excellent review by Bonetto et al. (2021), activity-dependent plasticity in oligodendrocytes encompasses a wide range of phenomena, not limited to changes in internodal length but also including oligodendrogenesis. Moreover, the effects of neuronal activity are not uniform but likely depend on the diversity of both neurons and oligodendrocytes. For example, in the mouse visual cortex, activity-dependent myelination occurs in interneurons but not in excitatory neurons (Yang et al., 2020). Additionally, expression of TeNT in axons affected myelination heterogeneously in zebrafish; some axons were impaired in myelination and the others were not affected at all (Koudelka et al., 2016). In the mouse corpus callosum, neuronal activity influences oligodendrogenesis, which in turn facilitates adaptive myelination (Gibson et al., 2014).

      Thus, rather than refuting the role of activity-dependent plasticity in nodal spacing, our findings emphasize the diversity of underlying regulatory mechanisms. We described these explicitly in text (page 10, line 18).

      Significance

      This paper may argue against node plasticity as a mechanism for tuning of neural circuits. Myelin plasticity is a very hot topic right now and node plasticity reflects myelin plasticity. this seems to be a circuit where perhaps plasticity is NOT occurring. That would be interesting to test directly. One limitation is that this is limited to development.

      This paper does not argue against node plasticity, but rather demonstrates that oligodendrocytes in the NL region exhibit a form of plasticity; they proliferate in response to vesicular release from NM axons, yet do not undergo morphological changes, ensuring adequate oligodendrocyte density for the full myelination of the auditory circuit. Thus, activity-dependent plasticity involving oligodendrocytes would contributes in various ways to each neural circuit, which is presumably attributed to the fact that myelination is driven by complex multicellular interactions between diverse axons and oligodendrocytes. Oligodendrocytes are known to exhibit heterogeneity in morphology, function, responsiveness, and gene profiles (Foerster et al., 2019; Sherafat et al., 2021; Osanai et al., 2022; Valihrach et al., 2022), but functional significance of this heterogeneity remains largely unclear. This paper also provides insight into how oligodendrocyte heterogeneity may contribute to the fine-tuning of neural circuit function, adding further value to our findings. Importantly, our study covers the wide range of development in the sound localization circuit, from the pre-myelination (E9) to the postfunctional maturation (P9), revealing how the nodal spacing pattern along the axon in this circuit emerges and matures.

      Reviewer #2:

      Evidence, reproducibility and clarity

      Egawa et al describe the developmental timeline of the assembly of nodes of Ranvier in the chick brainstem auditory circuit. In this unique system, the spacing between nodes varies significantly in different regions of the same axon from early stages, which the authors suggest is critical for accurate sound localization. Egawa et al set out to determine which factors regulate this differential node spacing. They do this by using immunohistological analyses to test the correlation of node spacing with morphological properties of the axons, and properties of oligodendrocytes, glial cells that wrap axons with the myelin sheaths that flank the nodes of Ranvier. They find that axonal structure does not vary significantly, but that oligodendrocyte density and morphology varies in the different regions traversed by these axons, which suggests this is a key determinant of the region-specific differences in node density and myelin sheath length. They also find that differential oligodendrocyte density is partly determined by secreted neuronal signals, as (presumed) blockage of vesicle fusion with tetanus toxin reduced oligodendrocyte density in the region where it is normally higher. Based on these findings, the authors propose that oligodendrocyte morphology, myelin sheath length, and consequently nodal distribution are primarily determined by intrinsic oligodendrocyte properties rather than neuronal factors such as activity.

      Major points, detailed below, need to be addressed to overcome some limitations of the study.

      Major comments:

      (1) It is essential that the authors validate the efficiency of TeNT to prove that vesicular release is indeed inhibited, to be able to make any claims about the effect of vesicular release on oligodendrogenesis/myelination.

      eTeNT is a widely used genetically encoded silencing tool and constructs similar to the one used in this study have been successfully applied in primates and rodents to suppress target behaviors via genetic dissection of specific pathways (Kinoshita et al., 2012; Sooksawate et al., 2013). However, precisely quantifying the extent of vesicular release inhibition from NM axons in the brainstem auditory circuit is technically problematic.

      One major limitation is that while A3V efficiently infects NM neurons, its transduction efficiency does not reach 100%. In electrophysiological evaluations, NL neurons receive inputs from multiple NM axons, meaning that responses may still include input from uninfected axons. Additionally, failure to evoke synaptic responses could either indicate successful silencing or failure to stimulate NM axons, making a clear distinction difficult. Furthermore, unlike in motor circuits, we cannot assess the effect of silencing by observing behavioral outputs.

      Thus, we instead opted to quantify the precise expression efficiency of GFP-tagged eTeNT in the cell bodies of NM neurons. The proportion of NM neurons expressing GFP-tagged eTeNT was 89.7 ± 1.6% (N = 6 chicks), which is consistent with previous reports evaluating A3V transduction efficiency in the brainstem auditory circuit (Matsui et al., 2012). These results strongly suggest that synaptic transmission from NM axons was globally silenced by eTeNT at the NL region. We described these explicitly in text (page 8, line 2).

      (2) Related to 1, can the authors clarify if their TeNT expression system results in the whole tract being silenced? It appears from Fig. 6 that their approach leads to sparse expression of TeNT in individual neurons, which enables them to measure myelination parameters. Can the authors discuss how silencing a single axon can lead to a regional effect in oligodendrocyte number?

      Figure 6D depicts a representative axon selected from a dense population of GFP-positive axons in a 200-μm-thick slice after A3V-eTeNT infection to bilateral NM. As shown in Supplementary Video 1 and 2, densely labeled GFP-positive axons can be traced along the main trunk. To prevent any misinterpretation, we have revised the description of Figure 6 in the main text and Figure legend (page 31, line 9), and stated the A3V-eTeNT infection efficiency was 89.7 ± 1.6% in NM neurons, as mentioned above. Based on this efficiency, we interpreted that the global occlusion of vesicular release from most of the NM axons altered the pericellular microenvironment of the NL region, which led to the regional effect on the oligodendrocyte density.

      On the other hand, your question regarding whether sparse expression of eTeNT still has an effect is highly relevant. As we also discussed in our reply to comment 4 by Reviewer #1, the relationship between neuronal activity and oligodendrocytes is highly diverse. In some types of axons, vesicular release is essential for normal myelination, and this process was disrupted by TeNT (Koudelka et al., 2016), suggesting that direct interaction with oligodendrocytes via vesicle release may actively promote myelination in these types of axons.

      To clarify whether the phenotype observed in Figure 6 arises from changes in the pericellular microenvironment at the NL region or from the direct suppression of axon-oligodendrocyte interactions, we included a new Supplementary Figure (Figure 6—figure supplement 1). In this figure, we evaluated the node formation on the axon sparsely expressing eTeNT by electroporation into the unilateral NM. The results showed that sparse eTeNT expression did not increase the percentages of heminodes or unmyelinated segments. This finding supports our conclusion that the increased unmyelinated segments by A3V-eTeNT resulted from impaired synaptic transmission at NM terminals and subsequent alterations of  pericellular microenvironment at the NL region.

      (3) The authors need to fully revise their statistical analyses throughout and supply additional information that is needed to assess if their analyses are adequate:

      Thank you for your valuable suggestions to improve the rigor of our statistical analyses. We have reanalyzed all statistical tests using R software. In the revised Methods section and Figure Legends, we have clarified the rationale for selecting each statistical test, specified which test was used for each figure, and explicitly defined both n and N. After reevaluation with the Shapiro-Wilk test, we adjusted some analyses to non-parametric tests where appropriate. However, these adjustments did not alter the statistical significance of our results compared to the original analyses.

      (3.1) the authors use a variety of statistical tests and it is not always obvious why they chose a particular test. For example, in Fig. 2G they chose a Kruskal-Wallis test instead of a two-way ANOVA or MannWhitney U test, which are much more common in the field. What is the rationale for the test choice?

      We have revised the explanation of our statistical test choices to provide greater clarity and precision. For example, in Figure 2G, we first assessed the normality of the data in each of the four groups using the Shapiro-Wilk test, which revealed that some datasets did not follow a normal distribution. Given this, we selected the Kruskal-Wallis test, a commonly used non-parametric test for comparisons across three or more groups. Since the Kruskal-Wallis test indicated a significant difference, we conducted a post hoc Steel-Dwass test to determine which specific group comparisons were statistically significant.

      (3.2) in some cases, the choice of test appears wholly inappropriate. For example, in Fig. 3H-K, an unpaired t-test is inappropriate if the two regions were analysed in the same samples. In Fig. 5, was a ttest used for comparisons between multiple groups in the same dataset? If so, an ANOVA may be more appropriate.

      In the case of Figures 3H-K, we compared oligodendrocyte morphology between regions. However, since the number of sparsely labeled oligodendrocytes differs both between regions and across individual samples, there is no strict correspondence between paired measurements. On the other hand, in Figures 5B, C, and E, we compared the density of labeled cells between regions within the same slice, establishing a direct correspondence between paired data points. For these comparisons, we appropriately used a paired t-test.

      (3.3) in some cases, the authors do not mention which test was used (Fig 3: E-G no test indicated, despite asterisks; G/L/M - which regression test that was used? What does r indicate?)

      We have specified the statistical tests used for each figure in the Methods section and Figure Legends for better clarity. Additionally, we have revised the descriptions for Figure 4G, L, and M and their corresponding Figure Legends to explicitly indicate that Spearman’s rank correlation coefficient (rₛ) was used for evaluation.

      (3.4) more concerningly, throughout the results, data may have been pseudo-replicated. t-tests and ANOVAs assume that each observation in a dataset is independent of the other observations. In figures 1-4 and 6 there is a very large "n" number, but the authors do not indicate what this corresponds to. This leaves it open to interpretation, and the large values suggest that the number of nodes, internodal segments, or cells may have been used. These are not independent experimental units, and should be averaged per independent biological replicate - i.e. per animal (N).

      We have now clarified what “n” represents in each figure, as well as the number of animals (N) used in each experiment, in the Figure Legends.

      In this study, developmental stages of chick embryos were defined by HH stage (Hamburger and Hamilton, 1951), minimizing individual variability. Additionally, since our study focuses on the distribution of morphological characteristics of individual cells, averaging measurements per animal would obscure important cellular-level variability and potentially mislead interpretation of data. Furthermore, we employed a strategy of sparse genetic labeling in many experiments, which naturally results in variability in the number of measurable cells per animal. Given the clear distinctions in our data distributions, we believe that averaging per biological replicate is not essential in this case.

      To further ensure the robustness of our statistical analysis, data presented as boxplots were preliminarily assessed using PlotsOfDifferences, a web-based application that calculates and visualizes effect sizes and 95% confidence intervals based on bootstrapping (https://huygens.science.uva.nl/PlotsOfDifferences/; https://doi.org/10.1101/578575). Effect sizes can serve as a valuable alternative to p-values (Ho, 2018; https://www.nature.com/articles/s41592019-0470-3). The significant differences reported in our study are also supported by clear differences in effect sizes, ensuring that our conclusions remain robust regardless of the statistical approach used.

      If requested, we would be happy to provide PlotsOfDifferences outputs as supplementary source data files, similar to those used in eLife publications, for each figure.

      (3.5) related to the pseudo-replication issue, can the authors include individual datapoints in graphs for full transparency, per biological replicates, in addition or in alternative to bar-graphs (e.g. Fig. 5 and 6).

      We have now incorporated individual data points into the bar graphs in Figures 5 and 6.

      (4) The main finding of the study is that the density of nodes differs between two regions of the chicken auditory circuit, probably due to morphological differences in the respective oligodendrocytes. Can the authors discuss if this finding is likely to be specific to the bird auditory circuit?

      The morphological differences of oligodendrocytes between white and gray matter are well established (i.e. shorter myelin at gray matter), but their correspondence with the nodal spacing pattern along the long axonal projections of cortical neurons is not well understood. Future research may find similarities with our findings. Additionally, as mentioned in the final section of the Discussion, the mammalian brainstem auditory circuit is functionally analogous to the avian ITD circuit. Regional differences in nodal spacing along axons have also been observed in the mammalian system, raising the important question of whether these differences are supported by regional heterogeneity in oligodendrocytes. Investigating this possibility will facilitate our understanding of the underlying logic and mechanisms for determining node spacing patterns along axons, as well as provide valuable insights into evolutionary convergence in auditory processing mechanisms. We described these explicitly in text (page 11, line 34).

      (5) Provided the authors amend their statistical analyses, and assuming significant differences remain as shown, the study shows a correlation (but not causation) between node spacing and oligodendrocyte density, but the authors did not manipulate oligodendrocyte density per se (i.e. cell-autonomously). Therefore, the authors should either include such experiments, or revise some of their phrasing to soften their claims and conclusions. For example, the word "determine" in the title could be replaced by "correlate with" for a more accurate representation of the work. Similar sentences throughout the main text should be amended.

      As you summarized in your comment, our results demonstrated that A3V-eTeNT suppressed oligodendrogenesis in the NL region, leading to a reduction in oligodendrocyte density (Figures 6L, M), which caused the emergence of unmyelinated segments. While this is an indirect manipulation of oligodendrocyte density, it nonetheless provides evidence supporting a causal relationship between oligodendrocyte density and nodal spacing.

      The emergence of unmyelinated segments at the NL region further suggests that the myelin extension capacity of oligodendrocytes differs between regions, highlighting regional differences in intrinsic properties of oligodendrocyte as the most prominent determinant of nodal spacing variation. However, as you correctly pointed out, our findings do not establish direct causation.

      In the future, developing methods to artificially manipulate myelin length could provide a more definitive demonstration of causality. Given these considerations, we have modified the title to replace "determine" with "underlie", ensuring that our conclusions are presented with appropriate nuance.

      (6) The authors fail to introduce, or discuss, very pertinent prior studies, in particular to contextualize their findings with:

      (6.1) known neuron-autonomous modes of node formation prior to myelination, e.g. Zonta et al (PMID 18573915); Vagionitis et al (PMID 35172135); Freeman et al (PMID 25561543)

      (6.2) known effects of vesicular fusion directly on myelinating capacity and oligodendrogenesis, e.g. Mensch et al (PMID 25849985)

      (6.3) known correlation of myelin length and thickness with axonal diameter, e.g. Murray & Blakemore (PMID 7012280); Ibrahim et al (PMID 8583214); Hildebrand et al (PMID 8441812).

      (6.4) regional heterogeneity in the oligodendrocyte transcriptome (page 9, studies summarized in PMID 36313617)

      Thank you for your insightful suggestions. We have incorporated the relevant references you provided and revised the manuscript accordingly to contextualize our findings within the existing literature.

      Minor comments:

      (7) Can the authors amend Fig. 1G with the correct units of measurement, not millimetres.

      Response: 

      Thank you for your suggestion. We have corrected the units in Figure 1G to µm

      (8) The Olig2 staining in Fig 2C does not appear to be nuclear, as would be expected of a transcription factor and as is well established for Olig2, but rather appears to be excluded from the nucleus, as it is in a ring or donut shape. Can the authors comment on this?

      Oligodendrocytes and OPCs have small cell bodies, often comparable in size to their nuclei. The central void in the ring-like Olig2 staining pattern appears too small to represent the nucleus. Additionally, a similar ring-like appearance is observed in BrdU labeling (Figure 5G), suggesting that this staining pattern may reflect nuclear morphology or other structural features.

      Significance

      In our view the study tackles a fundamental question likely to be of interest to a specialized audience of cellular neuroscientists. This descriptive study is suggestive that in the studied system, oligodendrocyte density determines the spacing between nodes of Ranvier, but further manipulations of oligodendrocyte density per se are needed to test this convincingly.

      The main finding of our study is that the primary determinant of the biased nodal spacing pattern in the sound localization circuit is the regional heterogeneity in the morphology of oligodendrocytes due to their intrinsic properties (e.g., their ability to produce and extend myelin sheaths) rather than the density of the cells. This was based on our observations that a reduction of oligodendrocyte density by A3V-eTeNT expression caused unmyelinated segments but did not increase internodal length (Figure 6), further revealing the importance of oligodendrocyte density in ensuring full myelination for the axons with short internodes. Thus, we think that our study could propose the significance of oligodendrocyte heterogeneity in the circuit function as well as in the nodal spacing using experimental manipulation of oligodendrocyte density. 

      Reviewer #3:

      Evidence, reproducibility and clarity

      The authors have investigated the myelination pattern along the axons of chick avian cochlear nucleus. It has already been shown that there are regional differences in the internodal length of axons in the nucleus magnocellularis. In the tract region across the midline, internodes are longer than in the nucleus laminaris region. Here the authors suggest that the difference in internodal length is attributed to heterogeneity of oligodendrocytes. In the tract region oligodendrocytes would contribute longer myelin internodes, while oligodendrocytes in the nucleus laminaris region would synthesize shorter myelin internodes. Not only length of myelin internodes differs, but also along the same axon unmyelinated areas between two internodes may vary. This is an interesting contribution since all these differences contribute to differential conduction velocity regulating ipsilateral and contralateral innervation of coincidence detector neurons. However, the demonstration falls rather short of being convincing. I have some major concerns:

      (1) The authors neglect the possibility that nodal cluster may be formed prior to myelin deposition. They have investigated stages E12 (no nodal clusters) and E15 (nodal cluster plus MAG+ myelin). Fig. 1D is of dubious quality. It would be important to investigate stages between E12 and E15 to observe the formation of pre-nodes, i.e., clustering of nodal components prior to myelin deposition.

      Thank you for your insightful comment regarding the potential role of pre-nodal clusters in determining internodal length. Indeed, studies in zebrafish have suggested that pre-nodal clustering of node components prior to myelination may prefigure internodal length (Vagionitis et al., 2022). We have incorporated a discussion on whether such pre-nodal clusters could contribute to regional differences in nodal spacing in our manuscript (page 9, line 35).

      Whether pre-nodal clusters are detectable before myelination appears to depend on neuronal subpopulation (Freeman et al., 2015). To investigate the presence of pre-nodal clusters along NM axons in the brainstem auditory circuit, we previously attempted to visualize AnkG signals at E13 and E14. However, we did not observe clear structures indicative of pre-nodal clusters; instead, we only detected sparse fibrous AnkG signals with weak Nav clustering at their ends, consistent with hemi-node features. This result does not exclude the possibility of pre-nodal clusters on NM axons, as the detection limit of immunostaining cannot be ruled out. In brainstem slices, where axons are densely packed, nodal molecules are expressed at low levels across a wide area, leading to a high background signal in immunostaining, which may mask weak pre-nodal cluster signals prior to myelination. Regarding the comment on Figure 1D, we assume you are referring to Figure 2D based on the context. The lack of clarity in the high-magnification images in Figure 2D results from both the high background signal and the limited penetration of the MAG antibody. Furthermore, we are unable to verify Neurofascin accumulation at pre-nodal clusters, as there is currently no commercially available antibody suitable for use in chickens, despite our over 20 years of efforts to identify one for AIS research. Therefore, current methodologies pose significant challenges in visualizing pre-nodal clusters in our model. Future advancements, such as exogenous expression of fluorescently tagged Neurofascin at appropriate densities or knock-in tagging of endogenous molecules, may help overcome these limitations.

      However, a key issue to be discussed in this study is not merely the presence or absence of prenodal clusters, but rather whether pre-nodal clusters—if present—would determine regional differences in internodal length. To address this possibility, we have added new data in Figure 6I, measuring the length of unmyelinated segments that emerged following A3V-eTeNT expression.

      If pre-nodal clusters were fixed before myelination and predetermined internodal length, then the length of unmyelinated segments should be equal to or a multiple of the typical internodal length. However, our data showed that unmyelinated segments in the NL region were less than half the length of the typical NL internodal length, contradicting the hypothesis that fixed pre-nodal clusters determine internodal length along NM axons in this region.

      (2) The claim that axonal diameter is constant along the axonal length need to be demonstrated at the EM level. This would also allow to measure possible regional differences in the thickness of the myelin sheath and number of myelin wraps.

      As mentioned in our reply to comment 2 by Reviewer #1, the diameter of NM axons was already evaluated using electron microscopy (EM) in the pioneering study by Seidl et al., (2010). Additionally, EM-based analysis makes it difficult to clearly distinguish between the main trunk of NM axons and thin collateral branches at the NL region. Accordingly, we did not do the EM analysis in this revision. 

      In Figure 4, we used palGFP, which is targeted to the cell membrane, allowing us to measure axon diameter by evaluating the distance between two membrane signal peaks. This approach minimizes the influence of the blurring of fluorescence signals on diameter measurements. Thus, we believe that our method is sufficient to evaluate the relative difference in axon diameters between regions and hence to show that axon diameter is not the primary determinant of the 3-fold difference in internodal length between regions. 

      (3) The observation that internodal length differs is explain by heterogeneity of sources of oligodendrocyte is not convincing. Oligodendrocytes a priori from the same origin remyelinate shorter internode after a demyelination event.

      The heterogeneity in oligodendrocyte morphology would reflect differences in gene profiles, which, in turn, may arise from differences in their developmental origin and/or pericellular microenvironment of OPCs. We made this point as clear as possible in Discussion (page 9, line 21).

      Significance

      The authors suggest that the difference in internodal length is attributed to heterogeneity of oligodendrocytes. In the tract region oligodendrocytes would contribute longer myelin internodes, while oligodendrocytes in the nucleus laminaris region would synthesize shorter myelin internodes. Not only length of myelin internodes differs, but also along the same axon unmyelinated areas between two internodes may vary. This is an interesting contribution since all these differences contribute to differential conduction velocity regulating ipsilateral and contralateral innervation of coincidence detector neurons.

    1. eLife Assessment

      This valuable study investigates the self-assembly activity of death-fold domains. The data collected using advanced microscopy and distributed amphifluoric FRET-based flow cytometry methods provide solid evidence for the conclusions, although the interpretations based on these conclusions remain speculative in some cases. This paper is broad interest to those studying a variety of biological pathways involved in inflammatory responses and various forms of cell death.

    2. Reviewer #1 (Public review):

      Summary:

      This is a high-quality and extensive study that reveals differences in the self-assembly properties of the full set of 109 human death fold domains (DFDs). Distributed amphifluoric FRET (DAmFRET) is a powerful tool that reveals the self-assembly behaviour of the DFDs, in non-seeded and seeded contexts, and allows comparison of the nature and extent of self-assembly. The nature of the barriers to nucleation is revealed in the transition from low to high AmFRET. Alongside analysis of the saturation concentration and protein concentration in the absence of seed, the subset of proteins that exhibited discontinuous transitions to higher-order assemblies was observed to have higher concentrations than DFDs that exhibited continuous transitions. The experiments probing the ~20% of DFDs that exhibit discontinuous transition to polymeric form suggest that they populate a metastable, supersaturated form in the absence of cognate signal. This is suggestive of a high intrinsic barrier to nucleation.

      Strengths:

      The differences in self-assembly behaviour are significant and likely identify mechanistic differences across this large family of signalling adapter domains. The work is of high quality, and the evidence for a range of behaviours is strong. This is an important and useful starting point since the different assembly mechanisms point towards specific cellular roles. However, understanding the molecular basis for these differences will require further analysis.

      An impressive optogenetic approach was engineered and applied to initiate self-assembly of CASP1 and CASP9 DFDs, as a model for apoptosome initiation in these two DFDs with differing continuous or discontinuous assembly properties. This comparison revealed clear differences in the stability and reversibility of the assemblies, supporting the hypothesis that supersaturation-mediated DFD assembly underlies signal amplification in at least some of the DFDs.

      The study reveals interesting correlations between supersaturation of DFD adapters in short- and long-lived cells, suggestive of a relationship between the mechanism of assembly and cellular context. Additionally, the comprehensive nature of the study provides strong evidence that the interactions are almost all homomeric or limited to members of the same DFD subfamily or interaction network. Similar approaches with bacterial proteins from innate immunity operons suggest that their polymerisation may be driven by similar mechanisms.

      Weaknesses:

      Only a limited investigation of assembly morphology was conducted by microscopy. There was a tendency for discontinuous structures to form fibrillar structures and continuous to populate diffuse or punctate structures, but there was overlap across all categories, which is not fully explored. The methodology used to probe oligomeric assembly and stability (SDD-AGE) does not justify the conclusions drawn regarding stability and native structure within the assemblies.

      The work identifies important differences between DFDs and clearly different patterns of association. However, most of the detailed analysis is of the DFDs that exhibit a discontinuous transition, and important questions remain about the majority of other DFDs and why some assemblies should be reversible and others not, and about the nature of signalling arising from a continuous transition to polymeric form.

      Some key examples of well-studied DFDs, such as MyD88 and RIPK,1 deserve more discussion, since they display somewhat surprising results. More detailed exploration of these candidates, where much is known about their structures and the nature of the assemblies from other work, could substantiate the conclusions here and transform some of the conclusions from speculative to convincing.

      The study concludes with general statements about the relationship between stochastic nucleation and mortality, which provide food for thought and discussion but which, as they concede, are highly speculative. The analogies that are drawn with batteries and privatisation will likely not be clearly understood by all readers. The authors do not discuss limitations of the study or elaborate on further experiments that could interrogate the model.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript from Rodriguez Gama et al. proposes several interesting conclusions based on different oligomerization properties of Death-Fold Domains (DFDs) in cells, their natural abundance, and supersaturation properties. These ideas are:<br /> (1) DFDs broadly store the cell's energy by remaining in a supersaturated state;<br /> (2) Cells are constantly in a vulnerable state that could lead to cell death;<br /> (3) The cell's lifespan depends on the supersaturation levels of certain DFDs.

      Overall, the evidence supporting these claims is not completely solid. Some concerns were noted.

      Strengths:

      Systematic analysis of DFD self-assembly and its relationship with protein abundance, supersaturation, cell longevity, and evolution.

      Weaknesses

      (1) On page 2, it is stated, "Nucleation barriers increase with the entropic cost of assembly. Assemblies with large barriers, therefore, tend to be more ordered than those without. Ordered assembly often manifests as long filaments in cells," as a way to explain the observed results that DFDs assemblies that transitioned discontinuously form fibrils, whereas those that transitioned continuously (low-to-high) formed spherical or amorphous puncta. It is unlikely to be able to differentiate between amorphous and structured puncta by conventional confocal microscopy. Some DFDs self-assemble into structured puncta formed by intertwined fibrils. Such fibril nets are more structured and thus should be associated with a higher entropic cost. Therefore, the results in Figure 1B do not seem to agree with the reasoning described.

      (2) Errors for the data shown in Figure 1B would have been very useful to determine whether the population differences between diffuse, punctate, and fibrillar for the continuous (low-to-high) transition are meaningful.

      (3) A main concern in the data shown in Figure 1B and F is that the number of counts for discontinuous compared to continuous is small. Thus, the significance of the results is difficult to evaluate in the context of the broad function of DFDs as batteries, as stated at the beginning of the manuscript.

      (4) The proteins or domains that are self-seeded (Figure 1F) should be listed such that the reader has a better understanding of whether domains or full-length proteins are considered, whether other domains have an effect on self-seeding (which is not discussed), and whether there is repetition.

      (5) The authors indicate an anticorrelation between transcript abundance and Csat based on the data shown in Figure 2B; however, the data are scattered. It is not clear why an anticorrelation is inferred.

      (6) It would be useful to indicate the expected range of degree centrality. The differences observed are very small. This is specifically the case for the BC values. The lack of context and the small differences cast doubts on their significance. It would be beneficial to describe these data in the context of the centrality values of other proteins.

      (7) Page 3 section title: "Nucleation barriers are a characteristic feature of inflammatory signalosome adaptors." This title seems to contradict the results shown in Figure 2D, where full-length CARD9 and CARD11 are classified as sensors, but it has been reported that they are adaptor proteins with key roles in the inflammatory response. Please see the following references as examples: The adaptor protein CARD9 is essential for the activation of myeloid cells through ITAM-associated and Toll-like receptors. Nat Immunol 8, 619-629 (2007), and Mechanisms of Regulated and Dysregulated CARD11 Signaling in Adaptive Immunity and Disease. Front Immunol. 2018 Sep 19;9:2105.

      However, both CARD9 and CARD11 show discontinuous to continuous behavior for the individual DFDs versus full-length proteins, respectively, in contrast to the results obtained for ASC, FADD, etc. FADD plays a key role in apoptosis but shows the same behavior as BCL10 and ASC. However, the manuscript indicates that this behavior is characteristic of inflammatory signalosomes. What is the explanation for adaptor proteins behaving in different ways? This casts doubts about the possibility of deriving general conclusions on the significance of these observations, or the subtitles in the results section seem to be oversimplifications.

      (8) IFI16-PYD displays discontinuous behavior according to Figure S1H; however, it is not included in Figure 2D, but AIM 2 is.

      (9) To demonstrate that "Nucleation barriers facilitate signal amplification in human cells," constructs using APAF1 CARD, NLRC4 CARD, caspase-9 CARD, and a chimera of the latter are used to create what the authors refer to as apoptsomes. Even though puncta are observed, referring to these assemblies as apoptosomes seems somewhat misleading. In addition, it is not clear why the activity of caspase-9 was not measured directly, instead of that of capsae-3 and 7, which could be activated by other means. The polymerization of caspase-1 CARD with NLRC4 CARD, leading to irreversible puncta, could just mean that the polymers are more stable. In fact, not all DFDs form equally stable or identical complexes, which does not necessarily imply that a nucleation barrier facilitates signal amplification. Could this conclusion be an overstatement?

      (10) To demonstrate that "Innate immune adaptors are endogenously supersaturated," it is stated on page 5 that ASC clusters continue to grow for the full duration of the time course and that AIM2-PYD stops growing after 5 min. The data shown in Figure 4F indicate that AIM2-PYD grows after 5 mins, although slowly, and ASC starts to slow down at ~ 13 min. Because ASC has two DFDs, assemblies can grow faster and become bigger. How is this related to supersaturation?

    4. Author response:

      We appreciate constructive feedback from both reviewers. Reviewer 1 provided a very positive assessment and helpful suggestions for clarity, which we will incorporate.

      We also thank Reviewer 2 for their detailed comments. In some instances, their public review raised concerns about specific data or interpretations that are, in fact, already presented and justified in the original manuscript. This feedback has highlighted a need to improve the clarity of our presentation. 

      In our revised manuscript, we will make key information more prominent to prevent further misunderstandings. We will also provide additional statistical validation for our conclusions, additional data from the optogenetic experiments and high throughput imaging, and further elaborate on the behaviors of specific proteins (FADD, MyD88, and RIPK1). We are confident that these revisions will make our findings more transparent and accessible to readers, and we look forward to submitting our revised manuscript.

    1. eLife Assessment

      During the development of the unicellular eukaryote Dictyostelium discoideum, cells aggregate into mounds, which then form protrusions called tips, and the tips then become the front of migrating slugs and the top of fruiting bodies. This valuable study identifies a protein called adenosine deaminase-related growth factor (ADGF) as a key regulator of tip formation, and the authors convincingly show that ADGF catalyses the formation of ammonia from adenosine, allowing ammonia to initiate tip formation, and they then elucidate pathways upstream and downstream from ADGF. The authors discuss the intriguing possibility that mammalian ADGF may also regulate development in a similar manner.

    2. Reviewer #1 (Public review):

      Summary:

      This work shows that a specific adenosine deaminase protein in Dictyostelium generates the ammonia that is required for tip formation during Dictyostelium development. Cells with an insertion in the adgf gene aggregate but do not form tips. A remarkable result, shown by several different ways, is that the adgf mutant can be rescued by exposing the mutant to ammonia gas. The authors also describe other phenotypes of the adgf mutant such as increased mound size, altered cAMP signaling, and abnormal cell type differentiation. It appears that the adgf mutant has defects the expression of a large number of genes, resulting in not only the tip defect but also the mound size, cAMP signaling, and differentiation phenotypes.

      Strengths:

      The data and statistics are excellent.

      Weaknesses:

      The key weakness is understanding why the cells bother to use a diffusible gas like ammonia as a signal to form a tip and continue development. The rescue of the mutant by adding ammonia gas to the entire culture indicates that ammonia conveys no positional information within the mound. By the time the cells have formed a mound, the cells have been starving for several hours, and desperately need to form a fruiting body to disperse some of themselves as spores, and thus need to form a tip no matter what. One can envision that the local ammonia concentration is possibly informing the mound that some minimal number of cells are present (assuming that the ammonia concentration is proportional to the number of cells), but probably even a miniscule fruiting body would be preferable to the cells compared to a mound. This latter idea could be easily explored by examining the fate of the adgf cells in the mound - do they all form spores? Do some form spores? Or perhaps the ADGF is secreted by only one cell type, and the resulting ammonia tells the mound that for some reason that cell type is not present in the mound, allowing some of the cells to transdifferentiate into the needed cell type. Thus elucidating if all or some cells produce ADGF would greatly strengthen this puzzling story.

      Comments on revisions:

      Looks better, but I think you answered my questions (listed as weaknesses in the public review) in the reply to the reviewer but not in the paper. I'd suggest carefully thinking about my questions and addressing them in the Discussion. You did however do all of the things in the paper that were listed as "Recommendations for the authors"