10,000 Matching Annotations
  1. Jul 2025
    1. eLife Assessment

      This useful manuscript addresses some key molecular mechanisms on the neuroprotective roles of soluble TREM2 in neurodegenerative diseases. The study will advance our understanding of TREM2 mutations, particularly on the damaging effect of known TREM2 mutations, and also provides solid evidence why soluble TREM2 can antagonize Aβ aggregation.

    2. Reviewer #1 (Public review):

      In this manuscript, Saeb et al reported the mechanistic roles of the flexible stalk domain in sTREM2 function using molecular dynamics simulations. They have reported some interesting molecular bases explaining why sTREM2 shows protective effects during AD, such as partial extracellular stalk domain promoting binding preference and stabilities of sTREM2 with its ligand even in the presence of known AD-risk mutation, R47H. Furthermore, they found that the stalk domain itself acts as the site for ligand binding by providing an "expanded surface", known as 'Expanded Surface 2' together with the Ig-like domain. Also, they observed no difference in the binding free energy of phosphatidyl-serine with wild TREM2-Ig and mutant TREM2-Ig, which is a bit inconsistent with the previous report with experiment studies by Journal of Biological Chemistry 293, (2018), Alzheimer's and Dementia 17, 475-488 (2021), Cell 160, 1061-1071 (2015).

    3. Reviewer #2 (Public review):

      Significance:

      TREM2 is an immunomodulatory receptor expressed on myeloid cells and microglia in the brain. TREM2 consists of a single immunoglobular (Ig) domain that leads into a flexible stalk, transmembrane helix, and short cytoplasmic tail. Extracellular proteases can cleave TREM2 in its stalk and produce a soluble TREM2 (sTREM2). TREM2 is genetically linked to Alzheimer's disease (AD), with the strongest association coming from an R47H variant in the Ig domain. Despite intense interest, the full TREM2 ligand repertoire remains elusive, and it is unclear what function sTREM2 may play in the brain. The central goal of this paper is to assess the ligand-binding role of the flexible stalk that is generated during the shedding of TREM2. To do this, the authors simulate the behavior of constructs with and without stalk. However, it is not clear why the authors chose to use the isolated Ig domain as a surrogate for full-length TREM2. Additionally, experimental binding evidence that is misrepresented by the authors contradicts the proposed role of the stalk.

      Summary and strengths:

      The authors carry out MD simulations of WT and R47H TREM2 with and without the flexible stalk. Simulations are carried out for apo TREM2 and for TREM2 in complex with various lipids. They compare results using just the Ig domain to results including the flexible stalk that is retained following cleavage to generate sTREM2. The computational methods are well-described and should be reproducible. The long simulations are a strength, as exemplified in Figure 2A where a CDR2 transition happens at ~400-600 ns. The stalk has not been resolved in structural studies, but the simulations suggest the intriguing and readily testable hypothesis that the stalk interacts with the Ig domain and thereby contributes to the stability of the Ig domain and to ligand binding. I suspect biochemists interested in TREM2 will make testing this hypothesis a high priority.

      Comments on latest version:

      The authors have addressed my critiques and carried out additional simulations, as requested. I would upgrade my assessment of the evidence to "solid."

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Review:

      Review #1 (Public review):

      Also, they observed no difference in the binding free energy of phosphatidyl-serine with wild TREM2-Ig and mutant TREM2-Ig, which is a bit inconsistent with the previous report with experiment studies by Journal of Biological Chemistry 293, (2018), Alzheimer's and Dementia 17, 475-488 (2021), Cell 160, 1061-1071 (2015).

      We agree with the reviewer that our results do not fully recapitulate experimental findings and directly note this in the body of our work, particularly given the known limitations of free energy calculations in MD simulations, as outlined in the Limitations section. Our claim is that the loss-of-function effects of the R47H variant extend beyond decreased binding affinities which are likely due to variable binding patterns. We have also re-analyzed and highlighted statistically significant differences in interaction entropies. Ultimately, our claim is that mutational effects extend beyond experimentally confirmed differences in binding affinities.

      Perhaps the authors made significant efforts to run a number of simulations for multiple models, which is nearly 17 microseconds in total; none of the simulations has been repeated independently at least a couple of times, which makes me uncomfortable to consider this finding technically true. Most of the important conclusions that authors claimed, including the opposite results from previous research, have been made on the single run, which raises the question of whether this observation can be reproduced if the simulation has been repeated independently. Although the authors stated the sampling number and length of MD simulations in the current manuscript as a limitation of this study, it must be carefully considered before concluding rather than based on a single run.

      To address this comment, we have added numerous replicates to our simulations of WT and R47H (s)TREM2 without lipids and substantially increased the total simulation time. Each pure protein system now has six total microsecond-long technical replicates. The addition of replicates strengthens the validity of the work and allows us to make stronger novel conclusions than with one simulation alone, particularly for claims regarding the CDR2 loop and sTREM2 stalk.  In our models with phospholipids, running multiple independent biological replicates of the same system offers a more rigorous methodology than simply repeating simulations of the same docked model. This strategy allows us to sample several distinct starting configurations, thereby minimizing biases introduced by docking algorithms and single-model reliance.

      sTREM2 shows a neuroprotective effect in AD, even with the mutations with R47H, as evidenced by authors based on their simulation. sTREM2 is known to bind Aβ within the AD and reduce Aβ aggregation, whereas R47H mutant increases Aβ aggregation. I wonder why the authors did not consider Aβ as a ligand for their simulation studies. As a reader in this field, I would prefer to know the protective mechanism of sTREM2 in Aβ aggregation influenced by the stalk domain.

      Our initial approach for this study used Aβ as a ligand rather than phospholipids. However, we noted the difficulties in simulating Aβ, particularly in choosing relevant Aβ structures and oligomeric states (n-mers). We believe that phospholipids represent an equally pertinent ligand for TREM2, given its critical role in lipid sensing and metabolism. Furthermore, there is growing recognition in the AD research community of the need to move beyond Aβ and focus on other understudied pathological mechanisms.

      In a similar manner, why only one mutation is considered "R47H" for the study? There are more server mutations reported to disrupt tethering between these CDRs, such as T66M. Although this "T66M" is not associated with AD, I guess the stalk domain protective mechanism would not be biased among different diseases. Therefore, it would be interesting to see whether the findings are true for this T66M.

      In most previous studies, the mechanism for CDR destabilization by mutant was explored, like the change of secondary structures and residue-wise interloop interaction pattern. While this is not considered in this manuscript, neither detailed residue-wise interaction that changed by mutant or important for 'ligand binding" or "stalk domain".

      These are both excellent points that deserve extensive investigation, although we note that our paper does include significant protein-protein and protein-ligand interaction mapping that encompasses both the CDR2 loop and stalk, analyses which were not performed in any previous papers. In a separate paper, we explored more detailed residue-wise interactions for the CDR2 loop (Lietzke et al., Alzheimer’s and Dementia, 2025). While R47H is the most common and prolific mutation in literature, an extensive catalog of other mutations is important to explore. To this end, we are currently preparing a separate publication that will explore a larger mutational library and include more detailed sTREM2 analyses. 

      The comparison between the wild and mutant and other different complex structures must be determined by particular statistical calculations to state the observed difference between different structures is significant. Since autocorrelation is one of the major concerns for MD simulation data for predicting statistical differences, authors can consider bootstrap calculations for predicting statistical significance.

      The addition of numerous replicates across systems negates potential effects from autocorrelation and allows us to include standard deviations to critically assess the validity of our claims.

      Review #2 (Public review):

      The authors state that reported differences in ligand binding between the TREM2 and sTREM2 remain unexplained, and the authors cite two lines of evidence. The first line of evidence, which is true, is that there are differences between lipid binding assays and lipid signaling assays. However, signaling assays do not directly measure binding. Secondly, the authors cite Kober et al 2021 as evidence that sTREM2 and TREM2 showed different affinities for Abeta1-42 in a direct binding assay. Unfortunately, when Kober et al measured the binding of sTREM2 and Ig-TREM2 to Abeta they reported statistically identical affinities (Kd = 3.8 {plus minus} 2.9 µM vs 5.1 {plus minus} 3.7 µM) and concluded that the stalk did not contribute measurably to Abeta binding.

      We appreciate the reviewer’s insight and acknowledge the need to clarify our interpretation of Kober et al. (2021). We have adjusted how we cite Kober et al. and reframed the first paragraph in the second results section.

      In line with these findings, our energy calculations reveal that sTREM2 exhibits weaker—but still not statistically significant—binding affinities for phospholipids compared to TREM2. These results suggest that while overall binding affinity might be similar, differences in binding patterns or specific lipid interactions could still contribute to functional differences observed between TREM2 and sTREM2.

      The authors appear to take simulations of the Ig domain (without any stalk) as a surrogate for the full-length, membrane-bound TREM2. They compare the Ig domain to a sTREM2 model that includes the stalk. While it is fully plausible that the stalk could interact with and stabilize the Ig domain, the authors need to demonstrate why the full-length TREM2 could not interact with its own stalk and why the isolated Ig domain is a suitable surrogate for this state.

      We believe that this is a major limitation of all computational work of TREM2 to-date, and of experimental work which only presents the Ig-like domain. This is extensively discussed in the limitations section of our paper and treated carefully throughout the text. We are currently working toward a separate manuscript that will represent the first biologically relevant model of full-length TREM2 in a membrane and will rigorously assess the current paradigm of using the Ig-like domain as an experimental surrogate for TREM2.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Perhaps the authors made significant efforts to run a number of simulations for multiple models, which is nearly 17 microseconds in total; none of the simulations has been repeated independently at least a couple of times, which makes me uncomfortable to consider this finding technically true. Most of the important conclusions that authors claimed, including the opposite results from previous research, have been made on the single run, which raises the question of whether this observation can be reproduced if the simulation has been repeated independently. Although the authors stated the sampling number and length of MD simulations in the current manuscript as a limitation of this study, it must be carefully considered before concluding rather than based on a single run.

      To address this comment, we have added numerous replicates to our simulations of WT and R47H (s)TREM2 without lipids and substantially increased the total simulation time. Each pure protein system now has six total microsecond-long technical replicates. The addition of replicates strengthens the validity of the work and allows us to make stronger novel conclusions than with one simulation alone, particularly for claims regarding the CDR2 loop and sTREM2 stalk.  In our models with phospholipids, running multiple independent biological replicates of the same system offers a more rigorous methodology than simply repeating simulations of the same docked model. This strategy allows us to sample several distinct starting configurations, thereby minimizing biases introduced by docking algorithms and single-model reliance. 

      (2) sTREM2 shows a neuroprotective effect in AD, even with the mutations with R47H, as evidenced by authors based on their simulation. sTREM2 is known to bind Aβ within the AD and reduce Aβ aggregation, whereas R47H mutant increases Aβ aggregation. I wonder why the authors did not consider Aβ as a ligand for their simulation studies. As a reader in this field, I would prefer to know the protective mechanism of sTREM2 in Aβ aggregation influenced by the stalk domain.

      Our initial approach for this study used Aβ as a ligand rather than phospholipids. However, we noted the difficulties in simulating Aβ, particularly in choosing relevant Aβ structures and oligomeric states (n-mers). We believe that phospholipids represent an equally pertinent ligand for TREM2, given its critical role in lipid sensing and metabolism. Furthermore, there is growing recognition in the AD research community of the need to move beyond Aβ and focus on other understudied pathological mechanisms.

      (3) In a similar manner, why only one mutation is considered "R47H" for the study? There are more server mutations reported to disrupt tethering between these CDRs, such as T66M. Although this "T66M" is not associated with AD, I guess the stalk domain protective mechanism would not be biased among different diseases. Therefore, it would be interesting to see whether the findings are true for this T66M.

      (4) In most previous studies, the mechanism for CDR destabilization by mutant was explored, like the change of secondary structures and residue-wise interloop interaction pattern. While this is not considered in this manuscript, neither detailed residue-wise interaction that changed by mutant or important for 'ligand binding" or "stalk domain".

      These are both excellent points that deserve extensive investigation, although we note that our paper does include significant protein-protein and protein-ligand interaction mapping that encompasses both the CDR2 loop and stalk, analyses which were not performed in any previous papers. In a separate paper, we explored more detailed residue-wise interactions for the CDR2 loop (Lietzke et al., Alzheimer’s and Dementia, 2025). While R47H is the most common and prolific mutation in literature, an extensive catalog of other mutations is important to explore. To this end, we are currently preparing a separate publication that will explore a larger mutational library and include more detailed sTREM2 analyses.

      (5) The comparison between the wild and mutant and other different complex structures must be determined by particular statistical calculations to state the observed difference between different structures is significant. Since autocorrelation is one of the major concerns for MD simulation data for predicting statistical differences, authors can consider bootstrap calculations for predicting statistical significance.

      The addition of numerous replicates across systems negates potential effects from autocorrelation and allows us to include standard deviations to critically assess the validity of our claims.

      Reviewer #2 (Recommendations for the authors):

      Major points:

      (1) I encourage the authors to review Figure 5D and the text of section 2.7 from Kober et al 2021, which argued that "(t)he identical (within error) binding affinities indicated that the TREM2 Ig domain composes the majority (if not entirety) of the mAβ42 binding surface."

      We appreciate the reviewer’s insight and acknowledge the need to clarify our interpretation of Kober et al. (2021). We have adjusted how we cite Kober et al and reframed the first paragraph in the second results section.

      (2) The abstract and text need extensive revision to address the major concerns, which jeopardize the biological premise and significance of the work.

      We have made changes to the abstract and text to reflect concerns and revisions.

      (3) The title and abstract should change to reflect the contents of the paper. The authors do not directly measure lipid binding, nor are any of the computations done in a membrane environment. The authors do not measure anything in the brain.

      We have modified the title to better reflect the content of the paper. The paper measures lipid binding in the form of free energy calculations and interaction maps.

      Minor points:

      (1) How does the conservation of the TREM2 stalk compare to the Ig domain as they relate to the TREM2 family?

      While this study may inspire further exploration of other TREM receptors, we do not believe that our results extend to other TREM family members because of relatively low homology.

      (2) Please show the locations of the glycosylation sites on a model in Figure 1 and discuss their potential contribution to the ligand binding surfaces.

      N-linked glycosylation points are now noted on the sequence map of Figure 1 and updated in the text.

      (3) There is an isoform of TREM2 that produces a secreted product that is similar to the sTREM2 produced by proteolysis. The authors should comment as to whether their findings would apply to secreted TREM2.

      We have addressed this with a new line in the ‘Ideas and Speculation’ section.

      (4) This sentence on p. 2, line 73 references a review, not a study:

      This has been corrected.

      (5) "Yet, one study suggested effective TREM2 stimulation by PLs may require co-presentation with other molecules, potentially reflecting the nature of lipoprotein endocytosis30"

      This has been corrected.

      (6) Is "inclusive" on line 88 a typo for inconclusive?

      This has been corrected.

      (7) "Further, there is a strong correlation between the levels of sTREM2 in the cerebrospinal fluid and that of Tau, however correlation with Aβ is inclusive"

      This has been corrected.

    1. eLife Assessment

      This study convincingly demonstrates that odors evoke a feeding response in Drosophila, mediated by gustatory receptors and observed as a proboscis extension. The evidence is comprehensive, encompassing behavior, functional imaging and electrophysiology. This important results on the molecular and cellular basis of multimodal integration across olfaction and gustation will be of interest for the study of chemosensation, sensory biology, and animal behavior.

    2. Reviewer #1 (Public review):

      Summary:

      Odor- and taste-sensing are mediated by two different systems, the olfactory and gustatory systems, and have different behavioral roles. In this study, Wei et al. challenge this dichotomy by showing that odors can activate gustatory receptor neurons (GRNs) in Drosophila to promote feeding responses, including the proboscis extension response (PER) that was previously thought to be driven only by taste. While previous studies suggested that odors can promote PER to appetitive tastants, Wei et al. go further to show that odors alone cause PER, this effect is mediated through sweet-sensing GRNs, and sugar receptors are required. The study also shows that odor detection by bitter-sensing GRNs suppresses PER. The authors' conclusions are supported by behavioral assays, calcium imaging, electrophysiological recordings, and genetic manipulations. The observation that both attractive and aversive odors promote PER leaves an open question as to why this effect is adaptive. Overall, the study sheds new light on chemosensation and multimodal integration by showing that odor and taste detection converge at the level of sensory neurons, a finding that is interesting and surprising while also being supported by another recent study (Dweck & Carlson, Sci Advances 2023).

      Strengths:

      (1) The main finding that odors alone can promote PER by activating sweet-sensing GRNs is interesting and novel.

      (2) The study uses video tracking of the proboscis to quantify PER rather than manual scoring, which is typically used in the field. The tracking method is less subjective and provides a higher-resolution readout of the behavior.

      (3) The study uses calcium imaging and electrophysiology to show that odors activate GRNs. These represent complementary techniques that measure activity at different parts of the GRN (axons versus dendrites, respectively) and strengthen the evidence for this conclusion.

      (4) Genetic manipulations show that odor-evoked PER is primarily driven by sugar GRNs and sugar receptors rather than olfactory neurons. This is a major finding that distinguishes this work from previous studies of odor effects on PER and feeding (e.g., Reisenman & Scott, 2019; Shiraiwa, 2008) that assumed or demonstrated that odors were acting through olfactory neurons.

      Weaknesses/Limitations:

      (1) Many of the odor effects on behavior or neuronal responses were only observed at very high concentrations. Most effects seemed to require concentrations of at least 10^-2 (0.01 v/v), which is at the high end of the concentration range used in olfactory studies (e.g., Hallem et al., 2004), and most experiments in the paper used a far higher concentration of 0.5 v/v. It is unclear whether these are concentrations that would be naturally encountered by flies. In addition, it is difficult to compare the concentrations used for electrophysiology and behavior given that they are presented in solution versus volatile form.

      (2) The timecourse of GRN activation by odors seems quite prolonged (and possibly delayed, depending on the exact timing of odor onset to the fly), and this timecourse is not directly compared with activation by tastes to determine whether it is a property of the calcium sensor or a real difference.

      (3) While the overall effect of different conditions is tested using appropriate statistical methods, post-hoc tests are not always used to determine which specific groups are different from each other (e.g., which odors and concentrations elicit significant PER compared to air or mineral oil controls in Fig. 1; which odors show impaired responses without olfactory organs in Fig. 2A).

      Discrepancies with previous studies:

      These discrepancies are important to note but should not necessarily be considered "weaknesses" of the present study.

      (1) It is not entirely clear why PER to odors alone has not been previously reported, especially as this study shows that it is a broad effect evoked by many different odors. Previous studies (Oh et al., 2021; Reisenman & Scott, 2019; Shiraiwa, 2008) tested the effect of odors on PER and only observed enhancement of PER to sugar rather than odor-evoked PER; some of these studies explicitly show no effect of odor alone or odor with low sugar concentration. In the Response to Reviewers, the authors propose that genetic background may explain discrepancies, but this is not discussed much in the paper itself. Differences in behavioral quantification (automated vs. manual scoring, quantification of PER duration versus probability) may also contribute.

      (2) The calcium imaging data showing that sugar GRNs respond to a broad set of odors contrasts with results from Dweck & Carlson (Sci Adv, 2023) who recorded sugar neurons with electrophysiology and observed responses to organic acids, but not other odors. This discrepancy is mentioned in the Discussion but the underlying reason is not clear.

    3. Reviewer #3 (Public review):

      Summary:

      Using flies, Kazama et al. combined behavioral analysis, electrophysiological recordings, and calcium imaging experiments to elucidate how odors activate gustatory receptor neurons (GRNs) and elicit a proboscis extension response, which is interpreted as a feeding response.

      The authors used DeepLabCut v2.0 to estimate the extension of the proboscis, which represents an unbiased and more precise method for describing this behavior compared to manual scoring.

      They demonstrated that the probability of eliciting a proboscis extension increases with higher odor concentrations. The most robust response occurs at a 0.5 v/v concentration, which, despite being diluted in the air stream, remains a relatively high concentration. Although the probability of response is not particularly high it is higher than control stimuli. Notably, flies respond with a proboscis extension to both odors that are considered positive and those regarded as negative.

      The authors used various transgenic lines to show that the response is mediated by GRNs. Specifically, inhibiting Gr5a reduces the response, while inhibiting Gr66a increases it in fed flies. Additionally, they find that odors induce a strong positive response in both types of GRNs, which is abolished when the labella of the proboscis are covered. This response was also confirmed through electrophysiological tip recordings.

      Finally, the authors demonstrated that the response increases when two stimuli of different modalities, such as sucrose and odors, are presented together, suggesting clear multimodal integration

      Strengths:

      The integration of various techniques, which collectively supports the robustness of the results.<br /> The assessment of electrophysiological recordings in intact animals, preserving natural physiological conditions.

      Weaknesses:

      Only highly concentrated odours are capable of evoking positive responses and, even then, the proportion remains relatively low.

      The authors have incorporated my suggestions.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Review:

      Reviewer #1 (Public review): 

      Summary: 

      Odor- and taste-sensing are mediated by two different systems, the olfactory and gustatory systems, and have different behavioral roles. In this study, Wei et al. challenge this dichotomy by showing that odors can activate gustatory receptor neurons (GRNs) in Drosophila to promote feeding responses, including the proboscis extension response (PER) that was previously thought to be driven only by taste. While previous studies suggested that odors can promote PER to appetitive tastants, Wei et al. go further to show that odors alone cause PER, this effect is mediated through sweet-sensing GRNs, and sugar receptors are required. The study also shows that odor detection by bitter-sensing GRNs suppresses PER. The authors' conclusions are supported by behavioral assays, calcium imaging, electrophysiological recordings, and genetic manipulations. The observation that both attractive and aversive odors promote PER leaves an open question as to why this effect is adaptive. Overall, the study sheds new light on chemosensation and multimodal integration by showing that odor and taste detection converge at the level of sensory neurons, a finding that is interesting and surprising while also being supported by another recent study (Dweck & Carlson, Sci Advances 2023).

      Strengths: 

      (1) The main finding that odors alone can promote PER by activating sweet-sensing GRNs is interesting and novel.

      (2) The study uses video tracking of the proboscis to quantify PER rather than manual scoring, which is typically used in the field. The tracking method is less subjective and provides a higherresolution readout of the behavior.

      (3) The study uses calcium imaging and electrophysiology to show that odors activate GRNs. These represent complementary techniques that measure activity at different parts of the GRN (axons versus dendrites, respectively) and strengthen the evidence for this conclusion. 

      (4) Genetic manipulations show that odor-evoked PER is primarily driven by sugar GRNs and sugar receptors rather than olfactory neurons. This is a major finding that distinguishes this work from previous studies of odor effects on PER and feeding (e.g., Reisenman & Scott, 2019; Shiraiwa, 2008) that assumed or demonstrated that odors were acting through olfactory neurons.

      We appreciate the reviewer’s positive assessment of the novelty and significance of our work.

      Weaknesses/Limitations: 

      (1) The authors may want to discuss why PER to odors alone has not been previously reported, especially as they argue that this is a broad effect evoked by many different odors. Previous studies testing the effect of odors on PER only observed odor enhancement of PER to sugar (Oh et al., 2021; Reisenman & Scott, 2019; Shiraiwa, 2008) and some of these studies explicitly show no effect of odor alone or odor with low sugar concentration; regardless, the authors likely would have noticed if PER to odor alone had occurred. Readers of this paper may also be aware of unpublished studies failing to observe an effect of PER on odor alone (including studies performed by this reviewer and unrelated work by other colleagues in the field), which of course the authors are not expected to directly address but may further motivate the authors to provide possible explanations.

      We appreciate the reviewer’s comment. We believe that the difference in genotype is likely the largest reason behind this point. This is because the strength varied widely across genotypes and was quite weak in some strains including commonly used w[1118] empty Gal4 and w[1118] empty spit Gal4 as shown in Figure1- figure supplement 3 (Figure S3 in original submission). However, given that we observed odor-evoked PER in various genotypes (many in main Figures and three in Figure1- figure supplement 3 including Drosophila simulans), the data illustrate that it is a general phenomenon in Drosophila. Indeed, although Oh et al. (2021) did not emphasize it in the text, their Fig. 1E showed that yeast odor evoked PER at a probability of 20%, which is much higher than the rate of spontaneous PER in many genotypes. Therefore, this literature may represent another support for the presence of odor-evoked PER. We have expanded our text in the Discussion to describe these issues.

      Another possibility is our use of DeepLabcut to quantitatively track the kinematics of proboscis movement, which may have facilitated the detection of PER.

      (2) Many of the odor effects on behavior or neuronal responses were only observed at very high concentrations. Most effects seemed to require concentrations of at least 10-2 (0.01 v/v), which is at the high end of the concentration range used in olfactory studies (e.g., Hallem et al., 2004), and most experiments in the paper used a far higher concentration of 0.5 v/v. It is unclear whether these are concentrations that would be naturally encountered by flies.

      We acknowledge that the concentrations used are on the higher side, suggesting that GRNs may need to be stimulated with relatively concentrated odors to induce PER. Although it is difficult to determine the naturalistic range of odor concentration, it is at least widely reported that olfactory neurons including olfactory receptor neurons and projection neurons do not saturate, and exhibit odor identity-dependent responses at the concentration of 10<sup>-2</sup> where odor-evoked PER can be observed. Furthermore, we have shown in Figure 6 that low concentration (10<sup>-4</sup>) of banana odor, ethyl butyrate, and 4-methycyclohexanol all significantly increased the rate of odor-taste multisensory PER even in olfactory organs-removed flies, suggesting that low concentration odors can influence feeding behavior via GRNs in a natural context where odors and tastants coexist at food sites. Finally, we note that odors were further diluted by a factor of 0.375 by mixing the odor stream with the main air stream before being applied to the flies as described in Methods.

      (3) The calcium imaging data showing that sugar GRNs respond to a broad set of odors contrasts with results from Dweck & Carlson (Sci Adv, 2023) who recorded sugar neurons with electrophysiology and observed responses to organic acids, but not other odors. This discrepancy is not discussed.  

      As the reviewer points out, Dweck and Carlson (Sci Adv, 2023) reported using single sensillum electrophysiology (base recording) that sugar GRNs only respond to organic acids whereas we found using calcium imaging from a group of axons and single sensillum electrophysiology (tip recording) that these GRNs respond to a wide variety of odors. Given that we observed odor responses using two methods, the discrepancy is likely due to the differences in genotype examined. We now have discussed this point in the text.

      (4) Related to point #1, it would be useful to see a quantification of the percent of flies or trials showing PER for the key experiments in the paper, as this is the standard metric used in most studies and would help readers compare PER in this study to other studies. This is especially important for cases where the authors are claiming that odor-evoked PER is modulated in the same way as previously shown for sugar (e.g., the effect of starvation in Figure S4).

      For starved flies, we would like to remind the reviewer that the percentage of trials showing PER is reported in Fig. 1E, which shows a similar trend as the integrated PER duration. For fed flies, we have analyzed the percentage of PER and added the result to Figure 2-figure supplement 1C (Figure S4 in original submission).

      (5) Given the novelty of the finding that odors activate sugar GRNs, it would be useful to show more examples of GCaMP traces (or overlaid traces for all flies/trials) in Figure 3. Only one example trace is shown, and the boxplots do not give us a sense of the reliability or time course of the response. A related issue is that the GRNs appear to be persistently activated long after the odor is removed, which does not occur with tastes. Why should that occur? Does the time course of GRN activation align with the time course of PER, and do different odors show differences in the latency of GRN activation that correspond with differences in the latency of PER (Figure S1A)?

      Following the reviewer’s suggestion, we now report GCaMP responses for all the trials in all the flies (both Gr5a>GCaMP and Gr66a>GCaMP flies), where the time course and trial-to-trial/animal-toanimal variability of calcium responses can be observed (Figure 3-figure supplement 2).

      Regarding the second point, we recorded responses to both sucrose and odors in some flies and found that calcium responses of GRNs are long-lasting not only to odors but also to sucrose, as shown in Author response image 1. This may be due in part to the properties of GCaMP6s and slower decay of intracellular calcium concentration as compared to spikes.

      Author response image 1.

      Example calcium responses to sucrose and odor (MCH) in the same fly (normalized by the respective peak responses to better illustrate the time course of responses). Sucrose (blue) and odor (orange) concentrations are 100 mM, and 10<sup>-1</sup> respectively. Odor stimulation begins at 5 s and lasts for 2 s. Sucrose was also applied at the same timing for the same duration although there was a limitation in controlling the precise timing and duration of tastant application. Because of this limitation, we did not quantify the off time constant of two responses.

      To address whether the time course of GRN activation aligns with the time course of PER, and whether different odors evoke different latencies of GRN activation that correspond to latencies of PER, we plotted the time course of GRN responses and PER, and further compared the response latencies across odors and across two types of responses in Gr5a>GCaMP6s flies. As shown in Author response image 2, no significant differences were found in response latency between the six odors for PER and odor responses. Furthermore, Pearson correlation between GRN response latencies and PER latencies was not significant (r = 0.09, p = 0.872).

      Author response image 2.

      (A) PER duration in each second in Gr5a-Gal4>UAS-GCaMP6s flies. The black lines indicate the mean and the shaded areas indicate standard error of the mean. n = 25 flies. (B) Time course of calcium responses (ΔF/F) to nine odors in Gr5a GRNs. n = 5 flies. (C) Latency to the first odor-evoked PER in Gr5a-Gal4>UAS-GCaMP6s flies. Green bar indicates the odor application period. p = 0.67, one-way ANOVA. Box plots indicate the median (orange line), mean (black dot), quartiles (box), and 5-95% range (bar). Dots are outliers. (D) Latency of calcium responses (10% of rise to peak time) in Gr5a GRNs. Green bar indicates the odor application period. p = 0.32, one-way ANOVA. Box plots indicate the median (orange line), mean (black dot), quartiles (box), and 5-95% range (bar). Dots are outliers.

      (6) Several controls are missing, and in some cases, experimental and control groups are not directly compared. In general, Gal4/UAS experiments should include comparisons to both the Gal4/+ and UAS/+ controls, at least in cases where control responses vary substantially, which appears to be the case for this study. These controls are often missing, e.g. the Gal4/+ controls are not shown in Figure 2C-G and the UAS/+ controls are not shown in Figure 2J-L (also, the legend for the latter panels should be revised to clarify what the "control" flies are). For the experiments in Figure S5, the data are not directly compared to any control group. For several other experiments, the control and experimental groups are plotted in separate graphs (e.g., Figure 2C-G), and they would be easier to visually compare if they were together. In addition, for each experiment, the authors should denote which comparisons are statistically significant rather than just reporting an overall p-value in the legend (e.g., Figure 2H-L).

      We thank the reviewer for the input. We have conducted additional experiments for four Gal4/+controls in Figure 2 and added detailed information about control flies in the figure legend (Figure 2C-F).

      For the RNAi flies shown in Figure 2 and Figure 2-figure supplement 3, we used the recommended controls suggested by the VDRC. These control flies were crossed with tubulin-Gal4 lines to include both Gal4 and UAS control backgrounds.

      Regarding Figure S5 in original submission (current Figure 2-figure supplement 2), we now present the results of statistical tests which revealed that PER to certain odors is statistically significantly stronger than that to the solvent control (mineral oil) for both wing-removed and wing-leg-removed flies.

      For Figure 2C-F, we now plot the results for experimental and control groups side by side in each figure.

      Regarding the results of statistical tests, we have provided more information in the legend and also prepared a summary table (supplemental table). 

      (7) Additional controls would be useful in supporting the conclusions. For the Kir experiments, how do we know that Kir is effective, especially in cases where odor-evoked PER was not impaired (e.g., Orco/Kir)? The authors could perform controls testing odor aversion, for example. For the Gr5a mutant, few details are provided on the nature of the control line used and whether it is in the same genetic background as the mutant. Regardless, it would be important to verify that the Gr5a mutant retains a normal sense of smell and shows normal levels of PER to stimuli other than sugar, ruling out more general deficits. Finally, as the method of using DeepLabCut tracking to quantify PER was newly developed, it is important to show the accuracy and specificity of detecting PER events compared to manual scoring.  

      A previous study (Sato, 2023, Front Mol Neurosci) showed that the avoidance to 100 μM 2methylthiazoline was abolished, and the avoidance to 1 mM 2MT was partially impaired in Orco>Kir2.1 flies. However, because Orco-Gal4 does not label all the ORNs and we have more concrete results on flies in which all the olfactory organs are removed as well as specific GRNs and Gr are manipulated, we decided to remove the data for Orco>kir2.1 flies and have updated the text and Figure 2 accordingly.

      For the Gr5a mutant and its control, we have added detailed information about the genotype in the figure legend and in the Methods. We have used the exact same lines as reported in Dahanukar et al. (2007) by obtaining the lines from Dr. Dahanukar. Dahanukar et al. has already carefully examined that Gr5a mutant loses responses only to certain types of sugars (e.g. it even retains normal responses to some other sugars), demonstrating that Gr5a mutants do not exhibit general deficits.

      As for the PER scoring method, we manually scored PER duration and compared the results with those obtained using DeepLabCut in wild type flies for the representative data. The two results were similar (no statistical difference). We have reported the result in Figure1-figure supplement 1C.

      (8) The authors' explanation of why both attractive and aversive odors promote PER (lines 249-259) did not seem convincing. The explanation discusses the different roles of smell and taste but does not address the core question of why it would be adaptive for an aversive odor, which flies naturally avoid, to promote feeding behavior.  

      We have extended our explanation in the Discussion by adding the following possibility: “Enhancing PER to aversive odors might also be adaptive as animals often need to carry out the final check by tasting a trace amount of potentially dangerous substances to confirm that those should not be further consumed.”

      Reviewer #2 (Public review): 

      Summary: 

      A gustatory receptor and neuron enhances an olfactory behavioral response, proboscis extension. This manuscript clearly establishes a novel mechanism by which a gustatory receptor and neuron evokes an olfactory-driven behavioral response. The study expands recent observations by Dweck and Carlson (2023) that suggest new and remarkable properties among GRNs in Drosophila. Here, the authors articulate a clear instance of a novel neural and behavioral mechanism for gustatory receptors in an olfactory response.

      Strengths: 

      The systematic and logical use of genetic manipulation, imaging and physiology, and behavioral analysis makes a clear case that gustatory neurons are bona fide olfactory neurons with respect to proboscis extension behavior.

      Weaknesses: 

      No weaknesses were identified by this reviewer.  

      We appreciate the reviewer’s recognition of the novelty and significance of our work.

      Reviewer #3 (Public review): 

      Summary: 

      Using flies, Kazama et al. combined behavioral analysis, electrophysiological recordings, and calcium imaging experiments to elucidate how odors activate gustatory receptor neurons (GRNs) and elicit a proboscis extension response, which is interpreted as a feeding response. 

      The authors used DeepLabCut v2.0 to estimate the extension of the proboscis, which represents an unbiased and more precise method for describing this behavior compared to manual scoring.

      They demonstrated that the probability of eliciting a proboscis extension increases with higher odor concentrations. The most robust response occurs at a 0.5 v/v concentration, which, despite being diluted in the air stream, remains a relatively high concentration. Although the probability of response is not particularly high it is higher than control stimuli. Notably, flies respond with a proboscis extension to both odors that are considered positive and those regarded as negative.

      The authors used various transgenic lines to show that the response is mediated by GRNs.

      Specifically, inhibiting Gr5a reduces the response, while inhibiting Gr66a increases it in fed flies. Additionally, they find that odors induce a strong positive response in both types of GRNs, which is abolished when the labella of the proboscis are covered. This response was also confirmed through electrophysiological tip recordings.

      Finally, the authors demonstrated that the response increases when two stimuli of different modalities, such as sucrose and odors, are presented together, suggesting clear multimodal integration.

      Strengths: 

      The integration of various techniques, that collectively support the robustness of the results.

      The assessment of electrophysiological recordings in intact animals, preserving natural physiological conditions.

      We appreciate the reviewer’s recognition of the novelty and significance of our work.

      Weaknesses: 

      The behavioral response is observed in only a small proportion of animals.  

      We acknowledge that the probability of odor-evoked PER is lower compared to sucrose-evoked PER, which is close to 100 % depending on the concentration. To further quantify which proportion of animals exhibit odor-evoked PER, we now report this number besides the probability of PER for each odor shown in Fig. 1E. We found that, in wild type Dickinson flies, 73% and 68 % of flies exhibited PER to at least one odor presented at the concentration of 0.5 and 0.1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      Minor comments/suggestions: 

      - Define "MO" in Figure 1D.  

      We have defined it as mineral oil in the figure legend.

      - Clarify how peak response was calculated for GCaMP traces (is it just the single highest frame per trial?).

      We extended the description in the Methods as follows: “The peak stimulus response was quantified by averaging ΔF/F across five frames at the peak, followed by averaging across three trials for each stimulus. Odor stimulation began at frame 11, and the frames used for peak quantification were 12 to 16.” We made sure that information about the image acquisition frame rate was provided earlier in the text.

      - Clarify how the labellum was covered in Figure 3 and show that this does not affect the fly's ability to do PER (e.g., test PER to sugar stimulation on tarsus) - otherwise one might think that gluing the labella could affect PER.

      In Figure 3, only calcium responses were recorded, and PER was not recorded simultaneously from the same flies. To ensure stable recording from GRN axons in the SEZ, we kept the fly’s proboscis in an extended position as gently as possible using a strip of parafilm. In some of the imaging experiments, we covered the labellum with UV curable glue, whose purpose was not to fix the labellum in an extended position but to prevent the odors from interacting with GRNs on the labellum. We have added a text in the Methods to explain how we covered the labellum.

      - Clarify how the coefficients for the linear equation were chosen in Figure 3G.  

      We used linear regression (implemented in Python using scikit-learn) to model the relationship between neural activity and behavior, aiming to predict the PER duration based on the calcium responses of two GRN types, Gr5a and Gr66a. The coefficients were estimated using the LinearRegression function. We added this description to the Methods. 

      - Typo in "L-type", Figure 4A.  

      We appreciate the reviewer for pointing out this error and have corrected it.

      - Clarify over what time period ephys recordings were averaged to obtain average responses.

      We have modified the description in the Methods as follows: “The average firing rate was quantified by using the spikes generated between 200 and 700 ms after the stimulus contact following the convention to avoid the contamination of motion artifact (Dahanukar and Benton, 2023; Delventhal et al., 2014; Hiroi et al., 2002).

      - The data and statistics indicate that MCH does not enhance feeding in Figure 6G, so the text in lines 207-208 is not accurate.

      We have modified the text as follows: “A similar result was observed with ethyl butyrate, and a slight, although not significant, increase was also observed with 4-methylcyclohexanol (Figure 6G).”

      - P-value for Figure S9 correlation is not reported.  

      We appreciate the reviewer for pointing this out. The p-value is 0.00044, and we have added it to the figure legend (current Figure 5-figure supplement 1).

      Reviewer #2 (Recommendations for the authors): 

      Honestly, I have no recommendations for improvement. The manuscript is extremely well-written and logical. The experiments are persuasive. A lapidary piece of work.

      We appreciate the reviewer for the positive assessment of our work.

      Reviewer #3 (Recommendations for the authors): 

      - I suggest explaining the rationale for selecting a 4-second interval, beginning 1 second after the onset of stimulation.

      Integrated PER duration was defined as the sum of PER duration over 4 s starting 1 s after the odor onset. This definition was set based on the following data.

      (1) We used a photoionization detector (PID) to measure the actual time that the odor reaches the position of a tethered fly, which was approximately 1.1 seconds after the odor valve was opened. Therefore, we began analyzing PER responses 1 second after the odor onset (valve opening) to align with the actual timing of stimulation.

      (2) As shown in Fig.1D and 1F, the majority of PER occurred within 4 s after the odor arrival.

      We have now added the above rationale in the Methods.

      - I could not find the statistical analysis for Figures 1E and 1G. If these figures are descriptive, I suggest the authors revise the sentences: 'Unexpectedly, we found that the odors alone evoked repetitive PER without an application of a tastant (Figures 1D-1G, and Movie S1). Different odors evoked PER with different probability (Figure 1E), latency (Figure S1A), and duration (Figures 1F, 1G, and S2)'.

      We have added the results of statistical analysis to the figure legend.

      - In Figure 2, the authors performed a Scheirer-Ray-Hare test, which, to my knowledge, is a nonparametric test for comparing responses across more than two groups with two factors. If this is the case, please provide the p-values for both factors and their interaction

      We now show the p-values for both factors, odor and group as well as their interaction in the supplementary table. 

      - In line 83, I suggest the authors avoid claiming that 'these data show the olfactory system modulates but is not required for odor-evoked PER,' as they are inhibiting most, but not all olfactory receptor neurons. In this regard, is it possible to measure the olfactory response to odors in these flies?  

      We thank the reviewer for the comment. Because Orco-Gal4 does not label all the ORNs and because we have more concrete results on flies in which all the olfactory organs are removed as well as specific GRNs and Gr are manipulated, we decided to remove the data for Orco>kir2.1 flies and have updated the text and Figure 2 accordingly.

      - In Figure 2, I wonder if there are differences in the contribution of various receptors in detecting different odors. A more detailed statistical analysis might help address this question.

      Although it might be possible to infer the contribution of different gustatory receptors by constructing a quantitative model to predict PER, it is a bit tricky because the activity of individual GRNs and not Grs are manipulated in Figure 2 except for Gr5a. The idea could be tested in the future by more systematically manipulating many Grs that are encoded in the fly genome.

      - For Figures 2J-L, please clarify which group serves as the control.  

      We have added this information to the legend. 

      - In Figure 3, I recommend including an air control in panels D and F to better appreciate the magnitude of the response under these conditions.

      The responses to all three controls, air, mineral oil and water, were almost zero. As the other reviewer suggested to present trial-to-trial variability as well, we now show responses to all the controls in all the trials in all the animals tested in Figure 3-figure supplement 2.

      - I had difficulty understanding Figure 3G. Could the authors provide a more detailed explanation of the model?

      We used linear regression (implemented in Python using scikit-learn) to model the relationship between neural activity and behavior, aiming to predict the PER duration based on the calcium responses of two GRN types, Gr5a and Gr66a. The weights for GRNs were estimated using the LinearRegression function. The weight for Gr5a and Gr66a was positive and negative, respectively, indicating that Gr5a contributes to enhance whereas Gr66a contributes to reduce PER.

      To evaluate the model performance, we calculated the coefficient of determination (R<sup>2</sup>), which was 0.81, meaning the model explained 81% of the variance in the PER data.

      The scatter plot in Fig. 3G shows a tight relationship between the predicted PER duration (y-axis) plotted against the actual PER duration (x-axis), demonstrating a strong predictive power of the model.

      We added the details to the Methods.

      - In Figure S4a, the reported p-value is 0.88, which seems to be a typo, as the text indicates that PER is enhanced in a starved state.

      Thank you for pointing this out. We have modified the figure legend to describe that PER was enhanced in a starved state only for the experiments conducted with odors at 10<sup>-1</sup> concentration (current Figure 2-figure supplement 1).

    1. eLife Assessment

      This study reports important new insights into the roles of a long noncoding RNA, lnc-FANCI-2, in the progression of cervical cancer induced by a type of human papillomavirus. Through a blend of cell biological, biochemical, and genetic analyses of RNA and protein expression, protein-protein interaction, cell signaling, and cell morphology, the authors provide convincing evidence that lnc-FANCI-2 affects cervical cancer outcome by regulating the RAS signaling pathway. These findings will be of interest to scientists in the fields of cervical cancer, long noncoding RNA, and cell signaling.

    2. Reviewer #1 (Public review):

      Summary:

      The authors attempted to dissect the function of a long non-coding RNA, lnc-FANCI-2, in cervical cancer. They profiled lnc-FANCI-2 in different cell lines and tissues, generated knockout cell lines, and characterized the gene using multiple assays.

      Strengths:

      A large body of experimental data has been presented and can serve as a useful resource for the scientific community, including transcriptomics and proteomics datasets. The reported results also span different parts of the regulatory network and open up multiple avenues for future research.

      Weaknesses:

      The write-up is somewhat unfocused and lacks deep mechanistic insights in some places.

      Comments on revisions:

      The manuscript is much improved. I am satisfied with the authors' responses.

    3. Reviewer #3 (Public review):

      Summary:

      A long noncoding RNA, lnc-FANCI-2, was reported to be regulated by HPV E7 oncoprotein and a cell transcription factor, YY1 by this group. The current study focuses on the function of lnc-FANCI-2 in HPV-16 positive cervical cancer is to intrinsically regulate RAS signaling, thereby facilitating our further understanding additional cellular alterations during HPV oncogenesis. Authors used the advanced technical approaches such as KO, transcriptome and (IRPCRP) and LC- MS/MS analyses in the current study and concluded that KO Inc-FANCI-2 significantly increase RAS signaling, especially phosphorylation of Akt and Erk1/2.

      Strengths:

      (1) HPV E6E7 are required for full immortalization and maintenance of malignant phenotype of cervical cancer, but they are NOT sufficient for full transformation and tumorigenesis. This study helps further the understanding of other cellular alterations in HPV oncogenesis.<br /> (2) lnc-FANCI-2 is upregulated in cervical lesion progression from CIN1, CIN2-3 to cervical cancer, cancer cell lines and HPV transduced cell lines.<br /> (3) Viral E7 of high-risk HPVs and host transcription factor YY1 are two major factors promoting lnc-FANCI-2 expression.<br /> (4) Proteomic profiling of cytosolic and secreted proteins showed inhibition of MCAM, PODXL2 and ECM1 and increased levels of ADAM8 and TIMP2 in KO cells.<br /> (5) RNA-seq analyses revealed that KO cells exhibited significantly increased RAS signaling but decreased IFN pathways.<br /> (6) Increased phosphorylated Akt and Erk1/2, IGFBP3, MCAM, VIM, and CCND2 (cyclin D2) and decreased RAC3 were observed in KO cells.

      Comments on revisions:

      The revised manuscript has been significantly improved. The authors addressed all my concerns.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors attempted to dissect the function of a long non-coding RNA, lnc-FANCI-2, in cervical cancer. They profiled lnc-FANCI-2 in different cell lines and tissues, generated knockout cell lines, and characterized the gene using multiple assays.

      Strengths:

      A large body of experimental data has been presented and can serve as a useful resource for the scientific community, including transcriptomics and proteomics datasets. The reported results also span different parts of the regulatory network and open up multiple avenues for future research.

      Thanks for your positive comments on the strengths.

      Weaknesses:

      The write-up is somewhat unfocused and lacks deep mechanistic insights in some places.

      As the lnc-FANCI-2 as a novel lncRNA had never been explored for any functional study, our report found that it regulates RAS signaling. Thus, this report focuses on lnc-FANCI-2 and RAS signaling pathway but also includes some important screening data, which are important for our readers to understand how we could reach the RAS signaling.

      Reviewer #2 (Public review):

      The study by Liu et al provides a functional analysis of lnc-FANCI-2 in cervical carcinogenesis, building on their previous discovery of FANCI-2 being upregulated in cervical cancer by HPV E7.

      The authors conducted a comprehensive investigation by knocking out (KO) FANCI-2 in CaSki cells and assessing viral gene expression, cellular morphology, altered protein expression and secretion, altered RNA expression through RNA sequencing (verification of which by RT-PCR is well appreciated), protein binding, etc. Verification experiments by RT-PCR, Western blot, etc are notable strengths of the study.

      The KO and KD were related to increased Ras signaling and EMT and reduced IFN-y/a responses.

      Thanks for your positive comments. It did take us a few years to reach this scientific point for understanding of lnc-FANCI-2 function.

      Although the large amount of data is well acknowledged, it is a limitation that most data come from CaSki cells, in which FANCI-2 localization is different from SiHa cells and cancer tissues (Figure 1). The cytoplasmic versus nuclear localization is somewhat puzzling.

      Regarding lnc-FANCI-2 localization, it could be both cytoplasmic and nuclear in cervical cancer tissues, HPV16 or HPV18 infected keratinocytes, and HPV16+ cervical cancer cell line CaSki cells which contain multiple integrated HPV16 DNA copies. But surprisingly, it is most detectable in the nucleus in HPV16+ SiHa cells which contain only one copy of integrated HPV16 DNA (Yu, L., et al. mBio 15: e00729-24, 2024). No matter what, knockdown of lnc-FANCI-2 expression from SiHa cells induces RAS signaling leading to an increase in the expression of p-AKT and p-Erk1/2 (suppl. Fig. S6B).

      Reviewer #3 (Public review):

      Summary:

      A long noncoding RNA, lnc-FANCI-2, was reported to be regulated by HPV E7 oncoprotein and a cell transcription factor, YY1 by this group. The current study focuses on the function of lnc-FANCI-2 in HPV-16 positive cervical cancer is to intrinsically regulate RAS signaling, thereby facilitating our further understanding of additional cellular alterations during HPV oncogenesis. The authors used advanced technical approaches such as KO, transcriptome and (IRPCRP) and LC- MS/MS analyses in the current study and concluded that KO Inc-FANCI-2 significantly increases RAS signaling, especially phosphorylation of Akt and Erk1/2.

      Strengths:

      (1) HPV E6E7 are required for full immortalization and maintenance of the malignant phenotype of cervical cancer, but they are NOT sufficient for full transformation and tumorigenesis. This study helps further understanding of other cellular alterations in HPV oncogenesis.

      (2) lnc-FANCI-2 is upregulated in cervical lesion progression from CIN1, CIN2-3 to cervical cancer, cancer cell lines, and HPV transduced cell lines.

      (3) Viral E7 of high-risk HPVs and host transcription factor YY1 are two major factors promoting lnc-FANCI-2 expression.

      (4) Proteomic profiling of cytosolic and secreted proteins showed inhibition of MCAM, PODXL2, and ECM1 and increased levels of ADAM8 and TIMP2 in KO cells.

      (5) RNA-seq analyses revealed that KO cells exhibited significantly increased RAS signaling but decreased IFN pathways.

      (6) Increased phosphorylated Akt and Erk1/2, IGFBP3, MCAM, VIM, and CCND2 (cyclin D2) and decreased RAC3 were observed in KO cells.

      Thanks for your positive comments. It has taken us almost nine years to reach this point to gradually understand lnc-FANCI-2 functions, which are more complex than our initial thoughts.  

      Weaknesses:

      (1) The authors observed the increased Inc-FANCI-2 in HPV 16 and 18 transduced cells, and other cervical cancer tissues as well, HPV-18 positive HeLa cells exhibited different expressions of Inc-FANCI-2.

      Both HPV16 and HPV18 infections induce lnc-FANCI-2 expression in keratinocytes (Liu H., et al. PNAS, 2021). However, HPV18+ cervical cancer cell lines HeLa and C4II cells (Figure S1A and S1B) do not express lnc-FANCI-2 as we see in HPV-negative cell lines such as HCT116, HEK293, HaCaT, and BCBL1 cells. Although we don’t know why, our preliminary data show that the lnc-FANCI-2 promoter functions well and is sensitive to YY1 binding in lnc-FANCI-2 expressing CaSki and C33A cells in our dual luciferase assays but is much less sensitive to YY1 binding in HeLa and HCT116 cells, indicating some unknown cellular factors negatively regulating lnc-FANCI-2 promoter activity.

      Author response image 1.

      A firefly luciferase (FLuc) reporter containing either the wild-type (−600 wt) or YY1-binding-site-mutated lnc-FANCI-2 promoter was evaluated in CaSki, HeLa, C33A, and HCT116 cells for its promoter activity, with Renilla luciferase (RLuc) activity driven by a TK promoter serving as an internal control. The two YY1-binding motifs (A and B) with a X for mutation are illustrated in the right diagram.

      (2) Previous studies and data in the current showed a steadily increased Inc-FANCI-2 during cancer progression, however, the authors did not observe significant changes in cell behaviors (both morphology and proliferation) in KO Inc-FANCI-2.

      Thanks. We do see decreases in cell proliferation, colony formation, and cell migration, accompanied by increased cell senescence, from the lnc-FANCI-2 KO cells to the parent WT cells.  These data are now added to the revised Fig. 1 and the revised supplemental Fig. S3.

      (3) The authors observed the significant changes of RAS signaling (downstream) in KO cells, but they provided limited interpretations of how these results contributed to full transformation or tumorigenesis in HPV-positive cancer.

      As we stated in the title of this function of lnc-FANCI-2, the lnc-FANCI-2 intrinsically restricts RAS signaling and phosphorylation of Akt and Erk in HPV16-infected cervical cancer. Presumably, high RAS-AKT-ERK signaling inhibits tumor cell survival due to senescence induction as we show in our new Figure 1 and supplemental Fig. S3. A similar report was found in a lung cancer study (Patricia Nieto, et al. Nature 548: 239-243, 2017).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) A major issue is that parts of the manuscript read like a collection of experimental results. However, some of the results do not contribute directly to the central story. Besides confusing the reader, the large amount of apparently disparate results can raise more questions. For example:

      a) Why is lnc-FANCI-2 highly expressed in HPV16-infected cervical cancer cell lines (but not in HPV18-infected cells)?

      b) How do p53 and RB repress the expression of lnc-FANCI-2?

      c) What regulates the sub-cellular localization of lnc-FANCI-2?

      d) How does lnc-FANCI-2 negatively regulate RAS signalling?

      e) How does MAP4K4 bind to lnc-FANCI-2?

      f) Do lnc-FANCI-2 and MAP4K4 require each other to regulate RAS signalling?

      g) How does RAS signalling regulate the transcription of MCAM and IGFBP3?

      h) How does MCAM feedback on RAS? Do the different MCAM isoforms impact on RAS signalling differently?

      i) How does IGFBP3 feedback on ERK but not AKT?

      j) How do the other mentioned proteins like ADAM8 fit into the regulatory network?

      k) Each question will require a lot more work to address. I think it would be good if the authors could think through carefully what the key message(s) in the current manuscript should be and then present a more focused write-up.

      Thanks for the critical comments. Because this study is the first time to explore lnc-FANCI-2 functions, we would like to be collective. We believe these data are important to guide any future studies. We really appreciate our reviewer listing many questions related to HPV infection, cell biology, RAS signaling, cancer biology from questions a to k. To address each question in a satisfactory way will be a separate study, but fortunately, our report has pointed out such a direction with some preliminary data for future studies. Here below are our responses to each question from a to k:

      a) Both HPV16 and HPV18 infection induce lnc-FANCI-2 expression in keratinocytes (Liu H., et al. PNAS, 2021). However, HPV18+ cervical cancer cell lines HeLa and C4II cells (Figure S1A and S1B) do not express lnc-FANCI-2 as we see in HPV-negative cell lines such as HCT116, HEK293, HaCaT, and BCBL1 cells. Although we don’t know why, our preliminary data show that lnc-FANCI-2 promoter functions well and is sensitive to YY1 binding in lnc-FANCI-2 expressing CaSki and C33A cells but is much less sensitive to YY1 in HeLa and HCT116 cells, indicating some unknown cellular factors negatively regulating lnc-FANCI-2 promoter activity.

      b) We don’t know whether p53 and pRB could repress the expression of lnc-FANCI-2 although C33A cells bearing a mutant p53 and mutant pRB express high amount of lnc-FANCI-2. However, KD of E2F1 had no effect on lnc-FANCI-2 promoter activity in CaSki cells (Liu, H., et al. PNAS, 2021).

      c) RNA cellular localization can be affected by many factors, including splicing, export, and polyadenylation. As lnc-FANCI-2 is a long non-coding RNA, its regulation of cellular location could be more complicated than mRNAs and thus could be a future research direction.  

      d) The conclusion that lnc-FANCI-2 negatively regulates RAS signaling is based on both lnc-FANCI-2 KO and KD studies.  Please see the proposed hypothetic model in Figure 8E.

      e) The MAP4K4 binding to lnc-FANCI-2 was demonstrated by our IRPCRP-Mass spectrometry (Fig. 8A and 8C), although the exact binding site on lnc-FANCI-2 was not explored. As you probably know, many enzymes today turn out an RNA-binding enzyme (Castello A., et al. Trends Endocrinol. Metab. 26: 746-757, 2015; Hentze MW., et al. Nat. Rev. Mol. Cell Biol. 19: 327-341, 2018)    

      f) Yes, they are slightly relied on each other in regulating RAS signaling. We found that KD of MAP4K4 in parent CaSki cells (Figure 8D) led to more effect on RAS signaling (MCAM, IGFBP3, p-Akt) than that in lnc-FANCI-2 KO ΔPr-A9 cells. In contrast, the latter displayed more p-Erk1/2 than that induced by KD of lnc-FANCI-2 in the parental CaSki cells (Figure S7C).

      g) We believe RAS signaling regulates most likely the transcription of MCAM and IGFBP3 through phosphorylated transcription factors (Figure 8E diagram).

      h) As a signal molecule with at least 13 ligands/coreceptors (Joshkon A., et al. Biomedicines 8: 633, 2020), the increased MCAM appears to sustain RAS signaling (Fig. 7J and Fig. 8E). We are assuming the full-length cytoplasmic MCAM plays a predominant role in RAS signaling due to its abundance than the cleaved nuclear MCAM missing both transmembrane and cytoplasmic regions. Plus, RAS signaling mainly occurs in the cytosol.  

      i) Exact mechanism remains unknown. Lnc-FANCI-2 KO cells exhibit high expression levels of IGFBP3 RNA and protein and p-Erk1/2, but not so much for p-Akt, possibly due to IGFBP3 regulation of MAPK for Erk phosphorylation, but not much so on PI3K for Akt phosphorylation.

      j) The dysregulation of RAS signaling and ADAM protein activity is implicated in various cancers. ADAM proteins can modulate RAS signaling by cleaving and releasing ligands that activate or inactivate RAS-related pathways (Schafer B., et al. JBC 279: 47929-38, 2004; Ohtsu H., et al. Am J Physiol Cell Physiol 291: C1-C10, 2006; Dang M, et al. JBC 286: 17704-17713, 2011; Kleino I, et al. PLoS One 10: e0121301, 2015). Some ADAM proteins are Involved in the migration and invasion of cancer cells, and its loss can promote the degradation of KRAS (Huang Y-K., et al. Nat Cancer 5: 400-419, 2024). In this revision, we have a brief discussion on ADAMs and RAS signaling.

      k) We agree with our reviewer that each question will require a lot more work to address. As this study is to explore the lnc-FANCI-2 function for the first time, however, we prefer to include all of these data that have been selectively included in this write-up. We hope reviewer 1 will be satisfied with our response to each question from a to j. 

      (2) Figures S1A & S1C - Replicates are needed.

      Yes, we have repeated all of the experiments. The quantification shown in Figure S1A and S1C was performed in triplicate, and error bars have been added to the updated figure.

      3) Figure S1D - There seems to be some lnc-FANCI-2 RNA in the nucleus of CaSki cells as well. Please quantify the relative amount of lnc-FANCI-2 in the nucleus vs cytoplasm.

      Yes, a small fraction of lnc-FANCI-2 is in the nucleus of CaSki cells as we reported (Liu H., PNAS, 2021, Movies S1 and S2). We did quantify by fractionation and RT-qPCR the relative amount of lnc-FANCI-2 in the nucleus vs cytoplasm in Figure S1C. 

      (4) Figure S2B - (a) For ΔPr-A9 cells, it looks like there is an increase in E6 and a decrease in E7, instead of "little change" as the authors claimed. (b) I suggest checking the protein levels for all the control and KO clones.

      Thanks for the questions. We had some variation in E6 and E7 detection and the submitted one was one representative.  We grew again the lnc-FANCI-2 KO clones A9 and B3 and reexamined the expression of HPV16 E6/E7 proteins and their downstream targets, p53 and E2F1. As shown in new Figure S3A expt II, we saw again some variations in the detections (~20-30%) and these variations do not reflect a noticeable change for their downstream targets. Thus, we do not consider these changes significantly enough to draw a conclusion in our study, but rather most likely from sampling in the assays.

      (5) In the Proteome Profiler Human sReceptor Array analysis, multiple proteins were highlighted as having at least 30% change. But it is unclear how they relate to RAS signaling.

      Thanks for this comment.  Cellular soluble receptors are essential for RAS signaling, EMT pathway and IFN responses. For example, the dysregulation of RAS signaling and ADAM protein activity is implicated in various cancers. ADAM proteins can modulate RAS signaling by cleaving and releasing ligands that activate or inactivate RAS-related pathways (Schafer B., et al. JBC 279: 47929-38, 2004; Ohtsu H., et al. Am J Physiol Cell Physiol 291: C1-C10, 2006; Dang M, et al. JBC 286: 17704-17713, 2011; Kleino I, et al. PLoS One 10: e0121301, 2015). Some ADAM proteins are Involved in the migration and invasion of cancer cells, and its loss can promote the degradation of KRAS (Huang Y-K., et al. Nat Cancer 5: 400-419, 2024). In this revision, we have a brief discussion on ADAMs and RAS signaling.

      (6) Does knockdown of MAP4K4 lead to an increase in MCAM and IGFBP3?

      Yes, the MAP4K4 KD from parental WT CaSki cells does lead an increase in MCAM (~70%) and IGFBP3 (~30%) which is like the knockdown of lnc-FANCI-2 shown in the revised Figure 8D.

      Minor comments:

      (7) In the opinion of this reviewer the title is somewhat unwieldy.

      Thanks. We have shortened the title as “The lnc-FANCI-2 intrinsically restricts RAS signaling in HPV16-infected cervical cancer”

      (8) The abstract can be more focused and doesn't have to mention so many gene names. In fact, the significance paragraph works better as an abstract. For the significance, the authors can provide another write-up on the implications of their research instead.

      Thanks. We have revised the abstract and added the implications of this research.

      (9) The last sentence of the introduction feels a little abrupt. It would be good to elaborate a little more on the key findings.

      Thanks for this critical comment. We have revised as in the following: In this report, we demonstrate that lnc-FANCI-2 in HPV16-infected cells controls RAS signaling by interaction with MAP4K4 and other RNA-binding proteins. Ablation of lnc-FANCI-2 in the cells promotes RAS signaling and phosphorylation of Akt and Erk. High levels of lnc-FANCI-2 and low level of MCAM expression in cervical cancer patients correlate with improved survival, indicating that lnc-FANCI-2 plays a critical role in regulating RAS signaling to affect cervical cancer progression and patient outcomes.

      (10) Typo on line 191: Should be ADAM8 and not ADMA8.

      Corrected.

      Reviewer #2 (Recommendations for the authors):

      The paper contains a vast amount of data and would greatly benefit from an expanded version of the schematic of Figure 8E summarizing the main results. Including additional details on FANCI-2 regulation by HPV (primarily from previous studies) and its implications for HPV16-driven carcinogenesis would provide a more comprehensive overview.

      Thanks for the suggestion. We have modified our Figure 8E to include HR-HPV E7 and YY1 in regulation of lnc-FANCI-2 transcription.

      Further specific comments:

      (1) The introduction may be shortened to increase readability (e.g. lines 77-90; 94-105).

      We have shortened the introduction by deletion of the lines 94-105 from our initial submission.

      (2) Lines 55-57 the number of cervical cancer diagnoses and mortality need to be updated to the latest literature. The reference is from 2012.

      Thanks. We have revised and updated accordingly with a new citation (Bray F., et al: Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 74, 229-263 (2024))

      (3) Line 61: Progression rate of CIN3 is incorrect (31% in 30 years according to reference 5).

      Thanks. Corrected.

      (4) Lines 108-112 are difficult to understand and should be rewritten.

      Thanks. Revised accordingly.

      (5) Line 116 Is this correct or should 'but' be 'and'?

      Thanks. Corrected accordingly.

      (6) Figure 1A top: The difference between cervical cancer and normal areas is hard to see in the top figure. The region labeled as "normal" does not resemble typical differentiating epithelium or normal glandular epithelium, though this is difficult to assess accurately from the image provided. I suggest adding HE staining and also the histotypes.

      We have added an H&E staining panel in the corresponding region to Figure 1A, which clearly shows the normal and cancer regions. Both cervical cancer tissues were cervical squamous cell carcinoma.

      (7) HFK-HPV16 & 18 cells (Figure 1B) are not described in the Materials & Methods.

      Thanks. We revised our Materials and Methods by citing our two previous publications.

      (8) Figure 2E (RNA scope on FANCI-2 KO) only shows 2 to 3 cells, which makes it somewhat difficult to assess downregulated expression in the KO. I suggest replacing these with pictures showing more cells (i.e. >10) to strengthen the results.

      We have replaced the image in Figure 2E to include more cells.

      (9) The spindle-like morphology in deltaPr-A9 cells shown in FigS2A is not very distinct. Including images at higher magnification could help clarify this feature.

      Good comment. We have enlarged the images for better view and revised the context.

      (10) Both protein and RNA expression analysis have been performed on WT CaSki cells and FANCI-2 KO cells. If I am correct there is little overlap between the significantly changed gene products. What does this mean? Have you looked into the comparison?

      The DEGs identified from RNA-seq indicated a genome wide transcriptome change, while the protein array we used only covered 105 soluble protein receptors. However, we did find 9/15 (60%) membrane proteins in cell lysates (PODXL2, ECM1, NECTIN2, MCAM, ADAM9, CDH5, ADAM10, ITGA5, NOTCH1, SCARF2, ADAM8, TIMP2, LGALS3BP, CDH13, and ITGB6) exhibited consistent changes in expression (underlined) by both RNA-seq and protein array assays. We have revised the text with this information (page 11). Other six proteins (40%) had inconsistent expression correlation in two assays could be due to post-translational mechanisms, such as protein stability, modifications and secretion, etc.  

      (11) Figure S7, which represents TCGA data and survival is quite complex. It would be more effective to display a similar figure for FANCI-2, as was done for MCAM in Figure 7I, to simplify the comparison and enhance clarity.

      Thanks. However, the suggested figure for lnc-FANCI-2 was published in PNAS paper already (Liu H., et al. PNAS, 2021).  The Figure S8 in this revision is the result from our in-house GradientScanSurv pipeline, a new way to correlate the expression and survival more accurately.

      What do the Figures look like if you analyse only HPV16+ patients versus HPV18+ patients, considering that FANCI-2 upregulation in cell lines is related to HPV16 and not 18? Is there an effect of histotype? Or tumor stage?

      HPV18 infected keratinocytes express high level of lnc-FANCI-2. Two HPV18<sup>+</sup> HeLa and C4II cell lines and HPV-negative cell lines, such as HCT116 cells, which do not express lnc-FANCI-2 could be due to the presence of some unknow repressive factors. We found that lnc-FANCI-2 promoter functions well in responding to YY1 binding in CaSki and C33A cells expressing lnc-FANCI-2 but does not so in HeLa and HCT116 cells in our dual luciferase assays. 

      (12) It remains puzzling that FANCI-2 upregulation was previously shown to already occur in CIN lesions and increase further in cervical cancer, while the current data indicate that FANCI-2 suppresses AKT activation. If I am correct Akt activation has been linked to cervical carcinogenesis. Similarly, line 434 states that increased MCAM might promote cervical tumorigenesis, implying that low FANCI-2 would stimulate tumorigenesis. If I understand correctly, the increase in FANCI-2 observed in CIN lesions would reflect a "brake" on the carcinogenic pathway and its sustained increase in cancer might indicate that growth is still (partly) controlled. As mentioned earlier, a Figure illustrating the relation between FANCI-2, HPV, and the carcinogenic process would be beneficial for clarity.

      Yes. Increased MCAM, but low level of lnc-FANCI-2, correlates with poor cervical cancer survival. We have revised Figure 8E to illustrate this relation better.  

      (13) May part of the potentially conflicting findings be explained by CaSki cells being of metastatic origin? Related to this, does the expression of FANCI-2 or MALM depend on the tumor stage?

      Thanks for this important suggestion. Unfortunately, we found that the expression of lnc-FANCI-2 and MCAM is not associated with cervical cancer stage based on the TCGA data (http://gepia.cancer-pku.cn/index.html). See the data below:

      Author response image 2.

      Despite some lingering uncertainty, the extensive experiments conducted using KO and KD cells do provide compelling evidence that lnc-FANCI-2 function is linked to RAS signaling and EMT.

      Thanks for your positive review and instructive comments.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors observed the increased Inc-FANCI-2 in HPV 16 and 18 transduced cells, and other cervical cancer tissues as well, HPV-18 positive HeLa cells exhibited different expressions of Inc-FANCI-2. I suggest authors provide more discussions on this difference, for example, HPV genotypes. HPV genome status in host cells? Cell types?

      Thanks. We found the keratinocyte infections with HPV16, HPV18, and other HR-HPVs could induce lnc-FANCI-2 expression (Liu H., et al. PNAS, 2021). In this report, we found HPV18<sup>+</sup> HeLa and C4II cells and other HPV-negative cell lines do not. Our preliminary data on lnc-FANCI-2 promoter activity assays showed the presence of a negative regulatory factor (s) in non-lnc-FANCI-2 expressing cells. See the data in Author response image 1.

      We have revised our discussion by inclusion these sets of the luciferase data as data not shown.

      (2) I suggest the authors discuss more details on how the changes of RAS signaling in KO cells help our further understanding of the molecular mechanisms for HPV-associated full-cell transformation and malignancy in addition to the well-known functions of HPV E6 and E7.

      Thanks. We have modified the Figure 8E as suggested by reviewer 2 and revised the discussion further.

    1. eLife Assessment

      The paper addresses the question of gene epistasis and asks what is the correct null model for which we should declare no epistasis. By reanalyzing synthetic gene array datasets regarding single and double-knockout yeast mutants, and considering two theoretical models of cell growth, the authors reach the valuable conclusion that the product function is a good null model. While the justification of some assumptions is incomplete, the results have the potential to be of value to the field of gene epistasis.

    2. Reviewer #1 (Public review):

      Summary

      Detecting unexpected epistatic interactions between multiple mutations requires a robust null expectation-or neutral function-that predicts the combined effects of multiple mutations on phenotype based on the individual effects of single mutations. This study evaluated the relevance of the product neutrality function, where double-mutant fitness is represented as a multiplicative combination of single-mutant fitness in the absence of epistatic interactions. The authors used a recent large dataset on fitness, specifically yeast colony size, to analyze epistatic interactions.

      The study confirmed that the product function outperformed other neutral functions in predicting double-mutant fitness, showing no bias between negative and positive epistatic interactions. Additionally, in the theoretical portion of the study, the authors employed a previously established theoretical model of bacterial cell growth to simulate growth rates of both single- and double-mutants under multiple parameters. The simulations similarly demonstrated that the product function was superior to other functions in predicting the fitness of hypothetical double-mutants. Based on these findings, the authors concluded that the product function is a robust tool for analyzing epistatic interactions in growth fitness and effectively reflects how growth rates depend on the combination of multiple biochemical pathways.

      Strength

      By leveraging a previously published large dataset of yeast colony sizes for single- and double-knockout mutants, this study validated the relevance of the product function, which has frequently been used in genetics to analyze epistatic interactions. The confirmation that the product function provides a more reliable prediction of double-mutant fitness compared to other neutral functions is valuable for researchers analyzing epistatic interactions, particularly those working with the same dataset.<br /> Notably, this dataset has been previously used in studies exploring epistatic interactions with the product neutrality function. This study's findings affirm the validity of using the product function, which could enhance confidence in the conclusions drawn by those earlier studies. Consequently, both researchers utilizing this dataset and readers of prior research will benefit from the confirmation provided by this study.

      Weakness

      This study contains several serious problems, primarily stemming from the following issues: ignoring the substantial differences in the mechanisms regulating cell growth between prokaryotes and eukaryotes and adopting an overly specific and unrealistic set of assumptions in the mutation model. Below, the details are discussed.

      (1) Misapplication of prokaryotic growth models

      The mechanistic origin of the multiplicative model observed in yeast colony fitness is explained using a bacterial cell growth model. However, there is no valid justification for linking these two systems. The bacterial growth model, the Scott-Hwa model, heavily rely on specific molecular mechanisms, such as ppGpp-mediated regulation, which adjusts ribosome expression and activity during translation. In particular, this mechanism is critical to ensure growth-dependency of the fraction of ribosome in proteome in the Scott-Hwa model [https://doi.org/10.1111/j.1462-2920.2010.02357.x; https://doi.org/10.1073/pnas.2201585119]. Yeast cells lack this regulatory mechanism, making it inappropriate to directly apply bacterial growth models to yeast.<br /> The Weiße model is based on a larger set of underlying equations and involves more parameters than the Scott-Hwa model. In the original paper by Weiße et al. (PNAS, 2015), however, the model parameters were fitted solely to experimental data from E. coli, and the model's applicability to yeast was never assessed. In summary, for neither the Scott-Hwa model nor the Weiße model has it been demonstrated that the entire model quantitatively fits experimental data from yeast. A positive correlation between growth rate and RNA/protein ratio, often observed in yeast, supports only a limited portion of either model, and does not constitute validation of the models as a whole.

      (2) Overly specific assumptions in the theoretical model

      The theoretical model assumes that two mutations affect only independent parameters of specific biochemical processes. However, this overly restrictive assumption weakens the model's validity in explaining the general occurrence of the multiplicative model in mutations. Furthermore, experimental evidence suggests limitations of this approach. For example, in most viable yeast deletion mutants with reduced growth rates, the expression of ribosomal proteins remained largely unchanged, contrary to the predictions of the Scott-Hwa model [https://doi.org/10.7554/eLife.28034]. This discrepancy highlights that the Scott-Hwa model and its derivatives cannot reliably explain mutants' growth rates based on current experimental evidence.

      (3) Limited reliability of the mechanistic origin of the multiplicative model

      The authors seem to regard growth-optimizing feedback as the mechanistic origin of the multiplicative model. However, the importance of growth-optimizing feedback in explaining product neutrality heavily depends on the very specific framework of the Scott-Hwa model. As I pointed out above, the Scott-Hwa model is a bacterial growth model that considers only a narrowly defined set of biochemical reactions. Using such a narrow model to explore the mechanistic origin of product neutrality observed on a genome-wide scale appears to be inappropriate. Arguments based on either the Scott-Hwa model or the Weiße model fail to account for the generality of product neutrality across diverse genetic perturbations. These models, in their current form, do not explain the broader patterns of product neutrality observed experimentally.

    3. Reviewer #2 (Public review):

      The paper deals with the important question of gene epistasis, focusing on asking what is the correct null model for which we should declare no epistasis.

      In the first part, they use the Synthetic Genetic Array dataset to claim that the effects of a double mutation on growth rate is well predicted by the product of the individual effects (much more than e.g. the additive model). The second (main) part shows this is also the prediction of two simple, coarse-grained models for cell growth.

      I find the topic interesting, the paper well written, and the approach innovative.

      Comments on revisions:

      The authors have adequately addressed the comments raised in the review below, and I find that the paper has improved.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      Detecting unexpected epistatic interactions among multiple mutations requires a robust null expectation - or neutral function - that predicts the combined effects of multiple mutations on phenotype, based on the effects of individual mutations. This study assessed the validity of the product neutrality function, where the fitness of double mutants is represented as the multiplicative combination of the fitness of single mutants, in the absence of epistatic interactions. The authors utilized a comprehensive dataset on fitness, specifically measuring yeast colony size, to analyze epistatic interactions.

      The study confirmed that the product function outperformed other neutral functions in predicting the fitness of double mutants, showing no bias between negative and positive epistatic interactions. Additionally, in the theoretical portion of the study, the authors applied a wellestablished theoretical model of bacterial cell growth to simulate the growth rates of both single and double mutants under various parameters. The simulations further demonstrated that the product function was superior to other functions in predicting the fitness of hypothetical double mutants. Based on these findings, the authors concluded that the product function is a robust tool for analyzing epistatic interactions in growth fitness and effectively reflects how growth rates depend on the combination of multiple biochemical pathways.

      Strengths:

      By leveraging a previously published extensive dataset of yeast colony sizes for single- and double-knockout mutants, this study validated the relevance of the product function, commonly used in genetics to analyze epistatic interactions. The finding that the product function provides a more reliable prediction of double-mutant fitness compared to other neutral functions offers significant value for researchers studying epistatic interactions, particularly those using the same dataset.

      Notably, this dataset has previously been employed in studies investigating epistatic interactions using the product neutrality function. The current study's findings affirm the validity of the product function, potentially enhancing confidence in the conclusions drawn from those earlier studies. Consequently, both researchers utilizing this dataset and readers of previous research will benefit from the confirmation provided by this study's results.

      Weaknesses:

      This study exhibits several significant logical flaws, primarily arising from the following issues: a failure to differentiate between distinct phenotypes, instead treating them as identical; an oversight of the substantial differences in the mechanisms regulating cell growth between prokaryotes and eukaryotes; and the adoption of an overly specific and unrealistic set of assumptions in the mutation model. Additionally, the study fails to clearly address its stated objective-investigating the mechanistic origin of the multiplicative model. Although it discusses conditions under which deviations occur, it falls short of achieving its primary goal. Moreover, the paper includes misleading descriptions and unsubstantiated reasoning, presented without proper citations, as if they were widely accepted facts. Readers should consider these issues when evaluating this paper. Further details are discussed below.

      (1) Misrepresentation of the dataset and phenotypes

      The authors analyze a dataset on the fitness of yeast mutants, describing it as representative of the Malthusian parameter of an exponential growth model. However, they provide no evidence to support this claim. They assert that the growth of colony size in the dataset adheres to exponential growth kinetics; in contrast, it is known to exhibit linear growth over time, as indicated in [Supplementary Note 1 of https://doi.org/10.1038/nmeth.1534]. Consequently, fitness derived from colony size should be recognized as a different metric and phenotype from the Malthusian parameter. Equating these distinct phenotypes and fitness measures constitutes a fundamental error, which significantly compromises the theoretical discussions based on the Malthusian parameter in the study.

      The reviewer is correct in pointing out that colony-size measurements are distinct from exponential growth kinetics. We acknowledge that our original text implied that the dataset directly measured the exponential growth rate (Malthusian parameter), when in fact it was measuring yeast colony expansion rates on solid media. Colony growth under these conditions often follows a biphasic pattern in that there is typically an initial microscopic phase where cells can grow exponentially, but as the colony expands further then the growth dynamics become more linear (Meunier and Choder 1999). We have revised our text to state clearly what the experiment measured.

      However, while colony size does not exhibit exponential growth kinetics, several studies have argued that the rate of colony expansion is related to the exponential growth rate of cells growing in non-limiting nutrient conditions in liquid culture. This is because colony growth is dominated by cells at the colony boundaries that have access to nutrients and are in exponential growth. Cells in the colony interior lack nutrients and therefore contribute little to colony growth. This has been shown both in theoretical and experimental studies, finding that the linear growth rate of the colony is directly linked to the single-cell exponential growth rate (Pirt 1967; Gray and Kirwan 1974; Korolev et al. 2012; Gandhi et al. 2016; Meunier and Choder 1999). In particular, the above studies suggest that the linear colony growth rate is directly proportional to the square root of the exponential growth rate. Therefore, one would expect that the validity of the product model for one fitness measure implies its validity for the other measure. In addition, colony size was found to be highly correlated with the exponential growth rate of cells in non-limiting nutrients in liquid culture (Baryshnikova et al. 2010; Zackrisson et al. 2016; Miller et al. 2022). For these reasons, we treated the colony size and exponential growth rate as interchangeable in our original manuscript. 

      To address the important point raised by the reviewer, we now explain more clearly in the text what the analyzed data on colony size show and why we believe it is reflective of the exponential growth rate. Finally, we note that our results supporting the product neutrality function are consistent with the work of (Mani et al. 2008), which used smaller datasets based on liquid culture growth rates (Jasnos and Korona 2007; Onge et al. 2007).

      The text in Section 2.3 now reads:

      “Having verified empirically that the Product neutrality function is supported by the latest data for cell proliferation, we now turn our attention to its origins. Addressing this question requires some mechanistic model of biosynthesis. However, most mechanistic models of growth apply directly to single cells in rich nutrient conditions, which may not directly apply to the SGA measurements of colony expansion rates. In particular, colony growth has been shown to follow a biphasic pattern (Meunier et al. 1999). A first exponential phase is followed by a slower linear phase as the colony expands. Previous modeling and empirical work indicates that this second linear expansion rate reflects the underlying exponential growth of cells in the periphery of the colony (Pirt 1967; Gray et al. 1974; Gandhi et al. 2016; Baryshnikova, Costanzo, S. Dixon, et al. 2010; Zackrisson et al. 2016; Miller et al. 2022). More precisely, mathematical models show the linear colony-size expansion rate is directly proportional to the square root of the exponential growth rate under non-limiting conditions. Intuitively, this relationship arises because colony growth is dominated by the expansion of the population of cells in an annulus at the colony border that are exposed to rich nutrient conditions. These cells expand at a rate similar to the exponential rate of cells growing in a rich nutrient liquid culture. In contrast, the cells in the interior of the colony experience poor nutrient conditions, grow very slowly, and do not contribute to colony growth.

      This intimate relationship between both proliferation rates allows us to explore the origin of the Product neutrality function in mechanistic models of cell growth. Indeed, if colony-based fitnesses follow a Product model, then

      where the superscript c indicates colony-based values for the fitness W and the growth rate λ. Taking into account the relationship between single-cell exponential growth rates and colony growth rates, we can write

      where the superscript l denotes liquid cultures. Combining these expressions, we obtain

      In other words, from the perspective of the Product neutrality function, fitnesses based on colony expansion rates are equivalent to fitnesses based on single-cell exponential growth rates. The prevalence of the Product neutrality model—both in the SGA data and in previous studies on datasets from liquid cultures (Jasnos et al. 2007; Onge et al. 2007; Mani et al. 2008)—encourages the exploration of its origin in mechanistic models of cell growth.”

      (2) Misapplication of prokaryotic growth models

      The study attempts to explain the mechanistic origin of the multiplicative model observed in yeast colony fitness using a bacterial cell growth model, particularly the Scott-Hwa model. However, the application of this bacterial model to yeast systems lacks valid justification. The Scott-Hwa model is heavily dependent on specific molecular mechanisms such as ppGppmediated regulation, which plays a crucial role in adjusting ribosome expression and activity during translation. This mechanism is pivotal for ensuring the growth-dependency of the ribosome fraction in the proteome, as described in [https://doi.org/10.1073/pnas.2201585119]. Unlike bacteria, yeast cells do not possess this regulatory mechanism, rendering the direct application of bacterial growth models to yeast inappropriate and potentially misleading. This fundamental difference in regulatory mechanisms undermines the relevance and accuracy of using bacterial models to infer yeast colony growth dynamics.

      If the authors intend to apply a growth model with macroscopic variables to yeast double-mutant experimental data, they should avoid simply repurposing a bacterial growth model. Instead, they should develop and rigorously validate a yeast-specific growth model before incorporating it into their study.

      There is nothing that is prokaryote specific in the Scott-Hwa model. It does not include the specific ppGpp mechanism to regulate ribosome fraction that does not exist in eukaryotes.  The general features of the model, like how the ribosome fraction is proportional to the growth rate have indeed been validated in yeast (Metzl-Raz et al. 2017; Elsemman et al. 2022; Xia et al. 2022). Performing a detailed physiological analysis of budding yeast across varying growth conditions in order to build a more extensive model is beyond the scope of this work. Finally, we note that the Weiße model, which we also analyzed, is also generic and has replicated empirical measurements both from bacteria and yeast (Weiße et al. 2015).

      To clarify this point in the text, we have added the following to Section 2.3: 

      “Experimental measurements in other organisms suggest that the observations leading to this model, including that the cellular ribosome fraction increases with growth rate, are in fact generic and also seen in the yeast S. cerevisiae (Metzl-Raz et al. 2017; Elsemman et al. 2022; Xia et al. 2022).”

      (3) Overly specific assumptions in the theoretical model

      he theoretical model in question assumes that two mutations affect only independent parameters of specific biochemical processes, an overly restrictive premise that undermines its ability to broadly explain the occurrence of the multiplicative model in mutations. Additionally, experimental evidence highlights significant limitations to this approach. For example, in most viable yeast deletion mutants with reduced growth rates, the expression of ribosomal proteins remains largely unchanged, in direct contradiction to the predictions of the Scott-Hwa model, as indicated in [https://doi.org/10.7554/eLife.28034]. This discrepancy emphasizes that the ScottHwa model and its derivatives do not reliably explain the growth rates of mutants based on current experimental data, suggesting that these models may need to be reevaluated or alternative theories developed to more accurately reflect the complex dynamics of mutant growth.

      In the data from the Barkai lab referenced by the reviewer (reproduced below), we see that the ribosomal transcript fraction is in fact proportional to growth rate in response to gene deletions in contradiction to the reviewer’s interpretation. However, it is notable that the ribosomal transcript fraction is a bit higher for a given growth rate if that growth rate is generated by a mutation rather than generated by a suboptimal nutrient condition. We know that the very simple Scott-Hwa model is not a perfect representation of the cell. Nevertheless, it does recapitulate important aspects of growth physiology and therefore we thought it is useful to analyze its response to mutations and compare those responses to the different neutrality functions.  We never claimed the Scott-Hwa model was a perfect model and fully agree with the referee’s statement above that “... these models may need to be reevaluated, or alternative theories developed to more accurately reflect the complex dynamics of mutant growth.” Indeed, we say as much in our discussion where we wrote: 

      “While we focused on coarse-grained models for their simplicity and mechanistic interpretability, they might be too simple to effectively model large double-mutant datasets and the resulting double-mutant fitness distributions. We therefore expect the combination of high throughput genetic data with the analysis of larger-scale models, for instance based on Flux Balance Analysis, Metabolic Control Analysis, or whole-cell modeling, to lead to important complementary insights regarding the regulation of cell growth and proliferation.”

      To further clarify this point, we discuss and cite the Barkai lab data for gene deletions see Figure 2 from Metzl-Raz et al. 2017.

      (4) Lack of clarity on the mechanistic origin of the multiplicative model

      The study falls short of providing a definitive explanation for its primary objective: elucidating the "mechanistic origin" of the multiplicative model. Notably, even in the simplest case involving the Scott-Hwa model, the underlying mechanistic basis remains unexplained, leaving the central research question unresolved. Furthermore, the study does not clearly specify what types of data or models would be required to advance the understanding of the mechanistic origin of the multiplicative model. This omission limits the study's contribution to uncovering the biological principles underlying the observed fitness patterns.”

      We appreciate the reviewer’s interest in a more complete mechanistic explanation for the product model of fitness. The primary goal of this study was to explore the validity of the Product model from the perspective of coarse-grained models of cell growth, and to extract mechanistic insights where possible. We view our work as a first step toward a deeper understanding of how double-mutant fitnesses combine, rather than a final, all-encompassing theory. As the referee notes, we are limited by the current state of the field, which has an incomplete understanding of cell growth. 

      Nonetheless, our analysis does propose concrete, mechanistically informed explanations. For example, we highlight how growth-optimizing feedback—such as cells’ ability to reallocate ribosomes or adjust proteome composition—naturally leads to multiplicative rather than additive or minimal fitness effects. We also link the empirical deviations from pure multiplicative behavior to differences in how specific pathways re-balance under perturbation, and we suggest that a product-like rule emerges when multiple interconnected processes each partially limit cell growth.

      In the discussion, we clarify what additional data and models we think will be required to advance this question. Namely, we propose extending our approach through larger-scale, more detailed modeling frameworks – that may include explicit modeling of ppGpp or TOR activities in bacteria or eukaryotic cells, respectively. We also emphasize the importance of refining the measurement of cell growth rates to uncover subtle deviations from the product rule that could yield greater mechanistic insight. By integrating high-throughput genetic data with nextgeneration computational models, it should be possible to hone in on the specific biological principles (e.g., metabolic bottlenecks, resource reallocation) that underlie the multiplicative neutrality function.

      Reviewer #2 (Public review):

      The paper deals with the important question of gene epistasis, focusing on asking what is the correct null model for which we should declare no epistasis.

      In the first part, they use the Synthetic Genetic Array dataset to claim that the effects of a double mutation on growth rate are well predicted by the product of the individual effects (much more than e.g. the additive model). The second (main) part shows this is also the prediction of two simple, coarse-grained models for cell growth.

      I find the topic interesting, the paper well-written, and the approach innovative.

      One concern I have with the first part is that they claim that:

      "In these experiments, the colony area on the plate, a proxy for colony size, followed exponential growth kinetics. The fitness of a mutant strain was determined as the rate of exponential growth normalized to the rate in wild type cells."

      There are many works on "range expansions" showing that colonies expand at a constant velocity, the speed of which scales as the square root of the growth rate (these are called "Fisher waves", predicted in the 1940', and there are many experimental works on them, e.g. https://www.pnas.org/doi/epdf/10.1073/pnas.0710150104) If that's the case, the area of the colony should be proportional to growth_rate X time^2 , rather than exp(growth_rate*time), so the fitness they might be using here could be the log(growth_rate) rather than growth_rate itself? That could potentially have a big effect on the results.

      We thank the reviewer for their thoughtful remarks. As they rightly pointed out, a large body of literature supports that colonies expand at constant velocity both from a theoretical and experimental standpoint. 

      As discussed in the answer to the first question of Reviewer 1, this body of work also suggests that the linear expansion rate of the colony front is directly related to the single-cell exponential growth rate of the cells at the periphery. Hence, although the macroscopic colony growth may not be exponential in time, measuring colony size (or radial expansion) across different genotypes still provides a consistent and meaningful proxy for comparing their underlying growth capabilities. 

      In particular, these studies suggest (consistently with Fisher-wave theory) that the linear growth rate of the colony 𝐾 is proportional to the square root of the exponential growth rate 𝜆. Under the assumption that the product model is valid for a given double mutant and for the exponential growth rate, we would have that

      The associated wave-front velocities would then be predicted to be

      In other words, if the product model is valid for fitness measures based on exponential growth rates, it should also be valid for fitness measures based on linear colony growth rates. 

      We now include this discussion in the revised version of Section 2.3.

      Additional comments/questions:

      (1) What is the motivation for the model where the effect of two genes is the minimum of the two?

      The motivation for the minimal model is the notion that there might be a particular process that is rate-limiting for growth due to a mutation. In this case, a mutation in process X makes it really slow and process Y proceeds in parallel and has plenty of time to finish its job before cell division takes place. In this case, even a mutation to process Y might not slow down growth because there is an excess amount of time for it to be completed. Thus, the double mutant might then be anticipated to have the growth rate associated with the single mutation to process X. We now add a similar description when we introduce the different neutrality functions in Section 2.1.

      (2) How seriously should we take the Scott-Hwa model? Should we view it as a toy model to explain the phenomenon or more than that? If the latter, then since the number of categories in the GO analysis is much more than two (47?) in many cases the analysis of the experimental data would take pairs of genes that both affect one process in the Scott-Hwa model - and then the product prediction should presumably fail? The same comment applies to the other coarse-grained model.

      From our perspective, models like the Scott-Hwa model constitute the simplest representation of growth based on data that is not trivial. Moreover, the Scott-Hwa model is able to incorporate interactions between two different biological processes. We believe models, like the Scott-Hwa and Weiße models, should be viewed as more than mere toy models because they have been backed up by some empirical data, such as that showing the ribosome fraction increases with growth rate. However, the Scott-Hwa model is inherently limited by its low dimensionality and relative simplicity. We do not claim that such models can provide a full picture of the cell. As argued in the main text, we have chosen to focus on such models because of their tractability and in the hope of extracting general principles. We nonetheless agree with the reviewer that they do not have the capacity to represent interactions between genes in the same biological process. We now note this limitation in the text. 

      (3) There are many works in the literature discussing additive fitness contributions, including Kaufmann's famous NK model as well as spin-glass-type models (e.g. Guo and Amir, Science Advances 2019, Reddy and Desai, eLife 2021, Boffi et al., eLife 2023) These should be addressed in this context.

      We thank the reviewer for pointing out this part of the literature. We do believe these works constitute a relevant body of work tackling the emergence of epistasis patterns from a theoretical grounding, and now reference and discuss them in the text. 

      (4) The experimental data is for deletions, but it would be interesting to know the theoretical model's prediction for the expected effects of beneficial mutations and how they interact since that's relevant (as mentioned in the paper) for evolutionary experiments. Perhaps in this case the question of additive vs. multiplicative matters less since the fitness effects are much smaller.

      This is an interesting question. Since mutations increasing the growth rate generated by gene deletions or other systematic perturbations are rare, we did not focus on them. Of course, as the reviewer notes, in the case of evolution experiments, these fitness enhancing mutations are selected for. To address the reviewer's question, we can first consider the Scott-Hwa model. In this case, the analytical solution remains valid in the case of fitness enhancing mutations so that the fitness of the double mutant will be the product neutrality function multiplied by an additional interaction term (see Figure 3). The mathematical derivation predicts that the double mutant fitness can potentially grow indefinitely. Indeed, the denominator can be equal to zero in some cases. In simulations, we see that the observation for deleterious mutations does not seem to hold for beneficial mutations (new supplementary Figure S5 shown below). Indeed, no model seems to replicate double mutant fitnesses much better than any other. This suggests that the growth-optimizing feedback we discuss in section 2.3 may have compound effects that ultimately make double-mutant fitnesses much larger than any model predicts.

      We recognize this may be an important point, and discuss it in detail in the revised section 2.3 as well as in the discussion.

      Baryshnikova, Anastasia, Michael Costanzo, Scott Dixon, Franco J. Vizeacoumar, Chad L. Myers, Brenda Andrews, and Charles Boone. 2010. “Synthetic Genetic Array (SGA) Analysis in Saccharomyces Cerevisiae and Schizosaccharomyces Pombe.” Methods in Enzymology 470 (March):145–79.

      Elsemman, Ibrahim E., Angelica Rodriguez Prado, Pranas Grigaitis, Manuel Garcia Albornoz, ictoria Harman, Stephen W. Holman, Johan van Heerden, et al. 2022. “Whole-Cell Modeling in Yeast Predicts Compartment-Specific Proteome Constraints That Drive Metabolic Strategies.” Nature Communications 13 (1): 801.

      Gandhi, Saurabh R., Eugene Anatoly Yurtsev, Kirill S. Korolev, and Jeff Gore. 2016. “Range Expansions Transition from Pulled to Pushed Waves as Growth Becomes More Cooperative in an Experimental Microbial Population.” Proceedings of the National Academy of Sciences of the United States of America 113 (25): 6922–27.

      Gray, B. F., and N. A. Kirwan. 1974. “Growth Rates of Yeast Colonies on Solid Media.” Biophysical Chemistry 1 (3): 204–13.

      Jasnos, Lukasz, and Ryszard Korona. 2007. “Epistatic Buffering of Fitness Loss in Yeast Double Deletion Strains.” Nature Genetics 39 (4): 550–54.

      Korolev, Kirill S., Melanie J. I. Müller, Nilay Karahan, Andrew W. Murray, Oskar Hallatschek, and David R. Nelson. 2012. “Selective Sweeps in Growing Microbial Colonies.” Physical Biology 9 (2): 026008.

      Mani, Ramamurthy, Robert P. St Onge, John L. Hartman 4th, Guri Giaever, and Frederick P. Roth. 2008. “Defining Genetic Interaction.” Proceedings of the National Academy of Sciences of the United States of America 105 (9): 3461–66.

      Metzl-Raz, Eyal, Moshe Kafri, Gilad Yaakov, Ilya Soifer, Yonat Gurvich, and Naama Barkai. 2017. “Principles of Cellular Resource Allocation Revealed by Condition-Dependent Proteome Profiling.” eLife 6 (August). https://doi.org/10.7554/elife.28034.

      Meunier, J. R., and M. Choder. 1999. “Saccharomyces Cerevisiae Colony Growth and Ageing: Biphasic Growth Accompanied by Changes in Gene Expression.” Yeast (Chichester, England) 15 (12): 1159–69.

      Miller, James H., Vincent J. Fasanello, Ping Liu, Emery R. Longan, Carlos A. Botero, and Justin C. Fay. 2022. “Using Colony Size to Measure Fitness in Saccharomyces Cerevisiae.” PloS e 17 (10): e0271709.

      Onge, Robert P. St, Ramamurthy Mani, Julia Oh, Michael Proctor, Eula Fung, Ronald W. Davis, Corey Nislow, Frederick P. Roth, and Guri Giaever. 2007. “Systematic Pathway Analysis Using High-Resolution Fitness Profiling of Combinatorial Gene Deletions.” Nature Genetics 39 (2): 199–206.

      Pirt, S. J. 1967. “A Kinetic Study of the Mode of Growth of Surface Colonies of Bacteria and Fungi.” Journal of General Microbiology 47 (2): 181–97.

      Weiße, Andrea Y., Diego A. Oyarzún, Vincent Danos, and Peter S. Swain. 2015. “Mechanistic Links between Cellular Trade-Offs, Gene Expression, and Growth.” Proceedings of the National Academy of Sciences of the United States of America 112 (9): E1038–47.

      Xia, Jianye, Benjamin J. Sánchez, Yu Chen, Kate Campbell, Sergo Kasvandik, and Jens Nielsen. 2022. “Proteome Allocations Change Linearly with the Specific Growth Rate of Saccharomyces Cerevisiae under Glucose Limitation.” Nature Communications 13 (1): 2819.

      Zackrisson, Martin, Johan Hallin, Lars-Göran Ottosson, Peter Dahl, Esteban Fernandez-Parada, Erik Ländström, Luciano Fernandez-Ricaud, et al. 2016. “Scan-O-Matic: High-Resolution Microbial Phenomics at a Massive Scale.” G3 (Bethesda, Md.) 6 (9): 3003–14.

    1. eLife Assessment

      This paper highlights an important physiological function of PGAM in the differentiation and suppressive activity of Treg cells by regulating serine synthesis. This role is proposed to intersect with glycolysis and one-carbon metabolism. The study's conclusion is supported by solid evidence from in-vitro cellular and in-vivo mouse models.

    2. Reviewer #1 (Public review):

      Summary:

      This work provides a new potential tool to manipulate Tregs function for therapeutic use. It focuses on the role of PGAM in Tregs differentiation and function. The authors, interrogating publicly available transcriptomic and proteomic data of human regulatory T cells and CD4 T cells, state that Tregs express higher levels of PGAM (at both message and protein levels) compared to CD4 T cells. They then inhibit PGAM by using a known inhibitor ECGC and show that this inhibition affects Tregs differentiation. This result was also observed when they used antisense oligonucleotides (ASOs) to knockdown PGAM1.

      PGAM1 catalyzes the conversion of 3PG to 2PG in the glycolysis cascade. However, the authors focused their attention on the additional role of 3PG: acting as starting material for the de novo synthesis of serine.

      They hypothesized that PGAM1 regulates Tregs differentiation by regulating the levels of 3PG that are available for de novo synthesis of serine, which has a negative impact on Tregs differentiation. Indeed, they tested whether the effect on Tregs differentiation observed by reducing PGAM1 levels was reverted by inhibiting the enzyme that catalyzes the synthesis of serine from 3PG.

      The authors continued by testing whether both synthesized and exogenous serine affect Tregs differentiation and continued with in vivo experiments to examine the effects of dietary serine restriction on Tregs function.

      In order to understand the mechanism by which serine impacts Tregs function, the authors assessed whether this depends on the contribution of serine to one-carbon metabolism and to DNA methylation.

      The authors therefore propose that extracellular serine and serine whose synthesis is regulated by PGAM1 induce methylation of genes Tregs associated, downregulating their expression and overall impacting Tregs differentiation and suppressive functions.

      Strengths:

      The strength of this paper is the number of approaches taken by the authors to verify their hypothesis. Indeed, by using both pharmacological and genetic tools in in vitro and in vivo systems they identified a potential new metabolic regulation of Tregs differentiation and function.

    3. Reviewer #2 (Public review):

      Summary:

      The authors have tried to determine the regulatory role of Phosphoglycerate mutate (PGAM), an enzyme involved in converting 3-phosphoglycerate to 2-phosphoglycerate in glycolysis, in differentiation and suppressive function of regulatory CD4 T cells through de novo serine synthesis. This is done by contributing one carbon metabolism and eventually epigenetic regulation of Treg differentiation.

      Strengths:

      The authors have rigorously used inhibitors and antisense RNA to verify the contribution of these pathways in Treg differentiation in-vitro. This has also been verified in an in-vivo murine model of autoimmune colitis. This has further clinical implications in autoimmune disorders and cancer.

      [Editors' note: The authors addressed important comments by the reviewers.]

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This work provides a new potential tool to manipulate Tregs function for therapeutic use. It focuses on the role of PGAM in Tregs differentiation and function. The authors, interrogating publicly available transcriptomic and proteomic data of human regulatory T cells and CD4 T cells, state that Tregs express higher levels of PGAM (at both message and protein levels) compared to CD4 T cells. They then inhibit PGAM by using a known inhibitor ECGC and show that this inhibition affects Tregs differentiation. This result was also observed when they used antisense oligonucleotides (ASOs) to knockdown PGAM1.

      PGAM1 catalyzes the conversion of 3PG to 2PG in the glycolysis cascade. However, the authors focused their attention on the additional role of 3PG: acting as starting material for the de novo synthesis of serine.

      They hypothesized that PGAM1 regulates Tregs differentiation by regulating the levels of 3PG that are available for de novo synthesis of serine, which has a negative impact on Tregs differentiation. Indeed, they tested whether the effect on Tregs differentiation observed by reducing PGAM1 levels was reverted by inhibiting the enzyme that catalyzes the synthesis of serine from 3PG.

      The authors continued by testing whether both synthesized and exogenous serine affect Tregs differentiation and continued with in vivo experiments to examine the effects of dietary serine restriction on Tregs function.

      In order to understand the mechanism by which serine impacts Tregs function, the authors assessed whether this depends on the contribution of serine to one-carbon metabolism and to DNA methylation.

      The authors therefore propose that extracellular serine and serine whose synthesis is regulated by PGAM1 induce methylation of genes Tregs associated, downregulating their expression and overall impacting Tregs differentiation and suppressive functions.

      Strengths:

      The strength of this paper is the number of approaches taken by the authors to verify their hypothesis. Indeed, by using both pharmacological and genetic tools in in vitro and in vivo systems they identified a potential new metabolic regulation of Tregs differentiation and function.

      We are grateful to the reviewer for their thoughtful and constructive consideration of our work. We appreciate their comment that the number of approaches taken to test our hypothesis represents a strength that increases confidence in the conclusions.

      Weaknesses:

      Using publicly available transcriptomic and proteomic data of human T cells, the authors claim that both ex vivo and in vitro polarized Tregs express higher levels of PGAM1 protein compared to CD4 T cells (naïve or cultured under Th0 polarizing conditions). The experiments shown in this paper have all been carried out in murine Tregs. Publicly available resources for murine data (ImmGen -RNAseq and ImmPRes - Proteomics) however show that Tregs do not express higher PGAM1 (mRNA and protein) compared to CD4 T cells. It would be good to verify this in the system/condition used in the paper.

      This is a fair comment. Although our pharmacologic and genetic studies demonstrated the importance of PGAM in Treg differentiation and suppressive function in murine cells, thereby corroborating the hypothesis formed based on human CD4 cell expression data, we agree that investigating PGAM expression in murine Tregs is important in the context of our work. In reviewing the ImmPres proteomics database, the reviewer is correct that PGAM1 expression was not higher in iTregs compared to other subsets, including Th17 cells. However, when compared to other glycolytic enzymes, expression of PGAM1 increases out of proportion in iTregs. In particular, the ratio of PGAM1 to GAPDH expression is much greater in iTregs compared to Th17 cells. This data is now shown in the revised Figure S5. The disproportionate increase in PGAM1 expression is consistent with the regulatory role of PGAM in the Treg-Th17 axis via modulation of 3PG concentrations, a metabolite that lies between GAPDH and PGAM in the glycolytic pathway. The divergent expression changes between GAPDH and PGAM furthermore support the conclusion that GAPDH and PGAM play opposite roles in Treg differentiation.

      It would also be good to assess the levels of both PGAM1 mRNA and protein in Tregs PGAM1 knockdown compared to scramble using different methods e.g. qPCR and western blot. However, due to the high levels of cell death and differentiation variability, that would require cells to be sorted.

      We appreciate this comment. As noted by the reviewer, assessing PGAM1 expression via qPCR and Western blot would require cell sorting, which we do not currently have the resources to pursue. However, we measured the effect of ASOs on PGAM1 protein expression using anti-PGAM1 antibody via flow cytometry, which allowed gating on viable cells. As shown in Figure S3A, PGAM-targeted ASOs led to an approximately 40% decrease in PGAM1 expression, as measured by mean fluorescence intensity (MFI). Furthermore, we now show in revised Figure S2 that ASO uptake was near-complete in our cultured CD4 cells.

      It is not specified anywhere in the paper whether cells were sorted for bulk experiments. Based on the variability of cell differentiation, it would be good if this was mentioned in the paper as it could help to interpret the data with a different perspective.

      Cells were not sorted for bulk experiments. In the revised manuscript, this point is made clear in the text, figure legends, and Methods. It is worth noting that all bulk experiments were conducted on samples with greater than 70% cell viability (greater than 90% for stable isotope tracing studies).

      Reviewer #2 (Public review):

      Summary:

      The authors have tried to determine the regulatory role of Phosphoglycerate mutate (PGAM), an enzyme involved in converting 3-phosphoglycerate to 2-phosphoglycerate in glycolysis, in differentiation and suppressive function of regulatory CD4 T cells through de novo serine synthesis. This is done by contributing one carbon metabolism and eventually epigenetic regulation of Treg differentiation.

      Strengths:

      The authors have rigorously used inhibitors and antisense RNA to verify the contribution of these pathways in Treg differentiation in-vitro. This has also been verified in an in-vivo murine model of autoimmune colitis. This has further clinical implications in autoimmune disorders and cancer.

      We very much appreciate these comments about the rigor of the work and its implications.

      Weaknesses:

      The authors have used inhibitors to study pathways involved in Treg differentiation. However, they have not studied the context of overexpression of PGAM, which was the actual reason to pursue this study.

      We appreciate this comment and agree that overexpression of PGAM would be an excellent way to complement and further corroborate our findings. Unfortunately, despite attempting several methods, we were unable to consistently induce overexpression of PGAM1 in our primary T cell cultures.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I would suggest increasing the font size for flow cytometry gates. Percentages are the focus of the analysis, and it is very hard to read any.

      We have increased the font size on all flow cytometry gates, as suggested.

      Moreover, most of the flow data show Tregs polarization based on CD25 and FOXP3 expression. However, Figure 3 A, Figure 4D and Figure S3 show Tregs polarization based on FSC and Foxp3. Is there any reason for this?

      Antibody staining against CD25 was poor in the experiments noted, which is why Foxp3 alone was used to identify Treg cells in these experiments.

      Especially for Figure 3A, other cells could also express Foxp3 making interpretation difficult.

      This is a fair comment. With respect to Figures 4D and S3 (now revised Figure S4), these experiments were conducted in isolated CD4 cells, in which the population of CD25-Foxp3+ cells is minimal following Treg polarization (as evident in our other figures). Regarding Figure 3A, previous work has found minimal expression of Foxp3 in circulating non-T cells (Devaud et al., 2014, PMID 25063364), such that we have confidence the identified Foxp3 expressing cells are, in fact, Treg cells. Notably, Figure 3A was already gated on CD4+ T cells, and in the periphery of wild-type mice, these would be reasonably referred to as Tregs, although this does not apply to diseased states or specific cases such as the tumor microenvironment.

      The level of murine Tregs differentiation varies a lot among experiments. The % of CD4+CD25+FOXP3+ is ranging from 14% to 77% (controls). It would be good to understand and verify why such differentiation variability.

      For most of our Treg polarization experiments, % differentiation in the control group falls within the 35 – 55% range. We found that treatment with ASOs (even scrambled control ASOs) tended to decrease Treg polarization overall, leading to lower numbers of Foxp3 expression in these experiments. Differentiation was similarly low in a few experiments that did not involve the use of ASOs, which we believe was caused by batch variability in the recombinant TGF-b that was used for polarization. Despite this variability, experiments were conducted with sufficient independent experiments and biological replicates to observe consistent trends and to have confidence in the results, as corroborated by statistical testing and the wide variety of experimental approaches used to verify our conclusions. Notably controls were run in every experiment, allowing accurate comparisons to be made in each individual experiment.

      Similar comments apply to the level of cell death observed in the cultures of polarizing Tregs.

      Although there was some variability in cell viability between experiments, flow cytometry experiments were always gated on live cells, and we believe concerns about reproducibility are substantially mitigated by the number of independent experiments, biological replicates, and distinct experimental approaches used for verification of the experimental findings. For all bulk experiments, cell viability was greater than 70% and equal across samples. For the flux studies, viability was greater than 90% and equal across samples.

      Figure 2 B and D: EGCG has been used at two different concentrations. Is it lower in Figure 2D because of one condition being a combination of inhibitors or is it a typo?

      The doses stated in the original legend are correct. Yes, drug doses were optimized for combination-treatment experiments. This point is now clarified in the figure legend.

      Figure 2G: The description in the results does not match figure legend - Text - serine/glycine-free media or control (serine/glycine-containing) media; figure legend - serine/glycine-free media or media containing 4 mM serine.

      We thank the reviewer for pointing out this discrepancy, which was an error in the text. The two conditions used were 1) serine/glycine-free media, and 2) serine/glycine-free media supplemented with 4 mM serine. The text and figure legend have both been updated to clarify this point.

      Figure 3 F and G: the graphs do not show the individual points.

      Individual points were not shown in these graphs because they are derived from scRNA-seq data, with SCFEA calculated from individual cells. As such, there are far too many data points to display all individual values.

      CD4+ T-cell isolation and culture: cells were cultured in 50%RPMI and 50% AIM-V.

      I thought that AIM-V medium was intended to be for human cultures. Could some of the conditions explain the low level of differentiation observed in some experiments? If there is such variability it might be because the conditions used are not optimal and therefore not reproducible.

      We appreciate this critique. Although AIM-V media is often used for ex vivo human T cell cultures, it can similarly be used for mouse T cell culture with the addition of b-mercaptoethanol, as suggested by ThermoFisher and as used in prior publications, such as PMID 36947105. As outlined in the responses above, the differentiation we observed was consistent in most experiments, with some variability based on experimental conditions (such as lower differentiation in the setting of ASO treatment). Furthermore, we believe the number of independent experiments, biological replicates, and independent experimental approaches used in the study supports the reproducibility of our findings.

      Figures S1 A, S2 B, and S4: the flow data are shown using both heights (FSC) and area (zombie NIR dye). It would be better to use areas for both parameters.

      In the revised manuscript, areas are now used on both the x- and y-axes for these figures.

      Figure S1 B and S2 C: The bar graphs are both showing proliferation index, however, the graphs are labelled differently in the two figures and in the legend (proliferation index -Fig S1 B; division index -Fig S2 C and replication index in the legend of Fig S2 C). The explanation of how the index has been calculated should probably go in the legend of the first figure that shows it.

      We thank the reviewer for this comment. In the revised manuscript, we have ensured consistency in the terminology (“proliferation index” is now used consistently), and the explanation of the proliferation index calculation is now included in the legend to Figure S1, where the proliferation index first appears.

      Were Tregs PGAM1 KD used for RNAseq sorted or not? Based on the plots shown in Figure S2 B there is ~ 50% death which needs to be taken into consideration for the analysis if not depleted.

      Similar question for all bulk experiments. It is not specified in the methods or figure legends.

      The cells used for RNAseq and other bulk experiments were not sorted. This point is now made clear in the text, figure legends, and Methods. However, cultures were only used for bulk analyses if the viability in those particular experiments was greater than 70%. Given the sensitivity of stable isotope tracing analyses, cultures were only analyzed for those studies if viability was greater than 90%. In these experiments, viability was similar across samples.

      It was mentioned in Figure 1 that the PGAM KD led to transcriptional changes that impacted MYC targets and mTORC1 signalling. It would be good to validate these findings maybe with more targeted experiments.

      We appreciate this suggestion and agree that validation and further investigation of these critical targets would be worthwhile. However, because of limitations to resources and the fact that these findings are not critical to the main conclusions of the study, we consider these experiments as future directions beyond the scope of the current work.

      Reviewer #2 (Recommendations for the authors):

      Here are a few suggestions and recommendations to improve the research study.

      (1) The authors have used the word 'vehicle' in most of the figures, however, this word is not explained well in the figure legend. The authors may want to clarify to readers whether vehicle is a plasmid or a solvent for control purposes. For example, in Figure 1D, if vehicle is a plasmid, then another sample for vehicle +/-EGCG should be considered for the rigor in results.

      Thank you for identifying this point of confusion. For all drug treatment experiments, vehicle controls consisted of solvent alone without drug. For ASO experiments, the control condition consisted of scrambled ASO. This point is now made clear in the Methods (“Drug and ASO Treatments” section) as well as in the main text. Furthermore, the figure legends and axes have been edited such that “vehicle” is only used to refer to drug experiments (in which solvent vehicle alone was used as control), and “control” is used to refer to ASO experiments (in which scrambled ASO served as control).

      (2) Figure 1H represents the RNAseq data for knockdown of PGAM1. It might be interesting to see similar data for the overexpression of PGAM1.

      We appreciate this comment and agree that overexpression of PGAM1 would be an excellent way to complement and further corroborate our findings using PGAM1 knockdown and pharmacologic inhibition. Unfortunately, despite attempting several methods, we were unable to consistently induce overexpression of PGAM1 in our primary T cell cultures.

      (3) The font in most of the data from flow cytometry experiments (for example 1I) is not legible. Please increase the font size to make it legible.

      Font sizes have been increased.

      (4) Figure S2, PGAM expression was measured by Flow cytometry experiments. A similar experiment using western Blot, the direct measurement of protein expression, will strengthen the evidence.

      We appreciate this comment. As noted in the public reviews, Western blot would require sorting of viable cells, and unfortunately we do not currently have the resources to conduct additional experiments with FACS. However, we respectfully note that assessing protein expression via flow cytometry quantifies protein levels based on antibody binding, similar to Western blot (or in-cell Western blot), while also allowing gating on viable cells. We also note that nearly 100% of cultured CD4 cells took up ASO, as shown in revised Figure S2.

      (5) Figure 1J, it is mentioned in the text that 10 datasets were studied. a normalized parameter such as overexpression or suppression could be studied with the variance. It will be good to understand the variability in response among different datasets.

      We thank the reviewer for the opportunity to clarify this data. This data was taken from a single published dataset (Dykema et al., 2023, PMID 37713507) in which 10 distinct subsets of tumor-infiltrating Tregs (TIL-Tregs) were identified, rather than from 10 distinct datasets. After identifying the Activated (1)/OX40hiGITRhi cluster of TIL-Tregs as a highly suppressive subset that correlates with resistance to immune checkpoint blockade, Dykema et al. compared gene expression in this subset to the bulked collection of the other 9 subsets, and the data shown in Figure 1J is derived from this analysis. As such, the data in Figure 1J is, indeed, a normalized parameter of overexpression, showing overexpression of PGAM1 in this highly suppressive subset versus other subsets, out of proportion to proximal rate-limiting glycolytic enzymes. The main text and figure/figure legend have been edited to clarify this point.

      (6) It will be good to rephrase that the roles of PGAM and GAPDH are opposite, this paragraph is confusing since words such as "supporting Treg differentiation" and "augments Treg differentiation" have been used, although the data in S3 and 1D are opposite. Any possible explanation for the opposing roles of PGAM and GAPDH, despite their involvement in the same pathway of glycolysis, can be added to build up the interest of readers. What is the comparison of the expression of GAPDH and PGAM in Figure 1J?

      We thank the reviewer for this comment, as we appreciate that the language used in our initial manuscript was confusing. We have edited the main text, in both the Results and Discussion section, in order to clarify this point and provide explanation as suggested. Indeed, our experimental data indicate that GAPDH and PGAM play opposing roles in Treg differentiation; whereas inhibiting GAPDH activity leads to greater Treg differentiation (shown in revised Figure S4 and our previously published work), similarly inhibiting PGAM leads to diminished Treg differentiation. We view this point (that enzymes within the same glycolytic pathway can have divergent roles in T cells) as a primary implication of these findings, with the explanation that individual enzymes within the same pathway can differentially regulate the concentrations of key immunoactive metabolites. In our study, we identified 3PG as a key immunoactive metabolite whose concentration would be differentially impacted by GAPDH activity versus PGAM activity, since it lies downstream of GAPDH but upstream of PGAM.

      To provide further evidence for the opposing roles of GAPDH and PGAM, we analyzed existing datasets. In the revised Figure S5, we show that the PGAM1/GAPDH expression ratio increases in both human and mouse Tregs compared to other CD4 subsets.

      (7) Figure 2C, what is M+1, M+2 etc. Does it represent the number of hrs? If so, why are the results for 6 hrs are not shown since the study was for 6 hrs? And what is happening with M+2?

      We appreciate the opportunity to clarify this point and apologize for prior confusion. The terminology “M+n” refers to mass-shift produced by incorporation of 13-carbon. When a metabolite incorporates a single 13-carbon atom, it has a mass-shift of one (M+1), whereas incorporation of three 13-carbon atoms produces a mass-shift of three (M+3). Because we used uniformly 13-carbon labeled glucose, 3PG derived from the labeled glucose will have all three carbons labeled (M+3), as will serine that is newly synthesized from 3PG. Because serine can enter the downstream one-carbon cycle and be recycled, we also see the appearance of recycled serine with a single 13-carbon (M+1). The critical point in Figure 2C is that labeled serine is higher in Th17 versus Treg cells, demonstrating that de novo serine synthesis from glycolysis is greater. The main text has been edited to clarify this important point.

      (8) Including the quantification of inhibition and rescuing effect of EDCG and NCT will be helpful to readers.

      The inhibition and rescuing effects of these drugs are quantified in Figures 2D and 2E as they relate to Treg differentiation. The reviewer may be referring to quantification of relative effects on 3PG levels and serine synthesis. If so, we unfortunately do not have the resources to complete these studies, which would require large-scale quantitative mass spectrometry studies or enzyme activity assays.

      (9) Figure 2D and 2E: The authors could also experiment with a dose dependence curve on EGCG and NCT on this phenotype for Treg differentiation. That can help understand the balance between serine pathways and glycolysis pathways. Similarly, the dose dependence of 3PG for Figure 2E and comparing it to the kinetic constants of these enzymes involved and cellular concentrations, these details will be helpful to understand the metabolic dynamics, because this phenotype could be an interplay of both 3PG and serine concentrations.

      We appreciate this suggestion and agree that establishing detailed dose-dependence curves and relating these findings to enzyme kinetics would yield additional insights into the biochemical regulation provided by PGAM and PHGDH. Unfortunately we do not have the resources to pursue these additional studies, which therefore lie beyond the scope of our current work.

      (10) Figure 4: Explanation for no effect of methionine supplementation?

      Thank you for raising this point. We speculate that methionine supplementation had minimal effect because physiologic levels of serine were sufficient to provide basal substrates for the one-carbon cycle. On the other hand, eliminating methionine produced enough of a decrease in one-carbon metabolism to potentiate the effects of excess serine. This point is now briefly addressed in the text.

      (11) For direct connection between PGAM and methylation, methylation experiments could be worked out with NCT1 and SHIN1 (as in Figure 4H).

      We very much appreciate this suggestion, which we agree would provide a strong complementary approach. Unfortunately we do not have the resources to pursue these studies currently. However, we believe the increased methylation observed following PGAM knockdown (Figure 4G) as strong evidence that PGAM activity directly modulates methylation.

    1. eLife Assessment

      This important study fills an gap in our knowledge of the evolution of GPCRs in holozoans, as well as the phylogeny of associated signaling pathway components such as G proteins, GRKs, and RIC8 proteins. The evidence supporting the conclusions is compelling, with the analysis of extensive new genomic data from choanoflagellates and other non-animal holozoans. Overall, the study is thorough and well-executed. It will be a resource for researchers interested in both the comparative genomics of multicellularity and GPCR biology more broadly, especially given the importance of GPCRs as highly druggable targets.

    2. Reviewer #1 (Public review):

      Summary:

      The authors strived for an inventory of GPCRs and GPCR pathway component genes within the genomes of 23 choanoflagellates and other close relatives of metazoans.

      Strengths:

      The authors generated a solid phylogenetic overview of the GPCR superfamily in these species. Intriguingly, they discover novel GPCR families, novel assortments of domain combinations, and novel insights into the evolution of those groups within the Opisthokonta clade. A particular focus is laid on adhesion GPCRs, for which the authors discover many hitherto unknown subfamilies based on Hidden Markov Models of the 7TM domain sequences, which were also reflected by combinations of extracellular domains of the homologs. In addition, the authors provide bioinformatic evidence that aGPCRs of choanoflagellates also contain a GAIN domain, which is self-cleavable, thereby reflecting the most remarkable biochemical feat of aGPCRs.

      Weaknesses:

      The chosen classification scheme for aGPCRs may require reassessment and amendment by the authors in order to prevent confusion with previously issued classification attempts of this family.

    3. Reviewer #2 (Public review):

      Summary:

      The authors set out to characterise the GPCR family in choanoflagellates (and other unicellular holozoans). GPCRs are the most abundant gene family in many animal genomes, playing crucial roles in a wide range of physiological processes. Although they are known to evolve rapidly, GPCRs are an ancient feature of eukaryotic biology. Identifying conserved elements across the animal-protist boundary is therefore a valuable goal, and the increasing availability of genomes from non-animal holozoans provides new opportunities to explore evolutionary patterns that were previously obscured by limited taxon sampling. This study presents a comprehensive re-examination of GPCRs in choanoflagellates, uncovering examples of differential gene retention and revealing the dynamic nature of the GPCR repertoire in this group. As GPCRs are typically involved in environmental sensing, understanding how these systems evolved may shed light on how our unicellular ancestors adapted their signalling networks in the transition to complex multicellularity.

      Strengths:

      The paper combines a broad taxonomic scope with the use of both established and recently developed tools (e.g., Foldseek, AlphaFold), enabling a deep and systematic exploration of GPCR diversity. Each family is carefully described, and the manuscript also functions as an up-to-date review of GPCR classification and evolution. Although similar attempts to understand GPCR evolution were made over the last decade, the authors build on this foundation by identifying new families and applying improved computational methods to better predict structure and function. Notably, the presence of Rhodopsin-like GPCRs in some choanoflagellates and ichthyosporeans is intriguing, even though they do not fall within known animal subfamilies. The computational framework presented here is broadly applicable, offering a blueprint for surveying GPCR diversity in other non-model eukaryotes (and even in animal lineages), potentially revealing novel families relevant to drug discovery or helping revise our understanding of GPCR evolution beyond model systems.

      Weaknesses:

      While the study contributes several interesting observations, it does not radically revise the evolutionary history of the GPCR family. However, in an era increasingly concerned with the reproducibility of scientific findings, this is arguably a strength rather than a weakness. It is encouraging to see that previously established patterns largely hold, and that with expanded sampling and improved methods, new insights can be gained, especially at the level of specific GPCR subfamilies. Then, no functional follow-ups are provided in the model system Salpingoeca rosetta, but I am sure functional work on GPCRs in choanoflagellates is set to reveal very interesting molecular adaptations in the future.

    1. eLife Assessment

      This important theoretical study examines the possibility of encoding genomic information in a collective of short overlapping strands (e.g., the Virtual Circular Genome (VCG) model). The study presents convincing theoretical arguments, simulations and comparisons to experimental data to point at potential features and limitations of such distributed collective encoding of information. The work should be of relevance to colleagues interested in molecular information processing and to those interested in pre-Central Dogma or prebiotic models of self-replication.

    2. Reviewer #1 (Public review):

      Summary:

      This is an interesting theoretical study examining the viability of Virtual Circular Genome (VCG) model, a recently proposed scenario of prebiotic replication in which a relatively long sequence is stored as a collection of its shorter subsequences (and their compliments). It was previously pointed out that VCG model is prone to so-called sequence scrambling which limits the overall length of such a genome. In the present paper, additional limitations are identified. Specifically, it is shown that VCG is well replicated when the oligomers are elongated by sufficiently short chains from "feedstock" pool. However, ligation of oligomers from VCG itself results in a high error rate. I believe the research is of high quality and well written. However, the presentation could be improved and the key messages could be clarified.

      (1) It is not clear from the paper whether the observed error has the same nature as sequence scrambling

      (2) The authors introduce two important lengths LS1 and LS2 only in the conclusions and do not explain enough which each of them is important. It would make sense to discuss this early in the manuscript.

      (3) It is not entirely clear why specific length distribution for VCG oligomers has to be assumed rather than emerged from simulations.

      (4) Furthermore, the problem has another important length, L0 that is never introduced or discussed: a minimal hybridization length with a lifetime longer than the ligation time. From the parameters given, it appears that L0 is sufficiently long (~10 bases). In other words, it appears that the study is done is a somewhat suboptimal regime: most hybridization events do not lead to a ligation. Am I right in this assessment? If that is the case, the authors might want to explore another regime, L0<br /> Strengths:

      High-quality theoretical modeling of an important problem is implemented.

    3. Reviewer #2 (Public review):

      Summary:

      This important theoretical and computational study by Burger and Gerland attempts to set environmental, compositional, kinetic, and thermodynamic constraints on the proposed virtual circular genome (VCG) model for the early non-enzymatic replication of RNA. The authors create a solid kinetic model using published kinetic and thermodynamic parameters for non-enzymatic RNA ligation and (de)hybridization, which allows them to test a variety of hypotheses about the VCG. Prominently, the authors find that the length (longer is better) and concentration (intermediate is better) of the VCG oligos have an outsized impact on the fidelity and yield of VCG production with important implications for future VCG design. They also identify that activation of only RNA monomers, which can be achieved using environmental separation of the activation and replication, can relax the constraints on the concentration of long VCG component oligos by avoiding the error-prone oligo-oligo ligation. Finally, in a complex scenario with multiple VCG oligo lengths, the authors demonstrate a clear bias for the extension of shorter oligos compared to the longer ones. This effect has been observed experimentally (Ding et al., JACS 2023) but was unexplained rigorously until now. Overall, this manuscript will be of interest to scientists studying the origin of life and the behavior of complex nucleic acid systems.

      Strengths:

      - The kinetic model is carefully and realistically created, enabling the authors to probe the VCG thoroughly.<br /> - Fig. 6 outlines important constraints for scientists studying the origin of life. It supports the claim that the separation of activation and replication chemistry is required for efficient non-enzymatic replication. One could easily imagine a scenario where activation of molecules occurs, followed by their diffusion into another environment containing protocells that encapsulate a VCG. The selective diffusion of activated monomers across protocell membranes would then result in only activated monomers being available to the VCG, which is the constraint outlined in this work. The proposed exclusive replication by monomers also mirrors the modern biological systems, which nearly exclusively replicate by monomer extension.<br /> - Another strength of the work is that it explains why shorter oligos extend better compared to the long ones in complex VCG mixtures. This point is independent of the activation chemistry used (it simply depends on the kinetics and thermodynamics of RNA base-pairing) so it should be very generalizable.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is an interesting theoretical study examining the viability of Virtual Circular Genome (VCG) model, a recently proposed scenario of prebiotic replication in which a relatively long sequence is stored as a collection of its shorter subsequences (and their compliments). It was previously pointed out that VCG model is prone to socalled sequence scrambling which limits the overall length of such a genome. In the present paper, additional limitations are identified. Specifically, it is shown that VCG is well replicated when the oligomers are elongated by sufficiently short chains from ”feedstock” pool. However, ligation of oligomers from VCG itself results in a high error rate. I believe the research is of high quality and well written. However, the presentation could be improved and the key messages could be clarified.

      Strengths:

      High-quality theoretical modeling of an important problem is implemented.

      Weaknesses:

      The conclusions are somewhat convoluted and could be presented better.

      (1) It is not clear from the paper whether the observed error has the same nature as sequence scrambling.

      We thank the Reviewer for pointing out that this important point was not clearly explained. The sequence errors observed in our model are indeed of the same nature as sequence scrambling previously identified by Chamanian and Higgs (Chamanian and Higgs, PLoS Comp Biol 2022). The core issue is the ligation of two oligomers representing non-adjacent segments of the genome sequence, leading to the formation of ”chimeric” products that are not part of the desired genome.

      Our analysis identifies the ligation of VCG oligomers (V+V reactions) as the primary mechanism driving sequence scrambling. This allowed us to propose two strategies to mitigate sequence scrambling: (i) tuning the length and concentration of the VCG oligomers, and (ii) considering scenarios where only feedstock monomers contribute to elongation (non-reactive VCG oligomers). We modified the Introduction and Results section of our manuscript to convey this connection more clearly.

      (2) The authors introduce two important lengths LS1 and LS2 only in the conclusions and do not explain enough which each of them is important. It would make sense to discuss this early in the manuscript.

      We agree with the Reviewer and have followed the suggestion to introduce the two important length scales earlier in the manuscript (in the Model section of the main text). In the updated version, we refer to these length scales as the exhaustive coverage length L<sub>E</sub> (formerly LS1) and the unique subsequence length L<sub>U</sub> (formerly LS2). The exhaustive coverage length L<sub>E</sub> is defined as the maximum motif length for which all possible sequences of that length appear somewhere in the genome. In contrast, the unique subsequence length L<sub>U</sub> is the minimum motif length such that each subsequence of that length occurs only once in the genome, thus giving each motif a unique ”address”.

      Generally, a genome of length L<sub>G</sub> contains at most 2L<sub>G</sub> distinct subsequences, implying that L<sub>E</sub> can be at most , and L<sub>U</sub> must be at least , where ⌊...⌋ and ⌈...⌉ denote the next lower and higher integer, respectively. While the previous version of the manuscript focused exclusively on the limiting case L<sub>E</sub> \= L<sup>max</sup><sub>E</sub> and L<sub>U</sub> \= L<sup>min</sup><sub>U</sub> , we have extended our analysis to genomes with a broader range of L<sub>E</sub> and L<sub>U</sub> values the revised manuscript.

      This extended analysis reveals that, for accurate and efficient replication, the VCG oligomer length must always exceed L<sub>U</sub>, regardless of the choice of L<sub>E</sub>. The required margin beyond L<sub>U</sub> depends on the distribution of intermediate-length motifs (i.e., with L<sub>E</sub> < L < L<sub>U</sub>), but is typically only a few nucleotides.

      (3) It is not entirely clear why specific length distribution for VCG oligomers has to be assumed rather than emerged from simulations.

      We have integrated these new findings into the Results section of the main text and expanded the discussion of their implications for the prebiotic relevance of the VCG scenario in the Discussion section. Full methodological details are provided in the Supplementary Material (Sections S1 and S8).

      We thank the Reviewer for this insightful question. Our choice to assume specific length distributions for VCG oligomers is motivated by both conceptual and practical considerations. We explain our reasoning more clearly in the revised manuscript, in the beginning of the Model section of the main text.

      Conceptually, our study focuses on the propagation of sequence information by an already-formed VCG, rather than its emergence from a random pool. As discussed by Chamanian and Higgs, the spontaneous formation of a VCG from randomly interacting oligomers is a rare event. Our aim is to understand whether, once formed, such a structure can robustly replicate under prebiotic conditions. This question is best addressed when the genome and the oligomer pool (including their lengths and concentrations) can be systematically controlled.

      From a practical standpoint, working with a controllable pool of oligomers facilitates direct comparison to recent experimental studies that use predefined and well-characterized oligomer pools (Ding et al. JACS 2023). With our current methods and realistic rate constants, simulating the emergence of such pools from simple building blocks (e.g., monomers and dimers) would be computationally prohibitive, due to the low ligation rate. For example, in a system containing monomers (concentration 0.1mM) and octamers (concentration 1µM) in a volume of V = 3.3µm<sup>3</sup>, simulating the time between two ligation events takes over 300 hours of compute time (see SI Fig. S2). This renders dynamic pool generation unfeasible for the scope of our study.

      (4) Furthermore, the problem has another important length, L0 that is never introduced or discussed: a minimal hybridization length with a lifetime longer than the ligation time. From the parameters given, it appears that L0 is sufficiently long (∼ 10 bases). In other words, it appears that the study is done is a somewhat suboptimal regime: most hybridization events do not lead to a ligation. Am I right in this assessment? If that is the case, the authors might want to explore another regime, L_0 < LS_1, by considering a higher ligation rate.

      Indeed, we assume that the ligation rate is smaller than both the hybridization and dehybridization rates for any oligomer typically included in the pool (up to length 10). In terms of effective length scales, this corresponds to L<sub>0</sub> ≈ 10nt, with L<sub>0</sub> defined as stated by the Reviewer, i.e., the hybridization length corresponding to a lifetime comparable to the ligation time. Most of our analysis actually exploits the small ligation rate, by employing an adiabatic approximation in which ligation is assumed to be slower than any hybridization or dehybridization process in the pool irrespective of oligomer length. As the Reviewer states, in this regime most hybridization events are transient, and will not result in ligation, since the complexes typically dissociate before ligation can occur.

      While we agree that this assumption limits the overall yield of replication, it has a beneficial effect on replication fidelity. Oligomers that hybridize with mismatches tend to unbind more quickly due to the destabilizing effect of mismatches. In the slow-ligation regime, such complexes are likely to dissociate before a ligation can occur, preventing the formation of incorrect products. In contrast, if the ligation rate was comparable to the unbinding rate of mismatched hybrids, these incorrect associations could undergo ligation, thereby lowering the fidelity of replication. We thus view the regime L<sub>0</sub> > L<sub>V</sub> as more favorable for studying the error-suppressing potential of the VCG mechanism, though we acknowledge that exploring the effects of faster ligation rates is an interesting question for future work.

      Reviewer #2 (Public review):

      Summary:

      This important theoretical and computational study by Burger and Gerland attempts to set environmental, compositional, kinetic, and thermodynamic constraints on the proposed virtual circular genome (VCG) model for the early non-enzymatic replication of RNA. The authors create a solid kinetic model using published kinetic and thermodynamic parameters for non-enzymatic RNA ligation and (de)hybridization, which allows them to test a variety of hypotheses about the VCG. Prominently, the authors find that the length (longer is better) and concentration (intermediate is better) of the VCG oligos have an outsized impact on the fidelity and yield of VCG production with important implications for future VCG design. They also identify that activation of only RNA monomers, which can be achieved using environmental separation of the activation and replication, can relax the constraints on the concentration of long VCG component oligos by avoiding the error-prone oligo-oligo ligation. Finally, in a complex scenario with multiple VCG oligo lengths, the authors demonstrate a clear bias for the extension of shorter oligos compared to the longer ones. This effect has been observed experimentally (Ding et al., JACS 2023) but was unexplained rigorously until now. Overall, this manuscript will be of interest to scientists studying the origin of life and the behavior of complex nucleic acid systems.

      Strengths:

      • The kinetic model is carefully and realistically created, enabling the authors to probe the VCG thoroughly.

      • Fig. 6 outlines important constraints for scientists studying the origin of life. It supports the claim that the separation of activation and replication chemistry is required for efficient non-enzymatic replication. One could easily imagine a scenario where activation of molecules occurs, followed by their diffusion into another environment containing protocells that encapsulate a VCG. The selective diffusion of activated monomers across protocell membranes would then result in only activated monomers being available to the VCG, which is the constraint outlined in this work. The proposed exclusive replication by monomers also mirrors the modern biological systems, which nearly exclusively replicate by monomer extension.

      • Another strength of the work is that it explains why shorter oligos extend better compared to the long ones in complex VCG mixtures. This point is independent of the activation chemistry used (it simply depends on the kinetics and thermodynamics of RNA base-pairing) so it should be very generalizable.

      We thank the Reviewer for the careful assessment of our work and this concise summary of our main points.

      Weaknesses:

      • Most of the experimental work on the VCG has been performed with the bridged 2aminoimidazolium dinucleotides, which are not featured in the kinetic model of this work. Oher studies by Szostak and colleagues have demonstrated that non-enzymatic RNA extension with bridged dinucleotides have superior kinetics (Walton et al. JACS 2016, Li et al. JACS 2017), fidelity (Duzdevich et al. NAR 2021), and regioselectivity (Giurgiu et al. JACS 2017) compared to activated monomers, establishing the bridged dinucleotides as important for non-enzymatic RNA replication. Therefore, the omission of these species in the kinetic model presented here can be perceived as problematic. The major claim that avoidance of oligo ligations is beneficial for VCGs may be irrelevant if bridged dinucleotides are used as the extending species, because oligo ligations (V + V in this work) are kinetically orders of magnitude slower than monomer extensions (F + V in this work) (Ding et al. NAR 2022). Formally adding the bridged dinucleotides to the kinetic model is likely outside of the scope of this work, but perhaps the authors could test if this should be done in the future by simply increasing the rate of monomer extension (F + V) to match the bridged dinucleotide rate without changing rate of V + V ligation?

      We thank the Reviewer for this insightful comment. Indeed, we did not design our model to specifically describe the use of bridged 2-aminoimidazolium dinucleotides as feedstock for the VCG scenario. Adding the bridged dinucleotides to our model would require allowing for feedstock that effectively changes its length during the ligation reaction. As anticipated already by the Reviewer, this is outside the scope of our current modeling framework, which was chosen to explore the generic issue of sequence scrambling in the VCG scenario without distinguishing between different types of activation chemistries.

      Along the lines of the Reviewer’s suggestion, we clarified in the revised manuscript that we consider two limiting cases out of a family of models with two different ligation rate constants, k<sub>lig,1</sub> for ligations involving a monomer and k<sub>lig,>1</sub> for ligations involving no monomer, allowing for kinetic discrimination between these processes. We consider the two limiting cases where either k<sub>lig,1</sub> = k<sub>lig,>1</sub> or k<sub>lig,1</sub>/k<sub>lig,1</sub> → 0. The latter case, captures the behavior expected from an activation chemistry that enables fast primer extension but slow ligation, thereby suppressing sequence scrambling via V+V ligation events. The corresponding results, presented in Figure 6 and 7, indeed show that the VCG replication efficiency approaches 100% for pools that are rich in VCG oligomers.

      Our coarse-grained model, which does not explicitly describe the activation chemistry, was sufficient to capture important kinetic and thermodynamic constraints of the VCG scenario, and to qualitatively explain the experimental observation of a preferential extension of short over long VCG oligomers (Fig. 7B). For future work, we plan to extend our model to account for the activation chemistry in detail, to allow for a more quantitative comparison between theory and experiment.

      • The kinetic and thermodynamic parameters for oligo binding appear to be missing two potentially important components. First, base-paired RNA strands that contain gaps where an activated monomer or oligo can bind have been shown to display significantly different kinetics of ligation and binding/unbinding than complexes that do not contain such gaps (see Prywes et al. eLife 2016, Banerjee et al. Nature Nanotechnology 2023, and Todisco et al. JACS 2024). Would inclusion of such parameters alter the overall kinetic model?

      We thank the Reviewer for highlighting these recent studies. Todisco et al. (JACS 2024) report that complexes with gaps are well described by standard nearest-neighbor models, while stacking interactions at nick sites confer additional stability beyond these predictions. Our model is therefore expected to capture the thermodynamics of complexes with gaps accurately, but likely underestimates the stability of complexes containing nicks. In the VCG pool, all productive ligation complexes (F+F, F+V, V+V) inherently contain a nick and thus benefit from this stabilization, whereas unproductive complexes typically do not. The added stability is expected to increase the residence time of oligomers in productive complexes, thereby enhancing overall extension rates. However, since this stabilization applies uniformly across all productive complexes, it does not shift the relative contributions of different ligation pathways (in particular, correct vs. incorrect).

      This reasoning assumes that hybridization and dehybridization occur on timescales faster than ligation or primer extension. It is conceivable that this separation of timescales does not hold, particularly for oligomers binding to templates with gaps, where association is slower due to steric hindrance, while dissociation is further slowed by stabilizing nicks. As a result, the residence time of such complexes can become comparable to (or longer than) the ligation timescale. We now discuss this aspect more thoroughly in the revised Results and Discussion sections. Capturing the resulting effects in our analytical framework would require relaxing the adiabatic assumption, which is beyond the scope of this work. We recognize the relevance of the non-adiabatic regime of the dynamics, and hope to explore this regime in follow-up work.

      • Second, it has been shown that long base-paired RNA can tolerate mismatches to an extent that can result in monomer ligation to such mismatched duplexes (see Todisco et al. NAR 2024). Would inclusion of the parameters published in Todisco et al. NAR 2024 alter the kinetic model significantly?

      In contrast to complexes with nicks and gaps, mismatched complexes (Todisco et al. NAR 2024) will decrease replication fidelity relative to the results presented in our manuscript. Our current model assumes perfect base pairing, such that replication errors arise only from binding events involving regions too short to reliably identify the correct genomic position (sequence scrambling). Allowing mismatches will indeed introduce an additional error mechanism via imperfect yet sufficiently stable duplexes, thereby increasing the rate of incorrect extensions. However, we expect this effect to be limited. Due to the thermodynamic cost of internal loops, mismatched duplexes most often have their mismatches near the ends of the hybridized region, where their destabilizing effect is weakest (Todisco et al. NAR 2024). Terminal mismatches at the 3’end of the primer have been shown to reduce the primer-extension rate significantly via a stalling effect (Rajamani et al. JACS 2010, Leu et al. JACS 2013). Hence, we would expect errors due to mismatched duplexes to primarily occur for mismatches at the 5′ end. Such errors could be mitigated by a VCG pool consisting only of oligomers that are sufficiently long relative to the unique motif length of the virtual genome.

      We have extended the Discussion section to address this interesting issue.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      • ’(apostrophes) should be prime symbols instead of apostrophes

      We thank the Reviewer for spotting this mistake, which we have now corrected.

      • In the Introduction, the section that discusses the fidelity of enzyme-free copying should include a reference to Duzdevich et al. NAR 2021, as that work measured the fidelity experimentally.

      We have included this reference together with other references on the kinetics of hybridization/dehybridization to nicks and gaps in the main text.

      • The term feedstock oligomers may be problematic, because these also include monomers. In the ”Templated ligation” section of the Model, the statement ”We consider pools in which all oligomers are activated, as well as pools in which only monomers are activated” is imprecise. ”All oligomers, including monomers,...” would be better so as to avoid confusion in readers accustomed to standard RNA language.

      We thank the Reviewer for this helpful suggestion. In the revised manuscript, we now use the term feedstock (rather than feedstock oligomers) to avoid confusion. We have also revised the sentence in the ”Templated ligation” section to read ”all oligomers, including monomers, ...” as recommended.

      • The ”Experimentally determined association rate constants” reference 24-26, which measured the rate constants for DNA. Considering that the authors are modeling RNA, I wonder if Ashwood et al. Biophysical Journal 2023 contains any relevant RNA data that could help refine the model?

      We thank the Reviewer for pointing us to the study by Ashwood et al. We have added this reference to the corresponding paragraph in the revised manuscript. Their RNA association rate constant (∼ 5 × 10<sup>7</sup> M<sup>−1</sup> s<sup>−1</sup>) is larger than the one we used (∼ 1×10<sup>6</sup> M<sup>−1</sup> s<sup>−1</sup>), however a larger association rate is in fact beneficial for the validity of our adiabatic approximation, and thus would not affect our results, as long as the thermodynamic stability remains the same. This is because faster association then also implies faster dissociation, and the ratio of the ligation timescale to the timescales of (de)hybridization then becomes even smaller, which is the regime where the adiabatic approximation made in our analysis is justified.

      • In ”Triplexe softype 1—8 and 1—9...”,the word triplexes will confuse readers with RNA expertise as triplexe simply a triple-strandedRNA.

      We thank the Reviewer for pointing out the potentially ambiguous nomenclature. To avoid confusion with triplestranded RNA structures, we now refer to binary (ternary, ...) complexes instead of duplexes (triplexes, ...) throughout the revised manuscript.

    1. eLife Assessment

      This is a valuable study that uses single-cell RNA sequencing to define tumor-intrinsic transcriptional programs that characterize distinct types of small intestine neuroendocrine tumors. The evidence supporting the claims of the authors is solid, but would benefit from a larger sample size. The work will be of interest to cancer biologists studying neuroendocrine tumors, as well as those studying tumor heterogeneity more broadly.

    2. Reviewer #1 (Public review):

      Summary:

      The authors have assembled a cohort of 10 SiNET, 1 SiAdeno, and 1 lung MiNEN samples to explore the biology of neuroendocrine neoplasms. They employ single-cell RNA sequencing to profile 5 samples (siAdeno, SiNETs 1-3, MiNEN) and single-nuclei RNA sequencing to profile seven frozen samples (SiNET 4-10).

      They identify two subtypes of siNETs, characterized by either epithelial or neuronal NE cells, through a series of DE analyses. They also report findings of higher proliferation in non-malignant cell types across both subtypes. Additionally, they identify a potential progenitor cell population in a single lung MiNEN samples.

      In the revised study, they have addressed my points and I have no further comments.

    3. Reviewer #2 (Public review):

      Summary:

      The research identifies two main SiNET subtypes (epithelial-like and neuronal-like) and reveals heterogeneity in non-neuroendocrine cells within the tumor microenvironment. The study validates findings using external datasets and explores unexpected proliferation patterns. While it contributes to understanding SiNET oncogenic processes, the limited sample size and depth of analysis present challenges to the robustness of the conclusions.

      Strengths:

      The studies effectively identified two subtypes of SiNET based on epithelial and neuronal markers. Key findings include the low proliferation rates of neuroendocrine (NE) cells and the role of the tumor microenvironment (TME), such as the impact of Macrophage Migration Inhibitory Factor (MIF).

      Weaknesses:

      However, the analysis faces challenges such as a small sample size and lack of clear biological interpretation in some analyses.

    4. Reviewer #3 (Public review):

      This study profiles small intestine NETs and one mixed lung NET at single cell resolution and identifies two subtypes of neuroendocrine cells, as well as explores the proliferation patterns in malignant and nonmalignant cell types, identifying MIF as a potential factor that promotes proliferation of B and plasma cells in siNETs. Furthermore, they explore the single-cell landscape of a mixed LCNEC and squamous cell carcinoma, from which they identify a putative stem cell population with expression of features from both lineages.

      Strengths:

      This work showcases single-cell profiling of a rare tumor type, which is very informative for the field of NETs. The authors highlight very interesting observations, including the identification of the epithelial and neuronal subtype of siNETs, which they validated with an independent bulk RNA sequencing cohort. Furthermore, the observation of low cycling in malignant cells and high cycling in nonmalignant cells is an interesting one which may be applicable to other NETs.

      Weaknesses:

      • The authors do not connect their findings to clinical outcome. For example, is the epithelial or neuronal subtype enriched in tumors with worse or better prognosis or high grade vs. low grade siNETs or in patients who metastasize vs. who don't? As the authors show they can identify epithelial vs. neuronal subtypes in bulk RNA seq, perhaps they can take advantage of these other studies with larger sample sizes to investigate this. Additionally, the authors identify that the phenomenon of higher B/plasma cell proliferation is particular to epithelial siNETs and write that "The implications of high B/plasma cell turnover, and of other downstream effects of high MIF expression, are unclear, but raise the possibility that MIF-CD74 interaction may constitute a relevant target for the epithelial-like SiNET subtype." However, if this interaction contributes to survival in these patients, targeting this interaction may not be beneficial. Thus, it is important for the authors to try to connect their finding to clinical outcomes to enhance the translational relevance of this paper.

      • The generalizability of this study would be enhanced if the authors analyzed other available single cell studies of NETs and found a similar phenomenon of high proliferating nonmalignant cell types. Although these studies are also very limited in sample size, seeing concordance in findings across independent cohorts and different experimental techniques would help to strengthen the findings. While the authors rationalize that these other studies are too distinct from their own due to enrichment for immune cells, this limitation should be noted but does not prevent such an analysis from being attempted.

      • On page 3, the authors claim that "Technical effects (e.g. single cell analysis of fresh samples vs. single nuclei analysis of frozen samples) could also impact the capture of distinct cell types, although we did not observe a clear pattern of such bias." Can the authors show that cell type frequencies are not significantly different between the samples profiled with these two methods?

      • Why did siNET3 and siNET9 have much lower recovery of neuroendocrine cells compared to other samples? It would be interesting to see how similar or different the transcriptional profiles are of the samples that were obtained from the same patient, considering that multifocal siNETs are found to derive from distinct clones, although this analysis is understandably not possible in this case due to the lack of neuroendocrine cells in one of two samples from the same patient.

      • It should be more clearly stated in the text that these samples were previously treated with somatostatin analogues, as this impacts the interpretation of the findings.

      • The identification of a potential progenitor subtype in the miNEN is very intriguing, albeit a case study and represents a distinct cancer from the lowly proliferating siNETs. While the authors mention this in the text, the case study feels rather tangential to the other parts of the paper.

      • How the authors compared the DE genes to known signatures for the fibroblast and endothelial cells should be clarified and discussed in the Methods section.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors have assembled a cohort of 10 SiNET, 1 SiAdeno, and 1 lung MiNEN samples to explore the biology of neuroendocrine neoplasms. They employ single-cell RNA sequencing to profile 5 samples (siAdeno, SiNETs 1-3, MiNEN) and single-nuclei RNA sequencing to profile seven frozen samples (SiNET 4-10).

      They identify two subtypes of siNETs, characterized by either epithelial or neuronal NE cells, through a series of DE analyses. They also report findings of higher proliferation in non-malignant cell types across both subtypes. Additionally, they identify a potential progenitor cell population in a single-lung MiNEN sample.

      Strengths:

      Overall, this study adds interesting insights into this set of rare cancers that could be very informative for the cancer research community. The team probes an understudied cancer type and provides thoughtful investigations and observations that may have translational relevance.

      Weaknesses:

      The study could be improved by clarifying some of the technical approaches and aspects as currently presented, toward enhancing the support of the conclusions:

      (1) Methods: As currently presented, it is possible that the separation of samples by program may be impacted by tissue source (fresh vs. frozen) and/or the associated sequencing modality (single cell vs. single nuclei). For instance, two (SiNET1 and SiNET2) of the three fresh tissues are categorized into the same subtype, while the third (SiNET9) has very few neuroendocrine cells. Additionally, samples from patient 1 (SiNET1 and SiNET6) are separated into different subtypes based on fresh and frozen tissue. The current text alludes to investigations (i.e.: "Technical effects (e.g., fresh vs. frozen samples) could also impact the capture of distinct cell types, although we did not observe a clear pattern of such bias."), but the study would be strengthened with more detail.

      We thank the reviewer for the thoughtful and constructive review. Due to the difficulty in obtaining enough SiNET samples, we used two platforms to generate data - single cell analysis of fresh samples, and single nuclei analysis of frozen samples. We opted to combine both sample types in our analysis while being fully aware of the potential for batch effects. We therefore agree that this is a limitation of our work, and that differences between samples should be interpreted with caution.

      Nevertheless, we argue that the two SiNET subtypes that we have identified are very unlikely to be due to such batch effect. First, the epithelial SiNET subtype was not only detected in two fresh samples but also in one frozen sample (albeit with relatively few cells, as the reviewer correctly noted). Second, and more importantly, the epithelial SiNET subtype was also identified in analysis of an external and much larger cohort of bulk RNA-seq SiNET samples that does not share the issue of two platforms (as seen in Fig. 2f). Moreover, the proportion of samples assigned to the two subtypes is similar between our data and the external data. We therefore argue that the identification of two SiNET subtypes cannot be explained by the use of two data platforms. However, we agree that the results should be further investigated and validated by future studies.

      The reviewer also commented that two samples from the same patient which were profiled by different platforms (SiNET1 and SiNET6) were separated into different subtypes. We would like to clarify that this is not the case, since SiNET6 was not included in the subtype analysis due to too few detected Neuroendocrine cells, and was not assigned to any subtype, as noted in the text and as can be seen by its exclusion from Figure 2 where subtypes are defined. We apologize that our manuscript may have given the wrong impression about SiNET6 classification (it was labeled in Fig. 4a in a misleading manner). In the revised manuscript, we corrected the labeling in Fig. 4a and clarified that SiNET6 is not assigned to any subtype. We also further acknowledge the limitation of the two platforms and the arguments in favor of the existence of two SiNET subtypes.     

      (Additional specific recommendations for the authors are provided below)

      (2) Results:

      Heterogeneity in the SiNET tumor microenvironment: It is unclear if the current analysis of intratumor heterogeneity distinguishes the subtypes. It may be informative if patterns of tumor microenvironment (TME) heterogeneity were identified between samples of the same subtype. The team could also evaluate this in an extension cohort of published SiNET tumors (i.e. revisiting additional analyses using the SiNET bulk RNAseq from Alvarez et al 2018, a subset of single-cell data from Hoffman et al 2023, or additional bulk RNAseq validation cohorts for this cancer type if they exist [if they do not, then this could be mentioned as a need in Discussion])

      We agree that analysis of an independent cohort will assist in defining the association between TME and the SiNET subtype. However, the sample size required for that is significantly larger than the data available. In the revised manuscript we note that as a direction for future studies.

      (3) Proliferation of NE and immune cells in SiNETs: The observed proliferation of NE and immune cells in SiNETs may also be influenced by technical factors (including those noted above). For instance, prior studies have shown that scRNA-seq tends to capture a higher proportion of immune cells compared to snRNA-seq, which should be considered in the interpretation of these results. Could the team clarify this element?

      We agree that different platforms could affect the observed proportions of immune cells, and more generally the proportions of specific cell types. However, the low proliferation of Neuroendocrine cells and the higher proliferation of immune cells (especially B cells, but also T cells and macrophages) is consistently observed in both platforms, as shown in Fig. 4a, and therefore appears to be reliable despite the limitations of our work. We clarify this consistency in the revised manuscript. 

      (4) Putative progenitors in mixed tumors: As written, the identification of putative progenitors in a single lung MiNEN sample feels somewhat disconnected from the rest of the study. These findings are interesting - are similar progenitor cell populations identified in SiNET samples? Recognizing that ideally additional validation is needed to confidently label and characterize these cells beyond gene expression data in this rare tumor, this limitation could be addressed in a revised Discussion.

      We do not find evidence for similar progenitors in the SiNET samples, but they also do not contain two co-existing lineages of cancer cells within the same tumor, so this is harder to define. We agree about the need for additional validation for this specific finding and have noted that in the revised Discussion.

      Reviewer #2 (Public review):

      Summary:

      The research identifies two main SiNET subtypes (epithelial-like and neuronal-like) and reveals heterogeneity in non-neuroendocrine cells within the tumor microenvironment. The study validates findings using external datasets and explores unexpected proliferation patterns. While it contributes to understanding SiNET oncogenic processes, the limited sample size and depth of analysis present challenges to the robustness of the conclusions.

      Strengths:

      The studies effectively identified two subtypes of SiNET based on epithelial and neuronal markers. Key findings include the low proliferation rates of neuroendocrine (NE) cells and the role of the tumor microenvironment (TME), such as the impact of Macrophage Migration Inhibitory Factor (MIF).

      Weaknesses:

      However, the analysis faces challenges such as a small sample size, lack of clear biological interpretation in some analyses, and concerns about batch effects and statistical significance.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors set out to profile small intestine neuroendocrine tumors (siNETs) using single-cell/nucleus RNA sequencing, an established method to characterize the diversity of cell types and states in a tumor. Leveraging this dataset, they identified distinct malignant subtypes (epithelial-like versus neuronal-like) and characterized the proliferative index of malignant neuroendocrine cells versus non-malignant microenvironment cells. They found that malignant neuroendocrine cells were far less proliferative than some of their non-malignant counterparts (e.g., B cells, plasma cells, epithelial cells) and there was a strong subtype association such that epithelial-like siNETs were linked to high B/plasma cell proliferation, potentially mediated by MIF signaling, whereas neuronal-like siNETs were correlated with low B/plasma cell proliferation. The authors also examined a single case of a mixed lung tumor (neuroendocrine and squamous) and found evidence of intermediate/mixed and stem-like progenitor states that suggest the two differentiated tumor types may arise from the same progenitor.

      Strengths:

      The strengths of the paper include the unique dataset, which is the largest to date for siNETs, and the potentially clinically relevant hypotheses generated by their analysis of the data.

      Weaknesses:

      The weaknesses of the paper include the relatively small number of independent patients (n = 8 for siNETs), lack of direct comparison to other published single-cell NET datasets, mixing of two distinct methods (single-cell and single-nucleus RNA-seq), lack of direct cell-cell interaction analyses and spatially-resolved data, and lack of in vitro or in vivo functional validation of their findings.

      The analytical methods applied in this study appear to be appropriate, but the methods used are fairly standard to the field of single-cell omics without significant methodological innovation. As the authors bring forth in the Discussion, the results of the study do raise several compelling questions related to the possibility of distinct biology underlying the epithelial-like and neuronal-like subtypes, the origin of mixed tumors, drivers of proliferation, and microenvironmental heterogeneity. However, this study was not able to further explore these questions through spatially-resolved data or functional experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Methods:

      a) Could the team clarify the discrepancy in subtype assignment between two samples from the same patient? i.e. are these samples from the same tumor? If so, what does the team think is the explanation for the difference in subtype assignment?

      As noted above in response to the public review of reviewer #1, SiNET6 was in fact not assigned to any subtype (due to insufficient NE cells) and hence there was no discrepancy. We apologize for the misleading labeling of SiNET6 in the previous version and have corrected this In the revised version of Figure 4.

      b) What is the rationale for scoring tumor-derived programs on samples with no tumor cells? For instance, SiNET3 does not contain NE cells, and SiNET9 has a very low fraction of NE cells. Please clarify how the scoring was performed on these samples, as the program assignments may be driven by other cell types in samples with little to no NE cells.

      Scoring for tumor-derived programs was done only for the NE cells. Accordingly, SiNET3 was not scored or assigned to any of the programs. SINET9 was included in this analysis - although it had a relatively small fraction of NE cells, the absolute number of profiled cells was particularly high in this sample and therefore the number of NE cells was 130, higher than our cutoff of 100 cells.

      c) Given the heterogeneity of cell types within each sample, would there be a way to provide a refined sense of confidence for certain cell type annotations? This would be helpful given the heterogeneity in marker gene expression and the absence of gold-standard markers for fibroblasts and endothelial cells in this cancer type. Additionally, there seems to be an unusually large proportion of NK and T cells - was there selection for this (given that these tumors are largely not immune infiltrated)?

      Author Response: Except for the Neuroendocrine cells, there are six TME cell types that we consistently find in multiple SiNET samples: macrophages, T cells, B/plasma cells, fibroblasts, endothelial and epithelial cells. Each of these cell types are identified as discrete clusters in analysis of the respective tumors (as shown in Fig. 1a,b and Fig. S1), and these are exactly the six most common non-malignant cell types that we and others found in single cell analysis across various other tumor types (e.g. see Gavish et al. 2023, ref. #15). The signatures used to annotate these cell types are shown in Table S2, and they primarily consist of classical markers that are traditionally used to define those cell types. We therefore believe that the annotation of these typical tumor-associated cell types is robust and does not include major uncertainties. In addition to these five common cell types, there are three cell types that we find only in 1-2 of the samples – epithelial cells, plasma cells and NK cells. Again, we believe that their annotation is robust, and these cell types are primarily not used for further analysis.

      There was no selection for any specific cell types in this study. Nevertheless, single cell (or single nuclei) analysis may lead to biases towards specific cell types, that we cannot evaluate directly from the data. NK cells were detected only in one tumor. T cells were detected in eight of the ten samples; but in four of those samples the frequency of T cells was lower than 5% and only in one sample the frequency was above 20%. Therefore, while we cannot exclude a technical bias towards high frequency of T/NK cells, we do not consider these frequencies as high enough to suggest this specific type of bias. In the revised manuscript, we clarify that the commonly observed cell types in SiNETs are the same as those commonly observed in other tumors and we acknowledge the possibility of a technical bias in cell type capture.  

      d) Evaluating the expression of one gene at a time may not effectively demonstrate subtype-specific patterns, particularly when comparing NE cells from one tumor to non-NE cells from another, which may not be an appropriate approach for identifying differentially expressed genes. DE analysis coupled with concordance analysis, for example, could strengthen the results.

      We apologize, but we do not fully understand this comment. We note that the initial normalization by non-NE cells was done in order to decrease batch effects when combining the data of the two platforms. We also note that the two subtypes were identified by two distinct approaches, as shown in Fig. 2c and in Fig. 2f.

      (2) Results:

      See the above public review.

      (3) Minor Comments:

      a) Results: Single cell and single nuclei RNA-seq profiling of SiNETs

      The results say ten primary tumor samples from eight patients. Later in the paragraph it says, "After initial quality controls, we retained 29,198 cells from the ten patients." Please clarify to either ten samples or eight patients.

      Indeed these are ten samples rather than ten patients. We corrected that in the revised version and thank the reviewer for noticing our error.

      b) Methods:

      - Please specify which computational tools were used to perform quality control, signature scoring, etc.

      The approaches for quality control, scoring etc. are described in the methods. We implemented these approaches with R code and did not use other computational tools.

      - Minor point but be consistent with naming convention (ie, siAdeno vs SiAdeno) throughout the paper. For example, under "Sample Normalization, Filtering and annotations" change "siAdeno" to "SiAdeno."

      Thank you for noting this, we corrected that.

      - Add processing and analysis of MiNEN sample to the methods section. It is not mentioned in the methods at all.

      As noted in the revised manuscript, the MiNEN sample was analyzed in the same way as the SiNET fresh samples.

      c) Supplementary Figures:

      Figure S1: Change (A-H) to (A-I) to account for all panels in the figure.

      Figure S4: Add (C) after "the siAdeno sample" in the legend.

      Thank you for noting this, we corrected that.

      (4) Font size is quite small in the main figures.

      We enlarged the font in selected figure panels.

      Reviewer #2 (Recommendations for the authors):

      (1) The small number of samples used in some analyses affects the robustness of the findings. Increasing the sample size or including more validation data could improve the statistical reliability and make the results more convincing. The authors should consider expanding the cohort size or integrating additional external datasets to increase statistical power.

      We agree with the reviewer that adding more samples would improve the reliability of the results. However, the external data that we found was not comparable enough to enable integration with our data, and we are unable to profile additional SiNET samples in our lab. We hope that future studies would support our results and extend them further.

      (2) The biological significance of differentially expressed genes needs more depth, limiting the insights into SiNET biology. The authors should perform a comprehensive pathway enrichment analysis and integrate findings with existing literature. Tools like Gene Set Enrichment Analysis (GSEA) or Overrepresentation Analysis (ORA) could provide a more holistic view of altered biological processes.

      We thank the reviewer for this suggestion. We did examine the functional enrichment of differentially expressed genes and did not find additional enrichments that we felt were important to highlight beyond what we described. We report the genes in supplementary tables, enabling other researchers to examine these lists further. 

      (3) The unexpected finding of higher proliferation in non-malignant cells requires further investigation and plausible biological explanation. The authors should perform additional analyses to explore potential mechanisms, such as investigating cell cycle regulators or performing in vitro validation experiments. The authors should consider single-cell trajectory analysis to explore these highly proliferative non-malignant cells' potential differentiation or activation states.

      We agree that our results are descriptive and that we do not fully explain the mechanism for the high level of non-malignant cell proliferation. We did attempt to perform follow up computational analysis. These analyses raised the hypothesis that high levels of MIF are causing the proliferation of immune cells. Additional analyses that we performed were not sufficient to conclusively identify a mechanism, and we felt that they were not informative enough to be included in the manuscript. Further in vitro (or in vivo) studies are beyond the scope of the current work.

      (3) More details are required on methods used for p-value adjustment, and criteria for statistical significance should be clearly defined. Additionally, integrating scRNA-seq and snRNA-seq data needs a more thorough explanation, including batch effect mitigation and more explicit cell clustering representation. The authors should clearly describe p-value adjustments (e.g., FDR) and batch correction methods (e.g., Harmony, FastMNN integration) and include additional figures showing corrected UMAP plots or heatmaps post-batch correction to enhance the confidence in results.

      We now clarify in the Methods our use of FDR for p-value adjustments. As for batch correction, we have avoided the use of integration methods as we believe that they tend to distort the data and decrease tumor-specific signals. Instead, we primarily analyzed one tumor at a time and never directly compared cell profiles across distinct tumors but only compared the differences between subpopulations; specifically, we normalized the expression of NE cells by subtracting the expression of reference non-NE cells from the same tumor as a method to decrease batch effects. We now clarify this point in the Methods section.

      (4) The lack of analysis of interactions between different cell types limits understanding of tumor microenvironment dynamics. The authors should employ cell-cell interaction analysis tools (e.g., CellPhoneDB, NicheNet) to explore potential communication networks within the tumor microenvironment. This could provide valuable insights into how different cell types influence tumor progression and maintenance.

      We thank the reviewer for this suggestion. We have tried to use such methods but found the results difficult to interpret since these approaches generated very long lists of potential cell-cell interactions that are largely not unique to the SiNET context and their relevance remains unclear without follow up experiments, which are beyond the scope of this work. We therefore focused only on ligand/receptors that came up robustly through specific analyses such as the differences between SiNET subtypes. In particular, MIF is highly expressed in the epithelial subtype, and remarkably, MIF upregulation is shared across multiple cell types. Thus, the cell-cell interactions that are suggested by the SiNET data as somewhat unique to this context are those involving MIF and its receptor (CD74 on immune cell types), while other interactions detected by the proposed methods primarily reflect the generic ligand/receptors expressed by corresponding TME cell types.   

      Reviewer #3 (Recommendations for the authors):

      (1) For a relatively small dataset, the mixing of single-cell versus single-nucleus RNA-seq should be discussed more. It would be nice to have 1-2 tumors that are analyzed by both methods to compare and increase our understanding of how these different approaches may affect the results. This could be accomplished by splitting a fresh tumor into two parts, processing it fresh for single-cell RNA-seq, and freezing the other part for single-nucleus RNA-seq.

      We agree with the reviewer that the different techniques may bias our results and we refer to this limitation in the Results and Discussion sections. However, it is important to note that we do not directly integrate the primary data across these modalities, but rather analyze each tumor separately and only combine the results across tumors. For example, we first compare the NE cells from each tumor to control non-NE cells from the same tumor and then only compare the sets of NE-specific genes across tumors. Moreover, the subtypes that we detect cannot be explained by these modalities, as the first subtype contains samples from both methods and these subtypes are further demonstrated in external bulk data. Similarly, the results regarding low proliferation of NE cells and high proliferation of B/plasma cells are observed across both modalities. We therefore argue that while the combination of methods is a limitation of this work it does not account for the main results.  

      (2) The authors state that they defined the siNET transcriptomic signature by comparing their siNET single-cell/nucleus data to other NETs profiled by bulk RNA-seq. Some of the genes in the signature, such as CHGA, are widely used as markers for NETs (and not specific for siNET). The authors should address this in more detail.

      To define the SiNET transcriptomic signature we first analyzed each tumor separately and compared the expression of Neuroendocrine (NE) cells to that of non-NE cells to detect NE-specific genes. Next, we compared the lists of NE-specific genes across the 8 SiNET patients and found a subset of 26 genes which were shared across most of the analyzed SiNET samples (Fig. 2a). Thus, the signature was defined only from analysis of SiNETs and not based on comparison to other types of NETs and hence it is expected that the signature could contain both SiNET-specific genes and more generic NET genes such as CHGA.

      Only after defining this signature, we went on to compare it between SiNETs and other types of NETs (pancreatic and rectal) based on external bulk RNA-seq data. In this comparison, we observed that the signature was clearly higher in SiNETs than in the other NETs (Fig. 2b). This result supports the accuracy of the signature and further suggests that it contains a fraction of SiNET-specific genes and not only generic NET genes such as CHGA. Thus, we would expect this signature to perform well also for distinguishing between SiNET and types of NETs, but it does contain a subset of genes that would be high in the other NETs. Finally, we note that even though CHGA is a generic NET marker, the bulk RNA-seq data would suggest that, at least at the mRNA level, this gene is still higher expressed in SiNETs than in other NETs. To avoid confusion regarding the definition and specificity of the SiNET transcriptomic signature we have extended the description of this section in the revised manuscript.

      (3) The authors only compare their data to bulk transcriptomic data on NETs. While in some instances this makes sense given the bulk dataset has >80 tumors, they should at least cite and do some comparison to other published single-cell RNA-seq datasets of NETs (e.g., PMID: 37756410, 34671197). The former study listed has 3 siNETs, 4 pNETs, and 1 gNET. Do the epithelial-like and neuronal-like signatures show up in this dataset too?

      We examined these studies but concluded that their data was inadequate to identify the two SiNET subtypes. The latter study was of pNETs, while the former study had 3 SiNET samples but only from 2 patients, and furthermore it was enriching for immune cells with only very low amounts of NE cells. Therefore, we now cite this work in the discussion but cannot use it to extend the results from our work.

      (4) How did the authors statistically handle patients with more than one tumor sample (true for n = 2)? These tumor samples would not be truly independent.

      In both cases where we had two distinct samples of the same patient, only one sample had sufficient NE cells to be included in NE-related analysis and therefore the other samples (SiNET3 and SiNET6) were excluded from all analysis of NE differential expression and subtypes. These samples were only included in the initial analysis (Fig. 1) and in TME-related analysis (Fig. 3-4) in which there was no statistical analysis of differences between patients and hence no problem with the inclusion of 2 samples for the same patient. We clarified this issue in the revised version.

      (5) The association between siNET subtype and B/plasma cell proliferation is very interesting, as is the hypothesis regarding MIF signaling. It would be illuminating for the authors to perform cell-cell interaction analyses with methods such as CellChat in this context rather than just relying on DE. Spatial mapping would be helpful too and while this may be outside the scope of this study, it should at least be expounded upon in the Discussion section.

      Indeed, spatial transcriptomic analysis would add interesting insight to our data and to SiNET biology. Unfortunately, this is not within the scope of the current project but we note this interesting possibility in the Discussion. Regarding additional methods for cell-cell interactions, we have performed such analysis but found it not informative as it highlighted a large number of interactions that are not unique SiNETs and are difficult to interpret, and therefore we do not include this in the revised version. 

      (6) The authors note that in the mixed lung tumor, the NE component was more proliferative than that observed with siNETs. How does the proliferation compare to pNETs, gNETs, in other published studies? How about assessing the clonality of the SCC and LNET malignant cells with various genomic or combined genomic/transcriptomic methods?

      The percentage of proliferating NE cells in the mixed lung tumor was higher than 60%. This is extremely high, approximately four-fold higher than the average that we found in a pan-cancer analysis and higher than the average of any of the >20 cancer types that we analyzed (Gavish et al. 2023, ref. #15). This remarkably high proliferation serves as a control for the low proliferation that we found in SiNET NE cells.

      (7) In the Discussion on page 13, the authors write "Second, proliferation of NE cells may be inhibited by prior treatments with somatostatin analogues." How many patients were treated in this manner? This information should be made more explicit in the manuscript.

      Details on pretreatment with somatostatin analogues are provided in Table S1. All patients were pre-pretreated with somatostatin analogues, with the possible exception of one patient (P8, SiNET10) for which we could not confidently obtain this information.

      (8) On page 5, "bone-fide" is misspelled.

      (9) On page 8, "exact identify" is misspelled.

      We thank the reviewer and have corrected the typos.

    1. eLife Assessment

      This study offers a valuable assessment of the impact of antibiotics on the human gut microbiota across diverse observational cohorts. The findings presented are convincing, despite the observational design and residual confounding that may still contribute to discrepancies between the cross-sectional and longitudinal data. The work is relevant for researchers and clinicians interested in antimicrobial resistance and the impact of antibiotics on the host.

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript by Peto et al., the authors describe the impact of different antimicrobials on gut microbiota in a prospective observational study of 225 participants (healthy volunteers, inpatients and outpatients). Both cross-sectional data (all participants) and longitudinal data (subset of 79 haematopoietic cell transplant patients) were used. Using metagenomic sequencing, they estimated the impact of antibiotic exposure on gut microbiota composition and resistance genes. In their models, the authors aim to correct for potential confounders (e.g. demographics, non-antimicrobial exposures and physiological abnormalities), and for differences in the recency and total duration of antibiotic exposure. I consider these comprehensive models an important strength of this observational study. Yet, the underlying assumptions of such models may have impacted the study findings and residual confounding is likely. Other strengths include the presence of both cross-sectional and longitudinal exposure data and presence of both healthy volunteers and patients. Together, these observational findings expand on previous studies (both observational and RCTs) describing the impact of antimicrobials on gut microbiota.

      Weaknesses:

      (1) The main weaknesses result from the observational design. This hampers causal interpretation and makes correction for potential confounding necessary. While the authors have used comprehensive models to correct for potential confounders and for differences between participants in duration of antibiotic exposure and time between exposure and sample collection, I believe residual confounding is likely (which is mentioned as a limitation in the discussion).<br /> For their models, the authors found a disruption half-life of 6 days to be the best fit based on Shannon diversity. Yet, the disruption caused by antimicrobials may be longer than represented in this model - as highlighted in the discussion.

      (2) Another consequence of the observational design of this study is the relatively small number of participants available for some comparisons (e.g. oral clindamycin was only used by 6 participants). Care should be taken when drawing any conclusions from such small numbers.

      Comments on revisions:

      The authors have adequately addressed all of my comments.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors provide a study among healthy individuals, general medical patients and patients receiving haematopoietic cell transplants (HCT) to study the gut microbiome through shotgun metagenomic sequencing of stool samples. The first two groups were sampled once, while the patients receiving HCT were sampled longitudinally. A range of metadata (including current and previous (up to 1 year before sampling) antibiotic use) was recorded for all sampled individuals. The authors then performed shotgun metagenomic sequencing (using the Illumina platform) and performed bioinformatic analyses on these data to determine the composition and diversity of the gut microbiota and the antibiotic resistance genes therein. The authors conclude, on the basis of these analyses, that some antibiotics had a large impact on gut microbiota diversity, and could select opportunistic pathogens and/or antibiotic resistance genes in the gut microbiota.

      Strengths:

      The major strength of this study is the considerable achievement of performing this observational study in a large cohort of individuals. Studies into the impact of antibiotic therapy on the gut microbiota are difficult to organise, perform and interpret, and this work follows state-of-the-art methodologies to achieve its goals. The authors have achieved their objectives and the conclusion they draw on the impact of different antibiotics and their impact on the gut microbiota and its antibiotic resistance genes (the 'resistome', in short), are supported by the data presented in this work.

      Weaknesses:

      The weaknesses are the lack of information on the different resistance genes that have been identified and which could have been supplied as Supplementary Data.

      We have now supplied a list of individual resistance genes as supplementary data.

      In addition, no attempt is made to assess whether the identified resistance genes are associated with mobile genetic elements and/or (opportunistic) pathogens in the gut. While this is challenging with short-read data, alternative approaches like long-read metagenomics, Hi-C and/or culture-based profiling of bacterial communities could have been employed to further strengthen this work.

      We agree this is a limitation, and we now refer to this in the discussion. Unfortunately we did not have funding to perform additional profiling of the samples that would have provided more information about the genetic context of the AMR genes identified.

      Unfortunately, the authors have not attempted to perform corrections for multiple testing because many antibiotic exposures were correlated.

      The reviewer is correct that we did not perform formal correction for multiple testing. This was because correlation between antimicrobial exposures meant we could not determine what correction would be appropriate and not overly conservative. We now describe this more clearly in the statistical analysis section.

      Impact:

      The work may impact policies on the use of antibiotics, as those drugs that have major impacts on the diversity of the gut microbiota and select for antibiotic resistance genes in the gut are better avoided. However, the primary rationale for antibiotic therapy will remain the clinical effectiveness of antimicrobial drugs, and the impact on the gut microbiota and resistome will be secondary to these considerations.

      We agree that the primary consideration guiding antimicrobial therapy will usually be clinical effectiveness. However antimicrobial stewardship to minimise microbiome disruption and AMR selection is an increasingly important consideration, particularly as choices can often be made between different antibiotics that are likely to be equally clinically effective.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript by Peto et al., the authors describe the impact of different antimicrobials on gut microbiota in a prospective observational study of 225 participants (healthy volunteers, inpatients and outpatients). Both cross-sectional data (all participants) and longitudinal data (a subset of 79 haematopoietic cell transplant patients) were used. Using metagenomic sequencing, they estimated the impact of antibiotic exposure on gut microbiota composition and resistance genes. In their models, the authors aim to correct for potential confounders (e.g. demographics, non-antimicrobial exposures and physiological abnormalities), and for differences in the recency and total duration of antibiotic exposure. I consider these comprehensive models an important strength of this observational study. Yet, the underlying assumptions of such models may have impacted the study findings (detailed below). Other strengths include the presence of both cross-sectional and longitudinal exposure data and the presence of both healthy volunteers and patients. Together, these observational findings expand on previous studies (both observational and RCTs) describing the impact of antimicrobials on gut microbiota.

      Weaknesses:

      (1) The main weaknesses result from the observational design. This hampers causal interpretation and corrects for potential confounding necessary. The authors have used comprehensive models to correct for potential confounders and for differences between participants in duration of antibiotic exposure and time between exposure and sample collection. I wonder if some of the choices made by the authors did affect these findings. For example, the authors did not include travel in the final model, but travel (most importantly, south Asia) may result in the acquisition of AMR genes [Worby et al., Lancet Microbe 2023; PMID 37716364). Moreover, non-antimicrobial drugs (such as proton pump inhibitors) were not included but these have a well-known impact on gut microbiota and might be linked with exposure to antimicrobial drugs. Residual confounding may underlie some of the unexplained discrepancies between the cross-sectional and longitudinal data (e.g. for vancomycin).

      We agree that the observational design means there is the potential for confounding, which, as the reviewer notes, we attempt to account for as far as possible in the multivariable models presented. We cannot exclude the possibility of residual confounding, and we highlight this as a limitation in the  discussion. We have expanded on this limitation, and mention it as a possible explanation for inconsistencies between longitudinal and cross sectional models. Conducting randomised trials to assess the impacts of multiple antimicrobials in sick, hospitalised patients would be exceptionally difficult, and so it is hard to avoid reliance on observational data in these settings.

      We did record participants’ foreign travel and diet, but these exposures were not included in our models as they were not independently associated with an impact on the microbiome and their inclusion did not materially affect other estimates. However, because most participants were recruited from a healthcare setting, few had recent foreign travel and so this study was not well powered to assess the effects of travel on AMR carriage. We have added this as a limitation.

      In addition, the authors found a disruption half-life of 6 days to be the best fit based on Shannon diversity. If I'm understanding correctly, this results in a near-zero modelled exposure of a 14-day-course after 70 days (purple line; Supplementary Figure 2). However, it has been described that microbiota composition and resistome (not Shannon diversity!) remain altered for longer periods of time after (certain) antibiotic exposures (e.g. Anthony et al., Cell Reports 2022; PMID 35417701). The authors did not assess whether extending the disruption half-life would alter their conclusions.

      The reviewer is correct that the best fit disruption half-life of 6 days means the model assumes near-zero exposure by 70 days. We appreciate that antimicrobials can cause longer-term disruption than is represented in our model, and we refer to this in the discussion (we had cited two papers supporting this, and we are grateful for the additional reference above, which we have added). We agree that it is useful to clarify that the longer term effects may be seen in individual components of the microbiome or AMR genes, but not in overall measures of diversity, so have added this to the discussion.

      (2) Another consequence of the observational design of this study is the relatively small number of participants available for some comparisons (e.g. oral clindamycin was only used by 6 participants). Care should be taken when drawing any conclusions from such small numbers.

      We agree. Although our participants received a large number of different antimicrobial exposures, these were dependent on routine clinical practice at our centre and we lack data on many potentially important exposures. We had mentioned this in relation to antimicrobials not used at our centre, and have now clarified in the discussion that this also limits reliability of estimates for antimicrobials that were rarely used in study participants.

      (3) The authors assessed log-transformed relative abundances of specific bacteria after subsampling to 3.5 million reads. While I agree that some kind of data transformation is probably preferable, these methods do not address the compositional data of microbiome data and using a pseudocount (10-6) is necessary for absent (i.e. undetected) taxa [Gloor et al., Front Microbiol 2017; PMID 29187837]. Given the centrality of these relative abundances to their conclusions, a sensitivity analysis using compositionally-aware methods (such as a centred log-ratio (clr) transformation) would have added robustness to their findings.

      We agree that using a pseudocount is necessary for undetected taxa, which we have done assuming undetected taxa had an abundance of 10<sup>-6</sup> (based on the lower limit of detection at the depth we sequenced). We refer to this as truncation in the methods section, but for clarity we have now also described this as a pseudocount.  Because our analysis focusses on major taxa that are almost ubiquitous in the human gut microbiome, a pseudocount was only used for 3 samples that had no detectable Enterobacteriaciae.

      We are aware that compositionally-aware methods are often used with microbiome data, and for some analyses these are necessary to avoid introducing spurious correlations. However the flaws in non-compositional analyses outlined in Gloor et al do not affect the analyses in this paper:

      (1) The problems related to differing sequence depths or inadequate normalisation do not apply to our dataset, as we took a random subset of 3.5 million reads from all samples (Gloor et al correctly point out that this method has the drawback of losing some information, but it avoids problems related to variable sequencing depth)

      (2) The remainder Gloor et al critiques multivariate analyses that assess correlations between multiple microbiome measurements made on the same sample, starting with a dissimilarity matrix. With compositional data these can lead to spurious correlations, as measurements on an individual sample are not independent of other measurements made on the same sample. In contrast, our analyses do not use a dissimilarity matrix, but evaluate the association of multiple non-microbiome covariates (e.g. antibiotic exposures, age) with single microbiome measures. We use a separate model for each of 11 specified microbiome components, and display these results side-by side. This does not lead to the same problem of spurious correlation as analyses of dissimilarity matrices. However, it does mean that estimates of effects on each taxa outcome have to be interpreted in the context of estimates on the other taxa. Specifically, in our models, the associations of antimicrobial exposure with different taxa/AMR genes are not necessarily independent of each other (e.g. if an antimicrobial eradicated only one taxon then it would be associated with an increase in others). This is not a spurious correlation, and makes intuitive sense when using relative abundance as outcome. However, we agree this should be made more explicit.

      For these reasons, at this stage we would prefer not to increase the complexity of the manuscript by adding a sensitivity analysis.

      (4) An overall description of gut microbiota composition and resistome of the included participants is missing. This makes it difficult to compare the current study population to other studies. In addition, for correct interpretation of the findings, it would have been helpful if the reasons for hospital visits of the general medical patients were provided.

      We have added a summary of microbiome and resistome composition in the results section and new supplementary table 2), and we also now include microbiome and resistome profiles of all samples in the supplementary data. We also provide some more detail about the types of general medical patients included. We are not able to provide a breakdown of the initial reason for admission as this was not collected.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Provide a supplementary table with information on the abundance of individual genes in the samples.

      This supplementary data is now included.

      (2) Engage with an expert in statistics to discuss how statistical analyses can be improved.

      A experienced biostatistician has been involved in this study since its conception, and was involved in planning the analysis and in the responses to these comments.

      (3) Typos and other minor corrections:

      Methods: it is my understanding that litre should be abbreviated with a lowercase l.

      Different journals have different house styles: we are happy to follow Editorial guidance.

      p. 9: abuindance should be corrected to abundance.

      Corrected

      p. 9: relative species should be relevant species?  

      Yes, corrected. Thank you.

      p. 9 - 10: can the apparent lack of effect of beta-lactams on beta-lactamase gene abundance be explained by the focus on a small number of beta-lactamase resistance genes that are found in Enterobacteriaceae and which are not particularly prevalent, while other classes of resistance genes (e.g. Bacteroidal beta-lactamases) were excluded?

      It is possible that including other beta-lactamases would have led to different results, but as a small number of beta-lactamases in Enterobacteriaceae are of major clinical importance we decided to focus on these (already justified in the Methods). A full list of AMR genes identified is now provided in the supplementary data.

      p. 10: beta-lactamse should be beta-lactamase

      Corrected

      Figure 3A: could the data shown for tetracycline resistance genes be skewed by tetQ, which is probably one of the most abundant resistance genes in the human gut and acts through ribosome protection?

      TetQ was included, but only accounted for 23% of reads assigned to tetracycline resistance genes so is unlikely to have skewed the overall result. We limited the analysis to a few major categories of AMR genes and, other than VanA, have avoided presenting results for single genes to limit the degree of multiple testing. We now include the resistome profile for each sample in the supplementary data so that readers can explore the data if desired.

      Reviewer #2 (Recommendations For The Authors):

      (1) Given the importance of obligate anaerobic gut microbiota for human health, it might be interesting to divide antibiotics into categories based on their anti-anaerobic activity and assess whether these antibiotics differ in their effects on gut microbiota.

      The large majority of antibiotics used in clinical practice have activity against aerobic bacteria and anaerobic bacteria, so it is not possible to easily categorise them this way. There are two main exceptions (metronidazole and aminoglycosides) but there was insufficient use of these drugs to clearly detect or rule out a difference between them, even when categorising antimicrobials by class, so we prefer not to frame the results in these terms. Also see our comments on this categorisation below.

      (2) For estimating the abundance of anaerobic bacteria, three major groups were assessed: Bacteroidetes, Actinobacteria and Clostridia. To me, this seems a bit aspecific. For example, the phylum Bacteroidetes contains some aerobic bacteria (e.g. Flavobacteriia). Would it be possible to provide a more accurate estimation of anaerobic bacteria?

      We think that an emphasis on a binary aerobic/anaerobic classification is less biologically meaningful that the more granular genetic classification we use, and its use largely reflects the previous reliance on culture-based methods for bacterial identification. Although some important opportunistic human pathogens are aerobic, it is not clear that the benefit or harm of most gut commensals relates to their oxygen tolerance, and all luminal bacteria exist in an anaerobic environment. As such we prefer not to perform an additional analysis using this category. We are also not sure that this could be done reliably, as many of the taxa are characterised poorly, or not at all.

      We appreciate that Bacteroidetes, Actinobacteria and Clostridia are diverse taxa that include many different species, so may seem non-specific, but these were chosen because:

      i) they are non-overlapping with Enterobacteriaceae and Enterococcus, the major opportunistic pathogens of clinical relevance, so could be used in parallel, and

      ii) they make up the large majority of the gut microbiome in most people and most species are of low pathogenicity, so it is plausible that their disruption might drive colonisation with more pathogenic organisms (or those carrying important AMR genes).

      We have more clearly stated this rationale.

      (3) A statement on the availability of data and code for analysis is missing. I would highly recommend public sharing of raw sequence data and R code for analysis. If possible, it would be very valuable if processed microbiome data and patient metadata could be shared.

      We agree, and these have been submitted as supplementary data. We have added the following statement “The data and code used to produce this manuscript are available in the supplementary material, including processed microbiome data, and pseudonymised patient metadata. The sequence data for this study have been deposited in the European Nucleotide Archive (ENA) at EMBL-EBI under accession number PRJEB86785.”

    1. eLife Assessment

      This study provides important insights into the evolution of pesticide resistance, demonstrating that resistance can arise rapidly and repeatedly, which complements prior work on parallel evolution across species. The combination of extensive temporal sampling in the field, experimental evolution, and genomics makes for compelling findings. The authors are to be commended for acknowledging the main limitations of their study in the Discussion. Framing the work in a broader context of resistance beyond arthropod pests would further increase the appeal of the study, which is of relevance for both agronomic practitioners and evolutionary biologists.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Cao et al. provides a compelling investigation into the role of mutational input in the rapid evolution of pesticide resistance, focusing on the two-spotted spider mite's response to the recent introduction of the acaricide cyetpyrafen. This well-documented introduction of the pesticide-and thus a clearly defined history of selection-offers a powerful framework for studying the temporal dynamics of rapid adaptation. The authors combine resistance phenotyping across multiple populations, extensive resequencing to track the frequency of resistance alleles, and genomic analyses of selection in both contemporary and historical samples. These approaches are further complemented by laboratory-based experimental evolution, which serves as a baseline for understanding the genetic architecture of resistance across mite populations in China. Their analyses identify two key resistance-associated genes, sdhB and sdhD, within which they detect 15 mutations in wild-collected samples. Protein modeling reveals that these mutations cluster around the pesticide's binding site, suggesting a direct functional role in resistance. The authors further examine signatures of selective sweeps and their distribution across populations to infer the mechanisms-such as de novo mutation or gene flow-driving the spread of resistance, a crucial consideration for predicting evolutionary responses to extreme selection pressure. Overall, this is a well-rounded, thoughtfully designed and well-written manuscript. It shows significant novelty, as it is relatively rare to integrate broad-scale evolutionary inference from natural populations with experimentally informed bioassays, however, follow up work will be needed to fully resolve haplotype structure and the functional effects of resistance mutations in the system.

      Strengths:

      One of the most compelling aspects of this study is its integration of genomic time-series data in natural populations with controlled experimental evolution. By coupling genome sequencing of resistant field populations with laboratory selection experiments, the authors tease apart the individual effects of resistance alleles along with regions of the genome where selection is expected to occur, and compare that to the observed frequency in the wild populations over space and time. Their temporal data clearly demonstrates the pace at which evolution can occur in response to extreme selection. This type of approach is a powerful roadmap for the rest of the field of rapid adaptation.

      The study effectively links specific genetic changes to resistance phenotypes. The identification of sdhB and sdhD mutations as major drivers of cyetpyrafen resistance is well supported by allele frequency shifts in both field and experimental populations. The scope of their sampling clearly facilitated the remarkable number of observed mutations within these target genes, and the authors provide a careful discussion of the likelihood of these mutations from de novo or standing variation. Furthermore, the discovered cross-resistance that these mutations confer to other mitochondrial complex II inhibitors highlights the potential for broader resistance management and evolution.

      Weaknesses:

      (1) Pleiotropy without pesticide modes of action (cyflumetofen and cyetpyrafen) may also play a role in the rapid response to the focal pesticide in this study<br /> (2) Other aspects of the environment that might influence selection were not considered in the structure of resistance alleles (i.e. climate, elevation)<br /> (3) Very little data were used for haplotype reconstruction, only 8 SNPs, and this excluded all heterozygous alleles, which could dramatically influence the complexity of these inferred haplotype networks.<br /> (4) Single Mutations and Their Effects:<br /> - Allelic effects were not estimated in isogenic lines, so the effects presented also include heterogeneity from allelic interactions with the genomic background<br /> - The authors see populations that segregate for resistance mutations but that have no survival to pesticides. This suggests either not all of the resistance mutations studied here actually have functional effects or that dominance is playing a role in masking their effects in the heterozygous state.

    3. Reviewer #2 (Public review):

      Summary:

      This paper investigates the evolution of pesticide resistance in the two-spotted spider mite following the introduction of an SDHI acaricide, cyatpyrafen, in China. The authors make use of cyatpyrafen-naive populations collected before that pesticide was first used, as well as more recent populations (both sensitive and resistant) to conduct comparative population genomics. They report 15 different mutations in the insecticide target site from resistant populations, many reported here for the first time, and look at the mutation and selection processes underlying the evolution of resistance, through GWAS, haplotype mapping, and testing for loss of diversity indicating selective sweeps. None of the target site mutations found in resistant populations was found in pre-exposure populations, suggesting that the mutations may have arisen de novo rather than being present as standing variation, unless initially present at very low frequencies; a de novo origin is also supported by evidence of selective sweeps in some resistant populations. Furthermore, there is no significant evidence of migration of resistant genotypes between the sampled field populations indicating multiple origins of common mutations. Overall, this indicates a very high mutation rate and a wide range of mutational pathways to resistance for this target site in this pest species. The series of population genomic analyses carried out here, in addition to the evolutionary processes that appear to underly resistance development in this case, could have implications for the study of resistance evolution more widely.

      Strengths:

      This paper combines phenotypic characterisation with extensive comparative population genomics, made possible by the availability of multiple population samples (each with hundreds of individuals) collected before as well as after then introduction of the pesticide cyatpyrafen, as well as lab-evolved lines. This resuts in findings of mutation and selection processes that can be related back to the pesticide resistance trait of concern. Large numbers of mites were tested phenotypically to show the levels of resistance present, and the authors also made near-isogenic lines to confirm the phenotypic effects of key mutations. The population genomic analyses consider a range of alternative hypotheses, including mutations arising by de novo mutation or selection from standing genetic variation; and mutations in different populations arising independently or arriving by migration. The claim that mutations most likley arose by multiple repeated de novo mutations is therefore supported by multiple lines of evidence: the direct evidence of none of the mutations being found in over 2000 individuals from naive populations, and the indirect evidence from population genomics showing evidence of selective sweeps but not of significant migration between the sampled populations.

      Weaknesses:

      As acknowledged within the discussion, whilst evidence supports a de novo origin of the resistance associated mutations, this cannot be proven definitively as mutations may have been present at a very low frequency and therefore not found within the tested pesticide-naive population samples.

      Near-isofemale lines were made to confirm the resistance levels associated with five of the 15 mutations, but otherwise the genotype-phenotype associations are correlative as confirmation by functional genetics was beyond the scope of this study.

      Comments on revisions:

      My recommendations have all been addressed in the revised version.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Cao et al. provides a compelling investigation into the role of mutational input in the rapid evolution of pesticide resistance, focusing on the two-spotted spider mite's response to the recent introduction of the acaricide cyetpyrafen. This well-documented introduction of the pesticide - and thus a clearly defined history of selection - offers a powerful framework for studying the temporal dynamics of rapid adaptation. The authors combine resistance phenotyping across multiple populations, extensive resequencing to track the frequency of resistance alleles, and genomic analyses of selection in both contemporary and historical samples. These approaches are further complemented by laboratory-based experimental evolution, which serves as a baseline for understanding the genetic architecture of resistance across mite populations in China. Their analyses identify two key resistance-associated genes, sdhB and sdhD, within which they detect 15 mutations in wild-collected samples. Protein modeling reveals that these mutations cluster around the pesticide's binding site, suggesting a direct functional role in resistance. The authors further examine signatures of selective sweeps and their distribution across populations to infer the mechanisms - such as de novo mutation or gene flow-driving the spread of resistance, a crucial consideration for predicting evolutionary responses to extreme selection pressure. Overall, this is a well-rounded, thoughtfully designed, and well-written manuscript. It shows significant novelty, as it is relatively rare to integrate broad-scale evolutionary inference from natural populations with experimentally informed bioassays, however, some aspects of the methods and discussion have an opportunity to be clarified and strengthened.

      Strengths:

      One of the most compelling aspects of this study is its integration of genomic time-series data in natural populations with controlled experimental evolution. By coupling genome sequencing of resistant field populations with laboratory selection experiments, the authors tease apart the individual effects of resistance alleles along with regions of the genome where selection is expected to occur, and compare that to the observed frequency in the wild populations over space and time. Their temporal data clearly demonstrates the pace at which evolution can occur in response to extreme selection. This type of approach is a powerful roadmap for the rest of the field of rapid adaptation.

      The study effectively links specific genetic changes to resistance phenotypes. The identification of sdhB and sdhD mutations as major drivers of cyetpyrafen resistance is well-supported by allele frequency shifts in both field and experimental populations. The scope of their sampling clearly facilitated the remarkable number of observed mutations within these target genes, and the authors provide a careful discussion of the likelihood of these mutations from de novo or standing variation. Furthermore, the discovered cross-resistance that these mutations confer to other mitochondrial complex II inhibitors highlights the potential for broader resistance management and evolution.

      Weaknesses:

      (1) Experimental Evolution:

      - Additional information about the lab experimental evolution would be useful in the main text. Specifically, the dose of cyetpyrafen used should be clarified, especially with respect to the LD50 values. How does it compare to recommended field doses? This is expected to influence the architecture of resistance evolution. What was the sample size? This will help readers contextualize how the experimental design could influence the role of standing variation.

      The experimental design involved sampling approximately 6,000 individuals from the wild population ZJSX1, which were subsequently divided into two parallel cohorts under controlled laboratory conditions. The selection group (LabR) was subjected to continuous selection pressure using cyetpyrafen, while the control group (LabS) was maintained under identical laboratory conditions without exposure to acyetpyrafen. A dynamic selection regime was implemented wherein the acaricide dosage was systematically adjusted every two generations to maintain a consistent selection intensity, achieving a mortality rate of 60% ± 10% in the LabR population. This adaptive dosage strategy ensured sustained evolutionary pressure while preventing population collapse. The LC<sub>50</sub> values were tested at F1, F32, F54, F60, F62, and F66 generations using standardized bioassay protocols to quantify resistance development trajectories and optimize dosage for subsequent selection cycles. We provided the additional information in subsection 4.1 of the materials and methods section.

      - The finding that lab-evolved strains show cross-resistance is interesting, but potentially complicates the story. It would help to know more about the other mitochondrial complex II inhibitors used across China and their impact on adaptive dynamics at these loci, particularly regarding pre-existing resistance alleles. For example, a comparison of usage data from 2013, 2017, and 2019 could help explain whether cyetpyrafen was the main driver of resistance or if previous pesticides played a role. What happened in 2020 that caused such rapid evolution 3 years after launch?

      Although the introduction of the other two SDHI acaricides complicates the story, we would like to provide a complete background on the usage of acaricides with this mode of action in China. Although cyflumetofen was released in 2013 before cyetpyrafen, and cyenopyrafen was released in 2019 after cyetpyrafen, their market share is minor (about 3.2%) compared to cyetpyrafen (about 96.8%, personal communication). Since cross-resistance is reported among SDHIs, we could not exclude the contribution of cyflumetofen to the initial accumulation of resistance alleles, but the effect should be minor, both because of their minimal market share and because of the independent evolution of resistance in the field as found in our study. Although the contribution of cyflumetofen and cyenopyrafen cannot be entirely excluded, the rapid evolution of resistance seems likely to be mainly explained by the intensive application of cyetpyrafen. To clarify this issue, we added relevant information in the first paragraph of the discussion section.

      (2) Evolutionary history of resistance alleles:

      - It would be beneficial to examine the population structure of the sampled populations, especially regarding the role of migration. Though resistance evolution appears to have had minimal impact on genome-wide diversity (as shown in Supplementary Figure 2), could admixture be influencing the results? An explicit multivariate regression framework could help to understand factors influencing diversity across populations, as right now much is left to the readers' visual acuity.

      The genetic structure of the populations was examined by Treemix analysis. We detected only one migration event from JXNC to SHPD (no resistance data available for these two populations), suggesting a limited role for migration to resistance evolution. The multiple regression analysis revealed that overall genetic diversity and Tajima’s D across the genome were not significantly associated with resistance levels, genetic structure or geographic coordinates (P > 0.05), which all support a limited role of migration in resistance development.

      - It is unclear why lab populations were included in the migration/treemix analysis. We might suggest redoing the analysis without including the laboratory populations to reveal biologically plausible patterns of resistance evolution.

      Thank you for the constructive suggestion. The Treemix analysis was redone by removing laboratory populations and is now reported.

      - Can the authors explore isolation by distance (IBD) in the frequency of resistance alleles?

      Thank you for the constructive suggestion. No significant isolation-by-distance pattern was detected for resistance allele frequencies across all surveyed years (2020: P=0.73; 2021: P=0.52; 2023: P=0.16; Mantel test). We added these results to the text.

      - Given the claim regarding the novelty of the number of pesticide resistance mutations, it is important to acknowledge the evolution of resistance to all pesticides (antibiotics, herbicides, etc.). ALS-inhibiting herbicides have driven remarkable repeatability across species based on numerous SNPs within the target gene.

      We appreciate this comment, which highlights the need to place our findings within the broader evolutionary context of pesticide resistance. We have investigated references relevant to the evolution of resistance to diverse pesticides. As far as we can tell, the 15 target mutations in eight amino acid residues are among the highest number of pesticide resistance mutations detected, especially within the context of animal studies. We have added relevant text to the second paragraph of the discussion.

      - Figure 5 A-B. Why not run a multivariate regression with status at each resistance mutation encoded as a separate predictor? It is interesting that focusing on the predominant mutation gives the strongest r2, but it is somewhat unintuitive and masks some interesting variation among populations.

      We conducted a multiple regression analysis to explore the influence of multiple mutations on resistance levels of field populations. However of 15 putative resistant mutations, only five were detected in more than three populations where bioassay data are available, i.e. I260T, I260V, D116G, R119C, R119L. The frequency of three of these mutations, I260T (P = 0.00128), I260V (P = 0.00423) and D116G (P = 0.00058), are significantly correlated with the resistance level of field populations. This has been added.

      (3) Haplotype Reconstruction (Line 271-):

      - We are a bit sceptical of the methods taken to reconstruct these haplotypes. It seems as though the authors did so with Sanger sequencing (this should be mentioned in the text), focusing only on homozygous SNPs. How many such SNPs were used to reconstruct haplotypes, along what length of sequence? For how many individuals were haplotypes reconstructed? Nonetheless, I appreciated that the authors looked into the extent to which the reconstructed haplotypes could be driven by recombination. Can the authors elaborate on the calculations in line 296? Is that the census population size estimate or effective?

      Because haplotypes could not be determined when more than two loci were heterozygous, we detected haplotypes from sequencing data with at most one heterozygous locus. In total 844 individuals and 696 individuals were used to detect haplotypes of sdhB and sdhD. We detected 11 haplotypes (with 8 SNPs) and 24 haplotypes (with 11 SNPs) along 216 bp of the sdhB and 155 bp of the sdhD genes, respectively. Please see the fifth paragraph of subsection 2.4. We used ρ = 4 × Ne × d (genetic distance) (Li and Stephens, 2003) to calculate the number of effective individuals for one recombination event.

      (4) Single Mutations and Their Effect (line 312-):

      - It's not entirely clear how the breeding scheme resulted in near-isogenic lines. Could the authors provide a clearer explanation of the process and its biological implications?

      To investigate the effect of single mutations or their combination on resistance levels, we isolated the females and males with the same homozygous/ hemizygous genotypes for creating homozygous lines. Females from these lines were not near-isogenic, but homozygous for the critical mutations. We revised the description in the methods section to clearly define these lines.

      - If they are indeed isogenic, it's interesting that individual resistance mutations have effects on resistance that vary considerably among lines. Could the authors run a multivariate analysis including all potential resistance SNPs to account for interactions between them? Given the variable effects of the D116G substitution (ranging from 4-25%), could polygenic or epistatic factors be influencing the evolution of resistance?

      We couldn’t conduct multivariate analysis because most lines have only one resistant SNP. The four lines homozygous for 116G were from the same population. The variable mortality may reflect other unknown mechanisms but these are beyond the scope of this study.

      - Why are there some populations that segregate for resistance mutations but have no survival to pesticides (i.e., the green points in Figure 5)? Some discussion of this heterogeneity seems required in the absence of validation of the effects of these particular mutations. Could it be dominance playing a role, or do the authors have some other explanation?

      We didn’t investigate the degree of dominance of each mutation. The mutation I260V shows incompletely dominant inheritance (Sun, et al. 2022). To investigate survival rate of different populations, the two-spotted spider mite T. urticae was exposed to 1000 mg/L of cyetpyrafen, higher than the recommended field dose of 100 mg/L. Such a high concentration may lead to death of an individual heterozygous for certain mutations, such as I260V.

      - The authors mention that all resistance mutations co-localized to the Q-site. Is this where the pesticide binds? This seems like an important point to follow their argument for these being resistance-related.

      Yes. We revised Fig. 3c to show the Q-site.

      (5) Statistical Considerations for Allele Frequency Changes (Figure 3):

      - It might be helpful to use a logistic regression model to assess the rate of allele frequency changes and determine the strength of selection acting on these alleles (e.g., Kreiner et al. 2022; Patel et al. 2024). This approach could refine the interpretation of selection dynamics over time.

      Thank you for this suggestion. A logistic regression model was used to track allele frequencies trajectories. The selection coefficient of each allele and their joint effects were estimated.

      Reviewer #2 (Public review):

      Summary:

      This paper investigates the evolution of pesticide resistance in the two-spotted spider mite following the introduction of an SDHI acaricide, cyatpyrafen, in China. The authors make use of cyatpyrafen-naive populations collected before that pesticide was first used, as well as more recent populations (both sensitive and resistant) to conduct comparative population genomics. They report 15 different mutations in the insecticide target site from resistant populations, many reported here for the first time, and look at the mutation and selection processes underlying the evolution of resistance, through GWAS, haplotype mapping, and testing for loss of diversity indicating selective sweeps. None of the target site mutations found in resistant populations was found in pre-exposure populations, suggesting that the mutations may have arisen de novo rather than being present as standing variation, unless initially present at very low frequencies; a de novo origin is also supported by evidence of selective sweeps in some resistant populations. Furthermore, there is no significant evidence of migration of resistant genotypes between the sampled field populations, indicating multiple origins of common mutations. Overall, this indicates a very high mutation rate and a wide range of mutational pathways to resistance for this target site in this pest species. The series of population genomic analyses carried out here, in addition to the evolutionary processes that appear to underlie resistance development in this case, could have implications for the study of resistance evolution more widely.

      Strengths:

      This paper combines phenotypic characterisation with extensive comparative population genomics, made possible by the availability of multiple population samples (each with hundreds of individuals) collected before as well as after the introduction of the pesticide cyatpyrafen, as well as lab-evolved lines. This results in findings of mutation and selection processes that can be related back to the pesticide resistance trait of concern. Large numbers of mites were tested phenotypically to show the levels of resistance present, and the authors also made near-isogenic lines to confirm the phenotypic effects of key mutations. The population genomic analyses consider a range of alternative hypotheses, including mutations arising by de novo mutation or selection from standing genetic variation, and mutations in different populations arising independently or arriving by migration. The claim that mutations most likley arose by multiple repeated de novo mutations is therefore supported by multiple lines of evidence: the direct evidence of none of the mutations being found in over 2000 individuals from naive populations, and the indirect evidence from population genomics showing evidence of selective sweeps but not of significant migration between the sampled populations.

      Weaknesses:

      As acknowledged within the discussion, whilst evidence supports a de novo origin of the resistance-associated mutations, this cannot be proven definitively as mutations may have been present at a very low frequency and therefore not found within the tested pesticide-naive population samples.

      We agree that we could not definitively exclude the presence of a very low incidence of favoured mutations before the introduction of this novel acaricide.

      Near-isofemale lines were made to confirm the resistance levels associated with five of the 15 mutations, but otherwise, the genotype-phenotype associations are correlative, as confirmation by functional genetics was beyond the scope of this study.

      We hope that future functional studies will validate the effects of these mutations on resistance in both the two-spotted spider mite T. urticae and other spider mite species. This could be done by creating near-isogenic female lines or using CRISPR-Cas9 technology, as gene knockouts have recently been established for T. urticae.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Could the authors elaborate on the environmental context (e.g., climate, geography) of the sampled populations to give more nuance to the analysis of genetic differentiation and resistance evolution?

      We have explored the influence of geographic isolation on the frequency of resistance alleles by Mantel tests (isolation by distance). We didn’t investigate the influence of climate, because most of the samples were from greenhouses, where the climate to which the pest is exposed is unclear.

      (2) Line 161: is this supposed to be one R and one S?

      Yes, we added this information (LabR and LabS).

      (3) Line 207: variation is not saturated at the first two sites because the different combinations are not seen. This is a bit misleading.

      What we wanted to indicate was that the two codon positions are saturated, rather than their combinations. We revised this sentence by adding “of each codon position”.

      (4) Line 376: continuous selection did not "result in a new mutation arising". Rather, the mutation arose and was subsequently selected on.

      We revised the expression of this de novo mutation and selection process.

      (5) Line 402: can the authors explore what Ne would be necessary to drive the number of mutational origins they observe, as in (Karasov et al. 2010)?

      It is challenged to estimate Ne, especially when mutation rate data from the two-spotted spider mite T. urticae is unavailable. We observed 2.7 resistant mutations per population in samples collected in 2024, seven years after the release of cyetpyrafen. The estimated mutation rate (Θ) is  0.0193, given 20 generations per year for T. urticae. An effective population size (Ne) of 2.29*10<sup>6</sup> would be necessary to reach the number of de novo mutations observed in this study, given Θ  =  3Neμ (haplodiploid sex determination of T. urticae) and a mutation rate of μ  =  2.8*10<sup>-9</sup> per base pair per generation as estimated for Drosophila melanogaster (Keightley et al., 2014). The high reproductive capacity of T. urticae (> 100 eggs per female) and short generation time makes it easier to reach such a population size in the field as we now note.

      (6) Line 482: how did the authors precisely kill 60% of samples with their selection? What was the applied rate? In general, listing the rates of insecticide used in dose response would be useful to decipher if LD50s are projected outside of the doses used (seems like they are). In this case, authors should limit their estimates to those > the highest rate used in the dose response.

      It is difficult to control mortality precisely. We applied cyetpyrafen every two generations but did not determine the LC<sub>50</sub> every two generations. When mortality was lower than 60%, another round of spraying was applied by increasing the dosage of the pesticide. The LC<sub>50</sub> values were tested at F<sub>1</sub>, F<sub>32</sub>, F<sub>54</sub>, F<sub>60</sub>, F<sub>62</sub>, and F<sub>66</sub> generations to establish the trajectories around resistance.

      (7) The light pink genomic region in Figure 2 was distracting. Why is it included if there is no discussion of genomic regions outside the sdh genes? Generally, there was a lot going on in this figure, and some guiding categories (i.e., lab selected vs wild population) on the figure itself could help orient the reader.

      We included chromosome 2 colored in light pink/ red to show the selection signal across a wider genomic region. In the figure legend, we added a description of the lab selected, field resistant and field susceptible populations. Very little common selection signal was detected among resistant populations on chromosome 2, indicating this region was less likely to be involved in resistance evolution of T. urticae to cyetpyrafen. We also described the result briefly in the figure legend.

      Reviewer #2 (Recommendations for the authors):

      (1) The most significant aspect of this study is the use of multiple pest population samples taken before as well as after the introduction of a class of pesticides, allowing a thorough comparative population genomics study in a species where a range of resistance mutations have appeared within a few years. I would prefer to see a title conveying this significance, rather than the current study, which focuses on the total number of mutations and claimed notoriety of the (at that point unnamed) study species. Similarly, I would prefer an abstract that relies less on superlative claims and includes more details: the scientific name of the study species; the number of years in which resistance evolved; the number of historical specimens; how the resistance levels for single mutations were shown.

      (1) The title was changed by adding “the two-spotted spider mite Tetranychus urticae” and removing the “unprecedented number” to emphasize that “recurrent mutations drive rapid evolution”, i.e., “Recurrent Mutations Drive the Rapid Evolution of Pesticide Resistance in the Two-spotted Spider Mite Tetranychus urticae.”

      (2) The scientific name of the study species was added.

      (3) The number of years in which resistance evolved was added.

      (4) The number of historical specimens was added (2666).

      (5) Because we used homozygous lines but not iso-genic lines or gene-edited lines, our bioassay data could not provide direct evidence on the level of resistance conferred by each mutation. We revised our description of the results and removed this content from the abstract.

      Line 29: if you want to claim the number is unprecedented, please specify the context: unprecedented for a pesticide target in an arthropod pest? (more resistance mutations may have been found in bacteria/fungi...).

      We revised the sentence by adding “in an arthropod pest”.

      Line 30: rather than a claim of notoriety, it may be better to specify what damage this pest causes.

      Revised by describing it as an arthropod pest.

      Line 34: please clarify, was this all in different haplotypes, or were some mutations found in combination?

      Done: We identified 15 target mutations, including six mutations on five amino acid residues of subunit sdhB, and nine mutations on three amino acid residues of subunit sdhD, with as many as five substitutions on one residue.

      (2) The introduction begins by framing the context as resistance evolution in invertebrate pests. However, the evolutionary processes examined in the study are applicable to resistance in other systems, and potentially to other cases of rapid contemporary evolution. The authors could show wider significance for their work beyond the subfield of invertebrate pests by including more of this wider context in their introduction and discussion: even if this means they can no longer claim novelty based on the number of mutations alone, the study is a strong example of the use of population genomics combined with functional and phenotypic characterisation to investigate the evolutionary processes underlying the emergence of resistance, so could have wider importance than within its current framing.

      The background was revised as mentioned above to take this into account.

      For example, in lines 48-50, please clarify what is meant by pesticides here (insects/arthropods? weeds and pathogens too?) In lines 69-73, the opposite is sometimes seen in fungal pathogens, with large numbers of mutations generated in lab-evolved strains.

      We extended pesticides to those targeting arthropods, weeds and pathogens. We still emphasize the situation mainly with respect to arthropod pests.

      (3) Lines 91-93: how many modes of action? How recently were SDHI acaricides introduced?

      Added: at least 11 groups of acaricides based on their modes of action. SDHI was launched in 2007.

      (4) Line 98-102: Use in China is a useful background for the study populations, but the global context should be included too.

      Yes, four SDHI acaricides developed around the globe were introduced.

      (5) Line 113: They show diverse mutations, but all within the mechanism of target-site point mutations.

      We agree to your suggestion. This sentence has been removed as it repeats information stated above it.

      (6) Line 115-116: Yes, agreed; I think this is the main strength of the current study and should be emphasised sooner.

      Thanks.

      (7) Line 158: Selective sweep signals were clear in half of the resistant populations but not in the others. The suggestion that the others had undergine soft sweeps, with multiple mutations increasing in frequency simultaneously but no one reaching fixation, seems reasonable; but the authors could compare the populations that did show a sweep with those that did not (for example, was there greater diversity or evenness of genotypes in those that did not?).

      Five resistant populations with selection signals identified by PBE analysis (Figure 2b) showed corresponding decreases in π and Tajima’s D near the two SDH genes but not across the genome (Figure S1).

      (8) Line 313: please clarify "in combination with other mutations" within a mixed population or combined in one individual/haplotype? Also, the phrase "characterised the function" may be a little misleading, as this is a correlative analysis, not functional confirmation.

      None of the combinations of different resistant mutations was observed in a single haplotype. Here, we examine resistance levels associated with a single mutation or two mutations on sdhB and sdhD in one individual, i.e. sdhB_I260V and sdhD_R119C. We revised the sentences to avoid any implication of functional confirmation.

      (9) Line 358: again, please clarify the context: among arthropod pests?

      Done.

      (10) Line 360-363: please give some background on when and where these related compounds were introduced.

      Added.

      (11) Line 410: yes fitness costs may be a factor, but you could also give an example of a cost expressed in the absence of any pesticides, as well as the given example of negative cross-resistance.

      We added the example of the H258Y mutation which causes both fitness costs and negative cross-resistance.

      (12) Lines 419-438: this is one aspect where the situation for insecticides is in contrast with some other resistance areas.

      Yes, we restricted these statements to arthropod pests.

      (13) Line 466: some more detail could be given here: for example, SNP-specific monitoring would be less effective, but amplicon sequencing would be more suitable.

      Yes, revised.

      (14) Lines 472-475: Please list the numbers of field/lab, pre/post exposure, and sensitive/resistant populations within the main text.

      Done. The number of sensitive/resistant populations was reported in the result section.

      (15) Line 483: randomly selected individuals?

      Yes, added randomly selected individuals.

      (16) Line 556: Sanger sequencing to characterise populations? Or a number of individuals from each population?

      Revised.

      (17) References: there are some duplicate entries, please check this.

      Checked.

      (18) Figure 1e: consider a log(10) scale to better show large fold changes and avoid multiple axis breaks.

      Thanks for your suggestions. However we didn’t scale the LC<sub>50</sub> value, because we wanted to show the specific impact of 1,000 mg/L. The breaks in the Y axis around 30 mg/L -1,000 mg/L reveal that the LC50s of the resistant populations were all greater than 1000 mg/L, while those of the susceptible populations were all below 30 mg/L. This justified the use 1000 mg/L as a discriminating dose to investigate resistance status and level in subsequent work.

    1. eLife Assessment

      This valuable study reports the first characterization of the CG14545 gene in Drosophila melanogaster, which the authors name "Sakura." Acting during germline stem cell fate and differentiation, Sakura is required for both oogenesis and female fertility, although some mechanistic details require further investigation. This solid study presents a wide-ranging and well-controlled characterization of Sakura, and accordingly the findings and associated reagents described will be of use to scientists interested in oogenesis and early development.

    2. Reviewer #1 (Public review):

      In this manuscript, Azlan et al. identified a novel maternal factor called Sakura that is required for proper oogenesis in Drosophila. They showed that Sakura is specifically expressed in the female germline cells. Consistent with its expression pattern, Sakura functioned autonomously in germline cells to ensure proper oogenesis. In sakura KO flies, germline cells were lost during early oogenesis and often became tumorous before degenerating by apoptosis. In these tumorous germ cells, piRNA production was defective and many transposons were derepressed. Interestingly, Smad signaling, a critical signaling pathway for the GSC maintenance, was abolished in sakura KO germline stem cells, resulting in ectopic expression of Bam in whole germline cells in the tumorous germline. A recent study reported that Bam acts together with the deubiquitinase Otu to stabilize Cyc A. In the absence of sakura, Cyc A was upregulated in tumorous germline cells in the germarium. Furthermore, the authors showed that Sakura co-immunoprecipitated Otu in ovarian extracts. A series of in vitro assays suggested that the Otu (1-339 aa) and Sakura (1-49 aa) are sufficient for their direct interaction. Finally, the authors demonstrated that the loss of otu phenocopies the loss of sakura, supporting their idea that Sakura plays a role in germ cell maintenance and differentiation through interaction with Otu during oogenesis.

      Latest comments:

      The reviewer acknowledges the importance of sharing the observed defects in Sakura mutant ovaries and the possible physiological significance of the Sakura-Out interaction with the research community, as this information could lay the groundwork for future functional analysis research.

    3. Reviewer #2 (Public review):

      In this study, the authors identified CG14545 (named it sakura), as a key gene essential for Drosophila oogenesis. Genetic analyses revealed that Sakura is vital for both oogenesis progression and ultimate female fertility, playing a central role in the renewal and differentiation of germ stem cells (GSC).

      The absence of Sakura disrupts the Dpp/BMP signaling pathway, resulting in abnormal bam gene expression, which impairs GSC differentiation and leads to GSC loss. Additionally, Sakura is critical for maintaining normal levels of piRNAs. Also, the authors convincingly demonstrate that Sakura physically interacts with Otu, identifying the specific domains necessary for this interaction, suggesting a cooperative role in germline regulation. Importantly, the loss of otu produces similar defects to those observed in sakura mutants, highlighting their functional collaboration.

      The authors provide compelling evidence that Sakura is a critical regulator of germ cell fate, maintenance, and differentiation in Drosophila. This regulatory role is mediated through modulation of pMad and Bam expression. However, the phenotypes observed in the germarium appear to stem from reduced pMad levels, which subsequently trigger premature and ectopic expression of Bam. This aberrant Bam expression could lead to increased CycA levels and altered transcriptional regulation, impacting piRNA expression. In this revised manuscript, the authors further investigated whether Sakura affects the function of Orb, a binding partner they identified, in deubiquitinase activity when Orb interacts with Bam.

      This elaborate study will be embraced by both germline-focused scientists and the developmental biology community.

      Latest comments:

      The authors answered all my persistent concerns and made changes according to the recommendations I incorporated for the revised version of the manuscript.

    4. Reviewer #3 (Public review):

      In this very thorough study, the authors characterize the function of a novel Drosophila gene, which they name Sakura. They start with the observation that sakura expression is predicted to be highly enriched in the ovary and they generate an anti-sakura antibody, a line with a GFP-tagged sakura transgene, and a sakura null allele to investigate sakura localization and function directly. They confirm the prediction that it is primarily expressed in the ovary and, specifically, that it is expressed in germ cells, and find that about 2/3 of the mutants lack germ cells completely and the remaining have tumorous ovaries. Further investigation reveals that Sakura is required for piRNA-mediated repression of transposons in germ cells. They also find evidence that sakura is important for germ cell specification during development and germline stem cell maintenance during adulthood. However, despite the role of sakura in maintaining germline stem cells, they find that sakura mutant germ cells also fail to differentiate properly such that mutant germline stem cell clones have an increased number of "GSC-like" cells. They attribute this phenotype to a failure in the repression of Bam by dpp signaling. Lastly, they demonstrate that sakura physically interacts with otu and that sakura and otu mutants have similar germ cell phenotypes. Overall, this study helps to advance the field by providing a characterization of a novel gene that is required for oogenesis. The data are generally high-quality and the new lines and reagents they generated will be useful for the field.

      Latest comments:

      As with my previous assessment, I remain supportive of publication of this manuscript. Though I agree with the other reviewers that additional experimentation would increase the value of this study even further, I feel it will also be a useful contribution to the field as is.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Azlan et al. identified a novel maternal factor called Sakura that is required for proper oogenesis in Drosophila. They showed that Sakura is specifically expressed in the female germline cells. Consistent with its expression pattern, Sakura functioned autonomously in germline cells to ensure proper oogenesis. In sakura KO flies, germline cells were lost during early oogenesis and often became tumorous before degenerating by apoptosis. In these tumorous germ cells, piRNA production was defective and many transposons were derepressed. Interestingly, Smad signaling, a critical signaling pathway for the GSC maintenance, was abolished in sakura KO germline stem cells, resulting in ectopic expression of Bam in whole germline cells in the tumorous germline. A recent study reported that Bam acts together with the deubiquitinase Otu to stabilize Cyc A. In the absence of sakura, Cyc A was upregulated in tumorous germline cells in the germarium. Furthermore, the authors showed that Sakura co-immunoprecipitated Otu in ovarian extracts. A series of in vitro assays suggested that the Otu (1-339 aa) and Sakura (1-49 aa) are sufficient for their direct interaction. Finally, the authors demonstrated that the loss of otu phenocopies the loss of sakura, supporting their idea that Sakura plays a role in germ cell maintenance and differentiation through interaction with Otu during oogenesis.

      Strengths:

      To my knowledge, this is the first characterization of the role of CG14545 genes. Each experiment seems to be well-designed and adequately controlled

      Weaknesses:

      However, the conclusions from each experiment are somewhat separate, and the functional relationships between Sakura's functions are not well established. In other words, although the loss of Sakura in the germline causes pleiotropic effects, the cause-and-effect relationships between the individual defects remain unclear.

      Comments on latest version:

      The authors have attempted to address my initial concerns with additional experiments and refutations. Unfortunately, my concerns, especially my specific comments 1-3, remain unaddressed. The present manuscript is descriptive and fails to describe the molecular mechanism by which Sakura exerts its function in the germline. Nevertheless, this reviewer acknowledges that the observed defects in sakura mutant ovaries and the possible physiological significance of the Sakura-Out interaction are worth sharing with the research community, as they may lay the groundwork for future research in functional analysis.

      We thank the reviewer for valuable comments. We would like to investigate the molecular mechanism by which Sakura exerts its function in the germline in near future studies. 

      Reviewer #2 (Public review):

      In this study, the authors identified CG14545 (named it sakura), as a key gene essential for Drosophila oogenesis. Genetic analyses revealed that Sakura is vital for both oogenesis progression and ultimate female fertility, playing a central role in the renewal and differentiation of germ stem cells (GSC).

      The absence of Sakura disrupts the Dpp/BMP signaling pathway, resulting in abnormal bam gene expression, which impairs GSC differentiation and leads to GSC loss. Additionally, Sakura is critical for maintaining normal levels of piRNAs. Also, the authors convincingly demonstrate that Sakura physically interacts with Otu, identifying the specific domains necessary for this interaction, suggesting a cooperative role in germline regulation. Importantly, the loss of otu produces similar defects to those observed in sakura mutants, highlighting their functional collaboration.

      The authors provide compelling evidence that Sakura is a critical regulator of germ cell fate, maintenance, and differentiation in Drosophila. This regulatory role is mediated through modulation of pMad and Bam expression. However, the phenotypes observed in the germarium appear to stem from reduced pMad levels, which subsequently trigger premature and ectopic expression of Bam. This aberrant Bam expression could lead to increased CycA levels and altered transcriptional regulation, impacting piRNA expression. In this revised manuscript, the authors further investigated whether Sakura affects the function of Orb, a binding partner they identified, in deubiquitinase activity when Orb interacts with Bam.

      We appreciate the authors' efforts to address all our comments. While these revisions have greatly improved the clarity of certain sections, some of the concerns remain unclear, while details mentioned in the responses about these studies should be incorporated in the manuscript. Specifically, the manuscript still lacks the demonstration that Sakura co-localizes with Orb/Bam despite having the means for staining and visualization. This would bring insight into the selective binding of Orb with Bam vs. Sakura perhaps at different stages of oogenesis. Such analyses would allow for more specific conclusions, further alluding to the underlying mechanism, rather than the general observations currently presented.

      This elaborate study will be embraced by both germline-focused scientists and the developmental biology community.

      We thank the reviewer for valuable comments. We believe that the author meant Otu, not Orb, for the binding partner of Sakura that we identified. We would like to investigate the colocalization of Sakura with other proteins including Otu and the molecular mechanism by which Sakura exerts its function in the germline in near future studies. 

      Reviewer #3 (Public review):

      In this very thorough study, the authors characterize the function of a novel Drosophila gene, which they name Sakura. They start with the observation that sakura expression is predicted to be highly enriched in the ovary and they generate an anti-sakura antibody, a line with a GFP-tagged sakura transgene, and a sakura null allele to investigate sakura localization and function directly. They confirm the prediction that it is primarily expressed in the ovary and, specifically, that it is expressed in germ cells, and find that about 2/3 of the mutants lack germ cells completely and the remaining have tumorous ovaries. Further investigation reveals that Sakura is required for piRNA-mediated repression of transposons in germ cells. They also find evidence that sakura is important for germ cell specification during development and germline stem cell maintenance during adulthood. However, despite the role of sakura in maintaining germline stem cells, they find that sakura mutant germ cells also fail to differentiate properly such that mutant germline stem cell clones have an increased number of "GSC-like" cells. They attribute this phenotype to a failure in the repression of Bam by dpp signaling. Lastly, they demonstrate that sakura physically interacts with otu and that sakura and otu mutants have similar germ cell phenotypes. Overall, this study helps to advance the field by providing a characterization of a novel gene that is required for oogenesis. The data are generally high-quality and the new lines and reagents they generated will be useful for the field.

      Comments on latest version:

      With these revisions, the authors have addressed my main concerns.

      We thank the reviewer for valuable comments.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The manuscript is much improved based on the changes made upon recommendations from the reviewers.

      Though most of our comments have been addressed, we have a few more we wish to recommend. For previous points we made, we replied with further clarification for the authors.

      Figure 1

      (1) B should be the supplemental figure.

      We moved the former Fig 1B to Supplemental Figure 1.

      • Previous Fig1B (sakura mRNA expression level) is now Fig S2, not S1. Please make this data as Fig S1.

      We moved Fig S1 to main Fig7A and renumbered Fig S2-S16 to Fig S1-S15.

      (2) C - How were the different egg chamber stages selected in the WB? Naming them 'oocytes' is deceiving. Recommend labeling them as 'egg chambers', since an oocyte is claimed to be just the one-cell of that cyst.

      We changed the labeling to egg chambers.

      • The labels on lanes for Stages 12-13 and Stage 14, still only say "chambers", not "egg chambers". Also there is no Stage 1-3 egg chamber. More accurately, the label should be "Germarium - Stage 11 egg chambers".

      We updated the lables on lanes as suggested by the reviewer.

      (3) Is the antibody not detecting Sakura in IF? There is no mention of this anywhere in the manuscript.

      While our Sakura antibody detects Sakura in IF, it seems to detect some other proteins as well. Since we have Sakura-EGFP fly strain (which fully rescues sakuranull phenotypes) to examine Sakura expression and localization without such non-specific signal issues, we relied on Sakura-EGFP rather than anti-Sakura antibodies for IF.

      • Please put this info into the Methods section.

      We added this info into the Methods section.

      (4) Expand on the reliance of the sakura-EGFP fly line. Does this overexpression cause any phenotypes?

      sakura-EGFP does not cause any phenotypes in the background of sakura[+/+] and sakura[+/-].

      • Please add this detail into the manuscript.

      We added this info into the Methods section.

      Figure 5

      (1) D - It might make more sense if this graph showed % instead of the numbers.

      We did not understand the reviewer's point. We think using numbers, not %, makes more sense.

      • Having a different 'n' number for each experiment does not allow one to compare anything except numbers of the egg chambers. This must be normalized.

      We still don’t agree with the reviewer. In Fig 5D, we are showing the numbers of stage 14 oocytes per fly (= per a pair of ovaries). ‘n’ is the number of flies (= number of a pair of ovaries) examined. We now clarified this in the figure legend. Different ‘n’ number does not prevent us from comparing the numbers of stage 14 oocytes per fly. Therefore, we would like to show as it is now.

      (2) Line 213 - explain why RNAi 2 was chosen when RNAi 1 looks stronger.

      Fly stock of RNAi line 2 is much healthier than RNAi line 1 (without being driven Gal4) for some reasons. We had a concern that the RNAi line 1 might contain an unwanted genetic background. We chose to use the RNAi 2 line to avoid such an issue.

      • Please add this information to the manuscript.

      We added this info into the Methods section.

      Figure 7/8 - can go to Supplemental.

      We moved Fig 8 to supplemental. However, we think Fig 7 data is important and therefore we would like to present them as a main figure.

      • Current Fig S1 should go to Fig 7, to better understand the relationship between pMad and Bam expression.

      We moved Fig S1 to main Fig7A and renumbered Fig S2-S16 to Fig S1-S15.

      Figure 9C - Why the switch to S2 cells? Not able to use the Otu antibody in the IP of ovaries?

      We can use the Otu antibody in the IP of ovaries. However, in anti-Sakura Western after anti Otu IP, antibody light chain bands of the Otu antibodies overlap with the Sakura band. Therefore, we switched to S2 cells to avoid this issue by using an epitope tag.

      • Please add this info to the Methods section.

      We added this info into the Methods section.

      Figure 10- Some images would be nice here to show that the truncations no longer colocalize.

      We did not understand the reviewer's points. In our study, even for the full-length proteins. We have not shown any colocalization of Sakura and Otu in S2 cells or in ovaries, except that they both are enriched in developing oocytes in egg chambers.

      • Based on your binding studies, we would expect them to colocalize in the egg chamber, and since there are antibodies and a GFP-line available, it would be important to demonstrate that via visualization.

      As we wrote in the response and now in the manuscript, our antibodies are not best for immunostaining. We will try to optimize the experimental conditions in the future studies.

    1. eLife Assessment

      This valuable study identifies a population of CD81-positive fibroblasts showing senescence signatures that can activate neutrophils through the C3/C3aR1 axis, hence contributing to the inflammatory response in periodontitis. Solid evidence, combining in vitro and in vivo analyses and mouse and human data, supports these findings. The revised manuscript has addressed many concerns significantly. The work would be of interest to researchers working in the senescence and oral medicine fields.

    2. Reviewer #2 (Public review):

      Summary:

      The authors report the discovery of a population of gingival fibroblasts displaying the expression of cellular senescence markers P21 and P16 in human periodontitis samples and a murine ligature-induced periodontitis (LIP) model. They support this finding in the murine model through bulk RNA-sequencing and show that differentially expressed genes are significantly enriched in the SenMayo cellular senescence in aging dataset. They then show that Ligature-Induced Periodontitis (LIP) mice treated with the senomorphic drug metformin display overall diminished bone damage, reduced histomorphic alterations, and a reduction in P21 and P16 immunostaining signal. To explore the cell types expressing cellular senescence markers in periodontitis, the authors make use of a combination of bioinformatic analyses on publicly available scRNA-seq data, immunostainings on patient samples and their LIP model; as well as in vitro culture of healthy human gingival fibroblasts treated with LPS. They found that fibroblasts are a cell population expressing P16 in periodontitis which are also enriched for SenMayo genes, suggesting they have a senescent phenotype. They then point to a subgroup of fibroblasts expressing CD81+ with the highest enrichment for a SASP geneset in periodontitis. They also show that treatment of LIP mice and human LPS-treated gingival fibroblasts with metformin leads to a reduction of P21 and P16-positive cells, as well as the senescence-associated beta-galactosidase (SA-beta-gal) marker. Finally, they show evidence suggesting that CD81+ senescent fibroblasts are the source of C3 complement protein, which they stipulate signals through the C3AR1 receptor present in neutrophils in periodontitis. The authors observed that both CD81+ fibroblast and C3AR1+ neutrophil populations are expanded in periodontitis, that both populations appear to be in close contact, and that treatment with metformin reduced both C3 and the neutrophil marker MPO in their mouse LIP model.

      After a round of revision, the authors have made significant improvements to their manuscript, such as improving the quality of the data/evidence and also included new data from experiments using a well-known senolytic and the senomorphic metformin, which all together provide a solid support to their main claims.

      Strengths:

      The study implements several different techniques and tools on human samples, mouse models, fibroblast cultures, and publicly available data to support their conclusions. In summary, they provide solid evidence showing that in the context of periodontitis, there is an expansion of cells expressing senescence markers P21, and P16, as well as members of the SASP, and that this includes CD81+ fibroblasts.

      Weaknesses:

      The fact that in this study the periodontitis samples belonged to patients with a significantly higher median age (all older than 50 years of age) and the healthy samples belonged to young adults (all younger than 35 years of age), raises the need for caution in interpretation due to a possible effect of aging in the accumulation of CD81+ senescent fibroblasts. However, the recruitment of similar age groups in this case is of course difficult due to the higher prevalence of periodontitis in older adults. In this regard it is important to note that the authors still support their findings using a mouse ligature model. Similar studies comparing healthy and periodontitic patients from similar age groups will be of great importance in the future.

    3. Author response:

      The following is the authors’ response to the original reviews

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      There are four main areas that need further clarification:

      (1) Further and more complete assessment of senescence and the fibroblasts must be done to support the claims. 

      We sincerely appreciate the Reviewing Editor's valuable suggestion regarding the addition of cellular senescence detection markers. In the revised manuscript, we have incorporated additional detection markers for cellular senescence, such as H3K9me3 and SA-β-gal staining, in healthy and periodontitis gingival samples to further validate our findings (Figure 1A, B in revised manuscripts).

      (2) Confusion between ageing and senescence throughout the manuscript.

      We fully understand the concerns raised by the Reviewing Editor and reviewers regarding the confusion between the concepts of ageing and senescence in the manuscript. Cellular senescence is a manifestation of ageing at the cellular level. In the revised manuscript, we have given priority to the term ‘senescence’ to describe the cell condition instead of ‘aging’.

      (3) The lipid metabolism mechanistic claims are very speculative and largely unsupported by experimental data. 

      We greatly appreciate the Reviewing Editor and reviewers for pointing out the incorrect statements regarding the role of lipid metabolism in regulating cellular senescence. Since the mechanism by which cellular metabolism regulates cellular senescence is not the core focus of this manuscript, we have moved the results of the metabolic analysis from the sc-RNA sequencing data to the figure supplement (Figure 4-figure supplement 1) and revised the related statements in the revised manuscript (Page 7-8, Line 186-194).

      (4) Concerns about the use of Metformin as a senotherapy vs other pleiotropic effects in periodontitis and the suggestion of using an alternative Senolytic drug (Bcl2 inhibitors, etc.). 

      We fully understand the concerns of the Reviewing Editor and reviewers regarding metformin as an anti-aging therapy. In the revised manuscript, we have included additional experiments using other senolytic drugs ABT-263, a Bcl2 inhibitor, in the ligature-induced periodontitis mouse model. The corresponding results could be found in the Figure 6. and Page 9-10, Line 248-264 in the revised manuscripts.

      Reviewer #1 (Recommendations For The Authors):

      While most of the experiments are elegantly designed and the procedures well conducted there are several critical weaknesses that temper my enthusiasm for this solid and timely work. Considering my main points, I would recommend the following:

      (1) Potentiate the senescent assessment in vitro and, most importantly, in vivo. E.g. SABgal with fresh tissue, other senescent biomarkers like SAHFs (HP1g or H3K9me3), etc.

      We sincerely appreciate the reviewers' suggestion to potentiate the assessment of cellular senescence. In the revised manuscript, we performed SA-β-gal staining on fresh frozen samples, revealing a significantly higher number of SA-β-gal positive cells in the gingival tissue of periodontitis, particularly in the lamina propria, while few SA-β-gal positive cells were observed in healthy gingival tissue (Figure. 1A). Additionally, we assessed the protein level changes of H3K9me3, a marker of senescence-associated heterochromatin foci (SAHF), in gingival tissues from healthy individuals and periodontitis patients. The results showed a notable increase in the number of H3K9me3 positive cells in periodontitis tissues, approximately double that found in healthy gingiva (Figure. 1B). This trend aligns with our previous findings of elevated p16 and p21 levels. Collectively, these results further confirm that periodontitis gingival tissues contain a greater number of senescent cells compared to healthy gingiva.  

      (2) Claims on disturbances in lipid metabolism as a driver of CD81+ fibroblast senescence require appropriate functional/mechanistic validations and experiments of metabolism rewiring.

      We sincerely appreciate the reviewers' suggestion for more experimental evidence regarding the role of lipid metabolism in driving CD81+ fibroblast senescence. The influence and mechanisms of lipid metabolism on cellular senescence is a complex and important scientific issue, and it is not the central focus of this manuscript. Therefore, to avoid causing confusion for the reviewers and readers, we have removed the metabolism analysis in the Figure 4-figure supplement 1 and revised the presentation of the relevant results in the revised manuscript to ensure a more rigorous interpretation of our findings (Page 7-8, Line 186-194). 

      (3) Do LPS-stimulated HGFS implementing the senescent programme secrete C3? Detection of complement C3 at the protein level (e.g. by ELISA) would reinforce the proposed mechanism.

      This is indeed a very interesting question. In response to the reviewers' suggestion, we measured the levels of C3 protein secreted by human gingival fibroblasts induced by Pg-LPS, which is one of the markers of the senescence-associated secretory phenotype (SASP). The results indicated that, compared to untreated fibroblasts, those induced by Pg-LPS exhibited significantly higher levels of C3 secretion, approximately 1.5 times that of the control group (Figure. 5G). Additionally, we also found that primary gingival fibroblasts derived from periodontitis tissues secreted more complement C3 compared to those derived from healthy tissues (Figure. 5F). These findings suggest that the increased secretion of complement C3 by gingival fibroblasts in periodontitis tissues may be related to Pg-LPS-induced cellular senescence.

      (4) The mechanism of Metformin to impair senescence and/or the SASP is not fully validated and Metformin can produce other pleiotropic effects. A key experiment (including therapeutic implications) is using a senolytic drug (e.g. Navitoclax) to causally connect the eradication of senescent CD81+ fibroblasts with the recruitment of neutrophils. If the hypothesis of the authors is correct this approach should result in reduced levels of gingival CD81 and C3 positivity, prevention of neutrophils infiltration (reduced MPO positivity), and ameliorate bone damage in ligationinduced periodontitis murine models.

      We fully understand the reviewers' concerns regarding the role of metformin in alleviating cellular senescence and the possibility of it acting through non-senescent pathways. To clarify the role of cellular senescence in the recruitment of neutrophils by CD81+ fibroblasts through C3 in periodontitis, we treated a ligature-induced periodontitis mouse model with ABT-263, also known as Navitoclax. The results showed that after ABT-263 treatment, the number of p16-positive or H3K9me3-positive senescent cells in the periodontitis mice significantly decreased. Additionally, we observed reductions in the quantities of CD81+ fibroblasts, C3 protein levels, neutrophil infiltration, and osteoclasts to varying degrees in the LIP model after ABT263 treatment (Figure. 6). These results further support our hypothesis that the eradication of senescent CD81+ fibroblasts could reduce neutrophil infiltration and alveolar bone resorption. 

      (5) Have the authors considered using any of the available C3/C3aR inhibitors to validate the involvement of neutrophils and the inflammatory response in periodontitis? A C3/C3aR inhibitor would be an elegant treatment group in parallel with the senolytic approach.

      Thank you very much for the reviewers' suggestion to investigate neutrophil infiltration and inflammatory responses after treating periodontitis with C3/C3aR inhibitors. In a clinical study by Hasturk et al. in 2021 (Reference 1), it was found that using the C3 inhibitor AMY-101 effectively alleviated gingival inflammation levels in periodontitis patients. This was reflected in significant decreases in clinical indicators such as the modified gingival index and bleeding on probing, as well as a marked reduction in inflammatory tissue destruction markers, including MMP-8 and MMP-9. In addition, Tomoki Maekawa et al. (Reference 2) demonstrated that a peptide inhibitor of complement C3 effectively reduced inflammation levels and the extent of bone resorption in periodontitis. Moreover, research by Guglietta et al. (Reference 3) clarified that the C3 complement promotes neutrophil recruitment and the formation of neutrophil extracellular traps (NETs) via C3aR. And neutrophil extracellular traps are considered key pathological factors in causing sustained chronic inflammation in periodontitis (References 4 and 5). In summary, existing studies have clearly indicated that C3/C3aR inhibitors likely reduce neutrophil recruitment and inflammation in periodontitis. 

      Reference

      (1) Hasturk, H., Hajishengallis, G., Forsyth Institute Center for Clinical and Translational Research staff, Lambris, J. D., Mastellos, D. C., & Yancopoulou, D. (2021). Phase IIa clinical trial of complement C3 inhibitor AMY-101 in adults with periodontal inflammation. The Journal of clinical investigation, 131(23), e152973.

      (2) Maekawa, T., Briones, R. A., Resuello, R. R., Tuplano, J. V., Hajishengallis, E., Kajikawa, T., Koutsogiannaki, S., Garcia, C. A., Ricklin, D., Lambris, J. D., & Hajishengallis, G. (2016). Inhibition of pre-existing natural periodontitis in non-human primates by a locally administered peptide inhibitor of complement C3. Journal of clinical periodontology, 43(3), 238–249.

      (3) Guglietta, S., Chiavelli, A., Zagato, E., Krieg, C., Gandini, S., Ravenda, P. S., Bazolli, B., Lu, B., Penna, G., & Rescigno, M. (2016). Coagulation induced by C3aR-dependent NETosis drives protumorigenic neutrophils during small intestinal tumorigenesis. Nature communications, 7, 11037.

      (4) Kim, T. S., Silva, L. M., Theofilou, V. I., Greenwell-Wild, T., Li, L., Williams, D. W., Ikeuchi, T., Brenchley, L., NIDCD/NIDCR Genomics and Computational Biology Core, Bugge, T. H., Diaz, P. I., Kaplan, M. J., Carmona-Rivera, C., & Moutsopoulos, N. M. (2023). Neutrophil extracellular traps and extracellular histones potentiate IL-17 inflammation in periodontitis. The Journal of experimental medicine, 220(9), e20221751.

      (5) Silva, L. M., Doyle, A. D., Greenwell-Wild, T., Dutzan, N., Tran, C. L., Abusleme, L., Juang, L. J., Leung, J., Chun, E. M., Lum, A. G., Agler, C. S., Zuazo, C. E., Sibree, M., Jani, P., Kram, V., 6 Martin, D., Moss, K., Lionakis, M. S., Castellino, F. J., Kastrup, C. J., … Moutsopoulos, N. M. (2021). Fibrin is a critical regulator of neutrophil effector function at the oral mucosal barrier. Science (New York, N.Y.), 374(6575), eabl5450.

      Other comments

      (1) Figure 1. The authors report upregulation of the aging pathway in bulk RNAseq analyses. What about the upregulation of senescence-related pathways and differential expression of SASP-related genes in this experiment?

      Thanks for this interesting question. Through further analysis of the bulk RNA sequencing results of gingival tissues from LIP mice model, we found significant alterations in multiple senescence-associated secretory phenotype (SASP) genes and several cellular senescencerelated pathways. SASP genes, such as Icam1, Mmp3, Nos3, Igfbp7, Igfbp4, Mmp14, Timp1, Ngf, Il6, Areg, and Vegfa, were markedly upregulated in the periodontitis samples of ligature-induced mice (Figure 1-figure supplement 2A). Moreover, we observed a significant reduction in oxidative phosphorylation levels and the tricarboxylic acid (TCA) cycle in the periodontitis group, suggesting that the occurrence of cellular senescence may be related to mitochondrial dysfunction (Figure 1figure supplement 2B and C.).

      Additionally, we noted the activation of the PI3K-AKT and MAPK pathways in LIP model (Figure 1-figure supplement 2D and E), both of which can induce cellular senescence by activating the tumor suppressor pathway TP53/CDKN1A, leading to cell cycle arrest (References 1, 2). Furthermore, the NF-κB signaling pathway was also significantly enriched in LIP model (Figure 1-figure supplement 2F), which is closely associated with the secretion of SASP factors (Reference 3).

      In summary, our bulk RNA sequencing results suggest enrichment of cellular senescencerelated pathways in the periodontitis group, including mitochondrial metabolic dysregulation, senescence-related pathways, and alterations in the SASP. Related results were added into Page 56 of the revised manuscripts.

      Reference

      (1) Tang Q, Markby GR, MacNair AJ, Tang K, Tkacz M, Parys M, Phadwal K, MacRae VE, Corcoran BM. TGF-β-induced PI3K/AKT/mTOR pathway controls myofibroblast differentiation and secretory phenotype of valvular interstitial cells through the modulation of cellular senescence in a naturally occurring in vitro canine model of myxomatous mitral valve disease. Cell Prolif. 2023 Jun;56(6):e13435. doi: 10.1111/cpr.13435.

      (2) Sayegh S, Fantecelle CH, Laphanuwat P, Subramanian P, Rustin MHA, Gomes DCO, Akbar AN, Chambers ES. Vitamin D3 inhibits p38 MAPK and senescence-associated inflammatory mediator secretion by senescent fibroblasts that impacts immune responses during ageing. Aging Cell. 2024 Apr;23(4):e14093.

      (3) Raynard C, Ma X, Huna A, Tessier N, Massemin A, Zhu K, Flaman JM, Moulin F, Goehrig D, Medard JJ, Vindrieux D, Treilleux I, Hernandez-Vargas H, Ducreux S, Martin N, Bernard D. NF-κB-dependent secretome of senescent cells can trigger neuroendocrine transdifferentiation of breast cancer cells. Aging Cell. 2022 Jul;21(7):e13632.

      (2) I wonder whether the authors could clarify how the semi quantifications for p21, p16, Masson's trichrome, C3, or MPO were done in Figures 1, 2, and 6.

      Thank you very much for the reviewer's suggestion. We have added the semi-quantitative methods for p21, p16, Masson's trichrome, C3, and MPO in the Methods section. Specifically, for semi-quantification of protein expressions, the mean optical density (MOD) of positive stains for p21, p16, and C3 was measured using the ImageJ2 software (version 2.14.0, National Institutes of Health, Bethesda, MD). The number of MPO-positive cells and collagen volume fractions (stained blue) for individual sections were also measured using the ImageJ2 software. (Page 19, Line 537-541 in the revised manuscripts).  

      (3) Figure 2. It is unclear whether N=6 refers to 6 mice, maxilla, or fields per group.

      Thank you very much for the reviewer's question. To avoid any misunderstandings for the reviewer and readers, we have added a definition of the sample size in the description of the micro-CT analysis method. Specifically, in the micro-CT quantitative analysis, the sample size n for each group consists of 6 mice, with the average value of the BV/TV of the bilateral maxillary alveolar bone taken as one sample for statistical analysis (Page 17-18, Line 488-490 in the revised manuscripts).  

      (4)  igure 4K. Please provide separated staining for p16, VIM, and CD81, and not only the Merge. It is difficult to identify the triple-positive cells. Also, the arrows are difficult to observe.

      Thank you very much for the reviewer's suggestion. In the revised manuscript, we have included separated staining for p16, VIM, and CD81, and the triple-positive cells are indicated with white arrows (Figure 5-figure supplement 1). 

      (5) Overall, improve the magnifications in the IF experiments and show where the magnified areas come from.

      Thank you very much for the reviewer's suggestion. We have enlarged the fluorescence result images.

      (6) Refer to the original datasets of the scRNAseq results in figure legends.

      Thank you very much for the reviewer's suggestion. We have indicated the source of the raw single-cell sequencing data in the figure legend.

      (7) Check English grammar and writing.

      Thank you for the reviewer's suggestion. We checked the grammar and writing in the revised manuscript assisted by a native English speaker and AI tools like Chat-GPT.

      Reviewer #2 (Recommendations For The Authors):

      (1) When the authors refer to accelerated aging and/or senescence, they are doing so in comparison to what?

      Thank you for the reviewer's question, which allows me to further clarify the concepts of accelerated aging and/or senescence. In sections 2.1 and Figure 1 of this manuscript, we referred to accelerated aging and/or senescence. This indicates that the gingival tissues of periodontitis patients exhibit a higher number of senescent cells and elevated levels of senescence-related markers compared to healthy gingival tissues. In the title of this manuscript, we describe CD81+ fibroblasts as a unique subpopulation with accelerated cellular senescence. This means that CD81+ fibroblasts display higher expression levels of senescence-related genes, cell cycle inhibitor p16, and SASP factors compared to other fibroblast subpopulations. To avoid any misunderstanding, we have deleted the text ‘accelerated senescence’ in the revised manuscripts. 

      (2) In general, the main text does not describe the results using exact and reproducible terminology. Phrases like "X was most active", "a significant increase was observed", "the highest proportion was", and "the level of aging increased" should be supported by adding quantification details and by detailing what these comparisons are made to, to improve the reproducibility of the results.

      Thank you for the reviewer's suggestion. To improve the reproducibility of the results, we have added quantification details in the results section and clarified what comparisons are being made through the whole manuscript.

      (3) In some sections of the main text and figure legends, it is not entirely clear which sequencing experiments were conducted by the authors, which analyses were conducted by the authors on publicly available sequencing data, and which analyses were conducted on their mouse sequencing data.

      Thank you for the valuable feedback from the reviewer. To further clarify the source of the sequencing data, we have clearly indicated the data source in both the results section and the figure legends. 

      (4) In Figure 3H, the images showing SA-beta-gal staining on LPS-treated fibroblasts do not show convincingly the difference between treatments that are represented in the graph.

      Thank you for the reviewer's suggestion. To further clearly show the differences between treatments, we have enlarged the partial image of SA-β-gal staining shown in Figure 2-figure supplement 2 of the revised manuscripts. 

      (5) The choice of colors for Figure 4K is far from ideal as it is very difficult to tell apart red from purple channels and thus to visualize triple positive cells. A different LUT should be chosen, and separate individual channels should be shown to clearly identify triple-positive cells from others. Arrows also do not currently point at triple-positive cells.

      Thank you for the reviewer's suggestion. In the revised manuscript, we have included separated staining for p16, VIM, and CD81, and the triple-positive cells are marked with white arrows shown in Figure 5-figure supplement 1 of the revised manuscripts.  

      (6) The authors state that treatment with metformin "alleviated.... inflammatory cell infiltration (Figure 2C), and collagen degradation (Figure 2D) as observed through H&E and Masson staining." However, I cannot find a description of how the "relative fraction of collagen" in Figure 2Gc was calculated and how the H&E image they provide shows evidence of a reduction in inflammatory cells at that magnification.

      Thank you for the reviewer's suggestion. In the revised manuscript, we have added details in the methods section regarding the calculation of the "relative fraction of collagen" (Page 19, Line 539-541). Specifically, the collagen volume fractions (stained blue) for individual sections were measured using ImageJ2 software. Additionally, we have marked the infiltrating inflammatory cells in the gingiva in the H&E images with black arrows shown in Figure 7-figure supplement 1B of the revised manuscripts.

      (7) It appears that the in vivo experiment for metformin treatment was conducted with 6 animals per group, but this is not clear in the figures, main text, and methods.

      Thank you for the reviewer's suggestion. In the revised manuscript, we have included the number of mice in each group for the in vivo experiments, specifying that there are 6 mice per group in the figures, main text, and methods sections.

      (8) The methodology described for the bulk RNA-sequencing experiment in mice should describe the sequencing library characteristics and some reference to quality control thresholds that were implemented (mapped and aligned reads, sequencing depth and coverage, etc.).

      In the bulk RNA-sequencing experiment, the sequencing library characteristics and quality control thresholds were listed as follows:

      Sequencing Library Characteristics: We utilized the Illumina TruSeq RNA Library Construction Kit, generating libraries with an insert fragment length of approximately 400-500 bp.

      Quality Control Standards include the following:

      Alignment and Mapping Rates: The read data for all samples underwent preliminary quality control using FastQC (v0.11.9) and were aligned using HISAT2 (v2.2.1). The average mapping rate for each sample was over 90%.

      Sequencing Depth and Coverage: Each sample had a sequencing depth of 30M-40M paired reads to ensure sufficient transcript coverage. Detailed alignment statistics have been provided in the supplementary materials.

      Other Quality Control Measures: During the analysis, we also utilized RSeQC (v3.0.1) to evaluate the transcript coverage and GC bias of the sequencing data.

      The corresponding method description and reference were added in the Page 19-20, Line 546-558 of the revised manuscripts.

      (9) Patients with periodontitis are labeled as diagnosed with "chronic periodontitis". I would like to know how the authors defined this chronic state of the disease in their inclusion criteria.

      Thank you very much for the reviewer’s question, which gives us the opportunity to further clarify the definition and diagnosis of chronic periodontitis. The diagnostic criteria for patients with chronic periodontitis in this study are based on the 1999 International Workshop for a Classification of Periodontal Diseases and Conditions (Reference 1). Chronic periodontitis is a type of periodontal disease distinct from aggressive periodontitis, and it is not diagnosed based on the rate of disease progression. Clinically, the diagnosis of chronic periodontitis is primarily based on clinical attachment loss (CAL) ≥ 4 mm or probing depth (PD) ≥ 5 mm as one of the criteria for diagnosis.

      Reference

      (1) Armitage G. C. (2000). Development of a classification system for periodontal diseases and conditions. Northwest dentistry, 79(6), 31–35.

      (10) There is no detail about the age and sex of the donors for the healthy gingival fibroblast experiments. Are they some of the patients mentioned in Supplementary Table 1? Please clarify the source and number of independent primary cultures.

      Thank you very much to the reviewer for allowing us to further clarify the source and number of independent primary cultures. In the cell experiments, we used gingival fibroblasts derived from gingival tissue of two healthy volunteers and two patients with periodontitis as experimental subjects. This information has been listed in the Supplementary Table 1. 

      (11) Can the authors explain why their age inclusion criteria were different for the healthy and periodontitis groups according to their methods (healthy 18-50 years old: periodontitis 18-35 years old?)

      Thank you very much to the reviewer for pointing this out. We noticed that there was an error in the age range indicated for the healthy and periodontitis groups in the inclusion criteria. Based on the original inclusion criteria information, we have corrected the age range of the included population. 18-65 years old individuals were included into the both healthy and periodontitis groups. (Page 14-15, Line 396-404 in the revised manuscripts)

      (12) The methodology for inclusion is confusing and does not reflect the actual information of the recruited patients and samples thus analyzed. In the text, the healthy group appears to have included 8 young adult individuals and 8 middle-aged individuals. However, the list of recruited patients shows all healthy patients were in the young adult range (below 35 years of age) while all chronic periodontitis patients were middle-aged (above 50 years of age). Please clarify.

      Thank you very much to the reviewer for pointing out the issues in the article. This study included 8 healthy periodontal patients and 8 patients with periodontitis (Page 14, Line 396-398 and Supplementary Table 1 in the revised manuscripts). Since periodontitis has a higher prevalence in middle-aged and elderly populations, the periodontitis samples included in this study were mostly from this demographic. In contrast, the healthy gingival samples were sourced from patients undergoing wisdom tooth extraction, which primarily involves younger individuals. Therefore, due to the limited sample size, we could not enforce strict age matching. To address this, we repeated the relevant experiments in more consistent mouse models, which confirmed the increase in senescent cells in periodontal tissues (Figure 1D in the revised manuscripts). In summary, although the clinical samples were limited, the experimental results from the mouse models still support our conclusions.

      (13) The number of biological replicates for each group used in the bulk RNA-sequencing experiment is unclear. The methods state:" For those with biological duplication, we used DESeq2 [8] (version: 1.34.0) to screen differentially expressed gene sets between two biological conditions; for those without biological duplication, we used edgeR". Please clarify the number of mouse samples sequenced and the description of the groups.

      Thank you very much to the reviewer for pointing out the errors in the article. In the transcriptome sequencing, we collected gingival tissues from 3 healthy mice and gingival tissues from 3 ligature-induced periodontitis mice. Therefore, we used the DESeq2 (version: 1.34.0) method to filter for differentially expressed genes. The corresponding descriptions were revised in Page 20, Line 554-555 in the revised manuscripts.

      (14) Cluster group labels are misaligned in Figure 4C.

      Thank you very much for the reviewer's suggestion. The cluster group labels in Figure 3C of the revised manuscripts have been aligned.

      Reviewer #3 (Recommendations For The Authors):

      Major Comments for the Authors:

      (1) I do not find the immunohistochemical staining of p16 and p21 shown in Figures 2E and F to be particularly compelling. Especially as other stains of these markers used later in the manuscript are of higher quality (i.e. Figures 3F and G). Can this staining be improved to better reflect the quantifications in Figure 2G?

      Thank you very much for the reviewer's suggestion. In the revised manuscript, we have provided more representative images in Figure 7C in the revised manuscripts to reflect the effect of metformin treatment on the number of p16-positive cells in periodontitis. In Figure 7-figure supplement 1D of the revised manuscripts, we have marked p21-positive cells with black arrows to help readers better identify the p21-positive cells. Additionally, we have also assessed the H3K9me3 marker, which is more specific, and the results similarly indicate that metformin treatment can alleviate the formation of senescent cells in periodontitis (Figure 7-figure supplement 1E of the revised manuscript).

      (2) On line 140, Supplementary Figure 2C, D is quoted to show "...an increase in senescence characteristics of fibroblasts with the severity of periodontitis." This figure panel does not appear to support this statement. Please revise.

      Thank you very much for pointing out the errors in the manuscript. In the revised version, we have corrected this part of the description and added that “The results showed a decline in fibroblast proportion along with increasing disease severity (Figure 2-figure supplement 1C and D)” (Page 6, Line 153-154 of the revised manuscript)

      (3) I do not find the Western Blot experiment in Figure 4L to be particularly convincing. The text states that p21, p16, and CD81 increase in a context-dependent manner upon LPS stimulation, which doesn't appear to be very evident. I recommend repeating this experiment and showing both a representative blot alongside a blot density quantification where the bars have the error shown between experiments.

      Thank you very much for the reviewer’s suggestion regarding this result. During subsequent repeated experiments, we found that the result was not reproducible, and we have removed the related results.

      (4) The results state that metabolic profiling of senescent fibroblasts shows an increase in the biosynthesis of Linoleic acid, linolenic acid, arachidonic acid, and steroid. However, in Figure 5B only arachidonic acid and steroid biosynthesis appear to be elevated in CD81+ Fibroblasts, while Linoleic and linolenic acid appear to be decreased. Can the authors comment on this discrepancy? Moreover, in Figure 5C steroid biosynthesis is unchanged between healthy and periodontitis samples, contrary to the claimed increased trend in the results text. Please revise this section. Also, in Figures 5 B and C some of the terms are highlighted in a red or blue box. This is not discussed in the figure legend. Could the significance of this be explained or could these highlights be removed from the figure?

      Thank you very much for the reviewer’s correction regarding the errors in the manuscript. In the Page 7-8, Line 186-194 of the revised manuscripts, “Pathways related to fatty acid biosynthesis, arachidonic acid metabolism, and steroid biosynthesis were significantly upregulated in CD81+ fibroblasts (Figure 4-figure supplement 1A)” was re-wrote. Moreover, we have removed the results from Figure 5C, and the highlights in Figures 5B and C of the previous manuscripts. Since the mechanism by which cellular metabolism regulates cellular senescence is not the core focus of this manuscript, we have moved the results of the metabolic analysis from the sc-RNA sequencing data to the figure supplement (Figure 4-figure supplement 1) and revised the related statements in the revised manuscript (Page 7-8, Line 186-194).

      (5) The authors state that arachidonic acid can be converted to prostaglandins and leukotrienes through COXs (which are expressed in their CD81+ Fibroblasts), accentuating inflammatory responses. Have the authors profiled for the expression of prostaglandins and leukotrienes in their CD81+ Fibroblasts or between healthy and periodontitis samples? Such data would be a great inclusion in the manuscript.

      Thank you very much for the reviewer’s suggestion. Our results indicated that CD81+ gingival fibroblasts expressed higher levels of PTGS1 and PTGS2 compared to other fibroblast subpopulations. These genes encode proteins that are COX-1 and COX-2, which are key enzymes in prostaglandin biosynthesis (Figure 4-figure supplement 1 of the revised manuscript). Additionally, previous studies have reported high levels of prostaglandins and leukotrienes in periodontal tissues, and these pro-inflammatory mediators contribute to tissue destruction in periodontitis (Reference 1 and 2).

      Reference

      (1) Van Dyke, T. E., & Serhan, C. N. (2003). Resolution of inflammation: a new paradigm for the pathogenesis of periodontal diseases. Journal of dental research, 82(2), 82–90.

      (2) Hikiji, H., Takato, T., Shimizu, T., & Ishii, S. (2008). The roles of prostanoids, leukotrienes, and platelet-activating factor in bone metabolism and disease. Progress in lipid research, 47(2), 107–126.

      (6) Lines 199 and 200 state "...the cellular senescence of CD81+ fibroblasts could be attributed to disturbances in lipid metabolism". While altered lipid metabolic profiles are shown in Figure 5 to correlate with senescent fibroblasts/periodontitis tissue, no evidence is shown to suggest that they are the driver or cause of fibroblast senescence. Could this sentence be amended to better reflect the conclusions that can be drawn from the data presented?

      Thank you very much for the reviewer’s suggestion. We have revised the related statements and believed that “lipid metabolism might play a role in cellular senescence of the gingival fibroblasts” in the Page 7, Line 189 of the revised manuscripts.  

      Minor Comments for the Authors:

      (1) There are some sentences without references that I feel would warrant referencing: - Line 112 - "Metformin, an anti-aging drug has shown potential in inhibiting cell senescence in various disease models (REFERENCE)."

      Thank you for the reviewer's suggestion. We have included the relevant references in the Page10, Line 267-271 of the revised manuscripts.

      Reference

      (1) Soukas, A. A., Hao, H., & Wu, L. (2019). Metformin as Anti-Aging Therapy: Is It for Everyone?. Trends in endocrinology and metabolism: TEM, 30(10), 745–755.

      (2) Kodali, M., Attaluri, S., Madhu, L. N., Shuai, B., Upadhya, R., Gonzalez, J. J., Rao, X., & Shetty, A. K. (2021). Metformin treatment in late middle age improves cognitive function with alleviation of microglial activation and enhancement of autophagy in the hippocampus. Aging cell, 20(2), e13277.

      - Line 210 - "Previous studies have demonstrated the importance of sustained neutrophil infiltration in the progression of periodontitis (REFERENCE)."

      Thank you for the reviewer's suggestion. We have included the relevant references in the Page 8, Line 211-214 of the revised manuscripts.

      Reference

      (1) Song, J., Zhang, Y., Bai, Y., Sun, X., Lu, Y., Guo, Y., He, Y., Gao, M., Chi, X., Heng, B. C., Zhang, X., Li, W., Xu, M., Wei, Y., You, F., Zhang, X., Lu, D., & Deng, X. (2023). The Deubiquitinase OTUD1 Suppresses Secretory Neutrophil Polarization And Ameliorates Immunopathology of Periodontitis. Advanced science (Weinheim, Baden-Wurttemberg, Germany), 10(30), e2303207.

      (2) Kim, T. S., Silva, L. M., Theofilou, V. I., Greenwell-Wild, T., Li, L., Williams, D. W., Ikeuchi, T., Brenchley, L., NIDCD/NIDCR Genomics and Computational Biology Core, Bugge, T. H., Diaz, P. I., Kaplan, M. J., Carmona-Rivera, C., & Moutsopoulos, N. M. (2023). Neutrophil extracellular traps and extracellular histones potentiate IL-17 inflammation in periodontitis. The Journal of experimental medicine, 220(9), e20221751.

      (3) Ando, Y., Tsukasaki, M., Huynh, N. C., Zang, S., Yan, M., Muro, R., Nakamura, K., Komagamine, M., Komatsu, N., Okamoto, K., Nakano, K., Okamura, T., Yamaguchi, A., Ishihara, K., & Takayanagi, H. (2024). The neutrophil-osteogenic cell axis promotes bone destruction in periodontitis. International journal of oral science, 16(1), 18.

      (2) To improve the quality of several of the authors' claims I would recommend some further quantification of their experimental analyses. Namely:

      - Figures 3 F and G

      - Figures 4 I, J and K

      - Figures 6 F and G

      - Supplementary Figures 4 A, B, and C

      Thank you for the reviewer's suggestion. We have supplemented the quantitative analysis results for some images based on the reviewer's recommendations, specifically in Figure. 2G, Figure. 3G, Figure 5-figure supplement 1A, B, Figure 5-figure supplement 2A and Figure 7figure supplement 3A-D in the revised manuscripts. 

      (3) Figure 1L has missing x-axis annotation.

      Thank you for the reminder from the reviewer. The X-axis label has been added in Figure 1-figure supplement 1D for the GO term annotation. 

      (4) Line 117 is missing a reference for the experimental schematic shown in Figure 2A.

      Thank you for the reminder from the reviewer. The experimental schematic shown in Figure 7A has been referenced in Page 10, Line 275-277.

      (5) The "BV/TV ratio" and "CEJ-ABC distance" should be briefly explained in the results test (Lines 118 and 119).

      Thank you for the reviewer's suggestion. We have added the explanation of "BV/TV ratio" and "CEJ-ABC distance." In Page 10-11, Line 279-281 in the revised manuscripts.

      (6) Figure 2 could be improved by having some annotation for the anatomical regions shown.

      Thank you for the reviewer’s valuable suggestion. We have labeled the relevant anatomical structures to enhance clarity in Figure 7 in the revised manuscripts. 

      (7) The positive signal for p16 and p21 is difficult to interpret in Figure 2. Could the clarity of this be improved either by using more evident images or annotation with arrowheads indicating positive cells?

      Thank you for the reviewer's suggestion. In the revised manuscript, we have provided more representative images in Figure. 7C in the revised manuscripts to reflect the effect of metformin treatment on the number of p16-positive cells in periodontitis. In Figure 7-figure supplement 1D of the revised manuscripts, we have marked p21-positive cells with black arrows to help readers better identify the p21-positive cells. Additionally, we have also assessed the H3K9me3 marker, which is more specific, and the results similarly indicate that metformin treatment can alleviate the formation of senescent cells in periodontitis (Figure 7-figure supplement 1E of the revised manuscript).

      (8) Figure 2Gc, d, and e are not mentioned in the results text. Please include references to these panels at the appropriate points.

      Thank you for the reminder. In the revised manuscripts, Figures 2G c, d, and e in the previous manuscripts have been mentioned in the text in the Page 11, Line 284-289 of the revised manuscript. 

      (9) Scale bars are missing in Supplementary Figure 2E.

      Thank you for the suggestion. The scale bar has been added in the Figure 7-figure supplement 2B in the revised manuscripts. 

      (10) The order of the figure panels is not always mentioned in the order they are referred to in the text. For example, Figure 3 is presented in the order of A, B, D then C. Could this be changed to reflect the order in the results text?

      Thank you for the feedback. We have renumbered the figures according to the order mentioned in the original manuscript (Page 6, Line 146-149, Figure 2 in the revised manuscripts).

      (11) To improve reader clarity it would be good to briefly introduce the gene expression datasets analysed, such as GSE152042. I.e. what the experimental condition is from which it is derived.

      Thank you for the suggestion. We have included a brief description of the information and sources of the samples from GSE152042 in Page 6, Line 140-142 of the revised manuscripts. 

      (12) To improve reader clarity I would recommend signifying clearly in the figure if the data shown is from mouse or human samples. For example in Figure 3F and G.

      Thank you for the suggestion. We have moved all the results from the mouse experiments to the figures supplement (Figure 5-figure supplement 1 and 2 in the revised manuscripts).

      (13) The images shown in Figure 3H for SA-beta-Gal do not seem very convincing. Could this be improved?

      Thank you for the suggestion. To further illustrate the differences in SA-beta-Gal results between the groups, we have provided images at higher magnification in the Figure 2-figure supplement 2 of the revised manuscripts.  

      (14) Supplementary Figure 2E would benefit from small experimental schematics that would allow the reader to appreciate the timings of the treatment for this experiment.

      Thank you for the suggestion. We have added a schematic diagram in Figure 7-figure supplement 2A of the revised manuscripts to illustrate the LPS treatment, metformin treatment, and the timing of the assessments. 

      (15) Figure 4K would benefit from showing the merged image and single channels of each of the stains to better assess the degree of colocalisation.

      Thank you for the suggestion. We have included each individual fluorescence channel in Figure 5-figure supplement 1C of the revised manuscripts. 

      (16) The writing on the X-axis of Figure 6B is almost illegible to me, although this may just be a compression artefact. This makes the interpretation of the data quite difficult. Also, for Figures 6 B and C, the meaning of the (H) and (P) annotations should be clear on either the figure or figure legend. I surmise that they represent "Healthy" and "Periodontic" samples respectively.

      Thank you for the suggestion. In the revised manuscript, we have enlarged Figure 6B in the previous manuscripts to better display the X-axis as shown in the Figure 5B of the revised manuscripts. Additionally, we have fully labeled "Healthy" and "Periodontitis" in Figure 5C of the revised manuscripts.

      (17) MPO-positive cells are introduced on line 216, however, no explanation is provided for what population or state the expression of this protein marks. I surmise the authors are using it to detect Neutrophil populations. If so, could the authors briefly state this the first time it is used?

      Thank you for the suggestion. In the revised manuscript, we have added an introduction to MPO. MPO, or myeloperoxidase, is considered one of the markers for neutrophils. (Page 9, Line 240-242 of the revised manuscripts)

      (18) Supplementary Figure 3D does not appear to be mentioned or discussed in the results text.

      Thank you for the reminder. We have referenced Supplementary Figure 3D in the previous manuscripts in Page 9, Line 240-242 shown as Figure 5-figure supplement 2C of the revised manuscript.  

      (19) Figure 6E showing increased C3 expression in periodontic samples is not very convincing and differences in expression are not evident. Can the authors provide an image that more convincingly matches their quantification?

      Thank you for the suggestion. In the revised manuscript, we have provided more representative images shown in Figure 5E of the revised manuscript.

      (20) Figure 6I shows the expression of CD81 and SOD2 in healthy and periodontic tissue. The associated results texts (Lines 220 to 223) discuss the spatial coincidence of CD81 and MPO. Can the authors address this discrepancy in either the results text or the figure panel? Moreover, can Figure 6H and I be annotated to show the location of the gingival lamina propria to improve clarity?

      Thank you for the reminder. We have revised the relevant statements in the text: "Interestingly, spatial transcriptomic analysis of gingival tissue revealed that the regions expressing CD81 and SOD2, a neutrophil marker, in periodontitis overlapped in the gingival lamina propria, showing a high spatial correlation" in Page 9, Line 223-226 of the revised manuscripts. Additionally, we have labeled the gingival lamina propria (LP) in Figure 5H of the revised manuscripts.

      (21) I am confused about the purpose of Supplementary Figure 3E and what evidence it provides. Can the authors comment on this?

      Thank you for the reminder. To avoid any potential misunderstanding by readers, we have deleted Supplementary Figure 3 image in the revised manuscripts

    1. eLife Assessment

      This is a valuable study showing that differentiated cells of the zebrafish skin form membrane protrusions called cytonemes that contact and likely transmit Notch signals to cells of the undifferentiated layer below. The data are convincing that cytoneme like protrusions from the periderm are required for proper periderm structure, proliferation, gene expression, and Notch signaling. Evidence that inflammatory signaling through IL-17 affects epidermal differentiation, Notch and cytoneme formation is solid, but whether these are through a single common or two parallel pathways requires further investigation.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, Wang et al show that differentiated peridermal cells of the zebrafish epidermis extend cytoneme-like protrusions toward the less differentiated, intermediate layer below. They present evidence that expression of a dominant-negative cdc42, inhibits cytoneme formation and leads to elevated expression of a marker of undifferentiated keratinocytes, krtt1c19e, in the periderm layer. It is demonstrated that Delta-Notch signaling is involved in keratinocyte differentiation and that loss of cytonemes correlates with a loss of Notch signaling. Finally, changes in expression of the inflammatory cytokine IL-17 and its receptors is shown to affect cytoneme number and periderm structure in a manner similar to Notch and cdc42 perturbations.

      Strengths:

      Overall, the idea that differentiated cells signal to underlying undifferentiated cells via membrane protrusions in skin keratinocytes is interesting and novel, and it is clear that periderm cells send out thin membrane protrusions that contain a Notch ligand. Further, and perturbations that affect cytoneme number, Notch signaling and IL-17 expression clearly lead to changes in periderm structure and gene expression.

      Weaknesses:

      The mechanisms by which IL-17 affects cytoneme formation requires further investigation.

    3. Reviewer #2 (Public review):

      Summary:

      The aim of the study was to understand how cells of the skin communicate across dermal layers. The research group has previously demonstrated that cellular connections called airinemes contribute to this communication. The current work builds upon this knowledge by showing that differentiated keratinocytes also use cytonemes, specialized signaling filopodia, to communicate with undifferentiated keratinocytes. They show that cytonemes are the more abundant type of cellular extension used for communication between the differentiated keratinocyte layer and the undifferentiated keratinocytes. Disruption of cytoneme formation led to expansion of the undifferentiated keratinocytes into the periderm, mimicking skin diseases like psoriasis. The authors go on to show that disruption of cytonemes results in perturbations in Notch signaling between the differentiated keratinocytes of the periderm and the underlying proliferating undifferentiated keratinocytes. Further the authors show that Interleukin-17, also known to drive psoriasis, can restrict formation of periderm cytonemes, possibly through the inhibition of Cdc42 expression. This work suggests that cytoneme mediated Notch signaling plays a central role in normal epidermal regulation. The authors propose that disruption of cytoneme function may be an underlying cause of various human skin diseases.

      Strengths:

      The authors provide strong evidence that periderm keratinocytes cytonemes contain the notch ligand DeltaC to promote Notch activation in the underlying intermediate layer to regulate accurate epidermal maintenance.

      Weaknesses:

      The impact of the study would be increased if the mechanism by which Interlukin-17 and Cdc42 collaborate to regulate cytonemes was defined. Experiments measuring Cdc42 activity, rather than just measuring expression, would strengthen the conclusions.

      Comments on revisions:

      The authors have sufficiently addressed my critiques from the initial round of evaluation. They have included useful representative images, clarified how they scored cytonemes and provided additional controls/experimental conditions that improve the rigor of the study. The results provided now support the key conclusions of the study.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Wang et al show that differentiated peridermal cells of the zebrafish epidermis extend cytoneme-like protrusions toward the less differentiated, intermediate layer below. They present evidence that expression of a dominant-negative cdc42, inhibits cytoneme formation and leads to elevated expression of a marker of undifferentiated keratinocytes, krtt1c19e, in the periderm layer. Data is presented suggesting the involvement of Delta-Notch signaling in keratinocyte differentiation. Finally, changes in expression of the inflammatory cytokine IL-17 and its receptors is shown to affect cytoneme number and periderm structure in a manner similar to Notch and cdc42 perturbations.

      Strengths:

      Overall, the idea that differentiated cells signal to underlying undifferentiated cells via membrane protrusions in skin keratinocytes is interesting and novel, and it is clear that periderm cells send out thin membrane protrusions that contain a Notch ligand. Further, perturbations that affect cytoneme number, Notch signaling, and IL-17 expression clearly lead to changes in periderm structure and gene expression.

      Weaknesses:

      More work is needed to determine whether the effects on keratinocyte differentiation are due to a loss of cytonemes themselves, or to broader effects of inhibiting cdc42. Moreover, more evidence is needed to support the claim that periderm cytonemes deliver Delta ligands to induce Notch signaling below. Without these aspects of the study being solidified, understanding how IL-17 affects these processes seems premature.

      Reviewer #2 (Public Review):

      Summary:

      The aim of the study was to understand how cells of the skin communicate across dermal layers. The research group has previously demonstrated that cellular connections called airinemes contribute to this communication. The current work builds upon this knowledge by showing that differentiated keratinocytes also use cytonemes, specialized signaling filopodia, to communicate with undifferentiated keratinocytes. They show that cytonemes are the more abundant type of cellular extension used for communication between the differentiated keratinocyte layer and the undifferentiated keratinocytes. Disruption of cytoneme formation led to the expansion of the undifferentiated keratinocytes into the periderm, mimicking skin diseases like psoriasis. The authors go on to show that disruption of cytonemes results in perturbations in Notch signaling between the differentiated keratinocytes of the periderm and the underlying proliferating undifferentiated keratinocytes. Further, the authors show that Interleukin-17, also known to drive psoriasis, can restrict the formation of periderm cytonemes, possibly through the inhibition of Cdc42 expression. This work suggests that cytoneme-mediated Notch signaling plays a central role in normal epidermal regulation. The authors propose that disruption of cytoneme function may be an underlying cause of various human skin diseases.

      Strengths:

      The authors provide strong evidence that periderm keratinocytes cytonemes contain the notch ligand DeltaC to promote Notch activation in the underlying intermediate layer to regulate accurate epidermal maintenance.

      Weaknesses:

      The impact of the study would be increased if the mechanism by which Interlukin-17 and Cdc42 collaborate to regulate cytonemes was defined. Experiments measuring Cdc42 activity, rather than just measuring expression, would strengthen the conclusions.

      Reviewer #3 (Public Review):

      Summary:

      Leveraging zebra fish as a research model, Wang et al identified "cytoneme-like structures" as a mechanism for mediating cell-cell communications among skin epidermal cells. The authors further demonstrated that the "cytoneme-like structures" can mediate Notch signaling, and the "cytoneme-like structures" are influenced by IL17 signaling.

      Strengths:

      Elegant zebrafish genetics, reporters, and live imaging.

      Weaknesses: (minor)

      This paper focused on characterizing the "cytoneme-like structures" between different layers and the NOTCH signaling. However, these "cytoneme-like structures" observed in undifferentiated KC (Figure 2B), although at a slightly lower frequency, were not interpreted. In addition, it is unclear if these "cytoneme-like structures" can mediate other signaling pathways than NOTCH.

      We are currently investigating the role of cytoneme-like protrusions extended from undifferentiated keratinocytes and their role is still under investigation. We believe that addressing the function of undifferentiated keratinocyte cytonemes and exploring whether peridermal cytoneme can mediate other signaling pathways is beyond the scope of the current manuscript. However, we hope to publish our discoveries about them soon. It is worth noting that cytonemes mediate other morphogenetic signals, such as Hh, Wnt, Fgf, and TGFbeta in other contexts.

      Overall, this is a solid paper with convincing data reporting the "cytoneme-like structures" in vivo, and with compelling data demonstrating the roles in NOTCH signaling and the regulation by IL17.

      These findings provide a foundation for future work exploring the "cytoneme-like structures" in the mammalian system and other epithelial tissue types. This paper also suggests a potential connection between the "cytoneme-like structures" and psoriasis, which needs to be further explored in clinical samples.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major points

      - In general, representative images from each experiment should accompany the graphs shown. The inclusion of still frames from time-lapse imaging experiments in the main figures would help the reader understand the morphology and dynamics of these protrusions in control, cdc42, and IL-17 manipulations.

      Thank you for the comments. We appreciate your suggestion to include representative images alongside the graphs to better illustrate the morphology and dynamics of these protrusions.

      In response, we have made the following additions to our main figures.

      Figure 3A now includes still images from time-lapse movies for both control and cdc42 manipulations.

      Figure 5A and 6A,C now include still images for il17 manipulations.

      - Data in Figure 3 is crucial as it demonstrates that cdc42DN selectively impairs cytoneme extensions without affecting other actin-based structures. It also shows that cdc42DN leads to upregulation of krtt1c19e in periderm. Therefore, these data should be presented in a comprehensive way. Still, frames of high mag views of time-lapse images from control and cdc42DN should be included in the figure. Similarly, a counter label (E-Cadherin, perhaps) showing the presence of all three layers and goblet cells at different focal planes capturing the different layers of the skin should be included. It is stated that the goblet cell number is unaffected, but they seem to be absent in the image shown in Figure 3B.

      In this revised version, we have included magnified cross-sectional views. In addition to the images of the peridermal layer from the original version, we have now included the underlying intermediate and basal stem cell layers (Figure 3C-C”). We hope these data convincingly show that peridermal keratinocytes in cytoneme inhibited animals co-express krt4 and krtt1c19e markers, suggesting that peridermal keratinocytes are not fully differentiated.

      We agree that the goblet cells in this particular image of experimental group appear largely absent, however, as we quantified many animals, the number of goblet cells was not significantly different between controls and experimental (Figure S2).

      - The effects on periderm architecture upon broad cdc42 inhibition may not be directly due to a loss of cytonemes. Performing this experiment in a mosaic manner to determine if the effects are local and in the range of cytoneme protrusion would strengthen the conclusions. Adding a secondary perturbation to inhibit cytoneme formation in periderm cells would also strengthen the conclusions that defects are not related specifically to cdc42 inhibition, but cytonemes themselves.

      Thank you for the suggestion. We confirmed that mosaic expression of cdc42DN in peridermal keratinocytes elicited local disorganization, and elevated krtt1c19e expression as we seen in transgenic lines. Also, the cdc42DN expressing cells exhibited significantly lower cytoneme extension frequency.

      In addition, we found that like cdc42DN, rac1DN expressing keratinocytes exhibited significant decrease in cytoneme extension frequency, but rhoabDN show no effects (new Figure S3). These data suggest that cytoneme extension is regulated by cdc42 and rac1 but not rhoab. Further investigation is required however, at least these data suggest that the effects we observe is likely the loss of cytonemes not just specifically to cdc42 inhibition.

      - Figure 4. The inclusion of an endogenous reporter of Notch activity, like Hes or Hey immunofluorescence, would strengthen the conclusion that the intermediate layer is Notch responsive.

      Thank you for the suggestion. In this revised version, we have included immunostaining data in Figure 4D demonstrating that Her6 (the orthologous to human HES1) protein is expressed in the intermediate layer.

      - It is not clear where along a differentiation trajectory Notch signaling and cytonemes are needed. What happens to the intermediate layer when Notch signaling or cdc42 is inhibited? Do the cells become more basal-like? Or failing to become periderm? Meaning - is Notch promoting the basal to intermediate fate transition, or the intermediate to periderm transition? A more comprehensive characterization of basal, intermediate, and periderm differentiation with markers selective to each layer would help define which step in the process is being altered.

      Notch signaling is known to regulate keratinocyte terminal differentiation. Thus, it requires in the process from intermediate to peridermal transition. We observed peridermal keratinocytes still strongly express krt19 suggesting their terminal differentiation is inhibited when cytoneme mediated Notch signaling is compromised.

      As seen on Figure 3C”, peridermal keratinocytes express both krt4 and krtt1c19e markers and they are located at the peridermal layer suggesting that they are not fully differentiated keratinocytes. As we included the images of intermediate and basal layers, we do not observe any noticeable defects in basal stem cells or complete depletion of intermediate keratinocytes (Fig 3C-C”). These observations suggest that notch signaling, activated by cytonemes, is required for the differentiation of undifferentiated intermediate keratinocytes into peridermal keratinocytes.

      We included this interpretation in the main text.

      - A number of times in the text it is suggested that cytonemes, Notch, and IL-17 signaling are essential for keratinocyte differentiation and proliferation, but proliferation (% cells in S-phase and M-phase) is not measured. Also, #of keratinocytes @ periderm is not an accurate way to report the number of cells in the periderm unless every cell in the larvae has been counted. It should be # cells/unit area.

      In this revised version, we confirmed that the number of Edu+ cells among peridermal keratinocytes are significantly increased when cytonemes are inhibited (Figure 3F-G). Also, as indicated in the methods section, we indeed counted the cells in 290um x 200um square. We believe both of the data sufficiently suggest that the number of keratinocytes in periderm is significantly increased due to the lack of proper cytoneme mediated signaling.

      - If the model is correct that Delta ligands from the periderm signal to intermediate cells to promote their differentiation and inhibit their proliferation, then depletion of Delta from Krt4 expressing cells should recapitulate the periderm phenotype.

      It is a great suggestion. However, zebrafish skin express multiple delta ligands and we do not know what specific combination of Deltas are delivered via cytonemes. In this manuscript we identified Dlc is expressed along the cytonemes and krt4+ cells (revised Figure S4), however we are unsure whether other Delta ligands involve the notch activation. However, cytoneme inhibition is performed specifically in krt4+ cells and the downregulation of Notch activation are observed in krtt1c19e+ undifferentiated keratinocytes. In this revised version, we found that a Notch responsive protein Her6 is exclusively expressed in the cytoneme target keratinocytes, and cytoneme extending cells (krt4+) do not express Notch receptors.

      - rtPCR data in Figure S3 is not properly controlled. Each gene should be tested in both krt4 and krtt1c19e expressing cells to determine their relative expression levels in different skin layers that are proposed to signal to one another. Are Notch ligands present in basal cells? These could be activating Notch in the intermediate layer.

      Our intention was to merely confirm the Notch signaling components are expressed in cytoneme extending and receiving cells. Based on the new panel of RT-PCRs for notch signaling components, we confirmed again that dlc is expressed in cytoneme extending cells but not in receiving cells. Basal cells are also krtt1c19e+ but we did not detect dlc from them. Interestingly, we found that notch 2 is exclusively expressed in krtt1c19e+ cells but not from krt4+ cytoneme extending cells (now new Figure S4).

      - It is not intuitive why NICD (activation) and SuHDN (inhibition) of Notch signaling should result in a similar effect on the periderm. What is the effect of NICD expression on the TP1:H2BGFP reporter? Does it hyperactivate as expected?

      We agree reviewer’s concerns. It is well studied that psoriasis patients exhibits either loss or gain of notch signaling (Ota et al., 2014 Acta Histochecm Cytochem, Abdou et al., 2012 Annals of Diagnostic Pathology). However, it remains unknown the underlying mechanisms. We merely intended to showcase our zebrafish experimental manipulations recapitulate human patients’ case. However, we believe this data doesn’t require for drawing the overall conclusion but need further investigation to explain it. Thus, if the reviewers agree we want to omit it in this manuscript and leave it for future studies.

      - Due to the involvement of immune signaling in hyperproliferative skin diseases the paper then investigates the role of IL-17 on cytoneme formation by overexpressing two IL-17 receptors in the periderm. Fewer cytonemes were present in the receptor over-expressing periderm cells. The rationale for overexpressing the receptors was unclear. If relevant to endogenous cytokine signaling, the periderm would be expected to express IL-17 receptors normally and respond to elevated levels of IL-17.

      The rationale behind the reason of why we overexpress the IL-17 receptors is to test its autonomy of krt4+ peridermal cells. There is a debate that whether the onset of psoriasis is autonomous to keratinocytes or non-autonomous effects of immune malfunction. In addition to the overexpression of IL-17 receptors, we showed that the IL-17 ligand overexpression shows the sample effects on cytoneme extension (Fig. 6A-B).

      - Experiments overexpressing IL-17 in macrophages are also suggested to limit cytoneme number whereas heterozygous deletion elevates them. Representative images and movies should be included to support the data. Western blots or immunofluorescence showing that IL-17 and its receptors are indeed overexpressed in the relevant layers/cell types should also be included as controls. Knockout of IL-17 protein in the new Crispr deletion mutant should also be shown.

      In response to the reviewer’s comments, we have included representative images of peridermal keratinocytes in IL-17 ligand overexpressed and il17 CRISPR KO animals (Fig. 6A,C).

      We have confirmed the overexpression of Il17rd, Il17ra1a and Il17a in the transgenic animals. For the il17 receptors, we FACS-sorted differentiated keratinocytes and performed qRT-PCR. Similarly, for the il17 ligand, we isolated skin tissue and conducted qRT-PCR (new Figure S7).

      Additionally, we confirmed that IL-17 protein expression is undetectable in il17a CRISPR KO fish (Fig. S8C).

      - Evidence that the effect of IL-17 upregulation on periderm architecture is via cytonemes is suggestive but not conclusive. Can the phenotype be rescued by a constitutively active cdc42?

      We appreciate the reviewer’s suggestion. We are unsure whether constitutively active cdc42 expression can rescue IL-17 overexpression mediated reduction of cytoneme extension frequency. It is well expected that cdc42CA will stabilize actin polymerization in turn more cytonemes. However, it is also known sustained cdc42 activation can paradoxically lead to actin depolymerization. Thus, we concern it will be likely uninterpretable. Also, we need to generate a new transgenic line for this experiment and the baseline control experiments and validations take substantial amount of time and efforts with no confidence.

      We and others believe that the cdc42 is a final effector molecule to regulate cytoneme extension given its role in actin polymerization. we provided the evidence that IL-17 overexpression significantly reduced cdc42 and rac1 expression (Figure 6E) and co-manipulation with IL17 overexpression and cdc42DN led to further down-regulation of cytoneme extension frequency in peridermal keratinocytes (Figure 6H).

      - In a final experiment, the authors mutate a psoriasis-associated gene, clint1a gene and show an effect on cytonemes, Notch output, and periderm structure. More information about what this gene encodes, where the mRNA is expressed, and where the cell the protein should localize would help place this result in context for the reader.

      In this revised manuscript we included more information about the clint1.

      “The clathrin interactor 1 (clint1), also referred to as enthoprotin and epsinR functions as an adaptor molecule that binds SNARE proteins and play a role in clathrin-mediated vasicular transport (Wasiak, 2002). It has also been reported that clint1 is expressed in epidermis and play an important role in epidermal homeostasis and development in zebrafish (Dodd et al., 2009)”.

      Minor points

      - The architecture of zebrafish skin is notably distinct from that of humans and other mammals and whether parallels can be drawn with regards to cytoneme mediated signaling requires further investigation. For this reason, I believe the title should include the words 'in zebrafish skin'.

      In this version, we changed the title as ‘Cytoneme-mediated intercellular signaling in keratinocytes essential for epidermal remodeling in zebrafish’.

      - More details about the timing of cdc42 inhibition should be given in the main text to interpret the data. How many hours of days are the larvae treated? How does this compare to the rate of division and differentiation in the zebrafish larval epidermis?

      We apologize for omitting the detailed experimental conditions for cytoneme inhibition. We have revised the main text as follows “Although the cytoneme inhibition is evident after overnight treatment with the inducing drugs, noticeable epidermal phenotypes begin to appear after 3 days of treatment. This reflects the higher cytoneme extension frequency and their potential role during metamorphic stages, which takes a couple of weeks (Figure 1C)”

      - What are the genotypes of animals in Figure 4B where 'Notch expression' is being measured upon cdc42DN inhibition? Is this the TP1:H2B-GFP reporter? Again, details of the timing of this experiment are needed to evaluate the results.

      We indicated the reference supplement figure for the Notch activity measure in the figure legend S4. And we added the following sentence in the main text. “Similar to the effects on the epidermis after cytoneme inhibition (Figure 3), it takes 3 days to observe a significantly reduction in Notch signal in the undifferentiated keratinocytes.”

      Reviewer #2 (Recommendations For The Authors):

      - Figure 2B: the authors indicate that the undifferentiated keratinocytes (krtt1c19e+) do extend some cytonemes. Although this behavior is not a focus of the study, it would be helpful to see an image of krtt1c19e:lyn-tdTomato cytonemes. The discussion ends with an interesting statement about downward pointed protrusions coming off the undifferentiated keratinocytes. A representative image of this should be included in Figure 2.

      In this revised version, we included an image of krtt1c19e positive cell that extend cytonemes in Figure 2C.

      - The evidence for hyperproliferation of the undifferentiated keratinocytes would be strengthened by quantifying proliferation. Most experiments result in increased expression of krtt1c19e in the periderm layer, but it is unclear whether this is invasion, remodeling, or incomplete differentiation of the cells. Notch suppression with krtt1c19e:SuHDN and overactivation with krtt1c19e:NICD phenocopy each other. Are there differences in proliferation vs differentiation rates in these two genotypes that result in a similar phenotype?

      We appreciate the reviewer’s comments. In response to the feedback, we included Edu experiments that show increased cell proliferation in keratinocytes in periderm in experimental groups. Additionally, we observed co-expressed of both differentiated marker krt4 and undifferentiated marker krtt1c19e in the keratinocytes in periderm. Since we did not observe depletion of intermediate layer, we believe it is reasonable to conclude that the phenotype represents incomplete differentiation (new Figure 3). For the krtt1c19e:NICD question, please refer to our response to reviewer #1’ comment.

      - Do Cdc42DN and il17rd or il17ra1a work in parallel or in a hierarchy of signaling events to regulate cytoneme formation?

      Cdc42 is widely recognized as a final effector in cytoneme extension, given its well-established role in actin polymerization, which is critical for cytoneme extension. Our data support a model where il17 signaling acts upstream of cdc42. We showed that the overexpression of il17rd or il17ra1a significantly reduced the expression of Cdc42 (Figure 6E). In double transgenic fish overexpressing il17rd and cdc42DN, we observed a more marked decrease in cytoneme extension compared to single transgenic (Figure 6H). These results collectively indicate that, at least partially, Cdc42 functions downstream of il17 signaling in the context of cytoneme formation. However, we acknowledge that additional regulatory mechanisms may be involved, given the complexity of cellular signaling networks.  

      - Figure 6C: Are the effects of overexpression of il17rd specific to Cdc42, or are other Rho family GTPases like Rac and Rho also affected? Is the microridge defect (Figure 6D) also present in Tg(krt4:TetGBDTRE-v2a-cdc42DN) when induced, or could this be regulated by Rho/Rac?

      We used the microridge formation as a readout to evaluate the effects of il17receptor overexpression on actin polymerization. In this revision, we demonstrate that the expression of other small GTPases is also decreased in il17rd or il17ra1a overexpressed keratinocytes (Figure 6E). Also, we confirmed that microridges exhibit significantly shorter branch length when cdc42DN or rac1DN is overexpressed (new Figure S9). It is note that we have shown that the effects on cytonemes are regulated by cdc42 and rac1 (new Figure S3).

      - Please change the color of the individual data points from black to grey or another color so readers may better visualize the mean and error bars.

      We agree with this comment, and in response, we have revised the figures by changing the color of the individual data points to empty circles and now the error bars are better visualized.

      - Figure 1: What were the parameters used to identify an extension as a cytoneme? Please include the minimal length and max-width used in the analysis in the methods.

      Thank you for the comments. We have now included the method of how we defined cytonemes and measured as follows. In zebrafish keratinocytes, lamellipodial extensions are the dominant extension type, and most filopodial extensions are less than 1µm in length, both are not easily visible at the confocal resolution we used for this study. Thus, it is easy to distinguish filopodia from cytonemes, as cytonemes have a minimum length of 4.36µm in our observations. We did not use the width parameter since there are no other protrusions except cytonemes. We calculated the cytoneme extension frequency by counting how many cytonemes extended from a cell per hour. We analyzed movies with 3-minute intervals over a total of 10 hours, as described in the section above.

      - Line 149-150, (Figure S1) ML141 is a Cdc42 inhibitor, please correct the wording. Would the use of an actin polymerization inhibitor like Cytochalasin B or a depolymerizing agent (Latrunculin) increase the reduction in cytoneme formation?

      Thank you for pointing it out. We have revised it in this version. We have tried Cytochalasin B or Latrunculin and the treatments killed the animals.

      - Figure 2: What is the depth of the Z-axis images? Does the scale bar apply to the cross-sectional images as well? It may be beneficial to readers to expand the Z scale of the cross-section images for Figure 2C.

      Sure, we enlarged the cross-sectional images. Yes, the scale bar should apply to the cross-sectional images.

      - Figure 3B-B' cross-section images should be added to confirm images shown represent the periderm layer. Are there folds in the epidermis due to cdc42DN expression or are differentiated keratinocytes absent?

      In response, we have included z-stack images in the revised figure 3. We found that the epidermal tissue is not flat as compared to controls, presumably due to broad cdc42DN expression (Figure 3C”).

      - Figure S3: Do the EGFP+ and tdTomato+ cells have noticeable differential gene expression? The inclusion of RT-PCR analysis of all genes analyzed for both cell populations would bolster statements on lines 230-231 and 254-256.

      We agree the reviewer’s comment and we have revised the RT-PCR panel in this revised version (Figure S4).

      - Figure 4D-D', Please include cross-section images to indicate the focal plane for analysis.

      We included cross-section images in this revised version (Figure 4E-E”).

      - Figure 5B: Complimentary images visualizing the reduction of Notch would be helpful.

      We are sorry not to include the data. In this revised version, we included notch reporter expression data that comparing WT, Tg(krt4:il17rd), and Tg(krt4:il17ra1a) in Figure S5E.

      - Line 432-433: "Moreover, we have demonstrated that IL-17 can influence cytoneme extension by regulating Cdc42 GTPases, ultimately affecting actin polymerization." This claim would be strengthened by assaying for Cdc42 activity.

      It is a great idea, and we were trying to address this issue. However, we realized that activity measure with biosensors, especially in vivo, required significant amount of time and effort and validations which seem to take a substantial amount of work needed, and no confidence to work in our end. And, it seems the current methods works for in vitro samples still has many limitations such as sensitivity issues. Although, we agree cdc42 activity measure will bolster our findings, it seems very challenging to apply it to zebrafish in vivo system.

      - Line 445-447: "Clint1(Clathrin Interactor 1) plays an important role in vesicle trafficking, and it is well established that endocytic pathways are critical for multiple steps in cytoneme-mediated morphogen delivery (Kalthoff et al., 2002)." Please add references to the "endocytic pathways are critical for multiple steps in cytoneme-mediated morphogen delivery" portion of the sentence.

      We revised the sentence. It is “well established” -> it is “suggested”, and added a reference (Daly et al., 2022).

      Reviewer #3 (Recommendations For The Authors):

      The details of the "cytoneme inhibition" experiments need to be better clarified. How long was the dox treatment? How soon did the cells start to show "disorganization"? How soon did the KC in the periderm start to show increased proliferation?

      Thank you for the valuable comment and in response, we have revised the main text as follows “Although the cytoneme inhibition is evident after overnight treatment with the inducing drugs, noticeable epidermal phenotypes begin to appear after 3 days of treatment. This reflects the higher cytoneme extension frequency and their potential role during metamorphic stages, which takes a couple of weeks (Figure 1C)”

    1. eLife Assessment

      This fundamental manuscript presents a practical modification of the orthogonal hybridization chain reaction (HCR) technique, a promising yet underutilized method with broad potential for future applications across various fields. The authors advance this technique by integrating peptide ligation technology and nanobody-based antibody mimetics-cost-effective and scalable alternatives to conventional antibodies-into a DNA-immunoassay framework that merges oligonucleotide-based detection with immunoassay methodologies. Notably, they demonstrate with compelling evidence that this approach facilitates a modified ELISA platform capable of simultaneously quantifying multiple target protein expression levels within a single protein mixture sample.

    2. Reviewer #1 (Public review):

      Summary:

      This fundamental study presents a practical modification of the orthogonal hybridization chain reaction (HCR) technique, a promising yet underutilized method with broad potential for future applications across various fields. The authors advance this technique by integrating peptide ligation technology and nanobody-based antibody mimetics-cost-effective and scalable alternatives to conventional antibodies-into a DNA-immunoassay framework that merges oligonucleotide-based detection with immunoassay methodologies. They demonstrate this with compelling evidence that this approach facilitates a modified ELISA platform capable of simultaneously quantifying multiple target protein expression levels within a single protein mixture sample.

      Strengths:

      The hybridization chain reaction (HCR) technique was initially developed to enable the simultaneous detection of multiple mRNA expression levels within the same tissue. This method has since evolved into immuno-HCR, which extends its application to protein detection by utilizing antibodies. A key requirement of immuno-HCR is the coupling of oligonucleotides to antibodies, a process that can be challenging due to the inherent difficulties in expressing and purifying conventional antibodies.<br /> In this study, the authors present an innovative approach that circumvents these limitations by employing nanobody-based antibody mimetics, which recognize antibodies, instead of directly coupling oligonucleotides to conventional antibodies. This strategy facilitates oligonucleotide conjugation-designed to target the initiator hairpin oligonucleotide of HCR-through peptide ligation and click chemistry.

      Weaknesses:

      The sandwich-format technique presented in this study, which employs a nanobody that recognizes primary IgG antibodies, may have limited scalability compared to existing methods that directly couple oligonucleotides to primary antibodies. This limitation arises because the C-region types of primary antibodies are relatively restricted, meaning that the use of nanobody-based detection may constrain the number of target proteins that can be analyzed simultaneously. In contrast, the conventional approach of directly conjugating oligonucleotides to primary antibodies allows for a broader range of protein targets to be analyzed in parallel.

      Additionally, in the context of HCR-based protein detection, the number of proteins that can be analyzed simultaneously is inherently constrained by fluorescence wavelength overlap in microscopy, which limits its multiplexing capability. By comparison, direct coupling oligonucleotides to primary antibodies can facilitate the simultaneous measurement of a significantly greater number of protein targets than the sandwich-based nanobody approach in the barcode-ELISA/NGS-based technique.

      Comments on revisions:

      The previous suggestions were well incorporated in the revised manuscript.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a practical modification of the orthogonal hybridization chain reaction (HCR) technique, a promising yet underutilized method with broad potential for future applications across various fields. The authors advance this technique by integrating peptide ligation technology and nanobody-based antibody mimetics - cost-effective and scalable alternatives to conventional antibodies - into a DNA-immunoassay framework that merges oligonucleotide-based detection with immunoassay methodologies. Notably, they demonstrate that this approach facilitates a modified ELISA platform capable of simultaneously quantifying multiple target protein expression levels within a single protein mixture sample.

      Strengths:

      The hybridization chain reaction (HCR) technique was initially developed to enable the simultaneous detection of multiple mRNA expression levels within the same tissue. This method has since evolved into immuno-HCR, which extends its application to protein detection by utilizing antibodies. A key requirement of immuno-HCR is the coupling of oligonucleotides to antibodies, a process that can be challenging due to the inherent difficulties in expressing and purifying conventional antibodies.

      In this study, the authors present an innovative approach that circumvents these limitations by employing nanobody-based antibody mimetics, which recognize antibodies, instead of directly coupling oligonucleotides to conventional antibodies. This strategy facilitates oligonucleotide conjugation - designed to target the initiator hairpin oligonucleotide of HCR -through peptide ligation and click chemistry.

      Weaknesses:

      The sandwich-format technique presented in this study, which employs a nanobody that recognizes primary IgG antibodies, may have limited scalability compared to existing methods that directly couple oligonucleotides to primary antibodies. This limitation arises because the C-region types of primary antibodies are relatively restricted, meaning that the use of nanobody-based detection may constrain the number of target proteins that can be analyzed simultaneously. In contrast, the conventional approach of directly conjugating oligonucleotides to primary antibodies allows for a broader range of protein targets to be analyzed in parallel.

      We would like to clarify that MaMBA was specifically designed to address and overcome the limitations imposed by relying on primary antibodies’ Fc types for multiplexing. MaMBA utilizes DNA oligo-conjugated nanobodies that selectively and monovalently bind to the Fc region of IgG. This key feature allows us to barcode primary IgGs targeting different antigens independently. These barcoded IgGs can then be pooled together after barcoding, effectively minimizing the potential for cross-reactivity or crossover. Therefore, IgGs barcoded using MaMBA are functionally equivalent to those barcoded via conventional direct conjugation approaches with respect to multiplexing capability.

      Additionally, in the context of HCR-based protein detection, the number of proteins that can be analyzed simultaneously is inherently constrained by fluorescence wavelength overlap in microscopy, which limits its multiplexing capability. By comparison, direct coupling of oligonucleotides to primary antibodies can facilitate the simultaneous measurement of a significantly greater number of protein targets than the sandwich-based nanobody approach in the barcode-ELISA/NGS-based technique.

      As we have responded above, MaMBA barcoding of primary IgGs that target various antigens can be conducted separately. Once barcoded, these IgGs can then be combined into a single pool. Therefore, for BLISA (i.e., the barcode-ELISA/NGS-based technique), IgGs barcoded through MaMBA offer the same multiplexing capability as those barcoded using traditional direct conjugation methods.

      In in situ protein imaging, spectral overlap can indeed limit the throughput of multiplexed HCR fluorescent imaging. There are two strategies to address this challenge. As demonstrated in this work with _mis_HCR and _mis_HCRn, removing the HCR amplifiers allows for multiplexed detection using a limited number of fluorescence wavelengths. This is achieved through sequential rounds of HCR amplification and imaging. Alternatively, recent computational approaches offer promising solutions for “one-shot” multiplexed imaging. These include combinatorial multiplexing (PMID: 40133518) and spectral unmixing (PMID: 35513404), which can be applied to _mis_HCR to deconvolute overlapping spectra and increase multiplexing capacity in a single imaging acquisition.

      Reviewer #1 (Recommendations for the authors):

      (1) The introduction of nanobody and peptide ligation technology is a key highlight of this study. To strengthen the manuscript, the authors should provide a more detailed discussion of the principles and applications of HCR in the Introduction or Discussion sections.

      We have added a brief discussion of the HCR reaction to the revised manuscript.

      (2) It would also be beneficial to include results and/or discussion on how the affinity of nanobody binding to IgG influences the success and accuracy of the technique.

      We have added a brief discussion of the IgG nanobodies we used in MaMBA to the revised manuscript.

      (3) Additionally, a more detailed explanation of the recognition specificity of the AEP peptide ligase used in this study should be included in the Discussion section. Prior studies have reported on the specificity of amino acid residues positioned at the C-terminus of target A (-5 to -1) and the N-terminus of target B (1 to 3) in AEP-mediated ligation, and integrating this context would enhance clarity.

      We have added a brief discussion of the AEP-mediated ligation to the revised manuscript.

    1. eLife Assessment

      With a computational analysis of a neuroanatomical network model in C. elegans, this valuable work investigates the synaptic mechanism for memory-dependent klinotaxis, i.e., salt concentration chemotaxis. By incorporating experimental data altering the ASER neuron's basal glutamate release into their model, the authors demonstrate the possibility of a transition between excitatory and inhibitory signaling at the ASER-AIY synapse, depending on environmental and cultivated salt concentrations. These solid findings offer a proposal for how synaptic plasticity plays a role in sensorimotor navigation, and will be of interest to worm biologists and theoretical neuroscientists.

    2. Reviewer #2 (Public review):

      Summary:

      This study explores how a simple sensorimotor circuit in the nematode C. elegans enables it to navigate salt gradients based on past experiences. Using computational simulations and previously described neural connections, the study demonstrates how a single neuron, ASER, can change its signaling behavior in response to different salt conditions, with which the worm is able to "remember" prior environments and adjust its navigation toward "preferred" salinity accordingly.

      Strengths:

      The key novelty and strength of this paper is the explicit demonstration of computational neurobehavioral modeling and evolutionary algorithms to elucidate the synaptic plasticity in a minimal neural circuit that is sufficient to replicate memory-based chemotaxis. In particular, with changes in ASER's glutamate release and sensitivity of downstream neurons, the ASER neuron adjusts its output to be either excitatory or inhibitory depending on ambient salt concentration, enabling the worm to navigate toward or away from salt gradients based on prior exposure to salt concentration.

      Weaknesses:

      While the model successfully replicates some behaviors observed in previous experiments, some key assumptions of the work still need to be verified by biological validation of further experiments.

      Comments on revisions:

      Thank you for the authors' response. The revision and their response have substantially addressed my concerns.

    3. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment 

      The authors utilize a valuable computational approach to exploring the mechanisms of memorydependent klinotaxis, with a hypothesis that is both plausible and testable. Although they provide a solid hypothesis of circuit function based on an established model, the model's lack of integration of newer experimental findings, its reliance on predefined synaptic states, and oversimplified sensory dynamics, make the investigation incomplete for both memory and internal-state modulation of taxis.  

      We would like to express our gratitude to the editor for the assessment of our work. However, we respectfully disagree with the assessment that our investigation is incomplete, if the negative assessment is primarily due to the impact of AIY interneuron ablation on the chemotaxis index (CI) which was reported in Reference [1]. It is crucial to acknowledge that the CI determined through experimental means incorporates contributions from both klinokinesis and klinotaxis [1]. It is plausible that the impact of AIY ablation was not adequately reflected in the CI value. Consequently, the experimental observation does not necessarily diminish the role of AIY in klinotaxis. Anatomical evidence provided by the database (http://ims.dse.ibaraki.ac.jp/ccep-tool/) substantiates that ASE sensory neurons and AIZ interneurons, which have been demonstrated to play a crucial role in klinotaxis [Matsumoto et al., PNAS 121 (5) e2310735121], have the much higher number of synaptic connections with AIY interneurons. These findings provide substantial evidence supporting the validity of the presented minimal neural network responsible for salt klinotaxis.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This research focuses on C. elegans klinotaxis, a chemotactic behavior characterized by gradual turning, aiming to uncover the neural circuit mechanism responsible for the context-dependent reversal of salt concentration preference. The phenomenon observed is that the preferred salt concentration depends on the difference between the pre-assay cultivation conditions and the current environmental salt levels. 

      We would like to express our gratitude for the time and consideration you have dedicated to reviewing our manuscript.

      The authors propose that a synaptic-reversal plasticity mechanism at the primary sensory neuron, ASER, is critical for this memory- and context-dependent switching of preference. They build on prior findings regarding synaptic reversal between ASER and AIB, as well as the receptor composition of AIY neurons, to hypothesize that similar "plasticity" between ASER and AIY underpins salt preference behavior in klinotaxis. This plasticity differs conceptually from the classical one as it does not rely on any structural changes but rather synaptic transmission is modulated by the basal level of glutamate, and can switch from inhibitory to excitatory. 

      To test this hypothesis, the study employs a previously established neuroanatomically grounded model [4] and demonstrates that reversing the ASER-AIY synapse sign in the model agent reproduces the observed reversal in salt preference. The model is parameterized using a computational search technique (evolutionary algorithm) to optimize unknown electrophysiological parameters for chemotaxis performance. Experimental validity is ensured by incorporating constraints derived from published findings, confirming the plausibility of the proposed mechanism. 

      Finally. the circuit mechanism allowing C. elegans to switch behaviour to an exploration run when starved is also investigated. This extension highlights how internal states, such as hunger, can dynamically reshape sensory-motor programs to drive context-appropriate behaviors.  

      We would like to thank the reviewer for the appropriate summary of our work. 

      Strengths and weaknesses: 

      The authors' approach of integrating prior knowledge of receptor composition and synaptic reversal with the repurposing of a published neuroanatomical model [4] is a significant strength. This methodology not only ensures biological plausibility but also leverages a solid, reproducible modeling foundation to explore and test novel hypotheses effectively.

      The evidence produced that the original model has been successfully reproduced is convincing.

      The writing of the manuscript needs revision as it makes comprehension difficult.  

      We would like to thank the reviewer for recognizing the usefulness of our approach. In the revised version, we improved the explanation according to your suggestions.  

      One major weakness is that the model does not incorporate key findings that have emerged since the original model's publication in 2013, limiting the support for the proposed mechanism. In particular, ablation studies indicate that AIY is not critical for chemotaxis, and other interneurons may play partially overlapping roles in positive versus negative chemotaxis. These findings challenge the centrality of AIY and suggest the model oversimplifies the circuit involved in klinotaxis.

      We would like to express our gratitude for the constructive feedback we have received. We concur with some of your assertions. In fact, our model is the minimal network for salt klinotaxis, which includes solely the interneurons that are connected to each other via the highest number of synaptic connections. It is important to note that our model does not consider redundant interneurons that exhibit overlapping roles. Consequently, the model is not applicable to the study of the impact of interneuron ablation. In the reference [1], the influence of interneuron ablations on the chemotaxis index (CI) has been investigated. The experimentally determined CI value incorporates the contributions from both klinokinesis and klinotaxis. Consequently, it is plausible that the impact of AIY ablation was not significantly reflected in the CI value. The experimental observation does not necessarily diminish the role of AIY in klinotaxis. 

      Reference [1] also shows that ASER neurons exhibit complex, memory- and context-dependent responses, which are not accounted for in the model and may have a significant impact on chemotactic model behaviour. 

      As the reviewer has noted, our model does not incorporate the context-dependent response of the ASER. Instead, the impact of the salt concentration-dependent glutamate release from the ASER [S. Hiroki et al. Nat Commun 13, 2928 (2022)] as the result of the ASER responses was in detail examined in the present study.

      The hypothesis of synaptic reversal between ASER and AIY is not explicitly modeled in terms of receptor-specific dynamics or glutamate basal levels. Instead, the ASER-to-AIY connection is predefined as inhibitory or excitatory in separate models. This approach limits the model's ability to test the full range of mechanisms hypothesized to drive behavioral switching.  

      We would like to express our gratitude to the reviewer for their constructive feedback. As you correctly noted, the hypothesized synaptic reversal between ASER and AIY is not explicitly modeled in terms of the sensitivity of the receptors in the AIY and the glutamate basal levels by the ASER. On the other hand, in the present study, under considering a substantial difference in the sensitivity of the two glutamate receptors on the AIY, we sought to endeavored to elucidate the impact of salt-concentration-dependent glutamate basal levels on klinotaxis. To this end, we conducted a comprehensive examination of the full range gradual change in the ASER-to-AIY connection from inhibitory to excitatory, as illustrated in Figures S4 and S5.

      While the main results - such as response dependence on step inputs at different phases of the oscillator - are consistent with those observed in chemotaxis models with explicit neural dynamics (e.g., Reference [2]), the lack of richer neural dynamics could overlook critical effects. For example, the authors highlight the influence of gap junctions on turning sensitivity but do not sufficiently analyze the underlying mechanisms driving these effects. The role of gap junctions in the model may be oversimplified because, as in the original model [4], the oscillator dynamics are not intrinsically generated by an oscillator circuit but are instead externally imposed via $z_¥text{osc}$. This simplification should be carefully considered when interpreting the contributions of specific connections to network dynamics. Lastly, the complex and contextdependent responses of ASER [1] might interact with circuit dynamics in ways that are not captured by the current simplified implementation. These simplifications could limit the model's ability to account for the interplay between sensory encoding and motor responses in C. elegans chemotaxis. 

      We might not understand the substance of your assertions. However, we understand that the oscillator dynamics were not intrinsically generated by the oscillator neural circuit that is explicitly incorporated into our modeling. On the other hand, the present study focuses on how the sensory input and resulting interneuron dynamics regulate the oscillatory behavior of SMB motor neurons to generate klinotaxis. The neuron dynamics via gap junctions results from the equilibration of the membrane potential yi of two neurons connected by gap junctions rather than the zi. We added this explanation in the revised manuscript as follows.

      “The hyperpolarization signals in the AIZL are transmitted to the AIZR via the gap junction (Figs. S1d and S1f and Fig. 3d). This is because the neuron dynamics via gap junctions results from the equilibration of the membrane potential y<sub>i</sub> of two neurons connected by gap junctions rather than the z<sub>i</sub>.”

      In the limitation, we added the following sentence:

      “In the present study, the oscillator components of the SMB are not intrinsically generated by an oscillator circuit but are instead externally imposed via 𝑧<sub>i</sub><sup>OSC</sup>. Furthermore, the complex and context-dependent responses of ASER {Luo:2014et} were not taken into consideration. It should be acknowledged as a limitation of this study that these omitted factors may interact with circuit dynamics in ways that are not captured by the current simplified implementation.”

      Appraisal: 

      The authors show that their model can reproduce memory-dependent reversal of preference in klinotaxis, demonstrating that the ASER-to-AIY synapse plays a key role in switching chemotactic preferences. By switching the ASER-AIY connection from excitatory to inhibitory they indeed show that salt preference reverses. They also show that the curving/turn rate underlying the preference change is gradual and depends on the weight between ASER-AIY. They further support their claim by showing that curving rates also depend on cultivated (set-point).  

      We would like to thank the reviewer for assessing our work.

      Thus within the constraints of the hypothesis and the framework, the model operates as expected and aligns with some experimental findings. However, significant omissions of key experimental evidence raise questions on whether the proposed neural mechanisms are sufficient for reversal in salt-preference chemotaxis.  

      We agree with your opinion. The present hypothesis should be verified by experiments.

      Previous work [1] has shown that individually ablating the AIZ or AIY interneurons has essentially no effect on the Chemotactic Index (CI) toward the set point ([1] Figure 6). Furthermore, in [1] the authors report that different postsynaptic neurons are required for movement above or below the set point. The manuscript should address how this evidence fits with their model by attempting similar ablations. It is possible that the CI is rescued by klinokinesis but this needs to be tested on an extension of this model to provide a more compelling argument.  

      We would like to express our gratitude for the constructive feedback we have received. In the reference [1], the influence of interneuron ablations on the chemotaxis index (CI) has been investigated. It is important to acknowledge that the experimentally determined CI value encompasses the contributions of both klinokinesis and klinotaxis. It is plausible that the impact of AIY ablation was not reflected in the CI value. Consequently, these experimental observations do not necessarily diminish the role of AIY in klinotaxis. The neural circuit model employed in the present study constitutes a minimal network for salt klinotaxis, encompassing solely interneurons that are connected to each other via the highest number of synaptic connections. Anatomical evidence provided by the database (http://ims.dse.ibaraki.ac.jp/cceptool/) substantiates that ASE sensory neurons and AIZ interneurons, which have been demonstrated to play a crucial role in klinotaxis [Matsumoto et al., PNAS 121 (5) e2310735121], have the much higher number of synaptic connections with AIY interneurons. Our model does not take into account redundant interneurons with overlapping roles, thus rendering it not applicable to the study of the effects of interneuron ablation.

      The investigation of dispersal behaviour in starved individuals is rather limited to testing by imposing inhibition of the SMB neurons. Although a circuit is proposed for how hunger states modulate taxis in the absence of food, this circuit hypothesis is not explicitly modelled to test the theory or provide novel insights.  

      As the reviewer noted, the experimentally identified neural circuit that inhibits the SMB motor neurons in starved individuals is not incorporated in our model. Instead of incorporating this circuit explicitly, we examined whether our minimal network model could reproduce dispersal behavior under starvation conditions solely due to the experimentally demonstrated inhibitory effect of SMB motor neurons.

      Impact: 

      This research underscores the value of an embodied approach to understanding chemotaxis, addressing an important memory mechanism that enables adaptive behavior in the sensorimotor circuits supporting C. elegans chemotaxis. The principle of operation - the dependence of motor responses to sensory inputs on the phase of oscillation - appears to be a convergent solution to taxis. Similar mechanisms have been proposed in Drosophila larvae chemotaxis [2], zebrafish phototaxis [3], and other systems. Consequently, the proposed mechanism has broader implications for understanding how adaptive behaviors are embedded within sensorimotor systems and how experience shapes these circuits across species.

      We would like to express our gratitude for useful suggestion. We added this argument in Discussion of the revised manuscript as follows.    

      “The principle of operation, in which the dependence of motor responses to sensory inputs on the phase of motor oscillation, appears to be a convergent solution for taxis and navigation across species. In fact, analogous mechanisms have been postulated in the context of chemotaxis in Drosophila larvae chemotaxis {Wystrach:2016bt} and phototaxis in zebrafish {Wolf:2017ei}. Consequently, the synaptic reversal mechanism highlighted in this study offers the framework for understanding how the behaviors that are adaptive to the environment are embedded within sensorimotor systems and how experience shapes these neural circuits across species.”

      Although the reported reversal of synaptic connection from excitatory to inhibitory is an exciting phenomenon of broad interest, it is not entirely new, as the authors acknowledge similar reversals have been reported in ASER-to-AIB signaling for klinokinesis ( Hiroki et al., 2022). The proposed reversal of the ASER-to-AIY synaptic connection from inhibitory to excitatory is a novel contribution in the specific context of klinotaxis. While the ASER's role in gradient sensing and memory encoding has been previously identified, the current paper mechanistically models these processes, introducing a hypothesis for synaptic plasticity as the basis for bidirectional salt preference in klinotaxis.  

      The research also highlights how internal states, such as hunger, can dynamically reshape sensory-motor programs to drive context-appropriate behaviors.  

      The methodology of parameter search on a neural model of a connectome used here yielded the valuable insight that connectome information alone does not provide enough constraints to reproduce the neural circuits for behaviour. It demonstrates that additional neurophysiological constraints are required.  

      We would like to acknowledge the appropriate recognition of our work.

      Additional Context 

      Oscillators with stimulus-driven perturbations appear to be a convergent solution for taxis and navigation across species. Similar mechanisms have been studied in zebrafish phototaxis [3], Drosophila larvae chemotaxis [2], and have even been proposed to underlie search runs in ants. The modulation of taxis by context and memory is a ubiquitous requirement, with parallels across species. For example, Drosophila larvae modulate taxis based on current food availability and predicted rewards associated with odors, though the underlying mechanism remains elusive. The synaptic reversal mechanism highlighted in this study offers a compelling framework for understanding how taxis circuits integrate context-related memory retrieval more broadly.  

      We would like to express our gratitude for the insightful commentary. In the revised manuscript, we incorporated the argument that the similar oscillator mechanism with stimulus-driven perturbations has been observed for zebrafish phototaxis [3] and Drosophila larvae chemotaxis [2] into Discussion.

      As a side note, an interesting difference emerges when comparing C. elegans and Drosophila larvae chemotaxis. In Drosophila larvae, oscillatory mechanisms are hypothesized to underlie all chemotactic reorientations, ranging from large turns to smaller directional biases (weathervaning). By contrast, in C. elegans, weathervaning and pirouettes are treated as distinct strategies, often attributed to separate neural mechanisms. This raises the possibility that their motor execution could share a common oscillator-based framework. Re-examining their overlap might reveal deeper insights into the neural principles underlying these maneuvers. 

      We would like to acknowledge your thoughtfully articulated comment. As the reviewer pointed out, the anatomical database (http://ims.dse.ibaraki.ac.jp/ccep-tool/) shows that that the neural circuits underlying weathervaning and pirouettes in C. elegans are predominantly distinct but exhibit partial overlap. When we restrict our search to the neurons that are connected to each other with the highest number of synaptic connections, we identify the projections from the neural circuit of weathervaning to the circuit of pirouettes; however we observed no reversal projections. This finding suggests that the neural circuit of weathervaning, namely, our minimal neural network, is not likely to be affected by that of pirouettes, which consists of AIB interneurons and interneurons and motor neurons the downstream. 

      (1) Luo, L., Wen, Q., Ren, J., Hendricks, M., Gershow, M., Qin, Y., Greenwood, J., Soucy, E.R., Klein, M., Smith-Parker, H.K., & Calvo, A.C. (2014). Dynamic encoding of perception, memory, and movement in a C. elegans chemotaxis circuit. Neuron, 82(5), 1115-1128. 

      (2) Antoine Wystrach, Konstantinos Lagogiannis, Barbara Webb (2016) Continuous lateral oscillations as a core mechanism for taxis in Drosophila larvae eLife 5:e15504. 

      (3) Wolf, S., Dubreuil, A.M., Bertoni, T. et al. Sensorimotor computation underlying phototaxis in zebrafish. Nat Commun 8, 651 (2017). 

      (4) Izquierdo, E.J. and Beer, R.D., 2013. Connecting a connectome to behavior: an ensemble of neuroanatomical models of C. elegans klinotaxis. PLoS computational biology, 9(2), p.e1002890. 

      Reviewer #2 (Public review): 

      Summary: 

      This study explores how a simple sensorimotor circuit in the nematode C. elegans enables it to navigate salt gradients based on past experiences. Using computational simulations and previously described neural connections, the study demonstrates how a single neuron, ASER, can change its signaling behavior in response to different salt conditions, with which the worm is able to "remember" prior environments and adjust its navigation toward "preferred" salinity accordingly.  

      We would like to express our gratitude for the time and consideration the reviewer has dedicated to reviewing our manuscript.

      Strengths: 

      The key novelty and strength of this paper is the explicit demonstration of computational neurobehavioral modeling and evolutionary algorithms to elucidate the synaptic plasticity in a minimal neural circuit that is sufficient to replicate memory-based chemotaxis. In particular, with changes in ASER's glutamate release and sensitivity of downstream neurons, the ASER neuron adjusts its output to be either excitatory or inhibitory depending on ambient salt concentration, enabling the worm to navigate toward or away from salt gradients based on prior exposure to salt concentration.

      We would like to thank the reviewer for appreciating our research. 

      Weaknesses: 

      While the model successfully replicates some behaviors observed in previous experiments, many key assumptions lack direct biological validation. As to the model output readouts, the model considers only endpoint behaviors (chemotaxis index) rather than the full dynamics of navigation, which limits its predictive power. Moreover, some results presented in the paper lack interpretation, and many descriptions in the main text are overly technical and require clearer definitions.  

      We would like to thank the reviewer for the constructive feedback. As the reviewer noted, the fundamental assumptions posited in the study have yet to be substantiated by biological validation, and consequently, these assumptions must be directly assessed by biological experimentation. The model performance for salt klinotaxis has been evaluated by multiple factors, including not only a chemotaxis index but also the curving rate vs. bearing (Fig. 4a, the bearing is defined in Fig. A3) and the curving rate vs. normal gradient (Fig. 4c). These two parameters work to characterize the trajectory during salt klinotaxis. In the revised version, we meticulously revised the manuscript according to the reviewer’s suggestions. We would like to express our sincere gratitude for your insightful review of our work.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors): 

      An interesting and engaging methodology combining theoretical and computational approaches. Overall I found the manuscript up to discussion a difficult read, and I would suggest revising it. I would also recommend introducing the general operating principle of the oscillator with sensory perturbations before jumping into the implementation details of signal propagation specific to C.

      elegans.  

      In order to elucidate the relation between the general operating principle of the oscillator with sensory perturbations and the results shown by the two graphs from the bottom in Fig. 3d, the following statement was added on page 12.

      “It is remarkable that this regulatory mechanism derived via the optimization of the CI has been observed in the context of chemotaxis in Drosophila larvae chemotaxis {Wystrach:2016bt} and phototaxis in zebrafish {Wolf:2017ei}. The principle of operation, in which the dependence of motor responses to sensory inputs on the phase of motor oscillation, therefore, may serve as a convergent solution for taxis and navigation across species.”

      The abstract could benefit from a clarification of terms to benefit a broader audience:  The term "salt klinotaxis" is used without prior introduction or definition. It would be beneficial to briefly explain this term, as it may not be familiar to all readers. 

      Due to the limitation of the word number in the abstract, the explanation of salt klinotaxis could not be included.

      Although ASER is introduced as a right-side head sensory neuron, AIY neurons are not similarly introduced. It may also benefit to introduce here that ASER integrates memory with current salt gradients, tuning its output to produce context-appropriate behaviour.  

      Due to the limitation of the word number in the abstract, we could add no more the explanations. 

      "it can be anticipated that the ASER-AIY synaptic transmission will undergo a reversal due to alterations in the basal glutamate Release": Where is this expectation drawn from? Is it derived from biophysical or is it a functional expectation to explain the network's output constraints?  

      As delineated before this sentence, it is derived from a comprehensive consideration of the sensitivity of excitatory/inhibitory glutamate receptors expressed on the postsynaptic AIY interneurons, in conjunction with varying the basal level of glutamate transmission from ASER.

      The statement that the model "revealed the modular neural circuit function downstream of ASE" could be more explicit. What specific insights about the downstream circuit were uncovered?

      Highlighting one or two key findings would strengthen the impact.  

      Due to the limitation of the word number in the abstract, no more details could be added here, while the sentence was revised as “revealed that the circuit downstream of ASE functions as a module that is responsible for salt klinotaxis.” This is because the salt-concentration dependent behaviors in klinitaxis can be reproduced through the modulation of the ASRE-AIY synaptic connections alone, despite the absence of alterations in the neural circuit downstream of AIY.

      I believe the authors should cite Luo et al. 2014, which also studies how chemotactic behaviours arise from neural circuit dynamics, including the dynamic encoding of salt concentration by ASER, and the crucial downstream interaction with AIY for chemotactic actions. 

      We would like to express our gratitude for useful suggestion. We cited Luo et al. 2014 in the discussion on the limitation of our work. 

      The introduction could also be improved for clarity. Specifically in the last paragraph authors should clarify how the observed synchrony of ASER excitation to the AIZ (Matsumoto et al., 2024), validates the resulting network.  

      We would like to express our gratitude for useful suggestion. We added the following explanation in the last paragraph of the introduction.

      “Specifically, the synchrony of the excitation of the ASER and AIZ {Matsumoto:2024ig} taken together with the experimentally identified inhibitory synaptic transmission between the AIY and AIZ revealed that the ASER-AIY synaptic connections should be inhibitory, which was consistent with the network obtained from the most evolved model.”

      In addition, we added the following explanation after “It was then hypothesized that the ASER-AIY inhibitory synaptic connections are altered to become excitatory due to a decrease in the baseline release of glutamate from the ASER when individuals are cultured under C<sub>cult</sub> < C<sub>test</sub>.”

      This is due to the substantial difference in the sensitivity of excitatory/inhibitory glutamate receptors expressed on the postsynaptic AIY interneurons.

      I would also strongly recommend replacing the term "evolved model", with "Optimized Model" or "Best-Performing Model" to clarify this is a computational optimization process with limitations - optimization through GAs does not guarantee finding global optima.  

      We revised "evolved model" as "optimized model" in the main and SI text.

      The text overall would benefit from editing for clarity and expression.  

      According to the revisions mentioned above, we revised “best optimized model” as “most optimized model” in the main and SI text.

      The font size on the plot axis in Figures 3 c&d should be increased for readability on the printed page. Label the left/right panel to indicate unconstrained / constrained evolution.  

      As you noted, the font size of the subscript on the vertical axis in Figs 3c and 3d was too small. We have revised the font size of the subscript in Figs. 3c and 3d and also in Fig. 5e. At your suggestion, “unconstrained” and “constrained” have been added as labels to the left and right panels in Fig. 3.

      There is no input/transmission to AIYR to step input in either model shown in Figure 3? 

      As shown in Fig. S1e and S1f, there are the transmissions to the AIYR from the ASEL and ASER. 

      Supplementary Figure 1 attempts to explain the interactions. There are inconsistent symbols used for inhibition and excitation between network schema (colours) and the z response plots (arrows vs circles), combined with different meanings for red/blue making it very confusing. 

      We could not address the inconsistency in the color of arrows and lines with an ending between Figs. S1c and S1d and Figs. S1a and S1b. On the other hand, Figs. S1e and S1f were revised so that the consistent symbols were used for inhibition, excitation, and electrical gap connections in Figs. S1c-S1f. The same revisions were made for Fig. S7c-S7f.

      Model parameters are given to 15 decimal precision, which seems excessive. Is model performance sensitive to that order? We would expect robustness around those values. The authors should identify relevant orders and truncate parameters accordingly. 

      We examined the influence of the parameter truncation on the trajectory and decided that the parameters with four decimal places were appropriate. According to this, we revised Table A4.

      Figure 3 caption typo "step changes I the salt concentration".  

      The typo was revised in Fig. 3 caption. 

      Reviewer #2 (Recommendations for the authors): 

      (1) Overall, the language of the paper is not properly organized, making the paper's logic and purpose hard to follow. In the Results Section, many observations or findings lack explicit interpretation. To address this issue, the authors should consider (1) adopting the contextcontent-conclusion scheme, (2) optimizing the logic flow by clearly identifying the context and goals prior to discussing their results and findings, (3) more explicitly interpreting their results, especially in a biological context.  

      We would like to express our gratitude for helpful suggestion. According to your suggestion listed below, we revised the main and SI texts.

      (2) In Figure 2, trajectories from the model with AIY-AIZ constraints show a faster convergence than those from the constraint-free model. However, in the corresponding texts in the Results section, the authors claimed no significant difference. It seems that the authors made this argument only based on CI (Chemotaxis Index). Therefore, in order to address such inconsistency, the authors need more explanation on why only relying on CI, which is an endpoint metric, instead of the whole navigation.  

      I would like to thank you for the helpful comment. In the present study, not only the CI but also the curving rate shown in Fig. 4 were applied to characterize the behavior in klinotaxis.

      According to your comments, we revised the related description in the main text as follows:

      “The difference between these CI values is slight, while the model optimized with the constraints exhibits a marginally accelerated attainment of the salt concentration peak, as shown by the trajectories. The slightly higher chemotaxis performance observed in the constrained model is not essentially attributed to the introduction of the AIY-AIZ synaptic constraints but rather depends on the specific individuals selected from the optimized individuals obtained from the evolutionary algorithm. In fact, even when the AIY-AIZ constraints are taken into consideration, the model retains a significant degree of freedom to reproduce salt klinotaxis due to the presence of a substantial parameter space. Consequently, the impact of the AIY-AIZ constraints on the optimization of the CI is expected to be negligible.”

      (3) In Figures 3a and b, some inter-neuron connections are relatively weak (e.g., AIYR to AIZR in Figure 3a) - thus it is unclear whether the polarity of such synapses would significantly influence the behavioral outcome or not. The authors could consider plotting the change of the connection strengths between neurons over the course of model optimization to get a sense of confidence in each inter-neuron connection. 

      In the evolutional algorithm, the parameters of individuals are subject to discontinuous variation due to the influence of selection, crossover, and mutations. Consequently, it is not straightforward to extract information regarding parameter optimization from parameter changes due to the non-systematic nature of parameter variation..

      (4) In Figure 3, the order of individual figure panels is incorrect: in the main text, Figure 3 a and b were mentioned after c and d. Also, the caption of Figure 3c "negative step changes I the" should be "in".  

      The main text underwent revision, with the description of Figures 3a and 3b being presented prior to that of Figures 3c and 3d. The typo was revised.

      (5) In Figure 4, the order of individual figure panels is messed up: in the main text, Figure 4 a was mentioned after b.  

      The main text underwent revision, with the description of Figure 4a being presented prior to that of Figure 4b.

      (6) Also in Figure 4, the authors need to provide a definition/explanation of "Bearing" and "Translational Gradient". In Figure 4d, the definition of positive and negative components is not clear.  

      Normal and Translational Salt Concentration Gradient in METHOD was referenced for the definition and explanation of the bearing and the translational gradient. We added the following explanation on the positive and negative components.

      “The positive and negative components of the curving rate are respectively sampled from the trajectory during leftward turns (as illustrated in Fig. 4b) and rightward turns, respectively.”

      (7) Figure 5: the authors need to explain why c has an error bar and how they were calculated, as this result is from a computational model. Figure 5d is experimental results - the authors need to add error bars to the data points and provide a sample size. 

      As explained in Analysis of the Salt Preference Behavior in Klinotaxis in METHOD, the ensemble average of these quantities was determined by performing 100,000 sets of the simulation with randomized initial orientation for a simulation time of T_sim=200 sec. The error bars for the experimental data were added in Figs. 5c, 6a, and S9a.

      (8) On Page 14, the authors said, "To this end, this end, we used the best evolved network with the constraints, in which we varied the synaptic connections between ASER and AIY from inhibitory to excitatory." How did the model change the ASER-AIY signaling specifically? The authors should provide more explanation or at least refer to the Methods Section.  

      The caption of Fig. S4 was referred as the explanation on the detailed method. 

      (9) Page 15: "a subset a subset exhibited a slight curve...". This observation from the model simulation is contradictory to experiments. However, their explanation of that is hard to understand.  

      I would like to thank you for the helpful comment. To improve this, we added the following explanation:

      “In the case of step increases in 𝑧OFF as illustrated in the second right panel from the bottom in Fig.3d, the turning angle φ is increased from its ideal oscillatory component to a value close to zero, causing the model worm to deviate from the ideal sinusoidal trajectory and gradually turn toward lower salt concentrations. On the other hand, in the case of step increases in 𝑧ON as illustrated in the second left panel from the bottom in Fig.3d, the turning angle φ is again increased from its ideal oscillatory component to a value close to zero, causing the model worm to deviate from the ideal sinusoidal trajectory and gradually turn toward higher salt concentrations. The behaviors that are consistent with these analyses are observed in the trajectory illustrated in Fig. S8b.”

      (10) Last result session: inhibited SMB in starved worms is due to a mechanism unrelated to their neural network model upstream to SMB. Therefore, their results recapitulating the worms' dispersal behaviors cannot strengthen the validity of their model.  

      We agree with your opinion. We think that the findings from the study of starved worms do not provide evidence to validate the neural network model upstream of SMB.   

      (11) Discussion: "in contrast, the remaining neurons...". This argument lacks evidence or references.  

      This argument is based on the results obtained from the present study. This sentence was revised as follows:

      “This regulatory process enables the reproduction of salt concentration memory-dependent reversal of preference behavior in klinotaxis, despite the remaining neurons further downstream of the ASER not undergoing alterations and simply functioning as a modular circuit to transmit the received signals to the motor systems. Consequently, the sensorimotor circuit allows a simple and efficient bidirectional regulation of salt preference behavior in klinotaxis.”

      (12) To increase the predictive power of their model, can the authors perform simulations on mutant worms, like those with altered glutamate basal level expression in ASER?  

      We would like to express our gratitude for useful suggestion. The simulations, in which the weight of the ASER-AIY synaptic connection is increased from negative (inhibitory connection) to positive (excitatory connection), as illustrated in Figure S4, provide valuable insights into the relationship between varying glutamate basal levels from ASER and behavior in klinotaxis, such as the chemotaxis index.

    1. eLife Assessment

      This study presents a valuable finding on the molecular mechanisms that govern GABAergic inhibitory synapse function. The authors propose that Endophilin A1 serves as a novel regulator of GABAergic synapses by acting as a component of the inhibitory postsynaptic density. The authors have added substantial new analyses that take a wide range of approaches to provide solid support for their conclusions, although one of the reviewers concludes that the premise that gephyrin and endophilin A1 interact requires more robust analysis. The findings are likely to interest a broad audience of scientists focusing on inhibitory synaptic transmission, the excitation-inhibition balance, and its disruption in disorders such as epilepsy.

    2. Reviewer #1 (Public review):

      Summary:

      In the present study, Chen et al. investigate the role of Endophilin A1 in regulating GABAergic synapse formation and function. To this end, the authors use constitutive or conditional knockout of Endophilin A1 (EEN1) to assess the consequences on GABAergic synapse composition and function, as well as the outcome for PTZ-induced seizure susceptibility. The authors show that EEN1 KO mice show a higher susceptibility to PTZ-induced seizures, accompanied by a reduction in the GABAergic synaptic scaffolding protein gephyrin as well as specific GABAAR subunits and eIPSCs. The authors then investigate the underlying mechanisms, demonstrating that Endophilin A1 binds directly to gephyrin and GABAAR subunits, and identifying the subdomains of Endophilin A1 that contribute to this effect. Overall, the authors state that their study places Endophilin A1 as a new regulator of GABAergic synapse function.

      Strengths:

      Overall, the topic of this manuscript is very timely, since there has been substantial recent interest in describing the mechanisms governing inhibitory synaptic transmission at GABAergic synapses. The study will therefore be of interest to a wide audience of neuroscientists studying synaptic transmission and its role in disease. The manuscript is well written and contains a substantial quantity of data. In the revised version of the manuscript, the authors have increased the number of samples analyzed and have significantly improved the statistical analysis, thereby substantially strengthening the conclusions of their study.

      Comments on revised version:

      The authors have addressed all of my concerns, and the manuscript has been substantially improved.

    3. Reviewer #2 (Public review):

      Summary:

      The function of neural circuits relies heavily on the balance of excitatory and inhibitory inputs. Particularly, inhibitory inputs are understudied when compared to their excitatory counterparts due to the diversity of inhibitory neurons, their synaptic molecular heterogeneity, and their elusive signature. Thus, insights into these aspects of inhibitory inputs can inform us largely on the functions of neural circuits and the brain.

      Endophilin A1, an endocytic protein heavily expressed in neurons, has been implicated in numerous pre- and postsynaptic functions, however largely at excitatory synapses. Thus, whether this crucial protein plays any role in inhibitory synapse, and whether this regulates functions at the synaptic, circuit, or brain level remains to be determined.

      New Findings:

      (1) Endophilin A1 interacts with the postsynaptic scaffolding protein gephyrin at inhibitory postsynaptic densities within excitatory neurons.

      (2) Endophilin A1 promotes the organization of the inhibitory postsynaptic density and the subsequent recruitment/stabilization of GABA A receptors via Endophilin A1's membrane binding and actin polymerization activities.

      (3) Loss of Endophilin A1 in CA1 mouse hippocampal pyramidal neurons weakens inhibitory input and leads to susceptibility to epilepsy.

      (4) Thus the authors propose that via its role as a component of the inhibitory postsynaptic density within excitatory neurons, Endophilin A1 supports the organization, stability, and efficacy of inhibitory input to maintain the excitatory/inhibitory balance critical for brain function.

      (5) The conclusion of the manuscript is well supported by the data but will be strengthened by addressing our list of concerns and experiment suggestions.

      Comments on revised version:

      The authors addressed the concerns adequately. The three remaining concerns are:

      (1) The use of one-way ANOVA is not well justified.

      (2) The use of superplots to show culture to culture variability would make it more transparent.

      (3) Change EEN1 in Figure 8B to EndoA1.

    4. Reviewer #3 (Public review):

      Summary:

      The authors investigated a possible role of Endophilin A1 in the inhibitory postsynaptic density.

      Strengths:

      The authors used a broad array of experimental approaches to investigate this, including tests of seizure susceptibility, electrophysiology, biochemistry, neuronal culture and image analysis.

      Weaknesses:

      Many results are difficult to interpret, and data quality is not always convincing, unfortunately. The basic premise of the study, that gephyrin and endophilin A1 interact, requires more robust analysis to be convincing.

      Specific comments:

      The authors have made a substantial effort to improve their manuscript. A number of issues, related to numbers of observations mentioned by the reviewers, are clarified in the revised manuscript. The authors have also clarified some of the other questions from the reviewers. The long list of issues brought up by the reviewers and the many corrections needed still raise questions about data quality in this manuscript.<br /> In response to my comments (Point 2), the added experiment with PSD95.FingR and GPN.FingR in cultured neurons (Fig. S5A-D) is a good addition; the in vivo data using FingRs in Figure S3 look less convincing however. In response to my Point 5, the authors have added a cell-free binding assay (Figure 5I). This is a useful addition, but to convincingly make the point of interaction between Gephyrin and EndoA1, more rigorous biophysical quantitation of binding is needed. The legend in Figure 5I states that 4 independent experiments were performed, but the graph only shows 3 dots. This needs to be corrected.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the present study, Chen et al. investigate the role of Endophilin A1 in regulating GABAergic synapse formation and function. To this end, the authors use constitutive or conditional knockout of Endophilin A1 (EEN1) to assess the consequences on GABAergic synapse composition and function, as well as the outcome for PTZ-induced seizure susceptibility. The authors show that EEN1 KO mice show a higher susceptibility to PTZ-induced seizures, accompanied by a reduction in the GABAergic synaptic scaffolding protein gephyrin as well as specific GABAAR subunits and eIPSCs. The authors then investigate the underlying mechanisms, demonstrating that Endophilin A1 binds directly to gephyrin and GABAAR subunits, and identifying the subdomains of Endophilin A1 that contribute to this effect. Overall, the authors state that their study places Endophilin A1 as a new regulator of GABAergic synapse function.

      Strengths:

      Overall, the topic of this manuscript is very timely, since there has been substantial recent interest in describing the mechanisms governing inhibitory synaptic transmission at GABAergic synapses. The study will therefore be of interest to a wide audience of neuroscientists studying synaptic transmission and its role in disease. The manuscript is well-written and contains a substantial quantity of data.

      Weaknesses:

      A number of questions remain to be answered in order to be able to fully evaluate the quality and conclusions of the study. In particular, a key concern throughout the manuscript regards the way that the number of samples for statistical analysis is defined, which may affect the validity of the data analysed. Addressing this weakness will be essential to providing conclusive results that support the authors' claims.

      We would like to thank the reviewer for appreciation of the value of our study and careful critics to help us improve the manuscript. We will correct the way that the number of samples for statistical analysis is defined throughout the manuscript as suggested and update figures, figure legends, and Materials and Methods accordingly. For example, we will average the values for all dendritic segments from one neuron, so that each data point represents one neuron in the graphs.

      Reviewer #2 (Public review):

      Summary:

      The function of neural circuits relies heavily on the balance of excitatory and inhibitory inputs. Particularly, inhibitory inputs are understudied when compared to their excitatory counterparts due to the diversity of inhibitory neurons, their synaptic molecular heterogeneity, and their elusive signature. Thus, insights into these aspects of inhibitory inputs can inform us largely on the functions of neural circuits and the brain.

      Endophilin A1, an endocytic protein heavily expressed in neurons, has been implicated in numerous pre- and postsynaptic functions, however largely at excitatory synapses. Thus, whether this crucial protein plays any role in inhibitory synapse, and whether this regulates functions at the synaptic, circuit, or brain level remains to be determined.

      New Findings:

      (1) Endophilin A1 interacts with the postsynaptic scaffolding protein gephyrin at inhibitory postsynaptic densities within excitatory neurons.

      (2) Endophilin A1 promotes the organization of the inhibitory postsynaptic density and the subsequent recruitment/stabilization of GABA A receptors via Endophilin A1's membrane binding and actin polymerization activities.

      (3) Loss of Endophilin A1 in CA1 mouse hippocampal pyramidal neurons weakens inhibitory input and leads to susceptibility to epilepsy.

      (4) Thus the authors propose that via its role as a component of the inhibitory postsynaptic density within excitatory neurons, Endophilin A1 supports the organization, stability, and efficacy of inhibitory input to maintain the excitatory/inhibitory balance critical for brain function.

      (5) The conclusion of the manuscript is well supported by the data but will be strengthened by addressing our list of concerns and experiment suggestions.

      We would like to thank the reviewer for their favorable impression of manuscript. We also appreciate the great experiment suggestions to help us improve the manuscript.

      Weaknesses:

      Technical concerns:

      (1) Figure 1F and Figure 1H, Figures 7H,J:

      Can the authors justify using a paired-pulse interval of 50 ms for eEPSCs and an interval of 200 ms for eIPSCs? Otherwise, experiments should be repeated using the same paired pulse interval.

      We apologize for the confusion. As illustrated by the schematic current traces, the decay time constants of eEPSCs and eIPSCs in hippocampal CA1 neurons are different. The eEPSCs exhibit a faster channel closing rate, corresponding to a smaller time constant Tau. Thus, a shorter inter-stimulus interval (50 ms) was chosen for paired-pulse ratio recordings. In contrast, the eIPSCs display a slower channel closing rate, with a Tau value larger than that of eEPSCs, so a longer inter-stimulus interval (200 ms) was used for PPR. This protocol has been long-established and adopted in previous studies (please see below for examples).

      Contractor, A., Swanson, G. & Heinemann, S. F. Kainate receptors are involved in short- and long-term plasticity at mossy fiber synapses in the hippocampus. Neuron 29, 209-216, doi:10.1016/s0896-6273(01)00191-x (2001).

      Babiec, W. E., Jami, S. A., Guglietta, R., Chen, P. B. & O'Dell, T. J. Differential Regulation of NMDA Receptor-Mediated Transmission by SK Channels Underlies Dorsal-Ventral Differences in Dynamics of Schaffer Collateral Synaptic Function. Journal of neuroscience 37, 1950-1964, doi:10.1523/JNEUROSCI.3196-16.2017 (2017).

      (2) Figures 3G,H,I:

      While 3D representations of proteins of interest bolster claims made by superresolution microscopy, SIM resolution is unreliable when deciphering the localization of proteins at the subsynaptic level given the small size of these structures (<1 micrometer). In order to determine the actual location of Endophilin A1, especially given the known presynaptic localization of this protein, the authors should complete SIM experiments with a presynaptic marker, perhaps an active zone protein, so that the relative localization of Endophilin A1 can be gleaned. Currently, overlapping signals could stem from the presynapse given the poor resolution of SIM in this context.

      Thanks for your suggestions. It is certainly preferable to investigate the relative localization of endophilin A1 using both presynaptic and postsynaptic markers. For SIM imaging in Figure 3G-I, to visualize neuronal morphology, we immunostained GFP as cell fill, leaving two other channels for detection of immunofluorescent signals of endophilin A1 and another protein. We will try co-immunostaining of endophilin A1, the active zone protein bassoon (presynaptic marker) and gephyrin without morphology labeling. Alternatively, we will do co-staining of endophilin A1 and bassoon in GFP-expressing neurons. We agree that overlapping signals or proximal localization of presynaptic endophilin A1 with gephyrin or GABA<sub>A</sub>R γ2 could not be ruled out. To note, if image resolution is improved with the use of a more advanced imaging system, the overlap between two proteins will become smaller or even disappear. With the ~110 nm lateral resolution of SIM microscopy, the degree of overlap between the two proteins of interest is much lower than in confocal microscopy. Given the presynaptic localization of endophilin, most likely we will observe a small overlap (presynatpic) or proximal localization (postsynaptic) of endophilin A1 with bassoon. Nevertheless, we will complete the SIM experiments as suggested to improve the manuscript.

      Manuscript consistency:

      (1) Figure 2:

      The authors looked at VGAT and noticed a reduction of signals in hippocampal regions in their P21 slices, indicating that the proposed postsynaptic organization/stabilization functions of Endophilin A1 extend to the inhibitory presynapse, perhaps via Neuroligin 2-Neurexin. Simultaneously, hippocampal regions in P21 slices showed a reduction in PSD-95 signals, indicating that excitatory synapses are also affected. It would be crucial to also look at excitatory presynapses, via VGLUT staining, to assess whether EndoA1 -/- also affects presynapses. Given the extensive roles of Endophilin A1 in presynapses, especially in excitatory presynapses, this should be investigated.

      Thanks for the thoughtful comments. Given that the both VGAT and PSD95 signals are reduced in hippocampal regions in P21 slices, it is conceivable that the proposed postsynaptic organization/stabilization functions of endophilin A1 extend to the inhibitory presynapse via Neuroligin-2-Neurexin and the excitatory presynapse as well during development. Of note, endophilin A1 knockout did not impair the distribution of Neuroligin-2 in inhibitory postsynapses (immunoisolated with anti-GABA<sub>A</sub>R α1) in mature mice (Figure 3K), and endophilin A1 did not bind to Neuroligin-2 (Figure 4D), suggesting that endophilin A1 might function via other mechanisms. Nevertheless, as functions of endophilin A family members at the presynaptic site are well-established, the reduction of presynaptic signals in developmental hippocampal regions of EndoA<sup>-/-</sup> mice might result from the depletion of presynaptic endophilin A1. The presynaptic deficits can be compensatory by other mechanisms as neurons mature. Certainly, we will do VGLUT staining of EndoA1<sup>-/-</sup> brain slices as suggested to assess the role of endophilin A1 in excitatory presynapses in vivo.

      (2) Figure 7C:

      The authors do not assess whether p140Cap overexpression rescues GABAAR receptor loss exhibited in Endophilin A1 KO, as they did for Gephryin. This would be an important data point to show, as p140Cap may somehow rescue receptor loss by another pathway. In fact, it is mentioned in the text that this experiment was done, "Consistently, neither p140Cap nor the endophilin A1 loss-of-function mutants could rescue the GABAAR clustering phenotype in EEN1 KO neurons (Figure 7C, D)" yet the data for p140Cap overexpression seem to be missing. This should be remedied.

      Thanks a lot for the thoughtful comment. We will determine whether p140Cap overexpression also rescues the GABA<sub>A</sub>R clustering phenotype in EndoA1<sup>-/-</sup> neurons by surface GABA<sub>A</sub>R γ2 staining in our revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      Chen et al. identify endophilin A1 as a novel component of the inhibitory postsynaptic scaffold. Their data show impaired evoked inhibitory synaptic transmission in CA1 neurons of mice lacking endophilin A1, and an increased susceptibility to seizures. Endophilin can interact with the postsynaptic scaffold protein gephyrin and promote assembly of the inhibitory postsynaptic element. Endophilin A1 is known to play a role in presynaptic terminals and in dendritic spines, but a role for endophilin A1 at inhibitory postsynaptic densities has not yet been described.

      Strengths:

      The authors used a broad array of experimental approaches to investigate this, including tests of seizure susceptibility, electrophysiology, biochemistry, neuronal culture, and image analysis.

      Weaknesses:

      Many results are difficult to interpret, and the data quality is not always convincing, unfortunately. The basic premise of the study, that gephyrin and endophilin A1 interact, requires a more robust analysis to be convincing.

      We greatly appreciate the positive comment on our study and the very valuable feedback for us to improve the manuscript. We will conduct additional experiments to improve our data quality and strengthen our evidences according to these great constructive suggestions. To gain strong evidence for the interaction between endophilin A1 and gephyrin, we will perform in vitro pull-down assay with recombinant proteins from bacterial expression system.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) For all of the electrophysiology experiments, only the number of neurons recorded is stated, but not the number of independent animals that these neurons were obtained from. The number of independent animals used should be stated for each panel. At least 3 independent animals should be used in each group, otherwise, more data needs to be added.

      We apologize for missing the information in the original manuscript. For all electrophysiological experiments, data were obtained from more than 3 experimental animals. The figure legends were updated to include the number of independent animals used for each panel.

      (2) For the cell culture experiments analyzing dendritic puncta at GABAergic synapses, the number of data points analysed appears to be the number of dendritic segments quantified, regardless of whether they originate from the same neuron or not. This analysis method is not valid, since dendritic segments from the same neuron cannot be counted as statistically independent samples. The authors need to average the values for all dendritic segments from one neuron, such that one neuron equals one data point. This alteration should be made for Figures 2B, 2D, 4H, 4J, 5B, 5C, 5E, 5J, 5L, 6B, 6D, 6F, 6H, 6J, 6K,7B, and 7D. In addition, the number of independent cultures from which the neurons were obtained should be stated for each panel. At least 3 independent cultures should be used in each group, otherwise, more data need to be added.

      Thanks for the criticism. We reanalyzed the data throughout the manuscript as suggested and updated the figure legends accordingly. Moreover, we increased the number of neurons from independent experiments to further confirm the results in our revised manuscript.

      In the revised manuscript, we averaged the values for all dendritic segments from a single neuron and updated the data in Figure 3B, 3D, 4H, 4J, 5B, 5C, 5E, 5K, 5M, 6B, 6D, 6F, 6H, 6J, 6K,7B, and 7D.

      Neurons analyzed in each group were derived from at least 3 independent cultures. Due to very low efficiency of sparse transfection in primary cultured hippocampal neurons, multiple experimental repetitions were necessary to obtain the sufficient number of neurons for analysis. We described statistical analysis in “Material and Methods” section in the original manuscript as follows:

      “For all biochemical, cell biological and electrophysiological recordings, at least three independent experiments were performed (independent cultures, transfections or different mice).”

      (3) Individual data points should be shown on all graphs, particularly in Figures 2C, 2F, 2I, 3F, 3K, and 3L.

      Thank you for the suggestion. We replaced the original graphs with scatterplots and mean ± S.E.M. in new Figures.

      (4) For each experiment, the authors should state explicitly in the methods section whether that experiment was conducted blind to genotype.

      Thank you for the suggestion. We have modified the description of blind analysis for each experiment in methods section to “Seizure susceptibility was measured blindly by rating seizures on a scale of 0 to 7 as follows…”, “Quantification of immunostaining were carried out blindly…” in our revised manuscript.

      (5) For each experiment, the authors should state whether they used male or female mice, and what age the mice were at the time of the experiment

      Thanks a lot for the suggestion. We usually use male and female mice for neuron culture and behavioral test. We observed no sex-related differences in PTZ-induced behaviors, so the results were pooled together.

      For mice ages, P0 pups were used for hippocampal neuron cultures and virus injection in electrophysiological recording assays or FingR probes assays. P14-21 mice were used for electrophysiological recording, immunofluorescent staining and FingR probes detection in brain slice, while adult mice (P60) for behavioral tests, immunofluorescent staining in brain slice and biochemical assays. We have modified the description in genders and ages of mice in methods section to “To evaluate seizure susceptibility, 8-10-week-old male and female EndoA1<sup>+/+</sup> or EndoA1<sup>-/-</sup> littermates or EndoA1<sup>fl/fl</sup> littermates were intraperitoneally administered… ”, “For virus injection, 8-9-week-old naive male and female littermates were anesthetized…”, “Male and female littermates (P21 or P60) were anesthetized and immediately perfused…”, “Hippocampi of female or male pups (P0) were rapidly dissected under sterile conditions…”, “PSD fractions from adult mouse brain were prepared as previously described…”, “Newborn EndoA1<sup>fl/fl</sup> littermates (male or female) were anesthetized on ice for 4-5 min…” in our revised manuscript.

      (6) For each experiment involving WT and KO mice, please state whether WTs and KOs were bred as littermates from heterozygous breeders

      Sorry for the confusion. In our study, EndoA1<sup>+/+</sup> and EndoA1<sup>-/-</sup> mice were bred as littermates from heterozygous breeders. We added the information in methods section as follows in our revised manuscript, “EndoA1<sup>+/+</sup> and EndoA1<sup>-/-</sup> mice were bred as littermates from heterozygous breeders…”, “To evaluate seizure susceptibility, 8-10-week-old male and female EndoA1<sup>+/+</sup> or EndoA1<sup>-/-</sup> littermates or EndoA1<sup>fl/fl</sup> littermates…”, “For virus injection, 8-9-week-old naive male and female littermates were anesthetized…”, “Male and female littermates (P21 or P60) were anesthetized and immediately perfused…”, “For co-IP from brain lysates, the whole brain from 8-10-week-old WT and KO littermates were dissected…”, “Newborn EndoA1<sup>fl/fl</sup> littermates (male or female) were anesthetized on ice for 4-5 min…”.

      (7) For experiments comparing three or more groups, the authors claim in the methods section to have used a one-way ANOVA for statistical analysis. However, no ANOVA values are given, only the post-hoc tests. Please add the ANOVA values for each experiment before stating the values of the post-hoc analysis.

      Sorry for the missing information. We used one-way ANOVA for comparing three or more groups in the original manuscript and have changed to two-way ANOVA for behavior data analysis in our revised manuscript as suggested in Recommendations (18). We added the ANOVA values (F & p values) for each experiment in new figures. For example, see Figure 1C.

      (8) In Figure 1A-C, seizure susceptibility was compared in EEN+/+ and EEN-/- mice, but the methods section states that seizure susceptibility was evaluated in 8-10-week-old male C57BL/6N mice (line 513). Was this meant to indicate that the EEN+/+ and EEN-/- mice were on a C57BL/6N background? How does this match with the statement that EEN1 -/- mice were generated on a C57BL/6J background (line 467)?

      We apologize for the mistake. In our study, EEN1<sup>-/-</sup> mice were generated on a C57BL/6J background, as stated in our previously published papers (Yang et al., 2021; Yang et al., 2018) and in “Animals” in Material and Methods of our original manuscript. We had corrected the statement to “To evaluate seizure susceptibility, 8-10-week-old male and female EndoA1<sup>+/+</sup> or EndoA1<sup>-/-</sup> littermates…” in Material and Methods of the revised manuscript.

      (9) In the electrophysiology experiments in Figure 1E-O, it is not clear to me which neurons were recorded in the control group. The methods section states that "Whole-cell recordings were performed on an AAV-infected neuron and a neighboring uninfected neuron" (line 736). However, the figure legends states that recordings were obtained from "10 control (Ctrl, mCherry alone) and 10 EEN1 KO (mCherry and Cre) pyramidal neurons" (line 1079), which would indicate that the controls are not uninfected neurons from the same animal, but AAV-mCherry infected neurons from a different animal. Please clarify which of the two descriptions is accurate.

      Thanks for catching the error! In all electrophysiological experiments, a neighboring uninfected neuron was used as the control in Figure 1E-O. This was incorrectly stated in the figure legend of the original manuscript. In the revised manuscript, the information has been corrected in figure legends of new Figure 1 (E-F).

      (10) The authors show that in Endophilin A1 KO animals, eIPSCs are reduced, but mIPSC frequency and amplitude are unaltered. How do they explain this finding in the context of the fact that gephyrin and GABAAR1.

      We apologize for the confusion about the data of electrophysiological recording. Compared with eIPSC, which are recorded in the presence of electrically evoked action potential that elicited a substantial release of neurotransmitter, mIPSCs are small, spontaneous currents recorded in the presence of TTX during patch-clamp experiments, resulting from the release of neurotransmitters from presynaptic terminals in the absence of action potential. The amplitude of mIPSCs typically reflects the quantal release of neurotransmitters, while their frequency can vary depending on synaptic activity and the state of the neuron.

      A number of molecules fine-tune presynaptic neurotransmitter release and functions of inhibitory postsynaptic receptors. In our study, inhibitory postsynapses were partially affected in endophilin A1 knockout neurons, while presynaptic endophilin A1 remained intact during electrophysiological recordings. Conceivably, the observed deficits in endophilin A1 knockout mice were mild. Following endophilin A1 depletion, inhibitory postsynaptic receptors appeared sufficient to respond to spontaneous neurotransmitter release but may be inadequate to large amounts of neurotransmitter release evoked by action potential. Meanwhile, spontaneous synaptic activity and the state of the neuron were not obviously affected under basic state by endophilin A1 depletion during postnatal stages. Consequently, mIPSC frequency and amplitude remain unaltered but eIPSCs were reduced compared to the control neurons. This finding was consistent with behavioral experiments, where aggressive epileptic behaviors were induced by PTZ rather than spontaneous epilepsy in endophilin A1 knockout mice.

      (11) Distribution of gephyrin, VGAT, and GABAARg2 differs substantially between the different layers of hippocampal area CA1, and the same goes for the other regions of the hippocampus. However, in Figure 2, it is not clear to me from the sample images which layers of each subregion the authors quantified, or indeed whether they paid attention to which layers they included in their analysis. This can lead to a substantial skewing of the data if different layers were preferentially included in the two genotypes. Please clarify which layers were analysed, and how comparability between WTs and KOs was ensured. This is particularly important given the authors' claim that Endophilin A1 acts equally at all subtypes of GABAergic synapses (lines 373- 376).

      Thanks for the cautiousness! We distinguished each hippocampal subregion based on the anatomical structure in brain slices. Quantification of fluorescent mean intensity of each synaptic protein in all layers of each subregion, as shown in new Figure 2 and Figure S2A-F, revealed that GABAergic synaptic proteins were impaired in both P21 and P60 KO mice.

      We further analyzed the fluorescent signal of core postsynaptic component, gephyrin, in individual layers of each subregion in the hippocampus of mature WT and KO mice, as presented in new Figures S2G-H. Our findings demonstrated a decrease in gephyrin levels across all layers of each subregion in KO mice. Additionally, we examined gephyrin clustering across the soma, axon initial segment (AIS), and dendrites in cultured mature endophilin A1 knockout hippocampal neurons, as shown in new Figure S5E-H. The results showed that gephyrin was affected in all subcellular regions following endophilin A1 knockout.

      Collectively, these data suggest that endophilin A1 functions across all subtypes of GABAergic postsynapses.

      (12) In Figure 3E-F, the authors state that there was no change in the total level of synaptic neurons in EEN1 KO neurons (line 188). However, there is no quantification of the total level of synaptic neurons shown, and based on the immunoblot in Figure 3E, it looks like there is a substantial reduction in NR1, NL2, and g2. The authors should present a quantification of the total levels of these proteins and adjust their statement accordingly if necessary.

      Thanks a lot for your comments. We quantified the total protein levels in Figure 3E and added the result to new Figure 3F, showing that total protein levels were not obviously affected in cultured KO neurons. When normalized to total protein levels, the surface levels of GABA<sub>A</sub> receptors were significantly compromised compared to surface GluN1 and NL2. Furthermore, the total protein levels were not affected in brains of KO mice, as shown in Figures 3K (input) and 3L (S1). Collectively, there was no change in the total level of synaptic proteins in KO neurons.

      (13) In Figure 3G-I, the authors claim, based on super-resolution images as presented here, that Endophilin A1 colocalizes with gephyrin and g2. However, no quantification of this colocalization is presented. The authors should add this quantification to support their claim and indicate how many GABAergic synapses contain Endophilin A1.

      Thank you for the thoughtful comments. The resolution of the images is significantly improved by super-resolution microscopy. As a result, the overlap between the two proteins will become smaller or even disappear. Since no two proteins can occupy the same physical space, they would show lower colocalization and instead exhibit proximal localization. As expected, in Figures 3G and 3H, we observed only small overlap or proximal localization of endophilin A1 with gephyrin or GABA<sub>A</sub>R γ2. To further confirm the localization of endophilin A1 in inhibitory synapses, we co-stained endophilin A1 with both pre- and post-synaptic proteins, gephyrin and Bassoon. Then we quantified the colocalization of endophilin A1 with gephyrin or with Bassoon using the method for super-resolution images described in the reference (Andrew D. McCall. Colocalization by cross-correlation, a new method of colocalization suited for super-resolution microscopy. McCall BMC Bioinformatics (2024) 25:55). The percentage of gephyrin or Bassoon puncta that were in close proximity with endophilin A1 was also calculated, as shown in new video 5 and new Figure S4B-G. These data have been added in the revised manuscript as follows, “We further detected the localization of endophilin A1 to inhibitory synapses by co-immunostaining with both pre- and post-synaptic markers (Figure. S4B and Video 5). Quantitative analysis of super-resolution localization maps revealed that ~ 47 % puncta of gephyrin or Bassoon were proximal to endophilin A1 (Figure. S4G, n \= 14), with a mean distance between endophilin A1- and gephyrin-positive pixels of ∼ 120 nm, or between endophilin A1- and Bassoon-positive pixels of ∼ 130 nm (Figure. S4C-F).”

      (14) In the quantification shown in Figure 3K-L, there are no error bars in the WT data sets. This presumably means that all values were normalized to WT. However, since this artificially eliminates the variance in the WT group, a t-test is no longer valid, since this assumes a normal distribution and normal variance, which are no longer given. The authors should either change the way they normalize their data to maintain the variance in the WT group or perform a different statistical test that can account for the artificial lack of variance in one of the groups.

      Thank you for the suggestions! We modified our analysis approach. Specifically, we used mean value of WTs to normalize data to preserve the variance in the WT group and performed unpaired t-tests to assess statistical significance in Figure 3K-L. Additionally, we replaced the bar graphs with modified graphs showing individual data points. Please see Response to Recommendation (12).

      (15) What is the difference between the coIP experiment in Figure 4E and 3J, right panel? In both cases, an Endophilin A1 IP is performed, and gephyrin, GABAARg2, and GABAARa1 are assessed. However, Figure 3J's right panel indicates that Endophilin A1 does interact with the GABAAR subunits, whereas Figure 4E shows that it does not. How do the authors explain this discrepancy? Were these experiments performed more than once?

      Sorry for the confusion. Figure 3J and Figure 4E show data from immunoisolation assay and conventional co-immunoprecipitation (co-IP), respectively. Immunoisolation allows for the rapid and efficient separation of subcellular membrane compartments using antibodies conjugated to magnetic beads. In Figure 3J, we used antibodies against GABA<sub>A</sub>R α1 subunit or endophilin A1 to isolate the inhibitory postsynaptic membranes or endophilin A1-associated membranous compartments. In contrast, co-immunoprecipitation detects direct protein-protein interactions in detergent-solubilized lysates. For Figure 4E, we applied antibodies against endophilin A1 to precipitate its interaction partners. The results in Figure 3J and Figure 4E demonstrate that endophilin A1 is localized in the inhibitory postsynaptic compartment and directly interacts with gephyrin, but not with GABA<sub>A</sub>Rs. Detailed information regarding the methods used for co-IP and immunoisolation can be found in “GST-pull down, co-immunoprecipitation (IP), and immunoisolation” in the “Material and Methods” section of original manuscript.

      These experiments were repeated multiple times to ensure reliability. In fact, consistent data showing endophilin A1 localization in the inhibitory postsynaptic compartment were observed in Figure 3K, showing the quantified data as well.

      (16) For the colocalization analysis in Figure 5A-C, what percentage of gephyrin puncta contain g2 in the WT and Endophilin A1 KO? Currently, only a correlation coefficient is provided, but not the degree of overlap. Please add this information to the figure.

      Thanks for the comments on the colocalization analysis. We analyzed the percentage of gephyrin puncta overlapping with GABA<sub>A</sub>R γ2 and added the graphs in new Figure 5C.

      (17) Figure 6 investigates how actin depolarization affects GABAergic synapse function, but does not assess how Endophilin A1 contributes to this process. The authors then provide an extremely short statement in the discussion, stating that their data are contradictory to a previous study (lines 412 - 417). This section of the discussion should be expanded to address the specific role of Endophilin A1 in the consequences of actin depolymerization.

      Thanks a lot for the advice. In the original manuscript, we discussed the specific role of endophilin A1 at inhibitory postsynapses as follows in Discussion:

      “As membrane-binding and actin polymerization-promoting activities of endophilin A1 are both required for its function in enhancing iPSD formation and g2–containing GABA<sub>A</sub>R clustering to iPSD, we propose that membrane-bound endophilin A1 promotes postsynaptic assembly by coordinating the plasma membrane tethering of the postsynaptic protein complex and its stabilization with the actin cytomatrix”

      Following your advice, we added a statement in the revised manuscript addressing the role of endophilin A1 in actin polymerization at inhibitory postsynapses, shown as follows, “In the present study, the impaired clustering of gephyrin and GABA<sub>A</sub> γ2 by F-actin depolymerization underscores the essential role of F-actin in the assembly and stabilization of the inhibitory postsynaptic machinery. Membrane-bound endophilin A1 promotes F-actin polymerization beneath the plasma membrane through its interaction with p140Cap, an F-actin regulatory protein, thereby facilitating and/or stabilizing the clustering of gephyrin and γ2-containing GABA<sub>A</sub> ​receptors at postsynapses.”

      (18) Which statistical analysis was conducted in Figure 7F? Given the nature of the data, a repeated measures ANOVA would be necessary to accurately assess the statistical accuracy.

      Sorry for the confusion. We conducted one-way ANOVA followed by Tukey post hoc test at each time point in original Figure 7F. We have employed the method of repeated measures ANOVA followed by Tukey post hoc test as suggested in new Figure 7F. Meanwhile, we reanalyzed data in new Figure 1C with the same method. We also modified the description in “Statistical analysis” and Figure legends for new Figure1C and 7F in revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      Data presentation:

      (1) Figures 2A, B, D, E, G, H. Figures S2A, B, D:

      Add P21 or P60 labels to these figures so that the difference between similarly stained samples (e.g. Figures 2A, B) is obvious to the reader.

      Thanks! We added “P21” or “P60” labels in new Figure 2 and Figure S2 as suggested.

      (2) Figures 4C, D:

      The authors must make their coIP data annotation consistent. In Figure 4C, they use actual microgram amounts when, e.g., describing how much input was present, yet in Figure 4D they use + and -. The authors should pick one.

      Thanks for the comments. We labeled the consistent data annotation in new Figure 4C and 4D, we also changed the label in 4F for the consistent data annotation.

      (3) Figure 5A

      GFP is gray in this figure, but in all other figures, it is blue. Consider changing for presentation reasons.

      Thanks a lot for pointing out the problem. We replaced gray with blue color to indicate GFP in new Figure 5A.

      (4) Figures 6A, C, E, G

      Label graphs as either short-term or long-term drug treatment.

      Thanks for the suggestion. We labeled the graphs as 60 min for short-term or 120 min for long-term drug treatment in new Figure 6A, C, E, G for convenient reading.

      Annotation, grammar, spelling, typing errors:

      (1) Figure 4G:

      Merge and GFP labels are seemingly swapped.

      Thanks a lot for sharp eye. We corrected the labels in new Figure 4G.

      (2) Fig 4I:

      The authors use "Gephryin" instead of GPN. They should be consistent and choose one.

      Sorry for the mistake. We changed the label consistent with other figures in new Figure 4I and rearranged the images in figures for good looking.

      (3) "One-hour or two-hour treatment of mature neurons with nocodazole..."

      Thanks for your advice. We modified the sentence to “Treatment of mature neurons with nocodazole, a microtubule depolymerizing reagent, for one hour (short-term) or two hours (long-term), caused…”.

      (4) The authors should indicate that one-hour is their short-term treatment and that two-hour is their long-term treatment so that when these terms are used later to describe LatA experiments, it is clearer to the reader.

      Thanks for your comments. We modified the statement as seen in Response to Recommendation (3), it is clearer to the reader.

      (5) EEA1. The authors should use a more conventional term EndoA1 so that the manuscript can be searched easily.

      Thanks a lot for the suggestion. We replaced all of the term “EEN1” with “EndoA1” in the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      Major Points

      (1) The number of observations for the electrophysiology experiments in Figure 1 (dots are neurons) is very low and it is not clear whether the data shown is derived from different mice. The same criticism applies to the data shown in Figures 7G-K.

      We apologize for the low neuron number in electrophysiology experiments. In the patch-clamp experiments, the number of neurons recorded was higher than what is shown in the figures. However, neurons with a membrane resistance (Rm) below 500 MΩ, indicating unstable seals or poor conditions, were excluded from the analysis. Additionally, we added the number of mice from which the data derived in each group in the figure legends for Figure 1, 7 and S1, this point was also raised by Reviewer #1 (Please see Response to Recommendation (1)).

      (2) Images in Figure 2 are shown at low magnification, statements on changes in intensity of inhibitory synaptic markers in the hippocampal region are impossible to interpret. Analysis of inhibitory synapses in vivo would require sparse neuronal labeling and 3D reconstruction, for instance using gephyrin-FingRs (Gross et al., Neuron 2013).

      Thanks for your insightful suggestion. We obtained pCAG_PSD95.FingR-eGFP-CCR5TC and pCAG_GPN.FingR-eGFP-CCR5TC constructs from Addgene (plasmid # 46295 & #46296). We attempted in utero electroporation (IUE) to introduce the DNAs into cortical neurons or hippocampal neurons at E14.5, unfortunately with no success. Following the repetitive operation for numerous times, we could eventually obtain newborn pups of ICR mice after IUE. However, we failed to obtain any newborn pups of C57BL/6J mice due to abortion following the procedure. Furthermore, pregnant C57BL/6J mice (WTs or KOs) did not survive or remained in a poor state of health after surgery. Therefore, we were unable to analyze synapses through sparse labeling and 3D reconstruction by IUE. Alternatively, we obtained commercial AAVs carrying rAAV-EF1a-PSD95.FingR-eGFP-CCR5TC and rAAV-EF1a-mRuby2-Gephyrin.FingR-IL2RGTC, then injected into the CA1 region of EndoA1<sup>fl/fl</sup> mice at P0. Mice were fixed and detected the fluorescent signals in CA1 regions at P21. Consistent with immunostaining with antibodies, decreased mRuby2-Gephyrin.FingR or PSD95.FingR-eGFP was observed in dendrites of KO neurons at P21, as shown in new Figure S3. In combination with electrophysiological recording, PSD fractionation and immunoisolation from brains, these data support our conclusion regarding the effects of endophilin A1 knockout on the inhibitory synapses.

      Additionally, we transfected DIV12 cultured hippocampal neurons with pCAG_PSD95.FingR-eGFP-CCR5TC or pCAG_GPN.FingR-eGFP-CCR5TC and observed fluorescent signals on DIV16. Both the signal intensity and number of GPN.FingR-eGFP clusters were also significantly attenuated, with no obvious changes in PSD95.FingR-eGFP clusters in dendrites of mature neurons, as shown in new Figure S5A-D. We are very pleased that the result further strengthened our original conclusion. We have added the new pieces of data in our revised manuscript.

      (3) Figure 3: surface labeling of GluA1 or the GABAAR gamma 2 subunit is difficult to interpret: the patterns are noisy and the numerous puncta appear largely non-synaptic although this is difficult to judge in the absence of additional synaptic markers. It appears statistics are done on dendritic segments rather than the number of neurons. The legend does not mention how many independent cultures this data is derived from. In their previous study (Yang et al., Front Mol Neurosci 2018), the authors noted a decrease in surface GluA1 levels in the absence of endophilin A1. How do they explain the absence of an effect on surface GluA1 levels in the current study?

      Sorry for the concern and thanks for your comments. First, we assessed changes in the surface levels of excitatory and inhibitory receptors by co-immunostaining in cultured WT and KO hippocampal neurons. Given the very low transfection efficiency of neurons in high density culture, numerous puncta of receptors from adjacent non-transfected neurons were also detected. This approach may contribute to the noisy pattern observed in Figure 3A. Besides, the projections of z-stack for higher magnified dendrites may likely introduced higher background signals. We have now replaced the original images with the newest repeat in new Figure 3A. Moreover, we confirmed a decrease in the surface expression of GABA<sub>A</sub>R γ2 by the biotinylation assay, as shown in Figure 3E. Indeed, we agree that some puncta for surface labeling of receptors seemed to be non-synaptic localization. In order to reflect the decrease in synaptic proteins at synapses, we isolated PSD fraction by biochemical assay and found that gephyrin and GABA<sub>A</sub>R γ2, two major inhibitory postsynaptic components, were reduced in the PSD fraction from KO brains, as shown in Figure 3L. Their colocalization was also attenuated in the absence of endophilin A1, as shown in Figure 5A-C. Combined with electrophysiological recording, these data from multiple assays indicate GluA1 at synapses was not obviously affected but GABA<sub>A</sub>R γ2 at synapses was impaired in endophilin A1 KO neurons in the present study.

      We have corrected the way that the number of samples is defined for statistical analysis as suggested. This point was also raised by Reviewer #1 (Recommendation (2)). We averaged the values from all dendritic segments of a single neuron, such that one neuron equaled one data point. We had replaced the original Figure 3B and 3D (please see Response to Recommendation (2) by Reviewer #1). Additionally, we added the number of independent cultures these data were derived from to figure legends in revised manuscript.

      Previously, we observed a small decrease in surface GluA1 levels in spines under basal conditions and a more pronounced suppression of surface GluA1 accumulation in spines upon chemical LTP in endophilin A1 KO neurons from EndoA1<sup>-/-</sup> mice that knockout endophilin A1 since embryonic development stages (Figure 5C,H. Yang et al., Front Mol Neurosci, 2018). In Figure 3A and B in current study, we analyzed surface receptor levels in GFP-positive dendrites, rather than spines, under basal conditions when endophilin A1 was depleted at the later developmental stage. We found a decrease in surface GABA<sub>A</sub>R γ2 levels but no significant effects on surface GluA1 levels in dendrites. These findings indicate that endophilin A1 primarily affects excitatory synaptic proteins in spines during synaptic plasticity and inhibitory synaptic proteins in dendrites under basal conditions in mature neurons.

      (4) Super-resolution images in Figure 3G, H, I: endophilin A1 puncta look different in panel 3I compared to 3G and 3H, which are very noisy. It is difficult to interpret how specific these EEN1 puncta are. Previous images showing EEN1 distribution in dendrites look different (Yang et al., Front Mol Neurosci 2018); is the same KO-verified antibody being used here? Colocalization of EEN1 with gephyrin or the GABAAR gamma 2 subunit is difficult to interpret; gephyrin mostly does not seem to colocalize with EEN1 in the example shown.

      Sorry for your concerns. As stated previously in Major Points (3), transfection efficiency was very low in cultured neurons and our cultured neurons were at relative high density. As a result, numerous puncta of proteins located in the adjacent non-transfected neurons were also detected, which may contribute to noisy signals observed in Figure 3G-I.

      In our previous paper, we confirmed the specificity of the antibody against endophilin A1 (5A,B. Yang et al., Front Mol Neurosci, 2018). We used the same antibody (rabbit anti-endophilin A1, Synaptic Systems GmbH, Germany) in the current study. While the previous images were obtained using confocal microscopy, the current images in Figures 3G, H, and I were acquired using super-resolution microscopy (SIM). The different patterns observed in the dendrites may be attributed to the difference in image resolution, antibodies dilution and reaction time.

      Reviewer #1 also points out the quantification of colocalization of gephyrin and GABA<sub>A</sub>R γ2 with endophilin A1. Please see Response to Recommendation (13) by Reviewer #1.

      (5) The interaction of gephyrin and endophilin A1 is based on coIP experiments in cells and brain tissue. To convincingly demonstrate that these proteins interact, biophysical experiments with purified proteins are necessary.

      Thanks a lot for your great suggestions on the interaction of endophilin A1 with gephyrin. To convincingly demonstrate their interaction, we performed pull-down assay with purified recombinant proteins and the result shows that both G and E domains of gephyrin were involved in the interaction with endophilin A1. The data has been added to the revised manuscript as new Figure 5I. We also modified the statement about the data and figure legends in the revised manuscript.

      (6) Figure 4G: the gephyrin images are not convincing; the inhibitory postsynaptic element typically looks somewhat elongated; these puncta are very noisy and do not appear to represent iPSDs. The same criticism applies to the images shown in Figures 5 and 7.

      Thanks for the comment. The gephyrin puncta in our images exhibited heterogeneous shapes and sizes, with some appearing somewhat elongated. To address this, we compared the puncta pattern of gephyrin with that shown in the reference. As illustrated in the figure from the reference, gephyrin puncta also displayed distinct shapes and sizes, Figure 3A-F, Neuron 78, 971–985, June 19, 2013). Please note that the images were z-stack projections at higher magnification, as described in the "Materials and Methods" section. This approach may likely introduce higher background signals and may contribute to the much more heterogeneous appearance of the puncta in Figures 4, 5, and 7. As mentioned previously, the numerous gephyrin puncta located in the adjacent non-transfected neurons may also contribute to some of the noisy signals observed. We have replaced the original images with new images in new Figure 4G, 5 and 7.

      Moreover, in order to confirm the effects of endophilin A1 KO on the gephyrin clustering, we also detected the endogenous clusters of gephyrin or PSD95 visualized by GPN.FingR-eGFP or PSD95.FingR-eGFP in cultured mature neurons. The results were consistent with immunostaining with antibodies against gephyrin. Please see Response to Recommendation (2)

      (7) Figure 7E, F: the rescue (Cre + WT) appears to perform better than the control (mCherry + GFP) in the PTZ condition; how do the authors explain this? Mixes of viral vectors were injected, would this approach achieve full rescue?

      Thanks for the thoughtful comment. Mixed viruses were injected bilaterally into the hippocampal CA1 regions. The results showed a full rescue effect by WT endophilin A1 in knockout mice during the early days, with even a little bit better rescue effect than the control group in the later days under the PTZ condition, as shown in Figures 7E and 7F. In the current study, overexpression of endophilin A1 increased the clustering of gephyrin and GABA<sub>A</sub>R γ2 in cultured neurons, as shown in Figures 4I-J and 5D-E. Presumably, the slightly better rescue effects observed in the behavioral tests was likely attributed to the enhanced clustering and/or stabilization of gephyrin/GABA<sub>A</sub>R γ2 by WT endophilin A1 expression in KO neurons in vivo. Moreover, the electrophysiological recording also showed full rescue effects on eIPSC by WT endophilin A1 in KO neurons (Figure 7G-K).

      Minor Points

      (1) The authors mention that they previously found a decrease in eEPSC amplitude in EEN1 KO mice (Yang et al., Front Mol Neurosci 2018). The data in Fig. 1E suggests a decrease in eEPSC amplitude but is not significant here, likely due to the small number of observations. If both eEPSC and iEPSC amplitude are reduced in the absence of EEN1. Would the E/I ratio still be significantly changed?

      We apologize for the confusion. In our previous study, AMPAR-mediated excitatory postsynaptic currents (eEPSCs) were found to be slightly but significantly reduced compared to the control group, while NMDAR-mediated excitatory postsynaptic currents showed no significant difference (Figure 4N,O. Yang et al., Front Mol Neurosci, 2018). In the current study, we adopted a different recording protocol, simultaneously measuring eEPSCs and eIPSCs from the same neuron to calculate the E/I ratio. Unlike previous studies, we did not use inhibitors to suppress GABA receptor activity. As a result, the recorded signals did not distinguish AMPAR-mediated or NMDAR-mediated excitatory postsynaptic currents to reflect total eEPSCs, which may explain the non-significant reduction observed compared to control neurons in this study.

      It is possible that the eEPSC amplitude would show a significant reduction if a larger number of neurons were recorded. Nevertheless, the larger suppression of eIPSCs in the absence of endophilin A1 indicates that the E/I ratio is significantly altered.

      (2) Page 7: the authors mention they aim to exclude effects on presynaptic terminals of deleting endophilin A1 in cultured neurons, is this because of a sparse transfection approach?

      Please clarify.

      Sorry for the confusion. In cultured neurons, we always observed sparse transfection due to the very low transfection efficiency (~ 0.5%). Therefore, we could examine the effects of endophilin A1 knockout specifically in the specific CamKIIa promoter-driven Cre-expressing postsynaptic neurons, while endophilin A1 remained intact in the non-transfected presynaptic neurons.

      (3) The representative blot of the surface biotinylation experiment (Figure 3E) suggests that loss of endophilin A1 also affects GluN1 and Nlgn2 levels, and error bars in panel 3F (lacking individual data points) suggest these experiments were highly variable.

      Sorry for the confusion. Reviewer #1 also raised the question and we quantified the total level of GluN1 and NL2 in Figure 3E. And we replaced the original graphs with scatterplots and means ± S.E.M. Please see the Response to Recommendation (3) & (12) by Reviewer #1.

      (4) Have other studies analyzing inhibitory synapse composition identified endophilin A1 as a component? The rationale for this study seems to be primarily based on the presence of epileptic seizures and E/I imbalance.

      Thank you for your questions. To date, no other studies investigated endophilin A1 as an inhibitory postsynaptic component. We observed the proximal localization of endophilin A1 with inhibitory postsynaptic proteins using super-resolution microscopy (SIM) and quantification results showed ~ 47% puncta of gephyrin correlated with endophilin A1 (Figure 3G-I and S4B-G). We further immunoisolated the inhibitory postsynaptic fraction using GABA<sub>A</sub> receptors and found that endophilin A1 was present in the isolated fraction, and vice versa (Figure 3J). Additionally, we demonstrated that endophilin A1 directly interacted with gephyrin through co-IP and pull-down assays (Figure 5J-I). Together with data from immunolabeling, biochemical assays, electrophysiological recordings, and behavioral tests, these results identified endophilin A1 as an inhibitory postsynaptic component.

      (5) Figure 3J: what are S100 and P100 labels? Is Nlgn2 part of the EEN1 complex? If it is, why are Nlgn2 surface levels not affected by EEN1 loss (Figure 3E, F, K)? Why does EEN1 not interact with Nlgn2 in HEK cells (Figure 4D)?

      Sorry for the confusion. The detailed information regarding S100 and P100 can be found in the “GST-pull down, co-immunoprecipitation (IP), and immunoisolation” in the “Materials and Methods” section. S100 contains soluble proteins, while P100 refers to the membrane fraction after high speed (100,000xg) centrifugation.

      Figures 3J-K and 4C-F showed the data from immunoisolation and conventional co-immunoprecipitation assays, respectively. Immunoisolation, which uses antibodies coupled to magnetic beads, allows for the rapid and efficient separation of subcellular membrane compartments. In Figure 3J-K, we used antibodies against GABA<sub>A</sub>R α1 to isolate membrane protein complexes from the inhibitory postsynaptic fraction. In contrast, co-immunoprecipitation typically detects direct interactions between proteins solubilized by detergent treatment. For Figure 4C-F, FLAG beads were used in HEK293 lysates, or antibodies against endophilin A1 were employed in brain lysates to precipitate direct interaction partners. Combined with the results from Figure 3J-L, the data in 4C-F indicated that endophilin A1 was localized in the inhibitory postsynaptic compartment and directly bound to gephyrin but not to either GABA<sub>A</sub> receptors or Nlgn2 (NL2). This binding promoted the clustering of gephyrin and GABA<sub>A</sub>R γ2 at synapses, facilitating GABA<sub>A</sub>R assembly.

      Nlgn2 (NL2) is a key inhibitory postsynaptic component but does not directly bind to endophilin A1. Consequently, endophilin A1 failed to co-immunoprecipitate with NL2 in the presence of detergent in HEK293 cell lysates (Figure 4D). Furthermore, the surface levels of NL2 or its distribution in PSD fraction were unaffected by the loss of endophilin A1 (Figure 3E, F, K, L). This suggests that mechanisms independent of endophilin A1 orchestrate the surface expression and synaptic distribution of NL2.

      (6) How do the authors interpret the finding that endophilin A1, but not A2 or A3, binds gephyrin? What could explain these differences?

      Thanks for the thoughtful comment. Endophilin As contain BAR and SH3 domains. While the amino acid sequences in the BAR and SH3 domains are highly conserved, the intrinsically disordered loop region between BAR and SH3 domains is highly variable. A study by the Verstreken lab revealed that a human mutation in the unstructured loop region of endophilin A1 increases the risk of Parkinson's disease. They also demonstrated that the disordered loop region controls protein flexibility, which fine-tunes protein-protein and protein-membrane interactions critical for endophilin A1 function (Bademosi et al., Neuron 111, 1402–1422, May 3, 2023). Our previous study showed that endophilin A1 and A3, but not A2, bind to p140Cap through their SH3 domains, despite the high sequence homology in the SH3 domains among these proteins (Figure2A,B. Yang et al., Cell Research, 2015). These findings indicate that each endophilin A likely interacts with specific partners due to distinct key amino acids.

      Additionally, endophilin A1 is expressed at much higher levels than A2 and A3 in neurons, with distinct distribution of them across different brain regions. Our lab demonstrated that the function of A1 at postsynapses (both excitatory and inhibitory synapses) cannot be compensated by A2 or A3. Therefore, it is reasonable that endophilin A1, rather than A2 or A3, binds to gephyrin, even though the underlying mechanisms remain unclear.

      (7) Figure 4G: panels are mislabeled (GFP vs merge).

      Thanks for careful reading and sorry for the mistake. We corrected the label in new Figure 4G. Please see Response to Annotation, grammar, spelling, typing errors:(1) by Reviewer #2.

    1. eLife Assessment

      The manuscript by Ross, Miscik, and others describes an intriguing series of observations made when investigating the requirement for podxl during hepatic development in zebrafish. Understanding how genetic compensation pathways are involved in gene function is an important question. However, there is incomplete evidence provided in the manuscript at this point to conclude that discrepancies between observed phenotypes are due to genetic compensation.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Ross, Miscik, and others describes an intriguing series of observations made when investigating the requirement for podxl during hepatic development in zebrafish. Podxl morphants and CRISPants display a reduced number of hepatic stellate cells (HSCs), while mutants are either phenotypically wild type or display an increased number of HSCs.

      The absence of observable phenotypes in genetic mutants could indeed be attributed to genetic compensation, as the authors postulate. However, in my opinion, the evidence provided in the manuscript at this point is insufficient to draw a firm conclusion. Furthermore, the opposite phenotype observed in the two deletion mutants is not readily explainable by genetic compensation and invokes additional mechanisms.

      Major concerns:

      (1) Considering discrepancies in phenotypes, the phenotypes observed in podxl morphants and CRISPants need to be more thoroughly validated. To generate morphants, authors use "well characterized and validated ATG Morpholino" (lines 373-374). However, published morphants, in addition to kidney malformations, display gross developmental defects including pericardial edema, yolk sack extension abnormalities, and body curvature at 2-3 dpf (reference 7 / PMID: 24224085). Were these gross developmental defects observed in the knockdown experiments performed in this paper? If yes, is it possible that the liver phenotype observed at 5 dpf is, to some extent, secondary to these preceding abnormalities? If not, why were they not observed? Did kidney malformations reproduce? On the CRISPant side, were these gross developmental defects also observed in sgRNA#1 and sgRNA#2 CRISPants? Considering that morphants and CRISPants show very similar effects on HSC development and assuming other phenotypes are specific as well, they would be expected to occur at similar frequencies. It would be helpful if full-size images of all relevant morphant and CRISPant embryos were displayed, as is done for tyr CRISPant in Figure S2. Finally, it is very important to thoroughly quantify the efficacy of podxl sgRNA#1 and sgRNA#2 in CRISPants. The HRMA data provided in Figure S1 is not quantitative in terms of the fraction of alleles with indels. Figure S3 indicates a very broad range of efficacies, averaging out at ~62% (line 100). Assuming random distribution of indels among cells and that even in-frame indels result in complete loss of function (possible for sgRNA#1 due to targeting the signal sequence), only ~38% (.62*.62) of all cells will be mutated bi-allelically. That does not seem sufficient to reliably induce loss-of-function phenotypes. My guess is that the capillary electrophoresis method used in Figure S3 underestimates the efficiency of mutagenesis, and that much higher mutagenesis rates would be observed if mutagenesis were assessed by amplicon sequencing (ideally NGS but Sanger followed by deconvolution analysis would suffice). This would strengthen the claim that CRISPant phenotypes are specific.

      (2) In addition to confidence in morphant and CRISPant phenotypes, the authors' claim of genetic compensation rests on the observation that podxl (Ex1(p)_Ex7Δ) mutants are resistant to CRISPant effect when injected with sgRNA#1 (Figure 3L). Considering the issues raised in the paragraph above, this is insufficient. There is a very straightforward way to address both concerns, though. The described podxl(-194_Ex7Δ) and podxl(-319_ex1(p)Δ) deletions remove the binding site for the ATG morpholino. Therefore, deletion mutants should be refractive to the Morpholino (specificity assessment recommended in PMID: 29049395, see also PMID: 32958829). Furthermore, both deletion mutants should be refractive to sgRNA#1 CRISPant phenotypes, with the first being refractive to sgRNA#2 as well.

    3. Reviewer #2 (Public review):

      In this manuscript, Ross and Miscik et. al described the phenotypic discrepancies between F0 zebrafish mosaic mutant ("CRISPants") and morpholino knockdown (Morphant) embryos versus a set of 5 different loss-of-function (LOF) stable mutants in one particular gene involved in hepatic stellate cells development: podxl. While transient LOF and mosaic mutants induced a decrease of hepatic stellate cells number stable LOF zebrafish did not. The authors analyzed the molecular causes of these phenotypic differences and concluded that LOF mutants are genetically compensated through the upregulation of the expression of many genes. Additionally, they ruled out other better-known and described mechanisms such as the expression of redundant genes, protein feedback loops, or transcriptional adaptation.

      While the manuscript is clearly written and conclusions are, in general, properly supported, there are some aspects that need to be further clarified and studied.

      (1) It would be convenient to apply a method to better quantify potential loss-of-function mutations in the CRISPants. Doing this it can be known not only percentage of mutations in those embryos but also what fraction of them are actually generating an out-of-frame mutation likely driving gene loss of function (since deletions of 3-6 nucleotides removing 1-2 aminoacid/s will likely not have an impact in protein activity, unless that this/these 1-2 aminoacid/s is/are essential for the protein activity). With this, the authors can also correlate phenotype penetrance with the level of loss-of-function when quantifying embryo phenotypes that can help to support their conclusions.

      (2) It is unclear that 4.93 ng of morpholino per embryo is totally safe. The amount of morpholino causing undesired effects can differ depending on the morpholino used. I would suggest performing some sanity check experiments to demonstrate that morpholino KD is not triggering other molecular outcomes, such as upregulation of p53 or innate immune response.

      (3) Although the authors made a set of controls to demonstrate the specificity of the CRISPant phenotypes, I believe that a rescue experiment could be beneficial to support their conclusions. Injecting an mRNA with podxl ORF (ideally with a tag to follow protein levels up) together with the induction of CRISPants could be a robust manner to demonstrate the specificity of the approach. A rescue experiment with morphants would also be good to have, although these are a bit more complicated, to ultimately demonstrate the specificity of the approach.

      (4) In lines 314-316, the authors speculate on a correlation between decreased HSC and Podxl levels. It would be interesting to actually test this hypothesis and perform RT-qPCR upon CRISPant induction or, even better and if antibodies are available, western blot analysis.

      (5) Similarly, in lines 337-338 and 342-344, the authors discuss that it could be possible that genes near to podxl locus could be upregulated in the mutants. Since they already have a transcriptomic done, this seems an easy analysis to do that can address their own hypothesis.

      (6) Figures 4 and 5 would be easier to follow if panels B-F included what mutants are (beyond having them in the figure legend). Moreover, would it be more accurate and appropriate if the authors group all three WT and mutant data per panel instead of showing individual fish? Representing technical replicates does not demonstrate in vivo variability, which is actually meaningful in this context. Then, statistical analysis can be done between WT and mutant per panel and per set of primers using these three independent 3-month-old zebrafish.

    4. Reviewer #3 (Public review):

      Summary:

      Ross et al. show that knockdown of zebrafish podocalyxin-like (podxl) by CRISPR/Cas or morpholino injection decreased the number of hepatic stellate cells (HSC). The authors then generated 5 different mutant alleles representing a range of lesions, including premature stop codons, in-frame deletion of the transmembrane domain, and deletions of the promoter region encompassing the transcription start site. However, unlike their knockdown experiment, HSC numbers did not decrease in podxl mutants; in fact, for two of the mutant alleles, the number of HSCs increased compared to the control. Injection of podxl CRISPR/Cas constructs into these mutants had no effect on HSC number, suggesting that the knockdown phenotype is not due to off-target effects but instead that the mutants are somehow compensating for the loss of podxl. The authors then present multiple lines of evidence suggesting that compensation is not exclusively due to transcriptional adaptation - evidence of mRNA instability and nonsense-mediated decay was observed in some but all mutants; expression of the related gene endoglycan (endo) was unchanged in the mutants and endo knockdown had no effect on HSC numbers; and, expression profiling by RNA sequencing did not reveal changes in other genes that share sequence similarity with podxl. Instead, their RNA-seq data showed hundreds of differentially expressed genes, especially ECM-related genes, suggesting that compensation in podxl mutants is complex and multi-genic.

      Strengths:

      The data presented is impressively thorough, especially in its characterization of the 5 different podxl alleles and exploration of whether these mutants exhibit transcriptional adaptation.

      Weaknesses:

      RNA sequencing expression profiling was done on adult livers. However, compensation of HSC numbers is apparent by 6 dpf, suggesting compensatory mechanisms would be active at larval or even embryonic stages. Although possible, it's not clear that any compensatory changes in gene expression would persist to adulthood.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Ross, Miscik, and others describes an intriguing series of observations made when investigating the requirement for podxl during hepatic development in zebrafish. Podxl morphants and CRISPants display a reduced number of hepatic stellate cells (HSCs), while mutants are either phenotypically wild type or display an increased number of HSCs.

      The absence of observable phenotypes in genetic mutants could indeed be attributed to genetic compensation, as the authors postulate. However, in my opinion, the evidence provided in the manuscript at this point is insufficient to draw a firm conclusion. Furthermore, the opposite phenotype observed in the two deletion mutants is not readily explainable by genetic compensation and invokes additional mechanisms.

      Major concerns:

      (1) Considering discrepancies in phenotypes, the phenotypes observed in podxl morphants and CRISPants need to be more thoroughly validated. To generate morphants, authors use "well characterized and validated ATG Morpholino" (lines 373-374). However, published morphants, in addition to kidney malformations, display gross developmental defects including pericardial edema, yolk sack extension abnormalities, and body curvature at 2-3 dpf (reference 7 / PMID: 24224085). Were these gross developmental defects observed in the knockdown experiments performed in this paper? If yes, is it possible that the liver phenotype observed at 5 dpf is, to some extent, secondary to these preceding abnormalities? If not, why were they not observed? Did kidney malformations reproduce? On the CRISPant side, were these gross developmental defects also observed in sgRNA#1 and sgRNA#2 CRISPants? Considering that morphants and CRISPants show very similar effects on HSC development and assuming other phenotypes are specific as well, they would be expected to occur at similar frequencies. It would be helpful if full-size images of all relevant morphant and CRISPant embryos were displayed, as is done for tyr CRISPant in Figure S2. Finally, it is very important to thoroughly quantify the efficacy of podxl sgRNA#1 and sgRNA#2 in CRISPants. The HRMA data provided in Figure S1 is not quantitative in terms of the fraction of alleles with indels. Figure S3 indicates a very broad range of efficacies, averaging out at ~62% (line 100). Assuming random distribution of indels among cells and that even in-frame indels result in complete loss of function (possible for sgRNA#1 due to targeting the signal sequence), only ~38% (.62*.62) of all cells will be mutated bi-allelically. That does not seem sufficient to reliably induce loss-of-function phenotypes. My guess is that the capillary electrophoresis method used in Figure S3 underestimates the efficiency of mutagenesis, and that much higher mutagenesis rates would be observed if mutagenesis were assessed by amplicon sequencing (ideally NGS but Sanger followed by deconvolution analysis would suffice). This would strengthen the claim that CRISPant phenotypes are specific.

      The reviewer points out some excellent caveats regarding the morphant experiments. We agree that at least some of the effects of the podxl morpholino may be related to its effects on kidney development and/or gross developmental defects that impede liver development. Because of these limitations, we focused our experiments on analysis of CRISPant and mutant phenotypes, including showing that podxl (Ex1(p)_Ex7Δ) mutants are resistant to CRISPant effects on HSC number when injected with sgRNA#1. We did not observe any gross morphologic defects in podxl CRISPants. Liver size was not significantly altered in podxl CRISPants (Figure 2A). We will add brightfield images of podxl CRISPant larvae to the supplemental data for the revised manuscript.

      We agree with the reviewer that HRMA is not quantitative with respect to the fraction of alleles with indels and that capillary electrophoresis likely underestimates mutagenesis efficiency. Nonetheless, even with 100% mutation efficiency, podxl CRISPant knockdown, like most CRISPR knockdowns, would not represent complete loss of function:  ~1/3 of alleles will contain in-frame mutations and likely retain at least some gene function, so ~1/3*1/3 = 1/9 of cells will have no out-of-frame indels and contain two copies of at least partially functional podxl and ~2/3*2/3 = 4/9 of cells will have one out-of-frame indel and one copy of at least partially functional podxl. Thus, the decreased HSCs we observe with podxl CRISPant likely represents a partial loss-of-function phenotype in any case.

      (2) In addition to confidence in morphant and CRISPant phenotypes, the authors' claim of genetic compensation rests on the observation that podxl (Ex1(p)_Ex7Δ) mutants are resistant to CRISPant effect when injected with sgRNA#1 (Figure 3L). Considering the issues raised in the paragraph above, this is insufficient. There is a very straightforward way to address both concerns, though. The described podxl(-194_Ex7Δ) and podxl(-319_ex1(p)Δ) deletions remove the binding site for the ATG morpholino. Therefore, deletion mutants should be refractive to the Morpholino (specificity assessment recommended in PMID: 29049395, see also PMID: 32958829). Furthermore, both deletion mutants should be refractive to sgRNA#1 CRISPant phenotypes, with the first being refractive to sgRNA#2 as well.

      The reviewer proposes elegant experiments to address the specificity of the morpholino. For the revision, we plan to perform additional morpholino studies, including morpholino injections of podxl mutants and assessment of tp53 and other immune response/cellular stress pathway genes in podxl morphants.

      Reviewer #2 (Public review):

      In this manuscript, Ross and Miscik et. al described the phenotypic discrepancies between F0 zebrafish mosaic mutant ("CRISPants") and morpholino knockdown (Morphant) embryos versus a set of 5 different loss-of-function (LOF) stable mutants in one particular gene involved in hepatic stellate cells development: podxl. While transient LOF and mosaic mutants induced a decrease of hepatic stellate cells number stable LOF zebrafish did not. The authors analyzed the molecular causes of these phenotypic differences and concluded that LOF mutants are genetically compensated through the upregulation of the expression of many genes. Additionally, they ruled out other better-known and described mechanisms such as the expression of redundant genes, protein feedback loops, or transcriptional adaptation.

      While the manuscript is clearly written and conclusions are, in general, properly supported, there are some aspects that need to be further clarified and studied.

      (1) It would be convenient to apply a method to better quantify potential loss-of-function mutations in the CRISPants. Doing this it can be known not only percentage of mutations in those embryos but also what fraction of them are actually generating an out-of-frame mutation likely driving gene loss of function (since deletions of 3-6 nucleotides removing 1-2 aminoacid/s will likely not have an impact in protein activity, unless that this/these 1-2 aminoacid/s is/are essential for the protein activity). With this, the authors can also correlate phenotype penetrance with the level of loss-of-function when quantifying embryo phenotypes that can help to support their conclusions.

      Reviewer #2 raises an excellent point that is similar to Reviewer #1’s first concern. Please see our response above. In general, we agree that correlating phenotype penetrance with level of loss-of-function is a very good way to support conclusions regarding specificity in knockdown experiments. Unfortunately, because the phenotype we are examining (HSC number) has a relatively large standard deviation even in control/wildtype larvae (for example, 63 ± 19 (mean ± standard deviation) HSCs per liver in uninjected control siblings in Figure 1) it would be technically very difficult to do this experiment for podxl.

      (2) It is unclear that 4.93 ng of morpholino per embryo is totally safe. The amount of morpholino causing undesired effects can differ depending on the morpholino used. I would suggest performing some sanity check experiments to demonstrate that morpholino KD is not triggering other molecular outcomes, such as upregulation of p53 or innate immune response.

      Reviewer #2 raises an excellent point that is similar to Reviewer #1’s second concern. Please see our response above. We acknowledge that some of the effects of the podxl morpholino may be non-specific. To address this concern in the revised manuscript, we plan to perform additional morpholino studies, including morpholino injections of podxl mutants and assessment of tp53 and other immune response/cellular stress pathway genes in podxl morphants.

      (3) Although the authors made a set of controls to demonstrate the specificity of the CRISPant phenotypes, I believe that a rescue experiment could be beneficial to support their conclusions. Injecting an mRNA with podxl ORF (ideally with a tag to follow protein levels up) together with the induction of CRISPants could be a robust manner to demonstrate the specificity of the approach. A rescue experiment with morphants would also be good to have, although these are a bit more complicated, to ultimately demonstrate the specificity of the approach.

      (4) In lines 314-316, the authors speculate on a correlation between decreased HSC and Podxl levels. It would be interesting to actually test this hypothesis and perform RT-qPCR upon CRISPant induction or, even better and if antibodies are available, western blot analysis.

      We appreciate the reviewer’s acknowledgement of the controls we performed to demonstrate the specificity of the CRISPant phenotypes. The proposed experiments (rescue, assessment of Podxl levels) would help bolster our conclusions but are technically difficult due to the relatively large standard deviation for the HSC number phenotype even in wildtype larvae and the lack of well-characterized zebrafish antibodies against Podxl.

      (5) Similarly, in lines 337-338 and 342-344, the authors discuss that it could be possible that genes near to podxl locus could be upregulated in the mutants. Since they already have a transcriptomic done, this seems an easy analysis to do that can address their own hypothesis.

      Thank you for this suggestion. We were referring in these sections to genes that are near the podxl locus with respect to three-dimensional chromatin structure; such genes would not necessarily be near the podxl locus on chromosome 4. We will clarify the text in this paragraph for the revised manuscript. At the same time, we will examine our transcriptomic data to check expression of mkln1, cyb5r3, and other nearby genes on chromosome 4 as suggested and include this analysis in the revised manuscript.

      (6) Figures 4 and 5 would be easier to follow if panels B-F included what mutants are (beyond having them in the figure legend). Moreover, would it be more accurate and appropriate if the authors group all three WT and mutant data per panel instead of showing individual fish? Representing technical replicates does not demonstrate in vivo variability, which is actually meaningful in this context. Then, statistical analysis can be done between WT and mutant per panel and per set of primers using these three independent 3-month-old zebrafish.

      Thank you for this suggestion. We will modify these figures to clarify our results.

      Reviewer #3 (Public review):

      Summary:

      Ross et al. show that knockdown of zebrafish podocalyxin-like (podxl) by CRISPR/Cas or morpholino injection decreased the number of hepatic stellate cells (HSC). The authors then generated 5 different mutant alleles representing a range of lesions, including premature stop codons, in-frame deletion of the transmembrane domain, and deletions of the promoter region encompassing the transcription start site. However, unlike their knockdown experiment, HSC numbers did not decrease in podxl mutants; in fact, for two of the mutant alleles, the number of HSCs increased compared to the control. Injection of podxl CRISPR/Cas constructs into these mutants had no effect on HSC number, suggesting that the knockdown phenotype is not due to off-target effects but instead that the mutants are somehow compensating for the loss of podxl. The authors then present multiple lines of evidence suggesting that compensation is not exclusively due to transcriptional adaptation - evidence of mRNA instability and nonsense-mediated decay was observed in some but all mutants; expression of the related gene endoglycan (endo) was unchanged in the mutants and endo knockdown had no effect on HSC numbers; and, expression profiling by RNA sequencing did not reveal changes in other genes that share sequence similarity with podxl. Instead, their RNA-seq data showed hundreds of differentially expressed genes, especially ECM-related genes, suggesting that compensation in podxl mutants is complex and multi-genic.

      Strengths:

      The data presented is impressively thorough, especially in its characterization of the 5 different podxl alleles and exploration of whether these mutants exhibit transcriptional adaptation.

      Thank you very much for appreciating the hard work that went into this manuscript.

      Weaknesses:

      RNA sequencing expression profiling was done on adult livers. However, compensation of HSC numbers is apparent by 6 dpf, suggesting compensatory mechanisms would be active at larval or even embryonic stages. Although possible, it's not clear that any compensatory changes in gene expression would persist to adulthood.

      This reviewer makes an excellent point. Our finding that the largest changes in gene expression were in extracellular matrix (ECM) genes and ECM modulation is a major function of HSCs supports the hypothesis that genetic compensation is occurring in adults. Nonetheless, we agree that compensatory changes in adults may not fully reflect the compensatory changes during development, so it would bolster the conclusions of the paper to perform the RNA sequencing and qPCR experiments on zebrafish larval livers.

      We tried very hard to do this experiment proposed by Reviewer #3. In our hands, obtaining sufficient high-quality RNA for robust gene expression analysis typically requires pooling of ~10-15 larval livers. These larvae need to be obtained from a heterozygous in-cross in order to have matched wildtype sibling controls. Livers must be dissected from freshly euthanized (not fixed) zebrafish. Thus, this experiment requires genotyping live, individual larvae from a small amount of tissue (without sacrificing the larvae) before dissecting and pooling the livers. Unfortunately we were unable to confidently and reproducibly genotype individual live podxl larvae with these small amounts of tissue despite trying multiple approaches. Therefore we were not able to perform gene expression analysis on podxl mutant larval livers.

    1. eLife Assessment

      In this important study, the authors set out to determine the molecular interactions between the AQP2 from Trypanosoma brucei (TbAQP2) and the trypanocidal drugs pentamidine and melarsoprol in order to clarify the origins of clinically observed drug resistance and facilitate future drug design. Using cryo-EM, molecular dynamics simulations, and lysis assays, the authors present a solid theory for how drug resistance mutations in TbAQP2 prevent drug uptake. Overall, even though a few methodological issues still need minor clarification, this study will be of interest to those working on aquaporins and the development of drugs targeting aquaporins.

    2. Reviewer #1 (Public review):

      This study presents cryoEM-derived structures of the Trypanosome aquaporin AQP2, in complex with its natural ligand, glycerol, as well as two trypanocidal drugs, pentamidine and melarsoprol, which use AQP2 as an uptake route. The structures are high quality, and the density for the drug molecules is convincing, showing a binding site in the centre of the AQP2 pore.

      The authors then continue to study this system using molecular dynamics simulations. Their simulations indicate that the drugs can pass through the pore and identify a weak binding site in the centre of the pore, which corresponds with that identified through cryoEM analysis. They also simulate the effect of drug resistance mutations, which suggests that the mutations reduce the affinity for drugs and therefore might reduce the likelihood that the drugs enter into the centre of the pore, reducing the likelihood that they progress through into the cell.

      While the cryoEM and MD studies are well conducted, it is a shame that the drug transport hypothesis was not tested experimentally. For example, did they do cryoEM with AQP2 with drug resistance mutations and see if they could see the drugs in these maps? They might not bind, but another possibility is that the binding site shifts, as seen in Chen et al. Do they have an assay for measuring drug binding? I think that some experimental validation of the drug binding hypothesis would strengthen this paper. Without this, I would recommend the authors to soften the statement of their hypothesis (i.e, lines 65-68) as this has not been experimentally validated.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present 3.2-3.7 Å cryo-EM structures of Trypanosoma brucei aquaglyceroporin-2 (TbAQP2) bound to glycerol, pentamidine, or melarsoprol and combine them with extensive all-atom MD simulations to explain drug recognition and resistance mutations. The work provides a persuasive structural rationale for (i) why positively selected pore substitutions enable diamidine uptake, and (ii) how clinical resistance mutations weaken the high-affinity energy minimum that drives permeation. These insights are valuable for chemotherapeutic re-engineering of diamidines and aquaglyceroporin-mediated drug delivery.

      My comments are on the MD part.

      Strengths:

      The study

      (1) Integrates complementary cryo-EM, equilibrium, applied voltage MD simulations, and umbrella-sampling PMFs, yielding a coherent molecular-level picture of drug permeation.

      (2) Offers direct structural rationalisation of long-standing resistance mutations in trypanosomes, addressing an important medical problem.

      Weaknesses:

      Unphysiological membrane potential. A field of 0.1 V nm⁻¹ (~1 V across the bilayer) was applied to accelerate translocation. From the traces (Figure 1c), it can be seen that the translocation occurred really quickly through the channel, suggesting that the field might have introduced some large changes in the protein. The authors state that they checked visually for this, but some additional analysis, especially of the residues next to the drug, would be welcome.

      Based on applied voltage simulations, the authors argue that the membrane potential would help get the drug into the cell, and that a high value of the potential was applied merely to speed up the simulation. At the same time, the barrier for translocation from PMF calculations is ~40 kJ/mol for WT. Is the physiological membrane voltage enough to overcome this barrier in a realistic time? In this context, I do not see how much value the applied voltage simulations have, as one can estimate the work needed to translocate the substrate on PMF profiles alone. The authors might want to tone down their conclusions about the role of membrane voltage in the drug translocation.

      Pentamidine charge state and protonation. The ligand was modeled as +2, yet pKa values might change with the micro-environment. Some justification of this choice would be welcome.

      I don't follow the RMSD calculations. The authors state that this RMSD is small for the substrate and show plots in Figure S7a, with the bottom plot being presumably done for the substrate (the legends are misleading, though), levelling off at ~0.15 nm RMSD. However, in Figure S7a, we see one trace (light blue) deviating from the initial position by more than 0.2 nm - that would surely result in an RMSD larger than 0.15, but this is somewhat not reflected in the RMSD plots.

    4. Reviewer #3 (Public review):

      Summary:

      Recent studies have established that trypanocidal drugs, including pentamidine and melarsoprol, enter the trypanosomes via the glyceroaquaporin AQP2 (TbAQP2). Interestingly, drug resistance in trypanosomes is, at least in part, caused by recombination with the neighbouring gene, AQP3, which is unable to permeate pentamidine or melarsoprol. The effect of the drugs on cells expressing chimeric proteins is significantly reduced. In addition, controversy exists regarding whether TbAQP2 permeates drugs like an ion channel, or whether it serves as a receptor that triggers downstream processes upon drug binding. In this study the authors set out to achieve three objectives:<br /> (1) to determine if TbAQP2 acts as a channel or a receptor,<br /> (2) to understand the molecular interactions between TbAQP2 and glycerol, pentamidine, and melarsoprol, and<br /> (3) to determine the mechanism by which mutations that arise from recombination with TbAQP3 result in reduced drug permeation.

      Indeed, all three objectives are achieved in this paper. Using MD simulations and cryo-EM, the authors determine that TbAQP2 likely permeates drugs like an ion channel. The cryo-EM structures provide details of glycerol and drug binding, and show that glycerol and the drugs occupy the same space within the pore. Finally, MD simulations and lysis assays are employed to determine how mutations in TbAQP2 result in reduced permeation of drugs by making entry and exit of the drug relatively more energy-expensive. Overall, the strength of evidence used to support the author's claims is solid.

      Strengths:

      The cryo-EM portion of the study is strong, and while the overall resolution of the structures is in the 3.5Å range, the local resolution within the core of the protein and the drug binding sites is considerably higher (~2.5Å).

      I also appreciated the MD simulations on the TbAQP2 mutants and the mechanistic insights that resulted from this data.

      Weaknesses:

      (1) The authors do not provide any empirical validation of the drug binding sites in TbAQP2. While the discussion mentions that the binding site should not be thought of as a classical fixed site, the MD simulations show that there's an energetically preferred slot (i.e., high occupancy interactions) within the pore for the drugs. For example, mutagenesis and a lysis assay could provide us with some idea of the contribution/importance of the various residues identified in the structures to drug permeation. This data would also likely be very valuable in learning about selectivity for drugs in different AQP proteins.

      (2) Given the importance of AQP3 in the shaping of AQP2-mediated drug resistance, I think a figure showing a comparison between the two protein structures/AlphaFold structures would be beneficial and appropriate.

      (3) A few additional figures showing cryo-EM density, from both full maps and half maps, would help validate the data.

      (4) Finally, this paper might benefit from including more comparisons with and analysis of data published in Chen et al (doi.org/10.1038/s41467-024-48445-4), which focus on similar objectives. Looking at all the data in aggregate might reveal insights that are not obvious from either paper on their own. For example, melarsoprol binds differently in structures reported in the two respective papers, and this may tell us something about the energy of drug-protein interactions within the pore.

    1. eLife assessment

      This valuable manuscript presents findings supported by solid data to identify a surprising glia-exclusive function for betapix in vascular integrity and angiogenesis. The manuscript also describes the optimisation of a modified CRISPR-based Zwitch approach to generate conditional knockouts in zebrafish.

    2. Reviewer #1 (Public review):

      The manuscript by Chiu et al describes the modification of the Zwitch strategy to efficiently generate conditional knockouts of zebrafish betapix. They leverage this system to identify a surprising glia-exclusive function of betapix in mediating vascular integrity and angiogenesis. Betapix has been previously associated with vascular integrity and angiogenesis in zebrafish, and betapix function in glia has also been proposed. However, this study identifies glial betapix in vascular stability and angiogenesis for the first time.

      The study derives its strength from the modified CRISPR-based Zwitch approach to identify the specific role of glial betapix (and not neuronal, mural, or endothelial). Using RNA-in situ hybridization and analysis of scRNA-Seq data, they also identify delayed maturation of neurons and glia and implicate a reduction in stathmin levels in the glial knockouts in mediating vascular homeostasis and angiogenesis. The study also implicates a betapix-zfhx3/4-vegfa axis in mediating cerebral angiogenesis.

      There is both technical (the generation of conditional KOs) and knowledge-related (the exclusive role of glial betapix in vascular stability/angiogenesis) novelty in this work that is going to benefit the community significantly.<br /> While the text is well written, it often elides details of experiments and relies on implicit understanding on the part of the reader. Similarly, the figure legends are laconic and often fail to provide all the relevant details.

      Specific comments:

      (1) While the evidence from cKO's implicating glial betapix in vascular stability/angiogenesis is exciting, glia-specific rescue of betapix in the global KOs/mutants (like those performed for stathmin) would be necessary to make a water-tight case for glial betapix.

      (2) Splice variants of betapix have been shown to have differential roles in haemorrhaging (Liu, 2007). What are the major glial isoforms, and are there specific splice variants in the glial that contribute to the phenotypes described?

      (3) Liu et al, 2012 demonstrated reduced proliferation of endothelial cells in bbh fish and linked it to deficits in angiogenesis. Are there proliferation/survival defects in endothelial cells in the glial KOs?

    3. Reviewer #2 (Public review):

      Summary:

      Using a genetic model of beta-pix conditional trap, the authors are able to regulate the spatio-temporal depletion of beta-pix, a gene with an established role in maintaining vascular integrity (shown elsewhere). This study provides strong in vivo evidence that glial beta-pix is essential to the development of the blood-brain barrier and maintaining vascular integrity. Using genetic and biochemical approaches, the authors show that PAK1 and Stathmins are in the same signaling axis as beta-pix, and act downstream to it, potentially regulating cytoskeletal remodeling and controlling glial migration. How exactly the glial-specific (beta-pix driven-) signaling influences angiogenesis or vascular integrity is not clear.

      Strengths:

      (1) Developing a conditional gene-trap genetic model which allows for tracking knockin reporter driven by endogenous promoter, plus allowing for knocking down genes. This genetic model enabled the authors to address the relevant scientific questions they were interested in, i.e., a) track expression of beta-pix gene, b) deletion of beta-pix gene in a cell-specific manner.

      (2) The study reveals the glial-specific role of beta-pix, which was unknown earlier. This opens up avenues for further research. (For instance, how do such (multiple) cell-specific signaling converge onto endothelial cells which build the central artery and maintain the blood-brain barriers?)

      Weaknesses:

      Major:

      (1) The study clearly establishes a role of beta-pix in glial cells, which regulates the length of the central artery and keeps the hemorrhages under control. Nevertheless, it is not clear how this is accomplished.<br /> a. Is this phenotype (hemorrhage) a result of the direct interaction of glial cells and the adjacent endothelial cells? If direct, is the communication established through junctions or through secreted molecules?<br /> b. The authors do not exclude the possibility that the effects observed on endothelial cells (quantified as length of central artery) could be secondary to the phenotype observed with deletion of glial beta-pix. For instance, can glial beta-pix regulate angiogenic factors secreted by peri-vascular cells, which consequently regulate the length of the central artery or vascular integrity?<br /> c. The pictorial summary of the findings (Figure 7) does not include Zfhx or Vegfa. The data do not provide clarity on how these molecules contribute (directly or indirectly) to endothelial cell integrity. Vegfaa is expressed in the central artery, but the expression of the receptor in these endothelial cells is not shown. Similarly, all other experimental analyses for Zfhx and Vegfa expression were performed in glial cells. More experimental evidence is necessary to show the regulation of angiogenesis (of endothelial cells) by glial beta-pix. Is the Vegfaa receptor present on central arteries, and how does glial depletion of beta-pix affect its expression or response of central artery endothelial cells (both pertaining to angiogenesis and vascular integrity).

      (2) Microtubule stabilization via glial beta-pix, claimed in Figure 5M, is unclear. Magnified images for h-betapix OE and h-stmn-1 glial cells are absent. Is this migration regulated by beta-pix through its GEF activity for Cdc42/Rac?

      (3) Hemorrhages are caused by compromised vascular integrity, which was not measured (either qualitatively or quantitatively) throughout the manuscript. The authors do measure the length of the central artery in several gene deletion models (2I, 3C. 5F/J, 6G/K), which is indicative of artery growth/ angiogenesis. How (if at all) defects in angiogenesis are an indication of hemorrhage should be explained or established. Do these angiogenic growth defects translate into junctional defects at later developmental timepoints? Formation and maintenance of endothelial cell junctions within the hemorrhaging arteries should be assessed in fish with deleted beta-pix from astrocytes.

      (4) More information is required about the quality control steps for 10X sequencing (Figure 4, number of cells, reads, etc.). What steps were taken to validate the data quality? The EC groups, 1 and 2-days post-KO are not visible in 4C. One appreciates that the progenitor group is affected the most 2 days post-KO. But since the effects are expected to be on the endothelial cell group as well (which is shown in in vivo data), an extensive analysis should be done on the EC group (like markers for junctional integrity, angiogenesis, mesenchymal interaction, etc.). Are Stathmins limited to glial cells? Are there indicators for angiogenic responses in endothelial cells?

    1. eLife Assessment

      This useful study provides a spatial transcriptomic analysis of the mouse adrenal gland that could have implications for future research and applications. The authors present solid results that allow the dissection of the cell signalling pathways and cellular composition of different zones of the adrenal glands in the mouse model; they propose new zone-specific gene markers and specific intra- and inter-zonal signaling pathways based on receptor-ligand expression patterns. Their web tool is user-friendly and will be helpful for adrenal scientists; however, the validation of crucial results of the large dataset is necessary. There are also several contradictory results/interpretations, and the opportunity to dissect the sexually dimorphic gene expression pattern and mouse-human interspecies differences is a missed opportunity.

    2. Reviewer #1 (Public review):

      Summary:

      This study employs spatial transcriptomics to explore the molecular architecture of the adult mouse adrenal gland and the adjacent adipose tissue. The research aimed to identify zonation-specific genetic markers, elucidate cellular differentiation patterns, and investigate inter- and intra-zone communication within the adrenal gland. The findings support the centripetal differentiation model, highlighting the transition of cell populations across different cortical zones. The study also integrates ligand-receptor interaction analysis to uncover the adrenal gland's role in endocrine and neuroendocrine signaling, particularly in stress response. This high-resolution spatial transcriptomic map provides novel insights into adrenal gland biology and is a resource for further investigations.

      Strengths:

      The study, using the latest technologies and methods such as Visium CytAssist technology, UMAP & Seurat analysis, Gene Ontology (GO) & KEGG pathway enrichment analysis, Monocle3, and CellChat analysis, performed three-dimensional analysis, which has been challenging to achieve using the two-dimensional transcriptomics that have been commonly used up until now.

      The unique gene expression patterns were demonstrated for each adrenal zone. Spatial transcriptomics confirmed unique gene expression patterns for each adrenal zone (ZG, ZF, ZX, medulla). The centripetal differentiation model shows the migration of the progenitor cells from the adrenal capsule towards the inner cortex. Key genetic markers were identified in each adrenal zone and adjacent adipose tissues. In addition, CellChat analysis identified major signaling pathways, including Wnt signaling, Hedgehog signaling, IGF2-IGF2R interactions, and Neuropeptide Y (NPY) signaling in the medulla. All these results offer a valuable dataset for future adrenal biology research, with potential applications in disease modeling and therapeutic target identification.

      The results, high-resolution mapping of adrenal gland zonation, validation of the centripetal differentiation model, perspective on cell-cell communication, and potential translational impact on human adrenal gland function and disorders, are quite noble.

      Weaknesses:

      The reviewer requests that the following issues be addressed in the text:

      (1) The study focuses only on adult male mice, which limits insights into developmental and sex-specific differences. What do the authors predict about the gender and age difference?

      (2) Despite advanced methodologies, single-cell heterogeneity may not be fully captured, as Visium technology has limited spatial resolution.

      (3) While the study suggests that ZX might have a role in androgen synthesis, further functional validation is required.

      (4) The study is primarily descriptive, lacking in-depth mechanistic experiments to validate cell-cell communication interactions. It is quite interesting to suggest cell-cell communication, but the authors are still required to provide some evidence to support it.

      (5) The data supports the conclusions, particularly in validating the centripetal differentiation model using Monocle3 trajectory analysis. However, functional validation experiments (e.g., gene knockout studies) would strengthen the findings, especially regarding ZX function and ligand-receptor interactions.

    3. Reviewer #2 (Public review):

      This study by M. Blatkiewicz et al. seeks to define the spatial gene expression pattern of the adult male mouse adrenal gland using current spatial transcriptomic techniques. They propose new zone-specific gene markers and specific intra- and inter-zonal signaling pathways based on receptor-ligand expression patterns. Their web tool is user-friendly and will be helpful for adrenal scientists. The manuscript is easy to follow, but validation of crucial results of the large dataset is missing. There are also several contradictory results/interpretations, and the opportunity to dissect the sexually dimorphic gene expression pattern and mouse-human interspecies differences is a missed opportunity.

      (1) The authors used 10-week-old CD1 male mouse adrenal glands to assess the spatial transcriptomics of the adrenal gland. As they also mentioned, male mice typically lose their zone-X after puberty (around 6-8 weeks of age). However, their analysis in 10-week-old mice suggests that zone-X covers most of the adrenal cortex. As shown in Figure 3A, the dots between the zona glomerulosa and the medulla are mostly positive for zone-X, which would suggest that the zona fasciculata represents a relative minority of the overall adult adrenal cortex. Is this correct? Is the presence of zone-X in sexually mature adult male mice unique to the CD1 strain? Providing histology data in support of this conclusion, using zone-specific markers combined with RNA in situ hybridization or immunofluorescence techniques in the CD1 male adrenal gland, would help to interpret these data further. Given the relatively low resolution of their gene expression profiles, it is possible there is overlap between the zona fasciculata and the zone-X.

      (2) The pseudotime trajectory analysis confirms prior reports in the literature showing zonal transdifferentiation but does not provide novel insight. It would be nice to know what gene expression patterns correlate (positively or negatively) based on an unbiased analysis.

      (3) The authors suggest that they identified new zonal markers, but it would be nice to see confirmation of some of these markers (e.g., Frmpd4, Oca2, Sphkap for the ZG or Cited1, Nat8f5 for the ZF, etc. ) with in situ or immunofluorescence combined with known markers such as Dab2, Cyp11b2, or Cyp11b1.

      (4) The authors mention a gradual transition between the zones. It would be interesting to know whether transition zones exist between the zona glomerulosa and the zona fasciculata or the zona fasciculata and the zone-X.

      (5) The authors note using Visium cyst assist, but they do not discuss the advantages of this system compared to other systems. Explanation of the approximate resolution of their analysis (e.g., how many cells were pooled in the wells) would help readers to interpret their data. It would also be nice to compare it to other spatial transcriptomic analyses of human adrenals, given the differences between the zonation of human and mouse adrenals.

      (6) Interestingly, CellChat analysis suggests possible communication between the medulla and the zona fasciculata and zona glomerulosa. How do the authors explain the transfer of these molecules from the medulla to the outer zones given centripetal blood flow in the adrenal? Also, how does the fact that Igf2 expression has been shown to be expressed in the capsule (PMID: 22266195) affect the interpretation of their data?

      (7) The study misses the opportunity to dissect sexually dimorphic gene expression patterns in the mouse adrenal. For example, the authors could have focused on the role of stem cells between male and female mouse adrenals, which have been reported to differ (PMID: 31104943). In addition, the authors could have focused on the sexually dimorphic zone-X and its regulation by sex hormone signaling.

      (8) The capsule is classified as a connective tissue, which may be misleading given its important role as a signaling center in the adrenal. Genes enriched in typical connective tissues do not include many of the genes that seem to define the adrenal capsule. Also, some of the capsule markers appear to be found in the zona glomerulosa. Is this a result of not being able to fully resolve the small layer of zG cells and the even smaller layer of capsular cells? Guided reclustering of the cells based on known markers and separation of capsule and connective tissue might help to present their data on adrenal zonation more clearly.

    4. Reviewer #3 (Public review):

      Summary:

      In summary, the scientists used Visium spatial transcriptomics technology to create a thorough spatial transcriptomic atlas of the adult male mouse adrenal gland and the adipose tissues that surround it. Their primary goals were to map the cell communication network, determine the differentiation direction of various cell types, and find marker genes for various adrenal zones.

      Strengths:

      (1) Undoubtedly, one of the biggest strengths of the manuscript is a spatial transcriptomic o mouse adrenal gland tissue, which, to my knowledge, has not been done before.

      (2) Comprehensive Zonal Characterization: Seven distinct clusters were identified, corresponding to known anatomical and functional regions (ZG, ZF, ZX, medulla, connective tissue, brown and white adipose tissue), each with robust marker gene sets.

      (3) The authors manage to integrate advanced bioinformatical tools such as CellChatDB, Monocle3, and CARD to study the relationship between cell types and differentiation of the tissue.

      (4) The authors manage to identify novel marker genes for some adrenal zones.

      Weaknesses:

      (1) The study focused only on one adult male CD1 IGS mouse, which is a limiting factor for other strains, ages, or females, especially given the sexual dimorphism of the ZX. Although the authors claim that four slices of the adrenal gland have been processed on Visium and sequenced, for "clarity," they show only one, which might bias the results.

      (2) Lack of detailed QC analysis of the Visium slide.

      (3) The study misses the functional validation of the novel marker genes - this needs to be addressed.

      (4) What worries me a lot is the fact that, actually, there might be more than one cell present within a Visium spot, so the only way to define zones is by anatomical observation rather than cellular composition.

      (5) In cell chat analysis, the authors show the strength of the interactions, but miss out on the number of interactions.

      Conclusions:

      The authors' stated goals were mostly accomplished:

      By mapping the mouse adrenal gland's molecular landscape, they were able to clearly establish unique molecular signatures for every anatomical zone.

      Pseudotime study of the cell progression from the capsule through ZG, ZF, and ZX demonstrates that the data strongly support the centripetal differentiation concept. Conclusions on the functional importance of newly discovered marker genes are conjectural and need additional experimental support.

      Nevertheless, several findings are still tentative and will need more experimental support, especially when it comes to the significance of ZX persistence and the functional involvement of recently discovered marker genes.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The study investigated how individuals living in urban slums in Salvador, Brazil, interact with environmental risk factors, particularly focusing on domestic rubbish piles, open sewers, and a central stream. The study makes use of the step selection functions using telemetry data, which is a method to estimate how likely individuals move towards these environmental features, differentiating among groups by gender, age, and leptospirosis serostatus. The results indicated that women tended to stay closer to the central stream while avoiding open sewers more than men. Furthermore, individuals who tested positive for leptospirosis tended to avoid open sewers, suggesting that behavioral patterns might influence exposure to risk factors for leptospirosis, hence ensuring more targeted interventions.

      Strengths:

      (1) The use of step selection functions to analyze human movement represents an innovative adaptation of a method typically used in animal ecology. This provides a robust quantitative framework for evaluating how people interact with environmental risk factors linked to infectious diseases (in this case, leptospirosis).

      (2) Detailed differentiation by gender and serological status allows for nuanced insights, which can help tailor targeted interventions and potentially improve public health measures in urban slum settings.

      (3) The integration of real-world telemetry data with epidemiological risk factors supports the development of predictive models that can be applied in future infectious disease research, helping to bridge the gap between environmental exposure and health outcomes.

      Weaknesses:

      (1) The sample size for the study was not calculated, although it was a nested cohort study.

      We thank Reviewer #1 for highlighting this weakness. We will make sure that this is explained in the next version of the manuscript. At the time of recruiting participants, we found no literature on how to perform a sample size calculation for movement studies involving GPS loggers and associated methods of analysis. Therefore, we aimed to recruit as many individuals as possible within the resource constraints of the study.

      (2) The step‐selection functions, though a novel method, may face challenges in fully capturing the complexity of human decision-making influenced by socio-cultural and economic factors that were not captured in the study.

      We agree with Reviewer #1 that this model may fail to capture the full breadth of human decision-making when it comes to moving through local environments. We included a section discussing the aspect of violence and how this influences residents’ choices, along with some possibilities on how to record and account for this. Although it is outside of the scope of this study, we believe that coupling these quantitative methods with qualitative studies would provide a comprehensive understanding of movement in these areas.

      (3) The study's context is limited to a specific urban slum in Salvador, Brazil, which may reduce the generalizability of its findings to other geographical areas or populations that experience different environmental or socio-economic conditions.

      (4) The reliance on self-reported or telemetry-based movement data might include some inaccuracies or biases that could affect the precision of the selection coefficients obtained, potentially limiting the study's predictive power.

      We agree that telemetry data has inherent inaccuracies, which we have tried to account for by using only those data points within the study areas. We would like to clarify that there is no self-reported movement data used in this study. All movement data was collected using GPS loggers.

      (5) Some participants with less than 50 relocations within the study area were excluded without clear justification, see line 149.

      We found that the SSF models would not run properly if there weren’t enough relocations. Therefore, we decided to remove these individuals from the analysis. They are also removed from any descriptive statistics presented.

      (6) Some figures are not clear (see Figure 4 A & B).

      We will be trying to improve the quality of this image in the next version of the manuscript.

      (7) No statement on conflict of interest was included, considering sponsorship of the study.

      The conflict-of-interest forms for each author were sent to eLife separately. I believe these should be made available upon publication, but please reach out if these need to be re-sent.

      Reviewer #2 (Public review):

      Summary:

      Pablo Ruiz Cuenca et al. conducted a GPS logger study with 124 adult participants across four different slum areas in Salvador, Brazil, recording GPS locations every 35 seconds for 48 hours. The aim of their study was to investigate step-selection models, a technique widely used in movement ecology to quantify contact with environmental risk factors for exposure to leptospires (open sewers, community streams, and rubbish piles). The authors built two different types of models based on distance and based on buffer areas to model human environmental exposure to risk factors. They show differences in movement/contact with these risk factors based on gender and seropositivity status. This study shows the existence of modest differences in contact with environmental risk factors for leptospirosis at small spatial scales based on socio-demographics and infection status.

      Strengths:

      The authors assembled a rich dataset by collecting human GPS logger data, combined with field-recorded locations of open sewers, community streams, and rubbish piles, and testing individuals for leptospirosis via serology. This study was able to capture fine-scale exposure dynamics within an urban environment and shows differences by gender and seropositive status, using a method novel to epidemiology (step selection).

      Weaknesses:

      Due to environmental data being limited to the study area, exposure elsewhere could not be captured, despite previous research by Owers et al. showing that the extent of movement was associated with infection risk. Limitations of step selection for use in studying human participants in an urban environment would need to be explicitly discussed.

      The environmental factors used in the study required research teams to visit the sites and map the locations. Given that individuals travelled throughout the city of Salvador, performing this task at a large scale would be unachievable. Therefore, we limited the data to only those points within the study area boundaries to avoid any biases from interactions with unrecorded environmental factors. We will be including a more explicit discussion of the limitations of SSF in urban environmental settings with human participants in the next version of the manuscript.

    1. eLife Assessment

      This valuable study asks how the neural representation of individual finger movements changes during the early periods of sequence learning. By combining a new method for extracting features from human magnetoencephalography data and decoding analyses, the authors provide solid evidence of an early, swift change in the brain regions correlated with sequence learning, including a set of previously unreported frontal cortical regions. The authors also show that offline contextualization during short rest periods is the basis for improved performance. Further confirmation of these results on multiple movement sequences would further strengthen the key claims.

    2. Reviewer #1 (Public review):

      Summary:

      This study addresses the issue of rapid skill learning and whether individual sequence elements (here: finger presses) are differentially represented in human MEG data. The authors use a decoding approach to classify individual finger elements, and accomplish an accuracy of around 94%. A relevant finding is that the neural representations of individual finger elements dynamically change over the course of learning. This would be highly relevant for any attempts to develop better brain machine interfaces - one now can decode individual elements within a sequence with high precision, but these representations are not static but develop over the course of learning.

      Strengths:

      The work follows from a large body of work from the same group on the behavioural and neural foundations of sequence learning. The behavioural task is well established a neatly designed to allow for tracking learning and how individual sequence elements contribute. The inclusion of short offline rest periods between learning epochs has been influential because it has revealed that a lot, if not most of the gains in behaviour (ie speed of finger movements) occur in these so-called micro-offline rest periods.

      The authors use a range of new decoding techniques, and exhaustively interrogate their data in different ways, using different decoding approaches. Regardless of the approach, impressively high decoding accuracies are observed, but when using a hybrid approach that combines the MEG data in different ways, the authors observe decoding accuracies of individual sequence elements from the MEG data of up to 94%.

    3. Reviewer #2 (Public review):

      Summary:

      The current paper consists of two parts. The first part is the rigorous feature optimization of the MEG signal to decode individual finger identity performed in a sequence (4-1-3-2-4; 1~4 corresponds to little~index fingers of the left hand). By optimizing various parameters for the MEG signal, in terms of (i) reconstructed source activity in voxel- and parcel-level resolution and their combination, (ii) frequency bands, and (iii) time window relative to press onset for each finger movement, as well as the choice of decoders, the resultant "hybrid decoder" achieved extremely high decoding accuracy (~95%).

      In the second part of the paper, armed with the successful 'hybrid decoder,' the authors asked how neural representation of individual finger movement that is embedded in a sequence, changes during a very early period of skill learning and whether and how such representational change can predict skill learning. They assessed the difference in MEG feature patterns between the first and the last press 4 in sequence 41324 at each training trial and found that the pattern differentiation progressively increased over the course of early learning trials. Additionally, they found that this pattern differentiation specifically occurred during the rest period rather than during the practice trial. With a significant correlation between the trial-by-trial profile of this pattern differentiation and that for accumulation of offline learning, the authors argue that such "contextualization" of finger movement in a sequence (e.g., what-where association) underlies the early improvement of sequential skill. This is an important and timely topic for the field of motor learning and beyond.

      Strengths:

      The use of temporally rich neural information (MEG signal) has a significant advantage over previous studies testing sequential representations using fMRI. This allowed the authors to examine the earliest period (= the first few minutes of training) of skill learning with finer temporal resolution. Through the optimization of MEG feature extraction, the current study achieved extremely high decoding accuracy (approx. 94%) compared to previous works. The finding of the early "contextualization" of the finger movement in a sequence and its correlation to early (offline) skill improvement is interesting and important. The comparison between "online" and "offline" pattern distance is a neat idea.

      Weaknesses:

      One potential weakness, in terms of the generality, is that the study assessed the single sequence, the "41324" across all participants. Future confirmation test of using different sequences would be important.

    4. Reviewer #3 (Public review):

      Summary:

      One goal of this paper is to introduce a new approach for highly accurate decoding of finger movements from human magnetoencephalography data via dimension reduction of a "multi-scale, hybrid" feature space. Following this decoding approach, the authors aim to show that early skill learning involves "contextualization" of the neural coding of individual movements, relative to their position in a sequence of consecutive movements. Furthermore, they aim to show that this "contextualization" develops primarily during short rest periods interspersed with skill training, and correlates with a performance metric which the authors interpret as an indicator of offline learning.

      Strengths:

      A strength of the paper is the innovative decoding approach, which achieves impressive decoding accuracies via dimension reduction of a "multi-scale, hybrid space". This hybrid-space approach follows the neurobiologically plausible idea of concurrent distribution of neural coding across local circuits as well as large-scale networks.

      Weaknesses:

      A clear weakness of the paper lies in the authors' conclusions regarding "contextualization". Several potential confounds, which partly arise from the experimental design, and which are described below, question the neurobiological implications proposed by the authors, and offer a simpler explanation of the results. Furthermore, the paper follows the assumption that short breaks result in offline skill learning, while recent evidence casts doubt on this assumption.

      Specifically:

      The authors interpret the ordinal position information captured by their decoding approach as a reflection of neural coding dedicated to the local context of a movement (Figure 4). One way to dissociate ordinal position information from information about the moving effectors is to train a classifier on one sequence, and test the classifier on other sequences that require the same movements, but in different positions (Kornysheva et al., Neuron 2019). In the present study, however, participants trained to repeat a single sequence (4-1-3-2-4). As a result, ordinal position information is potentially confounded by the fixed finger transitions around each of the two critical positions (first and fifth press). Across consecutive correct sequences, the first keypress in a given sequence was always preceded by a movement of the index finger (=last movement of the preceding sequence), and followed by a little finger movement. The last keypress, on the other hand, was always preceded by a ring finger movement, and followed by an index finger movement (=first movement of the next sequence). Figure 3 - supplement 5 shows that finger identity can be decoded with high accuracy (>70%) across a large time window around the time of the keypress, up to at least {plus minus}100 ms (and likely beyond, given that decoding accuracy is still high at the boundaries of the window depicted in that figure). This time window approaches the keypress transition times in this study. Given that distinct finger transitions characterized the first and fifth keypress, the classifier could thus rely on persistent (or "lingering") information from the preceding finger movement, and/or "preparatory" information about the subsequent finger movement, in order to dissociate the first and fifth keypress. Currently, the manuscript provides little evidence that the context information captured by the decoding approach is more than a by-product of temporally extended, and therefore overlapping, but independent neural representations of consecutive keypresses that are executed in close temporal proximity - rather than a neural representation dedicated to context.

      During the review process, the authors pointed out that a "mixing" of temporally overlapping information from consecutive keypresses, as described above, should result in systematic misclassifications and therefore be detectable in the confusion matrices in Figures 3C and 4B, which indeed do not provide any evidence that consecutive keypresses are systematically confused. However, such absence of evidence (of systematic misclassification) should be interpreted with caution. The authors also reported that there was only a weak relation between inter-press intervals and "online contextualization" (Figure 5 - figure supplement 6), however, their analysis suprisingly includes a keypress transition that is shared between OP1 and OP5 ("4-4"), rather than focusing solely on the two distinctive transitions ("2-4" and "4-1").

      Such temporal overlap of consecutive, independent finger representations may also account for the dynamics of "ordinal coding"/"contextualization", i.e., the increase in 2-class decoding accuracy, across Day 1 (Figure 4C). As learning progresses, both tapping speed and the consistency of keypress transition times increase (Figure 1), i.e., consecutive keypresses are closer in time, and more consistently so. As a result, information related to a given keypress is increasingly overlapping in time with information related to the preceding and subsequent keypresses. Furthermore, learning should increase the number of (consecutively) correct sequences, and, thus, the consistency of finger transitions. Therefore, the increase in 2-class decoding accuracy may simply reflect an increasing overlap in time of increasingly consistent information from consecutive keypresses, which allows the classifier to dissociate the first and fifth keypress more reliably as learning progresses, simply based on the characteristic finger transitions associated with each. In other words, given that the physical context of a given keypress changes as learning progresses - keypresses move closer together in time, and are more consistently correct - it seems problematic to conclude that the mental representation of that context changes. During the review process, authors pointed at absence of evidence of a relation between tapping speed and "ordinal coding" (Figure 5 - figure supplement 7). However, a rigorous test of the idea that the mental representation of context changes would require a task design in which the physical context remains constant.

      A similar difference in physical context may explain why neural representation distances ("differentiation") differ between rest and practice (Figure 5). The authors define "offline differentiation" by comparing the hybrid space features of the last index finger movement of a trial (ordinal position 5) and the first index finger movement of the next trial (ordinal position 1). However, the latter is not only the first movement in the sequence, but also the very first movement in that trial (at least in trials that started with a correct sequence), i.e., not preceded by any recent movement. In contrast, the last index finger of the last correct sequence in the preceding trial includes the characteristic finger transition from the fourth to the fifth movement. Thus, there is more overlapping information arising from the consistent, neighbouring keypresses for the last index finger movement, compared to the first index finger movement of the next trial. A strong difference (larger neural representation distance) between these two movements is, therefore, not surprising, given the task design, and this difference is also expected to increase with learning, given the increase in tapping speed, and the consequent stronger overlap in representations for consecutive keypresses.

      A further complication in interpreting the results stems from the visual feedback that participants received during the task. Each keypress generated an asterisk shown above the string on the screen. It is not clear why the authors introduced this complicating visual feedback in their task, besides consistency with their previous studies. The resulting systematic link between the pattern of visual stimulation (the number of asterisks on the screen) and the ordinal position of a keypress makes the interpretation of "contextual information" that differentiates between ordinal positions difficult. While the authors report the surprising finding that their eye-tracking data could not predict asterisk position on the task display above chance level, the mean gaze position seemed to vary systematically as a function of ordinal position of a movement - see Figure 4 - figure supplement 3.

      The authors report a significant correlation between "offline differentiation" and cumulative micro-offline gains. However, to reach the conclusion that "the degree of representational differentiation -particularly prominent over rest intervals - correlated with skill gains.", the critical question is rather whether "offline differentiation" correlates with micro-offline gains (not with cumulative micro-offline gains). That is, does the degree to which representations differentiate "during" a given rest period correlate with the degree to which performance improves from before to after the same rest period (not: does "offline differentiation" in a given rest period correlate with the degree to which performance has improved "during" all rest periods up to the current rest period - but this is what Figure 5 - figure supplements 1 and 4 show).

      The authors follow the assumption that micro-offline gains reflect offline learning. However, there is no compelling evidence in the literature, and no evidence in the present manuscript, that micro-offline gains (during any training phase) reflect offline learning. Instead, emerging evidence in the literature indicates that they do not (Das et al., bioRxiv 2024), and instead reflect transient performance benefits when participants train with breaks, compared to participants who train without breaks, however, these benefits vanish within seconds after training if both groups of participants perform under comparable conditions (Das et al., bioRxiv 2024). During the review process, the authors argued that differences in the design between Das et al. (2024) on the one hand (Experiments 1 and 2), and the study by Bönstrup et al. (2019) on the other hand, may have prevented Das et al. (2024) from finding the assumed (lasting) learning benefit by micro-offline consolidation. However, the Supplementary Material of Das et al. (2024) includes an experiment (Experiment S1) whose design closely follows the early learning phase of Bönstrup et al. (2019), and which, nevertheless, demonstrates that there is no lasting benefit of taking breaks for the acquired skill level, despite the presence of micro-offline gains.

      Along these lines, the authors argue that their practice schedule "minimizes reactive inhibition effects", in particular their short practice periods of 10 seconds each. However, 10 seconds are sufficient to result in motor slowing, as report in Bächinger et al., elife 2019, or Rodrigues et al., Exp Brain Res 2009.

      An important conceptual problem with the current study is that the authors conclude that performance improves, and representation manifolds differentiate, "during" rest periods. However, micro-offline gains (as well as offline contextualization) are computed from data obtained during practice, not rest, and may, thus, just as well reflect a change that occurs "online", e.g., at the very onset of practice (like pre-planning) or throughout practice (like fatigue, or reactive inhibition).

      The authors' conclusion that "low-frequency oscillations (LFOs) result in higher decoding accuracy compared to other narrow-band activity" should be taken with caution, given that the critical decoding analysis for this conclusion was based on data averaged across a time window of 200 ms (Figure 2), essentially smoothing out higher frequency components.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Overview of reviewer's concerns after peer review: 

      As for the initial submission, the reviewers' unanimous opinion is that the authors should perform additional controls to show that their key findings may not be affected by experimental or analysis artefacts, and clarify key aspects of their core methods, chiefly:  

      (1) The fact that their extremely high decoding accuracy is driven by frequency bands that would reflect the key press movements and that these are located bilaterally in frontal brain regions (with the task being unilateral) are seen as key concerns, 

      The above statement that decoding was driven by bilateral frontal brain regions is not entirely consistent with our results. The confusion was likely caused by the way we originally presented our data in Figure 2. We have revised that figure to make it more clear that decoding performance at both the parcel- (Figure 2B) and voxel-space (Figure 2C) level is predominantly driven by contralateral (as opposed to ipsilateral) sensorimotor regions. Figure 2D, which highlights bilateral sensorimotor and premotor regions, displays accuracy of individual regional voxel-space decoders assessed independently. This was the criteria used to determine which regional voxel-spaces were included in the hybridspace decoder. This result is not surprising given that motor and premotor regions are known to display adaptive interhemispheric interactions during motor sequence learning [1, 2], and particularly so when the skill is performed with the non-dominant hand [3-5]. We now discuss this important detail in the revised manuscript:

      Discussion (lines 348-353)

      “The whole-brain parcel-space decoder likely emphasized more stable activity patterns in contralateral frontoparietal regions that differed between individual finger movements [21,35], while the regional voxel-space decoder likely incorporated information related to adaptive interhemispheric interactions operating during motor sequence learning [32,36,37], particularly pertinent when the skill is performed with the non-dominant hand [38-40].”

      We now also include new control analyses that directly address the potential contribution of movement-related artefact to the results.  These changes are reported in the revised manuscript as follows:

      Results (lines 207-211):

      “An alternate decoder trained on ICA components labeled as movement or physiological artefacts (e.g. – head movement, ECG, eye movements and blinks; Figure 3 – figure supplement 3A, D) and removed from the original input feature set during the pre-processing stage approached chance-level performance (Figure 4 – figure supplement 3), indicating that the 4-class hybrid decoder results were not driven by task-related artefacts.”

      Results (lines 261-268):

      “As expected, the 5-class hybrid-space decoder performance approached chance levels when tested with randomly shuffled keypress labels (18.41%± SD 7.4% for Day 1 data; Figure 4 – figure supplement 3C). Task-related eye movements did not explain these results since an alternate 5-class hybrid decoder constructed from three eye movement features (gaze position at the KeyDown event, gaze position 200ms later, and peak eye movement velocity within this window; Figure 4 – figure supplement 3A) performed at chance levels (cross-validated test accuracy = 0.2181; Figure 4 – figure supplement 3B, C). “

      Discussion (Lines 362-368):

      “Task-related movements—which also express in lower frequency ranges—did not explain these results given the near chance-level performance of alternative decoders trained on (a) artefact-related ICA components removed during MEG preprocessing (Figure 3 – figure supplement 3A-C) and on (b) task-related eye movement features (Figure 4 – figure supplement 3B, C). This explanation is also inconsistent with the minimal average head motion of 1.159 mm (± 1.077 SD) across the MEG recording (Figure 3 – figure supplement 3D).“

      (2) Relatedly, the use of a wide time window (~200 ms) for a 250-330 ms typing speed makes it hard to pinpoint the changes underpinning learning, 

      The revised manuscript now includes analyses carried out with decoding time windows ranging from 50 to 250ms in duration. These additional results are now reported in:

      Results (lines 258-261):

      “The improved decoding accuracy is supported by greater differentiation in neural representations of the index finger keypresses performed at positions 1 and 5 of the sequence (Figure 4A), and by the trial-by-trial increase in 2-class decoding accuracy over early learning (Figure 4C) across different decoder window durations (Figure 4 – figure supplement 2).”

      Results (lines 310-312):

      “Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R<sup>2</sup> = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C).“

      Discussion (lines 382-385):

      “This was further supported by the progressive differentiation of neural representations of the index finger keypress (Figure 4A) and by the robust trial-bytrial increase in 2-class decoding accuracy across time windows ranging between 50 and 250ms (Figure 4C; Figure 4 – figure supplement 2).”

      Discussion (lines 408-9):

      “Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1).”

      (3) These concerns make it hard to conclude from their data that learning is mediated by "contextualisation" ---a key claim in the manuscript; 

      We believe the revised manuscript now addresses all concerns raised in Editor points 1 and 2.

      (4) The hybrid voxel + parcel space decoder ---a key contribution of the paper--- is not clearly explained; 

      We now provide additional details regarding the hybrid-space decoder approach in the following sections of the revised manuscript:

      Results (lines 158-172):

      “Next, given that the brain simultaneously processes information more efficiently across multiple spatial and temporal scales [28, 32, 33], we asked if the combination of lower resolution whole-brain and higher resolution regional brain activity patterns further improve keypress prediction accuracy. We constructed hybrid-space decoders (N = 1295 ± 20 features; Figure 3A) combining whole-brain parcel-space activity (n = 148 features; Figure 2B) with regional voxel-space activity from a datadriven subset of brain areas (n = 1147 ± 20 features; Figure 2D). This subset covers brain regions showing the highest regional voxel-space decoding performances (top regions across all subjects shown in Figure 2D; Methods – Hybrid Spatial Approach). 

      […]

      Note that while features from contralateral brain regions were more important for whole-brain decoding (in both parcel- and voxel-spaces), regional voxel-space decoders performed best for bilateral sensorimotor areas on average across the group. Thus, a multi-scale hybrid-space representation best characterizes the keypress action manifolds.”

      Results (lines 275-282):

      “We used a Euclidian distance measure to evaluate the differentiation of the neural representation manifold of the same action (i.e. - an index-finger keypress) executed within different local sequence contexts (i.e. - ordinal position 1 vs. ordinal position 5; Figure 5). To make these distance measures comparable across participants, a new set of classifiers was then trained with group-optimal parameters (i.e. – broadband hybrid-space MEG data with subsequent manifold extraction (Figure 3 – figure supplements 2) and LDA classifiers (Figure 3 – figure supplements 7) trained on 200ms duration windows aligned to the KeyDown event (see Methods, Figure 3 – figure supplements 5). “

      Discussion (lines 341-360):

      “The initial phase of the study focused on optimizing the accuracy of decoding individual finger keypresses from MEG brain activity. Recent work showed that the brain simultaneously processes information more efficiently across multiple—rather than a single—spatial scale(s) [28, 32]. To this effect, we developed a novel hybridspace approach designed to integrate neural representation dynamics over two different spatial scales: (1) whole-brain parcel-space (i.e. – spatial activity patterns across all cortical brain regions) and (2) regional voxel-space (i.e. – spatial activity patterns within select brain regions) activity. We found consistent spatial differences between whole-brain parcel-space feature importance (predominantly contralateral frontoparietal, Figure 2B) and regional voxel-space decoder accuracy (bilateral sensorimotor regions, Figure 2D). The whole-brain parcel-space decoder likely emphasized more stable activity patterns in contralateral frontoparietal regions that differed between individual finger movements [21, 35], while the regional voxelspace decoder likely incorporated information related to adaptive interhemispheric interactions operating during motor sequence learning [32, 36, 37], particularly pertinent when the skill is performed with the non-dominant hand [38-40]. The observation of increased cross-validated test accuracy (as shown in Figure 3 – Figure Supplement 6) indicates that the spatially overlapping information in parcel- and voxel-space time-series in the hybrid decoder was complementary, rather than redundant [41].  The hybrid-space decoder which achieved an accuracy exceeding 90%—and robustly generalized to Day 2 across trained and untrained sequences— surpassed the performance of both parcel-space and voxel-space decoders and compared favorably to other neuroimaging-based finger movement decoding strategies [6, 24, 42-44].”

      Methods (lines 636-647):

      “Hybrid Spatial Approach.  First, we evaluated the decoding performance of each individual brain region in accurately labeling finger keypresses from regional voxelspace (i.e. - all voxels within a brain region as defined by the Desikan-Killiany Atlas) activity. Brain regions were then ranked from 1 to 148 based on their decoding accuracy at the group level. In a stepwise manner, we then constructed a “hybridspace” decoder by incrementally concatenating regional voxel-space activity of brain regions—starting with the top-ranked region—with whole-brain parcel-level features and assessed decoding accuracy. Subsequently, we added the regional voxel-space features of the second-ranked brain region and continued this process until decoding accuracy reached saturation. The optimal “hybrid-space” input feature set over the group included the 148 parcel-space features and regional voxelspace features from a total of 8 brain regions (bilateral superior frontal, middle frontal, pre-central and post-central; N = 1295 ± 20 features).”

      (5) More controls are needed to show that their decoder approach is capturing a neural representation dedicated to context rather than independent representations of consecutive keypresses; 

      These controls have been implemented and are now reported in the manuscript:

      Results (lines 318-328):

      “Within-subject correlations were consistent with these group-level findings. The average correlation between offline contextualization and micro-offline gains within individuals was significantly greater than zero (Figure 5 – figure supplement 4, left; t = 3.87, p = 0.00035, df = 25, Cohen's d = 0.76) and stronger than correlations between online contextualization and either micro-online (Figure 5 – figure supplement 4, middle; t = 3.28, p = 0.0015, df = 25, Cohen's d = 1.2) or micro-offline gains (Figure 5 – figure supplement 4, right; t = 3.7021, p = 5.3013e-04, df = 25, Cohen's d = 0.69). These findings were not explained by behavioral changes of typing rhythm (t = -0.03, p = 0.976; Figure 5 – figure supplement 5), adjacent keypress transition times (R2 = 0.00507, F[1,3202] = 16.3; Figure 5 – figure supplement 6), or overall typing speed (between-subject; R2 = 0.028, p \= 0.41; Figure 5 – figure supplement 7).”

      Results (lines 385-390):

      “Further, the 5-class classifier—which directly incorporated information about the sequence location context of each keypress into the decoding pipeline—improved decoding accuracy relative to the 4-class classifier (Figure 4C). Importantly, testing on Day 2 revealed specificity of this representational differentiation for the trained skill but not for the same keypresses performed during various unpracticed control sequences (Figure 5C).”

      Discussion (lines 408-423):

      “Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1). This result remained unchanged when measuring offline contextualization between the last and second sequence of consecutive trials, inconsistent with a possible confounding effect of pre-planning [30] (Figure 5 – figure supplement 2A). On the other hand, online contextualization did not predict learning (Figure 5 – figure supplement 3). Consistent with these results the average within-subject correlation between offline contextualization and micro-offline gains was significantly stronger than withinsubject correlations between online contextualization and either micro-online or micro-offline gains (Figure 5 – figure supplement 4). 

      Offline contextualization was not driven by trial-by-trial behavioral differences, including typing rhythm (Figure 5 – figure supplement 5) and adjacent keypress transition times (Figure 5 – figure supplement 6) nor by between-subject differences in overall typing speed (Figure 5 – figure supplement 7)—ruling out a reliance on differences in the temporal overlap of keypresses. Importantly, offline contextualization documented on Day 1 stabilized once a performance plateau was reached (trials 11-36), and was retained on Day 2, documenting overnight consolidation of the differentiated neural representations.”

      (6) The need to show more convincingly that their data is not affected by head movements, e.g., by regressing out signal components that are correlated with the fiducial signal;  

      We now include data in Figure 3 – figure supplement 3D showing that head movement was minimal in all participants (mean of 1.159 mm ± 1.077 SD).  Further, the requested additional control analyses have been carried out and are reported in the revised manuscript:

      Results (lines 204-211):

      “Testing the keypress state (4-class) hybrid decoder performance on Day 1 after randomly shupling keypress labels for held-out test data resulted in a performance drop approaching expected chance levels (22.12%± SD 9.1%; Figure 3 – figure supplement 3C). An alternate decoder trained on ICA components labeled as movement or physiological artefacts (e.g. – head movement, ECG, eye movements and blinks; Figure 3 – figure supplement 3A, D) and removed from the original input feature set during the pre-processing stage approached chance-level performance (Figure 4 – figure supplement 3), indicating that the 4-class hybrid decoder results were not driven by task-related artefacts.” Results (lines 261-268):

      “As expected, the 5-class hybrid-space decoder performance approached chance levels when tested with randomly shuffled keypress labels (18.41%± SD 7.4% for Day 1 data; Figure 4 – figure supplement 3C). Task-related eye movements did not explain these results since an alternate 5-class hybrid decoder constructed from three eye movement features (gaze position at the KeyDown event, gaze position 200ms later, and peak eye movement velocity within this window; Figure 4 – figure supplement 3A) performed at chance levels (cross-validated test accuracy = 0.2181; Figure 4 – figure supplement 3B, C). “

      Discussion (Lines 362-368):

      “Task-related movements—which also express in lower frequency ranges—did not explain these results given the near chance-level performance of alternative decoders trained on (a) artefact-related ICA components removed during MEG preprocessing (Figure 3 – figure supplement 3A-C) and on (b) task-related eye movement features (Figure 4 – figure supplement 3B, C). This explanation is also inconsistent with the minimal average head motion of 1.159 mm (± 1.077 SD) across the MEG recording (Figure 3 – figure supplement 3D). “

      (7) The offline neural representation analysis as executed is a bit odd, since it seems to be based on comparing the last key press to the first key press of the next sequence, rather than focus on the inter-sequence interval

      While we previously evaluated replay of skill sequences during rest intervals, identification of how offline reactivation patterns of a single keypress state representation evolve with learning presents non-trivial challenges. First, replay events tend to occur in clusters with irregular temporal spacing as previously shown by our group and others.  Second, replay of experienced sequences is intermixed with replay of sequences that have never been experienced but are possible. Finally, and perhaps the most significant issue, replay is temporally compressed up to 20x with respect to the behavior [6]. That means our decoders would need to accurately evaluate spatial pattern changes related to individual keypresses over much smaller time windows (i.e. - less than 10 ms) than evaluated here. This future work, which is undoubtably of great interest to our research group, will require more substantial tool development before we can apply them to this question. We now articulate this future direction in the Discussion:

      Discussion (lines 423-427):

      “A possible neural mechanism supporting contextualization could be the emergence and stabilization of conjunctive “what–where” representations of procedural memories [64] with the corresponding modulation of neuronal population dynamics [65, 66] during early learning. Exploring the link between contextualization and neural replay could provide additional insights into this issue [6, 12, 13, 15].”

      (8) And this analysis could be confounded by the fact that they are comparing the last element in a sequence vs the first movement in a new one. 

      We have now addressed this control analysis in the revised manuscript:

      Results (Lines 310-316)

      “Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R<sup>2</sup> = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure [30] (Figure 5 – figure supplement 2A, 1st vs. 2nd Sequence approaches).”

      Discussion (lines 408-416):

      “Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1). This result remained unchanged when measuring offline contextualization between the last and second sequence of consecutive trials, inconsistent with a possible confounding effect of pre-planning [30] (Figure 5 – figure supplement 2A). On the other hand, online contextualization did not predict learning (Figure 5 – figure supplement 3). Consistent with these results the average within-subject correlation between offline contextualization and micro-offline gains was significantly stronger than within-subject correlations between online contextualization and either micro-online or micro-offline gains (Figure 5 – figure supplement 4).”

      It also seems to be the case that many analyses suggested by the reviewers in the first round of revisions that could have helped strengthen the manuscript have not been included (they are only in the rebuttal). Moreover, some of the control analyses mentioned in the rebuttal seem not to be described anywhere, neither in the manuscript, nor in the rebuttal itself; please double check that. 

      All suggested analyses carried out and mentioned are now in the revised manuscript.

      eLife Assessment 

      This valuable study investigates how the neural representation of individual finger movements changes during the early period of sequence learning. By combining a new method for extracting features from human magnetoencephalography data and decoding analyses, the authors provide incomplete evidence of an early, swift change in the brain regions correlated with sequence learning…

      We have now included all the requested control analyses supporting “an early, swift change in the brain regions correlated with sequence learning”:

      The addition of more control analyses to rule out that head movement artefacts influence the findings, 

      We now include data in Figure 3 – figure supplement 3D showing that head movement was minimal in all participants (mean of 1.159 mm ± 1.077 SD).  Further, we have implemented the requested additional control analyses addressing this issue:

      Results (lines 207-211):

      “An alternate decoder trained on ICA components labeled as movement or physiological artefacts (e.g. – head movement, ECG, eye movements and blinks; Figure 3 – figure supplement 3A, D) and removed from the original input feature set during the pre-processing stage approached chance-level performance (Figure 4 – figure supplement 3), indicating that the 4-class hybrid decoder results were not driven by task-related artefacts.”

      Results (lines 261-268):

      “As expected, the 5-class hybrid-space decoder performance approached chance levels when tested with randomly shuffled keypress labels (18.41%± SD 7.4% for Day 1 data; Figure 4 – figure supplement 3C). Task-related eye movements did not explain these results since an alternate 5-class hybrid decoder constructed from three eye movement features (gaze position at the KeyDown event, gaze position 200ms later, and peak eye movement velocity within this window; Figure 4 – figure supplement 3A) performed at chance levels (cross-validated test accuracy = 0.2181; Figure 4 – figure supplement 3B, C). “

      Discussion (Lines 362-368):

      “Task-related movements—which also express in lower frequency ranges—did not explain these results given the near chance-level performance of alternative decoders trained on (a) artefact-related ICA components removed during MEG preprocessing (Figure 3 – figure supplement 3A-C) and on (b) task-related eye movement features (Figure 4 – figure supplement 3B, C). This explanation is also inconsistent with the minimal average head motion of 1.159 mm (± 1.077 SD) across the MEG recording (Figure 3 – figure supplement 3D).“

      and to further explain the proposal of offline contextualization during short rest periods as the basis for improvement performance would strengthen the manuscript. 

      We have edited the manuscript to clarify that the degree of representational differentiation (contextualization) parallels skill learning.  We have no evidence at this point to indicate that “offline contextualization during short rest periods is the basis for improvement in performance”.  The following areas of the revised manuscript now clarify this point:  

      Summary (Lines 455-458):

      “In summary, individual sequence action representations contextualize during early learning of a new skill and the degree of differentiation parallels skill gains. Differentiation of the neural representations developed during rest intervals of early learning to a larger extent than during practice in parallel with rapid consolidation of skill.”

      Additional control analyses are also provided supporting a link between offline contextualization and early learning:

      Results (lines 302-318):

      “The Euclidian distance between neural representations of Index<sub>OP1</sub> (i.e. - index finger keypress at ordinal position 1 of the sequence) and Index<sub>OP5</sub> (i.e. - index finger keypress at ordinal position 5 of the sequence) increased progressively during early learning (Figure 5A)—predominantly during rest intervals (offline contextualization) rather than during practice (online) (t = 4.84, p < 0.001, df = 25, Cohen's d = 1.2; Figure 5B; Figure 5 – figure supplement 1A). An alternative online contextualization determination equaling the time interval between online and offline comparisons (Trial-based; 10 seconds between Index<sub>OP1</sub> and Index<sub>OP5</sub> observations in both cases) rendered a similar result (Figure 5 – figure supplement 2B).

      Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R<sup>2</sup> = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure [30] (Figure 5 – figure supplement 2A, 1st vs. 2nd Sequence approaches). Conversely, online contextualization (using either measurement approach) did not explain early online learning gains (i.e. – Figure 5 – figure supplement 3).”  

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This study addresses the issue of rapid skill learning and whether individual sequence elements (here: finger presses) are differentially represented in human MEG data. The authors use a decoding approach to classify individual finger elements and accomplish an accuracy of around 94%. A relevant finding is that the neural representations of individual finger elements dynamically change over the course of learning. This would be highly relevant for any attempts to develop better brain machine interfaces - one now can decode individual elements within a sequence with high precision, but these representations are not static but develop over the course of learning. 

      Strengths: 

      The work follows a large body of work from the same group on the behavioural and neural foundations of sequence learning. The behavioural task is well established a neatly designed to allow for tracking learning and how individual sequence elements contribute. The inclusion of short offline rest periods between learning epochs has been influential because it has revealed that a lot, if not most of the gains in behaviour (ie speed of finger movements) occur in these so-called micro-offline rest periods. 

      The authors use a range of new decoding techniques, and exhaustively interrogate their data in different ways, using different decoding approaches. Regardless of the approach, impressively high decoding accuracies are observed, but when using a hybrid approach that combines the MEG data in different ways, the authors observe decoding accuracies of individual sequence elements from the MEG data of up to 94%. 

      Weaknesses:  

      A formal analysis and quantification of how head movement may have contributed to the results should be included in the paper or supplemental material. The type of correlated head movements coming from vigorous key presses aren't necessarily visible to the naked eye, and even if arms etc are restricted, this will not preclude shoulder, neck or head movement necessarily; if ICA was conducted, for example, the authors are in the position to show the components that relate to such movement; but eye-balling the data would not seem sufficient. The related issue of eye movements is addressed via classifier analysis. A formal analysis which directly accounts for finger/eye movements in the same analysis as the main result (ie any variance related to these factors) should be presented.

      We now present additional data related to head (Figure 3 – figure supplement 3; note that average measured head movement across participants was 1.159 mm ± 1.077 SD) and eye movements (Figure 4 – figure supplement 3) and have implemented the requested control analyses addressing this issue. They are reported in the revised manuscript in the following locations: Results (lines 207-211), Results (lines 261-268), Discussion (Lines 362-368).

      This reviewer recommends inclusion of a formal analysis that the intra-vs inter parcels are indeed completely independent. For example, the authors state that the inter-parcel features reflect "lower spatially resolved whole-brain activity patterns or global brain dynamics". A formal quantitative demonstration that the signals indeed show "complete independence" (as claimed by the authors) and are orthogonal would be helpful.

      Please note that we never claim in the manuscript that the parcel-space and regional voxelspace features show “complete independence”.  More importantly, input feature orthogonality is not a requirement for the machine learning-based decoding methods utilized in the present study while non-redundancy is [7] (a requirement satisfied by our data, see below). Finally, our results show that the hybrid space decoder out-performed all other methods even after input features were fully orthogonalized with LDA (the procedure used in all contextualization analyses) or PCA dimensionality reduction procedures prior to the classification step (Figure 3 – figure supplement 2).

      Relevant to this issue, please note that if spatially overlapping parcel- and voxel-space timeseries only provided redundant information, inclusion of both as input features should increase model over-fitting to the training dataset and decrease overall cross-validated test accuracy [8]. In the present study however, we see the opposite effect on decoder performance. First, Figure 3 – figure supplement 1 & 2 clearly show that decoders constructed from hybrid-space features outperform the other input feature (sensor-, wholebrain parcel- and whole-brain voxel-) spaces in every case (e.g. – wideband, all narrowband frequency ranges, and even after the input space is fully orthogonalized through dimensionality reduction procedures prior to the decoding step). Furthermore, Figure 3 – figure supplement 6 shows that hybrid-space decoder performance supers when parceltime series that spatially overlap with the included regional voxel-spaces are removed from the input feature set. 

      We state in the Discussion (lines 353-356)

      “The observation of increased cross-validated test accuracy (as shown in Figure 3 – Figure Supplement 6) indicates that the spatially overlapping information in parcel- and voxel-space time-series in the hybrid decoder was complementary, rather than redundant [41].”

      To gain insight into the complimentary information contributed by the two spatial scales to the hybrid-space decoder, we first independently computed the matrix rank for whole-brain parcel- and voxel-space input features for each participant (shown in Author response image 1). The results indicate that whole-brain parcel-space input features are full rank (rank = 148) for all participants (i.e. - MEG activity is orthogonal between all parcels). The matrix rank of voxelspace input features (rank = 267± 17 SD), exceeded the parcel-space rank for all participants and approached the number of useable MEG sensor channels (n = 272). Thus, voxel-space features provide both additional and complimentary information to representations at the parcel-space scale.  

      Author response image 1.

      Matrix rank computed for whole-brain parcel- and voxel-space time-series in individual subjects across the training run. The results indicate that whole-brain parcel-space input features are full rank (rank = 148) for all participants (i.e. - MEG activity is orthogonal between all parcels). The matrix rank of voxel-space input features (rank = 267 ± 17 SD), on the other hand, approached the number of useable MEG sensor channels (n = 272). Although not full rank, the voxel-space rank exceeded the parcel-space rank for all participants. Thus, some voxel-space features provide additional orthogonal information to representations at the parcel-space scale.  An expression of this is shown in the correlation distribution between parcel and constituent voxel time-series in Figure 2—figure Supplement 2.

      Figure 2—figure Supplement 2 in the revised manuscript now shows that the degree of dependence between the two spatial scales varies over the regional voxel-space. That is, some voxels within a given parcel correlate strongly with the time-series of the parcel they belong to, while others do not. This finding is consistent with a documented increase in correlational structure of neural activity across spatial scales that does not reflect perfect dependency or orthogonality [9]. Notably, the regional voxel-spaces included in the hybridspace decoder are significantly less correlated with the averaged parcel-space time-series than excluded voxels. We now point readers to this new figure in the results.

      Taken together, these results indicate that the multi-scale information in the hybrid feature set is complimentary rather than orthogonal.  This is consistent with the idea that hybridspace features better represent multi-scale temporospatial dynamics reported to be a fundamental characteristic of how the brain stores and adapts memories, and generates behavior across species [9].  

      Reviewer #2 (Public review): 

      Summary: 

      The current paper consists of two parts. The first part is the rigorous feature optimization of the MEG signal to decode individual finger identity performed in a sequence (4-1-3-2-4; 1~4 corresponds to little~index fingers of the left hand). By optimizing various parameters for the MEG signal, in terms of (i) reconstructed source activity in voxel- and parcel-level resolution and their combination, (ii) frequency bands, and (iii) time window relative to press onset for each finger movement, as well as the choice of decoders, the resultant "hybrid decoder" achieved extremely high decoding accuracy (~95%). This part seems driven almost by pure engineering interest in gaining as high decoding accuracy as possible. 

      In the second part of the paper, armed with the successful 'hybrid decoder,' the authors asked more scientific questions about how neural representation of individual finger movement that is embedded in a sequence, changes during a very early period of skill learning and whether and how such representational change can predict skill learning. They assessed the difference in MEG feature patterns between the first and the last press 4 in sequence 41324 at each training trial and found that the pattern differentiation progressively increased over the course of early learning trials. Additionally, they found that this pattern differentiation specifically occurred during the rest period rather than during the practice trial. With a significant correlation between the trial-by-trial profile of this pattern differentiation and that for accumulation of offline learning, the authors argue that such "contextualization" of finger movement in a sequence (e.g., what-where association) underlies the early improvement of sequential skill. This is an important and timely topic for the field of motor learning and beyond. 

      Strengths: 

      Each part has its own strength. For the first part, the use of temporally rich neural information (MEG signal) has a significant advantage over previous studies testing sequential representations using fMRI. This allowed the authors to examine the earliest period (= the first few minutes of training) of skill learning with finer temporal resolution. Through the optimization of MEG feature extraction, the current study achieved extremely high decoding accuracy (approx. 94%) compared to previous works. For the second part, the finding of the early "contextualization" of the finger movement in a sequence and its correlation to early (offline) skill improvement is interesting and important. The comparison between "online" and "offline" pattern distance is a neat idea. 

      Weaknesses: 

      Despite the strengths raised, the specific goal for each part of the current paper, i.e., achieving high decoding accuracy and answering the scientific question of early skill learning, seems not to harmonize with each other very well. In short, the current approach, which is solely optimized for achieving high decoding accuracy, does not provide enough support and interpretability for the paper's interesting scientific claim. This reminds me of the accuracy-explainability tradeoff in machine learning studies (e.g., Linardatos et al., 2020). More details follow. 

      There are a number of different neural processes occurring before and after a key press, such as planning of upcoming movement and ahead around premotor/parietal cortices, motor command generation in primary motor cortex, sensory feedback related processes in sensory cortices, and performance monitoring/evaluation around the prefrontal area. Some of these may show learning-dependent change and others may not.  

      In this paper, the focus as stated in the Introduction was to evaluate “the millisecond-level differentiation of discrete action representations during learning”, a proposal that first required the development of more accurate computational tools.  Our first step, reported here, was to develop that tool. With that in hand, we then proceeded to test if neural representations differentiated during early skill learning. Our results showed they did.  Addressing the question the Reviewer asks is part of exciting future work, now possible based on the results presented in this paper.  We acknowledge this issue in the revised Discussion:  

      Discussion (Lines 428-434):

      “In this study, classifiers were trained on MEG activity recorded during or immediately after each keypress, emphasizing neural representations related to action execution, memory consolidation and recall over those related to planning. An important direction for future research is determining whether separate decoders can be developed to distinguish the representations or networks separately supporting these processes. Ongoing work in our lab is addressing this question. The present accuracy results across varied decoding window durations and alignment with each keypress action support the feasibility of this approach (Figure 3—figure supplement 5).”

      Given the use of whole-brain MEG features with a wide time window (up to ~200 ms after each key press) under the situation of 3~4 Hz (i.e., 250~330 ms press interval) typing speed, these different processes in different brain regions could have contributed to the expression of the "contextualization," making it difficult to interpret what really contributed to the "contextualization" and whether it is learning related. Critically, the majority of data used for decoder training has the chance of such potential overlap of signal, as the typing speed almost reached a plateau already at the end of the 11th trial and stayed until the 36th trial. Thus, the decoder could have relied on such overlapping features related to the future presses. If that is the case, a gradual increase in "contextualization" (pattern separation) during earlier trials makes sense, simply because the temporal overlap of the MEG feature was insufficient for the earlier trials due to slower typing speed.  Several direct ways to address the above concern, at the cost of decoding accuracy to some degree, would be either using the shorter temporal window for the MEG feature or training the model with the early learning period data only (trials 1 through 11) to see if the main results are unaffected would be some example. 

      We now include additional analyses carried out with decoding time windows ranging from 50 to 250ms in duration, which have been added to the revised manuscript as follows: 

      Results (lines 258-261):

      “The improved decoding accuracy is supported by greater differentiation in neural representations of the index finger keypresses performed at positions 1 and 5 of the sequence (Figure 4A), and by the trial-by-trial increase in 2-class decoding accuracy over early learning (Figure 4C) across different decoder window durations (Figure 4 – figure supplement 2).”

      Results (lines 310-312):

      “Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R<sup>2</sup> = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C).“

      Discussion (lines 382-385):

      “This was further supported by the progressive differentiation of neural representations of the index finger keypress (Figure 4A) and by the robust trial-by trial increase in 2-class decoding accuracy across time windows ranging between 50 and 250ms (Figure 4C; Figure 4 – figure supplement 2).”

      Discussion (lines 408-9):

      “Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1).”

      Several new control analyses are also provided addressing the question of overlapping keypresses:

      Reviewer #3 (Public review):

      Summary: 

      One goal of this paper is to introduce a new approach for highly accurate decoding of finger movements from human magnetoencephalography data via dimension reduction of a "multi-scale, hybrid" feature space. Following this decoding approach, the authors aim to show that early skill learning involves "contextualization" of the neural coding of individual movements, relative to their position in a sequence of consecutive movements.

      Furthermore, they aim to show that this "contextualization" develops primarily during short rest periods interspersed with skill training and correlates with a performance metric which the authors interpret as an indicator of offline learning. 

      Strengths: 

      A strength of the paper is the innovative decoding approach, which achieves impressive decoding accuracies via dimension reduction of a "multi-scale, hybrid space". This hybridspace approach follows the neurobiologically plausible idea of concurrent distribution of neural coding across local circuits as well as large-scale networks. A further strength of the study is the large number of tested dimension reduction techniques and classifiers. 

      Weaknesses: 

      A clear weakness of the paper lies in the authors' conclusions regarding "contextualization". Several potential confounds, which partly arise from the experimental design (mainly the use of a single sequence) and which are described below, question the neurobiological implications proposed by the authors and provide a simpler explanation of the results. Furthermore, the paper follows the assumption that short breaks result in offline skill learning, while recent evidence, described below, casts doubt on this assumption.  

      Please, see below for detailed response to each of these points.

      Specifically: The authors interpret the ordinal position information captured by their decoding approach as a reflection of neural coding dedicated to the local context of a movement (Figure 4). One way to dissociate ordinal position information from information about the moving effectors is to train a classifier on one sequence and test the classifier on other sequences that require the same movements, but in different positions (Kornysheva et al., Neuron 2019). In the present study, however, participants trained to repeat a single sequence (4-1-3-2-4).

      A crucial difference between our present study and the elegant study from Kornysheva et al. (2019) in Neuron highlighted by the Reviewer is that while ours is a learning study, the Kornysheva et al. study is not. Kornysheva et al. included an initial separate behavioral training session (i.e. – performed outside of the MEG) during which participants learned associations between fractal image patterns and different keypress sequences. Then in a separate, later MEG session—after the stimulus-response associations had been already learned in the first session—participants were tasked with recalling the learned sequences in response to a presented visual cue (i.e. – the paired fractal pattern). 

      Our rationale for not including multiple sequences in the same Day 1 training session of our study design was that it would lead to prominent interference effects, as widely reported in the literature [10-12].  Thus, while we had to take the issue of interference into consideration for our design, the Kornysheva et al. study did not. While Kornysheva et al. aimed to “dissociate ordinal position information from information about the moving effectors”, we tested various untrained sequences on Day 2 allowing us to determine that the contextualization result was specific to the trained sequence. By using this approach, we avoided interference effects on the learning of the primary skill caused by simultaneous acquisition of a second skill.

      The revised manuscript states our findings related to the Day 2 Control data in the following locations:

      Results (lines 117-122):

      “On the following day, participants were retested on performance of the same sequence (4-1-3-2-4) over 9 trials (Day 2 Retest), as well as on the single-trial performance of 9 different untrained control sequences (Day 2 Controls: 2-1-3-4-2, 4-2-4-3-1, 3-4-2-3-1, 1-4-3-4-2, 3-2-4-3-1, 1-4-2-3-1, 3-2-4-2-1, 3-2-1-4-2, and 4-23-1-4). As expected, an upward shift in performance of the trained sequence (0.68 ± SD 0.56 keypresses/s; t = 7.21, p < 0.001) was observed during Day 2 Retest, indicative of an overnight skill consolidation effect (Figure 1 – figure supplement 1A).”

      Results (lines 212-219):

      “Utilizing the highest performing decoders that included LDA-based manifold extraction, we assessed the robustness of hybrid-space decoding over multiple sessions by applying it to data collected on the following day during the Day 2 Retest (9-trial retest of the trained sequence) and Day 2 Control (single-trial performance of 9 different untrained sequences) blocks. The decoding accuracy for Day 2 MEG data remained high (87.11% ± SD 8.54% for the trained sequence during Retest, and 79.44% ± SD 5.54% for the untrained Control sequences; Figure 3 – figure supplement 4). Thus, index finger classifiers constructed using the hybrid decoding approach robustly generalized from Day 1 to Day 2 across trained and untrained keypress sequences.”

      Results (lines 269-273):

      “On Day 2, incorporating contextual information into the hybrid-space decoder enhanced classification accuracy for the trained sequence only (improving from 87.11% for 4-class to 90.22% for 5-class), while performing at or below-chance levels for the Control sequences (≤ 30.22% ± SD 0.44%). Thus, the accuracy improvements resulting from inclusion of contextual information in the decoding framework was specific for the trained skill sequence.”

      As a result, ordinal position information is potentially confounded by the fixed finger transitions around each of the two critical positions (first and fifth press). Across consecutive correct sequences, the first keypress in a given sequence was always preceded by a movement of the index finger (=last movement of the preceding sequence), and followed by a little finger movement. The last keypress, on the other hand, was always preceded by a ring finger movement, and followed by an index finger movement (=first movement of the next sequence). Figure 4 - supplement 2 shows that finger identity can be decoded with high accuracy (>70%) across a large time window around the time of the keypress, up to at least +/-100 ms (and likely beyond, given that decoding accuracy is still high at the boundaries of the window depicted in that figure). This time window approaches the keypress transition times in this study. Given that distinct finger transitions characterized the first and fifth keypress, the classifier could thus rely on persistent (or "lingering") information from the preceding finger movement, and/or "preparatory" information about the subsequent finger movement, in order to dissociate the first and fifth keypress. 

      Currently, the manuscript provides little evidence that the context information captured by the decoding approach is more than a by-product of temporally extended, and therefore overlapping, but independent neural representations of consecutive keypresses that are executed in close temporal proximity - rather than a neural representation dedicated to context. 

      During the review process, the authors pointed out that a "mixing" of temporally overlapping information from consecutive keypresses, as described above, should result in systematic misclassifications and therefore be detectable in the confusion matrices in Figures 3C and 4B, which indeed do not provide any evidence that consecutive keypresses are systematically confused. However, such absence of evidence (of systematic misclassification) should be interpreted with caution, and, of course, provides no evidence of absence. The authors also pointed out that such "mixing" would hamper the discriminability of the two ordinal positions of the index finger, given that "ordinal position 5" is systematically followed by "ordinal position 1". This is a valid point which, however, cannot rule out that "contextualization" nevertheless reflects the described "mixing".

      The revised manuscript contains several control analyses which rule out this potential confound.

      Results (lines 318-328):

      “Within-subject correlations were consistent with these group-level findings. The average correlation between offline contextualization and micro-offline gains within individuals was significantly greater than zero (Figure 5 – figure supplement 4, left; t = 3.87, p = 0.00035, df = 25, Cohen's d = 0.76) and stronger than correlations between online contextualization and either micro-online (Figure 5 – figure supplement 4, middle; t = 3.28, p = 0.0015, df = 25, Cohen's d = 1.2) or micro-offline gains (Figure 5 – figure supplement 4, right; t = 3.7021, p = 5.3013e-04, df = 25, Cohen's d = 0.69). These findings were not explained by behavioral changes of typing rhythm (t = -0.03, p = 0.976; Figure 5 – figure supplement 5), adjacent keypress transition times (R<sup>2</sup> = 0.00507, F[1,3202] = 16.3; Figure 5 – figure supplement 6), or overall typing speed (between-subject; R<sup>2</sup> = 0.028, p \= 0.41; Figure 5 – figure supplement 7).”

      Results (lines 385-390):

      “Further, the 5-class classifier—which directly incorporated information about the sequence location context of each keypress into the decoding pipeline—improved decoding accuracy relative to the 4-class classifier (Figure 4C). Importantly, testing on Day 2 revealed specificity of this representational differentiation for the trained skill but not for the same keypresses performed during various unpracticed control sequences (Figure 5C).”

      Discussion (lines 408-423):

      “Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1). This result remained unchanged when measuring offline contextualization between the last and second sequence of consecutive trials, inconsistent with a possible confounding effect of pre-planning [30] (Figure 5 – figure supplement 2A). On the other hand, online contextualization did not predict learning (Figure 5 – figure supplement 3). Consistent with these results the average within-subject correlation between offline contextualization and micro-offline gains was significantly stronger than within subject correlations between online contextualization and either micro-online or micro-offline gains (Figure 5 – figure supplement 4). 

      Offline contextualization was not driven by trial-by-trial behavioral differences, including typing rhythm (Figure 5 – figure supplement 5) and adjacent keypress transition times (Figure 5 – figure supplement 6) nor by between-subject differences in overall typing speed (Figure 5 – figure supplement 7)—ruling out a reliance on differences in the temporal overlap of keypresses. Importantly, offline contextualization documented on Day 1 stabilized once a performance plateau was reached (trials 11-36), and was retained on Day 2, documenting overnight consolidation of the differentiated neural representations.”

      During the review process, the authors responded to my concern that training of a single sequence introduces the potential confound of "mixing" described above, which could have been avoided by training on several sequences, as in Kornysheva et al. (Neuron 2019), by arguing that Day 2 in their study did include control sequences. However, the authors' findings regarding these control sequences are fundamentally different from the findings in Kornysheva et al. (2019), and do not provide any indication of effector-independent ordinal information in the described contextualization - but, actually, the contrary. In Kornysheva et al. (Neuron 2019), ordinal, or positional, information refers purely to the rank of a movement in a sequence. In line with the idea of competitive queuing, Kornysheva et al. (2019) have shown that humans prepare for a motor sequence via a simultaneous representation of several of the upcoming movements, weighted by their rank in the sequence. Importantly, they could show that this gradient carries information that is largely devoid of information about the order of specific effectors involved in a sequence, or their timing, in line with competitive queuing. They showed this by training a classifier to discriminate between the five consecutive movements that constituted one specific sequence of finger movements (five classes: 1st, 2nd, 3rd, 4th, 5th movement in the sequence) and then testing whether that classifier could identify the rank (1st, 2nd, 3rd, etc) of movements in another sequence, in which the fingers moved in a different order, and with different timings. Importantly, this approach demonstrated that the graded representations observed during preparation were largely maintained after this cross decoding, indicating that the sequence was represented via ordinal position information that was largely devoid of information about the specific effectors or timings involved in sequence execution. This result differs completely from the findings in the current manuscript. Dash et al. report a drop in detected ordinal position information (degree of contextualization in figure 5C) when testing for contextualization in their novel, untrained sequences on Day 2, indicating that context and ordinal information as defined in Dash et al. is not at all devoid of information about the specific effectors involved in a sequence. In this regard, a main concern in my public review, as well as the second reviewer's public review, is that Dash et al. cannot tell apart, by design, whether there is truly contextualization in the neural representation of a sequence (which they claim), or whether their results regarding "contextualization" are explained by what they call "mixing" in their author response, i.e., an overlap of representations of consecutive movements, as suggested as an alternative explanation by Reviewer 2 and myself.

      Again, as stated in response to a related comment by the Reviewer above, it is not surprising that our results differ from the study by Kornysheva et al. (2019) . A crucial difference between the studies that the Reviewer fails to recognize is that while ours is a learning study, the Kornysheva et al. study is not. Our rationale for not including multiple sequences in the same Day 1 training session of our study design was that it would lead to prominent interference effects, as widely reported in the literature [10-12].  Thus, while we had to take the issue of interference into consideration for our design, the Kornysheva et al. study did not, since it was not concerned with learning dynamics. The strengths of the elegant Kornysheva study highlighted by the Reviewer—that the pre-planned sequence queuing gradient of sequence actions was independent of the effectors or timings used—is precisely due to the fact that participants were selecting between sequence options that had been previously—and equivalently—learned. The decoders in the Kornynsheva study were trained to classify effector- and timing-independent sequence position information— by design—so it is not surprising that this is the information they reflect.

      The questions asked in our study were different: 1) Do the neural representations of the same sequence action executed in different skill (ordinal sequence) locations differentiate (contextualize) during early learning?  and 2) Is the observed contextualization specific to the learned sequence? Thus, while Kornysheva et al. aimed to “dissociate ordinal position information from information about the moving effectors”, we tested various untrained sequences on Day 2 allowing us to determine that the contextualization result was specific to the trained sequence. By using this approach, we avoided interference effects on the learning of the primary skill caused by simultaneous acquisition of a second skill.

      Such temporal overlap of consecutive, independent finger representations may also account for the dynamics of "ordinal coding"/"contextualization", i.e., the increase in 2class decoding accuracy, across Day 1 (Figure 4C). As learning progresses, both tapping speed and the consistency of keypress transition times increase (Figure 1), i.e., consecutive keypresses are closer in time, and more consistently so. As a result, information related to a given keypress is increasingly overlapping in time with information related to the preceding and subsequent keypresses. The authors seem to argue that their regression analysis in Figure 5 - figure supplement 3 speaks against any influence of tapping speed on "ordinal coding" (even though that argument is not made explicitly in the manuscript). However, Figure 5 - figure supplement 3 shows inter-individual differences in a between-subject analysis (across trials, as in panel A, or separately for each trial, as in panel B), and, therefore, says little about the within-subject dynamics of "ordinal coding" across the experiment. A regression of trial-by-trial "ordinal coding" on trial-by-trial tapping speed (either within-subject, or at a group-level, after averaging across subjects) could address this issue. Given the highly similar dynamics of "ordinal coding" on the one hand (Figure 4C), and tapping speed on the other hand (Figure 1B), I would expect a strong relationship between the two in the suggested within-subject (or group-level) regression. 

      The aim of the between-subject regression analysis presented in the Results (see below) and in Figure 5—figure supplement 7 (previously Figure 5—figure supplement 3) of the revised manuscript, was to rule out a general effect of tapping speed on the magnitude of contextualization observed. If temporal overlap of neural representations was driving their differentiation, then participants typing at higher speeds should also show greater contextualization scores. We made the decision to use a between-subject analysis to address this issue since within-subject skill speed variance was rather small over most of the training session. 

      The Reviewer’s request that we additionally carry-out a “regression of trial-by-trial "ordinal coding" on trial-by-trial tapping speed (either within-subject, or at a group-level, after averaging across subjects)” is essentially the same request of Reviewer 2 above. That request was to perform a modified simple linear regression analysis where the predictor is the sum the 4-4 and 4-1 transition times, since these transitions are where any temporal overlaps of neural representations would occur.  A new Figure 5 – figure supplement 6 in the revised manuscript includes a scatter plot showing the sum of adjacent index finger keypress transition times (i.e. – the 4-4 transition at the conclusion of one sequence iteration and the 4-1 transition at the beginning of the next sequence iteration) versus online contextualization distances measured during practice trials. Both the keypress transition times and online contextualization scores were z-score normalized within individual subjects, and then concatenated into a single data superset. As is clear in the figure data, results of the regression analysis showed a very weak linear relationship between the two (R<sup>2</sup> = 0.00507, F[1,3202] = 16.3). Thus, contextualization score magnitudes do not reflect the amount of overlap between adjacent keypresses when assessed either within- or between-subject.

      The revised manuscript now states:

      Results (lines 318-328):

      “Within-subject correlations were consistent with these group-level findings. The average correlation between offline contextualization and micro-offline gains within individuals was significantly greater than zero (Figure 5 – figure supplement 4, left; t = 3.87, p = 0.00035, df = 25, Cohen's d = 0.76) and stronger than correlations between online contextualization and either micro-online (Figure 5 – figure supplement 4, middle; t = 3.28, p = 0.0015, df = 25, Cohen's d = 1.2) or micro-offline gains (Figure 5 – figure supplement 4, right; t = 3.7021, p = 5.3013e-04, df = 25, Cohen's d = 0.69). These findings were not explained by behavioral changes of typing rhythm (t = -0.03, p = 0.976; Figure 5 – figure supplement 5), adjacent keypress transition times (R<sup>2</sup> = 0.00507, F[1,3202] = 16.3; Figure 5 – figure supplement 6), or overall typing speed (between-subject; R<sup>2</sup> = 0.028, p \= 0.41; Figure 5 – figure supplement 7).”

      Furthermore, learning should increase the number of (consecutively) correct sequences, and, thus, the consistency of finger transitions. Therefore, the increase in 2-class decoding accuracy may simply reflect an increasing overlap in time of increasingly consistent information from consecutive keypresses, which allows the classifier to dissociate the first and fifth keypress more reliably as learning progresses, simply based on the characteristic finger transitions associated with each. In other words, given that the physical context of a given keypress changes as learning progresses - keypresses move closer together in time and are more consistently correct - it seems problematic to conclude that the mental representation of that context changes. To draw that conclusion, the physical context should remain stable (or any changes to the physical context should be controlled for). 

      The revised manuscript now addresses specifically the question of mixing of temporally overlapping information:

      Results (Lines 310-328)

      “Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R<sup>2</sup> = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure [30] (Figure 5 – figure supplement 2A, 1st vs. 2nd Sequence approaches). Conversely, online contextualization (using either measurement approach) did not explain early online learning gains (i.e. – Figure 5 – figure supplement 3). Within-subject correlations were consistent with these group-level findings. The average correlation between offline contextualization and micro-offline gains within individuals was significantly greater than zero (Figure 5 – figure supplement 4, left; t = 3.87, p = 0.00035, df = 25, Cohen's d = 0.76) and stronger than correlations between online contextualization and either micro-online (Figure 5 – figure supplement 4, middle; t = 3.28, p = 0.0015, df = 25, Cohen's d = 1.2) or micro-offline gains (Figure 5 – figure supplement 4, right; t = 3.7021, p = 5.3013e-04, df = 25, Cohen's d = 0.69). These findings were not explained by behavioral changes of typing rhythm (t = -0.03, p = 0.976; Figure 5 – figure supplement 5), adjacent keypress transition times (R<sup>2</sup> = 0.00507, F[1,3202] = 16.3; Figure 5 – figure supplement 6), or overall typing speed (between-subject; R<sup>2</sup> = 0.028, p \= 0.41; Figure 5 – figure supplement 7). “

      Discussion (Lines 417-423)

      “Offline contextualization was not driven by trial-by-trial behavioral differences, including typing rhythm (Figure 5 – figure supplement 5) and adjacent keypress transition times (Figure 5 – figure supplement 6) nor by between-subject differences in overall typing speed (Figure 5 – figure supplement 7)—ruling out a reliance on differences in the temporal overlap of keypresses. Importantly, offline contextualization documented on Day 1 stabilized once a performance plateau was reached (trials 11-36), and was retained on Day 2, documenting overnight consolidation of the differentiated neural representations.”

      A similar difference in physical context may explain why neural representation distances ("differentiation") differ between rest and practice (Figure 5). The authors define "offline differentiation" by comparing the hybrid space features of the last index finger movement of a trial (ordinal position 5) and the first index finger movement of the next trial (ordinal position 1). However, the latter is not only the first movement in the sequence but also the very first movement in that trial (at least in trials that started with a correct sequence), i.e., not preceded by any recent movement. In contrast, the last index finger of the last correct sequence in the preceding trial includes the characteristic finger transition from the fourth to the fifth movement. Thus, there is more overlapping information arising from the consistent, neighbouring keypresses for the last index finger movement, compared to the first index finger movement of the next trial. A strong difference (larger neural representation distance) between these two movements is, therefore, not surprising, given the task design, and this difference is also expected to increase with learning, given the increase in tapping speed, and the consequent stronger overlap in representations for consecutive keypresses. Furthermore, initiating a new sequence involves pre-planning, while ongoing practice relies on online planning (Ariani et al., eNeuro 2021), i.e., two mental operations that are dissociable at the level of neural representation (Ariani et al., bioRxiv 2023).  

      The revised manuscript now addresses specifically the question of pre-planning:

      Results (lines 310-318):

      “Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R<sup>2</sup> = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure [30] (Figure 5 – figure supplement 2A, 1st vs. 2nd Sequence approaches). Conversely, online contextualization (using either measurement approach) did not explain early online learning gains (i.e. – Figure 5 – figure supplement 3).”

      Discussion (lines 408-416):

      “Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1). This result remained unchanged when measuring offline contextualization between the last and second sequence of consecutive trials, inconsistent with a possible confounding effect of pre-planning [30] (Figure 5 – figure supplement 2A). On the other hand, online contextualization did not predict learning (Figure 5 – figure supplement 3). Consistent with these results the average within-subject correlation between offline contextualization and micro-offline gains was significantly stronger than within-subject correlations between online contextualization and either micro-online or micro-offline gains (Figure 5 – figure supplement 4).”

      A further complication in interpreting the results stems from the visual feedback that participants received during the task. Each keypress generated an asterisk shown above the string on the screen. It is not clear why the authors introduced this complicating visual feedback in their task, besides consistency with their previous studies. The resulting systematic link between the pattern of visual stimulation (the number of asterisks on the screen) and the ordinal position of a keypress makes the interpretation of "contextual information" that differentiates between ordinal positions difficult. During the review process, the authors reported a confusion matrix from a classification of asterisks position based on eye tracking data recorded during the task and concluded that the classifier performed at chance level and gaze was, thus, apparently not biased by the visual stimulation. However, the confusion matrix showed a huge bias that was difficult to interpret (a very strong tendency to predict one of the five asterisk positions, despite chance-level performance). Without including additional information for this analysis (or simply the gaze position as a function of the number of astersisk on the screen) in the manuscript, this important control analysis cannot be properly assessed, and is not available to the public.  

      We now include the gaze position data requested by the Reviewer alongside the confusion matrix results in Figure 4 – figure supplement 3.

      Results (lines 207-211):

      “An alternate decoder trained on ICA components labeled as movement or physiological artefacts (e.g. – head movement, ECG, eye movements and blinks; Figure 3 – figure supplement 3A, D) and removed from the original input feature set during the pre-processing stage approached chance-level performance (Figure 4 – figure supplement 3), indicating that the 4-class hybrid decoder results were not driven by task-related artefacts.” Results (lines 261-268):

      “As expected, the 5-class hybrid-space decoder performance approached chance levels when tested with randomly shuffled keypress labels (18.41%± SD 7.4% for Day 1 data; Figure 4 – figure supplement 3C). Task-related eye movements did not explain these results since an alternate 5-class hybrid decoder constructed from three eye movement features (gaze position at the KeyDown event, gaze position 200ms later, and peak eye movement velocity within this window; Figure 4 – figure supplement 3A) performed at chance levels (cross-validated test accuracy = 0.2181; Figure 4 – figure supplement 3B, C). “

      Discussion (Lines 362-368):

      “Task-related movements—which also express in lower frequency ranges—did not explain these results given the near chance-level performance of alternative decoders trained on (a) artefact-related ICA components removed during MEG preprocessing (Figure 3 – figure supplement 3A-C) and on (b) task-related eye movement features (Figure 4 – figure supplement 3B, C). This explanation is also inconsistent with the minimal average head motion of 1.159 mm (± 1.077 SD) across the MEG recording (Figure 3 – figure supplement 3D).”

      The rationale for the task design including the asterisks is presented below:

      Methods (Lines 500-514)

      “The five-item sequence was displayed on the computer screen for the duration of each practice round and participants were directed to fix their gaze on the sequence. Small asterisks were displayed above a sequence item after each successive keypress, signaling the participants' present position within the sequence. Inclusion of this feedback minimizes working memory loads during task performance [73]. Following the completion of a full sequence iteration, the asterisk returned to the first sequence item. The asterisk did not provide error feedback as it appeared for both correct and incorrect keypresses. At the end of each practice round, the displayed number sequence was replaced by a string of five "X" symbols displayed on the computer screen, which remained for the duration of the rest break. Participants were instructed to focus their gaze on the screen during this time. The behavior in this explicit, motor learning task consists of generative action sequences rather than sequences of stimulus-induced responses as in the serial reaction time task (SRTT). A similar real-world example would be manually inputting a long password into a secure online application in which one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks), which is typically ignored by the user.”

      The authors report a significant correlation between "offline differentiation" and cumulative micro-offline gains. However, this does not address the question whether there is a trial-by-trial relation between the degree of "contextualization" and the amount of micro-offline gains - i.e., the question whether performance changes (micro-offline gains) are less pronounced across rest periods for which the change in "contextualization" is relatively low. The single-subject correlation between contextualization changes "during" rest and micro-offline gains (Figure 5 - figure supplement 4) addresses this question, however, the critical statistical test (are correlation coefficients significantly different from zero) is not included. Given the displayed distribution, it seems unlikely that correlation coefficients are significantly above zero. 

      As recommend by the Reviewer, we now include one-way right-tailed t-test results which provide further support to the previously reported finding. The mean of within-subject correlations between offline contextualization and cumulative micro-offline gains was significantly greater than zero (t = 3.87, p = 0.00035, df = 25, Cohen's d = 0.76; see Figure 5 – figure supplement 4, left), while correlations for online contextualization versus cumulative micro-online (t = -1.14, p = 0.8669, df = 25, Cohen's d = -0.22) or micro-offline gains t = -0.097, p = 0.5384, df = 25, Cohen's d = -0.019) were not. We have incorporated the significant one-way t-test for offline contextualization and cumulative micro-offline gains in the Results section of the revised manuscript (lines 313-318) and the Figure 5 – figure supplement 4 legend.

      The authors follow the assumption that micro-offline gains reflect offline learning.

      However, there is no compelling evidence in the literature, and no evidence in the present manuscript, that micro-offline gains (during any training phase) reflect offline learning. Instead, emerging evidence in the literature indicates that they do not (Das et al., bioRxiv 2024), and instead reflect transient performance benefits when participants train with breaks, compared to participants who train without breaks, however, these benefits vanish within seconds after training if both groups of participants perform under comparable conditions (Das et al., bioRxiv 2024). During the review process, the authors argued that differences in the design between Das et al. (2024) on the one hand (Experiments 1 and 2), and the study by Bönstrup et al. (2019) on the other hand, may have prevented Das et al. (2024) from finding the assumed (lasting) learning benefit by micro-offline consolidation. However, the Supplementary Material of Das et al. (2024) includes an experiment (Experiment S1) whose design closely follows the early learning phase of Bönstrup et al. (2019), and which, nevertheless, demonstrates that there is no lasting benefit of taking breaks for the acquired skill level, despite the presence of micro-offline gains. 

      We thank the Reviewer for alerting us to this new data added to the revised supplementary materials of Das et al. (2024) posted to bioRxiv. However, despite the Reviewer’s claim to the contrary, a careful comparison between the Das et al and Bönstrup et al studies reveal more substantive differences than similarities and does not “closely follows a large proportion of the early learning phase of Bönstrup et al. (2019)” as stated. 

      In the Das et al. Experiment S1, sixty-two participants were randomly assigned to “with breaks” or “no breaks” skill training groups. The “with breaks” group alternated 10 seconds of skill sequence practice with 10 seconds of rest over seven trials (2 min and 2 sec total training duration). This amounts to 66.7% of the early learning period defined by Bönstrup et al. (2019) (i.e. - eleven 10-second-long practice periods interleaved with ten 10-second-long rest breaks; 3 min 30 sec total training duration).  

      Also, please note that while no performance feedback nor reward was given in the Bönstrup et al. (2019) study, participants in the Das et al. study received explicit performance-based monetary rewards, a potentially crucial driver of differentiated behavior between the two studies:

      “Participants were incentivized with bonus money based on the total number of correct sequences completed throughout the experiment.”

      The “no breaks” group in the Das et al. study practiced the skill sequence for 70 continuous seconds. Both groups (despite one being labeled “no breaks”) follow training with a long 3-minute break (also note that since the “with breaks” group ends with 10 seconds of rest their break is actually longer), before finishing with a skill “test” over a continuous 50-second-long block. During the 70 seconds of training, the “with breaks” group shows more learning than the “no breaks” group. Interestingly, following the long 3minute break the “with breaks” group display a performance drop (relative to their performance at the end of training) that is stable over the full 50-second test, while the “no breaks” group shows an immediate performance improvement following the long break that continues to increase over the 50-second test.  

      Separately, there are important issues regarding the Das et al. study that should be considered through the lens of recent findings not referred to in the preprint. A major element of their experimental design is that both groups—“with breaks” and “no breaks”— actually receive quite a long 3-minute break just before the skill test. This long break is more than 2.5x the cumulative interleaved rest experienced by the “with breaks” group. Thus, although the design is intended to contrast the presence or absence of rest “breaks”, that difference between groups is no longer maintained at the point of the skill test. 

      The Das et al. results are most consistent with an alternative interpretation of the data— that the “no breaks” group experiences offline learning during their long 3-minute break. This is supported by the recent work of Griffin et al. (2025) where micro-array recordings from primary and premotor cortex were obtained from macaque monkeys while they performed blocks of ten continuous reaching sequences up to 81.4 seconds in duration (see source data for Extended Data Figure 1h) with 90 seconds of interleaved rest. Griffin et al. observed offline improvement in skill immediately following the rest break that was causally related to neural reactivations (i.e. – neural replay) that occurred during the rest break. Importantly, the highest density of reactivations was present in the very first 90second break between Blocks 1 and 2 (see Fig. 2f in Griffin et al., 2025). This supports the interpretation that both the “with breaks” and “no breaks” group express offline learning gains, with these gains being delayed in the “no breaks” group due to the practice schedule.

      On the other hand, if offline learning can occur during this longer break, then why would the “with breaks” group show no benefit? Again, it could be that most of the offline gains for this group were front-loaded during the seven shorter 10-second rest breaks. Another possible, though not mutually exclusive, explanation is that the observed drop in performance in the “with breaks” group is driven by contextual interference. Specifically, similar to Experiments 1 and 2 in Das et al. (2024), the skill test is conducted under very different conditions than those which the “with breaks” group practiced the skill under (short bursts of practiced alternating with equally short breaks). On the other hand, the “no breaks” group is tested (50 seconds of continuous practice) under quite similar conditions to their training schedule (70 seconds of continuous practice). Thus, it is possible that this dissimilarity between training and test could lead to reduced performance in the “with breaks” group.

      We made the following manuscript revisions related to these important issues: 

      Introduction (Lines 26-56)

      “Practicing a new motor skill elicits rapid performance improvements (early learning) [1] that precede skill performance plateaus [5]. Skill gains during early learning accumulate over rest periods (micro-offline) interspersed with practice [1, 6-10], and are up to four times larger than offline performance improvements reported following overnight sleep [1]. During this initial interval of prominent learning, retroactive interference immediately following each practice interval reduces learning rates relative to interference after passage of time, consistent with stabilization of the motor memory [11]. Micro-offline gains observed during early learning are reproducible [7, 10-13] and are similar in magnitude even when practice periods are reduced by half to 5 seconds in length, thereby confirming that they are not merely a result of recovery from performance fatigue [11]. Additionally, they are unaffected by the random termination of practice periods, which eliminates the possibility of predictive motor slowing as a contributing factor [11]. Collectively, these behavioral findings point towards the interpretation that micro offline gains during early learning represent a form of memory consolidation [1]. 

      This interpretation has been further supported by brain imaging and electrophysiological studies linking known memory-related networks and consolidation mechanisms to rapid offline performance improvements. In humans, the rate of hippocampo-neocortical neural replay predicts micro-offline gains [6]. Consistent with these findings, Chen et al. [12] and Sjøgård et al. [13] furnished direct evidence from intracranial human EEG studies, demonstrating a connection between the density of hippocampal sharp-wave ripples (80-120 Hz)—recognized markers of neural replay—and micro-offline gains during early learning. Further, Griffin et al. reported that neural replay of task-related ensembles in the motor cortex of macaques during brief rest periods— akin to those observed in humans [1, 6-8, 14]—are not merely correlated with, but are causal drivers of micro-offline learning [15]. Specifically, the same reach directions that were replayed the most during rest breaks showed the greatest reduction in path length (i.e. – more efficient movement path between two locations in the reach sequence) during subsequent trials, while stimulation applied during rest intervals preceding performance plateau reduced reactivation rates and virtually abolished micro-offline gains [15]. Thus, converging evidence in humans and non-human primates across indirect non-invasive and direct invasive recording techniques link hippocampal activity, neural replay dynamics and offline skill gains in early motor learning that precede performance plateau.”

      Next, in the Methods, we articulate important constrains formulated by Pan and Rickard and Bonstrup et al for meaningful measurements:

      Methods (Lines 493-499)

      “The study design followed specific recommendations by Pan and Rickard (2015): 1) utilizing 10-second practice trials and 2) constraining analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur) that precede the emergence of “scalloped” performance dynamics strongly linked to reactive inhibition effects ( [29, 72]). This is precisely the portion of the learning curve Pan and Rickard referred to when they stated “…rapid learning during that period masks any reactive inhibition effect” [29].”

      We finally discuss the implications of neglecting some or all of these recommendations:

      Discussion (Lines 444-452):

      “Finally, caution should be exercised when extrapolating findings during early skill learning, a period of steep performance improvements, to findings reported after insufficient practice [67], post-plateau performance periods [68], or non-learning situations (e.g. performance of non-repeating keypress sequences in  [67]) when reactive inhibition or contextual interference effects are prominent. Ultimately, it will be important to develop new paradigms allowing one to independently estimate the different coincident or antagonistic features (e.g. - memory consolidation, planning, working memory and reactive inhibition) contributing to micro-online and micro-offline gains during and after early skill learning within a unifying framework.”

      Along these lines, the authors' claim, based on Bönstrup et al. 2020, that "retroactive interference immediately following practice periods reduces micro-offline learning", is not supported by that very reference. Citing Bönstrup et al. (2020), "Regarding early learning dynamics (trials 1-5), we found no differences in microscale learning parameters (micro online/offline) or total early learning between both interference groups." That is, contrary to Dash et al.'s current claim, Bönstrup et al. (2020) did not find any retroactive interference effect on the specific behavioral readout (micro-offline gains) that the authors assume to reflect consolidation. 

      Please, note that the Bönstrup et al. 2020 paper abstract states: 

      “Third, retroactive interference immediately after each practice period reduced the learning rate relative to interference after passage of time (N = 373), indicating stabilization of the motor memory at a microscale of several seconds.”

      which is further supported by this statement in the Results: 

      “The model comprised three parameters representing the initial performance, maximum performance and learning rate (see Eq. 1, “Methods”, “Data Analysis” section). We then statistically compared the model parameters between the interference groups (Fig. 2d). The late interference group showed a higher learning rate compared with the early interference group (late: 0.26 ± 0.23, early: 2.15 ± 0.20, P=0.04). The effect size of the group difference was small to medium (Cohen’s d 0.15)[29]. Similar differences with a stronger rise in the learning curve of a late interference groups vs. an early interference group were found in a smaller sample collected in the lab environment (Supplementary Fig. 3).”

      We have modified the statement in the revised manuscript to specify that the difference observed was between learning rates: Introduction (Lines 30-32)

      “During this initial interval of prominent learning, retroactive interference immediately following each practice interval reduces learning rates relative to interference after passage of time, consistent with stabilization of the motor memory [11].”

      The authors conclude that performance improves, and representation manifolds differentiate, "during" rest periods (see, e.g., abstract). However, micro-offline gains (as well as offline contextualization) are computed from data obtained during practice, not rest, and may, thus, just as well reflect a change that occurs "online", e.g., at the very onset of practice (like pre-planning) or throughout practice (like fatigue, or reactive inhibition).  

      The Reviewer raises again the issue of a potential confound of “pre-planning” on our contextualization measures as in the comment above: 

      “Furthermore, initiating a new sequence involves pre-planning, while ongoing practice relies on online planning (Ariani et al., eNeuro 2021), i.e., two mental operations that are dissociable at the level of neural representation (Ariani et al., bioRxiv 2023).”

      The cited studies by Ariani et al. indicate that effects of pre-planning are likely to impact the first 3 keypresses of the initial sequence iteration in each trial. As stated in the response to this comment above, we conducted a control analysis of contextualization that ignores the first sequence iteration in each trial to partial out any potential preplanning effect. This control analyses yielded comparable results, indicating that preplanning is not a major driver of our reported contextualization effects. We now report this in the revised manuscript:

      We also state in the Figure 1 legend (Lines 99-103) in the revised manuscript that preplanning has no effect on the behavioral measures of micro-offline and micro-online gains in our dataset:

      The Reviewer also raises the issue of possible effects stemming from “fatigue” and “reactive inhibition” which inhibit performance and are indeed relevant to skill learning studies. We designed our task to specifically mitigate these effects. We now more clearly articulate this rationale in the description of the task design as well as the measurement constraints essential for minimizing their impact.

      We also discuss the implications of fatigue and reactive inhibition effects in experimental designs that neglect to follow these recommendations formulated by Pan and Rickard in the Discussion section and propose how this issue can be better addressed in future investigations.

      To summarize, the results of our study indicate that: (a) offline contextualization effects are not explained by pre-planning of the first action sequence iteration in each practice trial; and (b) the task design implemented in this study purposefully minimize any possible effects of reactive inhibition or fatigue.  Circling back to the Reviewer’s proposal that “contextualization…may just as well reflect a change that occurs "online"”, we show in this paper direct empirical evidence that contextualization develops to a greater extent across rest periods rather than across practice trials, contrary to the Reviewer’s proposal.  

      That is, the definition of micro-offline gains (as well as offline contextualization) conflates online and "offline" processes. This becomes strikingly clear in the recent Nature paper by Griffin et al. (2025), who computed micro-offline gains as the difference in average performance across the first five sequences in a practice period (a block, in their terminology) and the last five sequences in the previous practice period. Averaging across sequences in this way minimises the chance to detect online performance changes and inflates changes in performance "offline". The problem that "online" gains (or contextualization) is actually computed from data entirely generated online, and therefore subject to processes that occur online, is inherent in the very definition of micro-online gains, whether, or not, they computed from averaged performance.

      We would like to make it clear that the issue raised by the Reviewer with respect to averaging across sequences done in the Griffin et al. (2025) study does not impact our study in any way. The primary skill measure used in all analyses reported in our paper is not temporally averaged. We estimated instantaneous correct sequence speed over the entire trial. Once the first sequence iteration within a trial is completed, the speed estimate is then updated at the resolution of individual keypresses. All micro-online and -offline behavioral changes are measured as the difference in instantaneous speed at the beginning and end of individual practice trials.

      Methods (lines 528-530):

      “The instantaneous correct sequence speed was calculated as the inverse of the average KTT across a single correct sequence iteration and was updated for each correct keypress.”

      The instantaneous speed measure used in our analyses, in fact, maximizes the likelihood of detecting changes in online performance, as the Reviewer indicates.  Despite this optimally sensitive measurement of online changes, our findings remained robust, consistently converging on the same outcome across our original analyses and the multiple controls recommended by the reviewers. Notably, online contextualization changes are significantly weaker than offline contextualization in all comparisons with different measurement approaches.

      Results (lines 302-309)

      “The Euclidian distance between neural representations of Index<sub>OP1</sub> (i.e. - index finger keypress at ordinal position 1 of the sequence) and Index<sub>OP5</sub> (i.e. - index finger keypress at ordinal position 5 of the sequence) increased progressively during early learning (Figure 5A)—predominantly during rest intervals (offline contextualization) rather than during practice (online) (t = 4.84, p < 0.001, df = 25, Cohen's d = 1.2; Figure 5B; Figure 5 – figure supplement 1A). An alternative online contextualization determination equalling the time interval between online and offline comparisons (Trial-based; 10 seconds between Index<sub>OP1</sub> and Index<sub>OP5</sub> observations in both cases) rendered a similar result (Figure 5 – figure supplement 2B).

      Results (lines 316-318)

      “Conversely, online contextualization (using either measurement approach) did not explain early online learning gains (i.e. – Figure 5 – figure supplement 3).”

      Results (lines 318-328)

      “Within-subject correlations were consistent with these group-level findings. The average correlation between offline contextualization and micro-offline gains within individuals was significantly greater than zero (Figure 5 – figure supplement 4, left; t = 3.87, p = 0.00035, df = 25, Cohen's d = 0.76) and stronger than correlations between online contextualization and either micro-online (Figure 5 – figure supplement 4, middle; t = 3.28, p = 0.0015, df = 25, Cohen's d = 1.2) or microoffline gains (Figure 5 – figure supplement 4, right; t = 3.7021, p = 5.3013e-04, df = 25, Cohen's d = 0.69). These findings were not explained by behavioral changes of typing rhythm (t = -0.03, p = 0.976; Figure 5 – figure supplement 5), adjacent keypress transition times (R<sup>2</sup> = 0.00507, F[1,3202] = 16.3; Figure 5 – figure supplement 6), or overall typing speed (between-subject; R<sup>2</sup> = 0.028, p \= 0.41; Figure 5 – figure supplement 7).”

      We disagree with the Reviewer’s statement that “the definition of micro-offline gains (as well as offline contextualization) conflates online and "offline" processes”.  From a strictly behavioral point of view, it is obviously true that one can only measure skill (rather than the absence of it during rest) to determine how it changes over time.  While skill changes surrounding rest are used to infer offline learning processes, recovery of skill decay following intense practice is used to infer “unmeasurable” recovery from fatigue or reactive inhibition. In other words, the alternative processes proposed by the Reviewer also rely on the same inferential reasoning. 

      Importantly, inferences can be validated through the identification of mechanisms. Our experiment constrained the study to evaluation of changes in neural representations of the same action in different contexts, while minimized the impact of mechanisms related to fatigue/reactive inhibition [13, 14]. In this way, we observed that behavioral gains and neural contextualization occurs to a greater extent over rest breaks rather than during practice trials and that offline contextualization changes strongly correlate with the offline behavioral gains, while online contextualization does not. This result was supported by the results of all control analyses recommended by the Reviewers. Specifically:

      Methods (Lines 493-499)

      “The study design followed specific recommendations by Pan and Rickard (2015): 1) utilizing 10-second practice trials and 2) constraining analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur) that precede the emergence of “scalloped” performance dynamics strongly linked to reactive inhibition effects ( [29, 72]). This is precisely the portion of the learning curve Pan and Rickard referred to when they stated “…rapid learning during that period masks any reactive inhibition effect” [29].”

      And Discussion (Lines 444-448):

      “Finally, caution should be exercised when extrapolating findings during early skill learning, a period of steep performance improvements, to findings reported after insufficient practice [67], post-plateau performance periods [68], or non-learning situations (e.g. performance of non-repeating keypress sequences in  [67]) when reactive inhibition or contextual interference effects are prominent.”

      Next, we show that offline contextualization is greater than online contextualization and predicts offline behavioral gains across all measurement approaches, including all controls suggested by the Reviewer’s comments and recommendations. 

      Results (lines 302-318):

      “The Euclidian distance between neural representations of Index<sub>OP1</sub> (i.e. - index finger keypress at ordinal position 1 of the sequence) and Index<sub>OP5</sub> (i.e. - index finger keypress at ordinal position 5 of the sequence) increased progressively during early learning (Figure 5A)—predominantly during rest intervals (offline contextualization) rather than during practice (online) (t = 4.84, p < 0.001, df = 25, Cohen's d = 1.2; Figure 5B; Figure 5 – figure supplement 1A). An alternative online contextualization determination equalling the time interval between online and offline comparisons (Trial-based; 10 seconds between Index<sub>OP1</sub> and Index<sub>OP5</sub> observations in both cases) rendered a similar result (Figure 5 – figure supplement 2B).

      Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R<sup>2</sup> = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure [30] (Figure 5 – figure supplement 2A, 1st vs. 2nd Sequence approaches). Conversely, online contextualization (using either measurement approach) did not explain early online learning gains (i.e. – Figure 5 – figure supplement 3).”

      Results (lines 318-324)

      “Within-subject correlations were consistent with these group-level findings. The average correlation between offline contextualization and micro-offline gains within individuals was significantly greater than zero (Figure 5 – figure supplement 4, left; t = 3.87, p = 0.00035, df = 25, Cohen's d = 0.76) and stronger than correlations between online contextualization and either micro-online (Figure 5 – figure supplement 4, middle; t = 3.28, p = 0.0015, df = 25, Cohen's d = 1.2) or microoffline gains (Figure 5 – figure supplement 4, right; t = 3.7021, p = 5.3013e-04, df = 25, Cohen's d = 0.69).”

      Discussion (lines 408-416):

      “Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1). This result remained unchanged when measuring offline contextualization between the last and second sequence of consecutive trials, inconsistent with a possible confounding effect of pre-planning [30] (Figure 5 – figure supplement 2A). On the other hand, online contextualization did not predict learning (Figure 5 – figure supplement 3). Consistent with these results the average within-subject correlation between offline contextualization and micro-offline gains was significantly stronger than within subject correlations between online contextualization and either micro-online or micro-offline gains (Figure 5 – figure supplement 4).”

      We then show that offline contextualization is not explained by pre-planning of the first action sequence:

      Results (lines 310-316):

      “Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R<sup>2</sup> = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure [30] (Figure 5 – figure supplement 2A, 1st vs. 2nd Sequence approaches).”

      Discussion (lines 409-412):

      “This result remained unchanged when measuring offline contextualization between the last and second sequence of consecutive trials, inconsistent with a possible confounding effect of pre-planning [30] (Figure 5 – figure supplement 2A).”

      In summary, none of the presented evidence in this paper—including results of the multiple control analyses carried out in response to the Reviewers’ recommendations— supports the Reviewer’s position. 

      Please note that the micro-offline learning "inference" has extensive mechanistic support across species and neural recording techniques (see Introduction, lines 26-56). In contrast, the reactive inhibition "inference," which is the Reviewer's alternative interpretation, has no such support yet [15].

      Introduction (Lines 26-56)

      “Practicing a new motor skill elicits rapid performance improvements (early learning) [1] that precede skill performance plateaus [5]. Skill gains during early learning accumulate over rest periods (micro-offline) interspersed with practice [1, 6-10], and are up to four times larger than offline performance improvements reported following overnight sleep [1]. During this initial interval of prominent learning, retroactive interference immediately following each practice interval reduces learning rates relative to interference after passage of time, consistent with stabilization of the motor memory [11]. Micro-offline gains observed during early learning are reproducible [7, 10-13] and are similar in magnitude even when practice periods are reduced by half to 5 seconds in length, thereby confirming that they are not merely a result of recovery from performance fatigue [11]. Additionally, they are unaffected by the random termination of practice periods, which eliminates the possibility of predictive motor slowing as a contributing factor [11]. Collectively, these behavioral findings point towards the interpretation that microoffline gains during early learning represent a form of memory consolidation [1]. 

      This interpretation has been further supported by brain imaging and electrophysiological studies linking known memory-related networks and consolidation mechanisms to rapid offline performance improvements. In humans, the rate of hippocampo-neocortical neural replay predicts micro-offline gains [6].

      Consistent with these findings, Chen et al. [12] and Sjøgård et al. [13] furnished direct evidence from intracranial human EEG studies, demonstrating a connection between the density of hippocampal sharp-wave ripples (80-120 Hz)—recognized markers of neural replay—and micro-offline gains during early learning. Further, Griffin et al. reported that neural replay of task-related ensembles in the motor cortex of macaques during brief rest periods— akin to those observed in humans [1, 6-8, 14]—are not merely correlated with, but are causal drivers of micro-offline learning [15]. Specifically, the same reach directions that were replayed the most during rest breaks showed the greatest reduction in path length (i.e. – more efficient movement path between two locations in the reach sequence) during subsequent trials, while stimulation applied during rest intervals preceding performance plateau reduced reactivation rates and virtually abolished micro-offline gains [15]. Thus, converging evidence in humans and non-human primates across indirect non-invasive and direct invasive recording techniques link hippocampal activity, neural replay dynamics and offline skill gains in early motor learning that precede performance plateau.”

      That said, absence of evidence, is not evidence of absence and for that reason we also state in the Discussion (lines 448-452):

      A simple control analysis based on shuffled class labels could lend further support to the authors' complex decoding approach. As a control analysis that completely rules out any source of overfitting, the authors could test the decoder after shuffling class labels. Following such shuffling, decoding accuracies should drop to chance-level for all decoding approaches, including the optimized decoder. This would also provide an estimate of actual chance-level performance (which is informative over and beyond the theoretical chance level). During the review process, the authors reported this analysis to the reviewers. Given that readers may consider following the presented decoding approach in their own work, it would have been important to include that control analysis in the manuscript to convince readers of its validity. 

      As requested, the label-shuffling analysis was carried out for both 4- and 5-class decoders and is now reported in the revised manuscript.

      Results (lines 204-207):

      “Testing the keypress state (4-class) hybrid decoder performance on Day 1 after randomly shuffling keypress labels for held-out test data resulted in a performance drop approaching expected chance levels (22.12%± SD 9.1%; Figure 3 – figure supplement 3C).”

      Results (lines 261-264):

      “As expected, the 5-class hybrid-space decoder performance approached chance levels when tested with randomly shuffled keypress labels (18.41%± SD 7.4% for Day 1 data; Figure 4 – figure supplement 3C).”

      Furthermore, the authors' approach to cortical parcellation raises questions regarding the information carried by varying dipole orientations within a parcel (which currently seems to be ignored?) and the implementation of the mean-flipping method (given that there are two dimensions - space and time - it is unclear what the authors refer to when they talk about the sign of the "average source", line 477). 

      The revised manuscript now provides a more detailed explanation of the parcellation, and sign-flipping procedures implemented:

      Methods (lines 604-611):

      “Source-space parcellation was carried out by averaging all voxel time-series located within distinct anatomical regions defined in the Desikan-Killiany Atlas [31]. Since source time-series estimated with beamforming approaches are inherently sign-ambiguous, a custom Matlab-based implementation of the mne.extract_label_time_course with “mean_flip” sign-flipping procedure in MNEPython [78] was applied prior to averaging to prevent within-parcel signal cancellation. All voxel time-series within each parcel were extracted and the timeseries sign was flipped at locations where the orientation difference was greater than 90° from the parcel mode. A mean time-series was then computed across all voxels within the parcel after sign-flipping.”

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors): 

      Comments on the revision: 

      The authors have made large efforts to address all concerns raised. A couple of suggestions remain: 

      - formally show if and how movement artefacts may contribute to the signal and analysis; it seems that the authors have data to allow for such an analysis  

      We have implemented the requested control analyses addressing this issue. They are reported in: Results (lines 207-211 and 261-268), Discussion (Lines 362-368):

      - formally show that the signals from the intra- and inter parcel spaces are orthogonal. 

      Please note that, despite the Reviewer’s statement above, we never claim in the manuscript that the parcel-space and regional voxel-space features show “complete independence”. 

      Furthermore, the machine learning-based decoding methods used in the present study do not require input feature orthogonality, but instead non-redundancy [7], which is a requirement satisfied by our data (see below and the new Figure 2 – figure supplement 2 in the revised manuscript). Finally, our results already show that the hybrid space decoder outperformed all other methods even after input features were fully orthogonalized with LDA or PCA dimensionality reduction procedures prior to the classification step (Figure 3 – figure supplement 2).

      We also highlight several additional results that are informative regarding this issue. For example, if spatially overlapping parcel- and voxel-space time-series only provided redundant information, inclusion of both as input features should increase model overfitting to the training dataset and decrease overall cross-validated test accuracy [8]. In the present study however, we see the opposite effect on decoder performance. First, Figure 3 – figure supplements 1 & 2 clearly show that decoders constructed from hybrid-space features outperform the other input feature (sensor-, whole-brain parcel- and whole-brain voxel-) spaces in every case (e.g. – wideband, all narrowband frequency ranges, and even after the input space is fully orthogonalized through dimensionality reduction procedures prior to the decoding step). Furthermore, Figure 3 – figure supplement 6 shows that hybridspace decoder performance supers when parcel-time series that spatially overlap with the included regional voxel-spaces are removed from the input feature set.  We state in the Discussion (lines 353-356)

      “The observation of increased cross-validated test accuracy (as shown in Figure 3 – Figure Supplement 6) indicates that the spatially overlapping information in parcel- and voxel-space time-series in the hybrid decoder was complementary, rather than redundant [41].”

      To gain insight into the complimentary information contributed by the two spatial scales to the hybrid-space decoder, we first independently computed the matrix rank for whole-brain parcel- and voxel-space input features for each participant (shown in Author response image 1). The results indicate that whole-brain parcel-space input features are full rank (rank = 148) for all participants (i.e. - MEG activity is orthogonal between all parcels). The matrix rank of voxelspace input features (rank = 267± 17 SD), exceeded the parcel-space rank for all participants and approached the number of useable MEG sensor channels (n = 272). Thus, voxel-space features provide both additional and complimentary information to representations at the parcel-space scale.  

      Figure 2—figure Supplement 2 in the revised manuscript now shows that the degree of dependence between the two spatial scales varies over the regional voxel-space. That is, some voxels within a given parcel correlate strongly with the time-series of the parcel they belong to, while others do not. This finding is consistent with a documented increase in correlational structure of neural activity across spatial scales that does not reflect perfect dependency or orthogonality [9]. Notably, the regional voxel-spaces included in the hybridspace decoder are significantly less correlated with the averaged parcel-space time-series than excluded voxels. We now point readers to this new figure in the results.

      Taken together, these results indicate that the multi-scale information in the hybrid feature set is complimentary rather than orthogonal.  This is consistent with the idea that hybridspace features better represent multi-scale temporospatial dynamics reported to be a fundamental characteristic of how the brain stores and adapts memories, and generates behavior across species [9].

      Reviewer #2 (Recommendations for the authors):  

      I appreciate the authors' efforts in addressing the concerns I raised. The responses generally made sense to me. However, I had some trouble finding several corrections/additions that the authors claim they made in the revised manuscript: 

      "We addressed this question by conducting a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4, and 4-4 keypress transition times observed for each complete correct sequence (both predictor and response variables were z-score normalized within-subject). The results of this analysis also affirmed that the possible alternative explanation that contextualization effects are simple reflections of increased mixing is not supported by the data (Adjusted R<sup>2</sup> = 0.00431; F = 5.62).  We now include this new negative control analysis in the revised manuscript."  

      This approach is now reported in the manuscript in the Results (Lines 324-328 and Figure 5-Figure Supplement 6 legend.

      "We strongly agree with the Reviewer that the issue of generalizability is extremely important and have added a new paragraph to the Discussion in the revised manuscript highlighting the strengths and weaknesses of our study with respect to this issue." 

      Discussion (Lines 436-441)

      “One limitation of this study is that contextualization was investigated for only one finger movement (index finger or digit 4) embedded within a relatively short 5-item skill sequence. Determining if representational contextualization is exhibited across multiple finger movements embedded within for example longer sequences (e.g. – two index finger and two little finger keypresses performed within a short piece of piano music) will be an important extension to the present results.”

      "We strongly agree with the Reviewer that any intended clinical application must carefully consider the specific input feature constraints dictated by the clinical cohort, and in turn impose appropriate and complimentary constraints on classifier parameters that may differ from the ones used in the present study. We now highlight this issue in the Discussion of the revised manuscript and relate our present findings to published clinical BCI work within this context."  

      Discussion (Lines 441-444)

      “While a supervised manifold learning approach (LDA) was used here because it optimized hybrid-space decoder performance, unsupervised strategies (e.g. - PCA and MDS, which also substantially improved decoding accuracy in the present study; Figure 3 – figure supplement 2) are likely more suitable for real-time BCI applications.”

      and 

      "The Reviewer makes a good point. We have now implemented the suggested normalization procedure in the analysis provided in the revised manuscript." 

      Results (lines 275-282)

      “We used a Euclidian distance measure to evaluate the differentiation of the neural representation manifold of the same action (i.e. - an index-finger keypress) executed within different local sequence contexts (i.e. - ordinal position 1 vs. ordinal position 5; Figure 5). To make these distance measures comparable across participants, a new set of classifiers was then trained with group-optimal parameters (i.e. – broadband hybrid-space MEG data with subsequent manifold extraction (Figure 3 – figure supplements 2) and LDA classifiers (Figure 3 – figure supplements 7) trained on 200ms duration windows aligned to the KeyDown event (see Methods, Figure 3 – figure supplements 5). “

      Where are they in the manuscript? Did I read the wrong version? It would be more helpful to specify with page/line numbers. Please also add the detailed procedure of the control/additional analyses in the Method. 

      As requested, we now refer to all manuscript revisions with specific line numbers. We have also included all detailed procedures related to any additional analyses requested by reviewers.

      I also have a few other comments back to the authors' following responses: 

      "Thus, increased overlap between the "4" and "1" keypresses (at the start of the sequence) and "2" and "4" keypresses (at the end of the sequence) could artefactually increase contextualization distances even if the underlying neural representations for the individual keypresses remain unchanged. One must also keep in mind that since participants repeat the sequence multiple times within the same trial, a majority of the index finger keypresses are performed adjacent to one another (i.e. - the "4-4" transition marking the end of one sequence and the beginning of the next). Thus, increased overlap between consecutive index finger keypresses as typing speed increased should increase their similarity and mask contextualization- related changes to the underlying neural representations."  "We also re-examined our previously reported classification results with respect to this issue. 

      We reasoned that if mixing effects reflecting the ordinal sequence structure is an important driver of the contextualization finding, these effects should be observable in the distribution of decoder misclassifications. For example, "4" keypresses would be more likely to be misclassified as "1" or "2" keypresses (or vice versa) than as "3" keypresses. The confusion matrices presented in Figures 3C and 4B and Figure 3-figure supplement 3A display a distribution of misclassifications that is inconsistent with an alternative mixing effect explanation of contextualization." 

      "Based upon the increased overlap between adjacent index finger keypresses (i.e. - "4-4" transition), we also reasoned that the decoder tasked with separating individual index finger keypresses into two distinct classes based upon sequence position, should show decreased performance as typing speed increases. However, Figure 4C in our manuscript shows that this is not the case. The 2-class hybrid classifier actually displays improved classification performance over early practice trials despite greater temporal overlap. Again, this is inconsistent with the idea that the contextualization effect simply reflects increased mixing of individual keypress features."  

      As the time window for MEG feature is defined after the onset of each press, it is more likely that the feature overlap is the current and the future presses, rather than the current and the past presses (of course the three will overlap at very fast typing speed). Therefore, for sequence 41324, if we note the planning-related processes by a Roman numeral, the overlapping features would be '4i', '1iii', '3ii', '2iv', and '4iv'. Assuming execution-related process (e.g., 1) and planning-related process (e.g., i) are not necessarily similar, especially in finer temporal resolution, the patterns for '4i' and '4iv' are well separated in terms of process 'i' and 'iv,' and this advantage will be larger in faster typing speed. This also applies to the other presses. Thus, the author's arguments about the masking of contextualization and misclassification due to pattern overlap seem odd. The most direct and probably easiest way to resolve this would be to use a shorter time window for the MEG feature. Some decrease in decoding accuracy in this case is totally acceptable for the science purpose.  

      The revised manuscript now includes analyses carried out with decoding time windows ranging from 50 to 250ms in duration. These additional results are now reported in:

      Results (lines 258-268):

      “The improved decoding accuracy is supported by greater differentiation in neural representations of the index finger keypresses performed at positions 1 and 5 of the sequence (Figure 4A), and by the trial-by-trial increase in 2-class decoding accuracy over early learning (Figure 4C) across different decoder window durations (Figure 4 – figure supplement 2). As expected, the 5-class hybrid-space decoder performance approached chance levels when tested with randomly shuffled keypress labels (18.41%± SD 7.4% for Day 1 data; Figure 4 – figure supplement 3C). Task-related eye movements did not explain these results since an alternate 5-class hybrid decoder constructed from three eye movement features (gaze position at the KeyDown event, gaze position 200ms later, and peak eye movement velocity within this window; Figure 4 – figure supplement 3A) performed at chance levels (crossvalidated test accuracy = 0.2181; Figure 4 – figure supplement 3B, C).”

      Results (lines 310-316):

      “Offline contextualization strongly correlated with cumulative micro-offline gains (r = 0.903, R² = 0.816, p < 0.001; Figure 5 – figure supplement 1A, inset) across decoder window durations ranging from 50 to 250ms (Figure 5 – figure supplement 1B, C). The offline contextualization between the final sequence of each trial and the second sequence of the subsequent trial (excluding the first sequence) yielded comparable results. This indicates that pre-planning at the start of each practice trial did not directly influence the offline contextualization measure [30] (Figure 5 – figure supplement 2A, 1st vs. 2nd Sequence approaches). “

      Discussion (lines 380-385):

      “The first hint of representational differentiation was the highest false-negative and lowest false-positive misclassification rates for index finger keypresses performed at different locations in the sequence compared with all other digits (Figure 3C). This was further supported by the progressive differentiation of neural representations of the index finger keypress (Figure 4A) and by the robust trial-by-trial increase in 2class decoding accuracy across time windows ranging between 50 and 250ms (Figure 4C; Figure 4 – figure supplement 2).”

      Discussion (lines 408-9):

      “Offline contextualization consistently correlated with early learning gains across a range of decoding windows (50–250ms; Figure 5 – figure supplement 1).”

      "We addressed this question by conducting a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times observed for each complete correct sequence" 

      For regression analysis, I recommend to use total keypress time per a sequence (or sum of 4-1 and 4-4) instead of specific transition intervals, because there likely exist specific correlational structure across the transition intervals. Using correlated regressors may distort the result.  

      This approach is now reported in the manuscript:

      Results (Lines 324-328) and Figure  5-Figure Supplement 6 legend.

      "We do agree with the Reviewer that the naturalistic, generative, self-paced task employed in the present study results in overlapping brain processes related to planning, execution, evaluation and memory of the action sequence. We also agree that there are several tradeoffs to consider in the construction of the classifiers depending on the study aim. Given our aim of optimizing keypress decoder accuracy in the present study, the set of tradeoffs resulted in representations reflecting more the latter three processes, and less so the planning component. Whether separate decoders can be constructed to tease apart the representations or networks supporting these overlapping processes is an important future direction of research in this area. For example, work presently underway in our lab constrains the selection of windowing parameters in a manner that allows individual classifiers to be temporally linked to specific planning, execution, evaluation or memoryrelated processes to discern which brain networks are involved and how they adaptively reorganize with learning. Results from the present study (Figure 4-figure supplement 2) showing hybrid-space decoder prediction accuracies exceeding 74% for temporal windows spanning as little as 25ms and located up to 100ms prior to the KeyDown event strongly support the feasibility of such an approach." 

      I recommend that the authors add this paragraph or a paragraph like this to the Discussion. This perspective is very important and still missing in the revised manuscript. 

      We now included in the manuscript the following sections addressing this point:

      Discussion (lines 334-338)

      “The main findings of this study during which subjects engaged in a naturalistic, self-paced task were that individual sequence action representations differentiate during early skill learning in a manner reflecting the local sequence context in which they were performed, and that the degree of representational differentiation— particularly prominent over rest intervals—correlated with skill gains. “

      Discussion (lines 428-434)

      “In this study, classifiers were trained on MEG activity recorded during or immediately after each keypress, emphasizing neural representations related to action execution, memory consolidation and recall over those related to planning. An important direction for future research is determining whether separate decoders can be developed to distinguish the representations or networks separately supporting these processes. Ongoing work in our lab is addressing this question. The present accuracy results across varied decoding window durations and alignment with each keypress action support the feasibility of this approach (Figure 3—figure supplement 5).”

      "The rapid initial skill gains that characterize early learning are followed by micro-scale fluctuations around skill plateau levels (i.e. following trial 11 in Figure 1B)"  Is this a mention of Figure 1 Supplement 1 A?  

      The sentence was replaced with the following: Results (lines 108-110)

      “Participants reached 95% of maximal skill (i.e. - Early Learning) within the initial 11 practice trials (Figure 1B), with improvements developing over inter-practice rest periods (micro-offline gains) accounting for almost all total learning across participants (Figure 1B, inset) [1].”

      The citation below seems to have been selected by mistake; 

      "9. Chen, S. & Epps, J. Using task-induced pupil diameter and blink rate to infer cognitive load. Hum Comput Interact 29, 390-413 (2014)." 

      We thank the Reviewer for bringing this mistake to our attention. This citation has now been corrected.

      Reviewer #3 (Recommendations for the authors):  

      The authors write in their response that "We now provide additional details in the Methods of the revised manuscript pertaining to the parcellation procedure and how the sign ambiguity problem was addressed in our analysis." I could not find anything along these lines in the (redlined) version of the manuscript and therefore did not change the corresponding comment in the public review.  

      The revised manuscript now provides a more detailed explanation of the parcellation, and sign-flipping procedure implemented:

      Methods (lines 604-611):

      “Source-space parcellation was carried out by averaging all voxel time-series located within distinct anatomical regions defined in the Desikan-Killiany Atlas [31]. Since source time-series estimated with beamforming approaches are inherently sign-ambiguous, a custom Matlab-based implementation of the mne.extract_label_time_course with “mean_flip” sign-flipping procedure in MNEPython [78] was applied prior to averaging to prevent within-parcel signal cancellation. All voxel time-series within each parcel were extracted and the timeseries sign was flipped at locations where the orientation difference was greater than 90° from the parcel mode. A mean time-series was then computed across all voxels within the parcel after sign-flipping.”

      The control analysis based on a multivariate regression that assessed whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times, as briefly mentioned in the authors' responses to Reviewer 2 and myself, was not included in the manuscript and could not be sufficiently evaluated. 

      This approach is now reported in the manuscript: Results (Lines 324-328) and Figure  5-Figure Supplement 6 legend.

      The authors argue that differences in the design between Das et al. (2024) on the one hand (Experiments 1 and 2), and the study by Bönstrup et al. (2019) on the other hand, may have prevented Das et al. (2024) from finding the assumed learning benefit by micro-offline consolidation. However, the Supplementary Material of Das et al. (2024) includes an experiment (Experiment S1) whose design closely follows a large proportion of the early learning phase of Bönstrup et al. (2019), and which, nevertheless, demonstrates that there is no lasting benefit of taking breaks with respect to the acquired skill level, despite the presence of micro-offline gains.  

      We thank the Reviewer for alerting us to this new data added to the revised supplementary materials of Das et al. (2024) posted to bioRxiv. However, despite the Reviewer’s claim to the contrary, a careful comparison between the Das et al and Bönstrup et al studies reveal more substantive differences than similarities and does not “closely follows a large proportion of the early learning phase of Bönstrup et al. (2019)” as stated. 

      In the Das et al. Experiment S1, sixty-two participants were randomly assigned to “with breaks” or “no breaks” skill training groups. The “with breaks” group alternated 10 seconds of skill sequence practice with 10 seconds of rest over seven trials (2 min and 2 sec total training duration). This amounts to 66.7% of the early learning period defined by Bönstrup et al. (2019) (i.e. - eleven 10-second long practice periods interleaved with ten 10-second long rest breaks; 3 min 30 sec total training duration). Also, please note that while no performance feedback nor reward was given in the Bönstrup et al. (2019) study, participants in the Das et al. study received explicit performance-based monetary rewards, a potentially crucial driver of differentiated behavior between the two studies:

      “Participants were incentivized with bonus money based on the total number of correct sequences completed throughout the experiment.”

      The “no breaks” group in the Das et al. study practiced the skill sequence for 70 continuous seconds. Both groups (despite one being labeled “no breaks”) follow training with a long 3-minute break (also note that since the “with breaks” group ends with 10 seconds of rest their break is actually longer), before finishing with a skill “test” over a continuous 50-second-long block. During the 70 seconds of training, the “with breaks” group shows more learning than the “no breaks” group. Interestingly, following the long 3minute break the “with breaks” group display a performance drop (relative to their performance at the end of training) that is stable over the full 50-second test, while the “no breaks” group shows an immediate performance improvement following the long break that continues to increase over the 50-second test.  

      Separately, there are important issues regarding the Das et al study that should be considered through the lens of recent findings not referred to in the preprint. A major element of their experimental design is that both groups—“with breaks” and “no breaks”— actually receive quite a long 3-minute break just before the skill test. This long break is more than 2.5x the cumulative interleaved rest experienced by the “with breaks” group. Thus, although the design is intended to contrast the presence or absence of rest “breaks”, that difference between groups is no longer maintained at the point of the skill test. 

      The Das et al results are most consistent with an alternative interpretation of the data— that the “no breaks” group experiences offline learning during their long 3-minute break. This is supported by the recent work of Griffin et al. (2025) where micro-array recordings from primary and premotor cortex were obtained from macaque monkeys while they performed blocks of ten continuous reaching sequences up to 81.4 seconds in duration (see source data for Extended Data Figure 1h) with 90 seconds of interleaved rest. Griffin et al. observed offline improvement in skill immediately following the rest break that was causally related to neural reactivations (i.e. – neural replay) that occurred during the rest break. Importantly, the highest density of reactivations was present in the very first 90second break between Blocks 1 and 2 (see Fig. 2f in Griffin et al., 2025). This supports the interpretation that both the “with breaks” and “no breaks” group express offline learning gains, with these gains being delayed in the “no breaks” group due to the practice schedule.

      On the other hand, if offline learning can occur during this longer break, then why would the “with breaks” group show no benefit? Again, it could be that most of the offline gains for this group were front-loaded during the seven shorter 10-second rest breaks. Another possible, though not mutually exclusive, explanation is that the observed drop in performance in the “with breaks” group is driven by contextual interference. Specifically, similar to Experiments 1 and 2 in Das et al. (2024), the skill test is conducted under very different conditions than those which the “with breaks” group practiced the skill under (short bursts of practiced alternating with equally short breaks). On the other hand, the “no breaks” group is tested (50 seconds of continuous practice) under quite similar conditions to their training schedule (70 seconds of continuous practice). Thus, it is possible that this dissimilarity between training and test could lead to reduced performance in the “with breaks” group.

      We made the following manuscript revisions related to these important issues: 

      Introduction (Lines 26-56)

      “Practicing a new motor skill elicits rapid performance improvements (early learning) [1] that precede skill performance plateaus [5]. Skill gains during early learning accumulate over rest periods (micro-offline) interspersed with practice [1, 6-10], and are up to four times larger than offline performance improvements reported following overnight sleep [1]. During this initial interval of prominent learning, retroactive interference immediately following each practice interval reduces learning rates relative to interference after passage of time, consistent with stabilization of the motor memory [11]. Micro-offline gains observed during early learning are reproducible [7, 10-13] and are similar in magnitude even when practice periods are reduced by half to 5 seconds in length, thereby confirming that they are not merely a result of recovery from performance fatigue [11]. Additionally, they are unaffected by the random termination of practice periods, which eliminates the possibility of predictive motor slowing as a contributing factor [11]. Collectively, these behavioral findings point towards the interpretation that microoffline gains during early learning represent a form of memory consolidation [1]. 

      This interpretation has been further supported by brain imaging and electrophysiological studies linking known memory-related networks and consolidation mechanisms to rapid offline performance improvements. In humans, the rate of hippocampo-neocortical neural replay predicts micro-offline gains [6]. Consistent with these findings, Chen et al. [12] and Sjøgård et al. [13] furnished direct evidence from intracranial human EEG studies, demonstrating a connection between the density of hippocampal sharp-wave ripples (80-120 Hz)—recognized markers of neural replay—and micro-offline gains during early learning. Further, Griffin et al. reported that neural replay of task-related ensembles in the motor cortex of macaques during brief rest periods— akin to those observed in humans [1, 6-8, 14]—are not merely correlated with, but are causal drivers of micro-offline learning [15]. Specifically, the same reach directions that were replayed the most during rest breaks showed the greatest reduction in path length (i.e. – more efficient movement path between two locations in the reach sequence) during subsequent trials, while stimulation applied during rest intervals preceding performance plateau reduced reactivation rates and virtually abolished micro-offline gains [15]. Thus, converging evidence in humans and non-human primates across indirect non-invasive and direct invasive recording techniques link hippocampal activity, neural replay dynamics and offline skill gains in early motor learning that precede performance plateau.”

      Next, in the Methods, we articulate important constraints formulated by Pan and Rickard (2015) and Bönstrup et al. (2019) for meaningful measurements:

      Methods (Lines 493-499)

      “The study design followed specific recommendations by Pan and Rickard (2015): 1) utilizing 10-second practice trials and 2) constraining analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur) that precede the emergence of “scalloped” performance dynamics strongly linked to reactive inhibition effects ([29, 72]). This is precisely the portion of the learning curve Pan and Rickard referred to when they stated “…rapid learning during that period masks any reactive inhibition effect” [29].”

      We finally discuss the implications of neglecting some or all of these recommendations:

      Discussion (Lines 444-452):

      “Finally, caution should be exercised when extrapolating findings during early skill learning, a period of steep performance improvements, to findings reported after insufficient practice [67], post-plateau performance periods [68], or non-learning situations (e.g. performance of non-repeating keypress sequences in  [67]) when reactive inhibition or contextual interference effects are prominent. Ultimately, it will be important to develop new paradigms allowing one to independently estimate the different coincident or antagonistic features (e.g. - memory consolidation, planning, working memory and reactive inhibition) contributing to micro-online and micro-offline gains during and after early skill learning within a unifying framework.”

      Personally, given that the idea of (micro-offline) consolidation seems to attract a lot of interest (and therefore cause a lot of future effort/cost public money) in the scientific community, I would find it extremely important to be cautious in interpreting results in this field. For me, this would include abstaining from the claim that processes occur "during" a rest period (see abstract, for example), given that micro-offline gains (as well as offline contextualization) are computed from data obtained during practice, not rest, and may, thus, just as well reflect a change that occurs "online", e.g., at the very onset of practice (like pre-planning) or throughout practice (like fatigue, or reactive inhibition). In addition, I would suggest to discuss in more depth the actual evidence not only in favour, but also against, the assumption of micro-offline gains as a phenomenon of learning.  

      We agree with the reviewer that caution is warranted. Based upon these suggestions, we have now expanded the manuscript to very clearly define the experimental constraints under which different groups have successfully studied micro-offline learning and its mechanisms, the impact of fatigue/reactive inhibition on micro-offline performance changes unrelated to learning, as well as the interpretation problems that emerge when those recommendations are not followed. 

      We clearly articulate the crucial constrains recommended by Pan and Rickard (2015) and Bönstrup et al. (2019) for meaningful measurements and interpretation of offline gains in the revised manuscript. 

      Methods (Lines 493-499)

      “The study design followed specific recommendations by Pan and Rickard (2015): 1) utilizing 10-second practice trials and 2) constraining analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur) that precede the emergence of “scalloped” performance dynamics strongly linked to reactive inhibition effects ( [29, 72]). This is precisely the portion of the learning curve Pan and Rickard referred to when they stated “…rapid learning during that period masks any reactive inhibition effect” [29].”

      In the Introduction, we review the extensive evidence emerging from LFP and microelectrode recordings in humans and monkeys (including causality of neural replay with respect to micro-offline gains and early learning in the Griffin et al. Nature 2025 publication):

      Introduction (Lines 26-56)

      “Practicing a new motor skill elicits rapid performance improvements (early learning) [1] that precede skill performance plateaus [5]. Skill gains during early learning accumulate over rest periods (micro-offline) interspersed with practice [1, 6-10], and are up to four times larger than offline performance improvements reported following overnight sleep [1]. During this initial interval of prominent learning, retroactive interference immediately following each practice interval reduces learning rates relative to interference after passage of time, consistent with stabilization of the motor memory [11]. Micro-offline gains observed during early learning are reproducible [7, 10-13] and are similar in magnitude even when practice periods are reduced by half to 5 seconds in length, thereby confirming that they are not merely a result of recovery from performance fatigue [11]. Additionally, they are unaffected by the random termination of practice periods, which eliminates the possibility of predictive motor slowing as a contributing factor [11]. Collectively, these behavioral findings point towards the interpretation that microoffline gains during early learning represent a form of memory consolidation [1]. 

      This interpretation has been further supported by brain imaging and electrophysiological studies linking known memory-related networks and consolidation mechanisms to rapid offline performance improvements. In humans, the rate of hippocampo-neocortical neural replay predicts micro-offline gains [6]. Consistent with these findings, Chen et al. [12] and Sjøgård et al. [13] furnished direct evidence from intracranial human EEG studies, demonstrating a connection between the density of hippocampal sharp-wave ripples (80-120 Hz)—recognized markers of neural replay—and micro-offline gains during early learning. Further, Griffin et al. reported that neural replay of task-related ensembles in the motor cortex of macaques during brief rest periods— akin to those observed in humans [1, 6-8, 14]—are not merely correlated with, but are causal drivers of micro-offline learning [15]. Specifically, the same reach directions that were replayed the most during rest breaks showed the greatest reduction in path length (i.e. – more efficient movement path between two locations in the reach sequence) during subsequent trials, while stimulation applied during rest intervals preceding performance plateau reduced reactivation rates and virtually abolished micro-offline gains [15]. Thus, converging evidence in humans and non-human primates across indirect non-invasive and direct invasive recording techniques link hippocampal activity, neural replay dynamics and offline skill gains in early motor learning that precede performance plateau.”

      Following the reviewer’s advice, we have expanded our discussion in the revised manuscript of alternative hypotheses put forward in the literature and call for caution when extrapolating results across studies with fundamental differences in design (e.g. – different practice and rest durations, or presence/absence of extrinsic reward, etc). 

      Discussion (Lines 444-452):

      “Finally, caution should be exercised when extrapolating findings during early skill learning, a period of steep performance improvements, to findings reported after insufficient practice [67], post-plateau performance periods [68], or non-learning situations (e.g. performance of non-repeating keypress sequences in  [67]) when reactive inhibition or contextual interference effects are prominent. Ultimately, it will be important to develop new paradigms allowing one to independently estimate the different coincident or antagonistic features (e.g. - memory consolidation, planning, working memory and reactive inhibition) contributing to micro-online and micro-offline gains during and after early skill learning within a unifying framework.”

      References

      (1) Zimerman, M., et al., Disrupting the Ipsilateral Motor Cortex Interferes with Training of a Complex Motor Task in Older Adults. Cereb Cortex, 2012.

      (2) Waters, S., T. Wiestler, and J. Diedrichsen, Cooperation Not Competition: Bihemispheric tDCS and fMRI Show Role for Ipsilateral Hemisphere in Motor Learning. J Neurosci, 2017. 37(31): p. 7500-7512.

      (3) Sawamura, D., et al., Acquisition of chopstick-operation skills with the nondominant hand and concomitant changes in brain activity. Sci Rep, 2019. 9(1): p. 20397.

      (4) Lee, S.H., S.H. Jin, and J. An, The dieerence in cortical activation pattern for complex motor skills: A functional near- infrared spectroscopy study. Sci Rep, 2019. 9(1): p. 14066.

      (5) Grafton, S.T., E. Hazeltine, and R.B. Ivry, Motor sequence learning with the nondominant left hand. A PET functional imaging study. Exp Brain Res, 2002. 146(3): p. 369-78.

      (6) Buch, E.R., et al., Consolidation of human skill linked to waking hippocamponeocortical replay. Cell Rep, 2021. 35(10): p. 109193.

      (7) Wang, L. and S. Jiang, A feature selection method via analysis of relevance, redundancy, and interaction, in Expert Systems with Applications, Elsevier, Editor. 2021.

      (8) Yu, L. and H. Liu, Eeicient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 2004. 5: p. 1205-1224.

      (9) Munn, B.R., et al., Multiscale organization of neuronal activity unifies scaledependent theories of brain function. Cell, 2024.

      (10) Borragan, G., et al., Sleep and memory consolidation: motor performance and proactive interference eeects in sequence learning. Brain Cogn, 2015. 95: p. 54-61.

      (11) Landry, S., C. Anderson, and R. Conduit, The eeects of sleep, wake activity and timeon-task on oeline motor sequence learning. Neurobiol Learn Mem, 2016. 127: p. 5663.

      (12) Gabitov, E., et al., Susceptibility of consolidated procedural memory to interference is independent of its active task-based retrieval. PLoS One, 2019. 14(1): p. e0210876.

      (13) Pan, S.C. and T.C. Rickard, Sleep and motor learning: Is there room for consolidation? Psychol Bull, 2015. 141(4): p. 812-34.

      (14) , M., et al., A Rapid Form of Oeline Consolidation in Skill Learning. Curr Biol, 2019. 29(8): p. 1346-1351 e4.

      (15) Gupta, M.W. and T.C. Rickard, Comparison of online, oeline, and hybrid hypotheses of motor sequence learning using a quantitative model that incorporate reactive inhibition. Sci Rep, 2024. 14(1): p. 4661.

  2. Jun 2025
    1. eLife Assessment

      The microbiome field is constantly providing insight on various health-related properties elicited by the commensals that inhabit their mammalian hosts. Harnessing the potential of these commensals for knowledge about host-microbe interactions, as well as properties with therapeutic implications, will likely remain a fruitful field for decades to come. In this valuable study, Wang et al use various methods, encompassing classic microbiology, genomics, chemical biology, and immunology, to identify a potent probiotic strain that protects nematode and murine hosts from Salmonella enterica infection. The authors provide compelling evidence identifying gut metabolites that are correlated with protection, and show that a single metabolite can recapitulate the effects of probiotic administration.

    2. Reviewer #1 (Public review):

      Summary:

      Diarrheal diseases represent an important public health issue. Among the many pathogens that contribute to this problem, Salmonella enterica serovar Typhimurium is an important one. Due to the rise in antimicrobial resistance and the problems associated with widespread antibiotic use, the discovery and development of new strategies to combat bacterial infections is urgently needed. The microbiome field is constantly providing us with various health-related properties elicited by the commensals that inhabit their mammalian hosts. Harnessing the potential of these commensals for knowledge about host-microbe interactions as well as useful properties with therapeutic implications will likely to remain a fruitful field for decades to come. In this manuscript, Wang et al use various methods, encompassing classic microbiology, genomics, chemical biology, and immunology, to identify a potent probiotic strain that protects nematode and murine hosts from S. enterica infection. Additionally, authors identify gut metabolites that are correlated with protection, and show that a single metabolite can recapitulate the effects of probiotic administration.

      Strengths:

      The utilization of varied methods by the authors, together with the impressive amount of data generated, to support the claims and conclusions made in the manuscript is a major strength of the work. Also, the ability the move beyond simple identification of the active probiotic, also identifying compounds that are at least partially responsible for the protective effects, is commendable.

      Weaknesses:

      No major weaknesses noted.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the investigators isolated one Lacticaseibacillus rhamnosus strain (P118), and determined this strain worked well against Salmonella Typhimurium infection. Then, further studies were performed to identify the mechanism of bacterial resistance, and a list of confirmatory assays were carried out to test the hypothesis.

      Strengths:

      The authors provided details regarding all assays performed in this work, and this reviewer trusted that the conclusion in this manuscript is solid. I appreciate the efforts of the authors to perform different types of in vivo and in vitro studies to confirm the hypothesis.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Diarrheal diseases represent an important public health issue. Among the many pathogens that contribute to this problem, Salmonella enterica serovar Typhimurium is an important one. Due to the rise in antimicrobial resistance and the problems associated with widespread antibiotic use, the discovery and development of new strategies to combat bacterial infections is urgently needed. The microbiome field is constantly providing us with various health-related properties elicited by the commensals that inhabit their mammalian hosts. Harnessing the potential of these commensals for knowledge about host-microbe interactions as well as useful properties with therapeutic implications will likely to remain a fruitful field for decades to come. In this manuscript, Wang et al use various methods, encompassing classic microbiology, genomics, chemical biology, and immunology, to identify a potent probiotic strain that protects nematode and murine hosts from S. enterica infection. Additionally, authors identify gut metabolites that are correlated with protection, and show that a single metabolite can recapitulate the effects of probiotic administration.

      We gratefully appreciate your positive and professional comments.

      Strengths:

      The utilization of varied methods by the authors, together with the impressive amount of data generated, to support the claims and conclusions made in the manuscript is a major strength of the work. Also, the ability the move beyond simple identification of the active probiotic, also identifying compounds that are at least partially responsible for the protective effects, is commendable.

      We gratefully appreciate your positive and professional comments.

      Weaknesses:

      No major weaknesses noted.

      We gratefully appreciate your positive comments.

      Reviewer #2 (Public review):

      Summary:

      In this work, the investigators isolated one Lacticaseibacillus rhamnosus strain (P118), and determined this strain worked well against Salmonella Typhimurium infection. Then, further studies were performed to identify the mechanism of bacterial resistance, and a list of confirmatory assays were carried out to test the hypothesis.

      We gratefully appreciate your positive and professional comments.

      Strengths:

      The authors provided details regarding all assays performed in this work, and this reviewer trusted that the conclusion in this manuscript is solid. I appreciate the efforts of the authors to perform different types of in vivo and in vitro studies to confirm the hypothesis.

      We gratefully appreciate your positive and professional comments.

      Weaknesses:

      I have mainly two questions for this work.

      Main point-1:

      The authors provided the below information about the sources from which Lacticaseibacillus rhamnosus was isolated. More details are needed. What are the criteria to choose these samples? Where were these samples originate from? How many strains of bacteria were obtained from which types of samples?

      Lines 486-488: Lactic acid bacteria (LAB) and Enterococcus strains were isolated from the fermented yoghurts collected from families in multiple cities of China and the intestinal contents from healthy piglets without pathogen infection and diarrhoea by our lab.

      Sorry for the ambiguous and limited information, previously, more details had been added in Materials and methods section in the revised manuscript (see Line 482-493) (Manuscript with marked changes are related to “Related Manuscript File” in submission system). We gratefully appreciate your professional comments.

      Line 482-493: “Lactic acid bacteria (LAB) and Enterococcus strains were isolated from 39 samples: 33 fermented yoghurts samples (collected from families in multiple cities of China, including Lanzhou, Urumqi, Guangzhou, Shenzhen, Shanghai, Hohhot, Nanjing, Yangling, Dali, Zhengzhou, Shangqiu, Harbin, Kunming, Puer), and 6 healthy piglet rectal content samples without pathogen infection and diarrhea in pig farm of Zhejiang province (Table 1). Ten isolates were randomly selected from each sample. De Man-Rogosa-Sharpe (MRS) with 2.0% CaCO<sub>3</sub> (is a selective culture medium to favor the luxuriant cultivation of Lactobacilli) and Brain heart infusion (BHI) broths (Huankai Microbial, Guangzhou, China) were used for bacteria isolation and cultivation. Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry (MALDI-TOF MS, Bruker Daltonik GmbH, Bremen, Germany) method was employed to identify of bacterial species with a confidence level ≥ 90% (He et al., 2022).”

      Lines 129-133: A total of 290 bacterial strains were isolated and identified from 32 samples of the fermented yoghurt and piglet rectal contents collected across diverse regions within China using MRS and BHI medium, which consist s of 63 Streptococcus strains, 158 Lactobacillus/ Lacticaseibacillus Limosilactobacillus strains and 69 Enterococcus strains.

      Sorry for the ambiguous information, we had carefully revised this section and more details had been added in this section (see Line 129-133). We gratefully appreciate your professional comments.

      Line 129-133: “After identified by MALDI-TOF MS, a total of 290 bacterial isolates were isolated and identified from 33 fermented yoghurts samples and 6 healthy piglet rectal content samples. Those isolates consist of 63 Streptococcus isolates, 158 Lactobacillus/Lacticaseibacillus/Limosilactobacillus isolates, and 69 Enterococcus isolates (Figure 1A, Table 1).”

      Main-point-2:

      As probiotics, Lacticaseibacillus rhamnosus has been widely studied. In fact, there are many commercially available products, and Lacticaseibacillus rhamnosus is the main bacteria in these products. There are also ATCC type strain such as 53103.

      I am sure the authors are also interested to know if P118 is better as a probiotics candidate than other commercially available strains. Also, would the mechanism described for P118 apply to other Lacticaseibacillus rhamnosus strains?

      It would be ideal if the authors could include one or two Lacticaseibacillus rhamnosus which are currently commercially used, or from the ATCC. Then, the authors can compare the efficacy and antibacterial mechanisms of their P118 with other strains. This would open the windows for future work.

      We gratefully appreciate your professional comments and valuable suggestions. We deeply agree that it will be better and make more sense to include well-known/recognized/commercial probiotics as a positive control to comprehensively evaluate the isolated P118 strain as a probiotic candidate, particularly in comparison to other well-established probiotics, and also help assess whether the mechanisms described for P118 are applicable to other L. rhamnosus strains or lactic acid bacteria in general. Those issues will be fully taken into consideration and included in the further works. Nonetheless, the door open for future research had been left in Conclusion section (see Line 477-479) “Further investigations are needed to assess whether the mechanisms observed in P118 are strain-specific or broadly applicable to other L. rhamnosus strains, or LAB species in general.”.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      This reviewer appreciates the efforts from the authors to provide the details related to this work. In the meantime, the manuscript shall be written in a way which is easy for the readers to follow.

      We had tried our best to revise and make improve the whole manuscript to make it easy for the readers to follow (e.g., see Line 27-30, Line 115-120, Line 129-133, Line 140-143, Line 325-328, Line 482-493, Line 501-502, Line 663-667, Line 709-710, Line 1003-1143). We gratefully appreciate your valuable suggestions.

      For example, under the sections of Materials and Methods, there are 19 sub-titles. The authors could consider combining some sections, and/or cite other references for the standard procedures.

      We gratefully appreciate your professional comments and valuable suggestions. Some sections had been combined according to the reviewer’s suggestions (see Line 501-710).

      Another example: the figures have great resolution, but they are way too busy. The figures 1 and 2 have 14-18 panels. Figure 5 has 21 panels. Please consider separating into more figures, or condensing some panels.

      We deeply agree with you that some submitted figures are way too busy, but it’s not easy for us to move some results into supplementary information sections, because all of them are essential for fully supporting our hypothesis and conclusions. Nonetheless, some panels had been combined or condensed according to the reviewer’s suggestions (see Line 1003-1024, Line 1056-1075). We gratefully appreciate your professional comments and valuable suggestions.

      More minor comments:

      line 30: spell out "C." please.

      Done as requested (see Line 29, Line 31). We gratefully appreciate your valuable suggestions.

    1. eLife Assessment

      This valuable study identifies a novel bacteriophage that can use the exopolysaccharide Psl of Pseudomonas aeruginosa to infect and disrupt biofilms. The work is convincing and suggests a novel approach to control biofilms that is relevant to researchers working on biofilms, specifically in Pseudomonas, on phage physiology and discovery, and on alternatives to controlling bacterial pathogens.

    2. Reviewer #1 (Public review):

      Summary:

      Walton et al. set out to isolate new phages targeting the opportunistic pathogen Pseudomonas aeruginosa. Using a double ∆fliF ∆pilA mutant strain, they were able to isolate 4 new phages, CLEW-1. -3, -6 and -10, that were unable to infected the parental PAO1F Wt strain. Further experiments showed that the 4 phages were only able to infect a ∆fliF strain, indicating a role of the MS-protein in the flagellum complex. Through further mutational analysis of the flagellum apparatus, the authors were able to identify the involvement of c-di-GMP in phage infection. Depletion of c-di-GMP levels by an inducible phosphodiesterase render the bacteria resistant to phage infection, while elevation of c-di-GMP through the Wsp system made the cells sensitive to infection by CLEW-1. Using TnSeq, the authors were able to not only reaffirm the involvement of c-di-GMP in phage infection but also able to identify the exopolysaccharide PSL as a downstream target for CLEW-1. C-di-GMP is a known regulator of PSL biosynthesis. The authors show that CLEW-1 binds directly to PSL on the cell surface and that deletion of the pslC gene resulted in complete phage resistance. The authors also provide evidence that the phage - PSL interaction happens during the biofilm mode of growth and that the addition of the CLEW-1 phage specifically resulted in a significant loss of biofilm biomass. Lastly, the authors set out to test if CLEW-1 could be used to resolve a biofilm infection using a mouse keratitis model. Unfortunately, while the authors noted a reduction in bacterial load assessed by GFP fluorescence, the keratitis did not resolve under the tested parameters.

      Strengths:

      The experiments carried out in this manuscript are thoughtful and rational, and sufficient explanation is provided for why the authors chose each specific set of experiments. The data presented strongly supports their conclusions and they give present compelling explanations for any deviation. The authors have not only developed a new technique for screening for phages targeting P. aeruginosa, but also highlights the importance of looking for phages during the biofilm mode of growth, as opposed to the more standard techniques involving planktonic cultures.

      Weaknesses:

      The authors did not include host-range testing or resistance development in this study, which would have strengthened the paper. Additionally, further characterisation of the CLEW-1 interaction with PSL at the molecular level would also have been welcomed. However, this will likely be the subject of future studies.

    3. Reviewer #2 (Public review):

      This manuscript by Walton et al. suggests that they have identified a new bacteriophage that uses the exopolysaccharide Psl from Pseudomonas aeruginosa (PA) as a receptor. As Psl is an important component in biofilms, the authors suggest that this phage (and others similarly isolated) may be able to specifically target biofilm-growing bacteria.

      Comments on revised version:

      The authors have generally responded well to the reviewers' comments. This has served to improve this manuscript that has identified a new bacteriophage that uses the exopolysaccharide Psl from Pseudomonas aeruginosa as a receptor.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Walton et al. set out to isolate new phages targeting the opportunistic pathogen Pseudomonas aeruginosa. Using a double ∆fliF ∆pilA mutant strain, they were able to isolate 4 new phages, CLEW-1. -3, -6, and -10, which were unable to infect the parental PAO1F Wt strain. Further experiments showed that the 4 phages were only able to infect a ∆fliF strain, indicating a role of the MS-protein in the flagellum complex. Through further mutational analysis of the flagellum apparatus, the authors were able to identify the involvement of c-di-GMP in phage infection. Depletion of c-di-GMP levels by an inducible phosphodiesterase renders the bacteria resistant to phage infection, while elevation of c-di-GMP through the Wsp system made the cells sensitive to infection by CLEW-1. Using TnSeq, the authors were able to not only reaffirm the involvement of c-di-GMP in phage infection but also able to identify the exopolysaccharide PSL as a downstream target for CLEW-1. C-di-GMP is a known regulator of PSL biosynthesis. The authors show that CLEW-1 binds directly to PSL on the cell surface and that deletion of the pslC gene resulted in complete phage resistance. The authors also provide evidence that the phage-PSL interaction happens during the biofilm mode of growth and that the addition of the CLEW-1 phage specifically resulted in a significant loss of biofilm biomass. Lastly, the authors set out to test if CLEW-1 could be used to resolve a biofilm infection using a mouse keratitis model. Unfortunately, while the authors noted a reduction in bacterial load assessed by GFP fluorescence, the keratitis did not resolve under the tested parameters. 

      Strengths: 

      The experiments carried out in this manuscript are thoughtful and rational and sufficient explanation is provided for why the authors chose each specific set of experiments. The data presented strongly supports their conclusions and they give present compelling explanations for any deviation. The authors have not only developed a new technique for screening for phages targeting P. aeruginosa, but also highlight the importance of looking for phages during the biofilm mode of growth, as opposed to the more standard techniques involving planktonic cultures. 

      Weaknesses: 

      While the paper is strong, I do feel that further discussions could have gone into the decision to focus on CLEW-1 for the majority of the paper. The paper also doesn't provide any detailed information on the genetic composition of the phages. It is unclear if the phages isolated are temperate or virulent. Many temperate phages enter the lytic cycle in response to QS signalling, and while the data as it is doesn't suggest that is the case, perhaps the paper would be strengthened by further elimination of this possibility. At the very least it might be worth mentioning in the discussion section. 

      Thank you for your review. The genomes of all Clew phages and Ocp-2 have been uploaded [Genbank accession# PQ790658.1, PQ790659.1, PQ790660.1, PQ790661.1, and PQ790662.1]. It turns out that the Clew phage are highly related, which is highlighted by the genomic comparison in the supplementary figure S1. It therefore made sense to focus our in-depth analysis on one of the phage. We have included a supplementary figure (S1A), demonstrating that the other Clew phage also require an intact psl locus for infection, to make that logic clearer. The phage are virulent (there is apparently a bit of a debate about this with regard to Bruynogheviruses, but we have not been able to isolate lysogens). This is now mentioned in the discussion.  

      Reviewer #2 (Public review): 

      This manuscript by Walton et al. suggests that they have identified a new bacteriophage that uses the exopolysaccharide Psl from Pseudomonas aeruginosa (PA) as a receptor. As Psl is an important component in biofilms, the authors suggest that this phage (and others similarly isolated) may be able to specifically target biofilm-growing bacteria. While an interesting suggestion, the manner in which this paper is written makes it difficult to draw this conclusion. Also, some of the results do not directly follow from the data as presented and some relevant controls seem to be missing. 

      Thank you for your review. We would argue that the combination of demonstrating Psl-dependent binding of Clew-1 to P. aeruginosa, as well as demonstration of direct binding of Clew-1 to affinity-purified Psl, indicates that the phage binds directly to Psl and uses it as a receptor. In looking at the recommendations, it appears that the remark about controls refers to not using the ∆pslC mutant alone (as opposed to the ∆fliF2 ∆pslC double mutant) as a control for some of the binding experiments. However, since the ∆fliF2 mutant is more permissive for phage infection, analyzing the effect of deleting pslC in the context of the ∆fliF2 mutant background is the more stringent test. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      First off, I would like to congratulate the authors on this study and manuscript. It is very well executed and the writing and flow of the paper are excellent. The findings are intriguing and I believe the paper will be very well received by both the phage, Pseudomonas, and biofilm communities. 

      Thank you for your kind review of our work!

      I have very little to critique about the paper but I have listed a few suggestions that I believe could strengthen the paper if corrected: 

      Comments and suggestions: 

      (1) The paper initially describes 4 isolated phages but no rationale is given for why they chose to continue with CLEW-1, as opposed to CLEW-3, -6, and -10. The paper would benefit from going into more detail with phage genomics and perhaps characterize the phage receptor binding to PSL. 

      Clew-1, -3, -6, and -10 are actually quite similar to one another. The genomes are now uploaded to Genbank [accession# PQ790658.1, PQ790659.1, PQ790660.1, and PQ790661.1]. They all require an intact Psl locus for infection, we have updated Fig. S1 to show this for the remaining Clew phage. In the end, it made sense to focus on one of these related phage and characterize it in depth.

      (2) PA14 was used in some experiments but not listed in the strain table. 

      Thank you, this has been added in the resubmission.

      (3) Would have been good to see more strains/isolates used.

      We are currently characterizing the host range of Clew-1. It appears to be pretty limited, but this will likely be included in another paper that will focus on host range, not only of Clew-1, but other biofilm-tropic phage that we have isolated since then.

      (4) Could purified PSL be added to make non-PSL strain (like PA14) susceptible? 

      We have tried adding purified Psl to a psl mutant strain, but this does not result phage sensitivity. Further characterization of the Psl receptor, is something we are currently working on, but will likely be a much bigger story than can be easily accommodated in a revised manuscript.

      (5) No data on resistance development. 

      We have not done this as yet.

      (6) Alternative biofilm models. Both in vitro and in vivo. 

      We agree that exploring the interaction of Clew-1 with biofilms in greater detail is a logical next step. The revised manuscript does have data on the viability of P. aeruginosa biofilm bacteria after Clew-1 infection using either a bead biofilm model or LIVE/DEAD staining of static biofilms. However, expanding on this further (setting up flow-cell biofilms, developing reporters to monitor phage infection, etc.) is beyond the scope of this initial report and characterization of Clew-1.

      (7) There is a mistake in at least one reference. An unknown author is listed in reference 48. DA Garsin is not part of the paper. Might be worth looking into further mistakes in the reference list as I suspect this might be an issue related to the citation software.

      Thank you. Yes, odd how that extra author got snuck in. This has been corrected.

      (8) I don't seem to be able to locate a Genbank file or accession number. If it wasn't performed how was evolutionary relatedness data generated?

      The genomes of all Clew phages and Ocp-2 have been uploaded [Genbank accession# PQ790658.1, PQ790659.1, PQ790660.1, PQ790661.1, and PQ790662.1]

      (9) No genomic information about the isolated phages. Are they temperate or virulent? This would be important information as only strictly lytic phages are currently deemed appropriate for phage therapy. 

      These phage are virulent. We have only been able to isolate resistant bacteria from plaques, but they do not harbor the phage (as detected by PCR). This matches what other researchers have found for Bruynogheviruses.

      Reviewer #2 (Recommendations for the authors): 

      Others have used different PA mutants lacking known phage receptors to pan for new phages. However, it is not totally clear how the screen here was selected for the Psl-specific phage. The authors used flagella and pili mutants and found Clew-1, -3, -6, and -10. These were all Bruynogheviruses. They also isolated a phage that uses the O antigen as a receptor. The family of this latter phage and how it is known to use this as a receptor is not described. 

      Phage Ocp-2 is a Pbunavirus. We added new supplementary figure S3, addressing the O-antigen receptor.

      The authors focused on Clew-1, but the receptor for these other Clew phages is not presented. For Clew-1 the phage could plaque on the fliF deletion mutant but not the wild-type strain. The reason for this never appears to be addressed. The authors leap to consider the involvement of c-di-GMP, but how this relates to fliF appears to be lacking. 

      We have included a supplementary figure demonstrating that all the Clew phage require Psl for infection (Fig. S1A). As noted above, we have uploaded the genomic data that underpins the comparison in our supplementary figure. The phage are all closely related. It therefore made sense to focus on one of the phage for the analysis.  

      It is particularly unclear why this phage doesn't plaque on PAO1 as this strain does make Psl. Related to this, it actually looks like something is happening to PAO1 in Figure S4 (although what units are on the x-axis is not entirely clear).

      We hypothesize that the fraction of susceptible cells in the population dictates whether the phage can make overt plaques. The supplementary figure S4 indicates that a subpopulation of the wild-type culture is susceptible and this is borne out by the fraction of wild type cells that the phage can bind to (~50%). The fliF mutation increases this frequency of susceptible cells to 80-90% (Fig. 3).

      The Tnseq screen to identify receptors is clever and identifies additional phosphodiesterase genes, the deletion of which makes PAO1 susceptible. And the screen to find resistant fliF mutants identified genes involved in Psl. However, the link between the phosphodiesterase mutants and the amount of Psl produced never appears to be established. And the statement that Psl is required for infection (line 130) is never actually tested.

      The link between c-di-GMP and Psl production is well-established in the literature. I think the requirement for Psl in infection is demonstrated multiple ways, including lack of plaque formation on psl mutant strains and lack of phage binding to strains that do not produce Psl, direct binding of the phage to affinity purified Psl.

      Figure 2C describes using a ∆fliF2 strain but how this is different (or if it is different) from ∆fliF described in the text is never explained.

      The difference in the deletions is explained in table S1, in the description for the deletion constructs used in their construction, pEXG2-∆fliF and pEXG2-∆fliF2 (∆fliF2 is smaller than ∆fliF and can be complemented completely with our complementing plasmid, pP37-fliF, which is the reason why we used the ∆fliF2 mutation going forward, rather than the ∆fliF mutation on which the phage was originally isolated).

      Similarly, there is a sentence (line 138) that "Attachment of Clew-1 is Psl-dependent" but this would appear to have no context.

      The relevant figure, Fig. 3, is cited in the next sentence and is the subject of the remaining paragraphs in this section of the manuscript.

      For Figure 3B, why wasn't the single ∆pslC mutant visualized in this analysis? Similar questions relate to the data in Figure 4.

      Analyzing the effect of the pslC deletion in the context of the ∆fliF2 mutant background, which is more permissive for phage infection, is the more stringent test.  

      The efficacy of Clew-1 in the mouse keratitis model is intriguing but it is unclear why the CFU/eye are so variable. The description of how the experiment was actually carried out is not clear. Was only one eye scratched or both? Were controls included with a scratch and no bacteria ({plus minus} phage)?

      One eye was infected. We did not conduct a no-bacteria control (just scratching the cornea is not sufficient to cause disease). The revised manuscript has an updated animal experiment in which we carried the infection forward to 72h with two phage treatments. Following this regiment, there is a significant decrease in CFU, as well as corneal opacity (disease). Variability of the data is a fairly common feature in animal experiments. There are a number of factors, such as does the mouse blink and remove some of the inoculum shortly after deposition of the bacteria or the phage after each treatment that could explain this variability.

    1. eLife Assessment

      This useful study analyzed 335 Mycobacterium tuberculosis Complex genomes and found that MTBC has a closed pangenome with few accessory genes. The research provides solid evidence for gene presence-absence patterns which support the appending conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, Behruznia and colleagues use long-read sequencing data for 339 strains of the Mycobacterium tuberculosis complex to study genome evolution in this clonal bacterial pathogen. They use both a "classical" pangenome approach that looks at the presence and absence of genes, and a pangenome graph based on whole genomes in order to investigate structural variants in non-coding regions. The comparison of the two approaches is informative and shows that much is missed when focussing only on genes. The two main biological results of the study are that 1) the MTBC has a small pangenome with few accessory genes, and that 2) pangenome evolution is driven by genome reduction. In the revised article, the description of the data set and the methods is much improved, and the comparison of the two pangenome approaches is more consistent. I still think, however, that the discussion of genome reduction suffers from a basic flaw, namely the failure to distinguish clearly between orthologs and homologs/paralogs.

      Strengths:

      The authors put together the so-far largest data set of long-read assemblies representing most lineages of the Mycobacterium tuberculosis context, and covering a large geographic area. They sequenced and assembled genomes for strains of M. pinnipedi, L9, and La2, for which no high-quality assemblies were available previously. State-of-the-art methods are used to analyze gene presence-absence polymorphisms (Panaroo) and to construct a pangenome graph (PanGraph). Additional analysis steps are performed to address known problems with misannotated or misassembled genes.

      Weaknesses:

      The revised manuscript has gained much clarity and consistency. One previous criticism, however, has in my opinion not been properly addressed. I think the problem boils down to not clearly distinguishing between orthologs and paralogs/homologs. As this problem affects a main conclusion - the prevalence of deletions over insertions in the MTBC - it should be addressed, if not through additional analyses, then at least in the discussion.

      Insertions and deletions are now distinguished in the following way: "Accessory regions were further classified as a deletion if present in over 50% of the 192 sub-lineages or an insertion/duplication if present in less than 50% of sub-lineages." The outcome of this classification is suspicious: not a single accessory region was classified as an insertion/duplication. As a check of sanity, I'd expect at least some insertions of IS6110 to show up, which has produced lineage- or sublineage-specific insertions (Roychowdhury et al. 2015, Shitikov et al. 2019). Why, for example, wouldn't IS6110 insertions in the single L8 strain show up here?

      In a fully clonal organism, any insertion/duplication will be an insertion/duplication of an existing sequence, and thus produce a paralog. If I'm correctly understanding your methods section, paralogs are systematically excluded in the pangraph analysis. Genomic blocks are summarized at the sublineage levels as follows (l.184 ): "The DNA sequences from genomic blocks present in at least one sub-lineage but completely absent in others were extracted to look for long-term evolution patterns in the pangenome." I presume this is done using blastn, as in other steps of the analysis.

      So a sublineage-specific copy of IS6110 would be excluded here, because IS6110 is present somewhere in the genome in all sublineages. However, the appropriate category of comparison, at least for the discussion of genome reduction, is orthology rather than homology: is the same, orthologous copy of IS6110, at the same position in the genome, present or absent in other sublineages? The same considerations apply to potential sublineage-specific duplicates of PE, PPE, and Esx genes. These gene families play important roles in host-pathogen interactions, so I'd argue that the neglect of paralogs is not a finicky detail, but could be of broader biological relevance.

    3. Reviewer #2 (Public review):

      Summary:

      The authors attempted to investigate the pangenome of MTBC by using a selection of state-of-the-art bioinformatic tools to analyse 324 complete and 11 new genomes representing all known lineages and sublineages. The aim of their work was to describe the total diversity of the MTBC and to investigate the driving evolutionary force. By using long read and hybrid approaches for genome assembly, an important attempt was made to understand why the MTBC pangenome size was reported to vary in size by previous reports. This study provides strong evidence that the MTBC pangenome is closed and that genome reduction is the main driver of this species evolution.

      Strengths:

      A stand-out feature of this work is the inclusion of non-coding regions as opposed to only coding regions which was a focus of previous papers and analyses which investigated the MTBC pangenome. A unique feature of this work is that it highlights sublineage-specific regions of difference (RDs) that was previously unknown. Another major strength is the utilisation of long-read whole genomes sequences, in combination with short-read sequences when available. It is known that using only short reads for genome assembly has several pitfalls. The parallel approach of utilizing both Panaroo and Pangraph for pangenomic reconstruction illuminated limitations of both tools while highlighting genomic features identified by both. This is important for any future work and perhaps alludes to the need for more MTBC-specific tools to be developed. Lastly, ample statistical support in the form of Heaps law and genome fluidity calculations for each pangenome to demonstrate that they are indeed closed.

      Weaknesses:

      There are no major weaknesses in the revised version of this manuscript.

    4. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      The revised manuscript has gained much clarity and consistency. One previous criticism, however, has in my opinion not been properly addressed. I think the problem boils down to not clearly distinguishing between orthologs and paralogs/homologs. As this problem affects a main conclusion - the prevalence of deletions over insertions in the MTBC - it should be addressed, if not through additional analyses, then at least in the discussion.

      Insertions and deletions are now distinguished in the following way: "Accessory regions were further classified as a deletion if present in over 50% of the 192 sub-lineages or an insertion/duplication if present in less than 50% of sub-lineages." The outcome of this classification is suspicious: not a single accessory region was classified as an insertion/duplication. As a check of sanity, I'd expect at least some insertions of IS6110 to show up, which has produced lineage- or sublineage-specific insertions (Roychowdhury et al. 2015, Shitikov et al. 2019). Why, for example, wouldn't IS6110 insertions in the single L8 strain show up here?

      In a fully clonal organism, any insertion/duplication will be an insertion/duplication of an existing sequence, and thus produce a paralog. If I'm correctly understanding your methods section, paralogs are systematically excluded in the pangraph analysis. Genomic blocks are summarized at the sublineage levels as follows (l.184 ): "The DNA sequences from genomic blocks present in at least one sub-lineage but completely absent in others were extracted to look for long-term evolution patterns in the pangenome." I presume this is done using blastn, as in other steps of the analysis.

      So a sublineage-specific copy of IS6110 would be excluded here, because IS6110 is present somewhere in the genome in all sublineages. However, the appropriate category of comparison, at least for the discussion of genome reduction, is orthology rather than homology: is the same, orthologous copy of IS6110, at the same position in the genome, present or absent in other sublineages? The same considerations apply to potential sublineage-specific duplicates of PE, PPE, and Esx genes. These gene families play important roles in host-pathogen interactions, so I'd argue that the neglect of paralogs is not a finicky detail, but could be of broader biological relevance.

      Within the analysis we undertook we did look at paralogous blocks in pangraph, based on copy number per genome. However, this could have been clearer in the text and we will rectify this. We also focussed on duplicated/deleted blocks that were present in two of more sub-lineages. This is noted in figure 4 legend but we will make this clearer in other sections of the manuscript.

      We agree that indeed the way paralogs are handled could still be optimised, and that gene duplicates of some genes could have biological importance. The reviewer is suggesting that a synteny analysis between genomes would be best for finding specific regions that are duplicated/deleted within a genome, and if those sections are duplicated/deleted in the same regions of the genome. Since Pangraph does not give such information readily, a larger amount of analysis would be required to confirm such genome position-specific duplications. While this is indeed important, we deem this to be out of scope for the current publication, but will note this as a limitation in the discussion. However, this does not fundamentally change the main conclusions of our analysis.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Behruznia and colleagues use long-read sequencing data for 335 strains of the Mycobacterium tuberculosis complex to study genome evolution in this clonal bacterial pathogen. They use both a "classical" pangenome approach that looks at the presence and absence of genes, and a more general pangenome graph approach to investigate structural variants also in non-coding regions. The two main results of the study are that (1) the MTBC has a small pangenome with few accessory genes, and that (2) pangenome evolution is driven by deletions in sublineage-specific regions of difference. Combining the gene-based approach with a pangenome graph is innovative, and the former analysis is largely sound apart from a lack of information about the data set used. The graph part, however, requires more work and currently fails to support the second main result. Problems include the omission of important information and the confusing analysis of structural variants in terms of "regions of difference", which unnecessarily introduces reference bias. Overall, I very much like the direction taken in this article, but think that it needs more work: on the one hand by simply telling the reader what exactly was done, on the other by taking advantage of the information contained in the pangenome graph.

      Strengths:

      The authors put together a large data set of long-read assemblies representing most lineages of the Mycobacterium tuberculosis context, covering a large geographic area. State-of-the-art methods are used to analyze gene presence-absence polymorphisms (Panaroo) and to construct a pangenome graph (PanGraph). Additional analysis steps are performed to address known problems with misannotated or misassembled genes in pangenome analysis.

      Weaknesses:

      The study does not quite live up to the expectations raised in the introduction. Firstly, while the importance of using a curated data set is emphasized, little information is given about the data set apart from the geographic origin of the samples (Figure 1). A BUSCO analysis is conducted to filter for assembly quality, but no results are reported. It is also not clear whether the authors assembled genomes themselves in the cases where, according to Supplementary Table 1, only the reads were published but not the assemblies. In the end, we simply have to trust that single-contig assemblies based on long-reads are reliable.

      We have now added a robust overview of the dataset to supplementary file 1. This is split into 3 sections: public genomes, which were assembled by others; sequenced genomes, which were created and assembled by us; the BUSCO information for all the genomes together. We did not assemble any public data ourselves but retrieved these from elsewhere. We have modified the text to be more specific on this (Line 114 onwards) and the supplementary file is updated to better outline the data.

      One issue with long read assemblies could be that high rates of sequencing errors result in artificial indels when coverage is low, which in turn could affect gene annotation and pangenome inference (e.g. Watson & Warr 2019, https://doi.org/10.1038/s41587-018-0004-z). Some of the older long-read data used by the authors could well be problematic (PacBio RSII), but also their own Nanopore assemblies, six of which have a mean coverage below 50 (Wick et al. 2023 recommend 200x for ONT, https://doi.org/ 10.1371/journal.pcbi.1010905). Could the results be affected by such assembly errors? Are there lineages, for example, for which there is an increased proportion of RSII data? Given the large heterogeneity in data quality on the NCBI, I think more information about the reads and the assemblies should be provided.

      We have now included an analysis where we looked to see if the sequencing platform influenced the resulting accessory genome size and the pseudogene count. The details of this are included in lines 207-219, and the results are outlined in lines 251-258. Essentially, we found no correlation between sequencing platform and genome characteristics, although less stringent cut-offs did suggest that PacBio SMRT-only assembled genomes may have larger accessory genomes. We do not believe this is enough to influence our larger inferences from this data. It should be noted that complete genomes, in general, give a better indication of pangenome size compared to draft genomes, as has been shown previously (e.g. Marin et al., 2024). Even with some small potential bias, this makes our analysis more robust than any previously published.

      In relation to the sequencing depth of our own data, all genomes had coverage above 30x, which Sanderson et al. (2024) has shown to be sufficient for highly accurate sequence recovery. We fixed an issue with the L9 isolate from the previous submission, which resulted in a better BUSCO score and overall quality of that isolate and the overall dataset.

      The part of the paper I struggled most with is the pangenome graph analysis and the interpretation of structural variants in terms of "regions of difference". To start with, the method section states that "multiple whole genomes were aligned into a graph using PanGraph" (l.159/160), without stating which genomes were for what reason. From Figure 5 I understand that you included all genomes, and that Figure 6 summarizes the information at the sublineage level. This should be stated clearly, at present the reader has to figure out what was done. It was also not clear to me why the authors focus on the sublineage level: a minority of accessory genes (107 of 506) are "specific to certain lineages or sublineages" (l. 240), so why conclude that the pangenome is "driven by sublineage-specific regions of difference", as the title states? What does "driven by" mean? Instead of cutting the phylogeny arbitrarily at the sublineage level, polymorphisms could be described more generally by their frequencies.

      We apologise for the ambiguity in the methodology. All the isolates were inputted to Pangraph to create the pangenome using this method. This is now made clearer in lines 175-177. Standard pangenome statistics (size, genome fluidity, etc.) derived from this Pangraph output are now present in the results section as well (lines 301-320).

      We then only looked at regions of difference at the sub-lineage level, meaning we grouped genomes by sub-lineage within the resulting graph and looked for blocks common between isolates of the same sub-lineage but absent from one or more other sub-lineages. We did this from both the Panaroo output and the Pangraph output and then retained only blocks found by both. The results of this are now outlined in lines 351-383.

      We focussed on these sub-lineage-specific regions to focus on long-term evolution patterns and not be influenced by single-genome short-term changes. We do not have enough genomes of closely related isolates to truly look at very recent evolution, although the small accessory genome indicates this is not substantial in terms of gene presence/absence. We also did not want potential mis-annotations in a single genome to heavily influence our findings due to the potential issues pointed out by the reviewer above. We state this more clearly in the introduction (lines 106-108), methods (lines 184-186) and results (345-347), and we indicate the limitations in the Discussion, lines 452-457 and 471-473. We also changed the title to ‘shaped’ instead of ‘driven by’.

      I fully agree that pangenome graphs are the way to go and that the non-coding part of the genome deserves as much attention as the coding part, as stated in the introduction. Here, however, the analysis of the pangenome graph consists of extracting variants from the graph and blasting them against the reference genome H37Rv in order to identify genes and "regions of difference" (RDs) that are variable. It is not clear what the authors do with structural variants that yield no blast hit against H37Rv. Are they ignored? Are they included as new "regions of difference"? How many of them are there? etc. The key advantage of pangenome graphs is that they allow a reference-free, full representation of genetic variation in a sample. Here reference bias is reintroduced in the first analysis step.

      We apologise for the confusion here as indeed the RDs terminology is very MTBC-specific. Current RDs are always relevant to H37Rv, as that is how original discovery of these regions was done and that is how RDScan works. We clarify this in the introduction (lines 67-68). If we found a large sequence polymorphism (e.g. by Pangraph) and searched for known RDs using RDScan, we then assigned a current RD name to this LSP. This uses H37Rv as a reference. If we did not find a known RD, we then classified the LSP as a new RD if it is present in H37Rv, or left the designation as an LSP if not in H37Rv, thus expanding the analysis beyond the H37Rv-centric approaches used by others previously. This is hopefully now made clearer in the methods, lines 187-194.

      Along similar lines, I find the interpretation of structural variants in terms of "regions of difference" confusing, and probably many people outside the TB field will do so. For one thing, it is not clear where these RDs and their names come from. Did the authors use an annotation of RDs in the reference genome H37Rv from previously published work (e.g. Bespiatykh et al. 2021)? This is important basic information, its lack makes it difficult to judge the validity of the results. The Bespiatykh et al. study uses a large short-read data (721 strains) set to characterize diversity in RDs and specifically focuses on the sublineage-specific variants. While the authors cite the paper, it would be relevant to compare the results of the two studies in more detail.

      We have amended the introduction to explain this terminology better (lines 67-68). Naming of the RDs here came from using RDScan to assign current names to any accessory regions we found and if such a region was not a known RD, we gave it a lineage-related name, allowing for proper RD naming later (lines 187-194). Because the Bespiatyk paper is the basis for RDScan, our work implicitly compares to this throughout, as any RDs we find which were not picked up by RDScan are thus novel compared to that paper.

      As far as I understand, "regions of difference" have been used in the tuberculosis field to describe structural variants relative to the reference genome H37Rv. Colloquially, regions present in H37Rv but absent in another strain have been called "deletions". Whether these polymorphisms have indeed originated through deletion or through insertion in H37Rv or its ancestors requires a comparison with additional strains. While the pangenome graph does contain this information, the authors do not attempt to categorize structural variants into insertions and deletions but simply seem to assume that "regions of difference" are deletions. This, as well as the neglect of paralogs in the "classical" pangenome analysis, puts a question mark behind their conclusion that deletion drives pangenome evolution in the MTBC.

      We have now amended the analysis to specifically designate a structural variant as a deletion if present in the majority of strains and absent in a minority, or an insertion/duplication if present in a minority and absent in a majority (lines 191-192). We also ran Panaroo without merging paralogs to examine duplication in this output; Pangraph implicitly includes paralogs already.

      From all these analyses we did not find any structural variants classed as insertions/duplications and did not find paralogs to be a major feature at the sub-lineage level (lines 377-383). While these features could be important on shorter timescales, we do not have enough closed genomes to confidently state this (limitation outlined in lines 452-457). Therefore, our assertion that deletions are a primary force shaping the long-term evolution in this group still holds.

      Reviewer #2 (Public Review):

      Summary:

      The authors attempted to investigate the pangenome of MTBC by using a selection of state-of-the-art bioinformatic tools to analyse 324 complete and 11 new genomes representing all known lineages and sublineages. The aim of their work was to describe the total diversity of the MTBC and to investigate the driving evolutionary force. By using long read and hybrid approaches for genome assembly, an important attempt was made to understand why the MTBC pangenome size was reported to vary in size by previous reports.

      Strengths:

      A stand-out feature of this work is the inclusion of non-coding regions as opposed to only coding regions which was a focus of previous papers and analyses which investigated the MTBC pangenome. A unique feature of this work is that it highlights sublineage-specific regions of difference (RDs) that were previously unknown. Another major strength is the utilisation of long-read whole genomes sequences, in combination with short-read sequences when available. It is known that using only short reads for genome assembly has several pitfalls. The parallel approach of utilizing both Panaroo and Pangraph for pangenomic reconstruction illuminated the limitations of both tools while highlighting genomic features identified by both. This is important for any future work and perhaps alludes to the need for more MTBC-specific tools to be developed.

      Weaknesses:

      The only major weakness was the limited number of isolates from certain lineages and the over-representation others, which was also acknowledged by the authors. However, since the case is made that the MTBC has a closed pangenome, the inclusion of additional genomes would not result in the identification of any new genes. This is a strong statement without an illustration/statistical analysis to support this.

      We have included a Heaps law and genome fluidity calculation for each pangenome estimation to demonstrate that the pangenome is closed. This is detailed in lines 225-228 with results shown in lines 274-278 and 316- 320 and Supplementary Figure 2. We agree that more closely related genomes would benefit a future version of this analysis and indicate we indicate the limitations in the Discussion, lines 452-457 and 471-473.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Abstract

      l. 24, "with distinct genomic features". I'm not sure what you are referring to here.

      We refer to the differences in accessory genome and related functional profiles but did not want to bloat the abstract with such additional details

      Introduction

      l. 40, "L1 to L9". A lineage 10 has been described recently: https://doi.org/10.3201/eid3003.231466.

      We have updated the text and the reference. Unfortunately, no closed genome for this lineage exists so we have not included it in the analyses. We note this in the results, like 232

      l.62/3, "caused by the absence of horizontal gene transfer, plasmids, and recombination". Recombination is not absent in the MTBC, only horizontal gene transfer seems to be, which is what the cited studies show. Indeed a few sentences later homologous recombination is mentioned as a cause of deletions.

      This has now been removed from the introduction

      l. 67, "within lineage diversity is thought to be mostly driven by SNPs". Again I'm not sure what is meant here with "driven by". Point mutations are probably the most common mutational events, but duplications, insertions, deletions, and gene conversion also occur and can affect large regions and possibly important genes, as shown in a recent preprint (https://doi.org/10.1101/2024.03.08.584093).

      We have changed the text to say ‘mostly composed of’. While indeed other SNVs may be contributing, the prevailing thought at lineage level is that SNPs are the primary source of diversity. The linked pre-print is looking at within transmission clusters and this has not been described at the lineage level, which could be done in a future work.

      l. 100/1. "that can account for variations in virulence, metabolism, and antibiotic resistance". I would phrase this conservatively since the functional inferences in this study are speculative.

      This has now been tempered to be less specific.

      Methods

      l. 108. That an assembly has a single contig does not mean that it is "closed". Many single contig assemblies on NCBI are reference-guided short-read assemblies, that is, fragments patched together rather than closed assemblies. The same could be true for long-read assemblies.

      We specifically chose those listed as closed on NCBI so rely on their checks to ensure this is true. We have stated this better in the paper, line 117.

      l. 111. From Supplementary Table 1 understand that for many genomes only the reads were available (no ASM number). Did you assemble these genomes? If yes, how? The assembly method is not indicated in the supplement, contrary to what is written here.

      All public genomes were downloaded in their assembled forms from the various sources. This is specified better in the text (line 118) and the supplementary table 1 now lists the accessions for all the assemblies.

      l. 113. How many assemblies passed this threshold? And is BUSCO actually useful to assess assembly quality in the MTBC? I assume the dynamic, repetitive gene families that cause problems for assembly and mapping in TB (PE, PPE, ESX) do not figure in the BUSCO list of single-copy orthologs.

      All assemblies passed the BUSCO thresholds for high-quality genomes as laid out in Supplementary Table 1. While indeed this does not include multi-copy genes such as PE/PPE we focussed on regions of difference at the sub-lineage level where two or more genomes represent that sub-lineage. This means any assembly issues in a single genome would need to be exactly the same in another of the same sub-lineage to be included in our results. Through this, we aimed to buffer out issues in individual assemblies.

      l. 147: Why is Panaroo used with -merge-paralogs? I understand that near-identical genes may not be too interesting from a functional perspective, but if the aim of the analysis is to make broad claims about processes driving genome evolution, paralogs should be considered.

      We chose to do so with merged paralogs to look for larger patterns of diversity beyond within-genome paralogs. Additionally, this was required to build the core phylogenetic tree. However, as the reviewer points out, this may bias our findings towards deletions and away from duplications as a primary evolutionary force.

      We repeated this without the merged paralogs option and indeed found a larger pangenome, as outlined in Table 1. However, at the sub-lineage level, this did not result in any new presence/absence patterns (lines 381-383). This means the paralogs tended to be in single genomes only. This still indicates that deletions are the primary force in the longer-term evolution of the complex but indeed on shorter spans this may be different.

      l. 153: remove the comment in brackets.

      This has been fixed and the proper URL placed in instead.

      l. 159: which genomes, and why those?

      This is now clarified to state all genomes were used for this analysis.

      l. 161, "gene blocks": since this analysis is introduced as capturing the non-coding part of the genome, maybe just call them "blocks"?

      All references to gene blocks are now changed to genomic blocks to be more specific.

      l. 162: what happens with blocks that yield no hits against RvD1, TbD1, and H37Rv?

      We named these with lineage-specific names (supplementary table 4) but did not assign RD names specifically.

      l. 164: where does the information about the regions of difference come from? How exactly were these regions determined?

      Awe have expanded this section to be more specific on the use of RDScan and new naming, along with how we determine if something is an RD/LSP.

      Results

      l. 185ff: This paragraph gives many details about the geographic origin of the samples, but what I'd expect here is a short description of assembly qualities, for example, the results of the BUSCO analysis, a description of your own Nanopore assemblies, or a small analysis of the number of indels/pseudogenes relative to sequencing technology or coverage (see comment in the public review).

      This section (lines 231-258) has been expanded considerably to give a better overview of the dataset and any potential biases. Supplementary table 1 has also been expanded to include more information on each strain.

      l. 187, "324 genomes published previously": 322 according to the methods section.

      The number has been fixed throughout to the proper total of public genomes (329).

      l. 201: define the soft core, shell, and cloud genes.

      This is now defined on line 262

      l. 228, "defined primarily by RD105 and RD207 deletions": this claim seems to come from the analysis of variable importance (Factoextra), which should be made clear here.

      This has been clarified on line 333.

      l. 237, "L8, serving as the ancestor of the MTBC": this is incorrect, equivalent to saying that the Chimpanzee is the ancestor of Homo sapiens.

      We have changed this to basal to align with how it is described in the original paper.

      l. 239, "The accessory genome of the MTBC". It is a bit confusing that the same term, 'accessory genome', is used here for the graph-based analysis, which is presented as a way to look at the non-coding part of the genome.

      We have clarified the terminology on line 347 and improved consistency throughout.

      l. 240/1, "specific to certain lineages and sublineages". What exactly do you mean by "specific" to? Present only in members of a certain lineage/sublineage? In all members of a certain lineage/sublineage? Maybe an additional panel in Figure 5, showing examples of lineage- and sublineage-specific variants, would help the reader grasp this key concept.

      We have clarified this on line 349 and the legend of what is now figure 4.

      l. 241/2, "82 lineage and sublineage-specific genomic regions ranging from 270 bp to 9.8 kb". Were "gene blocks" filtered for a minimum size, or why are there no variants smaller than 270 bp? A short description of all the blocks identified in the graph could be informative (their sizes, frequencies ...).

      Yes, a minimum of 250bp was set for the blocks to only look at larger polymorphisms. This is clarified on line 177 and 304.

      A second point: It is not entirely clear to me what Figure 6 is showing. Are you showing here a single representative strain per sublineage? Or have you somehow summarized the regions of difference shown in Figure 5 at the sublineage level? What is the tree on the left? This should be made clear in the legend and maybe also in the methods/results.

      In figure 4 (which was figure 6), because each RD is common to all members of the same sub-lineage, we have placed a single branch for each sub-lineage. This is has been clarified in the legend.

      l. 254, "this gene was classified as being in the core genome": why should a partially deleted gene not be in the core genome?

      You are correct, we have removed that statement.

      l. 258/259, "The Pangraph alignment approach identified partial gene deletion and non-coding regions of the DNA that were impacted by genomic deletion". I do not understand how you classify a structural variant identified in the pangenome graph as a deletion or an insertion.

      This has been clarified as relative to H37Rv, as this is standard practice for RDs and general evolutionary analyses in MTBC, as outlined above.

      l. 262/263 , "the accessory genome of the MTBC is small and is acquired vertically from a common ancestor within the lineage". If deletion is the main process involved here, "acquired" seems a bit strange.

      We agree and changed the header to better reflect the discussion on mis-annotation issues

      Figure 1: Good to know, but not directly relevant for the rest of the paper. Maybe move it to the supplement?

      This has been moved to Supplementary figure 1

      Figure 2: the y-axis is labeled 'Variable genome size', but from the text and the legend I figure it should be 'Number of accessory genes'?

      This has been changed to ‘accessory genes’ in Figure 1 (which was figure 2 in previous version).

      Figure 4: too small.

      We will endeavour to ensure this is as large as possible in the final version.

      Discussion

      l. 271, "MTBC accessory genome is ... acquired vertically". See above.

      Changed, as outlined above.

      l. 292, "appeared to be fragmented genes caused by misassemblies". Is there a way to distinguish "true" pseudogenes from misassemblies? This could be a relevant issue for low-coverage long-read assemblies (see public review).

      Not that we are currently aware of, but we do know other groups which are working on this issue.

      l. 300/1, "the whole-genome approach could capture higher genetic variations". Do you mean the graph approach? I'm not sure that comparing the two approaches here makes sense, as they serve different purposes. A pangenome graph is a summary of all genetic variation, while the purpose of Panaroo is to study gene absence/presence. So by definition, the graph should capture more genetic variation.

      This statement was specifically to state that much genetic variation in MTBC is outside the coding genes and so traditional “pangenome’ analyses are actually not looking at the full genomic variation.

      l. 302/3, "this method identified non-coding regions of the genome that were affected by genomic deletions". See the comments above regarding deletions versus insertions. I'd say this method identifies coding and non-coding regions that were affected by genomic deletions and insertions.

      We have undertaken additional analyses to be sure these are likely deletions, as outlined above.

      l. 305: what are "lineage-independent deletions"?

      We labelled these as convergent evolution, now clarified on line 443.

      l. 329: How is RD105 "caused" by the insertion of IS6110? I did not find RD105 mentioned in the Alonso et al. paper. Similarly below, l. 331, how is RD207 "linked" to IS6110?

      The RD105 connection was misattributed as IS6110 insertion is related to RD152, not RD105. This has now been removed.

      RD207 is linked to IS6110 as its deletion is due to recombination between two such elements. This is now clarified on line 486.

      l. 345, "the growth advantage gene group": not quite sure what this is.

      We have fixed this on line 499 to state they are genes which confer growth advantages.

      l. 373ff: The role of genetic drift in the evolution of the MTBC is an open question, other studies have come to different conclusions than Hershberg et al. (this has been recently reviewed: https://doi.org/10.24072/pcjournal.322).

      We have outlined this debate better in lines 527-531

      l. 375/6, "Gene loss, driven by genetic drift, is likely to be a key contributor to the observed genetic diversity within the MTBC." This sentence would need some elaboration to be intelligible. How does genetic drift drive gene loss?

      We have removed this.

      l. 395/6, "... predominantly driven by genome reduction. This observation underlines the importance of genomic deletions in the evolution of the MTBC." See comments above regarding deletions. I'm not convinced that your study really shows this, as it completely ignores paralogs and the processes counteracting reductive genome evolution: duplication and gene amplification.

      As outlined above, we have undertaken additional analyses to more strongly support this statement.

      l. 399, "the accessory genome of MTBC is a product of gene deletions, which can be classified into lineage-specific and independent deletions". Again, I'm not sure what is meant by lineage-independent deletions.

      We have better defined this in the text, line 443, to be related to convergent evolution.

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses.

      In lines 120-121, it is mentioned that TB-profiler v4.4.2 was used for lineage classification, but this version was released in February 2023. As I understand there have been some changes (inclusion/exclusion) of certain lineage markers. Would it not be appropriate to repeat lineage classification with a more recent version? This would of course require extensive re-analysis, so could the lineage marker database perhaps also be cited.

      We have rerun all the genomes through TB-Profiler v6.5 and updated the text to state this; the exact database used is also now stated.

      Could the authors perhaps include the sequencing summary or quality of the nanopore sequences? The L9 (Mtb8) sample had a relatively lower depth and resulted in two contigs. Yet one contig was the initial inclusion criteria. It is unclear whether these samples were excluded from some of the analyses. Mtb6 also has relatively low coverage. Was the sequencing quality adequate to accurately identify all the lineage markers, in particular those with a lower depth of coverage? Could a hybrid approach be an inexpensive way to polish these assemblies?

      We reanalysed the L9 sample and, with some better cleaning, got it to a single contig with better depth and overall score. This is outlined in the Supplementary table 1 sheets. While depth is average, it is still above the recommended 30x, which is needed for good sequence recovery (Sanderson et al., 2024). We did indeed recover all lineage markers from these assemblies.

      Recommendations for improving the writing and presentation.

      The introduction is well-written and recent MTBC pangenomic studies have been incorporated, but I am curious as to why this paper was not referred to: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6922483/ I believe this was the first attempt to study the pangenome, albeit with a different research question. Nearly all previous analyses largely focused on utilizing the pangenome to investigate transmission.

      Indeed this study did look at a pangenome of sorts, but specifically SNPs and not genes or regions. Since the latter is the main basis for pangenome work these days, we chose not to include this paper.

      Minor corrections to the text and figures.

      In line 129, it is explained that DNA was extracted to be suitable for PacBio sequencing, but ONT sequencing was used for the 11 new sequences. Is this a minor oversight or do the authors feel that DNA extracted for PacBio would be suitable for ONT sequencing? It is a fair assumption.

      We apologise, this is a long-read extraction approach and not specific to PacBio. We have amended the text to state this.

      In line 153, this should be removed: (Conor, could you please add the script to your GitHub page?).

      This has been fixed now.

    1. eLife Assessment

      This study presents potentially valuable insights into the role of climbing fibers in cerebellar learning. The main claim is that climbing fiber activity is necessary for optokinetic reflex adaptation, but is dispensable for its long-term consolidation. There is evidence to support the first part of this claim, though it requires a clearer demonstration of the penetrance and selectivity of the manipulation. However, support for the latter part of the claim is incomplete owing to methodological concerns, including the robustness of the CF marking and manipulation approach and the unclear efficacy of longer-duration climbing fiber activity suppression.

    2. Reviewer #2 (Public review):

      Summary:

      The authors aimed to explore the role of climbing fibers (CFs) in cerebellar learning, with a focus on optokinetic reflex (OKR) adaptation. Their goal was to understand how CF activity influences memory acquisition, memory consolidation, and memory retrieval by optogenetically suppressing CF inputs at various stages of the learning process.

      Strengths:

      The study addresses a significant question in the cerebellar field by focusing on the specific role of CFs in adaptive learning. The authors use optogenetic tools to manipulate CF activity. This provides a direct method to test the causal relationship between CF activity and learning outcomes.

      Weaknesses:

      Despite shedding light on the potential role of CFs in cerebellar learning, the study is hampered by significant methodological issues that question the validity of its conclusions. The absence of detailed evidence on the effectiveness of CF suppression and concerns over tissue damage from optogenetic stimulation weaken the argument that CFs are not essential for memory consolidation. These challenges make it difficult to confirm whether the study's objectives were fully met or if the findings conclusively support the authors' claims. The research commendably attempts to unravel the temporal involvement of CFs in learning but also underscores the difficulties in pinpointing specific neural mechanisms that underlie the phases of learning. Addressing these methodological issues, investigating other signals that might instruct consolidation, and understanding CFs' broader impact on various learning behaviors are crucial steps for future studies.

      Comments on revisions:

      In this revision, the authors provide new data regarding the effect of eNpHR on CF-evoked complex spiking in vivo but fails to address overall concerns showing the functional effect that explains their causal results. Additionally, the paper has a narrow "CF-or-nothing" framing that leaves unanswered the central question of which signal instructs consolidation if CFs do not. Substantial new experiments and tighter logic are required before the work can serve as a definitive test of CF involvement in different memory processes.

    3. Reviewer #3 (Public review):

      Summary:

      The authors attempted to study connections with the inferior olive to the cerebellar cortex and analyze impacts on optokinetic reflex using optogenetics to perturb the pathway. This is a commendable effort as these methods are very challenging due to the location of the inferior olive and recording methods.

      Strengths:

      The authors have shown that climbing fiber activity was altered due to the optogenetic perturbation. They have added an additional figure to show that complex spikes disappear with inhibitory optogenetics and the impacts on behavior are interesting.

      Weaknesses:

      The images provided to show injection region are difficult to see and specific cell types are not co-labeled. The data and strength of the results would benefit from high-resolution images demonstrating selectivity and expression, in particular for Figure 2A and 3A. In addition, while the processed recording data looks very striking, including the raw data, as done in Figure 2, would again support the conclusions.

      One major concern is that the viruses chosen are non-specific to the cell targets and a cre-based approach is lacking to draw conclusions on only the targeted pathway of interest. It is unclear based on the figures provided if the AAVs labeled only the pathway of interest. It would be interesting to know if typical memory acquisition returns in the same animals if inhibition stops and if animal movement was impacted by the perturbation.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by Seo et al highlights knowledge gaps regarding the role of cerebellar complex spike (CS) activity during different phases of learning related to optokinetic reflex (OKR) in mice. The novelty of the approach is twofold: first, specifically perturbing the activity of climbing fibers (CFs) in the flocculus (as opposed to disrupting communication between the inferior olive (IO) and its cerebellar targets globally); and second, examining whether disruption of the CS activity during the putative "consolidation phase" following training affects OKR performance.

      The first part of the results provides adequate evidence supporting the notion that optogenetic disruption of normal CF-Purkinje neuron (PN) signaling results in the degradation of OKR performance. As no effects are seen in OKR performance in animals subjected to optogenetic irradiation during the memory consolidation or retrieval phases, the authors conclude that CF function is not essential beyond memory acquisition. However, the manuscript does not provide a sufficiently solid demonstration that their long-term activity manipulation of CF activity is effective, thus undermining the confidence of the conclusions.

      Strengths:

      The main strength of the work is the aim to examine the specific involvement of the CF activity in the flocculus during distinct phases of learning. This is a challenging goal, due to the technical challenges related to the anatomical location of the flocculus as well as the IO. These obstacles are counterbalanced by the use of a well-established and easy-to-analyse behavioral model (OKR), that can lead to fundamental insights regarding the long-term cerebellar learning process.

      Weaknesses:

      The impact of the work is diminished by several methodological shortcomings.

      Most importantly, the key finding that prolonged optogenetic inhibition of CFs (for 30 min to 6 hours after the training period) must be complemented by the demonstration that the manipulation maintains its efficacy. In its current form, the authors only show inhibition by short-term optogenetic irradiation in the context of electrical-stimulation-evoked CSs in an ex vivo preparation. As the inhibitory effect of even the eNpHR3.0 is greatly diminished during seconds-long stimulations (especially when using the yellow laser as is done in this work (see Zhang, Chuanqiang, et al. "Optimized photo-stimulation of halorhodopsin for long-term neuronal inhibition." BMC biology 17.1 (2019): 1-17), we remain skeptical of the extent of inhibition during the long manipulations. In short, without a demonstration of effective inhibition throughout the putative consolidation phase (for example by showing a significant decrease in CS frequency throughout the irradiation period), the main claim of the manuscript of phase-specific involvement of CF activity in OKR learning can not be considered to be based on evidence.

      Second, the choice of viral targeting strategy leaves gaps in the argument for CF-specific mechanisms. CaMKII promoters are not selective for the IO neurons, and even the most precise viral injections always lead to the transfection of neurons in the surrounding brainstem, many of which project to the cerebellar cortex in the form of mossy fibers (MF). Figure 1Bii shows sparsely-labelled CFs in the flocculus, but possibly also MFs. While obtaining homogenous and strong labeling in all floccular CFs might be impossible, at the very least the authors should demonstrate that their optogenetic manipulation does not affect simple spiking in PNs.

      Finally, while the paper explicitly focuses on the effects of CF-evoked complex spikes in the PNs and not, for example, on those mediated by molecular layer interneurons or via direct interaction of the CF with vestibular nuclear neurons, it would be best if these other dimensions of CF involvement in cerebellar learning were candidly discussed.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to explore the role of climbing fibers (CFs) in cerebellar learning, with a focus on optokinetic reflex (OKR) adaptation. Their goal was to understand how CF activity influences memory acquisition, memory consolidation, and memory retrieval by optogenetically suppressing CF inputs at various stages of the learning process.

      Strengths:

      The study addresses a significant question in the cerebellar field by focusing on the specific role of CFs in adaptive learning. The authors use optogenetic tools to manipulate CF activity. This provides a direct method to test the causal relationship between CF activity and learning outcomes.

      Weaknesses:

      Despite shedding light on the potential role of CFs in cerebellar learning, the study is hampered by significant methodological issues that question the validity of its conclusions. The absence of detailed evidence on the effectiveness of CF suppression and concerns over tissue damage from optogenetic stimulation weakens the argument that CFs are not essential for memory consolidation. These challenges make it difficult to confirm whether the study's objectives were fully met or if the findings conclusively support the authors' claims. The research commendably attempts to unravel the temporal involvement of CFs in learning but also underscores the difficulties in pinpointing specific neural mechanisms that underlie the phases of learning. Addressing these methodological issues, investigating other signals that might instruct consolidation, and understanding CFs' broader impact on various learning behaviors are crucial steps for future studies.

      We appreciate the editors and reviewers for their constructive feedback and careful consideration of our manuscript. Despite their acknowledgment of the potential of our study to yield valuable insights into the role of CF activity in cerebellar learning and its phase-specific involvement, we have meticulously addressed all the methodological concerns raised by providing additional clarifications and explanations in this letter.

      In response to concerns regarding the efficacy of long-term optogenetic inhibition, we conducted additional in vivo monitoring of CF activity during the irradiation period, confirming sustained inhibition of complex spikes throughout the consolidation phase (Figure 2, lines 112-139). Although stable single-unit recording beyond 40 minutes was not feasible due to technical challenges, the robust suppression of CF-evoked complex spikes we observed during this period (Figure 2, lines 112–139) provides strong evidence that halorhodopsin-mediated inhibition persists over the longer irradiation intervals employed in our behavioral assays.

      Moreover, given that there is a concern regarding the CaMKII promoter also inducing expression in neighboring mossy fibers, potentially affecting simple spike activity, we have presented data in Figure 2C, which illustrates that PC simple spike firing rates remain unchanged during prolonged illumination. This finding confirms that our optogenetic manipulation selectively disrupts CF-mediated complex spikes without influencing mossy fiber to PC transmission. We have elucidated these results further in lines 128 to 136.

      Lastly, we have broadened our Discussion to consider alternative mechanisms of CF involvement in cerebellar learning, including the modulation of molecular layer interneurons (Rowan et al., 2018) and direct CF interactions with vestibular nuclear neurons (Balaban et al., 1981), thereby offering a more comprehensive perspective on the multifaceted role of CF signaling. Specific clarifications regarding these points are articulated from lines 222 to 242 and 243 to 254 in the manuscript. We are confident that these revisions adequately address the reviewers' concerns and further substantiate the specificity and significance of our study findings

      (1) Rowan, Matthew JM, et al. "Graded control of climbing-fiber-mediated plasticity and learning by inhibition in the cerebellum." Neuron 99.5 (2018): 999-1015.

      (2) Balaban, Carey D., Yasuo Kawaguchi, and Eiju Watanabe. "Evidence of a collateralized climbing fiber projection from the inferior olive to the flocculus and vestibular nuclei in rabbits." Neuroscience letters 22.1 (1981): 23-29.

    1. eLife Assessment

      The study introduces new tools for measuring the intracellular calcium concentration close to transmitter release sites, which may be relevant for synaptic vesicle fusion and replenishment. This approach yields important new information about the spatial and temporal profile of calcium concentrations near the site of entry at the plasma membrane. This experimental work is complemented by a coherent, open-source, computational model that successfully describes changes in calcium domains. Some of the conclusions are strongly supported by the data, but a few gaps in the data presented mean that the evidence for other conclusions is incomplete.

    2. Reviewer #1 (Public review):

      This paper describes technically-impressive measurements of calcium signals near synaptic ribbons in goldfish bipolar cells. The data presented provides high spatial and temporal resolution information about calcium concentrations along the ribbon at various distances from the site of entry at the plasma membrane. This is important information. Important gaps in the data presented mean that the evidence for the main conclusions is currently inadequate.

      Strengths

      • The technical aspects of the measurements are impressive. The authors use calcium indicators bound to the ribbon and high speed line scans to resolve changes with a spatial resolution of ~250 nm and temporal resolution of less than 10 ms. These spatial and temporal scales are much closer to those relevant for vesicle release than previous measurements.

      • The use of calcium indicators with very different affinities and of different intracellular calcium buffers helps provide confirmation of key results.

      Weaknesses

      • Multiple key points of the paper lack a statistical test or summary data from populations of cells. For example, the text states that the proximal and distal calcium kinetics in Figure 2A differ. This is not clear from the inset to Figure 2A - where the traces look like scaled versions of each other. Values for time to half-maximal peak fluorescence are given for one example cell but no statistics or summary are provided. Figure 8 shows examples from one cell with no summary data. This issue comes up in other places as well.

      • The rise time measurements in Figure 2 are very different for low and high affinity indicators, but no explanation is given for this difference. Similarly, the measurements of peak calcium concentration in Figure 4 are very different with the two indicators. That might suggest that the high affinity indicator is strongly saturated, which raises concerns about whether that is impacting the kinetic measurements.

    3. Reviewer #2 (Public review):

      Summary:

      The study introduces new tools for measuring intracellular Ca2+ concentration gradients around retinal rod bipolar cell (rbc) synaptic ribbons. This is done by comparing the Ca2+ profiles measured with mobile Ca2+ indicator dyes versus ribbon-tethered (immobile) Ca2+ indicator dyes. The Ca2+ imaging results provide a straightforward demonstration of Ca2+ gradients around the ribbon and validate their experimental strategy. This experimental work is complemented by a coherent, open-source, computational model that successfully describes changes in Ca2+ domains as a function of Ca2+ buffering. In addition, the authors try to demonstrate that there is heterogeneity among synaptic ribbons within an individual rbc terminal.

      Strengths:

      The study introduces a new set of tools for estimating Ca2+ concentration gradients at ribbon AZs, and the experimental results are accompanied by an open-source, computational model that nicely describes Ca2+ buffering at the rbc synaptic ribbon. In addition, the dissociated retinal preparation remains a valuable approach for studying ribbon synapses. Lastly, excellent EM.

      Comments on revisions:

      Specific minor comments:

      (1) Rewrite the final sentence of the Abstract. It is difficult to understand.

      (2) Add a definition in the Introduction (and revisit in the Discussion) that delineates between micro- and nano-domain. A practical approach would be to round up and round down. If you round up from 0.6 um, then it is microdomain which means ~ 1 um or higher. Likewise, round down from 0.3 um to nanodomain? If you are using confocal, or even STED, the resolution for Ca imaging will be in the 100 to 300 nm range. The point of your study is that your new immobile Ca2-ribbon indicator may actually be operating on a tens of nm scale: nanophysiology. The Results are clearly written in a way that acknowledges this point but maybe make such a "definition" comment in the intro/discussion in order to: 1) demonstrate the power of the new Ca2+ indicator to resolve signals at the base of the ribbon (effectively nano), and 2) (Discussion) to acknowledge that some are achieving nanoscopic resolution (50 to 100nm?) with light microscopy (as you ref'd Neef et al., 2018 Nat Comm).

      (3) Suggested reference: Grabner et al. 2022 (Sci Adv, Supp video 13, and Fig S5). Here rod Cav channels are shown to be expressed on both sides the ribbon, at its base, and they are within nanometers from other AZ proteins. This agrees with the conclusions from your imaging work.

      (4) In the Discussion, add a little more context to what is known about synaptic transmission in the outer and inner retina.. First, state that the postsynaptic receptors (for example: mGluR6-OnBCs vs KARs-Off-BCs, vs. AMPAR-HCs), and possibly the synaptic cleft (ground squirrel), are known to have a significant impact on signaling in the outer retina. In the inner retina, there are many more unknowns. For example, when I think of the pioneering Palmer JPhysio study, which you sight, I think of NMDAR vs AMPAR, and uncertainty in what type postsynaptic cell was patched (GC or AC....). Once you have informed the reader that the postsynapse is known to have a significant impact on signaling, then promote your experimental work that addresses presynaptic processes: "...the new tool and results allow us to explore release heterogeneity, ribbon by ribbon in dissociated preps, which we eventually plan to use at ribbon synapses within slices......to better understand how the presynapse shapes signaling......".

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors have developed a new Ca indicator conjugated to the peptide, which likely recognizes synaptic ribbons and have measured microdomain Ca near synaptic ribbons at retinal bipolar cells. This interesting approach allows one to measure Ca close to transmitter release sites, which may be relevant for synaptic vesicle fusion and replenishment. Though microdomain Ca at the active zone of ribbon synapses has been measured by Hudspeth and Moser, the new study uses the peptide recognizing synaptic ribbons, potentially measuring the Ca concentration relatively proximal to the release sites.

      Strengths:

      The study is, in principle, technically well done, and the peptide approach is technically interesting, which allows one to image Ca near the particular protein complexes. The approach is potentially applicable to other types of imaging.

      Weaknesses:

      Peptides may not be entirely specific, and genetic approach tagging particular active zone proteins with fluorescent Ca indicator proteins may well be more specific. Although the authors are aware of this and the peptide approach is generally used for ribbon synapses, the authors should be aware of this, when interpreting the results.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      This paper describes technically-impressive measurements of calcium signals near synaptic ribbons in goldfish bipolar cells. The data presented provides high spatial and temporal resolution information about calcium concentrations along the ribbon at various distances from the site of entry at the plasma membrane. This is important information. Important gaps in the data presented mean that the evidence for the main conclusions is currently inadequate.

      Thank you very much for this positive evaluation of our work. We would like to respectfully point out to the Reviewer that our current study was conducted using zebrafish as a model and not goldfish. We have revised the paper to eliminate any gaps in the data presentation.

      Strengths

      (1) The technical aspects of the measurements are impressive. The authors use calcium indicators bound to the ribbon and high-speed line scans to resolve changes with a spatial resolution of ~250 nm and a temporal resolution of less than 10 ms. These spatial and temporal scales are much closer to those relevant for vesicle release than previous measurements.

      (2) The use of calcium indicators with very different affinities and different intracellular calcium buffers helps provide confirmation of key results.

      Thank you very much for this positive evaluation of our work.

      Weaknesses

      (1) Multiple key points of the paper lack statistical tests or summary data from populations of cells. For example, the text states that the proximal and distal calcium kinetics in Figure 2A differ. This is not clear from the inset to Figure 2A - where the traces look like scaled versions of each other. Values for time to half-maximal peak fluorescence are given for one example cell but no statistics or summary are provided. Figure 8 shows examples from one cell with no summary data. This issue comes up in other places as well.

      Thank you for this feedback. We have addressed this in our revised manuscript where possible. We now include the results of paired-t-tests to compare the amplitudes of proximal vs. distal calcium signals shown in Fig. 2A & C, Fig. 3C & D, Fig. 4 C & D, Fig. 5A-D, and Fig. 8E&F. Because proximal and distal calcium signals were obtained from the same ribbons within 500-nm distances, as the Reviewer pointed out, “the traces look like scaled versions of each other”. For experiments where we make comparisons across cells or different calcium indicators, as shown in Fig.3 E&F, Fig.5E, and Fig. 8B&C, we now include the results of an unpaired t-test. We have now included the t-test statistics information in the respective figure legends in the revised version.

      Regarding the Reviewer’s concern that “values for time to half-maximal peak fluorescence are given for one example cell, but no statistics or summary are provided,” we estimated the fluorescence rise times by only fitting the average traces to compare the overall qualitative behavior of the corresponding calcium indicator fluorescence. We did attempt to analyze the uncertainty for the rise-time estimates, but the simultaneous fitting of the rise- and decay-behavior of time traces is notoriously sensitive to noise, and therefore, a much higher signal-to-noise ratio would be required to provide reliable uncertainty estimation for the corresponding rise-time and decay-time characteristics. This is now explicitly explained in the corresponding Methods subsection.

      In Figure 8, we now show example fluorescence traces from one cell at the bottom of the A and D panels, and the summary data is described in B-C and E-F, with statistics provided in the figure legends.

      (2) Figure 5 is confusing. The figure caption describes red, green, and blue traces, but the figure itself has only two traces in each panel and none are red, green, or blue. It's not possible currently to evaluate this figure.

      Thank you for pointing out this oversight. The figure shows the proximal and distal calcium signals, not the cytoplasmic ones. The figure caption was adjusted to correctly reflect what is shown in the figure.

      (3) The rise time measurements in Figure 2 are very different for low and high-affinity indicators, but no explanation is given for this difference. Similarly, the measurements of peak calcium concentration in Figure 4 are very different from the two indicators. That might suggest that the high-affinity indicator is strongly saturated, which raises concerns about whether that is impacting the kinetic measurements.

      We agree with the Reviewer and had mentioned in the text that we do believe that the high-affinity version of the dye is at least partially saturated. This will be especially a problem for strong depolarizations and signals near the membrane. We slightly changed the corresponding description of results on page 6 to acknowledge this point: “However, it should be noted that Cal520HA will be at least partially saturated at the Ca2+ levels expected in Ca2+ microdomains relevant for vesicle exocytosis, affecting both the amplitude and the kinetics of the fluorescence signal”. 

      Recommendations:

      (1) It would be good to describe the location of calcium channels relative to the ribbon in the introduction.

      We have provided this information in the discussion (please see p. 19: “The faster, smaller, and more spatially confined Ca<sup>2+</sup> signals that are insensitive to the application of high concentrations of exogenous Ca<sup>2+</sup> buffers, referred to here as ribbon proximal Ca<sup>2+</sup> signals, could be due to Ca<sup>2+</sup> influx through Cav channel clusters beneath the synaptic ribbon”). We have now provided this information in the last paragraph of the introduction as well. 

      (2) The introduction is quite technical and would benefit from a more complete description of the findings of the paper (e.g. expanding the last sentence to a full paragraph).

      We have updated the last paragraph of the introduction as per the reviewer’s advice.

      (3) It is not clear that the capacitance measurements in Figure 1 are needed (I did not see them used anywhere else in the paper).

      We have removed the capacitance measurements from the figure.

      (4) Please add legends in the figures themselves defining different line colors and weights so that a reader does not need to search for them in the figure caption.

      We agree that such figure improvements facilitate reading. We have added legends in the figures themselves, where appropriate.

      (5) The insets with the expanded traces in many cases are too small - e.g. Figure 1F.

      We have enlarged the insets in applicable figures as much as possible to facilitate visualization. These changes can be seen in Figures 1, 2, 3, 4, 5, and 8, as well as Supplementary Figure 3.

      (6) Page 5, statistics for amplitude of calcium changes. Is p < 0.001 really correct here? The SEMs indicate an overlap of the two distributions of mean amplitudes - and later data for which you give p = 0.001 has much less overlap.

      Since the two data sets in question come from paired recordings, with a high Pearson correlation coefficient of 0.93, the p-values are in fact, correct despite this significant overlap. We conducted paired-t-tests to compare proximal vs. distal calcium signals obtained from a single calcium indicator shown in Fig. 2A & C, Fig. 3C & D, Fig.4 C & D, Fig.5A-D, and Fig. 8E&F. For experiments where we make comparisons across cells or across different calcium indicators, as shown in Fig.3 E&F, Fig.5E, and Fig. 8B&C, we performed an unpaired t-test. In response to the Reviewer’s comment, we now provide details on t-statistics in the respective figure legends in the revised version.

      (7) The text on page 6 describing Figure 3 appears to repeat several technical aspects of the measurements that have already been described in Figure 1. I would reduce that overlap as it is confusing for a reader.

      Since Fig.1 describes calcium measurements with free calcium indicator, whereas Fig.3 describes bound calcium indicator, we would prefer to keep the information for the sake of completeness, despite some small amount of repetition.

      (8) Figure 4A needs to be described in more detail.

      We have provided the vesicle pool details in the Supplementary Fig. 1.

      (9) The text in Figure 7 is too small.

      We have redone Fig. 7 and Supplemental Fig. 4 to ensure that the tick labels and other text are sufficiently large.

      (10) Are the units (nM) in Figure 8 correct?

      Thank you for pointing that out. The units were supposed to be µM and have been corrected in the figure.

      Reviewer #2 (Public review):

      Summary:

      The study introduces new tools for measuring intracellular Ca2+ concentration gradients around retinal rod bipolar cell (rbc) synaptic ribbons. This is done by comparing the Ca2+ profiles measured with mobile Ca2+ indicator dyes versus ribbon-tethered (immobile) Ca2+ indicator dyes. The Ca2+ imaging results provide a straightforward demonstration of Ca2+ gradients around the ribbon and validate their experimental strategy. This experimental work is complemented by a coherent, open-source, computational model that successfully describes changes in Ca2+ domains as a function of Ca2+ buffering. In addition, the authors try to demonstrate that there is heterogeneity among synaptic ribbons within an individual rbc terminal.

      Strengths:

      The study introduces a new set of tools for estimating Ca2+ concentration gradients at ribbon AZs, and the experimental results are accompanied by an open-source, computational model that nicely describes Ca2+ buffering at the rbc synaptic ribbon. In addition, the dissociated retinal preparation remains a valuable approach for studying ribbon synapses. Lastly, excellent EM.

      Thank you very much for this appreciation of our work.

      Weaknesses:

      Heterogeneity in the spatiotemporal dynamics of Ca2+ influx was not convincingly related to ribbon size, nor was the functional relevance of Ca2+ dynamics to rod bipolars demonstrated (e.g., exocytosis to different postsynaptic targets). In addition, the study would benefit from the inclusion of the Ca2+ currents that were recorded in parallel with the Ca2+ imaging.

      Thank you for this critique. We agree that our data do not establish the relationship between ribbon size and Ca<sup>2+</sup> signal. By analogy to the hair cell literature, we believe that it is a reasonable hypothesis, but more studies will be necessary to definitively determine whether the signal relates to ribbon size or synaptic signaling. This will be addressed in future experiments.

      We have included the calcium current recorded in parallel with calcium imaging in Fig.1, when we show a single example. We now do the same for individual examples shown in Fig. 8 A and D, bottom. The calcium imaging data shown in Figs. 2-5 and Supp. Fig. 3 is the average trace, thus we have provided the averages of the peak calcium current and statistics. Since in Figure 8D-F some ribbons only have one reading, we have not conducted statistical analysis in this case. 

      Recommendations:

      The major conclusion of the work is that within bipolar cells, heterogeneity exists between Ca2+ microdomains formed at synaptic ribbons, which is supported by the results; however, what causes this is not clear. Most of the comments below are suggestions that hopefully help the authors strengthen the association of Ca2+ domain heterogeneity with features of ribbon AZs or at least offer additional options for the authors to communicate their work.

      (1) In the current study, anatomical segregation of SRs by size does not appear to exist across the ZF rod bipolar terminal, nor has this been reported for mouse rod bipolars. In the absence of this, the current study lacks the fortuitous attributes, and thus reasoning, utilized in the hair cell (HC) studies (those cited in the current MS). Namely, the HC studies utilized the following anatomical features to compare EM, IF, and physio results: a) identified differences in ribbon synapses along a tonotopic gradient (basal to apical cochlea), b) compared ribbons on different sides of an inner HC (pillar vs. modiolar), or c) examined age-dependent changes in HC ribbons.

      Thank you for this comment. We agree that we do not show any interesting systematic relationships between ribbon size and cell position or other large-scale morphological features. We added text on page 19 to stress this (“However, in comparing our findings with studies of ribbon size heterogeneity in hair cell…”). However, to our knowledge, diversity in ribbon size has never been reported in bipolar cells. 

      (2) In the absence of intrinsic topographical segregation in ribbon size within rod bipolars, then a) the imaging data attained from dissoc cells needs to be internally as sound as possible, and b) the parameters used to define ribbon dimensions in light (LM) and electron microscopy should be as communicative/interchangeable as possible.

      Thank you for this comment. Our confocal images show a moderate correlation between ribbon size measured as fluorescence of ribeye binding peptide vs. calcium hot spots.  Similarly, SBF-SEM images demonstrate that the ribbon active zone length vs width show a moderate correlation. We have summarized these findings in Figure 11. Thus, as the Reviewer pointed out, our confocal and SBF-SEM findings support each other.

      (3) It is not entirely clear how the authors distinguish rod bipolars (a subset of On-bipolars) from all other ON-bipolars? The two different preparations: dissoc or intact retina, present distinct challenges. In the example presented in Supplementary Figure 2B, the PKCalpha stained bipolar has an axon that is approx. 25 um long, but the expected length should be approx. 50um based on ZF retinal anatomy and recent study on rbc1/2 (Hellevik et al BioRxiv 2023). One could argue rather that the enzymatic treatment or mechanical shear forces caused the axon to shrink. If that is the line of reasoning, then present a low mag field of view with an assortment of dissoc bipolars stained for PKCalpha, zoom in, and describe cell morphologies and their assignment as PKCa + or -. Then you can summarize how axon terminal size, axon length, and PKC staining are or aren't correlated. Based on the results, one might have to perform IF on each dissoc cell that was assayed under LM (Ca2+ imaging) and ephys to verify it's a rod bipolar. In the case of the EM, the authors refer to the terminals analyzed as rbcs because they have larger terminals and less branching than the cbs. Since these are really nice EM images, data-rich, with better resolution than I have ever seen for retinal SBF-EM, do due diligence by tracing the terminals of neighboring bcs (ignoring details within terminals just outline terminals) and make a visual presentation that illustrates that those you selected as rbs have larger terminals than cbs (this can also give of sense of the density distribution of terminal types). Is there a published ephysio on the ZF rbcs which has been correlated with morphology? The Hellevik et al BioRxiv 2023 study shows light responses but not necessary rbcs distinguished from other On-bcs.

      We have quantified the number of rod bipolar cells obtained from our isolation procedure using two approaches: 1. To fix the isolated bipolar cells and perform immunofluorescence with PKC alpha. 2. To isolate bipolar cells from Tg(vsx1: memCerulean)<sup>q19</sup> transgenic zebrafish, labeling rod bipolar cell type 1 (RBC1) that we recently obtained from Dr. Yoshimatsu (Hellevik et al., 2024). Of note, the circuitry of RBC1 has been shown to be similar to the mammalian rod bipolar cell pathway (Hellevik et al., 2024). Below, we list our findings:

      The average terminal size of fixed bipolar cells labeled with PKC alpha was 5.9 ± 0.2 mm, whereas the freshly isolated living bipolar cells used for our physiology experiments had an average terminal size of 6.3 ± 0.2 mm, and the rod bipolar cells from the Tg(vsx1: memCerulean)<sup>q19</sup> line had an average terminal size of 6.9 ± 0.2 mm. We also measured terminal size for fixed bipolar cells, unlabeled with PKC alpha: 3.3 ± 0.2 mm, and unlabeled cells from Tg(vsx1: memCerulean)<sup>q19</sup> cells: 4.0± 0.2 mm.

      In addition, we also pay attention to the soma shape and dendrites, as the primary dendrite of the RBC is thick and short. Connaughton and Nelson have done a thorough analysis of morphological classification. But no measurements were given. https://onlinelibrary.wiley.com/doi/10.1002/cne.20261. Since the axon length is not retained during the isolation procedure, we do not use it as an identification marker for rod bipolar cells in our experiments.

      We re-imaged vsx1 with the DIC channel to compare the terminal sizes of fluorescently labeled RBC1 terminals with those of other BPCs in the DIC channel. Below are the images that can give a sense of the density distribution of terminal types and measurements.

      Author response image 1.

      Tracing all neighboring terminals in SBF-SEM is laborious and beyond the scope of this manuscript, but we will do full reconstructions in a future publication.

      (4) How to strengthen the description of heterogeneity within the dissoc measurements? There are two places in the LM data where heterogeneity may be relevant. The first point here is that Ribbon size (TAMRA- Ribeye binding peptide) and active zone size (Cal520HA/LA-RBP) measurements depend on labelling the ribbon/Ribeye; thus, Ribbon size and AZ size should be correlated on this basis alone. I would expect Pearson's r value to show a stronger association (r > 0.7) than what is reported in Figure 11B/C (r: 0.52 or 0.32). I would interpret a moderate to weak correlation (r < 0.5 to 0.3) as an indication that ribbons are heterogeneous (variability in Ca influx per unit ribbon size). Now to the second point, in Figure 8 and Supplementary Figure 5 there is time-signal amplitude heterogeneity. >>> My curiosity is whether signal amplitude is heterogeneous in space (ribbon size, my speculation) and in time (complex, but compare ribeye bound and free Ca2+ indicator)? It seems like the data in Figure 8 and 11 should cross over and possibly offer the authors more to say.

      We appreciate the Reviewer’s insightful observation and added a sentence at the very end of the Results section reflecting the Reviewer’s argument (“we note that a large correlation between the inferred ribbon size and active zone size…”)

      The Reviewer’s second point about the connection between heterogeneity of signal amplitude in space and in time is an interesting one as well and could be grounds for an additional investigation in the future.

      (5) As the authors know, a very powerful tool for exploring Ca microdomain dynamics is to exploit the Voltage dependence of Cavs (as exemplified in the numerous HC studies that are cited). An I-V protocol would provide a valuable means to illustrate different rates of saturating the LA and HA Ca indicators. More generally, the Ca currents and associated patch clamp parameters (Gm, leak...) can tell us much about the health of the cell and provide an added metric to assess normal variability between cells. A few places in the MS currents are mentioned yet this data is missing (Figure S5 , last line: Amplitude variability between two cells with similar Ca currents.).

      Thank you for the valuable suggestion. We will include I-V protocol across several ribbons in future experiments.  We have included the calcium currents for all the calcium transient traces. We have also included the statistics to compare those currents across conditions.

      Technical comments

      (6) Since the Ribeye-Ca2+ indicator covers the entire ribbon, it will contribute to a signal gradient. The proximal signal is assumed to be closest to the base of the ribbon where presumably the Cav channels are located, and the distal signal will originate from the top (apex) of the ribbon some 200 nm from the base of the ribbon. Have you tried to measure "ribbon lengths and widths" with the HA and LA Ca indicators? My guess would be that the LA will show a gradient, and give you a better indication of the base of the ribbon; whereas the HA signal will have dimensions similar to the TAMRA-peptide.

      Due to the point spread function limitation in the light microscopy, we obtained all ribbon measurements from the SBF-SEM images only. 

      As a surrogate for size in the light microscopy, we used ribbon fluorescence, which we expect should scale with the number of ribeye molecules in the ribbon (Figure 11B) 

      (7) Normalize proximal and distal LM data to highlight kinetic differences (Fig 2-5, 8), and when describing temporal heterogeneity please use a better description that includes time, such as time-to-pk, and decay1, decay 2....

      In the current manuscript, we only focus on the amplitude as it provides the information about the number of calcium channels. We used the rise time measurements to compare the time to reach the peak amplitude at the proximal vs. distal locations, demonstrating that proximal calcium signals reach the peak faster since the calcium channels are located beneath the ribbon.

      We tried to perform fittings to the individual traces. Since they are too noisy to pick out true kinetic differences between ribbons, we would need to average several traces from each ribbon. We plan to apply our high-resolution approach established in this paper to a longer stimulus and perform the fittings as per the Reviewer’s advice for a future paper.

      We now describe on pages 6-7 the two decay components for data in Figs. 2 and 3.

      (8) Why not measure ribbon length in EM as done in confocal and then compare lengths from LM and EM. In Figure S8, you have made a nice presentation of AZ Area from EM. Make similar plots for EM ribbon length (and width?), and compare the distributions to Figure 11 LM data. Maybe use other statistical descriptions like Coeff of Var or look for different populations by using multi-distribution fits. If the differences in length or area (EM data) can be segregated into short and long distances, then a similar feature might arise from the LM data. If no such morphological segregation exists, then the heterogeneity in Ca microdomains may arise from variable Cav channel density or gating, Ca buffer, etc.

      Due to the point spread function limitation in light microscopy, the size of the ribbon dimensions in light microscopy cannot be reliably measured. As a surrogate, we used total fluorescence of the ribbon, which should correlate with the number of ribeye molecules in the ribbon. To obtain ribbon dimensions, we used measurements from the SBF-SEM images only. We summarized the distribution of ribbon width and length in Figures 11C and 11D. The distribution of the active zone size is summarized in Supplementary Figure 8. Pearson’s correlation coefficients are positive, but a weak correlation, suggesting multiple mechanisms likely to contribute to heterogeneity in the local calcium signals as the Reviewer pointed out.

      (9) Again, the quality of the EM data is great, and sufficient to make the assignment of SVs to different pools, as you have done in Fig S1. My only complaint is that the Ultrafast pool as indicated in the schematic of S1A seems to have a misassignment with respect to the green SV that is 15 nm from the PM. In the original Mennerick and Matthews 1996 study, the UF pool emptied in ~1msec. The morphological correlate for the UF has been assumed to be SVs touching the plasma membrane. 15 nm away is about 14 nm too far to be in the UF.

      Thank you for pointing that out. We have updated the vesicles labeling in Supplementary Figure 1 and Main Figure 4.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors have developed a new Ca indicator conjugated to the peptide, which likely recognizes synaptic ribbons, and have measured microdomain Ca near synaptic ribbons at retinal bipolar cells. This interesting approach allows one to measure Ca close to transmitter release sites, which may be relevant for synaptic vesicle fusion and replenishment. Though microdomain Ca at the active zone of ribbon synapses has been measured by Hudspeth and Moser, the new study uses the peptide recognizing synaptic ribbons, potentially measuring the Ca concentration relatively proximal to the release sites.

      Thank you very much for this positive evaluation of our work.

      Strengths:

      The study is in principle technically well done, and the peptide approach is technically interesting, which allows one to image Ca near the particular protein complexes. The approach is potentially applicable to other types of imaging.

      Thank you very much for this appreciation.

      Weaknesses:

      Peptides may not be entirely specific, and the genetic approach tagging particular active zone proteins with fluorescent Ca indicator proteins may well be more specific. I also feel that "Nano-physiology" is overselling, because the measured Ca is most likely the local average surrounding synaptic ribbons. With this approach, nobody knows about the real release site Ca or the Ca relevant for synaptic vesicle replenishment. It is rather "microdomain physiology" which measures the local Ca near synaptic ribbons, relatively large structures responsible for fusion, replenishment, and recycling of synaptic vesicles.

      The peptide approach has been used fairly extensively in the ribbon synapse field and the evidence that it efficiently labels the ribbon is well established, however, we do acknowledge that the peptide is in equilibrium with a cytoplasmic pool. Thus, some of the signal arises from this cytoplasmic pool. The alternative of a genetically encoded Ca-indicator concatenated to a ribbon protein would not have this problem, but would be more limited in flexibility in changing calcium indicators. We believe both approaches have their merits, each with separate advantages and disadvantages.

      As for the nano vs. micro argument, we certainly do not want to suggest that we are measuring the same nano-domains, on the spatial scale of 10s of nanometers, that drive neurotransmitter release, but we do believe we are in the sub-micrometer -- 100s of nm -- range. We chose the term based on the usage by other authors to describe similar measurements (Neef et al., 2018; https://doi.org/10.1038/s41467-017-02612-y), but we see the reviewer’s point.

      Recommendations:

      I have no recommendation for additional experiments. However, the statement of "nanophysiology" is too much, and the authors should tone done the ms recognizing some caveats.

      As we mention above, we chose the term based on the usage by other authors to describe similar measurements, and we do believe that we achieve resolution of a few hundred nanometers, and therefore would prefer to keep the current title of the manuscript. For example, Figure 5E shows that, with ribeye-bound low-affinity calcium indicator, the proximal calcium signals were preserved in the presence of BAPTA, rising and decaying abruptly, as expected for a nanodomain Ca<sup>2+</sup> elevation. Thus, we believe that this measurement in particular describes a nanodomain-scale signal. However, we acknowledge that we are not currently able to resolve the spatial distribution of Ca<sup>2+</sup> signals with a spatial resolution of 10s of nanometers.

    1. eLife Assessment

      The authors take a synthetic approach by introducing synaptic ribbon proteins into HEK cells to analyze how these assemblies cluster calcium channels at the active zone. Using a synapse-naive heterologous expression system and overexpression-based strategy is valuable, as it establishes a promising model for studying molecular interactions at the active zone. The study is built on a solid combination of super-resolution microscopy and electrophysiology, though it currently falls short of replicating the full functional properties of native ribbon synapses and instead resembles a multiprotein complex that partially mimics ribbon-type active zones.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors attempt to reconstitute some active zone properties by introducing synaptic ribbon proteins into HEK cells. This "ground-up" approach can be valuable for assessing the necessity of specfic proteins in synaptic function. Here, the authors co-transfect a membrane-targeted bassoon, RBP2, calcium channel subunits and Ribeye to generate what they call "synthetic ribbons". The resultant structures show an ability to cluster calcium channels (Figure 4B) and a modest ability to concentrate calcium entry locations (figure 7J). At the light level, the ribeye aggregates look spherical and localize to the membrane through its interaction with the membrane-targeted bassoon and at the EM level the structures resemble those observed when Ribeye is overexpressed alone. It is a nice proof-of-principle in establishing a useful experimental system for studying calcium channel localization and with expression of other proteins perhaps a means to understanding structure and function of the ribbon. The paper does establish that previously described protein-interactions can be reconstituted in a heterologous system to and that the addition of Ribeye can increase the size of calcium channel patches via indirect interactions.

      Strengths:

      (1) The authors establish a new experimental system for the study of calcium channel localization to active zones.<br /> (2) The clustering of calcium channels to bassoon via RBP2 is a nice confirmation of a previously-described interaction between bassoon and calcium channels in a cell-based system<br /> (3) The "ground-up" approach is an attractive one and theoretically allows one learn a lot about the essential interactions for building a ribbon structure.<br /> (4) The finding that introducing Ribeye can enhance the size of calcium channel patches is a novel finding that is interesting.

      Weaknesses:

      (1) The addition of EM is welcome, but the structures seem to resemble those created by overexpression of Ribeye alone, albeit at the membrane. It is unclear to me whether the interaction with Bsn or indirect interactions with other proteins has any effect on these structures. Also, while the abstract mentions that the size and shape are similar to ribbons, the EM seems to show that the size and shape are quite variable.<br /> (2) The clustering of channels is accomplished by taking advantage of previously described interactions between RBP2, Ca channels and bassoon. While it is nice to see that it can be reconstituted in a naive cell, the interactions were previously described. The localization of Ribeye to bassoon takes advantage of a previously described interaction between the two and the membrane localization of the complexes required introduction of a membrane-anchoring motif. These factors limit the novelty of the findings.<br /> (3) The difference in Ca imaging between SyRibbons and other locations is subtle. While there are reasonable explanations for why this could be the case, it may limit the utility of this system for studying Ca-channel-ribbon dynamics moving forward.

    3. Reviewer #2 (Public review):

      Summary:

      The authors show that co-expression of bassoon, RIBEYE, Cav1.3-alpha1, Cav-beta3, Cav-alpha2delta1, and RBP2 in a heterologus system (HEK293 cells) is sufficient to generate a protein complex resembling a presyanptic ribbon-type active zone both in morphology and in function (in clustering voltage-gated Ca channels and creating sites for localized Ca2+ entry). If the 3 separate Cav gene products are taken as a single protein (i.e. a Ca channel), the conclusion is that the core of a ribbon synapse comprises 4 proteins: bassoon holds the RIBEYE-containing ribbon to the plasma membrane, and RPB2 binds to bassoon and Ca channels, tethering the Ca channels to the presynaptic active zone.

      Strengths:

      (1) Good use of a heterologous system with generally appropriate controls provides convincing evidence that a presynaptic ribbon-type active zone (without the ability to support exocytosis), with the ability to support localized Ca2+ entry (a key feature of ribbon-type pre-synapses) can be assembled from a few proteins.<br /> (2) In the revised manuscript, the authors do a good job of addressing the limitations of their cultured cell-system.

      Weaknesses:

      (1) Relies on over-expression, which almost certainly diminishes the experimentally-measured parameters (e.g. pre-synapse clustering, localization of Ca2+ entry).<br /> (2) Are HEK cells the best model? HEK cells secrete substances and have a studied-endocytitic pathway, but they do not create neurosecretory vesicles. Initially, I asked why didn't the authors did not try to reconstitute a ribbon synapse in a cell that makes neurosecretory vesicles like a PC12 cell, and the authors addressed this question in their revision.<br /> (3) Related to 1 and 2: the Ca channel localization observed is significant but not so striking given the presence of Cav protein and measurements of Ca2+ influx distributed across the membrane. Presumably, this is the result of overexpression and an absence of pathways for pre-synaptic targeting of Ca channels. But, still, it was surprising that Ca channel localization was so diffuse. I suppose that the authors tried to reduce the effect of over-expression by using an inducible Cav1.3? Even so, the accessory subunits were constitutively over-expressed.

    4. Reviewer #3 (Public review):

      Summary:

      Ribbon synapses are complex molecular assemblies responsible for synaptic vesicle trafficking in sensory cells of the eye and the inner ear. The Ca2+-dependent exocytosis occurs at the active zone (AZ), however, the molecular mechanisms orchestrating the structure and function of the AZs of ribbon synapses are not well understood. To advance in the understanding of those mechanisms, the authors present a novel and interesting experimental strategy pursuing the reconstitution of a minimal active zone of a ribbon synapse within a synapse-naïve cell line: HEK293 cells. The authors have used stably transfected HEK293 cells that express voltage-gated Ca2+ channels subunits (constitutive -CaV beta3 and CaV alpha2 beta1- and inducible CaV1.3 alpha1). They have expressed in those cells several proteins of the ribbon synapse active zone: (1) RIBEYE, (2) a modified version of Bassoon that binds to the plasma membrane through artificial palmitoylation (Palm-Bassoon) and (3) RIM-binding protein 2 (RBP2) to induce the formation of a minimal active zone that they called SyRibbons. The formation of such structures is convincing, however, the evidence of such structures having a functional impact (for example enhancing Ca2+-currents), as the authors claim, is weak. In conclusion, the novel approach shows that expression of a multiprotein complex partially reproduces properties, especially structural properties, of ribbon-type active zones in a heterologous system. Although the approach opens interesting possibilities for further experiments, the evidence supporting the functional properties of the so called "synthetic ribbon synapses" is incomplete.

      Strengths of the study:

      (1) The study is carefully carried out using a remarkable combination of (1) superresolution, correlative light microscopy and cryo-electron tomography, to analyze the formation and subcellular distribution of molecular assemblies and (2) functional assessment of voltage-gated Ca2+ channels using patch-clamp recording of Ca2+-currents and fluorometry to correlate Ca2+ influx with the molecular assemblies formed by AZ proteins. The results are of high quality and are in general accompanied of required control experiments.<br /> (2) The method opens new opportunities to further investigate the minimal and basic properties of AZ proteins that are difficult to study using in vivo systems. The cells that operate through ribbon synapses (e.g. photoreceptors and hair cells) are particularly difficult to manipulate, so setting up and validating the use of a heterologous system more suitable for molecular manipulations is highly valuable.<br /> (3) The structures formed by RIBEYE and Palm-Bassoon in HEK293 cells identified by STED nanoscopy and cryo-electron microscopy share relevant similarities similar to the AZs of ribbon synapses found in rat inner hair cells.

      Weaknesses of the study:

      (1) The evidence of the functional properties of the "synthetic ribbon-type active zones" has been only assessed by its effect on the modulation of Ca2+-channel function, and that effect is rather weak. The authors provide reasonable explanations regarding such a weak effect but, however, it is difficult to conclude that indeed the "synthetic ribbon-type active zones" are bona fide functional multiprotein complexes.

    5. Author response:

      The following is the authors’ response to the original reviews

      Life Assessment

      The authors use a synthetic approach to introduce synaptic ribbon proteins into HEK cells and analyze the ability of the resulting assemblies to cluster calcium channels at the active zone. The use of this ground-up approach is valuable as it establishes a system to study molecular interactions at the active zone. The work relies on a solid combination of super-resolution microscopy and electrophysiology, but would benefit from: (i) additional ultrastructural analysis to establish ribbon formation (in the absence of which the claim of these being synthetic ribbons might not be supported; (ii) data quantification (to confirm colocalization of different proteins); (iii) stronger validation of impact on Ca2+ function; (iv) in depth discussion of problems derived from the use of an over-expression approach.

      We thank the editors and the reviewers for the constructive comments and appreciation of our work. Please find a detailed point-to-point response below. In response to the critique received, we have now (i) included an ultrastructural analysis of the SyRibbons using correlative light microscopy and cryo-electron tomography, (ii) performed quantifications to confirm the colocalisation of the various proteins, (iii) discussed and carefully rephrased our interpretation of the role of the ribbon in modulating Ca<sup>2+</sup> channel function and (iv) discussed concerns regarding the use of an overexpression system. 

      Public Reviews:

      Reviewer #1 (Public Review):

      We would like to thank the reviewer for the comments and advice to further improve our manuscript. We have completely overhauled the manuscript taking the suggestions of the reviewer into account.

      (1) Are these truly "synthetic ribbons". The ribbon synapse is traditionally defined by its morphology at the EM level. To what extent these structures recapitulate ribbons is not shown. It has been previously shown that Ribeye forms aggregates on its own. Do these structures look any more ribbonlike than ribeye aggregates in the absence of its binding partners?

      We thank reviewer 1 for their constructive feedback and critique of the work. 

      We agree that traditionally, ribbon synapses have always been defined by the distinct morphology observed at the EM level. However, since the discovery of the core-components of ribbons (RIBEYE and Piccolino) confocal and super-resolution imaging of immunofluorescently labelled ribbons have gained importance for analysing ribbon synapses. A correspondence of RIBEYE immunofluorescent structures at the active zone to electron microscopy observations of ribbons has been established in numerous studies (Wong et al, 2014; Michanski et al, 2019, 2023; Maxeiner et al, 2016; Jean et al, 2018) even though direct correlative approaches have yet to be performed to our knowledge. We have now analysed SyRibbons using cryo-correlative electron-light microscopy. We observe that GFPpositive RIBEYE spots corresponded well with electron-dense structures, as is characteristic for synaptic ribbons (Robertis & Franchi, 1956; Smith & Sjöstrand, 1961; Matthews & Fuchs, 2010). We could also observe SyRibbons within 100 nm of the plasma membrane (see Fig. 3). We have now added this qualitative ultrastructural analysis of SyRibbons in the main manuscript (lines 272 - 294, Fig. 3 and Supplementary Fig. 3).

      (2) No new biology is discovered here. The clustering of channels is accomplished by taking advantage of previously described interactions between RBP2, Ca channels and bassoon. The localization of Ribeye to bassoon takes advantage of a previously described interaction between the two. Even the membrane localization of the complexes required the introduction of a membraneanchoring motif.

      We respectfully disagree with the overall assessment. Our study emphasizes the synthetic establishment of protein assemblies that mimic key aspects of ribbon-type active zone, defining minimum molecular requirements. Numerous previous studies have described the role of the synaptic ribbon in organising the spatial arrangement of Ca<sup>2+</sup> channels, regulating their abundance and possibly also modulating their physiological properties (Maxeiner et al, 2016; Frank et al, 2010; Jean et al, 2018; Wong et al, 2014; Grabner & Moser, 2021; Lv et al, 2016). We would like to highlight that there remain major gaps between existing in vitro and in vivo data; for instance, no evidence for direct or indirect interactions between Ca<sup>2+</sup> channels and RIBEYE have been demonstrated so far. While we do indeed take advantage of previously known interactions between RIBEYE and Bassoon (tom Dieck et al, 2005); between Bassoon, RBP2 and P/Q-type Ca<sup>2+</sup> channels (Davydova et al, 2014); and between RBP2 and Ltype Ca<sup>2+</sup> channels (Hibino et al, 2002), our study tries to bridge these gaps by establishing the indirect link between the synaptic ribbon (RIBEYE) and L-type CaV1.3 Ca<sup>2+</sup> channels using a bottom-up approach, which has previously just been speculative. Our data shows how even in a synapse-naive heterologous expression system, ribbon synapse components assemble Ca<sup>2+</sup> channel clusters and even show a partial localisation of Ca<sup>2+</sup> signal. Moreover, we argue that the established reconstitution approach provides other interesting insights such as laying ground-up evidence supporting the anchoring of the synaptic ribbon by Bassoon. Finally, we expect that the established system will serve future studies aimed at deciphering the role of putative CaV1.3 or CaV1.4 interacting proteins in regulating Ca<sup>2+</sup> channels of ribbon synapses by providing a more realistic Ca<sup>2+</sup> channel assembly that has been available in heterologous expression systems used so far. In response to the reviewers comment we have augmented the discussion accordingly.  

      (3) The only thing ribbon-specific about these "syn-ribbons" is the expression of ribeye and ribeye does not seem to participate in the localization of other proteins in these complexes. Bsn, Cav1.3 and RBP2 can be found in other neurons.

      The synaptic ribbon made of RIBEYE is the key molecular difference in the molecular AZ ultrastructure of ribbon synapses in the eye and the ear. We hypothesize the ribbon to act as a superscaffold that enables AZ with large Ca<sup>2+</sup> channel assemblies and readily releasable pools. In further support of this hypothesis, the present study on synthetic ribbons shows that CaV1.3 Ca<sup>2+</sup> channel clusters are larger in the presence of SyRibbons compared to SyRibbon-less CaV1.3 Ca<sup>2+</sup> channel clusters in tetratransfected HEK cells (Ca<sup>2+</sup> channels, RBP, membrane-anchored Bassoon, and RIBEYE, Fig. 6). In response to the reviewers comment we now added an analysis of triple-transfected HEK cells (Ca<sup>2+</sup> channels, RBP, membrane-anchored Bassoon), in which CaV1.3 Ca<sup>2+</sup> channel clusters again are significantly smaller than at the SyRibbons and indistinguishable from SyRibbon-less CaV1.3 Ca<sup>2+</sup> channel clusters (Fig. 6E, F).

      (4) As the authors point out, RBP2 is not necessary for some Ca channel clustering in hair cells, yet seems to be essential for clustering to bassoon here.

      Here we would like to clarify that RBP2 is indeed important in inner hair cells for promoting a larger complement of CaV1.3 and RBP2 KO mice show smaller CaV1.3 channel clusters and reduced whole cell and single-AZ Ca<sup>2+</sup> influx amplitudes (Krinner et al, 2017). However, a key point of difference we emphasize on is that even though CaV1.3 clusters appeared smaller, they did not appear broken or fragmented as they do upon genetic perturbation of Bassoon (Frank et al, 2010), RIBEYE (Jean et al, 2018) or Piccolino (Michanski et al, 2023). This highlights how there may be a hierarchy in the spatial assembly of CaV1.3 channels at the inner hair cell ribbon synapse (also described in the discussion section “insights into presynaptic Ca<sup>2+</sup> channel clustering and function”) with proteins like RBP2 regulating abundance of CaV1.3 channels at the synapse and organising them into smaller clusters – what we have termed as “nanoclustering”; while Bassoon and RIBEYE may serve as super-scaffolds further organizing these CaV1.3 nanoclusters into “microclusters”. Observations of fragmented Ca<sup>2+</sup> channel clusters and broader spread of Ca<sup>2+</sup> signal seen upon Ca<sup>2+</sup> imaging in RIBEYE and Bassoon mutants (Jean et al, 2018; Frank et al, 2010; Neef et al, 2018), and the absence of such a phenotype in RBP2 mutants (Krinner et al, 2017) may be explained by such a differential role of these proteins in organising Ca<sup>2+</sup> channel spatial assembly. The data of the present study on reconstituted ribbon containing AZs are in line with these observations in inner hair cells: RBP2 appears important to tether Ca<sup>2+</sup> channels to Bassoon and these AZ-like assemblies are organised to their full extent by the presence of RIBEYE. As mentioned in the response to point 3 of the reviewer, we have now further strengthened this point by adding the analysis of SyRibbon-less CaV1.3 Ca<sup>2+</sup> channel clusters in tripletransfected HEK cells (Ca<sup>2+</sup> channels, RBP, membrane-anchored Bassoon, Fig. 6E, F). Moreover, we have revised the discussion accordingly. 

      (5) The difference in Ca imaging between SyRibbons and other locations is extremely subtle.

      We agree with the reviewer on the modest increase in Ca<sup>2+</sup> signal amplitude seen in the presence of  SyRibbons and provide the following reasoning for this observation: 

      (i) It is plausible that due to the overexpression approach, Ca<sup>2+</sup> channels (along with RBP2 and PalmBassoon) still show considerably high expression throughout the membrane even in regions where SyRibbons are not localised. Indeed, this is evident in the images shown in the lower panel in Fig. 6B, where Ca<sup>2+</sup> channel immunofluorescence is distributed across the plasma membrane with larger clusters formed underneath SyRibbons (for an opposing scenario, please see the cell in Fig. 6B upper panel with very localised CaV1.3 distribution underneath SyRibbons). This would of course diminish the difference in the Ca<sup>2+</sup> signals between membrane regions with and without SyRibbons. We note that while the contrast is greater for native synapses, extrasynaptic Ca<sup>2+</sup> channels have been described in numerous studies alone for hair cells (Roberts et al, 1990; Brandt, 2005; Zampini et al, 2010; Wong et al, 2014).

      (ii) Nevertheless, we do not expect a remarkably big difference in Ca<sup>2+</sup> influx due to the presence of SyRibbons in the first place. Ribbon-less AZs in inner hair cells of RIBEYE KO mice showed normal Ca<sup>2+</sup> current amplitudes at the whole-cell and the single-AZ level (Jean et al, 2018). However, it was the spatial spread of the Ca2+ signal at the single-AZ level which appeared to be broader and more diffuse in these mutants in the absence of the ribbon, in contrast to the more confined Ca2+ hotspots seen in the wild-type controls. 

      So, in agreement with these published observations – it appears that presence of SyRibbons helps in spatially confining the Ca<sup>2+</sup> signal by super scaffolding nanoclusters into microclusters (see also our response to points 3 and 4 of the reviewer): this is evident from seeing some spatial confinement of Ca<sup>2+</sup> signals near SyRibbons on top of the diffuse Ca<sup>2+</sup> signal across the rest of the membrane as a result of overexpression in HEK cells. 

      We have now carefully rephrased our interpretation throughout the manuscript and added further explanation in the discussion section.   

      (6) The effect of the expression of palm-Bsn, RBP2 and the combination of the two on Ca-current is ambiguous. It appears that while the combination is larger than the control, it probably isn't significantly different from either of the other two alone (Fig 5). Moreover, expression of Ribeye + the other two showed no effect on Ca current (Figure 7). Also, why is the IV curve right shifted in Figure 7 vs Figure 5?

      We agree with the reviewer that co-expression of palm-Bassoon and RBP2 seems to augment Ca<sup>2+</sup> currents, while the additional expression of RIBEYE results in no change when compared to wild-type controls. We currently do not have an explanation for this observation and would refrain from making any claims without concrete evidence. As the reviewer also correctly pointed out, while the expression of the combination of palm-Bassoon and RBP2 raises Ca<sup>2+</sup> currents, current amplitudes are not significantly different when compared to the individual expression of the two proteins (P > 0.05, Kruskal-Wallis test). In light of this, we have now carefully rephrased our MS. Moreover, we would like to thank reviewer 1 for pointing out the right shift in the IV curve which was due to an error in the values plotted on the x-axis. This has been corrected in the updated version of the manuscript. 

      (7) While some of the IHC is quantified, some of it is simply shown as single images. EV2, EV3 and Figure 4a in particular (4b looks convincing enough on its own, but could also benefit from a larger sample size and quantification)

      We have now added quantifications for the colocalisations of the various transfection combinations depicted in the above-mentioned figures collectively in Supplementary Figure 7 and added the corresponding results and methods accordingly. 

      Reviewer #2 (Public Review):

      We would like to thank the reviewer for the comments and advice to further improve our manuscript.

      (1) Relies on over-expression, which almost certainly diminishes the experimentally-measured parameters (e.g. pre-synapse clustering, localization of Ca2+ entry).

      We acknowledge this limitation highlighted by the reviewer arising from the use of an overexpression system and have carefully rephrased our interpretation and discussed possible caveats in the discussion section. 

      (2) Are HEK cells the best model? HEK cells secrete substances and have a studied-endocytitic pathway, but they do not create neurosecretory vesicles. Why didn't the authors try to reconstitute a ribbon synapse in a cell that makes neurosecretory vesicles like a PC12 cell?

      This is a valid point for discussion that we also had here extensively. We indeed did consider pheochromocytoma cells (PC12 cells) for reconstitution of ribbon-type AZs and also performed initial experiments with these in the initial stages of the project. PC12 cells offer the advantage of providing synaptic-like microvesicles and also endogenously express several components of the presynaptic machinery such as Bassoon, RIM2, ELKS etc (Inoue et al, 2006) such that overexpression of exogenous AZ proteins would have to be limited to RIBEYE only. 

      However, a major drawback of PC12 cells as a model is the complex molecular background of these cells. We have also briefly described this in the discussion section (line 615 – 619). Naïve, undifferentiated PC12 cells show highly heterogeneous expression of various CaV channel types (Janigro et al, 1989); however, CaV1.3, the predominant type in ribbon synapses of the ear, does not seem to express in these cells (Liu et al, 1996). Furthermore, our attempts at performing immunostainings against CaV1.3 and at overexpressing CaV1.3 in PC12 cells did not prove successful and we decided on refraining from pursuing this further (data not shown). 

      On the contrary, HEK293 cells being “synapse-naïve” provide the advantage of serving as a “blank canvas” for performing such reconstitutions, e.g. they lack voltage-gated Ca<sup>2+</sup> channels and multidomain proteins of the active zone. Moreover, an important practical aspect for our choice was the availability of the HEK293 cell line with stable (and inducible) expression of the CaV1.3 Ca<sup>2+</sup> channel complex. Finally, as described in lines 613 – 614 of the discussion section, even though HEK293 cells lack SVs and the molecular machinery required for their release, our work paves way for future studies which could employ delivery of SV machinery via co-expression (Park et al, 2021) which could then be analyzed by the correlative light and electron microscopy workflow we worked out and added during revision. 

      (3) Related to 1 and 2: the Ca channel localization observed is significant but not so striking given the presence of Cav protein and measurements of Ca2+ influx distributed across the membrane. Presumably, this is the result of overexpression and an absence of pathways for pre-synaptic targeting of Ca channels. But, still, it was surprising that Ca channel localization was so diffuse. I suppose that the authors tried to reduce the effect of over-expression by using an inducible Cav1.3? Even so, the accessory subunits were constitutively over-expressed.

      We agree with the reviewer on the modest increase in Ca<sup>2+</sup> signal amplitude seen in the presence of SyRibbons. Yes, we employed inducible expression of the CaV1.3a subunit and tried to reduce the effect of overexpression by testing different induction times. However, we did not observe any major differences in expression and observed large variability in CaV1.3 expression across cells irrespective of induction duration. At all time points, there were cells with diffuse CaV1.3 localisation also in regions without SyRibbons which likely reduced the contrast of the Ca<sup>2+</sup> signal we observe. We provide the following reasoning for this observation: 

      (i) It is plausible that due to the overexpression approach, Ca<sup>2+</sup> channels (along with RBP2 and PalmBassoon) still show considerable expression along the membrane also in regions where SyRibbons are not localised. Indeed, this is evident in the images shown in the lower panel in Fig. 6B where Ca<sup>2+</sup> channel immunofluorescence is distributed across the plasma membrane with larger clusters formed underneath SyRibbons. This would of course diminish the difference in the Ca<sup>2+</sup> signals between membrane regions with and without SyRibbons. We note that while the contrast is greater for native synapses, extrasynaptic Ca<sup>2+</sup> channels have been described in numerous studies alone for hair cells (Roberts et al, 1990; Brandt, 2005; Zampini et al, 2010; Wong et al, 2014).

      (ii) Nevertheless, we do not expect a striking difference in Ca<sup>2+</sup> influx amplitude due to the presence of SyRibbons in the first place. Ribbon-less AZs in inner hair cells of RIBEYE KO mice showed normal Ca<sup>2+</sup> current amplitudes at the whole-cell and the single-AZ level (Jean et al, 2018). Instead, it was the spatial spread of the Ca<sup>2+</sup> signal at the single-AZ level which appeared to be broader and more diffuse in these mutants in the absence of the ribbon, in contrast to the more confined Ca<sup>2+</sup> hotspots seen in the wildtype controls. 

      So, in agreement with these published observations – it appears that presence of SyRibbons helps in spatially confining the Ca<sup>2+</sup> signal by super scaffolding nanoclusters into microclusters: this is evident from seeing some spatial confinement of Ca<sup>2+</sup> signals near SyRibbons on top of the diffuse Ca<sup>2+</sup> signal across the rest of the membrane as a result of overexpression in HEK cells. 

      We have now carefully rephrased our interpretation throughout the manuscript and added further explanation in the discussion section.   

      Reviewer #3 (Public Review):

      We would like to thank the reviewer for the comments and advice to further improve our manuscript.

      (1) The results obtained in a heterologous system (HEK293 cells) need to be interpreted with caution. They will importantly speed the generation of models and hypothesis that will, however, require in vivo validation.

      We acknowledge this limitation highlighted by Reviewer 3 arising from the use of an overexpression system and have carefully rephrased our interpretation and discussed possible caveats in the discussion section. We employed inducible expression of the CaV1.3a subunit and tried to reduce the effect of overexpression by testing different induction times. However, we did not observe any major differences in expression and observed large variability in CaV1.3 expression across cells irrespective of induction duration. At all time points, there were cells with diffuse CaV1.3 localisation, even in regions without SyRibbons and this could reduce the contrast of the Ca<sup>2+</sup> signal we observe. We provide the following reasoning for this observation: 

      (i) It is plausible that due to the overexpression approach, Ca<sup>2+</sup> channels (along with RBP2 and PalmBassoon) still show considerable expression along the membrane also in regions where SyRibbons are not localised. Indeed, this is evident in the images shown in the lower panel in Fig. 6B where Ca<sup>2+</sup> channel immunofluorescence is distributed across the plasma membrane with larger clusters formed underneath SyRibbons. This would of course diminish the difference in the Ca<sup>2+</sup> signals between membrane regions with and without SyRibbons. We note that while the contrast is greater for native synapses, extrasynaptic Ca<sup>2+</sup> channels have been described in numerous studies alone for hair cells (Roberts et al, 1990; Brandt, 2005; Zampini et al, 2010; Wong et al, 2014).

      (ii) Nevertheless, we do not expect a striking difference in Ca<sup>2+</sup> influx amplitude due to the presence of SyRibbons in the first place. Ribbon-less AZs in inner hair cells of RIBEYE KO mice showed normal Ca<sup>2+</sup> current amplitudes at the whole-cell and the single-AZ level (Jean et al, 2018). Instead, it was the spatial spread of the Ca<sup>2+</sup> signal at the single-AZ level which appeared to be broader and more diffuse in these mutants in the absence of the ribbon, in contrast to the more confined Ca<sup>2+</sup> hotspots seen in the wildtype controls. 

      So, in agreement with these published observations – it appears that presence of SyRibbons helps in spatially confining the Ca<sup>2+</sup> signal by super scaffolding nanoclusters into microclusters: this is evident from seeing some spatial confinement of Ca<sup>2+</sup> signals near SyRibbons on top of the diffuse Ca<sup>2+</sup> signal across the rest of the membrane as a result of overexpression in HEK cells. 

      (2) The authors analyzed the distribution of RIBEYE clusters in different membrane compartments and correctly conclude that RIBEYE clusters are not trapped in any of those compartments, but it is soluble instead. The authors, however, did not carry out a similar analysis for Palm-Bassoon. It is therefore unknown if Palm-Bassoon binds to other membrane compartments besides the plasma membrane. That could occur because in non-neuronal cells GAP43 has been described to be in internal membrane compartments. This should be investigated to document the existence of ectopic internal Synribbons beyond the plasma membrane because it might have implications for interpreting functional data in case Ca2+-channels become part of those internal Synribbons.

      In response to this valid concern, we have now included the suggested experiment in Supplementary Figure 1. We investigated the subcellular localisation of Palm-Bassoon and did not find Palm-Bassoon puncta to colocalise with ER, Golgi, or lysosomal markers, suggesting against a possible binding with membrane compartments inside the cell. We have added the following sentence in the results section, line 145 : “Palm-Bassoon does not appear to localize in the ER, Golgi apparatus or lysosomes (Supplementary Fig 1 D, E and F).”

      (3) The co-expression of RBP2 and Palm-Bassoon induces a rather minor but significant increase in Ca2+-currents (Figure 5). Such an increase does not occur upon expression of (1) Palm-Bassoon alone, (2) RBP2 alone or (3) RIBEYE alone (Figure 5). Intriguingly, the concomitant expression of PalmBassoon, RBP2 and RIBEYE does not translate into an increase of Ca2+-currents either (Figure 7).

      We agree with the reviewer that co-expression of palm-Bassoon and RBP2 seems to augment Ca<sup>2+</sup> currents, while the additional expression of RIBEYE results in no change when compared to wild-type controls. We currently do not have an explanation for this observation and would refrain from making any claims without concrete evidence. We also highlight that, while the expression of the combination of palm-Bassoon and RBP2 raises Ca<sup>2+</sup> currents, current amplitudes are not significantly different when compared to the individual expression of the two proteins (P > 0.05, Kruskal-Wallis test). In light of this, we have now carefully rephrased our MS. 

      (4) The authors claim that Ca2+-imaging reveals increased CA2+-signal intensity at synthetic ribbontype AZs. That claim is a subject of concern because the increase is rather small and it does not correlate with an increase in Ca2+-currents.

      Thanks for the comment: please see our response to your first comment and the lines 585 – 610 in the discussion section.

      Recommendations for the authors:  

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors should have a better discussion of problems derived from over-expression.

      Done. Please see above. 

      (2) Ideally, the authors would repeat the study using a secretory cell line, but this is of course not possible. The idea could be brought forth, though.

      As described above in our response to the public review of reviewer 2, we have discussed this idea in the discussion section (refer to lines 615 – 619), emphasizing on both the advantages and the limitations of using a secretory cell line (e.g. PC12 cells) instead of HEK293 cells as a model for performing such reconstitutions. 

      Reviewer #3 (Recommendations For The Authors):

      (1) There are several figures in which colocalization between different proteins is studied only displaying images but without any quantitative data. This should be corrected by providing such a quantitative analysis.

      We have now added quantifications for the colocalisations of the various transfection combinations depicted in the above-mentioned figures collectively in Supplementary Figure 7 and added the corresponding results and methods accordingly. 

      (2) The little increase in Ca2+-currents and Ca2+-influx associated to the clustering of Ca2+-channels to Synribbons is a concern. The authors should discuss if such a minor increase (found only when Palm-Bassoon and RBP2 ae co-expressed) would have or not physiological consequences in an actual synapse. They might discuss the comparison of those results and compare with results obtained in genetically modified mice in which Ca2+-currents are affected upon the removal of AZs proteins. On the other hand, they should explain why Ca2+-currents do not increase when the Synribbons are formed by RIBEYE, Palm-Bassoon and RBP2.

      Done. Please see above. 

      (3) The description of the patch-clamp experiments should be enriched by including representative currents. Did the authors measure tail currents?

      We would like to thank the reviewer for the valuable suggestion and have now added representative currents to the figures (see Supplementary Figure 5B). We agree with the reviewer on the importance of further characterizing the Ca<sup>2+</sup> currents in the presence and absence of SyRibbons by analysis of tail currents for counting the number of Ca<sup>2+</sup> channels by non-stationary fluctuation analysis but consider this to be out of scope of the current study and an objective for future studies. 

      (4) The current displayed in Figure 7 E should be explained better.

      Previous studies have shown that Ca<sup>2+</sup>-binding proteins (CaBPs) compete with Calmodulin to reduce Ca<sup>2+</sup>-dependent inactivation (CDI) and promote sustained Ca<sup>2+</sup> influx in Inner Hair Cells (Cui et al, 2007; Picher et al, 2017). In the absence of CaBPs, CaV1.3-mediated Ca<sup>2+</sup> currents show more rapid CDI as in the case here upon heterologous expression in HEK cells ((Koschak et al, 2001), see also Picher et al 2017 where co-expression of CaBP2 with CaV1.3 inhibits CDI in HEK293 cells). The inactivation kinetics of CaV1.3 are also regulated by the subunit composition (Cui et al, 2007) along with the modulation via interaction partners and given the reconstitution here we do not find the currents very surprising. 

      (5) Is the difference in Ca2+-influx still significantly higher upon the removal of the maximum value measured in positive Syribbons spots (Figure 7, panel K)?

      Yes, on removing the maximum value, the P value increases from 0.01 to 0.03 but remains statistically significant. 

      (6) In summary, although the approach pioneered by the authors is exciting and provides relevant results, there is a major concern regarding the interpretation of the modulation of Ca2+ channels.

      We have now carefully rephrased our interpretation on the modulation of Ca<sup>2+</sup> channels.  

      References

      Brandt A (2005) Few CaV1.3 Channels Regulate the Exocytosis of a Synaptic Vesicle at the Hair Cell Ribbon Synapse. Journal of Neuroscience 25: 11577–11585

      Cui G, Meyer AC, Calin-Jageman I, Neef J, Haeseleer F, Moser T & Lee A (2007) Ca2+-binding proteins tune Ca2+-feedback to Cav1. 3 channels in mouse auditory hair cells. The Journal of Physiology 585: 791–803

      Davydova D, Marini C, King C, Klueva J, Bischof F, Romorini S, Montenegro-Venegas C, Heine M, Schneider R, Schröder MS, et al (2014) Bassoon specifically controls presynaptic P/Q-type Ca(2+) channels via RIM-binding protein. Neuron 82: 181–194

      tom Dieck S, Altrock WD, Kessels MM, Qualmann B, Regus H, Brauner D, Fejtová A, Bracko O, Gundelfinger ED & Brandstätter JH (2005) Molecular dissection of the photoreceptor ribbon synapse: physical interaction of Bassoon and RIBEYE is essential for the assembly of the ribbon complex. J Cell Biol 168: 825–836

      Frank T, Rutherford MA, Strenzke N, Neef A, Pangršič T, Khimich D, Fejtova A, Gundelfinger ED, Liberman MC, Harke B, et al (2010) Bassoon and the synaptic ribbon organize Ca2+ channels and vesicles to add release sites and promote refilling. Neuron 68: 724–738

      Grabner CP & Moser T (2021) The mammalian rod synaptic ribbon is essential for Cav channel facilitation and ultrafast synaptic vesicle fusion. eLife 10: e63844

      Hibino H, Pironkova R, Onwumere O, Vologodskaia M, Hudspeth AJ & Lesage F (2002) RIM - binding proteins (RBPs) couple Rab3 - interacting molecules (RIMs) to voltage - gated Ca2+ channels. Neuron 34: 411–423

      Inoue E, Deguchi-Tawarada M, Takao-Rikitsu E, Inoue M, Kitajima I, Ohtsuka T & Takai Y (2006) ELKS, a protein structurally related to the active zone protein CAST, is involved in Ca2+-dependent exocytosis from PC12 cells. Genes to Cells 11: 659–672

      Janigro D, Maccaferri G & Meldolesi J (1989) Calcium channels in undifferentiated PC12 rat pheochromocytoma cells. FEBS Letters 255: 398–400

      Jean P, Morena DL de la, Michanski S, Tobón LMJ, Chakrabarti R, Picher MM, Neef J, Jung S, Gültas M, Maxeiner S, et al (2018) The synaptic ribbon is critical for sound encoding at high rates and with temporal precision. Elife 7: e29275

      Koschak A, Reimer D, Huber I, Grabner M, Glossmann H, Engel J & Striessnig J (2001) alpha 1D (Cav1.3) subunits can form l-type Ca2+ channels activating at negative voltages. J Biol Chem 276: 22100–22106

      Krinner S, Butola T, Jung S, Wichmann C & Moser T (2017) RIM-Binding Protein 2 Promotes a Large Number of CaV1.3 Ca2+-Channels and Contributes to Fast Synaptic Vesicle Replenishment at Hair Cell Active Zones. Front Cell Neurosci 11: 334

      Liu H, Felix R, Gurnett CA, De Waard M, Witcher DR & Campbell KP (1996) Expression and Subunit Interaction of Voltage-Dependent Ca2+ Channels in PC12 Cells. J Neurosci 16: 7557–7565

      Lv C, Stewart WJ, Akanyeti O, Frederick C, Zhu J, Santos-Sacchi J, Sheets L, Liao JC & Zenisek D (2016) Synaptic Ribbons Require Ribeye for Electron Density, Proper Synaptic Localization, and Recruitment of Calcium Channels. Cell Reports 15: 2784–2795

      Matthews G & Fuchs P (2010) The diverse roles of ribbon synapses in sensory neurotransmission. Nat Rev Neurosci 11: 812–822

      Maxeiner S, Luo F, Tan A, Schmitz F & Südhof TC (2016) How to make a synaptic ribbon: RIBEYE deletion abolishes ribbons in retinal synapses and disrupts neurotransmitter release. The EMBO Journal 35: 1098–1114

      Michanski S, Kapoor R, Steyer AM, Möbius W, Früholz I, Ackermann F, Gültas M, Garner CC, Hamra FK, Neef J, et al (2023) Piccolino is required for ribbon architecture at cochlear inner hair cell synapses and for hearing. EMBO Rep 24: e56702

      Michanski S, Smaluch K, Steyer AM, Chakrabarti R, Setz C, Oestreicher D, Fischer C, Möbius W, Moser T, Vogl C, et al (2019) Mapping developmental maturation of inner hair cell ribbon synapses in the apical mouse cochlea. PNAS 116: 6415–6424

      Neef J, Urban NT, Ohn T-L, Frank T, Jean P, Hell SW, Willig KI & Moser T (2018) Quantitative optical nanophysiology of Ca2+ signaling at inner hair cell active zones. Nat Commun 9: 290

      Park D, Wu Y, Lee S-E, Kim G, Jeong S, Milovanovic D, Camilli PD & Chang S (2021) Cooperative function of synaptophysin and synapsin in the generation of synaptic vesicle-like clusters in non-neuronal cells. Nat Commun 12

      Picher MM, Gehrt A, Meese S, Ivanovic A, Predoehl F, Jung S, Schrauwen I, Dragonetti AG, Colombo R, Camp GV, et al (2017) Ca2+-binding protein 2 inhibits Ca2+-channel inactivation in mouse inner hair cells. PNAS 114: E1717–E1726

      Robertis ED & Franchi CM (1956) Electron Microscope Observations on Synaptic Vesicles in Synapses of the Retinal Rods and Cones. J Biophys Biochem Cytol 2: 307–318

      Roberts WM, Jacobs RA & Hudspeth AJ (1990) Colocalization of ion channels involved in frequency selectivity and synaptic transmission at presynaptic active zones of hair cells. J Neurosci 10: 3664–3684

      Smith CA & Sjöstrand FS (1961) A synaptic structure in the hair cells of the guinea pig cochlea. Journal of Ultrastructure Research 5: 184–192

      Wong AB, Rutherford MA, Gabrielaitis M, Pangršič T, Göttfert F, Frank T, Michanski S, Hell S, Wolf F, Wichmann C, et al (2014) Developmental refinement of hair cell synapses tightens the coupling of Ca2+ influx to exocytosis. EMBO J 33: 247–264

      Zampini V, Johnson SL, Franz C, Lawrence ND, Münkner S, Engel J, Knipper M, Magistretti J, Masetto S & Marcotti W (2010) Elementary properties of CaV1.3 Ca(2+) channels expressed in mouse cochlear inner hair cells. J Physiol 588: 187–199

    1. eLife Assessment

      The reported cryo-EM imaging of a pentameric ligand-gated ion channel in liposomes as opposed to nanodiscs has both broad implications and contributes valuable methodological advances to the structural investigation of membrane receptors. The comparison of structures assigned to distinct functional states in liposomes versus nanodiscs is convincing and will aid membrane protein structural biologists in selection of functionally relevant membrane reconstitution environments.

    2. Reviewer #1 (Public review):

      Summary:

      The authors, Dalal, et. al., determined cryo-EM structures of open, closed, and desensitized states of the pentameric ligand-gated ion channel ELIC reconstituted in liposomes, and compared them to structures determined in varying nanodisc diameters. They argue that the liposomal reconstitution method is more representative of functional ELIC channels, as they were able to test and recapitulate channel kinetics through stopped-flow thallium flux liposomal assay. The authors and others have described channel interactions with membrane scaffold proteins (MSP), initially thought to be in a size-dependent manner. However, the authors reported their cryo-EM ELIC structure interacts with the large nanodisc spNW25, contrary to their original hypotheses. This suggests that the channels interactions with MSPs might alter its structure, possibly influencing the functional states of the channel. Thus, the authors describe reconstitution in liposomes are more representative of the native structure and can recapitulate all channel states.

      Strengths:

      Cryo-EM structural determination from proteoliposomes is promising methodology within the ion channel field due to their large surface area and lack of MSP or other membrane memetics that could alter channel structure. The authors succeeded in comparing structures determined in liposomes to those in a wide range of nanodisc diameters. This comparison gives rise to important discussions for other membrane protein structural studies when deciding the best method for individual circumstances.

      Weaknesses:

      As the overarching goal of the study was to determine structural differences of ELIC in detergent nanodiscs and liposomes. The authors stated they determined open, closed, and desensitized states of ELIC reconstituted in liposomes and suggest the desensitization gate is at the 9' region of the pore. However, limited functional data was provided when determining the functional states of the channel with most of the evidence deriving from structures, which only provides snapshots of channels.

    3. Reviewer #2 (Public review):

      Summary

      The report by Dalas and colleagues introduces a significant novelty in the field of pentameric ligand-gated ion channels (pLGICs). Within this family of receptors, numerous structures are available, but a widely recognised problem remains in assigning structures to functional states observed in biological membranes. Here, the authors obtain both structural and functional information of a pLGIC in a liposome environment. The model receptor ELIC is captured in the resting, desensitised and open states. Structures in large nanodiscs, possibly biased by receptor-scaffold protein interactions, are also reported. Altogether these results set the stage for the adoption of liposomes as a proxy for the biological membranes, for cryoEM studies of pLGICs and membrane proteins in general.

      Strengths

      The structural data is comprehensive, with structures in liposomes in the 3 main states (and for each, both inward-facing and outward-facing), and an agonist-bound structure in the large spNW25 nanodisc (and a retreatment of previous data obtained in a smaller disc). It adds up to a series of work from the same team that constitutes a much-needed exploration of various types of environment for the transmembrane domain of pLGICs. The structural analysis is thorough.<br /> The tone of the report is particularly pleasant, in the sense that the authors' claims are not inflated. For instance, a sentence such as "By performing structural and functional characterization under the same reconstitution conditions, we increase our confidence in the functional annotation of these structures." is exemplary.

      Weakness

      All the details necessary to reproduce the work are present in the Methods. Nevertheless, the biochemistry might have been shown and discussed in greater details. While I do believe that liposomes will be in most cases better than, say, nanodiscs, the process that leads from the protein in its membrane down to the liposome will play a big role in preserving the native structure.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors, Dalal, et. al., determined cryo-EM structures of open, closed, and desensitized states of the pentameric ligand-gated ion channel ELIC reconstituted in liposomes, and compared them to structures determined in varying nanodisc diameters. They argue that the liposomal reconstitution method is more representative of functional ELIC channels, as they were able to test and recapitulate channel kinetics through stopped-flow thallium flux liposomal assay. The authors and others have described channel interactions with membrane scaffold proteins (MSP), initially thought to be in a size-dependent manner. However, the authors reported that their cryo-EM ELIC structure interacts with the large nanodisc spNW25, contrary to their original hypotheses. This suggests that the channel's interactions with MSPs might alter its structure, possibly not accurately representing/reflecting functional states of the channel.

      Strengths:

      Cryo-EM structural determination from proteoliposomes is a promising methodology within the ion channel field due to their large surface area and lack of MSP or other membrane mimetics that could alter channel structure. Comparing liposomal ELIC to structures in various-sized nanodiscs gives rise to important discussions for other membrane protein structural studies when deciding the best method for individual circumstances.

      Weaknesses:

      The overarching goal of the study was to determine structural differences of ELIC in detergent nanodiscs and liposomes. Including comparisons of the results to the native bacterial lipid environment would provide a more encompassing discussion of how the determined liposome structures might or might not relate to the native receptor in its native environment. The authors stated they determined open, closed, and desensitized states of ELIC reconstituted in liposomes and suggest the desensitization gate is at the 9' region of the pore. However, no functional studies were performed to validate this statement.

      The goal of this study was to determine structures of ELIC in the same lipid environment in which its function is characterized. However, it is also worth noting that phosphatidylethanolamine and phosphatidylglyerol, two lipids used for the liposome formation, are necessary for ELIC function (PMID 36385237) and principal lipid components of gram-negative bacterial membranes in which ELIC is expressed.

      The desensitized structure of ELIC in liposomes shows a pore diameter at the hydrophobic L240 (9’) residue of 3.3 Å, which is anticipated to pose a large energetic barrier to the passage of ions due to the hydrophobic effect. We have included a graphical representation of pore diameters from the HOLE analysis for all liposome structures in Supplementary Figure 6B. While we have not tested the role of L240 in desensitization with functional experiments, it was shown by Gonzalez-Gutierrez and colleagues (PMID 22474383) that the L240A mutation apparently eliminates desensitization in ELIC. This finding is consistent with L240 (9’) being the desensitization gate of ELIC. We have referenced this study when discussing the desensitization gate in the Results.

      Reviewer #2 (Public review):

      Summary

      The report by Dalas and colleagues introduces a significant novelty in the field of pentameric ligand-gated ion channels (pLGICs). Within this family of receptors, numerous structures are available, but a widely recognised problem remains in assigning structures to functional states observed in biological membranes. Here, the authors obtain both structural and functional information of a pLGIC in a liposome environment. The model receptor ELIC is captured in the resting, desensitized, and open states. Structures in large nanodiscs, possibly biased by receptor-scaffold protein interactions, are also reported. Altogether, these results set the stage for the adoption of liposomes as a proxy for the biological membranes, for cryoEM studies of pLGICs and membrane proteins in general.

      Strengths

      The structural data is comprehensive, with structures in liposomes in the 3 main states (and for each, both inward-facing and outward-facing), and an agonist-bound structure in the large spNW25 nanodisc (and a retreatment of previous data obtained in a smaller disc). It adds up to a series of work from the same team that constitutes a much-needed exploration of various types of environment for the transmembrane domain of pLGICs. The structural analysis is thorough.

      The tone of the report is particularly pleasant, in the sense that the authors' claims are not inflated. For instance, a sentence such as "By performing structural and functional characterization under the same reconstitution conditions, we increase our confidence in the functional annotation of these structures." is exemplary.

      Weaknesses

      Core parts of the method are not described and/or discussed in enough detail. While I do believe that liposomes will be, in most cases, better than, say, nanodiscs, the process that leads from the protein in its membrane down to the liposome will play a big role in preserving the native structure, and should be an integral part of the report. Therefore, I strongly felt that biochemistry should be better described and discussed. The results section starts with "Optimal reconstitution of ELIC in liposomes [...] was achieved by dialysis". There is no information on why dialysis is optimal, what it was compared to, the distribution of liposome sizes using different preparation techniques, etc... Reading the title, I would have expected a couple of paragraphs and figure panels on liposome reconstitution. Similarly, potential biochemical challenges are not discussed. The methods section mentions that the sample was "dialyzed [...] over 5-7 days". In such a time window, most of the members of this protein family would aggregate, and it is therefore a protocol that can not be directly generalised. This has to be mentioned explicitly, and a discussion on why this can't be done in two days, what else the authors tested (biobeads? ... ?) would strengthen the manuscript.

      To a lesser extent, the relative lack of both technical details and of a broad discussion also pertains to the cryoEM and thallium flux results. Regarding the cryoEM part, the authors focus their analysis on reconstructions from outward-facing particles on the basis of their better resolutions, yet there was little discussion about it. Is it common for liposome-based structures? Are inward-facing reconstructions worse because of the increased background due to electrons going through two membranes? Are there often impurities inside the liposomes (we see some in the figures)? The influence of the membrane mimetics on conformation could be discussed by referring to other families of proteins where it has been explored (for instance, ABC transporters, but I'm sure there are many other examples). If there are studies in other families of channels in liposomes that were inspirational, those could be mentioned. Regarding thallium flux assays, one argument is that they give access to kinetics and set the stage for time-resolved cryoEM, but if I did not miss it, no comparison of kinetics with other techniques, such as electrophysiology, nor references to eventual pioneer time-resolved studies are provided.

      Altogether, in my view, an updated version would benefit from insisting on every aspect of the methodological development. I may well be wrong, but I see this paper more like a milestone on sample prep for cryoEM imaging than being about the details of the ELIC conformations.

      Additions have been made to the Results and Discussion sections elaborating on the following points: 1) reconstitution of ELIC in liposomes using dialysis, the advantage of this over other methods such as biobeads, and whether the dialysis protocol can be shortened for other less stable proteins; 2) the issue of separating outward- and inward-facing channels; 3) referencing the effect of nanodiscs on ABC transporters, structures of membrane proteins in liposomes, and pioneering time-resolved cryo-EM studies; and 4) comparison of the kinetics of ELIC gating kinetics with electrophysiology measurements. With regards to the first point, it should be noted that all necessary details are provided in the Methods to reproduce the experiments including the reconstitution and stopped-flow thallium flux assay. It is also important to note that the same preparation for making proteoliposomes was used for assessing function using the stopped-flow thallium flux assay and for determining the structure by cryo-EM. This is now stated in the Results.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major revisions:

      (1) The authors suggest that the desensitization gate is located at the 9' region within the pore. However, as stated by the authors, the 2' residues function as the desensitization gate in related channels. In a few of their HOLE analyzed structures (e.g. Figure 2B and 4B), there seems to be a constriction also at 2', but this finding is not discussed in the context of desensitization. Further functional testing of mutated 9' and/or 2' gates would bolster the argument for the location of the desensitization gate.

      As stated above, we have included HOLE plots of pore radius in Supplementary Fig. 6B and referenced the study showing that the L240A mutation (9’) in ELIC (PMID 22474383) appears to eliminate desensitization. This result along with the narrow pore diameter at 9’ in the desensitized structure suggests that 9’ is likely a desensitization gate in ELIC. In contrast, mutation of Q233 (2’) to a cysteine in a previous study produced a channel that still desensitizes (PMID 25960405). Since Q233 is a hydrophilic residue in contrast to L240, Q233 probably does not pose the same energetic barrier to ion translocation as L240 based on the structure.

      (2) In discussing functional states of ELIC and ELIC5 in different reconstitution methods, the authors reference constriction sites determined by HOLE analysis software. These constriction sites were key evidence for the authors to determine functional state, however, it is difficult to discern pore sizes based on the figures. Pore diameters and clear color designation (ie, green vs orange) with the figures would greatly aid their discussions.

      HOLE plots are displayed in Supplementary Fig. 6B and pore diameters are not provided in the text.

      (3) The authors had an intriguing finding that ELIC dimers are found in spNW25 scaffolds. Is there any functional evidence to suggest they could be functioning as dimers?

      There is no evidence that the function of ELIC or other pLGICs is altered by the formation of dimers of pentamers. Therefore, while this result is intriguing and likely facilitated by concentrating multiple ELIC pentamers within the nanodisc, it is not clear if these interactions have any functional importance. We have stated this in the Results.

      (4) Thallium flux assay to validate channel function within proteoliposomes. Proteoliposomes are known to be generally very leaky membranes, would be good to have controls without ELIC added to determine baseline changes in fluorescence.

      We have established from multiple previous studies that liposomes composed of 2:1:1 POPC:POPE:POPG (PMID 36385237 and 31724949) do not show significant thallium flux as measured by the stopped-flow assay (PMID 29058195) in the absence of ELIC activity. Furthermore, in the present study, the data in Fig. 1A of WT ELIC shows a low thallium flux rate 60 seconds after exposure to agonist when the ion channel has mostly desensitized. Therefore, this data serves also as a control indicating that the high thallium flux rates in response to agonist (at earlier delay times) are not due to leak, but rather due to ELIC channel activity.

      Minor revisions:

      (1) Abstract and introduction. 'Liganded' should be ligand

      We removed this word and changed it to “agonist-bound” for consistency throughout the manuscript.

      (2) Inconsistent formatting of FSC graphs in Supplemental Figure 4

      The difference is a consequence of the different formatting between cryoSPARC and Relion FSC graphs.

      Reviewer #2 (Recommendations for the authors):

      Minor writing remarks:

      The present report builds on previous work from the same team, and to my eye it would be a plus if this were conveyed more explicitly. I see it as a strength to explore various developments in several papers that complement each other. E.g in the introduction when citing reference 12 (Dalal 2024), later in introducing ref 15 (Petroff 2022), I wish I was reminded of the main findings and how they fit with the new results.

      We have expanded on the Results and Discussion detailing key findings from these studies that are relevant to the current study.

      Suggestions for analysis:

      Data treatment. Maybe I missed it, but I wondered if C1 vs C5 treatment of the liposome data showed any interesting differences? When I think about the biological membrane, I picture it as a very crowded place with lots of neighbouring proteins. I would not be surprised if, similarly to what they do in discs, the receptor would tend to stick to, or bump into, anything present also in liposomes (a neighboring liposome, some undefined density inside the liposome).

      We attempted to perform C1 heterogeneous refinement jobs in cryoSPARC and C1 3D classification in Relion5. For the WT datasets, these did not produce 3D reconstructions that were of sufficient quality for further refinement. For ELIC5 with agonist, the C1 reconstructions were not different than the C5 reconstructions. Furthermore, there was no evidence of dimers of pentamers from the 2D or 3D treatments, unlike what was observed in the spNW25 nanodiscs. This is likely because the density of ELIC pentamers in the liposomes was too low to capture these transient interactions. We have included this information in the Methods.

      In data treatment, we sometimes find only what we're looking for. I wondered if the authors tried to find, for instance, the open and D conformations in the resting dataset during classifications.

      This is an interesting question since some population of ELIC channels could visit a desensitized conformation in the absence of agonist and this would not be detected in our flux assay. After extensive heterogeneous refinement jobs in cryoSPARC and 3D classification jobs in Relion5, we did not detect any unexpected structures such as open/desensitized conformations in the apo dataset.

      In the analysis of the M4 motions, is there info to be gained by looking at how it interacts with the rest of the TMD? For instance, I wondered if the buried surface area between M4 and the rest was changed. Also one could imagine to look at that M4 separately in outward-facing and inward-facing conformations (because the tension due to the bilayer will not be the same in the outer layer in both orientations - intuitively, I'd expect different levels of M4 motions)

      We have expanded our analysis of the structures as recommended. We determined the buried surface area between M4 and the rest of the channel in the liganded WT and ELIC5 structures in liposomes and nanodiscs, as well as the area between the TMD interfaces for these structures. There appears to be a pattern where liposome structures show less buried surface area between M4 and the rest of the channel, and less area at the TMD interfaces. Overall, this suggests that the liposome structures of ELIC in the open-channel or desensitized conformations are more loosely packed in the TMD compared to the nanodisc structures.

      We have also further discussed the issue of separating outward- and inward-facing conformations in the Results. The problem with classifying outward- and inward-facing orientations is that top/down or tilted views of the particles cannot be easily distinguished as coming from channels in one orientation or the other, unless there are conformational differences between outward- and inward-facing channels that would allow for their separation during 3D heterogeneous refinement or 3D classification. Furthermore, since the inward-facing reconstructions are of much lower resolution than the outward-facing reconstructions, we suspect that these particles are more heterogeneous possibly containing junk, multiple conformations, or particles that are both inward- and outward-facing. On the other hand, the outward-facing structures are of good quality, and therefore we are more confident that these come from a more homogeneous set of particles that are likely outward-facing (Note that most particles are outward facing based on side views of the 2D class averages). That said, when examining the conformation of M4 in outward- and inward-facing structures, we do not see any significant differences with the caveat that the inward-facing structures are of poor quality and that inward- and outward-facing particles may not have been well-separated.

    1. eLife Assessment

      This study makes the valuable claim that people track, specifically, the elasticity of control (that is, the degree to which outcome depends on how many resources - such as money - are invested), and that control elasticity is impaired in certain types of psychopathology. A novel task is introduced that provides solid evidence that this learning process occurs and that human behavior is sensitive to changes in the elasticity of control. Evidence that elasticity inference is distinct from more general learning mechanisms and is related to psychopathology remains incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigated the elasticity of controllability by developing a task that manipulates the probability of achieving a goal with a baseline investment (which they refer to as inelastic controllability) and the probability that additional investment would increase the probability of achieving a goal (which they refer to as elastic controllability). They found that a computational model representing the controllability and elasticity of the environment accounted better for the data than a model representing only the controllability. They also found that prior biases about the controllability and elasticity of the environment was associated with a composite psychopathology score. The authors conclude that elasticity inference and bias guide resource allocation.

      Strengths:

      This research takes a novel theoretical and methodological approach to understanding how people estimate the level of control they have over their environment and how they adjust their actions accordingly. The task is innovative and both it and the findings are well-described (with excellent visuals). They also offer thorough validation for the particular model they develop. The research has the potential to theoretically inform understanding of control across domains, which is a topic of great importance.

      Weaknesses:

      In its revised form, the manuscript addresses most of my previous concerns. The main remaining weakness pertains to the analyses aimed at addressing my suggesting of Bayesian updating as an alternative to the model proposed by the authors. My suggestion was to assume that people perform a form of function approximation to relate resource expenditure to success probability. The authors performed a version of this where people were weighing evidence for a few canonical functions (flat, step, linear), and found that this model underperformed theirs. However, this Bayesian model is quite constrained in its ability to estimate the function relating resources. A more robust test would be to assume a more flexible form of updating that is able to capture a wide range of distributions (e.g., using basis functions, gaussian processes, or nonparametric estimators); see, e.g., work by Griffiths on human function learning). The benefit of testing this type of model is that it would make contact with a known form of inference that individuals engage in across various settings and therefore could offer a more parsimonious and generalizable account of function learning, whereby learning of resource elasticity is a special case. I defer to the authors as to whether they'd like to pursue this direction, but if not I think it's still important that they acknowledge that they are unable to rule out a more general process like this as an alternative to their model. This pertains also to inferences about individual differences, which currently hinge on their preferred model being the most parsimonious.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors test whether controllability beliefs and associated actions/resource allocation are modulated by things like time, effort, and monetary costs (what they call "elastic" as opposed to "inelastic" controllability). Using a novel behavioral task and computational modeling, they find that participants do indeed modulate their resources depending on whether they are in an "elastic," "inelastic," or "low controllability" environment. The authors also find evidence that psychopathology is related to specific biases in controllability.

      Strengths:

      This research investigates how people might value different factors that contribute to controllability in a creative and thorough way. The authors use computational modeling to try to dissociate "elasticity" from "overall controllability," and find some differential associations with psychopathology. This was a convincing justification for using modeling above and beyond behavioral output and yielded interesting results. Notably, the authors conclude that these findings suggest that biased elasticity could distort agency beliefs via maladaptive resource allocation. Overall, this paper reveals important findings about how people consider components of controllability.

      Weaknesses:

      The authors have gone to great lengths to revise the manuscript to clarify their definitions of "elastic" and "inelastic" and bolster evidence for their computational model, resulting in an overall strong manuscript that is valuable for elucidating controllability dynamics and preferences. One minor weakness is that the justification for the analysis technique for the relationships between the model parameters and the psychopathology measures remains lacking given the fact that simple correlational analyses did not reveal any significant associations.

    4. Reviewer #3 (Public review):

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome. In particular, the authors identify one key dimension: the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally argue that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea has the potential to change how we think about several major mental disorders in a substantial way and can additionally help us better understand how healthy people navigate challenging decision-making problems. More concisely, it is a very good idea.

      Unfortunately, my view is that neither the theoretical nor empirical aspects of the paper really deliver on that promise. In particular, most (perhaps all) of the interesting claims in the paper have weak empirical support.

      Starting with theory, the authors do not provide a strong formal characterization of the proposed notion of elasticity. There are existing, highly general models of controllability (e.g., Huys & Dayan, 2009; Ligneul, 2021) and the elasticity idea could naturally be embedded within one of these frameworks. The authors gesture at this in the introduction; however, this formalization is not reflected in the implemented model, which is highly task-specific. Moreover, the authors present elasticity as if it is somehow "outside of" the more general notion of controllability. However, effort and investment are just specific dimensions of action; and resources like money, strength, and skill (the "highly trained birke") are just specific dimensions of state. Accordingly, the notion of elasticity is necessarily implicitly captured by the standard model. Personally, I am compelled by the idea that effort and resource (and therefore elasticity) are particularly important dimensions, ones that people are uniquely tuned to. However, by framing elasticity as a property that is different in kind from controllability (rather than just a dimension of controllability), the authors only make it more difficult to integrate this exciting idea into generalizable models.

      Turning to experiment, the authors make two key claims: (1) people infer the elasticity of control, and (2) individual differences in how people make this inference are importantly related to psychopathology.

      Starting with claim 1, there are three subclaims here; implicitly, the authors make all three. (1A) People's behavior is sensitive to differences in elasticity, (1B) people actually represent/track something like elasticity, and (1C) people do so naturally as they go about their daily lives. The results clearly support 1A. However, 1B and 1C are not strongly supported.

      (1B) The experiment cannot support the claim that people represent or track elasticity because effort is the only dimension over which participants can engage in any meaningful decision-making. The other dimension, selecting which destination to visit, simply amounts to selecting the location where you were just told the treasure lies. Thus, any adaptive behavior will necessarily come out in a sensitivity to how outcomes depend on effort.

      Notes on rebuttal: The argument that vehicle/destination choice is not trivial because people occasionally didn't choose the instructed location is not compelling to me-if anything, the exclusion rate is unusually low for online studies. The finding that people learn more from non-random outcomes is helpful, but this could easily be cast as standard model-based learning very much like what one measures with the Daw two-step task (nothing specific to control here). Their final argument is the strongest, that to explain behavior the model must assume "a priori that increased effort could enhance control." However, more literally, the necessary assumption is that each attempt increases the probability of success-e.g. you're more likely to get a heads in two flips than one. I suppose you can call that "elasticity inference", but I would call it basic probabilistic reasoning.

      For 1C, the claim that people infer elasticity outside of the experimental task cannot be supported because the authors explicitly tell people about the two notions of control as part of the training phase: "To reinforce participants' understanding of how elasticity and controllability were manifested in each planet, [participants] were informed of the planet type they had visited after every 15 trips." (line 384).

      Notes on rebuttal: The authors try to retreat, saying "our research question was whether people can distinguish between elastic and inelastic controllability." I struggle to reconcile this with the claim in the abstract "These findings establish the elasticity of control as a distinct cognitive construct guiding adaptive behavior". That claim is the interesting one, and the one I am evaluating the evidence in light of.

      Finally, I turn to claim 2, that individual differences in how people infer elasticity are importantly related to psychopathology. There is much to say about the decision to treat psychopathology as a unidimensional construct (the authors claim otherwise, but see Fig 6C). However, I will keep it concrete and simply note that CCA (by design) obscures the relationship between any two variables. Thus, as suggestive as Figure 6B is, we cannot conclude that there is a strong relationship between Sense of Agency (SOA) and the elasticity bias---this result is consistent with any possible relationship (even a negative one). As it turns out, Figure S3 shows that there is effectively no relationship (r=0.03).

      Notes on rebuttal: The authors argue for CCA by appeal to the need to "account for the substantial variance that is typically shared among different forms of psychopathology". I agree. A simple correlation would indeed be fairly weak evidence. Strong evidence would show a significant correlation after *controlling for* other factors (e.g. a regression predicting elasticity bias from all subscales simultaneously). CCA effectively does the opposite, asking whether-with the help of all the parameters and all the surveys-one can find any correlation between the two sets of variables. The results are certainly suggestive, but they provide very little statistical evidence that the elasticity parameter is meaningfully related to any particular dimension of psychopathology.

      There is also a feature of the task that limits our ability to draw strong conclusions about individual differences about elasticity inference. In the original submission, the authors stated that the study was designed to be "especially sensitive to overestimation of elasticity". A straightforward consequence of this is that the resulting *empirical* estimate of estimation bias (i.e., the gamma_elasticity parameter) is itself biased. This immediately undermines any claim that references the directionality of the elasticity bias (e.g. in the abstract). Concretely, an undirected deficit such as slower learning of elasticity would appear as a directed overestimation bias.

      When we further consider that elasticity inference is the only meaningful learning/decision-making problem in the task (argued above), the situation becomes much worse. Many general deficits in learning or decision-making would be captured by the elasticity bias parameter. Thus, a conservative interpretation of the results is simply that psychopathology is associated with impaired learning and decision-making.

      Notes on rebuttal: I am very concerned to see that the authors removed the discussion of this limitation in response to my first review. I quote the original explanation here:

      - In interpreting the present findings, it needs to be noted that we designed our task to be especially sensitive to overestimation of elasticity. We did so by giving participants free 3 tickets at their initial visits to each planet, which meant that upon success with 3 tickets, people who overestimate elasticity were more likely to continue purchasing extra tickets unnecessarily. Following the same logic, had we first had participants experience 1 ticket trips, this could have increased the sensitivity of our task to underestimation of elasticity in elastic environments. Such underestimation could potentially relate to a distinct psychopathological profile that more heavily loads on depressive symptoms. Thus, by altering the initial exposure, future studies could disambiguate the dissociable contributions of overestimating versus underestimating elasticity to different forms of psychopathology.

      The logic of this paragraph makes perfect sense to me. If you assume low elasticity, you will infer that you could catch the train with just one ticket. However, when elasticity is in fact high, you would find that you don't catch the train, leading you to quickly infer high elasticity-eliminating the bias. In contrast, if you assume high elasticity, you will continue purchasing three tickets and will never have the opportunity to learn that you could be purchasing only one-the bias remains.

      The authors attempt to argue that this isn't happening using parameter recovery. However, they only report the *correlation* in the parameter, whereas the critical measure is the *bias*. Furthermore, in parameter recovery, the data-generating and data-fitting models are identical-this will yield the best possible recovery results. Although finding no bias in this setting would support the claims, it cannot outweigh the logical argument for the bias that they originally laid out. Finally, parameter recovery should be performed across the full range of plausible parameter values; using fitted parameters (a detail I could only determine by reading the code) yields biased results because the fitted parameters are themselves subject to the bias (if present). That is, if true low elasticity is inferred as high elasticity, then you will not have any examples of low elasticity in the fitted parameters and will not detect the inability to recover them.

      Minor comments:

      Below are things to keep in mind.

      The statistical structure of the task is inconsistent with the framing. In the framing, participants can make either one or two second boarding attempts (jumps) by purchasing extra tickets. The additional attempt(s) will thus succeed with probability p for one ticket and 2p - p^2 for two tickets; the p^2 captures the fact that you only take the second attempt if you fail on the first. A consequence of this is buying more tickets has diminishing returns. In contrast, in the task, participants always jumped twice after purchasing two tickets, and the probability of success with two tickets was exactly double that with one ticket. Thus, if participants are applying an intuitive causal model to the task, they will appear to "underestimate" the elasticity of control. I don't think this seriously jeopardizes the key results, but any follow-up work should ensure that the task's structure is consistent with the intuitive causal model.

      The model is heuristically defined and does not reflect Bayesian updating. For example, it over-estimates maximum control by not using losses with less than 3 tickets (intuitively, the inference here depends on what your beliefs about elasticity). Including forced three-ticket trials at the beginning of each round makes this less of an issue; but if you want to remove those trials, you might need to adjust the model. The need to introduce the modified model with kappa is likely another symptom of the heuristic nature of the model updating equations.

    5. Author response:

      The following is the authors’ response to the original reviews

      We thank the Reviewers for their thorough reading and thoughtful feedback. Below, we address each of the concerns raised in the public reviews, and outline our revisions that aim to further clarify and strengthen the manuscript.

      In our response, we clarify our conceptualization of elasticity as a dimension of controllability, formalizing it within an information-theoretic framework, and demonstrating that controllability and its elasticity are partially dissociable. Furthermore, we provide clarifications and additional modeling results showing that our experimental design and modeling approach are well-suited to dissociating elasticity inference from more general learning processes, and are not inherently biased to find overestimates of elasticity. Finally, we clarify the advantages and disadvantages of our canonical correlation analysis (CCA) approach for identifying latent relationships between multidimensional data sets, and provide additional analyses that strengthen the link between elasticity estimation biases and a specific psychopathology profile. 

      Public Reviews:

      Reviewer 1 (Public review): 

      This research takes a novel theoretical and methodological approach to understanding how people estimate the level of control they have over their environment, and how they adjust their actions accordingly. The task is innovative and both it and the findings are well-described (with excellent visuals). They also offer thorough validation for the particular model they develop. The research has the potential to theoretically inform the understanding of control across domains, which is a topic of great importance.

      We thank the Reviewer for their favorable appraisal and valuable suggestions, which have helped clarify and strengthen the study’s conclusion. 

      An overarching concern is that this paper is framed as addressing resource investments across domains that include time, money, and effort, and the introductory examples focus heavily on effort-based resources (e.g., exercising, studying, practicing). The experiments, though, focus entirely on the equivalent of monetary resources - participants make discrete actions based on the number of points they want to use on a given turn. While the same ideas might generalize to decisions about other kinds of resources (e.g., if participants were having to invest the effort to reach a goal), this seems like the kind of speculation that would be better reserved for the Discussion section rather than using effort investment as a means of introducing a new concept (elasticity of control) that the paper will go on to test.

      We thank the Reviewer for pointing out a lack of clarity regarding the kinds of resources tested in the present experiment. Investing additional resources in the form of extra tickets did not only require participants to pay more money. It also required them to invest additional time – since each additional ticket meant making another attempt to board the vehicle, extending the duration of the trial, and attentional effort – since every attempt required precisely timing a spacebar press as the vehicle crossed the screen. Given this involvement of money, time, and effort resources, we believe it would be imprecise to present the study as concerning monetary resources in particular. That said, we agree with the Reviewer that results might differ depending on the resource type that the experiment or the participant considers most. Thus, we now clarify the kinds of resources the experiment involved (lines 87-97): 

      “To investigate how people learn the elasticity of control, we allowed participants to invest different amounts of resources in attempting to board their preferred vehicle. Participants could purchase one (40 coins), two (60 coins), or three tickets (80 coins) or otherwise walk for free to the nearest location. Participants were informed that a single ticket allowed them to board only if the vehicle stopped at the station, while additional tickets provided extra chances to board even after the vehicle had left the platform. For each additional ticket, the chosen vehicle appeared moving from left to right across the screen, and participants could attempt to board it by pressing the spacebar when it reached the center of the screen. Thus, each additional ticket could increase the chance of boarding but also required a greater investment of resources—decreasing earnings, extending the trial duration, and demanding attentional effort to precisely time a button press when attempting to board.”

      In addition, in the revised discussion, we now highlight the open question of whether inferences concerning the elasticity of control generalize across different resource domains (lines 341-348):

      “Another interesting possibility is that individual elasticity biases vary across different resource types (e.g., money, time, effort). For instance, a given individual may assume that controllability tends to be highly elastic to money but inelastic to effort. Although the task incorporated multiple resource types (money, time, and attentional effort), the results may differ depending on the type of resources on which the participant focuses. Future studies could explore this possibility by developing tasks that separately manipulate elasticity with respect to different resource types. This would clarify whether elasticity biases are domain-specific or domaingeneral, and thus elucidate their impact on everyday decision-making.”

      Setting aside the framing of the core concepts, my understanding of the task is that it effectively captures people's estimates of the likelihood of achieving their goal (Pr(success)) conditional on a given investment of resources. The ground truth across the different environments varies such that this function is sometimes flat (low controllability), sometimes increases linearly (elastic controllability), and sometimes increases as a step function (inelastic controllability). If this is accurate, then it raises two questions.

      First, on the modeling front, I wonder if a suitable alternative to the current model would be to assume that the participants are simply considering different continuous functions like these and, within a Bayesian framework, evaluating the probabilistic evidence for each function based on each trial's outcome. This would give participants an estimate of the marginal increase in Pr(success) for each ticket, and they could then weigh the expected value of that ticket choice (Pr(success)*150 points) against the marginal increase in point cost for each ticket. This should yield similar predictions for optimal performance (e.g., opt-out for lower controllability environments, i.e., flatter functions), and the continuous nature of this form of function approximation also has the benefit of enabling tests of generalization to predict changes in behavior if there was, for instance, changes in available tickets for purchase (e.g., up to 4 or 5) or changes in ticket prices. Such a model would of course also maintain a critical role for priors based on one's experience within the task as well as over longer timescales, and could be meaningfully interpreted as such (e.g., priors related to the likelihood of success/failure and whether one's actions influence these). It could also potentially reduce the complexity of the model by replacing controllability-specific parameters with multiple candidate functions (presumably learned through past experience, and/or tuned by experience in this task environment), each of which is being updated simultaneously.

      We thank the Reviewer for suggesting this interesting alternative modeling approach. We agree that a Bayesian framework evaluating different continuous functions could offer advantages, particularly in its ability to generalize to other ticket quantities and prices. To test the Reviewer's suggestion, we implemented a Bayesian model where participants continuously estimate both controllability and its elasticity as a mixture of three archetypal functions mapping ticket quantities to success probabilities. The flat function provides no control regardless of how many tickets are purchased (corresponding to low controllability). The step function provides the same level of control as long as at least one ticket is purchased (inelastic controllability). The linear function increases control proportionally with each additional ticket (elastic controllability). The model computes the likelihood that each of the functions produced each new observation, and accordingly updates its beliefs. Using these beliefs, the model estimates the probability of success for purchasing each number of tickets, allowing participants to weigh expected control against increasing ticket costs. Despite its theoretical advantages for generalization to different ticket quantities, this continuous function approximation model performed significantly worse than our elastic controllability model (log Bayes Factor > 4100 on combined datasets). We surmise that the main advantage offered by the elastic controllability model is that it does not assume a linear increase in control as a function of resource investment – even though this linear relationship was actually true in our experiment and is required for generalizing to other ticket quantities, it likely does not match what participants were doing. We present these findings in a new section ‘Testing alternative methods’ (lines 686-701):

      “We next examined whether participant behavior would be better characterized as a continuous function approximation rather than the discrete inferences in our model. To test this, we implemented a Bayesian model where participants continuously estimate both controllability and its elasticity as a mixture of three archetypal functions mapping ticket quantities to success probabilities. The flat function provides no control regardless of how many tickets are purchased (corresponding to low controllability). The step function provides full control as long as at least one ticket is purchased (inelastic controllability). The linear function linearly increases control with the number of extra tickets (i.e., 0%, 50%, and 100% control for 1, 2, and 3 tickets, respectively; elastic controllability). The model computes the likelihood that each of the functions produced each new observation, and accordingly updates its beliefs. Using these beliefs, the model estimates the probability of success for purchasing each number of tickets, allowing participants to weigh expected control against increasing ticket costs. Despite its theoretical advantages for generalization to different ticket quantities, this continuous function approximation model performed significantly worse than the elastic controllability model (log Bayes Factor > 4100 on combined datasets), suggesting that participants did not assume that control increases linearly with resource investment.”

      We also refer to this analysis in our updated discussion (326-339): 

      “Second, future models could enable generalization to levels of resource investment not previously experienced. For example, controllability and its elasticity could be jointly estimated via function approximation that considers control as a function of invested resources. Although our implementation of this model did not fit participants’ choices well (see Methods), other modeling assumptions or experimental designs may offer a better test of this idea.”

      Second, if the reframing above is apt (regardless of the best model for implementing it), it seems like the taxonomy being offered by the authors risks a form of "jangle fallacy," in particular by positing distinct constructs (controllability and elasticity) for processes that ultimately comprise aspects of the same process (estimation of the relationship between investment and outcome likelihood). Which of these two frames is used doesn't bear on the rigor of the approach or the strength of the findings, but it does bear on how readers will digest and draw inferences from this work. It is ultimately up to the authors which of these they choose to favor, but I think the paper would benefit from some discussion of a common-process alternative, at least to prevent too strong of inferences about separate processes/modes that may not exist. I personally think the approach and findings in this paper would also be easier to digest under a common-construct approach rather than forcing new terminology but, again, I defer to the authors on this.

      We acknowledge the Reviewer's important point about avoiding a potential "jangle fallacy." We entirely agree with the Reviewer that elasticity and controllability inferences are not distinct processes. Specifically, we view resource elasticity as a dimension of controllability, hence the name of our ‘elastic controllability’ model. In response to this and other Reviewers’ comments, in the revised manuscript, we now offer a formal definition of elasticity as the reduction in uncertainty about controllability due to knowing the amount of resources available to the agent (lines 16-20; see further details in response to Reviewer 3 below).  

      With respect to how this conceptualization is expressed in the modeling, we note that the representation in our model of maximum controllability and its elasticity via different variables is analogous to how a distribution may be represented by separate mean and variance parameters. Even the model suggested by the Reviewer required a dedicated variable representing elastic controllability, namely the probability of the linear controllability function. More generally, a single-process account allows that different aspects of the said process would be differently biased (e.g., one can have an accurate estimate of the mean of a distribution but overestimate its variance). Therefore, our characterization of distinct elasticity and controllability biases (or to put it more accurately, 'elasticity of controllability bias' and 'maximum controllability bias') is consistent with a common construct account.

      To avoid misunderstandings, we have now modified the text to clarify that we view elasticity as a dimension of controllability that can only be estimated in conjunction with controllability. Here are a few examples:

      Lines 21-28: “While only controllable environments can be elastic, the inverse is not necessarily true – controllability can be high, yet inelastic to invested resources – for example, choosing between bus routes affords equal control over commute time to anyone who can afford the basic fare (Figure 1; Supplementary Note 1). That said, since all actions require some resource investment, no controllable environment is completely inelastic when considering the full spectrum of possible agents, including those with insufficient resources to act (e.g., those unable to purchase a bus fare or pay for a fixed-price meal).”

      Lines 45-47: “Experimental paradigms to date have conflated overall controllability and its elasticity, such that controllability was either low or elastic[16-20]. The elasticity of control, however, must be dissociated from overall controllability to accurately diagnose mismanagement of resources.”

      Lines 70-72: “These findings establish elasticity as a crucial dimension of controllability that guides adaptive behavior, and a computational marker of control-related psychopathology.”

      Lines 87-88: “To investigate how people learn the elasticity of control, we allowed participants to invest different amounts of resources in attempting to board their preferred vehicle.”

      Reviewer 2 (Public review):

      This research investigates how people might value different factors that contribute to controllability in a creative and thorough way. The authors use computational modeling to try to dissociate "elasticity" from "overall controllability," and find some differential associations with psychopathology. This was a convincing justification for using modeling above and beyond behavioral output and yielded interesting results. Interestingly, the authors conclude that these findings suggest that biased elasticity could distort agency beliefs via maladaptive resource allocation. Overall, this paper reveals some important findings about how people consider components of controllability.

      We appreciate the Reviewer's positive assessment of our findings and computational approach to dissociating elasticity and overall controllability.

      The primary weakness of this research is that it is not entirely clear what is meant by "elastic" and "inelastic" and how these constructs differ from existing considerations of various factors/calculations that contribute to perceptions of and decisions about controllability. I think this weakness is primarily an issue of framing, where it's not clear whether elasticity is, in fact, theoretically dissociable from controllability. Instead, it seems that the elements that make up "elasticity" are simply some of the many calculations that contribute to controllability. In other words, an "elastic" environment is inherently more controllable than an "inelastic" one, since both environments might have the same level of predictability, but in an "elastic" environment, one can also partake in additional actions to have additional control overachieving the goal (i.e., expend effort, money, time).

      We thank the Reviewer for highlighting the lack of clarity about the concept of elasticity. We first clarify that elasticity cannot be entirely dissociated from controllability because it is a dimension of controllability. If no controllability is afforded, then there cannot be elasticity or inelasticity. This is why in describing the experimental environments, we only label high-controllability, but not low-controllability, environments as ‘elastic’ or ‘inelastic’. For further details on this conceptualization of elasticity, and associated revisions of the text, see our response above to Reviewer 1. 

      Second, we now clarify that controllability can also be computed without knowing the amount of resources the agent is able and willing to invest, for instance by assuming infinite resources available or a particular distribution of resource availabilities. However, knowing the agent’s available resources often reduces uncertainty concerning controllability. This reduction in uncertainty is what we define as elasticity. Since any action requires some resources, this means that no controllable environment is entirely inelastic if we also consider agents that do not have enough resources to commit any action. However, even in this case, environments can differ in the degree to which they are elastic. For further details on this formal definition, and associated revisions of the text, see our response to Reviewer 3.

      Importantly, whether an environment is more or less elastic does not fully determine whether it is more or less controllable. In particular, environments can be more controllable yet less elastic. This is true even if we allow that investing different levels of resources (i.e., purchasing 0, 1, 2, or 3 tickets) constitute different actions, in conjunction with participants’ vehicle choices. Below, we show this using two existing definitions of controllability. 

      Definition 1, reward-based controllability[1]: If control is defined as the fraction of available reward that is controllably achievable, and we assume all participants are in principle willing and able to invest 3 tickets, controllability can be computed in the present task as:

      where P( S'= goal ∣ 𝑆, 𝐴, 𝐶 ) is the probability of reaching the treasure from present state 𝑆 when taking action A and investing C resources in executing the action. In any of the task environments, the probability of reaching the goal is maximized by purchasing 3 tickets (𝐶 = 3) and choosing the vehicle that leads to the goal (𝐴 = correct vehicle). Conversely, the probability of reaching the goal is minimized by purchasing 3 tickets (𝐶 = 3) and choosing the vehicle that does not lead to the goal (𝐴 = wrong vehicle). This calculation is thus entirely independent of elasticity, since it only considers what would be achieved by maximal resource investment, whereas elasticity consists of the reduction in controllability that would arise if the maximal available 𝐶 is reduced. Consequently, any environment where the maximum available control is higher yet varies less with resource investment would be more controllable and less elastic. 

      Note that if we also account for ticket costs in calculating reward, this will only reduce the fraction of achievable reward and thus the calculated control in elastic environments.   

      Definition 2, information-theoretic controllability[2]: Here controllability is defined as the reduction in outcome entropy due to knowing which action is taken:

      where H(S'|S) is the conditional entropy of the distribution of outcomes S' given the present state S, and H(S'|S, A, C) is the conditional entropy of the outcome given the present state, action, and resource investment. 

      To compare controllability, we consider two environments with the same maximum control:

      • Inelastic environment: If the correct vehicle is chosen, there is a 100% chance of reaching the goal state with 1, 2, or 3 tickets. Thus, out of 7 possible action-resource investment combinations, three deterministically lead to the goal state (≥1 tickets and correct vehicle choice), three never lead to it (≥1 tickets and wrong vehicle choice), and one (0 tickets) leads to it 20% of the time (since walking leads to the treasure on 20% of trials).

      • Elastic Environment: If the correct vehicle is chosen, the probability of boarding it is 0% with 1 ticket, 50% with 2 tickets, and 100% with 3 tickets. Thus, out of 7 possible actionresource investment combinations, one deterministically leads to the goal state (3 tickets and correct vehicle choice), one never leads to it (3 tickets and wrong vehicle choice), one leads to it 60% of the time (2 tickets and correct vehicle choice: 50% boarding + 50% × 20% when failing to board), one leads to it 10% of time (2 ticket and wrong vehicle choice), and three lead to it 20% of time (0-1 tickets).

      Here we assume a uniform prior over actions, which renders the information-theoretic definition of controllability equal to another definition termed ‘instrumental divergence’[3,4]. We note that changing the uniform prior assumption would change the results for the two environments, but that would not change the general conclusion that there can be environments that are more controllable yet less elastic. 

      Step 1: Calculating H(S'|S)

      For the inelastic environment:

      P(goal) = (3 × 100% + 3 × 0% + 1 × 20%)/7 = .46, P(non-goal) = .54  H(S'|S) = – [.46 × log<sub>2</sub>(.46) + .54 × log<sub>2</sub>(.54)] = 1 bit

      For the elastic environment:

      P(goal) = (1 × 100% + 1 × 0% + 1 × 60% + 1 × 10% + 3 × 20%)/7 = .33, P(non-goal) = .67 H(S'|S) = – [.33 × log<sub>2</sub>(.33) + .67 × log<sub>2</sub>(.67)] = .91 bits

      Step 2: Calculating H(S'|S, A, C)

      Inelastic environment: Six action-resource investment combinations have deterministic outcomes entailing zero entropy, whereas investing 0 tickets has a probabilistic outcome (20%). The entropy for 0 tickets is: H(S'|C = 0) = -[.2 × log<sub>2</sub> (.2) + 0.8 × log<sub>2</sub> (.8)] = .72 bits. Since this actionresource investment combination is chosen with probability 1/7, the total conditional entropy is approximately .10 bits

      Elastic environment: 2 actions have deterministic outcomes (3 tickets with correct/wrong vehicle), whereas the other 5 actions have probabilistic outcomes:

      2 tickets and correct vehicle (60% success): 

      H(S'|A = correct, C = 2) = – [.6 × log<sub>2</sub> (.6) + .4 × log<sub>2</sub> (.4)] = .97 bits 2 tickets and wrong vehicle (10% success): 

      H(S'|A = wrong, C = 2) = – [.1 × log<sub>2</sub> (.1) + .9 × log<sub>2</sub> (.9)] = .47 bits 0-1 tickets (20% success):

      H(S'|C = 0-1) = – [.2 × log<sub>2</sub> (.2) + .8 × log<sub>2</sub> (.8)] = .72 bits

      Thus the total conditional entropy of the elastic environment is: H(S'|S, A, C) = (1/7) × .97 + (1/7) × .47 + (3/7) × .72 = .52 bits

      Step 3: Calculating I(S'|A, S)  

      Inelastic environment: I(S'; A, C | S) = H(S'|S) – H(S'|S, A, C) = 1 – 0.1 = .9 bits 

      Elastic environment: I(S'; A, C | S) = H(S'|S) – H(S'|S, A, C) = .91 – .52 = .39 bits

      Thus, the inelastic environment offers higher information-theoretic controllability (.9 bits) compared to the elastic environment (.39 bits). 

      Of note, even if each combination of cost and success/failure to reach the goal is defined as a distinct outcome, then information-theoretic controllability is higher for the inelastic (2.81 bits) than for the elastic (2.30 bits) environment. These calculations are now included in the Supplementary materials (Supplementary Note 1). 

      In sum, for both definitions of controllability, we see that environments can be more elastic yet less controllable. We have also revised the manuscript to clarify this distinction (lines 21-28):

      “While only controllable environments can be elastic, the inverse is not necessarily true – controllability can be high, yet inelastic to invested resources – for example, choosing between bus routes affords equal control over commute time to anyone who can afford the basic fare (Figure 1; Supplementary Note 1). That said, since all actions require some resource investment, no controllable environment is completely inelastic when considering the full spectrum of possible agents, including those with insufficient resources to act (e.g., those unable to purchase a bus fare or pay for a fixed-price meal).”

      Reviewer 3 (Public review):

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome is multi-dimensional. In particular, the authors propose that the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally propose that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea thus has the potential to change how we think about mental disorders in a substantial way, and could even help us better understand how healthy people navigate challenging decision-making problems.

      Unfortunately, my view is that neither the theoretical nor empirical aspects of the paper really deliver on that promise. In particular, most (perhaps all) of the interesting claims in the paper have weak empirical support.

      We appreciate the Reviewer's thoughtful engagement with our research and recognition of the potential significance of distinguishing between different dimensions of control in understanding psychopathology. We believe that all the Reviewer’s comments can be addressed with clarifications or additional analyses, as detailed below.  

      Starting with theory, the elasticity idea does not truly "extend" the standard control model in the way the authors suggest. The reason is that effort is simply one dimension of action. Thus, the proposed model ultimately grounds out in how strongly our outcomes depend on our actions (as in the standard model). Contrary to the authors' claims, the elasticity of control is still a fixed property of the environment. Consistent with this, the computational model proposed here is a learning model of this fixed environmental property. The idea is still valuable, however, because it identifies a key dimension of action (namely, effort) that is particularly relevant to the notion of perceived control. Expressing the elasticity idea in this way might support a more general theoretical formulation of the idea that could be applied in other contexts. See Huys & Dayan (2009), Zorowitz, Momennejad, & Daw (2018), and Gagne & Dayan (2022) for examples of generalizable formulations of perceived control.

      We thank the Reviewer for the suggestion that we formalize our concept of elasticity to resource investment, which we agree is a dimension of action. We first note that we have not argued against the claim that elasticity is a fixed property of the environment. We surmise the Reviewer might have misread our statement that “controllability is not a fixed property of the environment”. The latter statement is motivated by the observation that controllability is often higher for agents that can invest more resources (e.g., a richer person can buy more things). We clarify this in our revision of the manuscript in lines 8-15 (changes in bold): 

      “The degree of control we possess over our environment, however, may itself depend on the resources we are willing and able to invest. For example, the control a biker has over their commute time depends on the power they are willing and able to invest in pedaling. In this respect, a highly trained biker would typically have more control than a novice. Likewise, the control a diner in a restaurant has over their meal may depend on how much money they have to spend. In such situations, controllability is not fixed but rather elastic to available resources (i.e., in the same sense that supply and demand may be elastic to changing prices[14]).”

      To formalize elasticity, we build on Huys & Dayan’s definition of controllability1 as the fraction of reward that is controllably achievable, 𝜒 (though using information-theoretic definitions[2,3] would work as well). To the extent that this fraction depends on the amount of resources the agent is able and willing to invest (max 𝐶), this formulation can be probabilistically computed without information about the particular agent involved, specifically, by assuming a certain distribution of agents with different amounts of available resources. This would result in a probability distribution over 𝜒. Elasticity can thus be defined as the amount of information obtained about controllability due to knowing the amount of resources available to the agent: I(𝜒; max 𝐶). We have added this formal definition to the manuscript (lines 15-20): 

      “To formalize how elasticity relates to control, we build on an established definition of controllability as the fraction of reward that is controllably achievable[15], 𝜒. Uncertainty about this fraction could result from uncertainty about the amount of resources that the agent is able and willing to invest, 𝑚𝑎𝑥 𝐶. Elasticity can thus be defined as the amount of information obtained about controllability by knowing the amount of available resources: 𝐼(𝜒; 𝑚𝑎𝑥 𝐶).”

      Turning to experiment, the authors make two key claims: (1) people infer the elasticity of control, and (2) individual differences in how people make this inference are importantly related to psychopathology. Starting with claim 1, there are three sub-claims here; implicitly, the authors make all three. (1A) People's behavior is sensitive to differences in elasticity, (1B) people actually represent/track something like elasticity, and (1C) people do so naturally as they go about their daily lives. The results clearly support 1A. However, 1B and 1C are not supported. Starting with 1B, the experiment cannot support the claim that people represent or track elasticity because the effort is the only dimension over which participants can engage in any meaningful decision-making (the other dimension, selecting which destination to visit, simply amounts to selecting the location where you were just told the treasure lies). Thus, any adaptive behavior will necessarily come out in a sensitivity to how outcomes depend on effort. More concretely, any model that captures the fact that you are more likely to succeed in two attempts than one will produce the observed behavior. The null models do not make this basic assumption and thus do not provide a useful comparison.

      We appreciate the Reviewer's critical analysis of our claims regarding elasticity inference, which as detailed below, has led to an important new analysis that strengthens the study’s conclusions. However, we respectfully disagree with two of the Reviewer’s arguments. First, resource investment was not the only meaningful decision dimension in our task, since participant also needed to choose the correct vehicle to get to the right destination. That this was not trivial is evidenced by our exclusion of over 8% of participants who made incorrect vehicle choices more than 10% of the time. Included participants also occasionally erred in this choice (mean error rate = 3%, range [0-10%] now specified in lines 363-366). 

      Second, the experimental task cannot be solved well by a model that simply tracks how outcomes depend on effort because 20% of the time participants reached the treasure despite failing to board their vehicle of choice. In such cases, reward outcomes and control were decoupled. Participants could identify when this was the case by observing the starting location (since depending on the starting location, the treasure location could have been automatically reached by walking), which was revealed together with the outcome. To determine whether participants distinguished between control-related and non-control-related reward, we have now fitted a variant of our model to the data that allows learning from each of these kinds of outcomes by means of a different free parameter. The results show that participants learned considerably more from control-related outcomes. They were thus not merely tracking outcomes, but specifically inferred when outcomes can be attributed to control. We now include this new analysis in the revised manuscript (Methods lines 648-661):

      “To ascertain that participants were truly learning latent estimates of controllability rather than simpler associations, we conducted two complementary analyses.

      First, we implemented a simple Q-learning model that directly maps ticket quantities to expected values based on reward prediction errors, without representing latent controllability. This associative model performed substantially worse than even our simple controllability model (log Bayes Factor ≥ 1854 on the combined datasets). Second, we fitted a variant of the elastic controllability model that compared learning from control-related versus chance outcomes via separate parameters (instead of assuming no learning from chance outcomes). Chance outcomes were observed by participants in the 20% of trials where reward and control were decoupled, in the sense that participants reached the treasure regardless of whether they boarded their vehicle of choice. Results showed that participants learned considerably more from control-related, as compared to chance, outcomes (mean learning ratio=1.90, CI= [1.83, 1.97]). Together, these analyses show that participants were forming latent controllability estimates rather than direct action-outcome associations.”

      Controllability inference by itself, however, still does not suffice to explain the observed behavior. This is shown by our ‘controllability’ model, which learns to invest more resources to improve control, yet still fails to capture key features of participants’ behavior, as detailed in the manuscript. This means that explaining participants’ behavior requires a model that not only infers controllability—beyond merely outcome probability—but also assumes a priori that increased effort could enhance control. Building these a priori assumption into the model amounts to embedding within it an understanding of elasticity – the idea that control over the environment may be increased by greater resource investment. 

      That being said, we acknowledge the value in considering alternative computational formulations of adaptation to elasticity, as now expressed in the revised discussion (lines 326-333; reproduced below in response to the Reviewer’s comment on updating controllability beliefs when losing with less than 3 tickets).

      For 1C, the claim that people infer elasticity outside of the experimental task cannot be supported because the authors explicitly tell people about the two notions of control as part of the training phase: "To reinforce participants' understanding of how elasticity and controllability were manifested in each planet, [participants] were informed of the planet type they had visited after every 15 trips." (line 384).

      We thank the Reviewer for highlighting this point. We agree that our experimental design does not test whether people infer elasticity spontaneously. However, our research question was whether people can distinguish between elastic and inelastic controllability. The results strongly support that they can, and this does have potential implications for behavior outside of the experimental task. Specifically, to the extent that people are aware that in some contexts additional resource investment improves control, whereas in other contexts it does not, then our results indicate that they would be able to distinguish between these two kinds of contexts through trial-and-error learning. That said, we agree that investigating whether and how people spontaneously infer elasticity is an interesting direction for future work. We have now added this to the discussion of future directions (lines 287-295):

      “Additionally, real life typically doesn’t offer the streamlined recurrence of homogenized experiences that makes learning easier in experimental tasks, nor are people systematically instructed and trained about elastic and inelastic control in each environment. These complexities introduce substantial additional uncertainty into inferences of elasticity in naturalistic settings, thus allowing more room for prior biases to exert their influences. The elasticity biases observed in the present studies are therefore likely to be amplified in real-life behavior. Future research should examine how these complexities affect judgments about the elasticity of control to better understand how people allocate resources in real-life.”

      Finally, I turn to claim 2, that individual differences in how people infer elasticity are importantly related to psychopathology. There is much to say about the decision to treat psychopathology as a unidimensional construct. However, I will keep it concrete and simply note that CCA (by design) obscures the relationship between any two variables. Thus, as suggestive as Figure 6B is, we cannot conclude that there is a strong relationship between Sense of Agency and the elasticity bias---this result is consistent with any possible relationship (even a negative one). The fact that the direct relationship between these two variables is not shown or reported leads me to infer that they do not have a significant or strong relationship in the data.

      We agree that CCA is not designed to reveal the relationship between any two variables. However, the advantage of this analysis is that it pulls together information from multiple variables. Doing so does not treat psychopathology as unidimensional. Rather, it seeks a particular dimension that most strongly correlates with different aspects of task performance.

      This is especially useful for multidimensional psychopathology data because such data are often dominated by strong correlations between dimensions, whereas the research seeks to explain the distinctions between the dimensions. Similar considerations apply to the multidimensional task parameters, which although less correlated, may still jointly predict the relevant psychopathological profile better than each parameter does in isolation. Thus, the CCA enabled us to identify a general relationship between task performance and psychopathology that accounts for different symptom measures and aspects of controllability inference. 

      Using CCA can thus reveal relationships that do not readily show up in two-variable analyses. Indeed, the direct correlation between Sense of Agency (SOA) and elasticity bias was not significant – a result that, for completeness, we now report in Supplementary Figure 3 along with all other direct correlations. We note, however, that the CCA analysis was preregistered and its results were replicated. Additionally, participants scoring higher on the psychopathology profile also overinvested resources in inelastic environments but did not futilely invest in uncontrollable environments (Figure 6A), providing external validation to the conclusion that the CCA captured meaningful variance specific to elasticity inference. Most importantly, an auxiliary analysis specifically confirmed the contributions of both elasticity bias (Figure 6D, middle plot) and, although not reported in the original paper, of the Sense of Agency score (SOA; p=.03 permutation test; see updated Figure 6D, bottom plot) to the observed canonical correlation. The results thus enable us to safely conclude that differences in elasticity inferences are significantly associated with a profile of control-related psychopathology to which SOA contributed significantly. We now report this when presenting the CCA results (lines 255-257): 

      “Loadings on the side of psychopathology were dominated by an impaired sense of agency (SOA; contribution to canonical correlation: p=.03, Figure 6D, bottom plot), along with obsessive compulsive symptoms (OCD), and social anxiety (LSAS) – all symptoms that have been linked to an impaired sense of control[22-25].”

      Finally, whereas interpretation of individual CCA loadings that were not specifically tested remains speculative, we note that the pattern of loadings largely replicated across the initial and replication studies (see Figure 6B), and aligns with prior findings. For instance, the positive loadings of SOA and OCD match prior suggestions that a lower sense of control leads to greater compensatory effort7, whereas the negative loading for depression scores matches prior work showing reduced resource investment in depression[5-6].

      We have now revised the manuscript to clarify the justification for our analytical approach (lines 236-248):

      “To examine whether the individual biases in controllability and elasticity inference have psychopathological ramifications, we assayed participants on a range of self-report measures of psychopathologies previously linked to a distorted sense of control (see Methods, pg. 24). Examining the direct correlations between model parameters and psychopathology measures (reported in Supplementary Figure 3) does not account for the substantial variance that is typically shared among different forms of psychopathology. For this reason, we instead used a canonical correlation analysis (CCA) to identify particular dimensions within the parameter and psychopathology spaces that most strongly correlate with one another.”

      We also now include a cautionary note in the discussion (lines 309-315):

      “Whereas our pre-registered CCA effectively identified associations between task parameters and a psychopathological profile, this analysis method does not directly reveal relationships between individual variables. Auxiliary analyses confirmed significant contributions of both elasticity bias and sense of agency to the observed canonical correlation, but the contribution of other measures remains to be determined by future work. Such work could employ other established measures of agency, including both behavioral indices and subjective self-reports, to better understand how these constructs relate across different contexts and populations.”

      There is also a feature of the task that limits our ability to draw strong conclusions about individual differences in elasticity inference. As the authors clearly acknowledge, the task was designed "to be especially sensitive to overestimation of elasticity" (line 287). A straightforward consequence of this is that the resulting *empirical* estimate of estimation bias (i.e., the gamma_elasticity parameter) is itself biased. This immediately undermines any claim that references the directionality of the elasticity bias (e.g. in the abstract). Concretely, an undirected deficit such as slower learning of elasticity would appear as a directed overestimation bias. When we further consider that elasticity inference is the only meaningful learning/decisionmaking problem in the task (argued above), the situation becomes much worse. Many general deficits in learning or decision-making would be captured by the elasticity bias parameter. Thus, a conservative interpretation of the results is simply that psychopathology is associated with impaired learning and decision-making.

      We apologize for our imprecise statement that the task was ‘especially sensitive to overestimation of elasticity’, which justifiably led to Reviewer’s concern that slower elasticity learning can be mistaken for elasticity bias. To make sure this was not the case, we made use of the fact that our computational model explicitly separates bias direction (𝜆) from the rate of learning through two distinct parameters, which initialize the prior concentration and mean of the model’s initial beliefs concerning elasticity (see Methods pg. 23). The higher the concentration of the initial beliefs (𝜖), the slower the learning. Parameter recovery tests confirmed that our task enables acceptable recovery of both the bias λ<sub>elasticity</sub> (r=.81) and the concentration 𝜖<sub>elasticity</sub> (r=.59) parameters. And importantly, the level of confusion between the parameters was low (confusion of 0.15 for 𝜖<sub>elasticity</sub> → λ<sub>elasticity</sub> and 0.04 for λ<sub>elasticity</sub>→ 𝜖<sub>elasticity</sub> This result confirms that our task enables dissociating elasticity biases from the rate of elasticity learning. 

      Moreover, to validate that the minimal level of confusion existing between bias and the rate of learning did not drive our psychopathology results, we re-ran the CCA while separating concentration from bias parameters. The results (figure below) demonstrate that differences in learning rate (𝜖) had virtually no contribution to our CCA results, whereas the contribution of the pure bias (𝜆) was preserved. 

      We now report on this additional analysis in the text (lines 617-627):

      “To capture prior biases that planets are controllable and elastic, we introduced parameters λ<sub>controllability</sub> and λ<sub>elasticity</sub>, each computed by multiplying the direction (λ – 0.5) and strength (ϵ) of individuals’ prior belief. 𝜖<sub>controllability</sub> and 𝜖<sub>elasticity</sub> range between 0 and 1, with values above 0.5 indicating a bias towards high controllability or elasticity, and values below 0.5 indicating a bias towards low controllability or elasticity. 𝜖<sub>controllability</sub> and 𝜖<sub>elasticity</sub> are positively valued parameters capturing confidence in the bias. Parameter recovery analyses confirmed both good recoverability (see S2 Table) and low confusion between bias direction and strength (𝜖<sub>controllability</sub> → λ<sub>controllability</sub> = −. 07, λ<sub>controllability</sub> → 𝜖<sub>controllability</sub> =. 16, 𝜖<sub>elasticity</sub> → λ<sub>elasticity</sub> =. 15, λ<sub>elasticity</sub> → 𝜖<sub>elasticity</sub> =. 04), ensuring that observed biases and their relation to psychopathology do not merely reflect slower learning (Supplementary Figure 4), which can result from changes in bias strength but not direction.”

      We also more precisely articulate the impact of providing participants with three free tickets at their initial visits to each planet.

      Showing that a model parameter correlates with the data it was fit to does not provide any new information, and cannot support claims like "a prior assumption that control is likely available was reflected in a futile investment of resources in uncontrollable environments." To make that claim, one must collect independent measures of the assumption and the investment.

      We apologize if this and related statements seemed to be describing independent findings. They were meant to describe the relationship between model parameters and model-independent measures of task performance. It is inaccurate, though, to say that they provide no new information, since results could have been otherwise. For instance, whether a higher controllability bias maps onto resource misallocation in uncontrollable environments (as we observed) depends on the range of this parameter in our population sample. Had the range been more negative, a higher controllability bias could have instead manifested as optimal allocation in controllable environments. Additionally, these analyses serve two other purposes: as a validity check, confirming that our computational model effectively captured observed individual differences, and as a help for readers to understand what each parameter in our model represents in terms of observable behavior. We now better clarify the descriptive purposes of these regressions (lines 214-220, 231-235): 

      “To clarify how fitted model parameters related to observable behavior, we regressed participants’ opt-in rates and extra ticket purchases on the parameters (Figure 6A) ...”

      “... In sum, the model parameters captured meaningful individual differences in how participants allocated their resources across environments, with the controllability parameter primarily explaining variance in resource allocation in uncontrollable environments, and the elasticity parameter primarily explaining variance in resource allocation in environments where control was inelastic.”

      Did participants always make two attempts when purchasing tickets? This seems to violate the intuitive model, in which you would sometimes succeed on the first jump. If so, why was this choice made? Relatedly, it is not clear to me after a close reading how the outcome of each trial was actually determined.

      We thank the Reviewer for highlighting the need to clarify these aspects of the task in the revised manuscript. 

      When participants purchased two extra tickets, they attempted both jumps, and were never informed about whether either of them succeeded. Instead, after choosing a vehicle and attempting both jumps, participants were notified where they arrived at. This outcome was determined based on the cumulative probability of either of the two jumps succeeding. Success meant that participants arrived at where their chosen vehicle goes, whereas failure meant they walked to the nearest location (as determined by where they started from). 

      Though it is unintuitive to attempt a second jump before seeing whether the first succeed, this design choice ensured two key objectives. First, that participants would consistently need to invest not only more money but also more effort and time in planets with high elastic controllability. Second, that the task could potentially generalize to the many real-world situations where the amount of invested effort has to be determined prior to seeing any outcome, for instance, preparing for an exam or a job interview. We now explicitly state these details when describing the experimental task (lines 393-395):

      “When participants purchased multiple tickets, they made all boarding attempts in sequence without intermediate feedback, only learning whether they successfully boarded upon reaching their final destination. This served two purposes. First, to ensure that participants would consistently need to invest not only more money but also more effort and time in planets with high elastic controllability. Second, to ensure that results could potentially generalize to the many real-world situations where the amount of invested effort has to be determined prior to seeing any outcome (e.g., preparing for an exam or a job interview).”

      It should be noted that the model is heuristically defined and does not reflect Bayesian updating. In particular, it overestimates control by not using losses with less than 3 tickets (intuitively, the inference here depends on your beliefs about elasticity). I wonder if the forced three-ticket trials in the task might be historically related to this modeling choice.

      We apologize for not making this clear, but in fact losing with less than 3 tickets does reduce the model’s estimate of available control. It does so by increasing the elasticity estimates (a<sub>elastic≥1</sub>,a<sub>elastic2</sub> parameters), signifying that more tickets are needed to obtain the maximum available level of control, thereby reducing the average controllability estimate across ticket investment options. We note this now in the presentation of the computational model (caption Figure 4):

      “A failure to board does not change estimated maximum controllability, but rather suggests that 1 ticket might not suffice to obtain control (a<sub>elastic≥1</sub> + 1; 𝑙𝑖𝑔ℎ𝑡 𝑔𝑟𝑒𝑒𝑛 𝑑𝑖𝑚𝑖𝑛𝑖𝑠ℎ𝑒𝑑). As a result, the model’s estimate of average controllability across ticket options is reduced.”

      It would be interesting to further develop the model such that losing with less than 3 tickets would also impact inferences concerning the maximum available control, depending on present beliefs concerning elasticity, but the forced three-ticket purchases already expose participants to the maximum available control, and thus, the present data may not be best suited to test such a model. These trials were implemented to minimize individual differences concerning inferences of maximum available control, thereby focusing differences on elasticity inferences. We now explicitly address these considerations in the revised discussion (lines 326-333) with the following: 

      “Future research could explore alternative models for implementing elasticity inference that extend beyond our current paradigm. First, further investigation is warranted concerning how uncertainty about controllability and its elasticity interact. In the present study, we minimized individual differences in the estimation of maximum available control by providing participants with three free tickets at their initial visits to each planet. We made this design choice to isolate differences in the estimation of elasticity, as opposed to maximum controllability. To study how these two types of estimations interact, future work could benefit from modifying this aspect of our experimental design.”

      Furthermore, we have now tested a Bayesian model suggested by Reviewer 1, but we found that this model fitted participants’ choices worse (see details in the response to Reviewer 1’s comments). 

      Recommendations for the authors:

      Reviewer 1 (Recommendations for the authors):

      In the introduction, the definition of controllability and elasticity, and the scope of "resources" investigated in the current study were unclear. If I understand correctly, controllability is defined as "the degree to which actions influence the probability of obtaining a reward", and elasticity is defined as the change in controllability based on invested resources. This would define the controllability of the environment and the elasticity of controllability of the environment. However, phrases such as "elastic environment" seem to imply that elasticity can directly attach to an environment, instead of attaching to the controllability of the environment.

      We thank the Reviewer for highlighting the need to clarify our conceptualization of elasticity and controllability. We now provide formal definitions of both, with controllability defined as the fraction of controllably achievable reward[1], and elasticity as the reduction in uncertainty about controllability due to knowing the amount of resources the agent is willing and able to invest (see further details in the response to Reviewer 3’s public comments). In the revised manuscript, we now use more precise language to clarify that elasticity is a property of controllability, not of environments themselves. In addition, we now clarify that the current study manipulated monetary, attentional effort, and time costs together (see further details in the response to Reviewer 1’s public comments).   

      (2) Some of the real-world examples were confusing. For example, the authors mention that investing additional effort due to the belief that this leads to better outcomes in OCD patients is overestimated elasticity, but exercising due to the belief that this can make one taller is overestimated controllability. What's the distinction between the examples? The example of the chess expert practicing to win against a novice, because the amount of effort they invest would not change their level of control over the outcome is also unclear. If the control over the outcome depends on their skill set, wouldn't practicing influence the control over the outcome? In the case of the meeting time example, wouldn't the bus routes differ in their time investments even though they are the same price? In addition to focusing the introductory examples around monetary resources, I would also generally recommend tightening the link between those examples and the experimental task.

      We thank the Reviewer for highlighting the need to clarify the examples used to illustrate elasticity and controllability. We have now revised these examples to more clearly distinguish between the concepts and to strengthen their connection to the experimental task.

      Regarding the OCD example, the possibility that OCD patients overestimate elasticity comes from research suggesting they experience low perceived control but nevertheless engage in excessive resource investment2, reflecting a belief that only through repeated and intense effort can they achieve sufficient control over outcomes. As an example, consider an OCD patient investing unnecessary effort in repeatedly locking their door. This behavior cannot result from an overestimation of controllability because controllability truly is close to maximal. It also cannot result from an underestimation of the maximum attainable control, since in that case investing more effort is futile. Such behavior, however, can result from an overestimation of the degree to which controllability requires effort (i.e., overestimation of elasticity). 

      Similarly, with regards to the chess expert, we intended to illustrate a situation where given their current level, the chess expert is already virtually guaranteed to win, such that additional practice time does not improve their chances. Conversely, the height example illustrates overestimated controllability because the outcome (becoming taller through exercise) is in fact not amenable to control through any amount of resource investment.

      Finally, the meeting time example was meant to illustrate that if the desired outcome is reaching a meeting in time, then different bus routes that cost the same provide equal control over this outcome to anyone who can afford the basic fare. This demonstrates inelastic controllability with respect to money, as spending more on transportation doesn't increase the probability of reaching the meeting on time. The Reviewer correctly notes that time investment may differ between routes. However, investing more time does not improve the expected outcome. This illustrates that inelastic controllability does not preclude agents from investing more resources, but such investment does not increase the fraction of controllably achievable reward (i.e., the probability of reaching the meeting in time).

      In the revised manuscript, we’ve refined each of the above examples to better clarify the specific resources being considered, the outcomes they influence, and their precise relationship to both elasticity and controllability: 

      OCD (lines 40-43): Conversely, the repetitive and unusual amount of effort invested by people with obsessive-compulsive disorder in attempts to exert control[23,24] could indicate an overestimation of elasticity, that is, a belief that adequate control can only be achieved through excessive and repeated resource investment[25].  

      Chess expert (54-57): Alternatively, they may do so because they overestimate the elasticity of control – for example, a chess expert practicing unnecessarily hard to win against a novice, when their existing skill level already ensures control over the match's outcome.

      Height (lines 53-54): A given individual, for instance, may tend to overinvest resources because they overestimate controllability – for example, exercising due to a misguided belief that that this can make one taller, when in fact height cannot be controlled. 

      Meeting time (lines 26-28): Choosing between bus routes affords equal control over commute time to anyone who can afford the basic fare (Figure 1).

      Methods

      (1) In the elastic controllability model definition, controllability is defined as "the belief that boarding is possible" (with any number of tickets). The definition again is different from in the task description where controllability is defined as "the probability of the chosen vehicle stopping at the platform if purchasing a single ticket."

      We clarify that "the probability of the chosen vehicle stopping at the platform if purchasing a single ticket" is our definition for inelastic controllability, as opposed to overall/maximum controllability, as stated here (lines 101-103):

      "We defined inelastic controllability as the probability that even one ticket would lead to successfully boarding the vehicle, and elastic controllability as the degree to which two extra tickets would increase that probability."

      Overall controllability is the summation of the two. This summation is referred to in the elastic controllability model definition as the "the belief that boarding is possible". We now clarify this in the caption to figure 4:

      Elastic Controllability model: Represents beliefs about maximum controllability (black outline) and the degree to which one or two extra tickets are necessary to obtain it. These beliefs are used to calculate the expected control when purchasing 1 ticket (inelastic controllability) and the additional control afforded by 2 and 3 tickets (elastic controllability).    

      We also clarify this in the methods when describing the parameterization of the model (lines 529-531): 

      The expected value of one beta distribution (defined by a,sub>control</sub>, b,sub>control</sub>) represents the belief that boarding is possible (controllability) with any number of tickets. 

      (2) The free parameter K is confusing. What is the psychological meaning of this parameter? Is it there just to account for the fact that failure with 3 tickets made participants favor 3 tickets or is there meaning attached to including this parameter?

      This parameter captures how participants update their beliefs about resource requirements after failing to board with maximum resource investment. Our psychological interpretation is that participants who experience failure despite maximum investment (3 tickets) prioritize resolving uncertainty about whether control is fundamentally possible (before exploring whether control is elastic), which can only be determined by continuing to invest maximum resources. 

      We now clarify this in the methods (lines 555-559):

      To account for our finding that failure with 3 tickets made participants favor 3, over 1 and 2, tickets, we introduced a modified elastic controllability* model, wherein purchasing extra tickets is also favored upon receiving evidence of low controllability (loss with 3 tickets). This effect was modulated by a free parameter 𝜅 which reflects a tendency to prioritize resolving uncertainty about whether control is at all possible by investing maximum resources.

      This interpretation is supported by our analysis of 3-ticket choice trajectories (Supplementary Figure 2 presented in response to Reviewer 2). As shown in the figure, participants who win less than 50% of their 3-ticket attempts persistently purchase 3 tickets over the first 10 trials, despite frequent failures. This persistence gradually declines as participants accumulate evidence about their limited control, corresponding with an increase in opt-out rates.

      (3) Some additional details about the task design would be helpful. It seems that participants first completed 90 practice trials and were informed of the planet type every 15 trials (6 times during practice). What message is given to the participants about the planets? Did the authors analyze the last 15 trials of each condition in the regression analysis, and all 30 trials in the modeling analysis? How does the computational model (especially the prior beliefs parameters) reset when the planet changes? How do points accumulate over the session and/or are participants motivated to budget the points? Is it possible for participants to accumulate many points and then switch to a heuristic of purchasing 3 tickets on each trial?

      We apologize for not previously clarifying these details of the experimental design.

      During practice blocks, participants received explicit feedback about each planet's controllability characteristics, to help them understand when additional resources would or would not improve their boarding success. For high inelastic controllability planets, the message read: "Your ride actually would stop for you with 1 ticket! So purchasing extra tickets, since they do cost money, is a WASTE." For low controllability planets: "Doesn't seem like the vehicle stops for you nor does purchasing extra tickets help." Lastly, for high elastic controllability planets: "Hopefully by now it's clear that only by purchasing 3 tickets (LOADING AREA) are you consistently successful in catching your ride." We now include these messages in the methods section describing the task (lines 453-458).

      We indeed analyzed the last 15 trials of each condition in the regression analysis, and all 30 trials in the modeling analysis. Whereas the modeling attempted to explain participants’ learning process, the regression focused on explaining the resultant behavior, which in our pilot data (N=19), manifested fairly stably in the last 15 trials (ticket choices SD = 0.33 compared to .63 in the first 15 trials). The former is already stated in the text (lines 409-415), and we now also clarify the latter when discussing the model fitting procedure (line 695): 

      Reinforcement-learning models were fitted to all choices made by participants via an expectation maximization approach used in previous work.

      The computational model was initialized with the same prior parameters for all planets. When a participant moved to a new planet, the model's beliefs were reset to these prior values, capturing how participants would approach each new environment with their characteristic expectations about controllability and elasticity. We now clarify this in the methods (line 628): 

      For each new planet participants encountered, these parameters were used to initialize the beta distributions representing participants’ beliefs

      Points accumulated across all planets throughout the session, with participants explicitly motivated to maximize their total points as this directly determined their monetary bonus payment. To address the Reviewer's question about changes in ticket purchasing behavior, we conducted a mixed probit regression examining whether accumulated points influenced participants’ decisions to purchase extra tickets. We did not find such an effect (𝛽<sub>coins accumulated</sub> \= .01 𝑝 = .87), indicating that participants did not switch to simple heuristic strategies after accumulating enough coins. We now report this analysis in the methods (lines 421-427):

      Points accumulated across all planets throughout the session, with participants explicitly motivated to maximize their total points as this directly determined their monetary bonus payment. To ensure that accumulated gains did not lead participants to adopt a simple heuristic strategy of always purchasing 3 tickets, we conducted a mixed probit regression examining whether the number of accumulated coins influenced participants' decisions to purchase extra tickets. We did not find such an effect (𝛽<sub>coins accumulated</sub> = .01 𝑝 = .87), ruling out the potential strategy shift.

      Following the modeling section, it may be helpful to have a table of the fitted models, the parameters of each model, and the meaning/interpretation of each parameter.

      We thank the Reviewer for this suggestion. We have now added a table (Supplementary Table 3) that summarizes all fitted models, their parameters, and the meaning/interpretation of each parameter.

      (1) The conclusions from regressing the task choices (opt-in rates and ticket purchases) on the fitted parameters seem confusing given that the model parameters were fitted on the task behavior, and the relationship between these variables seems circular. For example, the authors found that preferences for purchasing 2 or 3 tickets (a2 and a3; computational parameters) were associated with purchasing more tickets (task behavior). But wouldn't this type of task behavior be what the parameters are explaining? It's not clear whether these correlation analyses are about how individuals allocate their resources or about the validity check of the parameters. Perhaps analyses on individual deviation from the optimal strategy and parameter associations with such deviation are better suited for the questions about whether individual biases lead to resource misallocation.

      We thank the Reviewer for highlighting this seeming confusion. These regressions were meant to describe the relationship between model parameters and model-independent measures of task performance. This serves three purposes. First, a validity check, confirming that our computational model effectively captured observed individual differences. Second, to help readers understand what each parameter in our model represents in terms of observable behavior. Third, to examine in greater detail how parameter values specifically mapped onto observable behavior. For instance, whether a higher controllability bias maps onto resource misallocation in uncontrollable environments (as we observed) depends on the range of this parameter in our population sample. Had the range been more negative, a higher controllability bias could have instead manifested as optimal allocation in controllable environments. We now better clarify the descriptive purposes of these regressions (lines 214-220, 231-235): 

      To clarify how fitted model parameters related to observable behavior, we regressed participants’ opt-in rates and extra ticket purchases on the parameters (Figure 6A) ... 

      ... In sum, the model parameters captured meaningful individual differences in how participants allocated their resources across environments, with the controllability parameter primarily explaining variance in resource allocation in uncontrollable environments, and the elasticity parameter primarily explaining variance in resource allocation in environments where control was inelastic.  

      Regarding the suggestion to analyze deviation from optimal strategy, this corresponds with our present approach in that opting in is always optimal in high controllability environments and always non-optimal in low controllability environments, and similarly, purchasing extra tickets is always optimal in elastic controllability environments and always non-optimal elsewhere. Thus, positive or negative coefficients can be directly translated into closer or farther from optimal, depending on the planet type, as indicated in the figure by color. We now clarify this mapping in the figure legend:

      (2) Minor: The legend of Figure 6A is difficult to read. It might be helpful to label the colors as their planet types (low controllability, high elastic controllability, high inelastic controllability).

      We thank the Reviewer for this helpful suggestion. We have revised the figure accordingly.

      Reviewer 2 (Recommendations for the authors):

      As noted above, I'm not sure I agree with (or perhaps don't fully understand) the claims the authors make about the distinctions between their "elastic" and "inelastic" experimental conditions. Let's take the travel example from Figure 1 - is this not just an example of “hierarchical” controllability calculations? In other words, in the elastic example, my choice is between going one speed or another (i.e., exerting more or less effort), and in the inelastic example, my choice is first, which route to take (also a consideration of speed, but with lower effort costs than the elastic scenario), and second, an estimate of the time cost (not within my direct control, but could be estimated). In the elastic scenarios, additional value considerations vary between options, and in others (inelastic), they don't, with control over the first choice point (which bus route to choose, or which lunch option to take), but not over the price. I wonder if the paper would be better framed (or emphasized) as exploring the influences of effort and related "costs" of control. There isn't really such a thing as controllability that does not have any costs associated with it (whether that be action costs, effort, money, or simply scenario complexity).

      We thank the Reviewer for highlighting the need to clarify our distinction between elastic and inelastic controllability as it manifests in our examples. We first clarify that elasticity concerns how controllability varies with resources, not costs. Though resource investment and costs are often tightly linked, that is not always the case, especially not when comparing between agents. For example, it may be equally difficult (i.e., costly) for a professional biker to pedal at a high speed as it is for a novice to pedal at a medium speed, simply because the biker’s muscles are better trained. This resource advantage increases the biker’s control over his commute time without incurring additional costs as compared to the novice. We now clarify this distinction in the text by revising our example to (lines 9-11): 

      “For example, the control a biker has over their commute time depends on the power they are willing and able to invest in pedaling. In this respect, a highly trained biker would typically have more control than a novice.”

      Second, whereas in our examples additional value considerations indeed vary in elastic environments, that does not have to be the case, and indeed, that is not the case in our experiment. In our experimental task, participants are given the option to purchase as many tickets as they wish regardless of whether they are in an elastic or an inelastic environment.  

      We agree that elastic environments often raise considerations regarding the cost of control (for instance, whether it is worth it to pedal harder to get to the destination in time). To consider this cost against potential payoffs, however, the agent must first determine what are the potential payoffs – that is, it must determine the degree to which controllability is elastic to invested resources. It is this antecedent inference that our experiment studies. We uniquely study this inference using environments where control may not only be low or high, but also, where high control may or may not require additional resource investments. We now clarify this point in Figure 1’s caption:

      “In all situations, agents must infer the degree to which controllability is elastic to be able to determine whether the potential gains in control outweigh the costs of investing additional resources (e.g., physical exertion, money spent, time invested).”

      For a formal definition of the elasticity of control, see our response to Reviewer 3’s public comments. 

      Relatedly, another issue I have with the distinctions between inelastic/elastic is that a high/elastic condition has inherently ‘more’ controllability than a high/inelastic condition, no matter what. For example, in the lunch option scenario, I always have more control in the elastic situation because I have two opportunities to exert choice (food option ‘and’ cost). Is there really a significant difference, then, between calling these distinctions "elastic/inelastic" vs. "higher/lower controllability?" Not that it's uninteresting to test behavioral differences between these two types of scenarios, just that it seems unnecessary to refer to these as conceptually distinct.

      As noted in the response above, control over costs may be higher in elastic environments, but it does not have to be so, as exemplified by the elastic environments in our experimental task. For a fuller explanation of why higher elasticity does not imply higher controllability, see our response to Reviewer 2’s public comments. 

      I also wonder whether it's actually the case that people purchased more tickets in the high control elastic condition simply because this is the optimal solution to achieve the desired outcome, not due to a preference for elastic control. To test this, you would need to include a condition in which people opted to spend more money/effort to have high elastic control in an instance where it was not beneficial to do so.

      We appreciate the Reviewer's question about potential preferences for elastic control. We first clarify that participants did not choose which environment type they encountered, so if control was low or inelastic, investing extra resources did not give them more control. Furthermore, our results show that the average participant did not prefer a priori to purchase more tickets. This is evidenced by participants’ successful adaptation to inelastic environments wherein they purchased significantly fewer tickets (see Figure 2B and 2C), and by participants’ parameter fits, which reveal an a priori bias to assume that controllability is inelastic (𝜆<sub>elasticity</sub> \= .16 ± .19), as well as a fixed preference against purchasing the full number of tickets (𝛼<sub>3</sub> \= −.74 ± .37). 

      We now clarify these findings by including a table of all parameter fits in the revised manuscript (see response to Reviewer 1). 

      It was interesting that the authors found that failure with 3 tickets made people more likely to continue to try 3 tickets, however, there is another possible interpretation. Could it be that this is simply evidence of a general controllability bias, where people just think that it is expected that you should be able to exert more money/effort/time to gain control, and if this initially fails, it is an unusual outcome, and they should try again? Did you look at this trajectory over time? i.e., whether repeated tries with 3 tickets immediately followed a failure with 3 tickets? Relatedly, does the perseveration parameter from the model also correlate with psychopathology?

      We thank the Reviewer for this suggestion. Our model accounts for a general controllability bias through the 𝜆<sub>controllability</sub> parameter, which represents a prior belief that planets are controllable. It also accounts, through the 𝜆<sub>elasticity</sub> parameter, for the prior belief that you should be able to exert more money/effort/time to gain control. Now, our addition of 𝜅 to the model captures the observation that failures with 3 tickets made participants more likely to purchase 3 tickets when they opted in. If this observation was due to participants not accepting that the planet is not controllable, then we would expect the increase in 3-ticket purchases when opting in to be coupled with a diminished reduction in opting in. To determine whether this was the case, we tested a variant of our model where 𝜅 not only increases the elasticity estimate but also reduces the controllability update (using 𝛽<sub>control</sub>+(1- 𝜅) instead of 𝛽<sub>control</sub>+1) after failures with 3 tickets. However, implementing this coupling diminished the model's fit to the data, as compared to allowing both effects to occur independently, indicating that the increase in 3 ticket purchases upon failing with 3 tickets did not result from participants not accepting that controllability is in fact low. Thus, we maintain our original interpretation that failure with 3 tickets increases uncertainty about whether control is possible at all, leading participants who continue to opt in to invest maximum resources to resolve this uncertainty. We now report these results in the revised text (lines 662-674). 

      The trajectory over time is consistent this interpretation (new Supplementary Figure 2 shown below). Specifically, we see that under low controllability (0-50%, orange line), over the first 10 trials participants show higher persistence with 3 tickets after failing, despite experiencing frequent failures, but also a higher opt-out probability. As these participants accumulate evidence about their limited control, we observe a gradual decrease in 3-ticket selections that corresponds directly with a further increase in opting out (right panel, orange line). This pattern qualitatively corresponds with the behavior of our computational model (empty circles). We present the results of the new analysis in lines 180-190: 

      “In fact, failure with 3 tickets even made participants favor 3, over 1 and 2, tickets. This favoring  of 3 tickets continued until participants accumulated sufficient evidence about their limited control to opt out (Supplementary Figure 2). Presumably, the initial failures with 3 tickets resulted in an increased uncertainty about whether it is at all possible to control one’s destination. Consequently, participants who nevertheless opted in invested maximum resources to resolve this uncertainty before exploring whether control is elastic.”

      Regarding correlations between the perseveration parameter and psychopathology, we have now conducted a comprehensive exploratory analysis of all two-way relationships between parameters and psychopathology scores (new Supplementary Figure 3). Whereas we observed modest negative correlations with social anxiety (LSAS, r=-0.13), cyclothymic temperament (r=0.13), and alcohol use (AUDIT, r=-0.13), none reached statistical significance after FDR correction for multiple comparisons. 

      Regarding the modeling, I also wondered whether a better alternative model than the controllability model would be a simple associative learning model, where a number of tickets are mapped to outcomes, regardless of elasticity.

      We thank the Reviewer for suggesting this alternative model. Following this suggestion, we implemented a simple associative learning model that directly maps each option to its expected value, without a latent representation of elasticity or controllability. Unlike our controllability model which learns the probability of reaching the goal state for each ticket quantity, this associative learning model simply updates option values based on reward prediction errors.

      We found that this simple Q-learning model performed worse than even the controllability model at explaining participant data (log Bayes Factor  ≥1854 on the combined datasets), further supporting our hypothesis that participants are learning latent estimates of control rather than simply associating options with outcomes. We present the results of this analysis in lines 662664:

      We implemented a simple Q-learning model that directly maps ticket quantities to expected values based on reward prediction errors, without representing latent controllability. This associative model performed substantially worse than even our simple controllability model (log Bayes Factor ≥ 1854 on the combined datasets).

      Reviewer 3 (Recommendations for the authors):

      Please make all materials available, including code (analysis and experiment) and data. Please also provide a link to the task or a video of a few trials of the main task.

      We thank the reviewer for this important suggestion. All requested materials are now available at https://github.com/lsolomyak/human_inference_of_elastic_control. This includes all experiment code, analysis code, processed data, and a video showing multiple sample trials of the main task.

      References

      (1)  Huys, Q. J. M., & Dayan, P. (2009). A Bayesian formulation of behavioral control. Cognition, 113(3), 314– 328.

      (2)  Ligneul, R. (2021). Prediction or causation? Towards a redefinition of task controllability. Trends in Cognitive Sciences, 25(6), 431–433.

      (3)  Mistry, P., & Liljeholm, M. (2016). Instrumental divergence and the value of control. Scientific Reports, 6, 36295.

      (4)  Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145–151

      (5)  Cohen RM, Weingartner H, Smallberg SA, Pickar D, Murphy DL. Effort and cognition in depression. Arch Gen Psychiatry. 1982 May;39(5):593-7. doi: 10.1001/archpsyc.1982.04290050061012. PMID: 7092490.

      (6)  Bi R, Dong W, Zheng Z, Li S, Zhang D. Altered motivation of effortful decision-making for self and others in subthreshold depression. Depress Anxiety. 2022 Aug;39(8-9):633-645. doi: 10.1002/da.23267. Epub 2022 Jun 3. PMID: 35657301; PMCID: PMC9543190.

      (7)  Tapal, A., Oren, E., Dar, R., & Eitam, B. (2017). The Sense of Agency Scale: A measure of consciously perceived control over one's mind, body, and the immediate environment. Frontiers in Psychology, 8, 1552

    1. eLife Assessment

      This study establishes bathy phytochromes, a unique class of bacterial photoreceptors that respond to near-infrared light (NIR), as important tools for bacterial optogenetics. NIR light is a key control signal in optogenetics due to its deep tissue penetration and the ability to combine with existing red- and blue-light sensitive systems, but thus far, NIR-activated proteins have been poorly characterized. The strength of the evidence is solid overall, with comprehensive in vitro characterization, modular design strategies, and validation across different hosts. There are some questions that remain such as the rationale for linker choices, characterization of growth and performance relative to controls, and the physiological significance of color blind effects at alkaline pH but overall, this study should advance the fields of optogenetics and photobiology and inspire future work.

    2. Reviewer #1 (Public review):

      Summary:

      This is an interesting study characterizing and engineering so-called bathy phytochromes, i.e., those that respond to near infrared (NIR) light in the ground state, for optogenetic control of bacterial gene expression. Previously, the authors have developed a structure-guided approach to functionally link several light-responsive protein domains to the signaling domain of the histidine kinase FixL, which ultimately controls gene expression. Here, the authors use the same strategy to link bathy phytochrome light-responsive domains to FixL, resulting in sensors of NIR light. Interestingly, they also link these bathy phytochrome light-sensing domains to signaling domains from the tetrathionate-sensing SHK TtrS and the toluene-sensing SHK TodS, demonstrating the generality of their protein engineering approach more broadly across bacterial two-component systems.

      This is an exciting result that should inspire future bacterial sensor design. They go on to leverage this result to develop what is, to my knowledge, the first system for orthogonally controlling the expression of two separate genes in the same cell with NIR and Red light, a valuable contribution to the field.

      Finally, the authors reveal new details of the pH-dependent photocycle of bathy phytochromes and demonstrate that their sensors work in the gut - and plant-relevant strains E. coli Nissle 1917 and A. tumefaciens.

      Strengths:

      (1) The experiments are well-founded, well-executed, and rigorous.

      (2) The manuscript is clearly written.

      (3) The sensors developed exhibit large responses to light, making them valuable tools for ontogenetic applications.

      (4) This study is a valuable contribution to photobiology and optogenetics.

      Weaknesses:

      (1) As the authors note, the sensors are relatively insensitive to NIR light due to the rapid dark reversion process in bathy phytochromes. Though NIR light is generally non-phototoxic, one would expect this characteristic to be a limitation in some downstream applications where light intensities are not high (e.g., in vivo).

      (2) Though they can be multiplexed with Red light sensors, these bathy phytochrome NIR sensors are more difficult to multiplex with other commonly used light sensors (e.g., blue) due to the broad light responsivity of the Pfr state. This challenge may be overcome by careful dosing of blue light, as the authors discuss, but other bacterial NIR sensing systems with less cross-talk may be preferred in some applications.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Meier et al. engineer a new class of light-regulated two-component systems. These systems are built using bathy-bacteriophytochromes that respond to near-infrared (NIR) light. Through a combination of genetic engineering and systematic linker optimization, the authors generate bacterial strains capable of selective and tunable gene expression in response to NIR stimulation. Overall, these results are an interesting expansion of the optogenetic toolkit into the NIR range. The cross-species functionality of the system, modularity, and orthogonality have the potential to make these tools useful for a range of applications.

      Strengths:

      (1) The authors introduce a novel class of near-infrared light-responsive two-component systems in bacteria, expanding the optogenetic toolbox into this spectral range.

      (2) Through engineering and linker optimization, the authors achieve specific and tunable gene expression, with minimal cross-activation from red light in some cases.

      (3) The authors show that the engineered systems function robustly in multiple bacterial strains, including laboratory E. coli, the probiotic E. coli Nissle 1917, and Agrobacterium tumefaciens.

      (4) The combination of orthogonal two-component systems can allow for simultaneous and independent control of multiple gene expression pathways using different wavelengths of light.

      (5) The authors explore the photophysical properties of the photosensors, investigating how environmental factors such as pH influence light sensitivity.

      Weaknesses:

      (1) The expression of multi-gene operons and fluorescent reporters could impose a metabolic burden. The authors should present data comparing optical density for growth curves of engineered strains versus the corresponding empty-vector control to provide insight into the burden and overall impact of the system on host viability and growth.

      (2) The manuscript consistently presents normalized fluorescence values, but the method of normalization is not clear (Figure 2 caption describes normalizing to the maximal fluorescence, but the maximum fluorescence of what?). The authors should provide a more detailed explanation of how the raw fluorescence data were processed. In addition, or potentially in exchange for the current presentation, the authors should include the raw fluorescence values in supplementary materials to help readers assess the actual magnitude of the reported responses.

      (3) Related to the prior point, it would be useful to have a positive control for fluorescence that could be used to compare results across different figure panels.

      (4) Real-time gene expression data are not presented in the current manuscript, but it would be helpful to include a time-course for some of the key designs to help readers assess the speed of response to NIR light.

    4. Reviewer #3 (Public review):

      Summary:

      This paper by Meier et al introduces a new optogenetic module for the regulation of bacterial gene expression based on "bathy-BphP" proteins. Their paper begins with a careful characterization of kinetics and pH dependence of a few family members, followed by extensive engineering to produce infrared-regulated transcriptional systems based on the authors' previous design of the pDusk and pDERusk systems, and closing with characterization of the systems in bacterial species relevant for biotechnology.

      Strengths:

      The paper is important from the perspective of fundamental protein characterization, since bathy-BphPs are relatively poorly characterized compared to their phytochrome and cyanobacteriochrome cousins. It is also important from a technology development perspective: the optogenetic toolbox currently lacks infrared-stimulated transcriptional systems. Infrared light offers two major advantages: it can be multiplexed with additional tools, and it can penetrate into deep tissues with ease relative to the more widely used blue light-activated systems. The experiments are performed carefully, and the manuscript is well written.

      Weaknesses:

      My major criticism is that some information is difficult to obtain, and some data is presented with limited interpretation, making it difficult to obtain intuition for why certain responses are observed. For example, the changes in red/infrared responses across different figures and cellular contexts are reported but not rationalized. Extensive experiments with variable linker sequences were performed, but the rationale for linker choices was not clearly explained. These are minor weaknesses in an overall very strong paper.

    1. eLife Assessment

      This work models reinforcement-learning experiments using a recurrent neural network. It examines if the detailed credit assignment necessary for back-propagation through time can be replaced with random feedback. In this important study the authors show that it yields a satisfactory approximation and the evidence to support that it holds within relatively simple tasks is solid.

    2. Reviewer #1 (Public review):

      Summary:

      Can a plastic RNN serve as a basis function for learning to estimate value. In previous work this was shown to be the case, with a similar architecture to that proposed here. The learning rule in previous work was back-prop with an objective function that was the TD error function (delta) squared. Such a learning rule is non-local as the changes in weights within the RNN, and from inputs to the RNN depends on the weights from the RNN to the output, which estimates value. This is non-local, and in addition, these weights themselves change over learning. The main idea in this paper is to examine if replacing the values of these non-local changing weights, used for credit assignment, with random fixed weights can still produce similar results to those obtained with complete bp. This random feedback approach is motivated by a similar approach used for deep feed-forward neural networks.

      This work shows that this random feedback in credit assignment performs well but is not as well as the precise gradient-based approach. When more constraints due to biological plausibility are imposed performance degrades. These results are consistent with previous results on random feedback.

      Strengths:

      • The authors show that random feedback can approximate well a model trained with detailed credit assignment.<br /> • The authors simulate several experiments including some with probabilistic reward schedules and show results similar to those obtained with detailed credit assignments as well as in experiments.<br /> • The paper examines the impact of more biologically realistic learning rules and the results are still quite similar to the detailed back-prop model.

      Weaknesses:

      • The impact of the article is limited by using a network with discrete time-steps, and only a small number of time steps from stimulus to reward. They assume that each time step is on the order of hundreds of ms. They justify this by pointing to some slow intrinsic mechanisms, but they do not implement these slow mechanisms is a network with short time steps, instead they assume without demonstration that these could work as suggested. This is a reasonable first approximation, but its validity should be explicitly tested.

      • As the delay between cue and reward increases the performance decreases. This is not surprising given the proposed mechanism, but is still a limitation, especially given that we do not really know what a is the reasonable value of a single time step.

    3. Reviewer #2 (Public review):

      Summary:

      Tsurumi et al. show that recurrent neural networks can learn state and value representations in simple reinforcement learning tasks when trained with random feedback weights. The traditional method of learning for recurrent network in such tasks (backpropogation through time) requires feedback weights which are a transposed copy of the feed-forward weights, a biologically implausible assumption. This manuscript builds on previous work regarding "random feedback alignment" and "value-RNNs", and extends them to a reinforcement learning context. The authors also demonstrate that certain non-negative constraints can enforce a "loose alignment" of feedback weights. The author's results suggest that random feedback may be a powerful tool of learning in biological networks, even in reinforcement learning tasks.

      Strengths:

      The authors describe well the issues regarding biologically plausible learning in recurrent networks and in reinforcement learning tasks. They take care to propose networks which might be implemented in biological systems and compare their proposed learning rules to those already existing in literature. Further, they use small networks on relatively simple tasks, which allows for easier intuition into the learning dynamics.

      Weaknesses:

      The principles discovered by the authors in these smaller networks are not applied to larger networks or more complicated tasks with long temporal delays (>100 timesteps), so it remains unclear to what degree these methods can scale or can be used more generally.

      Comments on revisions: I would still want to see how well the network learns tasks with longer time delays (on the order of 100 or even 1000 timesteps). Previous work has shown that random feedback struggles to encode longer timescales (see Murray 2019, Figure 2), so I would be interested to see how that translates to the RL context in your model.

    4. Reviewer #3 (Public review):

      Summary:

      The paper studies learning rules in a simple sigmoidal recurrent neural network setting. The recurrent network has a single layer of 10 to 40 units. It is first confirmed that feedback alignment (FA) can learn a value function in this setting. Then so-called bio-plausible constraints are added: (1) when value weights (readout) is non-negative, (2) when the activity is non-negative (normal sigmoid rather than downscaled between -0.5 and 0.5), (3) when the feedback weights are non-negative, (4) when the learning rule is revised to be monotic: the weights are not downregulated. In the simple task considered all four biological features do not appear to impair totally the learning.

      Strengths:

      (1) The learning rules are implemented in a low-level fashion of the form: (pre-synaptic-activity) x (post-synaptic-activity) x feedback x RPE. Which is therefore interpretable in terms of measurable quantities in the wet-lab.

      (2) I find that non-negative FA (FA with non negative c and w) is the most valuable theoretical insight of this paper: I understand why the alignment between w and c is automatically better at initialization.

      (3) The task choice is relevant, since it connects with experimental settings of reward conditioning with possible plasticity measurements.

      Weaknesses:

      (4) The task is rather easy, so it's not clear that it really captures the computational gap that exists with FA (gradient-like learning) and simpler learning rule like a delta rule: RPE x (pre-synpatic) x (post-synaptic). To control if the task is not too trivial, I suggest adding a control where the vector c is constant c_i=1.

      (5) Related to point 3), the main strength of this paper is to draw potential connection with experimental data. It would be good to highlight more concretely the prediction of the theory for experimental findings. (Ideally, what should be observed with non-negative FA that is not expected with FA or a delta rule (constant global feedback) ?).

      (6a) Random feedback with RNN in RL have been studied in the past, so it is maybe worth giving some insights how the results and the analyzes compare to this previous line of work (for instance in this paper [1]). For instance, I am not very surprised that FA also works for value prediction with TD error. It is also expected from the literature that the RL + RNN + FA setting would scale to tasks that are more complex than the conditioning problem proposed here, so is there a more specific take-home message about non-negative FA? or benefits from this simpler toy task?

      (6b) Related to task complexity, it is not clear to me if non-negative value and feedback weights would generally scale to harder tasks. If the task in so simple that a global RPE signal is sufficient to learn (see 4 and 5), then it could be good to extend the task to find a substantial gap between: global RPE, non-negative FA, FA, BP. For a well chosen task, I expect to see a performance gap between any pair of these four learning rules. In the context of the present paper, this would be particularly interesting to study the failure mode of non-negative FA and the cases where it does perform as well as FA.

      (7) I find that the writing could be improved, it mostly feels more technical and difficult than it should. Here are some recommendations:<br /> 7a) For instance, the technical description of the task (CSC) is not fully described and requires background knowledge from other paper which is not desirable.<br /> 7b) Also the rationale for the added difficulty with the stochastic reward and new state is not well explained.<br /> 7c) In the technical description of the results I find that the text dives into descriptive comments of the figures but high-level take home messages would be helpful to guide the reader. I got a bit lost, although I feel that there is probably a lot of depth in these paragraphs.

      (8) Related to the writing issue and 5), I wished that "bio-plausibility" was not the only reason to study positive feedback and value weights. Is it possible to develop a bit more specifically what and why this positivity is interesting? Is there an expected finding with non-negative FA both in the model capability? or maybe there is a simpler and crisp take-home message to communicate the experimental predictions to the community would be useful?

      [1] https://www.nature.com/articles/s41467-020-17236-y

      Comments on revisions:

      Thank you for addressing all my comments in your reply.

    5. Author response:

      The following is the authors’ response to the original reviews

      Summary of our revisions

      (1) We have explained the reason why the untrained RNN with readout (value-weight) learning only could not well learn the simple task: it is because we trained the models continuously across trials with random inter-trial intervals rather than separately for each episodic trial and so it was not trivial for the models to recognize that cue presentation in different trials constitutes a same single state since the activities of untrained RNN upon cue presentation should differ from trial to trial (Line 177-185).

      (2) We have shown that dimensionality was higher in the value-RNNs than in the untrained RNN (Fig. 2K,6H).

      (3) We have shown that even when distractor cue was introduced, the value-RNNs could learn the task (Fig. 10).

      (4) We have shown that extended value-RNNs incorporating excitatory and inhibitory units and conforming to the Dale's law could still learn the tasks (Fig. 9,10-right column).

      (5) In the original manuscript, the non-negatively constrained value-RNN showed loose alignment of value-weight and random feedback from the beginning but did not show further alignment over trials. We have clarified its reason and found a way, introducing a slight decay (forgetting), to make further alignment occur (Fig. 8E,F).

      (6) We have shown that the value-RNNs could learn the tasks with longer cue-reward delay (Fig. 2M,6J) or action selection (Fig. 11), and found cases where random feedback performed worse than symmetric feedback.

      (7) We compared our value-RNNs with e-prop (Bellec et al., 2020, Nat Commun). While e-prop incorporates the effects of changes in RNN weights across distant times through "eligibility trace", our value-RNNs do not. The reason why our models can still learn the tasks with cue-reward delay is considered to be because our models use TD error and TD learning itself, even TD(0) without eligibility trace, is a solution for temporal credit assignment. In fact, TD error-based e-prop was also examined, but for that, result with symmetric feedback, but not with random feedback, was shown (their Fig. 4,5) while for another setup of reward-based e-prop without TD error, result with random feedback was shown (their SuppFig. 5). We have noted these in Line 695-711 (and also partly in Line 96-99).

      (8) In the original manuscript, we emphasized only the spatial locality (random rather than symmetric feedback) of our learning rule. But we have now also emphasized the temporal locality (online learning) as it is also crucial for bio-plausibility and critically different from the original value-RNN with BPTT. We also changed the title.

      (9) We have realized that our estimation of true state values was invalid (as detailed in page 34 of this document). Effects of this error on performance comparisons were small, but we apologize for this error.

      Reviewer #1 (Public review):

      Summary:

      Can a plastic RNN serve as a basis function for learning to estimate value. In previous work this was shown to be the case, with a similar architecture to that proposed here. The learning rule in previous work was back-prop with an objective function that was the TD error function (delta) squared. Such a learning rule is non-local as the changes in weights within the RNN, and from inputs to the RNN depends on the weights from the RNN to the output, which estimates value. This is non-local, and in addition, these weights themselves change over learning. The main idea in this paper is to examine if replacing the values of these non-local changing weights, used for credit assignment, with random fixed weights can still produce similar results to those obtained with complete bp. This random feedback approach is motivated by a similar approach used for deep feed-forward neural networks.

      This work shows that this random feedback in credit assignment performs well but is not as well as the precise gradient-based approach. When more constraints due to biological plausibility are imposed performance degrades. These results are not surprising given previous results on random feedback. This work is incomplete because the delay times used were only a few time steps, and it is not clear how well random feedback would operate with longer delays. Additionally, the examples simulated with a single cue and a single reward are overly simplistic and the field should move beyond these exceptionally simple examples.

      Strengths:

      • The authors show that random feedback can approximate well a model trained with detailed credit assignment.

      • The authors simulate several experiments including some with probabilistic reward schedules and show results similar to those obtained with detailed credit assignments as well as in experiments.

      • The paper examines the impact of more biologically realistic learning rules and the results are still quite similar to the detailed back-prop model.

      Weaknesses:

      *please note that we numbered your public review comments and recommendations for the authors as Pub1 and Rec1 etc so that we can refer to them in our replies to other comments.

      Pub1. The authors also show that an untrained RNN does not perform as well as the trained RNN. However, they never explain what they mean by an untrained RNN. It should be clearly explained.

      These results are actually surprising. An untrained RNN with enough units and sufficiently large variance of recurrent weights can have a high-dimensionality and generate a complete or nearly complete basis, though not orthonormal (e.g: Rajan&Abbott 2006). It should be possible to use such a basis to learn this simple classical conditioning paradigm. It would be useful to measure the dimensionality of network dynamics, in both trained and untrained RNN's.

      We have added an explanation of untrained RNN in Line 144-147:

      “As a negative control, we also conducted simulations in which these connections were not updated from initial values, referring to as the case with "untrained (fixed) RNN". Notably, the value weights w (i.e., connection weights from the RNN to the striatal value unit) were still trained in the models with untrained RNN.”

      We have also analyzed the dimensionality of network dynamic by calculating the contribution ratios of each principal component of the trajectory of RNN activities. It was revealed that the contribution ratios of later principal components were smaller in the cases with untrained RNN than in the cases with trained value RNN. We have added these results in Fig. 2K and Line 210-220 (for our original models without non-negative constraint):

      “In order to examine the dimensionality of RNN dynamics, we conducted principal component analysis (PCA) of the time series (for 1000 trials) of RNN activities and calculated the contribution ratios of PCs in the cases of oVRNNbp, oVRNNrf, and untrained RNN with 20 RNN units. Figure 2K shows a log of contribution ratios of 20 PCs in each case. Compared with the case of untrained RNN, in oVRNNbp and oVRNNrf, initial component(s) had smaller contributions (PC1 (t-test p = 0.00018 in oVRNNbp; p = 0.0058 in oVRNNrf) and PC2 (p = 0.080 in oVRNNbp; p = 0.0026 in oVRNNrf)) while later components had larger contributions (PC3~10,15~20 p < 0.041 in oVRNNbp; PC5~20 p < 0.0017 in oVRNNrf) on average, and this is considered to underlie their superior learning performance. We noticed that late components had larger contributions in oVRNNrf than in oVRNNbp, although these two models with 20 RNN units were comparable in terms of cue~reward state values (Fig. 2J-left).”

      and Fig. 6H and Line 412-416 (for our extended models with non-negative constraint):

      “Figure 6H shows contribution ratios of PCs of the time series of RNN activities in each model with 20 RNN units. Compared with the cases with naive/shuffled untrained RNN, in oVRNNbp-rev and oVRNNrf-bio, later components had relatively high contributions (PC5~20 p < 1.4×10,sup>−6</sup> (t-test vs naive) or < 0.014 (vs shuffled) in oVRNNbp-rev; PC6~20 p < 2.0×10<sup>−7</sup> (vs naive) or PC7~20 p < 5.9×10<sup>−14</sup> (vs shuffled) in oVRNNrf-bio), explaining their superior value-learning performance.”

      Regarding the poor performance of the model with untrained RNN, we would like to add a note. It is sure that untrained RNN with sufficient dimensions should be able to well represent just <10 different states, and state values should be able to be well learned through TD learning regardless of whatever representation is used. However, a difficulty (nontriviality) lies in that because we modeled the tasks in a continuous way, rather than in an episodic way, the activity of untrained RNN upon cue presentation should generally differ from trial to trial. Therefore, it was not trivial for RNN to know that cue presentation in different trials, even after random lengths of inter-trial interval, should constitute a same single state. We have added this note in Line 177-185:

      “This inferiority of untrained RNN may sound odd because there were only four states from cue to reward while random RNN with enough units is expected to be able to represent many different states (c.f., [49]) and the effectiveness of training of only the readout weights has been shown in reservoir computing studies [50-53]. However, there was a difficulty stemming from the continuous training across trials (rather than episodic training of separate trials): the activity of untrained RNN upon cue presentation generally differed from trial to trial, and so it is non-trivial that cue presentation in different trials should be regarded as the same single state, even if it could eventually be dealt with at the readout level if the number of units increases.”

      The original value RNN study (Hennig et al., 2023, PLoS Comput Biol) also modeled tasks in a continuous way (though using backprop-through-time (BPTT) for training) and their model with untrained RNN also showed considerably larger RPE error than the value RNN even when the number of RNN units was 100 (the maximum number plotted in their Fig. 6A).

      Pub2. The impact of the article is limited by using a network with discrete time-steps, and only a small number of time steps from stimulus to reward. What is the length of each time step? If it's on the order of the membrane time constant, then a few time steps are only tens of ms. In the classical conditioning experiments typical delays are of the order to hundreds of milliseconds to seconds. Authors should test if random feedback weights work as well for larger time spans. This can be done by simply using a much larger number of time steps.

      In the revised manuscript, we examined the cases in which the cue-reward delay (originally 3 time steps) was elongated to 4, 5, or 6 time-steps. Our online value RNN models with random feedback could still achieve better performance (smaller squared value error) than the models with untrained RNN, although the performance degraded as the cue-reward delay increased. We have added these results in Fig. 2M and Line 223-228 (for our original models without non-negative constraint)

      “We further examined the cases with longer cue-reward delays. As shown in Fig. 2M, as the delay increased, the mean squared error of state values (at 3000-th trial) increased, but the relative superiority of oVRNNbp and oVRNNrf over the model with untrained RNN remained to hold, except for cases with small number of RNN units (5) and long delay (5 or 6) (p < 0.0025 in Wilcoxon rank sum test for oVRNNbp or oVRNNrf vs untrained for each number of RNN units for each delay).”

      and Fig. 6J and Line 422-429 (for our extended models with non-negative constraint):

      “Figure 6J shows the cases with longer cue-reward delays, with default or halved learning rates. As the delay increased, the mean squared error of state values (at 3000-th trial) increased, but the relative superiority of oVRNNbp-rev and oVRNNrf-bio over the models with untrained RNN remained to hold, except for a few cases with 5 RNN units (5 delay oVRNNrf-bio vs shuffled with default learning rate, 6 delay oVRNNrf-bio vs naive or shuffled with halved learning rate) (p < 0.047 in Wilcoxon rank sum test for oVRNNbp-rev or oVRNNrf-bio vs naive or shuffled untrained for each number of RNN units for each delay).”

      Also, we have added the note about our assumption and consideration on the time-step that we described in our provisional reply in Line 136-142:

      “We assumed that a single RNN unit corresponds to a small population of neurons that intrinsically share inputs and outputs, for genetic or developmental reasons, and the activity of each unit represents the (relative) firing rate of the population. Cortical population activity is suggested to be sustained not only by fast synaptic transmission and spiking but also, even predominantly, by slower synaptic neurochemical dynamics [46] such as short-term facilitation, whose time constant can be around 500 milliseconds [47]. Therefore, we assumed that single time-step of our rate-based (rather than spike-based) model corresponds to 500 milliseconds.”

      Pub3. In the section with more biologically constrained learning rules, while the output weights are restricted to only be positive (as well as the random feedback weights), the recurrent weights and weights from input to RNN are still bi-polar and can change signs during learning. Why is the constraint imposed only on the output weights? It seems reasonable that the whole setup will fail if the recurrent weights were only positive as in such a case most neurons will have very similar dynamics, and the network dimensionality would be very low. However, it is possible that only negative weights might work. It is unclear to me how to justify that bipolar weights that change sign are appropriate for the recurrent connections and inappropriate for the output connections. On the other hand, an RNN with excitatory and inhibitory neurons in which weight signs do not change could possibly work.

      We examined extended models that incorporated inhibitory and excitatory units and followed Dale's law with certain assumptions, and found that these models could still learn the tasks. We have added these results in Fig. 9 and subsection “4.1 Models with excitatory and inhibitory units” and described the details of the extended models in Line 844-862:

      Pub4. Like most papers in the field this work assumes a world composed of a single cue. In the real world there many more cues than rewards, some cues are not associated with any rewards, and some are associated with other rewards or even punishments. In the simplest case, it would be useful to show that this network could actually work if there are additional distractor cues that appear at random either before the CS, or between the CS and US. There are good reasons to believe such distractor cues will be fatal for an untrained RNN, but might work with a trained RNN, either using BPPT or random feedback. Although this assumption is a common flaw in most work in the field, we should no longer ignore these slightly more realistic scenarios.

      We examined the performance of the models in a task in which distractor cue randomly appeared. As a result, our model with random feedback, as well as the model with backprop, could still learn the state values much better than the models with untrained RNN. We have added these results in Fig. 10 and subsection “4.2 Task with distractor cue”

      Reviewer #1 (Recommendations for the authors):

      Detailed comments to authors

      Rec1. Are the untrained RNNs discussed in methods? It seems quite good in estimating value but has a strong dopamine response at time of reward. Is nothing trained in the untrained RNN or are the W values trained. Untrained RNN are not bad at estimating value, but not as good as the two other options. It would seem reasonable that an untrained RNN (if I understand what it is) will be sufficient for such simple Pavlovian conditioning paradigms. This is provided that the RNN generates a complete, or nearly complete basis. Random RNN's provided that the random weights are chosen properly can indeed generate a nearly complete basis. Once there is a nearly complete temporal basis, it seems that a powerful enough learning rule will be able to learn the very simple Pavlovian conditioning. Since there are only 3 time-steps from cue to reward, an RNN dimensionality of 3 would be sufficient. A failure to get a good approximation can also arise from the failure of the learning algorithm for the output weights (W).

      As we mentioned in our reply to your public comment Pub1 (page 3-5), we have added an explanation of "untrained RNN" (in which the value weights were still learnt) (Line 144-147). We also analyzed the dimensionality of network dynamics by calculating the contribution ratios of principal components of the trajectory of RNN activities, showing that the contribution ratios of later principal components were smaller in the cases with untrained RNN than in the cases with trained value RNN (Fig. 2K/Line 210-220, Fig.6H/Line 412-416). Moreover, also as we mentioned in our reply to your public comment Pub1, we have added a note that even learning of a small number of states was not trivially easy because we considered continuous learning across trials rather than episodic learning of separate trials and thus it was not trivial for the model to know that cue presentation in different trials after random lengths of inter-trial interval should still be regarded as a same single state (Line 177-185).

      Rec2. For all cases, it will be useful to estimate the dimensionality of the RNN. Is the dimensionality of the untrained RNN smaller than in the trained cases? If this is the case, this might depend on the choice of the initial random (I assume) recurrent connectivity matrix.

      As mentioned above, we have analyzed the dimensionality of the network dynamics, and as you said, the dimensionality of the model with untrained RNN (which was indeed the initial random matrix as you said, as we mentioned above) was on average smaller than the trained value RNN models (Fig. 2K/Line 210-220, Fig.6H/Line 412-416).

      Rec3. It is surprising that the error starts increasing for more RNN units above ~15. See discussion. This might indicate a failure to adjust the learning parameters of the network rather than a true and interesting finding.

      Thank you very much for this insightful comment. In the original manuscript, we set the learning rate to a fixed value (0.1), without normalization by the squared norm of feature vector (as we mentioned in Line 656-7 of the original manuscript) because we thought such a normalization could not be locally (biologically) implemented. However, we have realized that the lack of normalization resulted in excessively large learning rate when the number of RNN units was large and it could cause instability and error increase as you suggested. Therefore, in the revised manuscript, we have implemented a normalization of learning rate (of value weights) that does not require non-local computations, specifically, division by the number of RNN units. As a result, the error now monotonically decreased, as the number of RNN units increased, in the non-negatively constrained models (Fig. 6E-left) and also largely in the unconstrained model with random feedback, although still not in the unconstrained model with backprop or untrained RNN (Fig. 2J-left)

      Rec4. Not numbering equations is a problem. For example, the explanations of feedback alignment (lines 194-206) rely on equations in the methods section which are not numbered. This makes it hard to read these explanations. Indeed, it will also be better to include a detailed derivation of the explanation in these lines in a mathematical appendix. Key equations should be numbered.

      We have added numbers to key equations in the Methods, and references to the numbers of corresponding equations in the main text. Detailed derivations are included in the Methods.

      Rec5. What is shown in Figure 3C? - an equation will help.

      We have added an explanation using equations in the main text (Line 256-259).

      Rec6. The explanation of why alignment occurs is not satisfactory, but neither is it in previous work on feedforward networks. The least that should be done though

      Regarding why alignment occurs, what remained mysterious (to us) was that in the case of nonnegatively constrained model, while the angle between value weight vector (w) and the random feedback vector (c) was relatively close (loosely aligned) from the beginning, it appeared (as mentioned in the manuscript) that there was no further alignment over trials, despite that the same mechanism for feedback alignment that we derived for the model without non-negative constraint was expected to operate also under the non-negative constraint. We have now clarified the reason for this, and found a way, introduction of slight decay (forgetting) of value weights, by which feedback alignment came to occur in the non-negatively constraint model. We have added these in the revised manuscript (Line 463-477):

      “As mentioned above, while the angle between w and c was on average smaller than 90° from the beginning, there was no further alignment over trials. This seemed mysterious because the mechanism for feedback alignment that we derived for the models without non-negative constraint was expected to work also for the models with non-negative constraint. As a possible reason for the non-occurrence of feedback alignment, we guessed that one or a few element(s) of w grew prominently during learning, and so w became close to an edge or boundary of the non-negative quadrant and thereby angle between w and other vector became generally large (as illustrated in Fig. 8D). Figure 8Ea shows the mean±SEM of the elements of w ordered from the largest to smallest ones after 1500 trials. As conjectured above, a few elements indeed grew prominently.

      We considered that if a slight decay (forgetting) of value weights (c.f., [59-61]) was assumed, such a prominent growth of a few elements of w may be mitigated and alignment of w to c, beyond the initial loose alignment because of the non-negative constraint, may occur. These conjectures were indeed confirmed by simulations (Fig. 8Eb,c and Fig. 8F). The mean squared value error slightly increased when the value-weightdecay was assumed (Fig. 8G), however, presumably reflecting a decrease in developed values and a deterioration of learning because of the decay.”

      Rec7. I don't understand the qualitative difference between 4G and 4H. The difference seems to be smaller but there is still an apparent difference. Can this be quantified?

      We have added pointers indicating which were compared and statistical significance on Fig. 4D-H, and also Fig. 7 and Fig. 9C.

      Rec8. More biologically realistic constraints.

      Are the weights allowed to become negative? - No.

      Figure 6C - untrained RNN with non-negative x_i. Again - it was not explained what untrained RNN is. However, given my previous assumption, this is probably because the units developed in an untrained RNN is much further from representing a complete basis function. This cannot be done with only positive values. It would be useful to see network dynamics of units for untrained RNN. It might also be useful in all cases to estimate the dimensionality of the RNN. For 3 time-steps, it needs to be at least 3, and for more time steps as in Figure 4, larger.

      As we mentioned in our reply to your public comment Pub3 (page 6-8), in the revised manuscript we examined models that incorporated inhibitory and excitatory units and followed Dale's law, which could still learn the tasks (Fig. 9, Line 479-520). We have also analyzed the dimensionality of network dynamics as we mentioned in our replies to your public comment Pub1 and recommendations Rec1 and Rec2.

      Rec9. A new type of untrained RNN is introduced (Fig 6D) this is the first time an explanation of of the untrained RNN is given. Indeed, the dimensionality of the second type of untrained RNN should be similar to the bioVRNNrf. The results are still not good.

      In the model with the new type of untrained RNN whose elements were shuffled from trained bioVRNNrf, contribution ratios of later principal components of the trajectory of RNN activities (Fig. 6H gray dotted line) were indeed larger than those in the model with native untrained RNN (gray solid line) but still much smaller than those in the trained value RNN models with backprop (red line) or random feedback (blue line). It is considered that in value RNN, RNN connections were trained to realize high-dimensional trajectory, and shuffling did not generally preserve such an ability.

      Rec10. The discussion is too long and verbose. This is not a review paper.

      We have made the original discussion much more compact (from 1686 words to 940 words). We have added new discussion, in response to the review comments, but the total length remains to be shorter than before (1589 words).

      Reviewer #2 (Public review):

      Summary:

      Tsurumi et al. show that recurrent neural networks can learn state and value representations in simple reinforcement learning tasks when trained with random feedback weights. The traditional method of learning for recurrent network in such tasks (backpropagation through time) requires feedback weights which are a transposed copy of the feed-forward weights, a biologically implausible assumption. This manuscript builds on previous work regarding "random feedback alignment" and "value-RNNs", and extends them to a reinforcement learning context. The authors also demonstrate that certain nonnegative constraints can enforce a "loose alignment" of feedback weights. The author's results suggest that random feedback may be a powerful tool of learning in biological networks, even in reinforcement learning tasks.

      Strengths:

      The authors describe well the issues regarding biologically plausible learning in recurrent networks and in reinforcement learning tasks. They take care to propose networks which might be implemented in biological systems and compare their proposed learning rules to those already existing in literature. Further, they use small networks on relatively simple tasks, which allows for easier intuition into the learning dynamics.

      Weaknesses:

      The principles discovered by the authors in these smaller networks are not applied to deeper networks or more complicated tasks, so it remains unclear to what degree these methods can scale up, or can be used more generally.

      We have examined extended models that incorporated inhibitory and excitatory units and followed Dale's law with certain assumptions, and found that these models could still learn the tasks. We have added these results in Fig. 9 and subsection “4.1 Models with excitatory and inhibitory units”.

      We have also examined the performance of the models in a task in which distractor cue randomly appeared, finding that our models could still learn the state values much better than the models with untrained RNN. We have added these result in Fig. 10 and subsection “4.2 Task with distractor cue”.

      Regarding the depth, we continue to think about it but have not yet come up with concrete ideas.

      Reviewer #2 (Recommendations for the authors):

      (1) I think the work would greatly benefit from more proofreading. There are language errors/oddities throughout the paper, I will list just a few examples from the introduction:

      Thank you for pointing this out. We have made revisions throughout the paper.

      line 63: "simultaneously learnt in the downstream of RNN". Simultaneously learnt in networks downstream of the RNN? Simulatenously learn in a downstream RNN? The meaning is not clear in the original sentence.

      We have revised it to "simultaneously learnt in connections downstream of the RNN" (Line 67-68).

      starting in line 65: " A major problem, among others.... value-encoding unit" is a run-on sentence and would more readable if split into multiple sentences.

      We have extensively revised this part, which now consists of short sentences (Line 70-75).

      line 77: "in supervised learning of feed-forward network" should be either "in supervised learning of a feed-forward network" or "in supervised learning of feed-forward networks".

      We have changed "feed-forward network" to "feed-forward networks" (Line 83).

      (2) Under what conditions can you use an online learning rule which only considers the influence of the previous timestep? It's not clear to me how your networks solve the temporal credit assignment problem when the cue-reward delay in your tasks is 3-5ish time steps. How far can you stretch this delay before your networks stop learning correctly because of this one-step assumption? Further, how much does feedback alignment constrain your ability to learn long timescales, such as in Murray, J.M. (2019)?

      The reason why our models can solve the temporal credit assignment problem at least to a certain extent is considered to be because temporal-difference (TD) learning, which we adopted, itself has a power to resolve temporal credit assignment, as exemplified in that TD(0) algorithms without eligibility trance can still learn the value of distant rewards. We have added a discussion on this in Line 702-705:

      “…our models do not have "eligibility trace" (nor memorable/gated unit, different from the original value-RNN [26]), but could still solve temporal credit assignment to a certain extent because TD learning is by itself a solution for it (notably, recent work showed that combination of TD(0) and model-based RL well explained rat's choice and DA patterns [132]).”

      We have also examined the cases in which the cue-reward delay (originally 3 time steps) was elongated to 4, 5, or 6 time-steps, and our models with random feedback could still achieve better performance than the models with untrained RNN although the performance degraded as the cue-reward delay increased. We have added these results in Fig. 2M and Line 223-228 (for our original models without non-negative constraint)

      “We further examined the cases with longer cue-reward delays. As shown in Fig. 2M, as the delay increased, the mean squared error of state values (at 3000-th trial) increased, but the relative superiority of oVRNNbp and oVRNNrf over the model with untrained RNN remained to hold, except for cases with small number of RNN units (5) and long delay (5 or 6) (p < 0.0025 in Wilcoxon rank sum test for oVRNNbp or oVRNNrf vs untrained for each number of RNN units for each delay).”

      and Fig. 6J and Line 422-429 (for our extended models with non-negative constraint):

      “Figure 6J shows the cases with longer cue-reward delays, with default or halved learning rates. As the delay increased, the mean squared error of state values (at 3000-th trial) increased, but the relative superiority of oVRNNbp-rev and oVRNNrf-bio over the models with untrained RNN remained to hold, except for a few cases with 5 RNN units (5 delay oVRNNrf-bio vs shuffled with default learning rate, 6 delay oVRNNrf-bio vs naive or shuffled with halved learning rate) (p < 0.047 in Wilcoxon rank sum test for oVRNNbp-rev or oVRNNrf-bio vs naive or shuffled untrained for each number of RNN units for each delay).”

      As for the difficulty due to random feedback compared to backprop, there appeared to be little difference in the models without non-negative constraint (Fig. 2M), whereas in the models with nonnegative constraint, when the cue-reward delay was elongated to 6 time-steps, the model with random feedback performed worse than the model with backprop (Fig. 6J bottom-left panel).

      (3) Line 150: Were the RNN methods trained with continuation between trials?

      Yes, we have added

      “The oVRNN models, and the model with untrained RNN, were continuously trained across trials in each task, because we considered that it was ecologically more plausible than episodic training of separate trials.” in Line 147-150. This is considered to make learning of even the simple cue-reward association task nontrivial, as we describe in our reply to your comment 9 below.

      (4) Figure 2I, J: indicate the statistical significance of the difference between the three methods for each of these measures.

      We have added statistical information for Fig. 2J (Line 198-203):

      “As shown in the left panel of Fig. 2J, on average across simulations, oVRNNbp and oVRNNrf exhibited largely comparable performance and always outperformed the untrained RNN (p < 0.00022 in Wilcoxon rank sum test for oVRNNbp or oVRNNrf vs untrained for each number of RNN units), although oVRNNbp somewhat outperformed or underperformed oVRNNrf when the number of RNN units was small (≤10 (p < 0.049)) or large (≥25 (p < 0.045)), respectively.”

      and also Fig. 6E (for non-negative models) (Line 385-390):

      “As shown in the left panel of Fig. 6E, oVRNNbp-rev and oVRNNrf-bio exhibited largely comparable performance and always outperformed the models with untrained RNN (p < 2.5×10<sup>−12</sup> in Wilcoxon rank sum test for oVRNNbp-rev or oVRNNrf-bio vs naive or shuffled untrained for each number of RNN units), although oVRNNbp-rev somewhat outperformed or underperformed oVRNNrf-bio when the number of RNN units was small (≤10 (p < 0.00029)) or large (≥25 (p < 3.7×10<sup>−6</sup>)), respectively…”

      Fig. 2I shows distributions, whose means are plotted in Fig. 2J, and we did not add statistics to Fig. 2I itself.

      (5) Line 178: Has learning reached a steady state after 1000 trials for each of these networks? Can you show a plot of error vs. trial number?

      We have added a plot of error vs trial number for original models (Fig. 2L, Line 221-223):

      “We examined how learning proceeded across trials in the models with 20 RNN units. As shown in Fig. 2L, learning became largely converged by 1000-th trial, although slight improvement continued afterward.”

      and non-negatively constrained models (Fig. 6I, Line 417-422):

      “Figure 6I shows how learning proceeded across trials in the models with 20 RNN units. While oVRNNbp-rev and oVRNNrf-bio eventually reached a comparable level of errors, oVRNNrf-bio outperformed oVRNNbp-rev in early trials (at 200, 300, 400, or 500 trials; p < 0.049 in Wilcoxon rank sum test for each). This is presumably because the value weights did not develop well in early trials and so the backprop-type feedback, which was the same as the value weights, did not work well, while the non-negative fixed random feedback worked finely from the beginning.”

      As shown in these figures, learning became largely steady at 1000 trials, but still slightly continued, and we have added simulations with 3000 trials (Fig. 2M and Fig. 6J).

      (6) Line 191: Put these regression values in the figure caption, as well as on the plot in Figure 3B.

      We have added the regression values in Fig. 3B and its caption.

      (7) Line 199: This idea of being in the same quadrant is interesting, but I think the term "relatively close angle" is too vague. Is there another more quantatative way to describe this what you mean by this?

      We have revised this (Line 252-254) to “a vector that is in a relatively close angle with c , or more specifically, is in the same quadrant as (and thus within at maximum 90° from) c (for example, [c<sub>1</sub>  c<sub>2</sub>  c<sub>3</sub>]<sup>T</sup> and [0.5c<sub>1</sub> 1.2c<sub>2</sub> 0.8c<sub>3</sub>]T) “

      (8) Line 275: I'd like to see this measure directly in a plot, along with the statistical significance.

      We have added pointers indicating which were compared and statistical significance on Fig. 4D-H, and also Fig. 7 and Fig. 9C.

      (9) Line 280: Surely the untrained RNN should be able to solve the task if the reservoir is big enough, no? Maybe much bigger than 50 units, but still.

      We think this is not sure. A difficulty lies in that because we modeled the tasks in a continuous way rather than in an episodic way (as we mentioned in our reply to your comment 3), the activity of untrained RNN upon cue presentation should generally differ from trial to trial. Therefore, it was not trivial for RNN to know that cue presentation in different trials, even after random lengths of inter-trial interval, should constitute a same single state. We have added this note in Line 177-185:

      “This inferiority of untrained RNN may sound odd because there were only four states from cue to reward while random RNN with enough units is expected to be able to represent many different states (c.f., [49]) and the effectiveness of training of only the readout weights has been shown in reservoir computing studies [50-53]. However, there was a difficulty stemming from the continuous training across trials (rather than episodic training of separate trials): the activity of untrained RNN upon cue presentation generally differed from trial to trial, and so it is non-trivial that cue presentation in different trials should be regarded as the same single state, even if it could eventually be dealt with at the readout level if the number of units increases.”

      The original value RNN study (Hennig et al., 2023, PLoS Comput Biol) also modeled tasks in a continuous way (though using BPTT for training) and their model with untrained RNN also showed considerably larger RPE error than the value RNN even when the number of RNN units was 100 (the maximum number plotted in their Fig. 6A).

      (10) It's a bit confusing to compare Figure 4C to Figure 4D-H because there are also many features of D-H which do not match those of C (response to cue, response to late reward in task 1). It would make sense to address this in some way. Is there another way to calculate the true values of the states (e.g., maybe you only start from the time of the cue) which better approximates what the networks are doing?

      As we mentioned in our replies to your comments 3 and 9, our models with RNN were trained continuously across trials rather than separately for each episodic trial, and whether the models could still learn the state representation is a key issue. Therefore, starting learning from the time of cue would not be an appropriate way to compare the models, and instead we have made statistical comparison regarding key features, specifically, TD-RPEs at early and late rewards, as indicated in Fig. 4D-H.

      (11) Line 309: Can you explain why this non-monotic feature exists? Why do you believe it would be more biologically plausible to assume monotonic dependence? It doesn't seem so straightforward to me, I can imagine that competing LTP/LTD mechanisms may produce plasticity which would have a non-monotic dependence on post-synaptic activity.

      Thank you for this insightful comment. As you suggested, non-monotonic dependence on the postsynaptic activity (BCM rule) has been proposed for unsupervised learning (cortical self-organization) (Bienenstock et al., 1982 J Neurosci), and there were suggestions that triplet-based STDP could be reduced to a BCM-like rule and additional components (Gjorgjieva et al., 2011 PNAS; Shouval, 2011 PNAS). However, the non-monotonicity appeared in our model, derived from the backprop rule, is maximized at the middle and thus opposite from the BCM rule, which is minimized at the middle (i.e., initially decrease and thereafter increase). Therefore we consider that such an increase-then-decreasetype non-monotonicity would be less plausible than a monotonic increase, which could approximate an extreme case (with a minimum dip) of the BCM rule. We have added a note on this point in Line 355-358:

      “…the dependence on the post-synaptic activity was non-monotonic, maximized at the middle of the range of activity. It would be more biologically plausible to assume a monotonic increase (while an opposite shape of nonmonotonicity, once decrease and thereafter increase, called the BCM (Bienenstock-Cooper-Munro) rule has actually been suggested [56-58]).”

      (12) Line 363: This is the most exciting part of the paper (for me). I want to learn way more about this! Don't hide this in a few sentences. I want to know all about loose vs. feedback alignment. Show visualizations in 3D space of the idea of loose alignment (starting in the same quadrant), and compare it to how feedback alignment develops (ending in the same quadrant). Does this "loose" alignment idea give us an idea why the random feedback seems to settle at 45 degree angle? it just needs to get the signs right (same quadrant) for each element?

      In reply to this encouraging comment, we have made further analyses of the loose alignment. By the term "loose alignment", we meant that the value weight vector w and the feedback vector c are in the same (non-negative) quadrant, as you said. But what remained mysterious (to us) was while the angle between w and c was relatively close (loosely aligned) from the beginning, it appeared (as mentioned in the manuscript) that there was no further alignment over trials (and the angle actually settled at somewhat larger than 45°), despite that the same mechanism for feedback alignment that we derived for the model without non-negative constraint was expected to operate also under the nonnegative constraint. We have now clarified the reason for this, and found a way, introduction of slight decay (forgetting) of value weights, by which feedback alignment came to occur in the non-negatively constraint model. We have added this in Line 463-477:

      “As mentioned above, while the angle between w and c was on average smaller than 90° from the beginning, there was no further alignment over trials. This seemed mysterious because the mechanism for feedback alignment that we derived for the models without non-negative constraint was expected to work also for the models with non-negative constraint. As a possible reason for the non-occurrence of feedback alignment, we guessed that one or a few element(s) of w grew prominently during learning, and so w became close to an edge or boundary of the non-negative quadrant and thereby angle between w and other vector became generally large (as illustrated in Fig. 8D). Figure 8Ea shows the mean±SEM of the elements of w ordered from the largest to smallest ones after 1500 trials. As conjectured above, a few elements indeed grew prominently.

      We considered that if a slight decay (forgetting) of value weights (c.f., [59-61]) was assumed, such a prominent growth of a few elements of w may be mitigated and alignment of w to c, beyond the initial loose alignment because of the non-negative constraint, may occur. These conjectures were indeed confirmed by simulations (Fig. 8Eb,c and Fig. 8F). The mean squared value error slightly increased when the value-weightdecay was assumed (Fig. 8G), however, presumably reflecting a decrease in developed values and a deterioration of learning because of the decay.”

      As for visualization, because the model's dimension was high such as 12, we could not come up with better ways of visualization than the trial versus angle plot (Fig. 3A, 8A,F). Nevertheless, we would expect that the abovementioned additional analyses of loose alignment (with graphs) are useful to understand what are going on.

      (13) Line 426: how does this compare to some of the reward modulated hebbian rules proposed in other RNNs? See Hoerzer, G. M., Legenstein, R., & Maass, W. (2014). Put another way, you arrived at this from a top-down approach (gradient descent->BP->approximated by RF->non-negativity constraint>leads to DA dependent modulation of Hebbian plasticity). How might this compare to a bottom up approach (i.e. starting from the principle of Hebbian learning, and adding in reward modulation)

      The study of Hoerzer et al. 2014 used a stochastic perturbation, which we did not assume but can potentially be integrated. On the other hand, Hoerzer et al. trained the readout of untrained RNN, whereas we trained both RNN and its readout. We have added discussion to compare our model with Hoerzer et al. and other works that also used perturbation methods, as well as other top-down approximation method, in Line 685-711 (reference 128 is Hoerzer et al. 2014 Cereb Cortex):

      “As an alternative to backprop in hierarchical network, aside from feedback alignment [36], Associative Reward-Penalty (A<sub>R-P</sub>) algorithm has been proposed [124-126]. In A<sub>R-P</sub>, the hidden units behave stochastically, allowing the gradient to be estimated via stochastic sampling. Recent work [127] has proposed Phaseless Alignment Learning (PAL), in which high-frequency noise-induced learning of feedback projections proceeds simultaneously with learning of forward projections using the feedback in a lower frequency. Noise-induced learning of the weights on readout neurons from untrained RNN by reward-modulated Hebbian plasticity has also been demonstrated [128]. Such noise- or perturbation-based [40] mechanisms are biologically plausible because neurons and neural networks can exhibit noisy or chaotic behavior [129-131], and might improve the performance of value-RNN if implemented.

      Regarding learning of RNN, "e-prop" [35] was proposed as a locally learnable online approximation of BPTT [27], which was used in the original value RNN 26. In e-prop, neuron-specific learning signal is combined with weight-specific locally-updatable "eligibility trace". Reward-based e-prop was also shown to work [35], both in a setup not introducing TD-RPE with symmetric or random feedback (their Supplementary Figure 5) and in another setup introducing TD-RPE with symmetric feedback (their Figure 4 and 5). Compared to these, our models differ in multiple ways.

      First, we have shown that alignment to random feedback occurs in the models driven by TD-RPE. Second, our models do not have "eligibility trace" (nor memorable/gated unit, different from the original valueRNN [26]), but could still solve temporal credit assignment to a certain extent because TD learning is by itself a solution for it (notably, recent work showed that combination of TD(0) and model-based RL well explained rat's choice and DA patterns [132]). However, as mentioned before, single time-step in our models was assumed to correspond to hundreds of milliseconds, incorporating slow synaptic dynamics, whereas e-prop is an algorithm for spiking neuron models with a much finer time scale. From this aspect, our models could be seen as a coarsetime-scale approximation of e-prop. On top of these, our results point to a potential computational benefit of biological non-negative constraint, which could effectively limit the parameter space and promote learning.”

      Related to your latter point (and also replying to other reviewer's comment), we also examined the cases where the random feedback in our model was replaced with uniform feedback, which corresponds to a simple bottom-up reward-modulated triplet plasticity rule. As a result, the model with uniform feedback showed largely comparable, but somewhat worse, performance than the model with random feedback. We have added the results in Fig. 2J-right and Line 206-209 (for our original models without non-negative constraint):

      “The green line in Fig. 2J-right shows the performance of a special case where the random feedback in oVRNNrf was fixed to the direction of (1, 1, ..., 1)<sup>T</sup> (i.e., uniform feedback) with a random coefficient, which was largely comparable to, but somewhat worse than, that for the general oVRNNrf (blue line).”

      and Fig. 6E-right and Line 402-407 (for our extended models with non-negative constraint):

      “The green and light blue lines in the right panels of Figure 6E and Figure 6F show the results for special cases where the random feedback in oVRNNrf-bio was fixed to the direction of (1, 1, ..., 1) <sup>T</sup> (i.e., uniform feedback) with a random non-negative magnitude (green line) or a fixed magnitude of 0.5 (light blue line). The performance of these special cases, especially the former (with random magnitude) was somewhat worse than that of oVRNNrf-bio, but still better than that of the models with untrained RNN. and also added a biological implication of the results in Line 644-652:

      We have shown that oVRNNrf and oVRNNrf-bio could work even when the random feedback was uniform, i.e., fixed to the direction of (1, 1, ..., 1) <sup>T</sup>, although the performance was somewhat worse. This is reasonable because uniform feedback can still encode scalar TD-RPE that drives our models, in contrast to a previous study [45], which considered DA's encoding of vector error and thus regarded uniform feedback as a negative control. If oVRNNrf/oVRNNrf-bio-like mechanism indeed operates in the brain and the feedback is near uniform, alignment of the value weights w to near (1, 1, ..., 1) is expected to occur. This means that states are (learned to be) represented in such a way that simple summation of cortical neuronal activity approximates value, thereby potentially explaining why value is often correlated with regional activation (fMRI BOLD signal) of cortical regions [113].”

      Reviewer #3 (Public review):

      Summary:

      The paper studies learning rules in a simple sigmoidal recurrent neural network setting. The recurrent network has a single layer of 10 to 40 units. It is first confirmed that feedback alignment (FA) can learn a value function in this setting. Then so-called bio-plausible constraints are added: (1) when value weights (readout) is non-negative, (2) when the activity is non-negative (normal sigmoid rather than downscaled between -0.5 and 0.5), (3) when the feedback weights are non-negative, (4) when the learning rule is revised to be monotic: the weights are not downregulated. In the simple task considered all four biological features do not appear to impair totally the learning.

      Strengths:

      (1) The learning rules are implemented in a low-level fashion of the form: (pre-synaptic-activity) x (post-synaptic-activity) x feedback x RPE. Which is therefore interpretable in terms of measurable quantities in the wet-lab.

      (2) I find that non-negative FA (FA with non negative c and w) is the most valuable theoretical insight of this paper: I understand why the alignment between w and c is automatically better at initialization.

      (3) The task choice is relevant since it connects with experimental settings of reward conditioning with possible plasticity measurements.

      Weaknesses:

      (4) The task is rather easy, so it's not clear that it really captures the computational gap that exists with FA (gradient-like learning) and simpler learning rule like a delta rule: RPE x (pre-synpatic) x (postsynaptic). To control if the task is not too trivial, I suggest adding a control where the vector c is constant c_i=1.

      We have examined the cases where the feedback was uniform, i.e., in the direction of (1, 1, ..., 1) in both models without and with non-negative constraint. In both models, the models with uniform feedback performed somewhat worse than the original models with random feedback, but still better than the models with untrained RNN. We have added the results in Fig. 2J-right and Line 206-209 (for our original models without non-negative constraint):

      “The green line in Fig. 2J-right shows the performance of a special case where the random feedback in oVRNNrf was fixed to the direction of (1, 1, ..., 1) <sup>T</sup> (i.e., uniform feedback) with a random coefficient, which was largely comparable to, but somewhat worse than, that for the general oVRNNrf (blue line).”

      and Fig. 6E-right and Line 402-407 (for our extended models with non-negative constraint):

      “The green and light blue lines in the right panels of Figure 6E and Figure 6F show the results for special cases where the random feedback in oVRNNrf-bio was fixed to the direction of (1, 1, ..., 1) <sup>T</sup> (i.e., uniform feedback) with a random non-negative magnitude (green line) or a fixed magnitude of 0.5 (light blue line). The performance of these special cases, especially the former (with random magnitude) was somewhat worse than that of oVRNNrf-bio, but still better than that of the models with untrained RNN.”

      We have also added a discussion on the biological implication of the model with uniform feedback mentioned in our provisional reply in Line 644-652:

      “We have shown that oVRNNrf and oVRNNrf-bio could work even when the random feedback was uniform, i.e., fixed to the direction of (1, 1, ..., 1) <sup>T</sup>, although the performance was somewhat worse. This is reasonable because uniform feedback can still encode scalar TD-RPE that drives our models, in contrast to a previous study [45], which considered DA's encoding of vector error and thus regarded uniform feedback as a negative control. If oVRNNrf/oVRNNrf-bio-like mechanism indeed operates in the brain and the feedback is near uniform, alignment of the value weights w to near (1, 1, ..., 1) is expected to occur. This means that states are (learned to be) represented in such a way that simple summation of cortical neuronal activity approximates value, thereby potentially explaining why value is often correlated with regional activation (fMRI BOLD signal) of cortical regions [113].”

      In addition, while preparing the revised manuscript, we found a recent simulation study, which showed that uniform feedback coupled with positive forward weights was effective in supervised learning of one-dimensional output in feed-forward network (Konishi et al., 2023, Front Neurosci).

      We have briefly discussed this work in Line 653-655:

      “Notably, uniform feedback coupled with positive forward weights was shown to be effective also in supervised learning of one-dimensional output in feed-forward network [114], and we guess that loose alignment may underlie it.”

      (5) Related to point 3), the main strength of this paper is to draw potential connection with experimental data. It would be good to highlight more concretely the prediction of the theory for experimental findings. (Ideally, what should be observed with non-negative FA that is not expected with FA or a delta rule (constant global feedback) ?).

      We have added a discussion on the prediction of our models, mentioned in our provisional reply, in Line 627-638:

      “oVRNNrf predicts that the feedback vector c and the value-weight vector w become gradually aligned, while oVRNNrf-bio predicts that c and w are loosely aligned from the beginning. Element of c could be measured as the magnitude of pyramidal cell's response to DA stimulation. Element of w corresponding to a given pyramidal cell could be measured, if striatal neuron that receives input from that pyramidal cell can be identified (although technically demanding), as the magnitude of response of the striatal neuron to activation of the pyramidal cell. Then, the abovementioned predictions could be tested by (i) identify cortical, striatal, and VTA regions that are connected, (ii) identify pairs of cortical pyramidal cells and striatal neurons that are connected, (iii) measure the responses of identified pyramidal cells to DA stimulation, as well as the responses of identified striatal neurons to activation of the connected pyramidal cells, and (iv) test whether DA→pyramidal responses and pyramidal→striatal responses are associated across pyramidal cells, and whether such associations develop through learning.”

      Moreover, we have considered another (technically more doable) prediction of our model, and described it in Line 639-643:

      “Testing this prediction, however, would be technically quite demanding, as mentioned above. An alternative way of testing our model is to manipulate the cortical DA feedback and see if it will cause (re-)alignment of value weights (i.e., cortical striatal strengths). Specifically, our model predicts that if DA projection to a particular cortical locus is silenced, effect of the activity of that locus on the value-encoding striatal activity will become diminished.”

      (6a) Random feedback with RNN in RL have been studied in the past, so it is maybe worth giving some insights how the results and the analyzes compare to this previous line of work (for instance in this paper [1]). For instance, I am not very surprised that FA also works for value prediction with TD error. It is also expected from the literature that the RL + RNN + FA setting would scale to tasks that are more complex than the conditioning problem proposed here, so is there a more specific take-home message about non-negative FA? or benefits from this simpler toy task? [1] https://www.nature.com/articles/s41467-020-17236-y

      As for a specific feature of non-negative models, we did not describe (actually did not well recognize) an intriguing result that the non-negative random feedback model performed generally better than the models without non-negative constraint with either backprop or random feedback (Fig. 2J-left versus Fig. 6E-left (please mind the difference in the vertical scales)). This suggests that the non-negative constraint effectively limited the parameter space and thereby learning became efficient. We have added this result in Line 392-395:

      “Remarkably, oVRNNrf-bio generally achieved better performance than both oVRNNbp and oVRNNrf, which did not have the non-negative constraint (Wilcoxon rank sum test, vs oVRNNbp : p < 7.8×10,sup>−6</sup> for 5 or ≥25 RNN units; vs oVRNNrf: p < 0.021 for ≤10 or ≥20 RNN units).”

      Also, in the models with non-negative constraint, the model with random feedback learned more rapidly than the model with backprop although they eventually reached a comparable level of errors, at least in the case with 20 RNN units. This is presumably because the value weights did not develop well in early trials and so the backprop-based feedback, which was the same as the value weights, did not work well, while the non-negative fixed random feedback worked finely from the beginning. We have added this result in Fig. 6I and Line 417-422:

      “Figure 6I shows how learning proceeded across trials in the models with 20 RNN units. While oVRNNbp-rev and oVRNNrf-bio eventually reached a comparable level of errors, oVRNNrf-bio outperformed oVRNNbp-rev in early trials (at 200, 300, 400, or 500 trials; p < 0.049 in Wilcoxon rank sum test for each). This is presumably because the value weights did not develop well in early trials and so the backprop-type feedback, which was the same as the value weights, did not work well, while the non-negative fixed random feedback worked finely from the beginning.”

      We have also added a discussion on how our model can be positioned in relation to other models including the study you mentioned (e-prop by Bellec, ..., Maass, 2020) in subsection “Comparison to other algorithms” of the Discussion):

      Regarding the slightly better performance of the non-negative model with random feedback than that of the non-negative model with backprop when the number of RNN units was large (mentioned in our provisional reply), state values in the backprop model appeared underdeveloped than those in the random feedback model. Slightly better performance of random feedback than backprop held also in our extended model incorporating excitatory and inhibitory units (Fig. 9B).

      (6b) Related to task complexity, it is not clear to me if non-negative value and feedback weights would generally scale to harder tasks. If the task in so simple that a global RPE signal is sufficient to learn (see 4 and 5), then it could be good to extend the task to find a substantial gap between: global RPE, non-negative FA, FA, BP. For a well chosen task, I expect to see a performance gap between any pair of these four learning rules. In the context of the present paper, this would be particularly interesting to study the failure mode of non-negative FA and the cases where it does perform as well as FA.

      In the cue-reward association task with 3 time-steps delay, the non-negative model with random feedback performed largely comparably to the non-negative model with backprop, and this remained to hold in a task where distractor cue, which was not associated with reward, appeared in random timings. We have added the results in Fig. 10 and subsection “4.2 Task with distractor cue”.

      We have also examined the cases where the cue-reward delay was elongated. In the case of longer cue-reward delay (6 time-steps), in the models without non-negative constraint, the model with random feedback performed comparably to (and slightly better than when the number of RNN units was large) the model with backprop (Fig. 2M). In contrast, in the models with non-negative constraint, the model with random feedback underperformed the model with backprop (Fig. 6J, left-bottom). This indicates a difference between the effect of non-negative random feedback and the effect of positive+negative random feedback.

      We have further examined the performance of the models in terms of action selection, by extending the models to incorporate an actor-critic algorithm. In a task with inter-temporal choice (i.e., immediate small reward vs delayed large reward), the non-negative model with random feedback performed worse than the non-negative model with backprop when the number of RNN units was small. When the number of RNN increased, these models performed more comparably. These results are described in Fig. 11 and subsection “4.3 Incorporation of action selection”.

      (7) I find that the writing could be improved, it mostly feels more technical and difficult than it should. Here are some recommendations:

      7a) for instance the technical description of the task (CSC) is not fully described and requires background knowledge from other paper which is not desirable.

      7b) Also the rationale for the added difficulty with the stochastic reward and new state is not well explained.

      7c) In the technical description of the results I find that the text dives into descriptive comments of the figures but high-level take home messages would be helpful to guide the reader. I got a bit lost, although I feel that there is probably a lot of depth in these paragraphs.

      As for 7a), 'CSC (complete serial compound)' was actually not the name of the task but the name of the 'punctate' state representation, in which each state (timing from cue) is represented in a punctate manner, i.e., by a one-hot vector such as (1, 0, ..., 0), (0, 1, ..., 0), ..., and (0, 0, ..., 1). As you pointed out, using the name of 'CSC' would make the text appearing more technical than it actually is, and so we have moved the reference to the name of 'CSC' to the Methods (Line 903-907):

      “For the agents with punctate state representation, which is also referred to as the complete serial compound (CSC) representation [1, 48, 133], each timing from a cue in the tasks was represented by a 10-dimensional one-hot vector, starting from (1 0 0 ... 0)<sup>T</sup> for the cue state, with the next state (0 1 0 ... 0) <sup>T</sup> and so on.”

      and in the Results we have instead added a clearer explanation (Line 163-165):

      “First, for comparison, we examined traditional TD-RL agent with punctate state representation (without using the RNN), in which each state (time-step from a cue) was represented in a punctate manner, i.e., by a one-hot vector such as (1, 0, ..., 0), (0, 1, ..., 0), and so on.”

      As for 7b), we have added the rationale for our examination of the tasks with probabilistic structures (Line 282-294):

      “Previous work [54] examined the response of DA neurons in cue-reward association tasks in which reward timing was probabilistically determined (early in some trials but late in other trials). There were two tasks, which were largely similar but there was a key difference that reward was given in all the trials in one task whereas reward was omitted in some randomly determined trials in another task. Starkweather et al. [54] found that the DA response to later reward was smaller than the response to earlier reward in the former task, presumably reflecting the animal's belief that delayed reward will surely come, but the opposite was the case in the latter task, presumably because the animal suspected that reward was omitted in that trial. Starkweather et al.[54] then showed that such response patterns could be explained if DA encoded TD-RPE under particular state representations that incorporated the probabilistic structures of the task (called the 'belief state'). In that study, such state representations were 'handcrafted' by the authors, but the subsequent work [26] showed that the original value-RNN with backprop (BPTT) could develop similar representations and reproduce the experimentally observed DA patterns.”

      As for 7c), we have extensively revised the text of the results, adding high-level explanations while trying to reduce the lengthy low-level descriptions (e.g., Line 172-177 for Fig2E-G).

      (8) Related to the writing issue and 5), I wished that "bio-plausibility" was not the only reason to study positive feedback and value weights. Is it possible to develop a bit more specifically what and why this positivity is interesting? Is there an expected finding with non-negative FA both in the model capability? or maybe there is a simpler and crisp take-home message to communicate the experimental predictions to the community would be useful?

      There is actually an unexpected finding with non-negative model: the non-negative random feedback model performed generally better than the models without non-negative constraint with either backprop or random feedback (Fig. 2J-left versus Fig. 6E-left), presumably because the nonnegative constraint effectively limited the parameter space and thereby learning became efficient, as we mentioned in our reply to your point 6a above (we did not well recognize this at the time of original submission).

      Another potential merit of our present work is the simplicity of the model and the task. This simplicity enabled us to derive an intuitive explanation on why feedback alignment could occur. Such an intuitive explanation was lacking in previous studies while more precise mathematical explanations did exist. Related to the mechanism of feedback alignment, one thing remained mysterious to us at the time of original submission. Specifically, in the non-negatively constraint random feedback model, while the angle between the value weight (w) and the random feedback (c) was relatively close (loosely aligned) from the beginning, it appeared (as mentioned in the manuscript) that there was no further alignment over trials (and the angle actually settled at somewhat larger than 45°), despite that the same mechanism for feedback alignment that we derived for the model without non-negative constraint was expected to operate also under the non-negative constraint. We have now clarified the reason for this, and found a way, introduction of slight decay (forgetting) of value weights, by which feedback alignment came to occur in the non-negatively constraint model. We have added this in Line 463-477:

      “As mentioned above, while the angle between w and c was on average smaller than 90° from the beginning, there was no further alignment over trials. This seemed mysterious because the mechanism for feedback alignment that we derived for the models without non-negative constraint was expected to work also for the models with non-negative constraint. As a possible reason for the non-occurrence of feedback alignment, we guessed that one or a few element(s) of w grew prominently during learning, and so w became close to an edge or boundary of the non-negative quadrant and thereby angle between w and other vector became generally large (as illustrated in Fig. 8D). Figure 8Ea shows the mean±SEM of the elements of w ordered from the largest to smallest ones after 1500 trials. As conjectured above, a few elements indeed grew prominently.

      We considered that if a slight decay (forgetting) of value weights (c.f., [59-61]) was assumed, such a prominent growth of a few elements of w may be mitigated and alignment of w to c, beyond the initial loose alignment because of the non-negative constraint, may occur. These conjectures were indeed confirmed by simulations (Fig. 8Eb,c and Fig. 8F). The mean squared value error slightly increased when the value-weightdecay was assumed (Fig. 8G), however, presumably reflecting a decrease in developed values and a deterioration of learning because of the decay.”

      Correction of an error in the original manuscript

      In addition to revising the manuscript according to your comments, we have made a correction on the way of estimating the true state values. Specifically, in the original manuscript, we defined states by relative time-steps from a reward and estimated their values by calculating the sums of discounted future rewards starting from them through simulations. However, we assumed variable inter-trial intervals (ITIs) (4, 5, 6, or 7 time-steps with equal probabilities), and so until receiving cue information, agent should not know when the next reward will come. Therefore, states for the timings up to the cue timing cannot be defined by the upcoming reward, but previously we did so (e.g., state of "one timestep before cue") without taking into account the ITI variability.

      We have now corrected this issue, having defined the states of timings with respect to the previous (rather than upcoming) reward. For example, when ITI was 4 time-steps and agent existed in its last time-step, agent will in fact receive a cue at the next time-step, but agent should not know it until actually receiving the cue information and instead should assume that s/he was at the last time-step of ITI (if ITI was 4), last − 1 (if ITI was 5), last − 2 (if ITI was 6), or last − 3 (if ITI was 7) with equal probabilities (in a similar fashion to what we considered when thinking about state definition for the probabilistic tasks). We estimated the true values of states defined in this way through simulations. As a result, the corrected true value of the cue-timing has become slightly smaller than the value described in the original manuscript (reflecting the uncertainty about ITI length), and consequently small positive TD-RPE has now appeared at the cue timing.

      Because we measured the performance of the models by squared errors in state values, this correction affected the results reporting the performance. Fortunately, the effects were relatively minor and did not largely alter the results of performance comparisons. However, we sincerely apologize for this error. In the revised manuscript, we have used the corrected true values throughout the manuscript, and we have described the ways of estimating these values in Line 919-976.

    1. eLife Assessment

      This important manuscript presents a thorough analysis of trans-specific polymorphism (TSP) in Major Histocompatibility Complex gene families across primates. The analysis makes the most of currently available genomic data and methods to substantially increase the amount and evolutionary time that TSPs can be observed. Both false negative TSPs due to missing genes at the assembly and/or annotation level, as well as false positives due to read mismapping with missing paralogs, are well assessed and discussed. Overall the evidence provided is compelling, and the manuscript clearly delineates the path for future progress on the topic.

    2. Reviewer #2 (Public review):

      Summary:

      In this study, the authors characterized population genetic variation in the MHC locus across primates and looked for signals of long-term balancing selection (specifically trans-species polymorphism, TSP) in this highly polymorphic region. To carry out these tasks, they used Bayesian methods for phylogenetic inference (i.e. BEAST2) and applied a new Bayesian test to quantify evidence supporting monophyly vs. transspecies polymorphism for each exon across different species pairs. Their results, although mostly confirmatory, represent the most comprehensive analyses of primate MHC evolution to date and novel findings or possible discrepancies are clearly pointed out. However, as the authors discuss, the available data are insufficient to fully capture primates' MHC evolution.

      Strengths of the paper include: using appropriate methods and statistically rigorous analyses; very clear figures and detailed description of the results methods that make it easy to follow despite the complexity of the region and approach; a clever test for TSP that is then complemented by positive selection tests and the protein structures for a quite comprehensive study.

      That said, weaknesses include: lack of information about how many sequences are included and whether uneven sampling across taxa might results in some comparisons without evidence for TSP; frequent reference to the companion paper instead of summarizing (at least some of) the critical relevant information (e.g., how was orthology inferred?); no mention of the quality of sequences in the database and whether there is still potential effects of mismapping or copy number variation affecting the sequence comparison.

      Comments on revisions:

      The authors have sufficiently addressed the reviewers' comments or provided additional details justifying their work. In particular, expansion of the discussion section on limitations of the analysis and clearer reference to how this relates to their companion paper represent improvements. Remaining suggestions are to still make clearer how much sparsity of sequences in the database may impact the conclusions (e.g., is this more of a problem for some genes or taxa than others? Is it a small problem or a large problem?). The data summary tables are a bit hard to read and seem to contain some information not used in the article - maybe the presentation of these could be improved or the full details, or a shorter table summer in the main paper and full details only in the supplement.

    3. Reviewer #3 (Public review):

      Summary:

      The study uses publicly available sequences of classical and non-classical genes from a number of primate species to assess the extent and depth of TSP across the primate phylogeny. The analyses were carried out in a coherent and, in my opinion, robust inferential framework and provide evidence for ancient (even > 30 million years) TSP at several classical class I and class II genes. The authors also characterise evolutionary rates at individual codons, map these rates onto MHC protein structures, and find that the fastest evolving codons are extremely enriched for autoimmune and infectious disease associations.

      Strengths:

      The study is comprehensive, relying on a large data set, state-of-the-art phylogenetic analyses and elegant tests of TSP. The results are not entirely novel, but a synthesis and re-analysis of previous findings is extremely valuable and timely.

      Weaknesses:

      Following the revision by the Authors I see mostly one weakness - Older literature on the subject is duly cited, but the discussion of the findings the context of this literature is limited.

      Comments on revisions:

      Lines 441-452 - In this section, you discuss an apparent paradox between long-lived balancing selection and strong directional selection, referencing elevated substitution rates. However, this issue is more nuanced and may not be best framed in terms of substitution rates. That terminology is common in phylogenetic analyses, where differences between sequences-or changes along phylogenetic branches-are often interpreted as true substitutions in the population genetic sense. In the case of MHC trees and the rates you're discussing here, the focus is more accurately on the rate at which new mutations become established within particular allelic lineages. So while this still concerns evolutionary rates at specific codons, equating them directly with substitution rates may be misleading. A more precise term or framing might be warranted in this context.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      MHC (Major Histocompatibility Complex) genes have long been mentioned as cases of trans-species polymorphism (TSP), where alleles might have their most recent common ancestor with alleles in a different species, rather than other alleles in the same species (e.g., a human MHC allele might coalesce with a chimp MHC allele, more recently than the two coalesce with other alleles in either species). This paper provides a more complete estimate of the extent and ages of TSP in primate MHC loci. The data clearly support deep TSP linking alleles in humans to (in some cases) old world monkeys, but the amount of TSP varies between loci.

      Strengths:

      The authors use publicly available datasets to build phylogenetic trees of MHC alleles and loci. From these trees they are able to estimate whether there is compelling support for Trans-species polymorphisms (TSPs) using Bayes Factor tests comparing different alternative hypotheses for tree shape. The phylogenetic methods are state-of-the-art and appropriate to the task.

      The authors supplement their analyses of TSP with estimates of selection (e.g., dN/dS ratios) on motifs within the MHC protein. They confirm what one would suspect: classical MHC genes exhibit stronger selection at amino acid residues that are part of the peptide binding region, and non-classical MHC exhibit less evidence of selection. The selected sites are associated with various diseases in GWAS studies.

      Weaknesses:

      An implication drawn from this paper (and previous literature) is that MHC has atypically high rates of TSP. However, rates of TSP are not estimated for other genes or gene families, so readers have no basis of comparison. No framework to know whether the depth and frequency of TSP is unusual for MHC family genes, relative to other random genes in the genome, or immune genes in particular. I expect (from previous work on the topic), that MHC is indeed exceptional in this regard, but some direct comparison would provide greater confidence in this conclusion.

      We agree that context is important! Although we expected to get the most interesting results from studying the classical genes, we did include the non-classical genes specifically for comparison. They are located in the same genomic region, have multiple sequences catalogued in different species (although they are less diverse), and perform critical immune functions. We think this is a more appropriate set to compare with the classical MHC genes than, say, a random set of genes. Interestingly, we did not detect TSP in these non-classical genes. This likely means that the classical MHC genes are truly exceptional, but it could also mean that not enough sequences are available for the non-classical genes to detect TSP. 

      It would be very interesting to repeat this analysis for another gene family to see whether such deep TSP also occurs in other immune or non-immune gene families. We are lucky that decades of past work and a dedicated database exists for cataloging MHC sequences. When this level of sequence collection is achieved for other highly polymorphic gene families, it will be possible to do a comparable analysis.  

      Given the companion paper's evidence of genic gain/loss, it seems like there is a real risk that the present study under-estimates TSP, if cases of TSP have been obscured by the loss of the TSP-carrying gene paralog from some lineages needed to detect the TSP. Are the present analyses simply calculating rates of TSP of observed alleles, or are you able to infer TSP rates conditional on rates of gene gain/loss?

      We were not able to infer TSP rates conditional on rates of gene gain/loss. We agree that some cases of TSP were likely lost due to the loss of a gene paralog from certain species. Furthermore, the dearth of MHC whole-region and allele sequences available for most primates makes it difficult to detect TSP, even if the gene paralog is still present. Long-read sequencing of more primate genomes should help with this. We agree that it would also be very interesting to study TSPs that were maintained for millions of years but were lost recently.

      Figure 5 (and 6) provide regression model fits (red lines in panel C) relating evolutionary rates (y axis not labeled) to site distance from the peptide binding groove, on the protein product. This is a nice result. I wonder, however, whether a linear model (as opposed to non-linear) is the most biologically reasonable choice, and whether non-linear functions have been evaluated. The authors might consider generalized additive models (GAMs) as an alternative that relaxes linearity assumptions.

      We agree that a linear model is likely not the most biologically reasonable choice, as protein interactions are complex. However, we made the choice to implement the simplest model because the evolutionary rates we inferred were relative, making parameters relatively meaningless. We were mainly concerned with positive or negative slopes and we leave the rest to the protein interaction experts.

      The connection between rapidly evolving sites, and disease associations (lines 382-3) is very interesting. However, this is not being presented as a statistical test of association. The authors note that fast-evolving amino acids all have at least one association: but is this really more disease-association than a random amino acid in the MHC? Or, a randomly chosen polymorphic amino acid in MHC? A statistical test confirming an excess of disease associations would strengthen this claim.

      To strengthen this claim, we added Figure 6 - Figure Supplement 7 (NOTE: this needs to be renamed as Table 1 - Figure Supplement 1, which the eLife template does not allow). Here, we plot the number of associations for each amino acid against evolutionary rate, revealing a significant positive slope in Class I. We also added explanatory text for this figure in lines 400-404.

      Reviewer #2 (Public review):

      Summary

      In this study, the authors characterized population genetic variation in the MHC locus across primates and looked for signals of long-term balancing selection (specifically trans-species polymorphism, TSP) in this highly polymorphic region. To carry out these tasks, they used Bayesian methods for phylogenetic inference (i.e. BEAST2) and applied a new Bayesian test to quantify evidence supporting monophyly vs. transspecies polymorphism for each exon across different species pairs. Their results, although mostly confirmatory, represent the most comprehensive analyses of primate MHC evolution to date and novel findings or possible discrepancies are clearly pointed out. However, as the authors discuss, the available data are insufficient to fully capture primates' MHC evolution.

      Strengths of the paper include: using appropriate methods and statistically rigorous analyses; very clear figures and detailed description of the results methods that make it easy to follow despite the complexity of the region and approach; a clever test for TSP that is then complemented by positive selection tests and the protein structures for a quite comprehensive study.

      That said, weaknesses include: lack of information about how many sequences are included and whether uneven sampling across taxa might results in some comparisons without evidence for TSP; frequent reference to the companion paper instead of summarizing (at least some of) the critical relevant information (e.g., how was orthology inferred?); no mention of the quality of sequences in the database and whether there is still potential effects of mismapping or copy number variation affecting the sequence comparison.

      To address these comments, we added Tables 2-4 to allow readers to more readily understand the data we included in each group. We refer to these tables in the introduction (line 95), in the “Data” section of the results (lines 128-129), and the “Data” section of the methods (lines 532-534).  We also added text (lines 216-219 and 250-252) to more explicitly point out that our method is conservative when few sequences are available.

      We also added a paragraph to the discussion which addresses data quality and mismapping issues (lines 473-499).

      We clarified the role of our companion paper (line 49-50) by changing “In our companion paper, we explored the relationships between the different classical and non-classical genes” to “In our companion paper, we built large multi-gene trees to explore the relationships between the different classical and non-classical genes.” We also changed the text in lines 97-99 from “In our companion paper, we compared genes across dozens of species and learned more about the orthologous relationships among them” to “In our companion paper, we built trees to compare genes across dozens of species. When paired with previous literature, these trees helped us infer orthology and assign sequences to genes in some cases.”

      Reviewer #3 (Public review):

      Summary

      The study uses publicly available sequences of classical and non-classical genes from a number of primate species to assess the extent and depth of TSP across the primate phylogeny. The analyses were carried out in a coherent and, in my opinion, robust inferential framework and provided evidence for ancient (even > 30 million years) TSP at several classical class I and class II genes. The authors also characterise evolutionary rates at individual codons, map these rates onto MHC protein structures, and find that the fastest evolving codons are extremely enriched for autoimmune and infectious disease associations.

      Strengths

      The study is comprehensive, relying on a large data set, state-of-the-art phylogenetic analyses and elegant tests of TSP. The results are not entirely novel, but a synthesis and re-analysis of previous findings is extremely valuable and timely.

      Weaknesses

      I've identified weaknesses in several areas (details follow in the next section):

      -  Inadequate description and presentation of the data used

      -  Large parts of the results read like extended figure captions, which breaks the flow. - Older literature on the subject is duly cited, but the authors don't really discuss their findings in the context of this literature.

      -  The potential impact of mechanisms other than long-term maintenance of allelic lineages by balancing selection, such as interspecific introgression and incorrect orthology assessment, needs to be discussed.

      We address these comments in the more detailed section below.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      The abstract could benefit from being sharpened. A personal pet peeve is a common habit of saying we don't know everything about a topic (line 16 - "lack a full picture of primate MHC evolution"); We never know everything on a topic, so this is hardly a strong rationale to do more work on it. This is followed by "to start addressing this gap" - which is vague because you haven't explicitly stated any gap, you simply said we are not yet omniscent on the topic. Please clearly identify a gap in our knowledge, a question that you will be able to answer with this paper.

      That makes sense! We added another sentence to the abstract to make the specific gap clearer. Inserted “In particular, we do not know to what extent genes and alleles are retained across speciation events” in lines 16-17.

      Reviewer #2 (Recommendations for the authors):

      - Some discussion of alternative explanations when certain comparisons were not found to have TSP - is this consistent with genetic drift sometimes leading to lineage loss, or does it suggest that the proposed tradeoff between autoimmunity and pathogen recognition might differ depending on primates' life history and/or exposure to similar pathogens? Could the trade-off of pathogen to self-recognition not be as costly in some species?

      This is consistent with genetic drift, as no lineages are expected to be maintained across these distantly-diverged primates under neutral selection. These ideas are certainly possible, but our Bayes Factor test only reveals evidence (or lack thereof) for deviations from the species tree and cannot provide reasons why or why not.

      - It would be interesting to put these results on very long-term balancing selection in the context of what has been reported at the region for shorter term balancing selection. The discussion compares findings of previous genes in the literature but not regarding the time scale.

      Indeed, there is some evidence for the idea of “divergent allele advantage”, in which MHC-heterozygous individuals have a greater repertoire of peptides that they can present, leading to greater resistance against pathogens and greater fitness. This heterozygote advantage thus leads to balancing selection (Pierini and Lenz, 2018; Chowell et al., 2019). Our discussion mentions other time scales of balancing selection across the primates at the MHC and other loci, but we choose to focus more on long-term than short-term balancing selection.

      - Lines 223-226 - how is the difference in BF across exons in MHC-A to be interpreted? The paragraph is about MHC-A, but then the explanation in the last sentence is for when similar BF are observed which is not the case for MHC-A. Is this interpreted as lack of evidence for TSP? Or something about recombination or gene conversion? Or that one exon may be under balancing selection but not the other?

      Thank you for pointing out the confusing logic in this paragraph. 

      Previous: “For MHC-A, Bayes factors vary considerably depending on exon and species pair. Many sequences had to be excluded from MHC-A comparisons because they were identified as gene-converted in the \textit{GENECONV} analysis or were previously identified as recombinants \citep{Hans2017,Gleimer2011,Adams2001}. Importantly, for MHC-A we do not see concordance in Bayes factors across the different exons, whereas we do for the other gene groups. Similar Bayes factors across all exons for a given comparison is thus evidence in favor of TSP being the primary driver of the observed deep coalescence structure (rather than recombination or gene conversion).” Current (lines 228-238): 

      “For MHC-A, Bayes factors vary considerably depending on exon and species pair. Past work suggests that this gene has had a long history of gene conversion affecting different exons, resulting in different evolutionary histories for different parts of the gene \citep{Hans2017,Gleimer2011,Adams2001}. Indeed, we excluded many MHC-A sequences from our Bayes factor calculations because they were identified as gene-converted in our \textit{GENECONV} analysis or were previously suggested to be recombinants. As shown in \FIG{bayes_factors_classI}, the lack of concordance in Bayes factors across the different exons for MHC-A is evidence for gene conversion, rather than balancing selection, being the most important factor in this gene's evolution. In contrast, the other gene groups generally show concordance in Bayes factors across exons. We interpret this as evidence in favor of TSP being the primary driver of the observed deep coalescence structure for MHC-B and -C (rather than recombination or gene conversion).”

      - In Figures 5C and 6C, the points sometimes show a kind of smile pattern of possibly higher rates further from the peptide. Did authors explore other fits like a polynomial? Or, whether distance only matters in close proximity to the peptide? Out of curiosity, is it possible to map substitution time/branch into the distance to the peptide binding region for each substitution? Is there any pattern with distance to interacting proteins in non-peptide binding MHC proteins like MHC-DOA? Although they don't have a PBR they do interact with other proteins.

      Thank you for these ideas! We did not explore other fits, such as a polynomial, because we wanted to implement the simplest model. Our evolutionary rates are relative, making parameters relatively meaningless. We were mainly concerned with positive or negative slopes and we leave the rest to the protein interaction experts.

      There is most likely a relationship between evolutionary rate and the distance to interacting proteins in the non-peptide-binding molecules MHC-DM and -DO. However, there are few currently available models and it is difficult to determine which residues in these models are actually interacting. However, researchers with more experience in protein interactions would be able to undertake such an analysis. 

      - How biased is the database towards human alleles? Could this affect some of the analyses, including the coincidence of rapidly evolving sites with associations? Are there more associations than expected under some null model?

      While the database is indeed biased toward human alleles, we included only a small subset of these in order to create a more balanced data set spanning the primates. This is unlikely to affect the coincidence of rapidly-evolving sites with associations; however, we note that there are no such association studies meeting our criteria in other species, meaning the associations are only coming from studies on humans.

      - To this reader, it is unnecessary and distracting to describe the figures within the text; there are frequent sentences in the text that belongs in the figure legend instead (e.g., lines 139-143, 208-211, 214-215, 328-330, etc). It would be better to focus on the results from the figures and then cite the figure, where the colors and exactly what is plotted can be in the figure legend.

      We appreciate these comments on overall flow. We removed lines 139-143 and lengthened the Figure 2 caption (and associated supplementary figure captions) to contain all necessary detail. We removed lines 208-211 and 214-215 and lengthened the captions for Figure 3, Figure 4, and associated supplementary figures. We removed a sentence from lines 303-304.  

      - I'm still concerned that the poor mappability of short-read data is contributing in some ways. Were the sequences in the database mostly from long-reads? Was nucleotide diversity calculated directly from the sequences in the database or from another human dataset? Is missing data at some sites accounted for in the denominator?

      The sequences in the database are mostly from short reads and come from a wide array of labs. We have added a paragraph to the discussion to explain the limitations of this (lines 473-499). However, the nucleotide diversity calculations shown in Figure 1 do not rely on the MHC database; rather, they are calculated from the human genomes in the 1000 Genomes project. Nucleotide diversity would be calculable for other species, but we did not do so for exactly the reason you mention–too much missing data.

      - The Figure 2 and Figure 3 supplements took me a little bit to understand - is it really worth pointing out the top 5 Bayes-factor comparisons when there is no evidence for TSP? A lot of the colored squares are not actually supporting TSP but in the grids you can't see which are and which aren't without looking at the Bayes Factor. I wonder if it would help if only those with BF > 100 were shown? Or if these were marked some other way so that it was easy to see where TSPs are supported.

      Thank you for your perspective on these figures! We initially limited them to only show >100 Bayes factors for each gene group and region, but some gene groups have no high Bayes factors. Additionally, the “summary” tree pictured in these figures is necessarily a simplification of the full space of posterior trees. We felt that showing low Bayes factor comparisons could help readers understand this relationship. For example, allele sets that look non-monophyletic on the summary tree may still have a low Bayes factor, showing that they are generally monophyletic throughout the larger (un-visualizable) space of trees.

      Reviewer #3 (Recommendations for the authors):

      Specific comments

      Abstract

      I think the abstract would benefit from some editing. For example, one might get the impression that you equate allele sharing, which would normally be understood as sharing identical sequences, with sharing ancestral allelic lineages. This distinction is important because you can have many TSPs without sharing identical allele sequences. In l. 20 you write about "deep TSP", which requires either definition of reformulation. In l. 21-23 you seem to suggest that long-term retention of allelic lineages is surprising in the light of rapid sequence evolution - it may be, depending on the evolutionary scenarios one is willing to accept, but perhaps it's not necessary to float such a suggestion in the abstract where it cannot be properly explained due to space constraints? The last sequence needs a qualifier like "in some cases".

      Thank you for catching these! For clarity, we changed several words:

      ● “alleles” to “allelic lineages” in line 13

      ● “deep” to “ancient” in line 21

      ● “Despite” to “in addition to” in line 22

      ● Added “in some cases” to line 28

      Results - Overall, parts of the results read like extended figure captions. I understand that the authors want to make the complex figures accessible to the reader. However, including so much information in the text disrupts the flow and makes it difficult to follow what the main findings and conclusions are.

      We appreciate these comments on overall flow. We removed lines 139-143 and lengthened the Figure 2 caption (and associated supplementary figure captions) to contain all necessary detail. We removed lines 208-211 and 214-215 and lengthened the captions for Figure 3, Figure 4, and associated supplementary figures. We removed a sentence from lines 303-304.  

      l. 37-39 such a short sentence on non-classical MHC is necessarily an oversimplification, I suggest it be expanded or deleted.

      There is certainly a lot to say about each of these genes! While we do not have space in this paper’s introduction to get into these genes’ myriad functions, we added a reference to our companion paper in lines 40-41:

      “See the appendices of our companion paper \citep{Fortier2024a} for more detail.”

      These appendices are extensive, and readers can find details and references for literature on each specific gene there. In addition, several genes are mentioned in analyses further on in the results, and their specific functions are discussed in more detail when they arise.

      l. 47 -49 It would be helpful to briefly outline your criteria for selecting these 17 genes, even if this is repeated later.

      Thank you! For greater clarity, we changed the text (lines 50-52) from “Here, we look within 17 specific genes to characterize trans-species polymorphism, a phenomenon characteristic of long-term balancing selection.” to “Here, we look within 17 specific genes---representing classical, non-classical, Class I, and Class II ---to characterize trans-species polymorphism, a phenomenon characteristic of long-term balancing selection.“  

      l.85-87 I may be completely wrong, but couldn't problems with establishing orthology in some cases lead to false inferences of TSP, even in primates? Or do you think the data are of sufficient quality to ignore such a possibility? (you touch on this in pp. 261-264)

      Yes, problems with establishing orthology can lead to false inferences of TSP, and it has happened before. For example, older studies that used only exon 2 (binding-site-encoding) of the MHC-DRB genes inferred trees that grouped NWM sequences with ape and OWM sequences. Thus, they named these NWM genes MHC-DRB3 and -DRB5 to suggest orthology with ape/OWM MHC-DRB3 and -DRB5, and they also suggested possible TSP between the groups. However, later studies that used non-binding-site-encoding exons or introns noticed that these NWM sequences did not group with ape/OWM sequences (which now shared the same name), providing evidence against orthology. This illustrates that establishing orthology is critical before assessing TSP (as is comparing across regions). This is part of the reason we published a companion paper (https://doi.org/10.7554/eLife.103545.1), which clears up questions of orthology and supports the analyses we did in this paper. In cases where orthology was ambiguous, this also helped us to be conservative in our conclusions here. The problems with ambiguous gene assignment are also discussed in lines 488-499.

      l. 88-93 is the first place (others are pp. 109-118 and 460-484) where a fuller description of the data used would be welcome. It's clear that the amount of data from different species varies enormously, not only in the number of alleles per locus, but also in the loci for which polymorphism data are available. In such a synthesis study, one would expect at least a tabulation of the data used in the appendices and perhaps a summary table in the main article.

      l. 109-118 Again, a more quantitative summary of the data used, with reference to a table, would be useful.

      Thank you! To address these comments, we added Tables 2-4 to allow readers to more readily understand the data we included in each group. We refer to these tables in the introduction (line 95), in the “Data” section of the results (lines 128-129), and the “Data” section of the methods (lines 532-534). Supplementary Files listing the exact alleles and sequences used in each group are also included in the resubmission.

      l. 123-124 here you say that the definition of the "16 gene groups" is in the methods (probably pp. 471-484), but it would be useful to present an informative summary of your rationale in the introduction or here

      Thank you! We agree that it is helpful to outline these groups earlier. We have changed the paragraph in lines 123-135 from: 

      “We considered 16 gene groups and two or three different genic regions for each group: exon 2 alone, exon 3 alone, and/or exon 4 alone. Exons 2 and 3 encode the peptide-binding region (PBR) for the Class I proteins, and exon 2 alone encodes the PBR for the Class II proteins. For the Class I genes, we also considered exon 4 alone because it is comparable in size to exons 2 and 3 and provides a good contrast to the PBR-encoding exons. See the Methods for more detail on how gene groups were defined. Because few intron sequences were available for non-human species, we did not include them in our analyses.” To: 

      “We considered 16 gene groups spanning MHC classes and functions. These include the classical Class I genes (MHC-A-related, MHC-B-related, MHC-C-related), non-classical Class I genes (MHC-E-related, MHC-F-related, MHC-G-related), classical Class IIA genes (MHC-DRA-related, MHC-DQA-related, MHC-DPA-related), classical Class IIB genes (MHC-DRB-related, MHC-DQB-related, MHC-DPB-related), non-classical Class IIA genes (MHC-DMA-related, MHC-DOA-related, and non-classical Class IIB genes (MHC-DMB-related, MHC-DOB-related). We studied two or three different genic regions for each group: exon 2 alone, exon 3 alone, and (for Class I) exon 4 alone. Exons 2 and 3 encode the peptide-binding region (PBR) for the Class I proteins, and exon 2 alone encodes the PBR for the Class II proteins. For the Class I genes, we also considered exon 4 alone because it is comparable in size to exons 2 and 3 and provides a good contrast to the PBR-encoding exons. Because few intron sequences were available for non-human species, we did not include them in our analyses.”

      l. 100 "alleles" -> "allelic lineages"

      Thank you for catching this. We have changed this language in line 104.

      l. 227-238 it's important to discuss the possible effect of the number of sequences available on the detectability of TSP - this is particularly important as the properties of MHC genealogies may differ considerably from those expected for neutral genealogies.

      This is a good point that may not be obvious to readers. We have added several sentences to clarify this:

      Line 193-194: “In a neutral genealogy, monophyly of each species' sequences is expected.”

      Line 213-219: “Note that the number of sequences available for comparison also affects the detectability of TSP. For example, if the only sequences available are from the same allelic lineage, they will coalesce more recently in the past than they would with alleles from a different lineage and would not show evidence for TSP. This means our method is well-suited to detect TSP when a diverse set of allele sequences are available, but it is conservative when there are few alleles to test. There were few available alleles for some non-classical genes, such as MHC-F, and some species, such as gibbon.”

      Line 244-246: “However, since there are fewer alleles available for the non-classical genes, we note that our method is likely to be conservative here.”

      l. 301 and 624-41 it's been difficult for me to understand the rationale behind using rates at mostly gap positions as the baseline and I'd be grateful for a more extensive explanation

      Normalizing the rates posed a difficult problem. We couldn’t include every single sequence in the same alignment because BEAST’s computational needs scale with the number of sequences. Therefore, we had to run BEAST separately on smaller alignments focused on a single group of genes at a time. We still wanted to be able to compare evolutionary rates across genes, but because of the way SubstBMA is implemented, evolutionary rates are relative, not absolute. Recall that to help us compare the trees, we included a common set of “backbone” sequences in all of the 16 alignments. This set included some highly-diverged genes. Initially, we planned to use 4-fold degenerate sites as the baseline sites for normalization, but there simply weren’t enough of them once we included the “backbone” set on top of the already highly diverse set of sequences in each alignment. This diversity presented an opportunity.  In BEAST, gaps are treated as missing and do not contribute any probability to the relevant branch or site (https://groups.google.com/g/beast-users/c/ixrGUA1p4OM/m/P4R2fCDWMUoJ?pli=1). So, we figured that sites that were “mostly gap” (a gap in all the human backbone sequences but with an insertion in some sequence) were mostly not contributing to the inference of the phylogeny or evolutionary rates. Because the “backbone” sequences are common to all alignments, making the “mostly gap” sites somewhat comparable across sets while not affecting inferred rates, we figured they would be a reasonable choice for the normalization (for lack of a better option).

      We added text to lines 680 and 691-693 to clarify this rationale.

      l. 380-84 this overview seems rather superficial. Would it be possible to provide a more quantitative summary?

      To make this more quantitative, we plotted the number of associations for each amino acid against evolutionary rate, shown in Figure 6 - Figure Supplement 7 (NOTE: this needs to be renamed as Table 1 - Figure Supplement 1, which the template does not allow). This reveals a significant positive slope for the Class I genes, but not for Class II. We also added explanatory text for this figure in lines 400-404.

      Discussion - your approach to detecting TSP is elegant but deserves discussion of its limitations and, in particular, a clear explanation of why detecting TSP rather than quantifying its extent is more important in the context of this work. Another important point for discussion is alternative explanations for the patterns of TSP or, more broadly, gene tree - species tree discordance. Although long-term maintenance of allelic lineages due to long-term balancing selection is probably the most convincing explanation for the observed TSP, interspecific introgression and incorrect orthology assessment may also have contributed, and it would be good to see what the authors think about the potential contribution of these two factors.

      Overall, our goal was to use modern statistical methods and data to more confidently assess how ancient the TSP is at each gene. We have added several lines of text (as noted elsewhere in this document) to more clearly illustrate the limitations of our approach. We also agree that interspecific introgression and incorrect orthology assessment can cause similar patterns to arise. We attempted to minimize the effect of incorrect orthology assessment by creating multi-gene trees and exploring reference primate genomes, as described in our companion paper (https://doi.org/10.7554/eLife.103545.1), but cannot eliminate it completely. We have added a paragraph to the discussion to address this (lines 488-499). Interspecific introgression could also cause gene tree-species tree discordance, but we are not sure about how systematic this would have to be to cause the overall patterns we observe, nor about how likely it would have been for various clades of primates across the world.

      l. 421 -424 A more nuanced discussion distinguishing between positive selection, which facilitates the establishment of a mutation, and directional selection, which leads to its fixation, would be useful here.

      We added clarification to this sentence (line 443-445), from “Indeed, within the phylogeny we find that the most rapidly-evolving codons are substituted at around 2--4-fold the baseline rate.” to “Indeed, within the phylogeny we find that the most rapidly-evolving codons are substituted at around 2--4-fold the baseline rate, generating ample mutations upon which selection may act.”

      l. 432-434 You write here about the shaping of TCR repertoires, but I couldn't find any such information in the paper, including Table 1.

      We did not include a separate column for these, so they can be hard to spot. They take the form of “TCR 𝛽 Interaction Probability >50%”, “TCR Expression (TRAV38-1)”, or “TCR 𝛼 Interaction Probability >50%” and can be found in Table 1.

      l. 436-442 Here a more detailed discussion in the context of divergent allelic advantage and even the evolution of new S-type specificities in plants would be valuable.

      We added an additional citation to a review article to this sentence (lines 438-439).  

      l. 443 The use of the word "training" here is confusing, suggesting some kind of "education" during the lifetime of the animal.

      We agree that “train” is not an entirely appropriate term, and have changed it to “evolve” (line 465).

      489-491 What data were used for these calculations?

      Apologies for missing this citation! We used the 1000 genomes project data, and the citation has been updated (line 541-542).

    1. eLife Assessment

      This study reports valuable findings on the role of Layilin in the motility and suppressive capacity of clonal expanded regulatory T cells (Tregs) in the skin. Although the strength of the study is utilizing conditional knock-out mice and human skin samples, the analysis of the molecular mechanism by which Layilin affects Treg function is incomplete. The study will be of interest to medical scientists working on skin immunology.

    2. Reviewer #1 (Public review):

      Summary and Strengths:

      This work shows that the gene encoding Layilin is expressed preferentially in human skin Tregs, and that the fraction of Tregs expressing Layilin may overexpress genes related to T cell activation and adhesion. Expression of Layilin on Tregs would have no impact on activation markers or in vitro suppressive function. However, activation of Layilin either with a cross-linking antibody or collagen IV, its natural ligand, would promote cell adhesion via LFA1 activation. The in vivo functional role of Layilin in Tregs is studied in a conditional KO mouse model in a model of skin inflammation. Deletion of Layilin in Tregs led to an attenuation of the disease score and a reduction in the cutaneous lymphocyte infiltrate. This work is clearly innovative, but a number of major points limit its interest.

      Weakness and major points:

      (1) The number of panels and figures suggests that this story is quite complete but several data presented in the main figures do not provide essential information for a proper understanding of Layilin's role in Tregs.

      Figures 1I, 1J, and the whole of Figure 2 could be placed as supplementary figures. Also, for Figure 3E, it would be preferable to show the percentage of cells expressing cytokines rather than their absolute numbers. In fact, the drop in the numbers of cytokine-producing cells is probably due solely to the drop in total cell numbers and not to a decrease in the proportion of cells expressing cytokines. If this is the case, these data should be shown in supplementary figures. Finally, Figures 4 and 5 could be merged.

      (2) Some important data are not shown or not mentioned.

      (a) It would be important to show the proportion of Treg, Tconv, and CD8 expressing Layilin in healthy skin and in patients developing psoriasis, as well as in the blood of healthy subjects.<br /> (b) We lack information to be convinced that there is enrichment for migration and adhesion genes in Layilin+ Tregs in the GSEA data. The authors should indicate what geneset libraries they used. Indeed, it is tempting to show only the genesets that give results in line with the message you want to get across. If these genesets come from public banks, the bank used should be indicated, and the results of all gene sets shown in an unbiased way. In addition, it should be indicated whether the analyses were performed on untransformed or pseudobulk scRNAseq data analyses. Finally, it would be preferable to confirm the GSEA data with z-score analyses, as Ingenuity does, for example. Indeed, in GSEA-type analyses, there are genes that have activating but also inhibiting effects on a pathway in a given gene set.<br /> (c) For all FACS data, the raw data should be shown as histograms or dot plots for representative samples.<br /> (d) For Figure 5B, the number of samples analyzed is insufficient to draw clear conclusions.

      (3) For Figs. 4 and 5, the design of the experiment poses a problem. Indeed, the comparison between Layn+ and Layn- cells may, in part, not be directly linked to the expression or absence of expression of this protein. Indeed, Layn+ and Layn- Tregs may constitute populations with different biological properties, beyond the expression of Layn. However, in the experiment design used here, a significant fraction of the sorted Layn- Tregs will be cells belonging to the population that has never expressed this protein. It would have been preferable to sort first the Layn+ Tregs, then knock down this protein and re-sort the Layn- Tregs and Layn+ Tregs. If this experiment is too cumbersome to perform, I agree that the authors should not do it. However, it would be important to mention the point I have just made in the text.

    3. Reviewer #2 (Public review):

      Summary:

      In their manuscript, Gouirand et al. report on the role of Layilin expression for the motility and suppressive capacity of regulatory T cells (Tregs). In previous studies, the authors had already demonstrated that Layilin is expressed on Tregs, that it acts as a negative regulator of their suppressive capacity, that it functions to anchor Tregs in non-lymphoid tissues, and that it enhances the adhesive properties of Layilin-expressing cells by co-localization with the integrin αLβ2 (LFA-1). Building on these published data, the authors now show that Layilin is highly expressed on a subset of clonally expanded effector Tregs in both healthy and psoriatic skin and that deletion of Layilin in Tregs in vivo resulted in significantly attenuated skin inflammation. Furthermore, the authors addressed the molecular mechanism by which Layilin affects the suppressive capacity of Tregs and showed that Layilin increased Treg adhesion via modulation of LFA-1, resulting in distinct cytoskeletal changes.

      Strengths:

      Certainly, the strength of this study lies in the combination of data from mouse and human models.

      Weaknesses:

      Some of the conclusions drawn by the authors must be treated with caution, as the experimental conditions were not always appropriate, leading to a risk of misinterpretation.

    4. Reviewer #3 (Public review):

      Summary:

      Gouirand et al explore the function of Layilin on Treg in the context of psoriasis using both patient samples and a conditional mutant mouse model. They perform functional analysis in the patient samples using Cas9-mediated deletion. The authors suggest that Layilin works in concert with integrins to bind collagen IV to attenuate cell movement.

      The work is well done and built on solid human data. The report is a modest advance from the authors' previous report in 2021 that focused on tumor responses, with this report focusing on psoriasis. There are some experimental concerns that should be considered.

      Strengths:

      (1) Good complementation of patient and animal model data.

      (2) Solid experimentation using state-of-the-art approaches.

      (3) There is clearly a biological effect of LAYN deficiency in the mouse model.

      (4) The report adds some new information to what was already known from the previous reports.

      Weaknesses:

      (1) It is not clear that the assays used for functional analysis of the patient samples were optimal.

      (2) Several conclusions are not fully substantiated.

      (3) The report is lacking some experimental details.

    5. Author response:

      Reviewer 1:

      Concern 1: Figures 1I, 1J, and the whole of Figure 2 could be placed as supplementary figures. Also, for Figure 3E, it would be preferable to show the percentage of cells expressing cytokines rather than their absolute numbers. In fact, the drop in the numbers of cytokine-producing cells is probably due solely to the drop in total cell numbers and not to a decrease in the proportion of cells expressing cytokines. If this is the case, these data should be shown in supplementary figures. Finally, Figures 4 and 5 could be merged.

      We thank you for your recommendations. As rearranging figures is not critical to convey the data, we have decided to keep the figures and supplemental figures as they are currently presented.

      Concern 2a: It would be important to show the proportion of Treg, Tconv, and CD8 expressing Layilin in healthy skin and in patients developing psoriasis, as well as in the blood of healthy subjects.

      This data is published in a previous manuscript from our group. Please see Figure 1 in “Layilin Anchors Regulatory T Cells in Skin” (PMID: 34470859)

      Concern 2b: We lack information to be convinced that there is enrichment for migration and adhesion genes in Layilin+ Tregs in the GSEA data. The authors should indicate what geneset libraries they used. Indeed, it is tempting to show only the genesets that give results in line with the message you want to get across. If these genesets come from public banks, the bank used should be indicated, and the results of all gene sets shown in an unbiased way. In addition, it should be indicated whether the analyses were performed on untransformed or pseudobulk scRNAseq data analyses. Finally, it would be preferable to confirm the GSEA data with z-score analyses, as Ingenuity does, for example. Indeed, in GSEA-type analyses, there are genes that have activating but also inhibiting effects on a pathway in a given gene set.

      Given that we have already shown that layilin plays a major role in Treg and CD8+ T cell adhesion in tissues, we used a candidate approach for our GSEA. We tested the hypothesis that adhesion and motility pathways are enriched in Layilin-expressing Tregs. There was a statistically significant enrichment for these genes in Layilin+ Tregs compared to Layilin- Tregs, which we feel adequately tests our hypothesis.

      Concern 2c: For all FACS data, the raw data should be shown as histograms or dot plots for representative samples.

      We respect this concern. We omit these secondary to space constraints.

      Concern 2d: For Figure 5B, the number of samples analyzed is insufficient to draw clear conclusions.

      We respectfully disagree. Three doners were used in a paired fashion (internally controlled) achieving statistical significance.

      Concern 3: For Figs. 4 and 5, the design of the experiment poses a problem. Indeed, the comparison between Layn+ and Layn- cells may, in part, not be directly linked to the expression or absence of expression of this protein. Indeed, Layn+ and Layn- Tregs may constitute populations with different biological properties, beyond the expression of Layn. However, in the experiment design used here, a significant fraction of the sorted Layn- Tregs will be cells belonging to the population that has never expressed this protein. It would have been preferable to sort first the Layn+ Tregs, then knock down this protein and re-sort the Layn- Tregs and Layn+ Tregs. If this experiment is too cumbersome to perform, I agree that the authors should not do it. However, it would be important to mention the point I have just made in the text.

      We agree. However, as the reviewer points out, these experiments are not logistically and practically feasible at this point. We do perform several experiments in this manuscript in which layilin is reduced via gene editing with results supporting our hypotheses.

      Reviewer 2:

      Some of the conclusions drawn by the authors must be treated with caution, as the experimental conditions were not always appropriate, leading to a risk of misinterpretation.

      We have been transparent with all our methods and data. We will leave this to the reader to determine level of rigor and the robustness of the data.

      Reviewer 3:

      Weaknesses:

      It is not clear that the assays used for functional analysis of the patient samples were optimal. (2) Several conclusions are not fully substantiated. (3) The report is lacking some experimental details.

      We have tried to be as comprehensive and thorough as possible. We feel that the data supports our conclusions. We will leave this to the reader to interpret and conclude.

    1. eLife Assessment

      This revised study describes an important new model for in vivo manipulation of microglia, exploring how mutations in the Adar1 gene within microglia contribute to Aicardi-Goutières Syndome. The methodology is validated with exceptional data, supporting the authors' conclusions. The paper underscores both the advantages and limitations of using transplanted cells as a surrogate for microglia, making it a resource that is of value for biologists studying macrophages and microglia.

    2. Reviewer #1 (Public review):

      Summary:

      Aicardi-Goutières Syndrome (AGS) is a genetic disorder that primarily affects the brain and immune system through excessive interferon production. The authors sought to investigate the role of microglia in AGS by first developing bone-marrow-derived progenitors in vitro that carry the estrogen-regulated (ER) Hoxb8 cassette, allowing them to expand indefinitely in the presence of estrogen and differentiate into macrophages when estrogen is removed. When injected into the brains of Csf1r-/- mice, which lack microglia, these cells engraft and resemble wild-type (WT) microglia in transcriptional and morphological characteristics, although they lack Sall1 expression. The authors then generated CRISPR-Cas9 Adar1 knockout (KO) ER-Hoxb8 macrophages, which exhibited increased production of inflammatory cytokines and upregulation of interferon-related genes. This phenotype could be rescued using a Jak-Stat inhibitor or by concurrently mutating Ifih1 (Mda5). However, these Adar1-KO macrophages fail to successfully engraft in the brain of both Csf1r-/- and Cx3cr1-creERT2:Csf1rfl/fl mice. To overcome this, the authors used a mouse model with a patient-specific Adar1 mutation (Adar1 D1113H) to derive ER-Hoxb8 bone marrow progenitors and macrophages. They discovered that Adar1 D1113H ER-Hoxb8 macrophages successfully engraft the brain, although at lower levels than WT-derived ER-Hoxb8 macrophages, leading to increased production of Isg15 by neighboring cells. These findings shed new light on the role of microglia in AGS pathology.

      Strengths:

      The authors convincingly demonstrate that ER-Hoxb8 differentiated macrophages are transcriptionally and morphologically similar to bone marrow-derived macrophages. They also show evidence that when engrafted in vivo, ER-Hoxb8 microglia are transcriptomically similar to WT microglia. Furthermore, ER-Hoxb8 macrophages engraft the Csf1r-/- brain with high efficiency and rapidly (2 weeks), showing a homogenous distribution. The authors also effectively use CRISPR-Cas9 to knock out TLR4 in these cells with little to no effect on their engraftment in vivo, confirming their potential as a model for genetic manipulation and in vivo microglia replacement.

      Overall, this paper demonstrates an innovative approach to manipulating microglia using ER-Hoxb8 cells as surrogates. The authors present convincing evidence of the model's efficacy and potential for broader application in microglial research, given its ease of production and rapid brain engraftment potential in microglia-deficient mice. Using mouse-derived cells for transplantation reduces complications that can come with the use of human cell lines, highlighting the utility of this system for research in mouse models.

    3. Reviewer #2 (Public review):

      Summary:

      Microglia have been implicated in brain development, homeostasis, and diseases. "Microglia replacement" has gain tractions in recent years, using primary microglia, bone marrow or blood-derived myeloid cells, or human iPSC-induced microglia. Here, the authors extended their previous work in the area and provide evidence to support: (1) Estrogen-regulated (ER) homeobox B8 (Hoxb8) conditionally immortalized macrophages from bone marrow can serve as stable, genetically manipulated cell lines. These cells are highly comparable to primary bone marrow-derived (BMD) macrophages in vitro, and, when transplanted into a microglia-free brain, engraft the parenchyma and differentiate into microglia-like cells (MLCs). Taking advantage of this model system, the authors created stable, Adar1-mutated ER-Hoxb8 lines using CRISPR-Cas9 to study the intrinsic contribution of macrophages to Aicardi-Goutières Syndrome (AGS) disease mechanism.

      Strengths:

      The studies are carefully designed and well-conducted. The imaging data and gene expression analysis are carried out at a high level of technical competences and the studies provide strong evidence that ER-Hoxb8 immortalized macrophages from bone marrow are a reasonable source for "microglia replacement" exercise. The findings are clearly presented, and the main message will be of general interest to the neuroscience and microglia communities.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Aicardi-Goutières Syndrome (AGS) is a genetic disorder that primarily affects the brain and immune system through excessive interferon production. The authors sought to investigate the role of microglia in AGS by first developing bone-marrow-derived progenitors in vitro that carry the estrogen-regulated (ER) Hoxb8 cassette, allowing them to expand indefinitely in the presence of estrogen and differentiate into macrophages when estrogen is removed. When injected into the brains of Csf1r-/- mice, which lack microglia, these cells engraft and resemble wild-type (WT) microglia in transcriptional and morphological characteristics, although they lack Sall1 expression. The authors then generated CRISPR-Cas9 Adar1 knockout (KO) ER-Hoxb8 macrophages, which exhibited increased production of inflammatory cytokines and upregulation of interferon-related genes. This phenotype could be rescued using a Jak-Stat inhibitor or by concurrently mutating Ifih1 (Mda5). However, these Adar1-KO macrophages fail to successfully engraft in the brain of both Csf1r-/- and Cx3cr1-creERT2:Csf1rfl/fl mice. To overcome this, the authors used a mouse model with a patient-specific Adar1 mutation (Adar1 D1113H) to derive ER-Hoxb8 bone marrow progenitors and macrophages. They discovered that Adar1 D1113H ER-Hoxb8 macrophages successfully engraft the brain, although at lower levels than WT-derived ER-Hoxb8 macrophages, leading to increased production of Isg15 by neighboring cells. These findings shed new light on the role of microglia in AGS pathology.

      Strengths:

      The authors convincingly demonstrate that ER-Hoxb8 differentiated macrophages are transcriptionally and morphologically similar to bone marrow-derived macrophages. They also show evidence that when engrafted in vivo, ER-Hoxb8 microglia are transcriptomically similar to WT microglia. Furthermore, ER-Hoxb8 macrophages engraft the Csf1r-/- brain with high efficiency and rapidly (2 weeks), showing a homogenous distribution. The authors also effectively use CRISPR-Cas9 to knock out TLR4 in these cells with little to no effect on their engraftment in vivo, confirming their potential as a model for genetic manipulation and in vivo microglia replacement.

      Weaknesses:

      The robust data showing the quality of this model at the transcriptomic level can be strengthened with confirmation at protein and functional levels. The authors were unable to investigate the effects of Adar1-KO using ER-Hoxb8 cells and instead had to rely on a mouse model with a patient-specific Adar1 mutation (Adar1 D1113H). Additionally, ER-Hoxb8-derived microglia do not express Sall1, a key marker of microglia, which limits their fidelity as a full microglial replacement, as has been rightfully pointed out in the discussion.

      Overall, this paper demonstrates an innovative approach to manipulating microglia using ER-Hoxb8 cells as surrogates. The authors present convincing evidence of the model's efficacy and potential for broader application in microglial research, given its ease of production and rapid brain engraftment potential in microglia-deficient mice. While Adar1-KO macrophages do not engraft well, the success of TLR4-KO line highlights the model's potential for investigating other genes. Using mouse-derived cells for transplantation reduces complications that can come with the use of human cell lines, highlighting the utility of this system for research in mouse models.

      Thank you for this thoughtful and balanced assessment. The major suggestion from Reviewer 1 was that confirmation of RNAseq data with protein or functional studies would add strength.  We provided protein staining by IHC for IBA1 in vivo, as well as protein staining by FACS for CD11B, CD45, and TMEM119 in vitro and in vivo.  For TLR4, we showed successful protein KO and blunted response to LPS (a TLR4 ligand) challenge, which we believe provides some protein and functional data to support the approach.  To bolster these data, we added staining for P2RY12 on brain-engrafted ER-Hoxb8s.

      Regarding the Adar1 KO phenotypes showing non-engraftment. Because ADAR1 KO mice are embryonically lethal due to hematopoietic failure, we see the health impacts of Adar1 KO on ER-Hoxb8s as a strength of the transplantation model, enabling the assessment of ADAR1 global function in macrophages and microglia-like cells without generation of a transgenic mouse line. In addition, it was a surprise that the health impact occurs at the macrophage and not the progenitor stage, perhaps providing insight for future studies of ADAR1’s role in hematopoiesis. Instead, we were able to show a significant impact of complete loss of Adar1 on survival and engraftment, suggesting an important biological function of ADAR1. Macrophage-specific D1113H mutation, which affects part of the deaminase domain, shows that when the RNA deamination (but not the RNA binding) function of ADAR1 is disrupted, we find brain-wide interferonopathy. This is very exciting to our group and hopefully the community as astrocytes are thought to be a major driver of brain interferonopathy in patients with ADAR1 mutations. Instead, this suggests that disruption of brain macrophages is also a major contributor. 

      Reviewer #2 (Public review):

      Summary:

      Microglia have been implicated in brain development, homeostasis, and diseases. "Microglia replacement" has gained traction in recent years, using primary microglia, bone marrow or blood-derived myeloid cells, or human iPSC-induced microglia. Here, the authors extended their previous work in the area and provided evidence to support: (1)

      Estrogen-regulated (ER) homeobox B8 (Hoxb8) conditionally immortalized macrophages from bone marrow can serve as stable, genetically manipulated cell lines. These cells are highly comparable to primary bone marrow-derived (BMD) macrophages in vitro, and, when transplanted into a microglia-free brain, engraft the parenchyma and differentiate into microglia-like cells (MLCs). Taking advantage of this model system, the authors created stable, Adar1-mutated ER-Hoxb8 lines using CRISPR-Cas9 to study the intrinsic contribution of macrophages to the Aicardi-Goutières Syndrome (AGS) disease mechanism.

      Strengths:

      The studies are carefully designed and well-conducted. The imaging data and gene expression analysis are carried out at a high level of technical competence and the studies provide strong evidence that ER-Hoxb8 immortalized macrophages from bone marrow are a reasonable source for "microglia replacement" exercise. The findings are clearly presented, and the main message will be of general interest to the neuroscience and microglia communities.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This is an elegant study, demonstrating both the utility and limitations of ER-Hoxb8 technology as a surrogate model for microglia in vivo. The manuscript is well-designed and clearly written, but authors should consider the following suggestions:

      (1) Validation of RNA hits at the protein level: To strengthen the comparison between ER-Hoxb8 macrophages and WT bone marrow-derived macrophages, validating several RNA hits at the protein level would be beneficial. As many of these hits are surface markers, flow cytometry could be employed for confirmation (e.g., Figure 1D, Figure 3E).

      In vitro, we show protein levels by flow cytometry for CD11B (ITGAM) and CD45 (PTPRC; Figure 1C), as well as TMEM119 (Supplemental Figure 2A) and TLR4 (Supplemental Figure 3C/D). In vivo, we show TMEM119 protein levels by flow cytometry (Figure 3A), as well as their CD11B/CD45 pregates (Supplemental Figure 2C), plus immunostaining for IBA1 (AIF1; Figure 2D). We now provide additional data showing P2RY12 immunostaining in brain-engrafted cells (Supplemental Figure 2B). 

      (2) The authors should consider testing the phagocytic capacity of ER-Hoxb8-derived macrophages to further validate their functionality.

      Thank you for the suggestion. We measured ER-Hoxb8 macrophage ability to engulf phosphatidylserine-coated beads that mimic apoptotic cells, compared with phosphatidylcholine-coated beads, now as new Supplemental Figure 1C/D. This agrees with existing literature showing efficient engulfment/phagocytosis by ER-Hoxb8-derived cells (Elhag et al., 2021).

      (3) For Figure 3E, incorporating a wild-type (WT) microglia reference would be beneficial to establish a baseline for comparison (e.g. including WT microglia data in the graph or performing a ratio analysis against WT expression levels).

      We agree - we now include bars representing our sequenced primary microglia data in Figure 3E as a comparison.  

      (4) Some statistical analyses may require refinement. Specifically, for Figure 4J, where the effects of Adar1 KO and Adar1 KO with Bari are compared, it would be more appropriate to use a two-way ANOVA.

      Thank you for noting it. We have now done more appropriate two-way ANOVA and included the updated results in Figure 4J and the corresponding Supplemental Figure 4G. Errors in figure legend texts have also been corrected to reflect the statistical tests used.

      (5) Cx3cr1-creERT2 pups injected with tamoxifen: The authors could clarify the depletion ratio in these experiments before the engraftment and assess whether the depletion is global or regional. In comparison to Csf1r-/-, where TLR4-KO ER-Hoxb8 engraft globally, in Cx3cr1-creERT2, the engraftment seems more regional (Figure 5A vs Supplementary Figure 5B); is this due to the differences in depletion efficiency?

      This is an excellent question and observation, and one that we are very interested in, though that finding does not change the conclusions of this particular study.  We find some region-specific differences in depletion early after tamoxifen injection, but that all brain regions are >95% depleted by P7. For instance, in a recently published manuscript (Bastos et al., 2025) we find some differences in the depletion kinetics in the genetic model. By P3, we find 90% depletion in cortex with 50-60% in thalamus and hippocampus. In other studies, we typically deliver primary monocytes, and this is the first study where we report engraftment of ER-Hoxb8 cells in the inducible model.  In this sense, it is possible that depletion kinetics may regionally affect engraftment, but future studies are required to more finely assess this point with ER-Hoxb8s, as it may change how these models are used in the future.

      Bastos et al., Monocytes can efficiently replace all brain macrophages and fetal liver monocytes can generate bonafide SALL1+ microglia, Immunity (2025), https://doi.org/10.1016/j.immuni.2025.04.006

      (6) It would be helpful for the authors to clarify whether Adar1 is predominantly expressed by microglia, especially since the study aims to show its role in dampening the interferon response.

      That’s a wonderful point. Adar1 is expressed by all brain cells, with highest transcript level in some neurons, astrocytes, and oligodendrocytes. It is an interferon-stimulated gene, and mutation itself leads to interferonopathy, we believe, due to poor RNA editing and detection of endogenous RNA as non-self by MDA5. We hope it can dampen the interferon response, but in the case of mutation, Adar1 is probably causal of interferonopathy.  It is induced in microglia upon systemic inflammatory challenge (LPS). We have edited the text to highlight its expression pattern.  See BrainRNAseq.org (Zhang*, Chen*, Sloan*, et al., 2014 and Bennett et al., 2016)

      Reviewer #2 (Recommendations for the authors):

      (1) There appears to be a morphological difference between wt and Adar1/Ifih1 double KO (dKO) cells in the engrafted brains (Figure 5). It would be good if the authors could systematically compare the morphology (e.g., soma size, number, and length of branches) of the engrafted MLCs between the wt and mutant cells.

      We agree. While cells did not differ in branch number or length, engrafted dKO cells had significantly larger somas compared with controls, which we now present in Figure S5A.

      (2) To fully appreciate the extent of how those engrafted ER-Hoxb8 immortalized macrophages resemble primary, engrafted yolk sac-myeloid cells, vs engrafted iPSC-induced microglia, it would be informative to provide a comparison of their RNAseq data derived from the engrafted ER-Hoxb8 immortalized macrophages with published data transcriptomic data sets (e.g. Bennett et al. Neuron 2018; Chadarevian et al. Neuron 2024; Schafer et al. Cell 2023).

      Thank you for this suggestion. To address this, we provide our full dataset for additional experiments. To compare with a similar non-immortalized model, we compared top up- and down-regulated genes from our data to those of ICT yolk sac progenitor cells from our previous work (Bennett et al., 2018). We find overlap between brain-engrafted ER-Hoxb8-, bone marrow-, and yolk sac-derived cells (Supplemental Figure 2F, Supplemental Table 3).  

      Minor comments:

      Figure 6C: red arrow showing zoom in regions are not matchable. It might be beneficial to provide bigger images with each channel for C and D as a Supplemental Figure.

      We fixed this in Figure 6C to show areas of interest in the cortex for both conditions. Figure S7A shows intermediate power images to aid in interpretation.

    1. eLife Assessment

      This valuable work proposes a novel, rapid S. aureus entry mechanism via Ca²⁺-dependent lysosomal exocytosis and acid sphingomyelinase release, which influences bacterial sub-cellular fate. However, reliance on chemical inhibitors and the absence of a knockout phenotype weakens the overall impact, making the study incomplete.

    2. Reviewer #2 (Public review):

      In the manuscript, Ruhling et al propose a rapid uptake pathway that is dependent on lysosomal exocytosis, lysosomal Ca2+ and acid sphingomyelinase, and further suggest that the intracellular trafficking and fate of the pathogen is dictated by the mode of entry. Overall, this is manuscript argues for an important mechanism of a 'rapid' cellular entry pathway of S.aureus that is dependent on lysosomal exocytosis and acid sphingomyelinase and links the intracellular fate of bacterium including phagosomal dynamics, cytosolic replication and host cell death to different modes of uptake.

      Key strength is the nature of the idea proposed, while continued reliance on inhibitor treatment combined with lack of phenotype for genetic knock out is a major weakness. While the authors argue a role for undetectable nano-scale Cer platforms on the cell surface caused by ASM activity, results do not rule out a SM independent role in the cellular uptake phenotype of ASM inhibitors.

      The authors have attempted to address many of the points raised in the previous revision. While the new data presented provide partial evidence, the reliance on chemical inhibitors and lack of clear results directly documenting release of lysosomal Ca2+, or single bacterial tracking, or clear distinction between ASM dependent and independent processes dampen the enthusiasm.

      I acknowledge the author's argument of different ASM inhibitors showing similar phenotypes across different assays as pointing to a role for ASM, but the lack of phenotype in ASM KO cells is concerning. The author's argument that altered lipid composition in ASM KO cells could be overcoming the ASM-mediated infection effects by other ASM-independent mechanisms is speculative, as they acknowledge, and moderates the importance of ASM-dependent pathway. The SM accumulation in ASM KO cells does not distinguish between localized alterations within the cells. If this pathway can be compensated, how central is it likely to be ?

      The authors allude to lower phagosomal escape rate in ASM KO cells compared to inhibitor treatment, which appears to contradict the notion of uptake and intracellular trafficking phenotype being tightly linked. As they point out, these results might be hard to interpret. Could an inducible KD system recapitulate (some of) the phenotype of inhibitor treatment ? If S. aureus does not escape phagosome in macrophages, could it provide a system to potentially decouple the uptake and intracellular trafficking effects by ASM (or its inhibitor treatment) ?

      The role of ASM on cell surface remains unclear. The hypothesis proposed by the authors that the localized generation of Cer on the surface by released ASM leads to generation of Cer-enriched platforms could be plausible, but is not backed by data, technical challenges to visualize these platforms notwithstanding. These results do not rule out possible SM independent effects of ASM on the cell surface, if indeed the role of ASM is confirmed by controlled genetic depletion studies.

      The reviewer acknowledges technical challenges in directly visualizing lysosomal Ca2+ using the methods outlined. Genetically encoded lysosomal Ca2+ sensor such as Gcamp3-ML1 might provide better ways to directly visualize this during inhibitor treatment, or S. aureus infection.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Rühling et al analyzes the mode of entry of S. aureus into mammalian cells in culture. The authors propose a novel mechanism of rapid entry that involves the release of calcium from lysosomes via NAADP-stimulated activation of TPC1, which in turn causes lysosomal exocytosis; exocytic release of lysosomal acid sphingomyelinase (ASM) is then envisaged to convert exofacial sphingomyelin to ceramide. These events not only induce the rapid entry of the bacteria into the host cells but are also described to alter the fate of the intracellular S. aureus, facilitating escape from the endocytic vacuole to the cytosol.

      Strengths:

      The proposed mechanism is novel and could have important biological consequences.

      Weaknesses:

      Unfortunately, the evidence provided is unconvincing and insufficient to document the multiple, complex steps suggested. In fact, there appear to be numerous internal inconsistencies that detract from the validity of the conclusions, which were reached mostly based on the use of pharmacological agents of imperfect specificity.

      We thank the reviewer for the detailed evaluation of our manuscript. We will address the criticism below.

      We agree with the reviewer that many of the experiments presented in our study rely on the usage of inhibitors. However, we want to emphasize that the main conclusion (invasion pathway affects the intracellular fate/phagosomal escape) was demonstrated without the use of inhibitors or genetic ablation in two key experiments (Figure5 D/E). These experiments were in line with the results we obtained with inhibitors (amitriptyline [Figure 4D], ARC39, PCK310, [Figure 4C] and Vacuolin-1 [Figure4E]). Importantly, the hypothesis was also supported by another key experiment, in which we showed the intracellular fate of bacteria is affected by removal of SM from the plasma membrane before invasion, but not by removal of SM from phagosomal membranes after bacteria internalization (Figure5A-C). Taken together, we thus believe that the main hypothesis is strongly supported by our data.

      Moreover, we either used different inhibitors for the same molecule (ASM was inhibited by ARC39, amitriptyline and PCK310 with similar outcome) or supported our hypothesis with gene-ablated cell pools (TPC1, Syt7, SARM1), as we will point out in more detail below.

      Firstly, the release of calcium from lysosomes is not demonstrated. Localized changes in the immediate vicinity of lysosomes need to be measured to ascertain that these organelles are the source of cytosolic calcium changes. In fact, 9-phenantrol, which the authors find to be the most potent inhibitor of invasion and hence of the putative calcium changes, is not a blocker of lysosomal calcium release but instead blocks plasmalemmal TRPM4 channels. On the other hand, invasion is seemingly independent of external calcium. These findings are inconsistent with each other and point to non-specific effects of 9-phenantrol. The fact that ionomycin decreases invasion efficiency is taken as additional evidence of the importance of lysosomal calcium release. It is not clear how these observations support involvement of lysosomal calcium release and exocytosis; in fact treatment with the ionophore should itself have induced lysosomal exocytosis and stimulated, rather than inhibited invasion. Yet, manipulations that increase and others that decrease cytosolic calcium both inhibited invasion.

      With respect to lysosomal Ca<sup>2<sup>+</sup></sup> release, we agree with the reviewer that direct visual demonstration of lysosomal Ca<sup>2<sup>+</sup></sup> release upon infection will improve the manuscript. We therefore performed live cell imaging to visualize lysosomal Ca<sup>2<sup>+</sup></sup> release by a previously published method.1 The approach is based on two dextran-coupled fluorophores that were incubated with host cells. The dyes are endocytosed and eventually stain the lysosomes. One of the dyes, Rhod-2, is Ca<sup>2<sup>+</sup></sup>-sensitive and can be used to estimate the lysosomal Ca<sup>2<sup>+</sup></sup> content. The second dye, AF647, is Ca<sup>2<sup>+</sup></sup>-insensitive and is used to visualize the lysosomes. If the ratio Rhod-2/AF647 within the lysosomes is decreasing, lysosomal Ca<sup>2<sup>+</sup></sup> release is indicated. We monitored lysosomal Ca<sup>2<sup>+</sup></sup> content during S. aureus infection with this method (Author response image 1 and Author response video 1). However, the lysosomes are very dynamic, and it is challenging to monitor the fluorescence intensities over time. Thus, quantitative measurements are not possible with our methodology, and we decided to not include these data in the main manuscript. However, one could speculate that lysosomal Ca<sup>2<sup>+</sup></sup> content in the selected ROI (Author response image 1 and Author response video 1) is decreased upon attachment of S. aureus to the host cells as indicated by a decrease in Rhod-2/AF647 ratio.

      Author response image 1.

      Lysosomal Ca<sup>2<sup>+</sup></sup> imaging during S. aureus infection. The lysosomes of HuLEC were stained with two dextran-coupled fluorescent dyes. A Ca<sup>2<sup>+</sup></sup>-sensitive dye Rhod-2 as well as Ca<sup>2<sup>+</sup></sup>insensitive AF647. Cells were infected with fluorescent S. aureus JE2 and monitored by live cell imaging (see Author response video 1). The intensity of Rhod-2/AF647 was measured close to a S. aureus-host contact site. Ratio of Rhod-2 vs. AF647 fluorescence intensity was calculated

      As to the TRPM4 involvement in S. aureus host cell internalization, it has been reported that TRPM4 is activated by cytosolic Ca<sup>2<sup>+</sup></sup>. However, the channel conducts monovalent cations such as K<sup>+</sup> or Na<sup>+</sup> but is impermeable for Ca<sup>2<sup>+</sup></sup> [2, 3]. The following of our observations are supporting this:

      i) S. aureus invasion is dependent on intracellular Ca<sup>2<sup>+</sup></sup>, but is independent from extracellular Ca<sup>2<sup>+</sup></sup>  (Figure 1A).

      ii) 9-phenantrol treatment reduces S. aureus internalization by host cells, illustrating the dependence of this process on TRPM4 (data removed from the manuscript) . We therefore hypothesize that TRPM4 is activated by Ca<sup>2<sup>+</sup></sup> released from lysosomes (see above).

      TRPM4 is localized to focal adhesions and is connected to actin cytoskeleton[4, 5] – a requisite of host cell entry of S. aureus.[6, 7] This speaks for an important function of TRPM4 in uptake of S. aureus in general, but does not necessarily have to be involved exclusively in the rapid uptake pathway.

      TRPM4 itself is not permeable for Ca<sup>2<sup>+</sup></sup> but is activated by the cation.  Thus, it is unlikely to cause lysosomal exocytosis. The stronger bacterial uptake reduction by treatment with 9-phenantrol when compared to Ned19 thus may be caused by the involvement of TRPM4 in additional pathways of S. aureus host cell entry involving that association of TRPM4 with focal adhesions or as pointed out by the reviewer, unspecific side effects of 9-phenantrol that we currently cannot exclude.  However, we think that experiments with 9-phenantrol distract from the main story (lysosomal Ca<sup>2<sup>+</sup></sup> and exocytosis) and might be confusing for the reader. We thus removed all data and discussion concerning 9phenantrol in the revised manuscript.

      Regarding the reduced S. aureus invasion after ionomycin treatment, we agree with the reviewer that ionomycin is known to lead to lysosomal exocytosis as was previously shown by others8 as well as our laboratory[9}. 

      We hypothesized that pretreatment with ionomycin would trigger lysosomal exocytosis and thus would reduce the pool of lysosomes that can undergo exocytosis before host cells are contacted by S. aureus. As a result, we should observe a marked reduction of S. aureus internalization in such “lysosome-depleted cells”, if the lysosomal exocytosis is coupled to bacterial uptake. Our observation of reduced bacterial internalization after ionomycin treatment supports this hypothesis.

      However, ionomycin treatment and S. aureus infection of host cells are distinct processes.  

      While ionomycin results in strong global and non-directional lysosomal exocytosis of all “releasable” lysosomes (~5-10 % of all lysosomes according to previous observations)8, we hypothesize that lysosomal exocytosis upon contact with S. aureus only involves a small proportion of lysosomes at host-bacteria contact sites. This is supported by experiments that demonstrate that ~30% of the lysosomes that are released by ionomycin treatment are exocytosed during S. aureus infection (see below and Figure 2, A-C). We added this new data as well as an according section to the discussion  (line 563 ff). Moreover, we moved the data obtained with ionomycin to Figure 2E and described our idea behind this experiment more precisely (line 166 ff).

      The proposed role of NAADP is based on the effects of "knocking out" TPC1 and on the pharmacological effects of Ned-19. It is noteworthy that TPC2, rather than TPC1, is generally believed to be the primary TPC isoform of lysosomes. Moreover, the gene ablation accomplished in the TPC1 "knockouts" is only partial and rather unsatisfactory. Definitive conclusions about the role of TPC1 can only be reached with proper, full knockouts. Even the pharmacological approach is unconvincing because the high doses of Ned-19 used should have blocked both TPC isoforms and presumably precluded invasion. Instead, invasion is reduced by only ≈50%. A much greater inhibition was reported using 9-phenantrol, the blocker of plasmalemmal calcium channels. How is the selective involvement of lysosomal TPC1 channels justified?

      As to partial gene ablation of TPC1: To avoid clonal variances, we usually perform pool sorting to obtain a cell population that predominantly contains cells -here- deficient in TPC1, but also a small proportion of wildtype cells as seen by the residual TPC1 protein on the Western blot. We observe a significant reduction in bacterial uptake in this cell pool suggesting that the uptake reduction in a pure K.O. population may be even more pronounced. 

      As to the inhibition by Ned19: 

      The scale of invasion reduction upon Ned19 treatment (50%, Figure 1B) is comparable with the reduction caused by other compounds that influence the ASM-dependent pathway (such as amitriptyline, ARC39 [Figure 2G], BAPTA-AM [Figure 1A], Vacuolin-1 [Figure 2D], β-toxin [Figure 2L] and ionomycin [Figure 2E]). Further, the partial reduction of invasion is most likely due to the concurrent activity of multiple internalization pathways which are not all targeted by the used compounds and which we briefly discuss in the manuscript.

      We agree with the reviewer that Ned19 inhibits TPC1 and TPC2. Since ablation of TPC1 reduced invasion of S. aureus, we concluded that TPC1 is important for S. aureus host cell invasion. We thus agree with the reviewer that a role for TPC2 cannot be excluded. We clarified this in the revised manuscript (Lines 552). It needs to be noted, however, that deficiency in either TPC1 or TPC2 alone was sufficient to prevent Ebola virus infection10, which is in line with our observations.

      In order to address the role of TPC2 for this review process, we kindly were gifted TPCN1/TPCN2 double knock-out HeLa cells by Norbert Klugbauer (Freiburg, Germany), which we tested for S. aureus internalization. We found that invasion was reduced in these cell lines supporting a role of lysosomal Ca<sup>2<sup>+</sup></sup> release in S. aureus host cell entry and a role for both TPC channels (Author response image 2, see end of the document). Since we did not have a single TPCN2 knock-out available we decided to exclude these data from the main manuscript.

      Author response image 2.

      Invasion efficiency is reduced in TPC1/TPC2 double K.O. HeLa cells. Invasion efficiency of S. aureus JE2 was determined in TPC1/TPC2 double K.O. cells after 10 and 30 min. Results were normalized to the parental HeLa WT cell line (set to 100 %).  

      Invoking an elevation of NAADP as the mediator of calcium release requires measurements of the changes in NAADP concentration in response to the bacteria. This was not performed. Instead, the authors analyzed the possible contribution of putative NAADP-generating systems and reported that the most active of these, CD38, was without effect, while the elimination of SARM1, another potential source of NAADP, had a very modest (≈20%) inhibitory effect that may have been due to clonal variation, which was not ruled out. In view of these data, the conclusion that NAADP is involved in the invasion process seems unwarranted.

      Our results from two independent experimental set-ups (Ned19 [Figure 1B] and TPC1 K.O. [Figure 1C & Figure 2N]) indicate the involvement of NAADP in the process. Together with the metabolomics unit at the Biocenter Würzburg, we attempted to measure cellular NAADP levels, however, this proved to be non-trivial and requires further optimization. However, we can rule out clonal variation in the SARM1 mutant since experiments were conducted with a cell pool as described above in order to avoid clonal variation of single clones.

      The mechanism behind biosynthesis of NAADP is still debated. CD38 was the first enzyme discovered to possess the ability of producing NAADP. However, it requires acidic pH to produce NAADP[11] -which does not match the characteristics of a cytosolic NAADP producer. HeLa cells do not express CD38 and hence, it is not surprising that inhibition of CD38 had no effect on S. aureus invasion in HeLa cells. However, NAADP production by HeLa cells was observed in absence of CD38[12]. Thus CD38independent NAADP generation is likely. SARM1 can produce NAADP at neutral pH[13] and is expressed in HeLa, thus providing a more promising candidate.  

      We agree with the reviewer that the reduction of S. aureus internalization after ablation of SARM1 is less pronounced than in other experiments of ours. This may be explained by NAADP originating from other enzymes, such as the recently discovered DUOX1, DUOX2, NOX1 and NOX2[14], which – with exception of DUOX2- possess a low expression even in HeLa cells. We add this to the discussion in the revised manuscript (line 579).

      We can, however, rule out clonal variation for the inhibitory effect. As stated above we generated K.O. cell pools specifically to avoid inherent problems of clonality. Thus, we also detect some residual wildtype cells within our cell pools.  

      The involvement of lysosomal secretion is, again, predicated largely on the basis of pharmacological evidence. No direct evidence is provided for the insertion of lysosomal components into the plasma membrane, or for the release of lysosomal contents to the medium. Instead, inhibition of lysosomal exocytosis by vacuolin-1 is the sole source of evidence. However, vacuolin-1 is by no means a specific inhibitor of lysosomal secretion: it is now known to act primarily as a PIKfyve inhibitor and to cause massive distortion of the endocytic compartment, including gross swelling of endolysosomes. The modest (20-25%) inhibition observed when using synaptotagmin 7 knockout cells is similarly not convincing proof of the requirement for lysosomal secretion.

      We agree with the reviewer that the manuscript will benefit from a functional analysis of lysosomal exocytosis and therefore conducted assays to investigate exocytosis in the revised manuscript. We previously showed i) by addition of specific antisera that LAMP1 transiently is exposed on the plasma membrane during ionomycin and pore-forming toxin challenge and ii) demonstrated the release of ASM activity into the culture medium under these conditions.[9] However, both measurements are not compatible with S. aureus infection, since LAMP1 antibodies also are non-specifically bound by protein A and another IgG-binding proteins on the S. aureus surface, which would bias the results. Since protein A also may serve as an adhesin in the investigated pathway, we cannot simply delete the ORF without changing other aspects of staphylococcal virulence. Further, FBS contains a ASM background activity that impedes activity measurements of cell culture medium. We previously removed this background activity by a specific heat-inactivation protocol.[9] However, S. aureus invasion is strongly reduced in culture medium containing this heat-inactivated FBS.

      We therefore developed a luminescence assay based on split NanoLuc luciferase that enables detection of LAMP1 exposed on the plasma membrane without usage of antibodies (Figure 2, A-C). We added a section on the assay in the revised manuscript. Briefly, we generated reporter cells by fusing a short peptide fragment of NanoLuc called HiBiT between the signal peptide and the mature luminal domain of LAMP1 and stably expressed the resulting protein in HeLa cells by lentiviral transduction. The LgBiT protein domain of NanoLuc luciferase (Promega) as well as the substrate Furimazine are added to the culture medium. HiBiT can reconstitute a functional NanoLuc with LgBiT and process Furimazine when lysosomes are exocytosed thereby generating luminescence measurable in a suitable plate reader. 

      With this assay we detected that  about 30% of lysosomes that were “releasable” by treatment with ionomycin are exocytosed during S. aureus infection. Lysosomal exocytosis was strongly reduced (even below the levels of untreated controls), if we treated cells with Vacuolin-1 or Ned19.  

      We agree with the reviewer that Vacuolin-1 to some extent has unspecific side effects as has been shown by others and which we addressed in the revised version of the manuscript (line 541 ff). However, our new results with the HiBiT reporter cell line clearly demonstrate a reduction of lysosomal exocytosis after Vacuolin-1 treatment. Supported by this and our other results we hypothesize that Vacuolin-1 decreases S. aureus internalization due to the inhibition of lysosomal exocytosis.

      As to the involvement of synaptotagmin 7: The effect of Syt7 K.O. on invasion was moderate in initial experiments, likely due to a high culture passage and presumably overgrowth of WT cells. However, reduction of invasion in Syt7 K.O.s was more pronounced in experiments with β-toxin complementation (Figure 2, N) and hence, we combined the two data sets (Figure 2, F). This demonstrates the reduction of bacterial invasion by ~40% in Syt7 K.O. cell pools. Moreover, Syt7 is not the only protein possibly involved in Ca<sup>2<sup>+</sup></sup>-dependent exocytosis. For instance, Syt1 has been shown to possess an overlapping function.[15] This may explain the differences between our Vacuolin-1 and Syt7 ablation experiments. We added this information to the discussion. 

      ASM is proposed to play a central role in the rapid invasion process. As above, most of the evidence offered in this regard is pharmacological and often inconsistent between inhibitors or among cell types. Some drugs affect some of the cells, but not others. It is difficult to reach general conclusions regarding the role of ASM. The argument is made even more complex by the authors' use of exogenous sphingomyelinase (beta-toxin). Pretreatment with the toxin decreased invasion efficiency, a seemingly paradoxical result. Incidentally, the effectiveness of the added toxin is never quantified/validated by directly measuring the generation of ceramide or the disappearance of SM.

      Although pharmacological inhibitors can have unspecific side effects, we want to emphasize that the inhibitors used in our study act on the enzyme ASM by completely different mechanisms. Amitriptyline is a so called functional inhibitor of ASM (FIASMA) which induces the detachment of ASM from lysosomal membranes resulting in degradation of the enzyme.[16] By contrast, ARC39 is a competitive inhibitor.[17, 18] 

      There are no inconsistencies in our data obtained with ASM inhibitors. Amitriptyline and ARC39 both reduce the invasion of S. aureus in HuLEC, HuVEC and HeLa cells (Figure 2G). ARC39 needs a longer pre-incubation, since its uptake by host cells is slower (to be published elsewhere). We observe a different outcome in 16HBE14o- and Ea.Hy 926 cells, with 16HBE14o- even demonstrating a slightly increased invasion of S. aureus upon ARC39 treatment. Amitriptyline had no effect (Figure 2G). 

      Thus, the ASM-dependent S. aureus internalization is cell type/line specific, which we state in the manuscript. The molecular origin of these differences is unclear and will require further investigation, e.g. in testing cell lines for potential differences in surface receptors. In a separate study we have already developed a biotinylation-based approach to identify potential novel host cell surface interaction partners during S. aureus infection.[19]

      Moreover, both inhibitors affected the invasion dynamics (Figure 3D), phagosomal escape (Figure 4C and Figure 4D) and Rab7 recruitment (Figure 4A and Supp. Figure 4A-C) in a similar fashion. Proper inhibition of ASM by both compounds in all cell lines used was validated by enzyme assays (Supp. Figure 2H), which again suggests that the ASM-dependent pathway does only exist in specific cell lines and also supports  that we do not observe unspecific side effects of the compounds. We clarified this in the revised manuscript.

      ASM is a key player for SM degradation and recycling. In clinical context, deficiency in ASM results in the so-called Niemann Pick disease type A/B. The lipid profile of ASM-deficient cells is massively altered[20], which will result in severe side effects. Short-term inhibition by small molecules therefore poses a clear benefit when compared to the usage of ASM K.O. cells. In order to satisfy the query of the reviewer, we generated two ASM K.O. cell pools (generated with two different sgRNAs) and tested these for S. aureus invasion efficiency (Figure 2, I). We did not observe bacterial invasion differences between WT and K.O. cells. However, when we treated the cells additionally with ASM inhibitor, we observed a strongly reduced invasion in WT cells, while invasion efficiency in ASM K.O. was only slightly affected (Figure 2, J). We concluded that the reduced invasion observed in inhibitor-treated WT cells  predominantly is due to absence of ASM, while the small reduction observed in ARC39treated ASM K.O.s is likely due to unspecific side effects.  

      We performed lipidomics on these cells and demonstrated a strongly altered sphingolipid profile in ASM K.O. cells compared to untreated and inhibitor-treated WT cells (Figure 2, K). We speculate that other ASM-independent bacterial invasion pathways are upregulated in ASM K.O.s., thereby obscuring the effect contributed by absence of ASM. We discussed this in the revised manuscript (line 518 ff).

      Moreover, we introduced the RFP-CWT escape marker into the ASM K.O. cells and measured phagosomal escape of S. aureus JE2 and Cowan I.  The latter strain is non-cytotoxic and serves as negative control, since it is known to possess a very low escape rate, due to its inability to produce toxin. Again, we compared early invaders (infection for 10 min) with early<sup>+</sup>late invaders (infection for 30 min). As observed  for JE2, “early invaders” possess lower escape rates than “early<sup>+</sup>late invaders”.

      We did not observe differences between WT and ASM K.O. cells, if we infected for only 10 min. By contrast, we observed a lower escape rate in ASM K.O (Author response image 3, see end of the document). compared to WT cells, when we infected for 30 min.  

      However, we usually observe an increased phagosomal escape, when we treated host cells with ASM inhibitors (Figure 4C and D). Reduced phagosomal escape of intracellular S. aureus in ASM K.O. cells may be caused by the altered sphingolipid profile(e.g., by interference with binding of bacterial toxins to phagosomal membranes or altered vesicular acidification). We hence think that these data are difficult to interpret, and clarification would require intense additional experimentation. Thus, we did not include this data in the manuscript. 

      Author response image 3.

      Phagosomal escape rates were established in either HeLa wild-type or ASM K.O. cells expressing the phagosomal escape reporter RFP-CWT. Host cells that were infected with the cytotoxic S. aureus strain JE2 or the non-cytotoxic strain Cowan I for 10 or 30 minutes and escape rates were determined by microscopy 3h p.i.

      As to the treatment with a bacterial sphingomyelinase:

      Treatment with the bacterial SMase (bSMase, here: β-toxin) was performed in two different ways:

      i) Pretreatment of host cells with β-toxin to remove SM from the host cell surface before infection. This removes the substrate of ASM from the cell surface prior to addition of the bacteria (Figure 2L, Figure 4A-C). Since SM is not present on the extracellular plasma membrane leaflet after treatment, a release of ASM cannot cause localized ceramide formation at the sites of lysosomal exocytosis. Similar observations were made by others.[21] 

      ii) Addition of bSMase to host cells together with the bacteria to complement for the absence of ASM (Figure 2N).  

      Removal of the ASM substrate before infection (i) prevents localized ASM-mediated conversion of SM to Cer during infection and resulted in a decreased invasion, while addition of the SMase during infection resulted in an increased invasion in TPC1 and Syt7 ablated cells. Thus, both experiments are consistent with each other and in line with our other observations. 

      Removal of SM from the plasma membrane by β-toxin was indirectly demonstrated by the absence of Lysenin recruitment to phagosomes/escaped bacteria when host cells were pretreatment with the toxin before infection (Figure5C). We also added another data set that demonstrates degradation of a fluorescence SM derivative upon β-toxin treatment of host cells (Supp Figure 2, M). In another publication, we recently quantified the effectiveness of β-toxin treatment, even though with slightly longer treatment times (75 min vs. 3h).[22]

      To clarify our experimental approaches to the readership we added an explanatory section to the revised manuscript (line 287 ff) and we also added a scheme to in Figure 2M describing the experimental settings.

      As to the general conclusions regarding the role of ASM: ASM and lysosomal exocytosis has been shown to be involved in uptake of a variety of pathogens[21, 23-27] supporting its role in the process.

      The use of fluorescent analogs of sphingomyelin and ceramide is not well justified and it is unclear what conclusions can be derived from these observations. Despite the low resolution of the images provided, it appears as if the labeled lipids are largely in endomembrane compartments, where they would presumably be inaccessible to the secreted ASM. Moreover, considering the location of the BODIPY probe, the authors would be unable to distinguish intact sphingomyelin from its breakdown product, ceramide. What can be concluded from these experiments? Incidentally, the authors report only 10% of BODIPY-positive events after 10 min. What are the implications of this finding? That 90% of the invasion events are unrelated to sphingomyelin, ASM, and ceramide?

      During the experiments with fluorescent SM analogues (Figure 3a,b), S. aureus was added to the samples immediately before the start of video recording. Hence, bacteria are slowly trickling onto the host cells, and we thus can image the initial contact between them and the bacteria, for instance, the bacteria depicted in Figure 3A contact the host cell about 9 min before becoming BODIPY-FL-positive (see Supp. Video 1, 55 min). Hence, in these cases we see the formation of phagosomes around bacteria rather than bacteria in endomembrane compartments. Since generation of phagosomes happens at the plasma membrane, SM is accessible to secreted ASM.  

      The “trickling” approach for infection is an experimental difference to our invasion measurements, in which we synchronized the infection by  centrifugation. This ensures that all bacteria have contact to host cells and are not just floating in the culture medium. However, live cell imaging of initial bacterialhost contact and synchronization of infection is hard to combine technically.

      In our invasion measurements -with synchronization-, we typically see internalization of ~20% of all added bacteria after 30 min. Hence, most bacteria that are visible in our videos likely are still extracellular and only a small proportion was internalized. This explains why only 10% of total bacteria are positive for BODIPY-FL-SM after 10 min. The proportion of internalized bacteria that are positive for BODIPY-FL-SM should be way higher but cannot be determined with this method.

      We agree with the reviewer that we cannot observe conversion of BODIPY-FL-SM by ASM. In order to do that, we attempted to visualize the conversion of a visible-range SM FRET probe (Supp. Figure 3), but the structure of the probe is not compatible with measurement of conversion on the plasma membrane, since the FITC fluorophore released into the culture medium by the ASM activity thereby gets lost for imaging. In general, the visualization of SM conversion with subcellular resolution is challenging and even with novel tools developed in our lab[28] visualization of SM on the plasma membrane is difficult. 

      The conclusions we draw from these experiments are that i.) S. aureus invasion is associated with SM and ii.) SM-associated invasion can be very fast, since bacteria are rapidly engulfed by BODIPY-FL-SM containing membranes.

      It is also unclear how the authors can distinguish lysenin entry into ruptured vacuoles from the entry of RFP-CWT, used as a criterion of bacterial escape. Surely the molecular weights of the probes are not sufficiently different to prevent the latter one from traversing the permeabilized membrane until such time that the bacteria escape from the vacuole.

      We here want to clarify that both Lysenin as well as the CWT reporter have access to ruptured vacuoles (Figure 4B). We used the Lysenin reporter in these experiments for estimation of SM content of phagosomal membranes. If a vacuole is ruptured, both the bacteria and the luminal leaflet of the phagosomal membrane remnants get in contact with the cytosol and hence with the cytosolically expressed reporters YFP-Lysenin as well as RFP-CWT resulting in “Lysenin-positive escape” when phagosomes contained SM (see Figure 5C). By contrast, either β-toxin expression by S. aureus or pretreatment with the bSMase resulted in absence of Lysenin recruitment suggesting that the phagosomal SM levels were decreased/undetectable (Figure 5C, Supp Figure 6F, G, I, J).

      Although this approach does not enable a quantitative measurement of phagosomal SM, this method is sufficient to show that β-toxin expression and pretreatment result in markedly decreased phagosomal SM levels in the host cells.

      The approach we used here to analyze “Lysenin-positive escape” can clearly be distinguished from Lysenin-based methods that were used by others.29 There Lysenin was used to show trans-bilayer movement of SM before rupture of bacteria-containing phagosomes.

      To clarify the function of Lysenin in our approach we added  additional figures (Figure 4F, Supp. Figure 5) and a movie (Supp. Video 4) to the revised manuscript.

      Both SMase inhibitors (Figure 4C) and SMase pretreatment increased bacterial escape from the vacuole. The former should prevent SM hydrolysis and formation of ceramide, while the latter treatment should have the exact opposite effects, yet the end result is the same. What can one conclude regarding the need and role of the SMase products in the escape process?

      As pointed out above, pretreatment of host cells with SMase removes SM from the plasma membrane and hence, ASM does not have access to its substrate. Hence, both treatment with either ASM inhibitors or pretreatment with bacterial SMase prevent ASM from being active on the plasma membrane and hence block the ASM-dependent uptake (Figure 2 G, L). Although overall less bacteria were internalized by host cells under these conditions, the bacteria that invaded host cells did so in an ASM-independent manner. 

      Since blockage of the ASM-dependent internalization pathway (with ASM inhibitor [Figure 4C, D], SMase pretreatment [Figure 5B] and Vacuolin-1[Figure.4E]) always resulted in enhanced phagosomal escape, we conclude that bacteria that were internalized in an ASM-independent fashion cause enhanced escape. Vice versa, bacteria that enter host cells in an ASM-dependent manner demonstrate lower escape rates. 

      This is supported by comparing the escape rates of “early” and “late” invaders [Figure 5D, E], which in our opinion is a key experiment that supports this hypothesis. The “early” invaders are predominantly ASM-dependent (see e.g. Figure 3E) and thus, bacteria that entered host cell in the first 10 min of infection should have been internalized predominantly in an ASM-dependent fashion, while slower entry pathways are active later during infection. The early ASM dependent invaders possessed lower escape rates, which is in line with the data obtained with inhibitors (e.g. Figure 4C, D).

      We hypothesize that the activity of ASM on the plasma membrane during invasion mediates the recruitment of a specific subset of receptors, which then influences downstream phagosomal maturation and escape. This hypothesis is supported by the fact that the subset of receptors interacting with S. aureus is altered upon inhibition of the ASM-dependent uptake pathway. We describe this in another study that is currently under evaluation elsewhere.  

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Ruhling et al propose a rapid uptake pathway that is dependent on lysosomal exocytosis, lysosomal Ca<sup>2<sup>+</sup></sup> and acid sphingomyelinase, and further suggest that the intracellular trafficking and fate of the pathogen is dictated by the mode of entry.

      The evidence provided is solid, methods used are appropriate and results largely support their conclusions, but can be substantiated further as detailed below. The weakness is a reliance on chemical inhibitors that can be non-specific to delineate critical steps.

      Specific comments:

      A large number of experiments rely on treatment with chemical inhibitors. While this approach is reasonable, many of the inhibitors employed such as amitriptyline and vacuolin1 have other or nondefined cellular targets and pleiotropic effects cannot be ruled out. Given the centrality of ASM for the manuscript, it will be important to replicate some key results with ASM KO cells.

      We thank the reviewer for the critical evaluation of our manuscript and plenty of constructive comments. 

      We agree with the reviewer, that ASM inhibitors such as functional inhibitors of ASM (FIASMA) like amitriptyline used in our study have unspecific side effects given their mode-of-action. FIASMAs induce the detachment of ASM from lysosomal membranes resulting in degradation of the enzyme.[16]  However, we want to emphasize that we also used the competitive inhibitor ARC39 in our study[17, 18] which acts on the enzyme by a completely different mechanism. All phenotypes (reduced invasion [Figure 2G], effect on invasion dynamics [Figure 3D], enhanced escape [Figure 4C, D] and differential recruitment of Rab7 [Supp. Figure 4A-C]) were observed with both inhibitors thereby supporting the role of ASM in the process.  

      We further agree that experiments with genetic evidence usually support and improve scientific findings. However, ASM is a cellular key player for SM degradation and recycling. In a clinical context, deficiency in ASM results in a so-called Niemann Pick disease type A/B. The lipid profile of ASMdeficient cells is massively altered[20], which in itself will result in severe side effects. Thus, the usage of inhibitors provides a clear benefit when compared to ASM K.O. cells, since ASM activity can be targeted in a short-term fashion thereby preventing larger alterations in cellular lipid composition.

      We nevertheless generated two ASM K.O. cell pools (generated with two different sgRNAs) and tested for invasion efficiency (Figure 2, I). Here, we did not observe differences between WT and mutants. However, if we treated the cells additionally with ASM inhibitor, we observed a strongly reduced invasion in WT cells, while invasion efficiency in ASM K.O. was only slightly affected (Figure 2, J). We concluded that the reduced invasion observed in WT cells upon inhibitor treatment predominantly is due to inhibition of ASM, whereas the small reduction observed in ARC39-treated ASM K.O.s is likely due to unspecific side effects. We also demonstrated a strongly altered sphingolipid profile in ASM K.O. cells when compared to untreated and inhibitor-treated WT cells (new Figure 2, K). We speculate that other ASM-independent invasion pathways are upregulated in ASM K.O.s., thereby making up for the absence of ASM. We discuss this in the revised manuscript (line 518 ff).

      We introduced the RFP-CWT escape marker into the ASM K.O. cells and measured phagosomal escape of S. aureus JE2 and Cowan I (Author response image 3). The latter serves as negative control, since it is known to possess a very low escape rate, due to its inability of toxin production. Again, we compared early invaders (infection for 10 min) with early<sup>+</sup>late invaders (infection for 30 min). As seen before for JE2, early invaders possess lower escape rates than early<sup>+</sup>late invaders. We did not observe differences between WT and K.O. cells, if we infected for 10 min. By contrast, we observed a lower escape rate in ASM K.O. compared to WT cells, when we infected for 30 min. However, we usually observe an increased phagosomal escape, when we treated host cells with ASM inhibitors (Figure 4C and D). We think that the reduced phagosomal escape in ASM K.O. is caused by the altered sphingolipid profile, which could have versatile effects (e.g., inference with binding of bacterial toxins to phagosomal membranes or changes in acidification). We hence think that these data are difficult to interpret, and clarification would require intense additional experimentation. Thus, we did not include this data in the manuscript. 

      Most experiments are done in HeLa cells. Given the pathway is projected as generic, it will be important to further characterize cell type specificity for the process. Some evidence for a similar mechanism in other cell types S. aureus infects, perhaps phagocytic cell type, might be good. 

      Whenever possible we performed the experiments not only in HeLa but also in HuLECs. For example, we refer to experiments concerning the role of Ca<sup>2<sup>+</sup></sup> (Figure 1A/Supp.Figure1A), lysosomal Ca<sup>2<sup>+</sup></sup>/Ned19 (Figure1B/Supp Figure 1C), lysosomal exocytosis/Vacuolin-1 (Figure 2D/Supp. Figure2D), ASM/ARC39 and amitriptyline (Figure 2G), surface SM/β-toxin (Figure 2L/Supp. Figure 2L), analysis of invasion dynamics (complete Figure 3) and measurement of cell death during infection (Figure 6C<sup>+</sup>E, Supp. Figure 8A<sup>+</sup>B).

      HuLECs, however, are not really genetically amenable and hence we were not able to generate gene deletions in these cells and upon introduction of the fluorescence escape reporter the cells are not readily growing. 

      As to ASM involvement in phagocytic cells: a role for ASM during the uptake of S. aureus by macrophages was previously reported by others.[25] However, in professional phagocytes S. aureus does not escape from the phagosome and replicates within the phagosome.[30]

      I'm a little confused about the role of ASM on the surface. Presumably, it converts SM to ceramide, as the final model suggests. Overexpression of b-toxin results in the near complete absence of SM on phagosomes (having representative images will help appreciate this), but why is phagosomal SM detected at high levels in untreated conditions? If bacteria are engulfed by SM-containing membrane compartments, what role does ASM play on the surface? If surface SM is necessary for phagosomal escape within the cell, do the authors imply that ASM is tuning the surface SM levels to a certain optimal range? Alternatively, can there be additional roles for ASM on the cell surface? Can surface SM levels be visualized (for example, in Figure 4 E, F)?

      We initially hypothesized that we would detect higher phagosomal SM levels upon inhibition of ASM, since our model suggests SM cleavage by ASM on the host cell surface during bacterial cell entry. However, we did not detect any changes in our experiments (Supp. Figure 4F). We currently favor the following explanation: SM is the most abundant sphingolipid in human cells.[31] If peripheral lysosomes are exocytosed and thereby release ASM, only a localized and relative small proportion of SM may get converted to Cer, which most likely is below our detection limit. In addition, the detection of cytosolically exposed phagosomal SM by YFP-Lysenin is not quantitative and provides a “Yes or No” measurement. Hence, we think that the rather limited SM to Cer conversion in combination with the high abundance of SM in cellular membranes does not visibly affect the recruitment of the Lysenin reporter. 

      In our experiments that employ BODIPY-FL-SM (Figure 3a<sup>+</sup>b), we cannot distinguish between native SM and downstream metabolites such as Cer. Hence, again we cannot make any assumptions on the extent to which SM is converted on the surface during bacterial internalization. Although our laboratory recently used trifunctional sphingolipid analogs to analyze the SM to Cer conversion[22], the visualization of this process on the plasma membrane is currently still challenging.

      Overall, we hypothesize that the localized generation of Cer on the surface by released ASM leads to generation of Cer-enriched platforms. Subsequently, a certain subset of receptors may be recruited to these platforms and influence the uptake process. These platforms are supposed to be very small, which also would explain that we did not detect changes in Lysenin recruitment.

      Related to that, why is ASM activity on the cell surface important? Its role in non-infectious or other contexts can be discussed.

      ASM release by lysosomal exocytosis is implied in plasma membrane repair upon injury. We added a short description of the role of extracellular ASM in the introduction (line 35).

      If SM removal is so crucial for uptake, can exocytosis of lysosomes alone provide sufficient ASM for SM removal? How much or to what extent is lysosomal exocytosis enhanced by initial signaling events? Do the authors envisage the early events in their model happening in localized confines of the PM, this can be discussed.

      Ionomycin treatment led to a release of ~10 % of all lysosomes and also increased extracellular ASM activity.[8, 9] In the revised manuscript, we developed an assay to determine lysosomal exocytosis during S. aureus infection (Figure 2, A-C). We detected lysosomal exocytosis of ~30% when compared to ionomycin treatment  during infection. Since this is only a fraction of the “releasable lysosomes”, we assume that the effects (lysosomal Ca<sup>2<sup>+</sup></sup> liberation, lysosomal exocytosis and ASM activity) are very localized and take place only at host-pathogen contact sites (see also above). We discuss this in the revised manuscript (line 563 ff). To our knowledge it is currently unclear to which extent the released ASM affects surface SM levels. We attempted to visualize the local ASM activity on the cell surface by using a visible range FRET probe (Supp. Fig. 3). Cleavage of the probe by ASM on the surface leads to release of FITC into the cell culture medium, which does not contribute a measurable signal at the surface. 

      How are inhibitor doses determined? How efficient is the removal of extracellular bacteria at 10 min? It will be good to substantiate the cfu experiments for infectivity with imaging-based methods. Are the roles of TPC1 and TPC2 redundant? If so, why does silencing TPC1 alone result in a decrease in infectivity? For these and other assays, it would be better to show raw values for infectivity. Please show alterations in lysosomal Ca<sup>2<sup>+</sup></sup> at the doses of inhibitors indicated. Is lysosomal Ca<sup>2<sup>+</sup></sup> released upon S. aureus binding to the cell surface? Will be good to directly visualize this.

      Concerning the inhibitor concentrations, we either used values established in published studies or recommendations of the suppliers (e.g. 2-APB, Ned19, Vacuolin-1). For ASM inhibitors, we determined proper inhibition of ASM by activity assays. Concentrations of ionomycin resulting in Ca<sup>2<sup>+</sup></sup> influx and lysosomal exocytosis was determined in earlier studies of our lab.[9, 32] 

      As to the removal of bacteria at 10 min p.i.: Lysostaphin is very efficient for removal of extracellular S. aureus and sterilizes the tissue culture supernatant. It significantly lyses bacteria within a few minutes, as determined by turbidity assays.[33]

      As to imaging-based infectivity assays: We performed imaging-based invasion assays to show reduced invasion efficiency with two ASM inhibitors in the revised manuscript with similar results as obtained by CFU counts (Supp. Figure 2, J).

      Regarding the roles of TPC1 and TPC2: from our data we cannot conclude whether the roles of TPC1 and TPC2 are redundant. One could speculate that since blockage of TPC1 alone is sufficient to reduce internalization of bacteria, that both channels may have distinct roles. On the other hand, there might be a Ca<sup>2<sup>+</sup></sup> threshold in order to initiate lysosomal exocytosis that can only be attained if TPC1 and TPC2 are activated in parallel. Thus, our observations are in line with another study that shows reduced Ebola virus infection in absence of either TPC1 or TPC2.[34] In order to address the role of TPC2 for this review process, we kindly were gifted TPCN1/TPCN2 double knock-out HeLa cells by Norbert Klugbauer (Freiburg, Germany), which we tested for S. aureus internalization. We found that invasion was reduced in these double KO cell lines even further supporting a role of lysosomal Ca<sup>2<sup>+</sup></sup> release in S. aureus host cell entry (Author response image 2, see end of the document). Since we did not have a single TPCN2 knockout available, we decided to exclude these data from the main manuscript.

      As to raw CFU counts: whereas the observed effects upon blocking the invasion of S. aureus are stable, the number of internalized bacteria varies between individual biological replicates, for instance, by differences in host cell fitness or growth differences in bacterial cultures, which are prepared freshly for each experiment.

      With respect to visualization of lysosomal Ca<sup>2<sup>+</sup></sup> release: we agree with the reviewer that direct visual demonstration of lysosomal Ca<sup>2<sup>+</sup></sup> release upon infection would improve the manuscript. We therefore performed live cell imaging to visualize lysosomal Ca<sup>2<sup>+</sup></sup> release by a previously published method.[1] The approach is based on two dextran-coupled fluorophores that were incubated with host cells. The dyes are endocytosed and eventually stain the lysosomes. One of the dyes, Rhod-2, is Ca<sup>2<sup>+</sup></sup>-sensitive and can be used to estimate the lysosomal Ca<sup>2<sup>+</sup></sup> content. The second dye, AF647, is Ca<sup>2<sup>+</sup></sup>-insensitive and is used to visualize the lysosomes. If the ratio Rhod-2/AF647 within the lysosomes is decreasing, lysosomal Ca<sup>2<sup>+</sup></sup> release is indicated. We monitored lysosomal Ca<sup>2<sup>+</sup></sup> content during S. aureus infection with this method (Author response image 1 and Author response video 1). However, the lysosomes are very dynamic, and it is challenging to monitor the fluorescence intensities over time. Thus, quantitative measurements are not possible with our methodology, and we decided to not include these data in the final manuscript. However, one could speculate that lysosomal Ca<sup>2<sup>+</sup></sup> content in the selected ROI (Author response image 1 and Author response video 1) is decreased upon attachment of S. aureus to the host cells as indicated by a decrease in Rhod-2/AF647 ratio.

      The precise identification of cytosolic vs phagosomal bacteria is not very easy to appreciate. The methods section indicates how this distinction is made, but how do the authors deal with partial overlaps and ambiguities generally associated with such analyses? Please show respective images.

      The number of events (individual bacteria) for the live cell imaging data should be clearly mentioned.

      We apologize for not having sufficiently explained the technology to detect escaped S. aureus. The cytosolic location of S. aureus is indicated by recruitment of RFP-CWT.[35] CWT is the cell wall targeting domain of lysostaphin, which efficiently binds to the pentaglycine cross bridge in the peptidoglycan of S. aureus. This reporter is exclusively and homogenously expressed in the host cytosol. Only upon rupture of phagoendosomal membranes, the reporter can be recruited to the cell wall of now cytosolically located bacteria. S. aureus mutants, for instance in the agr quorum sensing system, cannot break down the phagosomal membrane in non-professional phagocytes and thus stay unlabeled by the CWT-reporter.[35] We  include several images (Figure 4, F, Supp. Figure 5) /movies (Supp. Video 4) of escape events in the revised manuscript.  The bacteria numbers for live cell experiments are now shown in Supp. Figure 7.

      In the phagosome maturation experiments, what is the proportion of bacteria in Rab5 or Rab7 compartments at each time point? Will the decreased Rab7 association be accompanied by increased Rab5? Showing raw values and images will help appreciate such differences. Given the expertise and tools available in live cell imaging, can the authors trace Rab5 and Rab7 positive compartment times for the same bacteria?

      We included the proportion of Rab7-associated bacteria in the revised manuscript (Supp. Figure 4A and C) and also shortly mention these proportions in the text (line 353). Usually, we observe that Rab5 is only transiently (for a few minutes) present on phagosomes and only afterwards the phagosomes become positive for Rab7. We do not think that a decrease in Rab7-positive phagosomes would increase the proportion of Rab5-positive phagosomes. However, we cannot exclude this hypothesis with our data.

      We can achieve tracing of individual bacteria for recruitment of Rab5/Rab7 only manually, which impedes a quantitative evaluation. However, we included a Video (Supp. Video 3)  that illustrates the consecutive recruitment of the GTPases.

      The results with longer-term infection are interesting. Live cell imaging suggests that ASM-inhibited cells show accelerated phagosomal escape that reduces by 6 hpi. Where are the bacteria at this time point ? Presumably, they should have reached lysosomes. The relationship between cytosolic escape, replication, and host cell death is interesting, but the evidence, as presented is correlative for the populations. Given the use of live cell imaging, can the authors show these events in the same cell?

      We think that most bacteria-containing phagoendosomes should have fused with lysosomes 6 h p.i. as we have previously shown by acidification to pH of 5 and LAMP1 decoration.[36]

      The correlation between phagosomal escape and replication in the cytosol of non-professional phagocytes has been observed by us and others. In the revised manuscript we also provide images (Supp. Figure 5)/videos (Supp. Video 4) to show this correlation in our experiments.

      Given the inherent heterogeneity in uptake processes and the use of inhibitors in most experiments, the distinction between ASM-dependent and independent pathways might not be as clear-cut as the authors suggest. Some caution here will be good. Can the authors estimate what fraction of intracellular bacteria are taken up ASM-dependent?

      We agree with the reviewer that an overlap between internalization pathways is likely. A clear distinction is therefore certainly non-trivial. Alternative to ASM-dependent and ASM-independent pathways, the ASM activity may also accelerate one or several internalization pathways. We address this limitation in the discussion of the revised manuscript (line 596 ff).

      Early in infection (~10 min after contact with the cells), the proportion of bacteria that enter host cells ASM-dependently is relatively high amounting to roughly 75-80% in HuLEC. After 30 min, this proportion is decreasing to about 50%. We included a paragraph in the discussion of the revised manuscript (line 593 ff).

      Reviewer #2 (Recommendations for the authors):

      (1) The experiment in Figure 4H is interesting. Details on what proportion of the cell is double positive, and if only this fraction was used for analysis will be good.

      We did use all bacteria found in the images independently from whether host cells were infected with only one or both strains. We unfortunately cannot properly determine the proportion of cells that are double infected, since i) we record the samples with CLSM and hence, cannot exclude that there are intracellular bacteria found in higher or lower optical sections. ii) we visualized cells by staining Nuclei and did not stain the cell borders, thus we cannot precisely tell to which host cell the bacteria localize.

      (2) Data is sparse for steps 5 and 6 of the model (line 330).

      We apologize for the inconvenience. There is a related study published  elsewhere[19], in which we identified NRCAM and PTK7 as putative receptors involved in this invasion pathway. We included a section in the discussion with the corresponding citation (line 569).

      (3) Data for the reduced number of intracellular bacteria upon blocking ASM-dependent uptake (line 235) is not clear. Do they mean decreased invasion efficiency? These two need not be the same.

      We changed “reduced number of intracellular bacteria” to “invasion efficiency”.

      (4) b-toxin added to the surface can get endocytosed. Can its surface effect be delineated from endo/phagosomal effect?

      We attempted to delineate effects contributed by the toxin activity on the surface vs. within phagosomes (Figure 5 A-C). We see an increased phagosomal escape, when we pretreated host cells with β-toxin (removal of SM form the surface) and infected either in presence (toxin will be taken up together with the bacteria into the phagosome) or in absence (toxin was washed away shortly before infection) of β-toxin. By contrast, overexpression of β-toxin by S. aureus did not affect phagosomal escape rates. The proper activity of β-toxin was confirmed by absence of Lysenin recruitment during phagosomal escape in all three conditions. We concluded that the activity on the surface and not the activity in the phagosome is important.

      (5) The potential role(s) of bacterial factors in the uptake and subsequent intracellular stages can be discussed.

      There are multiple bacterial adhesins known in S. aureus. These usually are either covalently attached to the bacterial cell wall such as the sortase-dependently anchored Fibronectin-binding Proteins A and B but also secreted and “cell wall binding” proteins as well at non proteinaceous factor such as wall-teichoic acids. A discussion of these factors would thus be out of the scope of this manuscript, and we here suggest reverting to specialized reviews on that topic.

      (6) The manuscript is not very easy to read. The abstract could be rephrased for better clarity and succinctness, with a clearly stated problem statement. The introduction is somewhat haphazard, I feel it can be better structured.

      We apologize for the inconvenience. We stated the problem/research question in the abstract and tried to improve the introduction without adding too much unnecessary detail. In general, we tried  to improve the readability of the manuscript and hope that our results and conclusions can be easier understood by the reader in the revised version.

      (7) Typo in Figure 5F. Step 6 should read "accessory receptors"

      The typo was corrected.

      References

      (1) Lloyd-Evans, E. et al. Niemann-Pick disease type C1 is a sphingosine storage disease that causes deregulation of lysosomal calcium. Nature Medicine 14, 1247-1255 (2008).

      (2) Launay, P. et al. TRPM4 Is a Ca<sup>2<sup>+</sup></sup>-Activated Nonselective Cation Channel Mediating Cell Membrane Depolarization. Cell 109, 397-407 (2002).

      (3) Nilius, B. et al. The Ca<sup>2<sup>+</sup></sup>‐activated cation channel TRPM4 is regulated by phosphatidylinositol 4,5‐biphosphate. The EMBO Journal 25, 467-478-478 (2006).

      (4) Cáceres, M. et al. TRPM4 Is a Novel Component of the Adhesome Required for Focal Adhesion Disassembly, Migration and Contractility. PLoS One 10, e0130540 (2015).

      (5) Silva, I., Brunett, M., Cáceres, M. & Cerda, O. TRPM4 modulates focal adhesion-associated calcium signals and dynamics. Biophysical Journal 123, 390a (2024).

      (6) Schlesier, T., Siegmund, A., Rescher, U. & Heilmann, C. Characterization of the Atl-mediated staphylococcal internalization mechanism. International Journal of Medical Microbiology 310, 151463 (2020).

      (7) Jevon, M. et al. Mechanisms of Internalization ofStaphylococcus aureus by Cultured Human Osteoblasts. Infection and Immunity 67, 2677-2681 (1999).

      (8) Rodriguez, A., Webster, P., Ortego, J. & Andrews, N.W. Lysosomes behave as Ca<sup>2<sup>+</sup></sup>-regulated exocytic vesicles in fibroblasts and epithelial cells. J Cell Biol 137, 93-104 (1997).

      (9) Krones & Rühling et al. Staphylococcus aureus alpha-Toxin Induces Acid Sphingomyelinase Release From a Human Endothelial Cell Line. Front Microbiol 12, 694489 (2021).

      (10) Sakurai, Y. et al. Two-pore channels control Ebola virus host cell entry and are drug targets for disease treatment. Science 347, 995-998 (2015).

      (11) Aarhus, R., Graeff, R.M., Dickey, D.M., Walseth, T.F. & Lee, H.C. ADP-ribosyl cyclase and CD38 catalyze the synthesis of a calcium-mobilizing metabolite from NADP. J Biol Chem 270, 3032730333 (1995).

      (12) Schmid, F., Fliegert, R., Westphal, T., Bauche, A. & Guse, A.H. Nicotinic acid adenine dinucleotide phosphate (NAADP) degradation by alkaline phosphatase. J Biol Chem 287, 32525-32534 (2012).

      (13) Angeletti, C. et al. SARM1 is a multi-functional NAD(P)ase with prominent base exchange activity, all regulated bymultiple physiologically relevant NAD metabolites. iScience 25, 103812 (2022).

      (14) Gu, F. et al. Dual NADPH oxidases DUOX1 and DUOX2 synthesize NAADP and are necessary for Ca(2<sup>+</sup>) signaling during T cell activation. Sci Signal 14, eabe3800 (2021).

      (15) Schonn, J.-S., Maximov, A., Lao, Y., Südhof, T.C. & Sørensen, J.B. Synaptotagmin-1 and -7 are functionally overlapping Ca<sup>2<sup>+</sup></sup> sensors for exocytosis in adrenal chromaffin cells. Proceedings of the National Academy of Sciences 105, 3998-4003 (2008).

      (16) Kornhuber, J. et al. Functional Inhibitors of Acid Sphingomyelinase (FIASMAs): a novel pharmacological group of drugs with broad clinical applications. Cell Physiol Biochem 26, 9-20 (2010).

      (17) Naser, E. et al. Characterization of the small molecule ARC39, a direct and specific inhibitor of acid sphingomyelinase in vitro. J Lipid Res 61, 896-910 (2020).

      (18) Roth, A.G. et al. Potent and selective inhibition of acid sphingomyelinase by bisphosphonates. Angew Chem Int Ed Engl 48, 7560-7563 (2009).

      (19) Rühling, M., Schmelz, F., Kempf, A., Paprotka, K. & Fraunholz Martin, J. Identification of the Staphylococcus aureus endothelial cell surface interactome by proximity labeling. mBio 0, e03654-03624 (2025).

      (20) Schuchman, E.H. & Desnick, R.J. Types A and B Niemann-Pick disease. Mol Genet Metab 120, 27-33 (2017).

      (21) Miller, M.E., Adhikary, S., Kolokoltsov, A.A. & Davey, R.A. Ebolavirus Requires Acid Sphingomyelinase Activity and Plasma Membrane Sphingomyelin for Infection. Journal of Virology 86, 7473-7483 (2012).

      (22) M. Rühling, L.K., F. Wagner, F. Schumacher, D. Wigger, D. A. Helmerich, T. Pfeuffer, R. Elflein, C. Kappe, M. Sauer, C. Arenz, B. Kleuser, T. Rudel, M. Fraunholz, J. Seibel Trifunctional sphingomyelin derivatives enable nanoscale resolution of sphingomyelin turnover in physiological and infection processes via expansion microscopy. Nat Commun accepted in principle (2024).

      (23) Peters, S. et al. Neisseria meningitidis Type IV Pili Trigger Ca(2<sup>+</sup>)-Dependent Lysosomal Trafficking of the Acid Sphingomyelinase To Enhance Surface Ceramide Levels. Infect Immun 87 (2019).

      (24) Grassmé, H. et al. Acidic sphingomyelinase mediates entry of N. gonorrhoeae into nonphagocytic cells. Cell 91, 605-615 (1997).

      (25) Li, C. et al. Regulation of Staphylococcus aureus Infection of Macrophages by CD44, Reactive Oxygen Species, and Acid Sphingomyelinase. Antioxid Redox Signal 28, 916-934 (2018).

      (26) Fernandes, M.C. et al. Trypanosoma cruzi subverts the sphingomyelinase-mediated plasma membrane repair pathway for cell invasion. J Exp Med 208, 909-921 (2011).

      (27) Luisoni, S. et al. Co-option of Membrane Wounding Enables Virus Penetration into Cells. Cell Host & Microbe 18, 75-85 (2015).

      (28) Rühling, M. et al. Trifunctional sphingomyelin derivatives enable nanoscale resolution of sphingomyelin turnover in physiological and infection processes via expansion microscopy. Nature Communications 15, 7456 (2024).

      (29) Ellison, C.J., Kukulski, W., Boyle, K.B., Munro, S. & Randow, F. Transbilayer Movement of Sphingomyelin Precedes Catastrophic Breakage of Enterobacteria-Containing Vacuoles. Curr Biol 30, 2974-2983 e2976 (2020).

      (30) Moldovan, A. & Fraunholz, M.J. In or out: Phagosomal escape of Staphylococcus aureus. Cell Microbiol 21, e12997 (2019).

      (31) Slotte, J.P. Biological functions of sphingomyelins. Progress in Lipid Research 52, 424-437 (2013).

      (32) Stelzner, K. et al. Intracellular Staphylococcus aureus Perturbs the Host Cell Ca(2<sup>+</sup>) Homeostasis To Promote Cell Death. mBio 11 (2020).

      (33) Kunz, T.C. et al. The Expandables: Cracking the Staphylococcal Cell Wall for Expansion Microscopy. Front Cell Infect Microbiol 11, 644750 (2021).

      (34) Sakurai, Y. et al. Ebola virus. Two-pore channels control Ebola virus host cell entry and are drug targets for disease treatment. Science 347, 995-998 (2015).

      (35) Grosz, M. et al. Cytoplasmic replication of Staphylococcus aureus upon phagosomal escape triggered by phenol-soluble modulin alpha. Cell Microbiol 16, 451-465 (2014).

      (36) Giese, B. et al. Staphylococcal alpha-toxin is not sufficient to mediate escape from phagolysosomes in upper-airway epithelial cells. Infect Immun 77, 3611-3625 (2009).

    1. eLife Assessment

      The authors modified a common method to induce epilepsy in mice to provide an improved approach to screen new drugs for epilepsy. This is important because of the need to develop new drugs for patients who are refractory to current medications. The authors' method evokes seizures to circumvent a low rate of spontaneous seizures and the approach was validated using two common anti-seizure medications. The strength of evidence was solid, making the study invaluable, but there were some limitations to the approach and methods.

    2. Reviewer #1 (Public review):

      Summary:

      This important study by Chen et. al. describes a novel approach for optogentically evoking seizures in an etiologically relevant mouse model of epilepsy. The authors developed a model that can trigger seizures "on demand" using optogenetic stimulation of CA1 principal cells in mice rendered epileptic by an intra-hippocampal kainate (IHK) injection into CA3. The authors discuss their model in the context of the limitations of current animal models used in epilepsy drug development. In particular, their model addresses concerns regarding existing models where testing typically involves inducing acute seizures in healthy animals or waiting on infrequent, spontaneous seizures in epileptic animals.

      Strengths:

      A strength of this manuscript is that this approach may facilitate the evaluation of novel therapeutics since these evoked seizures, despite having some features that were significantly different from spontaneous seizures, are suggested to be sufficiently similar to spontaneous seizures which are more laborious to analyze. The data demonstrating the commonality of pharmacology and EEG features between evoked seizures and spontaneous seizures in epileptic mice, while also being different from evoked seizures in naïve mice, are convincing. The structural, functional, and behavioral differences between a seizure-naïve and epileptic mouse, which emerge due to the enduring changes occurring during epileptogenesis, are complex and important. Accordingly, this study highlights the importance of using mice that have underwent epileptogenesis as model organisms for testing novel therapeutics. Furthermore, this study positively impacts the wider epilepsy research community by investigating seizure semiology in these populations.

      Weaknesses:

      This study convincingly demonstrates that the feature space measurements for stimulus-evoked seizures in epileptic mice were significantly different from those in naïve mice; this result allows the authors to conclude that "seizures induced in chronically epileptic animals differed from those in naïve animals". However, the authors also conclude that "induced seizures resembled naturally occurring spontaneous seizures in epileptic animals" despite their own data demonstrating similar, albeit fewer, significant differences in feature space measurements. It is unclear if and what the threshold is whereby significant differences in these feature space measurements lead to the conclusion that the differences are meaningful, as in the comparison of epileptic and naïve mice, or not meaningful, as in the comparison of evoked and spontaneous seizures.

    3. Reviewer #2 (Public review):

      The authors aimed to develop an animal model of temporal lobe epilepsy (TLE) that will generate "on-demand" seizures and an improved platform to advance our ability to find new anti-seizure drugs (ASDs) for drug-resistant epilepsy (DRE). Unlike some of the work in this field, the authors are studying actual seizures, and hopefully events that are similar to actual epileptic seizures. To develop an optimized screening tool, however, one also needs high-throughput systems with actual seizures as a quantitative, rigorous, and reproducible outcome measures. The authors aim to provide such a model; however, this approach may be over-stated here and seems unlikely to address the critical issue of drug resistance, which is their most important claim.

      Strengths:

      - The authors have generated an animal model of "on demand" seizures, which could be used to screen new ASDs and potentially other therapies. The authors and their model make a good-faith effort to emulate the epileptic condition and to use seizure susceptibility or probability as a quantitative output measure.

      - The events considered to be seizures appear to be actual seizures, with some evidence that the seizures are different from seizures in the naïve brain. Their effort to determine how different ASDs raise seizure probability or threshold to an optogenetic stimulus to the CA1 area of the rodent hippocampus is focused on an important problem, as many if not most ASD screening uses surrogate measures that may not be as well linked to actual epileptic seizures.

      - Another concern is their stimulation of dorsal hippocampus, while ventral hippocampus would seem more appropriate.

      - Use of optogenetic techniques allows specific stimulation of the targeted CA1 pyramidal cells, and it appears that this approach is reproducible and reliable with quantitative rigor.

      - The authors have taken on a critically important problem, and have made a good-faith effort to address many of the technical concerns raised in the reviews, but the underlying problem of DRE remains.

      Weaknesses:

      - Although the model has potential advantages, it also has disadvantages. As stated by the authors, the pre-test work-load to prepare the model may not be worth the apparent advantages. And most important, the paper frequently mentions DRE but does not directly address it, and yet drug resistance is the critical issue in this field.

      - Although the paper shows examples of actual seizures, there remains some concern that some of the events might not be seizures - or a homogeneous population of seizures. More quantitative assessment of the electrical properties (e.g., duration) of the seizures and their probability is likely to be more useful than the proposed quantification in the future of the behavioral seizure stages, because the former could be both more objective and automated, while the behavioral analysis of the seizures will likely be more subjective and less reliable (and also fraught with subjectivity and analytical problems). Nonetheless, the authors point that the presence of "Racine 3 or above" behavioral seizures (in addition to their electrical data) is a good argument that many (if not all) of the "seizures" are actual epileptic seizures.

      - Optogenetic stimulation of CA1 provides cell-specificity for the stimulation, but it is not clear that this method would actually be better than electrical stimulation of a kindled rodent with superimposed hippocampal injury. The reader is unfortunately left with the concern of whether this model would be easier and more efficacious than kindling.

      - Although the authors have taken on a critically important problem, and have combined a variety of technologies, this approach may facilitate more rapid screening of ASDs against actual seizures (beneficial), but it does not really address the fundamentally critical yet difficult problem of DRE. A critical issue for DRE that is not well-addressed relates to adverse effects, which is often why many ASDs are not well tolerated by many patients (e.g., LEV). Thus, we are left with: how does this address anti-seizure DRE?

      - The focus of this paper seems to be more on seizures more than on epilepsy. In the absence of seizure spontaneity, the work seems to primarily address the issues of seizure spread and duration. Although this is useful, it does not seem to be addressing the question of what trips the system to generate a seizure.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      - The authors seem to have developed a new and useful model; however, it is not clear how this will address that core problem of DRE, which was their stated aim.

      - A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community.

      - As stated before in the original review, the potential impact would primarily be aimed at the ETSP or a drug-testing CRO; however, much more work will be required to convince the epilepsy community that this approach will actually identify new ASDs for DRE. The approach is potentially time-consuming with a steep and potentially difficult optimization curve, and thus may not be readily adaptable to the typical epilepsy-models neuroscience laboratory.

      Any additional context you think would help readers interpret or understand the significance of the work:

      - The problem of DRE is much more complicated than described by the authors here; however, the paper could end up being more useful than is currently apparent. Although this work could be seen as technically - and maybe conceptually - elegant and a technical tour de force, will it "deliver on the promise"? Is it better than kindling for DRE? In attempting to improve the discovery process, how will the model move us to another level? Will this model really be any better than others, such as kindling?

    4. Reviewer #3 (Public review):

      This revised paper develops and characterizes a new approach for screening drugs for epilepsy. The idea is to increase the ability to study seizures in animals with epilepsy because most animal models have rare seizures. Thus, the authors use the existing intrahippocampal kainic acid (IHKA) mouse model, which can have very unpredictable seizures with long periods of time between seizures. This approach is of clear utility to researchers who may need to observe many seizure events per mouse during screening of antiseizure medications. A key strength is also that more utility can be derived from each individual mouse. The authors modified the IHKA model to inject KA into CA3 instead of CA1 in order to preserve the CA1 pyramidal cells that they will later stimulate. To express the excitatory opsin channelrhodopsin (ChR2) in area CA1, they use a virus that expresses ChR2 in cells that express the Thy-1 promoter. The authors demonstrate that CA3 delivery of KA can induce a very similar chronic epilepsy phenotype to the injection of KA in CA1 and show that optical excitation of CA1 can reliably induce seizures. The authors evaluate the impact of repeated stimulation on the reliability of seizure induction and show that seizures can be reliably induced by CA1 stimulation, at least for the short term (up to 16 days). These are strengths of the study.

      However, there are several limitations: the seizures are evoked, not spontaneous. It is not clear how induced seizures can be used to investigate if antiseizure medication can reduce spontaneous seizures. Although seizure inducibility and severity can be assessed, the lack of spontaneous seizures is a limitation. To their credit, the authors show that electrophysiological signatures of induced vs spontaneous seizures are similar in many ways, but the authors also show several differences. Notably, the induced seizures are robustly inhibited by the antiseizure medication levetiracetam and variably but significantly inhibited by diazepam, similar to many mouse models with chronic recurrent seizure activity. One also wonders if using a mouse model with numerous seizures (such as the pilocarpine model) might be more efficient than using a modified IHKA protocol.

      In this revised manuscript, the authors address some previous concerns related to definitions of seizures and events that are trains of spikes, sex as a biological variable, and present new images of ChR2 expression (but these images could be improved to see the cells more clearly). A few key concerns remain unaddressed, however. For example, it is still not clear that evoked seizures triggered by stimulating CA1 are similar to spontaneous seizures, regardless of the idea that CA1 plays a role in seizure disorders. It also remains unclear whether repeated activation of the hippocampal circuit will result in additional alterations to this circuit that affect the seizure phenotype over prolonged intervals (after 16 days). Furthermore, the use of SVM with the number of seizures being used as replicates (instead of number of mice) is inappropriate. Another theoretical concern is whether the authors are correct in suggesting that one will be able to re-use the mice for screening multiple drugs in a row.

      Strengths:<br /> - The authors show that the IHKA model of chronic epilepsy can be modified to preserve CA1 pyramidal cells, allowing optogenetic stimulation of CA1 to trigger seizures.<br /> - The authors show that repeated optogenetic stimulation of CA1 in untreated mice can promote kindling and induce seizures, indeed generating two mouse models in total.<br /> - Many electrophysiological signatures are similar between the induced and spontaneous seizures, and induced seizures reliably respond to treatment with antiseizure medications.<br /> - Given that more seizures can be observed per mouse using on-demand optogenetics, this model enhances the utility of each individual mouse.<br /> - Mice of each sex were used.

      Weaknesses:<br /> - Evaluation of seizure similarity using the SVM modeling and clustering is not sufficiently justified when using number of seizures as the statistical replicate (vs mice).<br /> - Related to the first concern, the utility of increasing number of seizures for enhancing statistical power is limited because standard practice is for sample size to be numbers of mice.<br /> - The term "seizure burden" usually refers to the number of spontaneous seizures per day, not the severity of the seizures themselves. Because the authors are evoking the seizures being studied, this study design precludes assessment of seizure burden.<br /> - It seems likely that repeatedly inducing seizures will have a long-term effect, especially in light of the downward slope at day 13-16 for induced seizures seen in Figure 4C. A duration of evaluation that is longer than 16 days is warranted.<br /> - Human epilepsy is extensively heterogeneous in both etiology and individual phenotype, and it may be hard to generalize the approach.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public review):

      Weaknesses:

      While the data generally supports the authors' conclusions, a weakness of this manuscript lies in their analytical approach where EEG feature-space comparisons used the number of spontaneous or evoked seizures as their replicates as opposed to the number of IHK mice; these large data sets tend to identify relatively small effects of uncertain biological significance as being highly statistically significant. Furthermore, the clinical relevance of similarly small differences in EEG feature space measurements between seizure-naïve and epileptic mice is also uncertain.

      In this work, we used linear mixed effect model to address two levels of variability –between animals and within animals. The interactive linear mixed effect model shows that most (~90%) of the variability in our data comes from within animals (Residual), the random effect that the model accounts for, rather than between animals. Since variability between animals are low, the model identifies common changes in seizure propagation across animals, while accounting for the variability in seizures within each animal. Therefore, the results we find are of changes that happen across animals, not of individual seizures. We made text edits to clarify the use of the linear mixed effect model. (page6, second paragraph and page 11, first paragraph)

      Finally, the multiple surgeries and long timetable to generate these mice may limit the value compared to existing models in drug-testing paradigms.

      Thank you for the suggestion. We added a discussion in the ‘Comparison to other seizure models…’ section on pages 15 and 16. In an existing model investigating spontaneous tonic-clonic seizures (such as the intra-amygdala kainate injection model), the time investment is back-loaded, requiring two to three weeks per condition while counting spontaneous seizures, which may occur only once a day. In contrast, our model requires a front-loaded time investment. Once the animals are set up, we can test multiple drugs within a few weeks, providing significant time savings. Additionally, we did not pre-screen animals in our study. Existing models often pre-select mice with high rates of spontaneous seizures, whereas in our model, seizures can be induced even in animals with few spontaneous seizures. We believe that bypassing the need for pre-screening also is a key advantage of our induced seizure model.  

      Reviewer 1 (Recommendations for the authors):

      (1) Address why the EEG data comparisons were performed between seizures and not between animals (as explicitly described in the public review). Further, a discussion of the biological significance (or lack thereof) of the effect size differences observed is warranted. This is especially concerning when the authors make the claim that spontaneous and induced seizures are essentially the same while their analysis shows all evaluated feature space parameters were significantly difference in the initial 1/3 of the EEG waveforms.

      We made text edits to clarify the use of the linear mixed effects model (page 6, second paragraph, and page 11, first paragraph)

      (2) The authors place great emphasis on the use of clinically/etiologically relevant epilepsy models in drug discovery research. There is discussion criticizing the time points required to enact kindling and the artificial nature of acute seizure induction methods. However, the combination IHK-opto seizure induction model also requires a lengthy timeline. A more tempered discussion of this novel model's strengths may benefit readers.

      Thank you for the suggestion. We added a discussion in the ‘Comparison to other seizure models…’ section on pages 15 and 16.

      (3) The authors should further emphasize the benefit of having an inducible seizure model of focal epilepsy since other mouse models (e.g., genetic or TBI models) may have superior etiological relevance (construct and face validity) but may not be amenable to their optogenetic stimulation approach.

      Thank you for the suggestion. We revised the manuscript to better emphasize the potential significance of our approach. We added a discussion in the 'Application of Models...' section on page 15, second paragraph. The on-demand seizure model can be applied to address biologically and clinically relevant questions beyond its utility in drug screening. For example, crossing the Thy1-ChR2 mouse line with genetic epilepsy models, such as Scn1a mutants, could reveal how optogenetic stimulation differentially induces seizures in mutant versus non-mutant mice, providing insights into seizure generation and propagation in Dravet syndrome. Due to the cellular specificity of optogenetics, we also envision this approach being used to study circuit-specific mechanisms of seizure generation and propagation.

      (4) Suggestion: Provide immunolabeled imagery demonstrating ChR2 presence in Thy1 cells.

      Thank you for the suggestion. We added a fluorescence image showing ChR2 expression in Fig. 2A

      (5) It might be prudent to mention any potential effects of laser heat on hippocampal cell damage, although the 10 Hz, ~10 mW, and 6 s stim is unlikely to cause any substantial burns. Without knowing the diameter and material of the optic fiber, this is left up to some interpretation.

      Thank you for the comments. In the Methods section, we listed the optical fiber diameter as 400 microns (page 17, EEG and Fiber Implantation section). Using 5–18 mW laser power with a relatively large fiber diameter of 400 microns, the power density falls within the range of commonly employed channelrhodopsin activation conditions in vivo. That said, we would like to investigate potential heat effects or cell damage in a follow-up study.

      (6) There are instances in the manuscript where the authors describe experimental and analytical parameters vaguely (e.g. "Seizures were induced several times a day", "stimulation was performed every 1 - 3 hours over many days"). These descriptions can and should be more precise.

      Thank you for the comments. To enhance clarity, we added the stimulation protocol in a flowchart format in Fig. S2A, describing how we determined the threshold and proceeded to the drug test. Following this protocol, there was variability in the number of stimulations per day.

      (7) In the second to last paragraph of the discussion, the authors state "However, HPDs are not generalizable across species - they are specific to the mouse model (55)." This statement is inaccurate. The paper cited comes from Dr. Corrine Roucard's lab at Synapcell. In fact, Dr. Rouchard argues the opposite (See Neurochem Res (2017) 42:1919-1925).

      Thank you for pointing out the mistake. On page 16, in the first paragraph, reference 55 (now 58 in the revised version) was intended to refer to 'quickly produce dose-response curves with high confidence.' In the revision, we cited another paper reporting that hippocampal spikes were not reproduced in the rat IHK model. R. Klee, C. Brandt, K. Töllner, W. Löscher, Various modifications of the intrahippocampal kainate model of mesial temporal lobe epilepsy in rats fail to resolve the marked rat-to-mouse differences in type and frequency of spontaneous seizures in this model. Epilepsy Behav. 68, 129–140 (2017).

      (8) In the discussion, Levetiracetam is highlighted as an ASM that would not be detected in acute induced seizure models; the authors point out its lack of effect in MES and PTZ. However, LEV is effective in the 6Hz test (also an acute-induced seizure model). This should be stated.

      Thank you for the comments. We highlighted the discussion on LEV in the 'Application of Model to Testing Multiple Classes of ASMs...' section on page 14.

      (9) The results text indicates that 9 epileptic mice were used to test LEV and DZP. However, the individual data points illustrated in Figure 5B show N=8 mice. Please correct.

      Thank you for the comments. A total of nine epileptic mice were used to assess two drugs, with the animals being re-used as indicated in the schematic. A total of eight assessments were conducted for DZP with six mice and eight assessments for LEV with five mice. Each assessment included hourly ChR2 activations without an ASM and hourly ChR2 activations after ASM injection.

      (10) Figure 4D: Naïve mice are labeled as solid blue circles in the legend while the data points are solid blue triangles. Please correct.

      Thank you. We corrected the marker in Fig.4D.

      Reviewer 2 (Public Review):

      Weaknesses:

      (1) Although the figures provide excellent examples of individual electrographic seizures and compare induced seizures in epileptic and naïve animals, it is unclear which criteria were used to identify an actual seizure induced by the optogenetic stimulus, versus a hippocampal paroxysmal discharge (HPD), an "afterdischarge", an "electrophysiological epileptiform event" (EEE, Ref #36, D'Ambrosio et al., 2010 Epilepsy Currents), or a so-called "spike-wave-discharge" (SWD). Were HPDs or these other non-seizure events ever induced using stimulation in animals with IH-KA? A critical issue is that these other electrical events are not actual seizures, and it is unclear whether they were included in the column showing data on "electrographic afterdischarges" in Figure 5 for the studies on ASDs. This seems to be a problem in other areas of the paper, also.

      Thank you for pointing out the unclear definition of the seizures analyzed. We added sentences at the beginning of the Results section (page 3) to clarify the terminology we used. We analyzed animal behavior during evoked events, and a high percentage of induced electrographic events were accompanied by behavioral seizures with a Racine scale of three or above. We added Supplemental Figure S9, which shows behavioral seizure severity scores observed before and during ASM testing. We hope these changes address the reviewer’s concern and improve the clarity of the manuscript.

      (2) The differences between the optogenetically evoked seizures in IH-KA vs naïve mice are interpreted to be due to the "epileptogenesis" that had occurred, but the lesion from the KA-induced injury would be expected to cause differences in the electrically and behaviorally recorded seizures - even if epileptogenesis had not occurred. This is not adequately addressed.

      Thank you for the comments. IHK-injected mice had spontaneous tonic-clonic seizures before the start of optical stimulation, as shown in Figure S1.

      (3) The authors offer little mention of other research using animal models of TLE to screen ASDs, of which there are many published studies - many of them with other strengths and/or weaknesses. For example, although Grabenstatter and Dudek (2019, Epilepsia) used a version of the systemic KA model to obtain dose-response data on the effects of carbamazepine on spontaneous seizures, that work required use of KA-treated rats selected to have very high rates of spontaneous seizures, which requires careful and tedious selection of animals. The ETSP has published studies with an intra-amygdala kainic acid (IA-KA) model (West et al., 2022, Exp Neurol), where the authors claim that they can use spontaneous seizures to identify ASDs for DRE; however, their lack of a drug effect of carbamazepine may have been a false negative secondary to low seizure rates. The approach described in this paper may help with confounds caused by low or variable seizure rates. These types of issues should be discussed, along with others.

      We appreciate the reviewer’s insights. We added a discussion comparing our model with other existing models in the Discussion section (pages 15 and 16, 'Comparison to Other Seizure Models Used in Pharmacologic Screening' section). In an existing model investigating spontaneous tonic-clonic seizures (such as the intra-amygdala kainate injection model), the time investment is back-loaded, requiring two to three weeks per condition while counting spontaneous seizures, which may occur only once a day. In contrast, our model requires a front-loaded time investment. Once the animals are set up, we can test multiple drugs within a few weeks, providing significant time savings. Additionally, we did not pre-screen animals in our study. Existing models often pre-select mice with high rates of spontaneous seizures, whereas in our model, seizures can be induced even in animals with few spontaneous seizures. We believe that bypassing the need for pre-screening is a key advantage of our induced seizure model.

      (4) The outcome measure for testing LEV and DZP on seizures was essentially the fraction of unsuccessful or successful activations of seizures, where high ASD efficacy is based on showing that the optogenetic stimulation causes fewer seizures when the drug is present. The final outcome measure is thus a percentage, which would still lead to a large number of tests to be assured of adequate statistical power. Thus, there is a concern about whether this proposed approach will have high enough resolution to be more useful than conventional screening methods so that one can obtain actual dose-response data on ASDs.

      Thank you for the comments. In this revision, we added Supplemental Figure S9, showing the severity of behavioral seizures observed before and during ASM testing for each animal. We observed a reduction in behavioral seizure severity for each subject. We would like to explore using behavioral severity as an outcome measure in a follow-up study.

      (5) The authors state that this approach should be used to test for and discover new ASDs for DRE, and also used for various open/closed loop protocols with deep-brain stimulation; however, the paper does not actually discuss rigorously or critically the background literature on other published studies in these areas or how this approach will improve future research for a broader audience than the ETSP and CROs. Thus, it is not clear whether the utility will apply more widely and how extensive a readership will be attracted to this work.

      We appreciate the reviewer’s insights. We revised the manuscript to better emphasize the potential significance of our approach (page 15, second paragraph). The on-demand seizure model can be applied to address biologically and clinically relevant questions beyond its utility in drug screening. For example, crossing the Thy1-ChR2 mouse line with genetic epilepsy models, such as Scn1a mutants, could reveal how optogenetic stimulation differentially induces seizures in mutant versus non-mutant mice, providing insights into seizure generation and propagation in Dravet syndrome. Due to the cellular specificity of optogenetics, we also envision this approach being used to study circuit-specific mechanisms of seizure generation and propagation. Regarding drug-resistant epilepsy (DRE) and anti-seizure drug (ASD) screening, we agree with the reviewer that probing new classes of ASDs for DRE represents a critical goal. However, we believe that a full exploration of additional ASD classes and/or modeling DRE lies outside the scope of this manuscript, and we would like to explore it in a follow-up study.

      Reviewer 2 (Recommendations for the authors):

      (1) The authors should explain why 10 Hz was chosen as the stimulation frequency.

      Thank you for the comment. A frequency of 10 Hz was determined based on previous work using anesthetized animals prepared in an acute in vivo setting. To simplify the paper and avoid confusion, we did not include a discussion on how we determined the frequency. Instead, we added a detailed description of how we optimized the power in a flowchart format in Supplemental Figure S2. We hope this improves reproducibility.

      (2) After micro-injection of KA, morphological changes were observed in the hippocampus, but no comparison of Chr2 expression was made in naïve animals vs KA-injected animals. Presumably, the Thy1-Chr2 mouse expresses GFP in cells that express Chr2. Thus, it may be useful to show the expression of Chr2 in animals with hippocampal sclerosis. This may explain the lack of dramatic difference between stimulation parameters in naïve vs epileptic animals, as shown in supplemental Figure S2.

      Thank you for the suggestion. We added a fluorescence image of ChR2 expression in CA1, ipsilateral to the KA-injected site, in Fig. 2A.

      (3) The authors state that "During epileptogenesis, neural networks in the brain undergo various changes ranging from modification of membrane receptors to the formation of new synapses" and that these changes are critical for successful "on-demand" seizure induction. However, it is not clear or well-discussed whether changes in neuronal cell densities that occur during sclerosis are important for "on-demand" seizure induction as well. Also, the authors showed that naïve animals exhibit a kindling-like effect, but it was unclear whether a similar effect was present in epileptic animals (i.e. do stimulation thresholds to seizure induction change as the animal gets more induction stimulations)? If present, would the secondary kindling affect drug-testing studies (e.g., would the drug effect be different on induced seizure #2 vs induced seizure #20)?

      Thank you for the suggestion. Since this is an important aspect of the model, we would like to address the kindling effect, the secondary kindling effect, and histopathology in a longer-term setting (several weeks) in a follow-up study.

      (4) The authors show that in their model, LEV and DZP were both efficacious. The authors do not seem to mention that, over 25 years ago, LEV was originally missed in the standard ETSP screens; and, it was only discovered outside of the ETSP with the kindling model. The kindling model is now used to screen ASDs. The authors should consider adding this point to the Discussion. It remains unclear, however, if the author's screening strategy shows advantages over kindling and other such approaches in the field.

      Thank you for the suggestion. We added a discussion on LEV in the 'Application of Model to Testing Multiple Classes of ASMs...' section on page 14.

      (5) P8 paragraph 2. The authors state values for naïve animals, but they should also provide values for epileptic animals since they state that the groups were not significantly different (p>0.05). It would be useful to show values for both and state the actual p-value from the test. This issue of stating mean/median values with SD and sample size should be addressed for all data throughout the paper. Additionally, Figure S2 should be added to the manuscript and discussed, as it has data that may be valuable for the reproducibility of the paper.

      Thank you for the suggestion. Figure S2 shows the threshold power required to induce electrographic activity for n = 10 epileptic animals (9.14 ± 4.75 mW) and n = 6 naïve animals (6.17 ± 1.58 mW) (Wilcoxon rank-sum test, p = 0.137). The threshold duration was comparable between the same epileptic animals (6.30 ± 1.64 s) and naïve animals (5.67 ± 1.03 s) (Wilcoxon rank-sum test, p = 0.7133). 

      (6) In addition to the other stated references on synaptic reorganization in the CA1 area, the authors should mention similar studies from Esclapez et al. (1999, J Comp Neurol).

      Thank you. We have included the reference in the revision.

      (7) All of the raw EEG data on the seizures should be accessible to the readers.

      Thank you for the suggestion. We will consider depositing EEG data in a publicly accessible site.

      Reviewer 3 (Public review):

      Weaknesses:

      (1) Evaluation of seizure similarity using the SVM modeling and clustering is not sufficiently explained to show if there are meaningful differences between induced and spontaneous seizures. SVM modeling did not include analysis to assess the overfitting of each classifier since mice were modeled individually for classification.”

      Thank you for the comment. We made text edits to clarify the purpose of the SVM analysis. It was not intended to identify meaningful differences between induced and spontaneous seizures. Rather, it was used to classify EEG epochs as 'seizures' based on spontaneous seizures as the training set, demonstrating the gross similarity between induced and spontaneous seizures.

      (2) The difference between seizures and epileptiform discharges or trains of spikes (which are not seizures) is not made clear.

      Thank you for pointing out the unclear definition of the seizures analyzed. We added sentences at the beginning of the Results section (page 3) to clarify the terminology we used. We analyzed animal behavior during evoked events, and a high percentage of induced electrographic events were accompanied by behavioral seizures with a Racine scale of three or above. We added Supplemental Figure S9 to show the types of seizures observed before and during ASM testing. We hope these changes address the reviewer’s concern and improve the clarity of the manuscript.

      (3) The utility of increasing the number of seizures for enhancing statistical power is limited unless the sample size under evaluation is the number of seizures. However, the standard practice is for the sample size to be the number of mice.

      In this work, we used a linear mixed-effects model to address two levels of variability—between animals and within animals. The interactive linear mixed-effects model shows that most (~90%) of the variability in our data comes from within animals (residual), the random effect that the model accounts for, rather than between animals. Since variability between animals is low, the model identifies common changes in seizure propagation across animals while accounting for the variability in seizures within each animal. Therefore, the results we find reflect changes that occur across animals, not individual seizures. We made text edits to clarify the use of the linear mixed-effects model.

      (4) Seizure burden is not easily tested.

      Thank you for the comment. We added Supplemental Figure S9 to summarize the severity of behavioral seizures before and during ASM testing. This addresses the reviewer’s comment on seizure burden. In a follow-up study, we would like to explore this type of outcome measure for drug screening.

      Reviewer 3 (Recommendations for the authors):

      (1) Provide a stronger rationale to use area CA1. For example, the authors mention that CA1 is active during seizure activity, but can seizures originate from CA1? That would make the approach logical and also explain why induced and spontaneous seizures are similar.

      Thank you for the comment. We discussed it in the Discussion section (page 14, first and second paragraphs).

      (2) Explain the use of SVM classifiers so it is more convincing that induced and spontaneous seizures are similar. Or, if they are not similar, explain that this is a limitation.

      We made text edits to clarify the purpose of the SVM analysis. It was not intended to identify meaningful differences between induced and spontaneous seizures. Rather, it was used to classify EEG epochs as 'seizures' based on spontaneous seizures as the training set, demonstrating the gross similarity between induced and spontaneous seizures.

      (3)If feasible, extend the duration over which seizure induction reliability is assessed so that the long-term utility of the model can be demonstrated.

      Thank you for the suggestion. We would like to assess long-term utility in a follow-up study.

      (4) The GitHub link is not yet active. The authors will be required to supply their relevant code for peer evaluation as well as publication.

      Thank you. The GitHub repository is now active.

      (5) State and assess the impacts of sex as a biological variable.

      Thank you for pointing this out. Both female and male animals were included in this study: Epileptic cohort: 7 males, 3 females; Naïve cohort: 3 males, 4 females.

    1. eLife Assessment

      This useful study characterises motor and somatosensory cortex neural activity during naturalistic eating and drinking tongue movement in nonhuman primates. The data, which include both electrophysiology and nerve block manipulations, will be of value to neuroscientists and neural engineers interested in tongue use. Although the current analyses provide a solid description of single neuron activity in these areas, both the population level analyses and the characterisation of activity changes following nerve block could be improved.

    2. Reviewer #1 (Public review):

      Summary:

      Hosack and Arce-McShane investigate how the 3D movement direction of the tongue is represented in the orofacial part of the sensory-motor cortex and how this representation changes with the loss of oral sensation. They examine the firing patterns of neurons in the orofacial parts of the primary motor cortex (MIo) and somatosensory cortex (SIo) in non-human primates (NHPs) during drinking and feeding tasks. While recording neural activity, they also tracked the kinematics of tongue movement using biplanar video-radiography of markers implanted in the tongue. Their findings indicate that many units in both MIo and SIo are directionally tuned during the drinking task. However, during the feeding task, directional turning was more frequent in MIo units and less prominent in SIo units. Additionally, in some recording sessions, they blocked sensory feedback using bilateral nerve block injections, which seemed to result in fewer directionally tuned units and changes in the overall distribution of the preferred direction of the units.

      Strengths:

      The most significant strength of this paper lies in its unique combination of experimental tools. The author utilized a video-radiography method to capture 3D kinematics of the tongue movement during two behavioral tasks while simultaneously recording activity from two brain areas. This specific dataset and experimental setup hold great potential for future research on the understudied orofacial segment of the sensory-motor area.

      Weaknesses:

      A substantial portion of the paper is dedicated to establishing directional tuning in individual neurons, followed by an analysis of how this tuning changes when sensory feedback is blocked. While such characterizations are valuable, particularly in less-studied motor cortical areas and behaviors, the discrepancies in tuning changes across the two NHPs, coupled with the overall exploratory nature of the study, render the interpretation of these subtle differences somewhat speculative. At the population level, both decoding analyses and state space trajectories from factor analysis indicate that movement direction (or spout location) is robustly represented. However, as with the single-cell findings, the nuanced differences in neural trajectories across reach directions and between baseline and sensory-block conditions remain largely descriptive. To move beyond this, model-based or hypothesis-driven approaches are needed to uncover mechanistic links between neural state space dynamics and behavior.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Hosack and Arce-McShane examines the directional tuning of neurons in macaque primary motor (MIo) and somatosensory (SIo) cortex. The neural basis of tongue control is far less studied than, for example, forelimb movements, partly because the tongue's kinematics and kinetics are difficult to measure. A major technical advantage of this study is using biplanar video-radiography, processed with modern motion tracking analysis software, to track the movement of the tongue inside the oral cavity. Compared to prior work, the behaviors are more naturalistic behaviors (feeding and licking water from one of three spouts), although the animals were still head-fixed.

      The study's main findings are that:

      • A majority of neurons in MIo and a (somewhat smaller) percentage of SIo modulated their firing rates during tongue movements, with different modulation depending on the direction of movement (i.e., exhibited directional tuning). Examining the statistics of tuning across neurons, there was anisotropy (e.g., more neurons preferring anterior movement) and a lateral bias in which tongue direction neurons preferred that was consistent with the innervation patterns of tongue control muscles (although with some inconsistency between monkeys).<br /> • Consistent with this encoding, tongue position could be decoded with moderate accuracy even from small ensembles of ~28 neurons.<br /> • There were differences observed in the proportion and extent of directional tuning between the feeding and licking behaviors, with stronger tuning overall during licking. This potentially suggests behavioral context-dependent encoding.<br /> • The authors then went one step further and used a bilateral nerve block to the sensory inputs (trigeminal nerve) from the tongue. This impaired the precision of tongue movements and resulted in an apparent reduction and change in neural tuning in Mio and SIo.

      Strengths:

      The data are difficult to obtain and appear to have been rigorously measured, and provide a valuable contribution to this under-explored subfield of sensorimotor neuroscience. The analyses adopt well-established methods especially from the arm motor control literature, and represent a natural starting point for characterizing tongue 3D direction tuning.

      Weaknesses:

      There are alternative explanations from some of the interpretations, but those interpretations are described in a way that clearly distinguishes results from interpretations, and readers can make their own assessments. Some of these limitations are described in more detail below.

      One weakness of the current study is that there is substantial variability in results between monkeys.

      This study focuses on describing directional tuning using the preferred direction (PD) / cosine tuning model popularized by Georgopoulous and colleagues for understanding neural control of arm reaching in the 1980s. This is a reasonable starting point and a decent first order description of neural tuning. However, the arm motor control field has moved far past that viewpoint, and in some ways an over-fixation on static representational encoding models and PDs held that field back for many years. The manuscript benefit from drawing the readers' attention (perhaps in their Discussion) that PDs are a very simple starting point for characterizing how cortical activity relates to kinematics, but that there is likely much richer population-level dynamical structure and that a more mechanistic, control-focused analytical framework may be fruitful. A good review of this evolution in the arm field can be found in Vyas S, Golub MD, Sussillo D, Shenoy K. 2020. Computation Through Neural Population Dynamics. Annual Review of Neuroscience. 43(1):249-75. A revised version of the manuscript incorporates more population-level analyses, but with inconsistent use of quantifications/statistics and without sufficient contextualization of what the reader is to make of these results.

      The described changes in tuning after nerve block could also be explained by changes in kinematics between these conditions, which temper the interpretation of these interesting results.

      I am not convinced of the claim that tongue directional encoding fundamentally changes between drinking and feeding given the dramatically different kinematics and the involvement of other body parts like the jaw (e.g., the reference to Laurence-Chasen et al. 2023 just shows that there is tongue information independent of jaw kinematics, not that jaw movements don't affect these neurons' activities). I also find the nerve block results inconsistent (more tuning in one monkey, less in the other?) and difficult to really learn something fundamental from, besides that neural activity and behavior both change - in various ways - after nerve block (not at all surprising but still good to see measurements of).

      The manuscript states that "Our results suggest that the somatosensory cortex may be less involved than the motor areas during feeding, possibly because it is a more ingrained and stereotyped behavior as opposed to tongue protrusion or drinking tasks". An alternative explanation be more statistical/technical in nature: that during feeding, there will be more variability in exactly what somatosensation afferent signals are being received from trial to trial (because slight differences in kinematics can have large differences in exactly where the tongue is and the where/when/how of what parts of it are touching other parts of the oral cavity)? This variability could "smear out" the apparent tuning using these types of trial-averaged analyses. Given how important proprioception and somatosensation are for not biting the tongue or choking, the speculation that somatosensory cortical activity is suppressed during feedback is very counter-intuitive to this reviewer. In the revised manuscript the authors note these potential confounds and other limitations in the Discussion.

    4. Reviewer #3 (Public review):

      Summary

      In this study, the authors aim to uncover how 3D tongue direction is represented in the Motor (M1o) and Somatosensory (S1o) cortex. In non-human primates implanted with chronic electrode arrays, they use X-ray based imaging to track the kinematics of the tongue and jaw as the animal is either chewing food or licking from a spout. They then correlate the tongue kinematics with the recorded neural activity. They perform both single-unit and population level analyses during feeding and licking. Then, they recharacterize the tuning properties after bilateral lidocaine injections in the two sensory branches of the trigeminal nerve. They report that their nerve block causes a reorganization of the tuning properties and population trajectories. Overall, this paper concludes that M1o and S1o both contain representations of the tongue direction, but their numbers, their tuning properties and susceptibility to perturbed sensory input are different.

      Strengths

      The major strengths of this paper are in the state-of-the-art experimental methods employed to collect the electrophysiological and kinematic data. In the revision, the single-unit analyses of tuning direction are robustly characterized. The differences in neural correlations across behaviors, regions and perturbations are robust. In addition to the substantial amount of largely descriptive analyses, this paper makes two convincing arguments 1) The single-neuron correlates for feeding and licking in OSMCx are different - and can't be simply explained by different kinematics and 2) Blocking sensory input alters the neural processing during orofacial behaviors. The evidence for these claims is solid.

      Weaknesses

      The main weakness of this paper is in providing an account for these differences to get some insight into neural mechanisms. For example, while the authors show changes in neural tuning and different 'neural trajectory' shapes during feeding and drinking - their analyses of these differences are descriptive and provide limited insight for the underlying neural computations.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their paper, Hosack and Arce-McShane investigate how the 3D movement direction of the tongue is represented in the orofacial part of the sensory-motor cortex and how this representation changes with the loss of oral sensation. They examine the firing patterns of neurons in the orofacial parts of the primary motor cortex (MIo) and somatosensory cortex (SIo) in non-human primates (NHPs) during drinking and feeding tasks. While recording neural activity, they also tracked the kinematics of tongue movement using biplanar videoradiography of markers implanted in the tongue. Their findings indicate that most units in both MIo and SIo are directionally tuned during the drinking task. However, during the feeding task, directional turning was more frequent in MIo units and less prominent in SIo units. Additionally, in some recording sessions, they blocked sensory feedback using bilateral nerve block injections, which resulted in fewer directionally tuned units and changes in the overall distribution of the preferred direction of the units.

      Strengths:

      The most significant strength of this paper lies in its unique combination of experimental tools. The author utilized a video-radiography method to capture 3D kinematics of the tongue movement during two behavioral tasks while simultaneously recording activity from two brain areas. Moreover, they employed a nerve-blocking procedure to halt sensory feedback. This specific dataset and experimental setup hold great potential for future research on the understudied orofacial segment of the sensory-motor area.

      Weaknesses:

      Aside from the last part of the result section, the majority of the analyses in this paper are focused on single units. I understand the need to characterize the number of single units that directly code for external variables like movement direction, especially for less-studied areas like the orofacial part of the sensory-motor cortex. However, as a field, our decadelong experience in the arm region of sensory-motor cortices suggests that many of the idiosyncratic behaviors of single units can be better understood when the neural activity is studied at the level of the state space of the population. By doing so, for the arm region, we were able to explain why units have "mixed selectivity" for external variables, why the tuning of units changes in the planning and execution phase of the movement, why activity in the planning phase does not lead to undesired muscle activity, etc. See (Gallego et al. 2017; Vyas et al. 2020; Churchland and Shenoy 2024) for a review. Therefore, I believe investigating the dynamics of the population activity in orofacial regions can similarly help the reader go beyond the peculiarities of single units and in a broader view, inform us if the same principles found in the arm region can be generalized to other segments of sensorymotor cortex.

      We thank and agree with the reviewer on the value of information gained from studying population activity. We also appreciate that population analyses have led to the understanding that individual neurons have “mixed selectivity”. We have shown previously that OSMCx neurons exhibit mixed selectivity in their population activity and clear separation between latent factors associated with gape and bite force levels (Arce-McShane FI, Sessle BJ, Ram Y, Ross CF, Hatsopoulos NG (2023) Multiple regions of primate orofacial sensorimotor cortex encode bite force and gape. Front Systems Neurosci. doi: 10.3389/fnsys.2023.1213279. PMID: 37808467 PMCID: 10556252), and chew-side and food types (Li Z & Arce-McShane FI (2023). Cortical representation of mastication in the primate orofacial sensorimotor cortex. Program No. NANO06.05. 2023 Neuroscience Meeting Planner. Washington, D.C.: Society for Neuroscience, 2023. Online.). 

      The primary goal of this paper was to characterize single units in the orofacial region and to do a follow-up paper on population activity. In the revised manuscript, we have now incorporated the results of population-level analyses. The combined results of the single unit and population analyses provide a deeper understanding of the cortical representation of 3D direction of tongue movements during natural feeding and drinking behaviors. 

      Further, for the nerve-blocking experiments, the authors demonstrate that the lack of sensory feedback severely alters how the movement is executed at the level of behavior and neural activity. However, I had a hard time interpreting these results since any change in neural activity after blocking the orofacial nerves could be due to either the lack of the sensory signal or, as the authors suggest, due to the NHPs executing a different movement to compensate for the lack of sensory information or the combination of both of these factors. Hence, it would be helpful to know if the authors have any hint in the data that can tease apart these factors. For example, analyzing a subset of nerve-blocked trials that have similar kinematics to the control.

      Thank you for bringing this important point. We agree with the reviewer that any change in the neural activity may be attributed to lack of sensory signal or to compensatory changes or a combination of these factors. To tease apart these factors, we sampled an equal number of trials with similar kinematics for both control and nerve block feeding sessions. We added clarifying description of this approach in the Results section of the revised manuscript: “To confirm this e ect was not merely due to altered kinematics, we conducted parallel analyses using carefully subsampled trials with matched kinematic profiles from both control and nerve-blocked conditions.”

      Furthermore, we ran additional analysis for the drinking datasets by subsampling a similar distribution of drinking movements from each condition. We compared the neural data from an equal number of trials with a similar left-right angle of movement in the last 100 ms of the tongue trajectory, nearest the spout. We compared the directional tuning across an equal number of trials with a similar left-right angle of movement in the last 100 ms of the tongue trajectory, nearest the spout. These analyses that control for similar kinematics showed that there was still a decrease in the proportion of directionally modulated neurons with nerve block compared to the control. This confirms that the results may be attributed to the lack of tactile information. These are now integrated in the revised paper under Methods section: Directional tuning of single neurons, as well as Results section: E ects of nerve block: Decreased directional tuning of MIo and SIo neurons and Figure 10 – figure supplement 1.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Hosack and Arce-McShane examines the directional tuning of neurons in macaque primary motor (MIo) and somatosensory (SIo) cortex. The neural basis of tongue control is far less studied than, for example, forelimb movements, partly because the tongue's kinematics and kinetics are difficult to measure. A major technical advantage of this study is using biplanar video-radiography, processed with modern motion tracking analysis software, to track the movement of the tongue inside the oral cavity. Compared to prior work, the behaviors are more naturalistic behaviors (feeding and licking water from one of three spouts), although the animals were still head-fixed.

      The study's main findings are that:

      • A majority of neurons in MIo and a (somewhat smaller) percentage of SIo modulated their firing rates during tongue movements, with different modulations depending on the direction of movement (i.e., exhibited directional tuning). Examining the statistics of tuning across neurons, there was anisotropy (e.g., more neurons preferring anterior movement) and a lateral bias in which tongue direction neurons preferred that was consistent with the innervation patterns of tongue control muscles (although with some inconsistency between monkeys).

      • Consistent with this encoding, tongue position could be decoded with moderate accuracy even from small ensembles of ~28 neurons.

      • There were di erences observed in the proportion and extent of directional tuning between the feeding and licking behaviors, with stronger tuning overall during licking. This potentially suggests behavioral context-dependent encoding.

      • The authors then went one step further and used a bilateral nerve block to the sensory inputs (trigeminal nerve) from the tongue. This impaired the precision of tongue movements and resulted in an apparent reduction and change in neural tuning in Mio and SIo.

      Strengths:

      The data are difficult to obtain and appear to have been rigorously measured, and provide a valuable contribution to this under-explored subfield of sensorimotor neuroscience. The analyses adopt well-established methods, especially from the arm motor control literature, and represent a natural starting point for characterizing tongue 3D direction tuning.

      Weaknesses:

      There are alternative explanations for some of the interpretations, but those interpretations are described in a way that clearly distinguishes results from interpretations, and readers can make their own assessments. Some of these limitations are described in more detail below.

      One weakness of the current study is that there is substantial variability in results between monkeys, and that only one session of data per monkey/condition is analyzed (8 sessions total). This raises the concern that the results could be idiosyncratic. The Methods mention that other datasets were collected, but not analyzed because the imaging pre-processing is very labor-intensive. While I recognize that time is precious, I do think in this case the manuscript would be substantially strengthened by showing that the results are similar on other sessions.

      We acknowledge the reviewer’s concern about inter-subject variability. Animal feeding and drinking behaviors are quite stable across sessions, thus, we do not think that additional sessions will address the concern that the results could be idiosyncratic. Each of the eight datasets analyzed here have su icient neural and kinematic data to capture neural and behavioral patterns.  Nevertheless, we performed some of the analyses on a second feeding dataset from Monkey R. The results from analyses on a subset of this data were consistent across datasets; for example, (1) similar proportions of directionally tuned neurons, (2) similar distances between population trajectories (t-test p > 0.9), and (3) a consistently smaller distance between Anterior-Posterior pairs than others in MIo (t-test p < 0.05) but not SIo (p > 0.1). 

      This study focuses on describing directional tuning using the preferred direction (PD) / cosine tuning model popularized by Georgopoulous and colleagues for understanding neural control of arm reaching in the 1980s. This is a reasonable starting point and a decent first-order description of neural tuning. However, the arm motor control field has moved far past that viewpoint, and in some ways, an over-fixation on static representational encoding models and PDs held that field back for many years. The manuscript benefits from drawing the readers' attention (perhaps in their Discussion) that PDs are a very simple starting point for characterizing how cortical activity relates to kinematics, but that there is likely much richer population-level dynamical structure and that a more mechanistic, control-focused analytical framework may be fruitful. A good review of this evolution in the arm field can be found in Vyas S, Golub MD, Sussillo D, Shenoy K. 2020. Computation Through Neural Population Dynamics. Annual Review of Neuroscience. 43(1):249-75

      Thank you for highlighting this important point. Research on orofacial movements hasn't progressed at the same pace as limb movement studies. Our manuscript focused specifically on characterizing the 3D directional tuning properties of individual neurons in the orofacial area—an analysis that has not been conducted previously for orofacial sensorimotor control. While we initially prioritized this individual neuron analysis, we recognize the value of broader population-level insights.

      Based on your helpful feedback, we have incorporated additional population analyses to provide a more comprehensive picture of orofacial sensorimotor control and expanded our discussion section. We appreciate your expertise in pushing our work to be more thorough and aligned with current neuroscience approaches.

      Can the authors explain (or at least speculate) why there was such a large difference in behavioral e ect due to nerve block between the two monkeys (Figure 7)?

      We acknowledge this as a variable inherent to this type of experimentation. Previous studies have found large kinematic variation in the effect of oral nerve block as well as in the following compensatory strategies between subjects. Each animal’s biology and response to perturbation vary naturally. Indeed, our subjects exhibited different feeding behavior even in the absence of nerve block perturbation (see Figure 2 in Laurence-Chasen et al., 2022). This is why each individual serves as its own control.

      Do the analyses showing a decrease in tuning after nerve block take into account the changes (and sometimes reduction in variability) of the kinematics between these conditions? In other words, if you subsampled trials to have similar distributions of kinematics between Control and Block conditions, does the effect hold true? The extreme scenario to illustrate my concern is that if Block conditions resulted in all identical movements (which of course they don't), the tuning analysis would find no tuned neurons. The lack of change in decoding accuracy is another yellow flag that there may be a methodological explanation for the decreased tuning result.

      Thank you for bringing up this point. We accounted for the changes in the variability of the kinematics between the control and nerve block conditions in the feeding dataset where we sampled an equal number of trials with similar kinematics for both control and nerve block. However, we did not control for similar kinematics in the drinking task. In the revised manuscript, we have clarified this and performed similar analysis for the drinking task. We sampled a similar distribution of drinking movements from each condition. We compared the neural data from an equal number of trials with a similar left-right angle of movement in the last 100 ms of the tongue trajectory, nearest the spout. There was a decrease in the percentage of neurons that were directionally modulated (between 30 and 80%) with nerve block compared to the control. These results have been included in the revised paper under Methods section: Directional tuning of single neurons, as well as Results section: E ects of nerve block: Decreased directionality of MIo and SIo neurons.

      While the results from decoding using KNN did not show significant differences between decoding accuracies in control vs. nerve block conditions, the results from the additional factor analysis and decoding using LSTM were consistent with the decrease in directional tuning at the level of individual neurons.  

      The manuscript states that "Our results suggest that the somatosensory cortex may be less involved than the motor areas during feeding, possibly because it is a more ingrained and stereotyped behavior as opposed to tongue protrusion or drinking tasks". Could an alternative explanation be more statistical/technical in nature: that during feeding, there will be more variability in exactly what somato sensation afferent signals are being received from trial to trial (because slight differences in kinematics can have large differences in exactly where the tongue is and the where/when/how of what parts of it are touching other parts of the oral cavity)? This variability could "smear out" the apparent tuning using these types of trial-averaged analyses. Given how important proprioception and somatosensation are for not biting the tongue or choking, the speculation that somatosensory cortical activity is suppressed during feedback is very counter-intuitive to this reviewer.

      Thank you for bringing up this point. We have now incorporated this in our revised Discussion (see Comparison between MIo and SIo). We agree with the reviewer that trialby-trial variability in the a erent signals may account for the lower directional signal in SIo during feeding than in drinking. Indeed, SIo’s mean-matched Fano factor in feeding was significantly higher than those in drinking (Author response image 1). Moreover, the results of the additional population and decoding analyses also support this.  

      Author response image 1.

      Comparison of mean-matched Fano Factor between Sio neurons during feeding and drinking control tasks across both subjects (Wilcoxon rank sum test, p < 0.001).

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors aim to uncover how 3D tongue direction is represented in the Motor (M1o) and Somatosensory (S1o) cortex. In non-human primates implanted with chronic electrode arrays, they use X-ray-based imaging to track the kinematics of the tongue and jaw as the animal is either chewing food or licking from a spout. They then correlate the tongue kinematics with the recorded neural activity. Using linear regressions, they characterize the tuning properties and distributions of the recorded population during feeding and licking. Then, they recharacterize the tuning properties after bilateral lidocaine injections in the two sensory branches of the trigeminal nerve. They report that their nerve block causes a reorganization of the tuning properties. Overall, this paper concludes that M1o and S1o both contain representations of the tongue direction, but their numbers, their tuning properties, and susceptibility to perturbed sensory input are different.

      Strengths:

      The major strengths of this paper are in the state-of-the-art experimental methods employed to collect the electrophysiological and kinematic data.

      Weaknesses:

      However, this paper has a number of weaknesses in the analysis of this data.

      It is unclear how reliable the neural responses are to the stimuli. The trial-by-trial variability of the neural firing rates is not reported. Thus, it is unclear if the methods used for establishing that a neuron is modulated and tuned to a direction are susceptible to spurious correlations. The authors do not use shuffling or bootstrapping tests to determine the robustness of their fits or determining the 'preferred direction' of the neurons. This weakness colors the rest of the paper.

      Thank you for raising these points. We have performed the following additional analyses: (1) We have added analyses to ensure that the results could not be explained by neural variability. To show the trial-by-trial variability of the neural firing rates, we have calculated the Fano factor (mean overall = 1.34747; control = 1.46471; nerve block = 1.23023). The distribution was similar across directions, suggesting that responses of MIo and SIo neurons to varying 3D directions were reliable. (2) We have used a bootstrap procedure to ensure that directional tuning cannot be explained by mere chance. (3) To test the robustness of our PDs we also performed a bootstrap test, which yielded the same results for >90% of neurons, and a multiple linear regression test for fit to a cosine-tuning function. In the revised manuscript, the Methods and Results sections have been updated to include these analyses.  

      Author response image 2.

      Comparison of Fano Factor across directions for MIo and SIo Feeding Control (Kruskal-Wallis, p > 0.7).

      The authors compare the tuning properties during feeding to those during licking but only focus on the tongue-tip. However, the two behaviors are different also in their engagement of the jaw muscles. Thus many of the differences observed between the two 'tasks' might have very little to do with an alternation in the properties of the neural code - and more to do with the differences in the movements involved. 

      Using the tongue tip for the kinematic analysis of tongue directional movements was a deliberate choice as the anterior region of the tongue is highly mobile and sensitive due to a higher density of mechanoreceptors. The tongue tip is the first region that touches the spout in the drinking task and moves the food into the oral cavity for chewing and subsequent swallowing. 

      We agree with the reviewer that the jaw muscles are engaged differently in feeding vs. drinking (see Fig. 2). For example, a wider variety of jaw movements along the three axes are observed in feeding compared to the smaller amplitude and mostly vertical jaw movements in drinking. Also, the tongue movements are very different between the two behaviors. In feeding, the tongue moves in varied directions to position the food between left-right tooth rows during chewing, whereas in the drinking task, the tongue moves to discrete locations to receive the juice reward. Moreover, the tongue-jaw coordination differs between tasks; maximum tongue protrusion coincides with maximum gape in drinking but with minimum gape in the feeding behavior. Thus, the different tongue and jaw movements required in each behavior may account for some of the differences observed in the directional tuning properties of individual neurons and population activity. These points have been included in the revised Discussion.

      Author response image 3.

      Tongue tip position (mm) and jaw pitch(degree) during feeding (left) and drinking (right) behaviors. Most protruded tongue position coincides with minimum gape (jaw pitch at 0°) during  feeding but with maximum gape during drinking.

      Many of the neurons are likely correlated with both Jaw movements and tongue movements - this complicates the interpretations and raises the possibility that the differences in tuning properties across tasks are trivial.

      We thank the reviewer for raising this important point. In fact, we verified in a previous study whether the correlation between the tongue and jaw kinematics might explain di erences in the encoding of tongue kinematics and shape in MIo (see Supplementary Fig. 4 in Laurence-Chasen et al., 2023): “Through iterative sampling of sub-regions of the test trials, we found that correlation of tongue kinematic variables with mandibular motion does not account for decoding accuracy. Even at times where tongue motion was completely un-correlated with the jaw, decoding accuracy could be quite high.” 

      The results obtained from population analyses showing distinct properties of population trajectories in feeding vs. drinking behaviors provide strong support to the interpretation that directional information varies between these behaviors.

      The population analyses for decoding are rudimentary and provide very coarse estimates (left, center, or right), it is also unclear what the major takeaways from the population decoding analyses are. The reduced classification accuracy could very well be a consequence of linear models being unable to account for the complexity of feeding movements, while the licking movements are 'simpler' and thus are better accounted for.

      We thank the reviewer for raising this point. The population decoding analyses provide additional insight on the directional information in population activity,  as well as a point of comparison with the results of numerous decoding studies on the arm region of the sensorimotor cortex. In the revised version, we have included the results from decoding tongue direction using a long short-term memory (LSTM) network for sequence-tosequence decoding. These results di ered from the KNN results, indicating that a linear model such as KNN was better for drinking and that a non-linear and continuous decoder was better suited for feeding.  These results have been included in the revised manuscript.

      The nature of the nerve block and what sensory pathways are being affected is unclear - the trigeminal nerve contains many different sensory afferents - is there a characterization of how e ectively the nerve impulses are being blocked? Have the authors confirmed or characterized the strength of their inactivation or block, I was unable to find any electrophysiological evidence characterizing the perturbation.

      The strength of the nerve block is characterized by a decrease in the baseline firing rate of SIo neurons, as shown in Supplementary Figure 6 of “Loss of oral sensation impairs feeding performance and consistency of tongue–jaw coordination” (Laurence-Chasen et al., 2022)..

      Overall, while this paper provides a descriptive account of the observed neural correlations and their alteration by perturbation, a synthesis of the observed changes and some insight into neural processing of tongue kinematics would strengthen this paper.

      We thank the reviewer for this suggestion. We have revised the Discussion to provide a synthesis of the results and insights into the neural processing of tongue kinematics.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The procedure for anesthesia explained in the method section was not clear to me. The following information was missing: what drug/dose was used? How long the animal was under anesthesia? How long after the recovery the experiments were done?

      The animals were fully sedated with ketamine (100 mg/ml, 10 mg/kg) for less than 30 minutes, and all of the data was collected within 90 minutes after the nerve block was administered.

      (2) In Figure 10, panels A and B are very close together, it was not at first clear whether the text "Monkey R, Monkey Y" belongs to panel A or B.

      We have separated the two panels further in the revised figure.

      (3) I found Figure 11 very busy and hard to interpret. Separating monkeys, fitting the line for each condition, or using a bar plot can help with the readability of the figure.

      Thank you for the suggestion. We agree with you and have reworked this figure. To simplify it we have shown the mean accuracy across iterations.

      (4) I found the laterality discussions like "This signifies that there are more neurons in the left hemisphere contributes toward one direction of tongue movement, suggesting that there is some laterality in the PDs of OSMCx neurons that varies between individuals" bit of an over-interpretation of data, given the low n value and the dissimilarity in how strongly the nerve blocking altered monkies behavior.

      Thank you for sharing this viewpoint. We do think that laterality is a good point of comparison with studies on M1 neurons in the arm/hand region. In our study, we found that the peak of the PD distribution coincides with leftward tongue movements in feeding. The distribution of PDs provides insight into how tongue muscles are coordinated during movement. Intrinsic and extrinsic tongue muscles are involved in shaping the tongue (e.g., elongation, broadening) and positioning the tongue (e.g., protrusion/retraction, elevation/depression), respectively. These muscles receive bilateral motor innervation except for genioglossus. Straight tongue protrusion requires the balanced action of the right and left genioglossi while the lateral protrusion involves primarily the contralateral genioglossus. Given this unilateral innervation pattern, we hypothesized that left MIo/SIo neurons would preferentially respond to leftward tongue movements, corresponding to right genioglossus activation. 

      Reviewer #2 (Recommendations for the authors):

      Are the observation of tuning peaks being most frequently observed toward the anterior and superior directions consistent with the statistics of the movements the tongue typically makes? This could be analogous to anisotropies previously reported in the arm literature, e.g., Lillicrap TP, Scott SH. 2013. Preference Distributions of Primary Motor Cortex Neurons Reflect Control Solutions Optimized for Limb Biomechanics. Neuron. 77(1):168-79

      Thank you for bringing our attention to analogous findings by Lillicrap & Scott, 2013. Indeed, we do observe the highest number of movements in the Anterior Superior directions, followed by the Posterior Inferior. This does align with the distribution of tuning peaks that we observed. Author response image 4 shows the proportions of observed movements in each group of directions across all feeding datasets. We have incorporated this data in the Results section: Neuronal modulation patterns di er between MIo and SIo, as well as added this point in the Discussion.

      Author response image 4.

      Proportion of feeding trials in each group of directions. Error bars represent ±1 standard deviation across datasets (n = 4).

      "The Euclidean distance was used to identify nearest neighbors, and the number of nearest neighbors used was K = 7. This K value was determined after testing different Ks which yielded comparable results." In general, it's a decoding best practice to tune hyperparameters (like K) on fully held-out data from the data used for evaluation. Otherwise, this tends to slightly inflate performance because one picks the hyperparameter that happened to give the best result. It sounds like that held-out validation set wasn't used here. I don't think that's going to change the results much at all (especially given the "comparable results" comment), but providing this suggestion for the future. If the authors replicate results on other datasets, I suggest they keep K = 7 to lock in the method.

      K = 7 was chosen based on the size of our smallest training dataset (n = 55). The purpose of testing different K values was not to select which value gave the best result, but to demonstrate that similar K values did not affect the results significantly. We tested the di erent K values on a subset of the feeding data, but that data was not fully held-out from the training set. We will keep your suggestion in mind for future analysis.

      The smoothing applied to Figure 2 PSTHs appears perhaps excessive (i.e., it may be obscuring interesting finer-grained details of these fast movements). Can the authors reduce the 50 ms Gaussian smoothing (I assume this is the s.d.?) ~25 ms is often used in studying arm kinematics. It also looks like the movement-related modulation may not be finished in these 200 ms / 500 ms windows. I suggest extending the shown time window. It would also be helpful to show some trial-averaged behavior (e.g. speed or % displacement from start) under or behind the PSTHs, to give a sense of what phase of the movement the neural activity corresponds to.

      Thank you for the suggestion. We have taken your suggestions into consideration and modified Figure 2 accordingly. We decreased the Gaussian kernel to 25 ms and extended the time window shown. The trial-averaged anterior/posterior displacement was also added to the drinking PSTHs.

      Reviewer #3 (Recommendations for the authors):

      The major consideration here is that the data reported for feeding appears to be very similar to that reported in a previous study:

      "Robust cortical encoding of 3D tongue shape during feeding in macaques"

      Are the neurons reported here the same as the ones used in this previous paper? It is deeply concerning that this is not reported anywhere in the methods section.

      These are the same neurons as in our previous paper, though here we include several additional datasets of the nerve block and drinking sessions. We have now included this in the methods section.

      Second, I strongly recommend that the authors consider a thorough rewrite of this manuscript and improve the presentation of the figures. As written, it was not easy to follow the paper, the logic of the experiments, or the specific data being presented in the figures.

      Thank you for this suggestion. We have done an extensive rewrite of the manuscript and revision of the figures.

      A few recommendations:

      (1) Please structure your results sections and use descriptive topic sentences to focus the reader. In the current version, it is unclear what the major point being conveyed for each analysis is.

      Thank you for this suggestion. We have added topic sentences to the begin each section of the results.

      (2) Please show raster plots for at least a few example neurons so that the readers have a sense of what the neural responses look like across trials. Is all of Figure 2 one example neuron or are they different neurons? Error bars for PETH would be useful to show the reliability and robustness of the tuning.

      Figure 2 shows different neurons, one from MIo and one from SIo for each task. There is shading showing ±1 standard error around the line for each direction, however this was a bit difficult to see. In addition to the other changes we have made to these figures, we made the lines smaller and darkened the error bar shading to accentuate this. We also added raster plots corresponding to the same neurons represented in Figure 2 as a supplement.

      (3) Since there are only two data points, I am not sure I understand why the authors have bar graphs and error bars for graphs such as Figure 3B, Figure 5B, etc. How can one have an error bar and means with just 2 data points?

      Those bars represent the standard error of the proportion. We have changed the y-axis label on these figures to make this clearer.

      (4) Results in Figure 6 could be due to differential placement of the electrodes across the animals. How is this being accounted for?

      Yes, this is a possibility which we have mentioned in the discussion. Even with careful placement there is no guarantee to capture a set of neurons with the exact same function in two subjects, as every individual is different. Rather we focus on analyses of data within the same animal. The purpose of Figure 6 is to show the di erence between MIo and SIo, and between the two tasks, within the same subject. The more salient result from calculating the preferred direction is that there is a change in the distribution between control and nerve block within the same exact population. Discussions relating to the comparison between individuals are speculative and cannot be confirmed without the inclusion of many more subjects.

      (5) For Figure 7, I would recommend showing the results of the Sham injection in the same figure instead of a supplement.

      Thank you for the suggestion, we have added these results to the figure.

      (6) I think the e ects of the sensory block on the tongue kinematics are underexplored in Figure 7 and Figure 8. The authors could explore the deficits in tongue shape, and the temporal components of the trajectory.

      Some of these effects on feeding have been explored in a previous paper, LaurenceChasen et al., 2022. We performed some additional analyses on changes to kinematics during drinking, including the number of licks per 10 second trial and the length of individual licks. The results of these are included below. We also calculated the difference in the speed of tongue movement during drinking, which generally decreased and exhibited an increase in variance with nerve block (f-test, p < 0.001). However, we have not included these figures in the main paper as they do not inform us about directionality.

      Author response image 5.

      Left halves of hemi-violins (black) are control and right halves (red) are nerve block for an individual. Horizontal black lines represent the mean and horizontal red lines the median. Results of two-tailed t-test and f-test are indicated by asterisks and crosses, respectively: *,† p < 0.05; **,†† p < 0.01; ***,††† p < 0.001.

      (9) In Figures 9 and 10. Are the same neurons being recorded before and after the nerve block? It is unclear if the overall "population" properties are different, or if the properties of individual neurons are changing due to the nerve block.

      Yes, the same neurons are being recorded before and after nerve block. Specifically, Figure 9B shows that the properties of many individual neurons do change due to the nerve block. Di erences in the overall population response may be attributed to some of the units having reduced/no activity during the nerve block session.

      Additionally, I recommend that the authors improve their introduction and provide more context to their discussion. Please elaborate on what you think are the main conceptual advances in your study, and place them in the context of the existing literature. By my count, there are 26 citations in this paper, 4 of which are self-citations - clearly, this can be improved upon.

      Thank you for this suggestion. We have done an extensive rewrite of the Introduction and Discussion. We discussed the main conceptual advances in our study and place them in the context of the existing literature.

    1. eLife Assessment

      This useful manuscript reports on a new mouse model for LAMA2-MD, a rare but very severe congenital muscular dystrophy. The knockout mice were generated by removing exon3 in the Lama2 gene, which results in a frameshift in exon4 and a premature stop codon. These animals lack any laminin-alpha2 protein and confirm results from previous Lama2 knockout models. Additionally, this study includes weak transcriptomics data that might be a good resource for the field. However, experimental evidence, methods, and data analyses supporting the main claims of the manuscript are incomplete.

    2. Reviewer #1 (Public review):

      Strengths:

      This work adds another mouse model for LAMA2-MD that re-iterates the phenotype of previously published models. Such as dy3K/dy3K; dy/dy and dyW/dyW mice. The phenotype is fully consistent with the data from others.

      One of the major weaknesses of the manuscript initially submitted was the overinterpretation and the overstatements. The revised version is clearly improved as the authors toned-down their interpretation and now also cite the relevant literature of previous work.

      Comments on revisions:

      This is the second revision of a paper focusing on the generation of a CRISPR/Cas9-engineered mouse model for LAMA2-MD. I have reviewed the initial submission, the first revision, and now this second revision. While there have been improvements, several issues still need to be addressed by the authors. I will outline these points without dividing them into major and minor categories:

      Introduction:

      The statement regarding existing mouse models requires correction: The claim, "They were established in the pre-gene therapy era, leaving trace of engineering, such as bacterial elements in the Lama2 gene locus, thus unsuitable for testing various gene therapy strategies," is inaccurate. Current mouse models can indeed be used for testing gene therapy strategies, regardless of whether they contain elements in the Lama2 locus. The primary consideration is whether or not they express laminin-alpha2. Please revise this statement.<br /> Results Section:

      scRNA-seq:

      The authors note that they analyzed "a total of 8,111 cells from the dyH/dyH mouse brain and 8,127 cells from the WT mouse brain were captured using the 10X Genomics platform (Figure supplement 4A, B)." This is too few cells to support firm conclusions. Furthermore, there is a discrepancy in the referred figure S4, which indicates that 10,094 cells were analyzed for dyH/dyH mice and 10,496 for wild-type mice. Please correct this inconsistency.

      Figure 5C displays differences in cell populations between wild-type and dyH/dyH mice. Given the low number of cells analyzed and the lack of replicates, these differences cannot be considered reliable. More samples should be analyzed to support these findings.

      The data suggest a defect in the BBB for dyH/dyH mice, but this conclusion is based on minimal cell counts and remains purely correlative. If BBB issues exist, experimental validation is necessary, such as injecting dyes into the bloodstream to detect any leakage. I have previously highlighted this in my comments on earlier manuscript versions.

      Bulk RNA-seq:

      The number of samples analyzed here is substantial, making the data potentially more robust. These data could serve as a valuable resource for other researchers. However, it is important to note that all data are correlative and do not provide functional insights.

      Overall:

      The manuscript still lacks significant insights, partly because existing mouse models for LAMA2-MD have been extensively analyzed. While the bulk RNA-seq data offer some value as a resource, I recommend that the authors re-assess their writing and further temper their interpretations of the findings.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      This work adds another mouse model for LAMA2-MD that re-iterates the phenotype of previously published models. Such as dy3K/dy3K; dy/dy and dyW/dyW mice. The phenotype is fully consistent with the data from others.

      Thank you for the valuable comments and good suggestions you have proposed, and we have added information and analysis of another mouse model for LAMA2-MD in the updated version 2 of this manuscript.

      One of the major weaknesses of the manuscript initially submitted was the overinterpretation and the overstatements. The revised version is clearly improved as the authors toned-down their interpretation and now also cite the relevant literature of previous work.

      Thank you for the good comments you have proposed, and we have carefully corrected the overinterpretation and overstatements in the previous updated version.

      Unfortunately, the data on RNA-seq and scRNA-seq are still rather weak. scRNA-seq was conducted with only one mouse resulting in only 8000 nuclei. I am not convinced that the data allow us to interpret them to the extent of the authors. Similar to the first version, the authors infer function by examining expression. Although they are a bit more cautious, they still argue that the BBB is not functional in dy<sup>H</sup>/dy<sup>H</sup> mice without showing leakiness. Such experiments can be done using dyes, such as Evans-blue or Cadaverin. Hence, I would suggest that they formulate the text still more carefully.

      Thank you for the valuable suggestions. We also agree that we should perform more related functional experiments such as Evans-blue or Cadaverin to confirm the impaired BBB. However, the related functional experiments haven’t been done due to the first author has been working in clinic. While, we have added the "Limitations" part, and made statements in the Limitations part with "Even though RNA-seq and scRNA-seq have been performed, the data of scRNA-seq are still insufficient due to the limited number of mouse brains. This study has provided potentially important information for the molecular pathogenetic mechanisms of muscular dystrophy and brain dysfunction for LAMA2-CMD, however, some related functional experiments have not been further performed".

      A similar lack of evidence is true for the suggested cobblestone-like lissencephaly of the mice. There is no strong evidence that this is indeed occurring in the mice (might also be a problem because mice die early). Hence, the conclusions need to be formulated in such a way that readers understand that these are interpretations and not facts.

      Thank you for the valuable suggestions. We do agree with this comment, and have made statement in the Limitations with "This study has provided potentially important information for the molecular pathogenetic mechanisms of muscular dystrophy and brain dysfunction for LAMA2-CMD, however, some related functional experiments have not been further performed". Also, for the cobblestone-like lissencephaly which was showed in LAMA2-CMD patients while not found in the mouse model, we have added the discussion as "Though the cortical malformations were not found in the dy H/dy H brains by MRI analysis probably due to the small volume in within 1 month old, Thus, the changes in transcriptomes and protein levels provided potentially useful data for the hypothesis of the impaired gliovascular basal lamina of the BBB, which might be associated with occipital pachygyria in LAMA2-CMD patients."

      Finally, I am surprised that the only improvement in the main figures is the Western blot for laminin-alpha2. The histology of skeletal muscle still looks rather poor. I do not know what the problems are but suggest that the authors try to make sections from fresh-frozen tissue. I anticipate that the mice were eventually perfused with PFA before muscles were isolated. This often results in the big gaps in the sections.

      Thank you for the valuable suggestions. We do agree with this comment and we should make sections from fresh-frozen tissue. Therefore, we have made statement in the Limitations with "Moreover, due to making sections with PFA before muscles isolated, and not from fresh-frozen tissue, there have been big gaps in the sections which do affect the histology of skeletal muscle to some extent."

      Overall, the work is improved but still would need additional experiments to make it really an important addition to the literature in the LAMA-MD field.

      Thank you for all your good comments and the valuable suggestions.

      Reviewer #2 (Public Review):

      This revised manuscript describes the production of a mouse model for LAMA2- Related Muscular Dystrophy. The authors investigate changes in transcripts within the brain and blood barrier. The authors also investigate changes in the transcriptome associated with the muscle cytoskeleton. Strengths: (1) The authors produced a mouse model of LAMA2-CMD using CRISPR-Cas9. (2) The authors identify cellular changes that disrupted the blood-brain barrier.

      Thank you for your good comments.

      Weaknesses:

      The authors throughout the manuscript overstate "discoveries" which have been previously described, published and not appropriately cited.

      Thank you for your great suggestion. We have toned-down the interpretations and overstatements throughout the manuscript, and added words such as "potentially", "possible", "some potential clues", "was speculated to probably", and so on.

      Alternations in the blood brain barrier and in the muscle cell cytoskeleton in LAMA2-CMD have been extensively studied and published in the literature and are not cited appropriately.

      Thank you for your great suggestion. We do agree with that alternations in the muscle cell cytoskeleton in LAMA2-CMD have been extensively studied and published, and the related literatures have been cited in the updated version 2.0. However, alternations in the blood brain barrier in LAMA2-CMD haven’t been extensively studied, only some papers (such as PMID: 25392494, PMID: 32792907) have investigated or discussed this issue.

      The authors have increased animal number to N=6, but this is still insufficient based on Power analysis results in statistical errors and conclusions that may be incorrect.

      Thank you for your great suggestion. We do agree that the animal number should be increased for Power analysis, and we have added statements in the Limitations with "Finally, due to the limited number of animal samples for the Power analysis, the statistical errors and conclusions might be affected."

      The use of "novel mouse model" in the manuscript overstates the impact of the study.

      Thank you for your great suggestion. We have changed the statement "novel mouse model" throughout the manuscript except the title.

      All studies presented are descriptive and do not more to the field except for producing yet another mouse model of LAMA2-CMD and is the same as all the others produced.

      Thank you for your comment. We do agree that further functional experiments have not been performed to reveal and confirm the pathogenesis. However, the analysis of phenotype was systematic and comprehensive, including survival time, motor function, serum CK, muscle MRI, muscle histopathology in different stages, and brain histopathology. Moreover, RNA-seq and scRNA-seq in LAMA2-CMD have been seldom performed before, and the data in this study could provide potentially important information for the molecular pathogenetic mechanisms of muscular dystrophy and brain dysfunction for LAMA2-CMD.

      Grip strength measurements are considered error prone and do not give an accurate measurement of muscle strength, which is better achieved using ex vivo or in vivo muscle contractility studies.

      Thank you for your great suggestion. We do agree that grip strength measurements are considered error prone and do not give an accurate measurement of muscle strength. And we have added related statement in the Limitations with "Grip strength measurements used in this study are considered error prone and do not give an accurate measurement of muscle strength, which would be better achieved using ex vivo or in vivo muscle contractility studies."

      A lack of blinded studies as pointed out of the authors is a concern for the scientific rigor of the study.

      Thank you for your great suggestion. We performed the studies with those scoring outcome measures not blinded to the groups. Actually, it was very easy to discriminate the dy<sup>H</sup>/dy<sup>H</sup> groups from the WT/Het mice due to that the dy<sup>H</sup>/dy<sup>H</sup> mice showed much smaller body shape than other groups from as early as P7 .

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      There are multiple grammatical errors throughout the manuscript which should be corrected.

      Thank you for your recommendation. We have carefully corrected the grammatical errors within the manuscript.

      The authors mention no changes in intestinal muscles, but it is unclear if they are referring to skeletal or smooth muscle.

      Thank you for your good comment. The intestinal muscles with no changes in this study are referring to smooth muscle, and we have changes the description into intestinal smooth muscles.

    1. eLife Assessment

      The authors present useful findings on the use of a single-fly behavioral paradigm for assessing different Drosophila genetic models of neurodegeneration. The experimental design and analyses are solid and can be used for quick behavioral assessment in fly models of various neurodegenerative diseases, especially those having an impact on locomotion. The work will be of interest to Drosophila biologists using behavior as a readout for their studies.

    2. Reviewer #1 (Public review):

      Translating discoveries from model organisms to humans is often challenging, especially in neuropsychiatric diseases, due to the vast gaps in the circuit complexities and cognitive capabilities. Kajtor et al. propose to bridge this gap in the fly models of Parkinson's disease (PD) by developing a new behavioural assay where flies respond to a moving shadow by modifying their locomotor activities. The authors believe the flies' response to the shadow approximates their escape response to an approaching predator. To validate this argument, they tested several PD-relevant transgenic fly lines and showed that some of them indeed have altered responses in their assay.

      Strengths:

      This single-fly-based assay is easy and inexpensive to set up, scalable and provides sensitive, quantitative estimates to probe flies' optomotor acuity. The behavioural data is detailed, and the analysis parameters are well-explained.

      Weaknesses:

      The authors have yet to link cellular physiology to behaviour. It will be interesting to see how future use of this assay helps uncover connections between cellular pathology and behavioural changes.

    3. Reviewer #2 (Public review):

      The manifestation and progression of neurodegenerative disorders is poorly understood. Many of the neuronal disorders start by presenting subtle changes in neuronal circuit and quantification and measurement of these subtle behavior responses could help one delineate the mechanisms involved. The present study very nicely uses the flies' behavioral response to predator-mimicking passing shadows to measure subtle changes in their behavior. The data from various fly genetic models of Parkinson's disease supports their claim. This single trial method is useful to capture the individual animal's response to the threatening stimuli but stops short of capturing the fine ambulatory responses which could provide further information on an individual's behavioral response. By capturing the fine features, the authors could get detailed observations, such as posture, gait or wing positioning for a better understanding the behavioral response to the passing shadow.

    4. Author response:

      The following is the authors’ response to the original reviews

      We thank the Reviewers for their constructive comments and the Editor for the possibility to address the Reviewers’ points in this rebuttal. We 

      (1) Conducted new experiments with NP6510-Gal4 and TH-Gal4 lines to address potential behavioral differences due to targeting dopaminergic vs. both dopaminergic and serotonergic neurons

      (2) Conducted novel data analyses to emphasize the strength of sampling distributions of behavioral parameters across trials and individual flies

      (3) Provided Supplementary Movies

      (4) Calculated additional statistics

      (5) Edited and added text to address all points of the Reviewers.

      Please see our point-by-point responses below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Translating discoveries from model organisms to humans is often challenging, especially in neuropsychiatric diseases, due to the vast gaps in the circuit complexities and cognitive capabilities. Kajtor et al. propose to bridge this gap in the fly models of Parkinson's disease (PD) by developing a new behavioral assay where flies respond to a moving shadow by modifying their locomotor activities. The authors believe the flies' response to the shadow approximates their escape response to an approaching predator. To validate this argument, they tested several PD-relevant transgenic fly lines and showed that some of them indeed have altered responses in their assay.

      Strengths:

      This single-fly-based assay is easy and inexpensive to set up, scalable, and provides sensitive, quantitative estimates to probe flies' optomotor acuity. The behavioral data is detailed, and the analysis parameters are well-explained.

      We thank the Reviewer for the positive assessment of our study.

      Weaknesses:

      While the abstract promises to give us an assay to accelerate fly-to-human translation, the authors need to provide evidence to show that this is indeed the case. They have used PD lines extensively characterized by other groups, often with cheaper and easier-to-setup assays like negative geotaxis, and do not offer any new insights into them. The conceptual leap from a low-level behavioral phenotype, e.g. changes in walking speed, to recapitulating human PD progression is enormous, and the paper does not make any attempt to bridge it. It needs to be clarified how this assay provides a new understanding of the fly PD models, as the authors do not explore the cellular/circuit basis of the phenotypes. Similarly, they have assumed that the behavior they are looking at is an escape-from-predator response modulated by the central complex- is there any evidence to support these assumptions? Because of their rather superficial approach, the paper does not go beyond providing us with a collection of interesting but preliminary observations.

      We thank the Reviewer for pointing out some limitations of our study. We would like to emphasize that what we perceive as the main advantage of performing single-fly and single-trial analyses is the access to rich data distributions that provide more fine-scale information compared to bulk assays. We think that this is exactly going one step closer to ‘bridging the enormous conceptual leap from a low-level behavioral phenotype, e.g. changes in walking speed, to recapitulating human PD progression’, and we showcase this in our study by comparing the distributions over the entire repertoire of behavioral responses across fly mutants. Nevertheless, we agree with the Reviewer that many more steps in this direction are needed to improve translatability. Therefore, we toned down the corresponding statements in the Abstract and in the Introduction. Moreover, to further emphasize the strength of sampling distributions of behavioral parameters across trials and individual flies, we complemented our comparisons of central tendencies with testing for potential differences in data dispersion, demonstrated in the novel Supplementary Figure S4.

      Looming stimuli have been used to characterize flies’ escape behaviors. These studies uncovered a surprisingly rich behavioral repertoire (Zacarias et al., 2018), which was modulated by both sensory and motor context, e.g. walking speed at time of stimulus presentation (Card and Dickinson, 2008; Oram and Card, 2022; Zacarias et al., 2018). The neural basis of these behaviors was also investigated, revealing loom-sensitive neurons in the optic lobe and the giant fiber escape pathway (Ache et al., 2019; de Vries and Clandinin, 2012). Although less frequently, passing shadows were also employed as threat-inducing stimuli in flies (Gibson et al., 2015). We opted for this variant of the stimulus so that we could ensure that the shadow reached the same coordinates in all linear track concurrently, aiding data analysis and scalability. Similar to the cited study, we found the same behavioral repertoire as in studies with looming stimuli, with an equivalent dependence on walking speed, confirming that looming stimuli and passing shadows can both be considered as threat-inducing visual stimuli. We added a discussion on this topic to the main text.

      Reviewer #2 (Public Review):

      In this study, Kajtor et al investigated the use of a single-animal trial-based behavioral assay for the assessment of subtle changes in the locomotor behavior of different genetic models of Parkinson's disease of Drosophila. Different genotypes used in this study were Ddc-GAL4>UASParkin-275W and UAS- α-Syn-A53T. The authors measured Drosophila's response to predatormimicking passing shadow as a threatening stimulus. Along with these, various dopamine (DA) receptor mutants, Dop1R1, Dop1R2 and DopEcR were also tested.

      The behavior was measured in a custom-designed apparatus that allows simultaneous testing of 13 individual flies in a plexiglass arena. The inter-trial intervals were randomized for 40 trials within 40 minutes duration and fly responses were defined into freezing, slowing down, and running by hierarchical clustering. Most of the mutant flies showed decreased reactivity to threatening stimuli, but the speed-response behavior was genotype invariant.

      These data nicely show that measuring responses to the predator-mimicking passing shadows could be used to assess the subtle differences in the locomotion parameters in various genetic models of Drosophila.

      The understanding of the manifestation of various neuronal disorders is a topic of active research. Many of the neuronal disorders start by presenting subtle changes in neuronal circuits and quantification and measurement of these subtle behavior responses could help one delineate the mechanisms involved. The data from the present study nicely uses the behavioral response to predator-mimicking passing shadows to measure subtle changes in behavior. However, there are a few important points that would help establish the robustness of this study.

      We thank the Reviewer for the constructive comments and the positive assessment of our study.

      (1) The visual threat stimulus for measuring response behavior in Drosophila is previously established for both single and multiple flies in an arena. A comparative analysis of data and the pros and cons of the previously established techniques (for example, Gibson et al., 2015) with the technique presented in this study would be important to establish the current assay as an important advancement.

      We thank the Reviewer for this suggestion. We included the following discussion on measuring response behavior to visual threat stimuli in the revised manuscript.

      Many earlier studies used looming stimulus, that is, a concentrically expanding shadow, mimicking the approach of a predator from above, to study escape responses in flies (Ache et al., 2019; Card and Dickinson, 2008; de Vries and Clandinin, 2012; Oram and Card, 2022; Zacarias et al., 2018) as well as rodents (Braine and Georges, 2023; Heinemans and Moita, 2024; Lecca et al., 2017). These assays have the advantage of closely resembling naturalistic, ecologically relevant threatinducing stimuli, and allow a relatively complete characterization of the fly escape behavior repertoire. As a flip side of their large degree of freedom, they do not lend themselves easily to provide a fully standardized, scalable behavioral assay. Therefore, Gibson et al. suggested a novel threat-inducing assay operating with moving overhead translational stimuli, that is, passing shadows, and demonstrated that they induce escape behaviors in flies akin to looming discs (Gibson et al., 2015). This assay, coined ReVSA (repetitive visual stimulus-induced arousal) by the authors, had the advantage of scalability, while constraining flies to a walking arena that somewhat restricted the remarkably rich escape types flies otherwise exhibit. Here we carried this idea one step further by using a screen to present the shadows instead of a physically moving paddle and putting individual flies to linear corridors instead of the common circular fly arena. This ensured that the shadow reached the same coordinates in all linear tracks concurrently and made it easy to accurately determine when individual flies encountered the stimulus, aiding data analysis and scalability. We found the same escape behavioral repertoire as in studies with looming stimuli and ReVSA (Gibson et al., 2015; Zacarias et al., 2018), with a similar dependence on walking speed (Oram and Card, 2022; Zacarias et al., 2018), confirming that looming stimuli and passing shadows can both be considered as threat-inducing visual stimuli.  

      (2) Parkinson's disease mutants should be validated with other GAL-4 drivers along with DdcGAL4, such as NP6510-Gal4 (Riemensperger et al., 2013). This would be important to delineate the behavioral differences due to dopaminergic neurons and serotonergic neurons and establish the Parkinson's disease phenotype robustly.

      We thank the Reviewer for point out this limitation. To address this, we repeated our key experiments in Fig.3. with both TH-Gal4 and NP6510-Gal4 lines, and their respective controls. These yielded largely similar results to the Ddc-Gal4 lines reported in Fig.3., reproducing the decreased speed and decreased overall reactivity of PD-model flies. Nevertheless, TH-Gal4 and NP6510-Gal4 mutants showed an increased propensity to stop. Stop duration showed a significant increase not only in α-Syn but also in Parkin fruit flies. These novel results have been added to the text and are demonstrated in Supplementary Figure S3.

      (3) The DopEcR mutant genotype used for behavior analysis is w1118; PBac{PB}DopEcRc02142TM6B, Tb1. Balancer chromosomes, such as TM6B,Tb can have undesirable and uncharacterised behavioral effects. This could be addressed by removing the balancer and testing the DopEcR mutant in homozygous (if viable) or heterozygous conditions.

      We appreciate the Reviewer's comment and acknowledge the potential for the DopEcR balancer chromosome to produce unintended behavioral effects. However, given that this mutant was not essential to our main conclusions, we opted not to repeat the experiment. Nevertheless, we now discuss the possible confounds associated with using the PBac{PB}DopEcRc02142 mutant allele over the balancer chromosome. “We recognize a limitation in using PBac{PB}DopEcRc02142 over the  TM6B, Tb<sup>1</sup> balancer chromosome, as the balancer itself may induce behavioral deficits in flies. We consider this unlikely, as the PBac{PB}DopEcRc02142 mutation demonstrates behavioral effects even in heterozygotes (Ishimoto et al., 2013). Additionally, to our knowledge, no studies have reported behavioral deficits in flies carrying the TM6B, Tb<sup>1</sup> balancer chromosome over a wild-type chromosome.”

      (4) The height of the arena is restricted to 1mm. However, for the wild-type flies (Canton-S) and many other mutants, the height is usually more than 1mm. Also, a 1 mm height could restrict the fly movement. For example, it might not allow the flies to flip upside down in the arena easily. This could introduce some unwanted behavioral changes. A simple experiment with an arena of height at least 2.5mm could be used to verify the effect of 1mm height.

      We thank the Reviewer for this comment, which prompted us to reassess the dimensions of the apparatus. The height of the arena was 1.5 mm, which we corrected now in the text. We observed that the arena did not restrict the flies walking and that flies could flip in the arena. We now include two Supplementary Movies to demonstrate this.

      (5) The detailed model for Monte Carlo simulation for speed-response simulation is not described. The simulation model and its hyperparameters need to be described in more depth and with proper justification.

      We thank the Reviewer for pointing out a lack of details with respect to Monte Carlo simulations. We used a nested model built from actual data distributions, without any assumptions. Accordingly, the stimulation did not have hyperparameters typical in machine learning applications, the only external parameter being the number of resamplings (3000 for each draw). We made these modeling choices clearer and expanded this part as follows.

      “The effect of movement speed on the distribution of behavioral response types was tested using a nested Monte Carlo simulation framework (Fig. S5). This simulation aimed to model how different movement speeds impact the probability distribution of response types, comparing these simulated outcomes to empirical data. This approach allowed us to determine whether observed differences in response distributions are solely due to speed variations across genotypes or if additional behavioral factors contribute to the differences. First, we calculated the probability of each response type at different specific speed values (outer model). These probabilities were derived from the grand average of all trials across each genotype, capturing the overall tendency at various speeds. Second, we simulated behavior of virtual flies (n = 3000 per genotypes, which falls within the same order of magnitude as the number of experimentally recorded trials from different genotypes) by drawing random velocity values from the empirical velocity distribution specific to the given genotype and then randomly selecting a reaction based on the reaction probabilities associated with the drawn velocity (inner model). Finally, we calculated reaction probabilities for the virtual flies and compared it with real data from animals of the same genotype.

      Differences were statistically tested by Chi-squared test.”

      (6) The statistical analysis in different experiments needs revisiting. It wasn't clear to me if the authors checked if the data is normally distributed. A simple remedy to this would be to check the normality of data using the Shapiro-Wilk test or Kolmogorov-Smirnov test. Based on the normality check, data should be further analyzed using either parametric or non-parametric statistical tests. Further, the statistical test for the age-dependent behavior response needs revisiting as well. Using two-way ANOVA is not justified given the complexity of the experimental design. Again, after checking for the normality of data, a more rigorous statistical test, such as split-plot ANOVA or a generalized linear model could be used.

      We thank the Reviewer for this comment. We performed Kolmogorov-Smirnov test for normality on the data distributions underlying Figure 3, and normality was rejected for all data distributions at p = 0.05, which justifies the use of the non-parametric Mann-Whitney U-test. Regarding ANOVA, we would like to point out that the ANOVA hypothesis test design is robust to deviations from normality (Knief and Forstmeier, 2021; Mooi et al., 2018). While the Kruskal-Wallis test is considered a reasonable non-parametric alternative of one-way ANOVA, there is no clear consensus for a non-parametric alternative of two-way ANOVA. Therefore, we left the two-way ANOVA for Figure 5 in place; however, to increase the statistical confidence in our conclusions, we performed Kruskal-Wallis tests for the main effect of age and found significant effects in all genotypes in accordance with the ANOVA, confirming the results (Stop frequency, DopEcR p = 0.0007; Dop1R1, p = 0.004; Dop1R2, p = 9.94 × 10<sup>-5</sup>; w<sup>1118</sup>, p = 9.89 × 10<sup>-13</sup>; y<sup>1</sup> w<sup>67</sup>c<sup>23</sup>, p = 2.54 × 10<sup>-5</sup>; Slowing down frequency, DopEcR, p = 0.0421; Dop1R1, p = 5.77 x 10<sup>-6</sup>; Dop1R2, p = 0.011; w<sup>1118</sup>, p = 2.62 x 10<sup>-5</sup>; y<sup>1</sup> w<sup>67</sup>c<sup>23</sup>, p = 0.0382; Speeding up frequency, DopEcR, p = 0.0003; Dop1R1, p = 2.06 x 10<sup>-7</sup>; Dop1R2, p = 2.19 x 10<sup>-6</sup>; w<sup>1118</sup>, p = 0.0044; y<sup>1</sup> w<sup>67</sup>c<sup>23</sup>, p = 1.36 x 10<sup>-5</sup>). We also changed the post hoc Tukey-tests to post hoc Mann-Whitney tests in the text to be consistent with the statistical analyses for Figure 3. These resulted in very similar results as the Tukey-tests. Of note, there isn’t a straightforward way of correcting for multiple comparisons in this case as opposed to the Tukey’s ‘honest significance’ approach, we thus report uncorrected p values and suggest considering them at p = 0.01, which minimizes type I errors. These notes have been added to the ‘Data analysis and statistics’ Methods section.

      (7) The dopamine receptor mutants used in this study are well characterized for learning and memory deficits. In the Parkinson's disease model of Drosophila, there is a loss of DA neurons in specific pockets in the central brain. Hence, it would be apt to use whole animal DA receptor mutants as general DA mutants rather than the Parkinson's disease model. The authors may want to rework the title to reflect the same.

      We thank the Reviewer for this comment, which suggests that we were not sufficiently clear on the Drosophila lines with DA receptor mutations. We used Mi{MIC} random insertion lines for dopamine receptor mutants, namely y<sup>1</sup> w<sup>*1</sup>; Mi{MIC}Dop1R1<sup>MI04437</sup> (BDSC 43773), y<sup>1</sup> w<sup>*1</sup>; Mi{MIC}Dop1R2<sup>MI08664</sup> (BDSC 51098) (Harbison et al., 2019; Pimentel et al., 2016), and w<sup>1118</sup>; PBac{PB}DopEcR<sup>c02142</sup>/TM6B, Tb<sup>1</sup> (BDSC 10847) (Ishimoto et al., 2013; Petruccelli et al., 2020, 2016). These lines carried reported mutations in dopamine receptors, most likely generating partial knock down of the respective receptors. We made this clearer by including the full names at the first occurrence of the lines in Results (beyond those in Methods) and adding references to each of the lines.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Please think about focusing the manuscript either on the escape response or the PD pathology and provide additional evidence to demonstrate that you indeed have a novel system to address open questions in the field.

      As detailed above, we now emphasize more that the main advantage of our single-trial-based approach lies in the appropriate statistical comparison of rich distributions of behavioral data. Please see our response to the ‘Weaknesses’ section for more details.

      (2) Please explain the rationale for choosing the genetic lines and provide appropriate genetic controls in the experiments, e.g. trans-heterozygotes. Why use Ddc-Gal4 instead of TH or other specific Split-Gal4 lines?

      We thank the Reviewer for this suggestion. We repeated our key experiments with TH-Gal4 and NP6510-Gal4 lines. Please see our response to Point #2 of Reviewer #2 for details.

      (3) Please proofread the manuscript for ommissions. e.g. there's no legend for Fig 4b.

      We respectfully point out that the legend is there, and it reads “b, Proportion of a given response type as a function of average fly speed before the shadow presentation. Top, Parkin and α-Syn flies. Bottom, Dop1R1, Dop1R2 and DopEcR mutant flies.”

      Reviewer #2 (Recommendations For The Authors):

      (1) In figure 2(c), representing the average walking speed data for different mutants would be useful to visually correlate the walking differences.

      We thank the Reviewer for this suggestion. The average walking speed was added in a scatter plot format, as suggested in the next point of the Reviewer. 

      (2) The data could be represented more clearly using scatter plots. Also, the color scheme could be more color-blindness friendly.

      We thank the Reviewer for this suggestion. We added scatter plots to Fig.2c that indeed represent the distribution of behavioral responses better. We also changed the color scheme and removed red/green labeling.

      (3) The manuscript should be checked for typos such as in line 252, 449, 484.

      Thank you. We fixed the typos.

      References

      Ache JM, Polsky J, Alghailani S, Parekh R, Breads P, Peek MY, Bock DD, von Reyn CR, Card GM. 2019. Neural Basis for Looming Size and Velocity Encoding in the Drosophila Giant Fiber Escape Pathway. Curr Biol 29:1073-1081.e4. doi:10.1016/j.cub.2019.01.079

      Braine A, Georges F. 2023. Emotion in action: When emotions meet motor circuits. Neurosci Biobehav Rev 155:105475. doi:10.1016/j.neubiorev.2023.105475

      Card G, Dickinson MH. 2008. Visually Mediated Motor Planning in the Escape Response of Drosophila. Curr Biol 18:1300–1307. doi:10.1016/j.cub.2008.07.094

      de Vries SEJ, Clandinin TR. 2012. Loom-Sensitive Neurons Link Computation to Action in the Drosophila Visual System. Curr Biol 22:353–362. doi:10.1016/j.cub.2012.01.007

      Gibson WT, Gonzalez CR, Fernandez C, Ramasamy L, Tabachnik T, Du RR, Felsen PD, Maire MR, Perona P, Anderson DJ. 2015. Behavioral Responses to a Repetitive Visual Threat Stimulus Express a Persistent State of Defensive Arousal in Drosophila. Curr Biol 25:1401– 1415. doi:10.1016/j.cub.2015.03.058

      Harbison ST, Kumar S, Huang W, McCoy LJ, Smith KR, Mackay TFC. 2019. Genome-Wide Association Study of Circadian Behavior in Drosophila melanogaster. Behav Genet 49:60–82. doi:10.1007/s10519-018-9932-0

      Heinemans M, Moita MA. 2024. Looming stimuli reliably drive innate defensive responses in male rats, but not learned defensive responses. Sci Rep 14:21578. doi:10.1038/s41598-02470256-2

      Ishimoto H, Wang Z, Rao Y, Wu C, Kitamoto T. 2013. A Novel Role for Ecdysone in Drosophila Conditioned Behavior: Linking GPCR-Mediated Non-canonical Steroid Action to cAMP Signaling in the Adult Brain. PLoS Genet 9:e1003843. doi:10.1371/journal.pgen.1003843

      Knief U, Forstmeier W. 2021. Violating the normality assumption may be the lesser of two evils. Behav Res Methods 53:2576–2590. doi:10.3758/s13428-021-01587-5

      Lecca S, Meye FJ, Trusel M, Tchenio A, Harris J, Schwarz MK, Burdakov D, Georges F, Mameli M. 2017. Aversive stimuli drive hypothalamus-to-habenula excitation to promote escape behavior. Elife 6:1–16. doi:10.7554/eLife.30697

      Mooi E, Sarstedt M, Mooi-Reci I. 2018. Market Research, Springer Texts in Business and Economics. Singapore: Springer Singapore. doi:10.1007/978-981-10-5218-7

      Oram TB, Card GM. 2022. Context-dependent control of behavior in Drosophila. Curr Opin Neurobiol 73:102523. doi:10.1016/j.conb.2022.02.003

      Petruccelli E, Lark A, Mrkvicka JA, Kitamoto T. 2020. Significance of DopEcR, a G-protein coupled dopamine/ecdysteroid receptor, in physiological and behavioral response to stressors. J Neurogenet 34:55–68. doi:10.1080/01677063.2019.1710144

      Petruccelli E, Li Q, Rao Y, Kitamoto T. 2016. The Unique Dopamine/Ecdysteroid Receptor Modulates Ethanol-Induced Sedation in Drosophila. J Neurosci 36:4647–4657. doi:10.1523/JNEUROSCI.3774-15.2016

      Pimentel D, Donlea JM, Talbot CB, Song SM, Thurston AJF, Miesenböck G. 2016. Operation of a homeostatic sleep switch. Nature 536:333–337. doi:10.1038/nature19055

      Zacarias R, Namiki S, Card GM, Vasconcelos ML, Moita MA. 2018. Speed dependent descending control of freezing behavior in Drosophila melanogaster. Nat Commun 9:1–11. doi:10.1038/s41467-018-05875-1

    1. eLife Assessment

      This is an important study that combines replications of findings and novel detailed MRI investigations to assess the impact of environmental enrichment and maternal behavior on mice brain structure at different stages of development. The results and evidence supporting the conclusions are convincing, but in detail, the interpretation is challenging, in particular due to inter-individual and inter-litter variability. The extent to which maternal care mediates the impact of enrichment on brain development during the perinatal period also remains unclear because behavior was observed only during short periods, and the performed analyses are still incomplete. This study will nevertheless be of significant interest to neuroscientists and researchers interested in neurodevelopment in relation to environmental factors because of its in-depth use of MRI to study brain plasticity in mice.

    1. eLife Assessment

      This manuscript aims to identify the pacemaker cells in the lymphatic collecting vessels - the cells that initiate the autonomous action potentials and contractions needed to drive lymphatic pumping. Through the exemplary use of existing approaches (genetic deletions and cytosolic calcium detection in multiple cell types), the authors convincingly determine that lymphatic muscle cells are the origin of the action potential that triggers lymphatic contraction. The inclusion of scRNAseq and membrane potential data enhances a tremendous study. This fundamental discovery establishes a new standard for the field of lymphatic physiology.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript explores the multiple cell types present in the wall of murine collecting lymphatic vessels with the goal of identifying cells that initiate the autonomous action potentials and contractions needed to drive lymphatic pumping. Through the use of genetic models to delete individual genes or detect cytosolic calcium in specific cell types, the authors convincingly determine that lymphatic muscle cells are the origin of the action potential that triggers lymphatic contraction.

      Strengths:

      The experiments are rigorously performed, the data justify the conclusions and the limitations of the study are appropriately discussed.

      There is a need to identify therapeutic targets to improve lymphatic contraction and this work helps identify lymphatic muscle cells as potential cellular targets for intervention.

      Comments on revisions: The authors have addressed all of the reviewer comments. They should be congratulated on their precise and comprehensive study.

    3. Reviewer #2 (Public review):

      Summary:

      This is a well written manuscript describing studies directed at identifying the cell type responsible for pacemaking in murine collecting lymphatics. Using state of the art approaches, the authors identified a number of different cell types in the wall of these lymphatics and then using targeted expression of Channel Rhodopsin and GCaMP, the authors convincingly demonstrate that only activation of lymphatic muscle cells produces coordinated lymphatic contraction and that only lymphatic muscle cells display pressure-dependent Ca2+ transients as would be expected of a pacemaker in these lymphatics.

      Strengths:

      The use of targeted expression of channel rhodopsin and GCaMP to test the hypothesis that lymphatic muscle cells serve as the pacemakers in musing lymphatic collecting vessels.

      Weaknesses:

      The only significant weakness was the lack of quantitative analysis of most of the imaging data shown in Figures 1-11. In particular the colonization analysis should be extended to show cells not expected to demonstrate colocalization as a negative control for the colocalization analysis that the authors present. These weaknesses have been resolved by revision and addition of new and novel RNAseq data, additional colocalization data and membrane potential measurements.

      Comments on revisions: No additional concerns.

    4. Reviewer #3 (Public review):

      Summary:

      Zawieja et al. aimed to identify the pacemaker cells in the lymphatic collecting vessels. Authors have used various Cre-based expression systems and optogentic tools to identify these cells. Their findings suggest these cells are lymphatic muscle cells that drive the pacemaker activity in the lymphatic collecting vessels.

      Strengths:

      The authors have used multiple approaches to test their hypothesis. Some findings are presented as qualitative images, while some quantitative measurements are provided.

      Weaknesses:<br /> - More quantitative measurements.<br /> - Possible mechanisms associated with the pacemaker activity.<br /> - Membrane potential measurements.

      Comments on revisions: I do not have any additional comments.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Recommendations for the authors):

      The authors have done an impressive job in responding to the previous critique and even gone beyond what was asked. I have only very minor comments on this excellent manuscript. The manuscript also needs some light editing for grammar and readability.

      We have worked to improve the grammar and readability of the manuscript.

      Comments:

      Lines 227-234: At what age was tamoxifen administered to the various CreERTM mice?

      We have updated the ages of the mice used in this study in the methods sections.

      UMAP in Figure 5A is missing label for cluster 19.

      The UMAP in Figure 5A has the label for cluster 19 at the center-bottom of the image.

      Supplement Figure 6: Cluster 10 seems to be separate from the other AdvC clusters, and it includes some expression of Myh11 and Notch3. Further, there is low expression of Pdgfra in this cluster, which can be seen in panel B and panels D-I. Are the Pdgfra negative cells in the pie charts from cluster 10? Could the cells in this cluster by more LMC like than AdvC like?

      We agree with the reviewer that the subcluster 10 of the fibroblasts cells are intriguing if only a minor population. When assessing just this population of cells, which is 77 cells out of 2261 total, 40 of the 77 were Pdgfra+ and of the 37 remaining Pdgfra- but 11 of those were still CD34+. Thus at least half of these cells could be expected to have the PdgfraCreERTM. Only 8 of the 37 were Pdgfra-Notch3+ while 12 cells were Pdgfra+Notch3+, and only 3 were Pdgfra-Myh11+ while 3 were Pdgfra+Myh11+. 26 of 77 cells were Pdgfra+Pdgfrb+ double positive, while 12 of 37 Pdgfra- cells were still Pdgfrb+. Additionally, within the 77 cells of subcluster 10 17 were positive for Scn3a (Nav1.3), 21were positive for Kcnj8 (Kir6.1), and 33 were positive for Cacna1c (Cacna1c) which are typically LMC markers would support the reviewers thinking that this group contains a fibroblast-LMC transitional cell type. Only 2 of 77 cells were positive for the BK subunit (Kcnma1), which is a classic smooth muscle marker. Another possibility is this population represents the Pdgfra+Pdgfrb+ valve interstitial cells we identified in our IF staining and in our reporter mice. Of note almost all cells in this cluster were Col3a1+ and Vim+. Even though we performed QC analysis to remove doublets, it is also possible some of these cells could represent doublets or contaminants, however the low % of Myh11 expression, a very highly expressed gene in LMCs especially compared to ion channels, would suggest this is less likely. Assessing the presence of this particular cell cluster in future RNAseq or with spatial transcriptomics will be enlightening.

      Line 360. Proofread section title.

      We have simplified this title to read “Optogenetic Stimulation of iCre-driven Channel Rhodopsin 2”

      Lines 370-371. Are the length units supposed to be microns or millimeters?

      We have corrected this to microns as was intended. Thank you for catching this error.

      The resolution for each UMAP analysis should be stated, particularly for the identification of subclusters. How was the resolution chosen?

      To select the optimal cluster resolution, we used Clustree with various resolutions. We examined the resulting tree to identify a resolution where the clusters were well-separated and biologically meaningful, ensuring minimal merging or splitting at higher resolutions. Our goal was to find a resolution that captures relevant cell subpopulations while maintaining distinct clusters without excessive fragmentation. We have now stated the resolution for the subclustering of the LECs, LMCs, and fibroblasts. We have also added greater detail regarding the total number of cells, QC analysis, and the marker identification criteria used to the methods sections. We used resolution of 0.5 for sub-clustering LMCs, 0.87 for LECs, and 1.0 for fibroblasts.  These details are now added to the manuscript.

    1. eLife Assessment

      This important work advances our understanding of the impact of malnutrition on hematopoiesis and subsequently infection susceptibility. Support for the overall claims is convincing in some respects and incomplete in terms of identifying mechanism as highlighted by reviewers. This work will be of general interest to those in the fields of hematopoiesis, malnutrition, and dietary influence on immunity.

    2. Reviewer #2 (Public review):

      Summary:

      Sukhina et al. uses a chronic murine dietary restriction model to investigate the cellular mechanisms underlying nutritionally acquired immunodeficiency as well as the consequences of a refeeding intervention. The authors report a substantial impact of undernutrition to the myeloid compartment, which is not rescued by refeeding despite rescue of other phenotypes including lymphocyte levels, and which is associated with maintained partial susceptibility to bacterial infection.

      Strengths:

      Overall, this is a nicely executed study with an appropriate number of mice, robust phenotypes, and interesting conclusions, and the text is very well written. The authors' conclusions are generally well-supported by their data.

      Weaknesses:

      There is little evaluation of known critical drivers of myelopoiesis (e.g. PMID 20535209, 26072330, 29218601) over the course of the 40% diet, which would be of interest with regard to comparing this chronic model to other more short-term models of undernutrition.

      Further, the microbiota, well-established to be regulated by undernutrition (e.g. PMID 22674549, 27339978, etc.), and also well-established to be a critical regulator of hematopoiesis/myelopoiesis (e.g. PMID 27879260, 27799160, etc.), should be studied in any future explorations using this model.

      The authors have recognized these limitations to the study in their discussion.

    3. Reviewer #3 (Public review):

      This communication from Sukhina et al argues that a period of malnutrition (modeled by caloric restriction) causes lasting immune deficiencies (myelopoesis) not rescued by re-feeding. This is a potentially important paper exploring the effects of malnutrition on immunity, which is a clinically important topic. The revised study adds some details with respect to kinetics of immune compartment and body weight changes, but most aspects raised by the referees were deferred experimentally. Several textual changes have been made to avoid over-interpreting their data. My overall assessment of this revised study is similar to my impression before, which is that while the observations are interesting, there is both a lack of mechanistic understanding of the phenomena and a lack of resolution/detail about the phenomena itself.

    4. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This important work advances our understanding of the impact of malnutrition on hematopoiesis and subsequently infection susceptibility. Support for the overall claims is convincing in some respects and incomplete in others as highlighted by reviewers. This work will be of general interest to those in the fields of hematopoiesis, malnutrition, and dietary influence on immunity.

      We would like to thank the editors for agreeing to review our work at eLife. We greatly appreciate them assessing this study as important and of general interest to multiple fields, as well as the opportunity to respond to reviewer comments. Please find our responses to each reviewer below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors used a chronic murine dietary restriction model to study the effects of chronic malnutrition on controls of bacterial infection and overall immunity, including cellularity and functions of different immune cell types. They further attempted to determine whether refeeding can revert the infection susceptibility and immunodeficiency. Although refeeding here improves anthropometric deficits, the authors of this study show that this is insufficient to recover the impairments across the immune cell compartments.

      Strengths:

      The manuscript is well-written and conceived around a valid scientific question. The data supports the idea that malnutrition contributes to infection susceptibility and causes some immunological changes. The malnourished mouse model also displayed growth and development delays. The work's significance is well justified. Immunological studies in the malnourished cohort (human and mice) are scarce, so this could add valuable information.

      Weaknesses:

      The assays on myeloid cells are limited, and the study is descriptive and overstated. The authors claim that "this work identifies a novel cellular link between prior nutritional state and immunocompetency, highlighting dysregulated myelopoiesis as a major." However, after reviewing the entire manuscript, I found no cellular mechanism defining the link between nutritional state and immunocompetency.

      We thank the reviewer for deeming our work significant and noting the importance of the study. We appreciate the referee’s point regarding the lack of specific cellular functional data for innate immune cells and have modified the conclusions stated in text to more accurately reflect the results presented.

      Reviewer #2 (Public review):

      Summary:

      Sukhina et al. use a chronic murine dietary restriction model to investigate the cellular mechanisms underlying nutritionally acquired immunodeficiency as well as the consequences of a refeeding intervention. The authors report a substantial impact of undernutrition on the myeloid compartment, which is not rescued by refeeding despite rescue of other phenotypes including lymphocyte levels, and which is associated with maintained partial susceptibility to bacterial infection.

      Strengths:

      Overall, this is a nicely executed study with appropriate numbers of mice, robust phenotypes, and interesting conclusions, and the text is very well-written. The authors' conclusions are generally well-supported by their data.

      Weaknesses:

      There is little evaluation of known critical drivers of myelopoiesis (e.g. PMID 20535209, 26072330, 29218601) over the course of the 40% diet, which would be of interest with regard to comparing this chronic model to other more short-term models of undernutrition.

      Further, the microbiota, which is well-established to be regulated by undernutrition (e.g. PMID 22674549, 27339978, etc.), and also well-established to be a critical regulator of hematopoiesis/myelopoiesis (e.g. PMID 27879260, 27799160, etc.), is completely ignored here.

      We thank the reviewer for agreeing that the data presented support the stated conclusions and noting the experimental rigor.  The referee highlights two important areas for future mechanistic investigation that we agree are of great importance and relevant to the submitted study. We have included further discussion of the potential role cytokines and the microbiota might play in our model.

      Reviewer #3 (Public review):

      Summary:

      Sukhina et al are trying to understand the impacts of malnutrition on immunity. They model malnutrition with a diet switch from ad libitum to 40% caloric restriction (CR) in post-weaned mice. They test impacts on immune function with listeriosis. They then test whether re-feeding corrects these defects and find aspects of emergency myelopoiesis that remain defective after a precedent period of 40% CR. Overall, this is a very interesting observational study on the impacts of sudden prolonged exposure to less caloric intake.

      Strengths:

      The study is rigorously done. The observation of lasting defects after a bout of 40% CR is quite interesting. Overall, I think the topic and findings are of interest.

      Weaknesses:

      While the observations are interesting, in this reviewer's opinion, there is both a lack of mechanistic understanding of the phenomena and also some lack of resolution/detail about the phenomena itself. Addressing the following major issues would be helpful towards aspects of both:

      (1) Is it calories, per se, or macro/micronutrients that drive these phenotypes observed with 40% CR. At the least, I would want to see isocaloric diets (primarily protein, fat, or carbs) and then some of the same readouts after 40% CR. Ie does low energy with relatively more eg protein prevent immunosuppression (as is commonly suggested)? Micronutrients would be harder to test experimentally and may be out of the scope of this study. However, it is worth noting that many of the malnutrition-associated diseases are micronutrient deficiencies.

      (2) Is immunosuppression a function of a certain weight loss threshold? Or something else? Some idea of either the tempo of immunosuppression (happens at 1, in which weight loss is detected; vs 2-3, when body length and condition appear to diverge; or 5 weeks), or grade of CR (40% vs 60% vs 80%) would be helpful since the mechanism of immunosuppression overall is unclear (but nailing it may be beyond the scope of this communication).

      (3) Does an obese mouse that gets 40% CR also become immunodeficient? As it stands, this ad libitum --> 40% CR model perhaps best models problems in the industrial world (as opposed to always being 40% CR from weaning, as might be more common in the developing world), and so modeling an obese person losing a lot of weight from CR (like would be achieved with GLP-1 drugs now) would be valuable to understanding generalizability.

      (4) Generalizing this phenomenon as "bacterial" with listeriosis, which is more like a virus in many ways (intracellular phase, requires type I IFN, etc.) and cannot be given by the natural route of infection in mice, may not be most accurate. I would want to see an experiment with E.Coli, or some other bacteria, to test the statement of generalizability (ie is it bacteria, or type I IFN-pathway dominant infections, like viruses). If this is unique listeriosis, it doesn't undermine the story as it is at all, but it would just require some word-smithing.

      (5) Previous reports (which the authors cite) implicate Leptin, the levels of which scale with fat mass, as "permissive" of a larger immune compartment (immune compartment as "luxury function" idea). Is their phenotype also leptin-mediated (ie leptin AAV)?

      (6) The inability of re-feeding to "rescue" the myeloid compartment is really interesting. Can the authors do a bone marrow transplantation (CR-->ad libitum) to test if this effect is intrinsic to the CR-experienced bone marrow?

      (7) Is the defect in emergency myelopoiesis a defect in G-CSF? Ie if the authors injected G-CSF in CR animals, do they equivalently mobilize neutrophils? Does G-CSF supplementation (as one does in humans) rescue host defense against Listeria in the CR or re-feeding paradigms?

      We thank the reviewer for considering our work of interest and noting the rigor with which it was conducted. The referee raises several excellent mechanistic hypotheses and follow-up studies to perform. We agree that defining the specific dietary deficiency driving the phenotypes is of great interest. The relative contribution of calories versus macro- and micronutrients is an area we are interested in exploring in future studies, especially given the literature on the role of micronutrients in malnutrition driven wasting as the referee notes. We also agree that it will be key to determine whether non-hematopoietic cells contribute as well as the role of soluble factors such G-CSF and Leptin in mediating the immunodeficiency all warrant further study. Likewise, it will be important to evaluate how malnutrition impacts other models of infection to determine how generalizable these phenomena are. We have added these points to the discussion section as limitations of this study.

      Regarding how the phenotypes correspond to the timing of the immunosuppression relative to weight loss, we have performed new kinetics studies to provide some insight into this area. We now find that neutropenia in peripheral blood can be detected after as little as one week of dietary restriction, with neutropenia continuing to decline after prolonged restriction. These findings indicate that the impact on myeloid cell production are indeed rapid and proceed maximum weight loss, though the severity of these phenotypes does increase as malnutrition persists. We wholeheartedly agree with the reviewer that it will be interesting to explore whether starting weight impacts these phenotypes and whether similar findings can be made in obese animals as they are treated for weight loss.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In this study, the authors used a chronic murine dietary restriction model to study the effects of chronic malnutrition on controls of bacterial infection and overall immunity, including cellularity and functions of different immune cell types. They further attempted to determine whether refeeding can revert the infection susceptibility and immunodeficiency. Although refeeding here improves anthropometric deficits, the authors of this study show that this is insufficient to recover the impairments across the immune cell compartments. The authors claim that "this work identifies a novel cellular link between prior nutritional state and immunocompetency, highlighting dysregulated myelopoiesis as a major." However, after reviewing the entire manuscript, I could not find any cellular mechanism defining the link between nutritional state and immunocompetency. The assays on myeloid cells are limited, and the study is descriptive and overstated.

      Major concerns:

      (1) Malnutrition has entirely different effects on adults and children. In this study, 6-8 weeks old C57/Bl6 mice were used that mimic adult malnutrition. I do not understand then why the refeeding strategy for inpatient treatment of severely malnourished children was utilized here.

      (2) Figure 1g shows BM cellularity is reduced, but the authors claim otherwise in the text.

      (3) What is the basis of the body condition score in Figure 1d? It will be good to have it in the supplement.

      (4) Listeria monocytogenes cause systemic infection, so bioload was not determined in tissues beyond the liver.

      (5) Figure 3; T cell functional assays were limited to CD8 T cells and lymphocytes isolated from the spleen.

      (6) Why was peripheral cell count not considered? Discrepancies exist with the absolute cell number and relative abundance data, except for the neutrophil and monocyte data, which makes the data difficult to interpret. For example, for B cells, CD4 and CD8 cells.

      (7) Also, if mice exhibit thymic atrophy, why does % abundance data show otherwise? Overall, the data is confusing to interpret.

      (8) No functional tests for neutrophil or monocyte function exist to explain the higher bacterial burden in the liver or to connect the numbers with the overall pathogen load

      The rationale for examining both innate and adaptive immunity is not clear-it is even more unclear since the exact timelines for examining both innate and adaptive immunity (D0 and D5) were used.

      (9) Figure 2e doesn't make sense - why is spleen cellularity measured when bacterial load is measured in the liver?

      (10) Although it is claimed that emergency myelopoiesis is affected, no specific marker for emergency myelopoiesis other than cell numbers was studied.

      (11) I suggest including neutrophil effector functions and looking for real markers of granulopoiesis, such as Cebp-b. Since the authors attempted to examine the entirety of immune responses, it is better to measure cell abundance, types, and functions beyond the spleen. Consider the systemic spread of m while measuring bioload.

      (12) Minor grammatical errors - please re-read the entire text and correct grammatical errors to improve the flow of the text.

      (13) Sample size details missing

      (14) Be clear on which marks were used to identify monocytes. Using just CD11b and Ly6G is insufficient for neutrophil quantification.

      (15) Also, instead of saying "undernourished patients," say "patients with undernutrition" - change throughout the text. I would recommend numbering citations (as is done for Nature citations) to ease in following the text, as there are areas when there are more than ten citations with author names.

      (16) No line numbers are provided

      (17) Abstract

      -  What does accelerated contraction mean?

      -  "In" is repeated in a sentence

      -  Be clear that the study is done in a mouse model - saying just "animals" is not sufficient

      -  Indicate how malnutrition is induced in these mice

      (18) Introduction

      -  "restriction," "immune organs," - what is this referring to?

      -  You mention lymphoid tissue and innate and adaptive immunity, which doesn't make sense.

      Please correct this.

      -  You mention a lot of lymphoid tissues, i.e. lymphoid mass gain, but how about the bone marrow and spleen, which are responsible for most innate immune compartments?

      (19) Results

      a) Figure 1

      -  Why 40% reduced diet?

      -  It would be interesting to report if the organs are smaller relative to body weight. It makes sense that the organ weight is lower in the 40RD mice, especially since they are smaller, so the novelty of this data is not apparent (Figure 1f).

      -  You say, "We observed a corresponding reduction in the cellularity of the spleen and thymus, while the cellularity of the bone marrow was unaffected (Fig. 1g)." however, your BM data is significant, so this statement doesn't reflect the data you present, please correct.

      b) Figure 2

      - Figure 2d - what tissue is this from, mentioned in the figure? And measure cellularity there. The rationale for why you look only at the spleen here is weak. Also, we would benefit from including the groups without infection here for comparison purposes.

      c) Figure 3

      - The rationale for why you further looked at T cells is weak, mainly because of the following sentence. "Despite this overall loss in lymphocyte number, the relative frequency of each population was either unchanged or elevated, indicating that while malnutrition leads to a global reduction in immune cell numbers, lymphocytes are less impacted than other immune cell populations (Supplemental 1)." Please explain in the main text.

      d) Figure 4

      -  You say the peak of the adaptive immune response, but you never looked at the peak of adaptive immune - when is this? If you have the data, please show it. You also only show d0 and d5 post-infection data for adaptive immunity, so I am unsure where this statement comes from.

      -  How did you identify neutrophils and monocytes through flow cytometry? Indicate the markers used. Also, your text does not match your data; please correct it. i.e. monocyte numbers reduced, and relative abundance increased, but your text doesn't say this.

      -  Show the flow graph first then, followed by the quantification.

      -  The study would benefit from examining markers of emergency myelopoiesis such as Cebpb through qPCR.

      -  Although the number of neutrophils is lower in the BM and spleen, how does this relate to increased bacterial load in the liver? This is especially true since you did not quantify neutrophil numbers in the liver.

      e) Figure 6

      -  Some figures are incorrectly labelled.

      -  For the refeeding data, also include the data from the 40RD group to compare the level of recovery in the outcome measures.

      (20) Discussion

      -  You claim that monocytes are reduced to the same extent as neutrophils, but this is not true.

      Please correct.

      -  Indicate some limitations of your work.

      We thank the reviewer for offering these recommendations and the constructive comments. 

      Several comments raised concerns over the rationale or reasoning behind aspects of the experimental design or the data presented, which we would like to clarify:

      • Regarding the refeeding protocol, we apologize for the confusion for the rationale. We based our methodology on the general guidelines for refeeding protocols for malnourished people. We elected to increase food intake 10% daily to avoid risk of refeeding syndrome or other complications. Our method is by no means replicates the administration of specific vitamins, minerals, electrolytes, nor precise caloric content as would be given to a human patient. The citation provided offers information from the WHO regarding the complications that can arise during refeeding syndrome, which while it is from a document on pediatric care, we did not mean to imply that our method modeled refeeding intervention for children. We have modified the text to avoid this confusion.

      • The reviewer requested more clarity on why we studied both the innate and adaptive immune system as well as why we chose the time points studied. As referenced in the manuscript, prior work has observed that caloric restriction, fasting, and malnutrition all can impact the adaptive immune system. Given these previous findings, we felt it important to evaluate how malnutrition affected adaptive immune cell populations in our model. To this end, we provide data tracking the course of T-cell responses from the start of infection through day 14 at the time that the response undergoes contraction. However, since we find that bacterial burden is not properly controlled at earlier time points (day 5), when it is understood the innate immune system is more critical for mediating pathogen clearance, we elected to better characterize the effect malnutrition had on innate immune populations, something less well described in the literature. As phenotypes both in bacterial burden and within innate immune populations were observable as early as day 5, we chose to focus on that time point rather than later time points when readouts could be further confounded by secondary or compounding effects by the lack of early control of infection. We have tried to make this rationale clear in the text and have made changes to further emphasize this reasoning.

      • The reviewer also requested an explaination over why bacterial burden was measured in the liver and the immune response was measured in the spleen. While the reviewer is correct that our model is a systemic infection, it is well appreciated that bacteria rapidly disseminate to the liver and spleen and these organs serve as major sites of infection. Given the central role the spleen plays in organizing both the innate and adaptive immune response in this model, it is common practice in the field to phenotype immune cell populations in the spleen, while using the liver to quantify bacterial burden (see PMID: 37773751 as one example of many). We acknowledge this does not provide the full scope of bacterial infection or the immune response in every potentially affected tissue, but nonetheless believe the interpretation that malnourished and previously malnourished animals do not properly control infection and their immune responses are blunted compared to controls still stands.

      The reviewer raised several points about di3erences in the results for cell frequency and absolute number and why these may deviate in some circumstances. For example, the reviewer notes that we observe thymic atrophy yet the frequency of peripheral T-cells does not decline. It should be noted that absolute number can change when frequency does not and vice versa, due to changes in other cell types within the studied population of cells. As in the case of peripheral lymphocytes in our study, the frequency can stay the same or even increase when the absolute number declines (Supplemental 1). This can occur if other populations of cells decrease further, which is indeed the case as the loss of myeloid cells is greater than that of lymphocytes. Hence, we find that the frequency of T and B cells is unchanged or elevated, despite the loss in absolute number of peripheral cell, which is our stated interpretation. We believe this is consistent with our overall observations and is why it is important to report both frequency and absolute number, as we have done. 

      We have made the requested changes to the text to address the reviewers concerns as noted to improve clarity and accuracy for the description of experiments, results, and overall conclusions drawn in the manuscript. We have also included a discussion of the limitations of our work as well as additional areas for future investigation that remain open. 

      Reviewer #2 (Recommendations for the authors):

      Regarding the known drivers of myelopoiesis, can the authors quantify circulating levels of relevant immune cytokines (e.g. type I and II IFNs, GM-CSF, etc.)?

      Regarding the microbiota (point #2), how dramatically does this undernutrition modulate the microbiota both in terms of absolute load and community composition, and how effectively/quickly is this rescued by refeeding?

      We thank the reviewer for raising these recommendations. We agree that the role of circulating factors like cytokines and growth factors in contributing to the defects in myelopoiesis is of interest and is the focus of future work. Similarly, the impact of malnutrition on the microbiota is of great interest and has been evaluated by other groups in separate studies. How the known impact of malnutrition on the microbiota affects the phenotypes we observe in myelopoiesis is unclear and warrants future investigation. We have added these points to the discussion section as limitations of this study.

    1. Author Response:

      In the Weaknesses, Reviewer 3 suggests that in the Discussion, we comment upon whether WRN ATPase/3’-5’ helicase and WRNIP1 ATPase work on Y-family Pols additively or synergistically to raise fidelity. However, in the Discussion on page 20, we do comment on the role of WRN and WRNIP1 ATPase activities in conferring an additive increase in the fidelity of TLS by Y-family Pols.

    1. Author Response:

      We thank the reviewers for their thoughtful feedback and appreciate their recognition of the value of our findings. In response, we are refining the manuscript to clarify key terminology, more clearly describe our image analysis workflows, and temper the interpretation of our results where appropriate. We are planning to perform additional experiments to further investigate the specificity of mRNA co-localization between BK and CaV1.3 channels. We acknowledge the importance of understanding ensemble trafficking dynamics and the functional role of pre-assembly at the plasma membrane, and we plan to explore these questions in future work. We look forward to submitting a revised manuscript that addresses the reviewers’ comments in detail.

    1. eLife Assessment

      This important study explores the role of SIRT2 in regulating Japanese encephalitis virus replication and disease progression in rodent models. The findings presented are novel as sirtuins are known for their roles in aging, metabolism, and cell survival, but have not been studied in the context of viral infections until recently. The evidence supporting the claims is solid, although additional experiments to further characterize the clinical outcomes and directly test the link between acetylated NF-kB and SIRT2 expression would have strengthened the study. The work will be of interest to biologists studying viruses, sirtuins, and inflammation.

    2. Reviewer #1 (Public review):

      Summary:

      Desingu et al. show that JEV infection reduces SIRT2 expression. Upon JEV infection, 10-day-old SIRT2 KO mice showed increased viral titer, more severe clinical outcomes, and reduced survival. Conversely, SIRT2 overexpression reduced viral titer, clinical outcomes, and improved survival. Transcriptional profiling shows dysregulation of NF-KB and expression of inflammatory cytokines. Pharmacological NF-KB inhibition reduced viral titer. The authors conclude that SIRT2 is a regulator of JEV infection.

      Strengths:

      This paper is novel because sirtuins have been primarily studied for aging, metabolism, stem cells/regeneration. Their role in infection has not been explored until recently. Indeed, Barthez et al. showed that SIRT2 protects aged mice from SARS-CoV-2 infection (Barthez, Cell Reports 2025). Therefore, this is a timely and novel research topic. Mechanistically, the authors showed that SIRT2 suppresses the NF-KB pathway. Interestingly, SIRT2 has also been shown recently to suppress other major inflammatory pathways, such as cGAS-STING (Barthez, Cell Reports 2025) and the NLRP3 inflammasome (He, Cell Metabolism 2020; Luo, Cell Reports 2019). Together, these findings support the emerging concept that SIRT2 is a master regulator of inflammation.

      Weaknesses:

      (1) Figures 2 and 3. Although SIRT2 KO mice showed increased viral titer, more severe clinical outcomes, and reduced survival upon JEV infection, the difference is modest because even WT mice exhibited very severe disease at this viral dose. The authors should perform the experiment using a sub-lethal viral dose for WT mice, to allow the assessment of increased clinical outcomes and reduced survival in KO mice.

      (2) Figure 5K-N, the authors examined the expression of inflammatory cytokines in WT and SIRT2 KO cells upon JEV infection, in line with the dysregulation of NF-kB. It has been shown recently that SIRT2 also regulates the cGAS-STING pathway (Barthez, Cell Reports 2025) and the NLRP3 inflammasome (He, Cell Metabolism 2020; Luo, Cell Reports 2019). Do you also observe increased IFNb, IL1b, and IL18 in SIRT2 KO cells upon JEV infection? This may indicate that SIRT2 regulates systemic inflammatory responses and represents a potent protection upon viral infection. This is particularly important because in Figure 7F, the authors showed that SIRT2 overexpression reduced viral load even when NF-KB is inhibited, suggesting that NF-KB is not the only mediator of SIRT2 to suppress viral infection.

    3. Reviewer #2 (Public review):

      The manuscript by Desingu et al., explores the role of SIRT2 in regulating Japanese Encephalitis Virus (JEV) replication and disease progression in rodent models. Using both an in vitro and an in vivo approach, the authors demonstrate that JEV infection leads to decreased SIRT2 expression, which they hypothesize is exploited by JEV for viral replication. To test this hypothesis, the authors utilize SIRT2 inhibition (via AGK2 or genetic knockout) and demonstrate that it leads to increased viral load and worsens clinical outcomes in JEV-infected mice. Conversely, SIRT2 overexpression via an AAV delivery system reduces viral replication and improves survival among infected mice. The study proposes a mechanism in which SIRT2 suppresses JEV-induced autophagy and inflammation by deacetylating NF-κB, thereby reducing Beclin-1 expression (an NF-κB-dependent gene) and autophagy, which the authors consider a pathway that JEV exploits for replication. Transcriptomic analysis further supports that SIRT2 deficiency leads to NF-κB-driven cytokine hyperactivation. Additionally, pharmacological inhibition of NF-κB using Bay 11 (an IKK inhibitor) results in reduced viral load and improved clinical pathology in WT and SIRT2 KO mice. Overall, the findings from Desingu et al. are generally supported by the data and suggest that targeting SIRT2 may serve as a promising therapeutic approach for JEV infection and potentially other RNA viruses that SIRT2 helps control. However, the paper does fall short in some areas. Please see below for our comments to help improve the paper.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Desingu et al. show that JEV infection reduces SIRT2 expression. Upon JEV infection, 10-day-old SIRT2 KO mice showed increased viral titer, more severe clinical outcomes, and reduced survival. Conversely, SIRT2 overexpression reduced viral titer, clinical outcomes, and improved survival. Transcriptional profiling shows dysregulation of NF-KB and expression of inflammatory cytokines. Pharmacological NF-KB inhibition reduced viral titer. The authors conclude that SIRT2 is a regulator of JEV infection.

      This paper is novel because sirtuins have been primarily studied for aging, metabolism, stem cells/regeneration. Their role in infection has not been explored until recently. Indeed, Barthez et al. showed that SIRT2 protects aged mice from SARS-CoV-2 infection (Barthez, Cell Reports 2025). Therefore, this is a timely and novel research topic. Mechanistically, the authors showed that SIRT2 suppresses the NF-KB pathway. Interestingly, SIRT2 has also been shown recently to suppress other major inflammatory pathways, such as cGAS-STING (Barthez, Cell Reports 2025) and the NLRP3 inflammasome (He, Cell Metabolism 2020; Luo, Cell Reports 2019). Together, these findings support the emerging concept that SIRT2 is a master regulator of inflammation.

      Weaknesses:

      (1) Figures 2 and 3. Although SIRT2 KO mice showed increased viral titer, more severe clinical outcomes, and reduced survival upon JEV infection, the difference is modest because even WT mice exhibited very severe disease at this viral dose. The authors should perform the experiment using a sub-lethal viral dose for WT mice, to allow the assessment of increased clinical outcomes and reduced survival in KO mice.

      (2) Figure 5K-N, the authors examined the expression of inflammatory cytokines in WT and SIRT2 KO cells upon JEV infection, in line with the dysregulation of NF-kB. It has been shown recently that SIRT2 also regulates the cGAS-STING pathway (Barthez, Cell Reports 2025) and the NLRP3 inflammasome (He, Cell Metabolism 2020; Luo, Cell Reports 2019). Do you also observe increased IFNb, IL1b, and IL18 in SIRT2 KO cells upon JEV infection? This may indicate that SIRT2 regulates systemic inflammatory responses and represents a potent protection upon viral infection. This is particularly important because in Figure 7F, the authors showed that SIRT2 overexpression reduced viral load even when NF-KB is inhibited, suggesting that NF-KB is not the only mediator of SIRT2 to suppress viral infection.

      We thank the reviewer for the valuable recommendation. We are willing to conduct an experiment using a sub-lethal viral dose in wild-type (WT) mice to assess increased clinical outcomes and reduced survival in knockout (KO) mice, as recommended.

      Furthermore, we acknowledge reviewers' comments that SIRT2 regulates systemic inflammatory responses and provides potent protection against viral infection. Additionally, NF-κB is not the only mediator of SIRT2's suppression of viral infection; other possible molecular mechanisms are also involved in this process.

      Reviewer #2 (Public review):

      The manuscript by Desingu et al., explores the role of SIRT2 in regulating Japanese Encephalitis Virus (JEV) replication and disease progression in rodent models. Using both an in vitro and an in vivo approach, the authors demonstrate that JEV infection leads to decreased SIRT2 expression, which they hypothesize is exploited by JEV for viral replication. To test this hypothesis, the authors utilize SIRT2 inhibition (via AGK2 or genetic knockout) and demonstrate that it leads to increased viral load and worsens clinical outcomes in JEV-infected mice. Conversely, SIRT2 overexpression via an AAV delivery system reduces viral replication and improves survival among infected mice. The study proposes a mechanism in which SIRT2 suppresses JEV-induced autophagy and inflammation by deacetylating NF-κB, thereby reducing Beclin-1 expression (an NF-κB-dependent gene) and autophagy, which the authors consider a pathway that JEV exploits for replication. Transcriptomic analysis further supports that SIRT2 deficiency leads to NF-κB-driven cytokine hyperactivation. Additionally, pharmacological inhibition of NF-κB using Bay 11 (an IKK inhibitor) results in reduced viral load and improved clinical pathology in WT and SIRT2 KO mice. Overall, the findings from Desingu et al. are generally supported by the data and suggest that targeting SIRT2 may serve as a promising therapeutic approach for JEV infection and potentially other RNA viruses that SIRT2 helps control. However, the paper does fall short in some areas. Please see below for our comments to help improve the paper.

      We thank the reviewer for the valuable recommendation. We are willing to measure NF-kB acetylation in AdSIRT2 JEV-infected cells compared to WT-infected cells, to verify that the acetylation of NF-kB is truly linked to SIRT2 expression levels as per the reviewers' suggestion.

      We are willing to conduct an experiment using a sub-lethal viral dose in wild-type (WT) mice to assess increased clinical outcomes and reduced survival in knockout (KO) mice, as recommended.

      We are accepting the reviewer's suggestion that AGK2 can also inhibit other Sirtuins. Thus, to test the contribution of other Sirtuins, the experiment could be repeated using wild-type and Sirt2 KO mice. We are willing to conduct the AGK2 experiment using JEV-infected wild-type and Sirt2 knockout mice.

    1. eLife Assessment

      This valuable study tested whether several months of dolutegravir intensification alters the size of the HIV reservoir as well as immune activation in individuals already on suppressive ART. While the general study approach is appropriate and the paper is well written, the evidence supporting the claims of the authors is incomplete. The title of the paper is only partially supported by the data, based on specific issues with the study design and analysis plan highlighted by Reviewer 1. Specifically, the primary study outcomes were not clearly described a priori, the plausibility of a biologic effect is uncertain based on lack of a consistent effect across participants, and sample size is small. Given a possible observed partial effect and relevant hypothesis, this approach warrants study in a larger trial.

    2. Reviewer #1 (Public review):

      Fombellida-Lopez and colleagues describe the results of an ART intensification trial in people with HIV infection (PWH) on suppressive ART to determine the effect of increasing the dose of one ART drug, dolutegravir, on viral reservoirs, immune activation, exhaustion, and circulating inflammatory markers. The authors hypothesize that ART intensification will provide clues about the degree to which low-level viral replication is occurring in circulation and in tissues despite ongoing ART, which could be identified if reservoirs decrease and/or if immune biomarkers change. The trial design is straightforward and well-described, and the intervention appears to have been well tolerated. The investigators observed an increase in dolutegravir concentrations in circulation, and to a lesser degree in tissues, in the intervention group, indicating that the intervention has functioned as expected (ART has been intensified in vivo). Several outcome measures changed during the trial period in the intervention group, leading the investigators to conclude that their results provide strong evidence of ongoing replication on standard ART. The results of this small trial are intriguing, and a few observations in particular are hypothesis-generating and potentially justify further clinical trials to explore them in depth. However, I am concerned about over-interpretation of results that do not fully justify the authors' conclusions.

      (1) Trial objectives: What was the primary objective of the trial? This is not clearly stated. The authors describe changes in some reservoir parameters and no changes in others. Which of these was the primary outcome? No a priori hypothesis / primary objective is stated, nor is there explicit justification (power calculations, prior in vivo evidence) for the small n, unblinded design, and lack of placebo control. In the abstract (line 36, "significant decreases in total HIV DNA") and conclusion (lines 244-246), the authors state that total proviral DNA decreased as a result of ART intensification. However, in Figures 2A and 2E (and in line 251), the authors indicate that total proviral DNA did not change. These statements are confusing and appear to be contradictory. Regarding the decrease in total proviral DNA, I believe the authors may mean that they observed transient decrease in total proviral DNA during the intensification period (day 28 in particular, Figure 2A), however this level increases at Day 56 and then returns to baseline at Day 84, which is the source of the negative observation. Stating that total proviral DNA decreased as a result of the intervention when it ultimately did not is misleading, unless the investigators intended the day 28 timepoint as a primary endpoint for reservoir reduction - if so, this is never stated, and it is unclear why the intervention would then be continued until day 84? If, instead, reservoir reduction at the end of the intervention was the primary endpoint (again, unstated by the authors), then it is not appropriate to state that the total proviral reservoir decreased significantly when it did not.

      (2) Intervention safety and tolerability: The results section lacks a specific heading for participant safety and tolerability of the intervention. I was wondering about clinically detectable viremia in the study. Were there any viral blips? Was the increased DTG well tolerated? This drug is known to cause myositis, headache, CPK elevation, hepatotoxicity, and headache. Were any of these observed? What is the authors' interpretation of the CD4:8 ratio change (line 198)? Is this a significant safety concern for a longer duration of intensification? Was there also a change in CD4% or only in absolute counts? Was there relative CD4 depletion observed in the rectal biopsy samples between days 0 and 84? Interestingly, T cells dropped at the same timepoints that reservoirs declined... how do the authors rule out that reservoir decline reflects transient T cell decline that is non-specific (not due to additional blockade of replication)?

      (3) The investigators describe a decrease in intact proviral DNA after 84 days of ART intensification in circulating cells (Figure 2D), but no changes to total proviral DNA in blood or tissue (Figures 2A and 2E; IPDA does not appear to have been done on tissue samples). It is not clear why ART intensification would result in a selective decrease in intact proviruses and not in total proviruses if the source of these reservoir cells is due to ongoing replication. These reservoir results have multiple interpretations, including (but not limited to) the investigators' contention that this provides strong evidence of ongoing replication. However, ongoing replication results in the production of both intact and mutated/defective proviruses that both contribute to reservoir size (with defective proviruses vastly outnumbering intact proviruses). The small sample size and well-described heterogeneity of the HIV reservoir (with regard to overall size and composition) raise the possibility that the study was underpowered to detect differences over the 84-day intervention period. No power calculations or prior studies were described to justify the trial size or the duration of the intervention. Readers would benefit from a more nuanced discussion of reservoir changes observed here.

      (4) While a few statistically significant changes occurred in immune activation markers, it is not clear that these are biologically significant. Lines 175-186 and Figure 3: The change in CD4 cells + for TIGIT looks as though it declined by only 1-2%, and at day 84, the confidence interval appears to widen significantly at this timepoint, spanning an interquartile range of 4%. The only other immune activation/exhaustion marker change that reached statistical significance appears to be CD8 cells + for CD38 and HLA-DR, however, the decline appears to be a fraction of a percent, with the control group trending in the same direction. Despite marginal statistical significance, it is not clear there is any biological significance to these findings; Figure S6 supports the contention that there is no significant change in these parameters over time or between groups. With most markers showing no change and these two showing very small changes (and the latter moving in the same direction as the control group), these results do not justify the statement that intensifying DTG decreases immune activation and exhaustion (lines 38-40 in the abstract and elsewhere).

      (5) There are several limitations of the study design that deserve consideration beyond those discussed at line 327. The study was open-label and not placebo-controlled, which may have led to some medication adherence changes that confound results (authors describe one observation that may be evidence of this; lines 146-148). Randomized/blinded / cross-over design would be more robust and help determine signal from noise, given relatively small changes observed in the intervention arm. There does not seem to be a measurement of key outcome variables after treatment intensification ceased - evidence of an effect on replication through ART intensification would be enhanced by observing changes once intensification was stopped. Why was intensification maintained for 84 days? More information about the study duration would be helpful. Table 1 indicates that participants were 95% male. Sex is known to be a biological variable, particularly with regard to HIV reservoir size and chronic immune activation in PWH. Worldwide, 50% of PWH are women. Research into improving management/understanding of disease should reflect this, and equal participation should be sought in trials. Table 1 shows differing baseline reservoir sizes betweenthe control and intervention groups. This may have important implications, particularly for outcomes where reservoir size is used as the denominator.

      (6) Figure 1: the increase in DTG levels is interesting - it is not uniform across participants. Several participants had lower levels of DTG at the end of the intervention. Though unlikely to be statistically significant, it would be interesting to evaluate if there is a correlation between change in DTG concentrations and virologic / reservoir / inflammatory parameters. A positive relationship between increasing DTG concentration and decreased cell-associated RNA, for example, would help support the hypothesis that ongoing replication is occurring.

      (7) Figure 2: IPDA in tissue- was this done? scRNA in blood (single copy assay) - would this be expected to correlate with usCaRNA? The most unambiguous result is the decrease in cell-associated RNA - accompanying results using single-copy assay in plasma would be helpful to bolster this result. The use of the US RNA / Total DNA ratio is not helpful/difficult to interpret since the control and intervention arms were unmatched for total DNA reservoir size at study entry.

    3. Reviewer #2 (Public review):

      Summary:

      An intensification study with a double dose of 2nd generation integrase inhibitor with a background of nucleoside analog inhibitors of the HIV retrotranscriptase in 2, and inflammation is associated with the development of co-morbidities in 20 individuals randomized with controls, with an impact on the levels of viral reservoirs and inflammation markers. Viral reservoirs in HIV are the main impediment to an HIV cure, and inflammation is associated with co-morbidities.

      Strengths:

      The intervention that leads to a decrease of viral reservoirs and inflammation is quite straightforward forward as a doubling of the INSTI is used in some individuals with INSTI resistance, with good tolerability.

      This is a very well documented study, both in blood and tissues, which is a great achievement due to the difficulty of body sampling in well-controlled individuals on antiretroviral therapy. The laboratory assays are performed by specialists in the field with state-of-the art quantification assays. Both the introduction and the discussion are remarkably well presented and documented.

      The findings also have a potential impact on the management of chronic HIV infection.

      Weaknesses:

      I do not think that the size of the study can be considered a weakness, nor the fact that it is open-label either.

    4. Reviewer #3 (Public review):

      The introduction does a very good job of discussing the issue around whether there is ongoing replication in people with HIV on antiretroviral therapy. Sporadic, non-sustained replication likely occurs in many PWH on ART related to adherence, drug-drug interactions and possibly penetration of antivirals into sanctuary areas of replication and as the authors point out proving it does not occur is likely not possible and proving it does occur is likely very dependent on the population studied and the design of the intervention. Whether the consequences of this replication in the absence of evolution toward resistance have clinical significance challenging question to address.

      It is important to note that INSTI-based therapy may have a different impact on HIV replication events that results in differences in virus release for specific cell type (those responsible for "second phase" decay) by blocking integration in cells that have completed reverse transcription prior to ART initiation but have yet to be fully activated. In a PI or NNRTI-based regimen, those cells will release virus, whereas with an INSTI-based regimen, they will not.

      Given the very small sample size, there is a substantial risk of imbalance between the groups in important baseline measures. Unfortunately, with the small sample size, a non-significant P value is not helpful when comparing baseline measures between groups. One suggestion would be to provide the full range as opposed to the inter-quartile range (essentially only 5 or 6 values). The authors could also report the proportion of participants with baseline HIV RNA target not detected in the two groups.

      A suggestion that there is a critical imbalance between groups is that the control group has significantly lower total HIV DNA in PBMC, despite the small sample size. The control group also has numerically longer time of continuous suppression, lower unspliced RNA, and lower intact proviral DNA. These differences may have biased the ability to see changes in DNA and US RNA in the control group. Notably, there was no significant difference in the change in US RNA/DNA between groups (Figure 2C). The fact that the median relative change appears very similar in Figure 2C, yet there is a substantial difference in P values, is also a comment on the limits of the current sample size. The text should report the median change in US RNA and US RNA/DNA when describing Figures 2A-2C. This statistical comparison of changes in IPDA results between groups should be reported. The presentation of the absolute values of all the comparisons in the supplemental figures is a strength of the manuscript.

      In the assessment of ART intensification on immune activation and exhaustion, the fact that none of the comparisons between randomized groups were significant should be noted and discussed.

      The changes in CD4:CD8 ratio and sCD14 levels appear counterintuitive to the hypothesis and are commented on in the discussion.

      Overall, the discussion highlights the significant changes in the intensified group, which are suggestive. There is limited discussion of the comparisons between group,s where the results are less convincing.

      The limitations of the study should be more clearly discussed. The small sample size raises the possibility of imbalance at baseline. The supplemental figures (S3-S5) are helpful in showing the differences between groups at baseline, and the variability of measurements is more apparent. The lack of blinding is also a weakness, though the PK assessments do help (note 3TC levels rise substantially in both groups for most of the time on study (Figure S2).

      The many assays and comparisons are listed as a strength. The many comparisons raise the possibility of finding significance by chance. In addition, if there is an imbalance at baseline outcomes, measuring related parameters will move in the same direction.

      The limited impact on activation and inflammation should be addressed in the discussion, as they are highlighted as a potentially important consequence of intermittent, not sustained replication in the introduction.

      The study is provocative and well executed, with the limitations listed above. Pharmacokinetic analyses help mitigate the lack of blinding. The major impact of this work is if it leads to a much larger randomized, controlled, blinded study of a longer duration, as the authors point out.

    5. Author response:

      Reviewer #1 (Public Review):

      Fombellida-Lopez and colleagues describe the results of an ART intensification trial in people with HIV infection (PWH) on suppressive ART to determine the effect of increasing the dose of one ART drug, dolutegravir, on viral reservoirs, immune activation, exhaustion, and circulating inflammatory markers. The authors hypothesize that ART intensification will provide clues about the degree to which low-level viral replication is occurring in circulation and in tissues despite ongoing ART, which could be identified if reservoirs decrease and/or if immune biomarkers change. The trial design is straightforward and well-described, and the intervention appears to have been well tolerated. The investigators observed an increase in dolutegravir concentrations in circulation, and to a lesser degree in tissues, in the intervention group, indicating that the intervention has functioned as expected (ART has been intensified in vivo). Several outcome measures changed during the trial period in the intervention group, leading the investigators to conclude that their results provide strong evidence of ongoing replication on standard ART. The results of this small trial are intriguing, and a few observations in particular are hypothesis-generating and potentially justify further clinical trials to explore them in depth. However, I am concerned about over-interpretation of results that do not fully justify the authors' conclusions.

      We thank Reviewer #1 for their thoughtful and constructive comments, which will help us clarify and improve the manuscript. Below, we address each of the reviewer’s points and describe the changes that we intend to implement in the revised version. We acknowledge the reviewer’s concern regarding potential over-interpretation of certain findings, and we will take particular care to ensure that all conclusions are supported by the data and framed within the exploratory nature of the study.

      (1) Trial objectives: What was the primary objective of the trial? This is not clearly stated. The authors describe changes in some reservoir parameters and no changes in others. Which of these was the primary outcome? No a priori hypothesis / primary objective is stated, nor is there explicit justification (power calculations, prior in vivo evidence) for the small n, unblinded design, and lack of placebo control. In the abstract (line 36, "significant decreases in total HIV DNA") and conclusion (lines 244-246), the authors state that total proviral DNA decreased as a result of ART intensification. However, in Figures 2A and 2E (and in line 251), the authors indicate that total proviral DNA did not change. These statements are confusing and appear to be contradictory. Regarding the decrease in total proviral DNA, I believe the authors may mean that they observed transient decrease in total proviral DNA during the intensification period (day 28 in particular, Figure 2A), however this level increases at Day 56 and then returns to baseline at Day 84, which is the source of the negative observation. Stating that total proviral DNA decreased as a result of the intervention when it ultimately did not is misleading, unless the investigators intended the day 28 timepoint as a primary endpoint for reservoir reduction - if so, this is never stated, and it is unclear why the intervention would then be continued until day 84? If, instead, reservoir reduction at the end of the intervention was the primary endpoint (again, unstated by the authors), then it is not appropriate to state that the total proviral reservoir decreased significantly when it did not.

      We agree with the reviewer that the primary objective of the study was not explicitly stated in the submitted manuscript. We will clarify this in the revised manuscript. As registered on ClinicalTrials.gov (NCT05351684), the primary outcome was defined as “To evaluate the impact of treatment intensification at the level of total and replication-competent reservoir (RCR) in blood and in tissues”, with a time frame of 3 months. Accordingly, our aim was to explore whether any measurable reduction in the HIV reservoir (total or replication-competent) occurred during the intensification period, including at day 28, 56, or 84. The protocol did not prespecify a single time point for this effect to occur, and the exploratory design allowed for detection of transient or sustained changes within the intensification window.

      We recognize that this scope was not clearly articulated in the original text and may have led to confusion in interpreting the transient drop in total HIV DNA observed at day 28. While total DNA ultimately returned to baseline by the end of intensification, the presence of a transient reduction during this 3-month window still fits within the framework of the study’s registered objective. Moreover, although the change in total HIV DNA was transient, it aligns with the consistent direction of changes observed across the multiple independent measures, including CA HIV RNA, RNA/DNA ratio and intact HIV DNA, collectively supporting a biological effect of intensification.

      We would also like to stress that this is the first clinical trial ever, in which an ART intensification is performed not by adding an extra drug but by increasing the dosage of an existing drug. Therefore, we were more interested in the overall, cumulative, effect of intensification throughout the entire trial period, than in differences between groups at individual time points. We will clarify in the manuscript that this was a proof-of-concept phase 2 study, designed to generate biological signals rather than confirm efficacy in a powered comparison. The absence of a pre-specified statistical endpoint or sample size calculation reflects the exploratory nature of the trial.

      (2) Intervention safety and tolerability: The results section lacks a specific heading for participant safety and tolerability of the intervention. I was wondering about clinically detectable viremia in the study. Were there any viral blips? Was the increased DTG well tolerated? This drug is known to cause myositis, headache, CPK elevation, hepatotoxicity, and headache. Were any of these observed? What is the authors' interpretation of the CD4:8 ratio change (line 198)? Is this a significant safety concern for a longer duration of intensification? Was there also a change in CD4% or only in absolute counts? Was there relative CD4 depletion observed in the rectal biopsy samples between days 0 and 84? Interestingly, T cells dropped at the same timepoints that reservoirs declined... how do the authors rule out that reservoir decline reflects transient T cell decline that is non-specific (not due to additional blockade of replication)?

      We will improve the Methods section to clarify how safety and tolerability were assessed during the study. Safety evaluations were conducted on day 28 and day 84 and included a clinical examination and routine laboratory testing (liver function tests, kidney function, and complete blood count). Medication adherence was also monitored through pill counts performed by the study nurses.

      No virological blips above 50 copies/mL were observed and no adverse events were reported by participants during the 3-month intensification period. Although CPK levels were not included in the routine biological monitoring, no participant reported muscle pain or other symptoms suggestive of muscle toxicity.

      The CD4:CD8 ratio decrease noted during intensification was not associated with significant changes in absolute CD4 or CD8 counts, as shown in Figure 5. We interpret this ratio change as a transient redistribution rather than an immunological risk, therefore we do not consider it to represent a safety concern.

      We would like to clarify that CD4<sup>+</sup> T-cell counts did not significantly decrease in any of the treatment groups, as shown in Figure 5. The apparent decline observed concerns the CD4/CD8 ratio, which transiently dropped, but not the absolute number of CD4<sup>+</sup> T cells.

      (3) The investigators describe a decrease in intact proviral DNA after 84 days of ART intensification in circulating cells (Figure 2D), but no changes to total proviral DNA in blood or tissue (Figures 2A and 2E; IPDA does not appear to have been done on tissue samples). It is not clear why ART intensification would result in a selective decrease in intact proviruses and not in total proviruses if the source of these reservoir cells is due to ongoing replication. These reservoir results have multiple interpretations, including (but not limited to) the investigators' contention that this provides strong evidence of ongoing replication. However, ongoing replication results in the production of both intact and mutated/defective proviruses that both contribute to reservoir size (with defective proviruses vastly outnumbering intact proviruses). The small sample size and well-described heterogeneity of the HIV reservoir (with regard to overall size and composition) raise the possibility that the study was underpowered to detect differences over the 84-day intervention period. No power calculations or prior studies were described to justify the trial size or the duration of the intervention. Readers would benefit from a more nuanced discussion of reservoir changes observed here.

      We sincerely thank the reviewer for this insightful comment. We fully agree that the reservoir dynamics observed in our study raise several possible interpretations, and that its complexity, resulting from continuous cycles of expansion and contraction, reflects the heterogeneity of the latent reservoir.

      Total HIV DNA in PBMCs showed a transient decline during intensification (notably at day 28), ultimately returning to baseline by day 84. This biphasic pattern may reflect the combined effects of suppression of ongoing low-level replication by an increased DTG dosage, followed by the expansion of infected cell clones (mostly harboring defective proviruses). In other words, the transient decrease in total (intact + defective) DNA at day 28 may be due to an initial decrease in newly infected cells upon ART intensification, however at the subsequent time points this effect was masked by proliferation (clonal expansion) of infected cells with defective proviruses. This explains why the intact proviruses decreased, but the total proviruses did not change, between days 0 and 84.

      Importantly, we observed a significant decrease in intact proviral DNA between day 0 and day 84 in the intensification group (Figure 2D). We will highlight this result more clearly in the revised manuscript, as it directly addresses the study’s primary objective: assessing the impact of intensification on the replication-competent reservoir. In comparison, as the reviewer rightly points out, total HIV DNA includes over 90% defective genomes, which limits its interpretability as a biomarker of biologically relevant reservoir changes.

      In addition, other reservoir markers, such as cell-associated unspliced RNA and RNA/DNA ratios, also showed consistent trends supporting a modest but biologically relevant effect of intensification. Even in the absence of sustained changes in total HIV DNA, the coherence across these independent measures suggests a signal indicative of ongoing replication in at least some individuals, and at specific timepoints.

      Regarding tissue reservoirs, the lack of substantial change in total HIV DNA between days 0 and 84 is also in line with the predominance of defective sequences in these compartments. Moreover, the limited increase in rectal tissue dolutegravir levels during intensification (from 16.7% to 20% of plasma concentrations) may have limited the efficacy of the intervention in this site.

      As for the IPDA on rectal biopsies, we attempted the assay using two independent DNA extraction methods (Promega Reliaprep and Qiagen Puregene), but both yielded high DNA Shearing Index values, and intact proviral detection was successful in only 3 of 40 samples. Given the poor DNA integrity and weak signals, these results were not interpretable.

      That said, we fully acknowledge the limitations of our study, especially the small sample size, and we agree with the reviewer that caution is needed when interpreting these findings. In the revised manuscript, we will adopt a more measured tone in the discussion, clearly stating that these observations are exploratory and hypothesis-generating, and require confirmation in larger, more powered studies. Nonetheless, we believe that the convergence of multiple reservoir markers pointing in the same direction constitutes a potentially meaningful biological signal that deserves further investigation.

      (4) While a few statistically significant changes occurred in immune activation markers, it is not clear that these are biologically significant. Lines 175-186 and Figure 3: The change in CD4 cells + for TIGIT looks as though it declined by only 1-2%, and at day 84, the confidence interval appears to widen significantly at this timepoint, spanning an interquartile range of 4%. The only other immune activation/exhaustion marker change that reached statistical significance appears to be CD8 cells + for CD38 and HLA-DR, however, the decline appears to be a fraction of a percent, with the control group trending in the same direction. Despite marginal statistical significance, it is not clear there is any biological significance to these findings; Figure S6 supports the contention that there is no significant change in these parameters over time or between groups. With most markers showing no change and these two showing very small changes (and the latter moving in the same direction as the control group), these results do not justify the statement that intensifying DTG decreases immune activation and exhaustion (lines 38-40 in the abstract and elsewhere).

      We agree with the reviewer that the observed changes in immune activation and exhaustion markers were modest. We will revise the manuscript to reflect this more accurately. We will also note that these differences, while statistically significant (e.g., in TIGIT+ CD4+ T cells and CD38+HLA-DR+ CD8+ T cells), were limited in magnitude. We will explicitly acknowledge these limitations and interpret the findings with appropriate caution.

      (5) There are several limitations of the study design that deserve consideration beyond those discussed at line 327. The study was open-label and not placebo-controlled, which may have led to some medication adherence changes that confound results (authors describe one observation that may be evidence of this; lines 146-148). Randomized/blinded / cross-over design would be more robust and help determine signal from noise, given relatively small changes observed in the intervention arm. There does not seem to be a measurement of key outcome variables after treatment intensification ceased - evidence of an effect on replication through ART intensification would be enhanced by observing changes once intensification was stopped. Why was intensification maintained for 84 days? More information about the study duration would be helpful. Table 1 indicates that participants were 95% male. Sex is known to be a biological variable, particularly with regard to HIV reservoir size and chronic immune activation in PWH. Worldwide, 50% of PWH are women. Research into improving management/understanding of disease should reflect this, and equal participation should be sought in trials. Table 1 shows differing baseline reservoir sizes between the control and intervention groups. This may have important implications, particularly for outcomes where reservoir size is used as the denominator.

      We will expand the limitations section to address several key aspects raised by the reviewer: the absence of blinding and placebo control, the predominantly male study population, and the lack of post-intervention follow-up. While we acknowledge that open-label designs can introduce behavioral biases, including potential changes in adherence, we will now explicitly state that placebo-controlled, blinded trials would provide a more robust assessment and are warranted in future research.

      The 84-day duration of intensification was chosen based on previous studies and provided sufficient time for observing potential changes in viral transcription and reservoir dynamics. However, we agree that including post-intervention follow-up would have strengthened the conclusions, and we will highlight this limitation and future direction in the revised manuscript.

      The sex imbalance is now clearly acknowledged as a limitation in the revised manuscript, and we fully support ongoing efforts to promote equitable recruitment in HIV research. We would like to add that, in our study, rectal biopsies were coupled with anal cancer screening through HPV testing. This screening is specifically recommended for younger men who have sex with men (MSM), as outlined in the current EACS guidelines (see: https://eacs.sanfordguide.com/eacs-part2/cancer/cancer-screening-methods). As a result, MSM participants had both a clinical incentive and medical interest to undergo this procedure, which likely contributed to the higher proportion of male participants in the study.

      Lastly, although baseline total HIV DNA was higher in the intensified group, our statistical approach is based on a within-subject (repeated-measures) design, in which the longitudinal change of a parameter within the same participant during the study was the main outcome. In other words, we are not comparing absolute values of any marker between the groups, we are looking at changes of parameters from baseline within participants, and these are not expected to be affected by baseline imbalances.

      (6) Figure 1: the increase in DTG levels is interesting - it is not uniform across participants. Several participants had lower levels of DTG at the end of the intervention. Though unlikely to be statistically significant, it would be interesting to evaluate if there is a correlation between change in DTG concentrations and virologic / reservoir / inflammatory parameters. A positive relationship between increasing DTG concentration and decreased cell-associated RNA, for example, would help support the hypothesis that ongoing replication is occurring.

      We agree with the reviewer that assessing correlations between DTG concentrations and virological, immunological, or inflammatory markers would be highly informative. In fact, we initially explored this question in a preliminary way by examining whether individuals who showed a marked increase in DTG levels after intensification also demonstrated stronger changes in the viral reservoir. While this exploratory analysis did not reveal any clear associations, we would like to emphasize that correlating biological effects with DTG concentrations measured at a single timepoint may have limited interpretability. A more comprehensive understanding of the relationship between drug exposure and reservoir dynamics would ideally require multiple pharmacokinetic measurements over time, including pre-intensification baselines. This is particularly important given that DTG concentrations vary across individuals and over time, depending on adherence, metabolism, and other individual factors. We will clarify these points in the revised manuscript.

      (7) Figure 2: IPDA in tissue- was this done? scRNA in blood (single copy assay) - would this be expected to correlate with usCaRNA? The most unambiguous result is the decrease in cell-associated RNA - accompanying results using single-copy assay in plasma would be helpful to bolster this result.

      As mentioned in our response to point 3, we attempted IPDA on tissue samples, but technical limitations prevented reliable detection of intact proviruses. Regarding residual viremia, we did perform ultra-sensitive plasma HIV RNA quantification but due to a technical issue (an inadvertent PBMC contamination during plasma separation) that affected the reliability of the results we felt uncomfortable including these data in the manuscript.

      The use of the US RNA / Total DNA ratio is not helpful/difficult to interpret since the control and intervention arms were unmatched for total DNA reservoir size at study entry.

      We respectfully disagree with this comment. The US RNA / Total DNA ratio is commonly used to assess the relative transcriptional activity of the viral reservoir, rather than its absolute size. While we acknowledge that the total HIV-1 DNA levels differed at baseline between the two groups, the US RNA / Total DNA ratio specifically reflects the relationship between transcriptional activity and reservoir size within each individual, and is therefore not directly confounded by baseline differences in total DNA alone.

      Moreover, our analyses focus on within-subject longitudinal changes from baseline, not on direct between-group comparisons of absolute marker values. As such, the observed changes in the US RNA / Total DNA ratio over time are interpreted relative to each participant's baseline, mitigating concerns related to baseline imbalances between groups.

      Reviewer #2 (Public Review):

      Summary:

      An intensification study with a double dose of 2nd generation integrase inhibitor with a background of nucleoside analog inhibitors of the HIV retrotranscriptase in 2, and inflammation is associated with the development of co-morbidities in 20 individuals randomized with controls, with an impact on the levels of viral reservoirs and inflammation markers. Viral reservoirs in HIV are the main impediment to an HIV cure, and inflammation is associated with co-morbidities.

      Strengths:

      The intervention that leads to a decrease of viral reservoirs and inflammation is quite straightforward forward as a doubling of the INSTI is used in some individuals with INSTI resistance, with good tolerability.

      This is a very well documented study, both in blood and tissues, which is a great achievement due to the difficulty of body sampling in well-controlled individuals on antiretroviral therapy. The laboratory assays are performed by specialists in the field with state-of-the art quantification assays. Both the introduction and the discussion are remarkably well presented and documented.

      The findings also have a potential impact on the management of chronic HIV infection.

      Weaknesses:

      I do not think that the size of the study can be considered a weakness, nor the fact that it is open-label either.

      We thank Reviewer #2 for their constructive and supportive comments. We appreciate their positive assessment of the study design, the translational relevance of the intervention, and the technical quality of the assays. We also take note of their perspective regarding sample size and study design, which supports our positioning of this trial as an exploratory, hypothesis-generating phase 2 study.

      Reviewer #3 (Public Review):

      The introduction does a very good job of discussing the issue around whether there is ongoing replication in people with HIV on antiretroviral therapy. Sporadic, non-sustained replication likely occurs in many PWH on ART related to adherence, drug interactions and possibly penetration of antivirals into sanctuary areas of replication and as the authors point out proving it does not occur is likely not possible and proving it does occur is likely very dependent on the population studied and the design of the intervention. Whether the consequences of this replication in the absence of evolution toward resistance have clinical significance challenging question to address.

      It is important to note that INSTI-based therapy may have a different impact on HIV replication events that results in differences in virus release for specific cell type (those responsible for "second phase" decay) by blocking integration in cells that have completed reverse transcription prior to ART initiation but have yet to be fully activated. In a PI or NNRTI-based regimen, those cells will release virus, whereas with an INSTI-based regimen, they will not.

      Given the very small sample size, there is a substantial risk of imbalance between the groups in important baseline measures. Unfortunately, with the small sample size, a non-significant P value is not helpful when comparing baseline measures between groups. One suggestion would be to provide the full range as opposed to the inter-quartile range (essentially only 5 or 6 values). The authors could also report the proportion of participants with baseline HIV RNA target not detected in the two groups.

      We thank Reviewer #3 for their thoughtful and balanced review. We are grateful for the recognition of the strength of the Introduction, the complexity of evaluating residual replication, and the technical execution of the assays. We also appreciate the insightful suggestions for improving the clarity and transparency of our results and discussion.

      We will revise the manuscript to address several of the reviewer’s key concerns. We agree that the small sample size increases the risk of baseline imbalances. We will acknowledge these limitations in the revised manuscript. We will provide both the full range and the IQR in Table 1 in the revised manuscript.

      A suggestion that there is a critical imbalance between groups is that the control group has significantly lower total HIV DNA in PBMC, despite the small sample size. The control group also has numerically longer time of continuous suppression, lower unspliced RNA, and lower intact proviral DNA. These differences may have biased the ability to see changes in DNA and US RNA in the control group.

      We acknowledge the significant baseline difference in total HIV DNA between groups, which we have clearly reported. However, the other variables mentioned, duration of continuous viral suppression, unspliced RNA levels, and intact proviral DNA, did not differ significantly between groups at baseline, despite differences in the median values. These numerical differences do not necessarily indicate a critical imbalance.

      Notably, there was no significant difference in the change in US RNA/DNA between groups (Figure 2C).

      The nonsignificant difference in the change in US RNA/DNA between groups is not unexpected, given the significant between-group differences for both US RNA and total DNA changes. Since the ratio combines both markers, it is likely to show attenuated between-group differences compared to the individual components. However, while the difference did not reach statistical significance (p = 0.09), we still observed a trend towards a greater reduction in the US RNA/Total DNA ratio in the intervention group.

      The fact that the median relative change appears very similar in Figure 2C, yet there is a substantial difference in P values, is also a comment on the limits of the current sample size.

      Although we surely agree that in general, the limited sample size impacts statistical power, we would like to point out that in Figure 2C, while the medians may appear similar, the ranges do differ between groups. At days 56 and 84, the median fold changes from baseline are indeed close but the full interquartile range in the DTG group stays below 1, while in the control group, the interquartile range is wider and covers approximately equal distance above and below 1. This explains the difference in p values between the groups.

      The text should report the median change in US RNA and US RNA/DNA when describing Figures 2A-2C.

      These data are already reported in the Results section (lines 164–166): "By day 84, US RNA and US RNA/total DNA ratio had decreased from day 0 by medians (IQRs) of 5.1 (3.3–6.4) and 4.6 (3.1–5.3) fold, respectively (p = 0.016 for both markers)."

      This statistical comparison of changes in IPDA results between groups should be reported. The presentation of the absolute values of all the comparisons in the supplemental figures is a strength of the manuscript.

      In the assessment of ART intensification on immune activation and exhaustion, the fact that none of the comparisons between randomized groups were significant should be noted and discussed.

      We would like to point out that a statistically significant difference between the randomized groups was observed for the frequency of CD4<sup>+</sup> T cells expressing TIGIT, as shown in Figure 3A and reported in the Results section (p = 0.048).

      The changes in CD4:CD8 ratio and sCD14 levels appear counterintuitive to the hypothesis and are commented on in the discussion.

      Overall, the discussion highlights the significant changes in the intensified group, which are suggestive. There is limited discussion of the comparisons between groups where the results are less convincing.

      We will temper the language accordingly and add commentary on the limited and modest nature of these changes. Similarly, we will expand our discussion of counterintuitive findings such as the CD4:CD8 ratio and sCD14 changes.

      The limitations of the study should be more clearly discussed. The small sample size raises the possibility of imbalance at baseline. The supplemental figures (S3-S5) are helpful in showing the differences between groups at baseline, and the variability of measurements is more apparent. The lack of blinding is also a weakness, though the PK assessments do help (note 3TC levels rise substantially in both groups for most of the time on study (Figure S2).

      The many assays and comparisons are listed as a strength. The many comparisons raise the possibility of finding significance by chance. In addition, if there is an imbalance at baseline outcomes, measuring related parameters will move in the same direction.

      We agree that the multiple comparisons raise the possibility of chance findings but would like to stress that in an exploratory study like this it is very important to avoid a type II error. In addition, the consistent directionality of the most relevant outcomes (US RNA and intact DNA) lends biological plausibility to the observed effects.

      The limited impact on activation and inflammation should be addressed in the discussion, as they are highlighted as a potentially important consequence of intermittent, not sustained replication in the introduction.

      The study is provocative and well executed, with the limitations listed above. Pharmacokinetic analyses help mitigate the lack of blinding. The major impact of this work is if it leads to a much larger randomized, controlled, blinded study of a longer duration, as the authors point out.

      Finally, we fully endorse the reviewer’s suggestion that the primary contribution of this study lies in its value as a proof-of-concept and foundation for future randomized, blinded trials of greater scale and duration. We will highlight this more clearly in the revised Discussion.

    1. eLife Assessment

      Tropical single-island endemic bird populations are particularly vulnerable to climate change. The authors investigate genetic evidence of how such species dealt with climate changes in the past as a possible predictor for how they will respond to change in the future, which could provide an important example for the fields of conservation genetics and island biogeography. The authors' integration of genomics and habitat modeling is commendable, but we find that the support for their conclusions is incomplete: at times, the results presented appear to contradict each other, the authors do not fully account for key variables, and the limited taxonomic scope may cause problematic biases for the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      The authors combine PSMC and habitat modeling to try to connect habitat change during the Last Glacial Period to changes in Ne.

      Strengths:

      Observing how tropical single-island endemic bird species responded to habitat change in the past may help inform conservation interventions for these particularly vulnerable species. The combination of genomics and habitat modeling is a good idea - this sort of interdisciplinary thinking is what is needed to tackle these complex questions. Additionally, the use of PSMC makes it possible to perform this analysis on poorly-studied species with only a single genome available.

      Room for Improvement:

      Why coalescent Ne is a better predictor of extinction risk than current genomic diversity, or current Ne, isn't explicitly explained. PSMC in particular has many caveats, and some are not acknowledged or adequately addressed by the authors. For example, the authors note that population structure is a confounding factor with PSMC, but that it is not a problem in this instance. They do not provide compelling evidence for why this would be the case, they simply state that the species studied are all single-island endemics. However, single-island endemic species are not necessarily panmictic; this is even less likely to be true for species studied here that inhabit a large geographic area (ie, Australian species). Differing PSMC parameters may also impact results: the differences between passerines and non-passerines were one of their main results, but they do not provide any analysis to show that this difference was not driven by the different mutation rates used for the two groups.

      Parameters for many steps are not described, and choices that are described (such as the PSMC parameters) are not always fully explained. It is unclear why all data was mapped to the autosomes rather than removing reads that map to the sex chromosomes first. Using all the data, the reads belonging to the sex chromosomes could potentially map to other areas of the genome. It does not seem like a mapping quality filter was used, so these potential spurious alignments would not have been removed prior to analysis.

      There are points where the results are described in ways that appear to potentially differ from the supplementary figures. The authors state that even for species where PSMC results differed between models, "trends of Ne increase or decrease from the LIG to LGM were robust across all three PSMC models considered." The figures in the supplement for Pachycephala philippinensis, Rhynochetos jubatus, and Zosterops hypoxanthus appear to potentially contradict this statement, but it is difficult to tell, as the time period observed is not clearly marked on the graphs. How this robustness of trends was determined is not explained, leaving the precision of the analysis unclear.

      Table 1 also includes some information that contradicts what is in the Supplementary Tables, leading to a lack of clarity. Centropus unirufus, Chaetorhynchus papuensis, and Cnemophilus loriae are not included in Supplementary Table 4. Table 1 says Eulacestoma nigropectus, Paradisaea rubra, and Parotia lawesii did not undergo PSMC analysis, but Supplementary Table 4 says PSMC and modeling trends matched for these species. Table 1 says Rhagologus leucostigma underwent both PSMC and climate modeling, but Supplementary Table 4 says "NA" as if it was missing one of these analyses.

      Additionally, some of the results appear to contradict each other. For example, they show that there is no impact of habitat change in larger-bodied species, but also that larger-bodied species saw a decrease in Ne during the LGP. In another example, they state that when a species saw an increase in habitat during the LGP, they also had an increase in Ne. However, they also state that this was not the case for non-passerines.

      Ecosystems are highly complex; there may also be other variables influencing past demographic change other than those explored here. Results should be interpreted with caution.

    3. Reviewer #2 (Public review):

      Summary and strengths:

      In this manuscript, Karjee and colleagues used coalescent-based effective population size reconstruction (PSMC) from single genomes to understand past population trends in island birds and related this to life history traits and glacial patterns. This concept is fairly new, as there are still relatively few multiple PSMC synthesis studies. I also thought that the focus on island endemics was unique and adds value to this paper. I enjoyed seeing a paper focused on South East Asia and think that this could help contribute to our knowledge of the important biodiversity within this region.

      Major weaknesses:

      My biggest concern with this paper is that the analyses are limited to 20-30 species, and significant taxonomic bias is present (there are multiple species of passerine but only 1-2 representatives of other groups). While this is not an issue alone, many of the life history traits or geographical traits are conflated with phylogenetic diversity (e.g., there are no large-bodied passerines). Thus, it is my opinion that the impact of these drivers of past population size is conflated and cannot be disentangled with the current data. The authors themselves state that the core hypothesis surrounding Ne and habitat availability is not supported by their entire dataset (only seen in Passerines). This was not clear enough in the abstract, and conclusions cannot be drawn here as the impact of taxonomy cannot be separated from data richness, traits, etc. The PSMC analysis was done according to the most recent recommendations, and this part of the manuscript is fairly robust. However, in several places, it is incorrectly stated that the PSMC measures or can infer genetic diversity; PSMC only infers past effective population size. It cannot measure genetic diversity in the past. I cannot review the habitat reconstruction modelling as I am a conservation genomics specialist.

      Appraisal:

      I am not convinced about the findings within the paper. I do not think that the results are sufficiently supported at this time, largely due to the conflation of taxonomy with other variables. As this type of comparison is new, I do think that there is a chance for reasonable impact on the field of genomics and island biogeography if the manuscript's constraints are addressed. I do not see scope for impact on conservation at this time and find the conclusions in the abstract regarding conservation relevance to be unfounded.

    4. Author response:

      We thank the editors and the reviewers for their positive comments regarding our manuscript and the methodological approach we have taken to understand the historical demographic response of endemic island birds to climate change. We acknowledge the issues of uneven sample sizes and plan to include additional species of island endemic birds for which genomic data is now available. As requested by reviewer 1, we will also address the issues related to the PSMC analysis in the revised version of the manuscript.

    1. eLife Assessment

      This study presents important findings that enhance our understanding of immune cell interactions in the context of chronic HIV-1 infection. The evidence supporting the conclusions is convincing. The authors have employed appropriate and validated methodologies, including detailed data reprocessing and batch correction to account for inter-donor variability. The inclusion of supplementary figures and analyses, such as cell communication inference, further substantiates the robustness of the findings. Overall, this work contributes to our understanding of HIV-1 immune evasion and highlights potential therapeutic targets for reservoir eradication.

    1. eLife Assessment

      This study presents valuable findings on the relationship between nutrient availability and NAD/NADH levels, which in turn regulate biomass production in cancer cells. The authors provide solid evidence to support their claims, offering insight into why it is difficult to predict which nutrients limit cancer cell growth: both cell type and nutrient availability together determine the oxidative capacity that constrains the synthesis of various metabolic intermediates. The manuscript will be of interest to researchers working in cancer and cell metabolism.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript investigates how cellular NAD/NADH ratios are controlled in cancer cell lines in vitro. The authors build on previous work, which shows that serine synthesis is sensitive to NAD/NADH ratios and PHGDH expression. Here, the authors demonstrate that serine synthesis is variable across a panel of cell lines, even when controlling for expression of serine synthesis enzymes such as PHGDH. The authors show that cellular NAD/NADH ratios correlate with the ability to synthesize serine and grow in serine-deprived environments when PHGDH levels remain constant. Investigating this variability in NAD/NADH ratios, the authors find that the cells that can positively respond to serine deprivation are able to increase oxygen consumption and cellular NAD/NADH ratios. Cells that do not increase oxygen consumption in response to serine deprivation do not increase NAD/NADH ratios and cannot grow well without serine. The authors go on to show that in cells with the ability to increase oxygen consumption upon serine deprivation, PHGDH expression alone is sufficient to fully restore growth-serine; in cells that cannot increase oxygen consumption, both PHGDH expression and interventions to increase NAD/NADH ratios are required to increase growth. Thus, cells need both PHGDH and NAD/NADH increases to maximize serine synthesis in response to serine deprivation. The authors previously showed that lipid synthesis likewise requires NAD regeneration. Interestingly, one cell line that does not increase oxygen consumption in response to serine limitation tends to increase oxygen consumption in response to lipid deprivation; accordingly, depriving this cell line of lipids increases the synthesis of serine. Together, these findings show that how cells respond to nutrient deprivation is highly variable and that the response to nutrient deprivation (for example, whether or not oxygen consumption is increased) will determine how well cells tolerate depletion of nutrients with related biosynthetic constraints. This work sheds light on the complexity of cancer cell metabolism and helps to explain why it is difficult to predict which nutrients will be limiting to any cancer cell type or environment.

      Strengths:

      (1) The authors use multiple interventions to manipulate NAD/NADH ratios in cells.

      (2) Experiments are well controlled and appropriately interpreted.

      Weaknesses:

      Overall the data support the conclusions of the manuscript. I have only two minor comments and suggestions:.

      (1) Figure 2B/C: data are presented as relative to +serine, which shows how some cells respond to -serine, but may also be of interest to see how absolute (not relative) NAD/NADH levels correlate with serine synthesis and serine-independent proliferation. In other words, is it the dynamic increase in the ratio that is most important, or the absolute level of the ratio?

      (2) Line 177-178: the authors write, "We hypothesized that the elevated NAD+/NADH ratio represented a cellular response to make the NAD+/NADH ratio more oxidized to enable serine synthesis". I recommend modest edits to avoid anthropomorphizing. It is possible that the ratio responds for reasons yet to be determined and not necessarily because the cell is deliberately trying to enable serine synthesis.

    3. Reviewer #2 (Public review):

      In the manuscript "Cancer cells differentially modulate mitochondrial respiration to alter redox state and enable biomass synthesis in nutrient-limited environments", Chang et al investigate how cancer cells respond to the limitation of certain environmental nutrients by regulating the cellular NAD+/NADH ratio. They focus on serine and lipid metabolism, pathways known to be controlled by the NAD+/NADH ratio, and propose that changes in mitochondrial respiration in response to deprivation of these nutrients can influence the NAD+/NADH ratio, thereby impacting biomass synthesis.

      While the study is descriptive in nature and does not investigate specific molecular mechanisms that explain the crosstalk between nutrient availability and mitochondrial redox changes, the experimental component is robust, and the conclusions are well supported by the results. Some suggestions could further refine the conclusions and enhance the quality of the manuscript.

      Main critiques:

      (1) Throughout the manuscript, the authors utilise the number of cell doublings per day as an endpoint readout of cell proliferation. It would be advisable to include a quantification of the cell number and to display the proliferation rate over time. This would provide valuable insights into the timeline of cellular responses and avoid potential confounding effects associated with the use of Sulforhodamine B dye, an indirect measure of cell proliferation based on protein content, which may be influenced by some of the interventions. Furthermore, it will help determine whether specific treatments reduce cellular doublings resulting from cell death. This concern is particularly evident in treatments with rotenone, e.g., Fig. 1G, where the increase in doublings could be attributed to cell death.

      (2) The authors propose a model in which the deprivation of extracellular nutrients impacts mitochondrial respiration, which in turn increases the NAD+/NADH ratio and ultimately affects metabolic biosynthetic pathways that occur in the cytosol, such as serine biosynthesis. The mechanism by which nutrient availability is sensed and transmitted across different cellular compartments to regulate mitochondrial redox status remains unclear. This concern is particularly relevant for serine metabolism, as its synthesis occurs in the cytosol, but the authors connect it to mitochondrial respiration. Compartment-specific measurements of NAD+/NADH ratio would help to understand to what extent the redox state is affected by nutrients in the mitochondria and in the cytoplasm (see also minor critiques point 2). Moreover, the use of the genetic tool LbNox could be employed to manipulate the NAD+/NADH ratio in a compartment-specific manner, while also avoiding the toxicity of certain compounds, such as rotenone. This set of experiments would add depth to the investigation, which might otherwise appear too descriptive.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript by Chang and colleagues provides new insights into how cancer cells adapt their metabolism under nutrient-deprived conditions. They find cells respond differentially to serine and lipid deprivation via oxidising the cell redox state, which enables biomass synthesis and cell proliferation. They identified mitochondrial respiration as the major mechanism that dictates the endogenous NAD+/NADH ratio. By incorporating a dual stress paradigm, serine and lipid deprivation, the study further suggests that the NAD+/NADH ratio can serve as a link to orchestrate the complex interplay between multiple nutrient changes in the tumour microenvironment.

      Strengths:

      A novel aspect of this study is the idea that cancer cells are not uniformly passive victims of nutrient limitation; some can actively invoke endogenous NAD+ regeneration to combat nutrient stress. The conclusion is well-supported by comparing multiple cell lines from different tissues and genetic backgrounds, which improves generalizability. While most of the smaller conclusions align with common reasoning and expectations, the step-by-step deduction that leads to a novel 'big picture' is commendable. Another notable strength is the integration of dual stress (lipid and serine deprivation), which better mimics the complex tumor microenvironment with multiple nutrient fluctuations, raising the translational potential of these findings. The observation that lipid-deprived cells can stimulate serine synthesis and support proliferation in a subset of cancer cell lines offers a novel perspective on metabolic plasticity under starvation conditions.

      Weaknesses:

      Although the authors derive a novel and valuable overarching concept, the presentation of this "big picture" is not clearly articulated, making it less accessible to readers outside the immediate field. It would greatly enhance the manuscript to include a clearer summary of the overarching model and its implications. Additionally, discussing the potential clinical significance and applications of the findings would increase the relevance and broader impact of the work. Finally, the manuscript's clarity and credibility are undermined by inconsistent figure labeling and the lack of statistical analysis, particularly for the Western blot data.

      While this study identifies changes in serine synthesis, mitochondrial respiration, PHGDH protein levels, and NAD+/NADH ratio in different cell lines, some of these relationships appear correlative rather than causally established (Figure 2; Figure 5; Figure 6). Some claims are thus overinterpreted. For example, the co-occurrence of increased NAD+/NADH ratio and citrate levels under lipid deprivation in A549 cells does not establish causality (Figure 5). Direct perturbation experiments that manipulate NAD+/NADH and assess downstream effects on citrate synthesis would substantially strengthen the conclusions.

      The study focuses predominantly on mitochondrial respiration as a source of NAD+ regeneration. However, it will also be interesting to check other significant pathways, such as NAD+ salvage, which have been implicated in supporting serine biosynthesis. In addition, the subcellular distribution of NAD+ may distinguish whether some cells are truly redox-unresponsive. Mitochondrial NAD+ regeneration might counteract the cytosolic NAD+ consumption, rendering a relatively stable intracellular NAD+/NADH ratio. The malate-aspartate shuttle can be an interesting aspect.

      The authors should acknowledge the limitations of short-term isotope tracing in their experimental design. Differences in metabolic rates across cell lines can affect the kinetics of metabolite labeling, limiting the direct comparability of metabolic fluxes between them. As a result, observed changes may reflect transient adaptations rather than stable metabolic reprogramming. It is important to clarify that the study primarily captures short-term responses, and the conclusions may not extrapolate to longer-term adaptations or protein-level changes under sustained nutrient stress.

    1. eLife Assessment

      Weiss et al. provide important new insights and convincing evidence to further our mechanistic understanding of how antigen presentation shapes skin persistence of CD8+ TRM. Using a mouse model for inducible genetic ablation of transforming growth factor beta receptor 3 (TGFBR3) in CD8+ T cells, they demonstrate TGFBR3's role in regulating CD8+ TRM persistence in skin. Furthermore, they show that the strength of T cell receptor (TCR) engagement upon initial CD8+ TRM skin seeding has a positive influence on subsequent TRM expansion following a secondary antigen-reencounter. Together, these mechanisms add to our understanding of how the skin CD8+ T cell repertoire is dynamically responsive to topical antigen.

    2. Reviewer #1 (Public review):

      Summary:

      Weiss et. al. seek to delineate the mechanisms by which antigen-specific CD8+ T cells outcompete bystanders in the epidermis when active TGF-b is limiting, resulting in selective retention of these cells and more complete differentiation into the TRM phenotype.

      Strengths:

      They begin by demonstrating that at tissue sites where cognate antigen was expressed, CD8+ T cells adopt a more mature TRM transcriptome than cells at tissue sites where cognate antigen was never expressed. By integrating their scRNA-Seq data on TRM with the much more comprehensive ImmGenT atlas, the authors provide a very useful resource for future studies in the field. Furthermore, they conclusively show that these "local antigen-experienced" TRM have increased proliferative capacity and that TCR avidity during TRM formation positively correlates with their future fitness. Finally, using an elegant experimental strategy, they establish that TCR signaling in CD8+ T cells in epidermis induces TGFBRIII expression, which likely contributes to endowing them with a competitive advantage over antigen-inexperienced TRM.

      Weaknesses:

      The main weakness in this paper lies in the authors' reliance on a single model to derive conclusions on the role of local antigen during the acute phase of the response by comparing T cells in model antigen-vaccinia virus (VV-OVA) exposed skin to T cells in contralateral skin exposed to DNFB 5 days after the VV-OVA exposure. In this setting, antigen-independent factors may contribute to the difference in CD8+ T cell number and phenotype at the two sites. For example, it was recently shown that very early memory precursors (formed 2 days after exposure) are more efficient at seeding the epithelial TRM compartment than those recruited to skin at later times (Silva et al, Sci Immunol, 2023). DNFB-treated skin may therefore recruit precursors with reduced TRM potential. In addition, TRM-skewed circulating memory precursors have been identified (Kok et al, JEM, 2020), and perhaps VV-OVA exposed skin more readily recruits this subset compared to DNFB-exposed skin. Therefore, when the DNFB challenge is performed 5 days after vaccinia virus, the DNFB site may already be at a disadvantage in the recruitment of CD8+ T cells that can efficiently form TRM. In addition, CD8+ T cell-extrinsic mechanisms may be at play, such as differences in myeloid cell recruitment and differentiation or local cytokine and chemokine levels in VV-infected and DNFB-treated skin that could account for differences seen in TRM phenotype and function between these two sites. Although the authors do show that providing exogenous peptide antigen at the DNFB-site rescues their phenotype in relation to the VV-OVA site, the potential antigen-independent factors distinguishing these two sites remain unaddressed. In addition, there is a possibility that peptide treatment of DNFB-treated initiates a second phase of priming of new circulatory effectors in the local-draining lymph nodes that are then recruited to form TRM at the DFNB-site, and that the effect does not solely rely on TRM precursors at the DNFB-treated skin site at the time of peptide treatment.

      Secondly, although the authors conclusively demonstrate that TGFBRIII is induced by TCR signals and required for conferring increased fitness to local-antigen-experienced CD8+ TRM compared to local antigen-inexperienced cells, this is done in only one experiment, albeit repeated 3 times. The data suggest that antigen encounter during TRM formation induces sustained TGFBRIII expression that persists during the antigen-independent memory phase. It remains unclear why only the antigen encounter in skin, but not already in the draining lymph nodes, induces sustained TGFBRIII expression. Further characterizing the dynamics of TGFBRIII expression on CD8+ T cells during priming in draining lymph nodes and over the course of TRM formation and persistence may shed more light on this question. Probing the role of this mechanism at other sites of TRM formation would also further strengthen their conclusions and enhance the significance of this finding.

    3. Reviewer #2 (Public review):

      Summary:

      The authors set out to dissect the mechanistic basis of their previously published finding that encountering cutaneous antigen augments the persistence of CD8+ memory T cells that enter skin (TRM) (Hirai et al., 2021, Immunity). Here they use the same murine model to study the fate of CD8+ T cells after antigen-priming in the lymph nodes, (1) those that re-encounter antigen in the skin via vaccinia virus (VV) versus (2) those that do not encounter antigen in skin but rather are recruited via topical dinitrofluorobenzene (DNFB) (so-called "bystander TRM"). The authors' previous publication establishes that this first group of CD8+ TRM has a persistence advantage over bystander TRM under TGFb-limiting conditions. The current paper advances this finding by elucidating the role of TGFBR3 in regulating CD8+ TRM skin persistence upon topical antigen exposure. Key novelty of the work lies in the generation and use of the CD8+ T cell-specific TGFBR3 knockout model, which allows them to demonstrate the role of TGFBR3 in fine-tuning the degree of CD8+ T cell skin persistence and that TGFBR3 expression is promoted by CD8+ TRM encountering their cognate antigen upon initial skin entry. Future work directly measuring active TGFb in the skin under different conditions would help identify physiologic scenarios that yield active TGFb-limiting conditions, thus establishing physiologic relevance.

      Strengths:

      Technical strengths of the paper include (1) complementary imaging and flow cytometry analyses, (2) integration of their scRNA-seq data with the existing CD8+ TRM literature via pathway analysis, and (3) use of orthogonal models where possible. Using a vaccina virus (VV) model, with and without ovalbumin (OVA), the authors investigate how topical antigen exposure and TCR strength regulate CD8+ TRM skin recruitment and retention. The authors use both FTY720 and a Thy1.1 depleting antibody to demonstrate that skin CD8+ TRM expand locally following both a primary and secondary recall response to topical OVA application.

      A conceptual strength of the paper is the authors' observation that TCR signal strength upon initial TRM tissue entry helps regulate the extent of their local re-expansion on subsequent antigen re-exposure. They achieved this by applying peptides of varying affinity for the OT-I TCR on the DNFB-exposed flank in tandem with initial VV-OVA + DNFB treatment. They then measured TRM expansion after OVA peptide rechallenge, revealing that encountering a higher-affinity peptide upon skin entry leads to greater subsequent re-expansion. Additionally, by generating an OT-I Thy1.1+ E8i-creERT2 huNGFR Tgfbr3fl/fl (Tgfbr3∆CD8) mouse, the authors were able to elucidate a unique role for TGFBR3 in CD8+TRM persistence when active TGFb in skin is limited.

      Weaknesses:

      Overall, the authors' conclusions are well supported, although there are some instances where additional controls, experiments, or clarifications would add rigor. The conclusions regarding skin-localized TCR signaling leading to increased skin CD8+ TRM proliferation in-situ and increased TGFBR3 expression would be strengthened by assessing skin CD8+ TRM proliferation and TGFBR3 expression in models of high versus low avidity topical OVA-peptide exposure. The authors could further increase the novelty of the paper by exploring whether TGFBR3 is regulated at the RNA or protein level. To this end, they could perform analysis of their single-cell RNA sequencing data (Figure 1), comparing Tgfbr3 mRNA in DNFB versus VV-treated skin.

      For clarity, when discussing antigen exposure throughout the paper, it would be helpful for the authors to be more precise that they are referring to the antigen in the skin rather than in the draining lymph node. A more explicit summary of some of the lab's previous work focused on CD8+ TRM and the role of TGFb would also help readers better contextualize this work within the existing literature on which it builds.

      For rigor, it would be helpful where possible to pair flow cytometry quantification with the existing imaging data. Additional controls, namely enumerating TRM in the opposite, untreated flank skin of VV-only-treated mice and the treated flank skin of DNFB-only treated mice, would help contextualize the results seen in dually-treated mice in Figure 1. In figure legends, we suggest clearly reporting unpaired T tests comparing relevant metrics within VV or DNFB-treated groups (for example, VV-OVA PBS vs VV-OVA FTY720 in Figure 3F). Finally, quantifying right and left skin draining lymph node CD8+ T cell numbers would clarify the skin specificity and cell trafficking dynamics of the authors' model.

    1. eLife Assessment

      This study presents a useful framework to extract the individuality index to predict subjects' behavior in the target tasks. However, the evidence supporting such a framework is somewhat incomplete and would benefit from overall framing and clarity on its approaches. Overall, this study would be of interest to cognitive and AI researchers who work on cognitive models in general.

    2. Reviewer #1 (Public review):

      Summary

      The manuscript presents EIDT, a framework that extracts an "individuality index" from a source task to predict a participant's behaviour in a related target task under different conditions. However, the evidence that it truly enables cross-task individuality transfer is not convincing.

      Strengths

      The EIDT framework is clearly explained, and the experimental design and results are generally well-described. The performance of the proposed method is tested on two distinct paradigms: a Markov Decision Process (MDP) task (comparing 2-step and 3-step versions) and a handwritten digit recognition (MNIST) task under various conditions of difficulty and speed pressure. The results indicate that the EIDT framework generally achieved lower prediction error compared to baseline models and that it was better at predicting a specific individual's behaviour when using their own individuality index compared to using indices from others.

      Furthermore, the individuality index appeared to form distinct clusters for different individuals, and the framework was better at predicting a specific individual's behaviour when using their own derived index compared to using indices from other individuals.

      Weaknesses

      (1) Because the "source" and "target" tasks are merely parameter variations of the same paradigm, it is unclear whether EIDT achieves true cross-task transfer. The manuscript provides no measure of how consistent each participant's behaviour is across these variants (e.g., two- vs three-step MDP; easy vs difficult MNIST). Without this measure, the transfer results are hard to interpret. In fact, Figure 5 shows a notable drop in accuracy when transferring between the easy and difficult MNIST conditions, compared to transfers between accuracy-focused and speed-focused conditions. Does this discrepancy simply reflect larger within-participant behavioural differences between the easy and difficult settings? A direct analysis of intra-individual similarity for each task pair - and how that similarity is related to EIDT's transfer performance - is needed.

      (2) Related to the previous comment, the individuality index is central to the framework, yet remains hard to interpret. It shows much greater within-participant variability in the MNIST experiment (Figure S1) than in the MDP experiment (Figure 3). Is such a difference meaningful? It is hard to know whether it reflects noisier data, greater behavioural flexibility, or limitations of the model.

      (3) The authors suggests that the model's ability to generalize to new participants "likely relies on the fact that individuality indices form clusters and individuals similar to new participants exist in the training participant pool". It would be helpful to directly test this hypothesis by quantifying the similarity (or distance) of each test participant's individuality index to the individuals or identified clusters within the training set, and assessing whether greater similarity (or closer proximity) to the clusters in the training set is associated with higher prediction accuracy for those individuals in the test set.

    3. Reviewer #2 (Public review):

      This paper introduces a framework for modeling individual differences in decision-making by learning a low-dimensional representation (the "individuality index") from one task and using it to predict behaviour in a different task. The approach is evaluated on two types of tasks: a sequential value-based decision-making task and a perceptual decision task (MNIST). The model shows improved prediction accuracy when incorporating this learned representation compared to baseline models.

      The motivation is solid, and the modelling approach is interesting, especially the use of individual embeddings to enable cross-task generalization. That said, several aspects of the evaluation and analysis could be strengthened.

      (1) The MNIST SX baseline appears weak. RTNet isn't directly comparable in structure or training. A stronger baseline would involve training the GRU directly on the task without using the individuality index-e.g., by fixing the decoder head. This would provide a clearer picture of what the index contributes.

      (2) Although the focus is on prediction, the framework could offer more insight into how behaviour in one task generalizes to another. For example, simulating predicted behaviours while varying the individuality index might help reveal what behavioural traits it encodes.

      (3) It's not clear whether the model can reproduce human behaviour when acting on-policy. Simulating behaviour using the trained task solver and comparing it with actual participant data would help assess how well the model captures individual decision tendencies.

      (4) Figures 3 and S1 aim to show that individuality indices from the same participant are closer together than those from different participants. However, this isn't fully convincing from the visualizations alone. Including a quantitative presentation would help support the claim.

      (5) The transfer scenarios are often between very similar task conditions (e.g., different versions of MNIST or two-step vs three-step MDP). This limits the strength of the generalization claims. In particular, the effects in the MNIST experiment appear relatively modest, and the transfer is between experimental conditions within the same perceptual task. To better support the idea of generalizing behavioural traits across tasks, it would be valuable to include transfers across more structurally distinct tasks.

      (6) For both experiments, it would help to show basic summaries of participants' behavioural performance. For example, in the MDP task, first-stage choice proportions based on transition types are commonly reported. These kinds of benchmarks provide useful context.

      (7) For the MDP task, consider reporting the number or proportion of correct choices in addition to negative log-likelihood. This would make the results more interpretable.

      (8) In Figure 5, what is the difference between the "% correct" and "% match to behaviour"? If so, it would help to clarify the distinction in the text or figure captions.

      (9) For the cognitive model, it would be useful to report the fitted parameters (e.g., learning rate, inverse temperature) per individual. This can offer insight into what kinds of behavioural variability the individuality index might be capturing.

      (10) A few of the terms and labels in the paper could be made more intuitive. For example, the name "individuality index" might give the impression of a scalar value rather than a latent vector, and the labels "SX" and "SY" are somewhat arbitrary. You might consider whether clearer or more descriptive alternatives would help readers follow the paper more easily.

      (11) Please consider including training and validation curves for your models. These would help readers assess convergence, overfitting, and general training stability, especially given the complexity of the encoder-decoder architecture.

    4. Reviewer #3 (Public review):

      Summary:

      This work presents a novel neural network-based framework for parameterizing individual differences in human behavior. Using two distinct decision-making experiments, the authors demonstrate the approach's potential and claims it can predict individual behavior (1) within the same task, (2) across different tasks, and (3) across individuals. While the goal of capturing individual variability is compelling and the potential applications are promising, the claims are weakly supported, and I find that the underlying problem is conceptually ill-defined.

      Strengths:

      The idea of using neural networks for parameterizing individual differences in human behavior is novel, and the potential applications can be impactful.

      Weaknesses:

      (1) To demonstrate the effectiveness of the approach, the authors compare a Q-learning cognitive model (for the MDP task) and RTNet (for the MNIST task) against the proposed framework. However, as I understand it, neither the cognitive model nor RTNet is designed to fit or account for individual variability. If that is the case, it is unclear why these models serve as appropriate baselines. Isn't it expected that a model explicitly fitted to individual data would outperform models that do not? If so, does the observed superiority of the proposed framework simply reflect the unsurprising benefit of fitting individual variability? I think the authors should either clarify why these models constitute fair control or validate the proposed approach against stronger and more appropriate baselines.

      (2) It's not very clear in the results section what it means by having a shorter within-individual distance than between-individual distances. Related to the comment above, is there any control analysis performed for this? Also, this analysis appears to have nothing to do with predicting individual behavior. Is this evidence toward successfully parameterizing individual differences? Could this be task-dependent, especially since the transfer is evaluated on exceedingly similar tasks in both experiments? I think a bit more discussion of the motivation and implications of these results will help the reader in making sense of this analysis.

      (3) The authors have to better define what exactly he meant by transferring across different "tasks" and testing the framework in "more distinctive tasks". All presented evidence, taken at face value, demonstrated transferring across different "conditions" of the same task within the same experiment. It is unclear to me how generalizable the framework will be when applied to different tasks.

      (4) Conceptually, it is also unclear to me how plausible it is that the framework could generalize across tasks spanning multiple cognitive domains (if that's what is meant by more distinctive). For instance, how can an individual's task performance on a Posner task predict task performance on the Cambridge face memory test? Which part of the framework could have enabled such a cross-domain prediction of task performance? I think these have to be at least discussed to some extent, since without it the future direction is meaningless.

      (5) How is the negative log-likelihood, which seems to be the main metric for comparison, computed? Is this based on trial-by-trial response prediction or probability of responses, as what usually performed in cognitive modelling?

      (6) None of the presented evidence is cross-validated. The authors should consider performing K-fold cross-validation on the train, test, and evaluation split of subjects to ensure robustness of the findings.

      (7) The authors excluded 25 subjects (20% of the data) for different reasons. This is a substantial proportion, especially by the standards of what is typically observed in behavioral experiments. The authors should provide a clear justification for these exclusion criteria and, if possible, cite relevant studies that support the use of such stringent thresholds.

      (8) The authors should do a better job of creating the figures and writing the figure captions. It is unclear which specific claim the authors are addressing with the figure. For example, what is the key message of Figure 2C regarding transfer within and across participants? Why are the stats presentation different between the Cognitive model and the EIDT framework plots? In Figure 3, it's unclear what these dots and clusters represent and how they support the authors' claim that the same individual forms clusters. And isn't this experiment have 98 subjects after exclusion, this plot has way less than 98 dots as far as I can tell. Furthermore, I find Figure 5 particularly confusing, as the underlying claim it is meant to illustrate is unclear. Clearer figures and more informative captions are needed to guide the reader effectively.

      (9) I also find the writing somewhat difficult to follow. The subheadings are confusing, and it's often unclear which specific claim the authors are addressing. The presentation of results feels disorganized, making it hard to trace the evidence supporting each claim. Also, the excessive use of acronyms (e.g., SX, SY, CG, EA, ES, DA, DS) makes the text harder to parse. I recommend restructuring the results section to be clearer and significantly reducing the use of unnecessary acronyms.

    1. eLife Assessment

      This manuscript makes important contributions to the methodology commonly used to assess representational structures in human and animal brain activity recorded using various techniques (especially fMRI). The evidence in the form of mathematical analysis and simulations is solid. The impact of this contribution could be improved by extending the simulations to assess the effects of violations of explicit and implicit assumptions.

    2. Reviewer #1 (Public review):

      Summary:

      This work presents a formalism for the relationship between neural signals and pooled signals (e.g., voxel estimates in fMRI) and explores why correlation-based and mean-removed Euclidean RDMs perform well in practice. The key assumption is that the pooled estimates are weighted averages, with i.i.d. non-negative weights. Two sets of simulations are used to support the theoretical findings: one based on fully simulated neural data and another that reverse-engineers neural data from an RDM estimated from real macaque data. The authors also discuss limitations of their simulations, particularly concerning the i.i.d. assumption of the weights.

      Strengths:

      The strengths of this work include its mathematical rigor and the clear connection that is drawn between the derivations and empirical observations. The simulations were well-designed and easy to follow. One small suggestion: a brief explanation of what is meant by "sparse" in Figure 3 would help orient the reader without requiring them to jump ahead to the methods. Overall, I found the work engaging and insightful.

      Weaknesses:

      Although I appreciate the effort to explore *why* certain dissimilarity measures perform well, it wasn't clear how these findings would inform the practical choices of researchers conducting RDM-based analyses. Many researchers likely already use correlation-based or mean-removed Euclidean distance measures, given their popularity. In that case, how do these results provide additional value or guidance beyond current practice?

      Another aspect that could benefit from further clarification is the core assumption underlying the work - that channel-based activity reflects a non-negative weighted average of neural activity. Is this widely accepted as the most plausible model, or are there alternative relationships that researchers should consider? While this may seem intuitive, it's not something I would expect all readers to be familiar with, and only a single reference was provided to support it (which I unfortunately didn't have time to read). That said, I did appreciate the discussion of the i.i.d. assumption in the discussion section. Can more be said to educate researchers as to when the i.i.d. assumption might be violated?

      I didn't find the "Simulations based on neural data" section added much, and it risks being misinterpreted. The main difference here is that neural data were reverse-engineered from a macaque RDM and then used in simulations similar to those in the previous section. What is the added value of using a real RDM to generate simulated data? Were the earlier simulations lacking in some way? There's also a risk of readers mistakenly inferring that human dissimilarities have been reconstructed from macaque data, an assumption that goes beyond the paper's core message, which focuses on linking neural and channel-based signals from the *same* source. If this section is retained, the motivation should be clarified, and the implied parallel in Figure 6, between the human data and simulated data, should be reconsidered.