10,000 Matching Annotations
  1. Dec 2024
    1. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates how neural cell development is affected in Lowe syndrome. Using neural cultures differentiated from human iPSCs carrying either an LS mutation or a genetically engineered mutation in OCRL, the authors show a depletion of mitochondrial DNA and a decrease in mitochondrial activities that correlate with an increased formation of astrocytes at the expense of neurons. Similar effects on mitochondria and on astrocyte development were observed in an LS mouse model. Moreover, these mutant brain cells are less likely to be ciliated and show a reduction in Sonic Hedgehog signalling.

      Strengths/Weaknesses:

      The study derives strength from the analyses of two different models of Lowe syndrome, both reaching similar conclusions. However, the observed changes in mitochondrial defects, neuronal/astrocytic development, and primary cilia are only correlated, with no attempt to investigate a causal relationship. Moreover, the mouse model is only analysed at the adult stage providing no insights into the development of the defects. Different brain regions are analysed with immunostainings and qRT-PCR making it challenging to draw clear correlations between these findings. The quality of the corresponding figures is often poor and the selection of markers is frequently inappropriate. Taken together, these limitations complicate the interpretations of the data and significantly limit the conclusions that can be drawn from the study.

    1. eLife Assessment

      This valuable study proposes a theoretical model of clathrin coat formation based on membrane elasticity that seeks to determine whether this process occurs by increasing the area of a protein-coated patch with constant curvature, or by increasing the curvature of a protein-coated patch that forms in an initially flat conformation (so called constant curvature or constant area models). Identifying energetically favorable pathways and comparing the obtained shapes with experiments provides solid support to the constant-area pathway. This work will be of interest for biologists and biophysicists interested in membrane remodelling and endocytosis. It provides an innovative approach to tackle the question of constant curvature vs. constant area coat protein formation, although some of the model's assumption are only partially supported by experimental evidence.

    2. Reviewer #1 (Public review):

      Summary:

      The authors develop a set of biophysical models to investigate whether a constant area hypothesis or a constant curvature hypothesis explains the mechanics of membrane vesiculation during clathrin-mediated endocytosis.

      Strengths:

      The models that the authors choose are fairly well-described in the field and the manuscript is well-written.

      Weaknesses:

      One thing that is unclear is what is new with this work. If the main finding is that the differences are in the early stages of endocytosis, then one wonders if that should be tested experimentally. Also, the role of clathrin assembly and adhesion are treated as mechanical equilibrium but perhaps the process should not be described as equilibria but rather a time-dependent process. Ultimately, there are so many models that address this question that without direct experimental comparison, it's hard to place value on the model prediction.<br /> While an attempt is made to do so with prior published EM images, there is excessive uncertainty in both the data itself as is usually the case but also in the methods that are used to symmetrize the data. This reviewer wonders about any goodness of fit when such uncertainty is taken into account.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors employ theoretical analysis of an elastic membrane model to explore membrane vesiculation pathways in clathrin-mediated endocytosis. A complete understanding of clathrin-mediated endocytosis requires detailed insight into the process of membrane remodeling, as the underlying mechanisms of membrane shape transformation remain controversial, particularly regarding membrane curvature generation. The authors compare constant area and constant membrane curvature as key scenarios by which clathrins induce membrane wrapping around the cargo to accomplish endocytosis. First, they characterize the geometrical aspects of the two scenarios and highlight their differences by imposing coating area and membrane spontaneous curvature. They then examine the energetics of the process to understand the driving mechanisms behind membrane shape transformations in each model. In the latter part, they introduce two energy terms: clathrin assembly or binding energy, and curvature generation energy, with two distinct approaches for the latter. Finally, they identify the energetically favorable pathway in the combined scenario and compare their results with experiments, showing that the constant-area pathway better fits the experimental data.

      Strengths:

      The manuscript is well-written, well-organized, and presents the details of the theoretical analysis with sufficient clarity.<br /> The calculations are valid, and the elastic membrane model is an appropriate choice for addressing the differences between the constant curvature and constant area models.<br /> The authors' approach of distinguishing two distinct free energy terms-clathrin assembly and curvature generation-and then combining them to identify the favorable pathway is both innovative and effective in addressing the problem.<br /> Notably, their identification of the energetically favorable pathways, and how these pathways either lead to full endocytosis or fail to proceed due to insufficient energetic drives, is particularly insightful.

      Weaknesses:

      Membrane remodeling in cellular processes is typically studied in either a constant area or constant tension ensemble. While total membrane area is preserved in the constant area ensemble, membrane area varies in the constant tension ensemble. In this manuscript, the authors use the constant tension ensemble with a fixed membrane tension, σe. However, they also use a constant area scenario, where 'area' refers to the surface area of the clathrin-coated membrane segment. This distinction between the constant membrane area ensemble and the constant area of the coated membrane segment may cause confusion.

      As mentioned earlier, the theoretical analysis is performed in the constant membrane tension ensemble at a fixed membrane tension. The total free energy E_tot of the system consists of membrane bending energy E_b and tensile energy E_t, which depends on membrane tension, σe. Although the authors mention the importance of both E_b and E_t, they do not present their individual contributions to the total energy changes. Comparing these contributions would enable readers to cross-check the results with existing literature, which primarily focuses on the role of membrane bending rigidity and membrane tension.

      The authors introduce two different models, (1,1) and (1,2), for generating membrane curvature. Model 1 assumes a constant curvature growth, corresponding to linear curvature growth, while Model 2 relates curvature growth to its current value, resembling exponential curvature growth. Although both models make physical sense in general, I am concerned that Model 2 may lead to artificial membrane bending at high curvatures. Normally, for intermediate bending, ψ > 90, the bending process is energetically downhill and thus proceeds rapidly. the bending process is energetically downhill and thus proceeds rapidly. However, Model 2's assumption would accelerate curvature growth even further. This is reflected in the endocytic pathways represented by the green curves in the two rightmost panels of Fig. 4a, where the energy steeply increases at large ψ. I believe a more realistic version of Model 2 would require a saturation mechanism to limit curvature growth at high curvatures.

    1. eLife Assessment

      This study provides valuable quantitative data and analysis that reveals variations in 'Dorsal' nuclear dynamics along the dorso-ventral axis in the early Drosophila embryo. The evidence that supports that these variations are due to Dorsal/Cactus interactions in dorsal nuclei is convincing, albeit incomplete to understand the biological implications of these findings for developmental patterning.

    2. Reviewer #1 (Public review):

      Summary:

      Al Asafen and colleagues apply a set of scanning fluorescence correlation spectroscopic approaches (Raster Image Correlation Spectroscopy (RICS), cross-correlation RICS, and pair-correlation function spectroscopy) to address the nuclear-cytoplasmic kinetics of the Dorsal (Dl) transcription factor in early Drosophila embryos. The Toll/Dl system has long been appreciated to establish dorsal-ventral polarity of the embryo through Toll-dependent control of Dl nuclear localization, and provides an example of a morphogen gradient produced with high enough precision to yield robust biophysical measurements of general transcription factor activity and function. By measuring GFP-tagged Dl protein, either in wild-type embryos or in mutant embryos with low/medium/high levels of Toll signaling, the authors report diffusivity of Dl in nuclear and cytoplasmic compartments of the embryo, as well as the fraction of mobile and immobile Dl, which can be correlated with DNA binding through cross-correlation RICS. A model is presented where Cactus/IkB is implicated in preventing Dl from binding to DNA.

      Strengths:

      The experiments on wild-type GFP-tagged Dorsal are performed well, are mostly reported well, and are interpreted fairly.

      Weaknesses:

      The discrepancy between experiment and theory as pertains to Michaelis-Menten kinetics is not fully motivated in the text, and could benefit from a more clear presentation. The experiments performed to distinguish between the contribution of Toll-dependent phosphorylation and Cactus interaction models for limiting Dorsal DNA binding are possibly confounded by the presence of wild-type, GFP-tagged Dorsal protein.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Al Asafen, Clark et al., use fluorescence correlation spectroscopy (FCS) to quantitatively analyze the mobility of Dl along the DV axis of the early Drosophila embryo. Dl is essential for dorsal-ventral (DV) patterning and its gradient initiates the activation of several genes and thereby orchestrates the formation of the Drosophila body plan. While the mechanisms underlying the formation of the Dl gradient have been extensively studied by this group and others, there are some observations for which there is not yet a mechanistic explanation. For example, the peak of the Dl gradient grows continuously during nuclear cycles 10-14. This is likely due to Cact-dependent Dl diffusion and Dl binding to DNA. However, the biophysical parameters governing Dl nuclear dynamics that would support these claims have not been previously measured. In this work, the authors provide evidence that GFP-tagged Dl may be separated into a mobile pool and an immobile pool. Interestingly, the fraction of immobile Dl is position-dependent along the DV axis, revealing more binding to DNA in the ventral than in the dorsal nuclei. This is either due to higher binding affinity in ventral locations (due to Toll-dependent Dl phosphorylation) or to higher Dl-Cact binding in dorsal nuclei that would prevent Dl from binding to DNA. Using dl-mutant alleles, the authors support the latter hypothesis.

      Strengths:

      The manuscript is well written and their conclusions are convincingly supported by their methodology and analysis. As a quantitative study, the biophysical analysis seems rigorous, in general.

      Although this is not the first study that employs FSC to investigate the dynamics of a morphogen, it further exemplifies how these quantitative tools can be used to uncover mechanistic aspects of morphogen dynamics during development. In particular, the manuscript reports novel biophysical parameters of Dl dynamics that will be helpful in future hypotheses-driven modeling studies.

      Weaknesses:

      In my opinion, the main weakness of the manuscript is that the main biological implication of the study, namely that the asymmetry in the fraction of immobile Dl is a result of nuclear Dl-Cact binding which prevents Dl from binding DNA (Figure 5), occurs in a region of the embryo where there is very little Dl anyways (Figure 1A, 5A). While it is interesting that the fraction of immobile Dl increases (just a little, but significantly) in dorsal nuclei in mutants expressing a form of Dl with reduced Cact binding it is unclear what is the biological impact of this effect in a location where Dl is nearly absent. As can be seen in Figure 3F, the fraction of immobile is unaffected in Dl-mutant forms with reduced DNA binding, because it is already very low. It is unlikely that Dl binding to Cact in dorsal nuclei would affect shuttling as well since the fraction is very low anyway.

      While the authors have a very clear understanding of the biology of the Dl gradient, I feel that the manuscript is more written as a 'tools' paper (i.e., to exemplify how FSC methods and analysis can be used for biological discovery). This is ok, but I think that the authors should discuss further what are the biological implications of these findings other than the contribution to uncovering the biophysical parameters. For example, I think that the implications of the rejected hypothesis (i.e., that Toll-dependent Dl phosphorylation does not seem to have an impact on Dl binding affinities to DNA) are important and should be further discussed (even if no additional experiments are performed). What is then the role of Dl phosphorylation? Perhaps it could have an impact on patterning robustness in lateral regions. The authors should report in Figure 5 also what happens to the fraction of Dl bound to DNA in lateral regions in the reduced Cact binding and reduced Toll phosphorylation mutants.

      The way that position along the DV axis is reported using the nuclear-cytoplasmic-ratio (NCR) in Figures 1-3 is not incorrect, but I wonder if it is the best way of doing it. The reason is that it spreads out a relatively small region of the embryo (the ventral-most locations) and shrinks a relatively large region of the embryo (lateral and dorsal regions), see Figure 1A. Perhaps reporting the NCR in log_2 units would be more appropriate.

    1. eLife Assessment

      With compelling electrophysiological and behavioural evidence, this work establishes that the activity of insulin-producing cells (IPCs) depends on the nutritional state in Drosophila and that, like in mammals, there is also an incretin-like effect with IPCs responding to glucose feeding but not to glucose perfusion. Moreover, the authors demonstrate that DH44 neurons respond to glucose perfusion and, together with IPCs, modulate locomotor activity. This important study on the neuronal regulation of metabolic homeostasis will be of interest to both neuroscience and to medical research in diabetes.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents useful insights into the in vivo dynamics of insulin-producing cells (IPCs), key cells regulating energy homeostasis across the animal kingdom. The authors further provide compelling evidence using adult Drosophila melanogaster that IPCs, unlike neighboring DH44 cells, do not respond to glucose directly, but that glucose can indirectly regulate IPC activity after ingestion supporting an incretin-like mechanism in flies similar to mammals. The authors link decreased activity of IPCs to hyperactivity observed in starved flies, a locomotive behavior aimed to increase food search. Furthermore, the authors provide evidence that IPCs receive inhibitory inputs from Dh44 neurons, which are linked to increased locomotor activity.

      This paper is of outstanding interest to scientists aiming to understand metabolic control of circuit dynamics, in particular for internal state-linked behaviors competing with the feeding state.

      Strengths:

      (1) By using whole cell patch clamp recording, the authors convincingly showed the activity pattern and regulation of IPCs and neighboring DH44 neurons under different feeding states and in various refeeding paradigms.<br /> (2) The paper provides compelling evidence that IPCs are not directly and acutely activated by glucose, but rather through a post-ingestive incretin-like mechanism. In addition, the authors show that Dh44 neurons located adjacent to the IPCs respond to bath application of nutritive sugars contrary to the IPCs.<br /> (3) The paper also provides useful data on the regulation of IPC activity by Dh44 neurons, which is useful to understand their regulation in vivo.

      No major weaknesses remain in the revised version of this work.

    3. Reviewer #2 (Public review):

      Summary

      In this study, Bisen et al. characterized the state-dependency of insulin-producing cells in the brain of Drosophila melanogaster. They successfully established that IPC activity is modulated by the nutritional state and age of the animal. Interestingly, they demonstrate that IPCs respond to the ingestion of glucose, rather than to perfusion with it, an observation reminiscent of the incretin effect in mammals. The study is well conducted and presented and the experimental data convincingly support the claims made.

      Strengths

      The study makes great use of the tools available in *Drosophila* research, demonstrating the effect that starvation and subsequent refeeding have on the physiological activity of IPCs as well as on the behavior of flies to then establish causal links by making use of optogenetic tools.<br /> It is particularly nice to see how the authors put their findings in context to published research and use for example TDC2 neuron activation or DH44 activity to establish baselines to relate their data to.

    4. Reviewer #3 (Public review):

      Although insulin release is essential in the control of metabolism, adjusted to nutritional state, and plays major roles in normal brain function as well as in aging and disease, our knowledge about the activity of insulin-producing (and releasing) cells (IPCs) in vivo in limited.

      In this technically demanding study, IPC activity is studied in the Drosophila model system by fine in vivo patch clamp recordings with parallel behavioral analyses and various optogenetic as well as feeding manipulations.

      The data provide compelling evidence that IPC activity is increased with a slow time course after feeding a high glucose diet. By contrast, IPC activity is not directly affected by rising blood glucose levels. This is reminiscent of the incretin effect known from vertebrates and points to a conserved mechanism in insulin production and release upon sugar feeding.

      Moreover, the data confirm earlier studies that nutritional state strongly affects locomotion. Surprisingly, strong evidence shows that IPC activity makes only a negligible contribution to this. Instead, other modulatory neurons that are directly sensitive to blood glucose levels strongly affect locomotion. Together, these data reveal a network of multiple parallel and interacting neuronal layers to orchestrate the physiological, metabolic, and behavioral responses to the nutritional state. Together with the data from a previous study, this work sets the stage to dissect the architecture and function of this network.

      Strengths:

      State-of-the-art current clamp in situ patch clamp recordings in behaving animals are a demanding but powerful method to provide novel insight into the interplay of nutritional state, IPC activity, and locomotion. The patch clamp recordings and the parallel behavioral analyses are of high quality, as are the optogenetic manipulations. The data showing that starvation silences IPC activity in young flies (younger than 1 week) are excellent. The evidence for the claim that locomotor activity is not increased upon IPC activity but upon the activity of other blood glucose sensitive modulatory neurons (Dh44) is compelling, too. The study provides a great system to experimentally dissect the interplay of insulin production and release with metabolism, physiology, nutritional state, and behavior. Demonstrating the incretin effect in Drosophila provides novel experimental routes to further study it. During the revision process, compelling evidence has been added to underscore the incretin effect, the finding that IPCs themselves do not sense sugars, and that feeding a high sugar diet does not cause unspecific stress responses.

      I found no more weaknesses: The authors have carefully addressed all of my previous critiques by adding compelling new data and carefully revising the text. This paper provides a prime example of how responsible authors can utilize this constructive (but relatively new) reviewing procedure to make a very good manuscript even better.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This study presents useful insights into the in vivo dynamics of insulin-producing cells (IPCs), key cells regulating energy homeostasis across the animal kingdom. The authors provide compelling evidence using adult Drosophila melanogaster that IPCs, unlike neighboring DH44 cells, do not respond to glucose directly, but that glucose can indirectly regulate IPC activity after ingestion supporting an incretin-like mechanism in flies, similar to mammals. The authors link the decreased activity of IPCs to hyperactivity observed in starved flies, a locomotive behavior aimed at increasing food search. 

      Furthermore, there is supporting evidence in the paper that IPCs receive inhibitory inputs from Dh44 neurons, which are linked to increased locomotor activity. However, although the electrophysiological data underlying the dynamics of IPCs in vivo is compelling, the link between IPCs and other potential elements of the circuitry (e.g. octopaminergic neurons) regulating locomotive behaviors is not clear and would benefit from more rigorous approaches. 

      This paper is of interest to cell biologists and electrophysiologists, and in particular to scientists aiming to understand circuit dynamics pertaining to internal state-linked behaviors competing with the feeding state, shown here to be primarily controlled by the IPCs. 

      Strengths: 

      (1) By using whole-cell patch clamp recording, the authors convincingly showed the activity pattern of IPCs and neighboring DH44 neurons under different feeding states. 

      (2) The paper provides compelling evidence that IPCs are not directly and acutely activated by glucose, but rather through a post-ingestive incretin-like mechanism. In addition, the authors show that Dh44 neurons located adjacent to the IPCs respond to bath application of glucose contrary to the IPCs. 

      (3) The paper provides useful data on the firing pattern of 2 key cell populations regulating foodrelated brain function and behavior, IPCs and Dh44 neurons, results which are useful to understand their in vivo function. 

      Weaknesses: 

      (1) The term nutritional state generally refers to the nutrients which are beneficial to the animal. In Figure 1, the authors showed that IPCs respond to glucose but not proteins. To validate the term nutritional state the authors could test the effect of a non-nutritive sugar (e.g. D-arabinose or L-Glucose) on the post-ingestive physiological responses of the IPCs.

      We thank the referee for this insightful comment. Following their suggestion, we included two new experimental data sets, which we added to Figure 1: We show that IPCs do not respond to the non-nutritive sugar D-arabinose (Figure 1H). In order to further expand this data set and our conclusions, we additionally show that IPCs do respond to fructose – a second nutritive sugar in addition to glucose (Figure 1H). Together, these data sets permit the conclusion that IPCs are sensitive to the ingestion of nutritive sugars, and do not respond to ingestion of nonnutritive sugars or high protein diets. Thus, we validate the term nutritional state.

      (2) It is difficult to grasp the main message from the figures in the result section as some figures have several results subsections referring to different points the authors want to make. The key results of a figure will be easier to understand if they are summarized in one section of the results. Alternatively, a figure can be split into 2 figures if there are several key messages in those figures, e.g. Figures 2 and 3.  

      We appreciate this suggestion and have made several changes to our manuscript to add more clarity. Among other things, we have changed the order of data presentation in Figure 2, as suggested by the referee below, where we now start with the IPC activation data rather than the OAN activation. We also swapped the order of data presentation and split Figure S1 into Figures S1 & S2. Moreover, we re-arranged the panel order in supplementary figure S4. This significantly improved the flow of the results section. Since the figures the referee refers to contain comparative data, for example between diets (Figure 1) or neuron types (Figure 2), we prefer to keep these data sets together. However, we have carefully revised the results section to more clearly relate our statements to individual figure panels.

      (3) The prime investigation of the paper is about the physiological response and locomotive behavioral readout linked to IPCs. The authors do not show a link between OANs and IPCs in terms of functional or behavioral readouts. In Figure 2 the authors first start with stating a link between OAN neurons and locomotion changes resulting from internal feeding states. The flow of the paper would be better if the authors focused on the effect of optogenetic activation of IPCs under different feeding states and their impact on fly locomotion. If the experiments done on optogenetic activation of OANs were to validate the experimental approach the data on OAN neurons is better suited for the supplement without the need of a subsection in the result section on the OANs.  

      We agree with the reviewer’s suggestion and switched the order of the figure panels and text to aid the flow of the manuscript. We now show and discuss the IPC activation data first (Figure 2C-H) and OAN activation afterwards (Figure 2I-K). We did keep the OAN data in the main document, though, since that facilitates comparisons between the small effects of IPC activation and the large, well-established effects of OAN activation.

      (4) Figure 2F shows that optogenetic activation of IPCs in fed flies does not influence their locomotor output. In the text, the conclusion linked to Figure 2F-H states that IPC activation reduces starvation-induced hyperactivity which is a statement more suited to Figure 2I-K. 

      We edited the text accordingly.

      (5) The authors show activation of Dh44 neurons leads to hyperpolarisation of the IPCs. What is the functional link between non-PI Dh44 neurons and the IPCs? Do IPCs express DH44R or is DH44 required for this effect on IPCs? Investigating a potential synaptic or peptidergic link between DH44 neurons and IPCs and its effect on behavior would benefit the paper, as it is so far not well connected. 

      Although we have not performed any experiments dedicated to investigating the functional link between DH44Ns outside the PI and the IPCs in this study, there are two lines of evidence supporting that this connection is relatively direct. First, IPCs do express DH44R1 & R2, as we show in a parallel study in eLife (Held M, et al. ‘Aminergic and peptidergic modulation of Insulin-Producing Cells in Drosophila’. eLife. 2024;13. doi:10.7554/ELIFE.99548.1). Second, we performed functional connectivity experiments using a Leucokinin (LK) driver line in that paper. This driver line labels two pairs of non-PI DH44Ns in the VNC, which are DH44 and LK positive (Zandawala et al 2018). Activating that line leads to inhibition of IPCs, similar to the effect we observed here for DH44N activation. These two lines of evidence suggest that there could be a direct peptidergic connection between DH44+ neurons and IPCs. We have added a paragraph mentioning these experiments to our discussion:

      ‘Notably, the DH44<sup>PI</sup>Ns express the DH44 peptide, as confirmed by anti-DH44 stainings(100). This also applies to a large fraction of neurons labelled in the broad DH44 driver line(100). However, a subset of neurons labelled in the broad line did not exhibit DH44 immunoreactivity(100), and might therefore not actually express the DH44 peptide. Hence, the inhibition of IPCs could be driven by neurons in the DH44 driver line that do not express DH44. A strong candidate for the inhibition are LK and DH44-positive neurons, which are labelled by the broad line(76). In a parallel study, we showed that LK-expressing neurons strongly inhibit IPCs(30), similar to the broad DH44 line used here. Furthermore, evidence from single-nucleus transcriptomic analysis shows that IPCs express DH44-R1 and DH44-R2 receptors(30). Therefore, it is possible that DH44Ns communicate with IPCs through a direct peptidergic connection. Notably, the inhibitory effect of non-PI DH44Ns on IPCs was very strong and fast, suggesting that a connection via classical synapses is more likely. Regardless, our results show that the glucose sensing DH44<sup>PI</sup>Ns and IPCs act independently of each other.’

      Reviewer #2 (Public Review): 

      Summary: 

      In this study, Bisen et al. characterized the state-dependency of insulin-producing cells in the brain of *Drosophila melanogaster*. They successfully established that IPC activity is modulated by the nutritional state and age of the animal. Interestingly, they demonstrate that IPCs respond to the ingestion of glucose, rather than to perfusion with it, an observation reminiscent of the incretin effect in mammals. The study is well conducted and presented and the experimental data convincingly support the claims made. 

      Strengths: 

      The study makes great use of the tools available in *Drosophila* research, demonstrating the effect that starvation and subsequent refeeding have on the physiological activity of IPCs as well as on the behavior of flies to then establish causal links by making use of optogenetic tools. 

      It is particularly nice to see how the authors put their findings in context to published research and use for example TDC2 neuron activation or DH44 activity to establish baselines to relate their data to. 

      Weaknesses: 

      I find the inability of SD to rescue the IPC starvation effect in Figure 1G&H surprising, given that the fully fed flies were raised and kept on that exact diet. Did the authors try to refeed flies with SD for longer than 24 hours? I understand that at some point the age effect would also kick in and counteract potential IPC activity rescue. I think the manuscript would benefit if the authors could indicate the exact age of the SD refed flies and expand a bit on the discussion of that point.  

      We have expanded the first paragraph of our discussion to tackle these questions, in particular the potential effect of aging, as suggested by the referee. We now also indicate the exact age of the flies. Moreover, we have conducted additional experiments in which we added either glucose or arabinose to our standard diet (Figure 1H). As we would have expected based on our hypothesis that the glucose concentration in our standard diet was too low to cause an increase in IPC activity after starvation, we find that feeding standard diet plus glucose increases IPC activity to the same level as glucose only, and that adding arabinose to the standard diet does not lead to increased IPC activity after starvation (Figure 1H).

      The incretin-like effect is exciting and it will be interesting in the future to find out what might be the signal mediating this effect. It is interesting that IPCs in explants seem to be responsive to glucose. I think it would help if the authors could briefly discuss possible sources for the different findings between these in fact very different preparations. Could the the absence of the inhibitory DH44 feedback in the *ex-vivo* recordings for example play a role? 

      We thank the referee for this interesting point and expanded our discussion accordingly. We included that, in particular in brain explants without a VNC, the inhibitory connection we describe might be absent, as the referee suggested: ‘Previous ex vivo studies suggested that IPCs, like pancreatic beta cells, sense glucose cell-autonomously(23,24). Consistent with this, we observed an increase in IPC activity after the ingestion of glucose (Figure 2B). However, IPC activity did not increase during the perfusion of glucose directly over the brain. Importantly, the fly preparations were kept alive for several hours allowing the glucose-rich saline to enter circulation and reach all body parts. Several factors may explain the difference between ex vivo and in vivo preparations. First, in ex vivo studies, certain regulatory feedback mechanisms present in vivo could be absent. For example, the strong inhibitory input IPCs receive from DH44Ns we found would likely be absent in brain explants without a VNC. A lack of inhibitory feedback might allow for more direct glucose sensing by IPCs ex vivo, whereas in vivo, the IPC response could be suppressed by more complex systemic feedback. Second, we attempted to use the intracellular saline formulation employed in a previous ex vivo study44. However, we observed that IPCs depolarized quickly using this saline, leading to unstable recordings that did not meet our quality standards for in vivo experiments. Another possible explanation for the lack of an effect of glucose might have been that the dominant circulating sugar in flies is trehalose(70,71) which is derived from glucose. When we extended our experiments, we found that trehalose perfusion did not affect IPC activity either, strengthening the idea that IPCs do not directly sense changes in hemolymph sugar levels. Therefore, our findings suggest that, similar to mammals, IPC activity and hence, insulin release, is not simply modulated by hemolymph sugar concentration in Drosophila.’ 

      The incretin-like effect the authors observed seems to start only after 5h which seems longer than in mammals where, as far as I know, insulin peaks around 1h. Do the authors have ideas on how this timescale relates to ingestion and glucose dynamics in flies? 

      We have now included the following section in the discussion to explicitly address the question of different activity dynamics in flies and mammals, but also the limitations of our electrophysiological approach in this regard: ‘We observed that IPC activity increased over a timescale of hours, which is longer compared to the fast insulin response in mammals, where insulin typically peaks within an hour of feeding(97). In flies, insulin levels rise within minutes of refeeding, followed by a drop after 30 min(20). Our experimental techniques limit our ability to capture these fast initial dynamics, since the preparation for intracellular recordings requires tens of minutes, so that we typically recorded IPC activity at least 20 min after the last food ingestion. Notably, studies in fasted mammals have shown that insulin peaks within minutes of refeeding, followed by a rapid decline, with levels stabilizing as feeding continues(98,99). We speculate a similar dynamic could be present in flies, but with our approach, we capture the steady-state reached tens of minutes after food ingestion rather than a potential initial peak.’ 

      The authors mention "a decrease in the FV of IPC-activated starved flies even before the first optogenetic stimulation (Figure 2I),". Could this be addressed by running an experiment in darkness, only using the IR illumination of their behavioral assay? 

      We thank the referee for pointing out this unexpected result. We discuss this in more detail in the new version of our manuscript and expand on the reasons for not performing these optogenetic activation experiments in the dark: First, the red LED required to activate CsChrimson triggers strong startle responses in dark-adapted flies, which mask other behavioral effects, in particular subtle ones such as those observed for IPCs. The startle response is much reduced when performing experiments under low background light conditions. Second, flies, at least in our hands, do not exhibit robust foraging behavior or starvation-induced hyperactivity in the dark, which is critical for our behavioral experiments. However, we also explain in our discussion that we believe the effect of background illumination is relatively small, since flies expressing CsChrimson in OANs or DH44Ns show comparable activity levels to controls. Hence, a part of this effect is likely attributable to leak currents induced by CsChrimson expression. We would like to point out though that we are careful in our description of the IPC effect on behavior, and focus on the fact that it is considerably smaller than the effects of other modulatory neurons (DH44Ns and OANs).

      The authors show an inhibitory effect of DH44 neuron activation on IPC activity. They further demonstrate that DH44PI neurons are not the ones driving this and thus conclude that "...IPCs are inhibited by DH44Ns outside the PI.". As the authors mentioned the broad expression of the DH44-Gal4 line, can they be sure that the cells labeled outside the PI are actually DH44+? If so they should state this more clearly, if not they should adapt the discussion accordingly.   

      We have substantially added to our discussion of this point, according to the referee’s great suggestion. In short, the broad line includes neurons that are DH44 positive and neurons that are not: ‘Notably, the DH44<sup>PI</sup>Ns express the DH44 peptide, as confirmed by anti-DH44 stainings(100). This also applies to a large fraction of neurons labelled in the broad DH44 driver line(100). However, a subset of neurons labelled in the broad line did not exhibit DH44 immunoreactivity(100), and might therefore not actually express the DH44 peptide. Hence, the inhibition of IPCs could be driven by neurons in the DH44 driver line that do not express DH44.’

      Reviewer #3 (Public Review): 

      Although insulin release is essential in the control of metabolism, adjusted to nutritional state, and plays major roles in normal brain function as well as in aging and disease, our knowledge about the activity of insulin-producing (and releasing) cells (IPCs) in vivo is limited. 

      In this technically demanding study, IPC activity is studied in the Drosophila model system by fine in vivo patch clamp recordings with parallel behavioral analyses and optogenetic manipulation. 

      The data indicate that IPC activity is increased with a slow time course after feeding a high-glucose diet. By contrast, IPC activity is not directly affected by increasing blood glucose levels. This is reminiscent of the incretin effect known from vertebrates and points to a conserved mechanism in insulin production and release upon sugar feeding. 

      Moreover, the data confirm earlier studies that nutritional state strongly affects locomotion. Surprisingly, IPC activity makes only a negligible contribution to this. Instead, other modulatory neurons that are directly sensitive to blood glucose levels strongly affect modulation. Together, these data indicate a network of multiple parallel and interacting neuronal layers to orchestrate the physiological, metabolic, and behavioral responses to nutritional state. Together with the data from a previous study, this work sets the stage to dissect the architecture and function of this network. 

      Strengths: 

      State-of-the-art current clamp in situ patch clamp recordings in behaving animals are a demanding but powerful method to provide novel insight into the interplay of nutritional state, IPC activity, and locomotion. The patch clamp recordings and the parallel behavioral analyses are of high quality, as are the optogenetic manipulations. The data showing that starvation silences IPC activity in young flies (younger than 1 week) are compelling. The evidence for the claim that locomotor activity is not increased upon IPC activity but upon the activity of other blood glucose-sensitive modulatory neurons (Dh44) is strong. The study provides a great system to experimentally dissect the interplay of insulin production and release with metabolism, physiology, and behavior. 

      Weaknesses: 

      Neither the mechanisms underlying the incretin effect, nor the network to orchestrate physiological, metabolic, and behavioral responses to nutritional state have been fully uncovered. Without additional controls, some of the conclusions would require significant downtoning. Controls are required to exclude the possibility that IPCs sense other blood sugars than glucose. The claim that IPC activity is controlled by the nutritional state would require that starvation-induced IPC silencing in young animals can be recovered by feeding a normal diet. At current firing in starvation, silenced IPCs can only be induced by feeding a high-glucose diet that lacks other important ingredients and reduces vitality. Therefore, feasible controls are needed to exclude that diet-induced increases in IPC firing rate are caused by stress rather than nutritional changes in normal ranges. The finding that refeeding starved flies with a standard diet had no effect on IPC activity but a strong effect on the locomotor activity of starved flies contradicts the statement that locomotor activity is affected by the same dietary manipulations that affect IPC activity. The compelling finding that starvation induces IPC firing would benefit from determining the time course of the effect. The finding that IPCs are not active in fed animals older than 1 week is surprising and should be further validated. 

      We thank the referee for the thoughtful and constructive criticism of our experiments and conclusions. Below, we lay out how we tackled the individual points raised by the referee.

      (1) ‘Controls are required to exclude the possibility that IPCs sense other blood sugars than glucose.’  

      To address this point, we conducted experiments in which we perfused trehalose (Figure 3B), the main circulating hemolymph sugar in Drosophila and other insects. Our results clearly show that trehalose does not affect IPC activity upon perfusion, confirming our statements that IPCs do not sense key blood sugars directly.

      (2) ‘Feasible controls are needed to exclude that diet-induced increases in IPC firing rate are caused by stress rather than nutritional changes in normal ranges’. 

      We agree with the referee that this point was not completely fleshed out in our first submission. We have now performed additional experiments in which we added glucose (and fructose) to our standard diet (Figure 1H). Flies feeding on this diet received all necessary nutrients but still experienced high concentrations of sugars. The effects of high glucose in a standard diet background were indistinguishable from those of high glucose in agarose, confirming that the IPCs respond to sugar rather than stress. Another important observation in this context is that IPCs in flies kept on a high protein diet exhibited much lower spike rates than flies exhibiting the high glucose diet, even though they had a much shorter lifespan and therefore, presumably, experienced much higher stress levels (Figure 1H, Figure S1). These observations underline that stress is certainly not the primary factor here.

      (3) ‘The finding that refeeding starved flies with a standard diet had no effect on IPC activity but a strong effect on the locomotor activity of starved flies contradicts the statement that locomotor activity is affected by the same dietary manipulations that affect IPC activity.’

      We have revised the respective section of the results and discussion accordingly and are more careful and clearer in our interpretation of this behavioral dataset: ‘These results show that the locomotor activity was affected by the same dietary manipulations that had strong effects on IPC activity. However, IPC activity changes alone cannot explain the modulation of starvation-induced hyperactivity. On the one hand, high-glucose diets which drove the highest activity in IPCs were not sufficient to reduce locomotor activity back to baseline levels. On the other hand, refeeding flies with SD did not revert the effects of starvation on IPC activity (Figure 1H), but it was sufficient to reduce the locomotor activity below baseline levels (Figure 2B). This suggests that the modulation of starvation-induced hyperactivity is achieved by multiple modulatory systems acting in parallel.’

      (4) ‘The compelling finding that starvation induces IPC firing would benefit from determining the time course of the effect.’

      We followed the referee’s excellent suggestion and determined the time course of the starvation effect in three timesteps, similar to the experiments we did for refeeding (Figure 1G). In addition, we now also quantify the number of active IPCs (i.e., IPCs that fired at least one action potential during our five-minute analysis window), which further illustrates the dynamics of the starvation and refeeding effects. We find that the starvation effect is graded, and that IPC activity decreases with increasing starvation duration.

      (5) ‘The finding that IPCs are not active in fed animals older than 1 week is surprising and should be further validated.’

      To address the referee’s comment, we have added 14 new IPC recordings from flies in the 6–26-day range, such that we now have recordings from 9-14 IPCs for each age range (Figure S2B). They confirmed our previous analysis and strengthened the finding that IPC activity dramatically decreases after 8 days (on our standard diet). The total number of IPCs in this supplementary dataset was thus increased from 34 to 48.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      (1) Do IPCs respond to glucose specifically after ingestion or generally to any other nutritive sugars? To tackle this question the IPC responses in starved flies can be recorded after refeeding flies with other nutritive sugars (fructose, sucrose). 

      To address this important question, we have performed additional experiments in which we refed starved flies with fructose, as a nutritive sugar, and arabinose, as a non-nutritive sugar. As expected, IPCs responded to fructose but not arabinose and hence nutritive sugars in general. We describe and discuss these key results in the new version of our manuscript.

      (2) In Figure 2, the x and y axes are not annotated on all subfigures, which might help improve clarity. 

      We have annotated the subfigures as requested.

      (3) In the discussion on page 9 ("...we observed an increase in IPC activity after the ingestion of glucose (Figure 2B)."), the authors refer to Figure 2B instead of 3C.

      We have fixed this oversight.

      Reviewer #2 (Recommendations For The Authors): 

      Introduction 

      I think it could be helpful for the reader if you would briefly state the number of IPCs and whether you are targeting all of them with Dilp2-Gal4. 

      We included the numbers according to the suggestion. 14 IPCs are labeled in the driver line, and this is the number of IPCs commonly assumed to be present in the PI.

      Figures 

      In some Figures (for example 1D & E) the authors state the number of IPCs recorded (N) but not the number of animals used (n). This should be stated as the data from within an animal are dependent and might give insights about IPC heterogeneity. 

      We have compiled tables for the supplementary material (Tables S5 & S6) in which we state the number of IPCs and DH44<sup>PI</sup>Ns recorded and the number of different flies for each figure panel. We have recorded an average of 1.4 IPCs per fly (217 IPCs from 160 flies). We therefore expect the bias introduced by individual flies to be rather small. However, in our parallel study, we specifically investigate the heterogeneity of IPCs by maximizing the number of IPCs recorded per fly (Held M, et al. ‘Aminergic and peptidergic modulation of Insulin-Producing Cells in Drosophila’. eLife. 2024;13. doi:10.7554/ELIFE.99548.1). In the case of DH44PINs, we recorded 24 neurons in 21 flies – 1.1 neurons per fly.

      - Figure 3D: There is some white visible among the cell bodies in the overlay. I assume this comes from projecting across layers rather than indicating DH44 - IPC overlap? It would help to explicitly state that. 

      We have added a statement to the results section, in which we explain that most of the white is due to overlap in the z-projection rather than overlap in the driver lines. However, there are few cases (typically one to two cells per brain), in which neurons labeled by the DH44 line also stain positive for Dilp2, indicating they express both neuropeptides. We have added this information to the manuscript:  

      Results: ‘DH44<sup>PI</sup>Ns are anatomically similar to IPCs, and their cell bodies are located directly adjacent to those of IPCs in the PI, making them an ideal positive control for our experiments (Figure 3D). A small subset of DH44<sup>PI</sup>Ns also expresses Dilp2(75), and our immunostainings confirmed colocalization of Dilp2 and DH44 in a single neuron (Figure 3D, white arrow).’

      In figure caption: ‘UAS-myr-GFP was expressed under a DH44-GAL4 driver to label DH44 neurons. GFP was enhanced with anti-GFP (green), brain neuropils were stained with anti-nc82 (cyan), and IPCs were labelled using a Dilp2 antibody (magenta). White arrow indicates Dilp2 and DH44-GAL4 positive neuron. The other white regions in the image result from an overlap in z-projections between the two channels, rather than from antibody colocalization.’

      - Figure 4I: One might get the impression that the fast onset peak of activity precedes the stimulation onset, using a thinner line width might help avoid that. 

      This effect is due to a combination of using relatively heavy lines for clear visibility of the data and a gentle smoothing step (a 2s median filter, which corresponds to less than 1% of the 300s stimulation window) in our analysis of the behavioral data. However, inspection of the raw data clearly shows increases in velocity after the onset of the optogenetic activation. We clarified this in the figure caption: ‘Average FV across all DH44N activation trials based on two independent replications of the experiment in I. Note that the peak in average FV lies within the first frame of the stimulation window.’

      - S3 panel letters do not match references in the text.

      We fixed this oversight.

      Formatting 

      - Page 10: The paragraphs on the bottom of the page got switched around.

      This has been fixed.

      - Page 14: The first paragraph after the header "Free-walking assay" seems to be coming from elsewhere. 

      We apologize for this slightly embarrassing mistake. We used our related bioRxiv preprint (Held et al.) as a template for formatting this paper, and accidentally left this part of the methods section in the manuscript. We have fixed this error in our resubmission.

      Reviewer #3 (Recommendations For The Authors): 

      Major suggestions: 

      (1) The data show convincingly that IPC activity is decreased by starvation during the first week of adult life (Figures 1C and D). However, the conclusion that IPC activity is controlled by the nutritional state requires additional care. First, refeeding starved adult animals with a normal diet does not bring back normal IPC firing rates (Figure 1H). Therefore, IPC activity does not strictly follow changes in nutritional state, but IPCs are silenced by starvation. Second, from the second week of adult life on, IPCs are silent anyway, and thus unlikely responsive to changes in the nutritional state anymore (which might be different on a different standard diet?) The only effect of feeding on IPC activity is observed upon feeding starved, young animals with high glucose for 12-24 hrs (Figure 1G). However, it is not clear whether increased IPC firing is caused by the effects of high glucose on the nutritional state in a normal range, or because of diet-induced stress (the diet also severely shortens lifespan, Figure 1S). Does high glucose also increase IPC firing rate in young, fed animals? These would have strongly increased glucose concentrations but not suffer the stress of not getting any other nutrients. Such experiments would be required to make the statement that glucose feeding increases IPC firing rate. 

      We have performed several experiments to address this criticism. First, we performed a time course analysis of the starvation effect. We show that the IPC activity reduction is graded, and that IPC activity declines already after two hours of starvation, a timepoint at which stress levels should still be relatively small (Figure 1G). Second, we refed flies with high glucose concentrations added to the standard diet (Figure 1H). This minimized any potential stress responses due to a lack in nutrients. Third, we now show that IPCs specifically respond to nutritive (glucose and fructose), but not to non-nutritive sugars (arabinose, Figure 1H). We believe that these data sets, in addition to the graded refeeding effect, make a strong case for the nutritional state dependent modulation of IPCs. 

      (2) The testing of locomotor activity is well done, nicely recapitulates starvation-induced increases in locomotion, and adds interesting novel findings on refeeding with high glucose versus high protein diet. However, the statement that locomotor activity was affected by the same dietary manipulations that had strong effects on IPC activity does not reflect the data presented. Refeeding starved flies with a standard diet had no effect on IPC activity (Figure 1H) but a strong effect on locomotor activity of starved flies (a strong reduction, even stronger than high glucose diet, Figure 2B). 

      We have revised the respective section of the results and discussion accordingly and are more careful and clearer in our interpretation of this behavioral dataset: ‘These results show that the locomotor activity was affected by the same dietary manipulations that had strong effects on IPC activity. However, IPC activity changes alone cannot explain the modulation of starvationinduced hyperactivity. On the one hand, high-glucose diets which drove the highest activity in IPCs were not sufficient to reduce locomotor activity back to baseline levels. On the other hand, refeeding flies with SD did not revert the effects of starvation on IPC activity (Figure 1H), but it was sufficient to reduce the locomotor activity below baseline levels (Figure 2B). This suggests that the modulation of starvation-induced hyperactivity is achieved by multiple modulatory systems acting in parallel.’

      Related to points 1 and 2, a key statement that the results establish that IPC activity is controlled by the nutritional state requires care. What the data convincingly show is that IPC activity is near zero upon starvation. 

      As described above, we have added several extensive data sets (fructose feeding, arabinose feeding, trehalose perfusion, starvation time course) to show that we indeed observe a nutritional state dependent modulation of IPCs and describe these new results in the results and discussion.

      (3) The time course of nutritional state-dependent changes of IPC activity is claimed to be slow, several hours to days. Unless I have missed a figure, the underlying data are not presented (only for high glucose diet). It would be great if this could also be shown for a standard diet with higher glucose concentrations than the one used so that it rescues starvation-induced IPC silencing without shortening lifespan (if this is feasible?). The data showing starvation-induced IPC silencing are convincing, but, unless I have missed it, the time course has not been determined. It would be very nice to actually show this. Have different starvation times been tested in relation to IPC firing rate, and if yes, with what time resolution? Does IPC activity change already after 0.5 or 1 or a few hours of starvation? If starvation can silence IPCs faster than assumed, the nearzero IPC activity in animals older than a week could very well be caused by longer time intervals between meals. 

      We have performed experiments to address both important points raised by the referee here. 1) We have added high glucose concentrations to our standard diet, and show that it has the same effect – a significant increase in IPC activity – as the high glucose diet (Figure 1H). 2) We have analyzed the time course of IPC activity reduction in response to starvation (Figure 1G). Indeed, we find that a few hours of starvation start reducing IPC activity. We discuss the possibility that reduced IPC activity in older flies could be due to reduced food intake: ‘One of our experiments demonstrated that IPC activity was heavily diminished in flies older than 10 days (Figure S2B). A possible explanation could be that flies feed less as they age. However, this only holds true for flies older than 14 days86. Therefore, reduced IPC activity in 10-11 day old flies is unlikely to result from reduced food intake and likely involves inhibition of insulin signaling.’

      (4) The data on the proposed incretin effect are of high importance in potentially highlighting a highly conserved link between glucose ingestion and insulin release. An important control would be to test different sugars, such as trehalose, an important blood sugar of flies. If glucose is converted into trehalose and this is what IPCs sense, then perfusion of glucose has no effect. The fact fantastic experiments show that the DH44 neurons are sensitive to glucose perfusion does rule out that IPCs sense a different sugar. This would be very different from the incretin effect that requires additional hormones. In addition, as mentioned above, controls are required to show that high glucose affects IPCs as a nutrient and not as a stressor (see point 1), for example refeeding with a standard diet that contains a higher glucose concentration but does not reduce lifespan. Another great control to solidify the exciting claim on the incretin effect would be to knock out candidate Drosophila incretin hormones and test whether a high glucose diet stops increasing the IPC firing rate (although simpler controls might also do the job). 

      We have performed the two key experiments suggested by the referee. 1) We perfused trehalose as the primary blood sugar of flies and showed that IPCs do not respond to trehalose perfusion (Figure 3B & C). This further strengthens the finding that IPC activity in flies shows an incretin-like effect. 2) We have added high concentrations of glucose to our standard diet to provide flies with a full diet that contains high glucose concentrations. IPC activity in these flies was indistinguishable from the activity in flies which consumed pure glucose diets. In contrast, IPC activity in flies kept on a high protein diet, which dramatically reduced lifespan, was very low. These results clearly show that higher IPC activity is not due to increased stress levels, but a function of nutritive sugar ingestion. We further validated this hypothesis by refeeding flies with fructose as a nutritive sugar, which increased IPC activity, and arabinose as a non-nutritive sugar, which did not affect IPC activity (Figure 1H).

      Another point that might be relevant to this discussion is that IPC activity is almost entirely shut down during flight in Drosophila (which we showed in Liessem et al. 2023, Current Biology 33 (3), 449-463. e5). Several ‘stress hormones’ are released during flight, including octopamine. The fact that IPC activity is low in flying flies, starved flies, and flies kept on a pure protein diet (which all experience high stress levels), to us, very clearly suggests that stress is not the predominant factor here. We would also like to point out that, while the lifespan was reduced in flies kept on pure glucose diets, survival rates were at 100% until day 14, and we carried out our experiments on day 2 after starvation. Hence, these flies might not (yet) experience particularly high stress levels.

      (5) The discussion relates the absence of IPC firing in animals older than 1 week to aging. However, given that the flies fed on a normal diet show the typical lifespan for Drosophila, a 10-dayold fly is still in its youth. Maybe flies at 10 days eat simply less and thus IPC spiking goes down as in starved flies, especially because the standard diet used contains low glucose. Do IPCs also become silent after a week if the animals are fed with a standard diet that contains a higher glucose concentration? Without additional controls, this part of the discussion is pretty speculative and should be revised. 

      We agree with the reviewer, that it is not clear whether reduced IPC activity is a direct result of physiological changes that occur with aging, or an indirect effect of reduced food intake, which occur during aging. In both cases, in our view, it would be an age-related effect. Since this is a minor point of our manuscript, we decided not to perform additional experiments, other than significantly increasing the sample size for the aging data set already presented to shore up our findings (Figure S2B). We have, however, revisited the discussion of this point according to the referee’s suggestion: ‘One of our experiments demonstrated that IPC activity was heavily diminished in flies older than 10 days (Figure S2B). A possible explanation could be that flies feed less as they age. However, this only holds true for flies older than 14 days(85). Therefore, reduced IPC activity in 10-11 day old flies is unlikely to result from reduced food intake and likely involves inhibition of insulin signaling.’

      Other suggestions: 

      (6) For the mixed effects of octopamine and tyramine on larval locomotion that are referred to, it might be interesting to also look at Schützler et al 2019, PNAS because it shows that starvation activates TBH so that the octopamine to tyramine ratio is increased. 

      We refer to Schützler et al. in the following paragraph of our discussion: ‘This intermittent locomotor arrest has been previously described in adult flies and is thought to be mediated by ventral unpaired median OANs, which have been suggested to suppress long-distance foraging behavior(69). Since these are not the only neurons we activate in the TDC2 line, we speculate that the stopping phenotype could also result from concerted effects of octopamine and tyramine modulating muscle contractions(65-67) and motor neuron excitability(68), as previously described in Drosophila larvae, or from OANs interfering with pattern generating networks in the ventral nerve cord (VNC) during longer activation(69).’  

      (7) The reference list requires care. For example, reference 43 is identical to 67, reference 66 gives no information on incretin-like hormones in Drosophila as stated in the text 

      We carefully double-checked our reference list and corrected the mistakes mentioned.

    1. eLife Assessment

      The manuscript presents valuable findings of bone remodeling under chronic unpredictable mild stress (CUMS). This is an interesting work on mental stress on bone health and osteoporosis, and the authors offer solid evidence of decreased bone mass mediated by miR-335-3p/Fos signaling in osteoclasts that are involved in the induction of bone loss caused by CUMS. This revised version provided new data that improved the quality of the manuscript and addressed the reviewers' concerns.

    2. Reviewer #1 (Public review):

      I have reviewed the manuscript "Psychological stress disturbs bone metabolism via miR-335-3p/Fos signaling in osteoclast" with interest. The described findings are relevant and useful for daily practice in periodontology. The paper is concise, professionally written, and easy to read. In this study, Jiayao et al. revealed the role of miR-335-3p in psychological stress-induced osteoporosis. CUMS mice were constructed to observe the femur phenotype, osteoclasts were identified as the main research object, and miRNA-seq was used to find the key miRNAs linking the brain and peripheral tissues. This study showed that miR-335-3p expression was simultaneously reduced in murine NAC, serum, and bone under psychological stress. The miR-335-3p/Fos/NFATC1 signaling pathway was validated in osteoclasts to reveal the potential mechanism of enhanced osteoclast activity under psychological stress. This study, from a new perspective of miRNAs, indicates a possible cause of disturbed bone metabolism due to psychological stress and may suggest a new approach to treating osteoporosis.

    3. Reviewer #2 (Public review):

      Zhang et al. established chronic unpredictable mild stress (CUMS) mouse model, which displayed osteoporosis phenotype, suggesting a potential correlation between psychological stress and bone metabolism. They found that miRNA candidate miR-335-3p is downregulated in the long bone of CUMS mice through microRNA sequencing experiments and qRT-PCR. They further demonstrated that miR-335-3p attenuates osteoclast activity via inhibiting Fos signaling, which can induce NFATC1 expression and regulate osteoclast activity.

      My concerns have been addressed. And the quality of the manuscript is improved significantly.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      I have reviewed, with interest, the manuscript "Psychological stress disturbs bone metabolism via miR-335-3p/Fos signaling in osteoclast". The described findings are relevant and useful for daily practice in periodontology. The paper is concise, professionally written, and easy to read. In this study, Jiayao et al. revealed the role of miR-335-3p in psychological stress-induced osteoporosis. CUMS mice were constructed to observe the femur phenotype, osteoclasts were identified as the primary research object, and miRNA-seq was used to find the key miRNAs linking the brain and peripheral tissues. This study showed that the expression of miR-335-3p was simultaneously reduced in mice's NAC, serum, and bone under psychological stress. The miR-335-3p/Fos/NFATC1 signaling pathway was validated in osteoclasts to reveal the potential mechanism of enhanced osteoclast activity under psychological stress. From a new perspective of miRNAs, this study indicates a possible cause of disturbed bone metabolism due to psychological stress and may suggest a new approach to treating osteoporosis.

      We thank this reviewer for the instructive suggestions and encouragement.

      Reviewer #2 (Public Review):

      Zhang et al. established chronic unpredictable mild stress (CUMS) mouse model, which displayed osteoporosis phenotype, suggesting a potential correlation between psychological stress and bone metabolism. They found that miRNA candidate miR-335-3p is downregulated in the long bone of CUMS mice through microRNA sequencing and qRT-PCR experiments. They further demonstrated that miR-335-3p attenuates osteoclast activity via inhibiting Fos signaling, which can induce NFATC1 expression and regulate osteoclast activity.

      Strengths:

      The authors established CUMS mouse model and confirmed the osteoporosis phenotype through careful characterization of bone and analysis of osteoclast activity. They performed microRNA sequencing to identify the miRNA candidate regulating the bone loss in the CUMS mouse model. They also validated the expression of miR-335-3p and interfered with the function of miR-335-3p through an in vitro assay. Overall, the findings from this study provide important hints for the correlation between psychological stress and bone metabolism.

      We thank this reviewer for the comprehensive summary and positive comment on our work.

      Weakness:

      The data provided by the authors are preliminary, especially the mechanistic insight, which needs to be enhanced. The authors have shown that miR-335-3p expression was altered in the CUMS mouse model and the change of its expression regulated osteoclast activity. The validation should be conducted in vivo, and the mechanism behind this should be investigated further.

      We thank the reviewer’s important insight on the need for further in vivo validation of the role of miR-335-3p. Therefore, we designed and produced Antagomir-335-3p (antagonist) and Agomir-335-3p (agonist). Then, we injected them into the body through the tail vein for about 2 months and observed the bone phenotype in each group of mice. The results suggested that the decrease of miR-335-3p in vivo could lead to bone loss, which was consistent with our in vitro validation results (Figure 5H-I).

      Reviewing Editor:

      Method

      (1) Bone histomorphometric analysis following ASBMR's guidelines Bone histomorphometric analysis of bone formation and bone resorption: The authors should follow ASBMR's guidelines for bone histomorphometry (PMCID: PMC3672237 and PMID: 3455637) to perform standard analyses of histomorphometry, rather than selected areas. They should also clearly describe a software used and define the areas analyzed.

      We carefully re-analyzed bone histomorphometry according to ASBMR guidelines and combine this with our own understanding. At the same time, we improved the description of micro-CT and histological analysis in the method. If there is still any lack of standardization, we would be grateful for any constructive suggestions to improve this.

      (2) Osteoclast cultures require nuclear staining to demonstrate multinucleated Trap positive cells.

      We used the RAW264.7, a mouse macrophage-like cell line, for in vitro culture and induced its differentiation towards osteoclasts. Successfully induced osteoclasts showed enlarged cytoplasm and multinucleated fusion. Tartrate-resistant acid phosphatase (Trap) is the signature enzyme of osteoclasts. It can bind to the chromogen to exhibit a mauve color, based on the principle of azo-coupled immunohistochemistry. At the same time, small and rounded nuclei fused show a lighter color (author response image 1, yellow arrows). We attempted to stain the nuclei with hematoxylin based on this. However, it was unable to further distinguish the contours of the nuclei clearly due to the similar color to the Trap positive signals. Besides, many other scholars have assessed osteoclast activity in vitro experiments based solely on the results of Trap staining (area and number) (Cheng et al., 2022; Li et al., 2019; Ma et al., 2021; Zhong et al., 2023). Nevertheless, in the immunofluorescence staining of osteoclasts, the nuclei were labeled using a Hochest antibody to reflect the multinucleated fusion of osteoclasts (Figure 5G).  

      (3) Osteoclast pit assays should be carried out to necessarily demonstrate the change of osteoclast resorption ability caused by miR-335-3p.

      We added osteoclast pit assays to validate the role of miR-335-3p on osteoclast resorptive capacity (Figure 5D-E).

      (4) Serum ELISA assay should be done to examine the global change of bone remodeling in the CUMS mice to assess bone formation and bone resorption that will support their claim.

      We performed additional tests on serum concentrations of R-hydroxy glutamic acid protein (BGP), TRAP, Cathepsin K (CTSK), parathyroid hormone (PTH), calcium (CA), phosphate (P) in control and CUMS mice, which could better reflect the global change of bone remodeling in the CUMS mice (Figure 3— figure supplement 1).

      (5) miR-RNA-seq: A labeled volcano plot should be used to replace the present one to show significant changes in differential gene expression.

      We appreciate this great suggestion. We replaced the volcano plot that showed significant changes in differential gene expression (Figure 4B). We also uploaded the raw data to the GEO database (GSE253504), making the results clearer and more accessible.

      Discussion

      The authors should discuss previous works on the influences of hormones from the brain on chronic stress-induced bone loss and an association of these influences with their findings.

      The discussion on the relationship between the bone metabolism regulation of both hormones and miR-335-3p in psychological stress was added in the second and fifth paragraphs of the discussion. To conclude, on the one hand, brain-derived and blood-transported miR-335-3p regulate bone metabolism synergistically. On the other hand, it exerted a more direct influence on bone under psychological stress.

      Language

      The language of the MS should be improved.

      The manuscript has been carefully edited by a professional proofreader.

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 1F: The exact meaning of the Waveform Graph shown at left needs to be clarified for the not-so-experienced reader.

      We added the more detailed meaning of the Waveform Graph in figure legends (Figure legend 1F).

      (2) Is the concomitant increase in osteogenic and osteoblastic activity in this study consistent with that seen in similar disease studies? This could be added to the discussion.

      In the fifth paragraph of the discussion section, we present the alterations of osteogenic and osteoblastic activity observed in other studies that are similar to ours. We also had a detailed discussion based on these observations.

      (3) Figure 6A: Please highlight the key information to visualize the potential linkage among miR-335-3p, Fos, and osteoclast.

      We highlighted the crucial linkage among miR-335-3p, Fos, and osteoclast with red arrows (Figure 6A)

      4) Figure 6E: The specific area of the selected comparison needs to be clarified. Please add white dotted lines and lettering T (trabecular bone) and GP (growth plate) for the not-so-experienced reader. This will provide some orientation.

      We used white dotted lines as well as letters to label the tissue in immunofluorescence staining images (Figure 6E).

      (5) Line 350: "NAC derived and blood-trans, Ported miR-335-3p". There is a grammatical error. Please conduct general proofreading of the text and writing style.

      Thank you for pointing this out. We have corrected this grammatical error, and we also checked the full text to correct similar errors.

      Reviewer #2 (Recommendations For The Authors):

      (1) miR-335-3p was downregulated in the femur in the CUMS mice. The possible mechanism for this outcome should be further discussed. In Figure 4B, the Volcano plot showed that only a few miRNA were differentially expressed between the control and CUMS mice. How do the authors explain this?

      The chronic unpredictable mild stress (CUMS) model was constructed using normal mice. As the name of the model suggests, the stimulus is mild and does not cause developmental damage or teratogenic effects in mice. Conversely, CUMS has the potential to result in the chronic pathological conditions. Besides, in miRNA sequencing results from other tissues with similar models to ours, the number of differential miRNAs is also around a few dozen (Ma et al., 2019).

      (2) The authors have demonstrated that miR-335-3p inhibits osteoclast differentiation based on an in vitro assay in Figure 5; however, an in vivo experiment is required to provide more solid evidence.

      We strongly agree that in vivo experimental validation would bring more convincing results to this study. Therefore, we designed and produced Antagomir-335-3p (antagonist) and Agomir-335-3p (agonist), which were injected into mice via the tail vein every five days. Samples were collected at one and two months following the injection. We found that sustained two-month injections of antagomir could significantly lead to bone loss in mice (Figure 5H-I), which is consistent with our in vitro validation results.

      However, the Agomir-miR-335-3p group did not exhibit a notable enhancement of bone mass. This may be attributed to the fact that the 11-week-old normal mice selected for this study were in their prime and did not have strong osteoclastic activity in vivo. Therefore, the osteoclastic inhibition of Agomir-335-3p could not be demonstrated.

      In addition, no significant difference was seen one month after the injection. The main reason may be that the time is too short. On the one hand, the drug we injected was RNA preparation. They lacked stability resulting in poor delivery efficiency, which took some time to take effect. On the other hand, bone remodeling is also a time-consuming process.

      (3) FOS and NFATC1 should be expressed in the nuclei of the cells, therefore, the quality of the images needs to be improved.

      NFATC1 is a T-cell-activating nuclear factor that is activated in the nucleus to regulate the transcription of a variety of osteoclast-related genes, including ACP5, MMP9, etc. FOS could bind and interact with NFATC1, resulting in nuclear translocation and transcription activated. This could promote the differentiation and maturation of osteoclasts. They are both synthesized and processed in the cytoplasm and eventually enter the nucleus to perform their functions. Therefore, they are expressed in both the nucleus and the cytoplasm (Deng et al., 2022; Hounoki et al., 2008; Li et al., 2022).

      In Figure 5G, we labeled cell nuclei with HOCHEST antibody with blue fluorescence, and more co-localized signals of nuclei (blue), FOS (red), and NFATC1 (green) were seen in the Inhibitor-miR-335-3p group, whereas the opposite result was observed in the Mimic-miR-335-3p group. These results indicated that inhibited miR-335-3p could promote osteoclast differentiation in vitro.

      (4) The expression of FOS was elevated in CUMS group in Figure 6E; however, its mRNA level was unchanged, as shown in Figure 6 supplement; what is the explanation for this? How do the authors claim FOS is the downstream target if its mRNA expression is not impacted by CUMS?

      The results demonstrated that miR-335-3p targeted binding to the mRNA of Fos did not result in mRNA degradation. Instead, this binding interferes with the protein translation process, which ultimately leads to the reduction of FOS protein.

      (5) What would be the bone phenotype if a FOS inhibitor was injected into the control and CUMS mice? It is important to examine FOS function through an in vivo context.

      The regulatory role of FOS for osteoclasts has been validated in numerous articles, both in vivo and in vitro(Aikawa et al., 2008; Cao et al., 2023; Cheng et al., 2022). For example, Aikawa et al. designed a small-molecule inhibitor of c-Fos/activator protein-1 (AP-1) using three-dimensional (3D) pharmacophore modeling, which helped verify the effect of FOS on osteoclasts in vivo(Aikawa et al., 2008).

      We also strongly agree that in vivo injection of inhibitors of FOS, especially in CUMS mice, could further substantiate the role of miR-335-3p in osteoclasts under psychological stress. However, the study was constrained by the unavailability of commercially viable, efficacious small molecule inhibitors of FOS. In the future, we plan to design more precise therapeutic targets for psychological stress induced osteoporosis based on existing research ideas.

      Reference

      Aikawa, Y., Morimoto, K., Yamamoto, T., Chaki, H., Hashiramoto, A., Narita, H., Hirono, S., & Shiozawa, S. (2008). Treatment of arthritis with a selective inhibitor of c-Fos/activator protein-1. Nature Biotechnology, 26(7), 817-823. https://doi.org/10.1038/nbt1412

      Cao, Z., Niu, X. B., Wang, M. H., Yu, S. W., Wang, M. K., Mu, S. L., Liu, C., & Wang, Y. X. (2023, Nov). Anemoside B4 attenuates RANKL-induced osteoclastogenesis by upregulating Nrf2 and dampens ovariectomy-induced bone loss [Article]. Biomedicine & Pharmacotherapy, 167, 12, Article 115454. https://doi.org/10.1016/j.biopha.2023.115454

      Cheng, X., Yin, C., Deng, Y., & Li, Z. (2022). Exogenous adenosine activates A2A adenosine receptor to inhibit RANKL-induced osteoclastogenesis via AP-1 pathway to facilitate bone repair. Molecular Biology Reports, 49(3), 2003-2014. https://doi.org/10.1007/s11033-021-07017-1

      Deng, W., Ding, Z., Wang, Y., Zou, B., Zheng, J., Tan, Y., Yang, Q., Ke, M., Chen, Y., Wang, S., & Li, X. (2022). Dendrobine attenuates osteoclast differentiation through modulating ROS/NFATc1/ MMP9 pathway and prevents inflammatory bone destruction. Phytomedicine : International Journal of Phytotherapy and Phytopharmacology, 96, 153838. https://doi.org/10.1016/j.phymed.2021.153838

      Hounoki, H., Sugiyama, E., Mohamed, S. G.-K., Shinoda, K., Taki, H., Abdel-Aziz, H. O., Maruyama, M., Kobayashi, M., & Miyahara, T. (2008). Activation of peroxisome proliferator-activated receptor gamma inhibits TNF-alpha-mediated osteoclast differentiation in human peripheral monocytes in part via suppression of monocyte chemoattractant protein-1 expression. Bone, 42(4), 765-774. https://doi.org/10.1016/j.bone.2007.11.016

      Li, Y., Yang, C., Jia, K., Wang, J., Wang, J., Ming, R., Xu, T., Su, X., Jing, Y., Miao, Y., Liu, C., & Lin, N. (2022). Fengshi Qutong capsule ameliorates bone destruction of experimental rheumatoid arthritis by inhibiting osteoclastogenesis. Journal of Ethnopharmacology, 282, 114602. https://doi.org/10.1016/j.jep.2021.114602

      Li, Z., Huang, J., Wang, F., Li, W., Wu, X., Zhao, C., Zhao, J., Wei, H., Wu, Z., Qian, M., Sun, P., He, L., Jin, Y., Tang, J., Qiu, W., Siwko, S., Liu, M., Luo, J., & Xiao, J. (2019). Dual Targeting of Bile Acid Receptor-1 (TGR5) and Farnesoid X Receptor (FXR) Prevents Estrogen-Dependent Bone Loss in Mice. Journal of Bone and Mineral Research : the Official Journal of the American Society For Bone and Mineral Research, 34(4), 765-776. https://doi.org/10.1002/jbmr.3652

      Ma, K., Zhang, H., Wei, G., Dong, Z., Zhao, H., Han, X., Song, X., Zhang, H., Zong, X., Baloch, Z., & Wang, S. (2019). Identification of key genes, pathways, and miRNA/mRNA regulatory networks of CUMS-induced depression in nucleus accumbens by integrated bioinformatics analysis. Neuropsychiatric Disease and Treatment, 15, 685-700. https://doi.org/10.2147/NDT.S200264

      Ma, Q., Liang, M., Wu, Y., Luo, F., Ma, Z., Dong, S., Xu, J., & Dou, C. (2021). Osteoclast-derived apoptotic bodies couple bone resorption and formation in bone remodeling. Bone Research, 9(1), 5. https://doi.org/10.1038/s41413-020-00121-1

      Zhong, L., Lu, J., Fang, J., Yao, L., Yu, W., Gui, T., Duffy, M., Holdreith, N., Bautista, C. A., Huang, X., Bandyopadhyay, S., Tan, K., Chen, C., Choi, Y., Jiang, J. X., Yang, S., Tong, W., Dyment, N., & Qin, L. (2023). Csf1 from marrow adipogenic precursors is required for osteoclast formation and hematopoiesis in bone. eLife, 12. https://doi.org/10.7554/eLife.82112

    1. eLife Assessment

      This study presents an advance in efforts to use histone post-translational modification (PTM) data to model gene expression and predict epigenetic editing activity. Such models are broadly useful to the research community, especially ones that can model epigenetic editing activity, which is novel; additionally, the authors have nicely integrated datasets across cell types into their model. The work is mostly solid, but it would be strengthened by performing rigorous comparisons to existing methods that predict gene expression from PTM data and from additional model validation beyond dCas9-p300 based perturbations.

    2. Reviewer #1 (Public review):

      Batra, Cabrera and Spence et al. present a model which integrates histone posttranslational modification (PTM) data across cell models to predict gene expression with the goal of using this model to better understand epigenetic editing. This gene expression prediction model approach is useful if a) it predicts gene expression in specific cell lines b) it predicts expression values rather than a rank or bin, c) if it helps us to better understand the biology of gene expression or d) it helps us to understand epigenome editing activity. Problematically for point a) and b) it is easier to directly measure gene expression than to measure multiple PTMs and so the real usefulness of this approach mostly relates to c) and d).

      Other approaches have been published that use histone PTM to predict expression (e.g. PMID 27587684, 36588793). Is this model better in some way? No comparisons are made although a claim is made that direct comparisons are difficult. I appreciate that the authors have not used the histone PTM data to predict gene expression levels of an "average cell" but rather that they are predicting expression within specific cell types or for unseen cell types. Approaches that predict expression levels are much more useful whereas some previous approaches have only predicted expressed or not expressed or a rank order or bin-based ranking. The paper does not seem to have substantial novel insights into understanding the biology of gene expression.

      The approach of using this model to predict epigenetic editor activity on transcription is interesting and to my knowledge novel although only examined in the context of a p300 editor. As the author point out the interpretation of the epigenetic editing data is convoluted by things like sgRNA activity scoring and to fully understand the results likely would require histone PTM profiling and maybe dCas9 ChIP-seq for each sgRNA which would be a substantial amount of work.

      Furthermore from the model evaluation of H3K9me3 is seems the model is performing modestly for other forms of epigenetic or transcriptional editing- e.g. we know for the best studied transcriptional editor which is CRISPRi (dCas9-KRAB) that recruitment to a locus is associated with robust gene repression across the genome and is associated with H3K9me3 deposition by recruitment of KAP1/HP1/SETDB1 (PMID: 35688146, 31980609, 27980086, 26501517).

      One concern overall with this approach is that dCas9-p300 has been observed to induce sgRNA independent off target H3K27Ac (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8349887/ see Figure S5D) which could convolute interpretation of this type of experiment for the model.

    3. Reviewer #2 (Public review):

      Summary:

      The authors build a gene expression model based on histone post-translational modifications, and find that H3K27ac is correlated with gene expression. They proceed to perturb H3K27ac at 13 gene promoters in two cell types, and measure gene expression changes to test their model.

      Strengths:

      The combination of multiple methods to model expression, along with utilizing 6 histone datasets in 13 cell types allowed the authors to build a model that correlates between 0.7-0.79 with gene expression. They use dCas9-p300 fusions to perturb H3K27ac and monitor gene expression to test their model. Ranked correlations of the HEK293 data showed some support for the predictions after perturbation of H3K27ac.

      Weaknesses:

      The perturbation of 5 genes in K562 with perturb-seq data shows a modest correlation of ~0.5 and isn't included in the main figures. The authors are then left to speculate reasons why the outcome of epigenome editing doesn't fit their predictions, which highlights the limited value in the current version of this method.

      As mentioned before, testing genes that were not expressed being most activated by dCas9-p300 weaken the correlations vs. looking at a broad range of different gene expression as the original model was trained on.<br /> If the authors want this method to be used to predict outcomes of epigenome editing, expanding to dCas9-KRAB and other CRISPRa methods (SAM and VPR) would be useful. Those datasets are published and could be analyzed for this manuscript.<br /> The authors don't compare their method to other prediction methods.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Batra, Cabrera, Spence et al. present a model which integrates histone posttranslational modification (PTM) data across cell models to predict gene expression with the goal of using this model to better understand epigenetic editing. This gene expression prediction model approach is useful if a) it predicts gene expression in specific cell lines b) it predicts expression values rather than a rank or bin, c) it helps us to better understand the biology of gene expression, or d) it helps us to understand epigenome editing activity. Problematically for points a) and b) it is easier to directly measure gene expression than to measure multiple PTMs and so the real usefulness of this approach mostly relates to c) and d).

      We thank the reviewer for their comment and we agree that directly measuring gene expression (e.g., by performing RNA-seq) is easier than performing multiple PTMs in a new cell line. We designed our approach keeping in mind that the primary use case is to understand how epigenome editing would affect gene expression.

      Other approaches have been published that use histone PTM to predict expression (e.g. 27587684, 36588793). Is this model better in some way? No comparisons are made. The paper does not seem to have substantial novel insights into understanding the biology of gene expression. The approach of using this model to predict epigenetic editor activity on transcription is interesting and to my knowledge novel but I doubt given the variability of the predictions (Figures 6 and S7&8) that many people will be interested in using this in a practical sense. As the authors point out, the interpretation of the epigenetic editing data is convoluted by things like sgRNA activity scoring and to fully understand the results likely would require histone PTM profiling and maybe dCas9 ChIP-seq for each sgRNA which would be a substantial amount of work.

      We thank the reviewer for this insightful comment. We have included citations for a series of papers (PMIDs: 27587684, 30147283, 36588793) that performed gene expression prediction using histone PTM data. However, each of these methods perform classification of gene expression as opposed to predicting the actual gene expression value via regression. Additionally, the referenced studies all work with Roadmap Epigenomics read depth data as opposed to p-values obtained from the ENCODE pipelines, making it difficult to make direct comparisons.

      We outline in the Discussion section that by creating a comprehensive dataset of epigenome editing outcomes, which include quantification of histone PTMs before and after in situ perturbations, will improve our understanding of the effects of dCas9-p300 on gene expression and assist in the design of gRNAs for achieving fine-tuned control over gene expression levels. 

      Furthermore from the model evaluation of H3K9me3 it seems the model is not performing well for epigenetic or transcriptional editing- e.g. we know for the best studied transcriptional editor which is CRISPRi (dCas9-KRAB) that recruitment to a locus is associated with robust gene repression across the genome and is associated with H3K9me3 deposition by recruitment of KAP1/HP1/SETDB1 (PMID: 35688146, 31980609, 27980086, 26501517). However, it seems from Figures 2&4 that the model wouldn't be able to evaluate or predict this.

      We thank the reviewer for their comment. We have included a supplementary figure, Figure 4 – figure supplement 1, that quantifies how sensitive the trained gene expression model is to perturbations in H3K9me3. Indeed our data suggests that the model predictions are sensitive to perturbations in H3K9me3. For instance, there is a clear decrease and a gradual increase as the position where the perturbation is performed moves from upstream to downstream of the TSS. Additionally, the magnitude of the predicted fold-change is a function of how much the H3K9me3 is perturbed and hence the magnitude of change would be even higher if the perturbation magnitude is increased. However, this precise magnitude is hard to estimate In the absence of experimental perturbation data for H3K9me3.

      The model seems to predict gene expression for endogenous genes quite well although the authors sometimes use expression and sometimes use rank (e.g. Figure 6) - being clearer with how the model predicts expression rather than using rank or fold change would be very useful.

      We thank the reviewer for this important suggestion. We have added text in the revised manuscript to clarify that the model predicts gene expression values, which can be interpreted as rank or fold change, depending on the use case.

      One concern overall with this approach is that dCas9-p300 has been observed to induce sgRNA-independent off-target H3K27Ac (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8349887/ see Figure S5D) which could convolute interpretation of this type of experiment for the model.

      This is an excellent point and indeed, we and others have observed that dCas9-p300 can result in off-target H3K27ac levels (both increased and suppressed) across the genome. However, p300 is one of the few known proteins that can catalyze H3K27ac in the human genome, and H3K27ac remains a proxy for active genomic regulatory elements. Nevertheless, dCas9-p300 off target activity could certainly convolute our approach. We have included language to address this caveat in our discussion. Interestingly, even though dCas9-p300 (and other epigenome editing enzymes) can lead to off-target chromatin modifications, these effects often occur without coincident disruptions to the transcriptome. This suggests that many chromatin modifications, while “supportive” or “instructive” of/for transcription, may be insufficient (either alone or in the context of dCas9-based fusions) for transcriptional effects.

      Figure 2

      It seems this figure presents known rather than novel findings from the authors' description. Please comment on whether there are any new findings in this figure. Please comment on differences in patterns of repressive and activating histone PTMs between cell lines (e.g. H1-Esc H3K27me3 green 25-50% is more enriched than red 0-25%).

      Thank you for pointing out this issue. We have revised the text in both the Results and Discussion sections to better articulate that the goal of this figure is to validate the hypothesis that there are consistent patterns of histone PTMs with respect to gene expression across different human cell types.

      In Figure 2, which illustrates the raw histone marks data, the non-monotonic behavior of H3K27me3 in H1-hESC cells is indicative of a real biological phenomenon. This interpretation is supported by the relatively low Pearson correlation for the H3K27me3 mark observed in these cells, as documented in Figure 1b of another study: https://www.biorxiv.org/content/10.1101/2024.03.29.587323v1.

      Figure 3&4

      There are a number of approaches including DeepChrome and TransferChrome that predict endogenous gene expression from histone PTMs. I appreciate that the authors have not used the histone PTM data to predict gene expression levels of an "average cell" but rather that they are predicting expression within specific cell types or for unseen cell types. But from what is presented it isn't clear that the author's model is better or enabling beyond other approaches. The authors should show their model is better than other approaches or make clear why this is a significant advance that will be enabling for the field. For example is it that in this approach they are actually predicting expression levels whereas previous approaches have only predicted expressed or not expressed or a rank order or bin-based ranking?

      We thank the reviewer for this comment. We have added text to clarify the difference between our approach and existing approaches. There are two key differences between our model and other approaches. First, the gene expression model that we have trained here predicts gene expression values instead of gene expression levels as either high or low. Second, we have trained our models on ENCODE p-value data instead of read depths obtained from the Roadmap Epigenomics Consortium.

      Figure 5

      From the methods, it seems gene activation is measured by qpcr in hek293 transfected with individual sgRNAs and dCas9-p300. The cells aren't selected or sorted before qPCR so how are we sure that some of the variability isn't due to transfection efficiency associated with variable DNA quality or with variable transfection efficiency?

      This is a good question. All DNA preps were generated using high-quality reagents and consistent protocols. In addition, the only variable that changed with respect to transfection efficiency was the gRNA-encoding vector used in qPCR assays. We have added new data which demonstrates that transfection efficiency is shared across experiments (Figure 5 – figure supplement 1). We have also added additional experimental data as well as computational analysis analyzing a new dCas9-p300 based Perturb-seq dataset to the manuscript (Figure 6 – figure supplement 1), which use lentiviral transduction and RNA-seq as readouts and thus, are buffered against the variances mentioned by the Reviewer.

      Figure 6

      The use of rank in 6D and 6E is confusing. In 6D a higher rank is associated with higher expression while in 6E a higher rank seems to mean a lower fold change e.g. CYP17A1 has a low predicted fold-change rank and qPCR fold-change rank but in Figure 5 a very high qPCR fold change. Labeling this more clearly or explaining it in the text further would be useful.

      We thank the reviewer for their suggestion. We have made relevant changes to the caption of Figure 6 to clarify this.

      Reviewer #2 (Public Review):

      Summary:

      The authors build a gene expression model based on histone post-translational modifications and find that H3K27ac is correlated with gene expression. They proceed to perturb H3K27ac at 8 gene promoters, and measure gene expression changes to test their model.

      Strengths:

      The combination of multiple methods to model expression, along with utilizing 6 histone datasets in 13 cell types allowed the authors to build a model that correlates between 0.7-0.79 with gene expression. This group also utilized a tool they are experts in, dCas9-p300 fusions to perturb H3K27ac and monitor gene expression to test their model. Ranked correlations showed some support for the predictions after the perturbation of H3K27ac.

      Weaknesses:

      The perturbation of only 8 genes, and the only readout being qPCR-based gene expression, as opposed to including H3K27ac, weakened their validation of the computational model. Likewise, the use of six genes that were not expressed being most activated by dCas9-p300 might weaken the correlations vs. looking at a broad range of different gene expressions as the original model was trained on.

      We thank the reviewer for their comments. We have added additional experimental data as well as computational analysis analyzing a new dCas9-p300 based Perturb-seq dataset to the manuscript. We observe that the models we have developed are able to predict the fold-change rank across genes reasonably well (Figure 6 – figure supplement 1), similar to what we observe in Figure 6E.

      Reviewer #1 (Recommendations For The Authors):

      The authors should comment on how their model is different from or better than other models that use histone PTM data to predict gene expression.

      We thank the reviewer for this insightful suggestion. We have included citations for a series of papers (PMIDs: 27587684, 30147283, 36588793) that performed gene expression prediction using histone PTM data. However, each of these methods perform classification of gene expression as opposed to predicting the actual gene expression value via regression. Additionally, the referenced studies all work with Roadmap Epigenomics read depth data as opposed to p-values obtained from the ENCODE pipelines, making it difficult to make direct comparisons.

      The authors need to make clear whether their model will apply to other common epigenetic or transcriptional editors such as CRISPRi/H3K9me3 which is widely used.

      In this study, we focus on the histone changes induced by p300. However, future studies may use the framework described in our manuscript and apply it to other transcriptional editors as well.

      The authors need to be clearer about where they are predicting expression and where they are using rank. Ideally, show both.

      We thank the reviewer for this important suggestion. We have added text in the revised manuscript to clarify that the model predicts gene expression values, which can be interpreted as rank or fold change, depending on the use case.

      The authors should ideally show a case where they use the model to make a prediction of genes that can and can not be activated by dCas9-p300 or other epigenetic editors and then prove this with experiments.

      Thank you for the excellent suggestion. While it is indeed relevant, exploring this would extend beyond the scope of our current study. We consider it a valuable topic for future research.

      Reviewer #2 (Recommendations For The Authors):

      The y-axis in 5C needs to be labeled. The authors state it is "relative mRNA" but these numbers correlated with fold changes shown in Table S2.

      We have clarified the definition of the Y-axis in the caption for Figure 5C.

    1. eLife Assessment

      This valuable study presents a meta-analysis of the literature, confirming the relationship between the coupling of slow oscillations and fast spindles in memory formation, although the reported effects are weak and should be more clearly justified. Furthermore, while the evidence is convincing overall, the manuscript provides an incomplete description of the methods, which may challenge comprehension for readers unfamiliar with advanced statistical techniques. This study will be of interest to neuroscientists focusing on sleep and memory.

    2. Reviewer #1 (Public review):

      In this meta-analysis, Ng and colleagues review the association between slow-oscillation spindle coupling during sleep and overnight memory consolidation. The coupling of these oscillations (and also hippocampal sharp-wave ripples) have been central to theories and mechanistic models of active systems consolidation, that posit that the coupling between ripples, spindles, and slow oscillations (SOs) coordinate and drive the coordinated reactivation of memories in hippocampus and cortex, facilitating cross-regional information and ultimately memory strengthening and stabilisation.

      Given the importance that these coupling mechanisms have been given in theory, this is a timely and important contribution to the literature in terms of determining whether these theoretical assumptions hold true in human data. The results show that the timing of sleep spindles relative to the SO phase, and the consistency of that timing, predicted overnight memory consolidation in meta-analytic models. The overall amount of coupling events did not show as strong a relationship. The coupling phase in particular was moderated by a number of variables including spindle type (fast, slow), channel location (frontal, central, posterior), age, and memory type. The main takeaway is that fast spindles that consistently couple close to the peak of the SO in frontal channel locations are optimal for memory consolidation, in line with theoretical predictions.

      I did not follow the logic behind including spindle amplitude in the meta-analysis. This is not a measure of SO-spindle coupling (which is the focus of the review), unless the authors were restricting their analysis of the amplitude of coupled spindles only. It doesn't sound like this is the case though. The effect of spindle amplitude on memory consolidation has been reviewed in another recent meta-analysis (Kumral et al, 2023, Neuropsychologia). As this isn't a measure of coupling, it wasn't clear why this measure was included in the present meta-analysis. You could easily make the argument that other spindle measures (e.g., density, oscillatory frequency) could also have been included, but that seems to take away from the overall goal of the paper which was to assess coupling.

      At the end of the first paragraph of section 3.1 (page 13), the authors suggest their results "... further emphasise the role of coupling compared to isolated oscillation events in memory consolidation". This had me wondering how many studies actually test this. For example, in a hierarchical regression model, would coupled spindles explain significantly more variance than uncoupled spindles? We already know that spindle activity, independent of whether they are coupled or not, predicts memory consolidation (e.g., Kumral meta-analysis). Is the variance in overnight memory consolidation fully explained by just the coupled events? If both overall spindle density and coupling measures show an equal association with consolidation, then we couldn't conclude that coupling compared to isolated events is more important.

      It was very interesting to see that the relationship between the fast spindle coupling phase and overnight consolidation was strongest in the frontal electrodes. Given this, I wonder why memory promoting fast spindles shows a centro-parietal topography? Surely it would be more adaptive for fast spindles to be maximally expressed in frontal sites. Would a participant who shows a more frontal topography of fast spindles have better overnight consolidation than someone with a more canonical centro-parietal topography? Similarly, slow spindles would then be perfectly suited for memory consolidation given their frontal distribution, yet they seem less important for memory.

      The authors rightly note the issues with multiple comparisons in sleep physiology and memory studies. Multiple comparison issues arise in two ways in this literature. First are comparisons across multiple electrodes (many studies now use high-density systems with 64+ channels). Second are multiple comparisons across different outcome variables (at least 3 ways to quantify coupling (phase, consistency, occurrence) x 2 spindle types (fast, slow). Can the authors make some recommendations here in terms of how to move the field forward, as this issue has been raised numerous times before (e.g., Mantua 2018, Sleep; Cox & Fell 2020, Sleep Medicine Reviews for just a couple of examples). Should researchers just be focusing on the coupling phase? Or should researchers always report all three metrics of coupling, and correct for multiple comparisons? I think the use of pre-registration would be beneficial here, and perhaps could be noted by the authors in the final paragraph of section 3.5, where they discuss open research practices.

    3. Reviewer #2 (Public review):

      Summary:

      This article reviews the studies on the relationship between slow oscillation (SO)-spindle (SP) coupling and memory consolidation. It innovatively employs non-normal circular linear correlations through a Bayesian meta-analysis. A systematic analysis of the retrieved studies highlighted that co-coupling of SO and the fast SP's phase and amplitude at the frontal part better predicts memory consolidation performance. I only have a few comments that I recommend are addressed.

      Major Comments:

      Regarding the Moderator of Age: Although the authors discuss the limited studies on the analysis of children and elders regarding age as a moderator, the figure shows a significant gap between the ages of 40 and 60. Furthermore, there are only a few studies involving participants over the age of 60. Given the wide distribution of effect sizes from studies with participants younger than 40, did the authors test whether removing studies involving participants over 60 would still reveal a moderator effect?

    4. Reviewer #3 (Public review):

      This manuscript presents a meta-analysis of 23 studies, which report 297 effect sizes, on the effect of SO-spindle coupling on memory performance. The analysis has been done with great care, and the results are described in great detail. In particular, there are separate analyses for coupling phase, spindle amplitude, coupling strength (e.g., measured by vector length or modulation index), and coupling percentage (i.e., the percentage of SPs coupled with SOs). The authors conclude that the precision and strength of coupling showed significant correlations with memory retention.

      There are two main points where I do not agree with the authors.

      First, the authors conclude that "SO-SP coupling should be considered as a general physiological mechanism for memory consolidation". However, the reported effect sizes are smaller than what is typically considered a "small effect" (0.10<br /> Second, the study implements state-of-the-art Bayesian statistics. While some might see this as a strength, I would argue that it is the greatest weakness of the manuscript. A classical meta-analysis is relatively easy to understand, even for readers with only a limited background in statistics. A Bayesian analysis, on the other hand, introduces a number of subjective choices that render it much less transparent. This becomes obvious in the forest plots. It is not immediately apparent to the reader how the distributions for each study represent the reported effect sizes (gray dots). Presumably, they depend on the Bayesian priors used for the analysis. The use of these priors makes the analyses unnecessarily opaque, eventually leading the reader to question how much of the findings depend on subjective analysis choices (which might be answered by an additional analysis in the supplementary information). However, most of the methods are not described in sufficient detail for the reader to understand the proceedings. It might be evident for an expert in Bayesian statistics what a "prior sensitivity test" and a "posterior predictive check" are, but I suppose most readers would wish for a more detailed description. However, using a "Markov chain Monte Carlo (MCMC) method with the no-U-turn Hamiltonian Monte Carlo (HMC) sampler" and checking its convergence "through graphical posterior predictive checks, trace plots, and the Gelman and Rubin Diagnostic", which should then result in something resembling "a uniformly undulating wave with high overlap between chains" is surely something only rocket scientists understand. Whether this was done correctly in the present study cannot be ascertained because it is only mentioned in the methods and no corresponding results are provided. This kind of analysis seems not to be made to be intelligible to the average reader. It follows a recent trend of using more and more opaque methods. Where we had to trust published results a decade ago because the data were not openly available, today we must trust the results because the methods can no longer be understood with reasonable effort.

      In one point the method might not be sufficiently justified. The method used to transform circular-linear r (actually, all references cited by the authors for circular statistics use r² because there can be no negative values) into "Z_r", seems partially plausible and might be correct under the H0. However, Figure 12.3 seems to show that under the alternative Hypothesis H1, the assumptions are not accurate (peak Z_r=~0.70 for r=0.65). I am therefore, based on the presented evidence, unsure whether this transformation is valid. Also, saying that Z_r=-1 represents the null hypothesis and Z_r=1 the alternative hypothesis can be misinterpreted, since Z_r=0 also represents the null hypothesis and is not half way between H0 and H1.

    5. Author response:

      Reviewer #1 (Public review):

      I did not follow the logic behind including spindle amplitude in the meta-analysis. This is not a measure of SO-spindle coupling (which is the focus of the review), unless the authors were restricting their analysis of the amplitude of coupled spindles only. It doesn't sound like this is the case though. The effect of spindle amplitude on memory consolidation has been reviewed in another recent meta-analysis (Kumral et al, 2023, Neuropsychologia). As standardization this isn't a measure of coupling, it wasn't clear why this measure was included in the present meta-analysis. You could easily make the argument that other spindle measures (e.g., density, oscillatory frequency) could also have been included, but that seems to take away from the overall goal of the paper which was to assess coupling.

      Indeed, spindle amplitude refers to all spindle events rather than only coupled spindles. This choice was made because we recognized the challenge of obtaining relevant data from each study—only 4 out of the 23 included studies performed their analyses after separating coupled and uncoupled spindles. This inconsistency strengthens the urgency and importance of this meta-analysis to standardize the methods and measures used for future analysis on SO-SP coupling and beyond. We agree that focusing on the amplitude of coupled spindles would better reveal their relations with coupling, and we will discuss this limitation in the manuscript.

      Nevertheless, we believe including spindle amplitude in our study remains valuable, as it served several purposes. First, SO-SP coupling involves the modulation between spindle amplitude and slow oscillation phase. Different studies have reported conflicting conclusions regarding how spindle amplitude was related to coupling– some found significant correlations (e.g., Baena et al., 2023), while others did not (e.g., Roebber et al., 2022). This discrepancy highlights an indirect but potentially crucial insight into the role of spindle amplitude in coupling dynamics. Second, in studies related to SO-SP coupling, spindle amplitude is one of the most frequently reported measures along with other coupling measures that significantly correlated with oversleep memory improvements (e.g. Kurz et al., 2023; Ladenbauer et al., 2021; Niknazar et al., 2015), so we believe that including this measure can more comprehensively review of the existing literature on SO-SP coupling. Third, incorporating spindle amplitude allows for a direct comparison between the measurement of coupling and individual events alone in their contribution to memory consolidation– a question that has been extensively explored in recent research. (e.g., Hahn et al., 2020; Helfrich et al., 2019; Niethard et al., 2018; Weiner et al., 2023). Finally, spindle amplitude was identified as a key moderator for memory consolidation in Kumral et al.'s (2023) meta-analysis. By including it in our analysis, we sought to replicate their findings within a broader framework and introduce conceptual overlaps with existing reviews. Therefore, although we were not able to selectively include coupled spindles, there is still a unique relation between spindle amplitude and SO-SP coupling that other spindle measures do not have. 

      Originally, we also intended to include coupling density or counts in the analysis, which seems more relevant to the coupling metrics. However, the lack of uniformity in methods used to measure coupling density posed a significant limitation. We hope that our study will encourage consistent reporting of all relevant parameters in future research, enabling future meta-analyses to incorporate these measures comprehensively. We will add this discussion to the manuscript in the revised version to further clarify these points.

      References:

      Roebber, J. K., Lewis, P. A., Crunelli, V., Navarrete, M. & Hamandi, K. Effects of anti-seizure medication on sleep spindles and slow waves in drug-resistant epilepsy. Brain Sci. 12, 1288 (2022). https://doi.org/10.3390/brainsci12101288

      All other citations were referenced in the manuscript.

      At the end of the first paragraph of section 3.1 (page 13), the authors suggest their results "... further emphasise the role of coupling compared to isolated oscillation events in memory consolidation". This had me wondering how many studies actually test this. For example, in a hierarchical regression model, would coupled spindles explain significantly more variance than uncoupled spindles? We already know that spindle activity, independent of whether they are coupled or not, predicts memory consolidation (e.g., Kumral meta-analysis). Is the variance in overnight memory consolidation fully explained by just the coupled events? If both overall spindle density and coupling measures show an equal association with consolidation, then we couldn't conclude that coupling compared to isolated events is more important.

      While primary coupling measurements, including coupling phase and strength, showed strong evidence for their associations with memory consolidation, measures of spindles, including spindle amplitude, only exhibited limited evidence (or “non-significant” effect) for their association with consolidation. These results are consistent with multiple empirical studies using different techniques (e.g., Hahn et al., 2020; Helfrich et al., 2019; Niethard et al., 2018; Weiner et al., 2023), which reported that coupling metrics are more robust predictors of consolidation and synaptic plasticity than spindle or slow oscillation metrics alone. However, we agree with the reviewer that we did not directly separate the effect between coupled and uncoupled spindles, and a more precise comparison would involve contrasting the “coupling of oscillation events” with ”individual oscillation events” rather than coupling versus isolated events.

      We recognized that Kumral and colleagues’ meta-analysis reported a moderate association between spindle measures and memory consolidation (e.g., for spindle amplitude-memory association they reported an effect size of approximately r = 0.30). However, one of the advantages of our study is that we actively cooperated with the authors to obtain a large number of unreported and insignificant data relevant to our analysis, as well as separated data that were originally reported under mixed conditions. This approach decreases the risk of false positives and selective reporting of results, making the effect size more likely to approach the true value. In contrast, we found only a weak effect size of r = 0.07 with minimal evidence for spindle amplitude-memory relation. However, we agree with the reviewer that using a more conservative term in this context would be a better choice since we did not measure all relevant spindle metrics including the density.

      To improve clarity in our manuscript, we will revise the statement to: “Together with other studies included in the review, our results suggest a crucial role of coupling but did not support the role of spindle events alone in memory consolidation,” and provide relevant references. We believe this can more accurately reflect our findings and the existing literature to address the reviewer’s concern.

      It was very interesting to see that the relationship between the fast spindle coupling phase and overnight consolidation was strongest in the frontal electrodes. Given this, I wonder why memory promoting fast spindles shows a centro-parietal topography? Surely it would be more adaptive for fast spindles to be maximally expressed in frontal sites. Would a participant who shows a more frontal topography of fast spindles have better overnight consolidation than someone with a more canonical centro-parietal topography? Similarly, slow spindles would then be perfectly suited for memory consolidation given their frontal distribution, yet they seem less important for memory.

      Regarding the topography of fast spindles and their relationship to memory consolidation, we agree this is an intriguing issue, and we have already developed significant progress in this topic in our ongoing work. We share a few relevant observations: First, there are significant discrepancies in the definition of “slow spindle” in the field. Some studies defined slow spindle from 9-12 Hz (e.g. Mölle et al., 2011; Kurz et al., 2021), while others performed the event detection within a range of 11-13/14 Hz (e.g. Barakat et al., 2011; D'Atri et al., 2018). Compounding this issue, individual differences in spindle frequency are often overlooked, leading to challenges in reliably distinguishing between slow and fast spindles. Some studies have reported difficulty in clearly separating the two types of spindles altogether (e.g., Hahn et al., 2020). Moreover, a critical factor often ignored in past research is the traveling nature of both slow oscillations and spindles across the cortex, where spindles are coupled with significantly different phases of slow oscillations (see Figure 5). We believe a better understanding of coupling in the context of the movement of these waves will help us better understand the observed frontal relationship with consolidation. We will address this in our revised manuscript.

      The authors rightly note the issues with multiple comparisons in sleep physiology and memory studies. Multiple comparison issues arise in two ways in this literature. First are comparisons across multiple electrodes (many studies now use high-density systems with 64+ channels). Second are multiple comparisons across different outcome variables (at least 3 ways to quantify coupling (phase, consistency, occurrence) x 2 spindle types (fast, slow). Can the authors make some recommendations here in terms of how to move the field forward, as this issue has been raised numerous times before (e.g., Mantua 2018, Sleep; Cox & Fell 2020, Sleep Medicine Reviews for just a couple of examples). Should researchers just be focusing on the coupling phase? Or should researchers always report all three metrics of coupling, and correct for multiple comparisons? I think the use of pre-registration would be beneficial here, and perhaps could be noted by the authors in the final paragraph of section 3.5, where they discuss open research practices.

      There are indeed multiple methods that we can discuss, including cluster-based and non-parametric methods, etc., to correct for multiple comparisons in EEG data with spatiotemporal structures. In addition, encouraging the reporting of all tested but insignificant results, at least in supplementary materials, is an important practice that helps readers understand the findings with reduced bias. We agree with the reviewer’s suggestions and will add more information in section 3.5 to advocate for a standardized “template” used to analyze and report effect size in future research.

      We advocate for the standardization of reporting all three coupling metrics– phase, consistency, and occurrence. Each coupling metric captures distinct properties of the coupling process and may interact with one another (Weiner et al., 2023). Therefore, we believe it is essential to report all three metrics to comprehensively explore their different roles in the “how, what, and where” of long-distance communication and consolidation of memory. As we advance toward a deeper understanding of the relationship between memory and sleep, we hope this work establishes a standard for the standardization, transparency, and replication of relevant studies.

      Reviewer #2 (Public review):

      Regarding the Moderator of Age: Although the authors discuss the limited studies on the analysis of children and elders regarding age as a moderator, the figure shows a significant gap between the ages of 40 and 60. Furthermore, there are only a few studies involving participants over the age of 60. Given the wide distribution of effect sizes from studies with participants younger than 40, did the authors test whether removing studies involving participants over 60 would still reveal a moderator effect?

      We agree that there is an age gap between younger and older adults, as current studies often focus on contrasting newly matured and fully aged populations to amplify the effect, while neglecting the gradual changes in memory consolidation mechanisms across the aging spectrum. We suggest that a non-linear analysis of age effects would be highly valuable, particularly when additional child and older adult data become available.

      In response to the reviewer’s suggestion, we re-tested the moderation effect of age after excluding effect sizes from older adults. The results revealed a decrease in the strength of evidence for phase-memory association due to increased variability, but were consistent for all other coupling parameters. The mean estimations also remained consistent (coupling phase-memory relation: -0.005 [-0.013, 0.004], BF10 = 5.51, the strength of evidence reduced from strong to moderate; coupling strength-memory relation: -0.005 [-0.015, 0.008], BF10 = 4.05, the strength of evidence remained moderate). These findings align with prior research, which typically observed a weak coupling-memory relationship in older adults during aging (Ladenbauer et al, 2021; Weiner et al., 2023) but not during development (Hahn et al., 2020; Kurz et al., 2021; Kurz et al., 2023). Therefore, this result is not surprising to us, and there are still observable moderate patterns in the data. We will report these additional results in the revised manuscript, and interpret “the moderator effect of age becomes less pronounced during development after excluding the older adult data”. We believe the original findings including the older adult group remain meaningful after cautious interpretation, given that the older adult data were derived from multiple studies and different groups.

      Reviewer #3 (Public review):

      First, the authors conclude that "SO-SP coupling should be considered as a general physiological mechanism for memory consolidation". However, the reported effect sizes are smaller than what is typically considered a "small effect" (0.10)

      While we acknowledge the concern about the small effect sizes reported in our study, it is important to contextualize these findings within the field of neuroscience, particularly memory research. Even in individual studies, small effect sizes are not uncommon due to the inherent complexity of the mechanisms involved and the multitude of confounding variables. This is an important factor to be considered in meta-analyses where we synthesize data from diverse populations and experimental conditions. For example, the relationship between SO-slow SP coupling and memory consolidation in older adults is expected to be insignificant.

      As Funder and Ozer (2019) concluded in their highly cited paper, an effect size of r = 0.3 in psychological and related fields should be considered large, with r = 0.4 or greater likely representing an overestimation and rarely found in a large sample or in a replication. Therefore, we believe r = 0.1 should not be considered as a lower bound of the small effect. Bakker et al. (2019) also advocate for a contextual interpretation of the effect size. This is particularly important in meta-analyses, where the results are less prone to overestimation compared to individual studies, and we cooperated with all authors to include a large number of unreported and insignificant results. In this context, small correlations may contain substantial meaningful information to interpret. Although we agree that effect sizes reported in our study are indeed small at the overall level, they reflect a rigorous analysis that incorporates robust evidence across different levels of moderators. Our moderator analyses underscore the dynamic nature of coupling-memory relationships, with certain subgroups demonstrating much stronger and more meaningful effects, especially after excluding slow spindles and older adults. For example, both the coupling phase and strength of frontal fast spindles with slow oscillations exhibited "moderate-to-large" correlations with the consolidation of different types of memory, especially in young adults, with r values ranging from 0.18 to 0.32. (see Table S9.1-9.4). We will add more discussion about the influence of moderators on the dynamics of coupling-memory associations. In addition, we will update the conclusion to be “SO-fast SP coupling should be considered as a general physiological mechanism for memory consolidation”.

      Reference:

      Funder, D. C. & Ozer, D. J. Evaluating effect size in psychological research: sense and nonsense. Adv. Methods Pract. Psychol. Sci. 2, 156–168 (2019). https://doi.org/10.1177/2515245919847202.

      Bakker, A. et al. Beyond small, medium, or large: Points of consideration when interpreting effect sizes. Educ. Stud. Math. 102, 1–8 (2019). https://doi.org/10.1007/s10649-019-09908-4

      Second, the study implements state-of-the-art Bayesian statistics. While some might see this as a strength, I would argue that it is the greatest weakness of the manuscript. A classical meta-analysis is relatively easy to understand, even for readers with only a limited background in statistics. A Bayesian analysis, on the other hand, introduces a number of subjective choices that render it much less transparent.

      This kind of analysis seems not to be made to be intelligible to the average reader. It follows a recent trend of using more and more opaque methods. Where we had to trust published results a decade ago because the data were not openly available, today we must trust the results because the methods can no longer be understood with reasonable effort.

      This becomes obvious in the forest plots. It is not immediately apparent to the reader how the distributions for each study represent the reported effect sizes (gray dots). Presumably, they depend on the Bayesian priors used for the analysis. The use of these priors makes the analyses unnecessarily opaque, eventually leading the reader to question how much of the findings depend on subjective analysis choices (which might be answered by an additional analysis in the supplementary information).

      We appreciate the reviewer for sharing this viewpoint and we value the opportunity to clarify some key points. To address the concern about clarity, we will include a sub-section in the methods section explaining how to interpret Bayesian statistics including priors, posteriors, and Bayes factors, making our results more accessible to those less familiar with this approach.

      On the use of Bayesian models, we believe there may have been a misunderstanding. Bayesian methods, far from being "opaque" or overly complex, are increasingly valued for their ability to provide nuanced, accurate, and transparent inferences (Sutton & Abrams, 2001; Hackenberger, 2020; van de Schoot et al., 2021; Smith et al., 1995; Kruschke & Liddell, 2018). It has been applied in more than 1,200 meta-analyses as of 2020 (Hackenberger, 2020). In our study, we used priors that assume no effect (mean set to 0, which aligns with the null) while allowing for a wide range of variation to account for large uncertainties. This approach reduces the risk of overestimation or false positives and demonstrates much-improved performance over traditional methods in handling variability (Williams et al., 2018; Kruschke & Liddell, 2018). Sensitivity analyses reported in the supplemental material (Table S9.1-9.4) confirmed the robustness of our choices of priors– our results did not vary by setting different priors.

      As Kruschke and Liddell (2018) described, “shrinkage (pulling extreme estimates closer to group averages) helps prevent false alarms caused by random conspiracies of rogue outlying data,” a well-known advantage of Bayesian over traditional approaches. This explains the observed differences between the distributions and grey dots in the forest plots. Unlike p-values, which can be overestimated with a large sample size and underestimated with a small sample size, Bayesian methods make assumptions explicit, enabling others to challenge or refine them– an approach aligned with open science principles (van de Schoot et al., 2021). For example, a credible interval in Bayesian model can be interpreted as “there is a 95% probability that the parameter lies within the interval.”, while a confidence interval in frequentist model means “In repeated experiments, 95% of the confidence intervals will contain the true value.” We believe the former is much more straightforward and convincing for readers to interpret. We will ensure our justification for using Bayesian models is more clearly presented in the manuscript.

      We acknowledge that even with these justifications, different researchers may still have discrepancies in their preferences for Bayesian and frequentist models. To increase the effort of transparent reporting, we have also reported the traditional frequentist meta-analysis results in Supplemental Material 10 to justify the robustness of our analysis, which suggested non-significant differences between Bayesian and frequentist models. We will include clearer references in the next version of the manuscript to direct readers to the figures that report the statistics provided by traditional models.

      References:

      Hackenberger, B.K. Bayesian meta-analysis now—let's do it. Croat. Med. J. 61, 564–568 (2020). https://doi.org/10.3325/cmj.2020.61.564

      Sutton, A.J. & Abrams, K.R. Bayesian methods in meta-analysis and evidence synthesis. Stat. Methods Med. Res. 10, 277–303 (2001). https://doi.org/10.1177/096228020101000404

      Williams, D.R., Rast, P. & Bürkner, P.C. Bayesian meta-analysis with weakly informative prior distributions. PsyArXiv (2018). https://doi.org/10.31234/osf.io/9n4zp

      van de Schoot, R., Depaoli, S., King, R. et al. Bayesian statistics and modelling. Nat Rev Methods Primers 1, 1 (2021). https://doi.org/10.1038/s43586-020-00001-2

      Smith, T.C., Spiegelhalter, D.J. & Thomas, A. Bayesian approaches to random-effects meta-analysis: a comparative study. Stat. Med. 14, 2685–2699 (1995). https://doi.org/10.1002/sim.4780142408

      Kruschke, J.K. & Liddell, T.M. The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychon. Bull. Rev. 25, 178–206 (2018). https://doi.org/10.3758/s13423-016-1221-4

      However, most of the methods are not described in sufficient detail for the reader to understand the proceedings. It might be evident for an expert in Bayesian statistics what a "prior sensitivity test" and a "posterior predictive check" are, but I suppose most readers would wish for a more detailed description. However, using a "Markov chain Monte Carlo (MCMC) method with the no-U-turn Hamiltonian Monte Carlo (HMC) sampler" and checking its convergence "through graphical posterior predictive checks, trace plots, and the Gelman and Rubin Diagnostic", which should then result in something resembling "a uniformly undulating wave with high overlap between chains" is surely something only rocket scientists understand. Whether this was done correctly in the present study cannot be ascertained because it is only mentioned in the methods and no corresponding results are provided. 

      We appreciate the reviewer’s concerns about accessibility and potential complexity in our descriptions of Bayesian methods. Our decision to provide a detailed account serves to enhance transparency and guide readers interested in replicating our study. We acknowledge that some terms may initially seem overwhelming. These steps, such as checking the MCMC chain convergence and robustness checks, are standard practices in Bayesian research and are analogous to “linearity”, “normality” and “equal variance” checks in frequentist analysis. We have provided exemplary plots in the supplemental material and will add more details to explain the interpretation of these convergence checks. We hope this will help address any concerns about methodological rigor.

      In one point the method might not be sufficiently justified. The method used to transform circular-linear r (actually, all references cited by the authors for circular statistics use r² because there can be no negative values) into "Z_r", seems partially plausible and might be correct under the H0. However, Figure 12.3 seems to show that under the alternative Hypothesis H1, the assumptions are not accurate (peak Z_r=~0.70 for r=0.65). I am therefore, based on the presented evidence, unsure whether this transformation is valid. Also, saying that Z_r=-1 represents the null hypothesis and Z_r=1 the alternative hypothesis can be misinterpreted, since Z_r=0 also represents the null hypothesis and is not half way between H0 and H1.

      First, we realized that in the title of Figures 12.2 and 12.3. “true r = 0.35” and “true r = 0.65” should be corrected as “true Z_r”. The method we used here is to first generate an underlying population that has null (0), moderate (0.35), or large (0.65) Z_r correlations, then test whether the sampling distribution drawn from these populations followed a normal distribution across varying sample sizes. Nevertheless, the reviewer correctly noticed discrepancies between the reported true Z_r and its sampling distribution peak. This discrepancy arises because, when generating large population data, achieving exact values close to a strong correlation like Z_r = 0.65 is unlikely. We loop through simulations to generate population data and ensure their Z_r values fall within a threshold. For moderate effect sizes (e.g., Z_r = 0.35), this is straightforward using a narrow range (0.345 < Z_r < 0.355). However, for larger effect sizes like Z_r = 0.65, a wider range (0.6 < Z_r < 0.7) is required. therefore sometimes the population we used to draw the sample has a Z_r slightly deviated from 0.65. This remains reasonable since the main point of this analysis is to ensure that large Z_r still has a normal sampling distribution, but not focus specifically on achieving Z_r = 0.65.

      We acknowledge that this variability of the range used was not clearly explained and it is not accurate to report “true Z_r = 0.65”. In the revised version, we will address this issue by adding vertical lines to each subplot to indicate the Z_r of the population we used to draw samples, making it easier to check if it aligns with the sampling peak. In addition, we will revise the title to “Sampling distributions of Z_r drawn from strong correlations (Z_r = 0.6-0.7)”. We confirmed that population Z_r and the peak of their sampling distribution remain consistent under both H0 and H1 in all sample sizes with n > 25, and we hope this explanation can fully resolve your concern.

      We agree with the reviewer that claiming Z_r = -1 represents the null hypothesis is not accurate. The circlin Z_r = 0 is better analogous to Pearson’s r = 0 since both represent the mean drawn from the population with the null hypothesis. In contrast, the mean effect size under null will be positive in the raw circlin r, which is one of the important reasons for the transformation. To provide a more accurate interpretation, we will update Table 6 to describe the following strength levels of evidence: no effect (r < 0), null (r = 0), small (r = 0.1), moderate (r = 0.3), and large (r = 0.5).

    1. eLife Assessment

      This manuscript describes a fundamental investigation of the functioning of Cas9 and in particular on how variant xCas9 expands DNA targeting ability by an increase-flexibility mechanism. The authors provide compelling evidence to support their mechanistic models and the relevance of flexibility and entropy in recognition. This work can be of interest to a broad community of structural biophysicists, computational biologists, chemists, and biochemists.

    2. Joint Public Review:

      Summary:

      Hossain and coworkers investigate the mechanisms of recognition of xCas9, a variant of Cas9 with expanded targeting capability for DNA. They do so by using molecular simulations and combining different flavors of simulation techniques, ranging from long classical MD simulations, to enhanced sampling, to free energy calculations of affinity differences. Through this, the authors are able to develop a consistent model of expanded recognition based on the enhanced flexibility of the protein receptor.

      Strengths:

      The paper is solidly based on the ability of the authors to master molecular simulations of highly complex systems. In my opinion, this paper shows no major weaknesses. The simulations are carried out in a technically sound way. Comparative analyses of different systems provide valuable insights, even within the well-known limitations of MD. Plus, the authors further investigate why xCas9 exhibits improved recognition of the TGG PAM sequence compared to SpCas9 via well-tempered metadynamics simulations focusing on the binding of R1335 to the G3 nucleobase and the DNA backbone in both SpCas9 and xCas9. In this context, the authors provide a free-energy profiling that helps support their final model.

      The implementation of FEP calculations to mimic directed evolution improvement of DNA binding is also interesting, original and well-conducted.

      Overall, my assessment of this paper is that it represents a strong manuscript, competently designed and conducted, and highly valuable from a technical point of view.

      Weaknesses:

      To make their impact even more general, the authors may consider expanding their discussion on entropic binding to other recent cases that have been presented in the literature recently (such as e.g. the identification of small molecules for Abeta peptides, or the identification of "fuzzy" mechanisms of binding to protein HMGB1). The point on flexibility helping adaptability and expansion of functional properties is important, and should probably be given more evidence and more direct links with a wider picture.

    1. eLife Assessment

      This study reports valuable findings that highlight the importance of data quality and data representation for ligand-based virtual screening experiments. The authors' claims are supported by solid evidence, although the conclusions have been inferred from only two datasets. The work would gain much impact if additional datasets were used. The main findings will be of interest to cheminformaticians and medicinal chemists working in QSAR modeling, and possibly in other areas related to machine learning.

    2. Reviewer #1 (Public review):

      Summary:

      The work provides more evidence of the importance of data quality and representation for ligand-based virtual screening approaches. The authors have applied different machine learning (ML) algorithms and data representation using a new dataset of BRAF ligands. First, the authors evaluate the ML algorithms, and demonstrate that independently of the ML algorithm, predictive and robust models can be obtained in this BRAF dataset. Second, the authors investigate how the molecular representations can modify the prediction of the ML algorithm. They found that in this highly curated dataset the different molecule representations are adequate for the ML algorithms since almost all of them obtain high accuracy values, with Estate fingerprints obtaining the worst performing predictive models and ECFP6 fingerprints producing the best classificatory models. Third, the authors evaluate the performance of the models on subsets of different composition and size of the BRAF dataset. They found that given a finite number of active compounds, increasing the number of inactive compounds worsens the recall and accuracy. Finally, the authors analyze if the use of "less active" molecules affect the model's predictive performance using "less active" molecules taken from ChEMBl Database or using decoys from DUD-E. As results, they found that the accuracy of the model falls as the number of "less active" examples in the training dataset increases while the implementation of decoys in the training set generates results as good as the original models or even better in some cases. However, the use of decoys in the training set worsens the predictive power in the test sets that contain active and inactive molecules.

      Strengths:

      This is a highly relevant topic in medicinal chemistry and drug discovery. The manuscript is well-written, with a clear structure that facilitates easy reading, and it includes up-to-date references. The hypotheses are clearly presented and appropriately explored. The study provides valuable insights into the importance of deriving models from high-quality data, demonstrating that, when this condition is met, complex computational methods are not always necessary to achieve predictive models. Furthermore, the generated BRAF dataset offers a valuable resource for medicinal chemists working in ligand-based virtual screening.

      Weaknesses:

      While the work highlights the importance of using high-quality datasets to achieve better and more generalizable results, it does not present significant novelty, as the analysis of training data has been extensively studied in chemoinformatics and medicinal chemistry. Additionally, the inclusion of "AI" in the context of data-centric AI is somewhat unclear, given that the dataset curation is conducted manually, selecting active compounds based on IC50 values from ChEMBL and inactive compounds according to the authors' criteria.

      Moreover, the conclusions are based on the analysis of only two high-quality datasets. To generalize these findings, it would be beneficial to extend the analysis to additional high-quality datasets (at least 10 datasets for a robust benchmarking exercise).

      A key aspect that could be improved is the definition of an "inactive" compound, which remains unclear. In the manuscript, it is stated:

      • "The inactives were carefully selected based on the fact that they have no known pharmacological activity against BRAF."<br /> Does the lack of BRAF activity data necessarily imply that these compounds are inactive?<br /> • "We define a compound as 'inactive' if there are no known pharmacological assays for the said compound on our target, BRAF."<br /> However, in the authors' response, they mention:<br /> • "We selected certain compounds that we felt could not possibly be active against BRAF, such as ligands for neurotransmitter receptors, as inactives."

      Given that the definition of "inactive" is one of the most critical concepts in the study, I believe it should be clearly and consistently explained.

      Lastly, while statistical comparison is not always common in machine learning, it would greatly enhance the value of this work, especially when comparing models with small differences in accuracy.

    3. Reviewer #2 (Public review):

      Summary:

      The authors explored the importance of data quality and representation for ligand-based virtual screening approaches. I believe the results could be of potential benefit to the drug discovery community, especially to those scientists working in the field of machine learning applied to drug research. The in silico design is comprehensive and adequate for the proposed comparisons.

      This manuscript by Chong A. et al describes that it is not necessary to resort to the use of sophisticated deep learning algorithms for virtual screening, since based on their results considering conventional ML may perform exceptionally well if feeded by the right data and molecular representations.

      The article is interesting and well-written. The overview of the field and the warning about dataset composition are very well thought-out and should be of interest to a broad segment of the AI in drug discovery readership. This article further highlights some of the considerations that need to be taken into consideration for the implementation of data-centric AI for computer-aided drug design methods.

      Strengths:

      This study contributes significantly to the field of machine learning and data curation in drug discovery. The paper is, in general, well-written and structured. However, in my opinion, there are some suggestions regarding certain aspects of the data analyses.

      Weaknesses:

      The conclusions drawn in the study are based on the analysis of a two dataset. The authors chose BRAF as an example in this study, and expanded with BACE-1 dataset; however a benchmark with several targets would be suitable to evaluate reproducibility or transferability of the method. One concern could be the applicability of the method in other targets.

    4. Reviewer #3 (Public review):

      Summary:

      The authors presented a data-centric ML approach for virtual ligand screening. They used BRAF as an example to demonstrate the predictive power of their approach.

      Strengths:

      The performance of predictive models in this study is superior (nearly perfect) with respect to exiting methods.

      Comments on revisions:

      In the revised manuscript, the presented approach has been robustly tested and can be very useful for ligand prediction.

    5. Author response:

      The following is the authors’ response to the original reviews.

      We thank the Editors and reviewers for their candid evaluation of our work. While it was suggested that we should demonstrate the validity of our approach with maybe 10 different datasets but we felt that this would place an undue burden on our resources. Generally, it takes about 4 to 6 months for us to build a dataset and this does not include the time taken to train and test our AI models. This would mean that it would take us another 3 to 5 years to complete this research project if we chose to provide 10 different datasets. Publishing a research on one dataset is definitely not unheard of: for example, Subramanian et al. (2016) published their widely-cited benchmark dataset for just BACE1 inhibitors. However, we hoped that the additional work where we showed that we were able to improve the benchmark dataset for BACE1 inhibitors and achieve the same high level of predictive performance for this dataset would convince the readers (and reviewers) of the reproducibility of our approach. Furthermore, we also showed that our approach is robust and does not rely on a large volume of data to achieve this near-perfect accuracy. As can be seen in the Supplemental section, even our AI models trained on ONLY 250 BRAF actives and 250 inactives could achieve 96.3% accuracy! Logically, if the model is robust then we would expect the model to be reproducible. As such, we do not feel it is necessary for us to test our approach on 10 different datasets. 

      It was also suggested that we expand this study to other types of molecular representations to give a better idea of generalizability. We would like to point out that we tested, in total, 55 single fingerprints and paired combinations. Our goal was to create an approach that could give superior performance for virtual screening and we believe that we have achieved this. Based on the results of our study, we are of the opinion that molecular representations do not, in general, have an oversized effect on AI virtual screening. Although it is important to be aware that certain molecular representations may give SLIGHTLY better performance but we can see that with the exception of the 79-bit E-State fingerprint (which could still achieve an impressive 85% accuracy for the SVM model), nearly all molecular fingerprints and paired combinations that we used were able to achieve an accuracy of above 97%. Therefore, we do not share the reviewers' concern that our approach may not be useful when applied with other types of molecular representations.

      It is true that our work involved manual curation of the datasets but the goal of this paper is to lay down some  ground rules for the future development of a data-centric AI approach. Although manual curation is a routine practice in AI/ML, but it should be recognised that there is good manual curation and bad manual curation, and rules need to be established to ensure we have good manual curation. Without these rules, we would also not be able to establish and train a data-centric AI. All manual curation involves a level of subjectiveness but that subjectiveness comes from one's experience and domain knowledge of the field in which the AI is being applied. For example, in the case of this study, we relied on our knowledge and understanding of pharmacology to determine whether a compound is pharmacologically inactive or active. This may seem somewhat arbitrary to the uninitiated but it is anything but arbitrary. It is through careful thought and assessment of the chemical compounds that we choose these compounds for training the AI. Unfortunately, this sort of subjective assessment cannot be easily or completely explained but we do show where current practices have failed when building a dataset for training an AI for virtual screening.

    1. eLife Assessment

      This important study used an automated system to collect eggs laid over the course of multiple days by individual female Drosophila to successfully reveal a robust yet noisy circadian rhythm of egg-laying. Their results show that the neural control of this rhythm is entirely different from the one that controls locomotor activity rhythmicity. Preliminary connectome-based analyses provide evidence for connections between the relevant clock neurons and neurons involved in oviposition. The evidence provided is solid, although using an independent tool for targeted knockdown of clock genes and including the time series of representative individuals for all genotypes tested would help interpret the results.

    2. Joint Public Review:

      Riva et al uncovered the neural substrate underlying the oviposition rhythm in Drosophila melanogaster using a novel device that automates egg collection from individual mated females over the course of multiple days. By systematically knocking down the clock gene period in specific clock neurons the authors show that three cryptochrome (cry) positive dorso-lateral neurons (LNds) present in each hemisphere of the fly brain are critical to generating a female, sex-specific rhythm in oviposition. Interestingly, these neurons are not essential for freerunning locomotor activity. By contrast, the LNvs (lateral ventral neurons), which are essential for freerunning locomotor activity rhythmicity, were not involved in controlling the circadian rhythmicity of oviposition. Thus, this work has identified the first truly sex-specific circadian circuit in Drosophila. Using available Drosophila hemibrain connectome data they identify bidirectional connections between cry-expressing LNd and oviposition-related neurons.

      Strengths:

      This paper established a new semi-automatic device to register egg-laying activity, in Drosophila and found a specific role for a subset of clock neurons in the control of a female-specific circadian behavior. They also lay the groundwork for understanding how these neurons are connected to the neurons that control egg laying.

      Weaknesses:

      (1) Controls for the genetic background are incomplete, leaving open the possibility that the observed oviposition timing defects may be due to targeted knockdown of the period (per) gene but from the GAL4, Gal80, and UAS transgenes themselves. To resolve this issue the authors should determine the egg-laying rhythms of the relevant controls (GAL4/+, UAS-RNAi/+, etc); this only needs to be done for those genotypes that produced an arrhythmic egg-laying rhythm.

      (2) Reliance on a single genetic tool to generate targeted disruption of clock function leaves the study vulnerable to associated false positive and false negative effects: a) The per RNAi transgene used may only cause partial knockdown of gene function, as suggested by the persistent rhythmicity observed when per RNAi was targeted to all clock neurons. This could indicate that the results in Fig 2C-H underestimate the phenotypes of targeted disruption of clock function. b) Use of a single per RNAi transgene makes it difficult to rule out that off-target effects contributed significantly to the observed phenotypes. We suggest that the authors repeat the critical experiments using a separate UAS-RNAi line (for period or for a different clock gene), or, better yet, use the dominant negative UAS-cycle transgene produced by the Hardin lab (https://doi.org/10.1038/22566).

      (3) The egg-laying profiles obtained show clear damping/decaying trends which necessitates careful trend removal from the data to make any sense of the rhythm. Further, the detrending approach used by the authors is not tested for artefacts introduced by the 24h moving average used.

      (4) According to the authors the oviposition device cannot sample at a resolution finer than 4 hours, which will compel any experimenter to record egg laying for longer durations to have a suitably long time series which could be useful for circadian analyses.

      (5) Despite reducing the interference caused by manually measuring egg-laying, the rhythm does not improve the signal quality such that enough individual rhythmic flies could be included in the analysis methods used. The authors devise a workaround by combining both strongly and weakly rhythmic (LSpower > 0.2 but less than LSpower at p < 0.05) data series into an averaged time series, which is then tested for the presence of a 16-32h "circadian" rhythm. This approach loses valuable information about the phase and period present in the individual mated females, and instead assumes that all flies have a similar period and phase in their "signal" component while the distribution of the "noise" component varies amongst them. This assumption has not yet been tested rigorously and the evidence suggests a lot more variability in the inter-fly period for the egg-laying rhythm.

      (6) This variability could also depend on the genotype being tested, as the authors themselves observe between their Canton-S and YW wild-type controls for which their egg-laying profiles show clearly different dynamics. Interestingly, the averaged records for these genotypes are not distinguishable but are reflected in the different proportions of rhythmic flies observed. Unfortunately, the authors also do not provide further data on these averaged profiles, as they did for the wild-type controls in Figure 1, when they discuss their clock circuit manipulations using perRNAi. These profiles could have been included in Supplementary figures, where they would have helped the reader decide for themselves what might have been the reason for the loss of power in the LS periodogram for some of these experimental lines.

      (7) By selecting 'the best egg layers' for inclusion in the oviposition analyses an inadvertent bias may be introduced and the results of the assays may not be representative of the whole population.

      (8) An approach that measures rhythmicity for groups of individual records rather than separate individual records is vulnerable to outliers in the data, such as the inclusion of a single anomalous individual record. Additionally, the number of individual records that are included in a group may become a somewhat arbitrary determinant for the observed level of rhythmicity. Therefore, the experimental data used to map the clock neurons responsible for oviposition rhythms would be more convincing if presented alongside individual fly statistics, in the same format as used for Figure 1.

      (9) The features in the experimental periodogram data in Figures 3B and D are consistent with weakened complex rhythmicity rather than arrhythmicity. The inclusion of more individual records in the groups might have provided the added statistical power to demonstrate this. Graphs similar to those in 1G and 1I, might have better illustrated qualitative and quantitative aspects of the oviposition rhythms upon per knockdown via MB122B and Mai179; Pdf-Gal80.

      Wider context:

      The study of the neural basis of oviposition rhythms in Drosophila melanogaster can serve as a model for the analogous mechanisms in other animals. In particular, research in this area can have wider implications for the management of insects with societal impact such as pests, disease vectors, and pollinators. One key aspect of D. melanogaster oviposition that is not addressed here is its strong social modulation (see Bailly et al.. Curr Biol 33:2865-2877.e4. doi:10.1016/j.cub.2023.05.074). It is plausible that most natural oviposition events do not involve isolated individuals, but rather groups of flies. As oviposition is encouraged by aggregation pheromones (e.g., Dumenil et al., J Chem Ecol 2016 https://link.springer.com/article/10.1007/s10886-016-0681-3) its propensity changes upon the pre-conditioning of the oviposition substrates, which is a complication in assays of oviposition rhythms that periodically move the flies to fresh substrate.

    3. Author response:

      (1) Controls for the genetic background are incomplete, leaving open the possibility that the observed oviposition timing defects may be due to targeted knockdown of the period (per) gene but from the GAL4, Gal80, and UAS transgenes themselves. To resolve this issue the authors should determine the egg-laying rhythms of the relevant controls (GAL4/+, UAS-RNAi/+, etc); this only needs to be done for those genotypes that produced an arrhythmic egg-laying rhythm.

      We agree with this objection, and in the corrected version we plan to provide the assessment of the egg laying rhythms for the missing GAL4 controls as recommended only for Figure 3.

      (2) Reliance on a single genetic tool to generate targeted disruption of clock function leaves the study vulnerable to associated false positive and false negative effects: a) The per RNAi transgene used may only cause partial knockdown of gene function, as suggested by the persistent rhythmicity observed when per RNAi was targeted to all clock neurons. This could indicate that the results in Fig 2C-H underestimate the phenotypes of targeted disruption of clock function. b) Use of a single per RNAi transgene makes it difficult to rule out that off-target effects contributed significantly to the observed phenotypes. We suggest that the authors repeat the critical experiments using a separate UAS-RNAi line (for period or for a different clock gene), or, better yet, use the dominant negative UAS-cycle transgene produced by the Hardin lab (https://doi.org/10.1038/22566).

      We have recently acquired mutant flies with a dominant negative-cycle transgene (UAS-cycDN, Tanoue et al. 2004), and we plan to repeat our experiments with these mutants, in order to confirm our results.

      (3) The egg-laying profiles obtained show clear damping/decaying trends which necessitates careful trend removal from the data to make any sense of the rhythm. Further, the detrending approach used by the authors is not tested for artefacts introduced by the 24h moving average used.

      In the revised version we will show that the detrending approach used does not introduce any artefacts. The analysis of numerical simulations with an aperiodic stochastic signal superposed to a decaying signal shows that the detrending method used does not result in a spurious periodic signal. Furthermore, we can show that when the underlying signal is rhythmic, the correct period is obtained even when the moving average is a few hours larger or smaller than 24 h.

      (4) According to the authors the oviposition device cannot sample at a resolution finer than 4 hours, which will compel any experimenter to record egg laying for longer durations to have a suitably long time series which could be useful for circadian analyses.

      We apologize for not being clear enough. The device can in principle sample at any desired resolution. Notice, however, that the variable we are analyzing (number of eggs laid by a single female) has only a few possible values, which is one of the features that render the assessment of rhythmicity a particularly difficult task. If egg laying is sampled more often (say, at 2 h intervals) more time points will be available, but the values available for each time point will be much less. We will show an example where we compare both rates (2h and 4h). Even though the 2h sampling reveals the rhythmicity of the time series, the significance of the peaks obtained is less than when sampling at 4h intervals. We have found that a 4h sampling seems to provide the best compromise between frequency of the sampling and discreteness of the variable.

      On the other hand, it is important to stress that sampling frequency and longer durations are not very correlated (see e.g. Cohen et al. Journal of Theoretical Biology 314, pp 182 [2012]). It has been shown that the best way to make accurate predictions of the period of a rhythmic signal is to have a series spanning many cycles, irrespective of the sampling frequency. In other words, it is not true that with a 2h sampling it would be possible to analyze shorter series than with 4h sampling. Unfortunately, egg laying records are usually less than 5 cycles long, which is one of the reasons for the difficulties in the assessment of their rhythmicity.

      (5) Despite reducing the interference caused by manually measuring egg-laying, the rhythm does not improve the signal quality such that enough individual rhythmic flies could be included in the analysis methods used. The authors devise a workaround by combining both strongly and weakly rhythmic (LSpower > 0.2 but less than LSpower at p < 0.05) data series into an averaged time series, which is then tested for the presence of a 16-32h "circadian" rhythm. This approach loses valuable information about the phase and period present in the individual mated females, and instead assumes that all flies have a similar period and phase in their "signal" component while the distribution of the "noise" component varies amongst them. This assumption has not yet been tested rigorously and the evidence suggests a lot more variability in the inter-fly period for the egg-laying rhythm.

      The assumption is difficult to test rigorously, since for individual flies the records seem to be so noisy that no information can be extracted. As shown in the paper, it is even very difficult to assess the presence of rhythmicity at the individual level. We consider that the appearance of a rhythm after averaging several records shows the presence of this rhythm at the individual level. But it could be argued that the presence of rhythmicity in the average record could be due to only a few (or even a single) rhythmic individuals. In order to show that this is probably not the case, in the revised version we will show that, when the individuals that are rhythmic are left out, the average of the remaining flies still shows a rhythm (albeit a weaker one, as was to be expected).

      Regarding our assumption that all flies have the “same” period, the results on Fig. 1 F cannot really rule out this possibility, because with so few cycles, the determination of the period is not very accurate (see e.g. Cohen et al. Journal of Theoretical Biology 314, pp 182 [2012]). In our case, the error for the period is related to the width of the corresponding peak in the periodogram, which is typically 4 hs. In any case, in the revised version we will try to show, by using numerical simulations, that when the individual periods are not the same, but are distributed approximately as in Fig 1F, the average series is still rhythmic with the correct period.

      (6) This variability could also depend on the genotype being tested, as the authors themselves observe between their Canton-S and YW wild-type controls for which their egg-laying profiles show clearly different dynamics. Interestingly, the averaged records for these genotypes are not distinguishable but are reflected in the different proportions of rhythmic flies observed. Unfortunately, the authors also do not provide further data on these averaged profiles, as they did for the wild-type controls in Figure 1, when they discuss their clock circuit manipulations using perRNAi. These profiles could have been included in Supplementary figures, where they would have helped the reader decide for themselves what might have been the reason for the loss of power in the LS periodogram for some of these experimental lines.

      Even though we think that the individual records are in general too noisy to be really informative, we will provide all the individual egg profiles in the Supplementary Material of the revised version, in order to let the reader, check this for herself/himself.

      (7) By selecting 'the best egg layers' for inclusion in the oviposition analyses an inadvertent bias may be introduced and the results of the assays may not be representative of the whole population.

      We agree that this may introduce some bias in the results. But in our opinion this bias is very difficult to avoid, since for females that lay very few eggs, rhythmicity can even be difficult to define (some females can spend a whole day without laying a single egg). On the other hand, even when the results may not be representative of the whole population, they would be representative of the flies that lay most of the eggs in a population, which seems to be very relevant in ecological terms.

      (8) An approach that measures rhythmicity for groups of individual records rather than separate individual records is vulnerable to outliers in the data, such as the inclusion of a single anomalous individual record. Additionally, the number of individual records that are included in a group may become a somewhat arbitrary determinant for the observed level of rhythmicity. Therefore, the experimental data used to map the clock neurons responsible for oviposition rhythms would be more convincing if presented alongside individual fly statistics, in the same format as used for Figure 1.

      The question of possible rhythmic outliers has been addressed above, in question 5, where we discuss why we think that such outliers are not “determinant for the observed level of rhythmicity”. As also mentioned above, even though we think that they are too noisy to be informative, we plan to include all individual profiles in the Supplementary Material.

      (9) The features in the experimental periodogram data in Figures 3B and D are consistent with weakened complex rhythmicity rather than arrhythmicity. The inclusion of more individual records in the groups might have provided the added statistical power to demonstrate this. Graphs similar to those in 1G and 1I, might have better illustrated qualitative and quantitative aspects of the oviposition rhythms upon per knockdown via MB122B and Mai179; Pdf-Gal80.

      We assume that the features mentioned refer to the appearance in the periodograms of two small peaks under the significance lines. We are aware that in the studies of the rhythmicity of locomotor activity such features are usually interpreted as “complex rhythms”, i.e. as evidence of the existence of two different mechanisms producing two different rhythms in the same individual. In our case, however, at least two other possibilities should be taken into account. Since the periodograms we show assess the rhythmicity of the average time series of several individuals, the two small peaks could correspond to the periods of two different subpopulations. Another possibility could be that such peaks are simply an artifact of the method in the analysis of time series that consist of very few cycles (as explained above) and also few points per cycle. A cursory examination of the individual profiles, that will be provided in the new version, do not seem to support any of the first two possibilities mentioned. On the other hand, we will show evidence that the analysis of series that are perfectly random sometimes result in periodograms with some small peaks.

    1. eLife Assessment

      This important study demonstrates the ability for high-throughput recording and categorization of unconstrained and stimulus-based behaviors across a very large population of marmosets (n = 120 animals across 36 family units). The authors implement an analytical approach to identify "outlier" behavior that could be key in the development of next-generation precision psychiatry. While the strength of evidence appears solid overall, many key methodological details are incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      The authors demonstrate a fully unsupervised, high throughput (meaning very low human interaction required) approach to quantifying marmoset behavior in unconstrained environments.

      Strengths:

      The authors provide an approach that is scalable, easy to implement at face value, and highly robust. Currently, most behavioral quantification approaches do not work well on marmosets, or the published examples that do look promising do not scale towards high throughput as demonstrated by the authors.

      While marmosets can certainly be a useful translational research model devoid of free behavior quantification, the authors make a compelling point about how this approach can be useful in the study of treatments of emerging marmoset disease models.

      Overall this is a very exhaustive manuscript that overcomes significant shortcomings in previous work and speaks highly to the use of marmosets for unconstrained behavioral and neural assessment.

      Weaknesses:

      Recording marmoset behavior with a 60Hz frame rate is a significant limitation to the approach which is hopefully easily alleviated in the future through better cameras/reconstruction pipelines. Marmosets (in the reviewers' experience) have a lot of motion energy above the 30Hz nyquist limit imposed by this system and are agile to a degree requiring higher frame rates.

      The manuscript neglects recent approaches to non-human primate behavioral quantification from other groups that should be included. Simians are simians after all.

      As a minor weakness, this reviewer would have liked to see code shared for the reviewers to evaluate, especially pertaining to the high throughput and robustness of the approach.

    3. Reviewer #2 (Public review):

      In this manuscript, Menegas et al. classify the "control" behavior of captive marmosets. They combine behavioral screening from video recordings with audio and neural recordings (from the striatum) to better define what can be considered a typical behavioral repertoire for captive marmoset monkeys. A range of analyses is presented, investigating various aspects of behavior, such as social interactions and the detection of atypical individuals.

      The manuscript is compelling in many respects, especially due to the richness of the dataset and the breadth of analyses presented. However, a significant issue with the manuscript lies in its writing: the results are conveyed in an overly succinct and superficial manner, and the "Methods" section is nearly absent. Key concepts are often undefined, and the mathematical details underlying the figures are not explained, leaving readers to guess the authors' approach.

      Another issue is the vague use of the term "natural behavior." All data presented here appear to have been collected in small cages with limited climbing opportunities and enrichment. Thus, the authors should refrain from using "natural" to describe these conditions.

      Below, we elaborate further on the lack of methodological detail. Based on these issues, we believe the manuscript, in its current form, does not meet the scientific standards necessary for proper review. We strongly encourage the authors to undertake an extensive revision.

      Major Revision Points:

      The methods and results require significantly more detail. A scientific publication should provide readers with enough information to reproduce the study. Here, the detail level is far too low to fully understand, or reproduce, the study, and in many instances, readers are left to guess how the figure panels were produced. Below is a non-exhaustive list of examples illustrating these issues:

      (1) "we temporarily placed horizontal cage dividers to reduce the total cage size during data collection": What were the resulting (and initial) cage dimensions?

      (2) "After training the network, we hierarchically clustered the latent space": What is the latent space? Based on Figure 2a, it appears related to the network's recurrent layer, but this is not clarified in the text.

      (3) Alpha and perplexity parameters: Please define these terms. Since these concepts appear fundamental, readers should not have to consult external references.

      (4) "We then traced cluster identities across hierarchical levels": What are hierarchical levels?

      (5) "To understand how the input time series data was weighed in the bottleneck layer of the model": What is the bottleneck layer?

      (6) "we measured the average attention allocation to previous time points": The authors should define "attention allocation."

      (7) "we compared each neuron's firing rate distribution to shuffled data based on the overall frequency of each behavior during the session": This description is insufficient to understand the analysis.

      (8) "we hierarchically clustered neurons according to their firing rate enrichment maps": No mathematical explanation is provided for neuron clustering, nor is the concept of a "firing rate enrichment map" clarified.

      (9) "Cluster 4 showed higher activity when neurons were 'alone' or 'active'": This is vague and uses unclear jargon (e.g., "neurons alone"). Additionally, no mathematical explanation is provided for assigning neuronal activity to behavioral states.

      (10) Figure 3f, right-side panels: The analysis seems to involve cage mate positioning, yet no description is provided.

      (11) "we used motion watches to measure activity across all hours": Are these motion-sensitive watches physically attached to the animals? The methodology should be described, including data analysis details.

      This list could continue, but we trust the authors understand the point. There is a wealth of analyses and information in this study, but the descriptions are too superficial. We understand that fully describing each analysis may require significant rewriting, including supplementary figures, and will likely make the manuscript longer. This is entirely acceptable, as the ideas presented here are worth the added rigor.

      "Natural behavior": Typically, the term "natural" suggests that the dataset reflects the range of behaviors exhibited by animals in the wild. Here, however, recordings were made in a small cage with limited climbing opportunities and enrichment. Under these conditions, it's hard to justify describing the behavior as "natural". In a project aimed at classifying the behavioral repertoire of marmoset monkeys and making this dataset accessible to other laboratories, it would be helpful to include more detailed information about the animals' housing conditions. This might include cage sizes, temperature, humidity, and details on food quantities, quality, and feeding times.

      Correlation versus causation: In the section titled "Large-scale data collection reveals variability across days and correlation between cagemates," the authors conclude: "Overall, these results indicate that measurements of animals' behavioral traits depend heavily on their social environment." This interpretation seems incorrect. We know that animal behavior varies throughout the day, with activity peaks typically occurring in the morning and afternoon. Such factors, or other external influences, could induce correlations between animals that are not caused by social interactions.

      Figure 4g: What are we intended to conclude from this analysis?

      Figure 5: Please specify the type of calls analyzed. For example, did you analyze only long-distance calls (aka 'loud phees' or 'shrills')? In "We split the audio data into 5-minute (non-continuous) segments and found that the average call rate in these segments varied from 0 calls per minute to 60 calls per minute (Fig. 5d-e)," does the call rate refer to individual animals or the entire cage?

      "This implies that a high rate of calls in a room can interrupt animals during social resting states and cause them to preferentially exhibit more active/attentive states." Does it? This could simply indicate that more active animals produce more calls.

      "We recorded neural activity in the striatum because it is known to contain diverse signals related to movement and social interactions." While I understand that the authors intend to publish neural data separately, a brief discussion of the striatum's role here would be helpful.

    4. Author response:

      We would like to thank the editors and reviewers for taking the time to help improve our manuscript. We appreciate the feedback and will definitely increase the level of methodological detail in a revised submission.

      Here is a brief summary of our plan to address the points raised by the reviewers. We will respond to the comments in a point-by-point manner when we resubmit a revised manuscript.

      Reviewer 1

      This reviewer raised a question about the 60 Hz frame rate for recording. We agree that increasing the number of cameras and frame rate would improve the tracking quality, but this would come at the cost of scalability. In the current study (and other concurrent studies in the lab), we recorded from 10-20 families simultaneously to try to sample the distribution of behavioral responses to stimuli observed in animals in our colony. This was only possible logistically because of the lightweight equipment design allowing us to record data from animals without large disruptions to their home-cage environment.

      One strategy for acquiring higher-resolution data is to build a small number of enclosures that are fully surrounded by cameras, and to cycle animals through these enclosures (1). However, this strategy limits throughput by reducing the number of animals per day that can be studied. If the size and cost of cameras and computers decreases in the future, then this recording strategy will be scalable to the whole-colony level. For our current study and analysis, we are limited by the resolution of our dataset. We do believe that our data (although not a perfect 3d reconstruction or an extremely high frame rate) is sufficient to label behavioral states with high accuracy. We will add a figure to more clearly show that behavioral state data can be accurately inferred from this imperfect data, which has also been recently highlighted by other groups (2).

      Additionally, with recent progress in the application of deep learning to animal pose tracking, new models can infer 3d pose dynamics from 2d data (3) and leverage spatiotemporal structure to clean up noisy data (4). We believe that other groups will be able to use these types of approaches to extract much more value from this dataset. So, in summary, we do understand the concern related to reconstruction quality and will 1) more clearly define the usefulness of our current models, 2) release our data and code so that others can build upon it or repurpose it, and 3) plan future experiments with higher camera count and frame rate as permitted by logistical constraints. 

      Reviewer 2

      This reviewer asked for an increased level of methodological detail. We will try to address this in a few ways:

      (1) Code and data sharing. We believe that many of the questions related to the methodology will be best answered by sharing the data and code directly. Because there is a large amount of code associated with this manuscript, it is impractical to list every step and every parameter in the paper. Along with our revised manuscript, we will make our data and code publicly available. That said, we will improve our description of key parameters in the paper as the reviewer suggested.

      (2) More detailed Methods section. The reviewer asked us to provide more methodological detail. We understand that this is currently a weakness of our manuscript, and we will focus on addressing it. For instance, the reviewer rightly points out that we did not describe the motion watches used to generate the data in Figure S7. We will address this.

      (3) Simplify the manuscript. The paper currently has 22 figures, and further analysis could be done based on the results shown in any of them. For instance, this reviewer asked us to add a comparison across females and males (similar to our comparison of juveniles and adults). While we plan to add that analysis, we recognize that there are several figures/panels that are not closely related to our intended goal of describing the patterns we found in our large dataset. We will simplify the manuscript by removing some excess figures/panels and focus on describing the parts of the analysis that are crucial to our conclusions in greater detail.

      (4) More careful language. This reviewer pointed out that there were some inaccuracies with our descriptive language. For instance, we used the term "natural" behavior to describe the behavior of animals in captivity, which may more accurately be described as their home-cage behavior. We will be more careful to align our language to the standard for the field. For instance, several studies refer to unrestrained behavior in a laboratory setting as "spontaneous" behavior rather than "natural" behavior (5). In our case, the data consists of both spontaneously occurring behavior and responses to a set of stimuli. We will make sure that the descriptions are more precise in the revised manuscript.

      (1) Bala, P. C. et al. Automated markerless pose estimation in freely moving macaques with OpenMonkeyStudio. Nat Commun 11, (2020).

      (2) Weinreb, C. et al. Keypoint-MoSeq: parsing behavior by linking point tracking to pose dynamics. bioRxiv (2023) doi:10.1101/2023.03.16.532307.

      (3) Gosztolai, A. et al. LiftPose3D, a deep learning-based approach for transforming two-dimensional to three-dimensional poses in laboratory animals. Nat Methods 18, 975–981 (2021).

      (4) Wu, A. et al. Deep Graph Pose: a semi-supervised deep graphical model for improved animal pose tracking. Adv Neural Inf Process Syst 33, 6040–6052 (2020).

      (5) Levy, D. R. et al. Mouse spontaneous behavior reflects individual variation rather than estrous state. Curr Biol 33, 1358-1364.e4 (2023).

    1. eLife Assessment

      This useful work identifies a key role for Tachykinin-1 parasubthalamic neurons in avoidance learning. At present, the evidence for the conclusions regarding fiber photometry, viral transfection, reporting of behavioral outcomes, and pathway-specificity is incomplete. This work will be of interest to neuroscientists studying neural mechanisms for avoidance and aversion.

    2. Reviewer #1 (Public review):

      This study is focused on a population of neurons in the mouse parasubthalamic nucleus (pSTN) that express Tackhykinin1 (Tac1). This gene has been used before to target pSTN for functional circuit studies because it is fairly selective for pSTN in this region, though it targets only a subset of pSTN neurons. Prior work has shown that activity in these neurons can impact motivated behaviors, including feeding and drinking behaviors, and that their activity is associated with aversion or avoidance behaviors. While not breaking much new ground, this study adds to that work by making use of a 2-way active avoidance assay, where a CS predicts a US (footshock), that the mice can escape. Using fiber photometry the authors show convincing evidence that Tac1 neurons in pSTN increase their activity in response to a US footshock, and that after some pairings the neurons will start responding to the CS too, though to a lesser extent than the US. Their most important data shows that either ablation or optogenetic inhibition of these cells can hugely block the active avoidance (escape) behavior, suggesting these neurons are key for the performance of this task, which they interpret as key for learning the task (but see more below). They show that optogenetic stimulation is aversive in a real-time place assay, and when paired with footshock can enhance active avoidance behavior. Finally, they show that Tac1 pSTN axons in PVT recapitulate these effects while showing that axons in CEA or PBN may only recapitulate some of these effects (more below). Overall I think the data is solid and shows that the activity of Tac1 pSTN neurons in the 2 way active avoidance task is causally related to avoidance behavior in the direction that would be predicted by recent literature. However, I think the authors overstate the conclusions in the title, abstract, and text. I do not think the data make a strong case for a role for these cells in learning, at least in any classical sense, as used in the title and abstract and elsewhere. Also the statement in the abstract that the pSTN mediates its effects 'differentially' through its downstream targets is not convincingly supported by data.

      Major concerns:

      (1) The authors infer that the activity in the Tac1 pSTN neurons is necessary for aversive or avoidance 'learning'. But this is not well defined, what exactly does that mean and what types of evidence would support or falsify such a hypothesis? Moreover, the authors show convincingly, and in line with prior reports, that these cells are activated by aversive stimuli (here footshock), and that activation of these cells is sufficient to induce avoidance behavior. Because manipulation of these cells can serve as a primary negative reinforcer, it becomes even more challenging and important to explain how experiments that manipulate these cells while measuring behavior/performance can discriminate between changes in: (1) primary aversion, (2) motivation to avoid, (3) associative learning, or (4) memory/retrieval. The authors seem to favor #3, but they don't make a clear case for this point of view or else what they mean by 'avoidance learning'. In my opinion, the data do not well discriminate between possibilities 1 through 3. The authors should clarify their logic and temper their conclusions throughout.

      (2) Abstract line 37 is not well supported. The authors focus mostly on pSTN projections to PVT and show that the measurements or manipulation of these axons recapitulates the effects seen with pSTN cell bodies. The authors do fewer studies of axons in CeA and PBN, but do find that they can recapitulate the effects with opsin inhibition, but detect no effects with opsin stimulation. However, the lack of effect with opsin stimulation in Figure S7a-e proves very little on its own. It could be technical, due to inadequate expression or functional efficacy. It is not supported by histological and functional evidence that the manipulation was effective. Overall I can only conclude that the projections to these regions might be very similar (based on the inhibition data), or might be a little different. The data are thus inadequate to support the authors' claim that the pSTN mediates learning differentially through its downstream targets.

      Other concerns:

      (3) Line 93 is not adequately supported by data in Figure 1b. Additional data is needed that shows expression across cases, including any spread that may be visible when zooming out from pSTN. Additional methods are needed to indicate what exclusion criteria were applied and how many mice were excluded. These data could help support the statement on line 93 that expression was largely restricted within pSTN.

      (4) From the results and methods it is not clear where the GFP signal would come from in the mice expressing Casp3 for the ablation studies. It is therefore not clear if the absence of GFP should be taken as evidence of cell loss. For example, it is not clear if multiple vectors were used, if volumes and titers were carefully matched between control groups, or if competition/occlusion between AAVs could be ruled out. It is also not clear how this was quantified, that is how many sections/subjects and how counting was done. It is not clear how long was waited between the AAV infusion, behavior, and euthanasia, perhaps especially important for the ablation done after avoidance learning occurred.

      (5) The authors should consider showing individual measurements and not just mean/sem wherever feasible, for example, to support the statement on line 141 that 'all ablated mice showed...'.

      (6) S3 is an important control for interpreting data in Figure 2d-i. Something similar is needed to support the inferences made in 2j-u. The very strong effect showing a lack of active avoidance in response to CS or the US when pSTN Tac1 neurons are inhibited during CS or during US suggests that something gross may be going on, such as a gross motor or sensory response that supersedes the effect of footshock. The authors do not comment on whether there are any gross behavioral responses to the inhibition, but an experiment as in S3 is needed, for example, to show that behavior is intact during pSTN inhibition if delivered after the mice already learned to associate CS with US.

      (7) The authors use 100 shocks of 0.8 mA for 7 days. I think this is quite strong and in the pSTN inhibition experiments it seems to be functionally 'inescapable' and could thus produce behaviors similar to 'learned helplessness'. Can the authors consider whether this might contribute to the striking findings they observed in their opsin inhibition assays?

      (8) The description of the experiment in S5 is inadequate. What are the adjacent areas? Where do the authors see spread? The use of the word 'case' in figure S5 implies an individual case, but the legend says 5 mice were used for 'case 1' and 3 mice were used for 'case 2'. The use of the word 'off-target in the figure implies that the expression was of the intended target. But the text of results and methods implies it was intentional targeting of unnamed and unshown adjacent regions. This should be clarified.

      (9) The authors suggest the CPA study is divergent from Serra et al 2023. Though I think this could be due to how the conditioning was done, it would be helpful for the authors to include less processed data. This would aid in possible interpretations for any divergences across studies. Can the authors include raw data (in seconds of time spent) in each compartment for each group across baseline and test days?

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Hu et. al presents a clearly-designed examination of the role of tachykinin1-expressing neurons in the parasubthalamic nucleus of the lateral posterior hypothalamus (PTSN) in active avoidance learning. These glutamatergic neurons have previously been implicated in responding to negative stimuli. This manuscript expands the current understanding of PTSNTac1 neurons in learned responses to threats by showing their role in encoding and mediating the active avoidance response. The authors first use bulk fiber photometry imaging to show the encoding of the active avoidance procedure, followed by cell-type specific manipulations of PTSNTac1 neurons during active avoidance. Finally, they show that encoding and mediation of active avoidance in a downstream target of PTSNTac1 neurons, the PVT/intermediodorsal nuclei of the dorsal thalamus (IMD), has the same effect as what was discovered in the cell body. This contrasts other output regions of the PTSN, such as the PBN and CeA, which were not found to promote active avoidance learning. The experiments presented were well-designed to support the conclusions of the authors, however, the manuscript is missing several key control experiments and supplemental information to support their main findings.

      Strengths:

      The manuscript provides information on a brain region and downstream target that mediates active avoidance learning. The manuscript provides valuable information via necessity and sufficiency experiments to show the role of the population of interest (PTSNTac1 neurons) in active avoidance learning. The authors also performed most behavior experiments in male and female mice, with adequate power to address potential sex differences in the control of active avoidance by PTSNTac1 neurons. Finally, the manuscript provides valuable information about the specificity of the PTSNTac1 downstream target in regulating active avoidance learning, identifying the PVT/intermediodorsal nuclei of the dorsal thalamus as the key target and ruling out the PBN and CeA.

      Weaknesses:

      However, several main conclusions of the paper must be interpreted carefully due to missing or inadequate control experiments and histological verification.

      (1) Inadequate presentation of viral localization. The authors state that expression was "largely restricted within PSTN" however there is no quantification of the amount of viral expression beyond the target region. Given that Tac1 is expressed in neighboring regions, it is critical to show the viral expression and fiber implant location data for all animals included in the figures. Furthermore, criteria for inclusion and exclusion based on mistargeting should be delineated. This should also be clearly outlined for the experiments in Figure S5, where "behavioral effects of activation of sparsely Tac1-expressing neurons in two adjacent areas of PSTN" was tested but the location of viral expression in those cases is unclear.

      (2) Lack of motion artifact correction with isosbestic signal for GCamp recordings. It is appreciated that the authors included a separate EGFP-expressing group to compare to the GCamp-expressing group, however, additional explanation is required for the methods used to analyze the raw fluorescent signal. Namely, were fluorescent signals isosbestic-corrected prior to calculating ΔF/F? If no isosbestic signal was used to correct motion artifacts within a recording session, additional explanation is needed to explain how this was addressed. The lack of motion artifacts in the EGFP signal in a separate cohort is inadequate to answer this caveat as motion artifacts are within-animal.

      (3) Missing control experiment demonstrating intact locomotor performance in caspase ablation experiments. The authors use caspase ablation of PTSNTac1 neurons prior to active avoidance learning to appraise the necessity of this cell population. However, a control experiment showing intact locomotor ability in ablated mice was not performed.

      (4) Missing control experiment demonstrating [lack of] valence with PTSN silencing manipulations. The authors performed a real-time and conditioned place preference experiments for ChR2-expressing mice (Fig 3M) and found stimulation to be negatively-valenced and generate an aversive memory, respectively. Absent this control experiment with silencing, an alternative conclusion remains possible that optogenetic silencing via GtACR2 created nonspecific location preferences in the active avoidance apparatus, confounding the interpretation of those results.

      (5) Incomplete analysis of sex differences. Data in female mice is conspicuously missing from inhibition experiments. The rationale for exclusion from this dataset would be useful for the interpretation of the other noted sex differences.

    4. Reviewer #3 (Public review):

      Summary:

      This study by Hu et al. examined the role of tachykinin1 (Tac1)-expressing neurons in the para subthalamic nucleus (PSTH) in active avoidance of electric shocks. Bulk recording of PSTH Tac1 neurons or axons of these neurons in PVT showed activation of a shock-predicting tone and shock itself. Ablation of these neurons or optogenetic manipulation of these neurons or their projection to PVT suggests the causality of this pathway with the learning of active avoidance.

      Strengths:

      This work found an understudied pathway potentially important for active avoidance of electric shocks. Experiments were thoroughly done and the presentation is clear. The amount of discussion and references are appropriate.

      Weaknesses:

      Critical control experiments are missing for most experiments, and statistical tests are not clear or not appropriate in most parts. Details are shown below.

      (1) There are some control experiments missing. Notably, optogenetic manipulation is not verified in any experiments. It is important to verify whether neural activation with optogenetic activation is at the physiological level or supra-physiological level, and whether optogenetic inhibition does not cause unwanted activity patterns such as rebound activation at the critical time window.

      (2) Neural ablation with caspase was confirmed by GFP expression. However, from the present description, a different virus to express EITHER caspase or GFP was injected, and then the numbers of GFP-expressing neurons were compared. It is not clear how this can detect ablation.

      (3) In many places, statistical approaches are not clear from the present figures, figure legends, and Methods. It seems that most statistics were performed by pooling trials, but it is not described, or multiple "n" are described. For example, it is explicitly mentioned in Figure 4H, "n = 3 mice, n = 213 avoidance trials and n = 87 failure trials". The authors should not pool trials, but should perform across-animal tests in this and other figures, and "n" for statistical tests should be clearly described in each plot.

      (4) It is also unclear how the test types were selected. For example, in Figure 1K and O with similar datasets, one is examined by a paired test and the other is by an unpaired test. Since each animal has both early vs late trials, and avoidance vs failure trials, paired tests across animals should be performed for both.

      (5) It is also strange to show violin plots for only 6 animals. They should instead show each dot for each animal, connected with a line to show consistent increases of activity in late vs early trials and avoidance vs failure trials.

      (6) To tell specificity in avoidance learning, it is better to show escape in the current trials with optogenetic manipulation.

      (7) For place aversion, % time decrease across days was tested. It is better to show the original number before normalization, as well.

      (8) For anatomical results in Figure S6, it is important to show images with lower magnification, too.

      (9) Inactivation of either pathway from PSTH to PBN or to CeA also inhibits active avoidance, but the authors conclude that these effects are "partial" compared to the inactivation of PSTH to PVT. It is not clear how the effects were compared since the effects of PSTH-CeA inactivation are quite strong, comparable to PSTH-PVT inactivation by eye. They should quantify the effects to conclude the difference.

      (10) Supplementary table 1: as mentioned above, n for statistical tests should be clearer.

    5. Author response:

      Reviewer #1 (Public review):

      This study is focused on a population of neurons in the mouse parasubthalamic nucleus (pSTN) that express Tackhykinin1 (Tac1). This gene has been used before to target pSTN for functional circuit studies because it is fairly selective for pSTN in this region, though it targets only a subset of pSTN neurons. Prior work has shown that activity in these neurons can impact motivated behaviors, including feeding and drinking behaviors, and that their activity is associated with aversion or avoidance behaviors. While not breaking much new ground, this study adds to that work by making use of a 2-way active avoidance assay, where a CS predicts a US (footshock), that the mice can escape. Using fiber photometry, the authors show convincing evidence that Tac1 neurons in pSTN increase their activity in response to a US footshock, and that after some pairings the neurons will start responding to the CS too, though to a lesser extent than the US. Their most important data shows that either ablation or optogenetic inhibition of these cells can hugely block the active avoidance (escape) behavior, suggesting these neurons are key for the performance of this task, which they interpret as key for learning the task (but see more below). They show that optogenetic stimulation is aversive in a real-time place assay, and when paired with footshock can enhance active avoidance behavior. Finally, they show that Tac1 pSTN axons in PVT recapitulate these effects while showing that axons in CEA or PBN may only recapitulate some of these effects (more below). Overall I think the data is solid and shows that the activity of Tac1 pSTN neurons in the 2 way active avoidance task is causally related to avoidance behavior in the direction that would be predicted by recent literature. However, I think the authors overstate the conclusions in the title, abstract, and text. I do not think the data make a strong case for a role for these cells in learning, at least in any classical sense, as used in the title and abstract and elsewhere. Also, the statement in the abstract that the pSTN mediates its effects 'differentially' through its downstream targets is not convincingly supported by data.

      We are very pleased that Reviewer 1 thought our data is solid.

      Major concerns:

      (1) The authors infer that the activity in the Tac1 pSTN neurons is necessary for aversive or avoidance 'learning'. But this is not well defined, what exactly does that mean and what types of evidence would support or falsify such a hypothesis? Moreover, the authors show convincingly, and in line with prior reports, that these cells are activated by aversive stimuli (here footshock), and that activation of these cells is sufficient to induce avoidance behavior. Because manipulation of these cells can serve as a primary negative reinforcer, it becomes even more challenging and important to explain how experiments that manipulate these cells while measuring behavior/performance can discriminate between changes in: (1) primary aversion, (2) motivation to avoid, (3) associative learning, or (4) memory/retrieval. The authors seem to favor #3, but they don't make a clear case for this point of view or else what they mean by 'avoidance learning'. In my opinion, the data do not well discriminate between possibilities 1 through 3. The authors should clarify their logic and temper their conclusions throughout.

      Thank you Reviewer 1 for providing us insightful suggestions. Based on our fiber photometry data that the activities of PSTN Tac1+ neurons show a significant increase in CS-evoked calcium fluorescent signals in late trials relative to those in early trials (Figure 1H-K) and our optogenetic inhibition experiments during CS (Figure 2N-Q), these results illustrate that the activities of PSTN Tac1+ neurons are modulated by learning and are required for active avoidance learning. Moreover, PSTN Tac1+ neurons are activated by footshock and activation of these cells is sufficient to induce avoidance behavior. These findings demonstrate that PSTN Tac1+ neurons encode aversive information. Together, our current data support that PSTN Tac1+ neurons encode both aversive event and its predicting cue. We will clarify our conclusions in the revised manuscript.

      (2) Abstract line 37 is not well supported. The authors focus mostly on pSTN projections to PVT and show that the measurements or manipulation of these axons recapitulates the effects seen with pSTN cell bodies. The authors do fewer studies of axons in CeA and PBN, but do find that they can recapitulate the effects with opsin inhibition, but detect no effects with opsin stimulation. However, the lack of effect with opsin stimulation in Figure S7a-e proves very little on its own. It could be technical, due to inadequate expression or functional efficacy. It is not supported by histological and functional evidence that the manipulation was effective. Overall, I can only conclude that the projections to these regions might be very similar (based on the inhibition data), or might be a little different. The data are thus inadequate to support the authors' claim that the pSTN mediates learning differentially through its downstream targets.

      In the revised version of manuscript, we will provide more histological and functional evidence for the PSTN-to-CeA and PSTN-to-PBN circuits to support our conclusion on the functional roles of these downstream targets. Similar with our anterograde experiment that the PSTN densely projects to CeA and PBN (Figure S6), optogenetic activation and inhibition experiments showed dense axonal terminals in the CeA and PBN from the PSTN and this line of data will be included in the revised manuscript. In addition, we will further examine these circuits by investigating the functional roles of CeA-projecting or PBN-Projecting PSTN neurons during 2-way active avoidance task.

      Other concerns:

      (3) Line 93 is not adequately supported by data in Figure 1b. Additional data is needed that shows expression across cases, including any spread that may be visible when zooming out from pSTN. Additional methods are needed to indicate what exclusion criteria were applied and how many mice were excluded. These data could help support the statement on line 93 that expression was largely restricted within pSTN.

      In the revised version of manuscript, we will provide larger example images containing pSTN and its adjacent areas to demonstrate that the viral expression is well restricted into this brain area. Moreover, we will provide detailed information on the exclusion criteria and the number of mice excluded in the Method section.   

      (4) From the results and methods it is not clear where the GFP signal would come from in the mice expressing Casp3 for the ablation studies. It is therefore not clear if the absence of GFP should be taken as evidence of cell loss. For example, it is not clear if multiple vectors were used, if volumes and titers were carefully matched between control groups, or if competition/occlusion between AAVs could be ruled out. It is also not clear how this was quantified, that is how many sections/subjects and how counting was done. It is not clear how long was waited between the AAV infusion, behavior, and euthanasia, perhaps especially important for the ablation done after avoidance learning occurred.

      I totally agree with Reviewer 1’s concerns. We will perform immunohistochemistry or in situ hybridization for Tachykinin-1 itself and then measure colocalization of GFP with Tachykinin-1 inside and outside of the PTSN, and the degree of absence of Tachykinin-1 in Casp mice. In addition, we will provide more detailed experimental information in the revised manuscript.

      (5) The authors should consider showing individual measurements and not just mean/sem wherever feasible, for example, to support the statement on line 141 that 'all ablated mice showed...'.

      Thank you Reviewer 1 for this suggestion. We will re-plot the data as individual measurements in the revised manuscript.

      (6) S3 is an important control for interpreting data in Figure 2d-i. Something similar is needed to support the inferences made in 2j-u. The very strong effect showing a lack of active avoidance in response to CS or the US when pSTN Tac1 neurons are inhibited during CS or during US suggests that something gross may be going on, such as a gross motor or sensory response that supersedes the effect of footshock. The authors do not comment on whether there are any gross behavioral responses to the inhibition, but an experiment as in S3 is needed, for example, to show that behavior is intact during pSTN inhibition if delivered after the mice already learned to associate CS with US.

      Thank you Reviewer 1 for this insightful suggestion. During the review process, we have performed this line of experiment as in Figure S3. We measured the behavioral responses during pSTN optogenetic inhibition after the mice already learned to associate CS with US and found most GtACR-expressing mice showed unaffected avoidance learning. This data will be included in the revised manuscript.

      (7) The authors use 100 shocks of 0.8 mA for 7 days. I think this is quite strong and in the pSTN inhibition experiments it seems to be functionally 'inescapable' and could thus produce behaviors similar to 'learned helplessness'. Can the authors consider whether this might contribute to the striking findings they observed in their opsin inhibition assays?

      I agree with the Reviewer 1’s comment on the string findings in the optogenetic inhibition results. Indeed, based on the results on days 1 and 2, optogenetic inhibition of PSTN tac1+ neurons has significantly blocked GtACR-expressing animals’ behavioral performance during 2-way active avoidance task. To examine whether the effect by optogenetic inhibition of these neurons could possibly decline with prolonged training, we conducted additional 5-day training. We will discuss and add this comment in the revised manuscript.

      (8) The description of the experiment in S5 is inadequate. What are the adjacent areas? Where do the authors see spread? The use of the word 'case' in figure S5 implies an individual case, but the legend says 5 mice were used for 'case 1' and 3 mice were used for 'case 2'. The use of the word 'off-target in the figure implies that the expression was of the intended target. But the text of results and methods implies it was intentional targeting of unnamed and unshown adjacent regions. This should be clarified.

      We will add histological images and clarify these comments in the revised manuscript. The purpose of this experiment is to illustrate that even slightly spreading ChR2 viruses into Tac1+ neurons of the adjacent areas of the PSTN did not result in behavioral changes and this will indirectly support the main behavioral function caused by the PSTN tac1+ neurons rather than its neighboring areas. Because Tac1+ neurons outside the PSTN are sparsely expressed, it is quite difficult to completely restrict the viral expression in the PSTN from the anterior to the posterior. Thus, we will provide detailed information on the exclusion criteria and the number of mice excluded in the Method section.   

      (9) The authors suggest the CPA study is divergent from Serra et al 2023. Though I think this could be due to how the conditioning was done, it would be helpful for the authors to include less processed data. This would aid in possible interpretations for any divergences across studies. Can the authors include raw data (in seconds of time spent) in each compartment for each group across baseline and test days?

      We will follow Reviewer 1’s suggestion to include raw data (in seconds of time spent) in each compartment for each group across baseline and test days in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Hu et. al presents a clearly-designed examination of the role of tachykinin1-expressing neurons in the parasubthalamic nucleus of the lateral posterior hypothalamus (PTSN) in active avoidance learning. These glutamatergic neurons have previously been implicated in responding to negative stimuli. This manuscript expands the current understanding of PTSNTac1 neurons in learned responses to threats by showing their role in encoding and mediating the active avoidance response. The authors first use bulk fiber photometry imaging to show the encoding of the active avoidance procedure, followed by cell-type specific manipulations of PTSNTac1 neurons during active avoidance. Finally, they show that encoding and mediation of active avoidance in a downstream target of PTSNTac1 neurons, the PVT/intermediodorsal nuclei of the dorsal thalamus (IMD), has the same effect as what was discovered in the cell body. This contrasts other output regions of the PTSN, such as the PBN and CeA, which were not found to promote active avoidance learning. The experiments presented were well-designed to support the conclusions of the authors, however, the manuscript is missing several key control experiments and supplemental information to support their main findings.

      Strengths:

      The manuscript provides information on a brain region and downstream target that mediates active avoidance learning. The manuscript provides valuable information via necessity and sufficiency experiments to show the role of the population of interest (PTSNTac1 neurons) in active avoidance learning. The authors also performed most behavior experiments in male and female mice, with adequate power to address potential sex differences in the control of active avoidance by PTSNTac1 neurons. Finally, the manuscript provides valuable information about the specificity of the PTSNTac1 downstream target in regulating active avoidance learning, identifying the PVT/intermediodorsal nuclei of the dorsal thalamus as the key target and ruling out the PBN and CeA.

      We highly appreciate that Reviewer 2 thought that our experiments presented were well-designed to support the conclusions and provided valuable information in several aspects.

      Weaknesses:

      However, several main conclusions of the paper must be interpreted carefully due to missing or inadequate control experiments and histological verification.

      (1) Inadequate presentation of viral localization. The authors state that expression was "largely restricted within PSTN" however there is no quantification of the amount of viral expression beyond the target region. Given that Tac1 is expressed in neighboring regions, it is critical to show the viral expression and fiber implant location data for all animals included in the figures. Furthermore, criteria for inclusion and exclusion based on mistargeting should be delineated. This should also be clearly outlined for the experiments in Figure S5, where "behavioral effects of activation of sparsely Tac1-expressing neurons in two adjacent areas of PSTN" was tested but the location of viral expression in those cases is unclear.

      Similar with questions 3 and 8 of Reviewer 1. We will provide the viral expression and fiber implant location data for all animals included in the figures and histological images in Figure S5 in the revised manuscript. Moreover, we will provide detailed information on the exclusion criteria and the number of mice excluded in the Method section.  

      2) Lack of motion artifact correction with isosbestic signal for GCamp recordings. It is appreciated that the authors included a separate EGFP-expressing group to compare to the GCamp-expressing group, however, additional explanation is required for the methods used to analyze the raw fluorescent signal. Namely, were fluorescent signals isosbestic-corrected prior to calculating ΔF/F? If no isosbestic signal was used to correct motion artifacts within a recording session, additional explanation is needed to explain how this was addressed. The lack of motion artifacts in the EGFP signal in a separate cohort is inadequate to answer this caveat as motion artifacts are within-animal.

      We will follow Reviewer 2’s suggestion and perform isosbestic-correction for fluorescent signals prior to calculating ΔF/F. We will re-plot related figures and add this information in the revised manuscript.

      (3) Missing control experiment demonstrating intact locomotor performance in caspase ablation experiments. The authors use caspase ablation of PTSNTac1 neurons prior to active avoidance learning to appraise the necessity of this cell population. However, a control experiment showing intact locomotor ability in ablated mice was not performed.

      We will follow Reviewer 2’s suggestion to perform a control experiment showing intact locomotor ability in caspase 3-ablated mice and will include this data in the revised manuscript.

      (4) Missing control experiment demonstrating [lack of] valence with PTSN silencing manipulations. The authors performed a real-time and conditioned place preference experiments for ChR2-expressing mice (Fig 3M) and found stimulation to be negatively-valenced and generate an aversive memory, respectively. Absent this control experiment with silencing, an alternative conclusion remains possible that optogenetic silencing via GtACR2 created nonspecific location preferences in the active avoidance apparatus, confounding the interpretation of those results.

      Thank you Reviewer 2 for this useful suggestion. We will examine the valence with PTSN silencing manipulations by using a RTPP test and add this data in the revised manuscript.

      (5) Incomplete analysis of sex differences. Data in female mice is conspicuously missing from inhibition experiments. The rationale for exclusion from this dataset would be useful for the interpretation of the other noted sex differences.

      Thank you Reviewer 2 for this useful suggestion. During the review process, we have performed ablation and inhibition experiments in females, demonstrating similar behavioral effects as those in males. We will add these data in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      This study by Hu et al. examined the role of tachykinin1 (Tac1)-expressing neurons in the para subthalamic nucleus (PSTH) in active avoidance of electric shocks. Bulk recording of PSTH Tac1 neurons or axons of these neurons in PVT showed activation of a shock-predicting tone and shock itself. Ablation of these neurons or optogenetic manipulation of these neurons or their projection to PVT suggests the causality of this pathway with the learning of active avoidance.

      Strengths:

      This work found an understudied pathway potentially important for active avoidance of electric shocks. Experiments were thoroughly done and the presentation is clear. The amount of discussion and references are appropriate.

      We are very pleased to have Reviewer 3’s positive comments on the manuscript.

      Weaknesses:

      Critical control experiments are missing for most experiments, and statistical tests are not clear or not appropriate in most parts. Details are shown below.

      (1) There are some control experiments missing. Notably, optogenetic manipulation is not verified in any experiments. It is important to verify whether neural activation with optogenetic activation is at the physiological level or supra-physiological level, and whether optogenetic inhibition does not cause unwanted activity patterns such as rebound activation at the critical time window.

      Thank you Reviewer 3 for this useful suggestion. We will perform in vitro slice recording experiments to verify optogenetic manipulations and add this line of evidence in the revised manuscript.

      (2) Neural ablation with caspase was confirmed by GFP expression. However, from the present description, a different virus to express EITHER caspase or GFP was injected, and then the numbers of GFP-expressing neurons were compared. It is not clear how this can detect ablation.

      Similar with question 4 of Reviewer 1. We will perform immunohistochemistry or in situ hybridization for Tachykinin-1 itself and then measure colocalization of GFP with Tachykinin-1 inside and outside of the PTSN, and the degree of absence of Tachykinin-1 in Casp-ablated mice. In addition, we will provide more detailed experimental information in the revised manuscript.

      (3) In many places, statistical approaches are not clear from the present figures, figure legends, and Methods. It seems that most statistics were performed by pooling trials, but it is not described, or multiple "n" are described. For example, it is explicitly mentioned in Figure 4H, "n = 3 mice, n = 213 avoidance trials and n = 87 failure trials". The authors should not pool trials, but should perform across-animal tests in this and other figures, and "n" for should be clearly described in each plot.

      We have provided all statistical information in the Supplementary Table 1. In the revised manuscript, we will perform across-animal tests, re-plot new figures and provide clear statistical information.

      (4) It is also unclear how the test types were selected. For example, in Figure 1K and O with similar datasets, one is examined by a paired test and the other is by an unpaired test. Since each animal has both early vs late trials, and avoidance vs failure trials, paired tests across animals should be performed for both.

      Following Reviewer 3’s suggestion, we will perform across-animal tests. In the first version of our manuscript, for fiber photometry experiments, we pooled trial data of each animal and performed statistics tests across trials. Because avoidance and failure trials were different, we thus selected an unpaired test for this kind of dataset.

      (5) It is also strange to show violin plots for only 6 animals. They should instead show each dot for each animal, connected with a line to show consistent increases of activity in late vs early trials and avoidance vs failure trials.

      Similar with question 4 of Reviewer 3, we pooled trial data of each animal and performed statistics tests across trials. We will perform across-animal tests and re-plot figures by connecting with a line to show consistent increases of activity in late vs early trials and avoidance vs failure trials for each animal.

      (6) To tell specificity in avoidance learning, it is better to show escape in the current trials with optogenetic manipulation.

      Thank you Reviewer 3 for this useful suggestion. We will follow this suggestion and add this analysis in the revised manuscript.

      (7) For place aversion, % time decrease across days was tested. It is better to show the original number before normalization, as well.

      Similar with question 9 of Reviewer 1, we will show the original number before normalization in the revised manuscript.

      (8) For anatomical results in Figure S6, it is important to show images with lower magnification, too.

      We will follow this suggestion and provide histological images with lower magnification in the revised manuscript.

      (9) Inactivation of either pathway from PSTH to PBN or to CeA also inhibits active avoidance, but the authors conclude that these effects are "partial" compared to the inactivation of PSTH to PVT. It is not clear how the effects were compared since the effects of PSTH-CeA inactivation are quite strong, comparable to PSTH-PVT inactivation by eye. They should quantify the effects to conclude the difference.

      We will quantify the effects of different downstream targets of the PSTN to make a precise conclusion.

      (10) Supplementary table 1: as mentioned above, n for statistical tests should be clearer.

      As mentioned above, we will perform across-animal tests and provide clear statistical information in the figure legends and supplementary table 1.

    1. eLife Assessment

      This study uses a large dataset from both recent isolates and genomes in databases to provide an important analysis of the population structure of the pathogen Salmonella gallinarum. The authors present convincing results regarding the regional adaptation and the evolutionary trajectory of the resistome and mobilome, even though some issues regarding the genomic analysis could be improved. This work will interest microbiologists and researchers working on genomics, evolution, and antimicrobial resistance.

    2. Reviewer #1 (Public review):

      Summary:

      The investigators in this study analyzed the dataset assembly from 540 Salmonella isolates, and those from 45 recent isolates from Zhejiang University of China. The analysis and comparison of the resistome and mobilome of these isolates identified a significantly higher rate of cross-region dissemination compared to localized propagation. This study highlights the key role of the resistome in driving the transition and evolutionary history of S. Gallinarum.

      Strengths:

      The isolates included in this study were from 16 countries in the past century (1920 to 2023). While the study uses S. Gallinarun as the prototype, the conclusion from this work will likely apply to other Salmonella serotypes and other pathogens.

      Weaknesses:

      While the isolates came from 16 countries, most strains in this study were originally from China.

      Comments on revisions:

      This reviewer is happy with the detailed responses from the authors regarding revising this manuscript. I do not have further comments.

    3. Reviewer #2 (Public review):

      Summary:

      The authors sequence 45 new samples of S. Gallinarum, a commensal Salmonella found in chickens, which can sometimes cause disease. They combine these sequences with around 500 from public databases, determine the population structure of the pathogen, and coarse relationships of lineages with geography. The authors further investigate known anti-microbial genes found in these genomes, how they associate with each other, whether they have been horizontally transferred, and date the emergence of clades.

      Strengths:

      - It doesn't seem that much is known about this serovar, so publicly available new sequences from a high burden region are a valuable addition to the literature.<br /> - Combining these sequences with publicly available sequences is a good way to better contextualise any findings.<br /> - The genomic analyses have been greatly improved since the first version of the manuscript, and appropriately analyse the population and date emergence of clades.<br /> - The SNP thresholds are contextualised in terms of evolutionary time.<br /> - The importance and context of the findings are fairly well described.

      Weaknesses:

      - There are still a few issues with the genomic analyses, although they no longer undermine the main conclusions:

      (1) Although the SNP distance is now considered in terms of time, the 5 SNP distance presented still represents ~7yrs evolution, so it is unlikely to be a transmission event, as described. It would be better to use a much lower threshold or describe the interpretation of these clusters more clearly. Bringing in epidemiological evidence or external references on the likely time interval between transmissions would be helpful.

      (2) The HGT definition has not fundamentally been changed and therefore still has some issues, mainly that vertical evolution is still not systematically controlled for. Using a 5kb window is not sufficient, as LD may extend across the entire genome. As the authors have now run gubbins correctly, they could use the results from this existing analysis to find recent HGT. To definite mobilisation, perhaps a standard pipeline such (e.g. https://github.com/EBI-Metagenomics/mobilome-annotation-pipeline) would be more convincing.

      (3) The invasiveness index is better described, but the authors still did not provide convincing evidence that the small difference is actually biologically meaningful (there was no statistical difference between the two strains provided in response Figure 6). What do other Salmonella papers using this approach find, and can their links be brought in? If there is still no good evidence, a better description of this difference would help make the conclusions better supported.

      In summary, the analysis is broadly well described and feels appropriate. Some of the conclusions are still not fully supported, although the main points and context of the paper now appear sound.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The investigators in this study analyzed the dataset assembly from 540 Salmonella isolates, and those from 45 recent isolates from Zhejiang University of China. The analysis and comparison of the resistome and mobilome of these isolates identified a significantly higher rate of cross-region dissemination compared to localized propagation. This study highlights the key role of the resistome in driving the transition and evolutionary 

      Thank you for summarizing our work. According to your comments, we carefully considered and responded to them and made corresponding revisions to the text. Additionally, to fully contextualize the background knowledge and clarify the major points in this study, we add some references.

      Upon further review of our initial manuscript, we realized that the original submission did not strictly follow the lineage order proposed by Zhou et al. (Natl Sci Rev. 2023 Sep 2;10(10):nwad228). To avoid confusion and keep the uniform knowledge in the typing system, we have adjusted the lineage nomenclature along the revised manuscript to reflect the corrected order as follows:

      Author response table 1.

      To ensure consistency with previous studies, we have revised the nomenclature for the different lineages of bvSP.

      Strengths: 

      The isolates included in this study were from 16 countries in the past century (1920 to 2023). While the study uses S. Gallinarun as the prototype, the conclusion from this work will likely apply to other Salmonella serotypes and other pathogens. 

      Thanks for the constructive comments and the positive reception of the manuscript.

      Weaknesses: 

      While the isolates came from 16 countries, most strains in this study were originally from China. 

      We appreciate the reviewer's observation regarding the sampling distribution of isolates in this study. We acknowledge that while the isolates were collected from 15 different countries, with a significant proportion originated from China (Author response image 1). This focus is due to several reasons:

      Author response image 1.

      Geographic distribution of 580 S. Gallinarum. Different colors indicate the countries of origin for the 580 S. Gallinarum strains in the dataset. Darker shades represent higher numbers of strains.

      (1) As once a globally prevalent pathogen across the 20th century, S. Gallinarum was listed by the World Organization for Animal Health (WOAH) due to its economic importance. After 30 years of implementation of the National Poultry Improvement Plan in the US, it was almost eradicated in high-income countries, and interestingly, it became an endemic pathogen with sporadic outbreaks in most low- or middle-income countries like China and Brazil. Given the vast expanse of China's land area and the country's economic factors, implementing the same measures remains challenging.  

      (2) S. Gallinarum is an avian-specific pathogen, particularly affecting chickens, and its distribution is closely linked to chicken meat production in different countries. There are more frequent reports of fowl typhoid in some high chicken-producing developing countries. Data from the United States Department of Agriculture (USDA) on annual chicken meat production for 2023/2024 show that the global distribution of S. Gallinarum aligns closely with the overall chicken meat production of these countries (https://fas.usda.gov/data/production/commodity/0115000).

      Author response image 2.

      The United States Department of Agriculture (USDA) data on annual chicken meat production for 2023/2024 across different countries globally.

      (3) Our primary objective was to investigate the localized resistome adaptation of S. Gallinarum in regions. Being a region with significant disease burden, China has reported numerous outbreaks (Sci Data. 2022 Aug 13;9(1):495; Sci Data. 2024 Feb 27;11(1):244) and a high AMR prevalence of this serovar (Natl Sci Rev. 2023 Sep 2;10(10):nwad228; mSystems. 2023 Dec 21;8(6):e0088323), making it an excellent example for understanding localized resistance mechanisms.

      (4) As China is the primary country of origin for the strains in this study, it is necessary to ensure that the strains from China are consistent with the local geographic characteristics of the country. Therefore, we conducted a correlation analysis between the number of strains from different provinces in China and the total GDP/population size of those provinces (Author response image 3). The results show that most points fall within the 95% confidence interval of the regression line. Although some points exhibit relative unbalance in the number of S. Gallinarum strains, most data points for these regions have a small sample size (n < 15). Overall, we found that the prevalence of S. Gallinarum in different regions of China is consistent with the overall nationwide trend.

      Author response image 3.

      Correlation analysis between the number of S. Gallinarum collected from different provinces in China and the total GDP/population size. The figure depicts a series of points representing individual provinces. The x-axis indicates the number of S. Gallinarum included in the dataset, while the y-axis displays the values for total GDP and total population size, respectively.

      Nevertheless, a search of nearly a decade of literature on PubMed and a summary of the S. Gallinarum genome available on public databases indicate that the dataset used is the most complete. Furthermore, focusing on a specific region within China allowed us to conduct a detailed and thorough analysis. However, we highly agree that expanding the study to include more isolates from other countries would enhance the generalizability of our findings, and we are actively collecting additional S. Gallinarum genome data. In the revised manuscript, we have further emphasized the limitations as follow:

      Lines 427-429: “However, the current study has some limitations. Firstly, despite assembling the most comprehensive WGS database for S. Gallinarum from public and laboratory sources, there are still biases in the examined collection. The majority (438/580) of S. Gallinarum samples were collected from China, possibly since the WGS is a technology that only became widely available in the 21st century. This makes it impractical to sequence it on a large scale in the 20th century, when S. Gallinarum caused a global pandemic. So, we suspect that human intervention in the development of this epidemic is the main driving force behind the fact that most of the strains in the data set originated in China. In our future work, we aim to actively gather more data to minimize potential biases within our dataset, thereby improving the robustness and generalizability of our findings.”

      Reviewer #2 (Public review): 

      Summary: 

      The authors sequence 45 new samples of S. Gallinarum, a commensal Salmonella found in chickens, which can sometimes cause disease. They combine these sequences with around 500 from public databases, determine the population structure of the pathogen, and coarse relationships of lineages with geography. The authors further investigate known anti-microbial genes found in these genomes, how they associate with each other, whether they have been horizontally transferred, and date the emergence of clades. 

      Thank you for your constructive suggestions, which are valuable and highly beneficial for improving our paper. According to your comments, we carefully considered and responded to them and made corresponding revisions to the text. Furthermore, to fully contextualize the background knowledge and clarify the major points in this study, we add some references to support our findings and policy implications.

      Upon further review of our initial manuscript, we realized that the original submission did not strictly follow the lineage order proposed by Zhou et al. (Natl Sci Rev. 2023 Sep 2;10(10):nwad228). To avoid confusion in the typing system, we have adjusted the lineage nomenclature in the revised manuscript to reflect the corrected order (see Author response table 1).

      Strengths: 

      (1) It doesn't seem that much is known about this serovar, so publicly available new sequences from a high-burden region are a valuable addition to the literature. 

      (2) Combining these sequences with publicly available sequences is a good way to better contextualise any findings. 

      Thank you so much for your thorough review and constructive comments on the manuscript.

      Weaknesses: 

      There are many issues with the genomic analysis that undermine the conclusions, the major ones I identified being: 

      (1) Recombination removal using gubbins was not presented fully anywhere. In this diversity of species, it is usually impossible to remove recombination in this way. A phylogeny with genetic scale and the gubbins results is needed. Critically, results on timing the emergence (fig2) depend on this, and cannot be trusted given the data presented. 

      We sincerely thank you for pointing out this issue. In the original manuscript, we aimed to present different lineages of S. Gallinarum within a single phylogenetic tree constructed using BEAST. However, in the revised manuscript, we have addressed this issue by applying the approach recommended by Gubbins to remove recombination events for each lineage defined by FastBAPs. Additionally, to better illustrate the removal of recombination regions in the genome, we have included a figure generated by Gubbins (New Supplementary Figure 12). 

      Our results indicate that recombination events are relatively infrequent in Lineage 1, followed by Lineage 3, but occur more frequently in Lineage 2. In the revised manuscript, we have included additional descriptions in the Methods section to clarify this analysis. We hope these modifications adequately address the reviewer’s concerns and enhance the trustworthiness of our findings.

      (2) The use of BEAST was also only briefly presented, but is the basis of a major conclusion of the paper. Plot S3 (root-to-tip regression) is unconvincing as a basis of this data fitting a molecular clock model. We would need more information on this analysis, including convergence and credible intervals. 

      Thank you very much for raising this issue. We decided to reconduct separate BEAST analyses for each lineage, accurately presenting the evolutionary scale based on the abovementioned improvements. The implementation of individual lineage for BEAST analysis was conducted based on the following steps:

      (1) Using R51 as the reference, a reference-mapped multiple core-genome SNP sequence alignment was created, and recombination regions were detected and removed as described above.

      (2) TreeTime was used to assess the temporal structure by performing a regression analysis of the root-to-tip branch distances within the maximum likelihood tree, considering the sampling date as a variable (New Supplementary Figures 6). However, the root-to-tip regression analysis presented in New Supplementary Figures 6 was not intended as a basis for selecting the best molecular clock model; its purpose was to clean the dataset with appropriate measurements.

      (3) To determine the optimal model for running BEAST, we tested a total of six combinations in the initial phase of our study. These combinations included the strict clock, relaxed lognormal clock, and three population models (Bayesian SkyGrid, Bayesian Skyline, and Constant Size). Before conducting the complete BEAST analysis, we evaluated each combination using a Markov Chain Monte Carlo (MCMC) analysis with a total chain length of 100 million and sampling every 10,000 iterations. We then summarized the results using NSLogAnalyser and determined the optimal model based on the marginal likelihood value for each combination. The results indicated that the model incorporating the Bayesian Skyline and the relaxed lognormal clock yielded the highest marginal likelihood value in our sample. Then, we proceeded to perform a timecalibrated Bayesian phylogenetic inference analysis for each lineage. The following settings were configured: the "GTR" substitution model, “4 gamma categories”, the "Relaxed Clock Log Normal" model, the "Coalescent Bayesian Skyline" tree prior, and an MCMC chain length of 100 million, with sampling every 10,000 iterations.

      (4) Convergence was assessed using Tracer, with all parameter effective sampling sizes (ESS) exceeding 200. Maximum clade credibility trees were generated using TreeAnnotator. Finally, key divergence time points (with 95% credible intervals) were estimated, and the tree was visualized using FigTree. 

      For the key lineages, L2b and L3b (carrying the resistome, posing antimicrobial resistance (AMR) risks, and exhibiting intercontinental transmission events), we have redrawn Figure 2 based on the updated BEAST analysis results (New Figure 2). For L1, L2a, and L3c, we have added supplementary figures to provide a more detailed visualization of their respective BEAST analysis outcomes (New Supplementary Figures 3-5). The revised BEAST analysis indicates that the origin of L3b in China can be traced back to as early as 1683 (95% CI: 1608 to 1839). In contrast, the earliest possible origin of L2b in China dates back to 1880 (95% CI: 1838 to 1902). This indicates that the previous manuscript's assumption that L2b is an older lineage compared to L3b may be inaccurate. 

      Furthermore, In the revised manuscript, we specifically estimated the time points for the first intercontinental transmission events for the two major lineages, L2b and L3b. Our results indicate that L2b, likely underwent two major intercontinental transmission events. The first occurred around 1893 (95% CI: 1870 to 1918), with transmission from China to South America. The second major transmission event occurred in 1923 (95% CI: 1907 to 1940), involving the spread from South America to Europe. In contrast, the transmission pattern of L3b appears relatively more straightforward. Our findings show that L3b, an S. Gallinarum lineage originating in China, only underwent one intercontinental transmission event from China to Europe, likely occurring around 1790 (95% CI: 1661 to 1890) (New Supplementary Figure 7). Based on the more critical BEAST analysis for each lineage, we have revised the corresponding conclusions in the manuscript. We believe that the updated BEAST analysis, performed using a more accurate recombination removal approach, significantly enhances the rigor and credibility of our findings.

      (3) Using a distance of 100 SNPs for a transmission is completely arbitrary. This would at least need to be justified in terms of the evolutionary rate and serial interval. 

      Using single nucleotide polymorphism (SNP) distance to trace pathogen transmission is a common approach (J Infect Dis. 2015 Apr 1;211(7):1154-63) and in our previous studies (hLife 2024; 2(5):246-256. mLife 2024; 3(1):156-160.). When the SNP distance within a cluster falls below a set threshold, the strains in that cluster are considered to have a potential direct transmission link. It is generally accepted that the lower the threshold, the more stringent the screening process becomes. However, there is little agreement in the literature regarding what such a threshold should be, and the appropriate SNP cut-off for inferring transmission likely depends critically on the context (Mol Biol Evol. 2019 Mar 1;36(3):587-603).

      In this study, we compared various thresholds (SNPs = 5, 10, 20, 25, 30, 35, 40, 50, 100) to ensure clustering in an appropriate manner. First, we summarized the tracing results under each threshold (Author response image 4), which demonstrated that, regardless of the threshold used, all strains associated with transmission events originated from the same location (New Figure 3a).

      Author response image 4.

      Clustering results of 45 newly isolated S. Gallinarum strains using different SNP thresholds of 5, 10, 15, 20, 25, 28, 30, 50, and 100 SNPs. The nine subplots represent the clustering results under each threshold. Each point corresponds to an individual strain, and lines connect strains with potential transmission relationships.

      In response to your comments regarding the evolutionary rate, we estimated the overall evolutionary rate of the S. Gallinarum using BEAST. We applied the methodology described by Arthur W. Pightling et al. (Front Microbiol. 2022 Jun 16; 13:797997). The numbers of SNPs per year were determined by multiplying the evolutionary rates estimated with BEAST by the number of core SNP sites identified in the alignments. We hypothesize that a slower evolutionary rate in bacteria typically requires a lower SNP threshold when tracing transmission events using SNP distance analysis. Pightling et al.'s previous research found an average evolutionary rate of 1.97 SNPs per year (95% HPD, 0.48 to 4.61) across 22 different Salmonella serotypes. Our updated BEAST estimation for the evolutionary rate of S. Gallinarum suggests it is approximately 0.74 SNPs per year (95% HPD, 0.42 to 1.06). Based on these findings, and our previous experience with similar studies (mBio. 2023 Oct 31;14(5):e0133323.), we set a threshold of 5 SNPs in the revised manuscript.

      Then, we adopted the newly established SNP distance threshold (n=5) to update Figure 3a and New Supplementary Figure 8. The heatmap on the far right of New Figure 3a illustrates the SNP distances among 45 newly isolated S. Gallinarum strains from two locations in Zhejiang Province (Taishun and Yueqing). New Supplementary Figure 8 simulates potential transmission events between the bvSP strains isolated from Zhejiang Province (n=95) and those from China with available provincial information (n=435). These analyses collectively demonstrate the localized transmission pattern of bvSP within China. Our analysis using the newly established SNP threshold indicates that the 45 strains isolated from Taishun and Yueqing exhibit a highly localized transmission pattern, with pairs of strains exhibiting potential transmission events below the set threshold occurring exclusively within a single location. Subsequently, we conducted the SNP distance-based tracing analysis for the 95 strains from Zhejiang Province and those from China with available provincial information (n=435) (New Supplementary Figure 8, New Supplementary Table S8). Under the SNP distance threshold (n=5), we identified a total of 91 potential transmission events, all of which occurred exclusively within Zhejiang Province. No inter-provincial transmission events were detected. Based on these findings, we revised the methods and conclusions in the manuscript accordingly. We believe that the updated version well addresses your concerns.

      Nevertheless, the final revised and updated results do not change the conclusions presented in our original manuscript. Instead, applying a more stringent SNP distance threshold allows us to provide solid evidence supporting the localized transmission pattern of S. Gallinarum in China. 

      (4) The HGT definition is non-standard, and phylogeny (vertical inheritance) is not controlled for.  

      The cited method: 

      'In this study, potentially recently transferred ARGs were defined as those with perfect identity (more than 99% nucleotide identity and 100% coverage) in distinct plasmids in distinct host bacteria using BLASTn (E-value {less than or equal to}10−5)' 

      This clearly does not apply here, as the application of distinct hosts and plasmids cannot be used. Subsequent analysis using this method is likely invalid, and some of it (e.g. Figure 6c) is statistically very poor. 

      Thank you for raising this important question. In our study, Horizontal Gene Transfer (HGT) is defined as the transfer of genetic information between different organisms, a process that facilitates the spread of antibiotic resistance genes (ARGs) among bacteria. This definition of HGT is consistent with that used in previous studies (Evol Med Public Health. 2015; 2015(1):193–194; ISME J. 2024 Jan 8;18(1):wrad032). In Salmonella, the transfer of antimicrobial resistance genes via HGT is not solely dependent on plasmids; other mobile genetic elements (MGEs), such as transposons, integrons, and prophages, also play significant roles. This has also  been documented in our previous work (mSystems. 2023 Dec 21;8(6):e0088323). Given the involvement of various MGEs in the horizontal transfer of ARGs, we propose that the criteria for evaluating horizontal transfer via plasmids can also be applied to ARGs mediated by other MGEs.

      In this study, we adopted stricter criteria than those used by Xiaolong Wang et al. Specifically, we defined two ARGs as identical only if they exhibited 100% nucleotide identity and 100% coverage. To address concerns regarding the potential influence of vertical inheritance in our analysis, we have made the following improvements. In the revised manuscript, we provide a more detailed table that includes the co-localization analysis of each ARG with mobile genetic elements (New Supplementary Table 9). For prophages and plasmids, we required that ARGs be located directly within these elements. In contrast, for transposons and integrons, we considered ARGs to be associated if they were located within a 5 kb region upstream or downstream of these elements (Nucleic Acids Res. 2022 Jul 5;50(W1):W768-W773). 

      In the revised manuscript, we first categorized a total of 621 ARGs carried by 436 bvSP isolates collected in China according to the aforementioned criteria and found that 415 ARGs were located on MGEs. After excluding the ARGs not associated with MGEs, we recalculated the overall HGT frequency of 10 types of ARGs in China, the horizontal ARGs transfer frequency in three key regions, and the horizontal ARGs transfer frequency within a single region (New Supplementary Table 7). Based on the results, we updated relevant sections of the manuscript and remade Figure 6. The updated manuscript describes the results of this section as follows:

      “Horizontal transfer of resistome occurs widely in localized bvSP

      Horizontal transfer of the resistome facilitates the acquisition of AMR among bacteria, which may record the distinct acquisition event in the bacterial genome. To compare these events in a geographic manner, we further investigated the HGT frequency of each ARG carried by bvSP isolated from China and explored the HGT frequency of resistome between three defined regions. Potentially horizontally transferred ARGs were defined as those with perfect identity (100% identity and 100% coverage) and were located on MGEs across different strains (Fig. 6a). We first categorized a total of 621 ARGs carried by 436 bvSP isolates collected in China and found that 415 ARGs were located on MGEs. After excluding the ARGs not associated with MGEs, our findings reveal that horizontal gene transfer of ARGs is widespread among Chinese bvSP isolates, with an overall transfer rate of 92%. Specifically, 50% of the ARGs exhibited an HGT frequency of 100%, indicating that these ARGs might underwent extensive frequent horizontal transfer events (Fig. 6b). It is noteworthy that certain resistance genes, such as tet(A), aph(3'')-Ib, and aph(6)-Id, appear to be less susceptible to horizontal transfer.

      However, different regions generally exhibited a considerable difference in resistome HGT frequency. Overall, bvSP from the southern areas in China showed the highest HGT frequency (HGT frequency=95%). The HGT frequencies for bvSP within the eastern and northern regions of China are lower, at 92% and 91%, respectively (Fig. 6c). For specifical ARG type, we found tet(A) is more prone to horizontal transfer in the southern region, and this proportion was considerably lower in the eastern region. Interestingly, certain ARGs such as aph(6)-Id, undergo horizontal transfer only within the eastern and northern regions of China (Fig. 6d). Notably, as a localized transmission pathogen, resistome carried by bvSP exhibited a dynamic potential among inter-regional and local demographic transmission, especially from northern region to southern region (HGT frequency=93%) (Fig. 6e, Supplementary Table 7).”

      We also modified the current version of the pipeline used to calculat the HGT frequency of resistance genes. In the revised pipeline, users are required to provide a file specifying the locations of mobilome on the genome before formally calculating the HGT frequency of the target ARGs. The specific code and data used in the calculation have been uploaded to https://github.com/tjiaa/Cal_HGT_Frequency.

      However, we also acknowledge that the current in silico method has some limitations. This approach heavily relies heavily on prior information in existing resistome/mobilome databases. Additionally, the characteristics of second-generation sequencing data make it challenging to locate gene positions precisely. Using complete genome assemblies might be a crucial approach to address this issue effectively. In the revised manuscript, we have also provided a more detailed explanation of the implications of the current pipeline.

      Regarding your second concern, "some of it (e.g., Figure 6c) is statistically very poor," the horizontal ARG transfer frequency calculation for each region was based on the proportion of horizontal transfer events of ARGs in that region to the total possible transfer events. As a result, we are unable to calculate the statistical significance between the two regions. Our aim with this approach is to provide a rough estimate of the extent of horizontal ARG transfer within the S. Gallinarum population in each region. In future studies, we will refine our conclusions by developing a broader range of evaluation methods to ensure more comprehensive assessment and validation.

      (5) Associations between lineages, resistome, mobilome, etc do not control for the effect of genetic background/phylogeny. So e.g. the claim 'the resistome also demonstrated a lineage-preferential distribution' is not well-supported. 

      Thank you for your comments. We acknowledge that the associations between lineages and the mobilome/resistome may be influenced by the genetic background or phylogeny of the strains. For instance, our conclusion regarding the lineage-preferential distribution of the resistome was primarily based on New Figure 4a, where L3 is clearly shown to carry the most ARGs. Furthermore, we observed that L3b tends to harbor bla<sub>_TEM-1B</sub>, _sul2, and tet(A) more frequently than other lineages. However, we recognize that this evidence is insufficient to support a definitive conclusion of “demonstrated a lineage-preferential distribution”. Therefore, we have re-examined the current manuscript and described these findings as a potential association between the mobilome/resistome and lineages.

      (6) The invasiveness index is not well described, and the difference in means is not biologically convincing as although it appears significant, it is very small. 

      Thank you for pointing this out. For the invasiveness index mentioned in the manuscript, we used the method described in previous studies. (PLoS Genet. 2018 May 8;14(5), Nat Microbiol. 2021 Mar;6(3):327-338). Specifically, Salmonella’s ability to cause intestinal or extraintestinal infections in hosts is related to the degree of genome degradation. We evaluated the potential for extraintestinal infection by 45 newly isolated S. Gallinarum strains (L2b and L3b) using a model that quantitatively assesses genome degradation. We analyzed samples using the 196 top predictor genes, employing a machine-learning approach that utilizes a random forest classifier and delta-bitscore functional variant-calling. This method evaluated the invasiveness of S. Gallinarum towards the host, and the distribution of invasiveness index values for each region was statistically tested using unpaired t-test. The code used for calculating the invasiveness index is available at https://github.com/Gardner-BinfLab/invasive_salmonella. In the revised manuscript, we added a more detailed description of the invasiveness index calculation in the Methods section as follows:

      Lines 592-603: “Specifically, Salmonella’s ability to cause intestinal or extraintestinal infections in hosts is related to the degree of genome degradation. We evaluated the potential for extraintestinal infection by 45 newly isolated S. Gallinarum strains (L2b and L3b) using a model that quantitatively assesses genome degradation. We analyzed each sample using the 196 top predictor genes for measuring the invasiveness of S. Gallinarum, employing a machine-learning approach that utilizes a random forest classifier and deltabitscore functional variant-calling. This method evaluated the invasiveness of S. Gallinarum towards the host, and the distribution of invasiveness index values for each region was statistically tested using unpaired t-test. The code used for calculating the invasiveness index is available at: https://github.com/Gardner-BinfLab/invasive_salmonella.”

      Regarding the second question, 'the difference in means is not biologically convincing as although it appears significant, it is very small,' we believe that this difference is biologically meaningful. In our previous work, we infected chicken embryos with different lineages of S. Gallinarum (Natl Sci Rev. 2023 Sep 2;10(10):nwad228). The virulence of thirteen strains of Salmonella Gallinarum, comprising five from lineage L2b and eight from lineage L3b, was evaluated in 16-day-old SPF chicken embryos through inoculation into the allantoic cavity. Controls included embryos that inoculated with phosphate-buffered saline (PBS). The embryos were incubated in a thermostatic incubator maintained at 37.5°C with a relative humidity ranging from 50% to 60%. Prior to inoculation, the viability of the embryos was assessed by examining the integrity of their venous system and their movements; any dead embryos were excluded from the study. Overnight cultures resuspended in PBS at a concentration of 1000 CFU per 100 μL were administered to the embryos. Mortality was recorded daily for a period of five days, concluding upon the hatching of the chicks. 

      It is generally accepted that strains with higher invasive capabilities are more likely to cause chicken embryo mortality. Our experimental results showed that the L2b, which exhibits higher invasiveness, with a slightly higher to cause chicken embryo death (Author response image 5). 

      Author response image 5.

      The survival curves of chicken embryos infected with bvSP isolates from S. Gallinarum L2b and S. Gallinarum L3b. Inoculation with Phosphate Buffer Saline (PBS) were considered controls. 

      (7) 'In more detail, both the resistome and mobilome exhibited a steady decline until the 1980s, followed by a consistent increase from the 1980s to the 2010s. However, after the 2010s, a subsequent decrease was identified.' 

      Where is the data/plot to support this? Is it a significant change? Is this due to sampling or phylogenetics? 

      Thank you for highlighting these critical points. The description in this statement is based on New Supplementary Figure 11. On the right side of New Supplementary Figure 11, we presented the average number of Antimicrobial Resistance Genes (ARGs) and Mobile Genetic Elements (MGEs) carried by S. Gallinarum isolates from different years, and we described the overall trend across these years. However, we realized that this statement might overinterpret the data. Given that this sentence does not impact our emphasis on the overall increasing trends observed in the resistome and mobilome, as well as their potential association, we decided to remove it in the revised manuscript.

      The revised paragraph would read as follows:

      Lines 261-268: “Variations in regional antimicrobial use may result in uneven pressure for selecting AMR. The mobilome is considered the primary reservoir for spreading resistome, and a consistent trend between the resistome and the mobilome has been observed across different lineages, from L1-L3c. We observed an overall gradual rise in the resistome quantity carried by bvSP across various lineages, correlating with the total mobilome content (S11 Fig). Furthermore, we investigated the interplay between particular mobile elements and resistome types in bvSP.”

      (8) It is not clear what the burden of disease this pathogen causes in the population, or how significant it is to agricultural policy. The article claims to 'provide valuable insights for targeted policy interventions.', but no such interventions are described. 

      Thank you for your constructive suggestions. Salmonella Gallinarum is an avian-specific pathogen that induces fowl typhoid, a severe systemic disease characterized by high mortality rates in chickens, thereby posing a significant threat to the poultry industry, particularly in developing countries (Rev Sci Tech. 2000 Aug;19(2):40524). In our previous research, we conducted a comprehensive meta-analysis of 201 publications encompassing over 900 million samples to investigate the global impact of S. Gallinarum (Sci Data. 2022 Aug 13;9(1):495). Our findings estimated that the global prevalence of S. Gallinarum is 8.54% (with a 95% confidence interval of 8.43% to 8.65%), with notable regional variations in incidence rates.

      Our previously analysis focused on the prevalence of S. Gallinarum (including biovars SP and SG) across six continents. The results revealed that all continents, except Oceania, exhibited positive prevalences of S. Gallinarum. Asia had the highest prevalence at 17.31%, closely followed by Europe at 16.03%. In Asia, the prevalence of biovar SP was higher than that of biovar SG, whereas in Europe, biovar SG was observed to be approximately two hundred times more prevalent than biovar SP. In South America, the prevalence of S. Gallinarum was higher than that of biovar SP, at 10.06% and 13.20% respectively. Conversely, the prevalence of S. Gallinarum was relatively lower in North America (4.45%) compared to Africa (1.10%) (Author response image 6).

      Given the significant economic losses caused by S. Gallinarum to the poultry industry and the potential risk of escalating antimicrobial resistance, more targeted policy interventions are urgently needed. Further elaboration on this implication is provided in the revised “Discussion” section as follows:

      Lines 401-416: “In summary, the findings of this study highlight that S. Gallinarum remains a significant concern in developing countries, particularly in China. Compared to other regions, S. Gallinarum in China poses a notably higher risk of AMR, necessitating the development of additional therapies, i.e. vaccine, probiotics, bacteriophage therapy in response to the government's policy aimed at reducing antimicrobial use ( J Infect Dev Ctries. 2014 Feb 13;8(2):129-36). Furthermore, given the dynamic nature of S. Gallinarum risks across different regions, it is crucial to prioritize continuous monitoring in key areas, particularly in China's southern regions where the extensive poultry farming is located. Lastly, from a One-Health perspective, controlling AMR in S. Gallinarum should not solely focus on local farming environments, with improved overall welfare on poultry and farming style. The breeding pyramid of industrialized poultry production should be targeted on the top, with enhanced and accurate detection techniques (mSphere. 2024 Jul 30;9(7):e0036224). More importantly, comprehensive efforts should be made to reduce antimicrobial usage overall and mitigate potential AMR transmission from environmental sources or other hosts (Vaccines (Basel). 2024 Sep 18;12(9):1067; Vaccines (Basel). 2023 Apr 18;11(4):865; Front Immunol. 2022 Aug 11:13:973224).”

      Author response image 6.

      A comparison of the global prevalence of S. gallinarum across continents.

      (9) The abstract mentions stepwise evolution as a main aim, but no results refer to this. 

      Thank you for raising this issue. In the revised manuscript, we have changed “stepwise evolution” to simply “evolution” to ensure a more accurate and precise description.

      (10) The authors attribute changes in population dynamics to normalisation in China-EU relations and hen fever. However, even if the date is correct, this is not a strongly supported causal claim, as many other reasons are also possible (for example other industrial processes which may have changed during this period). 

      Thank you for raising this critical issue. In the revised manuscript, we conducted a more stringent BEAST analysis for each lineage, as described earlier. This led to some changes in the inferred evolutionary timelines. Consequently, we have removed the corresponding statement from the “Results” section. Instead, we now only provide a discussion of historical events, supported by literature, that could have facilitated the intercontinental spread of L2b and L3b in the “Discussion” section. We believe these revisions have made the manuscript more rigorous and precise.

      Lines 332-342: “_The biovar types of _S. Gallinarum have been well-defined as bvSP, bvSG, and bvSD historically ( J Vet Med B Infect Dis Vet Public Health. 2005 Jun;52(5):2148). Among these, bvSP can be further subdivided into five lineages (L1, L2a, L2b, L3b, and L3c) using hierarchical Bayesian analysis. Different sublineages exhibited preferential geographic distribution, with L2b and L3b of bvSP being predominant global lineage types with a high risk of AMR. The historical geographical transmission was verified using a spatiotemporal Bayesian framework. The result shows that L3b was initially spread from China to Europe in the 18<sup>th</sup>-19<sup>th</sup> century, which may be associated with the European hen fever event in the mid-19th century (Burnham GP. 1855. The history of the hen fever: a humorous record). L2b, on the other hand, appears to have spread to Europe via South America, potentially contributing to the prevalence of bvSP in the United States.”  

      (11) No acknowledgment of potential undersampling outside of China is made, for example, 'Notably, all bvSP isolates from Asia were exclusively found in China, which can be manually divided into three distinct regions (southern, eastern, and northern).'.

      Perhaps we just haven't looked in other places?

      We appreciate the reviewer's observation regarding the sampling distribution of isolates in this study. We acknowledge that while the isolates were collected from 15 different countries with, a significant proportion originated from China (Author response image 1). This focus is due to several reasons:

      (1) As once a globally prevalent pathogen across the 20th century, S. Gallinarum was listed by the World Organization for Animal Health (WOAH) due to its economic importance. After 30 years of implementation the National Poultry Improvement Plan in the US, it was almost eradicated in high-income countries, and interestingly, it became an endemic pathogen with sporadic outbreaks in most low- or middle-income countries like China and Brazil. Given the vast expanse of China's land area and the country's economic factors, implementing the same measures remains a challenging endeavour. 

      (2) S. Gallinarum is an avian-specific pathogen, particularly affecting chickens, and its distribution is closely linked to chicken meat production in different countries. In some high chicken-producing developing countries, such as China and Brazil, there are more frequent reports of fowl typhoid. Data from the United States Department of Agriculture (USDA) on annual chicken meat production for 2023/2024 show that the global distribution of S. Gallinarum aligns closely with the overall chicken meat production of these countries (https://fas.usda.gov/data/production/commodity/0115000).  

      (3) Our primary objective was to investigate the localized resistome adaptation of S. Gallinarum in regions. Being a region with significant disease burden, China has reported numerous outbreaks (Sci Data. 2022 Aug 13;9(1):495; Sci Data. 2024 Feb 27;11(1):244) and a high AMR prevalence of this serovar (Natl Sci Rev. 2023 Sep 2;10(10):nwad228; mSystems. 2023 Dec 21;8(6):e0088323), making it an excellent example for understanding localized resistance mechanisms. 

      Nevertheless, a search of nearly a decade of literature on PubMed and a summary of the S. Gallinarum genome available on public databases indicate that the dataset used is the most complete. Furthermore, focusing on a specific region within China allowed us to conduct a detailed and thorough analysis. However, we highly agree that expanding the study to include more isolates from other countries would enhance the generalizability of our findings, and we are actively collecting additional S. Gallinarum genome data. In the revised manuscript, we modified this sentence to indicate that this phenomenon is only observed in the current dataset, thereby avoiding an overly absolute statement:

      Lines 131-135: “For the bvSP strains from Asia included in our dataset, we found that all originated from China. To further investigate the distribution of bvSP across different regions in China, we categorized them into three distinct regions: southern, eastern, and northern (Supplementary Table 3)”.

      (12) Many of the conclusions are highly speculative and not supported by the data. 

      Thank you for your comment. We have carefully revised the manuscript to address your concerns. We hope that the changes made in the revised version meet your expectations and provide a clearer and more accurate interpretation of our findings.

      (13) The figures are not always the best presentation of the data: 

      a. Stacked bar plots in Figure 1 are hard to interpret, the total numbers need to be shown.

      Panel C conveys little information. 

      b. Figure 4B: stacked bars are hard to read and do not show totals. 

      c. Figure 5 has no obvious interpretation or significance. 

      Thank you for your comments. We have revised the figures to improve the clarity and presentation of the data.

      In summary, the quality of analysis is poor and likely flawed (although there is not always enough information on methods present to confidently assess this or provide recommendations for how it might be improved). So, the stated conclusions are not supported. 

      Thank you for your valuable feedback. We have carefully revised the manuscript to address your concerns. We hope that the updated figures and tables, and new data in the revised version meet your expectations and provide more appropriate interpretation of our findings.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors): 

      This reviewer enjoyed reading this well-written manuscript. The authors are encouraged to address the following comments and revise the manuscript accordingly. 

      (1) Title: The authors use avian-restrict Salmonella to refer to Salmonella Gallinarum. Please consider using Salmonella Gallinarum in the title. Also, your analysis relates to resistome and mobilome. Would it make sense to add mobilome in the manuscript? 

      Thank you for your guidance. In the revised manuscript, we have changed the title to “Avian-specific Salmonella enterica Serovar Gallinarum transition to endemicity is accompanied by localized resistome and mobilome interaction”. We believe that this revised title more accurately reflects the content of our study.

      (2) Abstract: This study uses 45 isolates from your labs. However, you failed to include these 45 isolates in the Abstract. Also, please clarify the sources of these isolates (from dead chickens, or dead chicken embryos? You wrote in two different ways in this manuscript). Also, I am not entirely convinced how the results from these 45 isolates will support the overall conclusion of this work. 

      Thank you for your thorough review and constructive comments on the manuscript. In the revised version, we have added a description of 45 newly isolated S. Gallinarum strains in the Abstract to provide readers with a clearer understanding of the dataset used in this study.

      Lines 36-41: “Using the most comprehensive whole-genome sequencing dataset of Salmonella enterica serovar Gallinarum (S. Gallinarum) collected from 16 countries, including 45 newly recovered samples from two related local regions, we established the relationship among avian-specific pathogen genetic profiles and localization patterns.”

      Furthermore, the newly isolated S. Gallinarum strains were obtained from dead chicken embryos. We think your second concern may arise from the following description in the manuscript: “All 734 samples of dead chicken embryos were collected from Taishun and Yueqing in Zhejiang Province, China. After the thorough autopsy, the liver, intestines, and spleen were extracted and added separately into 2 mL centrifuge tubes containing 1 mL PBS. The organs were then homogenized by grinding.” In fact, all the collected dead chicken embryos were aged 19 to 20 days. At this developmental stage, collecting the liver, intestines, and spleen for isolation and cultivation of S. Gallinarum is possible. To avoid any confusion, we have included a more detailed description of the dead chicken embryos in the revised manuscript as follows:

      Lines 447-451: “All 734 samples of dead chicken embryos aged 19 to 20 days were collected from Taishun and Yueqing in Zhejiang Province, China. After a thorough autopsy, the liver, intestines, and spleen were extracted and added separately into 2 mL centrifuge tubes containing 1 mL PBS. The organs were then homogenized by grinding.”

      Regarding your concern about the statement, “I am not entirely convinced how the results from these 45 isolates will support the overall conclusion of this work,” we would like to clarify the significance of these new isolates. Our research first identified distinct characteristics in the 45 newly isolated S. Gallinarum strains from Taishun and Yueqing, Zhejiang Province. Specifically, we found that most of the strains from Yueqing belonged to sequence type ST92, whereas the majority from Taishun were ST3717. Additionally, there were significant differences between these geographically close strains in terms of SNP distance and predicted invasion capabilities. These findings suggest that S. Gallinarum may exhibit localized transmission patterns, which forms the basis of the scientific question and hypothesis we originally aimed to address. Furthermore, in our previous work, we collected 325 S. Gallinarum strains. By incorporating the newly isolated 45 strains, we aim to provide a more comprehensive view of the population diversity, transmission pattern and potential risk of S. Gallinarum. We will continue to endeavour to understand the global genomic and population diversity in this field.

      Finally, we revised the sentences that could potentially raise concerns for readers: 

      Lines 175-177: “To investigate the dissemination pattern of bvSP in China, we obtained forty-five newly isolated bvSP from 734 samples (6.1% overall isolation rate) collected from diseased chickens at two farms in Yueqing and Taishun, Zhejiang Province.”  >  “To investigate the dissemination pattern of bvSP, we obtained forty-five newly isolated bvSP from 734 samples (6.1% overall isolation rate) collected from diseased chickens at two farms in Yueqing and Taishun, Zhejiang Province.”

      (3) The manuscript uses nomenclature and classification into different sublineages. Did the authors establish the approaches for defining these sublineages in this group or did you follow the accepted standards? 

      Thank you very much for raising this important issue. The biovar types of Salmonella Gallinarum have historically been well-defined as S. Gallinarum biovar

      Pullorum (bvSP), S. Gallinarum biovar Gallinarum (bvSG), and S. Gallinarum biovar Duisburg (bvSD) (J Vet Med B Infect Dis Vet Public Health. 2005 Jun;52(5):214-8). However, there seems to be no widespread consensus on the population nomenclature for the key biovar bvSP. In a previous study, Zhou et al. classified bvSP into six lineages:

      L1, L2a, L2b, L3a, L3b, and L3c (Natl Sci Rev. 2023 Sep 2;10(10):nwad228). However, our more comprehensive analysis of S. Gallinarum using a larger dataset and hierarchical Bayesian clustering revealed that L3a, previously considered a distinct lineage, is actually a sublineage of L3c. Upon further review of our initial manuscript, we realized that the original submission did not strictly follow the lineage order proposed by Zhou et al. To avoid confusion in the typing system, we have adjusted the lineage nomenclature in the revised manuscript to reflect the corrected order (see Author response table 1).

      (4) This reviewer is convinced with the analysis approaches and conclusion of this work.

      In the meantime, the authors are encouraged to discuss the application of the conclusion of this study: a) can the data be somehow used in the prediction model? b) would the conclusion from S. Gallinarum have generalized application values for other pathogens. 

      Thank you for your constructive comments on the manuscript. 

      a) can the data be somehow used in the prediction model?

      We believe that genomic data can be effectively used for constructing prediction models; however, the success of such models largely depends on the specific traits being predicted. In this study, we utilized a random forest prediction model based on 196 top genes (PLoS Genet. 2018 May 8;14(5)) to predict the invasiveness of 45 newly isolated strains. In relation to the antimicrobial resistance (AMR) issue discussed in this paper, we also conducted relevant analyses. For instance, we explored the use of image-based models to predict whether a genome is resistant to specific antibiotics (Comput Struct Biotechnol J. 2023 Dec 29:23:559-565). We are confident that the incorporation of newly generated data will facilitate the development of future predictive models, and we plan to pursue further research in this area.

      b) would the conclusion from S. Gallinarum have generalized application values for other pathogens.

      This might be explained from two perspectives. First, the key role of the mobilome in facilitating the spread of the resistome, as emphasized in this study, has also been confirmed in research on other pathogens (mBio. 2024 Oct 16;15(10):e0242824). Thus, we believe that the pipeline we developed to assess the horizontal transfer frequency of different resistance genes across regions applies to various pathogens. On the other hand, due to distinct evolutionary histories, different pathogens exhibit varying levels of adaptation to their environments. In this study, we found that S. Gallinarum tends to spread highly localized; however, this conclusion may not necessarily hold for other pathogens.

      Reviewer #2 (Recommendations for the authors): 

      The authors would need to: 

      (1) Address my concerns about genomic analyses listed in the public review. 

      Thank you for your valuable feedback. We have carefully reviewed your concerns and made the necessary revisions to address the points raised about genomic analyses in the public review. We sincerely hope that these modifications meet your expectations and provide more robust analysis. We appreciate your thoughtful input and remain open to further suggestions to improve the manuscript.

      (2) Add more detail on the genomic methods and their outputs, as suggested above. 

      We have added further details to clarify the methodologies and outputs as mentioned above. Specifically, we expanded the description of the data processing, and the bioinformatic tools used for analysis. To ensure clarity, we also included an expanded discussion of the key outputs, highlighting their implications. We hope these revisions meet your expectations.

      (3) Critically rewrite their introduction to make it clear what problem they are trying to address. 

      Thank you for your guidance. In the revised manuscript, we have made the necessary modifications to the Introduction section to more clearly articulate the problem we aim to address.

      (4) Critically rewrite their conclusions so they are supported by the data they present, and make it clear when claims are more speculative. 

      Thank you for your guidance. In the revised manuscript, we have made the recommended modifications to the relevant sections of the conclusion as outlined above.

      More minor issues I identified: 

      (1) Typo in the title 'avian-restrict'. 

      Done.

      Line 1: “Avian-specific Salmonella enterica Serovar Gallinarum transition to endemicity is accompanied by localized resistome and mobilome interaction.”

      (2) 'By utilizing the pipeline we developed' -- a pipeline has not been introduced at this point. 

      In the revised manuscript, we have removed this section from the 'Abstract'.

      Lines 46-48: “Notably, the mobilome-resistome combination among distinct lineages exhibits a geographical-specific manner, further supporting a localized endemic mobilome-driven process.”

      (3) 'has more than 90% serovars' -- doesn't make sense. 

      Revised.

      Lines 82-83: “Salmonella, a pathogen with distinct geographical characteristics, has more than 90% of its serovars frequently categorized as geo-serotypes.”

      (4) 'horrific mortality rates that remain a disproportionate burden'. 

      Revised.

      Lines 83-87: “Among the thousands of geo-serotypes, Salmonella enterica Serovar Gallinarum (S. Gallinarum) is an avian-specific pathogen that causes severe mortality, with particularly detrimental effects on the poultry industry in low- and middle-income countries.”

      (5) What is the rate, what is a comparison, how is it disproportionate? 

      Thank you for your valuable feedback. It is challenging to accurately estimate the specific prevalence of S. Gallinarum, particularly due to the lack of comprehensive data in many countries. Numerous cases likely go unreported. However, S. Gallinarum is more commonly detected in low- and middle-income countries. Here, we provide three evidence supporting this observation. First, in our previous research, we conducted a comprehensive meta-analysis of 201 studies, involving over 900 million samples, to evaluate the global impact of S. Gallinarum (Sci Data. 2022 Aug 13;9(1):495). The estimated prevalence in 17 countries showed that Bangladesh had the highest rate (25.75%) of S. Gallinarum infections. However, for biovar Pullorum (bvSP), Argentina (20.69%) and China (18.18%) reported the highest prevalence rates. Second, previous studies have also reported that S. Gallinarum predominantly occurs in low- and middleincome countries (Vet Microbiol. 2019 Jan:228:165-172; BMC Microbiol. 2024 Oct 18;24(1):414). Finally, S. Gallinarum was once a globally prevalent pathogen in the 20th century. Following the implementation of eradication programs in most high-income countries, it was listed by the World Organization for Animal Health and subsequently became an endemic pathogen with sporadic outbreaks. However, similar eradication efforts are challenging to implement in low- and middle-income countries, leading to a disproportionately higher incidence of S. Gallinarum in these regions.

      In the revised manuscript, we have rephrased this sentence to enhance its accuracy:

      Lines 83-87: “Among the thousands of geo-serotypes, Salmonella enterica serovar Gallinarum (S. Gallinarum) is an avian-specific pathogen that causes severe mortality, with particularly detrimental effects on the poultry industry in low- and middle-income countries.”

      (6) 'we collected the most comprehensive set of 580 S. Gallinarum isolates', -> 'we collected the most comprehensive set S. Gallinarum isolates, consisting of 580 genomes'. 

      Revised.

      Lines 97-100: “To fill the gaps in understanding the evolution of S. Gallinarum under regional-associated AMR pressures and its adaptation to endemicity, we collected the most comprehensive set S. Gallinarum isolates, consisting of 580 genomes, spanning the period from 1920 to 2023.” 

      (7) Sequence reads are not available, and use a non-standard database. The eLife policy states: 'Sequence reads and assembly must be included for reference genomes, while novel short sequences, including epitopes, functional domains, genetic markers and haplotypes should be deposited, together with surrounding sequences, into Genbank, DNA Data Bank of Japan (DDBJ), or EMBL Nucleotide Sequence Database (ENA). DNA and RNA sequencing data should be deposited in NCBI Trace Archive or NCBI Sequence Read Archive (SRA).' So the sequences assemblies and reads should ideally be mirrored appropriately. 

      Thank you for your valuable suggestion regarding submitting the genome data for the newly isolated 45 S. Gallinarum strains. The genome data have been deposited in the NCBI Sequence Read Archive (SRA) under two BioProjects. The “SRA Accession number” for each strain have been added to New Supplementary Table 1. We believe this will ensure that the data are more readily accessible to a broader audience of researchers for download and analysis. We have revised the corresponding paragraph in the manuscript as follows:

      Lines 606-608: “For the newly isolated 45 strains of Salmonella Gallinarum, genome data have been deposited in NCBI Sequence Read Archive (SRA) database. The “SRA Accession” for each strain are listed in Supplementary Table 1.”

      (8) You should state at the start of the results which data is public, and how much is newly sequenced. 

      Revised.

      Lines 109-112: “To understand the global geographic distribution and genetic relationships of S. Gallinarum, we assembled the most comprehensive S. Gallinarum WGS dataset (n=580), comprising 535 publicly available genomes and 45 newly sequenced genomes.”

    1. eLife Assessment

      This valuable study tackles the well-established overflow metabolism issue by applying a coarse-grained metabolic flux model to predict how individual cells execute various energy strategies, such as respiration versus fermentation. The model's population average is convincing enough to align with experimental observations on overflow metabolism. However, the theoretical framework's reliance on single-cell growth rate variability must be questioned because of insufficient correlation with fluxes and the absence of regulatory mechanisms, highlighting the need for single-cell experimental validation to substantiate the proposed model.

    2. Reviewer #1 (Public review):

      Summary:

      Cell metabolism exhibits a well-known behavior in fast-growing cells, which employ seemingly wasteful fermentation to generate energy even in the presence of sufficient environmental oxygen. This phenomenon is known as Overflow Metabolism or the Warburg effect in cancer. It is present in a wide range of organisms, from bacteria and fungi to mammalian cells.

      In this work, starting with a metabolic network for Escherichia coli based on sets of carbon sources, and using a corresponding coarse-grained model, the author applies some well-based approximations from the literature and algebraic manipulations. These are used to successfully explain the origins of Overflow Metabolism, both qualitatively and quantitatively, by comparing the results with E. coli experimental data.

      By modeling the proteome energy efficiencies for respiration and fermentation, the study shows that these parameters are dependent on the carbon source quality constants K_i (p.115 and 116). It is demonstrated that as the environment becomes richer, the optimal solution for proteome energy efficiency shifts from respiration to fermentation. This shift occurs at a critical parameter value K_A(C).<br /> This counter intuitive results qualitativelly explains Overflow Metabolism.

      Quantitative agreement is achieved through the analysis of the heterogeneity of the metabolic status within a cell population. By introducing heterogeneity, the critical growth rate is assumed to follow a Gaussian distribution over the cell population, resulting in accordance with experimental data for E. coli. Overflow metabolism is explained by considering optimal protein allocation and cell heterogeneity.

      The obtained model is extensively tested through perturbations: 1) Introduction of overexpression of useless proteins; 2) Studying energy dissipation; 3) Analysis of the impact of translation inhibition with different sub-lethal doses of chloramphenicol on Escherichia coli; 4) Alteration of nutrient categories of carbon sources using pyruvate. All model perturbations results are corroborated by E. coli experimental results.

      Strengths:

      In this work, the author effectively uses modeling techniques typical of Physics to address complex problems in Biology, demonstrating the potential of interdisciplinary approaches to yield novel insights. The use of Escherichia coli as a model organism ensures that the assumptions and approximations are well-supported in existing literature. The model is convincingly constructed and aligns well with experimental data, lending credibility to the findings. In this version, the extension of results from bacteria to yeast and cancer is substantiated by a literature base, suggesting that these findings may have broad implications for understanding diverse biological systems.

      Weaknesses:

      The author explores the generalization of their results from bacteria to cancer cells and yeast, adapting the metabolic network and coarse-grained model accordingly. In previous version this generalization was not completedly supported by references and data from the literature. This drawback, however, has been treated in this current version, where the authors discuss in much more detail and give references supporting this generalization.

    3. Reviewer #2 (Public review):

      In this version of manuscript, the author clarified many details and rewrote some sections. This substantially improved the readability of the paper. I also recognized that the author spent substantial efforts in the Appendix to answer the potential questions.

      Unfortunately, I am not currently convinced by the theory proposed in this paper. In the next section, I will first recap the logic of the author and explain why I am not convinced. Although the theory fits many experimental results, other theories on overflow metabolism are also supported by experiments. Hence, I do not think based on experimental data we could rule in or rule out different theories.

      Recap: To explain the origin of overflow metabolism, the author uses the following logic:

      (1) There is a substantial variability of single-cell growth rate<br /> (2) The flux (J_r^E) and (J_f^E) are coupled with growth rate by Eq. 3<br /> (3) Since growth rate varies from cells to cells, flux (J_r^E) and (J_f^E) also varies<br /> (4) The variabilities of above fluxes in above create threshold-analog relation, and hence overflow metabolism.

      My opinion:

      The logic step (2) and (3) have caveats. The variability of growth rate has large components of cellular noise and external noise. Therefore, variability of growth rate is far from 100% correlated with variability of flux (J_r^E) and (J_f^E) at the single-cell level. Single-cell growth rate is a complex, multivariate functional, including (Jr^E) and (J_f^E) but also many other variables. My feeling is the correlation could be too low to support the logic here.

      One example: ribosomal concentration is known to be an important factor of growth rate in bulk culture. However, the "growth law" from bulk culture cannot directly translate into the growth law at single-cell level [Ref1,2]. This is likely due to other factors (such as cell aging, other muti-stability of cellular states) are involved.

      Therefore, I think using Eq.3 to invert the distribution of growth rate into the distribution of (Jr^E) and (J_f^E) is inapplicable, due to the potentially low correlation at single-cell level. It may show partial correlations, but may not be strong enough to support the claim and create fermentation at macroscopic scale.

      Overall, if we track the logic flow, this theory implies overflow metabolism is originated from variability of k_cat of catalytic enzymes from cells to cells. That is, the author proposed that overflow metabolism happens macroscopically as if it is some "aberrant activation of fermentation pathway" at the single-cell level, due to some unknown partially correlation from growth rate variability.

      Compared with other theories, this theory does not involve any regulatory mechanism and can be regarded as a "neutral theory". I am looking forward to seeing single cell experiments in the future to provide evidences about this theory.

      [Ref1] https://www.biorxiv.org/content/10.1101/2024.04.19.590370v2<br /> [Ref2] https://www.biorxiv.org/content/10.1101/2024.10.08.617237v2

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Cell metabolism exhibits a well-known behavior in fast-growing cells, which employ seemingly wasteful fermentation to generate energy even in the presence of sufficient environmental oxygen. This phenomenon is known as Overflow Metabolism or the Warburg effect in cancer. It is present in a wide range of organisms, from bacteria and fungi to mammalian cells.

      In this work, starting with a metabolic network for Escherichia coli based on sets of carbon sources, and using a corresponding coarse-grained model, the author applies some well-based approximations from the literature and algebraic manipulations. These are used to successfully explain the origins of Overflow Metabolism, both qualitatively and quantitatively, by comparing the results with E. coli experimental data.

      By modeling the proteome energy efficiencies for respiration and fermentation, the study shows that these parameters are dependent on the carbon source quality constants K_i (p.115 and 116). It is demonstrated that as the environment becomes richer, the optimal solution for proteome energy efficiency shifts from respiration to fermentation. This shift occurs at a critical parameter value K_A(C).

      This counterintuitive result qualitatively explains Overflow Metabolism.

      Quantitative agreement is achieved through the analysis of the heterogeneity of the metabolic status within a cell population. By introducing heterogeneity, the critical growth rate is assumed to follow a Gaussian distribution over the cell population, resulting in accordance with experimental data for E. coli. Overflow metabolism is explained by considering optimal protein allocation and cell heterogeneity.

      The obtained model is extensively tested through perturbations: 1) Introduction of overexpression of useless proteins; 2) Studying energy dissipation; 3) Analysis of the impact of translation inhibition with different sub-lethal doses of chloramphenicol on Escherichia coli; 4) Alteration of nutrient categories of carbon sources using pyruvate. All model perturbation results are corroborated by E. coli experimental results.

      We appreciate the reviewer's highly positive comments and the accurate summary of our manuscript.

      Strengths:

      In this work, the author employs modeling methods typical of Physics to address a problem in Biology, standing at the interface between these two scientific fields. This interdisciplinary approach proves to be highly fruitful and should be further explored in the literature. The use of Escherichia coli as an example ensures that all hypotheses and approximations in this study are well-founded in the literature. Examples include the approximation for the Michaelis-Menten equation (line 82), Eq. S1, proteome partition in Appendix 1.1 (lines 68-69), and a stable nutrient environment in Appendix 1.1 (lines 83-84). The section "Testing the model through perturbation" heavily relies on bacterial data. The construction of the model and its agreement with experimental data are convincingly presented.

      We appreciate the reviewer's highly positive comments. We have incorporated many of the reviewer's insightful suggestions and added citations in the appropriate contexts, which have significantly improved our manuscript.

      Weaknesses:

      In Section Appendix 6.4, the author explores the generalization of results from bacteria to cancer cells, adapting the metabolic network and coarse-grained model accordingly. It is argued that as a consequence, all subsequent steps become immediately valid. However, I remain unconvinced, considering the numerous approximations used to derive the equations, which the literature demonstrates to be valid primarily for bacteria. A more detailed discussion about this generalization is recommended. Additionally, it is crucial to note that the experimental validation of model perturbations heavily relies on E. coli data.

      We appreciate the reviewer's insightful suggestions. We apologize for not clearly illustrating the generalization of results from bacteria to cancer cells in the previous version of our manuscript. Indeed, in our earlier version, there was no experimental validation of model results related to cancer cells.

      Following the reviewer’s suggestions, we have now added Fig. 5 and Appendix-fig. 5, fully expanded the previous Appendix 6.4 into Appendix 9 in our current version, and added a new section entitled “Explanation of the Crabtree effect in yeast and the Warburg effect in cancer cells” in our main text to provide a detailed discussion of the generalization from bacteria to yeast and cancer cells. Through the derivations shown in Appendix 9 (Eqs. S180-S189), we arrived at Eq. 6 (or Eq. S190 in Appendix 9) to facilitate the comparison of our model results with experimental data in yeast and cancer cells. This comparison is presented in Fig. 5, where we demonstrate that our model can quantitatively explain the data for the Crabtree effect in yeast and the Warburg effect in cancer cells (related experimental data references: Shen et al., Nature Chemical Biology 20, 1123–1132 (2024); Bartman et al., Nature 614, 349-357 (2023)). These additions have significantly strengthened our manuscript.

      Reviewer #2 (Public Review):

      Summary

      This paper has three parts. The first part applied a coarse-grained model with proteome partition to calculate cell growth under respiration and fermentation modes. The second part considered single-cell variability and performed population average to acquire an ensemble metabolic profile for acetate fermentation. The third part used model and simulation to compare experimental data in literature and obtained substantial consistency.

      We thank the reviewer for the accurate summary and positive comments on our manuscript.

      Strengths and major contributions

      (i) The coarse-grained model considered specific metabolite groups and their interrelations and acquired an analytical solution for this scenario. The "resolution" of this model is in between the Flux Balanced Analysis/whole-cell simulation and proteome partition analysis.

      (ii) The author considered single-cell level metabolic heterogeneity and calculated the ensemble average with explicit calculation. The results are consistent with known fermentation and growth phenomena qualitatively and can be quantitatively compared to experimental results.

      We appreciate the reviewer’s highly positive comments.

      Weaknesses

      (i) If I am reading this paper correctly, the author's model predicts binary (or "digital") outcomes of single-cell metabolism, that is, after growth rate optimization, each cell will adopt either "respiration mode" or "fermentation mode" (as illustrated in Figure Appendix - Figure 1 C, D). Due to variability enzyme activity k_i^{cat} and critical growth rate λ_C, each cell under the same nutrient condition could have either respiration or fermentation, but the choice is binary.

      The binary choice at the single-cell level is inconsistent with our current understanding of metabolism. If a cell only uses fermentation mode (as shown in Appendix - Figure 1C), it could generate enough energy but not be able to have enough metabolic fluxes to feed into the TCA cycle. That is, under pure fermentation mode, the cell cannot expand the pool of TCA cycle metabolites and hence cannot grow.

      This caveat also appears in the model in Appendix (S25) that assumes J_E = r_E*J_{BM} where r_E is a constant. From my understanding, r_E can be different between respiration and fermentation modes (at least for real cells) and hence it is inappropriate to conclude that cells using fermentation, which generates enough energy, can also generate a balanced biomass.

      We thank the reviewer for raising this question. Indeed, regarding energy biogenesis between respiration and fermentation, our model predicts binary outcomes at the single-cell level. However, this outcome does not hinder cell growth, as there are three independent possible fates for the carbon source (e.g., glucose) in metabolism: fermentation, respiration for energy biogenesis, and biomass generation. Each fate is associated with a distinct fraction of the proteome, with no overlap between them (see Appendix-figs. 1 and 5). Consequently, in a purely fermentative mode, a cell can still use the proteome dedicated to the biomass generation pathway to produce biomass precursors via the TCA cycle.

      The classification of the carbon source’s fates into three independent pathways was initially introduced by Chen and Nielsen (Chen and Nielsen, PNAS 116, 17592-17597 (2019)). We apologize for the oversight in not citing their paper in this context in the previous version of our manuscript (although it was cited elsewhere). We have now included the citation in all appropriate places.

      To illustrate this issue more clearly, we explicitly present the proteome allocation results for optimal growth in a fermentation mode below, where the proteome efficiency (i.e., the proteome energy efficiency in our previous version) in fermentation is higher than in respiration (i.e., ). We use the model shown in Fig. 1B as an example, with the relevant equations being Eqs. S26 and S28 in Appendix 2.1. By substituting Eq. S28 into Eq. S26, we arrive at Eq. 3 (or Eq. S29 in Appendix 2.1), which we restate here as Eq. R1:

      For a given nutrient condition, i.e., for a specific value of κ<sub>A</sub> at the single-cell level, the values of are determined (see Eqs. S20, S27, S31 and S32), while  ϕ and φ<sub>max</sub> are constants (see Eq. S33 and Appendix 1.1). Therefore, if , then , since all coefficients are positive (i.e., ) and takes non-negative values. Hence, the solution for optimal growth is (see Eqs. S35-S36 in Appendix 2.2):

      Here, the result signifies a pure fermentation mode with no respiration flux for energy biogenesis. Then, by combining Eq. R2 with Eqs. S28 and S30 from Appendix 2.1, we obtain the optimal proteome allocation results for this case:

      where , while κ<sub>A</sub> and take given values (see Eqs. S20 and S27). In Eq. R3, φ<sub>3</sub> corresponds to the fraction of the proteome devoted to carrying the carbon flux from Acetyl-CoA (the entry point of Pool b, see Fig. 1B and Appendix 1.2) to α-Ketoglutarate (the entry point of Pool c), with all of these being enzymes within the TCA cycle. The optimal growth solution is , which demonstrates that in a pure fermentation mode, the optimal growth condition includes the presence of enzymes within the TCA cycle capable of carrying the flux required for biomass generation.

      Regarding Eq. S25, J<sub>E</sub> represents the energy demand for cell proliferation, expressed as the stoichiometric energy flux in ATP. Although the influx of carbon sources (e.g., glucose) varies significantly between fermentation and respiration modes, J<sub>BM</sub> and J<sub>E</sub>  are the biomass and energy fluxes used to build cells, respectively. In bacteria, whether in fermentation or respiration mode, the proportion of maintenance energy used for protein degradation is roughly negligible (see Locasale and Cantley, BMC Biol 8, 88 (2010)). Consequently, the energy demand represented by J_E scales approximately linearly with the biomass production rate _J<sub>BM</sub> (related experimental data reference: Ebenhöh et al., Life 14, 247 (2024)), regardless of the energy biogenesis mode. Therefore, _r_E can be regarded as roughly constant for bacteria. However, in eukaryotic cells such as yeast and mammalian cells, the proportion of maintenance energy is much more significant (see Locasale and Cantley, BMC Biol 8, 88 (2010)). Therefore, we have explicitly considered the contribution of maintenance energy in these cases and have extended the previous Appendix 6.4 into Appendix 9 in the current version.

      (ii) The minor weakness of this model is that it assumes a priori that each cell chooses its metabolic strategy based on energy efficiency. This is an interesting assumption but there is no known biochemical pathway that directly executes this mechanism. In evolution, growth rate is more frequently considered for metabolic optimization. In Flux Balanced Analysis, one could have multiple objective functions including biomass synthesis, energy generation, entropy production, etc. Therefore, the author would need to justify this assumption and propose a reasonable biochemical mechanism for cells to sense and regulate their energy efficiency.

      We thank the reviewer for raising this question and apologize for not explaining this point clearly enough in the previous version of our manuscript. Just as the reviewer mentioned, growth rate should be considered for metabolic optimization under the selection pressure of the evolutionary process. In fact, in our model, the sole optimization objective is exactly the cell growth rate. The determination of whether to use fermentation or respiration based on proteome efficiency (i.e., the proteome energy efficiency in our previous version) is not an a priori assumption in our model; rather, it is a natural consequence of growth rate optimization, as we detail below. 

      For a given nutrient condition with a determined value of κ<sub>A</sub> , as we have explained in the aforementioned responses, the constraint on the fluxes is summarized in Eq. 3 and is restated as Eq. R1. Mathematically, we can obtain the solution for the optimal growth strategy by combining Eq. R1 (i.e., Eq. 3) with the optimization on cell growth rate λ, and the solution can be obtained as follows: If the proteome efficiency in fermentation is larger than that in respiration, i.e., , then from Eq. R1, we obtain , since the values of ε<sub>r</sub> , ε<sub>f </sub>, Ψ, ϕ and φ<sub>max</sub> are all fixed for a given κ_A_ , with ε<sub>r</sub> , ε<sub>f </sub>, Ψ, ϕ, φ<sub>max</sub> > 0 . Hence, (since ), and note that . Therefore is the solution for optimal growth, where the growth rate can take the maximum value of . Similarly, for the case where the proteome efficiency in respiration is larger than that in fermentation (i.e ), is the solution for optimal growth. With this analysis, we have demonstrated that the choice between fermentation and respiration based on proteome efficiency is a natural consequence of growth rate optimization.

      We have now revised the related content in our manuscript to clarify this point.

      My feeling is that the mathematical structure of this model could be correct, but the single-cell interpretation for the ensemble averaging has issues. Each cell could potentially adopt partial respiration and partial fermentation at the same time and have temporal variability in its metabolic mode as well. With the modification of the optimization scheme, the author could have a revised model that avoids the caveat mentioned above.

      We thank the reviewer for raising this question. In fact, in the above two responses, we have addressed the issues raised here, clarifying that the binary mode between respiration and fermentation does not hinder cell growth and that the sole optimization objective is the cell growth rate, as the reviewer suggested. Regarding temporal variability, due to factors such as cell cycle stages and the intrinsic noise arising from stochastic processes, temporal variability in the fermentation or respiration mode is indeed likely. However, at any given moment at the single-cell level, a binary choice between fermentation and respiration is what our model predicts for the optimal growth strategy. 

      Discussion and impact for the field

      Proteome partition models and Flux Balanced Analysis are both commonly used mathematical models that emphasize different parts of cellular physiology. This paper has ingredients for both, and I expect after revision it will bridge our understanding of the whole cell.

      We appreciate the reviewer’s very positive comments. We have followed many of the good suggestions raised by the reviewer, and our revised manuscript is much improved as a result.

      Reviewer #3 (Public Review):

      Summary:

      In the manuscript "Overflow metabolism originates from growth optimization and cell heterogeneity" the author Xin Wang investigates the hypothesis that the transition into overflow metabolism at large growth rates actually results from an inhomogeneous cell population, in which every individual cell either performs respiration or fermentation.

      We thank the reviewer for carefully reading our manuscript and the accurate summary.

      Weaknesses:

      The paper has several major flaws. First, and most importantly, it repeatedly and wrongly claims that the origins of overflow metabolism are not known. The paper is written as if it is the first to study overflow metabolism and provide a sound explanation for the experimental observations. This is obviously not true and the author actually cites many papers in which explanations of overflow metabolism are suggested (see e.g. Basan et al. 2015, which even has the title "Overflow metabolism in E. coli results from efficient proteome allocation"). The paper should be rewritten in a more modest and scientific style, not attempting to make claims of novelty that are not supported. In fact, all hypotheses in this paper are old. Also the possiblility that cell heterogeneity explains the observed 'smooth' transition into overflow metabolism has been extensively investigated previously (see de Groot et al. 2023, PNAS, "Effective bet-hedging through growth rate dependent stability") and the random drawing of kcat-values is an established technique (Beg et al., 2007, PNAS, "Intracellular crowding defines the mode and sequence of substrate uptake by Escherichia coli and constrains its metabolic activity"). Thus, in terms of novelty, this paper is very limited. It reinvents the wheel and it is written as if decades of literature debating overflow metabolism did not exist.

      We thank the reviewer for both the critical and constructive comments. Following the reviewer’s suggestion, we have revised our manuscript to adopt a more modest style. However, we respectfully disagree with the criticism regarding the novelty of our study, as detailed below.

      First, while many explanations for overflow metabolism have been proposed, we have cited these in both the previous and current versions of our manuscript. We apologize for not emphasizing the distinctions between these previous explanations and our study in the main text of our earlier version, though we did provide details in Appendix 6.3. In fact, most of these explanations (e.g., Basan et al., Nature 528, 99-104 (2015); Chen and Nielsen, PNAS 116, 17592-17597 (2019); Majewski and Domach, Biotechnol. Bioeng. 35, 732-738 (1990); Niebel et al., Nat. Metab. 1, 125-132 (2019); Shlomi et al., PLoS Comput. Biol. 7, e1002018 (2011); Varma and Palsson, Appl. Environ. Microbiol. 60, 3724-3731 (1994); Vazquez et al., BMC Syst. Biol. 4, 58 (2010); Vazquez and Oltvai, Sci. Rep. 6, 31007 (2016); Zhuang et al., Mol. Syst. Biol. 7, 500 (2011)) heavily rely on the assumption that cells optimize their growth rate for a given rate of carbon influx under each nutrient condition (or certain equivalents) to explain the growth rate dependence of fermentation flux. However, this assumption—that cell growth rate is optimized for a given rate of carbon influx—is questionable, as the given factors in a nutrient condition are the identity and concentration of the carbon source, rather than the carbon influx itself.

      Consequently, in our model, we purely optimize cell growth rate without imposing a special constraint on carbon influx. Our assumption that the given factors in a nutrient condition are the identity and concentration of the carbon source aligns with the studies by Molenaar et al. (Molenaar et al., Mol. Syst. Biol. 5, 323 (2009)), where they specified an identical assumption on page 5 of their Supplementary Information (SI); Scott et al. (Scott et al., Science 330, 1099-1102 (2010)), where the growth rate formula was derived for a culturing condition with a given nutrient quality; and Wang et al. (Wang et al., Nat. Comm. 10, 1279 (2019)), our previous study on microbial growth. Among these three studies, only Molenaar et al. addresses overflow metabolism. However, Molenaar et al. did not consider cell heterogeneity, resulting in their model predictions on the growth rate dependence of fermentation flux being a digital response, which is inconsistent with experimental data.

      Furthermore, prevalent explanations such as those by Basan et al. (Basan et al., Nature 528, 99-104 (2015)) and Chen and Nielsen (Chen and Nielsen, PNAS 116, 17592-17597 (2019)) suggest that overflow metabolism originates from the proteome efficiency in fermentation always being higher than in respiration. However, Shen et al. (Shen et al., Nature Chemical Biology 20, 1123–1132 (2024)) recently discovered that the proteome efficiency measured at the cell population level in respiration is higher than in fermentation for many yeast and cancer cells, despite the presence of fermentation fluxes through aerobic glycolysis. This finding clearly contradicts the studies by Basan et al. (2015) and Chen and Nielsen (2019). 

      Nevertheless, our model may resolve this puzzle by incorporating two important features. First, in our model, the proteome efficiency (i.e., the proteome energy efficiency in our previous version) in respiration is larger than that in fermentation when nutrient quality is low (Eqs. S174-S175 in Appendix 9). Second, and crucially, due to the incorporation of cell heterogeneity in our model, there could be a proportion of cells with higher proteome efficiency in fermentation than in respiration, even when the overall proteome efficiency at the cell population level is higher in respiration than in fermentation. As shown in the newly added Fig. 5A-B, our model results can quantitatively illustrate the experimental data from Shen et al., Nature Chemical Biology 20, 1123–1132 (2024).

      Finally, regarding the criticism of the novelty of our hypothesis: As specified in our main text, cell heterogeneity has been widely reported experimentally in both microbes (e.g., Ackermann, Nat. Rev. Microbiol. 13, 497-508 (2015); Bagamery et al., Curr. Biol. 30, 4563-4578 (2020); Balaban et al., Science 305, 1622-1625 (2004); Nikolic et al., BMC Microbiol. 13, 1-13 (2013); Solopova et al., PNAS 111, 7427-7432 (2014); Wallden et al., Cell 166, 729-739 (2016)) and tumor cells (e.g., Duraj et al., Cells 10, 202 (2021); Hanahan and Weinberg, Cell 164, 681-694 (2011); Hensley et al., Cell 164, 681-694 (2016)). However, to the best of our knowledge, cell heterogeneity has not yet been incorporated into theoretical models for explaining overflow metabolism or the Warburg effect. The reviewer mentioned the study by de Groot et al. (de Groot et al., PNAS 120, e2211091120 (2023)) as studying overflow metabolism similarly to our work. We have carefully read this paper, including the main text and SI, and found that it is not directly relevant to either overflow metabolism or the Warburg effect. Instead, their model extends the work of Kussell and Leibler (Kussell and Leibler, Science 309, 2075-2078 (2005)), focusing on bet-hedging strategies of microbes in changing environments.

      Regarding the criticism that random drawing of kcat-values is an established technique (Beg et al., PNAS 104, 12663-12668 (2007)), we need to stress that the distribution noise on kcat-values considered in our model is fundamentally different from that in Beg et al. In Beg et al., their model involved 876 reactions (see Dataset 1 in Beg et al.), of which only 109 had associated biochemical experimental data. Thus, their distribution of kcat-values pertains to different enzymes within the same cell. In contrast, we have the mean of the kcat-values from experimental data for each relevant enzymes, with the distribution of kcat-values representing the same enzyme in different cells.           

      Moreover, the manuscript is not clearly written and is hard to understand. Variables are not properly introduced (the M-pools need to be discussed, fluxes (J_E), "energy coefficients" (eta_E), etc. need to be more explicitly explained. What is "flux balance at each intermediate node"? How is the "proteome efficiency" of a pathway defined? The paper continues to speak of energy production. This should be avoided. Energy is conserved (1st law of thermodynamics) and can never be produced. A scientific paper should strive for scientific correctness, including precise choice of words.

      We thank the reviewer for the constructive comments. Following these, we have provided more explicit information and revised our manuscript to enhance readability. In our initially submitted version, the phrase "energy production" was borrowed from Nelson et al. (Nelson et al., Lehninger principles of biochemistry, 2008) and Basan et al. (Basan et al., Nature 528, 99-104 (2015)), and we chose to follow this terminology. We appreciate the reviewer’s suggestion and have now revised the wording to use more appropriate expressions.

      The statement that the "energy production rate ... is proportional to the growth rate" is, apart from being incorrect - it should be 'ATP consumption rate' or similar (see above), a non-trivial claim. Why should this be the case? Such statements must be supported by references. The observation that the catabolic power indeed appears to increase linearly with growth rate was made, based on chemostat data for E.coli and yeast, in a recent preprint (Ebenhöh et al, 2023, bioRxiv, "Microbial pathway thermodynamics: structural models unveil anabolic and catabolic processes").

      We thank the reviewer for the insightful suggestions. Following these, we have revised our manuscript and cited the suggested reference (i.e., Ebenhöh et al., Life 14, 247 (2024)).

      All this criticism does not preclude the possibility that cell heterogeneity plays a role in overflow metabolism. However, according to Occam's razor, first the simpler explanations should be explored and refuted before coming up with a more complex solution. Here, it means that the authors first should argue why simpler explanations (e.g. the 'Membrane Real Estate Hypothesis', Szenk et al., 2017, Cell Systems; maximal Gibbs free energy dissipation, Niebel et al., 2019, Nature Metabolism; Saadat et al., 2020, Entropy) are not considered, resp. in what way they are in disagreement with observations, and then provide some evidence of the proposed cell heterogeneity (are there single-cell transcriptomic data supporting the claim?).

      We thank the reviewer for raising these questions and providing valuable insights. Regarding the shortcomings of simpler explanations, as explained above, most proposed explanations (including the references mentioned by the reviewer: Szenk et al., Cell Syst. 5, 95-104 (2017); Niebel et al., Nat. Metab. 1, 125-132 (2019); Saadat et al., Entropy 22, 277 (2020)) rely heavily on the assumption that cells optimize their growth rate for a given rate of carbon influx under each nutrient condition (or its equivalents). However, this assumption is questionable, as the given factors in a nutrient condition are the identities and concentrations of the carbon sources, rather than the carbon influx itself.

      Specifically, Szenk et al. is a perspective paper, and the original “membrane real estate hypothesis” was proposed by Zhuang et al. (Zhuang et al., Mol. Syst. Biol. 7, 500 (2011)). Zhuang et al. specified in Section 7 of their SI that their model’s explanation of the experimental results shown in Fig. 2C of their manuscript relies on the assumption of restrictions on carbon influx. In Niebel et al. (Niebel et al., Nat. Metab. 1, 125-132 (2019)), the Methods section specifies that the glucose uptake rate was considered a given factor for a growth condition. In Saadat et al. (Saadat et al., Entropy 22, 277 (2020)), Appendix A notes that their model results depend on minimizing carbon influx for a given growth rate, which is equivalent to the assumption mentioned above (see Appendix 6.3 in our manuscript for details). 

      Regarding the experimental evidence for our proposed cell heterogeneity, Bagamery et al. (Bagamery et al., Curr. Biol. 30, 4563-4578 (2020)) reported non-genetic heterogeneity in two subpopulations of Saccharomyces cerevisiae cells upon the withdrawal of glucose from exponentially growing cells. This strongly indicates the coexistence of fermentative and respiratory modes of heterogeneity in S. cerevisiae cultured in a glucose medium (refer to Fig. 1E in Bagamery et al.). Nikolic et al. (Nikolic et al., BMC Microbiol. 13, 1-13 (2013)) reported a bimodal distribution in the expression of the acs gene (the transporter for acetate) in an E. coli cell population growing on glucose as the sole carbon source within the region of overflow metabolism (see Fig. 5 in Nikolic et al.), indicating the cell heterogeneity we propose. For cancer cells, Duraj et al. (Duraj et al., Cells 10, 202 (2021)) reported a high level of intra-tumor heterogeneity in glioblastoma using optical microscopy images, where 48%~75% of the cells use fermentation and the remainder use respiration (see Fig. 1C in Duraj et al.), which aligns with the cell heterogeneity picture of aerobic glycolysis predicted by our model.   

      We have now added related content to the discussion section to strengthen our manuscript.

      Reviewer #1 (Recommendations For The Authors): 

      Some minor corrections:

      (1) Adjusted the reference: (García-Contreras et al., 2012)

      (2) Corrected line 255: Removed the duplicate "the genes"

      We thank the reviewer for the suggestions and have implemented each of them to revise our manuscript. The reference in the form of García-Contreras et al., 2012, although somewhat unusual, is actually correct, so we have kept it unchanged.

      General comment to the author:

      Considering that this work exists at the interface between Physics and Biology, where a significant portion of the audience may not be familiar with the mathematical manipulations performed, it would enhance the paper's readability to provide more explicit indications in the text. For example, in line 91, explicitly define phi_A as phi_R; or in line 115, explain the K_i parameter in the text for better readability.

      We thank the reviewer for the suggestion. Following this, we have now provided more explicit information for the definition of mathematical symbols to enhance readability.

      Reviewer #2 (Recommendations For The Authors):

      The current form of this manuscript is difficult to read for general readers. In addition, the model description in the Appendix can be improved for biophysics readers to keep track of the variables. Here are my suggestions:

      a) In the main text, the author should give the definition of "proteome energy efficiency" explicitly both in English and mathematical formula - since this is the central concept of the paper. The biological interpretation of formula (4) should also be stated.

      We thank the reviewer for the suggestion. Following this, we have now added definitions and biological interpretations to fix these issues.

      b) I feel the basic model of the reaction network in the Appendix could be stated in a more concise way, by emphasizing whether a variable is extensive (exponential growing) or intensive (scale-invariant under exponential growth).

      From my understanding, this work assumes balanced exponential growth and hence there is a balanced biomass vector Y* (a constant unit vector with all components sum to 1) for each cell. The steady-state fluxes {J} are extensive and all have growth rate λ. The proteome partition and relative metabolite fractions are ratios of different components of Y* and hence are intensive.

      The normalized fluxes {J^(n)} (with respect to biomass) are a function of Y* and are all kept as constant ratios with each other. They are also intensive.

      The biomass and energy production are linear combinations of {J} and hence are extensive and follow exponential growth. The biomass and energy efficiency are ratios between flux and proteome biomass, and hence are intensive.

      We thank the reviewer for the insightful suggestion. Following this, we have now added the intensive and extensive information for all relevant variables in the newly added Appendix-table 3.

      c) In the Appendix, the author should have a table or list of important variables, with their definition, units, and physiological values under respiration and fermentation.

      We thank the reviewer for the very useful suggestion. Following this, we have now added Appendix-table 3 (pages 54-57 in the appendices) to illustrate the symbols used throughout our manuscript, as well as the model variables and parameter settings.   

      d) Regarding the single-cell variability, the author ignored recent experimental measurements on single-cell metabolism. This includes variability on ATP, NAD(P)H in E. coli, which will be useful background for the readers, see below.

      https://pubmed.ncbi.nlm.nih.gov/25283467/

      https://pubmed.ncbi.nlm.nih.gov/29391569/

      We thank the reviewer for the very useful suggestion. We have now cited these relevant studies in our manuscript.  

      e) The choice between 100% respiration and 100% fermentation is based on the optimization of proteome energy efficiency, while the intermediate strategies are not favored in this model. This is similar to a concept in control theory called the bang-bang principle. This can be added to the Discussion.

      We thank the reviewer for this suggestion. We have reviewed the concept and articles on the bang-bang principle. While the bang-bang principle is indeed relevant to binary choices, it is somewhat distant from the topic of metabolic strategies related to optimal growth. The elementary flux mode (see Müller et al., J. Theor. Biol. 347, 182190 (2014); Wortel et al., FEBS J. 281, 1547-1555 (2014)) is more pertinent to this topic, as it may lead to diauxic microbial growth (another binary metabolic strategy) in microbes grown on a mixture of two carbon sources from Group A (see Wang et al., Nat. Comm. 10, 1279 (2019)). Therefore, we have cited and mentioned only the elementary flux mode (Müller et al., J. Theor. Biol. 347, 182-190 (2014); Wortel et al., FEBS J. 281, 1547-1555 (2014)) in the introduction and discussion sections of our manuscript.

    1. eLife Assessment

      This important study explores the association between mother-child interactions and the development of children's social brain networks, specifically the theory of mind and social pain networks. The findings provide solid evidence for enhanced stimulus-evoked neural synchronization between child-caregiver dyads, while the evidence for the other variables is incomplete and could be strengthened with further analyses. The study effectively bridges brain development with children's behavior and parenting practices and would be of interest to broad research communities in social neuroscience and developmental psychology.

    2. Reviewer #1 (Public review):

      The authors sought to examine the associations between child age, reports of parent-child relationship quality, and neural activity patterns while children (and also their parents) watched a movie clip. Major methodological strengths include the sample of 3-8 year-old children in China (rare in fMRI research for both age range and non-Western samples), use of a movie clip previously demonstrated to capture theory of mind constructs at the neural level, measurement of caregiver-child neural synchrony, and assessment of neural maturity. Results provide important new information about parent-child neural synchronization during this movie and associations with reports of parent-child relationship quality. The work is a notable advance in understanding the link between the caregiving context and the neural construction of theory of mind networks in the developing brain.

      There are several theoretical and methodological limitations of the manuscript in its current form:

      (1) We appreciate that the authors wanted to show support for a mediational mechanism. However, we suggest that the authors drop the structural equation modeling because the data are cross-sectional so mediation is not appropriate. Other issues include the weak justification of including the parent-child neural synchronization as part of parenting.... it could just as easily be a mechanism of change or driven by the child rather than a component of parenting behavior. The paper would be strengthened by looking at associations between selected variables of interest that are MOST relevant to the imaging task in a regression type of model. Furthermore, the authors need to be more explicit about corrections for multiple comparisons throughout the manuscript; some of the associations are fairly weak so claims may need to be tempered if they don't survive correction.

      (2) Reverse correlation analysis is sensible given what prior developmental fMRI studies have done. But reverse correlation analysis may be more prone to overfitting and noise, and lacks sensitivity to multivariate patterns. Might inter-subject correlation be useful for *within* the child group? This would minimize noise and allow for non-linear patterns to emerge.

      (3) No learning effects or temporal lagged effects are tested in the current study, so the results do not support the authors' conclusions that the data speak to Bandura's social learning theory. The authors do mention theories of biobehavioral synchrony in the introduction but do not discuss this framework in the discussion (which is most directly relevant to the data). The data can also speak to other neurodevelopmental theories of development (e.g.,neuroconstructivist approaches), but the authors do not discuss them. The manuscript would benefit from significantly revising the framework to focus more on biobehavioral synchrony data and other neurodevelopmental approaches given the prior work done in this area rather than a social psychology framework that is not directly evaluated.

      (4) The significance and impact of the findings would be clearer if the authors more clearly situated the findings in the context of (a) other movie and theory of mind fMRI task data during development; and (b) existing data on parent-child neural synchrony (often uses fNIRS or EEG). What principles of brain and social cognition development do these data speak to? What is new?

      (5) There is little discussion about the study limitations, considerations about the generalizability of the findings, and important next steps and future directions. What can the data tell us, and what can it NOT tell us?

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the impact of mother-child neural synchronization and the quality of parent-child relationships on the development of Theory of Mind (ToM) and social cognition. Utilizing a naturalistic fMRI movie-viewing paradigm, the authors analyzed inter-subject neural synchronization in mother-child dyads and explored the connections between neural maturity, parental caregiving, and social cognitive outcomes. The findings indicate age-related maturation in ToM and social pain networks, emphasizing the importance of dyadic interactions in shaping ToM performance and social skills, thereby enhancing our understanding of the environmental and intrinsic influences on social cognition.

      Strengths:

      This research addresses a significant question in developmental neuroscience, by linking social brain development with children's behaviors and parenting. It also uses a robust methodology by incorporating neural synchrony measures, naturalistic stimuli, and a substantial sample of mother-child dyads to enhance its ecological validity. Furthermore, the SEM approach provides a nuanced understanding of the developmental pathways associated with Theory of Mind (ToM).

      Weaknesses:

      (1) Upon reviewing the introduction, I feel that the first goal - developmental changes of the social brain and its relation to age - seems somewhat distinct from the other two goals and the main research question of the manuscript. The authors might consider revising this section to enhance the overall coherence of the manuscript. Additionally, the introduction lacks a clear background and rationale for the importance of examining age-related changes in the social brain.

      (2) The manuscript uses both "mother-child" and "parent-child" terminology. Does this imply that only mothers participated in the fMRI scans while fathers completed the questionnaires? If so, have the authors considered the potential impact of parental roles (father vs. mother)?

      (3) There is inconsistent usage of the terms ISC and ISS in the text and figures, both of which appear to refer to synchronization derived from correlation analysis. It would be beneficial to maintain consistency throughout the manuscript.

      (4) Of the 50 dyads, 16 were excluded due to data quality issues, which constitutes a significant proportion. It would be helpful to know whether these excluded dyads exhibited any distinctive characteristics. Providing information on demographic or behavioral differences-such as Theory of Mind (ToM) performance and age range between the excluded and included dyads would enhance the assessment of the findings' generalizability.

      (5) The article does not adhere to the standard practice of using a resting state as a baseline for subtracting from task synchronization. Is there a rationale for this approach? Not controlling for a baseline may lead to issues, such as whether resting state synchronization already differs between subjects with varying characteristics.

      (6) The title of the manuscript suggests a direct influence of mother-child interactions on children's social brain and theory of mind. However, the use of structural equation modeling (SEM) may not fully establish causal relationships. It is possible that the development of children's social brain and ToM also enhances mother-child neural synchronization. The authors should address this alternative hypothesis of the potential bidirectional relationship in the discussion and exercise caution regarding terms that imply causality in the title and throughout the manuscript.

      (7) I would appreciate more details about the 14 Theory of Mind (ToM) tasks, which could be included in supplemental materials. The authors score them on a scale from 0 to 14 (each task 1 point); however, the tasks likely vary in difficulty and should carry different weights in the total score (for example, the test and the control questions should have different weights). Many studies have utilized the seven tasks according to Wellman and Liu (2004), categorizing them into "basic ToM" and "advanced ToM." Different components of ToM could influence the findings of the current study, which should be further examined by a more in-depth analysis.

    4. Reviewer #3 (Public review):

      Summary:

      The article explores the role of mother-child interactions in the development of children's social cognition, focusing on Theory of Mind (ToM) and Social Pain Matrix (SPM) networks. Using a naturalistic fMRI paradigm involving movie viewing, the study examines relationships among children's neural development, mother-child neural synchronization, and interaction quality. The authors identified a developmental pattern in these networks, showing that they become more functionally distinct with age. Additionally, they found stronger neural synchronization between child-mother pairs compared to child-stranger pairs, with this synchronization and neural maturation of the networks associated with the mother-child relationship and parenting quality.

      Strengths:

      This is a well-written paper, and using dyadic fMRI and naturalistic stimuli enhances its ecological validity, providing valuable insights into the dynamic interplay between brain development and social interactions. However, I have some concerns regarding the analysis and interpretation of the findings. I have outlined these concerns below in the order they appear in the manuscript, which I hope will be helpful for the revision.

      Weaknesses:

      (1) Given the importance of social cognition in this study, please cite a foundational empirical or review paper on social cognition to support its definition. The current first citation is primarily related to ASD research, which may not fully capture the broader context of social cognition development.

      (2) It is standard practice to report the final sample size in the Abstract and Introduction, rather than the initial recruited sample, as high attrition rates are common in pediatric studies. For example, this study recruited 50 mother-child dyads, and only 34 remained after quality control. This information is crucial for interpreting the results and conclusions. I recommend reporting the final sample size in the abstract and introduction but specifying in the Methods that an additional 16 mother-child dyads were initially recruited or that 50 dyads were originally collected.

      (3) In the "Neural maturity reflects the development of the social brain" section, the authors report the across-network correlation for adults, finding a negative correlation between ToM and SPM. However, the cross-network correlations for the three child groups are not reported. The statement that "the two networks were already functionally distinct in the youngest group of children we tested" is based solely on within-network positive correlations, which does not fully demonstrate functional distinctness. Including cross-network correlations for the child groups would strengthen this conclusion.

      (4) The ROIs for the ToM and SPM networks are defined based on previous literature, applying the same ROIs across all age groups. While I understand this is a common approach, it's important to note that this assumption may not fully hold, as network architecture can evolve with age. The functional ROIs or components of a network might shift, with regions potentially joining or exiting a network or changing in size as children develop. For instance, Mark H. Johnson's interactive specialization theory suggests that network composition may adapt over developmental stages. Although the authors follow the approach of Richardson et al. (2018), it would be beneficial to discuss this limitation in the Discussion. An alternative approach would be to apply data-driven analysis to justify the selection of the ROIs for the two networks.

      (5) The current sample size (N = 34 dyads) is a limitation, particularly given the use of SEM, which generally requires larger samples for stable results. Although the model fit appears adequate, this does not guarantee reliability with the current sample size. I suggest discussing this limitation in more detail in the Discussion.

      (6) Based on the above comment, I believe that conclusions regarding the relationship between social network development, parenting, and support for Bandura's theory should be tempered. The current conclusions may be too strong given the study's limitations.

      (7) The SPM (pain) network is associated with empathic abilities, also an important aspect of social skills. It would be relevant to explore whether (or explain why) SPM development and child-mother synchronization are (or are not) related to parenting and the parent-child relationship.

    1. eLife Assessment

      NeuroSCAN is an accessible and interactive tool for streamlined observation of neuronal morphology, membrane contact, and synaptic connectivity across developmental stages in the nematode C. elegans. This important tool relies on solid electron microscopy datasets. This resource will be of high interest to C. elegans researchers interested in nervous system wiring and circuit function.

    2. Reviewer #1 (Public review):

      Summary:

      The authors present NeuroSCAN, an accessible and interactive tool for visualizing and summarizing data from multiple previously annotated C. elegans connectomes. NeuroSCAN provides a useful entry point for streamlined observation of neuronal morphology, and the membrane contacts and synaptic connectivity between neurons across developmental stages and individual connectomes readily extracted from existing data.

      Strengths:

      Koonce et al. have generated a web-based visualization tool for exploring C. elegans neuronal morphology, contact area between neurons, and synaptic connectivity data. Here, the authors integrate volumetric segmentation of neurons and visualization of contact area patterns of individual neurons generated from Diffusion Condensation and C-PHATE embedding based on previous work from adult volumetric electron microscopy (vEM) data, extended to available vEM data for earlier developmental stages, which effectively summarizes modularity within the collated C. elegans contactomes to date. Overall, NeuroSCAN's relative ease of use for generating visualizations, its ability to quickly toggle between developmental stages, and its integration of a concise visualization of individual neurons' contact patterns strengthen its utility.

      Weaknesses:

      NeuroSCAN provides an accessible and convenient platform. However, many of the characteristics of NeuroSCAN overlap with that of an existing tool for visualizing connectomics data, Neuroglancer, which is a widely-used and shared platform with data from other organisms. The authors do not make clear their motivation for generating this new tool rather than building on a system that has already collated previous connectomics data. Although the field will benefit from any tool that collates connectomics data and makes it more accessible and user-friendly, such a tool is only useful if it is kept up-to-date, and if data formatting for submitting electron microscopy data to be added to the tool is made clear. It is unclear from this manuscript whether NeuroSCAN will be updated with recently published and future C. elegans connectomes, or how additional datasets can be submitted to be added in the future.

      The interface for visualizing contacts and synapses would be improved with better user access to the quantitative underlying data. When contact areas or synapses are added to the viewer, adding statistics on the magnitude of the contact area, the number of synapses, and the rank of these values among the neuron's top connections, would make the viewer more useful for hypothesis generation. Furthermore, synapses are currently listed individually, with names that are not very legible to the web user. Grouping them by pre- and postsynaptic neurons and linking these groups across developmental stages would also be an improvement.

      While the DC/C-PHATE visualizations are a useful tool for the user, it is difficult to understand when grouping or splitting of cell contact patterns is biologically significant. DC is a deterministic algorithm applied to a contactome from a single organism, and the authors do not provide quantitative metrics of distances between individual neurons or a number of DC iterations on the C-PHATE plot, nor is the selection process for the threshold for DC described in this manuscript. In the application of DC/C-PHATE to larval stage nerve ring strata organization shown by the authors, qualitative observations of C-PHATE plots colored based on adult data seem to be the only evidence shown for persistent strata during development (Figure 3) or changing architectural motifs across stages (Figure 4). Quantitation of differences in neuron position within the DC hierarchy, or differences in modularity across stages, is needed to support these conclusions. Furthermore, illustrating the quantitative differences in C-PHATE plots used to make these conclusions will provide a more instructive guide for users of NeuroSCAN in generating future hypotheses.

      While the case studies presented by the authors help to highlight the utility of the different visualizations offered by the NeuroSCAN platform, the authors need to be more careful with the claims they make from these correlative observations. For example, in Figure 4, the authors use C-PHATE clustering patterns to make conclusions about changes in clustering patterns of individual neurons across development based on single animal datasets. In this and many other cases presented in this study with the limited existing datasets, it is difficult to differentiate between developmental changes and individual variability between the neurite positions, contacts, and synapse differences within these data. This caveat needs to be clearly addressed.

    3. Reviewer #2 (Public review):

      Summary:

      The past five years have seen the publication of both new (Witvliet et al., 2021) and newly analyzed (Cook et al., 2019; Moyle et al., 2021; Brittin et al., 2021) data for the C. elegans connectome. The increase in data availability for a single species allows researchers to examine variability due to both stochastic events and changes over development. The quantity of these data is huge. To help the community make these data more accessible, the authors present a new online tool that allows the examination of 3D models for C. elegans neurons in the central neuropil across development. In addition to visualizing the overall structure of the neuronal processes and locations of synapses, the NeuroSCAN tool also allows users to probe into the C-PHATE visualization results, which this group previously pioneered to describe similarities in neuron adjacency (Moyle et al., 2021).

      Strengths:

      The ability to visualize the data from both a connectomics and contactomics perspective across developmental time has significant power. The original C. elegans connectome (White et al., 1986) presented their circuits as line drawings with chemical and electrical synapses indicated through arrows and bars. While these line drawings remain incredibly useful, they were also necessary simplifications for a 2D publication and they lack details of the complex architecture seen within each EM image. Koonce et al take advantage of segmented image data of each neuronal process within the nerve ring to create a web interface where users can visualize 3D models for their neuron of choice. The C-PHATE visualization allows users to explore similarities among different neurons in terms of adjacency and then go directly to the 3D model for these neurons. The 3D models it generates are beautiful and will likely be showing up in many future presentations and publications. The tool doesn't require any additional downloading and is open source.

      Weaknesses:

      While it's impossible to create one tool that will satisfy all potential users, I found myself wanting to have numbers associated with the data. For example, knowing the number of connections or the total surface area of contacts between individual neurons wasn't possible through the viewer, which limits the utility of taking deep analytical dives. While connectivity data are readily accessible through other interfaces such as Nemanode and WormWiring, a more thorough integration may be helpful to some users.

      There were several issues with the user interface that made it a bit clunky to use. For example, as I added additional neurons to the filter search box, the loading time got longer and longer. I ran an experiment uploading all of the amphid neurons, one pair at a time. Each additional neuron pair added an additional 5-10 seconds to the loading. By the time I got to the last pair, it took over a minute to load. Issues like these, some of which may be unavoidable given the size of the data, could be conveyed through better documentation. I did not find the tutorial very helpful and the supplementary movies lacked any voiceover, so it wasn't always clear what they were trying to show.

    4. Reviewer #3 (Public review):

      Summary:

      This work provides graphical tools for reconstructing the detailed anatomy of a nervous system from a series of sections imaged by electron microscopy. Contact between neuronal processes can direct outgrowth and is necessary for connectivity and, thus function. A bioinformatic approach is used to group neurons according to shared features (e.g., contact, synapses) in a hierarchy of "relatedness" that can be interrogated at each step. In this work, Koonze et al analyze vEM data sets for the C. elegans nerve ring (NR), a dense fascicle of processes from181 neurons. In a bioinformatic approach, the clustering algorithm Diffusion Condensation (DC) groups neurons according to similar cell biological features in iterations that remove chunks of differences in feature data with each step ultimately merging all NR neurons in one cluster. DC results are displayed with C-Phate a 3D visualization tool to produce a trajectory that can be interrogated for cell identities and other features at each iterative step. In previous work by these authors, this approach was utilized to identify subgroups of neuronal processes or "strata" in the NR that can be grouped by physical contact and connectivity. Here they expand their analysis to include a series of available vEM data sets across C. elegans larval development. This approach suggests that strata initially established during embryonic development are largely preserved in the adult. Importantly, exceptions involving stage-specific reorganization of neuronal placement in specific strata were also detected. A case study featured in the paper demonstrates the utility of this approach for visualizing the integration of newly generated neurons into the existing NR anatomy. Visualization tools used in this work are publicly available at NeuroSCAN.

      Strengths:

      A web-based app, NeuroSCAN, that individual researchers can use to interrogate the structure and organization of the C. elegans nerve ring across development

      Weaknesses:

      In the opinion of this reviewer, only minor revisions are required.

    1. eLife Assessment

      This important study is of significant relevance to the fields of predictive processing, perception, and learning. The well-designed paradigm allows the authors to avoid several common confounds in investigating predictions, such as stimulus familiarity and adaptation. Using a state-of-the-art multivariate EEG approach, the authors test the opposing process theory and find evidence in support of it, but some elements - especially the interactions across block - have only incomplete support at present. This could be strengthened via further analyses and justification.

    2. Reviewer #1 (Public review):

      Summary:

      In this lovely paper, McDermott and colleagues tackle an enduring puzzle in the cognitive neuroscience of perceptual prediction. Though many scientists agree that top-down predictions shape perception, previous studies have yielded incompatible results - with studies showing 'sharpened' representations of expected signals, and others showing a 'dampening' of predictable signals to relatively enhance surprising prediction errors. To deepen the paradox further, it seems like there are good reasons that we would want to see both influences on perception in different contexts.

      Here, the authors aim to test one possible resolution to this 'paradox' - the opposing process theory (OPT). This theory makes distinct predictions about how the time course of 'sharpening' and 'dampening' effects should unfold. The researchers present a clever twist on a leading-trailing perceptual prediction paradigm, using AI to generate a large dataset of test and training stimuli so that it is possible to form expectations about certain categories without repeating any particular stimuli. This provides a powerful way of distinguishing expectation effects from repetition effects - a perennial problem in this line of work.

      Using EEG decoding, the researchers find evidence to support the OPT. Namely, they find that neural encoding of expected events is superior in earlier time ranges (sharpening-like) followed by a relative advantage for unexpected events in later time ranges (dampening-like). On top of this, the authors also show that these two separate influences may emerge differently in different phases of learning - with superior decoding of surprising prediction errors being found more in early phases of the task, and enhanced decoding of predicted events being found in the later phases of the experiment.

      Strengths:

      As noted above, a major strength of this work lies in important experimental design choices. Alongside removing any possible influence of repetition suppression mechanisms in this task, the experiment also allows us to see how effects emerge in 'real-time' as agents learn to make predictions. This contrasts with many other studies in this area - where researchers 'over-train' expectations into observers to create the strongest possible effects or rely on prior knowledge that was likely to be crystallised outside the lab.

      Weaknesses:

      This study reveals a great deal about how certain neural representations are altered by expectation and learning on shorter and longer timescales, so I am loath to describe certain limitations as 'weaknesses'. But one limitation inherent in this experimental design is that, by focusing on implicit, task-irrelevant predictions, there is not much opportunity to connect the predictive influences seen at the neural level to the perceptual performance itself (e.g., how participants make perceptual decisions about expected or unexpected events, or how these events are detected or appear).

      The behavioural data that is displayed (from a post-recording behavioural session) shows that these predictions do influence perceptual choice - leading to faster reaction times when expectations are valid. In broad strokes, we may think that such a result is broadly consistent with a 'sharpening' view of perceptual prediction, and the fact that sharpening effects are found in the study to be larger at the end of the task than at the beginning. But it strikes me that the strongest test of the relevance of these (very interesting) EEG findings would be some evidence that the neural effects relate to behavioural influences (e.g., are participants actually more behaviourally sensitive to invalid signals in earlier phases of the experiment, given that this is where the neural effects show the most 'dampening' a.k.a., prediction error advantage?)

    3. Reviewer #2 (Public review):

      Summary:

      There are two accounts in the literature that propose that expectations suppress the activity of neurons that are (a) not tuned to the expected stimulus to increase the signal-to-noise ratio for expected stimuli (sharpening model) or (b) tuned to the expected stimulus to highlight novel information (dampening model). One recent account, the opposing process theory, brings the two models together and suggests that both processes occur, but at different time points: initial sharpening is followed by later dampening of the neural activity of the expected stimulus. In this study, the authors aim to test the opposing process theory in a statistical learning task by applying multivariate EEG analyses and finding evidence for the opposing process theory based on the within-trial dynamics.

      Strengths:

      This study addresses a very timely research question about the underlying mechanisms of expectation suppression. The applied EEG decoding approach offers an elegant way to investigate the temporal characteristics of expectation effects. A major strength of the study lies in the experimental design that aims to control for repetition effects, one of the common confounds in prediction suppression studies. The reported results are novel in the field and have the potential to substantially improve our understanding of expectation suppression in visual perception.

      Weaknesses:

      The strength in controlling for repetition effects by introducing a neutral (50% expectation) condition also adds a weakness to the current version of the manuscript, as this neutral condition is not integrated into the behavioral (reaction times) and EEG (ERP and decoding) analyses. This procedure remained unclear to me. The reported results would be strengthened by showing differences between the neutral and expected (valid) conditions on the behavioral and neural levels. This would also provide a more rigorous check that participants had implicitly learned the associations between the picture category pairings.

      It is not entirely clear to me what is actually decoded in the prediction condition and why the authors did not perform decoding over trial bins in prediction decoding as potential differences across time could be hidden by averaging the data. The manuscript would generally benefit from a more detailed description of the analysis rationale and methods.

      Finally, the scope of this study should be limited to expectation suppression in visual perception, as the generalization of these results to other sensory modalities or to the action domain remains open for future research.

    4. Reviewer #3 (Public review):

      Summary:

      In their study, McDermott et al. investigate the neurocomputational mechanism underlying sensory prediction errors. They contrast two accounts: representational sharpening and dampening. Representational sharpening suggests that predictions increase the fidelity of the neural representations of expected inputs, while representational dampening suggests the opposite (decreased fidelity for expected stimuli). The authors performed decoding analyses on EEG data, showing that first expected stimuli could be better decoded (sharpening), followed by a reversal during later response windows where unexpected inputs could be better decoded (dampening). These results are interpreted in the context of opposing process theory (OPT), which suggests that such a reversal would support perception to be both veridical (i.e., initial sharpening to increase the accuracy of perception) and informative (i.e., later dampening to highlight surprising, but informative inputs).

      Strengths:

      The topic of the present study is of significant relevance to the field of predictive processing. The experimental paradigm used by McDermott et al. is well designed, allowing the authors to avoid several common confounds in investigating predictions, such as stimulus familiarity and adaptation. The introduction of the manuscript provides a well-written summary of the main arguments for the two accounts of interest (sharpening and dampening), as well as OPT. Overall, the manuscript serves as a good overview of the current state of the field.

      Weaknesses:

      In my opinion, several details of the methods, results, and manuscript raise doubts about the quality and reliability of the reported findings. Key concerns are:

      (1) The results in Figure 2C seem to show that the leading image itself can only be decoded with ~33% accuracy (25% chance; i.e. ~8% above chance decoding). In contrast, Figure 2E suggests the prediction (surprisingly, valid or invalid) during the leading image presentation can be decoded with ~62% accuracy (50% chance; i.e. ~12% above chance decoding). Unless I am misinterpreting the analyses, it seems implausible to me that a prediction, but not actually shown image, can be better decoded using EEG than an image that is presented on-screen.

      (2) The "prediction decoding" analysis is described by the authors as "decoding the predictable trailing images based on the leading images". How this was done is however unclear to me. For each leading image decoding the predictable trailing images should be equivalent to decoding validity (as there were only 2 possible trailing image categories: 1 valid, 1 invalid). How is it then possible that the analysis is performed separately for valid and invalid trials? If the authors simply decode which leading image category was shown, but combine L1+L2 and L4+L5 into one class respectively, the resulting decoder would in my opinion not decode prediction, but instead dissociate the representation of L1+L2 from L4+L5, which may also explain why the time-course of the prediction peaks during the leading image stimulus-response, which is rather different compared to previous studies decoding predictions (e.g. Kok et al. 2017). Instead for the prediction analysis to be informative about the prediction, the decoder ought to decode the representation of the trailing image during the leading image and inter-stimulus interval. Therefore I am at present not convinced that the utilized analysis approach is informative about predictions.

      (3) I may be misunderstanding the reported statistics or analyses, but it seems unlikely that >10 of the reported contrasts have the exact same statistic of Tmax= 2.76. Similarly, it seems implausible, based on visual inspection of Figure 2, that the Tmax for the invalid condition decoding (reported as Tmax = 14.903) is substantially larger than for the valid condition decoding (reported as Tmax = 2.76), even though the valid condition appears to have superior peak decoding performance. Combined these details may raise concerns about the reliability of the reported statistics.

      (4) The reported analyses and results do not seem to support the conclusion of early learning resulting in dampening and later stages in sharpening. Specifically, the authors appear to base this conclusion on the absence of a decoding effect in some time-bins, while in my opinion a contrast between time-bins, showing a difference in decoding accuracy, is required. Or better yet, a non-zero slope of decoding accuracy over time should be shown (not contingent on post-hoc and seemingly arbitrary binning).

      (5) The present results both within and across trials are difficult to reconcile with previous studies using MEG (Kok et al., 2017; Han et al., 2019), single-unit and multi-unit recordings (Kumar et al., 2017; Meyer & Olson 2011), as well as fMRI (Richter et al., 2018), which investigated similar questions but yielded different results; i.e., no reversal within or across trials, as well as dampening effects with after more training. The authors do not provide a convincing explanation as to why their results should differ from previous studies, arguably further compounding doubts about the present results raised by the methods and results concerns noted above.

      Impact:

      At present, I find the potential impact of the study by McDermott et al. difficult to assess, given the concerns mentioned above. Should the authors convincingly answer these concerns, the study could provide meaningful insights into the mechanisms underlying perceptual prediction. However, at present, I am not entirely convinced by the quality and reliability of the results and manuscript. Moreover, the difficulty in reconciling some of the present results with previous studies highlights the need for more convincing explanations of these discrepancies and a stronger discussion of the present results in the context of the literature.

    1. eLife Assessment

      The findings are considered valuable and have theoretical implications for the interdisciplinary field of value-based social decision-making. Support for the main claims is incomplete and these should be supported by further analyses.

    2. Reviewer #1 (Public review):

      Summary:

      The authors test the hypotheses, using an effort-exertion and an effort-based decision-making task, while recording brain dynamics with EEG, that the brain processes reward outcomes for effort differentially when they earned for themselves versus others.

      Strengths:

      The strengths of this experiment include what appears to be a novel finding of opposite signed effects of effort on the processing of reward outcomes when the recipient is self versus others. Also, the experiment is well-designed, the study seems sufficiently powered, and the data and code are publicly available.

      Weaknesses:

      Inferences rely heavily on the results of mixed effects models which may or may not be properly specified and are not supported by complementary analyses. Also, not all results hang together in a sensible way. For example, participants report feeling less subjective effort, but also more disliking of tasks when they were earning rewards for others versus self. Given that participants took longer to complete tasks when earning effort for others, it is conceivable that participants might have been working less hard for others versus themselves, and this may complicate the interpretation of results.

    3. Reviewer #2 (Public review):

      Summary:

      Measurements of the reward positivity, an electrophysiological component elicited during reward evaluation, have previously been used to understand how self-benefitting effort expenditure influences the processing of rewards. The present study is the first to complement those measurements with electrophysiological reward after-effects of effort expenditure during prosocial acts. The results provide solid evidence that effort adds reward value when the recipient of the reward is the self but discounts reward value when the beneficiary is another individual.

      Strengths:

      An important strength of the study is that the amount of effort, the prospective reward, the recipient of the reward, and whether the reward was actually gained or not were parametrically and orthogonally varied. In addition, the researchers examined whether the pattern of results generalized to decisions about future efforts. The sample size (N=40) and mixed-effects regression models are also appropriate for addressing the key research questions. Those conclusions are plausible and adequately supported by statistical analyses.

      Weaknesses:

      Although the obtained results are highly plausible, I am concerned whether the reward positivity (RewP) and P3 were adequately measured. The RewP and P3 were defined as the average voltage values in the time intervals 300-400 ms and 300-440 ms after feedback onset, respectively. So they largely overlapped in time. Although the RewP measure was based on frontocentral electrodes (FC3, FCz, and FC4) and the P3 on posterior electrodes (P3, Pz, and P4), the scalp topographies in Figure 3 show that the RewP effects were larger at the posterior electrodes used for the P3 than at frontocentral electrodes. So there is a concern that the RewP and P3 were not independently measured. This type of problem can often be resolved using a spatiotemporal principal component analysis. My faith in the conclusions drawn would be further strengthened if the researchers extracted separate principal components for the RewP and P3 and performed their statistical analyses on the corresponding factor scores.

    4. Reviewer #3 (Public review):

      This study investigates how effort influences reward evaluation during prosocial behaviour using EEG and experimental tasks manipulating effort and rewards for self and others. Results reveal a dissociable effect: for self-benefitting effort, rewards are evaluated more positively as effort increases, while for other-benefitting effort, rewards are evaluated less positively with higher effort. This dissociation, driven by reward system activation and independent of performance, provides new insights into the neural mechanisms of effort and reward in prosocial contexts.

      This work makes a valuable contribution to the prosocial behaviour literature by addressing areas that previous research has largely overlooked. It highlights the paradoxical effect of effort on reward evaluation and opens new avenues for investigating the mechanisms underlying this phenomenon. The study employs well-established tasks with robust replication in the literature and innovatively incorporates ERPs to examine effort-based prosocial decision-making - an area insufficiently explored in prior work. Moreover, the analyses are rigorous and grounded in established methodologies, further enhancing the study's credibility. These elements collectively underscore the study's significance in advancing our understanding of effort-based decision-making.

      Despite these contributions, there are several gaps in the analysis that leave the conclusions incomplete and warrant further investigation. These issues can be summarized as follows:

      (1) Incomplete EEG Reporting: The methods indicate that EEG activity was recorded for both tasks; however, the manuscript reports EEG results only for the first task, omitting the decision-making task. If the authors claim a paradoxical effect of effort on self versus other rewards, as revealed by the RewP component, this should also be confirmed with results from the decision-making task. Omitting these findings weakens the overall argument.

      (2) Neural and Behavioural Integration: The neural results should be contrasted with behavioural data both within and between tasks. Specifically, the manuscript could examine whether neural responses predict performance within each task and whether neural and behavioural signals correlate across tasks. This integration would provide a more comprehensive understanding of the mechanisms at play.

      (3) Success Rate and Model Structure: The manuscript does not clearly report the success rate in the prosocial effort task. If success rates are low, risk aversion could confound the results. Additionally, it is unclear whether the models accounted for successful versus unsuccessful trials or whether success was included as a covariate. If this information is present, it needs to be explicitly clarified. The exclusion criteria for unsuccessful trials in both tasks should also be detailed. Moreover, the decision to exclude electrodes as independent variables in the models warrants an explanation.

      (4) Prosocial Decision Computational Modelling: The prosocial decision task largely replicates prior behavioural findings but misses the opportunity to directly test the hypotheses derived from neural data in the prosocial effort task. If the authors propose a paradoxical effect of effort on self-rewards and an inverse effect for prosocial effort, this could be formalised in a computational model. A model comparison could evaluate the proposed mechanism against alternative theories, incorporating the complex interplay of effort and reward for self and others. Furthermore, these parameters should be correlated with neural signals, adding a critical layer of evidence to the claims. As it is, the inclusion of the prosocial decision task seems irrelevant.

      (5) Contradiction Between Effort Perception and Neural Results: Participants reported effort as less effortful in the prosocial condition compared to the self condition, which seems contradictory to the neural findings and the authors' interpretation. If effort has a discounting effect on rewards for others, one might expect it to feel more effortful. How do the authors reconcile these results? Additionally, the relationship between behavioural data and neural responses should be examined to clarify these inconsistencies.

      Necessary Revisions to Manuscript: If the authors address the issues above, corresponding updates to the introduction and discussion sections could strengthen the narrative and align the manuscript with the additional analyses.

    1. Author response:

      (1) General Statements

      We thank all three reviewers for their constructive comments and suggestions. We also thank reviewers #2 and #3 for considering our work to be timely and of interest to the field, not only for basic researchers, but also for translational scientists and industry. We are now providing additional results to further support our hypothesis and hope that all reviewers will find that our manuscript is now ready for publication. 

      (2) Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): 

      The manuscript by Coquel et al. investigates the effects of BKC and IBC, two compounds found in Psoralea corylifolia in DNA replication and the response to DNA damage, and explores their potential use in cancer treatment. These compounds have been previously shown to affect different cellular pathways and the authors use transformed cancer cells of different origins and a non-transformed cell line to question if their combination is toxic in cancer versus non-cancer cells. They propose that BKC inhibits DNA polymerases while IBC targets CHK2. Their results show that both compounds do affect DNA replication, inducing replication stress and affecting double strand break repair. They also show that their combined use increases their toxicity in a synergistic manner. 

      However, there are some major conclusions that are still not very well supported by the data: first, the differential effect on cancer and non-transformed cells; second, the direct link of BKC to the inhibition of DNA polymerases; and third, it is unclear if CHK2 is the relevant target for IBC in this context. 

      Regarding these points the authors should address the following issues: 

      (1) Most of the experiments use BJ fibroblasts as a control cell line. In order to evaluate if these compounds are preferentially toxic for cancer cells, the use of more than one non-transformed cell line is necessary. In addition, BJ cells are fibroblasts while most of the cancer cell lines employed are of epithelial origin. The authors could use MCF10 and RPE cells (both of epithelial origin) as control cell lines to complement the results and better support this claim. 

      We have now monitored the effect of IBC and BKC on the proliferation of MCF-7, MCF-10A and RPE-1 cells using the WST-1 assay and obtained similar results as for BJ and MCF-7 cells. These results are now included in the revised manuscript as Fig. S1A and S1B.

      (2) In order to explore what are the targets of BKC and IBC Cellular Thermal Shift Assays (CETSA) could be used. Either by doing an unbiased mass spectrometry analysis of proteins stabilized by these compounds or by a direct analysis of candidate proteins by western blot (a similar approach has been used for IBC to show that it inhibits SIRT2 in Ren et al., 2024 Phytotherapy Res).

      We thank this Reviewer for suggesting the use of the CETSA assay. We have now performed  CETSA on MCF-7 cells and found that IBC stabilizes CHK2 but not CHK1, to the same extent as the commercial CHK2 inhibitor BML-277 used here as a positive control. These results are now shown in new Fig. 4G and 4H.

      (3) For BKC in vitro polymerase assays could be carried out to show the direct inhibition of the DNA polymerase delta, for instance. 

      We have used high-speed Xenopus egg extracts to replicate ssDNA in vitro (Fig. S2C). This assay differs from the in vitro replication assay using low-speed Xenopus egg extracts (Fig. 2H) in that it only monitors elongation by replicative DNA polymerases (Pol δ and ε) and not earlier steps such as origin licensing and activation. The combined use of both low-speed and highspeed extracts strongly supports the view that BKC inhibits replicative DNA polymerases. 

      To confirm this result, we have also used CETSA to monitor BKC binding to different subunits of DNA Polδ and Polε in MCF-7 cells and in Xenopus egg extracts (Fig. 3C-D Fig. S3). We found that BKC binds POLD1 and POLE, the catalytic subunits of Pol δ and ε respectively, but not the accessory subunit POLD3 nor PCNA. Together with our docking results and DNA fiber experiments, these data strongly support the view that BKC is a potent inhibitor of DNA Pol and Pol. 

      (4) In addition, the authors could analyze the integrity of replication forks by PCNA immunofluorescence analysis. The colocalization of PCNA and POLD or POLE subunits could also support the role of DNA polymerases as targets of BKC. 

      Our molecular docking results also show that BKC occupies the catalytic sites of DNA Pol δ and ε, which may not affect their subcellular localization and/or PCNA binding. Since our DNA replication assays, CETSA and DNA fiber analyses strongly support the view that BKC inhibits replicative DNA polymerases, we have not performed this additional experiment.

      (5) In the case of IBC and the inhibition of CHK2, the authors should check the effect of IBC on the phosphorylation of BRCA1 on S988. The changes in CHK2 phosphorylation in Figure 3B are not convincing. The experiment should be repeated and the average of at least three experiments needs to be quantified. 

      We now provide evidence that IBC inhibits BRCA1 phosphorylation on S988. Western blots and quantification for three biological replicates are shown in Fig. 4C and Fig. S4H. Densitometric quantification of CHK2 phosphorylation on S516 from 3 biological replicates, along with statistical analysis, is now shown in Fig. S4G.

      (6) To prove that CHK2 is the relevant target for IBC the authors could test if ATM and CHK2 knockout cells are more resistant to this compound, since it would prevent the phosphorylation of CHK2. 

      We have performed siRNA transfection targeting CHK2. The transfected cells died after 72 hours in culture, so we have been unable to determine whether CHK2-KD cells have increased resistance to IBC.  

      In addition to these experiments, I would suggest some other major improvements in the manuscript: 

      (1) The concentration of both compounds should be provided in molar units throughout the paper.

      Thanks for pointing this out, we now use molar units throughout the paper.

      (2) The authors do not clearly indicate the concentration that is employed in the different experiments, making it difficult to assess the results. For instance, Figure 2 does not include the concentration in the legend or in the text. Time and concentration need to be clearly shown for each experiment. 

      The experimental conditions and inhibitor concentrations are now clearly indicated for each experiment.

      (3) Some experiments are only repeated once (fiber assays) or twice (cell cycle analysis by flow cytometry). These experiments need to be repeated 3 times and the proper statistical analysis performed (comparison of the medians). 

      Superplots with biological replicates for all DNA fiber assays are now displayed. The number of biological replicates is now indicated in the legends and appropriate statistical analyses are used.

      Other minor points or suggestions: 

      (1) Analyzing fork asymmetry would further support the direct effect of BKC on DNA polymerases. 

      The effect of BKC on fork asymmetry is now shown in Fig. 2F. 

      (2) A dose dependent analysis of BKC on the speed of DNA replication would also support this point. 

      Superplots of DNA fiber assays showing the effect of different concentrations of BKC on fork speed from three biological replicates are now included in Fig. 2E.

      (3) Page 7: BKC reduces fork speed ...two-fold. This sentence is not very clear, it would be better to say that speed is half of the control. 

      This sentence was changed to “BKC reduced fork speed by a factor of two relative to untreated cells”.

      (4) Figure 4G and S4D show contradictory results regarding the induction of Rad51 foci by IBC treatment. This needs to be clarified. 

      Figure 4G and S4D (now Fig. 5G and S5D) do not show contradictory results. In both cases, IBC treatment impaired the induction of RAD51 foci by IR or bleomycin.  

      (5) Page 12, Figure S5C is called for but it does not exist (probably meaning Figure S5B). 

      We apologize for this error, which has now been corrected.  

      Reviewer #1 (Significance): 

      The work by Coquel et al. aims at elucidating the use of BKC and IBC as a combined therapy to induce cell death in cancer cells by targeting DNA replication and CHK2. Both BKC and IBC have been previously shown to affect the proliferation of cancer cells. BKC has been shown to induce S phase arrest in an ATR dependent manner in MCF7 cells (Li et al., 2016 Front Pharm), while IBC induces cell death in MDA-MB-231 cells (Wu et al., 2022 Molecules). In this regard, the more interesting contribution of the manuscript is the potential identification of the targets of these compounds in cancer cells. The inhibition of CHK2 by IBC is quite compelling although it needs to be further proven. In contrast, the hypothesis that BKC inhibits DNA polymerases remains highly speculative. The results offer a limited advance in the knowledge of the mechanism of action of these two compounds. Focusing on the action of IBC on CHK2 would increase the impact of the results. In this sense a very recent report has been published showing that IBC inhibits SIRT2 (Ren et al., 2024 Phyto Res), showing that IBC can affect multiple enzymes and processes. This should be taken into account for a further analysis of its mechanism of action. 

      In addition to the identification of the targets of BKC and IBC, the authors also focus on their combination for cancer treatment. This is based on the idea that blocking the DSB repair and inducing replication stress at the same time is an efficient approach to induce cancer cell death. This is not a new concept, since the loss of ATM sensitizes cancer cells to the inhibition of the replication stress response and several combination therapies have been put forward with the idea of generating replication stress and preventing the subsequent repair of the double strand breaks induced in these cells. Thus, the novelty here is limited, especially considering that the effect of BKC on DNA replication has already been described. Further, since its mechanism of action is unclear, it is difficult to ascribe the observed synergy to the speculated hypothesis. A deeper analysis of IBC as a CHK2 inhibitor would be more interesting, and the potential combination with other chemotherapy agents such as replication stress inhibitors, HU or DNA damaging agents. Also, the lack of a good control of non-transformed cells also reduces the relevance of the work. 

      In its current state, the interest of the manuscript is limited. The mechanistical advance is not strong enough and is not completely supported by the data, and the use of these compounds as a combination therapy does not provide new insights in cancer treatment. In my opinion, focusing on the inhibition of CHK2 by IBC and its potential use would broaden the impact of the results beyond the mere analysis of the action of these compounds. 

      We thank this reviewer for his/her constructive and insightful comments. We have followed his/her advice and focused our analysis on the action of IBC on CHK2. Using CETSA, we confirmed that IBC binds CHK2 to the same extent as BML-277 inhibitor, but does not bind CHK1. We also show that IBC inhibits BRCA1 phosphorylation on S988 and CHK2 phosphorylation on S516. Together with the results presented in the initial version of the manuscript, these data support the view that CHK2 is a key IBC target. We have also applied CETSA to DNA polymerases and confirmed that BKC directly targets DNA Polδ and ε. Although it is unlikely that IBC and BKC will ever be used in combination therapies, the synergistic effect that we measured on cancer cells in vivo and in vitro indicates that IBC sensitizes cancer cells to endogenous replication stress and to exogenous sources of DNA damage, which could be used to replace BKC in combination therapies. For instance, our data indicate that IBC can be used in combination with drugs such as etoposide, doxorubicin or cyclophosphamide to potentiate their effect on drug-resistant lymphoma cell lines (DLBCL). As requested by this Reviewer, we have modified the discussion section to put more emphasis on IBC and CHK2 inhibitors and we hope that he/she will now find this revised version suitable for publication.

      Reviewer #2 (Evidence, reproducibility and clarity): 

      In the manuscript by Coquel et al., the authors report their findings on the effect of 2 natural compounds from Psoralea corylofolia plant extracts on cancer cells. They show that these compounds, bakuchiol (BKC) and isobavachalcone (IBC), inhibit proliferation of cancer cells and tumor development in xenografted mice, particularly when used in combination. They further show that BKC inhibited DNA polymerases and induced replication stress, and show evidence that IBC inhibits Chk2 kinase activity and downstream double-strand break repair. Based on their findings, the authors conclude that Chk2 inhibition and DNA replication inhibition represent a potential synergistic strategy to selecting target cancer cells. 

      Major: 

      (1) The data showing IBC is a Chk2 inhibitor is weak and more rigorous investigation is needed to establish this compound as a Chk2 inhibitor. 

      As indicate in our response to Reviewer #1, we have now analyzed the binding of IBC to CHK2 using the Cellular Thermal Shift Assay (CETSA) in MCF-7 cells. Our data clearly show that IBC binds to CHK2 but not CHK1. These results are now shown in Fig. 4G and 4H.

      For one, the authors mention they screened 43 cell cycle-related kinases in vitro, but only show data for 8 kinases in their kinase activity screens. Of these 8 kinases, Chk2 is the most strongly inhibited, but there are no data shown for the other 35 kinases. 

      Data for all the protein kinases tested in the in vitro assay are now presented in Fig. S4D and S4E.  

      Additionally, the purpose of the CHK2 mutants should be discussed in the text. 

      The CHK2(I157T) mutation is linked to an increased risk of breast and colorectal cancers. CHK2(R145W) is associated with Li-Fraumeni Syndrome. Both mutations do not affect the basal kinase activity of CHK2. This information is now indicated in the legend of Fig. S4D. 

      Secondly, the western blot in Fig 3B, appears to show a very modest effect of IBC on Chk2 autophosphorylation and not that different from the effect of IBC on Akt phosphorylation in Fig S3a. Yet, the authors claim that IBC inhibits Chk2 but not Akt. To strengthen these blots, a known Chk2 inhibitor, such as the one shown in Fig 4 (BML-277) should be included as a positive control for pChk2 similarly to what was shown for Akt with MK-2206. 

      We have now replaced the western blot in Fig. 3B (now Fig. 4B) with another biological replicate. Quantifications and statistical analyses of biological replicates are shown in Fig. S4G. Overall, we observed a 50% reduction of CHK2 auto-phosphorylation in MCF7 cells treated with IBC, and a 20% reduction in AKT phosphorylation (Fig. S4A). There was no additional reduction in AKT phosphorylation when cells were treated with IBC in combination with MK-2206, compared to cells treated with MK-2206 alone. We now include the CHK2 inhibitor BML-277 as a positive control alongside with IBC to monitor CHK2 and CHK1 auto-phosphorylation in Fig. 4B, S4G, 4D and S4I, respectively.

      Western blots showing a loss of phosphorylation of additional Chk2 targets is also needed. The manuscript mentions Brca1 S988 as a Chk2 substrate important for DSB repair. Showing the effect of IBC on this phosphorylation site would strengthen the conclusions. 

      We now provide evidence that IBC inhibits BRCA1 phosphorylation at S988. Western blots and quantification for three biological replicates are shown in Fig. 4C and S4H. 

      (2) The authors claim that the combination of IBC and BKC inhibit cell growth in a synergistic manner and that the "effect is more pronounce on cancer cells than on non-cancer cells." However, only 1 non-malignant cell line was used, and it was a fibroblast line. To make this claim, the authors need to show the effect in additional non-malignant cells, preferably with epithelial cell types. 

      We have now monitored cell proliferation using the WST-1 assay in two additional non-malignant cell lines, namely MCF-10A and RPE-1 cells. Cells were treated with IBC/BKC and their growth was compared to that of MCF-7 cells. These experiments yielded similar results to those obtained with BJ fibroblasts. These new data are now included in the revised version as Fig. S1A and S1B. 

      Minor: 

      (1) Densitometry data for all western blots should be shown with mean+/- stdev of independent western blots. 

      Densitometry data for all western blots with biological replicates are now shown in supplementary figures.

      (2) In Figure 1B the statistical test used to analyze cell number was not stated. 

      The statistical test is now indicated in Fig. 1B.

      (3) In Figure 2A, the DAPI image for BKC is the merged image and should be replaced with just DAPI. 

      This error has now been corrected.

      (4) In Figure 2B, the y-axis label says "yH2AX foci (MFI)". MFI and foci are not the same thing, and for yH2AX, the signal is often not focal. MFI of yH2AX is an appropriate measurement for replication stress, it's just not appropriate to equate MFI to foci. 

      We apologize for this labeling error, which has now been corrected.

      (5) For the 53BP1 MFI and Rad51 MFI shown in Fig 4 and Fig S4, it is more appropriate to show the number of foci/cell as these are better indicators of breaks and repair sites. MFI is influenced by expression levels of the proteins and not necessarily the break/repair. 

      The numbers of 53BP1 and RAD51 foci are now shown.

      (6) The data in Figures 5B and 5C are very difficult to read. Perhaps color-coat the lines/symbols. 

      We have now colored the graph to increase its readability. 

      Reviewer #2 (Significance): 

      The findings reported in this manuscript are timely, of interest to the field, and are mostly wellsupported by the experimental data. However, there are a few concerns that need to be addressed. 

      We are grateful to Reviewer #2 for his positive assessment of our manuscript. We hope that we have adequately addressed all of his/her specific concerns and that he/she will agree with the need to put more emphasis on IBC and CHK2 inhibition as requested by Reviewer #1.

      Reviewer #3 (Evidence, reproducibility and clarity): 

      The manuscript: "Synergistic effect of inhibiting CHK2 and DNA replication on cancer cell growth" successfully demonstrates that the compounds BKC and IBC found in Psoralea corylifolia act synergistically to inhibit cancer cell proliferation, using a wide range of well-chosen methodologies. Moreover, the authors characterized the mechanisms of action of both drugs, which result in inhibition of cell proliferation. The use of multiple cell lines and the mice models makes the study robust and complete. The manuscript presents a well written study that offers new insights and contributions to the field. 

      A few suggestions to improve the study: 

      (1) Given that both compounds BKC and IBC have already been previously described in the literature, it would be helpful for the reader to have them described better at the beginning of the study. 

      Thanks for pointing this out. We have now better described BKC and IBC at the beginning of the results section, as well as in the discussion. We agree that this could be helpful to readers.

      (2) Addition of western blot quantifications over the number of experimental repeats is important specifically for Fig. 2C and Fig. 3C where partial effect of treatment on a signal level is reported. 

      The densitometry analysis of data shown in Fig. 2C and biological replicates are now shown in Fig. S2B. Quantification for Fig. 3C (now Fig. 4D) is shown in Fig. S4I.

      (3) The quantification of mean intensity for 53BP1 and RAD51 foci should be exchanged with the quantification of number of foci per cell. While the quantification of gH2AX signal intensity is a correct representation of induction of this signal upon damage, foci formed by protein recruitment to DNA damage sites should be quantified by counting the number of foci, rather than signal in the whole cell/nucleus. These proteins exist before damage and are re-located in response to the damage. 

      Quantification of 53BP1 and RAD51 foci is now expressed as the number of foci per cell. 

      (4) Materials & Methods section is missing the methods for the experiment described in Fig. 1B. In summary, after addressing our few concerns, we believe the manuscript should be accepted for publication. 

      The WST-1 assay used for cell number quantification is included in “Reagents” in Material & Methods section.

      Reviewer #3 (Significance):

      The manuscript presents a well written study that offers new insights and contributions to the field. Although the inhibitors described have been known in science, the authors present convincingly their mode of action, which is either better characterized (for BKC) or inhibiting a different than previously suggested enzyme (for IBC). Authors also nicely pinpoint and explain the narrow window of concentrations when these two compounds act synergistically rather than additively. The analyses in multiple cell lines, mouse models and in combination with other cancer treatments, makes this study of interest not only for fundamental researchers but also for translational scientists and industry.

      My field of expertise: DNA replication and replication stress across model systems. 

      We are grateful to Reviewer #3 for his/her very positive assessment of our work and we hope that he/she will find this revised version suitable for publication.

    2. eLife Assessment

      This study presents important findings on the activity of two compounds, BKC and IBC, isolated from Psoralea corylifolia, which act synergistically to inhibit cancer cell proliferation. Using a spectrum of methods, the authors characterized the mechanisms of action of both drugs, providing convincing evidence that BKC targets DNA polymerases and IBC selectively inhibits CHK2. The study opens the possibility of improving the effectiveness of the combination of BKC and other damaging agents with IBC in cancer treatment.

      [Editors' note: this paper was reviewed by Review Commons.]

    3. Reviewer #1 (Public review):

      The manuscript by Coquel et al. investigates the effects of BKC and IBC, two compounds found in Psoralea corylifolia in DNA replication and the response to DNA damage, and explores their potential use in cancer treatment. These compounds have been previously shown to affect different cellular pathways and the authors use transformed cancer cells of different origins and a non-transformed cell line to question if their combination is toxic in cancer versus non-cancer cells. They propose that BKC inhibits DNA polymerases while IBC targets CHK2. Their results show that both compounds do affect DNA replication, inducing replication stress and affecting double strand break repair. They also show that their combined use increases their toxicity in a synergistic manner.

      Comments on current version:

      The authors have addressed the main questions raised in the original manuscript. The new data provide stronger evidence supporting the inhibition of DNA polymerases by BKC and the effect of IBC on CHK2. In addition, the new data provides information about the potential mechanism of action of IBC in cells and xenograft models. Together, the revised manuscript has notably increased the relevance and impact of the results with stronger conclusions and better controlled experiments.

    4. Reviewer #2 (Public review):

      Summary:

      The manuscript: "Synergistic effect of inhibiting CHK2 and DNA replication on cancer cell growth" successfully demonstrates that the compounds BKC and IBC found in Psoralea corylifolia act synergistically to inhibit cancer cell proliferation, using a wide range of well-chosen methodologies. Moreover, the authors characterized the mechanisms of action of both drugs, which result in inhibition of cell proliferation. The use of multiple cell lines and the mice models makes the study robust and complete.

      Significance:

      The manuscript presents a well written study that offers new insights and contributions to the field. Although the inhibitors described have been known in science, the authors present convincingly their mode of action, which is either better characterized (for BKC) or inhibiting a different than previously suggested enzyme (for IBC). Authors also nicely pinpoint and explain the narrow window of concentrations when these two compounds act synergistically rather than additively. The analyses in multiple cell lines, mouse models and in combination with other cancer treatments, make this study of interest not only for fundamental researchers but also for translational scientists and industry.

    1. eLife Assessment

      This valuable study uses AlphaFold2 to guide the structural modelling of different states of the human voltage-gated potassium channel KV11.1, a key pharmacological drug target. Follow-up molecular dynamics and drug-docking simulations, combined with experimental comparisons, offer convincing evidence supporting the models, showing that drugs bind more effectively to the inactivated state. The work shows potential for improving drug potency predictions in ion channel pharmacology, though its applicability to other systems remains uncertain.

    2. Reviewer #1 (Public review):

      Summary:

      Ngo et. al use several computational methods to determine and characterize structures defining the three major states sampled by the human voltage-gated potassium channel hERG: the open, closed, and inactivated state. Specifically, they use AlphaFold and Rosetta to generate conformations that likely represent key features of the open, closed, and inactivated states of this channel. Molecular dynamics simulations confirm that ion conduction for structure models of the open but not the inactivated state. Moreover, drug docking in silico experiments show differential binding of drugs to the conformation of the three states; the inactivated one being preferentially bound by many of them. Docking results are then combined with a Markov model to get state-weighted binding free energies that are compared with experimentally measured ones.

      Strengths:

      The study uses state-of-the art modeling methods to provide detailed insights into the structure-function relationship of an important human potassium channel. AlphaFold modeling, MD simulations, and Markov modeling are nicely combined to investigate the impact of structural changes in the hERG channel on potassium conduction and drug binding.

      Weaknesses:

      (1) The selection of inactivated conformations based on AlphaFold modeling seems a bit biased. The authors base their selection of the "most likely" inactivated conformation on the expected flipping of V625 and the constriction at G626 carbonyls. This follows a bit of the "Streetlight effect". It would be better to have selection criteria that are independent of what they expect to find for the inactivated state conformations. Using cues that favour sampling/modeling of the inactivated conformation, such as the deactivated conformation of the VSD used in the modeling of the closed state, would be more convincing. There may be other conformations that are more accurately representing the inactivated state. I see no objective criteria that justify the non-consideration of conformations from cluster 3 of the inactivated state modeling. I am not sure whether pLDDT is a good selection criterion. It reports on structural confidence, but that may not relate to functional relevance.

      (2) The comparison of predicted and experimentally measured binding affinities lacks an appropriate control. Using binding data from open-state conformations only is not the best control. A much better control is the use of alternative structures predicted by AlphaFold for each state (e.g. from the outlier clusters or not considered clusters) in the docking and energy calculations. Using these docking results in the calculations would reveal whether the initially selected conformations (e.g. from cluster 2 for the inactivated state) are truly doing a better job in predicting binding affinities. Such a control would strengthen the overall findings significantly.

      (3) Figures where multiple datapoints are compared across states generally lack assessment of the statistical significance of observed trends (e,g. Figure 3d).

      (4) Figure 3 and Figures S1-S4 compare structural differences between states. However, these differences are inferred from the initial models. The collection of conformations generated via the MD runs allow for much more robust comparisons of structural differences.

    3. Reviewer #2 (Public review):

      Summary:

      Ngo et al. use AlphaFold2 and Rosetta to model closed, open, and inactive states of the human ion channel hERG. Subsequent MD simulations and comparisons with experiments support the plausibility of their models.

      Strengths:

      This is thorough work studied from many different angles. It provides a self-consistent picture of how conformational changes in hERG may affect its function and binding to different targets.

      Weaknesses:

      Though this work claims the methodologies can be generalized to other systems, it is not obvious how. Many modeling choices seem arbitrary and also seem to have required extensive expert knowledge of the system. This limits the applicability of the modeling strategy.

    4. Reviewer #3 (Public review):

      Summary:

      The authors use Alphafold2, Rosetta, and Molecular Dynamics to model structures of the hERG K channel in open, inactive, and closed states. Experimental CryoEM data for open hERG (Wang and Mackinnon 2017), and closed EAG (Mandala and Mackinnon, 2002) were used as the main templates for channel models presented here. Given the importance of hERG as a safety pharmacology target, the identification of a robust simulation method to assess drug block is an important addition to the field.

      Strengths

      The key findings here are new inactivated and closed hERG channel conformations and hERG channel conformations with drugs docked in the inner vestibule below the selectivity filter. Amino acid pathways and interaction networks for different states are also presented.

      The inactive state and drug block models are carefully correlated with experimental data for the inactivated state of hERG (Lau et al, 2024) and with experimental free energy data for drug binding and have overall good agreement.

      It is remarkable that using cytoplasmic domain structures of hERG as a starting point revealed inactivation state structures in the hERG selectivity filter in Figures 2,3.

      Weaknesses

      Figure 6, if each data point is for a different drug, then perhaps identify each point.

      The PAS domain was not included in the models as stated in Methods page 14 but the PAS does appear in some of the templates used as starting points for models in Figure 1 a,b,c. Perhaps mentioning that the PAS was not included in some (all?) of the final models should be moved into the main text and discussed.

      The drug block of 1b channels (which do not contain PAS) has been reported to be slightly different than that for 1a channels (which contain PAS) and for 1a/1b channels (see London et al., 1997; https://doi.org/10.1161/01.RES.81.5.870 and Abi-Gerges et. al., 2011; DOI: 10.1111/j.1476-5381.2011.01378.x) and this should be discussed since the models presented here appear to be performed in the absence of the PAS.

      It also appears that the N-linker region (between PAS and the S1) and distal C region of hERG (post CNBHD-COOH) are not included in models, please state this if correct, and discuss.

    5. Author response:

      Reviewer #1: 

      Summary:

      Ngo et. al use several computational methods to determine and characterize structures defining the three major states sampled by the human voltage-gated potassium channel hERG: the open, closed, and inactivated state. Specifically, they use AlphaFold and Rosetta to generate conformations that likely represent key features of the open, closed, and inactivated states of this channel. Molecular dynamics simulations confirm that ion conduction for structure models of the open but not the inactivated state. Moreover, drug docking in silico experiments show differential binding of drugs to the conformation of the three states; the inactivated one being preferentially bound by many of them. Docking results are then combined with a Markov model to get state-weighted binding free energies that are compared with experimentally measured ones.

      Strengths:

      The study uses state-of-the art modeling methods to provide detailed insights into the structure-function relationship of an important human potassium channel. AlphaFold modeling, MD simulations, and Markov modeling are nicely combined to investigate the impact of structural changes in the hERG channel on potassium conduction and drug binding.

      We appreciate the reviewer’s recognition of our integration of state-of-the-art computational methods, including AlphaFold2, Rosetta, MD simulations, and Markov modeling. We are pleased that the reviewer found our approach to investigating the structure-function relationship of the hERG channel insightful.

      Weaknesses:

      (1) The selection of inactivated conformations based on AlphaFold modeling seems a bit biased. The authors base their selection of the "most likely" inactivated conformation on the expected flipping of V625 and the constriction at G626 carbonyls. This follows a bit of the "Streetlight effect". It would be better to have selection criteria that are independent of what they expect to find for the inactivated state conformations. Using cues that favour sampling/modeling of the inactivated conformation, such as the deactivated conformation of the VSD used in the modeling of the closed state, would be more convincing. There may be other conformations that are more accurately representing the inactivated state. I see no objective criteria that justify the non-consideration of conformations from cluster 3 of the inactivated state modeling. I am not sure whether pLDDT is a good selection criterion. It reports on structural confidence, but that may not relate to functional relevance.

      We acknowledge the concern regarding the selection criteria for the inactivated state models. In the revised manuscript version, we plan to broaden our selection approach and explicitly include conformations from different clusters beyond those highlighted in the initial submission (e.g., from cluster 3). We will also incorporate structural metrics that do not solely depend on the known channel inactivation hallmarks or reply on the pLDDT scores to further justify our chosen representative inactivated state models.

      (2) The comparison of predicted and experimentally measured binding affinities lacks an appropriate control. Using binding data from open-state conformations only is not the best control. A much better control is the use of alternative structures predicted by AlphaFold for each state (e.g. from the outlier clusters or not considered clusters) in the docking and energy calculations. Using these docking results in the calculations would reveal whether the initially selected conformations (e.g. from cluster 2 for the inactivated state) are truly doing a better job in predicting binding affinities. Such a control would strengthen the overall findings significantly.

      We agree that a more rigorous control for our drug-binding predictions is desirable. To address this, we will include molecular docking simulations and associated drug binding affinity estimations for more hERG channel models, including alternate conformations from the initial clustering that were not chosen as the final models. This will allow us to test whether our inactivated state structure from cluster 2 indeed outperforms or differs significantly from other possible inactivated hERG channel conformations in reproducing experimental drug potencies.

      (3) Figures where multiple datapoints are compared across states generally lack assessment of the statistical significance of observed trends (e,g. Figure 3d).

      (4) Figure 3 and Figures S1-S4 compare structural differences between states. However, these differences are inferred from the initial models. The collection of conformations generated via the MD runs allow for much more robust comparisons of structural differences.

      We will incorporate statistical analyses and measures of uncertainty for key comparisons. In Figures 3 and S1-S4 the consensus structural hERG channel models for open, inactivated and closed states are being compared, i.e. one representative model for each state. We believe this is a valid comparison, and the statistical analysis of the observed trends based on those models (e.g., in the bar plot of Figure 3d) alone might not be possible. However, we agree with the reviewer that instead of relying solely on those initial static models, we will also draw on the ensemble of states sampled during the MD simulations to quantify structural differences between different putative hERG channel states. Specifically, we will present ensemble-averaged measurements and highlight how these distributions differ significantly between states.

      Reviewer #2:

      Summary:

      Ngo et al. use AlphaFold2 and Rosetta to model closed, open, and inactive states of the human ion channel hERG. Subsequent MD simulations and comparisons with experiments support the plausibility of their models.

      Strengths:

      This is thorough work studied from many different angles. It provides a self-consistent picture of how conformational changes in hERG may affect its function and binding to different targets.

      We are grateful for the reviewer’s recognition of the thoroughness and multi-faceted nature of our study.

      Weaknesses:

      Though this work claims the methodologies can be generalized to other systems, it is not obvious how. Many modeling choices seem arbitrary and also seem to have required extensive expert knowledge of the system. This limits the applicability of the modeling strategy.

      We appreciate the reviewer’s comment on the generalizability of our approach. In the revision, we will more explicitly discuss the rationale behind the modeling choices and the extent to which they reflect system-specific knowledge. We will clarify how the strategies we developed (e.g., iterative refinement with AlphaFold2 and Rosetta, followed by MD simulation validation) can be adapted to other ion channels or related proteins. We will also outline a more generalizable workflow, specifying which steps require system-specific information and which steps are broadly applicable.

      Reviewer #3:

      Summary:

      The authors use Alphafold2, Rosetta, and Molecular Dynamics to model structures of the hERG K channel in open, inactive, and closed states. Experimental CryoEM data for open hERG (Wang and Mackinnon 2017), and closed EAG (Mandala and Mackinnon, 2002) were used as the main templates for channel models presented here. Given the importance of hERG as a safety pharmacology target, the identification of a robust simulation method to assess drug block is an important addition to the field.

      Strengths

      The key findings here are new inactivated and closed hERG channel conformations and hERG channel conformations with drugs docked in the inner vestibule below the selectivity filter. Amino acid pathways and interaction networks for different states are also presented.

      The inactive state and drug block models are carefully correlated with experimental data for the inactivated state of hERG (Lau et al, 2024) and with experimental free energy data for drug binding and have overall good agreement.

      It is remarkable that using cytoplasmic domain structures of hERG as a starting point revealed inactivation state structures in the hERG selectivity filter in Figures 2,3.

      We thank the reviewer for highlighting the novelty and importance of our work, particularly regarding the identification of new inactivated and closed hERG channel conformations and the modeling of drug block. We are also pleased that the reviewer found the correlation with experimental data to be strong and the structural insights to be valuable.

      Weaknesses

      Figure 6, if each data point is for a different drug, then perhaps identify each point.

      Thank you so much for this suggestion. Please note that Table 3 contains drug-specific data plotted in Figure 6 including drug names. We will provide a reference to Table 3 in the revised Figure 6 caption. We will also revise Figure 6 (and any similar figures) to clearly identify each data point with the corresponding drug and/or include a corresponding key in the Figure legend. This will make it easier to correlate each data point’s binding prediction with the experimental datasets.

      The PAS domain was not included in the models as stated in Methods page 14 but the PAS does appear in some of the templates used as starting points for models in Figure 1 a,b,c. Perhaps mentioning that the PAS was not included in some (all?) of the final models should be moved into the main text and discussed.

      The drug block of 1b channels (which do not contain PAS) has been reported to be slightly different than that for 1a channels (which contain PAS) and for 1a/1b channels (see London et al., 1997; https://doi.org/10.1161/01.RES.81.5.870 and Abi-Gerges et. al., 2011; DOI: 10.1111/j.1476-5381.2011.01378.x) and this should be discussed since the models presented here appear to be performed in the absence of the PAS.

      It also appears that the N-linker region (between PAS and the S1) and distal C region of hERG (post CNBHD-COOH) are not included in models, please state this if correct, and discuss.

      We appreciate the reviewer’s insightful comment regarding the PAS domain and the potential influence of other regions, such as the N-linker and distal C-region, on hERG channel drug binding and state transitions.

      The PAS domain did appear in the starting templates used for initial structural modeling (as shown in Figure 1a, b, c), but it was not included in the final models used for subsequent analyses. Similarly, the N-linker and the distal C-region were also omitted from the final models. These omissions were primarily due to hardware constraints used for AlphaFold structural modeling, as including these additional protein regions would exceed the memory capacity of graphical processing unit (GPU) cards on our available intramural, external and cloud high-performance computing resources, leading to failures during the protein structure prediction step.

      The PAS domain of hERG 1a isoform, even if not serving as a direct drug-binding site, can influence the gating kinetics of hERG channels as the reviewer pointed out. By altering the probability and duration with which those ion channels occupy specific conformational states, it can indirectly affect how well drugs bind. For example, if the presence of the PAS domain shifts channel gating so that more channels enter (and remain in) the inactivated state, drugs with a higher affinity for that state would appear to bind more potently, as observed in electrophysiological experiments. It is also plausible that the PAS domain could exert allosteric effects that alter the conformational landscape of the ion channel during gating transitions, potentially impacting drug accessibility or binding stability. This is an intriguing hypothesis and an important avenue for future research.

      With access to more powerful computational resources, it would be valuable to explore the full-length hERG 1a channel, including the PAS domain and associated regions, to assess their potential contributions to drug binding and gating dynamics. We will incorporate a discussion of these points into the main text, acknowledging the limitations of our current models, citing the references provided by the reviewer, and highlighting the need for future studies to explore these protein regions in greater detail.

    1. eLife Assessment

      This is an important technical method paper that details the development and quality assessment of a 3D MERFISH method to enable spatial transcriptomics of thick tissues, representing a major step forward in the technical capacity of the MERFISH. The evidence presented is convincing.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Fang et al. reports a 3D MERFISH method that enable spatial transcriptomics for tissues up to 200um in thickness. MERFISH as well other spatial transcriptomics technologies have been mainly used for thin (e.g, 10um) tissue slices, which limits the dimension of spaital transcriptomics technique. Therefore, expanding the capacity of MERFISH to thick tissues represents a major technical advance to enable 3D spatial transcriptomics. Here the authors provide detailed technical descriptions of the new method, troubleshooting, optimization, and application examples to demonstrate its technical capacity, accuracy, sensitivity, and utility. The method will likely have major impact on future spatial transcriptomics studies to benefit diverse biomedical fields.

      Strengths:

      The study was well-designed, executed, and presented. Extensive protocol optimization and quality assessments were carried out and conclusions are well supported by the data. The methods were sufficiently detailed and the results are solid and compelling.

      Weaknesses:

      Thorough performance comparison with other existing technologies can be done in the future.

      Comments on revisions:

      The authors have sufficiently addressed the previous comments.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      The study by Fang et al. reports a 3D MERFISH method that enables spatial transcriptomics for tissues up to 200um in thickness. MERFISH, as well as other spatial transcriptomics technologies, have been mainly used for thin (e.g, 10um) tissue slices, which limits the dimension of spatial transcriptomics technique. Therefore, expanding the capacity of MERFISH to thick tissues represents a major technical advance to enable 3D spatial transcriptomics. Here the authors provide detailed technical descriptions of the new method, troubleshooting, optimization, and application examples to demonstrate its technical capacity, accuracy, sensitivity, and utility. The method will likely have a major impact on future spatial transcriptomics studies to benefit diverse biomedical fields. 

      Strengths: 

      The study was well-designed, executed, and presented. Extensive protocol optimization and quality assessments were carried out and conclusions are well supported by the data. The methods were sufficiently detailed, and the results are solid and compelling. 

      Response: We thank the reviewer for the positive comments on our manuscript.  

      Weaknesses: 

      The biological application examples were limited to cell type/subtype classification in two brain regions. Additional examples of how the data could be used to address important biological questions will enhance the impact of the study. 

      We appreciate the reviewer's suggestion that demonstrating the broader applications of our thick-tissue 3D MERFISH method to address important biological questions would enhance the impact of our study. In line with the reviewer's feedback, we have included discussions on how this method could be applied to address various biological questions in the summary (last) paragraph of our manuscript. These discussions highlight the versatility and utility of our approach in studying diverse biological processes beyond cell type classification. 

      However, the goal of this work is to develop a method and establish its validity. While we are interested in applying it to addressing important biological questions in the future, we consider these applications beyond the scope of this work. 

      Reviewer #2 (Public Review): 

      Summary: 

      In their preprint, Fang et al present data on extending a spatial transcriptomics method, MERFISH, to 3D using a spinning disc confocal. MERFISH is a well-established method, first published by Zhuang's lab in 2015 with multiple follow-up papers. In the last few years, MERFISH has been used by multiple groups working on spatial transcriptomics, including approximately 12 million cell maps measured in the mouse brain atlas project. Variants of MERFISH were used to map epigenetic information complementary to gene expression and RNA abundance. However, MERFISH was always limited to thin ~10um sections to this date.

      The key contribution of this work by Fang et al. was to perform the optimization required to get MERFISH working in thick (100-200um) tissue sections. 

      Major strengths and weaknesses: 

      Overall the paper presents a technical milestone, the ability to perform highly multiplexed RNA measurements in 3D using MERFISH protocol. This is not the first spatial transcriptomics done in thick sections. Wang et al. 2018 - StarMAP used thick sections (150 um), and recently, Wang 2021 (EASI-FISH, not cited) performed serial HCR FISH on 300um sections. Data so far suggest that MERFISH has better sensitivity than in situ sequencing approaches (StarMAP) and has built-in multiplexing that EASI-FISH lacks. Therefore, while there is an innovation in the current work, i.e., it is a technically challenging task, the novelty, and overall contribution are modest compared to recently published work.  

      The authors could improve the writing and the manuscript text that places their work in the right context of other spatial transcriptomics work. Out of the 25 citations, 12 are for previous MERFISH work by Zhuang's lab, and only one manuscript used a spatial transcriptomics approach that is not MERFISH. Furthermore, even this paper (Wang et al, 2018) is only discussed in the context of neuroanatomy findings. The fact that Wang et al. were the first to measure thick sections is not mentioned in the manuscript. The work by Wang et al. 2021 (EASI-FISH) is not cited at all, as well as the many other multiplexed FISH papers published in recent years that are very relevant. For example, a key difference between seqFISH+ and MERFISH was the fact that only seqFISH+ used a confocal microscope, and MERFISH has always been relying on epi. As this is the first MERFISH publication to use confocal, I expect citations to previous work in seqFISH and better discussions about differences. 

      We thank the reviewer for recognizing our work as a technical milestone. Since the aim of this work is to build upon the strengths of MERFISH and address some of its limitations, we primarily cited previous MERFISH papers to clarify the specific improvements made in this work. Given the rapid growth of the spatial omics field, it has become impractical to comprehensively cite all method development papers. Instead, we cited a 2021 review article in the first sentence of the originally submitted manuscript and limited all discussions afterwards to MERFISH. In light of this reviewer’s suggestion to more broadly cite spatial transcriptomics work, we added two additional review articles on spatial omics. Spatial omics methods primarily include two categories: 1) imaging-based methods and 2) next-generation-sequencing based methods. The 2021 review article [Zhuang, Nat Methods 18,18–22 (2021)) included in the originally submitted manuscript is focused on imaging-based methods. The additional 2021 review article [Larsson et al., Nat Methods 18, 15–18 (2021)] that we now included in the revised manuscript is focused on next-generation-sequencing based methods. We also added a more recent review article published in 2023 [Bressan et al., Science 381:eabq4964 (2023)], which covers both categories of methods and include more recent technology developments. All three review articles are now cited in parallel in the first introductory paragraph of the manuscript.

      Although we presented our work as an advance in MERFISH specifically, we do consider the reviewer’s suggestion of citing the 2018 STARmap paper [Wang et al., Science 361, eaat5961 (2018)] in the introduction part of our manuscript reasonable. This STARmap paper was already cited in the results part of our originally submitted manuscript, and we have now described this work in the introduction part of our revised manuscript (third paragraph), as this paper was the first to demonstrate 3D in situ sequencing in thick tissues. In addition, we thank the reviewer for bringing to our attention the EASI-FISH paper [Wang et al, Cell 184, 6361-6377 (2021)], which reported a method for thick-tissue FISH imaging and demonstrated imaging of 24 genes using multiple rounds of multi-color FISH imaging. We also recently became aware of a paper reporting 3D imaging of thick samples using PHYTOMap [Nobori et al, Nature Plants 9, 10261033 (2023)]. This paper, published a few days after we submitted our manuscript to eLife, demonstrated imaging of 28 genes in thick plant samples using multiple rounds of multicolor FISH and the probe targeting and amplification methods previously developed for in situ sequencing. We also included these two papers in the introduction section of our revised manuscript (third paragraph). In addition, we also expanded the discussion paragraph (last paragraph) of the manuscript to discuss these thick tissue imaging methods in more details, and in the same paragraph, we also included discussions on two recent bioRxiv preprints in thicktissue transcriptomic imaging [Gandin et al., bioRxiv, doi:10.1101/2024.05.17.594641 (2024); Sui et al., bioRxiv, doi:10.1101/2024.08.05.606553 (2024)]

      However, we do not consider our use of confocal imaging in this work an advance in MERFISH because confocal microscopy, like epi-fluorescence imaging, is a commonly used approach that could be applied to MERFISH of thin tissues directly without any alteration of the protocol. Confocal imaging has been broadly used for both DNA and RNA FISH before any genomescale imaging was reported. Confocal and epi-imaging geometries have their distinct advantages, and which of these imaging geometries to use is the researcher’s choice depending on instrument availability and experimental needs. Thus, we do not find it necessary to cite specific papers just for using confocal imaging in spatial transcriptomic profiling. Our real advance related to confocal imaging is the use of machine-learning to increase the imaging speed. Without this improvement, 3D imaging of thick tissue using confocal would take a long time and likely degrade image quality due to photobleaching of out-of-focus fluorophores before they are imaged. We thus cited several papers that used deep learning to improve imaging quality and/or speed [(Laine et al., International Journal of Biochemistry & Cell Biology 140:106077 (2021); Ouyang et al., Nat Biotechnol 36:460–468 (2018); Weigert et al., Nat Methods 15:1090–1097 (2018)] in our original submission. Our unique contribution is the combination of machine learning with confocal imaging for 3D multiplexed FISH imaging of thick tissue samples, which had not been demonstrated previously.

      To get MERFISH working in 3D, the authors solved a few technical problems. To address reduced signal-to-noise due to thick samples, Fang et al. used non-linear filtering (i.e., deep learning) to enhance the spots before detection. To improve registrations, the authors identified an issue specific to their Z-Piezo that could be improved and replaced with a better model. Finally, the author used water immersion objectives to mitigate optical aberrations. All these optimization steps are reasonable and make sense. In some cases, I can see the general appeal (another demonstration of deep learning to reduce exposure time). Still, in other cases, the issue is not necessarily general enough (i.e., a different model of Piezo Z stage) to be of interest to a broad readership. There were a few additional optimization steps, i.e., testing four concentrations of readout and encoder probes. So while the preprint describes a technical milestone, achieving this milestone was done with overall modest innovation. 

      We appreciate the reviewer's recognition of the technical challenges we have overcome in developing this 3D thick-tissue MERFISH method. To achieve high-quality thick- tissue MERFISH imaging, we had to overcome multiple different challenges. We agree with the reviewer that the solutions to some of the above challenges are intellectually more impressive than the remaining ones that required relatively more mundane efforts. However, all of these are needed to achieve the overall goal, a goal that is considered a milestone by the reviewer.  We believe that the impact of a method should be evaluated based on its capabilities, potential applications, and its adaptability for broader adoption. In this regard, we anticipate that our reported method will be valuable and impactful contribution to the field of spatial biology.

      Data and code sharing - the only link in the preprint related to data sharing sends readers to a deleted Dropbox folder. Similarly, the GitHub link is a 404 error. Both are unacceptable. The author should do a better job sharing their raw and processed data. Furthermore, the software shared should not be just the MERlin package used to analyze but the specific code used in that package.  

      We shared the data through Dropbox as a temporary data-sharing approach for the review process, because of the potential needs to revise and/or add data during the paper revision process. We have now made all data publicly available at Dryad (https://doi.org/10.5061/dryad.w0vt4b922).

      The GitHub link that we provided for the MERlin package was valid and when we clicked on it, it took us to the correct GitHub site. However, to make the code a permanent record, we also deposited the code to Zenodo (https://zenodo.org/records/13356944). Moreover, following the suggestion by the reviewer, in addition to the MERlin v2.2.7 package itself, we have also shared the specific code to utilize this package for analyzing the data taken in this work at Dryad (https://doi.org/10.5061/dryad.w0vt4b922). 

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors): 

      (1) It will be good to expand the application section to demonstrate the utility of 3D MERFISH to address diverse types of biological questions for the two brain regions examined. At present, it only examined the localization of various cell clusters in the tissues. Can it be used to examine both short and long-range interactions, for example? 

      We appreciate the reviewer's feedback and agree that demonstrating the broader applications of our 3D thick-tissue MERFISH imaging method in addressing diverse biological questions would enhance the impact of our study.  

      In line with the reviewer’s comments, one of the analyses we performed in the manuscript was examining short-range interactions based on soma contact between adjacent neurons in the two brain regions studied (see third-to-last and second-to-last paragraphs of the Main text). This analysis provided insights into the spatial organization of inhibitory neurons and potential interactions between the same type of interneurons in these brain regions. 

      Although long-range interactions, for example synaptic interactions between neurons, would be of great interest, our current 3D MERFISH measurements does not allow such interactions to be determined. Future research to enable measurements of synaptic interactions between molecularly defined neuronal subtypes would be interesting, but we consider this to be out of the scope of the current study.

      (2) For the nearest neighbor distance analysis in Figure 3, the method seems to be missing. Please add details about this analysis to allow better understanding. It is counterintuitive that the cell subtypes showed tight local distribution (Figure 3 - supplement 3), but the nearest neighbor distances with subtypes are not different from those between subtypes. Please explain. 

      We apologize for the missing the nearest neighbor distance analysis in the Materials and Methods section.  We have added the detailed description of this analysis to the Materials and Methods section of the revised manuscript (last subsection of Materials and Methods).

      Regarding the comment “It is counterintuitive that the cell subtypes showed tight local distribution (Figure 3 - supplement 3), but the nearest neighbor distances with subtypes are not different from those between subtypes”, this is not necessarily counter-intuitive given how we defined nearest-neighbor distances between the same subtype of neurons and nearestneighbor distances between different subtypes of neurons. Here is how we performed this analysis for interneurons. First, we determined the nearest-neighbor neurons for each interneuron and classified it as either having another interneuron of the same type as the nearest neighbor or having a different type of interneuron or an excitatory neuron as the nearest neighbor. We then determine the distributions for the distances between these two types of nearest neighbors and compared these distributions. When a neuronal subtype for a tight spatial cluster, such as the type-A cluster shown in the schematic below, the nearest-neighbor distances between nearest neighbor A-A pairs are indeed small. However, the distance between a type-A neuron and a different type of neurons (for example, type-B) is not necessarily bigger than those between two type-A neurons, if the nearest neighbor cell for this type-A neuron is a type-B neuron. These nearest-neighbor A-B pairs are likely formed between type-A neurons at the edge of the cluster with type-B neurons near the edge of the type-A cluster. If the distance of an A-B pair is not comparable to those of nearest-neighbor A-A pairs, it is unlikely a nearestneighbor pair by our definition as described above.

      Author response image 1.

      Reviewer #2 (Recommendations For The Authors): 

      (1) The scholarship in this work is lacking. All of the non-MERFISH parts of the field of spatial transcriptomics are ignored. The work needs to be discussed in the context of the literature. 

      We thank the reviewer for this suggestion and have included discussions of other spatial omics work, and other thick-tissue multiplexed imaging work in the Introduction and discussion section of the manuscript. Please see details in our response to the Public Review  portion of this reviewer’s comments.  

      (2) The data/code sharing links are broken and need to be fixed. 

      Response: We shared the data through Dropbox as a temporary data-sharing approach for the review process, because of the potential needs to revise and/or add data during the paper revision process We have now placed all data publicly available at Dryad (https://doi.org/10.5061/dryad.w0vt4b922). 

      The GitHub link that we provided for the MERlin package was valid and when we clicked on it, it took us to the correct GitHub site. However, to make the code a permanent record, we also deposited the code to Zenodo (https://zenodo.org/records/13356944). Moreover, following the suggestion by the reviewer, in addition to the MERlin (MERFISH decoding package itself), we have also shared the specific code to utilize this package for analyzing the data taken in this work at Dryad (https://doi.org/10.5061/dryad.w0vt4b922) to ensure that the readers can fully reproduce the results presented in our manuscript.

    1. eLife Assessment

      This paper advances an important new concept in psoriasis pathogenesis and implicates Sema4a as a homeostatic regulator that is highly epithelial-specific. The findings are convincing and lend support for the biology described here as a mechanism with therapeutic implications.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Kume et al examined the role of the protein Semaphorin 4a in steady state skin homeostasis and how this relates to skin changes seen in human psoriasis and imiquimod-induced psoriasis-like disease in mice. The authors found that human psoriatic skin has reduced expression of Sema4a in the epidermis. While Sema4a has been shown to drive inflammatory activation in different immune populations, this finding suggested Sema4a might be important for negatively regulating Th17 inflammation in the skin. The authors go on to show that Sema4a knockout mice have skin changes in key keratinocyte genes, increased gdT cells, and increased IL-17 similar to differences seen in non-lesional psoriatic skin, and that bone marrow chimera mice with WT immune cells and Sema4a KO stromal cells develop worse IMQ-induced psoriasis-like disease, further linking expression of Sema4a in the skin to maintaining skin homeostasis. The authors next studied downstream pathways that might mediate the homeostatic effects of Sema4a, focusing on mTOR given its known role in keratinocyte function. Like for the immune phenotypes, Sema4a KO mice had increased mTOR activation in the epidermis in a similar pattern to mTOR activation noted in non-lesional psoriatic skin. The authors next targeted the mTOR pathway and showed rapamycin could reverse some of the psoriasis-like skin changes in Sema4a KO mice, confirming the role of increased mTOR in contributing to the observed skin phenotype.

      In the revised manuscript, the authors expand on the potential relevance to psoriasis by demonstrating similar findings in an IL-23-diriven model of skin inflammation, which is an orthogonal model of psoriasis to their original IMQ model. They also show that in addition to reversing steady state differences in skin thickness between Sema4a KO mice and WT mice, rapamycin improves metrics of disease in the IMQ model of psoriasis. These additional studies further bolster their conclusions that Sema4a may play a protective role in by preventing over-activation of mTOR in the skin in psoriasis.

      Strengths:

      The most interesting finding is the tissue-specific role for Sema4a, where it has previously been considered to play a mostly pro-inflammatory role in immune cells, this study shows that when expressed by keratinocytes, Sema4a plays a homeostatic role that when missing leads to development of psoriasis-like skin changes. This has important implications in terms of targeting Sema4a pharmacologically. It also may yield a novel mouse model to study mechanisms of psoriasis development in mice separate from the commonly used IMQ model. The included experiments are well-controlled and executed rigorously.

      The new experiments provide additional data to support the conclusions through an orthogonal model of psoriasis and demonstrating rapamycin-induced reversal of changes in the IMQ disease model.

      Weaknesses:

      While the main weakness of these studies, lack of tissue-specific Sema4a knockout mice (e.g. in keratinocytes only), remains, generating these mice and performing the necessary experiments is beyond the scope of completing these particular studies. Similarly, it is understandable that additional bone marrow chimeras would be costly and labor intensive without adding much more in the absence of tissue-specific knockouts.

    3. Reviewer #2 (Public review):

      Summary:

      Kume et al. found for the first time that Semaphorin 4A (Sema4A) was downregulated in both mRNA and protein levels in L and NL keratinocytes of psoriasis patients compared to control keratinocytes. In peripheral blood, they found that Sema4A is not only expressed in keratinocytes but is also upregulated in hematopoietic cells such as lymphocytes and monocytes in the blood of psoriasis patients. They investigated how the down-regulation of Sema4A expression in psoriatic epidermal cells affects the immunological inflammation of psoriasis by using a psoriasis mice model in which Sema4A KO mice were treated with IMQ. Kume et al. hypothesized that down-regulation of Sema4A expression in keratinocytes might be responsible for the augmentation of psoriasis inflammation. Using bone marrow chimeric mice, Kume et al. showed that KO of Sema4A in non-hematopoietic cells was responsible for the enhanced inflammation in psoriasis. The expression of CCL20, TNF, IL-17, and mTOR was upregulated in the Sema4AKO epidermis compared to the WT epidermis, and the infiltration of IL-17-producing T cells was also enhanced.

      Strengths:

      Decreased Sema4A expression may be involved in psoriasis exacerbation through epidermal proliferation and enhanced infiltration of Th17 cells, which helps understand psoriasis immunopathogenesis.

      Weaknesses:

      The mechanism of decreased Sema4A expression in psoriasis is not clear, although this does not affect the strength of this research.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Kume et al examined the role of the protein Semaphorin 4a in steady-state skin homeostasis and how this relates to skin changes seen in human psoriasis and imiquimod-induced psoriasis-like disease in mice. The authors found that human psoriatic skin has reduced expression of Sema4a in the epidermis. While Sema4a has been shown to drive inflammatory activation in different immune populations, this finding suggested Sema4a might be important for negatively regulating Th17 inflammation in the skin. The authors go on to show that Sema4a knockout mice have skin changes in key keratinocyte genes, increased gdT cells, and increased IL-17 similar to differences seen in non-lesional psoriatic skin, and that bone marrow chimera mice with WT immune cells and Sema4a KO stromal cells develop worse IMQ-induced psoriasis-like disease, further linking expression of Sema4a in the skin to maintaining skin homeostasis. The authors next studied downstream pathways that might mediate the homeostatic effects of Sema4a, focusing on mTOR given its known role in keratinocyte function. As with the immune phenotypes, Sema4a KO mice had increased mTOR activation in the epidermis in a similar pattern to mTOR activation noted in non-lesional psoriatic skin. The authors next targeted the mTOR pathway and showed rapamycin could reverse some of the psoriasis-like skin changes in Sema4a KO mice, confirming the role of increased mTOR in contributing to the observed skin phenotype.

      Strengths:

      The most interesting finding is the tissue-specific role for Sema4a, where it has previously been considered to play a mostly pro-inflammatory role in immune cells, this study shows that when expressed by keratinocytes, Sema4a plays a homeostatic role that when missing leads to the development of psoriasis-like skin changes. This has important implications in terms of targeting Sema4a pharmacologically. It also may yield a novel mouse model to study mechanisms of psoriasis development in mice separate from the commonly used IMQ model. The included experiments are well-controlled and executed rigorously.

      Weaknesses:

      A weakness of the study is the lack of tissue-specific Sema4a knockout mice (e.g. in keratinocytes only). The authors did use bone marrow chimeras, but only in one experiment. This work implies that psoriasis may represent a Sema4a-deficient state in the epidermal cells, while the same might not be true for immune cells. Indeed, in their analysis of non-lesional psoriasis skin, Sema4a was not significantly decreased compared to control skin, possibly due to compensatory increased Sema4a from other cell types. Unbiased RNA-seq of Sema4a KO mouse skin for comparison to non-lesional skin might identify other similarities besides mTOR signaling. Indeed, targeting mTOR with rapamycin reveres some of the skin changes in Sema4a KO mice, but not skin thickness, so other pathways impacted by Sema4a may be better targets if they could be identified. Utilizing WT→KO chimeras in addition to global KO mice in the experiments in Figures 6-8 would more strongly implicate the separate role of Sema4a in skin vs immune cell populations and might more closely mimic non-lesional psoriasis skin.

      We sincerely appreciate your summary and for pointing out the strengths and weaknesses of our study. Although we were unfortunately unable to perform all these experiments due to limitations in our resources, we fully agree with the importance of studying tissue-specific Sema4A KO mice. As an alternative, we compared the IL-17A-producing potential of skin T cells between WT→KO mice and KO→KO mice following 4 consecutive days of IMQ treatment using flow cytometry. The results were comparable between the two groups. Additionally, we performed RNA-seq on the epidermis of WT and Sema4A KO mice. While we did not find similarities between Sema4A KO skin and non-lesional psoriasis except for S100a8 expression, we will further try to seek for the mechanisms how Sema4A KO skin mimics non-lesional psoriasis skin as a future project.

      Although targeting mTOR with rapamycin did not reverse the epidermal thickness in Sema4A KO mice, rapamycin was effective in reducing epidermal thickness in a murine psoriasis model induced by IMQ in Sema4A KO mice. These results suggest potential clinical relevance for treating active, lesional psoriatic skin changes, which would be of interest to clinicians. Thank you once again for your valuable insights.

      Reviewer #2 (Public Review):

      Summary:

      Kume et al. found for the first time that Semaphorin 4A (Sema4A) was downregulated in both mRNA and protein levels in L and NL keratinocytes of psoriasis patients compared to control keratinocytes. In peripheral blood, they found that Sema4A is not only expressed in keratinocytes but is also upregulated in hematopoietic cells such as lymphocytes and monocytes in the blood of psoriasis patients. They investigated how the down-regulation of Sema4A expression in psoriatic epidermal cells affects the immunological inflammation of psoriasis by using a psoriasis mice model in which Sema4A KO mice were treated with IMQ. Kume et al. hypothesized that down-regulation of Sema4A expression in keratinocytes might be responsible for the augmentation of psoriasis inflammation. Using bone marrow chimeric mice, Kume et al. showed that KO of Sema4A in non-hematopoietic cells was responsible for the enhanced inflammation in psoriasis. The expression of CCL20, TNF, IL-17, and mTOR was upregulated in the Sema4AKO epidermis compared to the WT epidermis, and the infiltration of IL-17-producing T cells was also enhanced.

      Strengths:

      Decreased Sema4A expression may be involved in psoriasis exacerbation through epidermal proliferation and enhanced infiltration of Th17 cells, which helps understand psoriasis immunopathogenesis.

      Weaknesses:

      The mechanism by which decreased Sema4A expression may exacerbate psoriasis is unclear as yet.

      We greatly appreciate your summary and thoughtful feedback on the strengths and weaknesses of our study. In response, we have included the results of additional experiments on IL-23-mediated psoriasis-like dermatitis, which showed that epidermal thickness was significantly greater in KO mice compared to WT mice. When we analyzed the T cells infiltrating the ears using flow cytometry, the proportion of IL-17A producing Vγ2 and DNγδ T cells within the CD3 fraction of the epidermis was significantly higher in Sema4A KO mice, consistent with the results from IMQ-induced psoriasis-like dermatitis. Furthermore, we examined STAT3 expression in the epidermis of WT and Sema4A KO mice using Western blot analysis, and the results were comparable between the two groups. However, the mechanism by which decreased Sema4A expression may exacerbate psoriasis remains unclear. We have added some explanations and presumptions to the limitations section. Thank you once again for your valuable insights.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Figure 1C

      What statistics were used? The supplemental notes adjusted the P value, what correction for multiple comparisons was utilized? Could the authors instead show logFC for the DEGs between Ctl and L in each cluster? This might be best demonstrated with a volcano plot, highlighting SEMA4A, and other genes known to be DE in psoriasis.

      We apologize for not including the detailed analysis methods in the original manuscript submission. We analyzed the scRNA-seq data using Cellxgene VIP with Welch’s t-test. Multiple comparisons were performed using the Benjamini-Hochberg procedure, setting the false discovery rate (FDR) at 0.05. These details are now explained in the MATERIALS AND METHODS section of the resubmitted manuscript. We also added a log2FC-log10 p-value graph for the DEGs in keratinocytes between Ctl and L to Figure 1-figure supplement 1D. The log2FC values in keratinocytes, dendritic cells, and macrophages were -0.07, 0.00, and -0.05, respectively. Although the log2FC is low in keratinocytes, the adjusted p-value (padj) for Sema4A is 2.83×10-39, indicating a statistically significant difference.

      Page 8 Line 111 in the resubmitted manuscript:

      “The adjusted p-value (padj) for SEMA4A in keratinocytes between Ctl and L was 2.83×10-39, indicating a statistically significant difference despite not being visually prominent in the volcano plot, which shows comprehensive differential gene expression in keratinocytes (Figure 1C; Figure 1-figure supplement 1D).”

      Page 54: In the Figure legend of Figure 1-figure supplement 1D in the resubmitted manuscript:

      “(D) The volcano plot displays changes in gene expression in psoriatic L compared to Ctl.”

      Page 30 Line 481 in the resubmitted manuscript: In the “Data processing of single-cell RNA-sequencing and bulk RNA-sequencing” section.

      “The data was integrated into an h5ad file, which can be visualized in Cellxgene VIP (K. Li et al., 2022). We then performed differential analysis between two groups of cells to identify differential expressed genes using Welch’s t-test. Multiple comparisons were controlled using the Benjamini-Hochberg procedure, with the false discovery rate set at 0.05 and significance defined as padj < 0.05.”

      Figure 2B

      The results narrative notes WT->WT is comparable to KO->WT. No statistics are given for this comparison. It appears the difference is less than the other comparisons, but still may be significant. Also, in the supplemental for Figure 2B, there appear to be missing columns for the 4 BM chimera groups (columns for WT and KO, but not 4 columns for each donor: recipient pair).

      We sincerely apologize for any confusion. We presented the results of the chimeric mice in Figure 3, and Figure 3-source data 1 shows the 4 BM chimera groups. In Figure 3B, the p-value for the comparison between WT->WT mice and KO->WT mice was 0.7988, as indicated in Figure 3-source data 1.

      Figure 3B

      While ear skin is not easily obtainable at day 0 for comparison, why not also include back skin at Wk 8? If the back skin epidermis is thicker like the ear skin, it supports the ear skin conclusion and adds a more consistent comparison. If the back skin epidermis is not thicker, what would be the author's explanation as to the why only ear skin epidermis is thicker in KO mice at 8 weeks?

      We appreciate and completely agree with the reviewer’s insightful comment. We have added images and dot plots of the back skin at Week 8 in Figure 4B. Since the back skin epidermis is thicker, similar to the ear skin, these results support the conclusion drawn from the ear skin data. Regarding Figure 4C, which shows the expression of Sema4a in the epidermis and dermis of 8-week-old WT mouse ear, we have modified the sentence in the manuscript to ‘the epidermis of WT ear at Week 8’ for clarification.

      Page 12 Line 180 in the resubmitted manuscript:

      “While epidermal thickness of back skin was comparable at birth (Figure 4B), on week 8, epidermis of Sema4AKO back and ear skin was notably thicker than that of WT mice (Figure 4B), suggesting that acanthosis in Sema4AKO mice is accentuated post-birth.”

      Page 47: In the Figure legend of Figure 4B in the resubmitted manuscript:

      “(B) Left: representative Hematoxylin and eosin staining of Day 0 back and Wk 8 back and ear. Scale bar = 50 μm. Right: Epi and Derm thickness in Day 0 back (n = 5) and Wk 8 back (n = 5) and ear (n = 8).”

      Figures 3C&D, Figures 4 D-F

      The figures might be easier to read if some of the data is moved to supplemental, especially in Figure 4, which has 36 panels just in D-F. Conversely, the dLN data is important in establishing the skin microenvironment as important in the accumulation of γδ cells and IL-17 production in the setting of Sema4a KO, so this might be more impactful if moved to the main figure.

      We appreciate and agree with your comments. As recommended, we have moved data from Figure 3C and 4D-F to the supplemental section. The dLN data have been moved to the main figure as Figure 4E. This has improved the readability of the figures.

      Figure 5 and Figure 6 might work better if combined. The differences in keratinocytes in psoriasis are well-known, so the novelty is how Sema4a KO skin appears to share similar differences. This would be easier to see if compared side-by-side in the same figure. Also, there is an opportunity to show this more rigorously by performing RNA-seq on WT vs Sema4a KO skin. Showing a larger set of DEGs that trend similarly between Ctl/NL psoriasis and WT/Sema4a KO skin in a heatmap would bolster the conclusion that Sema4a deficiency contributes to a psoriasis-like skin defect.

      We appreciate your valuable suggestion. Following your recommendation, we have combined Figures 5 and 6 to facilitate a side-by-side comparison. This highlights the similarities between Sema4AKO skin and psoriasis, making it easier to observe differences in keratinocytes. Additionally, we performed RNA-seq on WT and Sema4a KO epidermis (n = 3 per group). We analyzed the raw count data using iDEP 2.0 (Ge S.X., BMC Bioinformatics, 2018), setting the minimal counts per million to 0.5 in at least one library. Differential gene expression analysis was conducted using DEseq2, with an FDR cutoff of 0.1 and a minimum fold change of 2. As a result, we identified 46 upregulated and 70 downregulated genes in Sema4AKO mice compared to WT mice (see the volcano plot and heat map). However, except for S100a8, we did not observe significant expression changes in non-lesional psoriasis-related genes between WT and Sema4AKO mice. In the future, we aim to identify subtle stimuli that could cause gene expression changes between these groups and we would like to perform additional RNA-seq experiments.

      Author response image 1.

      Author response image 2.

      Page 48: The Figure title of Figure 5 in the resubmitted manuscript:

      “Figure 5: Sema4AKO skin shares the features of human psoriatic NL.”

      SEMA4A is not significantly DE between Ctl and NL in the psoriasis RNA-seq data. If a lower expression of SEMA4A in psoriasis skin is a driving part of the phenotype, why is this not observed in the RNA-seq data? Presumably, this could be explained by infiltration of immune cells with increased SEMA4A expression, like in the scRNA-seq data in Figure 1. If so, might it be useful to analyze WT->KO chimera mice similarly to global KO mice in Figures 6-8? This might more accurately reflect what is happening in psoriasis, if epidermal SEMA4A expression is low, but immune expression is not. The KO data on their own nicely show a skin phenotype, but these additional experiments might more closely mimic psoriatic disease and increase the rigor and impact of the study.

      We really appreciate your insightful comments. Due to the limitations of the animal experimentation facility, we regret that we are unable to create additional chimeric mice. Although our analysis is limited, we compared IL-17A production from T cells of WT→KO mice and KO→KO mice following 4 consecutive days of IMQ treatment using flow cytometry (see Author response image 3 below; n = 6 for WT→KO, n = 4 for KO→KO). This comparison revealed that IL-17A production from T cells was comparable, regardless of whether they were derived from WT or Sema4AKO mice, when the skin constituent cells were derived from Sema4AKO. We appreciate the value of your advice, and agree that investigating keratinocyte differentiation and mTOR signaling in the epidermis, using either WT→KO chimeric mice or keratinocyte-specific Sema4A-deficient mice, is a crucial next step in our research.

      Author response image 3.

      Figure 8

      Rapamycin was able to partially reverse the psoriasis-like skin phenotype in Sema4a KO mice. Would rapamycin also be effective in the more severe disease induced by IMQ in Sema4a KO mice? While partially reducing the effect of Sema4a KO on steady-state skin with rapamycin strengthens the link to mTOR dysregulation, it did not change skin thickness. It's unclear if this would be useful clinically for patients with well-controlled psoriasis (NL skin). Would it be useful to reverse active, lesional psoriatic skin changes? Testing this might yield results more relevant to clinicians and patients.

      We are grateful for your valuable feedback. Rapamycin showed effectiveness in reducing epidermal thickness in a murine psoriasis model induced by IMQ in Sema4AKO mice. Rapamycin treatment downregulated the expression of Krt10, Krt14, and Krt16. We included these results to Figure 7-figure supplement 2. These results suggest potential clinical relevance for treating active, lesional psoriatic skin changes and may be of interest to clinicians and patients.

      Page 17 Line 269 in the resubmitted manuscript:

      “Next, we investigated whether intraperitoneal rapamycin treatment effectively downregulates inflammation in the IMQ-induced murine model of psoriasis in Sema4AKO mice (Figure 7-figure supplement 2A). Rapamycin significantly reduced epidermal thickness compared to vehicle treatment (Figure 7-figure supplement 2B). Additionally, rapamycin treatment downregulated the expression of Krt10, Krt14, and Krt16 (Figure 7-figure supplement 2C). While the upregulation of Il17a in the Sema4AKO epidermis in IMQ model was not clearly modified by rapamycin (Figure 7-figure supplement 2C), immunofluorescence revealed a decrease in the number of CD3 T cells in Sema4AKO epidermis by rapamycin (Figure 7-figure supplement 2D). In the naive states, mTORC1 primarily regulates keratinocyte proliferation, whereas mTORC2 mainly involved in the keratinocyte differentiation through Sema4A-related signaling pathways. Conversely, in the psoriatic dermatitis state, rapamycin downregulated both keratinocyte differentiation and proliferation markers. The observed similarities in Il17a expression following treatment with rapamycin and JR-AB2-011, regardless of additional IMQ treatment, suggest that Il17a production is not significantly dependent on Sema4A-related mTOR signaling.”

      Page 29 Line 461 in the resubmitted manuscript: In the “Inhibition of mTOR” section.

      “To analyze the preventive effectiveness of rapamycin in an IMQ-induced murine model of psoriatic dermatitis, Sema4AKO mice were administered either vehicle or rapamycin intraperitoneally from Day 0 to Day 17, and IMQ was topically applied to both ears for 4 days starting on Day 14. Then, on Day 18, ears were collected for further analysis.”

      Page 71: Figure 7-figure supplement 2 in the resubmitted manuscript:

      “Figure 7-figure supplement 2: Rapamycin treatment reduced the epidermal swelling observed in IMQ-treated Sema4AKO mice.

      (A) Experimental scheme. (B) The Epi thickness on Day 18. (n = 10 for Ctl, n = 12 for Rapamycin). (C) Relative expression of keratinocyte differentiation markers and Il17a in Sema4AKO Epi (n = 10 for Ctl, n = 12 for Rapamycin). (D) The number of T cells in the Epi (left) and Derm (right), under Ctl or rapamycin and IMQ treatments (n = 10 for Ctl, n = 12 for Rapamycin). Each dot represents the sum of numbers from 10 unit areas across 3 specimens. A-C: *p < 0.05, **p < 0.01. NS, not significant.”

      Reviewer #2 (Recommendations For The Authors):

      (1) To know whether the decrease of Sema4A in the epidermis of psoriasis patients is a result or a cause of psoriasis, it is necessary to show how the expression of Sema4A in epidermal cells is regulated. Shouldn't the degree of change in the expression of essential molecules (which is the cause of psoriasis) be more pronounced in L than in NL?

      We surveyed transcription factors of human Sema4A using GeneCards and found that NF-κB is the transcription factor most frequently associated with psoriasis. Wang et al. (Arthritis Res Ther. 2015) indicated NF-κB-dependent modulation of Sema4A expression in synovial fibroblasts of rheumatoid arthritis. However, since NF-κB expression is reportedly upregulated in psoriasis lesions, other transcription factors may function as key modulators of Sema4A expression in the epidermis.

      Although the molecules causing psoriasis remain to be elucidated, we investigated the correlation between the expression of psoriasis-related essential molecules in keratinocytes—such as S100A7A, S100A7, S100A8, S100A9, and S100A12—and SEMA4A expression in L and NL samples using qRT-PCR. We could not identify a correlation between these molecules and SEMA4A expression. We added a note to the limitations section to acknowledge that we were not able to reveal how Sema4A expression is regulated and that we could not determine the relationships between Sema4A expression and the essential molecules upregulated in psoriatic keratinocytes.

      Page 21 Line 328 in the resubmitted manuscript:

      “We were not able to reveal how Sema4A expression is regulated. Although we showed that downregulation of Sema4A is related to the abnormal cytokeratin expression observed in psoriasis, we could not determine the relationships between Sema4A expression and the essential molecules upregulated in psoriatic keratinocytes.”

      (2) Using bone marrow chimeric mice, it has already been reported that hematopoietic cells contain keratinocyte stem cells. Therefore, their interpretation is not supported by the results of their bone marrow chimeric mice experiment, and it is essential to generate keratinocyte-specific Sema4A knockout mice and perform similar experiments to support their interpretation.

      We value the reviewer’s insightful comment. We have assessed the expression of Sema4a in the epidermis of WT→KO chimeric mice using qRT-PCR. Our findings indicate that Sema4a expression levels in the epidermis of these mice are minimal (cycle threshold values of Sema4a ranged from 31.9 to not detected in WT→KO chimeric mice, whereas they ranged from 24.5 to 26.2 in WT→ WT mice). Consequently, we believe that the impact of keratinocyte stem cells derived from WT-hematopoietic cells is limited in this model. We appreciate this opportunity to clarify our results and will consider the generation of keratinocyte-specific Sema4A knockout mice for future experiments to further substantiate our interpretation.

      Page 11 Line 159 in the resubmitted manuscript:

      “Since it has already been reported that bone marrow cells contain keratinocyte stem cells (Harris et al., 2004; Wu, Zhao, & Tredget, 2010), we confirmed that epidermis of mice deficient in non-hematopoietic Sema4A (WT→KO) showed no obvious detection of Sema4a, thereby ruling out the impact of donor-derived keratinocyte stem cells infiltrating the host epidermis (Figure 3-figure supplement 1A).”

      Page 60: In the Figure legend of Figure 3-figure supplement 1A in the resubmitted manuscript:

      “(A) Sema4a expression in the Epi of WT→ WT mice and WT→ KO mice (n = 8 for WT→ WT, n = 7 for WT→ KO).”

      (3) Since Sema4A KO mice already have immunological and epidermal cell characteristics similar to psoriasis, albeit weak, it is possible that the nonspecific stimulus of simply topical IMQ may have appeared to exacerbate psoriasis. It is advisable to confirm whether a more psoriasis-specific stimulus, IL-23 administration, would produce similar results.

      Thank you for your suggestion. Following your advice, we have analyzed IL-23-mediated psoriasis-like dermatitis. To induce the model, 20 μl of phosphate-buffered saline containing 500 ng of recombinant mouse IL-23 was injected intradermally into both ears for 4 consecutive days. Unlike with the application of IMQ, there was no significant difference in ear thickness. However, H&E staining revealed that the epidermal thickness was significantly greater in KO mice compared to WT mice. Although a longer period of IL-23 induction might result in more pronounced ear swelling, we conducted this experiment over the same duration as the IMQ application experiment to maintain consistency. When we analyzed the T cells infiltrating the ears using flow cytometry, the proportion of IL-17A producing Vγ2 and DNγδ T cells in CD3 fraction in the epidermis was significantly higher in Sema4A KO mice, consistent with the results from IMQ-induced psoriasis-like dermatitis.

      The lack of significant difference in ear thickness changes with IL-23 administration might be due to IL-23 administration not reflecting upstream events of IL-23 production.

      We consider that in psoriasis, the expression of Sema4A in keratinocytes is likely more important than in T cells. Therefore, it makes sense that the phenotype difference was more pronounced with IMQ, which likely has a greater effect on keratinocytes compared to IL-23.

      Page 9 Line 137 in the resubmitted manuscript:

      “Though the imiquimod model is well-established and valuable murine psoriatic model (van der Fits et al., 2009), the vehicle of imiquimod cream can activate skin inflammation that is independent of toll-like receptor 7, such as inflammasome activation, keratinocyte death and interleukin-1 production (Walter et al., 2013). This suggests that the imiquimod model involves complex pathway. Therefore, we subsequently induced IL-23-mediated psoriasis-like dermatitis (Figure2-figure supplement 2A), a much simpler murine psoriatic model, because IL-23 is thought to play a central role in psoriasis pathogenesis (Krueger et al., 2007; Lee et al., 2004). Although ear swelling on day 4 was comparable between WT mice and Sema4AKO mice (Figure2-figure supplement 2B), the epidermis, but not the dermis, was significantly thicker in Sema4AKO mice compared to WT mice (Figure2-figure supplement 2C). We found that the proportion of CD4 T cells among T cells was significantly higher in Sema4A KO mice compared to WT mice, while the proportion of Vγ2 and DNγδ T cells among T cells was comparable between them (Figure 2-figure supplement 2D). On the other hand, focusing on IL-17A-producing cells, the proportion of IL-17A-producing Vγ2 and DNγδ T cells in CD3 fraction in the epidermis was significantly higher in Sema4A KO mice, consistent with the results from imiquimod-induced psoriasis-like dermatitis. (Figure 2-figure supplement 2E).”

      Page 24 Line 363 in the resubmitted manuscript: In the “Mice” section.

      “To induce IL-23-mediated psoriasis-like dermatitis, 20 μl of phosphate-buffered saline containing 500 ng of recombinant mouse IL-23 (BioLegend, San Diego, CA) was injected intradermally into both ears of anesthetized mice using a 29-gauge needle for 4 consecutive days.”

      Page 58: In the Figure legend of Figure 2-figure supplement 2 in the resubmitted manuscript:

      “IL-23-mediated psoriasis-like dermatitis is augmented in Sema4AKO mice.

      (A) An experimental scheme involved intradermally injecting 20 μl of phosphate-buffered saline containing 500 ng of recombinant mouse IL-23 into both ears of WT mice and KO mice for 4 consecutive days. Samples for following analysis were collected on Day 4. (B and C) Ear thickness (B) and Epi and Derm thickness (C) of WT mice and KO mice on Day 4 (n = 12 per group). (D and E) The percentages of Vγ3, Vγ2, DNγδ, CD4, and CD8 T cells (D) and those with IL-17A production (E) in CD3 fraction in the Epi (top) and Derm (bottom) of WT and KO ears (n = 5 per group). Each dot represents the average of 4 ear specimens. B-E: *p < 0.05, **p < 0.01. NS, not significant.”

      (4) How is STAT3 expression in the epidermis crucial in the pathogenesis of psoriasis in Sem4AKO mice?

      We appreciate your insightful comment. In our study, given the established role of activated STAT3 in psoriasis, we investigated both total STAT3 and phosphorylated STAT3 (p-STAT3) levels in the naive epidermis of WT and Sema4AKO mice (See the figure below). Our findings indicate that STAT3 activation does not occur in the epidermis of Sema4AKO mice. Therefore, we speculated that the hyperkeratosis observed in Sema4AKO mice is due to aberrant mTOR signaling rather than STAT3 activation. STAT3 may be relevant to other pathways independent of Sema4A signaling, or it may function as a complex with other molecules in the Sema4A signaling.

      Author response image 4.

    1. eLife Assessment

      This paper presents the important discovery that lipid metabolic imbalance caused by Snail, an EMT-related transcription factor, contributes to the acquisition of chemoresistance in cancer cells. However, the incomplete support for the authors' claims is due to concerns about the causal relationship and lack of sufficient quantitative analysis. With strengthened evidence, this work would be of broad interest to researchers in the fields of cancer biology, lipid metabolism, and cell biology.

    2. Reviewer #1 (Public review):

      The authors focus on the molecular mechanisms by which EMT cells confer resistance to cancer cells. The authors use a wide range of methods to reveal that overexpression of Snail in EMT cells induces cholesterol/sphingomyelin imbalance via transcriptional repression of biosynthetic enzymes involved in sphingomyelin synthesis. The study also revealed that ABCA1 is important for cholesterol efflux and thus for counterbalancing the excess of intracellular free cholesterol in these snail-EMT cells. Inhibition of ACAT, an enzyme catalyzing cholesterol esterification, also seems essential to inhibit the growth of snail-expressing cancer cells.

      However, It seems important to analyze the localization of ABCA1, as it is possible that in the event of cholesterol/sphingomyelin imbalance, for example, the intracellular trafficking of the pump may be altered.<br /> The authors should also analyze ACAT levels and/or activity in snail-EMT cells that should be increased. Overall, the provided data are important to better understand cancer biology.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors discovered that the chemoresistance in RCC cell lines correlates with the expression levels of the drug transporter ABCA1 and the EMT-related transcription factor Snail. They demonstrate that Snail induces ABCA1 expression and chemoresistance, and that ABCA1 inhibitors can counteract this resistance. The study also suggests that Snail disrupts the cholesterol-sphingomyelin (Chol/SM) balance by repressing the expression of enzymes involved in very long-chain fatty acid-sphingomyelin synthesis, leading to excess free cholesterol. This imbalance activates the cholesterol-LXR pathway, inducing ABCA1 expression. Moreover, inhibiting cholesterol esterification suppresses Snail-positive cancer cell growth, providing potential lipid-targeting strategies for invasive cancer therapy.

      Strengths:

      This research presents a novel mechanism by which the EMT-related transcription factor Snail confers drug resistance by altering the Chol/SM balance, introducing a previously unrecognized role of lipid metabolism in the chemoresistance of cancer cells. The focus on lipid balance, rather than individual lipid levels, is a particularly insightful approach. The potential for targeting cholesterol detoxification pathways in Snail-positive cancer cells is also a significant therapeutic implication.

      Weaknesses:

      The study's claim that Snail-induced ABCA1 is crucial for chemoresistance relies only on pharmacological inhibition of ABCA1, lacking additional validation. The causal relationship between the disrupted Chol/SM balance and ABCA1 expression or chemoresistance is not directly supported by data. Some data lack quantitative analysis.

    4. Author response:

      Response to Reviewer 1

      We will investigate the intracellular localization of ABCA1 in both EpH4 and EpH4-Snail cells. We will also examine the changes in ACAT expression levels within these cell lines.

      Response to Reviewer 2

      We will first investigate whether the chemoresistance exhibited by EpH4-Snail cells can be abolished not only through pharmacological inhibition of ABCA1 but also by knocking out the ABCA1 gene. Regarding causality, as demonstrated in Figure 2, we have already shown that reducing cholesterol levels in EpH4-Snail cells decreases ABCA1 expression. To further explore this relationship, we will assess whether increasing sphingomyelin levels by adding ceramide to the culture medium, thereby correcting the sphingomyelin-to-cholesterol ratio, would reduce ABCA1 expression. Furthermore, we will evaluate whether lowering cholesterol levels in EpH4-Snail cells via simvastatin treatment, along with normalization of the sphingomyelin-to-cholesterol ratio, attenuates their resistance to the anticancer drug nitidine chloride. Additionally, we will incorporate quantitative analyses for several experiments, as suggested in the reviewers’ comments, to enhance the robustness of our findings.

    1. eLife Assessment

      The authors use deep mutational scanning to assess the effect of all possible protein-coding variants in MC4R, a G protein-coupled receptor associated with obesity. They develop new, more precise approaches, enabling them to probe molecular phenotypes directly relevant to the development of drugs that target this receptor. In this important work, the authors provide convincing evidence that variants impact signaling through MC4R in different ways, that some defective variants are amenable to a corrector drug and that deep mutational scanning data could guide compound optimization.

    2. Reviewer #1 (Public review):

      Summary:

      Howard et al. performed deep mutational scanning on the MC4R gene, using a reporter assay to investigate two distinct downstream pathways across multiple experimental conditions. They validated their findings with ClinVar data and previous studies. Additionally, they provided insights into the application of DMS results for personalized drug therapy and differential ligand responses across variant types.

      Strengths:

      They captured over 99% of variants with robust signals and investigated subtle functionalities, such as pathway-specific activities and interactions with different ligands, by refining both the experimental design and analytical methods.

      Weaknesses:

      While the study generated informative results, it lacks a detailed explanation regarding the input library, replicate correlation, and sequencing depth for a given number of cells. Additionally, there are several questions that it would be helpful for authors to clarify.

      (1) It would be helpful to clarify the information regarding the quality of the input library and experimental replicates. Are variants evenly represented in the library? Additionally, have the authors considered using long-read sequencing to confirm the presence of a single intended variant per construct? Finally, could the authors provide details on the correlation between experimental replicates under each condition?

      (2) Since the functional readout of variants is conducted through RNA sequencing, it seems crucial to sequence a sufficient number of cells with adequate sequencing saturation. Could the authors clarify the coverage depth used for each RNA-seq experiment and how this depth was determined? Additionally, how many cells were sequenced in each experiment?

      (3) It appears that the frequencies of individual RNA-seq barcode variants were used as a proxy for MR4C activity. Would it be important to also normalize for heterogeneity in RNA-seq coverage across different cells in the experiment? Variability in cell representation (i.e., the distribution of variants across cells) could lead to misinterpretation of variant effects. For example, suppose barcode_a1 represents variant A and barcode_b1 represents variant B. If the RNA-seq results show 6 reads for barcode_a1 and 7 reads for barcode_b1, it might initially appear that both variants have similar effect sizes. However, if these reads correspond to 6 separate cells each containing 1 copy of barcode_a1, and only 1 cell containing 7 copies of barcode_b1, the interpretation changes significantly. Additionally, if certain variants occupy a larger proportion of the cell population, they are more likely to be overrepresented in RNA sequencing.

      (4) Although the assay system appears to effectively represent MC4R functionality at the molecular level, we are curious about the potential disparity between the DMS score system and physiological relevance. How do variants reported in gnomAD distribute within the DMS scoring system?

      (5) To measure Gq signaling, the authors used the GAL4-VPR relay system. Is there additional experimental data to support that this relay system accurately represents Gq signaling?

      (6) Identifying the variants responsive to the corrector was impressive. However, we are curious about how the authors confirmed that the restoration of MC4R activity was due to the correction of the MC4R protein itself. Is there a possibility that the observed effect could be influenced by other factors affected by the corrector? When the corrector was applied to the cells, were any expected or unexpected differential gene expression changes observed?

      (7) As mentioned in the introduction, gain-of-function (GoF) variants are known to be protective against obesity. It would be interesting to see further studies on the observed GoF variants. Do the authors have any plans for additional research on these variants?

    3. Reviewer #2 (Public review):

      Overview

      In this manuscript, the authors use deep mutational scanning to assess the effect of ~6,600 protein-coding variants in MC4R, a G protein-coupled receptor associated with obesity. Reasoning that current deep mutational scanning approaches are insufficiently precise for some drug development applications, they focus on articulating new, more precise approaches. These approaches, which include a new statistical model and innovative reporter assay, enable them to probe molecular phenotypes directly relevant to the development of drugs that target this receptor with high precision and statistical rigor.

      They use the resulting data for a variety of purposes, including probing the relationship between MC4R's sequence and structure, analyzing the effect of clinically important variants, identifying variants that disrupt downstream MC4R signaling via one but not both pathways, identifying loss of function variants are amenable to a corrector drug and exploring how deep mutational scanning data could guide small molecule drug optimization.

      Strengths

      The analysis and statistical framework developed by the authors represent a significant advance. In particular, the study makes use of barcode-level internally replicated measurements to more accurately estimate measurement noise.

      The framework allows variant effects to be compared across experimental conditions, a task that is currently hard to do with rigor. Thus, this framework will be applicable to a large number of existing and future deep mutational scanning experiments.

      The authors refine their existing barcode transcription-based assay for GPCR signaling, and develop a clever "relay" new reporter system to boost signaling in a particular pathway. They show that these reporters can be used to measure both gain of function and loss of function effects, which many deep mutational scanning approaches cannot do.

      The use of systematic approaches to integrate and then interrogate high-dimensional deep mutational scanning data is a big strength. For example, the authors applied PCA to the variant effect results from reporters for two different MC4R signaling pathways and were able to discover variants that biased signaling through one or the other pathway. This approach paves the way for analyses of higher dimensional deep mutational scans.

      The authors use the deep mutational scanning data they collect to map how different variants impact small molecule agonists activate MC4R signaling. This is an exciting idea, because developing small-molecule protein-targeting therapeutics is difficult, and this manuscript suggests a new way to map small-molecule-protein interactions.

      Weaknesses

      The authors derive insights into the relationship between MC4R signaling through different pathways and its structure. While these make sense based on what is already known, the manuscript would be stronger if some of these insights were validated using methods other than deep mutational scanning.

      Likewise, the authors use their data to identify positions where variants disrupt MC4R activation by one small molecule agonist but not another. They hypothesize these effects point to positions that are more or less important for the binding of different small molecule agonists. The manuscript would be stronger if some of these insights were explored further.

      Impact

      In this manuscript, the authors present new methods, including a statistical framework for analyzing deep mutational scanning data that will have a broad impact. They also generate MC4R variant effect data that is of interest to the GPCR community.

    4. Author response:

      We thank the reviewers for their support of this work and insightful recommendations for how to improve it. We have provided specific responses to each reviewer comment below. To summarize how we intend to address the requested revisions:

      Many of the reviewers’ comments requested additional technical or quality details about the DMS libraries or assays (e.g., number of cells tested, number of sequencing reads, assay replication, assay sensitivity, library balance), and we provide additional information and analyses that we can incorporate into the relevant portions of the text, supplementary tables, and supplementary figures to address these questions.

      Some comments asked to clarify nomenclature/wording or provide additional labels to images, and we will make these changes as requested.

      A few questions would require additional experimental data to address. Where experiments have already been performed, we will incorporate those results or cite relevant work previously reported in the literature.

      Reviewer 1:

      Summary

      Howard et al. performed deep mutational scanning on the MC4R gene, using a reporter assay to investigate two distinct downstream pathways across multiple experimental conditions. They validated their findings with ClinVar data and previous studies. Additionally, they provided insights into the application of DMS results for personalized drug therapy and differential ligand responses across variant types.

      Strengths

      They captured over 99% of variants with robust signals and investigated subtle functionalities, such as pathway-specific activities and interactions with different ligands, by refining both the experimental design and analytical methods.

      Weaknesses

      While the study generated informative results, it lacks a detailed explanation regarding the input library, replicate correlation, and sequencing depth for a given number of cells.

      Additionally, there are several questions that it would be helpful for authors to clarify.

      (1) It would be helpful to clarify the information regarding the quality of the input library and experimental replicates. Are variants evenly represented in the library? Additionally, have the authors considered using long-read sequencing to confirm the presence of a single intended variant per construct? Finally, could the authors provide details on the correlation between experimental replicates under each condition?

      Are variants evenly represented in the library?

      We strive to achieve as evenly balanced library as possible at every stage of the DMS process (e.g., initial cloning in E. coli through integration into human cells). Below is a representative plot showing the number of barcodes per amino acid variant at each position in a given ~60 amino acid subregion of MC4R, which highlights how evenly variants are represented at the E. coli cloning stage.

      Author response image 1.

      We also make similar measurements after the library is integrated into HEK293T cell lines, and see similarly even coverage across all variants, as shown in the plot below.

      Author response image 2.

      Additionally, have the authors considered using long-read sequencing to confirm the presence of a single intended variant per construct?

      We agree long-read sequencing would be an excellent way to confirm that our constructs contain a single intended variant. However, we elected for an alternate method (outlined in more detail in Jones et al. 2020) that leverages multiple layers of validation. First, the oligo chip-synthesized portions of the protein containing the variants are cloned into a sequence-verified plasmid backbone, which greatly decreases the chances of spuriously generating a mutation in a different portion of the protein. We then sequence both the oligo portion and random barcode using overlapping paired end reads during barcode mapping to avoid sequencing errors and to help detect DNA synthesis errors. At this stage, we computationally reject any constructs that have more than one variant. Given this, the vast majority of remaining unintended variants would come from somatic mutations introduced by the E. coli cloning or replication process, which should be low frequency. We have used our in-house full plasmid sequencing method, OCTOPUS, to sample and spot check this for several other DMS libraries we have generated using the same cloning methods. We have found variants in the plasmid backbone in only ~1% of plasmids in these libraries. Our statistical model also helps correct for this by accounting for barcode-specific variation. Finally we believe this provides further motivation for having multiple barcodes per variant, which dilutes the effect of any unintended additional variants.

      Finally, could the authors provide details on the correlation between experimental replicates under each condition?

      Certainly! In general, the Gs reporter had higher correlation between replicates than the Gq system (r ~ 0.5 vs r ~ 0.4). The plots below show two representative correlations at the RNA-seq stage of read counts for barcodes between the low a-MSH conditions. One important advantage of our statistical model is that it’s able to leverage information from barcodes regardless of the number of replicates they appear in.

      Author response image 3.

      Since the functional readout of variants is conducted through RNA sequencing, it seems crucial to sequence a sufficient number of cells with adequate sequencing saturation. Could the authors clarify the coverage depth used for each RNA-seq experiment and how this depth was determined? Additionally, how many cells were sequenced in each experiment?

      This will be addressed by incorporating the following details into the manuscript:

      We seeded 17 million cells per replicate at the start of each assay and, with a doubling of ~1.5x over the course of the assay, harvested ~25.5 million cells per replicate for RNA extraction and sequencing. We found this sufficient to get at least ~30-60x cellular coverage per amino acid variant.

      Total mapped reads per replicate at RNA-seq stage

      - Gs/CRE: 9.1-18.2 million mapped reads, median=12.3

      - Gq/UAS: 8.6-24.1 million mapped reads, median=14.5

      - Gs/CRE+Chaperone: 6.4-9.5 million mapped reads, median=7.5

      Reads per barcode distribution

      - Median read counts of 8, 10, and 6 reads per sample per barcode for Gs/CRE, Gq/UAS, and Gs/CRE+Chaperone assays, respectively.

      Barcodes per variant distribution

      - As reported, the median number of barcodes per variant across samples (the “median of medians”) is 56 for Gs/CRE and 28 for Gq/UAS

      - Additionally, it is 44 for Gs/CRE+Chaperone

      It appears that the frequencies of individual RNA-seq barcode variants were used as a proxy for MR4C activity. Would it be important to also normalize for heterogeneity in RNA-seq coverage across different cells in the experiment? Variability in cell representation (i.e., the distribution of variants across cells) could lead to misinterpretation of variant effects. For example, suppose barcode_a1 represents variant A and barcode_b1 represents variant B. If the RNA-seq results show 6 reads for barcode_a1 and 7 reads for barcode_b1, it might initially appear that both variants have similar effect sizes. However, if these reads correspond to 6 separate cells each containing 1 copy of barcode_a1, and only 1 cell containing 7 copies of barcode_b1, the interpretation changes significantly. Additionally, if certain variants occupy a larger proportion of the cell population, they are more likely to be overrepresented in RNA sequencing.

      We account for this heterogeneity in several ways. First, as shown above (Response to Reviewer 1, Question 1), we aim to have even representation of variants within our libraries. Second, we utilize compositional control conditions like forskolin or unstimulated conditions to obtain treatment-independent measurements of barcode abundance and, consequently, of mutant-vs-WT effects that are due to compositional rather than biological variability. We expect that variability observed under these controls is due to subtle effects of molecular cloning, gene expression, and stochasticity. Using these controls, we observe that mutant-vs-WT effects are generally close to zero in these normalization conditions (e.g., in untreated Gq, see Supplementary Figure 3) as compared to drug-treated conditions. For example, pre-mature stops behave similar to WT in normalization conditions. This indicates that mutant abundance is relatively homogenous. Where there are barcode-dependent effects on abundance, we can use information from these conditions to normalize that effect. Finally, our mixed-effect model accounts for barcode-specific deviations from the expected mutant effect (e.g. a “high count” barcode consistently being high relative to the mean).

      Although the assay system appears to effectively represent MC4R functionality at the molecular level, we are curious about the potential disparity between the DMS score system and physiological relevance. How do variants reported in gnomAD distribute within the DMS scoring system?

      Figure 2D shows DMS scores (variant effect on Gs signaling) relative to human population frequency for all MC4R variants reported in gnomAD as of January 8, 2024.

      To measure Gq signaling, the authors used the GAL4-VPR relay system. Is there additional experimental data to support that this relay system accurately represents Gq signaling?

      The full Gq reporter uses an NFAT response element from the IL-2 promoter to regulate the expression of the GAL4-VPR relay. In this system, the activation of Gq signaling results in the activation of the NFAT response element, and this signal is then amplified by the GAL4-VPR relay. The NFAT response element has been previously well-validated to respond to the activation of Gq signaling (e.g., PMID: 8631834). We will add this reference to the text to further support the use of the Gq assay.

      Identifying the variants responsive to the corrector was impressive. However, we are curious about how the authors confirmed that the restoration of MC4R activity was due to the correction of the MC4R protein itself. Is there a possibility that the observed effect could be influenced by other factors affected by the corrector? When the corrector was applied to the cells, were any expected or unexpected differential gene expression changes observed?

      While we do not directly measure whether Ipsen-17 has effects on other signaling processes, previous work has shown that Ipsen-17 treatment does not indirectly alter signaling kinetics such as receptor internalization (Wang et al., 2014). Furthermore, our analysis methods inherently account for this by normalizing variant effects to WT signaling levels. Any observed rescue of a given variant inherently means that the variant is specifically more responsive to Ipsen-17 than WT, and the fact that different variants exhibit different levels of rescue is reassuring that the mechanism is on target to MC4R. Lastly, Ipsen-17 is known to be an antagonist of alpha-MSH activity and is thought to bind directly to the same site on MC4R (Wang et al., 2014).

      As mentioned in the introduction, gain-of-function (GoF) variants are known to be protective against obesity. It would be interesting to see further studies on the observed GoF variants. Do the authors have any plans for additional research on these variants?

      We agree this would be an excellent line of inquiry, but due to changes in company priorities we unfortunately do not have any plans for additional research on these variants.

      Reviewer 2:

      Overview

      In this manuscript, the authors use deep mutational scanning to assess the effect of ~6,600 protein-coding variants in MC4R, a G protein-coupled receptor associated with obesity. Reasoning that current deep mutational scanning approaches are insufficiently precise for some drug development applications, they focus on articulating new, more precise approaches. These approaches, which include a new statistical model and innovative reporter assay, enable them to probe molecular phenotypes directly relevant to the development of drugs that target this receptor with high precision and statistical rigor.

      They use the resulting data for a variety of purposes, including probing the relationship between MC4R's sequence and structure, analyzing the effect of clinically important variants, identifying variants that disrupt downstream MC4R signaling via one but not both pathways, identifying loss of function variants are amenable to a corrector drug and exploring how deep mutational scanning data could guide small molecule drug optimization.

      Strengths

      The analysis and statistical framework developed by the authors represent a significant advance. In particular, the study makes use of barcode-level internally replicated measurements to more accurately estimate measurement noise.

      The framework allows variant effects to be compared across experimental conditions, a task that is currently hard to do with rigor. Thus, this framework will be applicable to a large number of existing and future deep mutational scanning experiments.

      The authors refine their existing barcode transcription-based assay for GPCR signaling, and develop a clever "relay" new reporter system to boost signaling in a particular pathway. They show that these reporters can be used to measure both gain of function and loss of function effects, which many deep mutational scanning approaches cannot do.

      The use of systematic approaches to integrate and then interrogate high-dimensional deep mutational scanning data is a big strength. For example, the authors applied PCA to the variant effect results from reporters for two different MC4R signaling pathways and were able to discover variants that biased signaling through one or the other pathway. This approach paves the way for analyses of higher dimensional deep mutational scans.

      The authors use the deep mutational scanning data they collect to map how different variants impact small molecule agonists activate MC4R signaling. This is an exciting idea, because developing small-molecule protein-targeting therapeutics is difficult, and this manuscript suggests a new way to map small-molecule-protein interactions.

      Weaknesses

      The authors derive insights into the relationship between MC4R signaling through different pathways and its structure. While these make sense based on what is already known, the manuscript would be stronger if some of these insights were validated using methods other than deep mutational scanning.

      Likewise, the authors use their data to identify positions where variants disrupt MC4R activation by one small molecule agonist but not another. They hypothesize these effects point to positions that are more or less important for the binding of different small molecule agonists. The manuscript would be stronger if some of these insights were explored further.

      Impact

      In this manuscript, the authors present new methods, including a statistical framework for analyzing deep mutational scanning data that will have a broad impact. They also generate MC4R variant effect data that is of interest to the GPCR community.

    1. eLife Assessment

      This important study demonstrates the therapeutic potential of recombinant human PDGF-AB/BB proteins in reducing the senescent signatures of primary human intervertebral disc cells. The study represents a significant advance in the treatment of intervertebral disc degeneration and can be applied broadly to other degenerative musculoskeletal tissues through the suppression of senescence. Solid evidence, primarily based on transcriptomic analysis and direct protein measurements from relatively homogeneous cell populations, supports the therapeutic potential of PDGF, but more experimental details would be needed to make the evidence stronger.

    2. Reviewer #1 (Public review):

      The authors, Zhang et al., demonstrate the beneficial effects of treating degenerate human primary intervertebral disc (IVD) cells with recombinant human PDGF-AB/BB on the senescence transcriptomic signatures. Utilizing a combination of degenerate cells from elderly humans and experimentally induced senescence in young, healthy IVD cells, the authors show the therapeutic effects on mRNA transcription as well as cellular processes through informatics approaches.

      One notable strength of this study is the use of human primary cells and recombinant forms of human PDGF-AB/BB proteins, which increases the translational potential of these in vitro studies. The manuscript is well-written, and the informatics analyses are thorough and clearly presented.

      However, in its current form, the study does not provide sufficient experimental details, and clarifications are needed. These are as follows:

      (1) The source of PDGF-AB/BB proteins is not detailed.<br /> (2) The irradiation parameters are not adequately reported - the authors should consider (PMCID: PMC5495460) for the parameters that should be reported.<br /> (3) The criteria for young and old patient donors are not explicitly described - though from the table, one presumes the cut-off for young is 27 years old.<br /> (4) What is the rationale for using different concentrations of PDGF-AB/BB in the degenerate cell and irradiation experiments?

      There are also a number of other issues the authors could consider. First, in the title and throughout the manuscript, the effects of PDGF-AB/BB are described as protective, yet in all the experiments, PDGF-AB/BB appears to be administered following either in vivo degeneration or in vitro irradiation, where protective effects (e.g., administration prior to insult) were not tested. Therefore, the effects of PDGF-AB/BB may be more accurately described as mitigating or therapeutic rather than protective.

      The authors state that the focus on NP (nucleus pulposus) cell studies is due to NP being the first site impacted during degeneration. However, this reviewer believes that this is because changes in the NP are more clinically evident (by imaging methods), despite degeneration often initiating from the AF (annulus fibrosus), e,g. through tears/microtears.

      A prior study has examined the effects of X-ray irradiation on NF-kB signaling in young and aged IVDs (PMCID: PMC5495460), and the authors may wish to consider this work.

    3. Reviewer #2 (Public review):

      Summary:

      This work highlights a novel role for platelet-derived growth factor (PDGF) in mitigating cellular senescence associated with age-related and painful intervertebral disc degeneration. Prior literature has demonstrated the importance of the accumulation of senescent cells in mediating many of the pathological effects associated with the degenerate disc joint such as inflammation and tissue breakdown. In this study, the authors treat clinically relevant human nucleus pulposus and annulus fibrosus cells from patients undergoing discectomy with recombinant PDGF-AB/BB for 5 days and then deep phenotyped the outcomes using bulk RNA sequencing. In addition, they irradiated healthy human disc cells which they subsequently treated with PDGF-AB/BB examining the expression of SASP-related markers and also PDGFRA receptor gene expression. Overall PDGF was able to down-regulate many senescent-associated pathways and the degenerate phenotype in IVD cells. Altered pathways were associated with neurogenesis, mechanical stimuli, metabolism, cell cycle, reactive oxygen species, and mitochondrial dysfunction. Overall the authors achieved their aims and the results by and large support their conclusions although improvements could be made to enhance the rigor of the study and findings.

      Strengths:

      A major strength of this study is the use of human cells from patients undergoing discectomy for disc herniation as well as access to healthy human cells. Investigating the role of PDGF regarding cellular senescence in the degenerate disc joint is a novel and underexplored area of research which is a significant contribution to the field of spine. This study highlights a potential target for addressing cellular senescence where most of the prior focus has been on senolytic drugs. Such studies have broad implications for other age-related diseases where senescence plays a major role. The use of transcriptomics and therefore an unbiased approach to investigating the role of PDGF is also considered a strength as are the follow-up studies involving irradiating healthy human disc cells and treating these cells with PDGF. The combined assessment of both nucleus pulposus and annulus fibrosus cells in the context of these studies adds to the impact.

      Weaknesses:

      A weakness of these studies lies in the lack of experimental details provided in the methodology including the rationale for such methods/conditions. Many details such as the specific culture models utilized, substrates, cell density, and media components are missing which impacts rigor. Such details would strengthen the manuscript and the ability to replicate and build on such work/findings. An additional weakness relates to qualitative data presented such as the B-galactosidase assay and immunofluorescence of senescence-associated proteins such as P21 and P16. Quantification of such data sets would greatly strengthen the studies and lend further support to the hypotheses. The study in its current form could be strengthened by the inclusion of mechanistic studies probing the downstream PDGF receptor-associated pathways for example specifically targeting or modulating the activity of the PDGF receptor PDGFRA including validation of the gene data for PDGFRA with protein level data to determine if the transcripts are being translated to protein. The claim that in annulus fibrosus cells, PDGF do not mediate their effects via the PDGFRA does not appear to be supported by the current data as only gene expression for the receptor was assessed with no functional or mechanistic studies being performed. Further discussion, interpretation, and direct comparison of the nucleus pulposus and annulus fibrosus data sets would be helpful for the readers. The magnitude of changes related to the effects of PDGF-BB on the S-phase in irradiated NP and AF cells between control and treated groups seem small making interpretation of these findings challenging.

    1. eLife Assessment

      In this manuscript, Yao et al. present solid evidence to show that MnMYL3 may serve as a receptor for NNV via macropinocytosis. This manuscript is valuable for understanding the molecular mechanisms of NNV cell entry. However, the manuscript will benefit from broader implications of these findings for cell entry of other viruses.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors discovered MYL3 of marine medaka (Oryzias melastigma) as a novel NNV entry receptor, elucidating its facilitation of RGNNV entry into host cells through macropinocytosis, mediated by the IGF1R-Rac1/Cdc42 pathway.

      Strengths:

      In this manuscript, the authors have performed in vitro and in vivo experiments to prove that MnMYL3 may serve as a receptor for NNV via macropinocytosis pathway. These experiments with different methods include Co-IP, RNAi, pulldown, SPR, flow cytometry, immunofluorescence assays, and so on. In general, the results are clearly presented in the manuscript.

      Weaknesses:

      For the writing in the introduction and discussion sections, the author Yao et al mainly focus on the viral pathogens and fish in Aquaculture, the meaning and novelty of results provided in this manuscript are limited, and not broad in biology. The authors should improve the likely impact of their work on the viral infection field, maybe also in the evolutionary field with the fish model.

      (1) Myosin is a big family, why did authors choose MYL3 as a candidate receptor for NNV?

      (2) What is the relationship between MmMYL3 and MmHSP90ab1 and other known NNV receptors? Why does NNV have so many receptors? Which one is supposed to serve as the key entry receptor?

      (3) In vivo knockout of MYL3 using CRISPR-Cas9 should be conducted to verify whether the absence of MYL3 really inhibits NNV infection. Although it might be difficult to do it in marine medaka as stated by the authors, the introduction of zebrafish is highly recommended, since it has already been reported that zebrafish could serve as a vertebrate model to study NNV (doi: 10.3389/fimmu.2022.863096).

      (4) The results shown in Figure 6 are not enough to support the conclusion that "RGNNV triggers macropinocytosis mediated by MmMYL3". Additional electron microscopy of macropinosomes (sizes, morphological characteristics, etc.) will be more direct evidence.

      (5) MYL3 is "predominantly found in muscle tissues, particularly the heart and skeletal muscles". However, NNV is a virus that mainly causes necrosis of nervous tissues (brain and retina). If MYL3 really acts as a receptor for NNV, how does it balance this difference so that nervous tissues, rather than muscle tissues, have the highest viral titers?

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript offers an important contribution to the field of virology, especially concerning NNV entry mechanisms. The major strength of the study lies in the identification of MmMYL3 as a functional receptor for RGNNV and its role in macropinocytosis, mediated by the IGF1R-Rac1/Cdc42 signaling axis. This represents a significant advance in understanding NNV entry mechanisms beyond previously known receptors such as HSP90ab1 and HSC70. The data, supported by comprehensive in vitro and in vivo experiments, strongly justify the authors' claims about MYL3's role in NNV infection in marine medaka.

      Strengths:

      (1) The identification of MmMYL3 as a functional receptor for RGNNV is a significant contribution to the field. The study fills a crucial gap in understanding the molecular mechanisms governing NNV entry into host cells.

      (2) The work highlights the involvement of IGF1R in macropinocytosis-mediated NNV entry and downstream Rac1/Cdc42 activation, thus providing a thorough mechanistic understanding of NNV internalization process. This could pave the way for further exploration of antiviral targets.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript presents a detailed study on the role of MmMYL3 in the viral entry of NNV, focusing on its function as a receptor that mediates viral internalization through the macropinocytosis pathway. The use of both in vitro assays (e.g., Co-IP, SPR, and GST pull-down) and in vivo experiments (such as infection assays in marine medaka) adds robustness to the evidence for MmMYL3 as a novel receptor for RGNNV. The findings have important implications for understanding NNV infection mechanisms, which could pave the way for new antiviral strategies in aquaculture.

      Strengths:

      The authors show that MmMYL3 directly binds the viral capsid protein, facilitates NNV entry via the IGF1R-Rac1/Cdc42 pathway, and can render otherwise resistant cells susceptible to infection. This multifaceted approach effectively demonstrates the central role of MmMYL3 in NNV entry.

    1. eLife Assessment

      This important work develops a new protocol to experimentally perturb target genes across a quantitative range of expression levels in cell lines. The evidence supporting their new perturbation approach is compelling, and the computational analyses to better understand dosage response relationships between genes are convincing. The study will be of broad interest to scientists in the fields of functional genomics and biotechnology. However, the evidence supporting the conclusions can be further improved.

    2. Reviewer #1 (Public review):

      In this manuscript, Domingo et al. present a novel perturbation-based approach to experimentally modulate the dosage of genes in cell lines. Their approach is capable of gradually increasing and decreasing gene expression. The authors then use their approach to perturb three key transcription factors and measure the downstream effects on gene expression. Their analysis of the dosage response curve of downstream genes reveals marked non-linearity.

      One of the strengths of this study is that many of the perturbations fall within the physiological range for each cis gene. This range is presumably between a single-copy state of heterozygous loss-of-function (log fold change of -1) and a three-copy state (log fold change of ~0.6). This is in contrast with CRISPRi or CRISPRa studies that attempt to maximize the effect of the perturbation, which may result in downstream effects that are not representative of physiological responses.

      Another strength of the study is that various points along the dosage-response curve were assayed for each perturbed gene. This allowed the authors to effectively characterize the degree of linearity and monotonicity of each dosage-response relationship. Ultimately, the study revealed that many of these relationships are non-linear, and that the response to activation can be dramatically different than the response to inhibition.

      To test their ability to gradually modulate dosage, the authors chose to measure three transcription factors and around 80 known downstream targets. As the authors themselves point out in their discussion about MYB, this biased sample of genes makes it unclear how this approach would generalize genome-wide. In addition, the data generated from this small sample of genes may not represent genome-wide patterns of dosage response. Nevertheless, this unique data set and approach represents a first step in understanding dosage-response relationships between genes.

      Another point of general concern in such screens is the use of the immortalized K562 cell line. It is unclear how the biology of these cell lines translates to the in vivo biology of primary cells. However, the authors do follow up with cell-type-specific analyses (Figures 4B, 4C, and 5A) to draw a correspondence between their perturbation results and the relevant biology in primary cells and complex diseases.

      The conclusions of the study are generally well supported with statistical analysis throughout the manuscript. As an example, the authors utilize well-known model selection methods to identify when there was evidence for non-linear dosage response relationships.

      Gradual modulation of gene dosage is a useful approach to model physiological variation in dosage. Experimental perturbation screens that use CRISPR inhibition or activation often use guide RNAs targeting the transcription start site to maximize their effect on gene expression. Generating a physiological range of variation will allow others to better model physiological conditions.

      There is broad interest in the field to identify gene regulatory networks using experimental perturbation approaches. The data from this study provides a good resource for such analytical approaches, especially since both inhibition and activation were tested. In addition, these data provide a nuanced, continuous representation of the relationship between effectors and downstream targets, which may play a role in the development of more rigorous regulatory networks.

      Human geneticists often focus on loss-of-function variants, which represent natural knock-down experiments, to determine the role of a gene in the biology of a trait. This study demonstrates that dosage response relationships are often non-linear, meaning that the effect of a loss-of-function variant may not necessarily carry information about increases in gene dosage. For the field, this implies that others should continue to focus on both inhibition and activation to fully characterize the relationship between gene and trait.

    3. Reviewer #2 (Public review):

      Summary:

      This work investigates transcriptional responses to varying levels of transcription factors (TFs). The authors aim for gradual up- and down-regulation of three transcription factors GFI1B, NFE2, and MYB in K562 cells, by using a CRISPRa- and a CRISPRi line, together with sgRNAs of varying potency. Targeted single-cell RNA sequencing is then used to measure gene expression of a set of 90 genes, which were previously shown to be downstream of GFI1B and NFE2 regulation. This is followed by an extensive computational analysis of the scRNA-seq dataset. By grouping cells with the same perturbations, the authors can obtain groups of cells with varying average TF expression levels. The achieved perturbations are generally subtle, not reaching half or double doses for most samples, and up-regulation is generally weak below 1.5-fold in most cases. Even in this small range, many target genes exhibit a non-linear response. Since this is rather unexpected, it is crucial to rule out technical reasons for these observations.

      Strengths:

      The work showcases how a single dataset of CRISPRi/a perturbations with scRNA-seq readout and an extended computational analysis can be used to estimate transcriptome dose responses, a general approach that likely can be built upon in the future.

      Weaknesses:

      (1) The experiment was only performed in a single replicate. In the absence of an independent validation of the main findings, the robustness of the observations remains unclear.

      (2) The analysis is based on the calculation of log-fold changes between groups of single cells with non-targeting controls and those carrying a guide RNA driving a specific knockdown. How the fold changes were calculated exactly remains unclear, since it is only stated that the FindMarkers function from the Seurat package was used, which is likely not optimal for quantitative estimates. Furthermore, differential gene expression analysis of scRNA-seq data can suffer from data distortion and mis-estimations (Heumos et al. 2023 (https://doi.org/10.1038/s41576-023-00586-w), Nguyen et al. 2023 (https://doi.org/10.1038/s41467-023-37126-3)). In general, the pseudo-bulk approach used is suitable, but the correct treatment of drop-outs in the scRNA-seq analysis is essential.

      (3) Two different cell lines are used to construct dose-response curves, where a CRISPRi line allows gene down-regulation and the CRISPRa line allows gene upregulation. Although both lines are derived from the same parental line (K562) the expression analysis of Tet2, which is absent in the CRISPRi line, but expressed in the CRISPRa line (Figure S3A) suggests substantial clonal differences between the two lines. Similarly, the PCA in S4A suggests strong batch effects between the two lines. These might confound this analysis.

      (4) The study uses pseudo-bulk analysis to estimate the relationship between TF dose and target gene expression. This requires a system that allows quantitative changes in TF expression. The data provided does not convincingly show that this condition is met, which however is an essential prerequisite for the presented conclusions. Specifically, the data shown in Figure S3A shows that upon stronger knock-down, a subpopulation of cells appears, where the targeted TF is not detected anymore (drop-outs). Also Figure 3B (top) suggests that the knock-down is either subtle (similar to NTCs) or strong, but intermediate knock-down (log2-FC of 0.5-1) does not occur. Although the authors argue that this is a technical effect of the scRNA-seq protocol, it is also possible that this represents a binary behavior of the CRISPRi system. Previous work has shown that CRISPRi systems with the KRAB domain largely result in binary repression and not in gradual down-regulation as suggested in this study (Bintu et al. 2016 (https://doi.org/10.1126/science.aab2956), Noviello et al. 2023 (https://doi.org/10.1038/s41467-023-38909-4)).

      (5) One of the major conclusions of the study is that non-linear behavior is common. This is not surprising for gene up-regulation, since gene expression will reach a plateau at some point, but it is surprising to be observed for many genes upon TF down-regulation. Specifically, here the target gene responds to a small reduction of TF dose but shows the same response to a stronger knock-down. It would be essential to show that his observation does not arise from the technical concerns described in the previous point and it would require independent experimental validations.

      (6) One of the conclusions of the study is that guide tiling is superior to other methods such as sgRNA mismatches. However, the comparison is unfair, since different numbers of guides are used in the different approaches. Relatedly, the authors point out that tiling sometimes surpassed the effects of TSS-targeting sgRNAs, however, this was the least fair comparison (2 TSS vs 10 tiling guides) and additionally depends on the accurate annotation of TSS in the relevant cell line.

      (7) Did the authors achieve their aims? Do the results support the conclusions?: Some of the most important conclusions are not well supported because they rely on accurately determining the quantitative responses of trans genes, which suffers from the previously mentioned concerns.

      (8) Discussion of the likely impact of the work on the field, and the utility of the methods and data to the community:<br /> Together with other recent publications, this work emphasizes the need to study transcription factor function with quantitative perturbations. Missing documentation of the computational code repository reduces the utility of the methods and data significantly.

    1. eLife Assessment

      This study presents valuable data on the identification and function of a protein complex present at the Maurer's cleft organelles of Plasmodium falciparum-infected red blood cells. The evidence supporting the findings is solid, but would benefit from greater rigor in presentation and analysis.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, Blancke Soares and Stäcker et al serendipitously identify a domain of the Plasmodium falciparum protein MSRP6 that mediates both export from the parasite into the infected red blood cell and association with the Maurer's cleft organelles found in the infected cell. The authors use this domain to identify a putative complex of proteins at the Maurer's cleft via proximity biotinylation. Six members of the complex are confirmed to interact with MSRP6 by co-immunoprecipitation.

      The functions of select proteins of this complex are further investigated with regard to the formation of Maurer's clefts. Disruption of PeMP2, PIESP2, and Pf332 resulted in morphological changes to the Maurer's clefts and prevented the anchoring of the Maurer's clefts to the infected red blood cell plasma membrane that normally occurs in the trophozoite stage. Curiously, disruption of MSRP6, the central member of the complex, did not affect Maurer's cleft anchoring. Mechanistically, how this complex affects Maurer's cleft structure and anchoring remains unclear.

      Finally, the authors show that the loss of Maurer's cleft anchoring observed upon disruption of PIESP2 or Pf332 does not affect cytoadherence of infected red blood cells via PfEMP1, arguing against a prior assumption that cleft tethering is required for the presentation of parasite-exported proteins on the infected red blood cell surface.

      Strengths:

      Maurer's clefts are enigmatic organelles found in red blood cells infected by Plasmodium falciparum that are presumed to play a role in trafficking exported parasite proteins to the surface of the red blood cells, though little is known about their biogenesis and function. The authors here convincingly identify a protein complex present at the Maurer's clefts using multiple orthogonal tools, and carry out assays that indicate this protein complex has a role in shaping and anchoring the Maurer's clefts at their final location at the red blood cell membrane. The data indicating that Maurer's cleft anchoring is dispensable for trafficking of P. falciparum exported proteins to the infected red blood cell membrane has implications for understanding the function of this organelle.

      Weaknesses:

      In many instances, the data lack appropriate controls that would be desirable for the highest level of rigor. Many, if not most, fluorescence microscopy assays lack untagged/parental controls (prepared in parallel and captured with the same settings) that are necessary to determine the validity of the data - that the observed signal is specific to the protein of interest and not due to autofluorescence or bleed-through from other channels. In other cases, wild-type controls are missing where data from disruption mutants are presented. Additionally, while some phenotypes are quantified, others are only qualitatively described where a more thorough quantitative investigation would be valuable. Finally, where phenotypes have been quantified, in many instances it is not clear that the analyses have included biological replicates as would be expected.

    3. Reviewer #2 (Public review):

      Summary:

      Soares et al characterize several P. falciparum exported proteins that localize to Maurer's Clefts (MCs), membrane structures formed in the host erythrocyte cytosol. MCs are thought to act as sorting stations that mediate the trafficking of effector proteins to the erythrocyte membrane, such as the surface adhesin and major virulence factor PfEMP1. While initially mobile within the host cytosol, MCs become anchored at the erythrocyte periphery around the time PfEMP1 appears on the RBC surface. While MC immobilization is thought to be important for the delivery of PfEMP1 onto the erythrocyte surface, this hypothesis has remained untested due to the lack of mutants that prevent anchoring. The study begins by determining the sequence features able to mediate the export of PF3D7_0830300 and MSRP6, both PEXEL-Negative Exported Proteins (PNEPs) with signal peptides. The authors show that in both proteins, a region downstream of the signal peptide is sufficient to mediate export, indicating the mature N-terminus is also important for the translocation of this type of PNEP, similar to other classes of exported proteins. Surprisingly, an additional C-terminal region of MSRP6 is also sufficient to mediate export when placed downstream of the signal peptide in the absence of other MSRP6 features. This region also mediates recruitment to MCs and was used as BioID bait to identify proximal MC proteins, several of which form a complex with MSRP6. Strikingly, disruption of certain MSRP6 interacting proteins (PeMP2, PIESP2, and Pf332) abolishes MC anchoring and in some cases also results in major changes in MC morphology. Surprisingly, neither PfEMP1 surface display nor cytoadhesion of infected RBCs is impacted in these mutants. This study features an impressive array of genetically modified parasites and will be of broad interest in providing the first functional analysis of MC anchoring, challenging the prevailing model for PfEMP1 trafficking within the infected RBC.

      Strengths:

      (1) The first section of the paper presents an in-depth dissection of the features that enable the export of signal peptide-containing PNEPs, confirming the mature N-terminus is sufficient for export across all known types of exported proteins. While it remains unknown how these features enable export, the results reinforce the universal importance of the mature N-terminus, whether generated by signal peptidase or Plasmepsin 5.

      (2) The discovery that a C-terminal region of MSRP6 (MAD) is also sufficient for export is novel. The authors suggest this may be the result of piggybacking on another exported protein, although the discussion acknowledges there are challenges with this model since unfolding by PTEX would be expected to disrupt these interactions. An alternative might be considered: the related protein MSRP7 is also exported but consists essentially of a signal peptide and MSP7-like domain without the large N-terminal region found in MSRP6. Presumably, the mature N-terminus of MSRP7 mediates export. If MSRP6 is derived from an exported predecessor composed only of the MSP7-like domain (like MSRP7), the MAD domain might retain the ancestral export information near the beginning of the MSP7-like domain. If this were the case, then the MAD domain (3cd region) should only be sufficient to mediate export when positioned immediately after the signal peptide as in the experiment in Fig 3C (SP-3cd-GFP). It would be interesting to determine if an SP-GFP-3cd construct is exported.

      (3) Disruption of PeMP2, PIESP2 or Pf332 is found to prevent MC anchoring. This is the most exciting part of the study as it provides the first set of mutants that interfere with anchoring, enabling the surprising observation that MC immobilization is not important for PfEMP1 surface display or cytoadhesion. The MC movement assay is a nice way to visualize anchoring and would be strengthened by a quantitative measure of colocalization between the time-lapse images (ie, Pearson correlation coefficient) to enable a statistical test. The use of SLI to specifically activate a var gene of choice is an exciting new approach that will be of great use to the PfEMP1 field together with the semi-automated binding assay that helps to increase throughput and reduce bias.

      Weaknesses:

      (1) At least two of the MSRP6 complex members were found to depend on other complex members for MC trafficking: PeMP3 depends on MSRP6 and Pf332 depends on PIESP2 (previously shown by Zhang et al 2018 and confirmed in the present study). While the authors disrupted all seven MSRP6 complex members, the impact on the trafficking of the other complex members was not systematically investigated. It would be particularly interesting to know which (if any) complex members are required for MC recruitment of PeMP2 since this protein is also needed for MC anchoring.

      (2) Some images of exported puncta are interpreted as localization to the MCs without a co-marker. Since other compartments have been identified in the RBC cytosol in addition to MCs (ie, J dots), an MC co-marker would help to verify these actually correspond to MCs. For example, in Figure 5B, GEXP18 gives an exported punctate appearance but lack of co-localization with SBP1 in Fig S2B shows that this does not correspond to MCs.

      (3) The authors show MAHRP2 localization is not impacted in their PIESP2 and Pf332 mutants and this is interpreted to indicate the tether structures are not disrupted. However, this conclusion requires actual analysis of the tether structures by electron microscopy since MAHRP2 association to MCs may not require tether integrity and could persist even if the tethers are altered or disrupted. Otherwise, this statement should be adjusted. Additionally, since T2A skipping efficiency can vary between constructs, it would be a good idea to perform a western blot to ensure that the SBP1-GFP and MAHRP2-mScarlet signals in Figure 8D,F reflect separated proteins.

      (4) The trypsin assays to monitor PfEMP1 surface display would benefit from a more detailed explanation of how the results were interpreted. For instance, though perhaps less intense than in the PIESP2, Pf332, and MSRP6 mutants, a Var01-protected fragment is also seen in the SBP1 mutant. Additionally, a protected fragment is indicated for most of the SBP1N controls (asterisk). As per the author's experimental design (lines 956-957), does this indicate that the RBC membrane was partially compromised during the experiment? In line 505, the trypsin assay data in the mutants is interpreted relative to the parent IT4var01-HA line but no data is shown for the parent.

    4. Reviewer #3 (Public review):

      Summary:

      Malaria is caused by Plasmodium falciparum parasites that infect, grow, and reproduce inside red blood cells. The parasites extensively modify the blood cells they infect, by exporting hundreds of proteins into the red blood cell compartment. One of the most important modifications made by the parasite is to display adhesive proteins on the blood cell surface which attach the infected cells to walls of small blood vessels. This can lead to organ damage resulting in serious disease complications and there is great interest in blocking the adhesive process to reduce disease. This study investigates the function of an atypical, exported protein that along with other proteins maintains the integrity of membranous sacs formed by the parasite in the blood cell compartment. These sacs are widely believed to help organise the display of the adhesive proteins on the infected blood cell surface. This study challenges this dogma by showing that disruption of the sacs does not prevent the display of the adhesive proteins suggesting alternative pathways are likely involved in adhesive protein display.

      Strengths:

      The conclusions are supported by a beautiful series of live parasite images.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

    1. eLife Assessment

      Morphological characteristics and phenotypes of mutations in key developmental genes suggest that head, trunk, and tail development are regulated by discernible modules. Gdf11 signalling plays a crucial role in orchestrating the transition from trunk to tail tissues in vertebrate embryos. This important study presents convincing evidence that Tgfbr1 acts upstream of Isl1 (a pivotal effector of Gdf11 signalling) and regulates blood vessels, the lateral plate mesoderm, and the endoderm associated with the trunk-to-tail transition. Together with the previous studies, this work identifies a key signal that acts as the pivot of the trunk-to-tail transition.

    2. Joint Public Review:

      Previously, this group showed that Tgfbr1 regulates the reorganization of the epiblast and primitive streak into the chordo-neural hinge and tailbud during the trunk-to-tail transition. Gdf11 signaling plays a crucial role in orchestrating the transition from trunk to tail tissues in vertebrate embryos, including the reallocation of axial progenitors into the tailbud and Tgfbr1 plays a key role in mediating its signaling activity. Progenitors that contribute to the extension of the neural tube and paraxial mesoderm into the tail are located in this region. In this work, the authors show that Tgfbr1 also regulates the reorganization of the posterior primitive streak/base of allantois and the endoderm as well.

      By analyzing the morphological phenotypes and marker gene expression in Tgfbr1 mutant mouse embryos, they show that it regulates the merger of somatic and splanchnic layers of the lateral plate mesoderm, the posterior streak derivative. They also present evidence suggesting that Tgfbr1 acts upstream of Isl1 (key effector of Gdf11 signaling for controlling differentiation of lateral mesoderm progenitors) and regulates the remodelling of the major blood vessels, the lateral plate mesoderm and endoderm associated with the trunk-to-tail transition. Through a detailed phenotypic analysis, the authors observed that, similarly to Isl1 mutants, the lack of Tgfbr1 in mouse embryos hinders the activation of hindlimb and external genitalia maker genes and results in a failure of lateral plate mesoderm layers to converge during tail development. As a result, they interpret that ventral lateral mesoderm, which generates the peri cloacal mesenchyme and genital tuberculum, fails to specify.

      They also show defects in the morphogenesis of the dorsal aorta at the trunk/tail juncture, resulting in an aberrant embryonic/extraembryonic vascular connection. Endoderm reorganization defects following abnormal morphogenesis of the gut tube in the Tgfbr1 mutants cause failure of tailgut formation and cloacal enlargement. Thus, Tgfbr1 activity regulates the morphogenesis of the trunk/tail junction and the morphogenetic switch in all germ layers required for continuing post-anal tail development. Taken together with the previous studies, this work places Gdf11/8 - Tgfbr1 signaling at the pivot of trunk-to-tail transition and the authors speculate that critical signaling through Tgfbr1 occurs in the posterior-most part of the caudal epiblast, close to the allantois.

      The data shown is solid with excellent embryology/developmental biology. This work demonstrates meticulous execution and is presented in a comprehensive and coherent manner. Although not completely novel, the results/conclusions add to the known function of Gdf11 signaling during the trunk-to-tail transition.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Previously, this group showed that Tgfbr1 regulates the reorganization of the epiblast and primitive streak into the chordo-neural hinge and tailbud during the trunk-to-tail transition. Gdf11 signaling plays a crucial role in orchestrating the transition from trunk to tail tissues in vertebrate embryos, including the reallocation of axial progenitors into the tailbud and Tgfbr1 plays a key role in mediating its signaling activity. Progenitors that contribute to the extension of the neural tube and paraxial mesoderm into the tail are located in this region. In this work, the authors show that Tgfbr1 also regulates the reorganization of the posterior primitive streak/base of allantois and the endoderm as well. 

      By analyzing the morphological phenotypes and marker gene expression in Tgfbr1 mutant mouse embryos, they show that it regulates the merger of somatic and splanchnic layers of the lateral plate mesoderm, the posterior streak derivative. They also present evidence suggesting that Tgfbr1 acts upstream of Isl1 (key effector of Gdf11 signaling for controlling differentiation of lateral mesoderm progenitors) and regulates the remodelling of the major blood vessels, the lateral plate mesoderm and endoderm associated with the trunk-to-tail transition. Through a detailed phenotypic analysis, the authors observed that, similarly to Isl1 mutants, the lack of Tgfbr1 in mouse embryos hinders the activation of hindlimb and external genitalia maker genes and results in a failure of lateral plate mesoderm layers to converge during tail development. As a result, they interpret that ventral lateral mesoderm, which generates the peri cloacal mesenchyme and genital tuberculum, fails to specify. 

      They also show defects in the morphogenesis of the dorsal aorta at the trunk/tail juncture, resulting in an aberrant embryonic/extraembryonic vascular connection. Endoderm reorganization defects following abnormal morphogenesis of the gut tube in the Tgfbr1 mutants cause failure of tailgut formation and cloacal enlargement. Thus, Tgfbr1 activity regulates the morphogenesis of the trunk/tail junction and the morphogenetic switch in all germ layers required for continuing post-anal tail development. Taken together with the previous studies, this work places Gdf11/8 - Tgfbr1 signaling at the pivot of trunk-to-tail transition and the authors speculate that critical signaling through Tgfbr1 occurs in the posterior-most part of the caudal epiblast, close to the allantois. 

      Strengths: 

      The data shown is solid with excellent embryology/developmental biology. This work demonstrates meticulous execution and is presented in a comprehensive and coherent manner. Although not completely novel, the results/conclusions add to the known function of Gdf11 signaling during the trunk-to-tail transition. 

      Weaknesses: 

      The authors rely on the expression of a small number of key regulatory genes to interpret the developmental defects. The alternative possibilities remain to be ruled out thoroughly. The manuscript is also quite descriptive and would benefit from more focused highlighting of the novelty regarding the absence of Tgfbr1 in the mouse embryo. They should also strengthen some of their conclusions with more details in the results.

      Although we used a limited number of key regulatory genes to interpret the phenotype, these genes were carefully chosen to focus on specific processes involving the lateral mesoderm, its derivatives, and the endoderm. In addition to these markers, we included references to other relevant markers that were previously analyzed and initially led us to examine the lateral plate mesoderm and tail gut in Tgfbr1 mutants. To strengthen our analysis, we have now incorporated additional data to clarify specific phenotypes. For instance, in situ hybridization (ISH) for Shh further confirms abnormalities at the caudal end of the endoderm in mutant embryos, while no endodermal defects are observed in the trunk region. We also included an analysis of the intermediate mesoderm, which shows abnormalities at the same level as those found in the lateral plate mesoderm and endoderm of Tgfbr1 mutants.

      It’s important to note that using additional markers to assess the epiblast/primitive streak of Tgfbr1 mutants at E7.5–E8.5, as suggested by a reviewer, is unlikely to yield new insights. At these early stages, Tgfbr1 mutant embryos do not display observable phenotypes in the main body axis. Data in this manuscript already demonstrate the absence of abnormalities at this stage, as shown in Figure 3 and Supplementary Figure 6. Additionally, the expression of certain genes showing abnormalities when the embryo would enter tail development, in the trunk their expression remains unaffected, indicating that trunk extension is not significantly impacted by Tgfbr1 deficiency. While transcriptomic analysis of these Tgfbr1 mutants could provide interesting insights, it would be more appropriate to focus on later developmental stages, which would be beyond the scope of the current study.

      The second major critique was that the manuscript is primarily descriptive. We disagree with this assessment. Several hypotheses were rigorously tested using genetic approaches, including Isl1 knockout experiments, cell tracing from the primitive streak with a newly generated Cre driver to activate a reporter from the ROSA26 locus, and assessment of extraembryonic endoderm fate in Tgfbr1 mutants by introducing the Afp-GFP transgene into the Tgfbr1 mutant background. Additionally, we conducted tracing analyses of tail bud cell contributions to the tail gut via DiI injection and embryo incubation. To address potential concerns regarding this experiment, we have included data showing the DiI position immediately after injection to confirm that it does not contact the tail gut. We also considered and accounted for potential DiI leakage into neuromesodermal progenitors to clarify the endodermal results.

      Our genetic and DiI experiments were specifically designed to differentiate between alternative hypotheses and to confirm hypotheses generated from other analyses. Additionally, improvements in some of the imaging data have helped address remaining concerns.

      Reviewer #1 (Recommendations For The Authors): 

      I have listed my suggestions as queries. The authors may perform experiments or clarify by editing the text to address them. 

      The authors state on Page 11 and elsewhere that the ventral lateral mesoderm is absent in the Tgfbr1 mutant. What is the basis for this conclusion? Are there specific markers for PCM or GT primordium? 

      The specific marker of PCM and GT primordium is Isl1. The absence of this marker in the Tgfbr1 mutants is shown in (Dias et al, 2020). The reference is introduced in the manuscript.

      A schematic illustrating the VLM and the expression patterns of Tgfbr1, Gdf11, etc., would be helpful. 

      Characterization of Gdf11 expression has been previously reported (e.g. McPherron et al 1999, cited in our manuscript). It is expressed in the region containing of axial progenitors before the trunk to tail transition and not expressed in the VLM. As for Tgfbr1 expression is hard to detect, likely because it is ubiquitously expressed at low level. We include in this document some pictures of an ISH, including a control using the Tgfbr1 mutants to illustrate that the staining resembling background actually represents Tgfbr1 expression. If the reviewers find it important, we can also incorporate these data into the manuscript. Under these circumstances, we feel that a schematic might not be very informative.

      Author response image 1.

      Image showing an example of an ISH procedure with a probe against Tgfbr1, showing widespread and low expression. The lower picture shows a ventral view of a stained wild type E10.5 embryo.

      Foxf1+ cells in the 'extended LPM' of Tgfbr1 mutants suggest fate transformation, or does it indicate the misexpression of marker gene otherwise suppressed by Tgfbr1 activity? The authors suggest that Foxf1+ cells are VLM progenitors from posterior PS trapped in the extended LPM. Do they continue to express PS markers? 

      The observation that both in wild type and Tgfbr1 mutant embryos Foxf1 expression in the trunk is restricted to the splanchnic LPM indicates that the absence of this marker in the somatic LPM is not the result of a suppression of its expression by Tgfbr1. In wild type embryos Foxf1 is also expressed in the posterior PS, regulated independently of its expression in the LPM (i.e. Shh-independent) and later in the pericloacal mesoderm (our supplementary figure 2). As Foxf1 expression in the posterior PS was not suppressed in the Tgfbr1 mutants, together with the absence of pericloacal mesoderm, we interpret that the Foxf1-positive cells in the two layers around the extended celomic cavity in the posterior end of the mutant embryos derived from the posterior PS, resulting from the absence of its normal progression through the embryonic tissues.

      We did not find expression of PS markers giving rise to paraxial mesoderm, like Tbxt, further suggesting that those cells could derive from the restricted set of cells within the posterior PS that contribute to the pericloacal mesoderm

      For example, the misexpression of Apela is interpreted as mis-localized endoderm cells. They show scattered Keratin 8 misexpression to support the interpretation. It would be more convincing if the authors tested the expression of other endoderm markers. 

      As indicated in the manuscript, we suggest that these cells are endoderm progenitors (p. 13), like those present at the posterior end of the gut tube at E9.5 and E10.5, that are unable to incorporate into the gut tube. Apela is not a general endodermal marker: it is expressed in the foregut pocket and the nascent cells of the hindgut/tail gut, becoming down regulated as cells take typical endodermal signatures. The presence of ectopic Apela expression in the extended LPM of the mutant embryos might indeed indicate the presence of progenitors that failed to downregulate Apela resulting from the lack differentiation-associated downregulation. This would also implicate the absence of definitive endodermal markers.

      The Nodal signaling pathway in the anterior PS drives endoderm development. It acts through Alk7. Does Tgfbr1 (Alk5) mutation impact endoderm development, in general? It isn't easy to assess this from the Foxa2 in situ RNA hybridization shown in Figures 6A and B. It would be helpful for the readers if the authors clarified this point. 

      In the pictures shown in Figure 7D-D’ it is already shown that the endoderm is mostly preserved until the region of the trunk to tail transition. The presence of a rather normal endoderm in the embryonic trunk can also be seen with Shh, a figure added as Supplementary Fig.5.

      Reviewer #2 (Recommendations For The Authors): 

      The authors mention two interesting novel points which they should develop in the discussion, and probably also in the results. 

      (1) The authors speculate about the possible involvement of the posterior PS as a mediator of Gdf11/Tgfbr1 signaling activity. However, as mentioned in the manuscript, their experiments do not allow regional sublocalization within the PS... Here it would be important to assess/discuss in more detail which progenitors respond to this signaling activity and when they do it. At the very least, the authors should provide high-resolution spatiotemporal data of the expression of Tgfbr1 in the PS. 

      Tgfbr1 expression at this embryonic stage does not give clear differential patterns. The data reported for this expression in Andersson et al 2006 is very low quality and we have not been able to reproduce the reported pattern. On the contrary, all our efforts over the years provided a very general staining that could even be interpreted as background. When we now included Tgfbr1 mutants as controls, it became clear that the ubiquitous and low level signal observed in wild type embryos indeed represent Tgfbr1 expression pattern: low level and ubiquitous. We are attaching a figure to this document illustrating these observations. If required, this can also be included in the manuscript as a supplementary figure. 

      Also, the work of Wymeersch et al., 2019 regarding the lateral plate mesoderm progenitors (LPMPs) should be referred to and discussed here. 

      This was now added in the results (page 11) and in discussion (page 16). 

      For instance, are the LPMP transcriptomic differences detected between E7.5 and E8.5 caused by Tgfbr1 signaling activity? This question could be easily answered through a comparative bulk RNAseq analysis of the posterior-most region of the PS of mutant and WT embryos. The possible colocalization of Tgfb1 (Wymeersch et al., 2019) and Tgfbr1 in the LPMPs should also be addressed. 

      We agree with the suggestion that RNA-seq in the posterior PS of WT and mutant embryos might be informative. However, it is very likely that within the proposed timeframe (E7.5 to E8.5) that there are no significant differences between the wild type and the Tgfbr1 mutant embryos because there is no apparent axial phenotype in Tgfbr1 mutant embryos before the trunk to tail transition. Therefore, at this stage, we think that this experiment is out of the scope of the present manuscript. 

      (2) The activity of Tgfbr1 during the trunk-to-tail transition is critical for the development of tail endodermal tissues. Here the authors suggest again the involvement of the posterior PS/allantois region, but a similar phenotype can also be observed for instance in the absence of Snai1 in the caudal epiblast (Dias et al., 2020)... It would be important to assess/discuss the origin of those morphogenetic problems in the gut. Is it due to the reallocation of NMC cells into the CNH? The tailbud-EMT process? LPMPs specification?... Regional mutations or gain of functions of Snai1 or Tgfbr1 in the caudal epiblast would help answer the question.  

      The endodermal phenotype in the Snai1 mutants is different to that observed in the Tgfbr1 mutants. As can be observed in Figures 3, 4 and 5 of Dias et al. the absence of tailbud is replaced by a structure that extends the epiblast. As a consequence, the endoderm finishes at the base of that structure, even expanding to make a structure resembling the cloaca, which is different to what is seen in the Tgfbr1 mutants. In this case, the lack of tail gut is likely to result either from the lack of formation of the progenitors of the gut endoderm or from the dissociation of what would be the tail bud from the LPM. Actually, hindlimb/pericloacal mesoderm markers, like Tbx4, are preserved in the Snai1 mutant. As for the gain of function of Snai1 experiment, already reported also in Dias et al 2020, the destiny of these cells is not clear. The ISH for Foxa2 showed extra signals but as it is not an exclusive marker for endoderm it is not possible to know whether any of these signals correspond to endodermal tissues.

      Regarding the development of tail endodermal tissues, the authors suggest that it occurs from a structure derived from the PS that is located posteriorly, in the tailbud, after the tip of the growing gut. This is an important and novel point as it suggests that the primordia of the endoderm is not wholly specified during gastrulation. So the observation should be well supported. How can Anastasiia et al. distinguish such "structure" from the actual developing gut? Does it have a distinct molecular signature or any morphological landmark that enables its separation from the actual gut? The data suggests that the region highlighted in Supplementary Figure 4Ab contains part of the actual gut tube (the same is suggested in Figure 5B). If the authors think otherwise, they must characterize that region of the tailbud by doing a thorough morphological and gene/protein expression analysis and assess its potency, via transplantation experiments. Also, the authors' claim mostly relies on the DiI experiments and those have three problems: #1 Anastasiia et al. assess "tail" endodermal growth at E9.5 when the correct stage to do it is after E10.5 (after tailbud formation). 2# Incongruencies, low number (only three embryos), and diversity in the results shown in Figure 8 and Supplementary Figure 4. For instance, despite similar staining at 0h, the extension and amount of DiI present in the gut tube after 20h varies significantly amongst the differently labeled embryos. A possible explanation lies in the abnormal leakiness of the DiI labelings and that is confirmed by the observations shown in Supplementary Figure 4M-O; the same for Supplementary Figure 4G, which shows a substantial amount of DiI in the neural tube. 3# The authors must provide high-quality data showing which tissues/regions were labelled at time 0h, including transversal and sagittal sections as they did for the 20h time-point. Additionally, it is important to re-orient the sagittal optical sections to a position that also shows the neural tube (like a mid-sagittal section) and include information concerning the AP/DV axis, as well as the location of the transversal optical sections in the sagittal image. 

      As described in the reply to reviewer 1, Apela is expressed in the nascent tail gut endoderm but not in more anterior areas except for a foregut pocket, and becomes downregulated as the tube acquires endodermal signatures. Therefore, the structure to which the reviewer refers to might indeed represent a group of progenitors that extend the tail gut. And the observation that this property is observed only in the tail gut as it grows, already separates this region of the gut, which in the end do not contribute to mature organs, from more anterior areas of the endoderm (essentially anterior to the cloaca) that will become a relevant tissue of the intestinal organs. Our DiI labelling experiment was aimed to test whether this pool of cells contributes to the gut but does not allow to determine the nature of those cells, a question that will require further research (discussed on p. 17) and we think is beyond the scope of the present manuscript.

      Regarding the labelling at E10.5, we agree that the tail bud in terms of NMCs is not completely formed, for example, at E9.5 the neuropore is not yet closed. However, we are more interested in regression of the epiblast, which is complete by E9.5. Injecting at E9.5 also has technical advantages for us, first, because in our hands earlier embryos grow better in culture, and second, because it is easier to inject in the tailbud at E9.5 because it is a little bit bigger than at E10.5. Therefore, injecting at E9.5 is less prone to technical artifacts due to injection inaccuracy and compromised growth in culture.

      We agree that the injected DiI could also leak into NMPs, which might be located in the same area. However, while this could result in labeling of the neural tube, it would not affect the interpretation of the finding of labeled cells in the tail gut. Indeed, the presence of this label in the gut epithelium indicates the presence of progenitors in the injected region of the tail gut. We added some considerations of this the possible leakage into the results section of the manuscript (p. 15). We thank the reviewer for drawing our attention to this issue. 

      We also now provide high quality data showing labelled tissue at 0h in Supplementary figure 8A-c’, higher magnification images in Fig. 8, and reoriented optical sections in Fig.6 and in Supplementary Fig. 7, including axis and location of the sections as suggested by the reviewer.

      Minor concerns/comments: 

      (1) The abstract is quite long, though this might be fine for this journal. 

      (2) In relation to the comment on the abstract, the manuscript needs an initial Figure descrbing the events that are described in the introduction. Otherwise, the manuscript will only be accessible to mouse embryologists.

      We have a figure summarizing the results at the end of the manuscript, we think that including similar figure in the beginning might be redundant. What we could do, if required, is to include this type of schematic as a graphical abstract.

      (3) The authors need to clarify what they mean when they use the following expressions "PS fate" and "fate of the posterior PS".

      I do not think that we have used such expressions. Indeed, they did not come out when we run a “find” in the word document. However, they would mean the tissue that would come out from them at later developmental stages.

      (4) The assessment of Isl1 expression in Tgfbr1 mutant and transgenic mouse embryos would be better indicative of their molecular relationship than a comparative phenotypic analysis. 

      These data have been reported in Dias et al 2020 and Jurberg et al 2013, both cited in the manuscript.  

      (5) The authors should explain or discuss what the upregulation of Foxa2 in the posterior end of Tgfbr1 mutants means.

      While an upregulation is apparent in the figure, looking at other pictures we cannot be sure of this being a significantly quantifiable up-regulation. We therefore removed the statement from the text.

      (6) What happens to the intermediate mesoderm during the trunk-to-tail transition? Is Tgfbr1 involved in the regulation of its development?

      We have tested this using Pax2 and added the relevant data in Supplementary Fig. 1 and described in the results.

      (7) The term "potential" should not be used during the description of DiI labeling experiments as this technique only assesses cell fate.

      Corrected

      (8) Some figures lack AP/DV axis information (e.g. Figures 6, C, and D).

      Corrected

    1. eLife Assessment

      This study provides a methodological report on a modified adaptive sampling approach, multiple walker supervised molecular dynamics (mwSuMD), and its application to G protein-coupled receptors (GPCRs), which are the most abundant membrane proteins and key targets for drugs. The mwSuMD approach assists in sampling complex binding processes, leading to useful findings for GPCR activity, although results may be considered incomplete, given the high RMSD values reported and lack of validation using experimental data. The manuscript also needs corrected descriptions of high-resolution PDB structures and better relation to existing computational literature.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigate ligand and protein-binding processes in GPCRs (including dimerization) by the multiple walker supervised molecular dynamics method. The paper is interesting and it is very well written.

      Strengths:

      The authors' method is a powerful tool to gain insight on the structural basis for the pharmacology of G protein-coupled receptors.

    3. Reviewer #2 (Public review):

      The study by Deganutti and co-workers is a methodological report on an adaptive sampling approach, multiple walker supervised molecular dynamics (mwSuMD), which represents an improved version of the previous SuMD.<br /> Case-studies concern complex conformational transitions in a number of G protein Coupled Receptors (GPCRs) involving long time-scale motions such as binding-unbinding and collective motions of domains or portions. GPCRs are specialized GEFs (guanine nucleotide exchange factors) of heterotrimeric Gα proteins of the Ras GTPase superfamily. They constitute the largest superfamily of membrane proteins and are of central biomedical relevance as privileged targets of currently marketed drugs.<br /> MwSuMD was exploited to address:

      a) binding and unbinding of the arginine-vasopressin (AVP) cyclic peptide agonist to the V2 vasopressin receptor (V2R);<br /> b) molecular recognition of the β2-adrenergic receptor (β2-AR) and heterotrimeric GDP-bound Gs protein;<br /> c) molecular recognition of the A1-adenosine receptor (A1R) and palmotoylated and geranylgeranylated membrane-anchored heterotrimeric GDP-bound Gi protein;<br /> d) the whole process of GDP release from membrane-anchored heterotrimeric Gs following interaction with the glucagon-like peptide 1 receptor (GLP1R), converted to the active state following interaction with the orthosteric non-peptide agonist danuglipron.

      The revised version has improved clarity and rigor compared to the original also thanks to the reduction in the number of complex case studies treated superficially.<br /> The mwSuMD method is solid and valuable, has wide applicability and is compatible with the most world-widely used MD engines. It may be of interest to the computational structural biology community.<br /> The huge amount of high-resolution data on GPCRs makes those systems suitable, although challenging, for method validation and development.<br /> While the approach is less energy-biased than other enhanced sampling methods, knowledge, at the atomic detail, of binding sites/interfaces and conformational states is needed to define the supervised metrics, the higher the resolution of such metrics is the more accurate the outcome is expected to be. Definition of the metrics is a user- and system-dependent process.

    4. Reviewer #3 (Public review):

      Summary:

      In the present work Deganutti et al. report a structural study on GPCR functional dynamics using a computational approach called supervised molecular dynamics.

      Strengths:

      The study has potential to provide novel insight into GPCR functionality. Example is the interaction between D344 and R385 identified during the Gs coupling by GLP-1R. However, validation of the findings, even computationally through for instance in silico mutagenesis study, is advisable.

      Weaknesses:

      No significant advance of the existing structural data on GPCR and GPCR/G protein coupling is provided. Most of the results are reproductions of the previously reported structures.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors investigate ligand and protein-binding processes in GPCRs (including dimerization) by the multiple walker supervised molecular dynamics method. The paper is interesting and it is very well written.

      Strengths:

      The authors' method is a powerful tool to gain insight into the structural basis for the pharmacology of G protein-coupled receptors.

      Weaknesses:

      Cholesterol may play a fundamental role in GPCR dimerization (as cited by the authors, Prasanna et al, "Cholesterol-Dependent Conformational Plasticity in GPCR Dimers"). Yet they do not use cholesterol in their simulations of the dimerization.

      We thank Reviewer #1 for the positive comment on mwSuMD.

      In the revised version of the manuscript, the section about the A<sub>2A</sub>/D2 receptors dimerization has been removed because largely speculative. We agree that the lack of cholesterol in those simulations added uncertainty to the presented results.

      Reviewer #2 (Public Review):

      The study by Deganutti and co-workers is a methodological report on an adaptive sampling approach, multiple walker supervised molecular dynamics (mwSuMD), which represents an improved version of the previous SuMD.

      Case-studies concern complex conformational transitions in a number of G protein Coupled Receptors (GPCRs) involving long time-scale motions such as binding-unbinding and collective motions of domains or portions. GPCRs are specialized GEFs (guanine nucleotide exchange factors) of heterotrimeric Gα proteins of the Ras GTPase superfamily. They constitute the largest superfamily of membrane proteins and are of central biomedical relevance as privileged targets of currently marketed drugs.

      MwSuMD was exploited to address:

      (1) Binding and unbinding of the arginine-vasopressin (AVP) cyclic peptide agonist to the V2 vasopressin receptor (V2R);

      (2) Molecular recognition of the β2-adrenergic receptor (β2-AR) and heterotrimeric GDPbound Gs protein;

      (3) Molecular recognition of the A1-adenosine receptor (A1R) and palmitoylated and geranylgeranylated membrane-anchored heterotrimeric GDP-bound Gi protein;

      (4) The whole process of GDP release from membrane-anchored heterotrimeric Gs following interaction with the glucagon-like peptide 1 receptor (GLP1R), converted to the active state following interaction with the orthosteric non-peptide agonist danuglipron;

      (5) The heterodimerization of D2 dopamine and A2A adenosine receptors (D2R and A2AR, respectively) and binding to a bi-valent ligand.

      The mwSuMD method is solid and valuable, has wide applicability, and is compatible with the most world-widely used MD engines. It may be of interest to the computational structural biology community.

      The huge amount of high-resolution data on GPCRs makes those systems suitable, although challenging, for method validation and development.

      While the approach is less energy-biased than other enhanced sampling methods, knowledge, at the atomic detail, of binding sites/interfaces and conformational states is needed to define the supervised metrics, the higher the resolution of such metrics is the more accurate the outcome is expected to be. The definition of the metrics is a user- and system-dependent process.

      The too many and ambitious case-studies undermine the accuracy of the output and reduce the important details needed for a methodological report. In some cases, the available CryoEM structures could have been exploited better.

      The most consistent example concerns AVP binding/unbinding to V2R. The consistency with CryoEM data decreases with an increase in the complexity of the simulated process and involved molecular systems (e.g. receptor recognition by membrane-anchored G protein and the process of nucleotide exchange starting from agonist recognition by an inactive-state receptor). The last example, GPCR hetero-dimerization, and binding to a bi-valent ligand, is the most speculative one as it does not rely on high-resolution structural data for metrics supervision.

      We praise Reviewer #2 for the detailed comment on the manuscript. In this revised version, the hetero-dimerization between A<sub>2A</sub>R and D<sub>2</sub>R has been removed. Also, results about GPCR case studies other than GLP-1R have been reduced and downgraded in importance to focus on the fundamental key points of the adaptive sampling method.  We agree that the consistency with cryoEM data tends to decrease with an increase in the complexity of the simulated process and involved molecular systems. While it is possible to approximate cryoEM results  our unbiased adaptive sampling technique finds its most interesting application in mechanistically unknown out-of-equilibrium processes rather than reproducing known experimental data perfectly. The simulated case studies we present showcase the versatility, speed and consistency of our adaptive method to explore energetically unbiased transitions.

      Reviewer #3 (Public Review):

      Summary:

      In the present work, Deganutti et al. report a structural study on GPCR functional dynamics using a computational approach called supervised molecular dynamics.

      Strengths:

      The study has the potential to provide novel insight into GPCR functionality. An example is the interaction between loops of GPCR and G proteins, which are not resolved experimentally, or the interaction between D344 and R385 identified during the Gs coupling by GLP-1R. However, validation of the findings, even computationally through for instance in silico mutagenesis study, is advisable.

      Weaknesses:

      In its current form, the manuscript seems immature and in particular, the described results grasp only the surface of the complex molecular mechanisms underlying GPCR activation. No significant advance of the existing structural data on GPCR and GPCR/G protein coupling is provided. Most of the results are a reproduction of the previously reported structures.

      We thank Reviewer #3 for the positive comment on the work. The revised manuscript focuses more on the GLP-1R and Gs case studies. We believe it addresses the weaknesses raised by showing the behaviour of key structural motifs and providing new hypotheses about GDP release.  

      Reviewer #2 (Recommendations For The Authors):

      In this methodological report, Deganutti and co-workers propose an improved version of supervised molecular dynamics (SuMD), named multiple walker SuMD (mwSuMD). Such an adaptive sampling method was challenged in simulations of complex transitions involving GPCRs, which are out of reach by classical MD.

      Although less energy-biased than other enhanced sampling methods, mwSuMD requires knowledge of the atomic detail of the ligand-protein or protein-protein binding site/interfaces and the structural hallmarks of the states whose conversion the method is going to address. Such knowledge is, indeed, necessary to define the supervised metrics (e.g. distances, RMSD, etc), which is a user- and system-dependent process.

      We classify mwSuMD as an adaptive, rather than enhanced, sampling method as it does not use any energy bias. We agree with the Reviewer that some knowledge of the system is required to productively set up the simulations, but this is the case for almost any MD advanced methods.  

      The text requires improvement in the essential methodological details and cleaning of those parts is not properly instrumental in method validation.

      While attempting to prove the widest possible applicability of the method, the authors exaggerated the number of examples, which, in spite of the increasing complexity were only summarily described. Please, limit the case studies to AVP binding/unbinding to V2R and the whole process of GDP release from membrane-anchored Gs following activation of GLP1R by danuglipron. The latter case, indeed, involves small ligand binding (danuglipron), small ligand dissociation (GDP), receptor activation, and activated receptor binding to membraneanchored G protein and G protein conformational transition instrumental to nucleotide depletion, which is already too much. In this framework, the cases of Gs-β2AR and Gi-A2R recognition are redundant. Most importantly, the case of D2R-A2AR heterodimerization and binding to a bi-valent ligand must be eliminated. The reason is that the case is not entirely based on the mwSuMD and the biased protein-protein interface does not rely on highresolution data (i.e. no structural model of D2R-A2AR dimer has been determined so far). Last but not least, the high intrinsic flexibility of the bi-valent ligand adds further indetermination to the computational experiment. Being too speculative, the case-study does not serve to model validation.

      We thank the Reviewer for the suggestion. In the current revised form, the manuscript focuses on AVP binding/unbinding to V2R and the GLP-1R activation, Gs recognition and GDP release.

      While eliminating the three case studies mentioned above, the remaining ones should be described more extensively and clearly, highlighting the most productive setup for each system. Incidentally, listing the performance parameters (e.g. distribution mode and minimum RMSD) of each simulation setting in Table S1 is worth doing.

      More accuracy in the methodological description is needed.

      As for the supervised metrics, the rationale behind the choice of a particular index and whether it is the outcome of a number of trials must be declared and the selected indices must be better defined. Here there are a few examples.

      AVP-V2R case. It is not clear why the AVP centroids were computed on residues C1-Q4 (I suppose the Cα-atoms) and not on the Cα-atoms of the whole cyclic part (C1-C6). Along the same line, the choice of the Cα-atoms of four amino acid residues to compute the receptor binding-site centroids requires justification.

      We have amended the text to clarify that all the heavy atoms of AVP residues C1-Q4, which are anticipated to bind deep into V<sub>2</sub>R, were considered alongside V<sub>2</sub>R residues part of the peptide binding site (Cα atoms only). From our experience, the choice of including side chains or not for the definition of centroids usually does not affect the supervision output. It should only affect the output of mwSuMD simulations based on the RMSD which considers the specific relative distance from the reference. However, a benchmark of the differences produced by divergent selections is beyond the scope of the present work.

      GLP1R case. The statement: "Since the opening of TM1-ECL1 was observed in two replicas out of four, we placed the ligand in a favorable position for crossing that region of GLP-1R" is rather weak as a strategy to manually (?) define the input position of the ligand.

      As stated in the manuscript, placing the agonist in that position was driven by preliminary 8 μs of classic MD simulations that pointed out the possible path for binding.  We agree with the Reviewer that there is still some degree of arbitrarity in it and for this reason, we have not presented structural details of the F06882961 binding path.

      As for the supervised metrics, what does it mean "the distance between the ligand and GLP-1R TM7 residues L3797.34-F3817.36"? Was the distance computed between ligand and L379-F381 centroids? Also: "In the supervised stages, the distance between residues M386-L394 Gas of helix 5 (α5) and the GLP-1R intracellular residues R1762.46, R3486.37, S3526.41, and N4057.60 was monitored" was it an inter-centroid distance? Furthermore, "supervising the distance between AHD residues G70-R199 Gas and K300-L394Gas" was it the distance between the centroid of the AHD and the centroid of the C-terminal half of the Ras-like domain? In general, when more than two atoms are involved in distance calculation, please, specify if the distance is inter-centroid.

      Also: "During the third phase, the RMSD of PF06882961, as well as the RMSD of ECL3 (residues A3686.57-T3787.33, Ca atoms), were supervised" was the RMSD computed without superimposing the ligand to estimate its roto-translations?

      We have added details about the selections used for computing centroids throughout the methods section. For example, all the heavy atoms of F06882961 and the Ca atoms of L379-F381 were considered. RMSD values during GLP-1R activation were computed after superimposition on TM2, ECL1, and TM3 residues 170-240 (Ca atoms). This now has been specified in the text.

      The authors considered the 7LCJ GLP1R-danuglipron complex as a fully active reference state instead of considering the receptor from a ternary complex with Gs. The ternary complex (7LCI) was indeed considered as a reference only in simulations of receptor-G protein recognition. 

      7LCJ and 7LCI are both fully active states. The main difference is that in 7LCJ, Gs coordinates were not deposited. Indeed, their RMSD computed on the TMD Ca atoms and F06882961 is 0.63 Å and 0.54 Å, respectively.

      Most importantly, the ternary complex chosen by the authors is not adequate as a reference for simulating the "opening" of the AHD because it bears a miniGs, hence, missing the AHD. In that framework, such an opening is rather vague and was not properly supervised by mwSuMD. The authors must repeat metrics supervisions by using, as a reference, the 6X1A ternary complex, which bears a displaced AHD. This would likely lead to a different path of GDP release.

      To the best of our knowledge, there is no evidence that a specific open conformation of the AHD is linked to GDP release. In support, we note that in GPCR ternary complexes, the AHD is usually not modelled because of its high flexibility. The only body of evidence we are aware of is that AHD must open up to allow GDP release. For this reason,  we decided to supervise the distance between AHD and the Ras domain without using a reference.

      In the statement: "The AHD opening was simulated starting from the best GLP-1R:Gs binding mwSuMD replica" the definition "best binding" requires clarification.

      This has been amended, specifying that Replica 2 was considered the “best replica” due to the closed deviation to the cryoEM structure.

      As for the case study on β2-AR-Gs recognition, I strongly suggest to eliminate it. However, I'd like to make some comments. The sentence: "the adrenergic β2 receptor (b2 AR) in an intermediate active state was downloaded from GPCRdb (https://gpcrdb.org/)" is vague as it does not indicate what intermediate active state structure was used. Since the goal of the case study was to probe the method in simulating receptor-G protein binding, it would have been better to start with a fully active state of the receptor like the 4LDO structure, employed by the authors only to extract epinephrine.

      mwSuMD is designed to provide insights into structural transitions. We started from an intermediate active state of β2-AR in complex with adrenaline because resembling the most populated state stabilised by a full agonist according to NMR studies (DOI:10.1016/j.cell.2015.08.045); the fully-active β2-AR conformation is stabilized only after Gs binding. However, following the Reviewer’s suggestion, we have reduced the presented results for the β2-AR-Gs recognition.

      Also in this case, it is not clear if the supervised receptor-G protein distance is between the centroid of the whole 7-helix bundle and the centroid of Gs α5. It is not clear why the TM6 RMSD concerned only the cytosolic end of the helix and did not include the kink region. With that selection, to estimate the outward displacement, RMSD should have been computed without superimposing the considered portion (once all remaining Cα-atoms of the receptors are superimposed).

      As the Reviewer pointed out above, some knowledge of the system is required to set up mwSuMD. Using more generic metrics as we did in this case, like the distance between the whole TMD and Gs α5 represents a general approach applicable to other GPCRs, that should allow orthogonal metrics to evolve independently from the supervision.

      As now specified in the text, the superimposition for RMSD calculation was performed on residues 40 to 140 Ca atoms, hence not considering TM6.

      As for the A1R-Gi recognition, as already stated, I strongly suggest eliminating it. However, I'd like to add some comments. I would discourage the employment of an AlphaFold model for simulations deputed to model validation in general and, in particular, when highresolution structures are available. In this case, the authors would have used the 1GP2 structure of heterotrimeric Gi no matter if from the rat species.

      Following the Reviewer’s suggestion, we have dramatically reduced the results presented for the A1R-Gi recognition. We considered 1GP2 for the simulations but H5 lacks the Cterminal six residues and therefore some extent of modelling was still necessary. However, we take the Reviewer’s comment on board and consider it for future work.

      Also, the palmitoylation and geranylgeranylation process is quite tortuous and it is not clear why the NVT ensemble was employed in the second stage of equilibration. This is reflected also on the GLP1R case study.

      We have amended the text to clarify this passage. The second NVT stage is required for stabilizing the G protein and its orientation in the simulation box. The figure below shows that a plateau of the Ca RMSD during the NVT step was reached after 700 ns for both Gi (black) and Gs (orange).

      Author response image 1.

      Here, it is not clear if the RMSD of α5 of Gi was computed with or without superposition.

      The RMSD of α5  was computed after superimposing on A<sub>1</sub>R residues 40-140 Ca atoms (the less flexible region of the receptor). We have now amended the text to report this information. 

      Reviewer #3 (Recommendations For The Authors):  

      Points to address:

      (1) Root Mean Square Deviation (RMSD) data are often reported as minimum values. It would be useful to provide the average value along the stable part of the trajectories. From the plots in Figure 2ab, it seems that the minimum values reported in the paper are very far from the average ones and thus represent special cases that are seldom reached during simulation. The authors should clarify this point;

      For the revised manuscript, we moved Figure 2 to the supplementary material and added average RMSD values for the most notable replicas in Figures 4e and S8a,b. As a reference, in the text, we now report RMSDs from our previous classic MD simulations (https://doi.org/10.1038/s41467-021-27760-0) of Gs:GLP-1R cryoEM structure (G<sub>α</sub> = 6.18 ± 2.40 Å; G<sub>β</sub> \= 7.22 ± 3.12 Å; G<sub>γ</sub> = 9.30 ± 3.65 Å) which show how flexible G proteins bound to GPCRs are and give better context to the RMSD values we measured during mwSuMD simulations.

      (2) The RMSD values reported in the paper always refer to single molecules or proteins. It would be useful to also report the RMSD computed over the whole complexes (ligand/GPCR or GPCR/G protein). It would provide a better metric for understanding the general distance between the results and the reference experimental structures;

      We have now removed the results sections for A<sub>1</sub>R and β<sub>2</sub> AR to focus on GLP-1R, whose RMSD is analyzed in detail in Figures 2, 3 and 4.

      (3) A number of computational works investigated the GPCR/G protein interaction and these studies should be cited and discussed. Examples are the works from Mafi et al. 2023 (doi: 10.1038/s41557-023-01238-6), Fleetwood et al. 2020 (doi: 10.1021/acs.biochem.9b00842), Calderon et al. 2023 and 2024 (doi: 10.1021/acs.jcim.3c00805 and doi: 10.1021/acs.jcim.3c01574), Maria-Solano and Choi 2023 (doi: 10.7554/eLife.90773.1), Mitrovic et al. 2023 (doi: 10.1021/acs.jpcb.3c04897), and D'Amore et al. 2023 (doi: 10.1101/2023.09.14.557711). Many of these works focused on the activation of B2AR and the interaction with its G protein. In addition, Maria-Solano and Choi 2023 and D'Amore et al. 2023 also characterized the rotation of TM6 during the A1R and A2AR activation. Therefore, the claim "To the best of our knowledge, this is the first time an MD simulation captures the TM6 rotation upon receptor activation as results reported so far are largely limited to the TM6 opening and kinking55." is untimely;

      We thank the Reviewer for the suggested references. We have added them to the introduction as examples of energy-biased (Calderon et al. 2023 and 2024, Maria-Solano and Choi, Mitrovic et al., D'Amore et al) or adaptive sampling (Fleetwood et al) approaches to GPCR. Since the above articles focus on β<sub>2</sub>  AR and A<sub>1</sub>R, we do not discuss them in detail because the results sections for A<sub>1</sub>R and b<sub>2</sub> AR have been drastically reduced in the manuscript.

      We note that among the suggested references, only Mafi et al report about a simulated G protein (in a pre-formed complex) and none of the work sampled TM6 rotation without input of energy. However, we have removed the claim from the text.   

      (4) In the discussion section, the authors claim that a distance-based approach can be employed when the structural data of the endpoints is limited. However, the results obtained from the distance-based protocol during the validation of the approach, which was done using V2R as a reference, are unsatisfying, as acknowledged by the authors themselves. For instance, the RMSD mode value reported for the AVP C alpha atoms with respect to 7DW9 is high, 0.7 nm, whereas the minimum value is 0.38 nm. In addition, some side chains are not oriented in the experimental conformation and might have a different interaction pattern with the receptor if compared with the experimental structure. Considering that in this case the endpoint is known, it is plausible that the performance of the method would degrade even further when data about the target structure is limited. In a real case scenario, the ligand binding mode is unknown and in such a case no RMSD matrix can be used. This represents the major concern of this study that is no prediction is provided, but only - rather inaccurate - reproduction of the known structural data;

      The goal of the first part of the work was to compare mwSuMD to SuMD to justify its application on ligand binding using a challenging case study like vasopressin. The general validation of the parent method SuMD as a predictive tool for ligand binding mode has been extensively reported over the years (a few examples: https://doi.org/10.1021/ci400766b ; https://doi.org/10.1021/acs.jcim.5b00702 ; https://doi.org/10.1038/s41598-020-77700-z) and fell beyond the scope of this work. 

      (5) In the discussion, the authors write "A complete characterization of the possible interfaces between GPCR monomers, which falls beyond the goal of the present work, should be achieved by preparing different initial unbound states characterized by divergent relative orientations between monomers to dynamically dock." It would be useful for the reader to refer to and cite here advanced computational approaches that allow a comprehensive sampling of GPCR dimerization independently from the starting conformation of the receptors. One example is coarse-grained metadynamics as shown in doi: 10.1038/s41467-023-42082-z;

      The A<sub>2A</sub/D<sub>2</sub receptors dimerization has been removed from the manuscript. 

      (6) In many cases, it is not reported how residues missing from the experimental structures used to model the proteins were reconstructed. This information is important, considering that the authors comment on the results of their calculations on addressing these regions, such as in the case of B2AR. Furthermore, the authors did not report how their initial models were validated. The authors should also explain why they did not model the IC loops of A2AR and D2R;

      In the current version of the manuscript, for V2R ECL2 and GLP-1R, we specify that we produced 10 solutions with Modeller and considered the best one in terms of the DOPE score. 

      The only receptor model used,  β<sub>2</sub> AR, is now presented as preliminary data focusing on Gs and avoiding any structural detail of the Gs recognition. 

      As reported above the A2A-D2 dimerization has been removed from the manuscript.

      (7) In several cases, the authors state that residues never investigated before play an important role in the interaction between different proteins. An example is provided on page 6 for the B2AR/G protein association. Since this claim is quite significant, it would benefit from validation, at least for further calculations such as in silico mutagenesis studies. Another example is at the end of page 10 where the authors report a hidden interaction between D344 and R385 that is pivotal for Gs coupling by GLP-1R. Is there other evidence supporting this result (previously reported literature data, conservation rate of these residues, etc.)?;

      We have removed the supplementary table reporting B2AR/G protein interactions to reduce speculations and added a reference that reports GLP-1 EC50 reduction upon mutation of position 344 to Ala (https://doi.org/10.1021/acscentsci.3c00063).

      (8) The authors should provide a deeper discussion about the conformational rearrangement of GPCR and G protein during the coupling. In detail, the conformational changes of microswitich amino acids of GPCR (e.g., PIF, NPxxY, inactivating ionic lock) and alpha helix 5 of G proteins should be discussed in relation to the literature data and experimental structures;

      We have removed the A1R and b2 AR results to focus on GLP-1R. Key structural motifs in the polar central network and TM6 kink are analyzed more in detail in Figure 3.

      (9) The chronology of the conformational changes of GLP-1R is arbitrarily chosen. During the simulation, the RMSD values reported in Fig. 3 are high and do not demonstrate the full accomplishment of the simulation of the activation process of the receptor;

      We agree with the Reviewer that the GLP-1R inactive to active transition was not fully accomplished, compared to other work on class A GPCRs.  Unlike class A, class B GPCRs represent a challenging system to work with in silico because inactive starting conformations (e.g 6LN2) are extremely distant from the active one (e.g 7LCJ, 7LCI or 6X18), as demonstrated in Figure S6 for GLP-1R. Here we report the first attempt to model a class B GPCR activation mechanism starting from the inactive state, and even if not fully achieved, we believe it represents state-of-the-art simulations for this class of receptors.

      (10) It would be helpful for the reader not familiar with the employed technique that the authors explain in one sentence in the main text the pros and cons of using multiple walkers instead of single walker SuMD;

      We thank the Reviewer for the excellent suggestion. In the Discussion, we have now commented that: “more extensive sampling obtainable by seeding multiple parallel short simulations instead of a single simulation for batch”, while in the Methods we explain that “mwSuMD is designed to increase the sampling from a specific configuration by seeding user-decided parallel replicas (walkers) rather than one short simulation as per SuMD. Since one replica for each batch of walkers is always considered productive, mwSuMD gives more control than SuMD on the total wall-clock time used for a simulation. On the flip side, mwSuMD requires multiple GPUs to be the most effective, although any multi-threaded GPU can run more walkers on the same hardware keeping the sampling variety.”.

      Minor points to address:

      (11) Page 3: the following sentence is duplicated (also found on page 2) "GPCRs preferentially couple to very few G proteins out of 23 possible counterparts";

      (12) Page 20: Figure S13 refers to the QM validation of PF06882961 torsional angle, not to the image of the receptor conformational changes, which is instead Figure S14 (please correct figure caption).

      We thank the Reviewer for the accurate reading of the manuscript. These typos have been corrected.

    1. eLife Assessment

      This study describes a novel mechanism for how collagen fibrils are formed. The authors present compelling evidence that collagen-I fibrillogenesis relies on a functional endocytic system for recycling collagen-I, with circadian-regulated VPS33b and integrin-α11 being critical for fibril assembly. This is an important study for the understanding of the pathophysiology of collagen fibrillogenesis.

    2. Reviewer #1 (Public review):

      Summary:

      The authors describe that the endocytic pathway is crucial for ColI fibrillogenesis. ColI is endocytosed by fibroblasts, prior to exocytosis and formation of fibrils, which can include a mixture of endogenous/nascent ColI chains and exogenous ColI. ColI uptake and fibrillogenesis are regulated by circadian rhythm as described by the authors in 2020, thanks to the dependence of this pathway on circadian-clock-regulated protein VPS33B. Cells are capable of forming fibrils with recently endocytosed ColI along when nascent chains are not available. Previously identified VPS33B is demonstrated not to have a role in endocytosis of ColI, but to play a role in fibril formation, which the authors demonstrate by showing the loss of fibril formation in VPS33B KO, and an excess of insoluble fibrils - along-side a decrease in soluble ColI secretion - in VPS33B overexpression conditions. A VPS33B binding protein VIPAS39 is also shown to be required for fibrillogenesis and to colocalise with ColI. The authors thus conclude that ColI is internalised into endosomal structures within the cell, and that ColI, VPS33B and VIPA39 are co-trafficked to the site of fibrillogenesis, where along with ITGA11, which by mass spectrometric analysis is shown to be regulated by VPS33B levels, ColI fibrils are formed. Interestingly, in involved human skin sections from idiopathic pulmonary fibrosis (IPF) patients, ITGA11 and VPS33B expression is increased compared to healthy tissue, while in patient-derived fibroblasts, uptake of fluorescently-labelled ColI is also increased. This suggests that there may be a significant contribution of endocytosis-dependent fibrillogenesis in the formation of fibrotic and chronic wound-healing diseases in humans.

      Strengths:

      This is an interesting paper that contributes an exciting novel understanding of the formation of fibrotic disease, which despite its high occurrence, still has no robust therapeutic options. The precise mechanisms of fibrillogenesis are also not well understood, so a study devoted to this complex and key mechanism is well appreciated. The dependence of fibrillogenesis on VPS33B and VIPA39 is convincing and robust, while the distinction between soluble ColI secretion and insoluble fibrillar ColI is interesting and informative.

      Weaknesses:

      There are a number of limitations to this study in its current state. Inhibition of ColI uptake is performed using Dyngo4a, which although proposed as an inhibitor of Clathrin-dependent endocytosis is known to be quite un-specific. This may not be a problem however, as the endocytic mechanism for ColI also does not seem to be well defined in the literature, in fact, the principle mechanism described in the papers referred to by the authors is that of phagocytosis. It would be interesting to explore this important part of the mechanism further, especially in relation to the intracellular destination of ColI. The circadian regulation does not appear as robust as the authors last paper, however, there could be a larger lag between endocytosis of ColI and realisation of fibrils. The authors state that the endocytic pathway is the mechanism of trafficking and that they show ColI, VPS33B and VIPA39 are co-trafficked. However, the only link that is put forward to the endosomes is rather tenuously through VPS33B/VIPA39. There is no direct demonstration of ColI localisation to endosomes (ie. immunofluorescence), and this is overstated throughout the text. Demonstrating the intracellular trafficking and localisation of ColI, and its actual relationship to VPS33B and VIPA39, followed by ITGA11, would broaden the relevance of this paper significantly to incorporate the field of protein trafficking. Finally, the "self-formation" of ColI fibrils is discussed in relation to the literature and the concentration of fluorescently-tagged ColI, however as the key message of the paper is the fibrillogenesis from exocytosed colI, I do not feel like it is demonstrated to leave no doubt. Specific inhibition of intracellular trafficking steps, or following the progressive formation of ColI fibrils over time by immunofluorescence would demonstrate without any further doubt that ColI must be endocytosed first, to form fibrils as a secondary step, rather than externally-added ColI being incorporated directly to fibrils, independent of cellular uptake.

    3. Reviewer #3 (Public review):

      Summary:

      Chang et al. investigated the mechanisms governing collagen fibrillogenesis, firstly demonstrating that cells within tail tendons are able to uptake exogenous collagen and use this to synthesize new collagen-1 fibrils. Using an endocytic inhibitor, the authors next showed that endocytosis was required for collagen fibrillogenesis and that this process occurs in a circadian rhythmic manner. Using knockdown and overexpression assays, it was then demonstrated that collagen fibril formation is controlled by vacuolar protein sorting 33b (VPS33b), and this VPS33b-dependent fibrillogenesis is mediated via Integrin alpha-11 (ITGA11). The authors also demonstrated increased expression of VPS33b and ITGA11 at the gene level in fibroblasts from patients with idiopathic pulmonary fibrosis (IPF), and greater expression of these proteins in both lung samples from IPF patients and in chronic skin wounds, indicating that endocytic recycling is disrupted in fibrotic diseases. Finally, the authors performed knockdown assays in patient derived IPF fibroblasts to confirm that silencing of VPS33b and ITGA11 results in a decrease in recycling of exogenous collagen-1

      Strengths:

      The authors have performed a comprehensive functional analysis of the regulators of endocytic recycling of collagen, providing compelling evidence that VPS33b and ITGA11 are crucial regulators of this process, and that this endocytic recycling becomes disrupted in fibrotic diseases.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Overall authors’ response

      We would like to thank the 3 reviewers for a thorough critique of our manuscript, and acknowledging the novelty and importance of our studies, in particular the relevance to collagenrelated pathologies such as idiopathic pulmonary fibrosis and chronic skin wound. We appreciate that there are shortcomings in these studies, as highlighted by reviewers; we have rewritten parts of our manuscript to clarify any misunderstandings, and conducted additional experiments to address concerns raised by reviewers (please see below red text within each response), which have been incorporated into our revised manuscript (modified text highlighted in yellow in revised manuscript). We believe that the revision had made our manuscript stronger in support of our original conclusions. 

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors describe that the endocytic pathway is crucial for ColI fibrillogenesis. ColI is endocytosed by fibroblasts, prior to exocytosis and formation of fibrils, which can include a mixture of endogenous/nascent ColI chains and exogenous ColI. ColI uptake and fibrillogenesis are regulated by circadian rhythm as described by the authors in 2020, thanks to the dependence of this pathway on circadian-clock-regulated protein VPS33B. Cells are capable of forming fibrils with recently endocytosed ColI when nascent chains are not available. Previously identified VPS33B is demonstrated not to have a role in endocytosis of ColI, but to play a role in fibril formation, which the authors demonstrate by showing the loss of fibril formation in VPS33B KO, and an excess of insoluble fibrils - along-side a decrease in soluble ColI secretion - in VPS33B overexpression conditions. A VPS33B binding protein VIPAS39 is also shown to be required for fibrillogenesis and to colocalise with ColI. The authors thus conclude that ColI is internalised into endosomal structures within the cell, and that ColI, VPS33B, and VIPA39 are co-trafficked to the site of fibrillogenesis, where along with ITGA11, which by mass spectrometric analysis is shown to be regulated by VPS33B levels, ColI fibrils are formed. Interestingly, in involved human skin sections from idiopathic pulmonary fibrosis (IPF) patients, ITGA11 and VPS33B expression is increased compared to healthy tissue, while in patient-derived fibroblasts, uptake of fluorescently-labelled ColI is also increased. This suggests that there may be a significant contribution of endocytosis-dependent fibrillogenesis in the formation of fibrotic and chronic wound-healing diseases in humans. 

      Strengths: 

      This is an interesting paper that contributes an exciting novel understanding of the formation of fibrotic disease, which despite its high occurrence, still has no robust therapeutic options. The precise mechanisms of fibrillogenesis are also not well understood, so a study devoted to this complex and key mechanism is well appreciated. The dependence of fibrillogenesis on VPS33B and VIPA39 is convincing and robust, while the distinction between soluble ColI secretion and insoluble fibrillar ColI is interesting and informative. 

      Weaknesses: 

      There are a number of limitations to this study in its current state. Inhibition of ColI uptake is performed using Dyngo4a, which although proposed as an inhibitor of Clathrin-dependent endocytosis is known to be quite un-specific. This may not be a problem however, as the endocytic mechanism for ColI also does not seem to be well defined in the literature, in fact, the principle mechanism described in the papers referred to by the authors is that of phagocytosis.

      We thank the reviewer for pointing this out. Macropinocytosis or phagocytosis could be modelled using high molecular weight dextran, and we have used fluorescently-labelled dextran to investigate potential co-localisation with exogenous collagen to investigate the involvement of these mechanisms in addition to endocytosis, and showed very little co-localisation (revised Figure S2B, lines 123-126). Further, we have performed a competition experiment where unlabelled collagen was added in excess at the same time as labelled collagen and showed that excess unlabelled collagen led to a retention of labelled collagen at the cell periphery (revised Figure S2C, lines 126-129). This is suggestive of collagen-I uptake utilises a different pathway to dextran (i.e. fluid-phase endocytosis) and is a receptor-mediated process.  

      It would be interesting to explore this important part of the mechanism further, especially in relation to the intracellular destination of ColI.

      We agree with the reviewer that the intracellular destination of ColI is very interesting, which is what the current Chang lab is investigating, although we believe the research findings fall out of scope for the revised manuscript here. However, we have included additional immunofluorescence data to support that collagen is indeed taken up into endosomal compartments using GFP-tagged Rab5 constructs (revised Figure 1D, Figure S6A).

      The circadian regulation does not appear as robust as the authors' last paper, however, there could be a larger lag between endocytosis of ColI and realisation of fibrils.

      The authors state that the endocytic pathway is the mechanism of trafficking and that they show ColI, VPS33B, and VIPA39 are co-trafficked. However, the only link that is put forward to the endosomes is rather tenuously through VPS33B/VIPA39.

      We would like to clarify that we meant the post-Golgi compartment. We did not mean VPS33b/VIPAS39 as an endosome marker; however as we see collagen entering the cell in intracellular compartments, which is then recycled, we take that as convention, the endosome would be involved. This is further supported that we see some colocalisation with the classic Rab5 endosome marker.

      There is no direct demonstration of ColI localisation to endosomes (ie. immunofluorescence), and this is overstated throughout the text.

      We appreciate the comment and have modified overstatements in the revised manuscript as appropriate. As stated above, we have included additional immunofluorescence data to support that collagen is indeed taken up into endosomal compartments.

      Demonstrating the intracellular trafficking and localisation of ColI, and its actual relationship to VPS33B and VIPA39, followed by ITGA11, would broaden the relevance of this paper significantly to incorporate the field of protein trafficking. Finally, the "self-formation" of ColI fibrils is discussed in relation to the literature and the concentration of fluorescently-tagged ColI, however as the key message of the paper is the fibrillogenesis from exocytosed colI, I do not feel like it is demonstrated to leave no doubt. Specific inhibition of intracellular trafficking steps, or following the progressive formation of ColI fibrils over time by immunofluorescence would demonstrate without any further doubt that ColI must be endocytosed first, to form fibrils as a secondary step, rather than externally-added ColI being incorporated directly to fibrils, independent of cellular uptake.

      We appreciate the concern raised here. This is precisely why we trypsinised and replated cells as part of the workflow, so we can make sure that there is no residual exogenous collagen which is not endocytosed being incorporated onto pre-existing fibrils. We have new data using flow imaging, which showed that cells that don’t endocytose exogenous collagen has accumulation of said collagen at the periphery of the cells, which is greatly reduced after trypsinisation. This new data is in a more detailed methodology-based study which is under preparation, which will allow future studies to further dissect the collagen intracellular trafficking process, and thus is not included in the revised manuscript. 

      Reviewer #2 (Public Review): 

      Summary: 

      In this manuscript, the authors describe a mechanism, by which fluorescently-labelled Collagen type

      I is taken up by cells via endocytosis and then incorporated into newly synthesized fibers via an ITGA11 and VPS33B-dependent mechanism. The authors claim the existence of this collagen recycling mechanism and link it to fibrotic diseases such as IPF and chronic wounds. 

      Strengths: 

      he manuscript is well-written, and experimentally contains a broad variation of assays to support their conclusions. Also, the authors added data of IPF patient-derived fibroblasts, patient-derived lung samples, and patient-derived samples of chronic wounds that highlight a potential in vivo disease correlation of their findings. 

      The authors were also analyzing the membrane topology of VPS33B and could unravel a likely 'hairpin' like conformation in the ER membrane. 

      Weaknesses: 

      Experimental evidence is missing that supports the non-degradative endocytosis of the labeled collagen.

      We thank the reviewer for raising this. We would like to clarify that we do not think that all endocytosed collagen-I is recycled, but rather sorted in the endosome which determines the fate of endocytosed collagen. Interestingly, results from Kadler’s group has shown that blocking lysosome function (through chloroqine and bafilomycin) significantly reduced endogenous collagen fibril formation (https://www.biorxiv.org/content/10.1101/2024.05.09.593302v1), suggesting a nondegradative role for lysosome in fibrillogenesis.   

      The authors show and mention in the text that the endocytosis inhibitor Dyngo®4a shows an effect on collagen secretion. It is not clear to me how specific this readout is if the inhibitor affects more than endocytosis. This issue was unfortunately not further discussed.

      We thank the reviewer for this comment and have included in discussion the specificity of Dyngo4a (revised manuscript lines 383392). The ponceau stain suggests that Dyngo4a treatment did not affect global secretion and thus the effects are specific to collagen-I (Fig 2B).

      The authors use commercial rat tail collagen, it is unclear to me which state the collagen is in when it's endocytosed. Is it fully assembled as collagen fiber or are those single heterotrimers or homotrimers?

      We apologise for the confusion and will clarify in our revision. These would be single helical trimers from acid-extracted rat tail collagen. We have performed additional light scattering and CD spectra to confirm the molecular weight and helicity, and confirm that adding fluorescent tags did not alter the readout. We have included this in the revised manuscript (revised Figure S1A-C, manuscript lines 82-86).    

      The Cy-labeled collagen is clearly incorporated into new fibers, but I'm not sure whether the collagen is needed to be endocytosed to be incorporated into the fibers or if that is happening in the extracellular space mediated by the cells.

      We appreciate the concern raised here, which is also raised by reviewer 1. As answered above, this is why we trypsinised and replated cells as part of the workflow, so we can make sure that there is no residual exogenous collagen being incorporated onto pre-existing fibrils. We also have new data using flow imaging, which shows that cells that don’t endocytose exogenous collagen has accumulation of said collagen at the periphery of the cells, which is greatly reduced after trypsinisation. This new data is in a methodology-based manuscript which is under preparation, thus will not be included in the revised manuscript.  

      In general for the collagen blots, due to the lack of molecular weight markers, what chain/form of collagen type I are you showing here?

      Apologies for the lack of molecular weight markers, it was an oversight by the authors and have been included in the revised figures.  

      Besides the VPS33B siRNA transfected cells the authors also use CRISPR/Cas9-generated KO. The KO cells do not seem to be a clean system, as there is still a lot of mRNA produced. Were the clones sequenced to verify the KO on a genomic level?

      Yes, the clones were verified and used in our previous paper on circadian control of collagen homeostasis. There are instances where despite knockout at the protein level, mRNA is still persistent; however these transcripts are likely then directed to degradation through nonsense-mediated mRNA decay. To fully understand this mechanism is beyond the scope of this paper. 

      For the siRNA transfection, a control blot for efficiency would be great to estimate the effect size. To me it is not clear where the endocytosed collagen and VPS33B eventually meet in the cells and whether they interact. Or is ITGA11 required to mediate this process, in case VPS33B is not reaching the lumen?

      This is an interesting question. We have conducted experiments with Col1-GFP11 containing conditioned media incubated with VPS33b-barrell in the revised paper, which showed that they interact within the cell and not at the cell periphery (revised Figure 6G, lines 293-296), again highlighting that VPS33b is not involved in the endocytosis step but interacts with endocytosed collagen-I intracellularly. We have attempted colocliasation studies using the split GFP approach with VPS33B and ITGA11 to investigate where they interact, but as the ITGA11 construct we used did not localise to the cell surface as expected, we are not confident that this system is appropriate for investigating how/if VPS33B interacts with ITGA11, and there are simply no good antibody for VPS33B for staining. 

      The authors show an upregulation of ITGA11 and VPS33B in IPF patients-derived fibroblasts, which can be correlated to an increased level of ColI uptake, however, it is not clear whether this increased uptake in those cells is due to the elevated levels of VPS33B and/or ITGA11.

      We would like to clarify here that we do not think collagen-I uptake is due to VPS33B and/or ITGA11, as siITGA11 and VPS33B in fibroblasts showed no consistent changes in uptake as determined by flow cytometry, which was included in the original manuscript (now revised Figure 6H, 7I). VPS33B and ITGA11 are involved in the ‘outward’ arm of recycled collagen-I, i.e. directing to fibrillogenesis route. We agree that the inclusion of additional functional studies using IPF patient-derived patient fibroblasts would add to the manuscript, and have performed siRNA against VPS33B and ITGA11 on IPF fibroblasts, and demonstrated a late of endocytic recycling events (revised Figure 8D, S6B, lines 351-353).  

      Reviewer #3 (Public Review): 

      Summary: 

      Chang et al. investigated the mechanisms governing collagen fibrillogenesis, firstly demonstrating that cells within tail tendons are able to uptake exogenous collagen and use this to synthesize new collagen-1 fibrils. Using an endocytic inhibitor, the authors next showed that endocytosis was required for collagen fibrillogenesis and that this process occurs in a circadian rhythmic manner. Using knockdown and overexpression assays, it was then demonstrated that collagen fibril formation is controlled by vacuolar protein sorting 33b (VPS33b), and this VPS33b-dependent fibrillogenesis is mediated via Integrin alpha-11 (ITGA11). Finally, the authors demonstrated increased expression of VPS33b and ITGA11 at the gene level in fibroblasts from patients with idiopathic pulmonary fibrosis (IPF), and greater expression of these proteins in both lung samples from IPF patients and in chronic skin wounds, indicating that endocytic recycling is disrupted in fibrotic diseases. 

      Strengths: 

      The authors have performed a comprehensive functional analysis of the regulators of endocytic recycling of collagen, providing compelling evidence that VPS33b and ITGA11 are crucial regulators of this process. 

      Weaknesses: 

      Throughout the study, several different cell types have been used (immortalised tail tendon fibroblasts, NIHT3T cells, and HEK293T cells). In general, it is not clear which cells have been used for a particular experiment, and the rationale for using these different cell types is not explained. In addition, some experimental details are missing from the methods.

      We thank the reviewer for pointing out the lack of clarity, and have filled in missing information in the methods. HEK293T cells were used for virus production for the VPSoe system, and we have clarified the cell types used in figure legends (predominantly iTTF). We have also provided justification when NIH3T3 cells were used (revised lines 290-291).    

      There is also a lack of functional studies in patient-derived IPF fibroblasts which means the link between endocytic recycling of collagen and the role of VPS33b and ITGA11 cannot be fully established.

      We thank the reviewer for this comment, which was also raised by reviewer 2 above. We agree that the inclusion of additional functional studies using IPF patient-derived patient fibroblasts would add to the manuscript and have performed siRNA against VPS33B and ITGA11 on IPF fibroblasts, and demonstrated a late of endocytic recycling events (revised Figure 8D, S6B, lines 351-353).  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      The authors inhibit Clathrin-dependent endocytosis with dyngo4a. It is well known that this inhibitor is not highly specific for this pathway. It is also not explained why the authors only inhibit the Clathrin uptake pathway, and not pinocytosis or Clathrin-independent endocytosis too. The authors refer to papers that describe pinocytosis for collagen endocytosis.

      We thank the reviewer for raising this question. Based on the fact that inhibition of clathrin-dependent pathway does not completely abrogate endocytosis of collagen-I, we anticipate that other pathways are involved in mediating collagen-I uptake, although additional data suggested this is unlikely through fluid-phase endocytosis, and is receptor mediated (revised Figure S2B, C).  

      Where does the ColI go in the cell? Depending on the uptake pathway, it is likely to pass through endocytic carriers to endosomes, where it may be recycled to the PM or degraded. From the start, the authors describe the ColI as being in vesicular structures, however, the imaging data that this is based on is not co-labelled with anything to determine the potential structure/localisation. This is not done at any point in the paper, until IF is shown of ColI with VIPA39, however without the relevant controls, this IF is unconvincing, as the general pattern of ColI and VIPA39 as an endosomal marker are not classically recognisable. Additionally, VPS33B is described as a late endosome/lysosome marker, which would have different connotations on ColI trafficking or destination than other types of endosomes.

      We thank the reviewer for pointing out the weaknesses in our original IF. We have included new confocal images showing labelled collagen co-localisation with GFP-tagged Rab5 through transient transfection, which is a more traditional endosome marker (revised Figure 1D, Figure S6A).  

      We are currently characterising the compartments to where ColI is trafficked to, which is being prepared as part of a methodology-based manuscript. We believe that this characterisation would be too detailed to be included in a revised version of this manuscript. The Kadler lab also have data suggesting that the lysosome is involved in collagen fibrillogenesis instead of its canonical degradation function, which is in another submitted manuscript (https://www.researchsquare.com/article/rs-1336021/v1). It was not included in this manuscript due to our focus (i.e. endocytic-recycling).   

      In Figure 5H, the pattern of Cy5-ColI staining looks like it could even be ER/Golgi in the VPSKO zoom panel, but in the absence of co-labelling, we cannot conclude anything. In order for the authors to conclude that ColI is within the endosomes, co-labelled If should be performed to demonstrate ColIendosomal colocalization. Likewise for the role of VPS33B in ColI fibrillogenesis: dependence of the process is demonstrated, but the relationship is not defined. This could be clarified using IF. This would also support the authors' statements of co-trafficking between ColI, VPS33B, and VIPA39, which as the paper stands, is not demonstrated.

      We would like to clarify that our hypothesis is that the endosome controls how collagen is being deposited outside the cell, i.e. whether it’s protomeric secretion or fibrillogenesis, and that the decision of whether an endocytosed collagen is recycled or degraded lies in this compartment. The reviewer is correct that it may not be just the endosome that endocytosed collagen-I ends up in, as we have new data suggesting involvement of other intracellular compartment, although the detailed mechanism is beyond the scope of this manuscript. Nonetheless, we have included new data showing co-localisation of endocytosed collagen with Rab5 in this revised manuscript (revised Figure 1D, Figure S6A).  

      The basis of this paper is that endocytosis of ColI must occur before re-exocytosis as fibrillar ColI. The authors show this through pulse-chase experiments, with a trypsinisation step to remove any externally bound ColI. The authors also show nice time progression by flow cytometry, but it would truly demonstrate this point if they showed 0 timepoint, or low timepoint of IF to show progressive lengthening of ColI fibrils. This is used early on in Figure 1D, although the presentation here is not very clear. This is especially important as the authors address the self-seeding capabilities of Collagen in cell-free conditions in Figure 1F.

      We would like to thank the reviewer for this suggestion.  From previous endogenously tagged collagen data, we know that the appearance of collagen fibrils is rather rapid, thus it may not be a gradual lengthening as expected, but rather a depletion of endocytosed collagen in the initial seeding/growth step (please see https://www.researchsquare.com/article/rs-1336021/v1). We have included an image of replated fibroblasts after 18 hours showing no appearance of extracellular collagen, endogenous or otherwise (revised Figures S2A, line 110).  

      Finally, although the involvement of ITGA11 is interesting, it is not well described, and its role is not well demonstrated. This could likely be clarified by an additional introduction to ITGA11 and its role in collagen exocytosis/fibrillogenesis.

      We would like to thank the reviewer for pointing this out and have included additional sentences to specifically introduce ITGA11 and its role in fibrillogenesis (see lines 320, 321; 446-450).  

      Specific points: 

      Line 73: You haven't compared reuse vs production, so you can't say that reuse is central rather than production. They may be both as important or production still may be the most crucial, maybe it depends on cell/collagen type. Using the ColI KD or CHX to block nascent synthesis, you could directly compare the impact of both.

      We would like to clarify that we are not referring to reuse/recycling here. We meant that production of collagen (i.e. single hetero/homotrimer molecules within the cell) is not as crucial as the utilisation (i.e. are these being secreted as protomers, or assembled into fibrils) of these building blocks by the cells, which was supported by our finding that production (as suggested by mRNA levels) of IPF fibroblasts are similar to that in control fibroblasts (now revised Figure 8A). We have conducted ColI siRNA to block nascent synthesis in the original manuscript and showed that fibroblasts can efficiently make new fibrils by recycling exogenous collagen (Figure 3B, C), although we appreciate that siRNA may not completely inhibit endogenous production. Thus, we have also included new data using collagen-I knockout cells to support our hypothesis that without endogenous production, fibroblasts can still effectively make collagen fibrils if they can reuse what is available in the extracellular space (revised Figure 4, Figure S3C, D; lines 178-199).  

      Lines 83-87: The rationale for this experiment is not clear. Cy3-ColI is added, taken up into cells, and incorporated into fibrils coming from cells. 5FAM-ColI is added at a later stage, then at 2 days (when incorporation is demonstrated in Fig 1B), it is also incorporated into cells as expected. Why does this comment on ColI not being degraded any more than Cy3-ColI alone?

      We believe that the pulse chase experiment using the differently tagged collagen demonstrated a dimension of dynamics that is not demonstrated with Cy3-ColI alone. In this case, Cy3-ColI was initially added, and removed after 3 days; 5FAM-ColI is then added and incubated for 2 more days. Thus after 5 days since the initial pulse, the Cy3-ColI persisted and was not degraded. We would like to apologise for causing this confusion, and have clarified in the revised manuscript (lines 542-549; Figure S1D figure legend).  

      Figure 1A: I would like to see a negative control: either dark colI or no Cy3-Col, or timescale. Is B quantified from these images?

      We thank the reviewer for this comment. We have added the nocollagen control image in our revision (revised Figure S1D). 1B is not quantified from the ex vivo tendon experiments, but rather the in vitro cell culture experiments (i.e. those from 1D-1F, although they are all from independent experiments).  

      Figure 1B: in iTTF cells (immortalised tendon cells) Corrected to max: What does that mean?

      As there are variations between individual experiments (e.g. changes in the amount of collagen added due to pipetting) we have normalised to the maximum value obtained in each individual experiments so that we can display all biological repeats within the same graph.  

      Figure 1C: You can't say ColI is in vesicular structures from this, they are spots, yes, but that could also be in Golgi/ER (unlikely to be cytosolic but not impossible).

      We appreciate this comment and have change the wording accordingly and call them intracellular/punctate structures.

      Figure 1D: Not the best presentation: The cell mask has structures: what are these? It's not clear if this is a single cell, would be better with a defined marker (endocytic marker, lysosome etc). Instead of a low-resolution 3D view, it would be clearer with normal confocal XY and zooms of "vesicular structures" using appropriate markers as 3D reconstructions I think it could be removed.

      This is a single cell and the cell mask is staining plasma membrane. We didn’t use defined marker as we wanted to visualise the whole intracellular cell compartment. We appreciate that further proof is needed to verify the location of the endocytosed collagen, and have included additional confocal imaging data to support the localisation of collagen into Rab5 positive intracellular compartments (revised Figure 1D, Figure S6B).  

      Figure 1 E/F: Cy3 is only visible in extracellular structure, not also intracellular. Why? Would be useful to see the time points of incorporation at the end of the pulse, then at an early point into the chase, to demonstrate 1) Cy3-ColI uptake into cells and progressive incorporation rather than potential direct binding of ColI-Cy3 to ECM, or other non-specific factors. Showing the image at 0t would demonstrate an absence of external labelled colI and therefore its appearance later could be presumed that it had been internalised before.

      As the cells were trypsinized and replated after one hour labelled collagen feeding to ensure we are only tracking endocytosed collagen, t=0 in this case would be cells that are unattached. We have included t=18hr images post replate instead to show baseline level of collagen (revised Figures S2A, line 110).

      Figure S1A: yellow box: doesn't show only Cy3-ColI, there is red and yellow in the central cell, and large yellow blobs in the cell above. These images do not support this claim, including the Fiber Zoom box. They should also be shown in single channels to demonstrate the authors' points better.

      Apologies for the confusion – this is to show that newly added FAM5 Collagen is also co-localising with previously endocytosed Cy3-ColI, i.e. the Cy3-ColI is persisting rather than being degraded.  

      Line 92: endocytosed into distinct structures: These images are very vague, but I don't think you can call them distinct structures, all you can say from this is that they are spots.

      We have changed the wording to ‘distinct puncta’.  

      It is not clear why the authors use Cy3, Cy5, and 5FAM labelled colI. A brief explanation would be useful.

      Apologies for the confusion, we initially included our justification (to show that the fluorescence labels do not change the way collagen is internalised) but removed it in the final manuscript due to length. We have added the justification (revised line 101-102).   

      Figure 1F: It would be useful to see a quantification of the Cy3 channel here: I agree with the conclusions, and find the 0.5 ug/ml condition more convincing than 0.1 actually, although there is some feint Cy3 in cell-free samples there seems to be quite a big increase in the presence of cells, and this would look more convincing if quantified.

      We thank the reviewer for this suggestion and have included quantification in the revised manuscript (revised Figure 1G-I).  

      Figure 2B: Dyng is not an abbreviation of Dyng. Standardise Dyng/Dyngo/Dyngo4a. WB is soluble colI and represents little (if any) insoluble col. IF is more or less the other way round. How do they compare this?

      Thank you for pointing out the inconsistencies, we have corrected this in the revised manuscript. We took the conditioned media from the same experiment where cells are fixed for IF and carried out Western blot analyses. The IF showed some collagen still present, albeit significantly reduced. This is in agreement with the western blot results (i.e. Dyng4a inhibits both soluble and insoluble forms of collagen deposition).  

      Figure 2C: not an image series. Quant: no cells/independent exps and STATS?

      Apologies for the missing experimental details in figure legends, it should say ‘representative of N=3 experiments’. We are not sure what the reviewer meant by Figure 2C not being an image series, as we meant it to be an image series of the individual fluorescence channels. We have changed this terminology to avoid confusion, and have included statistical analyses in the methods section. The statistical analyses of the fibril quantification is next to the fluorescence images.  

      Figures 2D/E: The authors show that internalised ColI peaks at 20h and decreases to 60h, Fibers peak at 40h. How is this measured? ECM removed? Why would there be less in the cells, degradation? Whats the synchronisation?

      We apologise for omitting the synchronisation method in methods section, and have included in our revised manuscript (revised lines 542-544). This is through dexamethasone addition (and removal after 1hr incubation) as standard. The internalised Col-I is measured using Cy3ColI so the cells would have both nascent and external collagen. Total intracellular collagen at the different time points would likely be higher than represented as a result, but here we are demonstrating that internalisation is a rhythmic event using the external labelled collagen. Fibers are measured using standard IF and then fibril counting.  

      Please note that we are only overlaying the two graphs to form our hypothesis that endocytosis may be used for accumulation of collagen protomers that then allows for efficient fibrillogenesis. They are not directly comparable as the quantification are of different things (internalised Cy3-ColI, total collagen fibrils). We have clarified this in our discussion (revised lines 399-401).  

      Discussion: Where does the ColI go? Solubilised? Degraded? Taken up by other cells? 

      The inverse correlation is not very tight. In fact, at 38h where fiber count peaks, Cy3-ColI also peaks (esp in normalised data, Figure S2D).

      We thank the reviewer for this comment and have reworded our main text to reflect this, and included additional discussion in our revised manuscript (revised lines 401-404).  

      Line 123: What is the turnover rate of Fibrils? Don't know for how long the transcription has been done, or when this would affect the fibril number. You have the quant for Fn1, where is the quant for ColI?

      We have included the quantification of collagen-I in original Figure 2A. We appreciate that it might cause confusion in Figure 2C (as we co-stained ColI and Fn1 in the same experiment) we have removed the collagen-I panel from the revised Figure 2C. We know from previous results that the number of fibrils fluctuate over 24hour period, although the turnover of one specific fibril is unlikely going to be 24 hours (https://www.biorxiv.org/content/10.1101/331496v2)

      Line 124: no accumulation of col in extracellular space, but you don't know how much endogenous colI (or other endogenous ECM proteins) they're taking up as it isn't measured here. If the author wants to comment on this, should use either exogenous col to monitor take up and resection or block transcription/translation to show fibril formation endo/exocytosis independent of endogenous synthesis.

      This experiment has been done in the original manuscript – siCol1a1 experiment was done with two rounds of siRNA, first round is normal transfection followed by reverse transfection onto fresh coverslips (this will ensure no prior ECM is being deposited, see Figure 3). However we appreciate that there may still be low levels of endogenous collagen-I, and thus have included new data using collagen-I knock-out fibroblasts to strengthen our findings (revised Figure 4).  

      Line 142: Why is fibronectin synthesis also decreased in Col KD? This is clear in the image but no explanation/reference is given.

      Due to the dynamic and complex nature of ECM, it is unsurprising if there is a knockon effect when knocking down one matrix protein. However, we have quantified the amount of fibronectin fibril deposited by scr and siCol1a1 fibroblasts, and showed that there was in fact no significant change between the two treatments (revised Figure 3A).

      Figure 3A: Need labels for which colour/protein is shown. Needs quantifying, especially as the Fn1 decrease is not so obvious here, it is consistent between Figure 3A and 2C?

      We have provided quantification in the revision (revised Figure 3A). Figure 3A and 2C are two separate experiments (one is Dyngo treatment and one is siCol1a1), and neither showed significant changes in fibronectin fibril areas.   

      Figure 3B: Line 151: the text states that "The observation of fibrillar Cy3 signals in siCol1a1 cells showed that the cells can repurpose collagen into fibrils without the requirement for intrinsic collagen-I production (red arrow Figure 3B), however, there is clearly endogenous colI here too (along the fiber and also strongly at each end). Does the ColI antibody recognise the exogenous ColI?

      In our hands the ColI antibody does not recognise exogenous ColI, as the cell-free Cy3-ColI images were also stained with ColI antibody to ensure the two experimental conditions were treated exactly the same.

      This conclusion could only be made in the true absence of collagen: either in knock-out cells, or where collagen production/trafficking has been blocked (ie knockout of ColI chaperone or ERES block), or in a cell type that produces collagens but not ColI. Alternatively, if there are any fibrils seen that are completely negative, they should be shown in the figure and quantified (number of Cy3-ColI+-ColI+ vs Cy3-ColI+-ColI-).

      We thank the reviewer for this suggestion. We have included new data from collagen knock-out fibroblasts in this revision (revised Figure 4).  

      Figure S4A: the quality of this blot isn't very high, the result is not very clear and the high intensity (unspecific?) band below confounds the interpretation. In the author's previous paper (NCB 2020) the blots for VPS33B were much clearer, as is Fig S4D. It would be nice to include a clearer blot, maybe from the other repeats.

      This is the only blot that we used to select which knockout clones to use for our previous paper, which is why the quality is not as high. Knockout clones were all verified with additional western blots, and we do not think that endogenous VPS33b is expressed at high levels (also verified by MS analyses).  Fig S4D is overexpression of VPS33b, which is much easier to detect.  

      Figure S4D: This blot is much clearer, it would be useful to include a high gain to show the VPS33B band in CT to be able to understand the true increase.

      From the qPCR data one can see that the increase at mRNA is 20+ fold increase; we’ve always had problems trying to detect endogenous VPS33b using western blot or mass spectrometry analysis.  

      Figure 4A: The fibrils here in the CT are not obvious, and the difference between CT and KOs is not appreciable. Would this be clearer shown at a lower magnification, with zooms where needed? Or immunogold labelling/CLEM to label the ColI?

      It is not trivial to carry out immunogold labelling/CLEM. These are cell-derived matrices in culture and thus lower magnification may not show as many collagen fibrils as one would expect. We are not confident that lower magnification will provide more information as the characteristic D-banded collagen pattern will be lost.  

      Line 167/Figure 4B: It looks like there is more internal ColI in KO, but the images are not good enough to tell. This could be better shown by flow cytometry.

      We have previously seen that VPSKO leads to accumulation of collagen-I in intracellular punctas (NCB2020) which is also seen here. Flow cytometry data for internalisation of external collagen is already included in original Figure 5G (revised Figure 6H).  

      Again you mention intercellular vesicles, but based on these images, it is not possible to conclude this. These large spots could be aggregation elsewhere in the cell. Specific localisation should be shown by co-labelled IF/confocal, or it could be nicely shown by EM + fluorescent element (CLEM / Immunogold), or these statements removed from the text.

      We appreciate that the term ‘vesicles’ is very defined in the trafficking field, and have changed it to ‘intracellular compartments’.  

      Line 173-174 / Figure 4E: Why do you think the matrix mass is not increased in VPSoe by the approach shown in E when there is seemingly a huge increase by IF? E must also measure other ECM matrix proteins, which do you expect to be secreted by these cells? Could this confound the data if they too are affected by VPSoe?

      IF is showing specifically collagen-I. Hydroxyproline detects multiple collagens, and shows a trend of increase (although not significant due to one outlier). Matrix mass is a very generic measurement of total ECM deposited based on decellularized ECM weight. The reviewer is correct that VPSoe may also affect other ECM deposition, however here we are focussing specifically with its effect on collagen-I. How VPSoe changes other types of ECM deposition would be something that could be addressed in future studies and is not within scope of this manuscript.   

      Are the results in E paired?

      Individual values between control and VPSoe in each separate experiments are paired.  

      Figure 4F: Is quantification from IF shown in D? Specify which kind of microscopy it is based on.

      Quantification is based on fibril counting using standard fluorescence microscopy, as used in our previous paper. D is independent of F, as F is specifically looking at synchronised circadian effects, and D (and elsewhere) we are looking at global collagen deposition effects, irrespective of what time of day the cells are in.  

      Figure S5F: What do the yellow/red spots in the blots represent?

      We apologise for the initial unclear description of what the yellow/magenta circles depict in relation to the phosphoimages of the radiolabelled cell free translation products displayed in Supplementary Figure 5, panels F, G and I. These circles indicate non-glycosylated (yellow) and N-glycosylated (magenta) species respectively, as is now clearly descried in the revised manuscript.

      Figure 5 title: You can't conclude this from these images, need confocal and PM or cytosolic marker.

      We have changed the title to ‘VPS33B co-trafficks with collagen-I”. There is no good commercial VPS33b antibody for immunofluorescence staining, which is why we used the split GFP approach in this paper, and the images were acquired using confocal imaging (Olympus SpinSR system).  

      Figure 5E: The authors describe that ColI is in endosomes throughout most of the paper, and this is based on the involvement of VPS33B in the colI pathway. VPS33B is thought to be at the late endosome/lysosome. However, these images do not look like classic endosomes or lysosomes, or other normal organelle IF phenotypes. The fluorescent intensity looks saturated, and it is difficult to conclude anything from these images. It is unclear where in the cell the largest blob in the zoom would be localised and in which cell. I would suggest that this image is replaced and proper controls included (IgG controls and single channels) as well as using different markers for other potential intracellular structures.

      We appreciate the reviewers comment with regards to the classification of VPS33b localisation in the endosome compartment. We did not mean to use VPS33b as an endosome marker, as the focus of our studies are the function of VPS33b in directing endogenous or exogenous collagen to fibrillogenesis. With live imaging we could see endocytosed collagen moving in intracellular compartments, and have conducted additional staining to show co-localisation with Rab5 (revised Figure 1), which we take to indicate, through convention, that it is occupying an endosome compartment. We have included single channel images in the revised manuscript (revised Figure 6E).

      Line 255/ Figure 5G: no consistent change in uptake. Why are the results so varied in the KO and oe, here and in Fig 4C/E? N=4, what does that mean? 4 cells? 4 independent exps?

      In all cases, “N” represents independent biological experiments in this manuscript. Thus “N=4” in this case is 4 independent biological experiments, with at least 10,000 cells analysed per experiment. 

      We don’t know why there is a variation in response, however that is also why we concluded that it is unlikely that VPS33B is directly involved with collagen uptake. We have changed 5G (now revised Figure 5H) to a paired line graph for better representation.  

      Figure 5H shows the uptake of Cy5ColI. At this resolution, VP2ko looks like the col is ER, in one of the cells in the zoom, it looks like it is at Golgi. I think that the uptake route of ColI needs to be better defined, as there is no way to tell here where the colI goes. ColI being recycled/degraded would be most likely. But this figure looks like that might not be the case. It is also not clear where the zooms come from, they should be indicated with dashed boxes in the lower mag image

      We thank the reviewer for this comment, and agree that we need to define the uptake route of ColI. This is currently being assembled as a methodology manuscript, and how ColI is being recycled/degraded is one major research area of the Chang lab. 

      We have added dashed boxes in the lower mag images to indicate where the zooms derived from, and we would also like to thank the reviewer for pointing this out as we realised we have accidentally cropped the image to a slightly different area for the VPSko image, and have now corrected this.  

      Line 257: Based on this data, it could be trafficking through the cell as well as into the extracellular space.

      We think that VPS33B is involved in trafficking collagen through the cell to plasma membrane but not secreted, as based on our split-GFP experiment we never observed extracellular GFP signal, which suggests VPS33b is not deposited extracellularly.

      Line 259: "highlighting the role in recycling col to fibril formation sites" is an overstatement based on the data shown here, there is no data on colI trafficking or its regulation

      We respectfully disagree that we have not shown data on col-I trafficking or regulation by VPS33b – split GFP highlighted cotrafficking to the plasma membrane, and we have shown a clear relationship between VPS33b and collagen-I fibril formation, with minimal changes to collagen-I mRNA levels. We acknowledge that we have not shown specifically the location of VPS33b at fibrillogenic sites and have modified this statement in revised manuscript (revised line 302).  

      Line 262: "Having identified VPS33B as specifically driving collagen-I fibril formation" is also an overstatement.

      We refer here the data that VPS33b is not controlling collagen-I secretion (as demonstrated by the CM westerns) and specifically fibrillogenesis. We have clarified this in the revised text (revised line 304).  

      Line 286: It would be useful to have a brief intro to PLOD3.

      We have included a brief intro to PLOD3 in the introduction, as well as the results highlighted by the reviewer, in our revised manuscript (revised line 54-58).  

      Line 289/290: There could be other explanations for disruption to exo-endocytosis when disrupting col trafficking. Is VPS33B controlling exocytosis in general? Why should it be specific to col? Likewise with siITGA11 KD? Hypothesis for ITGA11 and fibrillogenesis?

      The relationship between ITGA11 and collagen fibrillogenesis is currently in a manuscript by Donald Gullberg and Cedric Zeltz, under revision at Matrix Biology (see reference 63 in revised manuscript). We do not think that VPS33b is controlling exocytosis in general, which is supported by the minimal change in ponceau stain of the western blots in the manuscript. Previously it has been shown that VPS33B co-trafficks with PLOD3, a collagen-I modifier.  

      Figure 6I: Why only quant Scr + siITGA11, not in VPSoe? It looks like there is still an increase in intracellular or fibril formation in VPSoe + siITGA11, which would be a key result to discuss.

      We would like to clarify that 6I (now revised Figure 7I) is on the endocytosis of exogenous collagen-I, not quantification of Figure 6H.  

      Line 307: Discuss fibrillogenic sites, what are they?

      As we have not shown direct evidence of VPS33B delivering endocytosed collagen at the site of fibrillogenesis, we have decided to alter the text to avoid overstatement, as suggested from previous reviewers’ comments.  

      Figure 8: What does pentachrome label?

      Pentachrome staining allows for simultaneous staining of multiple species: collagen in red, sulphated mucopolysaccharides in violet, red blood cells in yellow, muscle in orange, nuclei in green.

      Line 326: "In this study we have identified the endosome as a major protagonist in..." This is an overstatement and cant be drawn from this data.

      We have modified this statement to “In this study we have identified an endocytic recycling mechanism for type I collagen fibrillogenesis that is under circadian regulation”

      Line 330/331: "Collagen-I co-traffics with VPS33B in a VIPAS-containing endosomal compartment that directs collagen-I to sites of fibril assembly," This is also an overstatement that cannot be drawn from this data.

      We have modified this statement to “Collagen-I co-traffics with VPS33B to the plasma membrane for fibrillogenesis”.  

      Line 340: again, the demonstration of the involvement of the endocytic pathway is very limited.

      We have provided new evidence in the revised manuscript that support the involvement of classical endosomal compartments.  

      Line 366: You cant conclude this, you have not manipulated these proteins to show a functional effect or modulation of fibrillogenesis, it could still be a secondary effect.

      We have provided new evidence in the revised manuscript that supports this conclusion. 

      Line 569: "Unless otherwise stated, incubation and washes were done at room temperature." Which incubations? Specify if this is just post-fixation during the EM prep or during cell culture.

      This is specific to the EM preparation and we have clarified in the revised manuscript (revised line 663).  

      Small text alterations:

      Overall we would like to thank the reviewer for highlighting these errors and mistakes in our manuscript, and have corrected them in our revised manuscript.  

      Figure 1E: Fluoro image series? This is only one image.

      We wrote this to mean single channel images, we have corrected the terminology.  

      Line 111: Ref for Dyngo4a?

      We have included this in the revised manuscript  

      Line 121: introduction/abbreviation definition for Fn1? Instead it is on Line 140.

      Thank you for highlighting this, we have corrected this in revised manuscript.  

      Figure S2C: Alignment of labels cleaves x-axis.

      We thank the reviewer for catching this and have corrected this with our revised manuscript.  

      Figure S4F and G should be inverted to mention sequentially in the text.

      We thank the reviewer for catching this and have corrected this in our revised manuscript.  

      Line 182: Figure 4J should be G.

      We thank the reviewer for catching this and have corrected this in our revised manuscript.

      Line 209: typo: N-glycosylated.

      We have corrected this typo in our revised manuscript.

      Fig 6E: Very big as a figure element compared to others.

      We have made this smaller in the revised manuscript to fit better with rest of the figure.  

      Line 313: Figure 7E not F.

      Thank you for spotting this, we have corrected it.  

      Line 555: Typo: Scraped.

      We have corrected this typo in our revised manuscript.

      Line 562: missing )

      We have corrected this typo in our revised manuscript.

      Standardise

      We thank the reviewer for spotting the mistakes below and have corrected in our revised manuscript.  

      Legends: Include numbers of repeats and STATs throughout. 

      Terminology: Dyng etc. 

      Scale bars: some included as editable lines, some with size on top, small/large etc.

      In certain cases we have positioned the scale bars in different regions of the figures to ensure no obscuring of the images.

      VPS33b v B. 

      Reviewer #2 (Recommendations For The Authors):  

      The authors can improve the experimental part of the manuscript the following: 

      -  For all the western blots please include molecular weight markers.

      We thank the reviewer for noticing this omission and have included molecular weight markers in the revised manuscript.  

      - Performing immunofluorescence and western blot analysis of endocytosed collagen -/+ inhibitors for lysosomal degradation (BafA1 or E64d+PepstatinA) in order to exclude endocytosis for degradation.

      We thank the reviewer for this comment, another paper from the lab has identified lysosome to be involved in collagen fibrillogenesis (https://www.biorxiv.org/content/10.1101/2024.05.09.593302v1), thus  

      - Figure out how Dyngo4a is affecting Col1 secretion in the first place? Does it interfere with the secretory pathway. Alternatively, use a different model to block endocytosis (e.g. siRNA Dynamin).

      We thank the reviewer for raising this. The Dyngo CM blot for total ponceau stain (revised Figure 2B) showed minimal changes, which suggest that global secretion is not affected.  

      - Further characterization of the VPS33B / collagen vesicles by immunofluorescence containing markers for early, late, and recycling endosomes. Block endocytic recycling by depletion of either Rabs or e.g. EHD1.

      There are no good VPS33b antibody for staining. We have included images of GFP-tagged Rab5 co-localisation with labelled collagen-I (revised Figure 1D, Figure S6B).  

      - Further clarify the status of the VPS33B knockouts e.g. by sequencing. also provide a readout of the siRNA KD, besides the mRNA levels, since there the difference is not striking.

      The knockout cell lines were characterised previously in our 2020 paper, which is referred to in our revised manuscript. We have always had issues detecting endogenous VPS33b due to reagents limitations, which is why we resorted to mRNA as the key readout.  

      - Doing siRNA knockdowns and endocytosis inhibition in the IPF fibroblasts to further strengthen the link between elevated expression of VPS33B/ ITGA11 and increased collagen uptake.

      We thank the reviewer for suggesting these experiments. Due to limitations of the patient-derived fibroblasts (cell numbers and passage numbers) we had to prioritise experiments, and thus have performed siRNA against VPS33B and ITGA11 in the IPF fibroblasts. We showed that in both cases the amount of recycled labelled-collagen in collagen fibrils is significantly reduced (revised Figure 8D).  

      Reviewer #3 (Recommendations For The Authors): 

      Major points 

      (1) Choice of cells: Please provide a rationale for why each cell line was used, and make sure that it is clear throughout the manuscript which cell line was used for each particular experiment. The HEK293T cell line is also missing from the reagent table.

      We thank the reviewer for pointing out this omission, and have clarified in our revised manuscript which cell lines were used in each experiment. We used HEK293T to generate lentiviruses as described in the methods section.  

      (2) Missing information from methods. Experimental details are missing from the methods in several places, making it difficult for someone to replicate an experiment. For example, no details are given in the methods describing the explant culture of murine tail tendons (described in results lines 78100), and there are no details on how the skin samples were obtained or stained. Further, no ethical approval details are provided for the use of human skin tissue.

      We apologise for leaving the ethical approval details and skin sample collection out, this was an oversight and will be included in the revised manuscript. We have also included the method to how murine tail tendons were cultured ex vivo (revised lines 527-531, 546-553).  

      (3) Functional studies in patient-derived cells. To fully establish the role of VPS33b and ITGA11 in fibrotic diseases, functional studies including the knockdown/overexpression of these genes could be performed to establish if the same response is seen as in non-diseased cells.

      We agree that this will add much to the paper, and have performed siRNA against VPS33B and ITGA11 in the IPF fibroblasts. We showed that in both cases the amount of recycled labelled-collagen in collagen fibrils is significantly reduced (revised Figure 8D).

      Minor Points

      We thank the reviewer for pointing out these mistakes, and have corrected and included additional details in the revised manuscript.  

      (1) Lines 51-52. Wording of this sentence is unclear, please rephrase. 

      (2) Line 182. Should this be Fig 4G rather than J? 

      (3) Line 209. Correct spelling of glycosylated. 

      (4) Line 463. Incomplete brackets and details missing? 

      (5) Line 590. Correct tense - was rather than are. 

      (6) Line 593. Specify centrifugation speed. 

      (7) Line 619. Nuclei rather than nucleus. 

      (8) Ln 650. Statistical analysis - was normality tested? 

      (9) Figure 1e - Difficult to read labels for coll/DAPI.

    1. eLife Assessment

      This valuable study discusses a hot topic in post-endoscopic retrograde cholangiopancreatography pancreatitis. The new score for predicting post-ERCP pancreatitis offers an idea about the risk of pancreatitis before the procedure. Although most scores depend on intraprocedural manoeuvres, such as the number of attempts to cannulate the papilla, this is a solid retrospective single-center study in one country. To be validated in the future, this score will need to be done in many countries and on large numbers of patients.

    2. Joint Public Review:

      Summary:

      This work provides a new general tool for predicting post-ERCP pancreatitis before the procedure depending on pancreatic calcification, female sex, intraductal papillary mucinous neoplasm, a native papilla of Vater, or the use of pancreatic duct procedures. Even though it is difficult for the endoscopist to predict before the procedure which case might have post-ERCP pancreatitis, this new model score can help with the maneuver and when the patient is at high risk of pancreatitis, sometimes can be deadly), so experienced endoscopists can do the procedure from the start. This paper provides a model for stratifying patients before the ERCP procedure into low, moderate, and high risk for pancreatitis. To be validated, this score should be done in many countries and on large numbers of patients. Risk factors can also be identified and added to the score to increase rank.

      Strengths:

      (1) One of the severe complications of endoscopic retrograde cholangiopancreatography procedure is pancreatitis, so investigators try all the time to find a score that can predict which patients will probably have pancreatitis after the procedure. Most scores depend on the intraprocedural maneuver. Some studies discuss the preprocedural score that can predict pancreatitis before the procure. This study discusses a new preprocedural score for post-ERCP pancreatitis.

      (2) Depending on this score that identifies low, moderate, and high-risk patients for post-pancreatitis, so from the start, experienced and well-trained endoscopists can do the procedure or can refer patients to tertiary hospitals or use interventional radiology or endoscopic retrograde cholangiopancreatography.

      (3) The number of patients in this study is sufficient to analyze data correctly.

      Weaknesses:

      (1) It is a single-country, retrospective study.

      (2) Many cases were excluded, so the score cannot be applied to those patients.

      Comments on revised version:

      Depending on old references cannot help us know the current situation. What if there are better more recent predictive tools? It would be better to test the validity of that score against, if present, a proven score to check its validity.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public review:

      Summary:

      This work provides a new general tool for predicting post-ERCP pancreatitis before the procedure depending on pancreatic calcification, female sex, intraductal papillary mucinous neoplasm, a native papilla of Vater, or the use of pancreatic duct procedures. Even though it is difficult for the endoscopist to predict before the procedure which case might have post-ERCP pancreatitis, this new model score can help with the maneuver and when the patient is at high risk of pancreatitis, sometimes can be deadly), so experienced endoscopists can do the procedure from the start. This paper provides a model for stratifying patients before the ERCP procedure into low, moderate, and high risk for pancreatitis. To be validated, this score should be done in many countries and on large numbers of patients. Risk factors can also be identified and added to the score to increase rank.

      Thank you for reviewing our manuscript. We hope that this score will be validated in other countries from now on.

      Strengths

      (1) One of the severe complications of endoscopic retrograde cholangiopancreatography procedure is pancreatitis, so investigators try all the time to find a score that can predict which patients will probably have pancreatitis after the procedure. Most scores depend on the intraprocedural maneuver. Some studies discuss the preprocedural score that can predict pancreatitis before the procure. This study discusses a new preprocedural score for post-ERCP pancreatitis.

      Thank you for evaluating our manuscript and raising a strength of this manuscript.

      (2) Depending on this score that identifies low, moderate, and high-risk patients for post-pancreatitis, so from the start, experienced and well-trained endoscopists can do the procedure or can refer patients to tertiary hospitals or use interventional radiology or endoscopic retrograde cholangiopancreatography.

      Thank you for evaluating our manuscript and raising a strength of this manuscript.

      (3) The number of patients in this study is sufficient to analyze data correctly.

      Thank you for evaluating our manuscript and raising a strength of this manuscript.

      Weaknesses:

      (1) It is a single-country, retrospective study.

      Thank you for this comment. It’s exactly as you said. This is a limitation (Lines 326-327).

      (2) Many cases were excluded, so the score cannot be applied to those patients.

      Thank you for this valuable comment. The predictive PEP score is not necessary for the excluded patients. The reasons were as follows. Biliary duct cannulation was not attempted in patients for whom it was difficult to identify the Vater papilla. The biliary tract was separated from the pancreas in patients with a past history of choledochojejunostomy, pancreatojejunostomy, or pancreatogastrostomy. PEP risk was thought to be low in these patients and patients who underwent bile duct cannulation via the choledochoduodenal fistula. PEP diagnosis is difficult in patients with acute pancreatitis, whose diagnosis is currently in progress. We added these explanations (Lines 98-106).

      (3) Many other studies, e.g., https://link.springer.com/article/10.1007/s00464-021-08491-1, https://pubmed.ncbi.nlm.nih.gov/36344369/, that have been published before discussing the same issue, so what is the new with this score?

      Thank you for raising the new reference written by Archibugi et al. in 2023. The novelty of our score is that it is calculated using the factors that are investigated before ERCP procedures. The study written by Archibugi et al. involved procedure time and cannulation attempts for PEP prediction. These two factors are unknown before ERCP procedures. Therefore, a preprocedural predictive risk model for PEP was not created before our study was performed. We added the content of the past study written by Archibugi and included the report as a reference (Lines 65-67, 73-74).

      (4) The discussion section needs reformulation to express the study's aim and results.

      Thank you for this valuable comment. I have rewritten the first paragraph of the discussion. In the paragraph, we showed that the study achieved the aim on the basis of the results (Lines 245-255).

      (5) Why did the authors select these items in their scoring system and did not add more variables?

      Thank you for this valuable comment. We selected the items listed in the Japanese guidelines for acute pancreatitis and post-ERCP pancreatitis. We added this description (Lines 123-126). The original references of the guidelines were cited in the first draft version.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comment1. Please revise these documents: copyright, disclaimer, ethics approval, consent to participate, consent for publication, data and material availability, competing interests, funding, authors' contributions, and acknowledgments.

      First, thank you for reviewing our manuscript. We have already described the required information in the “author information” section. The sentences containing this information were proofread in English.

      Reviewer #2 (Recommendations for the authors):

      Comment 1. It would be best if you did this study in a Prospective way for more validation.

      First, thank you for reviewing our manuscript. We have revised our manuscript according to your comments. It’s exactly as you said. These points are limitations (Lines 312-318, lines 326-327). We hope that future validation studies over wider geographic regions will prove our opinions.

      Comment 2. The model name should be Acronyum (the first letter of the five items in the risk model).

      Thank you for this valuable comment. Sorry, we could not create a memorable model name using the first letter of the five items.

      Comment 3. You say that you include the pre-procedure criteria that predict PEP. You state one of the items, pancreatic duct procedure. Do you mean it is a history?

      Thank you for this valuable comment. This means that the main purpose is the pancreatic duct. Therefore, the pancreatic duct procedure is listed as “planned pancreatic duct procedures” in Figure 2 (Lines 40-41, 231-234). When an unintended pancreatic duct procedure is performed, we can calculate the risk score by adding two points for “planned pancreatic duct procedures” (Lines 48-49, 247-250).

      Comment 4. Regarding calcification, do you mean chronic pancreatitis? It needs more clarification regarding its degree.

      Thank you for this valuable comment. We regard pancreatic calcification as a finding of chronic pancreatitis. Pancreatic calcification was defined as the degree that was confirmed by imaging, such as CT, MRI, and EUS. These definitions have been written in the first draft version (Lines 134-137).

      Comment 5. Why don't you include young age in the model? Your result found that age less than 50 is significantly associated with PEP.

      Thank you for this valuable comment. We selected the PEP risk factors listed in the Japanese guidelines for acute pancreatitis and post-ERCP pancreatitis. Age less than 50 years was listed as a PEP risk factor in the Japanese guidelines for acute pancreatitis. We added this description (Lines 123-126).

      Comment 6. There is an ancient reference, some of them in 1994,1996.

      Sorry for the old references. These references were written by Cotton et al. 1991, Freeman et al. 1996, and Loperfido et al. 1998. These are still important today. The diagnostic criteria for PEP were determined in the report written by Cotton et al., which is Cotton’s criteria. The other two references are representative reports that described risk factors for PEP, and these two reports were cited in the Japanese guidelines for pancreatitis written by Takada et al. 2022 (Lines 123-126).

      Comment 7. In the introduction, you say that the first score includes one of the items for PEP pain during the procedure. It is a little bit strange.

      Thank you for this comment. The first PEP risk score did not involve PEP pain but involved pain during the procedure (Line 68).

      Comment 8. We know that once ERCP is indicated, you justify the importance of the risk model, stating that if one or more risks are found, we can do EUS or PTD. It is not reasonable to abort the procedure in case of frequent pancreatic duct cannulation or cancel ERCP if pt has one or more risk factors.

      Thank you for this valuable comment. If ERCP is performed for high-risk patients, prophylaxes for PEP, such as procedures by experts, pancreatic stent placement, and NSAID suppository insertion, should be performed as much as possible (Lines 281-287, 308-311).

      Comment 9. Regarding ERCP pancreatitis criteria, does it include amylase 3t or lipase?

      Thank you for this comment. We used Cotton’s criteria for diagnosing PEP. Cotton’s criteria include hyperamylasemia (more than three times the normal upper limit) at least 24 hours after ERCP (114-116).

      Comment 10. It is well known that pr with functional biliary disorder has a high incidence of PEP; it doesn't need a manometer for diagnosis. It needs to be included.

      Thank you for this comment. Moreover, functional biliary disorders are difficult to diagnose before ERCP procedures (Lines 259-262). The factor that is not apparent before ERCP could not be included in the predictive PEP scoring system.

      Comment 11: What is gabexare and nafamost.

      Thank you for this comment, and sorry for our insufficient explanation. These compounds include gabexate masilate and nafamostat masilate, which are protease inhibitors. In some institutions, protease inhibitors are used as prophylaxis for PEP. We added “protease inhibitors” (Lines 138-139, Tables 1 and 2).

      Reviewer #3 (Recommendations for the authors):

      Comment 1. The sample size needs clarification.

      First, thank you for reviewing our manuscript. The sample size has been included in the “Methods” section (Lines 157-165).

      Comment 2. They need to be mentioned cause they depend on old references in discussion and background.

      Thank you for this comment. The previous references were written by Cotton et al. 1991, Freeman et al. 1996, and Loperfido et al. 1998. These are still important today. The diagnostic criteria for PEP were determined in the report written by Cotton et al., which is Cotton’s criteria. The other two references are representative reports that described risk factors for PEP, and these two reports were cited in the Japanese guidelines for pancreatitis written by Takada et al. 2022 (Lines 122-126). In the background and discussion, we added new recent references and information related to the references (Lines 65-67, 285-287, 291-295, 308-311).

      Comment 3. Case definition should be added to the methodology.

      Thank you for this comment. We added patient information. Please refer to the response against the eLife assessment, weakness, (2).

      Comment 4. Do you include all who met the inclusion criteria, or was there any random sampling technique?

      No, we did not use random sampling techniques.

      Comment 5. What is the value of comparing the development and validation groups? I do not think it adds anything new as if you want to exclude confounders. Has the comparison revealed that a confounder does exist? What was your point of view concerning that?

      Thank you for this valuable comment, and sorry for the insufficient explanation. The differences between the development cohort and the validation cohort are important because the goodness of fit for the score could be confirmed in significantly different groups. We added this explanation (Lines 197-199, 251-253).

    1. eLife Assessment

      This important study provides one mechanism that can explain the rapid diversification of poison-antidote pairs in fission yeast: recombination between existing pairs. The evidence is largely solid, but the study can benefit from demonstrating that the novel poison-antidote constructed by the authors can serve as a meiotic driver. The work is of interest to colleagues studying genetic incompatibilities.

    2. Reviewer #1 (Public review):

      Summary

      The authors determine the phylogenetic relation of the roughly two dozen wtf elements of 21 S. pombe isolates and show that none of them in the original S. pombe are essential for robust mitotic growth. It would be interesting to test their meiotic function by simply crossing each deletion mutant with the parent and analyzing spores for non-Mendelian inheritance. If this has been reported already, that information should be added to the manuscript. If not, I suggest the authors do these simple experiments and add this information.

      Strengths:

      The most interesting data (Figure 4) show that one recombinant (wtfC4) between wtf18 and wtf23 produces in mitotic growth a poison counteracted by its own antidote but not by the parental antidotes. Again, it would be interesting to test this recombinant in a more natural setting - meiosis between it and each of the parents.

      Weaknesses:

      In the opinion of this reviewer, some minor rewriting is needed.

    3. Reviewer #2 (Public review):

      Summary:

      This important study provides a mechanism that can explain the rapid diversification of poison-antidote pairs (wtf genes) in fission yeast: recombination between existing genes.

      Strengths:

      The authors analyzed the diversity of wtf in S. pombe strains, and found pervasive copy number variations. They further detected signals of recurrent recombination in wtf genes. To address whether recombination can generate novel wtf genes, the authors performed artificial recombination between existing wft genes, and showed that indeed a new wtf can be generated: the poison cannot be detoxified by the antidotes encoded by parental wtf genes but can be detoxified by own antidote.

      Weaknesses:

      The study can benefit from demonstrating that the novel poison-antidote constructed by the authors can serve as a meiotic driver.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Wang and colleagues explore factors contributing to the diversification of wtf meiotic drivers. wtf genes are autonomous, single-gene poison-antidote meiotic drivers that encode both a spore-killing poison (short isoform) and an antidote to the poison (long isoform) through alternative transcriptional initiation. There are dozens of wtf drivers present in the genomes of various yeast species, yet the evolutionary forces driving their diversification remain largely unknown. This manuscript is written in a straightforward and effective manner, and the analyses and experiments are easy to follow and interpret. While I find the research question interesting and the experiments persuasive, they do not provide any deeper mechanistic understanding of this gene family.

      Strengths:

      (1) The authors present a comprehensive compendium and analysis of the evolutionary relationships among wtf genes across 21 strains of S. pombe.

      (2) The authors found that a synthetic chimeric wtf gene, combining exons 1-5 of wtf23 and exon 6 of wtf18, behaves like a meiotic driver that could only be rescued by the chimeric antidote but neither of the parental antidotes. This is a very interesting observation that could account for their inception and diversification.

      Weaknesses:

      (1) Deletion strains

      The authors separately deleted all 25 Wtf genes in the S. pombe ference strain. Next, the authors performed a spot assay to evaluate the effect of wtf gene knockout on the yeast growth. They report no difference to the WT and conclude that the wtf genes might be largely neutral to the fitness of their carriers in the asexual life cycle at least in normal growth conditions.

      The authors could have conducted additional quantitative growth assays in yeast, such as growth curves or competition assays, which would have allowed them to detect subtle fitness effects that cannot be quantified with a spot assay. Furthermore, the authors do not rule out simpler explanations, such as genetic redundancy. This could have been addressed by crossing mutants of closely related paralogs or editing multiple wtf genes in the same genetic background.

      Another concern is the lack of detailed information about the 25 knockout strains used in the study. There is no information provided on how these strains were generated or, more importantly, validated. Many of these wtf genes have close paralogs and are flanked by repetitive regions, which could complicate the generation of such deletion strains. As currently presented, these results would be difficult to replicate in other labs due to insufficient methodological details

      (2) Lack of controls

      The authors found that a synthetic chimeric wtf gene, constructed by combining exons 1-5 of wtf23 and exon 6 of wtf18, behaves as a meiotic driver that can be rescued only by its corresponding chimeric antidote, but not by either of the parental antidotes (Figure 4F). In contrast, three other chimeric wtf genes did not display this property (Figure 4C-E). No additional experiments were conducted to explain these differences, and basic control experiments, such as verifying the expression of the chimeric constructs, were not performed to rule out trivial explanations. This should be at the very least discussed. Also, it would have been better to test additional chimeras.

      3. Statistical analyses

      In line 130 the authors state that: "Given complex phylogenetic mixing observed among wtf genes (Figure 1E), we tested whether recombination occurred. We detected signals of recombination in the 25 wtf genes of the S. pombe reference genome (p = 0) and in the wtf genes of the 21 S. pombe strains (p = 0) using pairwise homoplasy index (HPI) test. ". Reporting a p-value of 0 is not appropriate. Exact P-values should be reported.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      The authors determine the phylogenetic relation of the roughly two dozen wtf elements of 21 S. pombe isolates and show that none of them in the original S. pombe are essential for robust mitotic growth. It would be interesting to test their meiotic function by simply crossing each deletion mutant with the parent and analyzing spores for non-Mendelian inheritance. If this has been reported already, that information should be added to the manuscript. If not, I suggest the authors do these simple experiments and add this information.

      Thanks for the great summary! Most of the wtf genes have been tested for meiotic drive phenotypes previously by Bravo Nunez et al. (2020; http://doi.org/10.1371/journal.pgen.1008350). The reference was cited in our original manuscript, and we added the details in the revised manuscript.

      Strengths:

      The most interesting data (Figure 4) show that one recombinant (wtfC4) between wtf18 and wtf23 produces in mitotic growth a poison counteracted by its own antidote but not by the parental antidotes. Again, it would be interesting to test this recombinant in a more natural setting - meiosis between it and each of the parents.

      We will test the meiotic driver phenotype of the wtfC4 we constructed in S. pombe as suggested.

      Weaknesses:

      In the opinion of this reviewer, some minor rewriting is needed.

      We did the rewriting as this reviewer suggested in the comments to authors.

      Reviewer #2 (Public review):

      Summary:

      This important study provides a mechanism that can explain the rapid diversification of poison-antidote pairs (wtf genes) in fission yeast: recombination between existing genes.

      Thanks!

      Strengths:

      The authors analyzed the diversity of wtf in S. pombe strains, and found pervasive copy number variations. They further detected signals of recurrent recombination in wtf genes. To address whether recombination can generate novel wtf genes, the authors performed artificial recombination between existing wft genes, and showed that indeed a new wtf can be generated: the poison cannot be detoxified by the antidotes encoded by parental wtf genes but can be detoxified by own antidote.

      Thanks for the great summary!

      Weaknesses:

      The study can benefit from demonstrating that the novel poison-antidote constructed by the authors can serve as a meiotic driver.

      We will test the meiotic driver phenotype of the wtfC4 we constructed in S. pombe as suggested.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Wang and colleagues explore factors contributing to the diversification of wtf meiotic drivers. wtf genes are autonomous, single-gene poison-antidote meiotic drivers that encode both a spore-killing poison (short isoform) and an antidote to the poison (long isoform) through alternative transcriptional initiation. There are dozens of wtf drivers present in the genomes of various yeast species, yet the evolutionary forces driving their diversification remain largely unknown. This manuscript is written in a straightforward and effective manner, and the analyses and experiments are easy to follow and interpret. While I find the research question interesting and the experiments persuasive, they do not provide any deeper mechanistic understanding of this gene family.

      Thanks! Please see the following for our point-to-point response.

      Strengths:

      (1) The authors present a comprehensive compendium and analysis of the evolutionary relationships among wtf genes across 21 strains of S. pombe.

      (2) The authors found that a synthetic chimeric wtf gene, combining exons 1-5 of wtf23 and exon 6 of wtf18, behaves like a meiotic driver that could only be rescued by the chimeric antidote but neither of the parental antidotes. This is a very interesting observation that could account for their inception and diversification.

      Thanks for the great summary!

      Weaknesses:

      (1) Deletion strains

      The authors separately deleted all 25 Wtf genes in the S. pombe ference strain. Next, the authors performed a spot assay to evaluate the effect of wtf gene knockout on the yeast growth. They report no difference to the WT and conclude that the wtf genes might be largely neutral to the fitness of their carriers in the asexual life cycle at least in normal growth conditions.

      The authors could have conducted additional quantitative growth assays in yeast, such as growth curves or competition assays, which would have allowed them to detect subtle fitness effects that cannot be quantified with a spot assay. Furthermore, the authors do not rule out simpler explanations, such as genetic redundancy. This could have been addressed by crossing mutants of closely related paralogs or editing multiple wtf genes in the same genetic background.

      Another concern is the lack of detailed information about the 25 knockout strains used in the study. There is no information provided on how these strains were generated or, more importantly, validated. Many of these wtf genes have close paralogs and are flanked by repetitive regions, which could complicate the generation of such deletion strains. As currently presented, these results would be difficult to replicate in other labs due to insufficient methodological details

      We will generate growth curves for all the 25 wtf deletion strains. We will also provide detailed for wtf gene knockout. However, for 25 wtf genes, there are too many combinations for editing two genes, and it is technically challenging to knock out multiple wtf together. Nevertheless, our results suggest single wtf gene has little effect on the host fitness under normal condition.  

      (2) Lack of controls

      The authors found that a synthetic chimeric wtf gene, constructed by combining exons 1-5 of wtf23 and exon 6 of wtf18, behaves as a meiotic driver that can be rescued only by its corresponding chimeric antidote, but not by either of the parental antidotes (Figure 4F). In contrast, three other chimeric wtf genes did not display this property (Figure 4C-E). No additional experiments were conducted to explain these differences, and basic control experiments, such as verifying the expression of the chimeric constructs, were not performed to rule out trivial explanations. This should be at the very least discussed. Also, it would have been better to test additional chimeras.

      We will verify the expression of the chimeric genes, and test the phenotype of meiotic diver for wtfC4 in S. pombe.

      (3) Statistical analyses

      In line 130 the authors state that: "Given complex phylogenetic mixing observed among wtf genes (Figure 1E), we tested whether recombination occurred. We detected signals of recombination in the 25 wtf genes of the S. pombe reference genome (p = 0) and in the wtf genes of the 21 S. pombe strains (p = 0) using pairwise homoplasy index (HPI) test. ". Reporting a p-value of 0 is not appropriate. Exact P-values should be reported.

      We will report the exact p values in the revised manuscript.

    1. eLife Assessment

      This important chronobiological study in mice suggests that light modulated activity of Cdk5 activity on the PKA-CaMK-CREB signaling pathway provides missing molecular mechanistic details to understand light-induced circadian clock phase delays during the early night, but not for phase advances in the morning. The authors provide convincing evidence bridging from behavioral to molecular/cellular experiments to neural activity imaging.

    2. Reviewer #1 (Public review):

      In the manuscript Cyclin-dependent kinase 5 (Cdk5) activity is modulated by light and gates rapid phase shifts of the circadian clock Brenna et al., study the role of Cdk5 on circadian rhythms, the authors aim to elucidate the role of Cyclin-Dependent Kinase 5 (Cdk5) in modulating circadian rhythms, particularly in response to light cues. They hypothesized that Cdk5 acts as a gatekeeper, regulating the sensitivity of the circadian clock to light-induced phase shifts.

      Strengths:

      • Novelty: The study presents a novel mechanism by which Cdk5 influences circadian rhythms, particularly its role in modulating the light-induced phase-shifting response.<br /> • Experiments: The authors have employed a combination of molecular, cellular, and behavioural techniques, including genetic manipulations, biochemical assays, and electrophysiology, to investigate the role of Cdk5. The set of experiments performed in this work is non-trivial, done to a high standard and the additional experiments, data and textual alterations presented following the 1st round of review needs to be lauded.<br /> • Data: The data is well-presented in clear figures and appropriately described in the text.

      Weaknesses:

      • Although I found the data on Cdk5 gating light responses highly convincing there could be additional mechanisms which the authors have duly acknowledged and discussed in their text.<br /> In my assessment, the authors have convincingly demonstrated that Cdk5 plays a critical role in gating the light-induced phase-shifting response of the circadian clock. Their results strongly support their conclusions, as evidenced by their findings:<br /> This study provides valuable insights into the molecular mechanisms underlying circadian rhythm regulation and the impact of light on the circadian clock. The findings have the potential to influence future research in the field of chronobiology and may have implications for understanding and treating circadian rhythm disorders.<br /> The methods and data presented in this study are valuable to the field and can be used to further investigate the role of Cdk5 and other signalling pathways in circadian rhythm regulation.<br /> Broader context<br /> The circadian clock is a fundamental biological process that regulates various physiological functions, including sleep-wake cycles, hormone secretion, and metabolism. Disruptions to the circadian clock have been linked to a variety of health problems, such as sleep disorders, metabolic disorders, and cancer. Understanding the molecular mechanisms that underlie circadian rhythm regulation is essential for developing effective treatments for these disorders.

      All in all, I have no reservations regarding the manuscript titled "Cyclin-dependent kinase 5 (Cdk5) activity is modulated by light and gates rapid phase shifts of the circadian clock by Brenna et al. After consideration of the authors' revisions, I believe the manuscript has been significantly improved. I commend the authors for their diligence in addressing the reviewers' comments and for the quality of their research.

    3. Reviewer #2 (Public review):

      Summary:

      Definition of the role of CdK5 in circadian locator activity and light induced neural activity in the mouse SCN in-vivo revealing its mode of action through PKA-CaMK-CREB signaling pathway.

      Strengths:

      The experimental approaches are carried from in-vivo, to cellular and molecular level and provide first evidence for the specific involvement of CdK5 in light-induced phase advance of the free-running rhythm.

      Weaknesses:

      The behavioral analyses are limited to some selected parameters.

      Downstream effects on circadian oscillation of gene expression and physiological functions in other brain regions, organs is missing.

      Comments on revisions:

      I am happy with the manuscript in its present form.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reply to reviewer comments:

      (1) Given the interpretations of this study hinge on the specificity of the antibodies used in immune fluorescence, the authors should provide full western-blot images of all their antibodies in supplementary information. 

      The commercial antibodies have been validated by the provider. 

      Additionally, we did our own tests. Of note is that proper validation of any antibody is only possible by using a knockout mouse for each protein analyzed (i.e. for pPKA wt vs. pka ko mice). This is not possible, because we do not have all these knock-out strains. However, specific proteins like pPKA, pCAMKII, and pCAMKIV are known to be increased by a light pulse. We show by western blot that pPKA (Fig. 2a, b) and pCamKII (Fig. S2a, b) are increased in wt animals mirroring what we observed in the immunofluorescence. These results suggest that the signal is specific to these antibodies. We provide a full panel of western blots, including the other proteins studied by immunofluorescence such as pCamKIV, pCREB, CaV 3.1, and pDARP32 and show that they detect a protein of the expected size. Full Western-blots mentioned in the manuscript are shown in Supplementary Figure 7. Below are additional validations of antibodies used in the immunofluorescence experiments.

      Author response image 1.

      Author response image 2.

      (2) The explanation in the results section surrounding Fig. 4 seems to be specific for the representative trace rather than the group. Specifically, does the following statement apply to all the replicates?  " A Ca2+ transient was observed right before the light was given at ZT14 (Fig. 4b), which showed the same magnitude as those observed during and after the light stimulus". 

      If not this should be corrected.  

      We have replaced now Fig. 4b with an average trace of all experiments. The individual traces can be seen in supplementary figure 4d.

      (3) Are lines 236 -244 and figure 5A/B demonstrating shCDK5 being similar to no-calcium or EGTA conditions at the level of CREB not contradicting Figure 3 which argues that the reason behind the increase in CAMK-phosphorylation and pCREB following shCDK5 is increased basal calcium? If this is the case then why does removing the external calcium phenocopy shCDK5 in these cells? The authors need to clarify this and give an explanation. 

      (4) The authors should explain why they see an equivalent level (or more) of CREB activation, 5 minutes following forskolin activation in Ca2+-free condition (apparent in the case of shCDK5 and EGTA) in the FRET assay. Does this not imply PKA is the most likely candidate mediating this reaction at this stage? Given this interaction has been demonstrated in multiple (other) experiments including in vitro isolated enzyme experiments involving CREB and PKA (E.G. fig 6A in PMID: 2900470) an absence of p-PKA pulldown is not sufficient to justify the non-involvement of PKA (PMID: 22583753). This statement needs support in the form of positive data or acknowledging the limitations in the text (conditions, single technique, etc). 

      (5) The authors should better explain the fret pairs used in the experiments involving ICAP for the reader's benefit - a reduction in fluorescence as a function of CREB activation is non-intuitive.

      We answer all three questions (3-5) together since they belong to the same concept.

      (1) How FRET works.

      The Forster resonance energy transfer (FRET) technique is widely used to investigate molecular interactions between proteins such as CREB: CBP in living cells. We used a sensor called ICAP (an Indicator of CREB Activation due to Phosphorylation) published by Friedrich and colleagues in 2010

      (https://doi.org/10.1074/jbc.M110.124545). The sensor is composed of three different elements: 1) the KID domain of CREB containing the Ser-133, which is phosphorylated upon forskolin induction in our experimental setup, 2) the KIX domain of CBP, which is responsible for the dimerization with phospho-CREB and 3) a short linker that separates the KID with the KIX domain. KID is flanked by a cyan fluorescent protein (CFP), while KIX is flanked by a yellow fluorescent protein (YFP). When KID is not phosphorylated, the ICAP conformation allows CFP - stimulated by blue UV light - to transfer energy to YFP, producing FRET resulting in yellow light emission. Therefore, the ratiometric analysis FRET/CFP shows FRET > CFP. After a stimulus (forskolin), the serine-133 in KID is phosphorylated and KID can bind to KIX. The dimerization separates CFP from YFP, resulting in decreased FRET and increased CFP-dependent blue light emission (see Author response image 3 below). Therefore, the ratiometric analysis FRET/CFP shows FRET<CFP over time (usually within 20’ after the forskolin stimulus).

      Author response image 3.

      FRET model. On the left is a schematic representation of how ICAP works. On the right, an example of the quantified FRET decrease associated with increased KID: KIX interaction.

      (2) The ‘apparent’ contradiction between Figure 5A and Fig 3.

      As mentioned before, the chosen FRET method is ratiometric, meaning that a relative FRET signal in fluorescence is measured compared to the baseline (absence of forskolin, assay buffer). The FRET experiment can only tell whether there is a change in the phosphorylation state of KID during the live imaging comparing the baseline to the period after the forskolin treatment. The result produces a delta [ (time after forskolin)(baseline)]. The higher the delta, the more KID is phosphorylated after forskolin treatment. If KID phosphorylation is not increased compared to the baseline, the FRET signal tends to return to the baseline with a reduced delta [ (time after forskolin)-(baseline)]. Therefore, the experiment does not tell at the quantitative level the amount of KID (CREB domain) phosphorylation before the stimulus. It only tells whether after the stimulus the phosphorylation is increased producing or not a delta. This means that the lack of delta can be caused by: A) high KID phosphorylation in the baseline which does not further increase after the forskolin stimulus; B) very low KID phosphorylation in the baseline which does not increase after the forskolin stimulus. In Fig. 5A, wt cells (orange trace, lines, and double arrow) show a higher delta compared to the ko cells (blue trace, lines, and double arrow). The result indicated that the phosphorylation of CREB (KID domain) is increased after the forskolin stimulus only in the wt. To that extent, the results are in line with the experiment that we show in Figure 3. Indeed, the increased delta in CREB phosphorylation is observed only in the scramble animals, where it is lost in the ko (the blue double arrow indicates the delta in the scramble). 

      Author response image 4.

      (3) The FRET signal within 3 minutes after forskolin stimulation

      The signal mentioned by the reviewers at 5’ is an artifact given by the light diffraction promoted by the addition of Forskolin in DMSO which propagates through the plate. The same effect is observed in the only DMSO treatment (Fig.S5). Therefore, it needs not to be taken into account. The amplitude of this signal in this window of time is due to many independent variables (buffer composition, cell shape, room temperature, pipetting), therefore it is not possible to speculate any consideration about it. We never consider this time window for describing our results.

      Author response image 5.

      (4) Role of PKA and considerations about experiments performed in Fig. 5a and b

      To answer the question about the role of PKA, we believe it is a pivotal player. Our results indicate that PKA might promote CaV3.1, the entrance of calcium, and therefore, CAM Kinase pathway activation leading to CREB phosphorylation (Fig. 5). However, if the calcium is depleted, even a channel activation mediated by PKA cannot propagate the signal. For that reason, when we deplete calcium in wt cells as we do in the experiment performed in Figure 5B the activation of PKA alone cannot promote the CREB phosphorylation associated with a reduction of the FRET signal. As mentioned before, the FRET method gives a binary answer. It means either a higher or lower delta comparing time after forskolin to baseline. It cannot give stoichiometric info about the level of calcium and/or phosphorylation in the baseline. To that extent, the FRET experiment in Figure 5A cannot be connected to the experiment in Figure 5B. The method is the same, but the scientific questions are different. In Figure 5A we demonstrate that CDK5 plays a role in the PKA activation pathway. In Figure 5B we demonstrate that the general pathway needs calcium.

      We modified the text accordingly.

      (6) The presentation of the data in Figure 6 seems to be divergent from the rest of the data presentations. Please make it more consistent and also provide more explanations. Specifically, the authors suggest increased P-CREB nuclear localization (and an increase in phosphorylated PKA/CAMK) following shCDK5. Won't this lead to an increase in Per1, Dec1, cFos, and Sik1 basally (pre-light pulse)?

      We followed the reviewer's suggestion and present data in Figure 6 as done before in the manuscript. The reviewers should also consider our papers published before (Brenna et al., 2019; Brenna et al., 2021). In these papers, we demonstrate two important concepts that are in line with this manuscript. First, the lack of CDK5 promotes PER2 degradation and lack of nuclear translocation (Brenna et al., 2019). Second, PER2 plays a scaffold role in promoting the formation of the CREB transcriptional complex involved in the regulation of the expression of light-dependent genes (Brenna et al., 2021). Therefore, the take-home message here is that even if a lack of Cdk5 promotes a higher basal level of CREB phosphorylation, it also promotes PER2 degradation. Therefore, without PER2, the CREB-dependent gene expression is reduced. For this reason, we say that CDK5 gates phase shift (via PKA-CAM Kinases-CREB axis) of the circadian clock (via PER2).

      (7) The authors should discuss why calcium-sensitive phosphatases such as PP2A (PMID: 23752926) or calcineurin (PMID: 10217279) are not considered candidates for dephosphorylation of DARPP32 as these are described previously (CDK5) and conditions of increased calcium as seen here would favour these enzymes. The phospho-T75 data are supportive, but such additional discussion could be important given the past demonstrations.

      We thank the reviewers for the great insight. The pathway that promotes the T75 phosphorylation/dephosphorylation indeed includes many players as calcineurin and PPA2A. We mention this in the discussion now as follows:

      However, phosphatases such as PP2A and calcineurin, which de-phosphorylate DARPP32 including the Cdk5 phosphorylation site, may be involved in this process as well (Girault and Nairn, 2021). Upon light treatment and increase of Ca2+ these phosphatases would dephosphorylate DARPP32 and thereby inactivate it, leading to PKA activation. This process may occur in parallel to the Cdk5 regulation of DARPP32 contributing to a sustained activation of the light signaling pathway via PKA activation.

      (8) additional details on the knock-downs would be helpful: 

      - the relative amount of reduction in gene expression upon shRNA treatment should be provided  - How was the exact viral delivery and reduction in shRNA-induced knock-down confirmed for the individual animals?  

      The validation of Cdk5 knockdown was widely performed in the previous paper (Brenna et al., 2019, Fig2-Fig supp1, and Fig3-Fig suppl2). We used the same mice. We confirmed the goodness of the silencing also in the supp figure 1A of the current paper.

      (9) The authors only focus on male mice. This is rather incomplete, as it leaves away an important half of biological reality. Testing relevant aspects of the work in female mice would close this significant gap and also increase the number of biological replicates, which can still be considered relatively low. 

      We thank the reviewers for the suggestion. We injected female mice and performed the Ashoff type-II light pulse experiment at ZT14 and observe the same phenotype as for male mice. This is stated now in the paper and the data are shown in supplemental figure 1 e-f.

      (10) Given the roles of CdK5 in circadian clock period length regulation, but also light-induced phase delays, it would be interesting for a broader audience to discuss possible expectations of CdK5's roles, e.g. 

      (a) How will other circadian parameters, eg. activity bouts (numbers, length, activity onset/ offset) be affected? 

      (b) How does that relate to sleep, sleep phases? 

      (c) What is the expected impact on other physiological rhythms, eg food intake, cortisol levels? 

      (d) What are the expected effects on circadian oscillation of gene expression in other brain regions, organs? 

      We thank the reviewers for the observations. 

      a) The activity was discussed in the previous paper (Brenna et al. 2019). ShCdk5 mice show a reduced activity in both DD and LD 12:12 compared to wt, mirroring the Per2 brdm phenotype (Figure- Suppl3, with the difference mostly observed at night time (Figure 2-suppl4).

      We also demonstrate in Suppl Fig1 b, c of the current paper that light pulse does not affect the period length either in scramble mice or in sh Cdk5.

      b) We performed preliminary experiments with SCN shCdk5 knock-down animals and compared them to scr control mice using the Piezo sleep system. Total sleep was not different, however during the dark phase shCdk5 animals tended to sleep a bit more, similar to the neuronal Per2 KO animals (Wendrich et al., 2023 https://doi.org/10.3390/clockssleep5020017 ). After sleep-deprivation no differences were observed between shCdk5 and scr animals. This was comparable to the neuronal Per2 KO animals that also showed no phenotype after sleep deprivation.

      c) and d) We did not investigate food intake, cortisol, or other parameters involving peripheral clocks. We did not investigate the gene expression in other brain regions because the SCN is the main brain region involved in the regulation of the circadian clock phase shift. However future studies will address these questions.

    1. eLife Assessment

      This study examined the role of the dorsal medial prefrontal cortex of mice in anticipating reward-value switch points in a two-armed bandit task. They demonstrate the dorsal medial prefrontal cortex is involved in task performance and use model-based methods to uncover the algorithmic processes affected by prefrontal cortex perturbations. If the claims were supported, this would be a valuable finding. Unfortunately, the reviewers recognised significant issues with the task design and analyses, making the evidence to support the role of the prefrontal cortex in anticipation of switches inadequate.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors train mice on a two-armed bandit task, in which the reward value associated with the arms suddenly switches in a pseudorandom fashion. Their first finding is that the mice are able to anticipate the reward value switch points after long blocks, evident both prior to the switch point with higher rates of switching to the less-rewarded arm, and after the switch point with faster transition to the more-rewarded arm. They next find that unilateral ACAd/MO lesion / optogenetic silencing (surprisingly) causes greater anticipation of reward switch points, both prior to and after the switch point. They use behavioral modeling to argue that the unilateral ACAd/MO lesion effects are due to an increase in the contralateral hazard rate. Finally, they found that bilateral lesions did not have any effect on the hazard rate, suggesting that the unilateral lesion effect is due to balancing between hemispheres. This manuscript employed a clever behavioral design and analysis approach, though the effects were somewhat difficult to interpret and the author's interpretation relies heavily on the accuracy of their underlying behavioral model.

      Strengths:

      This paper employs a well-designed task that allows the researchers to detect whether mice have noticed a change in reward value both before and after the change takes place. The use of unilateral and bilateral inactivation experiments allowed the authors to test the role of the ACAd/MO region in the change point estimation. They found that unilateral inactivation, but not bilateral inactivation, had a significant effect on behavior. They performed sophisticated behavioral analysis to determine how ACAd/MO perturbations affect decision-making variables. This topic is of interest to the field, and the results are presented clearly and generally convincing.

      Weaknesses:

      The observed effects of the lesions are somewhat counterintuitive, with lesions appearing to affect persistence within a block more than change point detection itself-the mice actually adjusted more quickly to changes in reward values. Moreover, they had no issue detecting change points after bilateral inactivation. As a result, I'm not sure if the main framing of the article (including the title) is supported by their findings. Finally, I was unsure how the differences between unilateral and bilateral inactivation could be explained by their behavioral model.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Murphy et al. titled "Change point estimation by the mouse medial prefrontal cortex during probabilistic reward learning" investigated the role of the mPFC in the exploitation of task structure. Previous work had shown that monkeys and humans exploit predictable task structures (e.g., switching rapidly when heavily trained a reversal learning task), but whether this was also the case for mice was not known. To test this, Murphy et al. trained head-fixed mice on a two-armed bandit task in which the contingencies reversed when mice met a performance criterion (10 trials choosing the better option) plus an additional random number of trials (referred to as Lrandom). They found that as the length of Lrandom increased, mice began to exhibit pre-emptive switching in their choices as if they were expecting and/or anticipating the reversal to occur. They report that unilateral lesions of the mPFC (ACC + MO) led to earlier pre-emptive switching (although I found this part of the manuscript the most challenging to understand) and faster post-reversal switching that they argue reflects an impairment in the proper estimation of the reversal. They also report that this requires inter-hemispheric coordination because bilateral lesions did not further impair this estimation. Optogenetic inhibition just prior to the mouse making a choice recapitulated some of the behavioral metrics observed in the mPFC lesioned animals. Finally, the authors developed a novel hybrid belief-choice kernel model to provide a computational approach to quantifying these behavioral differences.

      Strengths:

      The paper is extremely well written and was an absolute pleasure to read. The results are novel and provide exciting (although not surprising) evidence that mice exploit task structures to earn rewards. Moreover, the experiments were well-designed and included appropriate controls and/or control conditions that support their findings.

      Weaknesses:

      Some of the results need to be clarified and/or language changed to ensure that readers will understand. Restricting analyses to expert mice that show the predicted effect is problematic.

    4. Reviewer #3 (Public review):

      Summary:

      The authors examine the role of the medial frontal cortex of mice in exploiting statistical structure in tasks. They claim that mice are "proactive": they predict upcoming changes, rather than responding in a "model-free" way to environmental changes. Further, they speculate that the estimation of future change (i.e., prediction of upcoming events, based on learning temporal regularities) might be "a main ... function of dorsal medial frontal cortex (dmFC)." Unfortunately, the current manuscript contains flaws such that the evidence supporting these claims is inadequate.

      Strengths:

      Understanding the neural mechanisms by which we learn about statistical structure in the world is an important goal. The authors developed an interesting task and used model-based techniques to try to understand the mechanisms by which perturbation of dmFC influenced behavior. They demonstrate that lesions and optogenetic silencing of dmFC influence behavior, showing that this region has a causal influence on the task.

      Weaknesses:

      I was concerned that the main behavioral effects shown in Figure 1F were a statistical artifact. By requiring the Geometric block length to be preceded by a performance-based block, the authors introduce a dependence that can generate the phenomena they describe as anticipation.

      To demonstrate this, I simulated their task with an agent that does not have any anticipation of the change point (Reviewer image 1). The agent repeats the previous action with probability `p(repeat)` (similar to the choice kernel in the author's models). If the agent doesn't repeat then the next choice depends on the previous outcome. If the previous choice was rewarded, it stays with `P(WS)` and chooses randomly with `1-P(WS)`. If the previous choice was unrewarded, it switches with `P(LS)` and chooses randomly with `1-P(LS)`.

      Review image 1.

      An agent with `P(WS)=P(LS)=P(repeat)=0.85` shows the same phenomena as the mice: a difference in performance before the block switch and "earlier" crossing of the midpoint after the switch. https://imgdrop.io/image/aHn6y. The phenomena go away in the simulations when a fixed block length of 20 trials is followed by a Geometric block length.

      The authors did not completely rely on the phenomena of Figure 1F for their conclusions. They did a model comparison to provide evidence that animals are anticipating the switch. Unfortunately, the authors did not use state-of-the-art methods in this section of the paper. In particular, they failed to show that under a range of generative parameters for each model class, the model selection process chooses the correct model class (i.e. a confusion matrix). A more minor point, they used BIC instead of a more robust cross-validated metric for model selection. Finally, instead of comparing their "best" anticipating model to their 2nd best model (without anticipation), they compared their best to their 4th best (Supp Fig 3.5). This seems misleading.

      Given all of the the above issues, it is hard to critically evaluate the model-based analysis of the effects of lesions/optogenetics.

    5. Author response:

      We appreciate the reviewers' thoughtful and constructive comments. In this provisional response, we aim to address what we see as the key critiques, with a detailed, point-by-point reply to be provided alongside the revised manuscript. Below, we outline how we intend to address these critiques in the revised manuscript.

      (1) We will revise sections of the manuscript to ensure that all results, particularly those concerning the effects of lesions, are described more clearly and with sufficient context. This includes providing additional visualizations and rewording any ambiguous statements.

      (2) In this study, we examined a subset of 7,396 blocks where animals quickly adapted after block switches (achieving LCriterion in 20 or fewer trials), thereby focusing on expert-level performance and avoiding periods that might be affected by low motivation. It is valid to question whether the same observations would hold if the full dataset were analyzed. To address this, we expanded our analysis to include a supplementary figure Supplementary Figure 1.1 that illustrate the same relationships based on block length (BL) instead of LRandom, both with and without the restriction on LCriterion (n = 9,156 blocks in which the block length is under 100 trials, without any LCriterion restrictions), and based on LRandom without any LCriterion restrictions and with a less stringent LCriterion restriction (with ≤ 50 Trials for the criterion). This method allowed us to include all trials in our dataset. We observed similar effects of block length on choice behavior around switches (Figure 3), confirming the consistency of our findings across different analytical conditions.

      (3) We agree that robust validation of model selection is crucial. To address this, we will generate a confusion matrix to assess whether our model selection process accurately identifies the correct model class across a range of generative parameters. Include additional model selection metrics, such as cross-validation, to complement the BIC analysis and provide a more robust comparison of models.

      (4) We acknowledge the concern regarding our comparison of the "best" and the "4th best" models. The "4th best" model was chosen because it is the most widely recognized in the literature. Our intention was to demonstrate the performance of the most commonly used model, but we understand how this may have been misleading. To address this, we will revise our comparison to focus on the "best" and the "2nd best" models, ensuring greater clarity in the manuscript. Additionally, we will include supplementary simulation results and figures to provide a more comprehensive analysis on models.

    1. eLife Assessment

      This valuable study presents a thorough analysis of protein abundance changes caused by amino acid substitutions, using structural context to improve predictive accuracy. By deriving substitution response matrices based on solvent accessibility, the authors demonstrate that simple structural features can predict abundance effects with accuracy comparable to complex methods such as free energy calculations. The strength of the evidence is solid, supported by robust experimental design and comprehensive analyses. This work is expected to be of interest to broad audiences as it offers practical tools for analyzing mutational effects and insights into the structural basis of proteostasis.

    2. Reviewer #1 (Public review):

      Of course, there is always another layer of the onion, VAMP-seq measures contributions from isolated thermodynamic stability, stability conferred by binding partners (small molecule and protein), synthesis/degradation balance (especially important in "degron" motifs), etc. Here the authors' goal is to create simple models that can act as a baseline for two main reasons:<br /> (1) how to tell when adding more information would be helpful for a global model;<br /> (2) how to detect when a residue/mutation has an unusual profile indicative of an unbalanced contribution from one of the factors listed above.

      As such, the authors state that this manuscript is not intended to be a state-of-the-art method in variant effect prediction, but rather a direction towards considering static structural information for the VAMP-seq effects. At its core, the method is a fairly traditional asymmetric substitution matrix (I was surprised not to see a comparison to BLOSUM in the manuscript) - and shows that a subdivision by burial makes the model much more predictive. Despite only having 6 datasets, they show predictive power even when the matrices are based on a smaller number. Another success is rationalizing the VAMPseq results on relevant oligomeric states.

      Specific Feedback:

      Major points:

      The authors spend a good amount of space discussing how the six datasets have different distributions in abundance scores. After the development of their model is there more to say about why? Is there something that can be leveraged here to design maximally informative experiments?

      They compare to one more "sophisticated model" - RosettaddG - which should be more correlated with thermodynamic stability than other factors measured by VAMP-seq. However, the direct head-to-head comparison between their matrices and ddG is underdeveloped. How can this be used to dissect cases where thermodynamics are not contributing to specific substitution patterns OR in specific residues/regions that are predicted by one method better than the other? This would naturally dovetail into whether there is orthogonal information between these two that could be leveraged to create better predictions.

      Perhaps beyond the scope of this baseline method, there is also ThermoMPNN and the work from Gabe Rocklin to consider as other approaches that should be more correlated only with thermodynamics.

      I find myself drawn to the hints of a larger idea that outliers to this model can be helpful in identifying specific aspects of proteostasis. The discussion of S109 is great in this respect, but I can't help but feel there is more to be mined from Figure S9 or other analyses of outlier higher than predicted abundance along linear or tertiary motifs.

    3. Reviewer #2 (Public review):

      Summary:

      This study analyzes protein abundance data from six VAMP-seq experiments, comprising over 31,000 single amino acid substitutions, to understand how different amino acids contribute to maintaining cellular protein levels. The authors develop substitution matrices that capture the average effect of amino acid changes on protein abundance in different structural contexts (buried vs. exposed residues). Their key finding is that these simple structure-based matrices can predict mutational effects on abundance with accuracy comparable to more complex physics-based stability calculations (ΔΔG).

      Major strengths:

      (1) The analysis focuses on a single molecular phenotype (abundance) measured using the same experimental approach (VAMP-seq), avoiding confounding factors present when combining data from different phenotypes (e.g., mixing stability, activity, and fitness data) or different experimental methods.

      (2) The demonstration that simple structural features (particularly solvent accessibility) can capture a significant portion of mutational effects on abundance.

      (3) The practical utility of the matrices for analyzing protein interfaces and identifying functionally important surface residues.

      Major weaknesses:

      (1) The statistical rigor of the analysis could be improved. For example, when comparing exposed vs. buried classification of interface residues, or when assessing whether differences between prediction methods are significant.

      (2) The mechanistic connection between stability and abundance is assumed rather than explained or investigated. For instance, destabilizing mutations might decrease abundance through protein quality control, but other mechanisms like degron exposure could also be at play.

      (3) The similar performance of simple matrix-based and complex physics-based predictions calls for deeper analysis. A systematic comparison of where these approaches agree or differ could illuminate the relationship between stability and abundance. For instance, buried sites showing exposed-like behavior might indicate regions of structural plasticity, while the link between destabilization and degradation might involve partial unfolding exposing typically buried residues. The authors have all the necessary data for such analysis but don't fully exploit this opportunity.

      (4) The pooling of data across proteins to construct the matrices needs better justification, given the observed differences in score distributions between proteins (for example, PTEN's distribution is shifted towards high abundance scores while ASPA and PRKN show more binary distributions).

      (5) Some key methodological choices require better justification. For example, combining "to" and "from" mutation profiles for PCA despite their different behaviors, or using arbitrary thresholds (like 0.05) for residue classification.

      The authors largely achieve their primary aim of showing that simple structural features can predict abundance changes. However, their secondary goal of using the matrices to identify functionally important residues would benefit from more rigorous statistical validation. While the matrices provide a useful baseline for abundance prediction, the paper could offer deeper biological insights by investigating cases where simple structure-based predictions differ from physics-based stability calculations.

      This work provides a valuable resource for the protein science community in the form of easily applicable substitution matrices. The finding that such simple features can match more complex calculations is significant for the field. However, the work's impact would be enhanced by a deeper investigation of the mechanistic implications of the observed patterns, particularly in cases where abundance changes appear decoupled from stability effects.

    4. Reviewer #3 (Public review):

      "Effects of residue substitutions on the cellular abundance of proteins" by Schulze and Lindorff-Larsen revisits the classical concept of structure-aware protein substitution matrices through the scope of modern protein structure modelling approaches and comprehensive phenotypic readouts from multiplex assays of variant effects (MAVEs). The authors explore 6 unique protein MAVE datasets based on protein abundance (and thus stability) by utilizing structural information, specifically residue solvent accessibility and secondary structure type, to derive combinations of context-specific substitution matrices predicting variant abundance. They are clear to outline that the aim of the study is not to produce a new best abundance predictor but to showcase the degree of prediction afforded simply by utilizing information on residue accessibility. The performance of their matrices is robustly evaluated using a leave-one-out approach, where the abundance effects for a single protein are predicted using the remaining datasets. Using a simple classification of buried and solvent-exposed residues, and substitution matrices derived respectively for each residue group, the authors convincingly demonstrate that taking structural solvent accessibility contexts into account leads to more accurate performance than either a structure-unaware matrix, secondary structure-based matrix, or matrices combining both solvent accessibility or secondary structure. Interestingly, it is shown that the performance of the simple buried and exposed residue substitution matrices for predicting protein abundance is on par with Rosetta, an established and specialized protein variant stability predictor. More importantly, the authors finish off the paper by demonstrating the utility of the two matrices to identify surface residues that have buried-like substitution profiles, that are shown to correspond to protein interface residues, post-translational modification sites, functional residues, or putative degrons.

      Strengths:

      The paper makes a strong and well-supported main point, demonstrating the utility of the authors' approach through performance comparisons with alternative substitution matrices and specialized methods alike. The matrices are rigorously evaluated without introducing bias, exploring various combinations of protein datasets. Supplemental analyses are extremely comprehensive and detailed. The applicability of the substitution matrices is explored beyond abundance prediction and could have important implications in the future for identifying functionally relevant sites.

      Comments:

      (1) A wider discussion of the possible reasons why matrices for certain proteins seem to correlate better than others would be extremely interesting, touching upon possible points like differences or similarities in local environments, degradation pathways, post-translation modifications, and regulation. While the initial data structure differences provide a possible explanation, Figure S17A, B correlations show a more complicated picture.

      (2) The performance analysis in Figure 2D seems to show that for particular proteins "less is more" when it comes to which datasets are best to derive the matrix from (CYP2C9, ASPA, PRKN). Are there any features (direct or proxy), that would allow to group proteins to maximize accuracy? Do the authors think on top of the buried vs exposed paradigm, another grouping dimension at the protein/domain level could improve performance?

      (3) While the matrices and Rosetta seem to show similar degrees of correlation, do the methods both fail and succeed on the same variants? Or do they show a degree of orthogonality and could potentially be synergistic?

      Overall, this work presents a valuable contribution by creatively utilizing a simple concept through cutting-edge datasets, which could be useful in various.

    1. eLife Assessment

      This manuscript describes work that is fundamental to our understanding of the mechanism of how stress regulates the noradrenergic system and the CRH system. Using a combination of ex vivo physiology and in vitro methods, the study provides compelling evidence as to the signaling mechanisms of how glucocorticoids impact adrenergic GPCRs in CRH cells via receptor trafficking. While the ex vivo work is specific to the hypothalamus, the mechanisms here could be extrapolated and investigated in other brain regions that may have similar signaling regulation.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors investigate the cellular mechanism underlying suppression of adrenergic effects on excitatory transmission onto hypothalamic CRH neurons by stress. Experiments in ex-vivo slices show that this is a long-lasting effect that occurs through endocytosis of receptors. The authors then move into an immortalized hypothalamic cell line to enable investigation of the mechanism of changes in receptor trafficking. They use a series of immunohistochemistry, FRET, and biochemical experiments to show that application of corticosterone increases targeting of alpha1 adrenergic receptors to the late endosome and lysosome rather than the recycling endosome. Perhaps most interesting, they find that alpha1 receptors and glucocortioid receptors form a complex that is ultimately transferred to the nucleus.

      Strengths:

      Overall, the studies in this manuscript are rigorous and well-conducted. The data supports their conclusions, and they've shown convincingly that glucocorticoid signaling affects trafficking of alpha1 receptors in the culture system they are using. These findings are important for the field of stress research, both in understanding how two components of the stress system (norepinephrine and HPA axis) interact with each other and in neuromodulatory modulation of hypothalamic CRH neurons. Their finding that alpha1 receptors and glucocorticoid receptors form a complex is particularly interesting and maybe impactful outside of the immediate application in the hypothalamus.

      Weaknesses:

      The study has two primary weaknesses. First, the majority of the experiments were conducted in an immortalized hypothalamic cell line. This was necessary to conduct the type of experiments needed to test the author's hypothesis, but it remains unclear how closely these cells resemble CRH neurons, or how the same mechanism may be preserved or altered in an intact circuit. Further discussion of these points would strengthen the manuscript.<br /> Second, while experiments are generally well-designed, the authors do not show that the effects of corticosterone can be blocked with a glucocorticoid receptor antagonist. This is fairly standard pharmacology and would strengthen confidence in the findings presented in the study.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors report novel and exciting findings delineating a non-transcriptional mechanism whereby glucocorticoids desensitize CRH neurons to NE in response to somatic stress. The authors find that this desensitization induced by CORT 1. persists more than 18h, 2. reduces surface expression of AR1bR (NE receptors) by redirecting trafficking from rapid recycling to late endosomal pools and lysosomes, 3. is dependent on NE binding to the AR1bR, 4. involves cellular nitrosylation, 5. involves ubiquitination of beta-arrestin, and 5. involves interactions between glucocorticoid receptors and AR1bs, glucocorticoid receptors and ubiquitinated beta-arrestin, and AR1b and ubiquitinated beta-arrestin. While the authors do not directly provide evidence for a trimeric complex composed of these three proteins, their data that CORT causes translocation of these dimeric complexes to the cell nucleus suggests it is likely. Overall, these results are highly informative for understanding novel mechanisms mediating glucocorticoid regulation of GPCRs.

      Strengths:<br /> - Good rationale for each experiment, which describes many parts of the CORT-NE desensitization mechanism<br /> - Great discussion of limitations of the approaches and the parts of the mechanism we do not fully understand yet<br /> - Appropriate approaches for questions being answered<br /> - Describes a highly novel CORT mechanism that non-transcriptionally switches GPCR trafficking dynamics, something that could have far reaching implications for other GPCRs involved in stress responses

      Weaknesses:<br /> - Unclear how this mechanism would generalize to other stressor modalities. Restraint stress is a somatic stressor, but can also be considered a psychological stressor (model of depression-like behavior). A purely somatic stressor might increase the robustness of this phenomenon.<br /> - Remains unknown how nitrosylation plays into the mechanism in terms of specific proteins affected by CORT (GRK2, endophilin, clathrin possibilities)

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Weiss et al describe a mechanism through which glucocorticoids desensitize CRH neurons in the PVN to norepinephrine. This follows on from previous work from this lab showing rapid glucocorticoid suppression of adrenergic signaling in CRH neurons specific to somatic stress activation, and modality-selective glucocorticoid negative feedback.

      Specifically, their previous work shows that:<br /> (1) NE increases glutamate drive to CRH neurons<br /> (2) CORT blunts the effects of NE through a dynamin-dependent mechanism<br /> (3) This contributes to loss of NE signalling after stress (specifically when the second stressor is a physiological one)

      Here they extend this line of interrogation by showing that CORT redistributes Ara1b receptors from rapid recycling endosomes to late endosomes and lysosomes. They show a time window of CORT actions and provide additional mechanistic details implicating nitric oxide-dependent nitrosylation in receptor trafficking.

      Strengths:

      Builds on existing work to provide additional mechanistic details.<br /> The experiments are well done and data are compelling.<br /> The link to nitrosylation is novel (but see below)

      Weaknesses:

      (1) The link to nitrosylation is interesting, but a bit confusing. If I understand correctly, inhibiting the production of NO or using NEM increases receptor internalization, suggesting that NO-dependent nitrosylation prevents ligand-dependent internalization. What is unclear to me is how CORT is linked to this step. I note the authors show a decrease in nitrosylation with CORT. So, does CORT decrease the activity of NOS and, thus, the production of NO? If so, then exogenously activating this system in the presence of CORT should result in a recovery of NE-dependent increase in glutamate release. Or is the GCR directly decreasing nitrosylation? Linking these elements is critical in terms of furthering our mechanistic understanding of this process.

      (2) It's not clear why/how blockade of Ara1 after CORT-induced cytosolic accumulation results in a reversal of effect. Unless I misunderstood something, this requires further explanation.

    5. Author response:

      We appreciate the expression of enthusiasm for our paper by the editors and the three reviewers and the suggestions on how to improve the study. Here we outline how we will address the reviewers’ concerns and suggestions in a planned revision of our manuscript.

      Reviewer #1 listed two primary weaknesses:

      (1) the need for discussion of the extent to which the cell line we used resembles CRH neurons and

      (2) that we did not test for the effect of blockade of the glucocorticoid receptor.

      (1) As the reviewer acknowledges, our experiments called for the use of a cell line to dissect intracellular trafficking of the α1 adrenoreceptor. We selected the N42 cell line for this purpose because it is an immortalized hypothalamic cell line (developed by Belsham and colleagues, Belsham et al., 2004) that expresses CRH. We have used this cell line successfully in the past to study transcriptional and rapid non-genomic actions of glucocorticoids, which indicated that, in addition to expressing CRH, these cells also express both the nuclear glucocorticoid receptor and a membrane-associated receptor that binds glucocorticoids (Rainville et al., 2019; Weiss et al., 2019). We believe that this hypothalamic cell line is the most closely related to native PVN CRH neurons of any cell line available. As requested, we will add to the Discussion of the manuscript to further justify our choice of cells.

      (2) We agree that this experiment should be performed. We will test the classical GR (and progesterone) antagonist RU486 (mifepristone) for its effect on the cort regulation of α1 adrenoreceptor trafficking. Our ex vivo electrophysiology studies have indicated that the rapid glucocorticoid effect in native hypothalamic CRH neurons is not blocked by RU486 and is not, therefore, dependent on activation of the classical nuclear GR (Di et al., 2003; Di et al., 2016).

      Reviewer #2 also listed two main weaknesses of the study:

      (1) that we did not test whether the adrenoreceptor desensitization by restraint stress generalizes to other stress modalities and might be more robust with a pure somatic stressor, and

      (2) the lack of identification of a target protein as a mechanism for the role of nitrosylation.

      (1) We used restraint stress as a means to elicit corticosterone release, which desensitized the HPA response to a NE-dependent somatic stressor (lipopolysaccharide injection) but not to a NEindependent psychological stressor (predator odor) (Jiang et al., 2021). We got a near-complete loss of the sensitivity of CRH neurons to NE with restraint (i.e., near ceiling effect), such that a different stressor, including a more purely somatic stressor, should not increase the Cort-induced desensitization further. For that reason, we would argue that testing other stressors would not add value to the current study. That said, we plan and have received new funding to test in the future whether the Cort desensitization of the HPA response to LPS stress generalizes to other somatic stressors. We also have future plans to test for the Cort desensitization of other Gq-coupled receptors.

      (2) We agree that finding the molecular target of nitrosylation as the mechanism for Cort desensitization of α1 adrenoreceptors would significant improve the study, but this is a potentially enormous undertaking as it will require the screening and validation of multiple proteins involved in protein trafficking to find the one(s) targeted for nitrosylation by Cort. We tested β-arrestin as a possible target in the paper, but did not find Cort to regulate β-arrestin nitrosylation. We plan to undertake a general nitrosylation screen of proteins to identify multiple possible targets, but prefer to defer this and the validation of possible targets to a future, more thorough analysis.

      Reviewer #3 also pointed out two main weaknesses of our study:

      (1) that the glucocorticoidnitrosylation link was confusing, and

      (2) that it was unclear how blocking α1 adrenoreceptors reversed the Cort-induced cytosolic accumulation of the receptor.

      We appreciate the reviewer pointing out these deficiencies in our interpretation and explanation of our findings. We plan to address them directly in the revised version of the paper. 

      References

      Belsham DD, Cai F, Cui H, Smukler SR, Salapatek AMF, Shkreta L (2004) Generation of a phenotypic array of hypothalamic neuronal cell models to study complex neuroendocrine disorders. Endocrinology 145:393–400.

      Weiss GL, Rainville JR, Zhao Q, Tasker JG (2019) Purity and stability of the membrane-limited glucocorticoid receptor agonist dexamethasone-BSA. Steroids 142:2-5. 

      Rainville JR, Weiss GL, Evanson N, Herman JP, Vasudevan N, Tasker JG (2019) Membrane-initiated nuclear trafficking of the glucocorticoid receptor in hypothalamic neurons. Steroids 142:55-64.

      Di S, Malcher-Lopes R, Halmos KCs, Tasker JG (2003) Non-genomic glucocorticoid inhibition via endocannabinoid release in the hypothalamus: a fast feedback mechanism. Journal of Neuroscience 23:4850-4857.

      Di S, Itoga CA, Fisher MO, Solomonow J, Roltsch EA, Gilpin NW, Tasker JG (2016) Acute stress suppresses inhibition and increases anxiety via endocannabinoid release in the basolateral amygdala. Journal of Neuroscience 36:8461-8470.

      Jiang Z, Chen C, Weiss GL, Fu X, Stelly CE, Sweeten BLW, Tirrell PS, Pursell I, Stevens CR, Fisher MO, Begley JC, Harrison LM, Tasker JG (2022) Stress-induced glucocorticoid desensitizes adrenoreceptors to gate the neuroendocrine response to somatic stress in male mice. Cell Reports 41(3):111509.

    1. eLife assessment

      This study reports analysis of the formation of electrosensory ampullary organs in non-model organisms, the sterlet sturgeon. By using a combination of targeted gene knock-out and inhibition, the study provides overall convincing evidence for differential roles of BMP signaling in lateral-line development, with few aspects that could be improved. The study is particularly valuable for understanding the development of a still-mysterious sensory system, and for its evolutionary implications.

    2. Reviewer #1 (Public Review):

      The authors were curious about the formation of the electrosensory lateral line, which is found in non-traditional model organisms. This issue has traditionally hampered studies because those organisms are not amenable to controlled experimental work.

      The authors skillfully use CRIPR-based technologies to overcome this limitation. Together with exceptionally good whole-mount in situ hybridisation, they produced a well-supported conclusion that Bmp signalling has different roles in the development of electrosensory ampullary organs.

      I would not entirely agree that Bmp signalling has "opposing" roles because the authors do not show evidence of opposition via gain-of-function experiments at different developmental times. Instead, they are simply different at different periods of organogenesis.

      The study is important for understanding the development of a still-mysterious sensory system, and for its implications in evolutionary biology more generally.

    3. Reviewer #2 (Public Review):

      Campbell et al have described the dynamic pattern of two Bmps (Bmp5, Bmp4), one of their receptors (Acvr2a), putative joint inhibitors of the Bmp & Wnt pathways (sostdc1, apcdd1) and an effector of Bmp signaling phosph-Smad, in the experimentally tractable sterlet sturgeon to better understand the role of Bmp signaling in electroreceptor development. The role of Bmp signaling is poorly understood in the lateral line system. Furthermore, the development of electroreceptors in ampullary organs remains poorly understood as most recent analysis of lateral line development has focused on model organisms Xenopus and zebrafish, which the electroreceptors have been lost. They show that expression of these players is consistent with a role for Bmp signaling in electroreceptor development. Furthermore, they show that Bmp5 crispants have fewer ampullary organs. However, inhibition of Bmp signaling with the small molecule inhibitor DMH1 for 20 hours starting from stage 36 after hatching and before ampullary organ development results in supernumerary ampullary organ development. These strikingly different results lead the authors to conclude that Bmp signaling has opposing roles in ampullary organ development.

      These observations are interesting and the conclusions are supported by the data presented and the study makes important contributions to our understanding of the role of Bmp signaling in electroreceptor development in lateral line development. However, the study opens and leaves unresolved a number of questions. While a definitive answer to these questions may be outside the scope of this paper, some additional experiments may help strengthen the paper.

    1. eLife Assessment

      This valuable study investigates the neural noise hypothesis of developmental dyslexia using electroencephalography (EEG) and 7T magnetic resonance spectroscopy (MRS). Solid results were reported that indicate no evidence of an imbalance between excitatory and inhibitory (E/I) brain activity in adolescents and young adults with dyslexia compared to controls, thereby challenging the neural noise hypothesis. This research advances our understanding of the neural mechanisms underlying dyslexia and offers broader insights into the neural processes involved in reading development.

    2. Reviewer #1 (Public review):

      Summary:

      "Neural noise", here operationalized as an imbalance between excitatory and inhibitory neural activity, has been posited as a core cause of developmental dyslexia, a prevalent learning disability that impacts reading accuracy and fluency. This is study is the first to systematically evaluate the neural noise hypothesis of dyslexia. Neural noise was measured using neurophysiological (electroencephalography [EEG]) and neurochemical (magnetic resonance spectroscopy [MRS]) in adolescents and young adults with and without dyslexia. The authors did not find evidence of elevated neural noise in the dyslexia group from EEG or MRS measures, and Bayes factors generally informed against including the grouping factor in the models. Although the comparisons between groups with and without dyslexia did not support the neural noise hypothesis, a mediation model that quantified phonological processing and reading abilities continuously revealed that EEG beta power in the left superior temporal sulcus was positively associated with reading ability via phonological awareness. This finding lends support for analysis of associations between neural excitatory/inhibitory factors and reading ability along a continuum, rather than as with a case/control approach, and indicates the relevance of phonological awareness as an intermediate trait that may provide a more proximal link between neurobiology and reading ability. Further research is needed across developmental stages and over a broader set of brain regions to more comprehensively assess the neural noise hypothesis of dyslexia, and alternative neurobiological mechanisms of this disorder should be explored.

      Strengths:

      The inclusion of multiple methods of assessing neural noise (neurophysiological and neurochemical) is a major advantage of this paper. MRS at 7T confers an advantage of more accurately distinguishing and quantifying glutamate, which is a primary target of this study. In addition, the subject-specific functional localization of the MRS acquisition is an innovative approach. MRS acquisition and processing details are noted in the supplementary materials using according to the experts' consensus recommended checklist (https://doi.org/10.1002/nbm.4484). Commenting on rigor the EEG methods is beyond my expertise as a reviewer.<br /> Participants recruited for this study included those with a clinical diagnosis of dyslexia, which strengthens confidence in the accuracy of the diagnosis. The assessment of reading and language abilities during the study further confirms the persistently poorer performance of the dyslexia group compared to the control group.<br /> The correlational analysis and mediation analysis provide complementary information to the main case-control analyses, and the examination of associations between EEG and MRS measures of neural noise is novel and interesting.<br /> The authors follow good practice for open science, including data and code sharing. They also apply statistical rigor, using Bayes Factors to support conclusions of null evidence rather than relying only on non-significant findings. In the discussion, they acknowledge the limitations and generalizability of the evidence and provide directions for future research on this topic.

      Weaknesses:

      Though the methods employed in the paper are generally strong, the MRS acquisition was not optimized to quantify GABA, so the findings (or lack thereof) should be interpreted with caution. Specifically, while 7T MRS affords the benefit of quantifying metabolites, such as GABA, without spectral editing, this quantification is best achieved with echo times (TE) of 68 or 80 ms in order to minimize the spectral overlap between glutamate and GABA and reduce contamination from the macromolecular signal (Finkelman et al., 2022, https://doi.org/10.1016/j.neuroimage.2021.118810). The data in the present study were acquired at TE=28 ms, and are therefore likely affected by overlapping Glu and GABA peaks at 2.3 ppm that are much more difficult to resolve at this short TE, which could directly affect the measures that are meant to characterize the Glu/GABA+ ratio/imbalance. In future research, MRS acquisition schemes should be optimized for the acquisition of Glutamate, GABA, and their relative balance.

      As the authors note in the discussion, additional factors such as MRS voxel location, participant age, and participant sex could influence associations between neural noise and reading abilities and should be considered in future studies.

      Appraisal:

      The authors present a thorough evaluation of the neural noise hypothesis of developmental dyslexia in a sample of adolescents and young adults using multiple methods of measuring excitatory/inhibitory imbalances as an indicator of neural noise. The authors concluded that there was not support for the neural noise hypothesis of dyslexia in their study based on null significance and Bayes factors. This conclusion is justified, and further research is called for to more broadly evaluate the neural noise hypothesis in developmental dyslexia.

      Impact:

      This study provides an exemplar foundation for the evaluation of the neural noise hypothesis of dyslexia. Other researcher may adopt the model applied in this paper to examine neural noise in various populations with/without dyslexia, or across a continuum of reading abilities, to more thoroughly examine evidence (or lack thereof) for this hypothesis. Notably, the lack of evidence here does not rule out the possibility for a role of neural noise in dyslexia, and the authors point out that presentation with co-occurring conditions, such as ADHD, may contribute to neural noise in dyslexia. Dyslexia remains a multi-faceted and heterogenous neurodevelopmental condition, and many genetic, neurobiological and environmental factors play a role. This study demonstrates one step toward evaluating neurobiological mechanisms that may contribute to reading difficulties.

    3. Reviewer #2 (Public review):

      Summary:

      This study utilized two complimentary techniques (EEG and 7T MRI/MRS) to directly test a theory of dyslexia: the neural noise hypothesis. The authors report finding no evidence to support an excitatory/inhibitory balance, as quantified by beta in EEG and Glutamate/GABA ratio in MRS. This is important work and speaks to one potential mechanism by which increased neural noise may occur in dyslexia.

      Strengths:

      This is a well conceived study with in depth analyses and publicly available data for independent review. The authors provide transparency with their statistics and display the raw data points along with the averages in figures for review and interpretation. The data suggest that an E/I balance issue may not underlie deficits in dyslexia and is a meaningful and needed test of a possible mechanism for increased neural noise.

      Weaknesses:

      The researchers did not include a visual print task in the EEG task, which limits analysis of reading specific regions such as the visual word form area, which is a commonly hypoactivated region in dyslexia. This region is a common one of interest in dyslexia, yet the researchers measured the I/E balance in only one region of interest, specific to the language network.

    4. Reviewer #3 (Public review):

      Summary:

      This study by Glica and colleagues utilized EEG (i.e., Beta power, Gamma power, and aperiodic activity) and 7T MRS (i.e., MRS IE ratio, IE balance) to reevaluating the neural noise hypothesis in Dyslexia. Supported by Bayesian statistics, their results show convincing evidence of no differences in EI balance between groups, challenging the neural noise hypothesis.

      Strengths:

      Combining EEG and 7T MRS, this study utilized both the indirect (i.e., Beta power, Gamma power, and aperiodic activity) and direct (i.e., MRS IE ratio, IE balance) measures to reevaluating the neural noise hypothesis in Dyslexia.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      "Neural noise", here operationalized as an imbalance between excitatory and inhibitory neural activity, has been posited as a core cause of developmental dyslexia, a prevalent learning disability that impacts reading accuracy and fluency. This study is the first to systematically evaluate the neural noise hypothesis of dyslexia. Neural noise was measured using neurophysiological (electroencephalography [EEG]) and neurochemical (magnetic resonance spectroscopy [MRS]) in adolescents and young adults with and without dyslexia. The authors did not find evidence of elevated neural noise in the dyslexia group from EEG or MRS measures, and Bayes factors generally informed against including the grouping factor in the models. Although the comparisons between groups with and without dyslexia did not support the neural noise hypothesis, a mediation model that quantified phonological processing and reading abilities continuously revealed that EEG beta power in the left superior temporal sulcus was positively associated with reading ability via phonological awareness. This finding lends support for analysis of associations between neural excitatory/inhibitory factors and reading ability along a continuum, rather than as with a case/control approach, and indicates the relevance of phonological awareness as an intermediate trait that may provide a more proximal link between neurobiology and reading ability. Further research is needed across developmental stages and over a broader set of brain regions to more comprehensively assess the neural noise hypothesis of dyslexia, and alternative neurobiological mechanisms of this disorder should be explored.

      Strengths:

      The inclusion of multiple methods of assessing neural noise (neurophysiological and neurochemical) is a major advantage of this paper. MRS at 7T confers an advantage of more accurately distinguishing and quantifying glutamate, which is a primary target of this study. In addition, the subject-specific functional localization of the MRS acquisition is an innovative approach. MRS acquisition and processing details are noted in the supplementary materials according to the experts' consensus-recommended checklist (https://doi.org/10.1002/nbm.4484). Commenting on the rigor, the EEG methods is beyond my expertise as a reviewer.

      Participants recruited for this study included those with a clinical diagnosis of dyslexia, which strengthens confidence in the accuracy of the diagnosis. The assessment of reading and language abilities during the study further confirms the persistently poorer performance of the dyslexia group compared to the control group.

      The correlational analysis and mediation analysis provide complementary information to the main case-control analyses, and the examination of associations between EEG and MRS measures of neural noise is novel and interesting.

      The authors follow good practice for open science, including data and code sharing. They also apply statistical rigor, using Bayes Factors to support conclusions of null evidence rather than relying only on non-significant findings. In the discussion, they acknowledge the limitations and generalizability of the evidence and provide directions for future research on this topic.

      Weaknesses:

      Though the methods employed in the paper are generally strong, there are certain aspects that are not clearly described in the Materials & Methods section, such as a description of the statistical analyses used for hypothesis testing.

      Thank you for pointing this out. A description of the statistical models used in the analyses of EEG biomarkers has been added to the Materials and Methods:

      “First, exponent and offset values were averaged across all electrodes and analyzed using a 2x2 repeated measures ANOVA with group (dyslexic, control) as a between-subjects factor and condition (resting state, language task) as a within-subjects factor. Age was included in the analyses as a covariate due to the correlation between variables. Next, exponent and offset values were averaged across electrodes corresponding to the left (F7, FT7, FC5) and right inferior frontal gyrus (F8, FT8, FC6), and to the left (T7, TP7, TP9) and right superior temporal sulcus (T8, TP8, TP10). The electrodes were selected based on the analyses outlined by Giacometti and colleagues (2014) and Scrivener and Reader (2022). For these analyses, a 2x2x2x2 repeated measures ANOVA with age as a covariate was conducted with group (dyslexic, control) as a between-subjects factor and condition (resting state, language task), hemisphere (left, right), and region (frontal, temporal) as within-subjects factors. Results for the alpha and beta bands were calculated for the same clusters of frontal and temporal electrodes and analyzed with a similar 2x2x2x2 repeated measures ANOVA; however, for these analyses, age was not included as a covariate due to a lack of significant correlations.”

      We also expanded the description of the statistical models used in the analyses of MRS biomarkers:

      “To analyze the metabolite results, separate univariate ANCOVAs were conducted for Glu, GABA+, Glu/GABA+ ratio and Glu/GABA+ imbalance measures with group (control, dyslexic) as a between-subjects factor and voxel gray matter volume (GMV) as a covariate. Additionally, for the Glu analysis, age was included as a covariate due to a correlation between variables. Both frequentist and Bayesian statistics were calculated. Glu/GABA+ imbalance measure was calculated as the square root of the absolute residual value of a linear relationship between Glu and GABA+ (McKeon et al., 2024).”

      With regard to metabolite quantification, it is unclear why the authors chose to analyze and report metabolite values in terms of creatine ratios rather than quantification based on a water reference given that the MRS acquisition appears to support using a water reference.

      We have decided to use the ratio of Glu and GABA to total creatine (tCr), as this is still a common practice in MRS studies at 7T (e.g., Nandi et al., 2022; Smith et al., 2021). This approach normalizes the signal, reducing the impact of intensity variations across different regions and tissue compositions. Additionally, total creatine concentration is considered relatively stable across different brain regions, which is particularly important in our study, where a functional localizer was used to establish the left STS region individually. Our decision was further influenced by previous studies on dyslexia (Del Tufo et al., 2018; Pugh et al., 2014) which have reported creatine ratios and included GM volume as a covariate in their models, thus providing comparability. It is now indicated in the Results:

      “For comparability with previous studies in dyslexia (Del Tufo et al., 2018; Pugh et al., 2014) we report Glu and GABA as a ratio to total creatine (tCr).”

      and in the Method sections:

      “Glu and GABA+ concentrations were expressed as a ratio to total-creatine (tCr; Creatine + Phosphocreatine) following previous MRS studies in dyslexia (Del Tufo et al., 2018; Pugh et al., 2014).

      We did not estimate absolute concentrations using water signals as a reference, as this would require accounting for water relaxation times, which may vary across our age range. Nevertheless, our dataset has been made publicly available for future researchers to calculate and compare absolute values.

      Del Tufo, S. N., Frost, S. J., Hoeft, F., Cutting, L. E., Molfese, P. J., Mason, G. F., Rothman, D. L., Fulbright, R. K., & Pugh, K. R. (2018). Neurochemistry Predicts Convergence of Written and Spoken Language: A Proton Magnetic Resonance Spectroscopy Study of Cross-Modal Language Integration. Frontiers in Psychology, 9, 1507. https://doi.org/10.3389/fpsyg.2018.01507

      Nandi, T., Puonti, O., Clarke, W. T., Nettekoven, C., Barron, H. C., Kolasinski, J., Hanayik, T., Hinson, E. L., Berrington, A., Bachtiar, V., Johnstone, A., Winkler, A. M., Thielscher, A., Johansen-Berg, H., & Stagg, C. J. (2022). tDCS induced GABA change is associated with the simulated electric field in M1, an effect mediated by grey matter volume in the MRS voxel. Brain Stimulation, 15(5), 1153–1162. https://doi.org/10.1016/j.brs.2022.07.049

      Pugh, K. R., Frost, S. J., Rothman, D. L., Hoeft, F., Del Tufo, S. N., Mason, G. F., Molfese, P. J., Mencl, W. E., Grigorenko, E. L., Landi, N., Preston, J. L., Jacobsen, L., Seidenberg, M. S., & Fulbright, R. K. (2014). Glutamate and choline levels predict individual differences in reading ability in emergent readers. Journal of Neuroscience, 34(11), 4082–4089. https://doi.org/10.1523/JNEUROSCI.3907-13.2014

      Smith, G. S., Oeltzschner, G., Gould, N. F., Leoutsakos, J. S., Nassery, N., Joo, J. H., Kraut, M. A., Edden, R. A. E., Barker, P. B., Wijtenburg, S. A., Rowland, L. M., & Workman, C. I. (2021). Neurotransmitters and Neurometabolites in Late-Life Depression: A Preliminary Magnetic Resonance Spectroscopy Study at 7T. Journal of Affective Disorders, 279, 417–425. https://doi.org/10.1016/j.jad.2020.10.011

      GABA is typically quantified using J-editing sequences as lower field strengths (~3T), and there is some evidence that the GABA signal can be reliably measured at 7T without editing, however, the authors should discuss potential limitations, such as reliability of Glu and GABA measurements with short-TE semi-laser at 7T.

      In addition, MRS measurements of GABA are known to be influenced by macromolecules, and GABA is often denoted as GABA+ to indicate that other compounds contribute to the measured signal, especially at a short TE and in the absence of symmetric spectral editing.

      A general discussion of the strengths and limitations of unedited Glu and GABA quantification at 7T is warranted given the interest of this work to researchers who may not be experts in MRS.

      While we agree with the Reviewer that at 3T, it is recommended to use J-edited MRS to measure GABA (Mullins et al., 2014), the better spectral resolution at 7T allows for more reliable results for both metabolites using moderate echo-time, non-edited MRS (Finkelman et al., 2022). In this study, we used a short echo time (TE), which is optimal for Glu but not ideal for GABA, as it interferes with other signals. We are grateful to the Reviewer for suggesting the addition of a short paragraph to the Discussion, describing the practicalities of 3T and 7T MRS and changing the abbreviation to GABA+ to inform readers of possible macromolecule contamination:

      “We chose ultra-high-field MRS to improve data quality (Özütemiz et al., 2023), as the increased sensitivity and spectral resolution at 7T allows for better separation of overlapping metabolites compared to lower field strengths. Additionally, 7T provides a higher signal-to-noise ratio (SNR), improving the reliability of metabolite measurements and enabling the detection of small changes in Glu and GABA concentrations. Despite these theoretical advantages, several practical obstacles should be considered, such as susceptibility artifacts and inhomogeneities at higher field strengths that can impact data quality. Interestingly, actual methodological comparisons (Pradhan et al., 2015; Terpstra et al., 2016) show only a slight practical advantage of 7T single-voxel MRS compared to optimized 3T acquisition. For example, fitting quality yielded reduced estimates of variance in concentration of Glu in 7T (CRLB) and slightly improved reproducibility levels for Glu and GABA (at both fields below 5%). Choosing the appropriate MRS sequence involves a trade-off between the accuracy of Glu and GABA measurements, as different sequences are recommended for each metabolite. J-edited MRS is recommended for measuring GABA, particularly with 3T scanners (Mullins et al., 2014). However, at 7T, more reliable results can be obtained using moderate echo-time, non-edited MRS (Finkelman et al., 2022). We have opted for a short-echo-time sequence, which is optimal for measuring Glu. However, this approach results in macromolecule contamination of the GABA signal (referred to as GABA+).”

      Finkelman, T., Furman-Haran, E., Paz, R., & Tal, A. (2022). Quantifying the excitatory-inhibitory balance: A comparison of SemiLASER and MEGA-SemiLASER for simultaneously measuring GABA and glutamate at 7T. NeuroImage, 247, 118810. https://doi.org/10.1016/j.neuroimage.2021.118810

      Mullins, P. G., McGonigle, D. J., O'Gorman, R. L., Puts, N. A., Vidyasagar, R., Evans, C. J., Cardiff Symposium on MRS of GABA, & Edden, R. A. (2014). Current practice in the use of MEGA-PRESS spectroscopy for the detection of GABA. NeuroImage, 86, 43–52. https://doi.org/10.1016/j.neuroimage.2012.12.004

      Özütemiz, C., White, M., Elvendahl, W., Eryaman, Y., Marjańska, M., Metzger, G. J., Patriat, R., Kulesa, J., Harel, N., Watanabe, Y., Grant, A., Genovese, G., & Cayci, Z. (2023). Use of a Commercial 7-T MRI Scanner for Clinical Brain Imaging: Indications, Protocols, Challenges, and Solutions-A Single-Center Experience. AJR. American Journal of Roentgenology, 221(6), 788–804. https://doi.org/10.2214/AJR.23.29342

      Pradhan, S., Bonekamp, S., Gillen, J. S., Rowland, L. M., Wijtenburg, S. A., Edden, R. A., & Barker, P. B. (2015). Comparison of single voxel brain MRS AT 3T and 7T using 32-channel head coils. Magnetic Resonance Imaging, 33(8), 1013–1018. https://doi.org/10.1016/j.mri.2015.06.003

      Terpstra, M., Cheong, I., Lyu, T., Deelchand, D. K., Emir, U. E., Bednařík, P., Eberly, L. E., & Öz, G. (2016). Test-retest reproducibility of neurochemical profiles with short-echo, single-voxel MR spectroscopy at 3T and 7T. Magnetic Resonance in Medicine, 76(4), 1083–1091. https://doi.org/10.1002/mrm.26022

      Further, the single MRS voxel location is a limitation of the study as neurochemistry can vary regionally within individuals, and the putative excitatory/inhibitory imbalance in dyslexia may appear in regions outside the left temporal cortex (e.g., network-wide or in frontal regions involved in top-down executive processes). While the functional localization of the MRS voxel is a novelty and a potential advantage, it is unclear whether voxel placement based on left-lateralized reading-related neural activity may bias the experiment to be more sensitive to small, activity-related fluctuations in neurotransmitters in the CON group vs. the DYS group who may have developed an altered, compensatory reading strategy.

      We agree that including only one region of interest for the MRS measurements is a potential limitation of our study, and we have now added this information to the Discussion:

      “Moreover, since the MRS data was collected only from the left STS, it is plausible that other areas might be associated with differences in Glu or GABA concentrations in dyslexia.”

      However, differences in Glu and GABA concentrations in this region were directly predicted by the neural noise hypothesis of dyslexia. We acknowledge that this information was missing in the previous version of the manuscript. It is now included in the Results:

      “Moreover, the neural noise hypothesis of dyslexia identifies perisylvian areas as being affected by increased glutamatergic signaling, and directly predicts associations between Glu and GABA levels in the superior temporal regions and phonological skills (Hancock et al., 2017).”

      as well as in the Discussion:

      “Nevertheless, the neural noise hypothesis predicted increased glutamatergic signaling in perisylvian regions, specifically in the left superior temporal cortex (Hancock et al., 2017).”

      Figure 1 contains a lot of information, and it may be helpful to split it into 2 figures (EEG vs. MRS) so that the plots could be made larger and the reader could more easily digest the information.

      (a) I would also recommend displaying separate metabolite fit plots for each group, since the current presentation in panel F makes it appear that the MRS data is examined by testing differences between groups across the full spectrum (where the lines diverge), which really isn't the case.

      (b) The GABA peak is not visible in the spectrum, and Glutamate and GABA both have multiple peaks that should be shown on the spectrum. This may be best achieved by displaying the individual metabolite sub-spectra below the full spectrum

      Thank you for these suggestions. We have split the information into two Figures following the Reviewer’s recommendations.

      It is not clear why the 3T structural images were used for segmentation and calculation of tissue fraction if 7T structural images were also acquired (which would presumably have higher resolution).

      Generally, T1-weighted images from the 7T scanner exhibit more artifacts than those from the 3T scanner due to higher magnetic field inhomogeneity. These artifacts are especially pronounced in regions near air-tissue interfaces, such as the temporal lobes. Therefore, we chose the 3T structural images for segmentation and tissue fraction calculations and clarified this in the Method section:

      “Voxel segmentation was performed on structural images from a 3T scanner, coregistered to 7T structural images in SPM12, as the latter exhibited excessive artifacts and intensity bias in the temporal regions”.

      The basis set includes a large number of metabolites (27), including many low-concentration metabolites/compounds (e.g., bHG, bHB, Citrate, Threonine, ethanol) that are typically only included in studies targeting specific metabolites in disease/pathology. Please justify the inclusion of this maximal set of metabolites in the basis set, given that the inclusion of overlapping low-concentration metabolites may influence metabolite measurements of interest (https://doi.org/10.1002/mrm.10246).

      There is still no consensus in the MR community on which metabolites should be included in the model of human cerebral 1H-MR spectra. Typically, only major contributors such as NAA, Cr, Cho, Lac, mI, and possibly Glx are evaluated. Some studies also include additional metabolites like Ace, Ala, Asp, GABA, Glc, Gly, sI, NAAG, and Tau. In this study, as in a few others, further metabolites such as PCh, GPC, PCr, GSH, PE, and Thr were introduced and this approach seems suitable for high-field spectra (Hofmann et al., 2002).

      Hofmann, L., Slotboom, J., Jung, B., Maloca, P., Boesch, C., & Kreis, R. (2002). Quantitative 1H-magnetic resonance spectroscopy of human brain: Influence of composition and parameterization of the basis set in linear combination model-fitting. Magnetic Resonance in Medicine, 48(3), 440–453. https://doi.org/10.1002/mrm.10246

      Please provide a figure indicating the localization of the MRS voxel for a sample subject.

      A figure indicating the localization of the MRS voxel for a sample subject was added to the MRS checklist.

      It would be helpful to include Table S1 in the main article.

      Table S1 from the Supplementary Material has now been added to the main manuscript as Table 1 in the Results section.

      Please report descriptive statistics for EEG and MRS measures in Table S1.

      We have added a new Table S1 in the Supplementary Material, providing descriptive statistics for EEG and MRS E/I balance measures, presented separately for the dyslexic and control groups.

      I recommend avoiding using the terms "direct" and "indirect" to contrast MRS and EEG measures of E/I balance. Both of these measures are imperfect and it is misleading to say that MRS is a "direct" measure of neurotransmitters. There is also ambiguity in what is meant by "direct": in contrast to EEG, MRS does not measure neural activity and does not provide high-resolution temporal information, so in a sense, it is less direct.

      Thank you for this suggestion. We have replaced the terms 'direct' and 'indirect' biomarkers with 'MRS' and 'EEG' biomarkers throughout the text.

      There are many cases throughout the results in which Bayes and frequentist stats seem to contradict each other in terms of significance and what should be included in the models, especially with regard to the interaction effects (the Bayes factors appear to favor non-significant interactions). I think this is worth considering and describing to offer more clarity for the readers.

      We agree that a discussion of the divergent results between Bayesian and frequentist models was missing in the previous version of the manuscript. To provide greater clarity for the readers, we have conducted follow-up Bayesian t-tests in every case where the results indicated the inclusion of non-significant interactions with the effect of group in the model. These additional analyses have been performed for the exponent, offset, as well as for beta bandwidth in the Supplementary Material. We have also added a paragraph addressing these discrepancies in the Discussion:

      “Remarkably, in some models, results from Bayesian and frequentist statistics yielded divergent conclusions regarding the inclusion of non-significant effects. This was observed in more complex ANOVA models, whereas no such discrepancies appeared in t-tests or correlations. Given reports of high variability in Bayesian ANOVA estimates across repeated runs of the same analysis (Pfister, 2021), these results should be interpreted with caution. Therefore, following the recommendation to simplify complex models into Bayesian t-tests for more reliable estimates (Pfister, 2021), we conducted follow-up Bayesian t-tests in every case that favored the inclusion of non-significant interactions with the group factor. These analyses provided further evidence for the lack of differences between the dyslexic and control groups. Another source of discrepancy between the two methods may stem from the inclusion of interactions between covariates and within-subject effects in frequentist ANOVA, which were not included in Bayesian ANOVA to adhere to the recommendation for simpler Bayesian models (Pfister, 2021).”

      Pfister, R. (2021). Variability of Bayes factor estimates in Bayesian analysis of variance. The Quantitative Methods for Psychology, 17(1), 40-45. doi:10.20982/tqmp.17.1.p040

      It would be helpful to indicate whether participants in the DYS group had a history of reading intervention/remediation. In addition to showing that the DYS group performed lower than the CON group on reading assessments as a whole and given their age, was the performance on the reading assessments at an individual level considered for inclusion in the study? (i.e., were participants' persistent poor reading abilities confirmed with the research assessments?)

      We were unable to assess individual reading skills due to the lack of standardized diagnostic norms for adult dyslexia in Poland. Therefore, participants in the dyslexic group were recruited based on a previous clinical diagnosis of dyslexia, and reading and reading-related tasks were used for group-level comparisons only. This information has been added to the Methods section:

      “Since there are no standardized diagnostic norms for dyslexia in adults in Poland, individuals were assigned to the dyslexic group based on a past diagnosis of dyslexia.”

      Unfortunately, we did not collect information about participants' history of reading intervention or remediation. In this context, we acknowledge that including a sample of adult participants is a potential limitation of our study, however, this was already mentioned in the Discussion.

      Regarding the fMRI task, please indicate whether the participants whose threshold and/or contrast was changed for localization were from the DYS or CON group.

      This information is now added to the Method section:

      “For 6 participants (DYS n = 2, CON n = 4), the threshold was lowered to p < .05 uncorrected, while for another 6 participants (DYS n = 3, CON n = 3) the contrast from the auditory run was changed to auditory words versus fixation cross due to a lack of activation for other contrasts.”

      Reviewer #2 (Public Review):

      Summary:

      This study utilized two complementary techniques (EEG and 7T MRI/MRS) to directly test a theory of dyslexia: the neural noise hypothesis. The authors report finding no evidence to support an excitatory/inhibitory balance, as quantified by beta in EEG and Glutamate/GABA ratio in MRS. This is important work and speaks to one potential mechanism by which increased neural noise may occur in dyslexia.

      Strengths:

      This is a well-conceived study with in-depth analyses and publicly available data for independent review. The authors provide transparency with their statistics and display the raw data points along with the averages in figures for review and interpretation. The data suggest that an E/I balance issue may not underlie deficits in dyslexia and is a meaningful and needed test of a possible mechanism for increased neural noise.

      Weaknesses:

      The researchers did not include a visual print task in the EEG task, which limits analysis of reading-specific regions such as the visual word form area, which is a commonly hypoactivated region in dyslexia. This region is a common one of interest in dyslexia, yet the researchers measured the I/E balance in only one region of interest, specific to the language network.

      We agree with the Reviewer that including different tasks for the EEG biomarkers assessment would be valuable. However, this limitation was already addressed in the Discussion:

      “Importantly, our study focused on adolescents and young adults, and the EEG recordings were conducted during rest and a spoken language task. These factors may limit the generalizability of our results. Future research should include younger populations and incorporate a broader array of tasks, such as reading and phonological processing, to provide a more comprehensive evaluation of the E/I balance hypothesis.”

      Further, this work does not consider prior studies reporting neural inconsistency; a potential consequence of increased neural noise, which has been reported in several studies and linked with candidate-dyslexia gene variants (e.g., Centanni et al., 2018, 2022; Hornickel & Kraus, 2013; Neef et al., 2017). While E/I imbalance may not be a cause of increased neural noise, other potential mechanisms remain and should be discussed.

      Thank you for referring us to other works reporting neural variability in dyslexia. We agree that a broader context regarding sources of reduced neural synchronization, beyond E/I imbalance, was missing in the previous version of the manuscript. We have now included these references in the Discussion:

      “Furthermore, although our results do not support the idea of E/I balance alterations as a source of neural noise in dyslexia, they do not preclude other mechanisms leading to less synchronous neural firing posited by the hypothesis. In this context, there is evidence showing increased trial-to-trial inconsistency of neural responses in individuals with dyslexia (Centanni et al., 2022) or poor readers (Hornickel and Kraus, 2013) and its associations with specific dyslexia risk genes (Centanni et al., 2018; Neef et al., 2017). At the same time, the observed trial-to-trial inconsistency was either present only in a subset of participants (Centanni et al., 2018), limited to some experimental conditions (Centanni et al., 2022), or specific brain regions – e.g., brainstem in Hornickel and Kraus (2013), left auditory cortex in Centanni et al. (2018), or left supramarginal gyrus in Centanni et al. (2022).”

      A better description of the exponent and offset components is needed at the beginning of the results, given that the methods are presented in detail at the end. I also do not see a clear description of these components in the methods.

      A description of the aperiodic components is now included in the Results:

      “In the initial step of the analysis, we analyzed the aperiodic (exponent and offset) components of the EEG spectrum. The exponent reflects the steepness of the EEG power spectrum, with a higher exponent indicating a steeper signal; while the offset represents a uniform shift in power across frequencies, with a higher offset indicating greater power across the entire EEG spectrum (Donoghue et al., 2020).”

      as well as in the Materials and Methods:

      “Two broadband aperiodic parameters were extracted: the exponent, which quantifies the steepness of the EEG power spectrum, and the offset, which indicates signal’s power across the entire frequency spectrum.”

      Reviewer #3 (Public Review):

      Summary:

      This study by Glica and colleagues utilized EEG (i.e., Beta power, Gamma power, and aperiodic activity) and 7T MRS (i.e., MRS IE ratio, IE balance) to reevaluate the neural noise hypothesis in Dyslexia. Supported by Bayesian statistics, their results show solid 'no evidence' of EI balance differences between groups, challenging the neural noise hypothesis. The work will be of broad interest to neuroscientists, and educational and clinical psychologists.

      Strengths:

      Combining EEG and 7T MRS, this study utilized both the indirect (i.e., Beta power, Gamma power, and aperiodic activity) and direct (i.e., MRS IE ratio, IE balance) measures to reevaluate the neural noise hypothesis in Dyslexia.

      Weaknesses:

      The authors may need to provide more data to assess the quality of the MRS data.

      We have addressed the following specific recommendations of the Reviewer providing more data about the quality of the MRS data.

      The authors may need to explain how the number of subjects is determined in the MRS section.

      We have clarified the MRS sample description in the Results section:

      “Due to financial and logistical constraints, 59 out of the 120 recruited subjects, selected progressively as the study unfolded, were examined with MRS. Subjects were matched by age and sex between the dyslexic and control groups. Due to technical issues and to prevent delays and discomfort for the participants, we collected 54 complete sessions. Additionally, four datasets were excluded based on our quality control criteria, and three GABA+ estimates exceeded the selected CRLB threshold. Ultimately, we report 50 estimates for Glu (21 participants with dyslexia) and 47 for GABA+ and Glu/GABA+ ratios (20 participants with dyslexia).”

      Is there a reason why theta and gamma peaks were not observed in the majority of participants? What are the possible reasons that likely caused the discrepancy between this study and previously reported relevant studies?

      We have now added a discussion about the absence of oscillatory peaks in the theta and gamma bands to the Discussion section:

      “We could not perform analyses for the gamma oscillations since in the majority of participants the gamma peak was not detected above the aperiodic component. Due to the 1/f properties of the EEG spectrum, both aperiodic and periodic components should be disentangled to analyze ‘true’ gamma oscillations; however, this approach is not typically recognized in electrophysiology research (Hudson and Jones, 2022). Indeed, previous studies that analyzed gamma activity in dyslexia (Babiloni et al., 2012; Lasnick et al., 2023; Rufener and Zaehle, 2021) did not separate the background aperiodic activity. For the same reason, we could not analyze results for the theta band, which often does not meet the criteria for an oscillatory component manifested as a peak in the power spectrum (Klimesch, 1999). Moreover, results from a study investigating developmental changes in both periodic and aperiodic components suggest that theta oscillations in older participants are mostly observed in frontal midline electrodes (Cellier et al., 2021), which were not analyzed in the current study.”

      Hudson, M. R., & Jones, N. C. (2022). Deciphering the code: Identifying true gamma neural oscillations. Experimental Neurology357, 114205. https://doi.org/10.1016/j.expneurol.2022.114205

      Klimesch, W. (1999). EEG alpha and theta oscillations reflect cognitive and memory performance: A review and analysis. Brain Research Reviews29(2-3), 169-195. https://doi.org/10.1016/S0165-0173(98)00056-3

      Based on Figure 1F, the quality of the MRS data may be contaminated by the lipid signal, especially for the DYS group. To better evaluate the MRS data, especially the GABA measurements, the authors need to show:

      (a) the placement of the MRS voxel on the anatomical images;

      Averaged MRS voxel placement was already presented in Figure 1 (now Figure 2) in the manuscript. Now, we have also added exemplary single-subject images to the MRS checklist in the Supplement.

      (b) Glu and GABA model functions

      We have now provided more meaningful Glu and GABA indications in Figure 2.

      (c) CRLB for GABA

      We have added respective estimates to the Supplement:

      %CRLB of Glu: mean 2.96, SD = 0.79

      %CRLB of GABA: mean 10.59, SD = 2.76

      %CRLB of NAA: 1.76 SD = 0.46

      Further, the authors added voxel's gray matter volume as a covariate when performing separate ANCOVAs. The authors may need to use alpha correction or 1-fCSF correction to corroborate these results.

      We chose to use the ratio of Glu and GABA to total creatine (tCr), as this remains a common practice in MRS studies at 7T (e.g., Nandi et al., 2022; Smith et al., 2021). This decision was also influenced by previous dyslexia studies (Del Tufo et al., 2018; Pugh et al., 2014) and is now clarified in the Results and Methods sections.

      Regarding alpha correction, a recent paper (García-Pérez et al., 2023) recommends: 'In general, avoid corrections for multiple testing if statistical claims are to be made for each individual test, in the absence of an omnibus null hypothesis.' Since we report null findings, further alpha correction would not significantly impact the results.

      García-Pérez, M. A. (2023). Use and misuse of corrections for multiple testing. Methods in Psychology8, 100120. https://doi.org/10.1016/j.metip.2023.100120

    1. eLife Assessment

      This is a useful analysis of STORM data that characterizes the clustering of active zones in retinogeniculate terminals across ages and in the absence of retinal waves. The design makes it possible to relate fixed time point structural data to a known outcome of activity-dependent remodeling. However, the evidence is incomplete, weakening the claims the authors make regarding how activity influences the clustering of these synapses.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript addresses the question of whether spontaneous activity contributes to the clustering of retinogeniculate synapses before eye opening. The authors re-analyze a previously published dataset to answer the question. The authors conclude that synaptic clustering is eye-specific and activity dependent during the first postnatal week. While there is useful information in this manuscript, I don't see how the data meaningfully supports the claims made about clustering.<br /> In adult retinogeniculate connections, functionally specificity is supported by select pairings of retinal ganglion cells and thalamocortical cells forming dozens of synaptic connections in subcellular microcircuits called glomeruli. In this manuscript, the authors measure whether the frequency of nearby synapses is higher in the observed data than in a model where synapses are randomly distributed throughout the volume. Any real anatomical data will deviate from such a model. The interesting biological question is not whether a developmental state deviates from random. The interesting question is how much of the adult clustering occurs before eye opening. In trying to decode the analysis in this manuscript, I can't tell if the answer is 99% or 0.001%.

      Strengths:

      The source dataset is high resolution data showing the colocalization of multiple synaptic proteins across development. Added to this data is labeling that distinguishes axons from the right eye from axons from the left eye. The first order analysis of this data showing changes in synapse density and in the occurrence of multi-active zone synapses is useful information about the development of an important model system.

      Weaknesses:

      I don't think the analysis of clustering within this dataset improves our understanding of how the system works. It is possible that the result is clear to the authors based on looking at the images. As a reader trying to interpret the analysis, I ran into the following problems:

      • It is not possible to estimate biologically meaningful effect sizes from the data provided. Spontaneous activity in the post natal week could be responsible for 99% or 0.001% of RGC synapse clustering.<br /> • There is no clear biological interpretation of the core measure of the publication, the normalized clustering index. The normalized clustering index starts with counting the fraction of single active zone synapses within various distances to the edge of synapses. This frequency is compared to a randomization model in which the positions of synapses are randomized throughout a volume. The authors found that the biggest deviation between the observed and randomized proximity frequency using a distance threshold of 1.5 um. They consider the deviation from the random model to be a sign of clustering. However, two RGC synapses 1.5 um apart have a good chance of coming from the same RGC axon. At this scale, real observations will, therefore, always look more clustered than a model where synapses are randomly placed in a volume. If you randomly place synapses on an axon, they will be much closer together than if you randomly place synapses within a volume. The authors normalize their clustering measure by dividing by the frequency of clustering in the normalized model. That makes the measure of clustering an ambiguous mix of synapse clustering, axon morphology, and synaptic density.<br /> • Other measures are also very derived. For instance, one argument is based on determining that the cumulative distribution of the distance of dominant-eye multi-active zone synapses with nearby single-active zone synapses from dominant-eye multi-active zone synapses is statistically different from the cumulative distribution of the distance of dominant-eye multi-active zones without nearby single-active zone synapses from dominant-eye multi-active zones. Multiple permutations of this measure are compared.<br /> • The sample size is too small for the kinds of comparisons being made. The authors point out that many STORM studies use an n of 1 while the authors have n = 3 for each of their six experimental groups. However, the critical bit is what kinds of questions you are trying to answer with a given sample size. This study depends on determining whether the differences between groups are due to age, genotype, or individual variation. This study also makes multiple comparisons of many different noisy parameters that test the same or similar hypothesis. In this context, it is unlikely that n = 3 sufficiently controls for individual variation.<br /> • There are major biological differences between groups that are difficult to control for. Between P2, P4, and P8, there are changes in cell morphology and synaptic density. There are also large differences in synapse density between wild type and KO mice. It is difficult to be confident that these differences are not responsible for the relatively subtle changes in clustering indices.<br /> • Many claims are based on complicated comparisons between groups rather than the predominating effects within the data. It is noted that: "In KO mice, dominant eye projections showed increased clustering around mAZ synapses compared to sAC synapses suggesting partial maintenance of synaptic clustering despite retinal wave defects". In contrast, I did not notice any discussion of the fact that the most striking trend in those measures is that the clustering index decreases from P2 to P8.<br /> • Statistics are improperly applied. In my first review I tried to push the authors to calculate confidence intervals for two reasons. First, I believed the reader should be able to answer questions such as whether 99% or 0.01% of RGC synaptic clustering occurred in the first postnatal week. Second, I wanted the authors to deal with the fact that n=3 is underpowered for many of the questions they were asking. While many confidence intervals can now be found leading up to a claim, it is difficult to find claims that are directly supported by the correct confidence interval. Many claims are still incorrectly based on which combinations of comparisons produced statistically significant differences and which combinations did not.

    3. Reviewer #2 (Public review):

      Summary:

      This study provides a valuable data set showing changes in the spatial organization of synaptic proteins at the retinogeniculate connection during a developmental period of active axonal and synaptic remodeling. The data collected by STORM microscopy is state-of-the-art in terms of the high-resolution view of the presynaptic components of a plastic synapse. The revision has addressed many, but not all, of the initial concerns about the authors interpretation of their data. However, with the revisions, the manuscript has become very dense and difficult to follow.

      Strengths:

      The data presented is of good quality and provides an unprecedented view at high resolution of the presynaptic components of the retinogeniculate synapse during active developmental remodeling. This approach offers an advance to the previous mouse EM studies of this synapse because the CTB label allows identification of the eye from which the presynaptic terminal arises.

      Weaknesses:

      From these data the authors conclude that eye-specific increase in mAZ synapse density occur over retinogeniculate refinement, that sAZ synapses cluster close to mAZ synapses over age, and that this process depends on spontaneous activity and proximity to eye-specific mAZ synapses. While the interpretation of this data set is much more grounded in this revised submission, some of the authors' conclusions/statements still lack convincing supporting evidence.<br /> This includes:

      (1) The conclusion that multi-active zone synapses are loci for synaptic clustering. This statement, or similar ones (e.g., line 407) suggest that mAZ synapses actively or through some indirect way influence the clustering of sAZ synapses. There is no evidence for this. Clustering of retinal synapses are in part due to the fact that retinal inputs synapse on the proximal dendrites. With increased synaptogenesis, there will be increased density of retinal terminals that are closely localized. And with development, perhaps sAZ synapses mature into mAZ synapses. This scenario could also explain a large part of this data set.

      (2) The conclusion that, "clustering depends on spontaneous retinal activity" could be misleading to the reader given that the authors acknowledge that their data is most consistent with a failure of synaptogenesis in the mutant mice (in the rebuttal). Additionally clustering does occur in CTB+ projections around mAZ synapses.

      (3). Line 403: "Since mAZ synapses are expected to have a higher release probability, they likely play an important role in driving plasticity mechanisms reliant on neurotransmission.":What evidence do the authors have that mAZ are expected to have higher release probability?

    4. Reviewer #3 (Public review):

      This study is a follow-up to a recent study of synaptic development based on a powerful data set that combines anterograde labeling, immunofluorescence labeling of synaptic proteins, and STORM imaging (Cell Reports, 2023). Specifically, they use anti-Vglut2 label to determine the size of the presynaptic structure (which they describe as the vesicle pool size), anti-Bassoon to label active zones with the resolution to count them, and anti-Homer to identify postsynaptic densities. Their previous study compared the detailed synaptic structure across the development of synapses made with contra-projecting vs. ipsi-projecting RGCs and compared this developmental profile with a mouse model with reduced retinal waves. In this study, they produce a new detailed analysis on the same data set in which they classify synapses into "multi-active zone" vs. "single-active zone" synapses and assess the number and spacing of these synapses. The authors use measurements to make conclusions about the role of retinal waves in the generation of same-eye synaptic clusters, providing key insight into how neural activity drives synapse maturation.

      Strengths:

      This is a fantastic data set for describing the structural details of synapse development in a part of the brain undergoing activity-dependent synaptic rearrangements. The fact that they can differentiate eye of origin is what makes this data set unique over previous structural work. The addition of example images from EM data set provides confidence in their categorization scheme.

      Weaknesses:

      Though the descriptions of synaptic clusters are important and represent a significant advance, the authors conclusions regarding the biological processes driving these clusters are not testable by such a small sample. This limitation is expected given the massive effort that goes into generating this data set. Of course the authors are free to speculate, but many of the conclusions of the paper are not statistically supported.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      This publication applies 3D super-resolution STORM imaging to understanding the role of developmental neural activity in the clustering of retinal inputs to the mouse dorsal lateral geniculate nucleus (dLGN). The authors argue that retinal ganglion cell (RGC) synaptic boutons start forming clusters early in postnatal development (P2). They then argue that these clusters contribute to eye-specific segregation of retinal inputs by activity-dependent stabilization of nearby boutons from the same eye. The data provided is N=3 animals for each condition of P2, P4, and P8 animals in wild-type mice and in mice where early patterns of structured retinal activity are blocked.

      Strengths:

      The 3D storm imaging of pre and postsynaptic elements provides convincing high-resolution localization of synapses.

      The experimental design of comparing ipsilateral and contralateral RGC axon boutons in a region of the dLGN that is known to become contralateral is elegant. The design makes it possible to relate fixed time point structural data to a known outcome of activity-dependent remodeling.

      Weaknesses:

      Based on previous literature, it is known that synapse density, synapse clustering, and synaptic specificity increase during postnatal development. Previous work has also shown that both the changes in synaptic clustering and synaptic specificity are affected by retinal activity. The data and analysis provided by the authors add little unambiguous evidence that advances this understanding.

      We agree with the reviewer that previous literature shows that synapse density, synapse clustering, and synaptic specificity increase during postnatal development and that these processes are affected by retinal activity. The majority of studies on synaptic refinement have been performed after eye-opening, when eye-specific segregation is already complete. In contrast, most studies of eye-specific segregation focus on axonal refinement phenotypes. To our knowledge, only a small number of experiments have examined retinogeniculate synaptic properties at the nanoscale during eye-specific segregation (1-4). Our broad goal is to understand the mechanisms of synaptogenesis and competition at the earliest stages of eye-specific refinement, when spontaneous retinal activity is a major driver of activity-dependent remodeling. We hope that readers will appreciate that there is still much to discover in this fascinating model system of synaptic competition.

      General problem 1: Most of the statistical analysis is limited to ANOVA comparison of axons from the contralateral and ipsilateral retina in the contralateral dLGN. The hypothesis that ipsilateral and contralateral axons would be statistically identical in the contralateral dLGN is not a plausible hypothesis so rejecting the hypothesis with P < X does not advance the authors' arguments beyond what was already known.

      General problem 2: Most of the interpretation of data is qualitative. While error bars are provided, these error bars are not used to draw conclusions. Given the small sample size (N=3), there is a large degree of uncertainty regarding the magnitude of changes (synapse size, number, specificity). The authors base their conclusions on the averages of these values when the likely degree of uncertainty could allow for the opposite interpretation.

      We appreciate the reviewer’s concerns regarding the use of ANOVA for statistical testing in the original submission. We have generated new figures that show confidence intervals for each analysis in the manuscript and these are included in the response to reviewers document below. To address the underlying concern that our N=3 sample size limits the interpretation of our results, we have revised the manuscript to be cautious in our interpretations and to discuss additional possibilities that are consistent with the anatomical data.

      General problem 3: Two of the four results sections depend on using the frequency of single active zone vGlut2 clusters near multiple active zone vGlut2 as a proxy for synaptic stabilization of the single active zone vGlut2 clusters by the multiple active zone vGlut2 clusters. The authors argue that the increased frequency of same-eye single active zone clusters relative to opposite-eye single active zone clusters means that multiple active zone vGlut2 clusters are selectively stabilizing single active zone clusters. There are other plausible explanations for this observation that are not eliminated. An increased frequency of nearby single active zone clusters would also occur if RGC axons form more than one synapse in the dLGN. Eye-specific segregation is, by definition, a relative increase in the frequency of nearby boutons from the same eye. The authors were, therefore, guaranteed to observe a non-random relationship between boutons from the same eye. The authors do compare their measures to a random model, but I could not find a description of the model. I would expect that the model would need to account for RGC arbor size, arbor structure, bouton number, and segregation independent of multi-active-zone vGlut2 clusters. The most common randomization for the type of analysis described here, a shift in the positions of single-active zone boutons, would not be adequate.<br /> In discussing the claimed cluster-induced stabilization of nearby boutons, the authors state that the specificity increases with age due to activity-dependent refinement. Their quantification does not support an increase in specificity with age. In fact, the high degree of clustering "specificity" they observe at P2 argues for the trivial same axon explanation.

      We agree with the reviewer that individual RGC axons form multiple synapses and that, over time, eye-specific segregation must increase the frequency of like-eye synapses relative to opposite-eye synapses. Indeed, our previous study of eye-specific refinement showed that at P8, the density of eye-specific inputs had increased for the dominant-eye and decreased for the non-dominant-eye (1). However, at postnatal day 4, contralateral and ipsilateral input densities were the same in the future contralateral-eye territory. One of our goals in this study was to determine if the process of synaptic clustering begins at these earliest stages of synaptic competition and, if so, whether it is influenced by retinal wave activity. It is plausible that the RGC axons from the same eye could initially form synapses randomly and, at some later stage, synapses may be selectively added to produce mature glomeruli. Consistent with this possibility, previous analysis of JAM-B RGC axon refinement showed the progressive clustering of axonal boutons at later stages of development after eye-specific segregation (5).

      Regarding the randomization that we employed, we performed a repositioning of synapse centroids within the volume of the neuropil after accounting for neuronal soma volumes and edge effects. We agree that this type of randomization cannot account for the fine scale structure of axons and dendrites, which we did not have access to in this four-color volumetric super-resolution data set. To address this, we have performed additional clustering analyses surrounding both single-active zone and multi-active zone synapses. This new analysis showed that there is a modest clustering effect around single-active zone synapses compared to complete randomization described above. We now present this information using a normalized clustering index for direct comparison of clustering between multi-active zone and single-active zone synapses. We have measured effect sizes and confidence intervals, which we present in point-by-point responses below. We have restructured the manuscript figures and discussion to provide a balanced interpretation of our results and the limitations of our study.

      Analysis of specific claims:

      Result Section 1

      Most of the figures show mean, error bars, and asterisks, but not the three data points from which these statistics are derived. Large changes in variance from condition to condition suggest that displaying the data points would provide more useful information.

      We thank the reviewer for their suggestion. We have updated all figures to display the means of all biological replicates as individual data points.

      Claim 1: Contralateral density increases more than ipsilateral in the contralateral region over the course of development. This claim is supported by the qualitative comparison of means and error bars in Figure 2D. The argument could be made quantitative by providing a confidence interval for synapse density increase for dominant and non-dominant synapse density. A confidence interval could then be generated for the difference in this change between the two groups. Currently, the most striking effect is a big difference in variance between P4 and P8 for dominant eye complex synapses. Given that N=3, I assume there is one extreme outlier here.

      We appreciate the comment and believe the reviewer was referring to the data presented in the original Figure 1D, rather than Figure 2D.

      We agree with the reviewer that our comment on the change in synapse density across ages was not quantitatively supported by the figure as we did not perform a proper age-wise statistical comparison. We have removed this claim in the revised manuscript.

      We also appreciate the suggestions to clarify the presentation of our statistical analyses and to utilize confidence interval measurements wherever possible. We present Author response image 1 below, showing the density of multi-AZ synapses in the contralateral-eye territory over time (P2-P8), for both CTB(+) contralateral (black) and CTB(-) ipsilateral inputs (red) featuring 5/95% confidence intervals:

      Author response image 1.

      More broadly, the reviewer has raised the concern that the low number of biological replicates (N=3) presents challenges in the use of ANOVA for statistical testing. We agree with the concern and have revised the manuscript to be cautious in our statistical tests and resulting claims. We have chosen to use paired T-tests to compare measurements of eye-specific synapse properties because these measurements were always made within each individual biological replicate (paired measurements). Below, we discuss our logic for this change and the effects on the results we present in the revised manuscript.

      Considering the above image:

      (1) ANOVA: In our initial submission, we used an ANOVA test which showed P<0.05 for the CTB(+) P4 vs. P8 comparison above, leading to our statement about an age-dependent increase in multi-AZ density. However, the figure above shows that P8 data has higher variance. Thus, the homogeneity of variance assumption of ANOVA may lead to false positives in this comparison.

      (2) Confidence interval for N=3: We calculated confidence intervals for P4 and P8 data (5/95% CI shown above). Overlap between the two groups indicates the true mean values of the two groups could be identical. However, the P8 confidence intervals (as well as other confidence intervals across other comparisons in the manuscript) also include the value of 0. This indicates there actually might be no multi-active zone synapses in the mouse dLGN. The failure arises because the low number of biological replicates (N=3 data points) precludes a reliable confidence interval measurement. CI measurements require sufficient sample sizes to determine the true population variance.

      (3) Difficulty in achieving sufficient sample sizes for CI analysis in ultrastructural studies of the brain: volumetric STORM experiments are technically complex and make use of sample preparation and analysis methods that are similar to volumetric electron microscopy (physical ultrathin sectioning and computational 3D stack alignment). For these technical reasons, it is difficult to collect imaging data from >10 mice for each group of data (e.g. age and tissue location) in one single project. Because of the technical challenges, most ultrastructural studies published to date present results from single biological replicates. In our STORM dataset, we collected imaging data of N=3 biological replicates for each age and genotype. We agree that in the future the collection of additional replicates will be important for improving the reliability of statistical comparisons in super-resolution and electron-microscopy studies. Continued advances in the throughput of imaging/analysis should help to make this easier over time. 

      (4) The use of paired T-tests: In this study, we have eye-specific CTB(+) and CTB(-) synapse imaging data from the same STORM fields within single biological replicates. When there is only one measurement from each replicate (e.g. synapse density, ratio of total synapses), using paired tests to compare these groups increases statistical power and does not assume similar variance. However, this limits our analysis to comparisons within each age, and not between ages. Accordingly, we have revised our discussion of the results and interpretations throughout the manuscript. When there are thousands of measurements of synapses from each replicate (e.g. Figure 2A-B on synapse volumes), we use a mixed linear model to analyze the variance. In the revised figures we present the results using standard error of the mean and link measurements from within the same individual replicates to show the paired data structure. In cases where specific comparisons are made across ages, we present 5/95% confidence interval measurements.

      Claim 2: The fraction of multiple-active zone vGlut2 clusters increases with age. This claim is weakly supported by a qualitative reading of panel 1E. The error bars overlap so it is difficult to know what the range of possible increases could be. In the text, the authors report mean differences without confidence intervals (or any other statistics). The reported results should, therefore, be interpreted as a description of their three mice and not as evidence about mice in general.

      We appreciate the reviewer’s concern that statistical accuracy of our synapse density comparisons over age is limited by the small sample size as discussed above. We have removed all strong claims about age-dependent changes in the density of multi-active zone and single-active zone synapses. Instead, we focus our analyses on comparisons between CTB(+) and CTB(-) synapse measurements, which are paired within each biological replicate. To specifically address the reviewer’s concern about figure panel 1E, we present Author response image 2 with confidence intervals below.

      Author response image 2.

      Figure S1. Panel A makes the point that the study could not be done without STORM by comparing the STORM images to "Conventional" images. The images are over-saturated low-resolution images. A reasonable comparison would be to a high-quality quality confocal image acquired with a high NA objective (~1.4) and low laser power (PSF ~ 0.2 x 0.2 x 0.6 um) that was acquired over the same amount of time it takes to acquire a STORM volume.

      We agree with the reviewer that the presentation of low-resolution conventional images is not necessary. We have deleted the panel and modified the text accordingly.

      Result section 2.

      Claim 1: The ipsi/contra (in contra LGN) difference in VGluT2 cluster volume increases with development. While there are many p-values listed, the main point is not directly quantified. A reasonable way to quantify the relative increase in volume could be in the form: the non-dominant volumes were 75%-95%(?) of the dominant volume at P2 and 60%-80% (?) at P8. The difference in change was -5 to 15%(?).

      We thank the reviewer for their helpful suggestion to improve the clarity of the results presented in this analysis of eye-specific synapse volumes. In our original report, we found differences in eye-specific VGluT2 volume at each time point (P2/P4/P8) in control mice (1). The original measurements used the entire synapse population. Here, we aimed to determine whether eye-specific differences in VGluT2 volumes were present for both multi-AZ synapses and single-AZ synapses, and whether one population may have a greater contribution to the previous population measurement that we reported. We found that at P4 (a time when the overall eye-specific synapse density is equivalent for both eyes in the dLGN), WT multi-AZ synapses showed a greater difference (372%) in eye-specific VGluT2 volume compared with single-AZ synapses (135%). In β2KO mice multi-AZ synapses showed a greater difference (110%) in eye-specific VGluT2 volume compared with single-AZ synapses (41%). In our initial manuscript submission, we included statistical comparisons of eye-specific volume differences across ages, but we did not highlight these differences in our discussion of the results. For clarity, we have removed all statistical comparisons across ages in the revised manuscript. We have modified the text to focus on eye-specific VGluT2 volume differences at P4 described above. To specifically address the reviewer’s question, we provide the percentage differences between multi- and single-AZ eye-specific synapses for each age/genotype below:

      Author response table 1.

      Claim 2: Complex synapses (vGlut2 clusters with multiple active zones) represent clusters of simple synapses and not single large boutons with multiple active zones. The authors argue that because vGlut2 cluster volume scales roughly linearly with active zone number, the vGlut2 clusters are composed of multiple boutons each containing a single active zone. Their analysis does not rule out the (known to be true) possibility that RGC bouton sizes are much larger in boutons with multiple active zones. The correlation of volume and active zone number, by itself, does not resolve the issue. A good argument for multiple boutons might be that the variance is smallest in clusters with 4 active zones (looks like it in the plot) since they would be the average of four active zones to vesicle pool ratios. It is very likely that the multi-active zone vGlut2 clusters represent some clustering and some multi-synaptic boutons. The reference cited by the authors as evidence for the presence of single active zone boutons in young tissue does not rule out the existence of multiple active zone boutons.

      We agree with the reviewer’s comments on the challenges of classifying multi-active zone synapses in STORM images as single terminals versus aggregates of terminals. To help address this, we have performed electron microscopy imaging of genetically labeled RGC axons and identified the existence of single retinogeniculate terminals with multiple active zones. Our EM imaging was limited to 2D sections and does not rule out the clustering of small, single- active zone synapses within 3D volumes. Future volumetric EM reconstructions will be informative for this question. We have significantly updated the figures and text to discuss the new results and provide a careful interpretation of the nature of multi-AZ synapses in STORM imaging data. 

      Several arguments are made that depend on the interpretation of "not statistically significant" (n.s.) meaning that "two groups are the same" instead of "we don't know if they are different". This interpretation is incorrect and materially impacts the conclusions.

      Several arguments are made that interpret statistical significance for one group and a lack of statistical significance for another group meaning that the effect was bigger in the first group. This interpretation is incorrect and materially impacts the conclusions.

      We thank the reviewer for raising these concerns. We have extensively revised the manuscript text to report the data in a more precise way without overinterpreting the results. All references to “N.S.” and associated conclusions have been either removed or substantiated with 5/95% confidence interval testing.

      Result Section 3.

      Claim 1: Complex synapses stabilize simple synapses. There are alternative explanations (mentioned above) for the observed clustering that negate the conclusions. 1) Boutons from the same axon tend to be found near one another. 2) Any form of eye-specific segregation would produce non-random associations in the analysis as performed. The authors compare each observation to a random model, but I cannot determine from the text if the model adequately accounts for alternative explanations.

      We thank the reviewer for their suggestion to consider alternative explanations for our results. We agree that our study does not provide direct molecular mechanistic data demonstrating synaptic stabilization effects. We have significantly revised the manuscript to be more cautious in our interpretations and specifically address alternative biological mechanisms that are consistent with the non-random arrangement of retinogeniculate synapses in our data.

      We agree with the reviewer that individual RGC axons form multiple synapses, however, nascent synapses might not always form close together. If synapses are initially added randomly within RGC axons, eye-specific segregation may conclude with a still-random pattern of dominant-eye inputs. At some later stage, synapses may be selectively refined to produce mature glomeruli. Consistent with this, individual RGCs undergo progressive clustering of axonal boutons at later stages of development after eye-specific segregation (5). One of our goals in this work was to determine if the process of synaptic clustering begins at the earliest stages of synapse formation and, if so, whether it is influenced by retinal wave activity.

      To measure synaptic clustering in our STORM data, we used a randomization of single-AZ synapse centroids within the volume of the neuropil after accounting for neuronal soma volumes and edge effects. Multi-AZ centroid positions were held fixed. Comparing the randomized result to the original distribution, we found a higher fraction of single-AZ synapse associated with multi-AZ synapses, arguing for a non-random clustering effect. However, we agree with the reviewer’s concern that this type of randomization cannot account for the fine scale structure of axons, which we did not have access to in this four-color volumetric super-resolution data set. Thus, there could still be errors in a purely volumetric randomization (e.g. the assignment of synapses to regions in the volume that would not be synaptic locations in the original neuropil), which would effectively decrease the measured degree of clustering after the randomization. To address this, we have revised our analysis to measure the degree of synapse clustering nearby both multi-AZ and single-AZ synapses after an equivalent randomization of single-AZ synapse positions in the volume. 

      We now present the revised results as a “clustering index” for both multi-AZ and single-AZ synapses. This measurement was performed in several steps: 1) randomization of single-AZ position with the imaging volume while holding multi-AZ centroid positions fixed, 2) independent measurements of the fraction of single-AZ synapses within the local shell (1.5 μm search radius) around multi-AZ and single-AZ synapses within the random distribution, 3) comparison of the result from (2) with the actual fractional measurements in the raw STORM data to compute a “clustering index” value. 4) Because the randomization is equivalent for both multi-AZ and single-AZ synapse measurements, any measured differences in the degree of clustering reflect the synapse type.

      We have updated Figure 3 in the revised manuscript to present the relative clustering index described above. We have updated the results, discussion, and methods sections accordingly.

      The authors claim that specificity increases over time. Figure 3b (middle) shows that the number of synapses near complex synapses might increase with time (needs confidence interval for effect size), but does not show that specificity (original relative to randomized) increases with time. The fact that nearby simple synapse density is always (P2) very different from random suggests a primarily non-activity-dependent explanation. The simplest explanation is that same-side boutons could be from the same axon whereas different-side axons could not be.

      We have significantly revised the analysis and presentation of results in Figure 3 to include a comparative measurement of synaptic clustering between multi-AZ and single-AZ synapses (discussed above). The data presented in the original Figure 3B have been moved to Supplemental Figure 4. Statistical comparisons in Figure S4 between the original and randomized synapse distributions are limited to within-age measurements. Cross-age comparisons were not performed or presented. To address the reviewer’s question concerning CI analysis in the original Figure 3B, we provide Author response image 3 below showing 5/95% confidence intervals for WT mice:

      Author response image 3.

      Claim 2: vGlut2 clusters more than 1.5 um away from multi-active zone vGlut2 clusters are not statistically significantly different in size than vGlut2 clusters within 1.5 um of multi-active zone vGlut2 clusters. Therefore "activity-dependent synapse stabilization mechanisms do not impact simple synapse vesicle pool size". The specific measure of 1.5 um from multi-active zone vGlut2 clusters does not represent all possible synapse stabilization mechanisms.

      We agree with the reviewer that this specific measure does not capture all possible synapse stabilization mechanisms. We have modified the text in the revised manuscript throughout to be more cautious in our data interpretation and have included additional discussion of alternative mechanisms consistent with our results.

      Result Section 4.

      Claim: The proximity of complex synapses with nearby simple synapses to other complex synapses with nearby simple synapses from the same eye is used to argue that activity is responsible for all this clustering.

      It is difficult to derive anything from the quantification besides 'not-random'. That is a problem because we already know that axons from the left and right eye segregate during the period being studied. All the measures in Section 4 are influenced by eye-specific segregation. Given this known bias, demonstrating a non-random relationship (P<X) doesn't mean anything. The test will reveal any non-random spatial relationship between same-eye and opposite-eye synapses.

      The results can be stated as: If you are a contralateral complex synapse, contralateral complex synapses that are also close to contralateral simple synapses will, on average, be slightly closer to you than contralateral complex synapses that are not close to contralateral ipsilateral synapses. That would be true if there is any eye-specific segregation (which there is).

      We appreciate the reviewer’s comments that our anatomical data are consistent with several possible mechanisms, suggesting the need for alternative interpretations of the results. In the original writing, we interpreted our results in the context of activity-dependent mechanisms of like-eye stabilization and opposite-eye competition. However, our results are also consistent with other mechanisms, including non-random molecular specification of eye-specific inputs onto subregions of postsynaptic target cells (e.g. distinct relay neuron dendrites). We have rewritten the manuscript to be more cautious in our interpretations and to provide a balanced discussion of alternative possibilities.

      Regarding the concern that the data in section four are influenced by eye-specific segregation, we previously found synapse density from both eyes is equivalent in the contralateral region at the P4 time point presented (1), which is consistent with binocular axonal overlap at this age. Within our imaging volumes, ipsilateral and contralateral inputs were broadly intermingled throughout the volume, and we did not find evidence for regional segregation with the imaging fields. By these metrics, retraction of ipsilateral inputs from the contralateral territory has not yet occurred.

      It is an overinterpretation of the data to claim that the lack of a clear correlation between vGlut2 cluster volume and distance to vGlut2 clusters with multiple active zones provides support for the claim that "presynaptic protein organization is not influenced by mechanisms governing synaptic clustering".

      We agree with the reviewer that our original language was imprecise in referring to presynaptic protein organization broadly. We have revised this text to present a more accurate description of the results.

      Reviewer #2 (Public Review):

      In this manuscript, Zhang and Speer examine changes in the spatial organization of synaptic proteins during eye-specific segregation, a developmental period when axons from the two eyes initially mingle and gradually segregate into eye-specific regions of the dorsal lateral geniculate. The authors use STORM microscopy and immunostain presynaptic (VGluT2, Bassoon) and postsynaptic (Homer) proteins to identify synaptic release sites. Activity-dependent changes in this spatial organization are identified by comparing the β2KO mice to WT mice. They describe two types of presynaptic organization based on Bassoon clustering, the complex and the simple synapse. By analyzing the relative densities and distances between these proteins over age, the authors conclude that the complex synapses promote the clustering of simple synapses nearby to form the future mature glomerular synaptic structure.

      Strengths:

      The data presented is of good quality and provides an unprecedented view at high resolution of the presynaptic components of the retinogeniculate synapse during active developmental remodeling. This approach offers an advance to the previous mouse EM studies of this synapse because of the CTB label allows identification of the eye from which the presynaptic terminal arises. Using this approach, the authors find that simple synapses cluster close to complex synapses over age, that complex synapse density increases with age.

      Weaknesses:

      From these data, the authors conclude that the complex synapse serves to "promote clustering of like-eye synapses and prohibit synapse clustering from the opposite eye". However, the authors show no causal data to support these ideas. There are a number of issues that the authors should consider:

      (1) Clustering of retinal synapses is in part due to the fact that retinal inputs synapse on the proximal dendrites. With increased synaptogenesis, there will be increased density of retinal terminals that are closely localized. And with development, perhaps simple synapses mature into complex synapses. Simple synapses may also represent ones that are in the process of being eliminated as previously described by Campbell and Shatz, JNeurosci 1992 (consider citing). Can the authors distinguish these scenarios from the ones that they conclude?

      We thank the reviewer for their thoughtful commentary and suggestions to improve our manuscript. We agree with the reviewer that our original interpretation of synaptic clustering by activity-dependent stabilization and punishment mechanisms is not directly supported by causal data. We have extensively revised the manuscript to take a more cautious view of the results and to discuss alternative mechanisms that are consistent with our data.

      During eye-specific circuit development, there is indeed increased synaptogenesis and, ultimately, RGC terminals are closely clustered within synaptic glomeruli. This process involves the selective addition and elimination of synapses. Bouton clustering has been shown to occur within individual RGC axons after eye-opening in the mouse (5). The convergence of other RGC types into clustered boutons has been shown at eye-opening by light and electron microscopy (3). There is also qualitative evidence that synaptic clusters may form earlier during eye-specific segregation in the cat (4). Our data provide additional evidence that synaptic clustering begins prior to eye-opening in the mouse (P2-P8). Although synapse numbers also increase during this period, the distribution of synapse addition is non-random. 

      Single-active zone synapses (we previously called these “simple”) may indeed mature into multi-active zone synapses (we previously called these “complex”). At the same time, single-active zone synapses may be eliminated. We believe that each of these events occurs as part of the synaptic refinement process. Our STORM images are static snapshots of eye-specific refinement, and we cannot infer the dynamic developmental trajectory of an individual synapse in our data. Future live imaging experiments in vivo/in situ will be needed to track the maturation and pruning of individual connections. We have expanded our discussion of these limitations and future directions in the manuscript.

      (2) The argument that "complex" synapses are the aggregate of "simple" synapses (Fig 2, S2) is not convincing.

      We agree with the reviewer’s concern about the ambiguous identity of complex synapses. To clarify the nature of multi-active zone synapses, we have performed RGC-specific dAPEX2 labeling to visualize retinogeniculate terminals by electron microscopy (EM). These experiments revealed the presence of synaptic terminals with multiple active zones. We have added images and text to the results section describing these findings. Our 2D EM images do not rule out the possibility that some multi-active zone synapses observed in STORM images are in fact clusters of individual RGC terminals. We have revised the text to provide a more accurate discussion of the nature of multi-active zone synapses.  

      (3) The authors use of the β2KO mice to assess changes in the organization of synaptic proteins in retinal terminals that have disrupted retinal waves. However, β2-nAChRs are also expressed in the dLGN and other areas of the brain and glutamatergic synapse development has been reported in the CNS independent of the disruption in retinal waves. This issue should be considered when interpreting the total reduced retinal synapse density in the dLGN of the mutant.

      We thank the reviewer for their suggestion to consider non-retinal effects of the germline deletion of the beta 2 subunit of the nicotinic acetylcholine receptor. Previously, Xu and colleagues reported the development of a conditional transgenic mouse model lacking β2-nAChR expression specifically in the retina (6). These retina-specific β2-nAChR mutant mice (Rx-β2cKO) have disrupted retinal wave properties and defects in eye-specific axonal segregation in binocular anterograde tracing experiments. This work suggests that the defects seen in germline β2-nAChR KO mice arise from defects in retinal wave activity rather than the loss of nicotinic receptors elsewhere in the brain. Additionally, the development of brainstem cholinergic inputs to the dLGN is delayed until the closure of the eye-specific segregation period (7), further suggesting a limited role for cholinergic transmission in the retinogeniculate refinement process.

      (4) Outside of a total synapse density difference between WT and β2KO mice, the changes in the spatial organization of synaptic proteins over development do not seem that different. In fact % simple synapses near complex synapses from the non-dominant eye in the mutant is not that different from WT at P8 (Fig 3C), an age when eye-specific segregation is very different between the genotypes. Can the authors explain this discrepancy?

      We thank the reviewer for their question concerning differences between synapse organization in WT versus β2KO mice. In the original presentation of Figure 3C at P4, the percentage of non-dominant eye single-AZ synapses near multi-AZ synapses increased at P4 in WT mice, but this did not occur in β2KO mice. This is consistent with our previous results showing that there is an increase in non-dominant eye synaptic density at this age, which does not occur in β2KO mice (1). At P8, this clustering effect is lost in WT as eye-specific segregation has taken place and non-dominant eye inputs have been eliminated. However, in β2KO mice, the overall synapse density is still low at this age. We interpret this result as a failure of synaptogenesis in the β2KO line, which leads to increased growth of individual RGC axons (8) and eye-specific overlap at P8 (9, 10). Evidence in support of this interpretation comes from live dynamic imaging studies of RGC axon branching in Xenopus and Zebrafish, showing that synapse formation stabilizes local axon branching and that disruptions of synapse formation or neurotransmission lead to enlarged axons (11-13).

      Our anatomical results do not provide a specific biological mechanism for the remaining clustering observed in the β2KO mice. We have revised our discussion of the fact that individual RGC axons may form multiple synaptic connections leading to clustering, which may be independent of changes in retinal wave properties in the β2KO mouse. We have also extensively revised the analysis and presentation of results in Figure 3 to directly compare synaptic clustering around both multi-AZ synapses and single-AZ synapses within the same imaging volumes.

      (5) The authors use nomenclature that has been previously used and associated with other aspects of retinogeniculate properties. For example, the phrases "simple" and "complex" synapses have been used to describe single boutons or aggregates of boutons from numerous retinal axons, whereas in this manuscript the phrases are used to describe vesicle clusters/release sites with no knowledge of whether they are from single or multiple boutons. Likewise, the use of the word "glomerulus" has been used in the context of the retinogeniculate synapse to refer to a specific pattern of bouton aggregates that involves inhibitory and neuromodulatory inputs. It is not clear how the release sites described by the authors fit in this picture. Finally the use of the word "punishment" is associated with a body of literature regarding the immune system and retinogeniculate refinement-which is not addressed in this study. This double use of the phrases can lead to confusion in the field and should be clarified by clear definitions of how they are used in the current study.

      We appreciate the reviewer’s concern that the terminology we used in the initial submission may cause confusion. We have revised the text throughout for clarity. “Simple” synapses are now referred to as “single-active zone synapses”. “Complex” synapses are now referred to as “multi-active zone synapses”. We have removed all text that previously referred to synaptic clusters in STORM images as glomeruli. We agree that we have not provided causal evidence for synaptic stabilization and punishment mechanisms, which would require additional molecular genetic studies. We have restructured the manuscript to remove these references and discuss our anatomical results impartially.  

      Reviewer #3 (Public Review):

      This manuscript is a follow-up to a recent study of synaptic development based on a powerful data set that combines anterograde labeling, immunofluorescence labeling of synaptic proteins, and STORM imaging (Cell Reports 2023). Specifically, they use anti-Vglut2 label to determine the size of the presynaptic structure (which they describe as the vesicle pool size), anti-Bassoon to label a number of active zones, and anti-Homer to identify postsynaptic densities. In their previous study, they compared the detailed synaptic structure across the development of synapses made with contra-projecting vs ipsi-projecting RGCs and compared this developmental profile with a mouse model with reduced retinal waves. In this study, they produce a new analysis on the same data set in which they classify synapses into "complex" vs. "simple" and assess the number and spacing of these synapses. From these measurements, they make conclusions regarding the processes that lead to synapse competition/stabilization.

      Strengths:

      This is a fantastic data set for describing the structural details of synapse development in a part of the brain undergoing activity-dependent synaptic rearrangements. The fact that they can differentiate eye of origin is also a plus.

      Weaknesses:

      The lack of details provided for the classification scheme as well as the interpretation of small effect sizes limit the interpretations that can be made based on these findings.

      We thank the reviewer for their reading of the manuscript and helpful comments to improve the work. We provide details on how single-active zone and multi-active zone synapses are classified in the methods section. We agree with the suggestion to be more careful in interpreting the results. We have extensively revised the manuscript to 1) include additional electron microscopy data demonstrating the presence of multi-active zone retinogeniculate synapses, 2) extend the synaptic clustering analysis to both single-active zone and multi-active zone synapses for comparison, and 3) improve the clarity and accuracy of the discussion throughout the manuscript.

      (1) The criteria to classify synapses as simple vs. complex is critical for all of the analysis in this study. Therefore this criteria for classification should be much more explicit and tested for robustness. As stated in the methods, it is based on the number of active zones which are designated by the number of Bassoon clusters associated with a Vglut2 cluster (line 697). A second part of the criteria is the size of the presynaptic terminal as assayed by "greater Vglut2 signal" (line 116). So how are these thresholds determined? For Bassoon clusters, is one voxel sufficient? Two? If it's one, how often do they see a Bassoon positive voxel with no Vglut2 cluster and therefore may represent "noise"? There is no distribution of Bassoon volumes that is provided that might be the basis for selecting this number of sites. Unfortunately, the images are not helpful. For example, does P8 WT in Figure 1B have 7 or 2? According to Figure 2C, it appears the numbers are closer to 2-4.

      The Vglut volume measurements also do not seem to provide a clear criterion. Figure 2 shows that the distributions of Vglut2 cluster volumes for complex and for simple synapses are significantly overlapping.

      The authors need to clarify the quantitative approach used for this classification strategy and test how sensitive the results of the study are to how robust this strategy is

      We thank the reviewer for their question concerning the STORM data analysis. Here we provide a brief overview of the complete analysis details, which are provided in the methods section.

      Our raw STORM data sets consisted of spectrally separate volumetric imaging channels of VGluT2, Bassoon, and Homer1 signals. For each of these channels, raw STORM data were processed by 1) application of the corresponding low-resolution conventional image of each physical section to the STORM data to filter artifacts in the STORM image which do not appear in the conventional image, 2) STORM images are then thresholded using a 2-factor Otsu threshold that removes low-intensity background noise while preserving all single-molecule localizations that correspond to genuine antibody labeling as well as non-specific antibody labeling in the tissue, 3) application of the MATLAB function “conncomp” to identify connected component voxel in 3D across the image stack. Clusters are only kept for further analysis steps if they are connected across at least 2 continuous physical sections (140 nm Z depth). 4) for every connected component (clusters corresponding to genuine antibody labeling and background labeling), we measure the volume and signal density (intensity/volume) for every cluster in the dataset, 5) a threshold is applied to retain clusters that have a higher volume and lower signal density. We exclude signals that have low-volume and high-density, which correspond to single antibody labels. This analysis retains larger clusters that correspond to synaptic objects and excludes non-specific antibody background. 

      The average size of WT synaptic Bassoon clusters ranges from 55 - 3532 voxels (0.00092~0.059 μm<sup>3</sup>), with a median size of 460 voxels (0.0077 μm<sup>3</sup>).

      The average size of WT synaptic VGluT2 clusters ranges from 50 -73752 voxels (0.00084~1.2 μm<sup>3</sup>), with a median size of 980 voxels (0.016 μm<sup>3</sup>).

      The average size of WT synaptic Homer1 clusters ranges from 63-7118 (0.0010~0.12 μm3), with a median size of 654 voxels (0.011 μm<sup>3</sup>).

      In practice, any Bassoon/VGluT2/Homer1 clusters with <10 voxels are immediately filtered at the Otsu thresholding step (2) above.

      The reviewer is correct that we often see Bassoon(+) clusters that are not associated with VGluT2, and these may reflect synapses of non-retinal origin or retinogeniculate synapses that lack VGluT2 expression. To identify retinogeniculate synapses containing VGluT2, we performed a synapse pairing analysis that measured the association between VGluT2 and Bassoon clusters after the synapse cluster filtering described above. We first measured the centroid-centroid distance from each VGluT2 cluster to the closest cluster in the Bassoon channel. We next quantified the signal intensity of the Bassoon channel within a 140 nm shell surrounding each VGluT2 cluster. A 2D histogram was plotted based on the measured centroid-centroid distances and opposing channel signal densities of each cluster. Paired clusters with closely positioned centroids and high intensities of apposed channel signal were identified using the OPTICS algorithm (14).

      In the original Figure 1B, the multi-active zone synapse in WT at P8 had two Bassoon clusters. To clarify this, we have revised the images in Figure 1 to include arrowheads that point to individual active zones. We have also revised Supplemental Figure 1 to show volumetric renderings of individual example synapses that help illustrate the 3D structure of these multi-active zone inputs. All details about synapse analysis and synapse pairing are provided in the methods section.

      (2) Effect sizes are quite small and all comparisons are made on medians of distributions. This leads to an n=3 biological replicates for all comparisons. Hence this small n may lead to significant results based on ANOVAS/t-tests, but the statistical power of these effects is quite weak. To accurately represent the variance in their data, the authors should show all three data points for each category (with a SD error bar when possible). They should also include the number of synapses in each category (e.g. the numerators in Figure 1D and the denominators for Figure 1E). For other figures, there are additional statistical questions described below.

      We thank the reviewer for their suggestion to improve the presentation of our results. We have added all three data points (individual biological replicates) to each figure plot when applicable. We have also included a supplemental table (Table S1) listing total eye-specific synapse numbers of each type (mAZ and sAZ) and AZ number for each biological replicate in both genotypes.

      (3) The authors need to add a caveat regarding their classification of synapses as "complex" vs. "simple" since this is a terminology that already exists in the field and it is not clear that these STORM images are measuring the same thing. For example, in EM studies, "complex" refers to multiple RGCs converging on the same single postsynaptic site. The authors here acknowledge that they cannot assign different AZs to different RGCs so this comparison is an assumption. In Figure 2 they argue this is a good assumption based on the finding that the Vglut column/active zone is constant and therefore each represents a single RGC. However, the authors should acknowledge that they are actually seeing quite different percentages than those in EM studies. For example, in Monavarfeshani et al, eLife 2018, there were no complex synapses found at P8. (Note this study also found many more complex vs. simple synapses in the adult - 70% vs. the 20% found in the current study - but this difference could be a developmental effect). In the future, the authors may want to take another data set in the adult dLGN to make a direct comparison based on numbers and see if their classification method for complex/simple maps onto the one that currently exists in the literature.

      We appreciate the reviewer’s comment that the use of the terms “complex” and “simple” may cause confusion. We have significantly revised the manuscript for clarity: 1) we now refer to “complex” synapses as “multi-active zone synapses” and “simple” synapses as “single-active zone synapses. 2) We have performed electron microscopy analysis of dAPEX2-labeled retinogeniculate projections to confirm the existence of large synaptic terminals with multiple active zones. 3) We have expanded our discussion of previous electron microscopy results describing a lack of axonal convergence at P8 (3). 4) We have added a discussion on how individual RGCs may form multiple synapses in close proximity within their axonal arbor, which would create a clustering effect.

      We agree that it will be informative to collect a STORM data set in the adult mouse dLGN and we look forward to working on this project to compare with EM results in the future.  

      (4) Figure 3 assays the relative distribution of simple vs. complex synapses. They found that a larger percentage of simple synapses were within 1.5 microns of complex synapses than you would expect by chance for both ipsi and contra projecting RGCs, and hence conclude that complex synapses are sites of synaptic clustering. In contrast, there was no clustering of ipsi-simple to contra-complex synapses and vice versa. The authors also argue that this clustering decreases between P4 and P8 for ipsi projecting RGCs.

      This analysis needs much more rigor before any conclusions can be drawn. First, the authors need to justify the 1.5-micron criteria for clustering and how robust their results are to variations in this distance. Second, these age effects need to be tested for statistical significance with an ANOVA (all the stats presented are pairwise comparisons to means expected by random distributions at each age). Finally, the authors should consider what n's to use here - is it still grouped by biological replicate? Why not use individual synapses across mice? If they do biological replicates, then they should again show error bars for each data point in their biological replicates. And they should include the number of synapses that went into these measurements in the caption.

      We appreciate the suggestion to improve the rigor of our analysis of synaptic clustering presented in Figure 3. We have revised our analysis to measure the degree of synapse clustering nearby both multi-AZ and single-AZ synapses after an equivalent randomization of single-AZ synapse positions in the volume. 

      We now present the revised results as a “clustering index” for both multi-AZ synapses and single-AZ synapses. This measurement was performed in several steps: 1) randomization of single-AZ positions within the imaging volume while holding multi-AZ centroid positions fixed, 2) independent measurements of the fraction of single-AZ synapses within the local shell (1.5 μm search radius) around multi-AZ and single-AZ synapses within the random distribution, 3) comparison of the result from (2) with the actual fractional measurements in the raw STORM data to compute a “clustering index” value. 4) Because the randomization is equivalent for both multi-AZ and single-AZ synapse measurements, the measured differences in the degree of clustering reflect a synapse type-specific effect.

      We have also updated Supplemental Figure 3 showing the results of varying the search radius from 1-4 μm for both contralateral- and ipsilateral-eye synapses. The results showed that a search radius of 1.5 μm resulted in the largest difference between the original synapse distribution and a randomized synapse distribution (shuffling of single-active zone synapse position while holding multi-active zone synapse position fixed).

      Finally, we have removed all statistical comparisons of single measurements (means or ratios) across ages from the manuscript. We focus our statistical analysis on paired data comparisons within individual biological replicates.

      For the analysis of synapse clustering, we grouped the data by biological replicates (N=3) to look for a global effect on synapse clustering. In the revised manuscript, we added data points for each replicate in the figure and included the number of synapses in Supplementary Table 1.

      (5) Line 211-212 - the authors conclude that the absence of clustered ipsi-simple synapses indicates a failure to stabilize (Figure 3). Yet, the link between this measurement and synapse stabilization is not clear. In particular, the conclusion that "isolated" synapses are the ones that will be eliminated seems to be countered by their finding in Figure 3D/E which shows that there is no difference in vesicle pool volume between near and far synapses. If isolated synapses are indeed the ones that fail to stabilize by P8, wouldn't you expect them to be weaker/have fewer vesicles? Also, it's hard to tell if there is an age-dependent effect since the data presented in Figures 3D/E are merged across ages.

      We thank the reviewer for their suggestion to clarify the results in Figure 3. Based on the measured eye-specific differences in vesicle pool size and organization, we also expected that synapses outside of clusters would show a reduced vesicle population. However, across all ages, we found no differences in the vesicle pool size of single-active zone synapses based on their proximity to multi-active zone synapses. Below, we show cumulative distributions of these results across all ages (P2/P4/P8) for WT mice CTB(+) data. Statistical tests (Kolmogorov-Smirnov tests) show no significant differences. P = 0.880, 0.767, 0.494 respectively. Separate 5/95% confidence interval calculations showed overlap between far and near populations at each age.

      Author response image 4.

      To clarify the presentation of the results, we have changed the text to state that the “vesicle pool size of sAZ synapses is independent of their distance to mAZ synapses”. We have removed references to stabilization and punishment from the results section of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Because none of the phenomena being measured can be expected to behave randomly (given what is already known about the system) and the sample size is small, I believe quantification of the data requires confidence intervals for effect sizes. Resolving the multi-bouton vs multi-active zone bouton with EM would also help.

      We thank the reviewer for their thorough reading of the manuscript and many helpful suggestions. We provide analysis with confidence intervals in a point-by-point response below. In the manuscript we revised our results and focused our statistical analyses on comparisons within the same biological replicate (paired effects). In addition, we have performed electron microscopy of RGC inputs to the dLGN at postnatal day 8 to demonstrate the presence of retinogeniculate synapses with multiple active zones.

      Figure 1:

      Please show data points in scatter bar plots and not just error bars.

      We have updated all plots to show data points for independent biological replicates.

      Please describe the image processing in more detail and provide an image in which the degree of off-target labeling can be evaluated.

      We have updated the description of the image processing in the methods sections. We have made all the code used in this analysis freely available on GitHub (https://github.com/SpeerLab). We have uploaded the raw STORM images of the full data set to the open-access Brain Imaging Library (16). These images can be accessed here: https://api.brainimagelibrary.org/web/view?bildid=ace-dud-lid (WTP2A data for example). All 18 datasets are currently searchable on the BIL by keyword “dLGN” or PI last name “Speer” and a DOI for the grouped dataset is pending.

      How does panel 1D get very small error bars with N = 3? Please provide scatter plots.

      We have updated panel 1D to show the means for each independent biological replicate.

      Line 129: over what volume is density measured? What are the n's? What is the magnitude (with confidence intervals) of increase?

      The volume we collected from each replicate was ~80μm*80μm*7μm (total volume ~44,800 μm3). N=3 biological replicates for each age, genotype, and tissue location. Because of concerns with the use of ANOVA for low sample numbers, we have removed a majority of the age-wise comparisons from the manuscript and instead focus on within-replicate paired data comparisons. Author response image 5 showa 5/95% confidence intervals for WT data (left panel) and β2KO data (right panel) is shown below:

      Author response image 5.

      The 5/95% CI range for the increase in synapse density from P2 to P8 for CTB(+) synapses is ~ -0.001 ~ 0.037 synapses / μm<sup>3</sup>.

      Line 131: You say that non-dominant increases and then decreases. It appears that the error bars argue that you do not have enough information to reliably determine how much or little density changes.

      Line 140: No confidence intervals. It appears the error bars allow both for the claimed effect of increased fraction and the opposite effect of decreased density.

      Because of concerns with the use of ANOVA for low sample numbers, we have removed age-wise comparisons of single-measurements (means and ratios) from the manuscript and instead focus on within-replicate paired data comparisons.

      Line 144: Confidence intervals would be a reasonable way to argue that fraction is not changed in KO: normal fraction XX%-XX%. KO fraction XX%-XX%.

      Author response image 6 shows panels for WT (left) and β2KO mice (right) with 5/95% CIs.

      Author response image 6.

      In the revised manuscript, we have updated the text to report the measurements, but we do not draw conclusions about changes over development.

      I find it hard to estimate magnitudes on a log scale.

      We appreciate the reviewer’s concern with the presentation of results on a log scale. Because the measured synapse properties are distributed logarithmically, we have elected to present the data on a log scale so that the distribution(s) can be seen clearly. Lognormal distributions enable us to use a mixed linear model for statistical analysis.

      Line 156: Needs confidence interval for difference.

      Line 158: Needs confidence interval for difference of differences.

      Line 160: Needs confidence interval for difference of differences.

      Why only compare at P4 where there is the biggest difference? The activity hypothesis would predict an even bigger effect at P8.

      Below is a table listing the mean volume (log10μm3) and [5/95%] confidence intervals for comparisons of VGluT2 signal between CTB(+) and CTB(-) synapses from Figure 2A and 2B:

      Author response table 2.

      Based on the values given above, the mean difference of differences and [5/95%] confidence intervals are listed below:

      Author response table 3.

      We added these values to the manuscript. We have also reported the difference in median values on a linear scale (as below) so that the readers can have a straightforward understanding of the magnitude.

      Author response table 4.

      We elected to highlight the results at P4 based on our previous finding that the synapse density from each eye-of-origin is similar at this time point (1).

      At P8, there is a decrease in the magnitude of the difference between CTB(+)/CTB(-) synapses compared to P4. This may be due to an increase in VGluT2 volume within non-dominant eye synapses that survive competition between P4-P8.

      At P8 in the mutant, there is an increase in the magnitude of the difference between CTB(+)/CTB(-) synapses compared to P4. This may be due to delayed synaptic maturation in β2KO mice.

      Line 171: The correct statistical comparison was not performed for the claim. Lack of * at P2 does not mean they are the same. Why do you get the same result for KO?

      We have revised the statistical analysis, figure presentation, and text to remove discussion of changes in the number of active zones per synapse over development based on ANOVA. We now report eye-specific differences at each time point using paired T-test analysis, which is mathematically equivalent to comparing the 5/95% confidence interval in the difference.

      Line 175: Qualitative claim. Correlation coefficients and magnitudes of correlation coefficients are not reported.

      Linear fitting slop and R square values are attached:

      Author response table 5.

      The values are added to the manuscript to support the conclusions.

      Line 177: n.s. does not mean that you have demonstrated the values are the same. An argument for similarity could be made by calculating a confidence interval a for potential range of differences. Example: Complex were 60%-170% of Simple.

      Author response image 7 with 5/95% CI is shown below (WT and B2KO):

      Author response image 7.

      Comparing the difference between multi-AZ synapse and single-AZ synapse revealed that the difference in average VGluT2 cluster volume per AZ is:

      Author response table 6.

      The values are added to the manuscript for discussion.

      Line 178: There is no reason to think that the vesical pool for a single bouton does not scale with active zone number within the range of uncertainty presented here.

      We have collected EM images of multi-AZ zone synapses and modified our discussion and conclusions in the revised text.

      Line 196: "non-random clustering increased progressively" is misleading. The density of the boutons increases for both the Original and Randomized. Given the increase in variance at P8, it is unlikely that the data supports the claim that the non-randomness increased. Would be easy to quantify with confidence intervals for a measure of specificity (O/R).

      We have revised the manuscript to remove analysis and discussion of changes in clustering over development. We have modified this section of the manuscript and figures to present a normalized clustering index that describes the non-random clustering effect present at each time point.

      Line 209: Evidence is for correlation, not causation and there is a trivial potential explanation for correlation.

      We appreciate the reviewer’s concern with over interpretation of the results. We have changed the text to more accurately reflect the data.

      Line 238:239: Authors failed to show effect is activity-dependent. Near/Far distinction is not necessarily a criterion for the effect of activity. The claim is likely false in other systems.

      We agree with the reviewer that the original text overinterpreted the results. We have changed the text to more accurately reflect the data. 

      Line 265-266: Assumes previous result is correct and measure of vGlut2 provides information about all presynaptic protein organization.

      We thank the reviewer for pointing out the incorrect reference to all presynaptic protein organization. We have corrected the text to reference only the VGluT2 and Bassoon signals that were measured.

      Line 276: There are many other interpretations that include trivial causes. It is unclear what the measure indicates about the biology and there is no interpretable magnitude of effect.

      We agree with the reviewer that the original text overinterpreted the results. We have changed the text to remove references to mechanisms of synaptic stabilization.

      Line 289: Differences cannot be demonstrated by comparing P-values. Try comparing confidence intervals for effect size or generate a confidence interval for the difference between the two groups.

      5/95% confidence intervals are given below for Figure 4C/D:

      Author response table 7.

      We have added these values to the manuscript to support our conclusion.

      Line 305: "This suggests that complex synapses from the non-dominant-eye do not exert a punishment effect on synapses from the dominant-eye" Even if all the other assumptions in this claim were true, "n.s." just means you don't know something. It cannot be compared with an asterisk to claim a lack of effect.

      We thank the reviewer for raising this concern. We have modified the text to remove references to synaptic punishment mechanisms in the results section.

      Below are the 5/95% confidence intervals for the results in Figure 4F:

      Author response table 8.

      We have added these values to the manuscript to support our conclusion.

      Line 308: "mechanisms that act locally". 6 microns is introduced based on differences in curves above(?). I don't see any analysis that would argue that longer-distance effects were not present.

      The original reference referred to the differences in the cumulative distribution measurements between multi-active zone synapses versus single-active zone synapses in their distance to the nearest neighboring multi-active zone synapse. For clarity, we have deleted the reference to the 6 micron distance in the revised text.

      Reviewer #2 (Recommendations For The Authors):

      (1) This data set would be valuable to the community. However, unless the authors can show experiments that manipulate the presence of complex synapses to test their concluding claims, the manuscript should be rewritten with a reassessment of the conclusions that is more grounded in the data.

      We thank the reviewer for their careful reading of the manuscript and we agree the original interpretations were not causally supported by the experimental results. We have made substantial changes to the text throughout the introduction, results, and discussion sections so that the conclusions accurately reflect the data.

      (2) To convincingly address the claim that "complex synapse" are aggregates of simple synapses, the authors should perform experiments at the EM level showing what the bouton correlates are to these synapses.

      We thank the reviewer for their suggestion to perform EM to gain a better understanding of retinogeniculate terminal structure. We generated an RGC-specific transgenic line expressing the EM reporter dAPEX2 localized to mitochondria. We have collected EM images of retinogeniculate terminals that demonstrate the presence of multiple active zones within individual synapses. These results are now presented in Figure 1. The text has been updated to reflect the new results.

      (3) Experiments using the conditional β2KO mice would help address questions of the contribution of β2-nAChRs in dLGN to the synaptic phenotype.

      We appreciate the reviewer’s concern that the germline β2KO model may show effects that are not retina-specific. To address this, Xu and colleagues generated a retina-specific conditional β2KO transgenic and characterized wave properties and defective eye-specific segregation at the level of bulk axonal tracing (6). The results from the conditional mutant study suggest that the main effects on eye-specific axon refinement in the germline β2KO model are likely of retinal origin through impacts on retinal wave activity. Additionally, anatomical data shows that brainstem cholinergic axons innervate the dLGN toward the second half of eye-specific segregation and are not fully mature at P8 when eye-specific refinement is largely complete (7). We agree with the reviewer that future synaptic studies of previously published wave mutants, including the conditional reporter line, would be needed to conclusively assess a contribution of non-retinal nAChRs. These experiments will take significant time and resources and we respectfully suggest this is beyond the scope of the current manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors need to be more transparent that they are using the same data set from the previous publication (right now it does not appear until line 471) and clarify what was found in that study vs what is being tested here.

      We thank the reviewer for their thoughtful reading of the manuscript and helpful recommendations to improve the clarity of the work. We have edited the text to make it clear that this study is a reanalysis of an existing data set. We have revised the text to discuss the results from our previous study and more clearly define how the current analysis builds upon that initial work. 

      (2) The authors restricted their competition argument in Figure 4 to complex synapses, but why not include the simple ones? This seems like a straightforward analysis to do.

      We appreciate the reviewer’s suggestion to measure spatial relationships between “clustered” and “isolated” single-AZ synapses as we have done for multi-AZ synapses in Figure 4. However, we are not able to perform a direct and interpretable comparison with the results shown for multi-AZ synapses. First, we would need to classify “clustered” and “isolated” single-AZ synapses. This classification convolves two effects: 1) a distance threshold to define clustering and 2) subsequent distance measurements between clustered synapses.

      If we apply an equivalent 1.5 μm distance threshold (or any other threshold) to define clustered synapses, the distance from each “clustered” single-AZ synapse to the nearest other single-AZ synapse will always be smaller than the defined threshold (1.5 μm). Alternatively, if all of the single-AZ synapses within each local 1.5 μm shell are excluded from the subsequent intersynaptic distance measurements, this will set a hard lower boundary on the distance between synaptic clusters (1.5 μm minimum). The two effects discussed above were separated in our original analysis of multi-AZ synapses defined as “clustered” and “isolated” based on their relationship to single-AZ synapses, but these effects cannot be separated when analyzing single-AZ distributions alone.

      (3) The Discussion seems much too long and speculative from the current data that is represented - particularly without verification of complex synapses actually being inputs from different RGCs. Along the same lines, figure captions are misleading. For example, for Figure 4 - the title indicates that the complex synapses are driving the rearrangements. But of course, these are static images. The authors should use titles that are more reflective of their findings rather than this interpretation.

      We thank the reviewer for these helpful suggestions. We have changed each of the figure captions to more accurately reflect the results. We have deleted all of the speculative discussion and revised the remaining text to improve the accuracy of the presentation.

      (4) In the future, the authors may want to consider an analysis as to whether ipsi and contra projection contribute to the same synapses

      We agree with the reviewer that it is of interest to investigate the contribution of binocular inputs to retinogeniculate synaptic clusters during development. At maturity, some weak binocular input remains in the dominant-eye territory (15). To look for evidence of binocular synaptic interactions, we measured the percentage of the total small single-active zone synapses that were within 1.5 micrometers of larger multi-active zone synapses of the opposite eye. On average, ~10% or less of the single-active zone synapses were near multi-active zone synapses of the opposite eye. This analysis is presented in Supplemental Figure S3C/D.

      It is possible that some large mAZ synapses might reflect the convergence of two or more smaller inputs from the two eyes. Our current analyses do not rule this out. However, previous EM studies have found limited evidence for convergence of multiple RGCs (3) at P8 and our own EM images show that larger terminals with multiple active zones are formed by a single RGC bouton. Future volumetric EM reconstructions with eye-specific labels will be informative to address this question.

      References

      (1) Zhang C, Yadav S, Speer CM. The synaptic basis of activity-dependent eye-specific competition. Cell Rep. 2023;42(2):112085.

      (2) Bickford ME, Slusarczyk A, Dilger EK, Krahe TE, Kucuk C, Guido W. Synaptic development of the mouse dorsal lateral geniculate nucleus. J Comp Neurol. 2010;518(5):622-35.

      (3)Monavarfeshani A, Stanton G, Van Name J, Su K, Mills WA, 3rd, Swilling K, et al. LRRTM1 underlies synaptic convergence in visual thalamus. Elife. 2018;7.

      (4) Campbell G, Shatz CJ. Synapses formed by identified retinogeniculate axons during the segregation of eye input. J Neurosci. 1992;12(5):1847-58.

      (5) Hong YK, Park S, Litvina EY, Morales J, Sanes JR, Chen C. Refinement of the retinogeniculate synapse by bouton clustering. Neuron. 2014;84(2):332-9.

      (6) Xu HP, Burbridge TJ, Chen MG, Ge X, Zhang Y, Zhou ZJ, et al. Spatial pattern of spontaneous retinal waves instructs retinotopic map refinement more than activity frequency. Dev Neurobiol. 2015;75(6):621-40.

      (7) Sokhadze G, Seabrook TA, Guido W. The absence of retinal input disrupts the development of cholinergic brainstem projections in the mouse dorsal lateral geniculate nucleus. Neural Dev. 2018;13(1):27.

      (8) Dhande OS, Hua EW, Guh E, Yeh J, Bhatt S, Zhang Y, et al. Development of single retinofugal axon arbors in normal and beta2 knock-out mice. J Neurosci. 2011;31(9):3384-99.

      (9) Rossi FM, Pizzorusso T, Porciatti V, Marubio LM, Maffei L, Changeux JP. Requirement of the nicotinic acetylcholine receptor beta 2 subunit for the anatomical and functional development of the visual system. Proc Natl Acad Sci U S A. 2001;98(11):6453-8.

      (10) Muir-Robinson G, Hwang BJ, Feller MB. Retinogeniculate axons undergo eye-specific segregation in the absence of eye-specific layers. J Neurosci. 2002;22(13):5259-64.

      (11) Fredj NB, Hammond S, Otsuna H, Chien C-B, Burrone J, Meyer MP. Synaptic Activity and Activity-Dependent Competition Regulates Axon Arbor Maturation, Growth Arrest, and Territory in the Retinotectal Projection. J Neurosci. 2010;30(32):10939.

      (12) Hua JY, Smear MC, Baier H, Smith SJ. Regulation of axon growth in vivo by activity-based competition. Nature. 2005;434(7036):1022-6.

      (13) Rahman TN, Munz M, Kutsarova E, Bilash OM, Ruthazer ES. Stentian structural plasticity in the developing visual system. Proc Natl Acad Sci U S A. 2020;117(20):10636-8.

      (14) Ankerst M, Breunig MM, Kriegel H-P, Sander J. OPTICS: ordering points to identify the clustering structure. SIGMOD Rec. 1999;28(2):49–60.

      (15) Bauer J, Weiler S, Fernholz MHP, Laubender D, Scheuss V, Hübener M, et al. Limited functional convergence of eye-specific inputs in the retinogeniculate pathway of the mouse. Neuron. 2021;109(15):2457-68.e12.

      (16) Benninger K, Hood G, Simmel D, Tuite L, Wetzel A, Ropelewski A, et al. Cyberinfrastructure of a Multi-Petabyte Microscopy Resource for Neuroscience Research.  Practice and Experience in Advanced Research Computing; Portland, OR, USA: Association for Computing Machinery; 2020. p. 1–7.

    1. eLife Assessment

      This is a fundamental study that addresses the key question of how the tetraspanin Tspan12 functions biochemically as a co-receptor for Norrin to initiate β-catenin signaling. The strength of the work lies in the rigorous and compelling binding analyses involving various purified receptors, co-receptors, and ligands, as well as molecular modeling by AlphaFold that was subsequently validated by an extensive series of mutagenesis experiments. The study advances the field by providing a novel mechanism of co-receptor function and shedding new light on how signaling specificity is achieved in the complex Wnt/Norrin signaling system.

    2. Joint Public Review:

      Though the Norrin protein is structurally unrelated to the Wnt ligands, it can activate the Wnt/β-catenin pathway by binding to the canonical Wnt receptors Fzd4 and Lrp5/6, as well as the tetraspanin Tspan12 co-receptor. Understanding the biochemical mechanisms by which Norrin engages Tspan12 to initiate signaling is important, as this pathway plays an important role in regulating retinal angiogenesis and maintaining the blood-retina-barrier. Numerous mutations in this signaling pathway have also been found in human patients with ocular diseases. The overarching goal of the study is to define the biochemical mechanisms by which Tspan12 mediates Norrin signaling. Using purified Tspan12 reconstituted in lipid nanodiscs, the authors conducted detailed binding experiments to document the direct, high-affinity interactions between purified Tspan12 and Norrin. To further model this binding event, they used AlphaFold to dock Norrin and Tspan12 and identified four putative binding sites. They went on to validate these sites through mutagenesis experiments. Using the information obtained from the AlphaFold modeling and through additional binding competition experiments, it was further demonstrated that Tspan12 and Fzd4 can bind Norrin simultaneously, but Tspan12 binding to Norrin is competitive with other known co-receptors, such as HSPGs and Lrp5/6. Collectively, the authors proposed that the main function of Tspan12 is to capture low concentrations of Norrin at the early stage of signaling, and then "hand over" Norrin to Fzd4 and Lrp5/6 for further signal propagation. Overall, the study is comprehensive and compelling, and the conclusions are well supported by the experimental and modeling data.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Though the Norrin protein is structurally unrelated to the Wnt ligands, it can activate the Wnt/βcatenin pathway by binding to the canonical Wnt receptors Fzd4 and Lrp5/6, as well as the tetraspanin Tspan12 co-receptor. Understanding the biochemical mechanisms by which Norrin engages Tspan12 to initiate signaling is important, as this pathway plays an important role in regulating retinal angiogenesis and maintaining the blood-retina-barrier. Numerous mutations in this signaling pathway have also been found in human patients with ocular diseases. The overarching goal of the study is to define the biochemical mechanisms by which Tspan12 mediates Norrin signaling. Using purified Tspan12 reconstituted in lipid nanodiscs, the authors conducted detailed binding experiments to document the direct, high-affinity interactions between purified Tspan12 and Norrin. To further model this binding event, they used AlphaFold to dock Norrin and Tspan12 and identified four putative binding sites. They went on to validate these sites through mutagenesis experiments. Using the information obtained from the AlphaFold modeling and through additional binding competition experiments, it was further demonstrated that Tspan12 and Fzd4 can bind Norrin simultaneously, but Tspan12 binding to Norrin is competitive with other known co-receptors, such as HSPGs and Lrp5/6. Collectively, the authors proposed that the main function of Tspan12 is to capture low concentrations of Norrin at the early stage of signaling, and then "hand over" Norrin to Fzd4 and Lrp5/6 for further signal propagation. Overall, the study is comprehensive and compelling, and the conclusions are well supported by the experimental and modeling data. 

      Strengths: 

      • Biochemical reconstitution of Tspan12 and Fzd4 in lipid nanodiscs is an elegant approach for testing the direct binding interaction between Norrin and its co-receptors. The proteins used for the study seem to be of high purity and quality. 

      • The various binding experiments presented throughout the study were carried out rigorously. In particular, BLI allows accurate measurement of equilibrium binding constants as well as on and off rates. 

      • It is nice to see that the authors followed up on their AlphaFold modeling with an extensive series of mutagenesis studies to experimentally validate the potential binding sites. This adds credence to the AlphaFold models. 

      • Table S1 is a further testament to the rigor of the study. 

      • Overall, the study is comprehensive and compelling, and the conclusions are well supported by the experimental and modeling data. 

      Suggestions for improvement: 

      • It would be helpful to show Coomassie-stained gels of the key mutant Norrin and Tspan12 proteins presented in Figures 2E and 2F. 

      We have included Stain-Free SDS-PAGE gels from the purification of the Norrin and Tspan12 mutants in a new Figure S4.

      • Many Norrin and Tspan12 mutations have been identified in human patients with FEVR. It would be interesting to comment on whether any of the mutations might affect the NorrinTspan12 binding sites described in this study. 

      Thank you for this suggestion. We have inspected human mutation databases gnomAD, ClinVar, and HGMD for known mutations in the predicted Tspan12-Norrin binding interface and their occurrence in human patients with FEVR or Norrie disease.

      While a number of Tspan12 residues that we predict to interact with Norrin are impacted by rare mutations in humans (e.g., L169M, E170V, E173K, D175N, E196G, S199C, as found in the gnomAD database), these alleles are of unknown clinical significance (as found in ClinVar or HGMD databases). It is possible that mutations that slightly weaken the Norrin-Tspan12 interface may not produce a strong phenotype, especially given the avidity we expect from this system. By our examination, the missense variants of clinical significance that have been found in the Tspan12 LEL would be expected to destabilize the protein (i.e., mutations to or from cysteine or proline, or mutations to residues involved in packing interactions within the LEL fold), and therefore these mutations may produce a disease phenotype by impacting Tspan12 protein expression levels.  

      Several Norrin mutations that are associated with Norrie disease, FEVR, or other diseases of the retinal vasculature have been found in the predicted Tspan12 binding site. For example, Norrin mutations at positions L103 (L103Q, L103V), K104 (K104N, K104Q), and A105 (A105T, A105P, A105E, A105S, A105T, A105V) have been found in patients, all of which may disrupt binding to Tspan12. However, the deleterious effect of K104 mutations on Norrin-stimulated signaling could also be explained by a weakened Norrin-Fzd4 binding interface. Norrin mutations at R115 (R115L and R115Q), as well as R121 (R121L, R121G, R121Q, and R121W) have also been found in patients with various diseases of the retinal vasculature. Additionally, the Norrin mutation T119P has been found in patients with Norrie disease, but we would expect this mutation to destabilize Norrin in addition to disrupting the Tspan12 binding site. 

      While we commented briefly on mutations R115L and R121W in the original draft (page 5, paragraphs 4 and 1, respectively), we have updated the manuscript with more comments on disease-associated mutations to the predicted Tspan12 binding site on Norrin (page 5, first partial paragraph; page 9, first partial paragraph). 

      • Some of the negative conclusions (e.g. the lack of involvement of Tspan12 in the formation of the Norrin-Lrp5/6-Fzd4-Dvl signaling complex) can be difficult to interpret. There are many possible reasons as to why certain biological effects are not recapitulated in a reconstitution experiment. For instance, the recombinant proteins used in the experiment may not be presented in the correct configurations, and certain biochemical modifications, such as phosphorylation, may also be missing. 

      We agree that different Tspan12 and Fzd4 stoichiometries, lipid compositions, and posttranslational modifications could impact the results of our study, and that it is important to mention these possibilities. We have added these caveats to the discussion section (page 10, last paragraph).  

      Reviewer #2 (Public Review): 

      This is an interesting study of high quality with important and novel findings. Bruguera et al. report a biochemical and structural analysis of the Tspan12 co-receptor for norrin. Major findings are that Norrin directly binds Tspan12 with high affinity (this is consistent with a report on BioRxiv: Antibody Display of cell surface receptor Tetraspanin12 and SARS-CoV-2 spike protein) and a predicted structure of Tspan12 alone or in complex with Norrin. The

      Norrin/Tspan12 binding interface is largely verified by mutational analysis. An interaction of the Tspan12 large extracellular loop (LEL) with Fzd4 cannot be detected and interactions of fulllength Tspan12 and Fzd4 cannot be tested using nano-disc based BLI, however, Fzd4/Tspan12 heterodimers can be purified and inserted into nanodiscs when aided by split GFP tags. An analysis of a potential composite binding site of a Fzd4/Tspan12 complex is somewhat inconclusive, as no major increase in affinity is detected for the complex compared to the individual components. A caveat to this data is that affinity measurements were performed for complexes with approximately 1 molecule Tspan12 and FZD4 per nanodisc, while the composite binding site could potentially be formed only in higher order complexes, e.g., 2:2 Fzd4/Tspan12 complexes. Interestingly, the authors find that the Norrin/Tspan12 binding site and the Norrin/Lrp6 binding site partially overlap and that the Lrp6 ectodomain competes with Tspan12 for Norrin binding. This result leads the authors to propose a model according to which Tspan12 captures Norrin and then has to "hand it off" to allow for Fzd4/Lrp6 formation. By increasing the local concentration of Norrin, Tspan12 would enhance the formation of the Fzd4/Lrp5 or Fzd4/Lrp6 complex. 

      Thank you for pointing out the BioRxiv report showing Norrin-Tspan12 LEL binding. We have cited this in the introduction of our revised manuscript (page 2, paragraph 3).

      The experiments based on membrane proteins inserted into nano-discs and the structure prediction using AlphaFold yield important new insights into a protein complex that has critical roles in normal CNS vascular biology, retinal vascular disease, and is a target for therapeutic intervention. However, it remains unclear how Norrin would be "handed off" from Tspan12 or Tspan12/Fzd4 complexes to Fzd4/Lrp6 complexes, as the relatively high affinity of Norrin to Fzd4/Tspan12 dimers likely does not favor the "handing off" to Fzd4/Lrp6 complexes. 

      While the Fzd4-Tspan12 interaction is strong, our data suggest that Fzd4 and Tspan12 bind Norrin with negative cooperativity, suggesting that Fzd4 binding may enhance Norrin-Tspan12 dissociation to facilitate handoff. This model is based on 1) the dissociation of Norrin from beadbound Tspan12 in the presence of saturating Fzd4 CRD (Figure 3D), and 2) a weaker measured affinity of Norrin-Tspan12LEL in the presence of saturating Fzd4 CRD (Figure 3F). We have now added wording to emphasize this in the discussion section (page 9, end of first full paragraph).

      However, as you note, the Norrin-Tspan12 affinity that we measured in the presence of Fzd CRD (tens of nM) is still much stronger than the known Norrin-LRP6 affinity (0.5-1µM), which predicts that the efficiency of this handoff may be low. We have now commented on this in the discussion section and mentioned an alternative model in which Tspan12 presents the second Norrin protomer to LRP5/6 for signaling, instead of dissociating (page 9, paragraph 2). However, the handoff efficiency could also be impacted by other factors such as the relative abundance and surface distribution of Tspan12, Fzd4, LRP6 and HSPGs.  

      Areas that would benefit from further experiments, or a discussion, include: 

      -  The authors test a potential composite binding site of Fzd4/Tspan12 heterodimers for norrin using nanodiscs that contain on average about 1 molecule Fzd4 and 1 molecule Tspan12. The Fzd4/Tspan12 heterodimer is co-inserted into the nanodiscs supported by split-GFP tags on Fzd4 and Tspan12. The authors find no major increase in affinity, although they find changes to the Hill slope, reflecting better binding of norrin at low norrin concentrations. In 293F cells overexpressing Fzd4 and Tspan12 (which may result in a different stoichiometry) they find more pronounced effects of norrin binding to Fzd4/Tspan12. This raises the possibility that the formation of a composite binding requires Fzd4/Tspan12 complexes of higher order, for example, 2:2 Fzd4/Tspan12 complexes, where the composite binding site may involve residues of each Fzd4 and Tspan12 molecule in the complex. This could be tested in nanodiscs in which Fzd4 and Tspan12 are inserted at higher concentrations or using Fzd4 and Tspan12 that contain additional tags for oligomerization. 

      It is quite possible that Tspan12 and Fzd4 cluster into complexes with a stoichiometry greater than 1:1 in cells (this is supported by e.g., BRET experiments in (Ke et al., 2013)), and we mention in the discussion that that receptor clustering may be an additional mechanism by which Tspan12 exerts its function (page 10, paragraph 4). We would be quite interested to know the stoichiometry of Fzd4 and Tspan12 complexes in cells at endogenous expression levels, both in the presence and absence of Norrin, and to biochemically characterize these putative larger complexes in the future. We have amended the discussion to mention the caveat that our reconstitution experiments do not test higher-stoichiometry Fzd4/Tspan12 complexes (page 10, last paragraph).

      - While Tspan12 LEL does not bind to Fzd4, the successful reconstitution of GFP from Tspan12 and Fzd4 tagged with split GFP components provides evidence for Fzd4/Tspan12 complex formation. As a negative control, e.g., Fzd5, or Tspan11 with split GFP tags (Fzd5/Tspan12 or Fzd4/Tspan11) would clarify if FZD4/Tspan12 heterodimers are an artefact of the split GFP system. 

      The split-GFP system allows us to co-purify receptors that do not normally co-localize (for example, as we have shown with Fzd4 and LRP6 in the absence of ligand (Bruguera et al., 2022)) so we do not mean to claim that it provides evidence for Fzd4/Tspan12 complex formation. In fact, we were unable to co-purify co-expressed Fzd4 and Tspan12 unless they were tethered with the split GFP system, and separately-purified Fzd4 and Tspan12 did not incorporate into nanodiscs together unless they were tethered by split GFP. Based on these experiments, we expect that the purported Fzd4-Tspan12 interaction that others have found by co-IP or co-localization is easily disrupted by detergent, may require a specific lipid, and/or may not be direct.

      To clarify this point, we have noted in the results section that without the split GFP tags, Tspan12 and Fzd4 did not co-purify or co-reconstitute into nanodiscs, and that co-reconstitution was enabled by the split GFP system (page 6, first full paragraph).   

      - Fzd4/Tspan12 heterodimers stabilized by split GFP may be locked into an unfavorable orientation that does not allow for the formation of a composite binding site of FZD4 and Tspan12, this is another caveat for the interpretation that Fzd4/Tspan12 do not form a composite binding site. This is not discussed. 

      While the split GFP does enforce a Fzd4/Tspan12 dimer, the split GFP is removed by protease cleavage during the final step of the purification process, after the dimer is contained in a nanodisc. This should allow Fzd4 and Tspan12 to freely adopt any pose and to diffuse within the confines of the nanodisc lipid bilayer. However, it has been shown that the phospholipid bilayer in small nanodiscs is not as fluid as the physiological plasma membrane, and although we used the slightly larger belt protein (MSP1E3D1, 13 nm diameter nanodiscs), perhaps the receptors are indeed locked in some unfavorable state for this reason. Additionally, the nanodiscs are planar, so if the formation of a composite binding site requires membrane curvature, this would not be recapitulated in our system. We have cited these caveats in the discussion section (page 10, last paragraph).  

      - Mutations that affect the affinity of norrin/fzd4 are not used to further test if Fzd4 and Tspan12 form a composite binding site. Norrin R41E or Fzd4 M105V were previously reported to reduce norrin/frizzled4 interactions and signaling, and both interaction and signaling were restored by Tspan12 (Lai et al. 2017). Whether a Fzd4/Tspan12 heterodimer has increased affinity for Norrin R41E was not tested. Similarly, affinity of FZD4 M105V vs a Fzd4 M105V/Tspan12 heterodimer were not tested. 

      Since the high affinity of Norrin for both Fzd4 and Tspan12 may have obscured any enhancement of Norrin affinity for Fzd4/Tspan12 compared to either receptor alone, we did consider weakening Fzd-Norrin affinity to sensitize this experiment, inspired by the experiments you mention in (Lai et al., 2017). However, we suspected that the slight increase in Norrin affinity for the Fzd4/Tspan12 dimer compared to Fzd4 alone was driven mainly by increased avidity that enhanced binding of low Norrin concentrations, and this avidity effect would likely confound the interpretation of any experiment monitoring 2:2 complex formation. Additionally, on the basis that soluble Fzd4 extracellular domain and Tspan12 bind Norrin with negative cooperativity (Figures 3D and 3F), we concluded that this composite binding site was unlikely.

      - An important conclusion of the study is that Tspan12 or Lrp6 binding to Norrin is mutually exclusive. This could be corroborated by an experiment in which LRP5/6 is inserted into nanodiscs for BLI binding tests with Norrin, or Tspan12 LEL, or a combination of both. Soluble LRP6 may remove norrin from equilibrium binding/unbinding to Tspan12, therefore presenting LRP6 in a non-soluble form may yield different results. 

      We agree that testing this conclusion in an orthogonal experiment would be a valuable addition to this study. We have now performed a similar experiment to the one you described, but with Norrin immobilized on biosensors, and with LRP6 in detergent competing with Tspan12 LEL for Norrin binding (Figure S12, discussed on page 8, first full paragraph). The results of this experiment show that biosensor-immobilized Norrin will bind LRP6, and that soluble Tspan12 inhibits LRP6 binding in a concentration-dependent manner. The LRP6 construct we use (residues 20-1439) includes the transmembrane domain but has a truncated C terminus, since LRP6 constructs containing the full C terminus tend to aggregate during purification. We chose to immobilize Norrin to make the experiment as interpretable as possible, since immobilizing LRP6 and competing Norrin off with the LEL could result in an increase in signal (from the LEL binding the second available Norrin protomer) as well as a decrease (from Norrin being competed off of the immobilized LRP6). We conducted the experiment in detergent (DDM) instead of nanodiscs to be able to test higher concentrations of LRP6.

      - The authors use LRP6 instead of LRP5 for their experiments. Tspan12 is less effective in increasing the Norrin/Fzd4/Lrp6 signaling amplitude compared to Norrin/Fzd4/Lrp5 signaling, and human genetic evidence (FEVR) implicates LRP5, not LRP6, in Norrin/Frizzled4 signaling. The authors find that Norrin binding to LRP6 and Tspan12 is mutually exclusive, however this may not be the case for Lrp5. 

      This is an important point which we have now addressed in the text (page 8, end of first full paragraph). LRP5 is indeed the receptor implicated in FEVR and expressed in the relevant tissues for Tspan12/Norrin signaling. Unfortunately, LRP5 expresses poorly and we are unable to purify sufficient quantities to perform these experiments. However, LRP5 and LRP6 both transduce Tspan12-enhanced Norrin signaling in TOPFLASH assays (as you mention and as shown by (Zhou and Nathans, 2014)), bind Norrin, and are highly similar (they share 71% sequence identity overall and 73% sequence identity in the extracellular domain), so we expect their Norrin-binding sites to be conserved.

      - The biochemical data are largely not correlated with functional data. The authors suggest that the Norrin R115L FEVR mutation could be due to reduced norrin binding to tspan12, but do not test if Tspan12-mediated enhancement of the norrin signaling amplitude is reduced by the R115L mutation. Similarly, the impressive restoration of binding by charge reversal mutations in site 3 is not corroborated in signaling assays. 

      We agree that testing the impact of Norrin mutations in cell-based signaling assays would be an informative way to further test our model. However, the Norrin mutants we tested generated poor TopFlash signals in all conditions tested. This may be due to general protein instability, weakened affinity for LRP, or weaker interactions with HSPGs. Whatever the cause, the low signal made it challenging to conclusively say whether the Norrin mutations affected Tspan12mediated signaling enhancement.

      When expressed for purification, Tspan12 mutants generally expressed poorly compared to WT Tspan12, so we were concerned that differences in protein stability or trafficking would lead to lower cell-surface levels of mutant Tspan12 relative to WT in TopFlash signaling assays, which would confound interpretation of mutant Tspan’s ability to enhance Norrin signaling.

      Because of these challenges, follow-up experiments to investigate the signaling capabilities of Norrin and Tspan12 mutants were not informative and we have not included them in the revised manuscript.

      Reviewer #3 (Public Review): 

      Brugeuera et al present an impressive series of biochemical experiments that address the question of how Tspan12 acts to promote signaling by Norrin, a highly divergent TGF-beta family member that serves as a ligand for Fzd4 and Lrp5/6 to promote canonical Wnt signaling during CNS (and especially retinal) vascular development. The present study is distinguished from those of the past 15 years by its quantitative precision and its high-quality analyses of concentration dependencies, its use of well-characterized nano-disc-incorporated membrane proteins and various soluble binding partners, and its use of structure prediction (by AlphaFold) to guide experiments. The authors start by measuring the binding affinity of Norrin to Tspan12 in nanodiscs (~10 nM), and they then model this interaction with AlphaFold and test the predicted interface with various charge and size swap mutations. The test suggests that the prediction is approximately correct, but in one region (site 1) the experimental data do not support the model. [As noted by the authors, a failure of swap mutations to support a docking model is open to various interpretations. As AlphFold docking predictions come increasingly into common use, the compendium of mutational tests and their interpretations will become an important object of study.] Next, the authors show that Tspan12 and Fzd4 can simultaneously bind Norrin, with modest negative cooperativity, and that together they enhance Norrin capture by cells expressing both Tspan12 and Fzd4 compared to Fzd4 alone, an effect that is most pronounced at low Norrin concentration. Similarly, at low Norrin concentration (~1 nM), signaling is substantially enhanced by Tspan12. By contrast, the authors show that LRP6 competes with Tspan12 for Norrin binding, implying a hand-off of Norrin from a Tspan12+Fzd4+Norrin complex to a LRP5/6+Fzd4+Norrin complex. Thanks to the authors' careful dose-response analyses, they observed that Norrin-induced signaling and Tspan12 enhancement of signaling both have bell-shaped dose-response curves, with strong inhibition at higher levels of Norrin or Tspan12. The implication is that the signaling system has been built for optimal detection of low concentrations of Norrin (most likely the situation in vivo), and that excess Tspan12 can titrate Norrin at the expense of LRP5/6 binding (i.e., reduction in the formation of the LRP5/6+Fzd4+Norrin signaling complex). In the view of this reviewer, the present work represents a foundational advance in understanding Norrin signaling and the role of Tspan12. It will also serve as an important point of comparison for thinking about signaling complexes in other ligand-receptor systems. 

      Recommendations for the authors: 

      Reviewer #2 (Recommendations For The Authors):   

      - In Figure 5F high concentrations of transfected Tspan12 plasmid inhibit signaling, which the authors interpret to support the model that Tspan12/Norrin binding prevents Norrin/LRP6/FZD4 complex formation. Alternatively, the cells do not tolerate the expression of the tetraspanin at high levels, for example, due to misfolding and aggregate formation. To distinguish these possibilities: Do high levels of Tspan12 overexpression also inhibit signaling induced by Wnt3a and appropriate Frizzled receptors, even though Tspan12 has no influence on Wnt/LRP6 binding? 

      We thank the reviewer for suggesting this important control experiment. We have added the Wnt-simulated TOPFLASH values to the figure in 5F for all conditions. In repeating this experiment, we noticed that high levels of transfected Tspan12 may decrease cell viability and therefore have adjusted the range of transfected Tspan12 in the new Figure 5F (discussed on page 8, second full paragraph). Under this new protocol, both Norrin- and Wnt-stimulated signaling were inhibited by the highest amount of transfected Tspan12. However, Norrinstimulated signaling is inhibited by lower amounts of transfected Tspan12 than Wnt-stimulated signaling, and to a greater extent, supporting our proposed model that Tspan12 competes with LRP for Norrin binding.

      - Is Tspan12 with c-terminal rho-tag (the form incorporated into nanodiscs) also used for functional luciferase assays, or was untagged Tspan12 used for the luciferase assays in Fig 4D and 5F? Does the c-terminal tag interfere with Tspan12-mediated enhancement of Norrin signaling? 

      For the luciferase assays included in this manuscript, wildtype, full-length, untagged Tspan12 is used. We have clarified this in our methods section. When we tested the wildtype vs Cterminally rho1D4-tagged version of Tspan12 in TOPFLASH assays, we saw that the enhancement of Norrin signaling by Tspan12-1D4 was weaker than enhancement by untagged Tspan12. This is consistent with the finding reported in Cell Reports (Lai et al., 2017) that a chimeric Tspan12 receptor with its C-terminus replaced with that of Tspan11 was still capable of enhancing Norrin signaling, though to a lesser extent than WT Tspan12. The deficiency of signaling by our rho1D4-tagged Tspan12 could be due to a difference in receptor expression level or trafficking, but in the absence of a reliable antibody against Tspan12, we were unable to assess the expression levels or localization of the untagged Tspan12 to compare it to the rho1D4-tagged version. (For binding experiments, we reasoned that the C-terminal tag should not affect Tspan12’s ability to bind Norrin extracellularly, especially as we found that purified fulllength Tspan12 and Tspan12∆C (residues 1-252) bound Norrin equally well; we have added this comparison to table S1.)  

      Reviewer #3 (Recommendations For The Authors): 

      Minor comments. 

      Based on the Fzd4-Dvl binding experiment, the authors might state explicitly the possibility that Tspan12's relevance is entirely accounted for by extracellular ligand capture. 

      We have stated this possibility explicitly in the discussion section (page 9, last paragraph). 

      Page 4, 3rd paragraph. I suggest "To experimentally test this structural prediction..." rather than "validate". 

      Thank you for this suggestion; we have replaced this wording. 

      This next item is optional, but I hope that the authors will consider it. This manuscript provides an opportunity for the authors to be more expansive in their thinking, and to put their work into the larger context of ligand+receptor+accessory protein interactions. The authors describe the Wnt7a/7b-Gpr124-RECK system and the role of HSPs in Norrin and Wnt signaling, but perhaps they can also comment on non-Wnt ligand-receptor systems where accessory proteins are found. They might add a figure (or supplemental figure) with a schematic showing the roles of HSP and Gpr124-RECK, and some non-Wnt ligand-receptor systems. This would help to make the present work more widely influential.

      Thank you for this suggestion. We have added a figure (Figure 6, discussed on page 10, paragraphs 2 and 3) and expanded our discussion to include other co-receptor systems. We have specifically focused on co-receptors that both capture ligands and interact with their primary receptor(s), thus delivering ligands to their receptors, as we have proposed for Tspan12. Within Wnt signaling, other co-receptor systems with this mechanism are RECK/Gpr124 (for Wnt7a/b) and Glypican-3. We found it interesting that this mechanism is also shared by several growth factor pathways with cystine knot ligands (like Norrin), so we have illustrated and mentioned three of these examples.

    1. eLife Assessment

      This important study provides insights and strategies for assessing laminar structure in vivo in the visual cortex of the macaque monkey with high-density linear electrode arrays. The paper provides convincing evidence demonstrating that signals in higher frequency bands, related to the discharge of action potentials, are of substantially better use for achieving well-resolved cortical layer identification than are signals in lower frequency bands typically associated with local field potentials and standard-practice Current Source Density (CSD) analyses. These findings are of interest to a wide range of neuroscientists making comparisons between cortical layers or recording with array electrodes.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Zhang et al., presented an electrophysiology method to identify the layers of macaque visual cortex with high density Neuropixels 1.0 electrode. They found several electrophysiology signal profiles for high-resolution laminar discrimination and described a set of signal metrics for fine cortical layer identification.

      Strengths:

      There are two major strengths. One is the use of high density electrodes. The Neuropixels 1.0 probe has 20 um spacing electrodes, which can provide high resolution for cortical laminar identification. The second strength is the analysis. They found multiple electrophysiology signal profiles which can be used for laminar discrimination. Using this new method, they could identify the most thin layer in macaque V1. The data support their conclusion.

      Weaknesses:

      While this electrophysiology strategy is much easier to perform even in awake animals compared to histological staining methods, it provides an indirect estimation of cortical layers. A parallel histological study can provide a direct matching between the electrode signal features and cortical laminar locations. However, there are technical challenges, for example the distortions in both electrode penetration and tissue preparation may prevent a precise matching between electrode locations and cortical layers. In this case, additional micro wires electrodes binding with Neuropixels probe can be used to inject current and mark the locations of different depths in cortical tissue after recording.

    3. Reviewer #2 (Public review):

      Summary:

      This paper documents a compelling attempt to accurately determine the locations and boundaries of the anatomically and functionally defined layers in macaque primary visual cortex using voltage signals recorded from a high-density electrode array that spans the full depth of cortex with contacts at 20 um spacing. First, the authors attempt to use current source density (CSD) analysis to determine layer locations, but they report a striking failure because the results vary greatly from one electrode penetration to the next and because the spatial resolution of the underlying local field potential (LFP) signal is coarse compared to the electrical contact spacing. The authors therefore turn to examining higher frequency signals related to action potentials and provide evidence that these signals reflect changes in neuronal size and packing density, response latency and visual selectivity, which taken together can advance the state-of-the-art accuracy in making layer assignments from in vivo recordings.

      Strengths:

      There is a lot of nice data to look at in this paper that show interesting quantities as a function of depth in V1. Bringing all of these together offers the reader a rich data set: CSD, action potential shape, response power and coherence spectrum, and post-stimulus time response traces. Furthermore, data are displayed as a function of eye (dominant or non-dominant) and for achromatic and cone-isolating stimuli.

      This paper takes a strong stand in pointing out weaknesses in the ability of CSD analysis to make consistent determinations about cortical layering in V1. Many researchers have found CSD to be problematic, and the observations here may be important to motivate other researchers to carry out rigorous comparisons and publish their results, even if they reflect negatively on the value of CSD analysis.

      The paper provides a thoughtful, practical and comprehensive recipe for assigning traditional cortical layers based on easily-computed metrics from electophysiological recordings in V1, and this is likely to be useful for electrophysiologists who are now more frequently using high-density electrode arrays.

      Weaknesses:

      Considerable space is taken in pointing out features that are well known, for example, the latency difference associated with different retinogeniculate pathways, the activity level differences associated with input layers, and the action potential shape differences associated with white vs. gray matter. These have been used for decades as indicators of depth and location of recordings in visual cortex as electrodes were carefully advanced. High density electrodes allow this type of data to now be collected in parallel, but at discrete, regular sampling points. Perhaps more emphasis could be placed on developing a rigorous analysis of how variable vs. reproducible are quantitative metrics of these features across penetrations, as a function of distance or functional domain, and from animal to animal, but this paper certainly makes a substantial push in this direction.

      Another important piece of information for assessing the ability to determine layers from spiking activity is to carry out post-mortem histological processing so that the layer determination made in vivo can be compared to anatomical layering. However, histological methods also suffer from distortion and noise, thus it remains to be seen how much can ultimately be gained by integrating histology with the physiological methods explored here.

      Overall

      Overall, this paper makes a compelling argument in favor of using action potentials and stimulus driven responses, instead of CSD measurements, to assign cortical layers to electrode contacts in V1. The rich presentation of data, combined with the authors' highly educated interpretation and speculation about how useful such measurements will be for layer assignment make this an important paper for many labs using high-density electrodes. It is easy to agree with much of what is postulated here and to hope that we will soon have reliable, quantitative methods to make layer assignments that will be meaningful in terms of the differentiated roles of single neurons in cortical computation. How much this will end up corresponding to the canonical layer numbering that has been used for many decades will be interesting to see.

      Comments on revisions:

      I found that the authors addressed my main concerns to the degree they were able. They improved the consistency of language and figures, and they added some useful quantification.

    4. Reviewer #3 (Public review):

      Summary:

      Zhang et al. explored strategies for aligning electrophysiological recordings from high-density laminar electrode arrays (Neuropixels) with the pattern of lamination across cortical depth in macaque primary visual cortex (V1), with the goal of improving the spatial resolution of layer identification based on electrophysiological signals alone. The authors compare the current commonly used standard in the field - current source density (CSD) analysis - with a new set of measures largely derived from action potential (AP) frequency band signals. Individual AP band measures provide distinct cues about different landmarks or potential laminar boundaries, and together they are used to subdivide the spatial extent of array recordings into discrete layers, including the very thin layer 4A, at a level of resolution unavailable when relying on CSD analysis alone for laminar identification. The authors compare the widths of the resulting subdivisions with previously reported anatomical measurements as evidence that layers have been accurately identified. This is a bit circular, given that they also use these anatomical measurements as guidelines limiting the boundary assignments; however, the strategy is overall sensible and the electrophysiological signatures used to identify layers are generally convincing. Furthermore, by varying the pattern of visual stimulation to target chromatically sensitive inputs known to be partially segregated by layer in V1, they show localized response patterns that lend confidence to their identification of particular sublayers.

      The authors compellingly demonstrate the insufficiency of CSD analysis for precisely identifying fine laminar structure, and in some cases its limited accuracy at identifying coarse structure. CSD analysis produced inconsistent results across array penetrations and across visual stimulus conditions and was not improved in spatial resolution by sampling at high density with Neuropixels probes. Instead, in order to generate a typical, informative pattern of current sources and sinks across layers, the LFP signals from the Neuropixels arrays required spatial smoothing or subsampling to approximately match the coarser (50-100 µm) spacing of other laminar arrays. Even with smoothing, the resulting CSDs in some cases predicted laminar boundaries that were inconsistent with boundaries estimated using other measures and/or unlikely given the typical sizes of individual layers in macaque V1. This point alone provides an important insight for others seeking to link their own laminar array recordings to cortical layers.

      They next offer a set of measures based on analysis of AP band signals. These measures include analyses of the density, average signal spread, and spike waveforms of units identified through spike sorting, as well as analyses of AP band power spectra and local coherence profiles across recording depth. The power spectrum measures in particular yield compact peaks at particular depths, albeit with some variation across penetrations, whereas the waveform measures most convincingly identified the layer 6-white matter transition. In general, some of the new measures yield inconsistent patterns across penetrations, and some of the authors' explanations of these analyses draw intriguing but rather speculative connections to properties of anatomy and/or responsivity. However, taken as a group, the set of AP band analyses appear sufficient to determine the layer 6-white matter transition with precision and to delineate intermediate transition points likely to correspond to actual layer boundaries, and the strategy serves as a substantial advancement over consideration of CSD signals alone to match electrophysiological recordings with cortical layers.

      Strengths:

      The authors convincingly demonstrate the potential to resolve putative laminar boundaries using only electrophysiological recordings from Neuropixels arrays. This is particularly useful given that histological information is often unavailable for chronic recordings. They make a clear case that CSD analysis is insufficient to resolve the lamination pattern with the desired precision and offer a thoughtful set of alternative analyses, along with an order in which to consider multiple cues in order to facilitate others' adoption of the strategy. The suggested analyses can be used to reliably identify certain landmarks (the positions of layer 4c and the layer 6-white matter transition), which provide very useful constraints for specifying the remaining laminar boundaries, and consideration of average anatomical patterns makes it unlikely that the remaining laminar boundaries will be far from their true locations. Overall, the widths of the resulting layers bear a sensible resemblance to the expected widths identified by prior anatomical measurements, and at least in some cases there are satisfying signatures of chromatic visual sensitivity and latency differences across layers that are predicted by the known connectivity of the corresponding layers. Thus, the proposed analytical toolkit appears to work well for macaque V1 and has strong potential to generalize to use in other cortical regions, though area-targeted selection of stimuli may be required.

      Weaknesses:

      The waveform measures, in particular the unit density distribution, are likely to be sensitive to the methods and criteria used for spike sorting, which differ among experimenters/groups, and this may limit the usefulness of this particular measure for others in the community.<br /> More generally, although the sizes of identified layers comport with typical sizes identified anatomically, a more powerful confirmation would be a direct comparison with histologically identified boundaries along each penetration's trajectory. Ultimately, the absence of this type of independent confirmation limits the strength of the claim that veridical laminar boundaries can be precisely identified from electrophysiological signals alone.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Zhang et al., presented an electrophysiology method to identify the layers of macaque visual cortex with high density Neuropixels 1.0 electrode. They found several electrophysiology signal profiles for high-resolution laminar discrimination and described a set of signal metrics for fine cortical layer identification.

      Strengths:

      There are two major strengths. One is the use of high density electrodes. The Neuropixels 1.0 probe has 20 um spacing electrodes, which can provide high resolution for cortical laminar identification. The second strength is the analysis. They found multiple electrophysiology signal profiles which can be used for laminar discrimination. Using this new method, they could identify the most thin layer in macaque V1. The data support their conclusion.

      Weaknesses:

      While this electrophysiology strategy is much easier to perform even in awake animals compared to histological staining methods, it provides an indirect estimation of cortical layers. A parallel histological study can provide a direct matching between the electrode signal features and cortical laminar locations. However, there are technical challenges, for example the distortions in both electrode penetration and tissue preparation may prevent a precise matching between electrode locations and cortical layers. In this case, additional micro wires electrodes binding with Neuropixels probe can be used to inject current and mark the locations of different depths in cortical tissue after recording.

      While we agree that it would be helpful to adopt a more direct method for linking laminar changes observed with electrophysiology to anatomical layers observed in postmortem histology, we do not believe that the approach suggested by the reviewer would be particularly helpful. The approach suggested involves making lesions, which are known to be quite variable in size, asymmetric in shape, and do not have a predictable geometry relative to the location of the electrode tip. In contrast, our electrophysiology measures have identified clear boundaries which precisely match the known widths and relative positions of all the layers of V1, including layer 4A, which is only 50 microns thick, much smaller than the resolution of lesion methods.

      Reviewer #2 (Public Review):

      Summary:

      This paper documents an attempt to accurately determine the locations and boundaries of the anatomically and functionally defined layers in macaque primary visual cortex using voltage signals recorded from a high-density electrode array that spans the full depth of cortex with contacts at 20 um spacing. First, the authors attempt to use current source density (CSD) analysis to determine layer locations, but they report a striking failure because the results vary greatly from one electrode penetration to the next and because the spatial resolution of the underlying local field potential (LFP) signal is coarse compared to the electrical contact spacing. The authors thus turn to examining higher frequency signals related to action potentials and provide evidence that these signals reflect changes in neuronal size and packing density, response latency and visual selectivity.

      Strengths:

      There is a lot of nice data to look at in this paper that shows interesting quantities as a function of depth in V1. Bringing all of these together offers the reader a rich data set: CSD, action potential shape, response power and coherence spectrum, and post-stimulus time response traces. Furthermore, data are displayed as a function of eye (dominant or non-dominant) and for achromatic and cone-isolating stimuli.

      This paper takes a strong stand in pointing out weaknesses in the ability of CSD analysis to make consistent determinations about cortical layering in V1. Many researchers have found CSD to be problematic, and the observations here may be important to motivate other researchers to carry out rigorous comparisons and publish their results, even if they reflect negatively on the value of CSD analysis.

      The paper provides a thoughtful, practical and comprehensive recipe for assigning traditional cortical layers based on easily-computed metrics from electrophysiological recordings in V1, and this is likely to be useful for electrophysiologists who are now more frequently using high-density electrode arrays.

      Weaknesses:

      Much effort is spent pointing out features that are well known, for example, the latency difference associated with different retinogeniculate pathways, the activity level differences associated with input layers, and the action potential shape differences associated with white vs. gray matter. These have been used for decades as indicators of depth and location of recordings in visual cortex as electrodes were carefully advanced. High density electrodes allow this type of data to now be collected in parallel, but at discrete, regular sampling points. Rather than showing examples of what is already accepted, the emphasis should be placed on developing a rigorous analysis of how variable vs. reproducible are quantitative metrics of these features across penetrations, as a function of distance or functional domain, and from animal to animal. Ultimately, a more quantitative approach to the question of consistency is needed to assess the value of the methods proposed here.

      We thank the reviewer for suggesting the addition of quantitative metrics to allow more substantive comparisons between various measures within and between penetrations. We have added quantification and describe this in the context of more specific comments made by this reviewer. We have retained descriptions of metrics that are well established because they provide an important validation of our approaches and laminar assignments.

      Another important piece of information for assessing the ability to determine layers from spiking activity is to carry out post-mortem histological processing so that the layer determination made in this paper could be compared to anatomical layering.

      We are not aware of any approach that would provide such information at sufficient resolution. For example, it is well known that electrolytic lesions often do not match to the locations expected from electrophysiological changes observed with single electrodes. As noted above, our observation that the laminar changes in electrophysiology precisely match the known widths and relative positions of all the layers of V1, including layer 4A, provides confidence in our laminar assignments.

      On line 162, the text states that there is a clear lack of consistency across penetrations, but why should there be consistency: how far apart in the cortex were the penetrations? How long were the electrodes allowed to settle before recording, how much damage was done to tissue during insertion? Do you have data taken over time - how consistent is the pattern across several hours, and how long was the time between the collection of the penetrations shown here?

      Answers to most of these questions can be found within the manuscript text. We have added text describing distance between electrode penetrations (at least 1mm, typically far more) and added a figure which shows a map of the penetration locations. The Methods section describes electrode penetration methods to minimize damage and settling times of penetrations. Data are provided regarding changes in recordings over time (see Methods, Drift Correction). The stimuli used to generate the data described are presented within a total of 30 minutes or less, minimizing any changes that might occur due to electrode drift. There is a minimum of 3 hours between different penetrations from the same animal.

      The impact of the paper is lessened because it emphasizes consistency but not in a consistent manner. Some demonstrations of consistency are shown for CSDs, but not quantified. Figure 4A is used to make a point about consistency in cell density, but across animals, whereas the previous text was pointing out inconsistency across penetrations. What if you took a 40 or 60 um column of tissue and computed cell density, then you would be comparing consistency across potentially similar scales. Overall, it is not clear how all of these different metrics compare quantitatively to each other in terms of consistency.

      As noted above, we have now added quantitative comparisons of consistency between different metrics. It is unclear why the reviewer felt that we use Figure 4A to describe consistency. That figure was a photograph from a previous publication simply showing the known differences in neuron density that are used to define layers in anatomical studies. This was intended to introduce the reader to known laminar differences. At any rate, we have been unable to contact the previous publishers of that work to obtain permission to use the figure. So we have removed that figure as it is unnecessary to illustrate the known differences in cell density that are used to define layers. We have kept the citation so that interested readers can refer to the publication.

      In many places, the text makes assertions that A is a consistent indicator of B, but then there appear to be clear counterexamples in the data shown in the figures. There is some sense that the reasoning is relying too much on examples, and not enough on statistical quantities.

      Without reference to specific examples we are not able to address this point.

      Overall

      Overall, this paper makes a solid argument in favor of using action potentials and stimulus driven responses, instead of CSD measurements, to assign cortical layers to electrode contacts in V1. It is nice to look at the data in this paper and to read the authors' highly educated interpretation and speculation about how useful such measurements will be in general to make layer assignments. It is easy to agree with much of what they say, and to hope that in the future there will be reliable, quantitative methods to make meaningful segmentations of neurons in terms of their differentiated roles in cortical computation. How much this will end up corresponding to the canonical layer numbering that has been used for many decades now remains unclear.

      Reviewer #3 (Public Review):

      Summary:

      Zhang et al. explored strategies for aligning electrophysiological recordings from high-density laminar electrode arrays (Neuropixels) with the pattern of lamination across cortical depth in macaque primary visual cortex (V1), with the goal of improving the spatial resolution of layer identification based on electrophysiological signals alone. The authors compare the current commonly used standard in the field - current source density (CSD) analysis - with a new set of measures largely derived from action potential (AP) frequency band signals. Individual AP band measures provide distinct cues about different landmarks or potential laminar boundaries, and together they are used to subdivide the spatial extent of array recordings into discrete layers, including the very thin layer 4A, a level of resolution unavailable when relying on CSD analysis alone for laminar identification. The authors compare the widths of the resulting subdivisions with previously reported anatomical measurements as evidence that layers have been accurately identified. This is a bit circular, given that they also use these anatomical measurements as guidelines limiting the boundary assignments; however, the strategy is overall sensible and the electrophysiological signatures used to identify layers are generally convincing. Furthermore, by varying the pattern of visual stimulation to target chromatically sensitive inputs known to be partially segregated by layer in V1, they show localized response patterns that lend confidence to their identification of particular sublayers.

      The authors compellingly demonstrate the insufficiency of CSD analysis for precisely identifying fine laminar structure, and in some cases its limited accuracy at identifying coarse structure. CSD analysis produced inconsistent results across array penetrations and across visual stimulus conditions and was not improved in spatial resolution by sampling at high density with Neuropixels probes. Instead, in order to generate a typical, informative pattern of current sources and sinks across layers, the LFP signals from the Neuropixels arrays required spatial smoothing or subsampling to approximately match the coarser (50-100 µm) spacing of other laminar arrays. Even with smoothing, the resulting CSDs in some cases predicted laminar boundaries that were inconsistent with boundaries estimated using other measures and/or unlikely given the typical sizes of individual layers in macaque V1. This point alone provides an important insight for others seeking to link their own laminar array recordings to cortical layers.

      They next offer a set of measures based on analysis of AP band signals. These measures include analyses of the density, average signal spread, and spike waveforms of single- and multi-units identified through spike sorting, as well as analyses of AP band power spectra and local coherence profiles across recording depth. The power spectrum measures in particular yield compact peaks at particular depths, albeit with some variation across penetrations, whereas the waveform measures most convincingly identified the layer 6-white matter transition. In general, some of the new measures yield inconsistent patterns across penetrations, and some of the authors' explanations of these analyses draw intriguing but rather speculative connections to properties of anatomy and/or responsivity. However, taken as a group, the set of AP band analyses appear sufficient to determine the layer 6-white matter transition with precision and to delineate intermediate transition points likely to correspond to actual layer boundaries.

      Strengths:

      The authors convincingly demonstrate the potential to resolve putative laminar boundaries using only electrophysiological recordings from Neuropixels arrays. This is particularly useful given that histological information is often unavailable for chronic recordings. They make a clear case that CSD analysis is insufficient to resolve the lamination pattern with the desired precision and offer a thoughtful set of alternative analyses, along with an order in which to consider multiple cues in order to facilitate others' adoption of the strategy. The widths of the resulting layers bear a sensible resemblance to the expected widths identified by prior anatomical measurements, and at least in some cases there are satisfying signatures of chromatic visual sensitivity and latency differences across layers that are predicted by the known connectivity of the corresponding layers. Thus, the proposed analytical toolkit appears to work well for macaque V1 and has strong potential to generalize to use in other cortical regions, though area-targeted selection of stimuli may be required.

      Weaknesses:

      The waveform measures, and in particular the unit density distribution, are likely to be sensitive to the criteria used for spike sorting, which differ widely among experimenters/groups, and this may limit the usefulness of this particular measure for others in the community. The analysis of detected unit density yields fluctuations across cortical depth which the authors attribute to variations in neural density across layers; however, these patterns seemed particularly variable across penetrations and did not consistently yield peaks at depths that should have high neuronal density, such as layer 2. Therefore, this measure has limited interpretability.

      While we agree that our electrophysiological measure of unit density does not strictly reflect anatomical neuronal density, we would like to remind the reader that we use this measure only to roughly estimate the correspondence between changes in density and likely layer assignments. We rely on other measures (e.g. AP power, AP power changes in response to visual stimuli) that have sharp borders and more clear transitions to assign laminar boundaries. Further, as noted in the reviewer’s list of strengths, the laminar assignments made with these measures are cross validated by differences in response latencies and sensitivity to different types of stimuli that are observed at different electrode depths.

      More generally, although the sizes of identified layers comport with typical sizes identified anatomically, a more powerful confirmation would be a direct per-penetration comparison with histologically identified boundaries. Ultimately, the absence of this type of independent confirmation limits the strength of their claim that veridical laminar boundaries can be identified from electrophysiological signals alone.

      As we have noted in response to similar comments from other reviewers, we are not aware of a method that would make this possible with sufficient resolution.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      The reviewers have indicated that their assessment would potentially be stronger if their advice for quantitative, statistically validated comparisons was followed, for example, to demonstrate variability or consistency of certain measures that are currently only asserted. Also, if available, some histological confirmation would be beneficial. It was requested that the use and modification of the layering from Balaram & Kaas is addressed, as well as dealing with inconsistencies in the scale bars on those figures. There are two figure permission issues that need to be resolved prior to publication: Balaram & Kaas 2014 in Fig 1A, Kelly & Hawken 2017 in Fig. 4A.

      Please see detailed responses to reviewer comments below. We have added new supplemental figures to quantitatively compare variability among metrics. As noted above, the suggested addition of data linking the electrophysiology directly to anatomical observations of laminar borders from the same electrode penetration is not feasible. The figure reused in Figure 1A is from open-access (CC BY) publication (Balaram & Kaas 2014). After reexamining the figure in the original study, we found that the inferred scale bar would give an obviously inaccurate result. So, we decided to remove the scale bar in Figure 1A. We haven’t received any reply from Springer Nature for Figure 4A permission, so we decided to remove the reused figure from our article (Kelly & Hawken 2017).

      Reviewer #1 (Recommendations For The Authors):<br /> Figure 4A has a different scale to Figure 4B-4F. It is better to add dashed lines to indicate the relationship between the cortical layers or overall range from Figure 4A to the corresponding layers in 4B to 4F.

      The reused figure in Figure 4A is removed due to permission issue. See also comments above.

      Reviewer #2 (Recommendations For The Authors):

      General comments

      This paper demonstrates that voltage signals in frequency bands higher than those used for LFP/CSD analysis can be used from high-density electrical contact recording to generate a map of cortical layering in macaque V1 at a higher spatial resolution than previously attained.

      My main concern is that much of this paper seems to show that properties of voltage signals recorded by electrodes change with depth in V1. This of course is well known and has been mapped by many who have advanced a single electrode micron-by-micron through the cortex, listening and recording as they go. Figure 4 shows that spike shapes can give a clear indication of GM to WM borders, and this is certainly true and well known. Figures 5 and 6 show that activity level on electrodes can indicate layers related to LGN input, and this is known. Figure 7 shows that latencies vary with layer, and this is certainly true as we know. A main point seems to be that CSD is highly inconsistent. This is important to know if CSD is simply never going to be a good measure for layering in V1, but it would require quantification and statistics to make a fair comparison.

      We are glad to see that the reviewer understands that changes in electrical signals across layers are well known and are expected to have particular traits that change across layers. We do not claim that have discovered anything that is unexpected or unknown. Instead, we introduce quantitative measures that are sensitive to these known differences (historically, often just heard with an audio monitor e.g. “LGN axon hash”). While the primary aim of this paper is not to show that Neuropixels probes can record some voltage signal properties that cannot be recorded with a single electrode before, we would like to point out that multi-electrode arrays have a very different sampling bias and also allow comparisons of simultaneous recordings across contacts with known fixed distances between them. For example our measure of “unit spread” could not be estimated with a single electrode.

      We’ve added Figure S3 to show quantitative comparison of variation between CSD and AP metrics. These figures add support to our prior, more anecdotal descriptions showing that CSDs are inconsistent and lack the resolution needed to identify thin layers.

      Some things are not explained very clearly. Like achromatic regions, and eye dominance - these are not quantified, and we don't know if they are mutually consistent - are achromatic/chromatic the same when tested through separate eyes? How consistent are these basic definitions? How definitive are they?

      The quantitative definitions of achromatic region/COFD and eye dominance column can be found in our previous paper (Li et al., 2022) cited in this article. The main theme of this study is to develop a strategy for accurately identifying layers, the more detailed functional analysis will be described in future publications.

      Specific comments

      The abstract refers to CSD analysis and CSD signals. Can you be more precise - do you aim to say that LFP signals in certain frequency bands are already known to lack spatial localization, or are you claiming to be showing that LFP signals lack spatial resolution? A major point of the results appears to be lack of consistency of CSD, but I do not see that in the Abstract. The first sentence in the abstract appears to be questionable based on the results shown here for V1.

      We have updated the Abstract to minimize confusion and misunderstanding.

      Scale bar on Fig 1A implies that layers 2-5 are nearly 3 mm thick. Can you explain this thickness? Other figures here suggest layers 1-6 is less than 2 mm thick. Note, in a paper by the same authors (Balaram et al) the scale bar (100 um, Figure 4) on similar macaque tissue suggests that the cortex is much thinner than this. Perhaps neither is correct, but you should attempt to determine an approximately accurate scale. The text defines granular as Layer 4, but the scale bar in A implies layer 4 is 1 mm thick, but this does not match the ~0.5 mm thickness consistent with Figure 1E, F. The text states that L4A is less then 100 um thick, but the markings and scale bar in Figure 1A suggests that it could be more than 100 um thick.

      We thank the reviewer for pointing out that there are clearly errors in the scale bars used in these previously published figures from another group. In the original figure 1(Balaram & Kaas 2014), histological slices were all scaled to one of the samples (Chimpanzee) without scale bar. After reexamining the scale bar we derived based on figure 2 of the original study, we found the same problem. Since relative widths of layers are more important than absolute widths in our study, we decided to remove the scale bar that we had derived and added to the Figure 1A.

      Line 157. Fix "The most commonly visual stimulus"

      Text has been changed

      Line 161. Fix "through dominate eye"

      Text has been changed

      Line 166. Please specify if the methods established and validated below are histological, or tell something about their nature here.

      The Abstract and Introduction already described the nature of our methods

      Line 184. Text is mixing 'dominant' and 'dominate', the former is better.

      Text has been changed accordingly

      Line 188. Can you clarify "beyond the time before a new stimulus transition". Are you generally referring to the fact that neuronal responses outlast the time between changes in the stimulus?

      That is correct. We are referring to the fact that neuronal responses outlast the time between changes in the stimulus. We have edited the text for clarity.

      Line 196. Fix "dominate eye" in two places.

      Text has been changed

      Line 196. The text seems to imply it is striking to find different response patterns for the two eyes, but given the OD columns, why should this be surprising?

      Since we didn’t find systematic comparison for CSD depth profiles of dominant/non-dominant eyes, or black/white in the past studies, we just describe what we saw in our data. The rational for testing each eye is that it is known that LGN projections from two eyes remain separated in direct input layer of V1, so comparing CSDs from two eyes could potentially help identifying input layers, such as L4C. Here we provide evidence showing that CSD profiles from two eyes deviate from naive expectations. For example, CSDs from black stimulus show less variation between two eyes, whereas CSDs from white stimulus could range from similar profile to drastically different ones across eyes.

      Line 198. Text like, "The most consistent..." is stating overall conclusions drawn by the authors before pointing the reader specifically to the evidence or the quantification that supports the statement.

      We’ve adjusted the text pointing to Figure S2, where depth profiles of all penetrations are visualized, and a newly added Figure S3, where the coefficients of variation for several metric profiles were shown.

      Line 200. "white stimulus is more variable" - the text does not tell us where/how this is supported with quantitative analysis/statistics.

      We’ve adjusted the text pointing to Figure S2, S3

      The metric in 4B is not explained, the text mentions the plot but the reader is unable to make any judgement without knowledge of the method, nor any estimate of error bars.

      The figure is first mentioned in section: Unit Density, and text in this section already described the definition of neuron density and unit density.  We’ve also modified the text pointing to the method section for details.

      Line 236. The text states the peak corresponds to L4C, but does not explain how the layer lines were determined.

      As described early in the CSD section, all layer boundaries are determined following the guide which layouts the strategy for how to draw borders by combining all metrics.

      At Line 296 the spike metrics section ends without providing a clear quantification of how useful the metrics will be. It is clear that the GM to WM boundary can be identified, but that can be found with single electrodes as well, as neurophysiologists get to see/hear the change in waveform as the electrode is advanced in even finer spatial increments than the 20 um spacing of the contacts here.

      The aim of this study is to develop an approach for accurately delineating layers simultaneously. The metrics we explored are considered estimation of well-known properties, so they can provide support for the correctness we hope to achieve. Here we first demonstrate the usefulness and later show the average across penetrations (Figure 9C-F). We are less concerned in quantification of how different factors affect precision and consistency of these metrics or how useful a single metric is, but rather, as described in the guide section, whether we can delineate all layers given all metrics.

      Line 302-306. Why this statement is made here is unclear, it interrupts the flow for a reason that perhaps will be explained later.

      This statement notes the insensitivity of this measure to temporal differences, introducing the value of incorporating a measure of how AP powers changes over time in the next section of the manuscript.

      Line 311. What is the reason to speculate about no canceling because of temporal overlap? Are you assuming a very sparse multi unit firing rate such that collisions do not happen?

      Here we describe a simple theoretical model in which spike waveforms only add without cancelling, then the power would be proportional to the number of spikes. In reality, spike waveform sometimes cancels causing the theoretical relationship to deteriorate to some degree.

      Lines 327-346. There is a considerable amount of speculation and arguing based on particular examples and there is a lack of quantification. Neuron density is mentioned, but not firing rate. would responses from fewer neurons with higher firing rate not be similar to more neurons with lower firing rates?

      According to the theoretical model we described, power is proportional to numbers of spikes which then depend on both neuron density and firing rate. So fewer neurons with higher firing rate would generate similar power to more neurons with lower firing rate. We’ve expanded the explanation of the model and added Figure S4 about the depth profile of firing rate. Text has also been adjusted pointing to the Figure S2, S3 about quantitively comparisons of variability.

      Line 348 states there is a precise link between properties and cortical layers, but the manuscript has not, up to this point, shown how that link was determined or quantified it.

      Through our generative model of power and the similarity between depth profile of firing rate and depth profile of neuron density (Figure S4), depth profile of power can be used to approximate depth profile of neuron density which is known to be closely correlated to cortical layering.

      Line 350. What is meant by "stochastic variability"?

      The text essentially says distances from electrode contact to nearby cell bodies were random, so closer cells have higher spike amplitudes and in turn result in higher power on a channel.

      The figures showing the two metrics, Pf and Cf, should be shown for the same data sets. The markings indicate that Fig 5 and Fig 6 show results from non-overlapping data sets. This does not build confidence about the results in the paper.

      Here we use typical profiles to demonstrate the characteristics of the power spectrum/coherence spectrum because of the variation across penetrations. We show later, in the guide section, all metrics for one penetration (another two cases in supplemental figures) and how to combine all metrics to derive layer delineations.

      Line 375 the statement is somewhat vague, "there are nevertheless sometimes cases where they can resolve uncertainties," can you please provide some quantitative support?

      We provided 3 examples in Figure 6, and more examples are shown in Figure 8, Figure S5, S6.

      Line 379. I believe the change you want to describe here is a change associated with a transition in the visual stimulus. It would be good to clarify this in the first several sentences here. Baseline can mean different things. I got the impression that your stimuli flip between states at a rate fast enough that signals do not really have time to return to a baseline.

      We rephrased the sentence to describe the metric more precisely. A pair of uniform colors flipping in 1.5 second intervals is usually long enough for spiking activities to decay to a saturated level.

      This section (379 - 398) continues a qualitative show-and-tell feel. There appears to be a lot of variability across the examples in Figure 7. How could you try to quantify this variability versus the variability in LFP? And, in this section overall, the text and figure legend don't really describe what the baseline is.

      Text adjustments are made to briefly describe the baseline window and point to the Method section where definitions are described in detail. We’ve added Figure S3 together with Figure S2 to address the variability across penetrations, stimuli, and metrics.

      Line 405 - 415. The discussion here does not consider that layers may not have well defined boundaries, the text gives the impression that there is some ultimate ground truth to which the metrics are being compared, but that may not be accurate.

      Except for a few layers/sublayers, such as L2, L3A, L3B, most layer boundaries of neocortex are well defined (Figure 1A) and histological staining of neurons/density and correlated changes in chemical content show very sharp transitions. The best of these staining methods is cytochrome oxidase, which shows sharp borders at the top and bottom of layer 4A, top and bottom of layer 4C, and the layer 5/6 border. There is also a sharp transition in neuronal cell body size and density at the top and bottom of layer 4Cb. The definition and delineation of all possible layers are constantly being refined, especially by accumulated knowledge of genetic markers of different cell types and connection patterns. In our study, we develop metrics to estimate well known anatomical and functional properties of different layers. We have also discussed layer boundaries that have been ambiguous to date and explained the reason and criteria to resolve them.

      Line 423. The text references Figure 1A in stating that relative thickness and position is crucial, but FIgure 1A does not provide that information and does not explain how it might be determined, or how much of a consensus there is. Also, the text does not consider that the electrode may go through the cortex at oblique angles, and not the same angle in each layer, and the relative thickness may not be a dependable reference.

      There are numerous studies that describe criteria to delineate cortical layers, the referenced article (Balaram & Kaas 2014) is used here as an example. We are not aware of any publication that has systematically compared the relative thickness of layers across the V1 surface of a given animal or across animals. Nevertheless, it is clear from the literature that there is considerable similarity across animals. Accordingly, we cannot know what the source of variability in overall cortical thickness in our samples is, but we do see considerable consistency in the relative thickness of the layers we infer from our measures. We illustrate the differences that we see across penetrations and consider likely causes, such as the extent to which the coverslip pressing down on the cortex might differentially compress the cortex at different locations within the chamber.

      The angle deviation of probe from surface will not change the relative thickness of layers, and the rigid linear probe is unlikely to bend in the cortex.

      Line 433. The term "Coherence" is used, clarify is this is you Cf from Figure 6. The text states, "marked decrease at the bottom of layer 6". Please clarify this, I do not see that in Figure 6.

      Text has been adjusted.

      In Figure 6, the locations of the lines between L1 and 2 do not seem to be consistent with respect to the subtle changes in light blue shading, across all three examples, yet the text on line 436 states that there is a clear transition.

      We feel that the language used accurately reflects what is shown in the figure. While the transition is not sharp, it is clear that there is a transition. This transition is not used to define this laminar border. We have edited the text to clarify that the L1/2 border is better defined based on the change in AP power which shows a sharp transition (Figure 7). 

      The text states that the boundary is also "always clear" from metrics... and sites Figure 5, but I do not see that this boundary is clear for all three examples in Figure 5.

      Text has been adjusted.

      Line 438. The text states that "it is not unusual for unit density to fall to zero below the L1/2 border (Figure 8E)", but surprisingly, the line in Figure 8 E does not even cover the indicated boundary between L1 and L2.

      At this point, the number of statements in the text that do not clearly and precisely correlate to the data in the figures is worrisome, and I think you could lose the confidence of readers at this point.

      We do not see any inconstancy between what is stated in our text and what is noted by the reviewer. The termination of the blue line corresponds to the location where no units are detected. This is the location where “unit density falls to zero”.  In this example, no units resolved through spike sorting until ~100mm beneath the L1/L2 boundary, which is exactly zero unity density (Figure 8E). That there are electrical signals in this region is clear from the AP power change (Figure 8C) which also shows the location of the L1/L2 border.

      Line 448. Text states that the 6A/B border is defined by a sharp boundary in AP power, but Figure 8A "AP power spectrum" does not show a sharp change at the A/B line. There is a peak in this metric in the middle to upper middle of 6A, but nothing so sharp to define a boundary between distinct layers, at least for penetration A2.

      Text has been adjusted.

      In Figure 8, the layer labels are not clear, whereas they are reasonably clear in the other figures.

      This is a technical problem regarding vector graphics that were not properly converted in PDF generation. We will upload each high-quality vector graphics when we finalize the version of record.

      The text emphasizes differences in L4B and L4C with respect to average power and coherence, but the transition seems a bit gradual from layer 3B to 4C in some examples in Figure 6. And in Figure 5, A3, there doesn't appear to be any particular transition along the line between 4B and 4C.

      In this guide section, we pointed out early that some metrics are good for some boundaries and variation exists between penetrations. We’ve expanded text emphasizing the importance of timing differences in DP/P for differentiating sublayers in L4. Lastly, in case of several unresolvable boundaries given all the metrics, the prior knowledge of relative thickness should be used.

      Line 466 provides prescriptions in absolute linear distances, but this is unwise given that cortex may be crossed at oblique angles by electrodes, particularly for parts of V1 that are not on the surface of the brain. Other parts of the text have emphasized relative measurements.

      Text has been changed using relative measurements.

      Line 507. The text says 9C and 4A are a good match, but the match does not look that good (4A has substantial dips at 0.5 and 0.75, and substantial peaks), and there is no quantification of fit. The error bars on 9C do not help show the variability across penetrations, they appear to be SEM, which shows that error bars get smaller as you average more data. It would seem more important to understand what is the variance in the density from one penetration to the next compared to the variance in density across layers.

      We have replaced “good match” with “roughly corresponds”. We note that we do not use unit density as a metric for identification of laminar borders and instead show that the expected locations of layers with higher neuronal density correspond to the locations where there are similar changes in unit density. It should be noted that Figure 9C is an average across many penetrations so should not be expected to show transitions that are as sharp in individual penetrations. Because of the figure permission issue, we have removed Figure 4A, and changed the text accordingly.

      Figure 9C-F show a lot of variability in the individual curves (dim gray lines) compared to the overall average. Does this show that these metrics are not reliable indicators at the level of single penetration, but show some trends across larger averages?

      In the beginning of the guide, we emphasized that all metrics should be combined for individual penetration, because some metrics are only reliable for delineating certain layer boundaries and the quality of data for the various measures varies between penetrations. The penetration average serves the same purpose explained in the previous question as an indicator that our layer delineation was not far off.

      The discussion mentions improvements in layer identification made here. Did this work check the assignments for these penetration against assignments made based on some form of ground truth? Previous methods would advance electrodes steadily, and make lesions, and carry out histology. Is there any way to tell how this method would compare to that?

      Even electrolytic lesions do not necessarily reveal ground truth and can be quite misleading. And their resolution is limited by lesion size. Lesions are typically variable in size, asymmetric and have variable shape and position relative to the location of the electrode tip, likely affected by the quality and location of electrical grounding and variations in current flow due to locations of blood vessels. A review of the published literature with electrode lesions shows that electrophysiological transitions are likely a far more accurate indicator of recording locations than post-mortem histology from electrolytic lesions. It is extremely rare for the locations of lesions to be precisely aligned to expected laminar transitions. See for example Chatterjee et al (Nature 2004). Also see several manuscripts from the Shapley lab. The lone rare exception of which we are aware is Blasdel and Fitzpatrick1984 in which consistently small and round lesions were produced and even these would be too large (~100 microns) to accurately identify layers if it were not for the fact that the electrode penetrations were very long and tangential to the cortical layers. 

      Reviewer #3 (Recommendations For The Authors):

      - The authors say (lines 360-362) that "Assuming spikes of a neuron spread to at least two adjacent recording channels, then the coherence between the two channels would be directly proportional to number of spikes, independent of spike amplitude." Has this been demonstrated? Very large amplitude spikes should show up on more channels than small amplitude spikes. Do waveform amplitudes and unit densities from the spike waveform analyses show consistent relationships to the power and/or coherence distributions over depth across penetrations?

      This part of the manuscript is providing a theoretical rational for what might be expected to affect the measures that we have derived. That is why we begin by stating that we are making an assumption. The answers to the reviewer’s questions are not known and have not been demonstrated. By beginning with this theoretical preface, we can point to cases where the data match these expectations as well as other cases where the data differ from the theoretical expectations.

      Coherence, by definition, is a normalized metric that is insensitive to amplitude. Spike amplitude mainly depends on how close the signal source is to electrode, and spike spread mainly depends on cell body size and shape given the same distance to electrode. Therefore, a very large spike amplitude could stem from a very close small cell to electrode, but would result in a small spike spread, especially axonal spikes (Figure 4B, red spike). Spike amplitudes on average are higher in L4C which matches the expectation that higher cell density would result, on average, closer cell body to electrode (Figure S4A). Nonetheless, the high-density small cell bodies in L4C result in a small spike spread (Figure 9D).

      - I suggest clarifying what is defined as the baseline window for the ΔP/P measure - is it the entire 10-150 ms response window used for the power spectrum analysis?

      Text adjustments are made in the Methods where the time windows are defined at the beginning of the CSD section. Only temporal change metrics (ΔCSD and ΔP/P) use the baseline window ([-40, 10]ms). The other two spectrum metrics (Power and Coherence) use the response window ([10, 150]ms).

      - Firing rate differs by cell type and, on average, differs by layer in V1. Many layer 2/3 neurons, for example, have low maximum firing rates when driven with optimized achromatic grating stimuli. To the extent that the generative models explaining the sources of power and coherence signals rely on the assumption that firing rates are matched across cortical depth, these models may be inaccurate. This assumption is declared only subtly, and late in the paper, but it is relevant to earlier claims.

      Text adjustments are made to explicitly describe the possibility that uneven depth profile of firing rate could counteract the depth profile of neuron density, resulting distorted or even a flat depth profile of power/coherence that deviates far from the depth profile of neuron density. In a newly added Figure S4, we first show the average firing rate profile during a set of stimuli (uniform color, static/drifting, achromatic/chromatic gratings), then specifically the PSTHs of the same stimuli shown in this study. It can be seen that layers receiving direct LGN inputs tend to fire at a higher rate (L4C, L6A). Firing rates in the PSTHs either roughly match across layers or are much higher in the densely packed layers. Therefore, the depth profile of firing rate contributes to rather than counteracting that of neuron density, enhancing the utility of the power/coherence profile for identification of correct layer boundaries.

      - Given the acute preparation used for recordings, I wonder whether tissue is available for histological evaluation. Although the layers identified are generally appropriate in relative size, it would be particularly compelling if the authors could demonstrate that the fraction of the cortical thickness occupied by each layer corresponded to the proportion occupied by that layer along the probe trajectory in histological sections. This would lend strength to the claim that these analyses can be used to identify layers in the absence of histology. Furthermore, variations in apparent cortical thickness could arise from different degrees of deviation from surface normal approach angles, which might be apparent by evaluation of histological material. I would add that variation in thickness on the scale shown in Fig. S4 is more likely to have an explanation of this kind.

      To serve other purposes unrelated to this study (identification of CO blobs), we cut the postmortem tissue in horizontal slices, so the histological comparison suggested cannot be made. The cortical thickness measured in this study had been affected not only by the angle deviation from the surface normal but also the swelling and compression of cortex. Nevertheless, evaluating the absolute thickness of cortex is not the main purpose of this study.

      Text and figure suggestions:

      - Fig 1A has been modified from Balaram & Kaas (2014) to revert to the Brodmann nomenclature scheme they argue against using in that paper; I wonder if they would object to this modification without explanation. Related, in the main text the authors initially refer to layers using Brodmann's labels with a secondary scheme (Hassler's) in parentheses and later drop the parenthetical labels; these conventions are not described or explained. Readers less familiar with the multiple nomenclature schemes for monkey V1 layers might be confused by the multiple labels without context, and could benefit from a brief description of the convention the authors have adopted.

      Throughout our article, we only used Brodmann’s naming convention because it has historically been adopted for old world monkey which we use in our study, whereas Hassler’s naming convention is more commonly used for new world monkey. Different naming conventions do not change our result, and it is out of scope for our study to discuss which nomenclature is more appropriate.

      - References to "dominate eye" throughout the text and figure legends should be replaced with "dominant eye."

      It has been changed throughout the article.

      - It is a bit odd to duplicate the same example in Fig. 2C and 2E. Perhaps a unique example would be a better use of the space.

      Here we first demonstrate the filtering effect, then compare profiles across different penetrations. The same example bridges the transition allowing side-by-side comparison.

      - The legend for Fig. 3 might be clearer if it simply listed the stimulus transitions for each column left to right, i.e. "black to white (non-dominant eye), white to black (non-dominant eye), black to white (dominant eye), ..."

      We feel that the icons are helpful. Here we want to show the stimulus colors directly to readers.

      - The misalignment between Fig. 4A vs. 4B-F, combined with the very small font size of the layer labels in Fig. 4B-F, make the visual comparison difficult. In Figs. 7 and 8, layer labels (and most labels in general) are much too small and/or low resolution to read easily. Overall, I would recommend increasing font size of labels in figures throughout the paper.

      The reused figure in Figure 4A is removed due to permission issue. Font sizes are adjusted.

      - Line 591 "using of high-density probes" should be "using high-density probes"

      Text has been changed accordingly

    1. eLife Assessment

      This important study provides solid evidence for a non-genomic action of progesterone in Xenopus oocyte activation. The findings demonstrate that two non-genomic progesterone receptors, ABHD2 and mPRb, function as a novel progesterone-stimulated phospholipase A2. The findings will be of broad interest to reproductive endocrinologists and physiologists.

    2. Reviewer #1 (Public review):

      Summary:

      Numerous pathways have been proposed to elucidate the nongenomic actions of progesterone within both male and female reproductive tissues. The authors employed the Xenopus oocyte system to investigate the PLA2 activity of ABHD2 and the downstream lipid mediators in conjunction with mPRb and P4, on their significance in meiosis. The research has been conducted extensively and is presented clearly.

      Strengths:

      While the interaction between membranous PR and ABHD2 is not a novel concept, this present study exhibits several strengths:

      (1) mPRbeta, a member of the PAQR family, has been elusive in terms of detailed signal transduction. Through mutation studies involving the Zn binding domain, the authors discovered that the hydrolase activity of mPRbeta is not essential for meiosis and oocyte maturation. Instead, they suggest that ABHD2, acting as a coreceptor of mPRbeta, demonstrates phospholipase activity, indicating that downstream lipid mediators may play a dominant role when stimulated by progesterone.<br /> (2) Extensive exploration of downstream signaling pathways and the identification of several potential meiotic activity-related lipid mediators make this aspect of the study novel and potentially significant.

      Weaknesses:

      However, there are some weaknesses and areas that need further clarification:

      (1) The mechanism governing the molecular assembly of mPRbeta and ABHD2 remains unclear. Are they constitutively associated or is their association ligand-dependent? Does P4 bind not only to mPRbeta but also to ABHD2, as indicated in Figure 6J? In the latter case, the reviewer suggests that the authors conduct a binding experiment using labeled P4 with ABHD2 to confirm this interaction and assess any potential positive or negative cooperativity with a partner receptor.

      (2) The authors have diligently determined the metabolite profile using numerous egg cells. However, the interpretation of the results appears incomplete, and inconsistencies were noted between Figure 2F and Supplementary Figure 2C. Furthermore, PGE2 and D2 serve distinct roles and have different elution patterns by LC-MS/MS, thus requiring separate measurements. In addition, the extremely short half-life of PGI2 necessitates the measurement of its stable metabolite, 6-keto-PGF1a, instead. The authors also need to clarify why they measured PGF1a but not PGF2a. Unfortunately, even in the revision, authors did not adequately address the last issue (differential measurements of PGD2 and E2, 6-keto-PG!alpha be determined instead of PGI2).

      (3) Although they propose PGs, LPA and S1P are important downstream mediators, the exact roles of the identified lipid mediators have not been clearly demonstrated, as receptor expression and activation were not demonstrated. While the authors showed S1PR3 expression and its importance by genetic manipulation, there was no observed change in S1P levels following P4 treatment (Supplementary Figure 2D). It is essential to identify which receptors (subtypes) are expressed and how downstream signaling pathways (PKA, Ca, MAPK, etc.) relate to oocyte phenotypes.

      These clarifications and further experiments would enhance the overall impact and comprehensiveness of the study.

      Comments on revisions:

      Need correction and addition for differential analyses of PGD2 and PGE2, and measurement of 6-keto-PGF1alpha instead of PGI2 (Figure 2F). PGI2 is extremely unstable (T1/2, 1 min in neutral buffer) and rapidly converted nonenzymically to 6-keto-PGF1a.

    3. Reviewer #2 (Public review):

      Summary:

      This interesting paper examines the earliest steps in progesterone-induced frog oocyte maturation, an example of non-genomic steroid hormone signaling that has been studied for decades but is still very incompletely understood. In fish and frog oocytes it seems clear that mPR proteins are involved, but exactly how they relay signals is less clear. In human sperm, the lipid hydrolase ABHD2 has been identified as a receptor for progesterone, and so the authors here examine whether ABHD2 might contribute to progesterone-induced oocyte maturation as well. The main results are:

      (1) Knocking down ABHD2 makes oocytes less responsive to progesterone, and ectopically expressing ABHD2.S (but not the shorter ABHD2.L gene product) partially rescues responsiveness. The rescue depends upon the presence of critical residues in the protein's conserved lipid hydrolase domain, but not upon the presence of critical residues in its acyltransferase domain.

      (2) Treatment of oocytes with progesterone causes a decrease in sphingolipid and glycerophospholipid content within 5 min. This is accompanied by an increase in LPA content and arachidonic acid metabolites. These species may contribute to signaling through GPCRs. Perhaps surprisingly, there was no detectable increase in sphingosine-1-phosphate, which might have been expected given the apparent substantial hydrolysis of sphingolipids. The authors speculate that S1P is formed and contributes to signaling but diffuses away.

      (3) Pharmacological inhibitors of lipid-metabolizing enzymes support, for the most part, the inferences from the lipidomics studies, although there are some puzzling findings. The puzzling findings may be due to uncertainty about whether the inhbitors are working as advertised.

      (4) Pharmacological inhibitors of G-protein signaling support a role for G-proteins and GPCRs in this signaling, although again there are some puzzling findings.

      (5) Reticulocyte expression supports the idea that mPRβ and ABHD2 function together to generate a progesterone-regulated PLA2 activity.

      (6) Knocking down or inhibiting ABHD2 inhibited progesterone-induced mPRβ internalization, and knocking down ABHD2 inhibited SNAP25∆20-induced maturation.

      Strengths:<br /> All in all, this could be a very interesting paper and a nice contribution. The data add a lot to our understanding of the process, and, given how ubiquitous mPR and AdipoQ receptor signaling appear to be, something like this may be happening in many other physiological contexts.

      Weaknesses:

      I have several suggestions for how to make the main points more convincing.

      Main criticisms:

      (1) The ABHD2 knockdown and rescue, presented in Fig 1, is one of the most important findings. It can and should be presented in more detail to allow the reader to understand the experiments better. E.g.: the antisense oligos hybridize to both ABHD2.S and ABHD2.L, and they knock down both (ectopically expressed) proteins. Do they hybridize to either or both of the rescue constructs? If so, wouldn't you expect that both rescue constructs would rescue the phenotype, since they both should sequester the AS oligo? Maybe I'm missing something here.

      In addition, it is critical to know whether the partial rescue (Fig 1E, I, and K) is accomplished by expressing reasonable levels of the ABHD2 protein, or only by greatly overexpressing the protein. The author's antibodies do not appear to be sensitive enough to detect the endogenous levels of ABHD2.S or .L, but they do detect the overexpressed proteins (Fig 1D). The authors could thus start by microinjecting enough of the rescue mRNAs to get detectable protein levels, and then titer down, assessing how low one can go and still get rescue. And/or compare the mRNA levels achieved with the rescue construct to the endogenous mRNAs.

      Finally, please make it clear what is meant by n = 7 or n = 3 for these experiments. Does n = 7 mean 7 independently lysed oocytes from the same frog? Or 7 groups of, say, 10 oocytes from the same frog? Or different frogs on different days? I could not tell from the figure legends, the methods, or the supplementary methods. Ideally one wants to be sure that the knockdown and rescue can be demonstrated in different batches of oocytes, and that the experimental variability is substantially smaller than the effect size.

      (2) The lipidomics results should be presented more clearly. First, please drop the heat map presentations (Fig 2A-C) and instead show individual time course results, like those shown in Fig 2E, which make it easy to see the magnitude of the change and the experiment-to-experiment variability. As it stands, the lipidomics data really cannot be critically assessed.

      [Even as heat map data go, panels A-C are hard to understand. The labels are too small, especially on the heat map on the right side of panel B. And the 25 rows in panel C are not defined (the legend makes me think the panel is data from 10 individual oocytes, so are the 25 rows 25 metabolites? If so, are the individual oocyte data being collapsed into an average? Doesn't that defeat the purpose of assessing individual oocytes?) And those readers with red-green colorblindness (8% of men) will not be able to tell an increase from a decrease. But please don't bother improving the heat maps; they should just be replaced with more-informative bar graphs or scatter plots.]

      (3) The reticulocyte lysate co-expression data are quite important, and are both intriguing and puzzling. My impression had been that to express functional membrane proteins, one needed to add some membrane source, like microsomes, to the standard kits. Yet it seems like co-expression of mPR and ABHD2 proteins in a standard kit is sufficient to yield progesterone-regulated PLA2 activity. I could be wrong here-I'm not a protein expression expert-but I was surprised by this result, and I think it is critical that the authors make absolutely certain that it is correct. Do you get much greater activities if microsomes are added? Are the specific activities of the putative mPR-ABHD2 complexes reasonable?

      Comments on revisions:

      The authors have satisfied my concerns with their response letter and revisions.

    4. Reviewer #3 (Public review):

      Summary:

      The authors report two P4 receptors, ABHD2 and mPRβ that function as co-receptors to induce PLA2 activity and thus drive meiosis. In their experimental studies, the authors knock down ABHD2 and demonstrated inhibition of oocyte maturation and inactivation of Plk1, MAPK, and MPF, which indicated that ABHD2 is required for P4-induced oocyte maturation. Next, they showed three residues (S207, D345, H376) in the lipase domain that are crucial for ABHD2 P4-mediated oocyte maturation in functional assays. They performed global lipidomics analysis on mPRβ or ABHD2 knockdown oocytes, among which the downregulation of GPL and sphingolipid species were observed and enrichment in LPA was also detected using their metabolomics method. Furthermore, they investigated pharmacological profiles of enzymes predicted to be important for maturation based on their metabolomic analyses and ascertained the central role for PLA2 in inducing oocyte maturation downstream of P4. They showed the modulation of S1P/S1PR3 pathway on oocyte maturation and potential role for or Gαs signaling and potentially Gβγ downstream of P4.

      Strengths:

      The authors make a very interesting finding that ABHD2 has PLA2 catalytic activity but only in the presence of mPRβ and P4. Finally, they provided supporting data for a relationship between ABHD2/PLA2 activity and mPRβ endocytosis and further downstream signaling. Collectively, this research report defines early steps in nongenomic P4 signaling, which is of broad physiological implications.

      Weaknesses:

      There were concerns with the pharmacological studies presented. Many of these inhibitors are used at high (double digit micromolar) concentrations that could result in non-specific pharmacological effects and the authors have provided very little data in support of target engagement and selectivity under the multiple experimental paradigms. In addition, the use of an available ABHD2 small molecule inhibitor was lacking in these studies.

      Comments on revisions:

      In the revised manuscript, the authors have addressed my major concerns.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment:

      “…However, the findings are reliant on high concentrations of inhibitor drugs, and mechanistic details about the molecular interaction and respective functions of ABHD2 and mPRb are incomplete.”

      As discussed below in the response to Reviewers the drug concentrations used span the full dose response of the active range of each drug. In cases where the drug concentrations required to block oocyte maturation where significantly higher than those reported in the literature, we considered those drugs ineffective. In terms of the molecular details of the mechanistic interaction between mPRb and ABHD2, we now provide additional data confirming their molecular interaction to produce PLA2 activity where each protein alone is insufficient. Although these new studies provide more mechanistic insights, there remains details of the ABHD2-mPR interactions that would need to be addressed in future studies which are beyond the scope of the current already extensive study.   

      Public Reviews:

      Reviewer 1

      (1) The mechanism governing the molecular assembly of mPRbeta and ABHD2 remains unclear. Are they constitutively associated or is their association ligand-dependent? Does P4 bind not only to mPRbeta but also to ABHD2, as indicated in Figure 6J? In the latter case, the reviewer suggests that the authors conduct a binding experiment using labeled P4 with ABHD2 to confirm this interaction and assess any potential positive or negative cooperativity with a partner receptor.

      The co-IP experiments presented in Figure 5E argue that the two receptors are constitutively associated at rest before exposure to P4; but at low levels since addition of P4 increases the association between mPRβ and ABHD2 by ~2 folds. Importantly, we know from previous work (Nader et al., 2020) and from imaging experiments in this study that mPR recycles in immature oocytes between the PM and the endosomal compartment. It is not clear at this point within which subcellular compartment the basal association of mPR and ABHD2 occurs. We have tried to elucidate this point but have not been able to generate a functional tagged ABHD2. We generated GFP-tagged ABHD2 at both the N- and C-terminus but these constructs where not functional in terms of their ability to rescue ABHD2 knockdown. This prevented us from testing the association dynamics between ABHD2 and mPR.   

      Regarding whether ABHD2 in the oocyte directly binds P4 or not, we had in the initial submission no data directly supporting this rather we based the cartoon in Fig. 6J on the findings from Miller et al. (Science 2016) who showed that ABHD2 in sperm binds biotinylated P4. With the use of a new expression system to produce ABHD2 in vitro (please see below) we were able to try the experiment suggested by the Reviewer. In vitro expressed ABHD2 was incubated with biotinylated P4, and binding tested on a streptavidin column. Under these conditions we could not detect any specific binding of P4 to ABHD2, however, these experiments remain somewhat preliminary and would require validation using additional approaches to conclusively test whether Xenopus ABHD2 binds P4 or not. The discrepancy with the Miller et al. findings could be species specific as they tested mammalian ABHD2.  

      (2) The authors have diligently determined the metabolite profile using numerous egg cells. However, the interpretation of the results appears incomplete, and inconsistencies were noted between Figure 2B and Supplementary Figure 2C. Furthermore, PGE2 and D2 serve distinct roles and have different elution patterns by LC-MS/MS, thus requiring separate measurements. In addition, the extremely short half-life of PGI2 necessitates the measurement of its stable metabolite, 6-keto-PGF1a, instead. The authors also need to clarify why they measured PGF1a but not PGF2a.

      We believe the Reviewer meant to indicate discrepancies between Fig. 2E (not 2B) and Supp. Fig. 2C. Indeed, the Reviewer is correct, and this is because Fig. 2E shows pooled normalized data on a per PG species and frog, whereas Supp. Fig. 2E shows and example of absolute raw levels from a single frog to illustrate the relative basal abundance of the different PG species. We had failed to clarify this in the Supp. Fig. 2E figure legend, which we have now added in the revised manuscript. So, the discrepancies are due to variation between different donor animals which is highlighted in Supp. Fig. 2A. Furthermore, to minimize confusion, in the revised manuscript we revised Supp. Fig. 2C to show only PG levels at rest, to illustrate basal levels of the different PG species relative to each other, which is the goal of this supplemental figure. 

      (3) Although they propose PGs, LPA, and S1P are important downstream mediators, the exact roles of the identified lipid mediators have not been clearly demonstrated, as receptor expression and activation were not demonstrated. While the authors showed S1PR3 expression and its importance by genetic manipulation, there was no observed change in S1P levels following P4 treatment (Supplementary Figure 2D). It is essential to identify which receptors (subtypes) are expressed and how downstream signaling pathways (PKA, Ca, MAPK, etc.) relate to oocyte phenotypes.

      We agree conceptually with the Reviewer that identifying the details of the signaling of the different GPCRs involved in oocyte maturation would be interesting. However, our lipidomic data argue that the activation of a PLA2 early in the maturation process in response to P4 leads to the production of multiple lipid messengers that would activate GPCRs and branch out the signaling pathway to activate various pathways required for the proper and timely progression of oocyte maturation. Preparing the egg for fertilization is complex; so, it is not surprising that a variety of pathways are activated simultaneously to properly initiate both cytoplasmic and nuclear maturation to transition the egg from its meiotic arrest state to be ready to support the rapid growth during early embryogenesis. We focus on the S1P signaling pathway specifically because, as pointed out by the Reviewer, we could not detect an increase in S1P even though our metabolomic data collectively argued for an increase. Our results on the S1P pathway -as well as a plethora of other studies historically in the literature that we allude to in the manuscript- argue that these different GPCRs support and regulate oocyte maturation, but they are not essential for the early maturation signaling pathway. For example, for S1P, as shown in Figure 4, the delay/inhibition of oocyte maturation due to S1PR3 knockdown can be reversed at high levels of P4, which presumably leads to higher levels of other lipid mediators that would bypass the need for signaling through S1PR3. This is reminiscent of the kinase cascade driving oocyte maturation where there is significant redundancy and feedback regulation. Therefore, analyzing each receptor subtype that may regulate the different PG species, LPA, and S1P would be a tedious and time-consuming undertaking that goes beyond the scope of the current manuscript. More importantly based on the above arguments, we suggest that findings from such an analysis, similar to the conclusions from the S1PR3 studies (Fig. 4), would show a modulatory role on oocyte maturation rather than a core requirement for the maturation process as observed with mPR and ABHD2. Thus they would provide relatively little insights into the core signaling pathway driving P4-mediated oocyte maturation.

      Reviewer 2:

      (1) The ABHD2 knockdown and rescue, presented in Fig 1, is one of the most important findings. It can and should be presented in more detail to allow the reader to understand the experiments better. E.g.: the antisense oligos hybridize to both ABHD2.S and ABHD2.L, and they knock down both (ectopically expressed) proteins. Do they hybridize to either or both of the rescue constructs? If so, wouldn't you expect that both rescue constructs would rescue the phenotype since they both should sequester the AS oligo? Maybe I'm missing something here.

      For the ABHD2 rescue experiment, the ABHD2 constructs (S or L) were expressed 48 hrs before the antisense was injected. The experiment was conducted in this way to avoid the potential confounding issue of both constructs sequestering the antisense. The assumption is that the injected RNA after protein expression would be degraded thus allowing the injected antisense to target endogenous ABHD2. The idea is to confirm that ABHD2.S expression alone is sufficient to rescue the antisense knockdown as confirmed experimentally.

      However, to further confirm the rescue, we performed the experiment in a different chronological order, where we started with injecting the antisense to knock down endogenous ABHD2 and this was followed 24 hrs later by expressing wild type ABHD2.S. As shown in Author response image 1 this also rescues the knockdown.

      Author response image 1.

      ABHD2 knockdown and rescue. Oocytes were injected with control antisense (Ctrl AS) or specific ABHD2 antisense (AS) oligonucleotides and incubated at 18 oC for 24 hours. Oocytes were then injected with mRNA to overexpress ABHD.S for 48 hours and then treated with P4 overnight. The histogram shows % GVBD in naïve, oocytes injected with control or ABHD2 antisense with or without mRNA to overexpress ABHD2.S.

      In addition, it is critical to know whether the partial rescue (Fig 1E, I, and K) is accomplished by expressing reasonable levels of the ABHD2 protein, or only by greatly overexpressing the protein. The author's antibodies do not appear to be sensitive enough to detect the endogenous levels of ABHD2.S or .L, but they do detect the overexpressed proteins (Fig 1D). The authors could thus start by microinjecting enough of the rescue mRNAs to get detectable protein levels, and then titer down, assessing how low one can go and still get rescue. And/or compare the mRNA levels achieved with the rescue construct to the endogenous mRNAs.

      The dose response of ABHD2 protein expression in correlation with rescue of the ABHD2 knockdown is shown indirectly in Figure 1I and 1J. In experiments ABHD2 knockdown was rescued using either the WT protein or two mutants (H120A and N125A). All three constructs rescued ABHD2 KD with equal efficiency (Fig. 1I), eventhough their expression levels varied (Fig. 1J). The WT protein was expressed at significantly higher levels than both mutants, and N125A was expressed at higher levels than H120A (Fig. 1J), note the similar tubulin loading control. Crude estimation of the WBs argues for the WT protein expression being ~3x that of H120A and ~2x that of N125A, yet all three have similar rescue of the ABHD2 knockdown (Fig. 1I). This argues that low levels of ABHD2 expression is sufficient to rescue the knockdown, consistent with the catalytic enzymatic nature of the ABHD2 PLA2 activity.

      Finally, please make it clear what is meant by n = 7 or n = 3 for these experiments. Does n = 7 mean 7 independently lysed oocytes from the same frog? Or 7 groups of, say, 10 oocytes from the same frog? Or different frogs on different days? I could not tell from the figure legends, the methods, or the supplementary methods. Ideally one wants to be sure that the knockdown and rescue can be demonstrated in different batches of oocytes, and that the experimental variability is substantially smaller than the effect size.

      The n reflects the number of independent female frogs. We have added this information to the figure legends. For each donor frog at each time point 10-30 oocytes were used.

      (2) The lipidomics results should be presented more clearly. First, please drop the heat map presentations (Fig 2A-C) and instead show individual time course results, like those shown in Fig 2E, which make it easy to see the magnitude of the change and the experiment-to-experiment variability. As it stands, the lipidomics data really cannot be critically assessed.

      [Even as heat map data go, panels A-C are hard to understand. The labels are too small, especially on the heat map on the right side of panel B. The 25 rows in panel C are not defined (the legend makes me think the panel is data from 10 individual oocytes, so are the 25 rows 25 metabolites? If so, are the individual oocyte data being collapsed into an average? Doesn't that defeat the purpose of assessing individual oocytes?) And those readers with red-green colorblindness (8% of men) will not be able to tell an increase from a decrease. But please don't bother improving the heat maps; they should just be replaced with more informative bar graphs or scatter plots.]

      We have revised the lipidomics data as requested by the Reviewer. The Reviewer asked that we show the data as a time course with each individual frog as in Fig. 2E. This turns out to be confusing and not a good way to present the data (please see Author response image 2).

      Author response image 2.

      Metabolite levels from 5 replicates of 10 oocytes each at each time point were measured and averaged per frog and per time point. Fold change was measured as the ratio at the 5- and 30-min time points relative to untreated oocytes (T0). FCs that are not statistically significant are shown as faded. Oocytes with mPR knockdown (KD) are boxed in green and ABHD2-KD in purple.

      We therefore revised the metabolomics data as follow to improve clarity. The changes in the glycerophospholipids and sphingolipids determined on the Metabolon CLP platform (specific for lipids) are now shown as single metabolites clustered at the levels of species and pathways and arranged for the 5- and 30-min time points sequentially on the same heatmap as requested (Fig. 2B). This allows for a quick visual overview of the data that clearly shows the decrease in the lipid species following P4 treatment in the control oocytes and not in the mPR-KD or ABHD2-KD cells (Fig. 2B). The individual species are listed in Supplemental Tables 1 and 2. We also revised the Supplemental Tables to include the values for the non-significant changes, which were omitted from the previous submission.

      We revised the metabolomics data from the HD4 platform in a similar fashion but because the lipid data were complimentary and less extensive than those from the CLP platform, we moved that heatmap to Supplemental Fig. 2B.

      For the single oocyte metabolomics, we now show the data as the correlation between FC and p value, which clearly shows the upregulated (including LPA) and downregulated metabolites at T30 relative to T0 (Fig. 2C). The raw data is now shown in a new Supplemental Table 7.  

      (3) The reticulocyte lysate co-expression data are quite important and are both intriguing and puzzling. My impression had been that to express functional membrane proteins, one needed to add some membrane source, like microsomes, to the standard kits. Yet it seems like co-expression of mPR and ABHD2 proteins in a standard kit is sufficient to yield progesterone-regulated PLA2 activity. I could be wrong here - I'm not a protein expression expert - but I was surprised by this result, and I think it is critical that the authors make absolutely certain that it is correct. Do you get much greater activities if microsomes are added? Are the specific activities of the putative mPR-ABHD2 complexes reasonable?

      We thank the Reviewer for this insightful comment. We agree that this is a critical result that would benefit from cross validation, especially given the low level of PLA2 activity detected in the reticulocyte lysate expression system. We have therefore expanded these studies using another in vitro expression system with microsomal membranes based on tobacco extracts (ALiCE®Cell-Free Protein Synthesis System, Sigma Aldrich) to enhance production and stability of the expressed receptors as suggested by the Reviewer. We further prepared virus-like particles (VLPs) from cells expressing each receptor individually or both receptors together. We however could not detect any PLA2 activity from the VLPs. We thus focused on the coupled in vitro transcription/translation tobacco extracts that allow the expression of difficult-to-produce membrane proteins in microsomes. This kit targets membrane protein directly to microsomes using a microsome targeting melittin signal peptide. This system took significant time and effort to troubleshoot and adapt to mPR and ABHD2 expression. We were however ultimately able to produce significantly higher amounts of both ABHD2 and mPRb, which were readily detected by WBs (Supplemental Fig. 4I). In contrast, we could not reliably detect mPR or ABHD2 using WBs from reticulocyte lysates given the limited amounts produced.

      Similarly to our previous findings with proteins produced in reticulocytes, expression of ABHD2 or mPRβ alone was not associated with an increase in PLA2 activity over a two-hour incubation period (Fig. 5C). It is worth noting here that the tobacco lysates had high endogenous PLA2 activity. However, co-expression of both mPRb and ABHD2 produced robust PLA2 activity that was significantly higher than that detected in reticulocyte lysate system (Fig. 5C). Surprisingly, however this PLA2 activity was P4 independent as it was observed when both receptors are co-expressed in the absence of P4.

      These results validate our earlier conclusion that PLA2 activity requires both mPR and ABHD2, so their interaction in needed for enzymatic activity. It is interesting however that in the tobacco expression system this mPR-ABHD2 PLA2 activity becomes for the most part P4 independent. As the tobacco expression system forces both ABHD2 and mPR into microsomes using a signal sequence, the two receptors are enriched in the same vesicular compartment. As they can interact independently of P4 as shown in the co-IP experiments in immature oocytes (Fig. 5D), their forced co-expression in the same microsomal compartment could lead to their association and thus PLA2 activity. This is an attractive possibility that fits the current data, but would need independent validation.

      Reviewer 3:

      There were concerns with the pharmacological studies presented. Many of these inhibitors are used at high (double-digit micromolar) concentrations that could result in non-specific pharmacological effects and the authors have provided very little data in support of target engagement and selectivity under the multiple experimental paradigms. In addition, the use of an available ABHD2 small molecule inhibitor was lacking in these studies.

      For the inhibitors used we performed a full dose response to define the active concentrations. So, inhibitors were not used at one high dose. We then compared the EC50 for each active inhibitor to the reported EC50 in the literature (Table 1). The inhibitors were deemed effective only if they inhibited oocyte maturation within the range reported in the literature. This despite the fact that frog oocytes are notorious in requiring higher concentrations of drug given their high lipophilic yolk content, which acts as a sponge for drugs. So our criteria for an effective inhibitor are rather stringent.  

      Based on these criteria, only 3 inhibitors were ‘effective’ in inhibiting oocyte maturation: Ibuprofen, ACA and MP-A08 with relative IC50s to those reported in the literature of 0.7, 1.1, and 1.6 respectively. Ibuprofen targets Cox enzymes, which produce prostaglandins. We independently confirmed an increase in PGs in response to P4 in oocytes thus validating the drug inhibitory effect. ACA blocks PLA2 and inhibits maturation, a role supported by the metabolomics analyses that shows decrease in the PE/PE/LPE/LPC species; and by the ABHD2-mPR PLA2 activity following in vitro expression. Finally, MP-A08 blocks sphingosine kinase activity, which role is supported by the metabolomics showing a decrease in sphingosine levels in response to P4; and our functional studies validating a role for the S1P receptor 3 in oocyte maturation.     

      As pointed out by the Reviewer, other inhibitors did block maturation at very high concentration, but we do not consider these as effective and have not implicated the blocked enzymes in the early steps of oocyte maturation. To clarify this point, we edited the summary panel (now Fig. 2D) to simplify it and highlight the inhibitors with an effect in the reported range in red and those that don’t inhibit based on the above criteria in grey. Those with intermediate effects are shown in pink. We hope these edits clarify the inhibitors studies.

      Recommendations For the Authors

      Reviewer 2:

      (1) Introduction, para 1. Please change "mPRs mediated" to "mPR-mediated".

      Done

      (2) Introduction, para 2. Please change "cyclin b" to "cyclin B".

      Done

      (3) Introduction, para 2. Please change "that serves" to "which serves".

      Done

      (4) Introduction, para 4. I know that the authors have published evidence that "a global decrease in cAMP levels is not detectable" (2016), but old work from Maller and Krebs (JBC 1979) did see an early, transient decrease after P4 treatment, and subsequent work from Maller said that there was both a decrease in adenylyl cyclase activity and an increase in cAMP activity. Perhaps it would be better to say something like "early work showed a transitory drop in cAMP activity within 1 min of P4 treatment (Maller), although later studies failed to detect this drop and showed that P4-dependent maturation proceeds even when cAMP is high (25)".

      We agree and thank the Reviewer for this recommendation. The text was revised accordingly.

      (5) Results, para 1. Based on the results in Fig 1B, one should probably not assert that ABHD2 is expressed "at levels similar to those of mPRβ in the oocyte"-with different mRNAs and different PCR primers, it's hard to say whether they are similar or not. The RNAseq data from Xenbase in Supp Fig 1 supports the idea that the ABHD2 and mPRβ mRNAs are expressed at similar levels at the message level, although of course mRNA levels and protein levels do not correlate well when different gene products are compared (Wuhr's 2014 Curr Biol paper reported correlation coefficients of about 0.3).

      We agree and have changed the text as follow to specifically point out to RNA: “we confirmed that ABHD2 RNA is expressed in the oocyte at levels similar to those of mPRβ RNA (Fig. 1B).”

      (6) Results, para 2. It would be worth pointing out that since an 18 h incubation with microinjected antisense oligos was sufficient to substantially knock down both the ABHD2 mRNAs (Fig 1C) and the ectopically-expressed proteins (Fig 1D), the mRNA and protein half-lives must be fairly short, on the order of a few hours or less.

      Done

      (7) Figure 1. Please make the western blots (especially Fig 1D) and their labeling larger. These are key results and as it stands the labeling is virtually unreadable on printed copies of the figures. I'm not sure about eLife's policy, but many journals want the text in figures to be no smaller than 5-7 points at 100% size.

      Likewise for many of the western blots in subsequent figures.

      As requested by the Reviewer we have increased the font and size of all Western blots in the Figures.

      (8) Figure 1E, G. I am not sure one should compare the effectiveness of the ABHD2 rescue (Fig 1E) and the mPRβ rescue (Fig 1G). Even if these were oocytes from the same frog, we do not know how the levels of the overexpressed ABHD2 and mPRβ proteins compare. E.g. maybe ABHD2 was highly overexpressed and mPRβ was overexpressed by a tiny amount.

      Although this is a possibility, the expression levels of the proteins here is not of much concern because we previously showed that mPRβ expression effectively rescues mPRβ antisense knockdown which inhibits maturation (please see (Nader et al., 2020)). This argues that at the levels of mRNA injected mPR is functional to support maturation, yet it does not rescue ABHD2 knockdown to the same levels (Fig. 1G). With that it is fair to argue that mPRβ is not as effective at rescuing ABHD2 KD maturation.

      (9) Inhibitor studies: There are two likely problems in comparing the observed potencies with legacy data - in vitro vs in vivo data and frog vs. mammalian data. Please make it clear what is being compared to what when you are comparing legacy data.

      The legacy data are from the literature based on the early studies that defined the IC50 for inhibition primarily using in vivo models (cell line mostly) but not oocytes. Typically, frog oocytes require significantly higher concentrations of inhibitors to mediate their effect because of the high lipophilic yolk content which acts as a sponge for some drugs. So, the fact that the drugs that are effective in inhibiting oocyte maturation (ACA, MP-A08, and Ibuprofen) work in a similar or lower concentration range to the published IC<sub50</sub> gives us confidence as to the specificity of their effect. We have revised Table 1 to include the reference for each IC<sub50</sub> value from the literature to allow the reader to judge the exact model and context used.

      (10) Isn't it surprising that Gas seems to promote maturation, given the Maller data (and data from others) that cAMP and PKA oppose maturation (see also the authors' own Fig 1A) and the authors' previous data sees no positive effect (minor point 7 above)?

      We show that a specific Gas inhibitor NF-449 inhibits maturation (although at relatively high concentrations), which is consistent with a positive role for Gas in oocyte maturation. We argue based on the lipidomics data and the inhibitors data that GPCRs play a modulatory role and not a central early signaling role in terms of releasing oocyte meiotic arrest. They are likely to have effects on the full maturation of the egg in preparation for embryonic development. The actions of the multiple lipid messengers generated downstream of mPRβ activation are likely to act through GPCRs and could signal through Gas or other Ga or even through Gβγ. Minor point 7 refers to the size of Western blots.

      (11) Page 9, bottom: "...one would predict activation of sphingosine kinases...." Couldn't it just be the activity of some constitutively active sphingosine kinase? Maybe replace "activation" with "activity".

      A constitutively sphingosine kinase activity would not make sense as it needs to be activated by P4.

      (12) Sometimes the authors refer to concentrations in molar units plus a power of 10 (e.g. 10-5 M) and sometime in µM or nM, sometimes even within the same paragraph. This makes it unnecessarily difficult to compare. Please keep consistent.

      We replaced all the concentrations through the text to M with scientific notation for consistency as requested by the Reviewer.

      (13) Fig 3I: "Sphingosine kinase" is misspelled.

      This has been corrected. We thank the Reviewer for catching it.

      (14) Legend to Fig. 5: Please change "after P4 treatment in reticulocytes" to "after P4 treatment in reticulocyte lysates".

      Done

      (15) Fig 6J. Doesn't the MAPK cascade inhibit MYT1? I.e. shouldn't the arrow be -| rather than ->?

      Yes the Reviewer is correct. This has been changed. We thank the Reviewer for noticing this error.

      (16) Materials and Methods, second paragraph. Please change "inhibitor's studies" to "inhibitor studies".

      Corrected thanks.

      (17) Table 1: Please be consistent in how you write Cox-2.

      Done.

      Reviewer #3:

      The findings are of potential broad interest, but I have some concerns with the pharmacological studies presented. Many of these inhibitors are used at high (double-digit micromolar) concentrations that could result in non-specific pharmacological effects and the authors have provided very little data in support of target engagement and selectivity under the multiple experimental paradigms. Importantly, several claims regarding lipid metabolism signaling in the context of oocyte maturation are made without critical validation that the intended target is inactivated with reasonable selectivity across the proteome. Several of the inhibitors used for pharmacology and metabolomics are known covalent inhibitors (JZL184 and MJN110) that can readily bind additional lipases depending on the treatment time and concentration.

      I did not find any data using the reported ABHD2 inhibitor (compound 183; PMID: 31525885). Is there a reason not to include this compound to complement the knockdown studies? I believe this is an important control given that not all lipid effects were reversed with ABHD2 knockdown. The proper target engagement and selectivity studies should be performed with this ABHD2 inhibitor.

      We obtained aliquots the reported ABHD2 inhibitor compound 183 from Dr. Van Der Stelt and tested its effect on oocyte maturation at 10<sup>-4</sup>M using both low (10<sup>-7</sup>M) or high (10<sup>-5</sup>M) P4 concentration. Compound 183 partially inhibited P4-mediated oocyte maturation. The new data was added to the manuscript as Supplemental Figure 3D.

      Additional comments:

      (1) Pristimerin was tested at low P4 concentration for effects on oocyte maturation. Authors should also test JZL184 and MJN110 under this experimental paradigm.

      We have tested the effect of high concentration (2.10-<sup>-5</sup>M) of JZL184 or MJN110 on oocyte maturation at low P4 concentration (Author response image 3).  MJN 110 did not have a prominent effect on oocyte maturation at low P4, whereas JZL184 inhibited maturation by 50%. However, this inhibition of maturation required concentrations of JZL 184 that are 10 times higher than those reported in rat and human cells (Cui et al., 2016; Smith et al., 2015), arguing against an important role for a monoacylglycerol enzymatic activity in inducing oocyte maturation.

      Author response image 3.

      The effect of MJN110 and JZL184 compounds on oocyte maturation at low P4 concentration. Oocytes were pre-treated for 2 hours with the vehicle or with the highest concentration of 2.10-<sup>-5</sup> M for both JZL184 or MJN110, followed by overnight treatment with P4 at 10-<sup>7</sup>M. Oocyte maturation was measured as % GVBD normalized to control oocytes (treated with vehicle) (mean + SEM; n = 2 independent female frogs for each compound).

      2) Figure 4A showed different ct values of ODC between Oocytes and spleen, please explain them in the text. There is not any description regarding spleen information in Figure 4A, please make it clear in the text.

      We thank the Reviewer for this recommendation. The text was revised accordingly.

      (3) For Figures 3A, E, and I, there are different concentration settings for comparing the activity, is it possible to get the curves based on the same set of concentrations? The concentration gradient didn't include higher concentration points in these figures, thus the related values are incorrect. Please set more concentration points to improve the figures. And for the error bar, there are different display formats like Figure 4c and 4d, etc. Please uniform the format for all the figures. Additionally, for the ctrl. or veh., please add an error bar for all figures.

      Some of the drugs tested were toxic to oocytes at high concentrations so the dose response was adjusted accordingly. The graphs were plotted to encompass the entire tested dose response. We could have plotted the data on the same x-axis range but that would make the figures uneven and awkward.

      We are not clear what the Reviewer means by “The concentration gradient didn't include higher concentration points in these figures, thus the related values are incorrect.”

      The error bars for all dose responses are consistent throughout all the Figures. They are different from those on bar graphs to improve clarity. If the Reviewer wishes to have the error bars on the bar graphs and dose response the same, we are happy to do so. 

      For the inhibitor studies the data were normalized on a per frog basis to control for variability in the maturation rate in response to P4, which varies from frog to frog. It is thus not possible to add error bars for the controls.

      (4) Please check the sentence "However, the concentration of HA130...... higher that......'; Change "IC50" to "IC50" in the text and tables. Table 1 lists IC50 values in the literature, but the references are not cited. Please include the references properly. For the IC50 value obtained in the research, please include the standard deviation in the table. For reference parts, Ref 1, 27, 32, 46, doublecheck the title format.

      We edited the sentence as follows to be more clear: “However, this inhibition of maturation required high concentrations of HA130  -at least 3 orders of magnitude higher that the reported HA130 IC<sub>50</sub>-…”

      We changed IC50 to subscript in Table 1.

      We added the relevant references in Table 1 to provide context for the cited IC50 values for the different inhibitors used.

      We added SEM to the IC<sub>50</sub> for inhibition of oocyte maturation values in Table 1.

      We checked the titles on the mentioned references and cannot identify any problems.

      References

      Cui, Y., Prokin, I., Xu, H., Delord, B., Genet, S., Venance, L., and Berry, H. (2016). Endocannabinoid dynamics gate spike-timing dependent depression and potentiation. eLife 5, e13185.

      Nader, N., Dib, M., Hodeify, R., Courjaret, R., Elmi, A., Hammad, A.S., Dey, R., Huang, X.Y., and Machaca, K. (2020). Membrane progesterone receptor induces meiosis in Xenopus oocytes through endocytosis into signaling endosomes and interaction with APPL1 and Akt2. PLoS Biol 18, e3000901.

      Smith, M., Wilson, R., O'Brien, S., Tufarelli, C., Anderson, S.I., and O'Sullivan, S.E. (2015). The Effects of the Endocannabinoids Anandamide and 2-Arachidonoylglycerol on Human Osteoblast Proliferation and Differentiation. PloS one 10, e0136546.

    1. eLife Assessment

      This work is of fundamental significance to the field of nervous system development as it advances our mechanistic understanding of axon guidance. The rigorous biochemical and genetic approaches are compelling, experiments are well-controlled, and the major claims are supported by convincing data. The study should be of general interest to the developmental neurobiology community.

    2. Reviewer #1 (Public review):

      Summary:

      This study is focused an important aspect of axon guidance at the central nervous system (CNS) midline: how neurons extend axons that either do or do not cross the CNS midline. The authors here address contradictory work in the field relating to how cell surface expression of the slit receptor Robo1 is regulated so as to generate crossed and non-crossed axon trajectories during Drosophila neural development. They use fly genetics, cell lines, and biochemical assessments to define a complex consisting of the commissureless, Nedd4 and Robo1 proteins necessary for regulating Robo1 protein expression. This work resolves certain remaining questions in the field regarding midline axon guidance, with strengths out weighing weaknesses; however, addressing some of these weaknesses would strengthen this study.

      Strengths:

      Strengths include:<br /> - The use of well controlled genetic gain-of-function (over expression) approaches in vivo in Drosophila to show that phosphorylation sites (there are 2, and this study allows for assessment of the contributions made by each) in the commissureless (Comm) protein are indeed required for Comm function with respect to regulating axon midline guidance via their role in directing Comm-mediated Robo1 ubiquitination and degradation in the lysosome.<br /> - The demonstration that in vitro, and in a sensitized genetic background in vivo, the Nedd4 ubiquitin ligase regulates Robo1 protein cell surface distribution and also midline axon crossing in vivo.<br /> - Important evidence here that serves to resolve many questions raised by previous studies (not from these authors) regarding how Robo1 is regulated by Comm and Nedd4 family ubiquitin ligases. Further, these results are likely to have implications for thinking about the regulation of midline guidance in more complex nervous systems.

      Weaknesses:

      - A weakness beyond the purview of revision but important to mention is that the authors chose not to complement their GOF experiments with gene editing approaches to generate endogenous PY mutant alleles of Comm that might have been useful in genetic interaction experiments directed toward revealing roles for endogenous Comm in the regulation of Robo1.

      Comments on revised version:

      In this revised manuscript the authors provide new experiments and also reasonable explanations to address concerns raised in the initial review. I am satisfied that these efforts address satisfactorily the points raised in the initial review and that this study has been strengthened. This is an interesting body of work that adds to our understanding of CNS midline guidance molecular mechanisms.

    3. Reviewer #2 (Public review):

      Summary:

      Sullivan and Bashaw delve into the mechanisms that drive neural circuit assembly, and specifically, into the regulation of cell surface proteins that mediate axon pathfinding. During nervous system development, axons must traverse a molecularly and physically complex extracellular milieu to reach their synaptic targets. A fundamental, conserved repulsive signaling pathway is initiated by the Slit-Robo ligand-receptor pair. Robo, expressed on axon growth cones, binds Slit, secreted by midline cells, to prevent "pre-crossing" and "re-crossing" of axons at the midline. To control this repulsion, Robo surface levels are tightly regulated. In Drosophila, Commissureless (Comm) downregulates Robo surface levels and is required for axon crossing at the midline. Several studies suggest that PY motifs in Comm are required to localize Robo to endosomes. PY motifs have been shown to bind WW-domain containing proteins including the ubiquitin ligase Nedd4 family, so the authors propose that Comm may regulate Robo through Nedd4 interactions. Previous studies have hinted at a role for Nedd4-mediated ubiquitination of Comm in regulation of Robo localization, but there have also been conflicting data. For example, Comm mutants that are unable to be ubiquitinated mimic wild-type Comm, suggesting that ubiquitination of Comm is not required for regulation of Robo. The current study utilizes a suite of genetic analyses in Drosophila to resolve discrepancies pertaining to the mode of Comm-dependent regulation of Robo1 and proposes that Comm acts as an adapter for the Nedd4 ubiquitin ligase to recognize Robo1 as a substrate. The authors also demonstrate that Nedd4 is indeed required for midline crossing.

      Strengths:

      While this work is more incremental rather than field-shifting, it is nonetheless an excellent example of a rigorous, thorough analysis that culminates in enriching our mechanistic understanding of how neurons regulate cell-surface receptors in a spatiotemporal manner to control fundamental steps of circuit wiring. The experimental approach is thorough, and the manuscript is extremely well-written.

      Weaknesses:

      Some key experiments (eg. complex formation) were performed in cell culture in an overexpression background. However, updated experiments demonstrated complex formation using immunoprecipitation in tissues overexpression the corresponding components. Also, there was a missed opportunity to bolster the model proposed by using Comm PY mutants in several experiments.

      Comments on revised version:

      The revised manuscript bolsters the authors' conclusions and now provides evidence for interactions in tissue. No additional experiments are needed.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Response to Editor and Reviewer Comments:

      Many thanks to the editor and reviewers for the thoughtful assessment of our manuscript “Commissureless acts as a substrate adapter in a conserved Nedd4 E3 ubiquitin ligase pathway to promote axon growth across the midline.” Thank you also for the positive comments about the quality of our writing, and for deeming our study rigorous and thorough. We are very pleased that, overall, you believe our combination of genetic and biochemical approaches offers useful insight into the mechanism of Robo regulation at the Drosophila embryonic midline and effectively reconciles the contradictory findings of previous studies done in this field.

      Response to the previous Public Reviews:

      We appreciate the concerns expressed by the reviewers and the suggestions of areas in which the study and manuscript could be improved. The reviewer suggestions were very helpful as we revised our manuscript in order to strengthen our mechanistic understanding of Robo downregulation and better characterize the role Nedd4 plays in this process. We strongly agree with Reviewer 1 that our insight into the mechanism of Robo downregulation via Comm would be much stronger had we not solely relied on overexpression experiments to investigate the effects of PY motif mutations on Comm function. While it is outside the scope of this particular paper, we appreciate your suggestion to use gene editing to investigate the role of PY motif mutation on endogenous comm function and believe this would be a useful question to address in future papers. In addition to this concern, both reviewers identified additional opportunities to strengthen the paper. We have done our best to incorporate reviewer suggestions and will outline how we addressed the following four areas that were identified by both reviewers as areas where additional data could strengthen our conclusions:

      (1) Additional experiments to examine Comm and Robo1 localization in vivo: Characterizing Robo localization in vivo when co-expressed with PY-mutant Comm variants.

      (2) Testing biochemical interactions in embryonic protein extracts: Examining the biochemical interaction between Robo, Comm, and Nedd4 in a more biologically relevant context than cell culture.

      (3) Additional genetic interaction experiments: A) Investigating whether Nedd4 overexpression enhances the Comm G.O.F phenotype of enhanced ectopic crossing. B) Testing for additional genetic interactions with comm.

      (4) Editing the text of the manuscript for clarity.

      (1) Characterizing Robo localization in vivo when co-expressed with Comm variants.

      In the first version of our manuscript, we characterized the localization of wild-type and PY mutant Comm variants expressed in apterous neurons (Figure 5C), but did not examine how these variants of Comm affected localization of their cargo Robo1. To address this gap, we co-expressed 10X UAS Comm-myc (WT, 1PY, 2PY) with 10X UAS Robo-HA under the ap gal4 driver, visualized Comm and Robo by immunostaining for Myc and HA, and measured colocalization between Comm and Robo. We found that Robo colocalizes equally with all comm variants and that its expression pattern mimics that of the Comm variant with which it is expressed. We observe that Robo is restricted to cell bodies when overexpressed with WT Comm but “leaks out” into axons when co-expressed with Comm 1PY or 2PY. This finding suggests that PY motifs are not only required for effective Comm localization to the appropriate cellular areas, but also for proper routing of its cargo, Robo1. These new data are presented in a new supplemental figure: Figure S3.

      (2) Examining the biochemical interaction between Robo, Comm, and Nedd4 in vivo.

      To examine biochemical interaction between Comm, Robo, and Nedd4 in a more biologically relevant context, we performed immunoprecipitations in fly embryonic lysate prepared from the following categories: WT, elav gal4: 5X UAS Comm-myc WT, and elav gal4: 5X UAS Comm-myc WT + 10X UAS Nedd4-HA. We performed immunoprecipitation for myc (Comm), and blotted for endogenous Robo, Myc (Comm), and HA (Nedd4). Corroborating our results in cell culture (Figure 7 A-C), we were able to pull down a three-protein complex consisting of Comm, Nedd4 and Robo in embryonic fly tissue. These new data are presented in a new supplemental figure: Figure S8.

      (3) Investigating additional genetic interactions between Comm and Nedd4.

      A) In our submitted manuscript, we demonstrated that overexpression of Nedd4 enhances Comm-induced downregulation of Robo levels (Figure 7 D-G). To determine whether Nedd4 also increases ectopic crossing, which is a morphological output of Comm activity/Robo downregulation, we analyzed nerve cord phenotypes in embryos from the following categories: WT, embryos expressing WT Comm under the elav gal4, and embryos co-expressing WT Comm and Nedd4 under the elav gal4 driver. We measured nerve cord widths and sorted them into three different “bins” of phenotypic severity, with more severe phenotypes being characterized by thinner nerve cords. We find that the distribution of phenotypes in embryos expressing Comm alone differs significantly from embryos expressing Comm + Nedd4, with the latter shifted toward more severe/thinner phenotypic classes. In addition to examining nerve cord width, we investigated whether Nedd4 can enhance collapse of the nerve cord segments (defined by loss of negative space within the segment) induced by Comm overexpression. We determined percentage of collapsed nerve cord segments and divided these values into three phenotypic classes: no collapse, partial collapse, and total collapse. The distribution of phenotypes in embryos co-expressing Nedd4 and Comm differs significantly from those expressing Comm alone. In the Comm expressing population, we only observe nerve cords with no or partial collapse, but in flies co-expressing Comm and Nedd4 we observe the more severe complete collapse phenotype. These findings suggest that addition of Nedd4 enhances the Comm gain of function phenotype both by further reducing nerve cord width and increasing the occurrence of defects related to ectopic crossing. These new data are presented in a new supplemental figure: Figure S9.

      B) The reviewers also suggested additional genetic interaction experiments between Nedd4 and Comm. It was suggested that we included experiments to look at Nedd4 manipulations in Comm null mutant backgrounds. However, given the complete penetrance and expressivity of the Comm null mutation in which no axons cross the midline, these experiments would not be informative. As an alternative, we attempted to use the described hypomorphic Comm allele, but here too, the baseline commissural axon guidance defects are too strong to allow meaningful detection of enhanced phenotypes. Finally, we tested whether removing one copy of comm could reveal phenotypes in the nedd4 zygotic mutants, but we did not detect defects. This is perhaps unsurprising given that comm heterozygotes have no detectable midline crossing defects.

      (4) Text edits.

      We have made a variety of changes to decrease ambiguity in the text and create a more user-friendly experience for the reader. In the text, as opposed to just the figures, we now explicitly state whether we use 5X or 10X UAS constructs for each of our overexpression constructs. We also edited all mentions of the truncated frazzled construct (FraDc) so that they are uniform. We have also edited all mentions of MiMIC so that they are uniform. In addition, we answer a few questions the reviewers posed. First, we clarify that S2R+ cells express endogenous Comm at very low levels. In addition, we clarify about how we know expression levels are similar across the three Comm variants by explaining that transgenes incorporated into the fly genome by targeted insertion into the same location on the third chromosome.

      We hope that these changes adequately address reviewer concerns, strengthen our study, and enhance readability of the paper. We appreciate the time you took to evaluate our manuscript and the thoughtful commentary and suggestions that you provided.

    1. eLife Assessment

      The paper presents a new behavioral assay for Drosophila aggression and demonstrates that social experience influences fighting strategies, with group-housed males favoring high-intensity but low-frequency tussling over aggressive lunging observed in isolated males. This paper is valuable for researchers studying Drosophila social behaviors, while the characterization of behavioral and neuroanatomical data is incomplete.

    2. Reviewer #1 (Public review):

      This work addresses an important question in the field of Drosophila aggression and mating- prior social isolation is known to increase aggression in males by increased lunging, which is suppressed by group housing (GH). However, it is also known that single-housed (SH) males, despite their higher attempts to court females, are less successful. Here, Gao et al., developed a modified aggression assay, to address this issue by recording aggression in Drosophila males for 2 hours, over a virgin female which is immobilized by burying its head in the food. They found that while SH males frequently lunge in this assay, GH males switch to higher intensity but very low-frequency tussling. Constitutive neuronal silencing and activation experiments implicate cVA sensing Or67d neurons promoting high-frequency lunging, similar to earlier studies, whereas Or47b neurons promote low-frequency but higher intensity tussling. Using optogenetic activation they found that three pairs of pC1 neurons- pC1SS2 increase tussling. While P1a neurons, previously implicated in promoting aggression and courtship, did not increase tussling in optogenetic activation (in the dark), they could promote aggressive tussling in thermogenetic activation carried out in the presence of visible light. It was further suggested, using a further modified aggression assay that GH males use increased tussling and are able to maintain territorial control, providing them mating advantage over SI males and this may partially overcome the effect of aging in GH males.

      Strengths:

      Using a series of clever neurogenetic and behavioral approaches, subsets of ORNs and pC1 neurons were implicated in promoting tussling behaviors. The authors devised a new paradigm to assay for territory control which appears better than earlier paradigms that used a food cup (Chen et al, 2002), as this new assay is relatively clutter-free, and can be eventually automated using computer vision approaches. The manuscript is generally well-written, and the claims made are largely supported by the data.

      Weaknesses:

      I have a few concerns regarding some of the evidence presented and claims made as well as a description of the methodology, which needs to be clarified and extended further.

      (1) Typical paradigms for assaying aggression in Drosophila males last for 20-30 minutes in the presence of nutritious food/yeast paste/females or all of these (Chen et al. 2002, Nilsen et al., 2004, Dierick et al. 2007, Dankert et al., 2009, Certel & Kravitz 2012). The paradigm described in Figure 1 A, while important and more amenable for video recording and computational analysis, seems a modification of the assay from Kravitz lab (Chen et al., 2002), which involved using a female over which males fight on a food cup. The modifications include a flat surface with a central food patch and a female with its head buried in the food, (fixed female) and much longer adaptation and recording times respectively (30 minutes, 2 hours), so in that sense, this is not a 'new' paradigm but a modification of an existing paradigm and its description as new should be appropriately toned down. It would also be important to cite these earlier studies appropriately while describing the assay.

      (2) Lunging is described as a 'low intensity' aggression (line 111 and associated text), however, it is considered a mid to high-intensity aggressive behavior, as compared to other lower-intensity behaviors such as wing flicks, chase, and fencing. Lunging therefore is lower in intensity 'relative' to higher intensity tussling but not in absolute terms and it should be mentioned clearly.

      (3) It is often difficult to distinguish faithfully between boxing and tussling and therefore, these behaviors are often clubbed together as box, tussle by Nielsen et al., 2004 in their Markov chain analysis as well as a more detailed recent study of male aggression (Simon & Heberlein, 2020). Therefore, authors can either reconsider the description of behavior as 'box, tussle' or consider providing a video representation/computational classifier to distinguish between box and tussle behaviors.

      (4) Simon & Heberlein, 2020 showed that increased boxing & tussling precede the formation of a dominance hierarchy in males, and lunges are used subsequently to maintain this dominant status. This study should be cited and discussed appropriately while introducing the paradigm.

      (5) It would be helpful to provide more methodological details about the assay, for instance, a video can be helpful showing how the males are introduced in the assay chamber, are they simply dropped to the floor when the film is removed after 30 minutes (Figures 1-2)?

      (6) The strain of Canton-S (CS) flies used should be mentioned as different strains of CS can have varying levels of aggression, for instance, CS from Martin Heisenberg lab shows very high levels of aggressive lunges. Are the CS lines used in this study isogenized? Are various genetic lines outcrossed into this CS background? In the methods, it is not clear how the white gene levels were controlled for various aggression experiments as it is known to affect aggression (Hoyer et al. 2008).

      (7) How important it is to use a fixed female for the assay to induce tussling? Do these females remain active throughout the assay period of 2.5 hours? Is it possible to use decapitated virgin females for the assay? How will that affect male behaviors?

      (8) Raster plots in Figure 2 suggest a complete lack of tussling in SH males in the first 60 minutes of the encounter, which is surprising given the longer duration of the assay as compared to earlier studies (Nielsen et al. 2004, Simon & Heberlein, 2020 and others), which are able to pick up tussling in a shorter duration of recording time. Also, the duration for tussling is much longer in this study as compared to shorter tussles shown by earlier studies. Is this due to differences in the paradigm used, strain of flies, or some other factor? While the bar plots in Figure 2D show some tussling in SH males, maybe an analysis of raster plots of various videos can be provided in the main text and included as a supplementary figure to address this.

      (9) Neuronal activation experiments suggesting the involvement of pC1SS2 neurons are quite interesting. Further, the role of P1a neurons was demonstrated to be involved in increasing tussling in thermogenetic activation in the presence of light (Figure 4, Supplement 1), which is quite important as the role of vision in optogenetic activation experiments, which required to be carried out in dark, is often not mentioned. However, in the discussion (lines 309-310) it is mentioned that PC1SS2 neurons are 'necessary and sufficient' for inducing tussling. Given that P1a neurons were shown to be involved in promoting tussling, this statement should be toned down.

      (10) Are Or47b neurons connected to pC1SS2 or P1a neurons?

      (11) The paradigm for territory control is quite interesting and subsequent mating advantage experiments are an important addition to the eventual outcome of the aggressive strategy deployed by the males as per their prior housing conditions. It would be important to comment on the 'fitness outcome' of these encounters. For instance, is there any fitness advantage of using tussling by GH males as compared to lunging by SH males? The authors may consider analyzing the number of eggs laid and eclosed progenies from these encounters to address this.

    3. Reviewer #2 (Public review):

      Summary:

      Gao et al. investigated the change of aggression strategies by the social experience and its biological significance by using Drosophila. Two modes of inter-male aggression in Drosophila are known: lunging, high-frequency but weak mode, and tussling, low-frequency but more vigorous mode. Previous studies have mainly focused on the lunging. In this paper, the authors developed a new behavioral experiment system for observing tussling behavior and found that tussling is enhanced by group rearing while lunging is suppressed. They then searched for neurons involved in the generation of tussling. Although olfactory receptors named Or67d and Or65a have previously been reported to function in the control of lunging, the authors found that these neurons do not function in the execution of tussling, and another olfactory receptor, Or47b, is required for tussling, as shown by the inhibition of neuronal activity and the gene knockdown experiments. Further optogenetic experiments identified a small number of central neurons pC1[SS2] that induce the tussling specifically. In order to further explore the ecological significance of the aggression mode change in group rearing, a new behavioral experiment was performed to examine territorial control and mating competition. Finally, the authors found that differences in the social experience (group vs. solitary rearing) are important in these biologically significant competitions. These results add a new perspective to the study of aggressive behavior in Drosophila. Furthermore, this study proposes an interesting general model in which the social experience-modified behavioral changes play a role in reproductive success.

      Strengths:

      A behavioral experiment system that allows stable observation of tussling, which could not be easily analyzed due to its low frequency, would be very useful. The experimental setup itself is relatively simple, just the addition of a female to the platform, so it should be applicable to future research. The finding about the relationship between the social experience and the aggression mode change is quite novel. Although the intensity of aggression changes with the social experience was already reported in several papers (Liu et al., 2011, etc), the fact that the behavioral mode itself changes significantly has rarely been addressed and is extremely interesting. The identification of sensory and central neurons required for the tussling makes appropriate use of the genetic tools and the results are clear. A major strength of the neurobiology in this study is the finding that another group of neurons (Or47b-expressing olfactory neurons and pC1[SS2] neurons), distinct from the group of neurons previously thought to be involved in low-intensity aggression (i.e. lunging), function in the tussling behavior. Further investigation of the detailed circuit analysis is expected to elucidate the neural substrate of the conflict between the two aggression modes.

      Weaknesses:

      The experimental systems examining the territory control and the reproductive competition in Figure 5 are novel and have advantages in exploring their biological significance. However, at this stage, the authors' claim is weak since they only show the effects of age and social experience on territorial and mating behaviors, but do not experimentally demonstrate the influence of aggression mode change itself. In the Abstract, the authors state that these findings reveal how social experience shapes fighting strategies to optimize reproductive success. This is the most important perspective of the present study, and it would be necessary to show directly that the change of aggression mode by social experience contributes to reproductive success.

      In addition, a detailed description of the tussling is lacking. For example, the authors state that the tussling is less frequent but more vigorous than lunging, but while experimental data are presented on the frequency, the intensity seems to be subjective. The intensity is certainly clear from the supplementary video, but it would be necessary to evaluate the intensity itself using some index. Another problem is that there is no clear explanation of how to determine the tussling. A detailed method is required for the reproducibility of the experiment.

    4. Reviewer #3 (Public review):

      In this manuscript, Gao et al. presented a series of intriguing data that collectively suggest that tussling, a form of high-intensity fighting among male fruit flies (Drosophila melanogaster) has a unique function and is controlled by a dedicated neural circuit. Based on the results of behavioral assays, they argue that increased tussling among socially experienced males promotes access to resources. They also concluded that tussling is controlled by a class of olfactory sensory neurons and sexually dimorphic central neurons that are distinct from pathways known to control lunges, a common male-type attack behavior.

      A major strength of this work is that it is the first attempt to characterize the behavioral function and neural circuit associated with Drosophila tussling. Many animal species use both low-intensity and high-intensity tactics to resolve conflicts. High-intensity tactics are mostly reserved for escalated fights, which are relatively rare. Because of this, tussling in the flies, like high-intensity fights in other animal species, has not been systematically investigated. Previous studies on fly aggressive behavior have often used socially isolated, relatively young flies within a short observation duration. Their discovery that 1) older (14-days-old) flies tend to tussle more often than younger (2-days-old) flies, 2) group-reared flies tend to tussle more often than socially isolated flies, and 3) flies tend to tussle at a later stage (mostly ~15 minutes after the onset of fighting), are the result of their creativity to look outside of conventional experimental settings. These new findings are keys for quantitatively characterizing this interesting yet under-studied behavior.

      Precisely because their initial approach was creative, it is regrettable that the authors missed the opportunity to effectively integrate preceding studies in their rationale or conclusions, which sometimes led to premature claims. Also, while each experiment contains an intriguing finding, these are poorly related to each other. This obscures the central conclusion of this work. The perceived weaknesses are discussed in detail below.

      Most importantly, the authors' definition of "tussling" is unclear because they did not explain how they quantified lunges and tussling, even though the central focus of the manuscript is behavior. Supplemental movies S1 and S2 appear to include "tussling" bouts in which 2 flies lunge at each other in rapid succession, and supplemental movie S3 appears to include bouts of "holding", in which one fly holds the opponent's wings and shakes vigorously. These cases raise a concern that their behavior classification is arbitrary. Specifically, lunges and tussling should be objectively distinguished because one of their conclusions is that these two actions are controlled by separate neural circuits. It is impossible to evaluate the credibility of their behavioral data without clearly describing a criterion of each behavior.

      It is also confusing that the authors completely skipped the characterization of the tussling-controlling neurons they claimed to have identified. These neurons (a subset of so-called pC1 neurons labeled by previously described split-GAL4 line pC1SS2) are central to this manuscript, but the only information the authors have provided is its gross morphology in a low-resolution image (Figure 4D, E) and a statement that "only 3 pairs of pC1SS2 neurons whose function is both necessary and sufficient for inducing tussling in males" (lines 310-311). The evidence that supports this claim isn't provided. The expression pattern of pC1SS2 neurons in males has been only briefly described in reference 46. It is possible that these neurons overlap with previously characterized dsx+ and/or fru+ neurons that are important for male aggressions (measured by lunges), such as in Koganezawa et al., Curr. Biol. 2016 and Chiu et al., Cell 2020. This adds to the concern that lunge and tussling are not as clearly separated as the authors claim.

      While their characterizations of tussling behaviors in wild-type males (Figures 1 and 2) are intriguing, the remaining data have little link with each other, making it difficult to understand what their main conclusion is. Figure 3 suggests that one class of olfactory sensory neurons (OSN) that express Or47b is necessary for tussling behavior. While the authors acknowledged that Or47b-expressing OSNs promote male courtship toward females presumably by detecting cuticular compounds, they provided little discussion on how a class of OSN can promote two different types of innate behavior. No evidence of a functional or circuitry relationship between the Or47b pathway and the pC1SS2 neurons was provided. It is unclear how these two components are relevant to each other. Lastly, the rationale of the experiment in Figure 5 and the interpretation of the results is confusing. The authors attributed a higher mating success rate of older, socially experienced males over younger, socially isolated males to their tendency to tussle, but tussling cannot happen when one of the two flies is not engaged. If, for instance, a socially isolated 14-day-old male does not engage in tussling as indicated in Figure 2, how can they tussle with a group-housed 14-day-old male? Because aggressive interactions in Figure 5 were not quantified, it is impossible to conclude that tussling plays a role in copulation advantage among pairs as authors argue (lines 282-288).

      Despite these weaknesses, it is important to acknowledge the authors' courage to initiate an investigation into a less characterized, high-intensity fighting behavior. Tussling requires the simultaneous engagement of two flies. Even if there is confusion over the distinction between lunges and tussling, the authors' conclusion that socially experienced flies and socially isolated flies employ distinct fighting strategies is convincing. Questions that require more rigorous studies are 1) whether such differences are encoded by separate circuits, and 2) whether the different fighting strategies are causally responsible for gaining ethologically relevant resources among socially experienced flies. Enhanced transparency of behavioral data will help readers understand the impact of this study. Lastly, the manuscript often mentions previous works and results without citing relevant references. For readers to grasp the context of this work, it is important to provide information about methods, reagents, and other key resources.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This work addresses an important question in the field of Drosophila aggression and mating- prior social isolation is known to increase aggression in males by increased lunging, which is suppressed by group housing (GH). However, it is also known that single-housed (SH) males, despite their higher attempts to court females, are less successful. Here, Gao et al., developed a modified aggression assay, to address this issue by recording aggression in Drosophila males for 2 hours, over a virgin female which is immobilized by burying its head in the food. They found that while SH males frequently lunge in this assay, GH males switch to higher intensity but very low-frequency tussling. Constitutive neuronal silencing and activation experiments implicate cVA sensing Or67d neurons promoting high-frequency lunging, similar to earlier studies, whereas Or47b neurons promote low-frequency but higher intensity tussling. Using optogenetic activation they found that three pairs of pC1 neurons- pC1SS2 increase tussling. While P1a neurons, previously implicated in promoting aggression and courtship, did not increase tussling in optogenetic activation (in the dark), they could promote aggressive tussling in thermogenetic activation carried out in the presence of visible light. It was further suggested, using a further modified aggression assay that GH males use increased tussling and are able to maintain territorial control, providing them mating advantage over SI males and this may partially overcome the effect of aging in GH males.

      Strengths:

      Using a series of clever neurogenetic and behavioral approaches, subsets of ORNs and pC1 neurons were implicated in promoting tussling behaviors. The authors devised a new paradigm to assay for territory control which appears better than earlier paradigms that used a food cup (Chen et al, 2002), as this new assay is relatively clutter-free, and can be eventually automated using computer vision approaches. The manuscript is generally well-written, and the claims made are largely supported by the data.

      Thank you for your precise summary of our study and being very positive on the novelty and significance of the study.

      Weaknesses:

      I have a few concerns regarding some of the evidence presented and claims made as well as a description of the methodology, which needs to be clarified and extended further.

      (1) Typical paradigms for assaying aggression in Drosophila males last for 20-30 minutes in the presence of nutritious food/yeast paste/females or all of these (Chen et al. 2002, Nilsen et al., 2004, Dierick et al. 2007, Dankert et al., 2009, Certel & Kravitz 2012). The paradigm described in Figure 1 A, while important and more amenable for video recording and computational analysis, seems a modification of the assay from Kravitz lab (Chen et al., 2002), which involved using a female over which males fight on a food cup. The modifications include a flat surface with a central food patch and a female with its head buried in the food, (fixed female) and much longer adaptation and recording times respectively (30 minutes, 2 hours), so in that sense, this is not a 'new' paradigm but a modification of an existing paradigm and its description as new should be appropriately toned down. It would also be important to cite these earlier studies appropriately while describing the assay.

      We will tone down the description and cite related references.

      (2) Lunging is described as a 'low intensity' aggression (line 111 and associated text), however, it is considered a mid to high-intensity aggressive behavior, as compared to other lower-intensity behaviors such as wing flicks, chase, and fencing. Lunging therefore is lower in intensity 'relative' to higher intensity tussling but not in absolute terms and it should be mentioned clearly.

      Ww will textually address this issue.

      (3) It is often difficult to distinguish faithfully between boxing and tussling and therefore, these behaviors are often clubbed together as box, tussle by Nielsen et al., 2004 in their Markov chain analysis as well as a more detailed recent study of male aggression (Simon & Heberlein, 2020). Therefore, authors can either reconsider the description of behavior as 'box, tussle' or consider providing a video representation/computational classifier to distinguish between box and tussle behaviors.

      We will textually address this issue.

      (4) Simon & Heberlein, 2020 showed that increased boxing & tussling precede the formation of a dominance hierarchy in males, and lunges are used subsequently to maintain this dominant status. This study should be cited and discussed appropriately while introducing the paradigm.

      We will cite this paper and discuss on this issue.

      (5) It would be helpful to provide more methodological details about the assay, for instance, a video can be helpful showing how the males are introduced in the assay chamber, are they simply dropped to the floor when the film is removed after 30 minutes (Figures 1-2)?

      We will provide more methodological details.

      (6) The strain of Canton-S (CS) flies used should be mentioned as different strains of CS can have varying levels of aggression, for instance, CS from Martin Heisenberg lab shows very high levels of aggressive lunges. Are the CS lines used in this study isogenized? Are various genetic lines outcrossed into this CS background? In the methods, it is not clear how the white gene levels were controlled for various aggression experiments as it is known to affect aggression (Hoyer et al. 2008).

      We will textually address this issue.

      (7) How important it is to use a fixed female for the assay to induce tussling? Do these females remain active throughout the assay period of 2.5 hours? Is it possible to use decapitated virgin females for the assay? How will that affect male behaviors?

      We will textually address this issue and provide additional videos.

      (8) Raster plots in Figure 2 suggest a complete lack of tussling in SH males in the first 60 minutes of the encounter, which is surprising given the longer duration of the assay as compared to earlier studies (Nielsen et al. 2004, Simon & Heberlein, 2020 and others), which are able to pick up tussling in a shorter duration of recording time. Also, the duration for tussling is much longer in this study as compared to shorter tussles shown by earlier studies. Is this due to differences in the paradigm used, strain of flies, or some other factor? While the bar plots in Figure 2D show some tussling in SH males, maybe an analysis of raster plots of various videos can be provided in the main text and included as a supplementary figure to address this.

      We will textually address the first question and provide more detailed analysis for the second question.

      (9) Neuronal activation experiments suggesting the involvement of pC1SS2 neurons are quite interesting. Further, the role of P1a neurons was demonstrated to be involved in increasing tussling in thermogenetic activation in the presence of light (Figure 4, Supplement 1), which is quite important as the role of vision in optogenetic activation experiments, which required to be carried out in dark, is often not mentioned. However, in the discussion (lines 309-310) it is mentioned that PC1SS2 neurons are 'necessary and sufficient' for inducing tussling. Given that P1a neurons were shown to be involved in promoting tussling, this statement should be toned down.

      We will tone down this statement.

      (10) Are Or47b neurons connected to pC1SS2 or P1a neurons?

      We conducted pathway analysis in the FlyWire electron microscopy database to investigate the connection between Or47b neurons and pC1 neurons. The results indicate that at least three intermediate neurons are required to establish a connection from Or47b neurons to pC1 neurons. Although the FlyWire database currently only contains neuronal data from female brains, they provide a reference for circuit connect in males. Using the currently available upstream and downstream tracing tools (e.g., retro-/trans-Tango), it is not possible to establish a direct connection between the two. Identifying the intermediate neurons involved in this connection is beyond this study. We will discuss on this concern in our revised manuscript.

      (11) The paradigm for territory control is quite interesting and subsequent mating advantage experiments are an important addition to the eventual outcome of the aggressive strategy deployed by the males as per their prior housing conditions. It would be important to comment on the 'fitness outcome' of these encounters. For instance, is there any fitness advantage of using tussling by GH males as compared to lunging by SH males? The authors may consider analyzing the number of eggs laid and eclosed progenies from these encounters to address this.

      We will discuss on this concern.

      Reviewer #2 (Public review):

      Summary:

      Gao et al. investigated the change of aggression strategies by the social experience and its biological significance by using Drosophila. Two modes of inter-male aggression in Drosophila are known: lunging, high-frequency but weak mode, and tussling, low-frequency but more vigorous mode. Previous studies have mainly focused on the lunging. In this paper, the authors developed a new behavioral experiment system for observing tussling behavior and found that tussling is enhanced by group rearing while lunging is suppressed. They then searched for neurons involved in the generation of tussling. Although olfactory receptors named Or67d and Or65a have previously been reported to function in the control of lunging, the authors found that these neurons do not function in the execution of tussling, and another olfactory receptor, Or47b, is required for tussling, as shown by the inhibition of neuronal activity and the gene knockdown experiments. Further optogenetic experiments identified a small number of central neurons pC1[SS2] that induce the tussling specifically. In order to further explore the ecological significance of the aggression mode change in group rearing, a new behavioral experiment was performed to examine territorial control and mating competition. Finally, the authors found that differences in the social experience (group vs. solitary rearing) are important in these biologically significant competitions. These results add a new perspective to the study of aggressive behavior in Drosophila. Furthermore, this study proposes an interesting general model in which the social experience-modified behavioral changes play a role in reproductive success.

      Strengths:

      A behavioral experiment system that allows stable observation of tussling, which could not be easily analyzed due to its low frequency, would be very useful. The experimental setup itself is relatively simple, just the addition of a female to the platform, so it should be applicable to future research. The finding about the relationship between the social experience and the aggression mode change is quite novel. Although the intensity of aggression changes with the social experience was already reported in several papers (Liu et al., 2011, etc.), the fact that the behavioral mode itself changes significantly has rarely been addressed and is extremely interesting. The identification of sensory and central neurons required for the tussling makes appropriate use of the genetic tools and the results are clear. A major strength of the neurobiology in this study is the finding that another group of neurons (Or47b-expressing olfactory neurons and pC1[SS2] neurons), distinct from the group of neurons previously thought to be involved in low-intensity aggression (i.e. lunging), function in the tussling behavior. Further investigation of the detailed circuit analysis is expected to elucidate the neural substrate of the conflict between the two aggression modes.

      Thank you for the acknowledgment of the novelty and significance of the study, and your suggestions for improving the manuscript.

      Weaknesses:

      The experimental systems examining the territory control and the reproductive competition in Figure 5 are novel and have advantages in exploring their biological significance. However, at this stage, the authors' claim is weak since they only show the effects of age and social experience on territorial and mating behaviors, but do not experimentally demonstrate the influence of aggression mode change itself. In the Abstract, the authors state that these findings reveal how social experience shapes fighting strategies to optimize reproductive success. This is the most important perspective of the present study, and it would be necessary to show directly that the change of aggression mode by social experience contributes to reproductive success.

      We will either tone down this statement or provide additional analysis.

      In addition, a detailed description of the tussling is lacking. For example, the authors state that the tussling is less frequent but more vigorous than lunging, but while experimental data are presented on the frequency, the intensity seems to be subjective. The intensity is certainly clear from the supplementary video, but it would be necessary to evaluate the intensity itself using some index. Another problem is that there is no clear explanation of how to determine the tussling. A detailed method is required for the reproducibility of the experiment.

      We will provide more detailed methods and data analysis regarding tussling behavior.

      Reviewer #3 (Public review):

      In this manuscript, Gao et al. presented a series of intriguing data that collectively suggest that tussling, a form of high-intensity fighting among male fruit flies (Drosophila melanogaster) has a unique function and is controlled by a dedicated neural circuit. Based on the results of behavioral assays, they argue that increased tussling among socially experienced males promotes access to resources. They also concluded that tussling is controlled by a class of olfactory sensory neurons and sexually dimorphic central neurons that are distinct from pathways known to control lunges, a common male-type attack behavior.

      A major strength of this work is that it is the first attempt to characterize the behavioral function and neural circuit associated with Drosophila tussling. Many animal species use both low-intensity and high-intensity tactics to resolve conflicts. High-intensity tactics are mostly reserved for escalated fights, which are relatively rare. Because of this, tussling in the flies, like high-intensity fights in other animal species, has not been systematically investigated. Previous studies on fly aggressive behavior have often used socially isolated, relatively young flies within a short observation duration. Their discovery that 1) older (14-days-old) flies tend to tussle more often than younger (2-days-old) flies, 2) group-reared flies tend to tussle more often than socially isolated flies, and 3) flies tend to tussle at a later stage (mostly ~15 minutes after the onset of fighting), are the result of their creativity to look outside of conventional experimental settings. These new findings are key for quantitatively characterizing this interesting yet under-studied behavior.

      Precisely because their initial approach was creative, it is regrettable that the authors missed the opportunity to effectively integrate preceding studies in their rationale or conclusions, which sometimes led to premature claims. Also, while each experiment contains an intriguing finding, these are poorly related to each other. This obscures the central conclusion of this work. The perceived weaknesses are discussed in detail below.

      Thank you for the precise summary of the key findings and novelty of the study, and your insightful suggestions.

      Most importantly, the authors' definition of "tussling" is unclear because they did not explain how they quantified lunges and tussling, even though the central focus of the manuscript is behavior. Supplemental movies S1 and S2 appear to include "tussling" bouts in which 2 flies lunge at each other in rapid succession, and supplemental movie S3 appears to include bouts of "holding", in which one fly holds the opponent's wings and shakes vigorously. These cases raise a concern that their behavior classification is arbitrary. Specifically, lunges and tussling should be objectively distinguished because one of their conclusions is that these two actions are controlled by separate neural circuits. It is impossible to evaluate the credibility of their behavioral data without clearly describing a criterion of each behavior.

      We will add more details in methods.

      It is also confusing that the authors completely skipped the characterization of the tussling-controlling neurons they claimed to have identified. These neurons (a subset of so-called pC1 neurons labeled by previously described split-GAL4 line pC1SS2) are central to this manuscript, but the only information the authors have provided is its gross morphology in a low-resolution image (Figure 4D, E) and a statement that "only 3 pairs of pC1SS2 neurons whose function is both necessary and sufficient for inducing tussling in males" (lines 310-311). The evidence that supports this claim isn't provided. The expression pattern of pC1SS2 neurons in males has been only briefly described in reference 46. It is possible that these neurons overlap with previously characterized dsx+ and/or fru+ neurons that are important for male aggressions (measured by lunges), such as in Koganezawa et al., Curr. Biol. 2016 and Chiu et al., Cell 2020. This adds to the concern that lunge and tussling are not as clearly separated as the authors claim.

      Reply: we will perform additional morphological and functional experiments on pC1<sup>SS2</sup> neurons, e.g., whether they are fru or dsx positive and comparing them with P1a neurons.

      While their characterizations of tussling behaviors in wild-type males (Figures 1 and 2) are intriguing, the remaining data have little link with each other, making it difficult to understand what their main conclusion is. Figure 3 suggests that one class of olfactory sensory neurons (OSN) that express Or47b is necessary for tussling behavior. While the authors acknowledged that Or47b-expressing OSNs promote male courtship toward females presumably by detecting cuticular compounds, they provided little discussion on how a class of OSN can promote two different types of innate behavior. No evidence of a functional or circuitry relationship between the Or47b pathway and the pC1SS2 neurons was provided. It is unclear how these two components are relevant to each other. Lastly, the rationale of the experiment in Figure 5 and the interpretation of the results is confusing. The authors attributed a higher mating success rate of older, socially experienced males over younger, socially isolated males to their tendency to tussle, but tussling cannot happen when one of the two flies is not engaged. If, for instance, a socially isolated 14-day-old male does not engage in tussling as indicated in Figure 2, how can they tussle with a group-housed 14-day-old male? Because aggressive interactions in Figure 5 were not quantified, it is impossible to conclude that tussling plays a role in copulation advantage among pairs as authors argue (lines 282-288).

      Regarding why Or47b-expressing OSNs regulate two types of innate behaviors, we will add a discussion in the revised manuscript to explore the possible mechanisms underlying this phenomenon.

      Regarding the relationship between Or47b-expressing OSNs and pC1<sup>SS2</sup> neurons, we conducted pathway connection analyses using the FlyWire database. Although the FlyWire database currently only contains neuronal data from female brains, these findings provide a certain degree of reference. The results indicate that at least three intermediate neurons are required to establish the connection between these two neuronal types. We hope the editor and reviewers would agree with us that identifying these intermediate neurons involved in this connection is beyond this study.

      Regarding the rationale and conclusions from the experiments in Figure 5, we acknowledge the difficulty in quantifying tussling and lunging behaviors in these experiments. In the revised manuscript, we will tone down the statements about the relationship between fighting strategies and reproductive success. Additionally, we will provide further behavioral experiments to support the association between these two factors.

      Despite these weaknesses, it is important to acknowledge the authors' courage to initiate an investigation into a less characterized, high-intensity fighting behavior. Tussling requires the simultaneous engagement of two flies. Even if there is confusion over the distinction between lunges and tussling, the authors' conclusion that socially experienced flies and socially isolated flies employ distinct fighting strategies is convincing. Questions that require more rigorous studies are 1) whether such differences are encoded by separate circuits, and 2) whether the different fighting strategies are causally responsible for gaining ethologically relevant resources among socially experienced flies. Enhanced transparency of behavioral data will help readers understand the impact of this study. Lastly, the manuscript often mentions previous works and results without citing relevant references. For readers to grasp the context of this work, it is important to provide information about methods, reagents, and other key resources.

      We will add more details in methods and cite additional references, we will also perform additional experiment on pC1<sup>SS2</sup> function.

    1. eLife Assessment

      This paper proposes a new source reconstruction method for electroencephalography (EEG) data and claims that it can provide far superior spatial resolution than existing approaches and even superior spatial resolution to fMRI. This primarily stems from abandoning the established quasi-static approximation to Maxwell's equations. If verified, the potential impact of the proposed method is very high indeed, but it is currently impossible to verify because the clarity of presentation and the evidence for the claims in the current version is inadequate.

    2. Reviewer #1 (Public Review):

      The paper proposes a new source reconstruction method for electroencephalography (EEG) data and claims that it can provide far superior spatial resolution than existing approaches and also superior spatial resolution to fMRI. This primarily stems from abandoning the established quasi-static approximation to Maxwell's equations.

      The proposed method brings together some very interesting ideas, and the potential impact is high. However, the work does not provide the evaluations expected when validating a new source reconstruction approach. I cannot judge the success or impact of the approach based on the current set of results. This is very important to rectify, especially given that the work is challenging some long-standing and fundamental assumptions made in the field.

      I also find that the clarity of the description of the methods, and how they link to what is shown in the main results hard to follow.

      I am insufficiently familiar with the intricacies of Maxwell's equations to assess the validity of the assumptions and the equations being used by WETCOW. The work therefore needs assessing by someone more versed in that area. That said, how do we know that the new terms in Maxwell's equations, i.e. the time-dependent terms that are normally missing from established quasi-static-based approaches, are large enough to need to be considered? Where is the evidence for this?

      I have not come across EFD, and I am not sure many in the EEG field will have. To require the reader to appreciate the contributions of WETCOW only through the lens of the unfamiliar (and far from trivial) approach of EFD is frustrating. In particular, what impact do the assumptions of WETCOW make compared to the assumptions of EFD on the overall performance of SPECTRE?

      The paper needs to provide results showing the improvements obtained when WETCOW or EFD are combined with more established and familiar approaches. For example, EFD can be replaced by a first-order vector autoregressive (VAR) model, i.e. y_t = A y_{t-1} + e_t (where y_t is [num_gridpoints x 1] and A is [num_gridpoints x num_gridpoints] of autoregressive parameters).

      The authors' decision not to include any comparisons with established source reconstruction approaches does not make sense to me. They attempt to justify this by saying that the spatial resolution of LORETA would need to be very low compared to the resolution being used in SPECTRE, to avoid compute problems. But how does this stop them from using a spatial resolution typically used by the field that has no compute problems, and comparing with that? This would be very informative. There are also more computationally efficient methods than LORETA that are very popular, such as beamforming or minimum norm.

      In short, something like the following methods needs to be compared:

      (1) Full SPECTRE (EFD plus WETCOW)<br /> (2) WETCOW + VAR or standard ("simple regression") techniques<br /> (3) Beamformer/min norm plus EFD<br /> (4) Beamformer/min norm plus VAR or standard ("simple regression") techniques

      This would also allow for more illuminating and quantitative comparisons of the real data. For example, a metric of similarity between EEG maps and fMRI can be computed to compare the performance of these methods. At the moment, the fMRI-EEG analysis amounts to just showing fairly similar maps.

      There are no results provided on simulated data. Simulations are needed to provide quantitative comparisons of the different methods, to show face validity, and to demonstrate unequivocally the new information that SPECTRE can _potentially_ provide on real data compared to established methods. The paper ideally needs at least 3 types of simulations, where one thing is changed at a time, e.g.:

      (1) Data simulated using WETCOW plus EFD assumptions<br /> (2) Data simulated using WETCOW plus e.g. VAR assumptions<br /> (3) Data simulated using standard lead fields (based on the quasi-static Maxwell solutions) plus e.g. VAR assumptions

      These should be assessed with the multiple methods specified earlier. Crucially the assessment should be quantitative showing the ability to recover the ground truth over multiple realisations of realistic noise. This type of assessment of a new source reconstruction method is the expected standard.

    3. Reviewer #2 (Public Review):

      Summary:

      The manuscript claims to present a novel method for direct imaging of electric field networks from EEG data with higher spatiotemporal resolution than even fMRI. Validation of the EEG reconstructions with EEG/FMRI, EEG, and iEEG datasets are presented. Subsequently, reconstructions from a large EEG dataset of subjects performing a gambling task are presented.

      Strengths:

      If true and convincing, the proposed theoretical framework and reconstruction algorithm can revolutionize the use of EEG source reconstructions.

      Weaknesses:

      There is very little actual information in the paper about either the forward model or the novel method of reconstruction. Only citations to prior work by the authors are cited with absolutely no benchmark comparisons, making the manuscript difficult to read and interpret in isolation from their prior body of work.

    1. eLife Assessment

      This important study reveals that disrupting fatty acid metabolism in macrophages significantly restricts the growth of Mycobacterium tuberculosis, showing that impaired lipid processing triggers various antimicrobial responses. Whilst the approach is robust, utilizing CRISPR-Cas9 knockout of multiple genes involved in lipid metabolism which yielded some convincing data, there are aspects that require improvement such as the autophagy assay and redox measurements. This work highlights how host lipid metabolism affects the ability of tubercle bacilli to thrive intracellularly, pointing to potential new therapeutic targets.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates the role of macrophage lipid metabolism in the intracellular growth of Mycobacterium tuberculosis. By using a CRISPR-Cas9 gene-editing approach, the authors knocked out key genes involved in fatty acid import, lipid droplet formation, and fatty acid oxidation in macrophages. Their results show that disrupting various stages of fatty acid metabolism significantly impairs the ability of Mtb to replicate inside macrophages. The mechanisms of growth restriction included increased glycolysis, oxidative stress, pro-inflammatory cytokine production, enhanced autophagy, and nutrient limitation. The study demonstrates that targeting fatty acid homeostasis at different stages of the lipid metabolic process could offer new strategies for host-directed therapies against tuberculosis.

      The work is convincing and methodologically strong, combining genetic, metabolic, and transcriptomic analyses to provide deep insights into how host lipid metabolism affects bacterial survival.

      Strengths:

      The study uses a multifaceted approach, including CRISPR-Cas9 gene knockouts, metabolic assays, and dual RNA sequencing, to assess how various stages of macrophage lipid metabolism affect Mtb growth. The use of CRISPR-Cas9 to selectively knock out key genes involved in fatty acid metabolism enables precise investigation of how each step-lipid import, lipid droplet formation, and fatty acid oxidation affect Mtb survival. The study offers mechanistic insights into how different impairments in lipid metabolism lead to diverse antimicrobial responses, including glycolysis, oxidative stress, and autophagy. This deepens the understanding of macrophage function in immune defense.

      The use of functional assays to validate findings (e.g., metabolic flux analyses, lipid droplet formation assays, and rescue experiments with fatty acid supplementation) strengthens the reliability and applicability of the results.

      By highlighting potential targets for HDT that exploit macrophage lipid metabolism to restrict Mtb growth, the work has significant implications for developing new tuberculosis treatments.

      Weaknesses:

      The experiments were primarily conducted in vitro using CRISPR-modified macrophages. While these provide valuable insights, they may not fully replicate the complexity of the in vivo environment where multiple cell types and factors influence Mtb infection and immune responses.

    3. Reviewer #2 (Public review):

      Summary:

      Host-derived lipids are an important factor during Mtb infection. In this study, using CRISPR knockouts of genes involved in fatty acid uptake and metabolism, the authors claim that a compromised uptake, storage, or metabolism of fatty acid restricts Mtb growth upon infection. Further, the authors claim that the mechanism involves increased glycolysis, autophagy, oxidative stress, pro-inflammatory cytokines, and nutrient limitation. The authors also claim that impaired lipid droplet formation restricts Mtb growth. However, promoting lipid droplet biogenesis does not reverse/promote Mtb growth.

      Strengths:

      The strength of the study is the use of clean HOXB8-derived primary mouse macrophage lines for generating CRISPR knockouts.

      Weaknesses:

      There are many weaknesses of this study, they are clubbed into four categories below

      (1) Evidence and interpretations: The results shown in this study at several places do not support the interpretations made or are internally contradictory or inconsistent. There are several important observations, but none were taken forward for in-depth analysis. A<br /> a) The phenotypes of PLIN2-/-, FATP1-/-, and CPT-/- are comparable in terms of bacterial growth restriction; however, their phenotype in terms of lipid body formation, IL1B expression, etc., are not consistent. These are interesting observations and suggest additional mechanisms specific to specific target genes; however, clubbing them all as altered fatty acid uptake or catabolism-dependent phenotypes takes away this important point. b) Finding the FATP1 transcript in the HOXB8-derived FATP1-/- CRISPR KO line is a bit confusing. There is less than a two-fold decrease in relative transcript abundance in the KO line compared to the WT line, leaving concerns regarding the robustness of other experiments as well using FATP1-/- cells.<br /> c) No gene showing differential regulation in FATP-/- macrophages, which is very surprising.<br /> d) ROS measurements should be done using flow cytometry and not by microscopy to nail the actual pattern.

      (2) Experimental design: For a few assays, the experimental design is inappropriate<br /> a) For autophagy flux assay, immunoblot of LC3II alone is not sufficient to make any interpretation regarding the state of autophagy. This assay must be done with BafA1 or CQ controls to assess the true state of autophagy.<br /> b) Similarly, qPCR analyses of autophagy-related gene expression do not reflect anything on the state of autophagy flux.

      (3) Using correlative observations as evidence:<br /> a) Observations based on RNAseq analyses are presented as functional readouts, which is incorrect.<br /> b) Claiming that the inability to generate lipid droplets in PLIN2-/- cells led to the upregulation of several pathways in the cells is purely correlative, and the causal relationship does not exist in the data presented.

      (4) Novelty: A few main observations described in this study were previously reported. That includes Mtb growth restriction in PLIN2 and FATP1 deficient cells. Similarly, the impact of Metformin and TMZ on intracellular Mtb growth is well-reported. While that validates these observations in this study, it takes away any novelty from the study.

      (5) Manuscript organisation: It will be very helpful to rearrange figures and supplementary figures.

    4. Reviewer #3 (Public review):

      Summary:

      This study provides significant insights into how host metabolism, specifically lipids, influences the pathogenesis of Mycobacterium tuberculosis (Mtb). It builds on existing knowledge about Mtb's reliance on host lipids and emphasizes the potential of targeting fatty acid metabolism for therapeutic intervention.

      Strengths:

      To generate the data, the authors use CRISPR technology to precisely disrupt the genes involved in lipid import (CD36, FATP1), lipid droplet formation (PLIN2), and fatty acid oxidation (CPT1A, CPT2) in mouse primary macrophages. The Mtb Erdman strain is used to infect the macrophage mutants. The study, revealsspecific roles of different lipid-related genes. Importantly, results challenge previous assumptions about lipid droplet formation and show that macrophage responses to lipid metabolism impairments are complex and multifaceted. The experiments are well-controlled and the data is convincing.

      Overall, this well-written paper makes a meaningful contribution to the field of tuberculosis research, particularly in the context of host-directed therapies (HDTs). It suggests that manipulating macrophage metabolism could be an effective strategy to limit Mtb growth.

      Weaknesses:

      None noted. The manuscript provides important new knowledge that will lead mpvel to host-directed therapies to control Mtb infections.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigates the role of macrophage lipid metabolism in the intracellular growth of Mycobacterium tuberculosis. By using a CRISPR-Cas9 gene-editing approach, the authors knocked out key genes involved in fatty acid import, lipid droplet formation, and fatty acid oxidation in macrophages. Their results show that disrupting various stages of fatty acid metabolism significantly impairs the ability of Mtb to replicate inside macrophages. The mechanisms of growth restriction included increased glycolysis, oxidative stress, pro-inflammatory cytokine production, enhanced autophagy, and nutrient limitation. The study demonstrates that targeting fatty acid homeostasis at different stages of the lipid metabolic process could offer new strategies for host-directed therapies against tuberculosis.

      The work is convincing and methodologically strong, combining genetic, metabolic, and transcriptomic analyses to provide deep insights into how host lipid metabolism affects bacterial survival.

      Strengths:

      The study uses a multifaceted approach, including CRISPR-Cas9 gene knockouts, metabolic assays, and dual RNA sequencing, to assess how various stages of macrophage lipid metabolism affect Mtb growth. The use of CRISPR-Cas9 to selectively knock out key genes involved in fatty acid metabolism enables precise investigation of how each step-lipid import, lipid droplet formation, and fatty acid oxidation affect Mtb survival. The study offers mechanistic insights into how different impairments in lipid metabolism lead to diverse antimicrobial responses, including glycolysis, oxidative stress, and autophagy. This deepens the understanding of macrophage function in immune defense.

      The use of functional assays to validate findings (e.g., metabolic flux analyses, lipid droplet formation assays, and rescue experiments with fatty acid supplementation) strengthens the reliability and applicability of the results.

      By highlighting potential targets for HDT that exploit macrophage lipid metabolism to restrict Mtb growth, the work has significant implications for developing new tuberculosis treatments.

      Weaknesses:

      The experiments were primarily conducted in vitro using CRISPR-modified macrophages. While these provide valuable insights, they may not fully replicate the complexity of the in vivo environment where multiple cell types and factors influence Mtb infection and immune responses.

      We thank the reviewer for pointing this out. We acknowledge that our in vitro system may indeed not fully replicate the complex in vivo environment in light of the heterogenous responses of macrophages to Mtb infection in whole animal models. We do believe, however, that the Hoxb8 in vitro model provides a powerful genetic tool to interrogate host-Mtb interactions using primary macrophages that represent the bone marrow-derived macrophage lineage. Reviewer #1 also made several helpful suggestions in their recommendations to authors relating to the reorganization of the data in our Figures in both the manuscript and the supplemental data.  We will incorporate these suggestions into the revised version of the manuscript upon resubmission.

      Reviewer #2 (Public review):

      Summary:

      Host-derived lipids are an important factor during Mtb infection. In this study, using CRISPR knockouts of genes involved in fatty acid uptake and metabolism, the authors claim that a compromised uptake, storage, or metabolism of fatty acid restricts Mtb growth upon infection. Further, the authors claim that the mechanism involves increased glycolysis, autophagy, oxidative stress, pro-inflammatory cytokines, and nutrient limitation. The authors also claim that impaired lipid droplet formation restricts Mtb growth. However, promoting lipid droplet biogenesis does not reverse/promote Mtb growth.

      Strengths:

      The strength of the study is the use of clean HOXB8-derived primary mouse macrophage lines for generating CRISPR knockouts.

      Weaknesses:

      There are many weaknesses of this study, they are clubbed into four categories below

      (1) Evidence and interpretations: The results shown in this study at several places do not support the interpretations made or are internally contradictory or inconsistent. There are several important observations, but none were taken forward for in-depth analysis. A

      a) The phenotypes of PLIN2-/-, FATP1-/-, and CPT-/- are comparable in terms of bacterial growth restriction; however, their phenotype in terms of lipid body formation, IL1B expression, etc., are not consistent. These are interesting observations and suggest additional mechanisms specific to specific target genes; however, clubbing them all as altered fatty acid uptake or catabolism-dependent phenotypes takes away this important point.

      We thank the reviewer for highlighting this. Our main focus was on assessing the impact of manipulating lipid homeostasis in macrophages and the consequences this has on the intracellular growth of Mtb.  It was never our intention to imply these mutants generated equivalent phenotypes, and we will modify the revised manuscript to reflect this point.  We will stress that interfering with lipid processing at different stages in macrophages results in both shared and divergent anti-microbial conditions against Mtb.

      b) Finding the FATP1 transcript in the HOXB8-derived FATP1-/- CRISPR KO line is a bit confusing. There is less than a two-fold decrease in relative transcript abundance in the KO line compared to the WT line, leaving concerns regarding the robustness of other experiments as well using FATP1<sup>-/-</sup> cells.

      CRISPR-Cas9 targeting of genes with single sgRNAs as is the case with our mutants generates insertions and deletions (INDELs) at the CRISPR cut site. These INDELs do not block mRNA transcription totally, and this is widely reported and accepted in the field.  In these cases, RT-PCR or RNA-seq methods are not used to verify CRISPR knockouts as they are not sensitive enough to identify INDELs. We provide knockout efficiencies by ICE analysis in supplemental information file 1 for all the mutants used in the study. We also demonstrate protein depletion by western blot and flow cytometry for all the mutants (Figure 1 - figure supplement 1). Only mutants with greater than >90% protein depletion were used for subsequent characterization.

      c) No gene showing differential regulation in FATP1<sup>-/-</sup> macrophages, which is very surprising.

      We assume the reviewer is referring to the Mtb transcriptome response in FATP1<sup>-/-</sup> macrophages, which we agree was unexpected.  However, we saw a significant compensatory response in the host cell (at transcriptional level) in FATP1-/- macrophages as evidenced by an upregulation of other fatty acid transporters (Figure 5 - figure supplement 1). We postulate that these compensatory responses could, in part, alleviate the stresses the bacteria experience within the cell, and these were discussed in the manuscript.

      d) ROS measurements should be done using flow cytometry and not by microscopy to nail the actual pattern.

      We thank the reviewer for the suggestion. However, confocal imaging is also widely used to measure ROS with similar quantitative power and individual cell resolution (PMID: 32636249, 35737799).

      (2) Experimental design: For a few assays, the experimental design is inappropriate

      a) For autophagy flux assay, immunoblot of LC3II alone is not sufficient to make any interpretation regarding the state of autophagy. This assay must be done with BafA1 or CQ controls to assess the true state of autophagy.

      We would like to point out that monitoring LC3I to LC3II conversion by western blot, confocal imaging of LC3 puncta and qPCR analysis of autophagy related genes are all validated assays for monitoring autophagic flux in a wide variety of cells. We refer the reviewer to the latest extensive guidelines on the subject (PMID: 33634751). Furthermore, Bafilomycin A and chloroquine are not specific inhibitors of autophagy and therefore are of limited value as controls. BafA is an inhibitor of the proton-ATPase apparatus as well impacting autophagy through activity on the Ca-P60A/SERCA pathway. Chloroquine impacts vacuole acidification, autophagosome/lysosome fusion and slows phagosome maturation. So, while BafA and chloroquine will reduce autophagy their effects are pleotropic and their impact on Mtb is unknown.

      b) Similarly, qPCR analyses of autophagy-related gene expression do not reflect anything on the state of autophagy flux.

      See our response above.

      (3) Using correlative observations as evidence:

      a) Observations based on RNAseq analyses are presented as functional readouts, which is incorrect.

      We are not entirely sure where we used our RNA-seq data sets as functional readouts. We used our transcriptome data to provide a preliminary identification of anti-microbial responses in the mutant macrophages infected with Mtb. Where applicable, we followed up and confirmed the more compelling RNA-seq data either by metabolic flux analyzes, qPCR, ROS measurements, and quantitative imaging.

      b) Claiming that the inability to generate lipid droplets in PLIN2-/- cells led to the upregulation of several pathways in the cells is purely correlative, and the causal relationship does not exist in the data presented.

      Again, it was not our intention to infer causality. Throughout the manuscript, we endeavor to present our data with a specific focus on describing the consequences of interfering with either fatty acid import, lipid droplet biogenesis and fatty acid oxidation on macrophage responses to Mtb.  We will revisit the revised manuscript to remove any sections that imply causality.

      (4) Novelty: A few main observations described in this study were previously reported. That includes Mtb growth restriction in PLIN2 and FATP1 deficient cells. Similarly, the impact of Metformin and TMZ on intracellular Mtb growth is well-reported. While that validates these observations in this study, it takes away any novelty from the study.

      To the best of our knowledge, Mtb growth restrictions in PLIN2 and FATP1 deficient macrophages have not been reported elsewhere. To the contrary, PLIN2 knockout macrophages obtained from PLIN2 deficient mice have been reported to robustly support Mtb replication (PMID: 29370315), quite the opposite to our data. We extensively discuss these discrepancies in the manuscript. We also discuss and cite appropriate references where Mtb growth restriction for similar macrophage mutants have been reported (CD36<sup>-/-</sup> and CPT2<sup>-/-</sup>). Our aim was to carry out a systematic myeloid specific genetic interference of fatty acid import, storage and catabolism to assess the effect on Mtb growth at all stages of lipid handling instead of focusing on one target. In the chemical approach, we used TMZ and Metformin deliberately because they had already been reported as being active against intracellular Mtb and we wished to place our data in the context of existing literature.  These studies were referenced extensively in the text.

      (5) Manuscript organisation: It will be very helpful to rearrange figures and supplementary figures.

      We will re-organize the figures in the manuscript revision as per the reviewer’s recommendation, and the recommendations of reviewer #1.

      We will address the other concerns raised by reviewer #2 in the recommendations to authors during revision of the manuscript. 

      Reviewer #3 (Public review):

      Summary:

      This study provides significant insights into how host metabolism, specifically lipids, influences the pathogenesis of Mycobacterium tuberculosis (Mtb). It builds on existing knowledge about Mtb's reliance on host lipids and emphasizes the potential of targeting fatty acid metabolism for therapeutic intervention.

      Strengths:

      To generate the data, the authors use CRISPR technology to precisely disrupt the genes involved in lipid import (CD36, FATP1), lipid droplet formation (PLIN2), and fatty acid oxidation (CPT1A, CPT2) in mouse primary macrophages. The Mtb Erdman strain is used to infect the macrophage mutants. The study, reveals specific roles of different lipid-related genes. Importantly, results challenge previous assumptions about lipid droplet formation and show that macrophage responses to lipid metabolism impairments are complex and multifaceted. The experiments are well-controlled and the data is convincing.

      Overall, this well-written paper makes a meaningful contribution to the field of tuberculosis research, particularly in the context of host-directed therapies (HDTs). It suggests that manipulating macrophage metabolism could be an effective strategy to limit Mtb growth.

      Weaknesses:

      None noted. The manuscript provides important new knowledge that will lead mpvel to host-directed therapies to control Mtb infections.

    1. eLife Assessment

      This study uses electrophysiological recordings, causal manipulations of activity, and modeling to investigate how the maintenance of a spatial location in working memory affects the representation of visual information in area V4 of monkeys. The work is important not just for understanding how visual information is encoded, but also for determining precisely how prefrontal inputs to the sensory cortex sculpt the corresponding visual responses during working memory. The data provide solid evidence of direct communication between prefrontal circuits that store spatial information and V4, which, under the current experimental conditions, manifests mainly as changes in temporal activity patterns (beta oscillations and phase coding).

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates what happens to the stimulus-driven responses of V4 neurons when an item is held in working memory. Monkeys are trained to perform memory-guided saccades: they must remember the location of a visual cue and then, after a delay, make an eye movement to the remembered location. In addition, a background stimulus (a grating) is presented that varies in contrast and orientation across trials. This stimulus serves to probe the V4 responses, is present throughout the trial, and is task-irrelevant. Using this design, the authors report memory-driven changes in the LFP power spectrum, changes in synchronization between the V4 spikes and the ongoing LFP, and no significant changes in firing rate.

      Strengths:

      (1) The logic of the experiment is nicely laid out.

      (2) The presentation is clear and concise.

      (3) The analyses are thorough, careful, and yield unambiguous results.

      (4) Together, the recording and inactivation data demonstrate quite convincingly that the signal stored in FEF is communicated to V4 and that, under the current experimental conditions, the impact from FEF manifests as variations in the timing of the stimulus-evoked V4 spikes and not in the intensity of the evoked activity (i.e., firing rate).

      Weaknesses:

      I think there are two limitations of the study that are important for evaluating the potential functional implications of the data. If these were acknowledged and discussed, it would be easier to situate these results in the broader context of the topic, and their importance would be conveyed more fairly and transparently.

      (1) While it may be true that no firing rate modulations were observed in this case, this may have been because the probe stimuli in the task were behaviorally irrelevant; if anything, they might have served as distracters to the monkey's actual task (the MGS). From this perspective, the lack of rate modulation could simply mean that the monkeys were successful in attending the relevant cue and shielding their performance from the potentially distracting effect of the background gratings. Had the visual probes been in some way behaviorally relevant and/or spatially localized (instead of full field), the data might have looked very different. With this in mind, it would be prudent to dial down the tone of the conclusions, which stretch well beyond the current experimental conditions (see recommendations).

      (2) Another point worth discussing is that although the FEF delay-period activity corresponds to a remembered location, it can also be interpreted as an attended location, or as a motor plan for the upcoming eye movement. These are overlapping constructs that are difficult to disentangle, but it would be important to mention them given prior studies of attentional or saccade-related modulation in V4. The firing rate modulations reported in some of those cases provide a stark contrast with the findings here, and I again suspect that the differences may be due at least in part to the differing experimental conditions, rather than a drastically different encoding mode or functional linkage between FEF and V4.

    3. Reviewer #2 (Public review):

      Summary:

      It is generally believed that higher-order areas in the prefrontal cortex guide selection during working memory and attention through signals that selectively recruit neuronal populations in sensory areas that encode the relevant feature. In this work, Parto-Dezfouli and colleagues tested how these prefrontal signals influence activity in visual area V4 using a spatial working memory task. They recorded neuronal activity from visual area V4 and found that information about visual features at the behaviorally relevant part of space during the memory period is carried in a spatially selective manner in the timing of spikes relative to a beta oscillation (phase coding) rather than in the average firing rate (rate code). The authors further tested whether there is a causal link between prefrontal input and the phase encoding of visual information during the memory period. They found that indeed inactivation of the frontal eye fields, a prefrontal area known to send spatial signals to V4, decreased beta oscillatory activity in V4 and information about the visual features. The authors went one step further to develop a neural model that replicated the experimental findings and suggested that changes in the average firing rate of individual neurons might be a result of small changes in the exact beta oscillation frequency within V4. These data provide important new insights into the possible mechanisms through which top-down signals can influence activity in hierarchically lower sensory areas and can therefore have a significant impact on the Systems, Cognitive, and Computational Neuroscience fields.

      Strengths:

      This is a well-written paper with a well-thought-out experimental design. The authors used a smart variation of the memory-guided saccade task to assess how information about the visual features of stimuli is encoded during the memory period. By using a grating of various contrasts and orientations as the background the authors ensured that bottom-up visual input would drive responses in visual area V4 in the delay period, something that is not commonly done in experimental settings in the same task. Moreover, one of the major strengths of the study is the use of different approaches including analysis of electrophysiological data using advanced computational methods of analysis, manipulation of activity through inactivation of the prefrontal cortex to establish causality of top-down signals on local activity signatures (beta oscillations, spike locking and information carried) as well as computational neuronal modeling. This has helped extend an observation into a possible mechanism well supported by the results.

      Weaknesses:

      Although the authors provide support for their conclusions from different approaches, I found that the selection of some of the analyses and statistical assessments made it harder for the reader to follow the comparison between a rate code and a phase code. Specifically, the authors wish to assess whether stimulus information is carried selectively for the relevant position through a firing rate or a phase code. Results for the rate code are shown in Figures 1B-G and for the phase code are shown in Figure 2. Whereas an F-statistic is shown over time in Figure 1F (and Figure S1) no such analysis is shown for LFP power. Similarly, following FEF inactivation there is no data on how that influences V4 firing rates and information carried by firing rates in the two conditions (for positions inside and outside the V4 RF). In the same vein, no data are shown on how the inactivation affects beta phase coding in the OUT condition.

      Moreover, some of the statistical assessments could be carried out differently including all conditions to provide more insight into mechanisms. For example, a two-way ANOVA followed by post hoc tests could be employed to include comparisons across both spatial (IN, OUT) and visual feature conditions (see results in Figures 2D, S4, etc.). Figure 2D suggests that the absence of selectivity in the OUT condition (no significant difference between high and low contrast stimuli) is mainly due to an increase in slope in the OUT condition for the low contrast stimulus compared to that for the same stimulus in the IN condition. If this turns out to be true it would provide important information that the authors should address.

      There are also a few conceptual gaps that leave the reader wondering whether the results and conclusion are general enough. Specifically,

      (1) the authors used microstimulation in the FEF to determine RFs. It is thus possible that the FEF sites that were inactivated were largely more motor-related. Given that beta oscillations and motor preparatory activity have been found to be correlated and motor sites show increased beta oscillatory activity in the delay period, it is possible that the effect of FEF inactivation on V4 beta oscillations is due to inactivation of the main source of beta activity. Had the authors inactivated sites with a preponderance of visual neurons in the FEF would the results be different?

      (2) Somewhat related to this point and given the prominence of low-frequency activity in deeper layers of the visual cortex according to some previous studies, it is not clear where the authors' V4 recordings were located. The authors report that they do have data from linear arrays, so it should be possible to address this.

      (3) The authors suggest that a change in the exact frequency of oscillation underlies the increase in firing rate for different stimulus features. However, the shift in frequency is prominent for contrast but not for orientation, something that raises questions about the general applicability of this observation for different visual features.

      (4) One of the major points of the study is the primacy of the phase code over the rate code during the delay period. Specifically, here it is shown that information about the visual features of a stimulus carried by the rate code is similar for relevant and irrelevant locations during the delay period. This contrasts with what several studies have shown for attention in which case information carried in firing rates about stimuli in the attended location is enhanced relative to that for stimuli in the unattended location. If we are to understand how top-down signals work in cognitive functions it is inevitable to compare working memory with attention. The possible source of this difference is not clear and is not discussed. The reader is left wondering whether perhaps a different measure or analysis (e.g. a percent explained variance analysis) might reveal differences during the delay period for different visual features across the two spatial conditions.

      The use of the memory-guided saccade task has certain disadvantages in the context of this study. Although delay activity is interpreted as memory activity by the authors, it is in principle possible that it reflects preparation for the upcoming saccade, spatial attention (particularly since there is a stimulus in the RF), etc. This could potentially change the conclusion and perspective.

      For the position outside the V4 RF, there is a decrease in both beta oscillations and the clustering of spikes at a specific phase. It is therefore possible that the decrease in information about the stimuli features is a byproduct of the decrease in beta power and phase locking. Decreased oscillatory activity and phase locking can result in less reliable estimates of phase, which could decrease the mutual information estimates.

      The authors propose that coherent oscillations could be the mechanism through which the prefrontal cortex influences beta activity in V4. I assume they mean coherent oscillations between the prefrontal cortex and V4. Given that they do have simultaneous recordings from the two areas they could test this hypothesis on their own data, however, they do not provide any results on that.

      The authors make a strong point about the relevance of changes in the oscillation frequency and how this may result in an increase in firing rate although it could also be the reverse - an increase in firing rate leading to an increase in the frequency peak. It is not clear at all how these changes in frequency could come about. A more nuanced discussion based on both experimental and modeling data is necessary to appreciate the source and role (if any) of this observation.

    4. Reviewer #3 (Public review):

      Summary:

      In this report, the authors test the necessity of prefrontal cortex (specifically, FEF) activity in driving changes in oscillatory power, spike rate, and spike timing of extrastriate visual cortex neurons during a visual-spatial working memory (WM) task. The authors recorded LFP and spikes in V4 while macaques remembered a single spatial location over a delay period during which task-irrelevant background gratings were displayed on the screen with varying orientation and contrast. V4 oscillations (in the beta range) scaled with WM maintenance, and the information encoded by spike timing relative to beta band LFP about the task-irrelevant background orientation depended on remembered location. They also compared recorded signals in V4 with and without muscimol inactivation of FEF, demonstrating the importance of FEF input for WM-induced changes in oscillatory amplitude, phase coding, and information encoded about background orientations. Finally, they built a network model that can account for some of these results. Together, these results show that FEF provides meaningful input to the visual cortex that is used to alter neural activity and that these signals can impact information coding of task-irrelevant information during a WM delay.

      Strengths:

      (1) Elegant and robust experiment that allows for clear tests for the necessity of FEF activity in WM-induced changes in V4 activity.

      (2) Comprehensive and broad analyses of interactions between LFP and spike timing provide compelling evidence for FEF-modulated phase coding of task-irrelevant stimuli at remembered location.

      (3) Convincing modeling efforts.

      Weaknesses:

      (1) 0% contrast background data (standard memory-guided saccade task) are not reported in the manuscript. While these data cannot be used to consider information content of spike rate/time about task-irrelevant background stimuli, this condition is still informative as a 'baseline' (and a more typical example of a WM task).

      (2) Throughout the manuscript, the primary measurements of neural coding pertain to task-irrelevant stimuli (the orientation/contrast of the background, which is unrelated to the animal's task to remember a spatial location). The remembered location impacts the coding of these stimulus variables, but it's unclear how this relates to WM representations themselves.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      This study investigates what happens to the stimulus-driven responses of V4 neurons when an item is held in working memory. Monkeys are trained to perform memory-guided saccades: they must remember the location of a visual cue and then, after a delay, make an eye movement to the remembered location. In addition, a background stimulus (a grating) is presented that varies in contrast and orientation across trials. This stimulus serves to probe the V4 responses, is present throughout the trial, and is task-irrelevant. Using this design, the authors report memory-driven changes in the LFP power spectrum, changes in synchronization between the V4 spikes and the ongoing LFP, and no significant changes in firing rate.

      Strengths:

      (1) The logic of the experiment is nicely laid out.

      (2) The presentation is clear and concise.

      (3) The analyses are thorough, careful, and yield unambiguous results.

      (4) Together, the recording and inactivation data demonstrate quite convincingly that the signal stored in FEF is communicated to V4 and that, under the current experimental conditions, the impact from FEF manifests as variations in the timing of the stimulus-evoked V4 spikes and not in the intensity of the evoked activity (i.e., firing rate).

      Weaknesses:

      I think there are two limitations of the study that are important for evaluating the potential functional implications of the data. If these were acknowledged and discussed, it would be easier to situate these results in the broader context of the topic, and their importance would be conveyed more fairly and transparently.

      (1) While it may be true that no firing rate modulations were observed in this case, this may have been because the probe stimuli in the task were behaviorally irrelevant; if anything, they might have served as distracters to the monkey's actual task (the MGS). From this perspective, the lack of rate modulation could simply mean that the monkeys were successful in attending the relevant cue and shielding their performance from the potentially distracting effect of the background gratings. Had the visual probes been in some way behaviorally relevant and/or spatially localized (instead of full field), the data might have looked very different.

      Any task design involves tradeoffs; if the visual stimulus was behaviorally relevant, then any observed neurophysiological changes would be more confounded by possible attentional effects. We cannot exclude the possibility that a different task or different stimuli would produce different results; we ourselves have reported firing rate enhancements for other types of visual probes during an MGS task (Merrikhi et al. 2017). We have added an acknowledgement of these limitations in the discussion section (lines 311-319). At minimum, our results show a dissociation between the top-down modulation of phase coding, which is enhanced during WM even for these task-irrelevant stimuli, and rate coding. Establishing whether and how this phase coding is related to perception and behavior will be an important direction for future work.

      With this in mind, it would be prudent to dial down the tone of the conclusions, which stretch well beyond the current experimental conditions (see recommendations).

      We have edited the title (removing the word ‘primarily’) and key sentences throughout to tone down the conclusions, generally to state that the importance of a phase code in WM modulations is *possible* given the observed results, rather than certain (see abstract line 27, introduction lines 58-60, results line 215, conclusion lines 294-295).

      (2) Another point worth discussing is that although the FEF delay-period activity corresponds to a remembered location, it can also be interpreted as an attended location, or as a motor plan for the upcoming eye movement. These are overlapping constructs that are difficult to disentangle, but it would be important to mention them given prior studies of attentional or saccade-related modulation in V4. The firing rate modulations reported in some of those cases provide a stark contrast with the findings here, and I again suspect that the differences may be due at least in part to the differing experimental conditions, rather than a drastically different encoding mode or functional linkage between FEF and V4.

      We have added a paragraph to the discussion section addressing links to attention and motor planning (lines 301-322), and specifically acknowledging the inherent difficulties of fully dissociating these effects when interpreting our results (lines 311-319).

      Reviewer #2 (Public review):

      Summary:

      It is generally believed that higher-order areas in the prefrontal cortex guide selection during working memory and attention through signals that selectively recruit neuronal populations in sensory areas that encode the relevant feature. In this work, Parto-Dezfouli and colleagues tested how these prefrontal signals influence activity in visual area V4 using a spatial working memory task. They recorded neuronal activity from visual area V4 and found that information about visual features at the behaviorally relevant part of space during the memory period is carried in a spatially selective manner in the timing of spikes relative to a beta oscillation (phase coding) rather than in the average firing rate (rate code). The authors further tested whether there is a causal link between prefrontal input and the phase encoding of visual information during the memory period. They found that indeed inactivation of the frontal eye fields, a prefrontal area known to send spatial signals to V4, decreased beta oscillatory activity in V4 and information about the visual features. The authors went one step further to develop a neural model that replicated the experimental findings and suggested that changes in the average firing rate of individual neurons might be a result of small changes in the exact beta oscillation frequency within V4. These data provide important new insights into the possible mechanisms through which top-down signals can influence activity in hierarchically lower sensory areas and can therefore have a significant impact on the Systems, Cognitive, and Computational Neuroscience fields.

      Strengths:

      This is a well-written paper with a well-thought-out experimental design. The authors used a smart variation of the memory-guided saccade task to assess how information about the visual features of stimuli is encoded during the memory period. By using a grating of various contrasts and orientations as the background the authors ensured that bottom-up visual input would drive responses in visual area V4 in the delay period, something that is not commonly done in experimental settings in the same task. Moreover, one of the major strengths of the study is the use of different approaches including analysis of electrophysiological data using advanced computational methods of analysis, manipulation of activity through inactivation of the prefrontal cortex to establish causality of top-down signals on local activity signatures (beta oscillations, spike locking and information carried) as well as computational neuronal modeling. This has helped extend an observation into a possible mechanism well supported by the results.

      Weaknesses:

      Although the authors provide support for their conclusions from different approaches, I found that the selection of some of the analyses and statistical assessments made it harder for the reader to follow the comparison between a rate code and a phase code. Specifically, the authors wish to assess whether stimulus information is carried selectively for the relevant position through a firing rate or a phase code. Results for the rate code are shown in Figures 1B-G and for the phase code are shown in Figure 2. Whereas an F-statistic is shown over time in Figure 1F (and Figure S1) no such analysis is shown for LFP power. Similarly, following FEF inactivation there is no data on how that influences V4 firing rates and information carried by firing rates in the two conditions (for positions inside and outside the V4 RF). In the same vein, no data are shown on how the inactivation affects beta phase coding in the OUT condition.

      We plan to incorporate statistical analysis of this point in the revised version.

      Moreover, some of the statistical assessments could be carried out differently including all conditions to provide more insight into mechanisms. For example, a two-way ANOVA followed by post hoc tests could be employed to include comparisons across both spatial (IN, OUT) and visual feature conditions (see results in Figures 2D, S4, etc.). Figure 2D suggests that the absence of selectivity in the OUT condition (no significant difference between high and low contrast stimuli) is mainly due to an increase in slope in the OUT condition for the low contrast stimulus compared to that for the same stimulus in the IN condition. If this turns out to be true it would provide important information that the authors should address.

      We plan to incorporate statistical analysis of this point in the revised version.

      There are also a few conceptual gaps that leave the reader wondering whether the results and conclusion are general enough. Specifically,

      (1) the authors used microstimulation in the FEF to determine RFs. It is thus possible that the FEF sites that were inactivated were largely more motor-related. Given that beta oscillations and motor preparatory activity have been found to be correlated and motor sites show increased beta oscillatory activity in the delay period, it is possible that the effect of FEF inactivation on V4 beta oscillations is due to inactivation of the main source of beta activity. Had the authors inactivated sites with a preponderance of visual neurons in the FEF would the results be different?

      We do not believe this to be likely based on what is known anatomically and functionally about this circuitry. Anatomically, the projections from FEF to V4 arise primarily from the supragranular layers, not layers which contain the highest proportion of motor activity (Barone et al. 2000, Pouget et al. 2009, Markov et al. 2013). Functionally, based on electrical identification of V4-projecting FEF neurons, we know that FEF to V4 projections are predominantly characterized by delay rather than motor activity (Merrikhi et al. 2017). We have now tried to emphasize these points when we introduce the inactivation experiments (lines 180-182).

      Experimentally, the spread of the pharmacological effect with our infusion system is quite large relative to any clustering of visual vs. motor neurons within the FEF, with behavioral consequences of inactivation spreading to cover a substantial portion of the visual hemifield (e.g., Noudoost et al. 2014, Clark et al. 2014), and so our manipulation lacks the spatial resolution to selectively target motor vs. other FEF neurons.

      (2) Somewhat related to this point and given the prominence of low-frequency activity in deeper layers of the visual cortex according to some previous studies, it is not clear where the authors' V4 recordings were located. The authors report that they do have data from linear arrays, so it should be possible to address this.

      Unfortunately our chamber placement for V4 has produced linear array penetration angles which do not reliably allow identification of cortical layers. We are aware of previous results showing layer-specific effects of attention in V4 (e.g., Pettine et al. 2019, Buffalo et al. 2011), and it would indeed be interesting to determine whether our observed WM-driven changes follow similar patterns. We may be able to analyze a subset of the data with current source density analysis to look for layer-specific effects in the future, but are not able to provide any information at this time.

      (3) The authors suggest that a change in the exact frequency of oscillation underlies the increase in firing rate for different stimulus features. However, the shift in frequency is prominent for contrast but not for orientation, something that raises questions about the general applicability of this observation for different visual features.

      We plan to incorporate statistical analysis of this point in the revised version.

      (4) One of the major points of the study is the primacy of the phase code over the rate code during the delay period. Specifically, here it is shown that information about the visual features of a stimulus carried by the rate code is similar for relevant and irrelevant locations during the delay period. This contrasts with what several studies have shown for attention in which case information carried in firing rates about stimuli in the attended location is enhanced relative to that for stimuli in the unattended location. If we are to understand how top-down signals work in cognitive functions it is inevitable to compare working memory with attention. The possible source of this difference is not clear and is not discussed. The reader is left wondering whether perhaps a different measure or analysis (e.g. a percent explained variance analysis) might reveal differences during the delay period for different visual features across the two spatial conditions.

      We have added discussion regarding the relationship of these results to previous findings during attention in the discussion section (lines 301-322).

      The use of the memory-guided saccade task has certain disadvantages in the context of this study. Although delay activity is interpreted as memory activity by the authors, it is in principle possible that it reflects preparation for the upcoming saccade, spatial attention (particularly since there is a stimulus in the RF), etc. This could potentially change the conclusion and perspective.

      We have added a new discussion paragraph addressing the relationship to attention and motor planning (lines 301-322). We have also moderated the language used to describe our conclusions throughout the manuscript in light of this ambiguity.

      For the position outside the V4 RF, there is a decrease in both beta oscillations and the clustering of spikes at a specific phase. It is therefore possible that the decrease in information about the stimuli features is a byproduct of the decrease in beta power and phase locking. Decreased oscillatory activity and phase locking can result in less reliable estimates of phase, which could decrease the mutual information estimates.

      We plan to incorporate statistical analysis of this point in the revised version.

      The authors propose that coherent oscillations could be the mechanism through which the prefrontal cortex influences beta activity in V4. I assume they mean coherent oscillations between the prefrontal cortex and V4. Given that they do have simultaneous recordings from the two areas they could test this hypothesis on their own data, however, they do not provide any results on that.

      This paper only includes inactivation data. We are working on analyzing the simultaneous recording data for a future publication.

      The authors make a strong point about the relevance of changes in the oscillation frequency and how this may result in an increase in firing rate although it could also be the reverse - an increase in firing rate leading to an increase in the frequency peak. It is not clear at all how these changes in frequency could come about. A more nuanced discussion based on both experimental and modeling data is necessary to appreciate the source and role (if any) of this observation.

      As the reviewer notes, it is difficult to determine whether the frequency changes drive the rate changes, vice versa, or whether both are generated in parallel by a common source. We have adjusted our language to reflect this (lines 277-278). Future modeling work may be able to shed more light on the causal relationships between various neural signatures.

      Reviewer #3 (Public review):

      Summary:

      In this report, the authors test the necessity of prefrontal cortex (specifically, FEF) activity in driving changes in oscillatory power, spike rate, and spike timing of extrastriate visual cortex neurons during a visual-spatial working memory (WM) task. The authors recorded LFP and spikes in V4 while macaques remembered a single spatial location over a delay period during which task-irrelevant background gratings were displayed on the screen with varying orientation and contrast. V4 oscillations (in the beta range) scaled with WM maintenance, and the information encoded by spike timing relative to beta band LFP about the task-irrelevant background orientation depended on remembered location. They also compared recorded signals in V4 with and without muscimol inactivation of FEF, demonstrating the importance of FEF input for WM-induced changes in oscillatory amplitude, phase coding, and information encoded about background orientations. Finally, they built a network model that can account for some of these results. Together, these results show that FEF provides meaningful input to the visual cortex that is used to alter neural activity and that these signals can impact information coding of task-irrelevant information during a WM delay.

      Strengths:

      (1) Elegant and robust experiment that allows for clear tests for the necessity of FEF activity in WM-induced changes in V4 activity.

      (2) Comprehensive and broad analyses of interactions between LFP and spike timing provide compelling evidence for FEF-modulated phase coding of task-irrelevant stimuli at remembered location.

      (3) Convincing modeling efforts.

      Weaknesses:

      (1) 0% contrast background data (standard memory-guided saccade task) are not reported in the manuscript. While these data cannot be used to consider information content of spike rate/time about task-irrelevant background stimuli, this condition is still informative as a 'baseline' (and a more typical example of a WM task).

      We plan to incorporate statistical analysis of this point in the revised version.

      (2) Throughout the manuscript, the primary measurements of neural coding pertain to task-irrelevant stimuli (the orientation/contrast of the background, which is unrelated to the animal's task to remember a spatial location). The remembered location impacts the coding of these stimulus variables, but it's unclear how this relates to WM representations themselves.

      Indeed, here we have focused on how maintaining spatial WM impacts visual processing of incoming sensory information, rather than on how the spatial WM signal itself is represented and maintained. Behaviorally, this impact on visual signals could be related to the effects of the content of WM on perception and reaction times (e.g., Soto et al. 2008, Awh et al. 1998, Teng et al. 2019), but no such link to behavior is shown in our data.

    1. eLife Assessment

      Masroor Ahmad Paddar and colleagues reveal noncanonical roles of ATG5 and membrane ATG8ylation in regulating retromer assembly and function. They identify ATG5's unique non-autophagic role and show that CASM partially contributes to these phenotypes. Although the mechanism by which ATG8ylation regulates the retromer remains unclear, the findings provide important insights with solid supporting evidence.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Masroor Ahmad Paddar and his/her colleagues explore the noncanonical roles of ATG5 and membrane atg8ylation in regulating retromer assembly and function. They begin by examining the interactomes of ATG5 and expand the scope of these effects to include homeostatic responses to membrane stress and damage.

      Strengths:

      This study provides novel insights into the noncanonical function of ATG8ylation in endosomal cargo sorting process.

      Weaknesses:

      The direct mechanism by which ATG8ylation regulates the retromer remains unsolved.

      Comments on revisions:

      After revision, though the major weakness remains unsolved, other questions have been addressed experimentally or further interpreted.

    3. Reviewer #2 (Public review):

      Summary:

      Padder et al. demonstrates that ATG5 mediates lysosomal repair via the recruitment of the retromer components during LLOMe-induced lysosomal damage and that mAtg8-ylation contributes to retromer-dependent cargo sorting of GLUT1. Although previous studies have suggested that during glucose withdrawal, classical autophagy contributes to retromer-dependent GLUT1 surface trafficking via interactions between LC3A and TBC1D5, the experiments here demonstrate that during basal conditions or lysosomal damage, ATGs that are not involved in mATG8ylation, such as FIP200, are not functionally required for retromer-dependent sorting of GLUT1. Overall, these studies suggest a unique role for ATG5 in the control of retromer function, and that conjugation of ATG8 to single membranes (CASM) is a partial contributors to these phenotypes.

      Strengths:

      (1) Overall, these studies suggest a unique non-autophagic role for ATG5 in the control of retromer function. They also demonstrate that conjugation of ATG8 to single membranes (CASM) is a partial contributors to these phenotypes. Overall, these data point to a new role for ATG5 and CASM-dependent mATG8ylation in lysosomal membrane repair and trafficking.

      (2) Although the studies are overall supportive of the proposed model that the retromer is controlled by CASM-dependent mATG8-ylaytion, it is noteworthy that previous studies of GLUT1 trafficking during glucose withdrawal (Roy et al. Mol Cell, PMID: 28602638) were predominantly conducted in cells lacking ATG5 or ATG7, which would not be able to discriminate between a CASM-dependent vs. canonical autophagy-dependent pathway in the control of GLUT1 sorting. Is the lack of GLUT1 mis-sorting to lysosomes observed in FIP200 and ATG13KO cells also observed during glucose withdrawal? Notably, deficiencies in glycolysis and glucose-dependent growth have been reported in FIP200 deficient fibroblasts (Wei et al. G&D, PMID: 21764854) so there may be difference in regulation dependent on the stress imposed on a cell.

      Comments on revisions:

      My previous comments have been addressed.

    4. Reviewer #3 (Public review):

      In this manuscript, Padder et al. used APEX2 proximity labeling to find an interaction between ATG5 and the core components of the Retromer complex, VPS26, VPS29, and VPS35. Further studies revealed that ATG5 KO inhibited the trafficking of GLUT1 to the plasma membrane. They also found that other autophagy genes involved in membrane atg8ylation affected GLUT1 sorting. However, knocking out other essential autophagy genes such as ATG13 and FIP200 did not affect GLUT1 sorting. These findings suggest that ATG5 participates in the function of the Retromer in a noncanonical autophagy manner. Overall, the methods and techniques employed by the authors largely support their conclusions. These findings are intriguing and significant, enriching our understanding of the non-autophagic functions of autophagy proteins and the sorting of GLUT1.

      Comments on revisions:

      The concerns I raised have all been addressed.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews

      Reviewer #1 (Public Review): 

      Summary: 

      In this study, Masroor Ahmad Paddar and his/her colleagues explore the noncanonical roles of ATG5 and membrane atg8ylation in regulating retromer assembly and function. They begin by examining the interactomes of ATG5 and expand the scope of these effects to include homeostatic responses to membrane stress and damage. 

      Strengths: 

      This study provides novel insights into the noncanonical function of ATG8ylation in endosomal cargo sorting process. 

      Weaknesses: 

      The direct mechanism by which ATG8ylation regulates the retromer remains unsolved. 

      We agree with the reviewer.  We do however show how at least one aspect of atg8ylation contributes to the proper retromer function, which occurs via lysosomal membrane maintenance and repair. Understanding the more direct effects on retromer will require a separate study. We now emphasize this in the revised manuscript (p. 18) and point out the limitations of the present work (p. 18): “One of the limitations of our study is that beyond effects of membrane atg8ylation on quality of lysosomal membrane and its homeostasis there could be more direct effects of membrane modification with mATG8s that still need to be understood”.

      Reviewer #2 (Public Review): 

      Summary:

      Padder et al. demonstrate that ATG5 mediates lysosomal repair via the recruitment of the retromer components during LLOMe-induced lysosomal damage and that mAtg8-ylation contributes to retromer-dependent cargo sorting of GLUT1. Although previous studies have suggested that during glucose withdrawal, classical autophagy contributes to retromer-dependent GLUT1 surface trafficking via interactions between LC3A and TBC1D5, the experiments here demonstrate that during basal conditions or lysosomal damage, ATGs that are not involved in mATG8ylation, such as FIP200, are not functionally required for retromer-dependent sorting of GLUT1. Overall, these studies suggest a unique role for ATG5 in the control of retromer function, and that conjugation of ATG8 to single membranes (CASM) is a partial contributor to these phenotypes. 

      Strengths: 

      (1) Overall, these studies suggest a unique non-autophagic role for ATG5 in the control of retromer function. They also demonstrate that conjugation of ATG8 to single membranes (CASM) is a partial contributor to these phenotypes. Overall, these data point to a new role for ATG5 and CASM-dependent mATG8ylation in lysosomal membrane repair and trafficking. 

      (2) Although the studies are overall supportive of the proposed model that the retromer is controlled by CASM-dependent mATG8-ylaytion, it is noteworthy that previous studies of GLUT1 trafficking during glucose withdrawal (Roy et al. Mol Cell, PMID: 28602638) were predominantly conducted in cells lacking ATG5 or ATG7, which would not be able to discriminate between a CASM-dependent vs. canonical autophagy-dependent pathway in the control of GLUT1 sorting. Is the lack of GLUT1 mis-sorting to lysosomes observed in FIP200 and ATG13KO cells also observed during glucose withdrawal? Notably, deficiencies in glycolysis and glucose-dependent growth have been reported in FIP200 deficient fibroblasts (Wei et al. G&D, PMID: 21764854) so there may be differences in regulation dependent on the stress imposed on a cell. 

      We thank the reviewer for the overall assessment of the strengths of the study.  We have discussed in the manuscript the elegant study by Roy et al., PMID 28602683. To accommodate reviewer’s comment, we have additionally emphasized in the text that our study is focused on basal conditions and conditions that perturb endolysosomal compartments. We agree with the reviewer that under metabolic stress conditions (such as glucose limitation) more complex pathways may be engaged and have acknowledged that in the discussion. We have now included this in the limitations of the study (p. 18): “Another limitation of our study is that we have focused on basal conditions or conditions causing lysosomal damage, whereas metabolic stress including glucose excess or limitation with its multitude of metabolic effects have not been addressed”.

      Weaknesses: 

      (1) Additional controls are needed to clarify the role of CASM in the control of retromer function. Because the manuscript proposes both CASM-dependent and independent pathways in the ATG5 mediated regulation of the retromer, it is important to provide robust evidence that CASM is required for retromer-dependent GLUT1 sorting to the plasma membrane vs. lysosome. The experiments with monensin in Fig. 7C-E are consistent with but not unequivocally corroborative of a role for CASM. 

      We fully agree with the reviewer. In fact, our data with bafilomycin A1 treatment causing GLUT1 miss-sorting show that it is the perturbance of lysosomes  and not CASM per se that leads to mis-sorting of GLUT1 (Fig. 7D,E). Note that it has been shown (PMIDs: 28296541, 25484071 and 37796195) that although bafilomycin A1 deacidifies lysosomes it does not induce but instead inhibits CASM. This is because bafilomycin A1 causes dissociation of V1 and V0 sectors of V-ATPase, unlike other CASM-inducing agents which promote V1 V0 association. Complementing this, our data with ATG2AB DKO and ESCRT VPS37A KO (Fig. 8A-F) indicate that the repair of lysosomes is important to keep the retromer machinery functional (as illustrated in Fig. 8G). This may be one of the effector mechanisms downstream of membrane atg8ylation in general and hence also downstream of CASM. We have revised Fig. 7 title to read “Lysosomal perturbations cause GLUT1 mis-sorting” and have explained these relationships in the text (p. 12-13): “Since bafilomycin A1 does not induce CASM but disturbs luminal pH, we conclude that it is the less acidic luminal pH of the endolysosomal organelles, and not CASM, that is sufficient to interfere with the proper sorting of GLUT1.”

      Based on the results shown with ATG16KO in Fig 4A-D, rescue experiments of these 16KO cells with WT vs. C-terminal WD40 mutant versions of ATG16 will specifically assess the requirement for CASM and potentially provide more rigorous support for the conclusions drawn. 

      We have carried out complementation with ATG16L1 WT and its E230 mutant (devoid of WD40 repeats but still capable of canonical autophagy) and placed these data in Fig. 7 (panels I and J) as recommended by the reviewer. This is now described on p. 13 (To additionally test this notion, we compared ATG16L1 full length (ATG16L1FL) and ATG16L1E230 (Rai et al., PMID 30403914) for complementation of the GLUT1 sorting defect in ATG16L1 KO cells (Fig. 7I,J). ATG16L1E230 [Rai, 2019, 30403914] lacks the key domain to carry out CASM via binding to VATPase 29,30 31-33 but retains capacity to carry out atg8ylation.  Both ATG16L1FL and ATG16L1E230 complemented mis-sorting of GLUT1 (Fig. 7I,J). Collectively, these data indicate that it is not absence of CASM/VAIL but absence of membrane atg8ylation in general that promotes GLUT1 mis-sorting.).

      (2) Also, the role of TBC1D5 should be further clarified. In Fig S7, are there any changes in the interactions between TBC1D5 and VPS35 in response to LLOMe or other agents utilized to induce CASM? 

      We thank the reviewer for pointing this out. We do have data with VPS35 in co-IPs shown in Fig. S7.  There is no change in the amounts of VPS35 or TBC1D5 in GFP-LC3A co-IPs. We now include in Fig. S7 (new panel D) a graph with quantification in the revised manuscript and emphasize this point (p. 12): “However, under CASM-inducing conditions, no changes were detected (Fig. S7B-D) in interactions between TBC1D5 and LC3A or in levels of VPS35 in LC3A co-IP, a proxy for LC3A-TBC1D5-VPS29/retromer association. This suggests that CASM-inducing treatments and additionally bafilomycin A1 do not affect the status of the TBC1D5-Rab7 system”.        

      Does TBC1D5 loss-of-function modulate the numbers of GLUT1 and Gal3 puncta observed in ATG5 deficient cells in response to LLOMe? 

      We agree that TBC1D5 is an interesting aspect. However, because TBC1D5 does not change its interactions in the experiments in our study, we consider this topic (i.e. whether TBC1D5 phenocopies VPS35 and ATG5 KOs in its effects on Gal3) to be beyond the scope of the present work. We underscore that LLOMe (lysosomal damage) mis-sorts GLUT1 even without any genetic intervention (e.g., in WT cells in the absence of ATG5 KO; Fig. 7). Thus, in our opinion the effects of TBC1D5 inactivation may be a moot point.  

      (3) Finally, the studies here are motivated by experiments in Fig. S1 (as well as other studies from the Deretic and Stallings labs) suggesting unique autophagy-independent functions for ATG5 in myeloid cells and neutrophils in susceptibility to Mycobacterium tuberculosis infection. However, it is curious that no attempt is made to relate the mechanistic data regarding the retromer or GLUT1 receptor mis-sorting back to the infectious models. Do myeloid cells or neutrophils lacking ATG5 have deficiencies in glucose uptake or GLUT1 cell surface levels? 

      Reviewer’s point is well taken. Glucose uptake, its metabolism, and diabetes underly resurgence in TB in certain populations and are important factors in a range of other diseases. This was alluded to in our discussion (lines 461-469). However, these are complex topics for future studies. We have now expanded this section of the discussion (p. 18): “In the context of tuberculosis, diabetes, which includes glucose dysregulation, is associated with increased incidence of active disease and adverse outcomes” (Dheda et al., ,PMID: 26377143; Dooley, et al., PMID:19926034).

      Reviewer #3 (Public Review): 

      In this manuscript, Padder et al. used APEX2 proximity labeling to find an interaction between ATG5 and the core components of the Retromer complex, VPS26, VPS29, and VPS35. Further studies revealed that ATG5 KO inhibited the trafficking of GLUT1 to the plasma membrane. They also found that other autophagy genes involved in membrane atg8ylation affected GLUT1 sorting. However, knocking out other essential autophagy genes such as ATG13 and FIP200 did not affect GLUT1 sorting. These findings suggest that ATG5 participates in the function of the Retromer in a noncanonical autophagy manner. Overall, the methods and techniques employed by the authors largely support their conclusions. These findings are intriguing and significant, enriching our understanding of the non-autophagic functions of autophagy proteins and the sorting of GLUT1.

      Nevertheless, there are several issues that the authors need to address to further clarify their conclusions. 

      (1) The authors confirmed the interaction between Atg5 and the Retromer complex through Co-IP experiments. Is the interaction between Atg5 and the Retromer direct? If it is direct, which Retromer complex protein regulates the interaction with Atg5? Additionally, does ATG5 K130R mutant enhance its interaction with the Retromer? 

      AlphaFold modeling in the initial submission of our study to eLife (absent from the current version) suggested the possibility of a direct interaction between ATG5 and VPS35 with ATG12—ATG5 complex facing outwards, in which case K130R would not matter. However, mutational experiments in putative contact residues did not alter association in co-IPs. So either ATG5 interacts with other retromer subunits or more likely is in a larger protein complex containing retromer. It will take a separate study to dissect associations and find direct interaction partners. 

      (2) To more directly elucidate how ATG5 regulates Retromer function by interacting with the Retromer and participates in the trafficking of GLUT1 to the plasma membrane, the authors should identify which region or crucial amino acid residues of ATG5 regulate its interaction with the Retromer. Additionally, they should test whether mutations in ATG5 that disrupt its interaction with the Retromer affect Retromer function (such as participating in the trafficking of GLUT1 to the plasma membrane) and whether they affect Atg8ylation. They also need to assess whether these mutations influence canonical autophagy and lysosomal sensitivity to damage. 

      Please see the response to point 1.

      Recommendations for the authors.

      Reviewer #1 (Recommendations For The Authors): 

      While most data are solid and convincing, the following questions need to be addressed before publication: 

      Major Concerns: 

      (1) Examining only one cargo (GLUT1) is insufficient to reflect the retromer's function comprehensively. At least two additional cargoes should be analyzed to observe the phenotypes more accurately. 

      We agree that having another retromer cargo (in addition to GLUT1) would be of interest. We point out that our data also show mis-sorting of SNX27 to lysosomes (Fig. 3H, quantifications in Fig. 3I).  SNX27 in turn sorts nearly 80 ion channels, signaling receptors, and other nutrient transporters. Which of the 80 cargos to prioritize and check (the expectation is that all 80 might be missorted given that they need SNX27)?  We have instead tested MPR, a SNX27-independent cargo. We now include data on effects of ATG5 knockout on CI-MPR (Fig. S9A-F). This is described in the text (p. 14; “Effect of ATG5 knockout on MPR sorting

      We tested whether ATG5 affects cation-independent mannose 6-phosphate receptor (CI-MPR). For this, we employed the previously developed methods (Fig. S9A) of monitoring retrograde trafficking of CI-MPR from the plasma membrane to the TGN 70,118-121. In the majority of such studies, CI-MPR antibody is allowed to bind to the extracellular domain of CI-MPR at the plasma membrane and its localization dynamics following endocytosis serves as a proxy for trafficking of CI-MPR. We used ATG5 KOs in HeLa and Huh7 cells and quantified by HCM retrograde trafficking to TGN of antibody-labeled CI-MPR at the cell surface, after being taken up by endocytosis and allowed to undergo intracellular sorting, followed by fixation and staining with TGN46 antibody. There was a minor but statistically significant reduction in CIMPR overlap with TGN46 in HeLaATG5-KO that was comparable to the reduction in HeLa cells when

      VPS35 was depleted by CRISPR (HeLaVPS35-KO) (Fig. S9B,C). Morphologically, endocytosed Ab-CI-

      MPR appeared dispersed in both HeLaATG5-KO and HeLaVPS35-KO cells relative to HeLaWT cells (Fig. S9D). Similar HCM results were obtained with Huh7 cells (WT vs. ATG5KO; Fig. S9E,F). We interpret these data as evidence of indirect action of ATG5 KO on CI-MPR sorting via membrane homeostasis, although we cannot exclude a direct sorting role via retromer. We favor the former interpretation based on the strength of the effect and the controversial nature of retromer engagement in sorting of CI-MPR (57,70,75,98,120).”)

      (2) The evidence from Alphafold predictions is weak. The direct interaction of ATG5 with retromer subunits should be tested. 

      Please see the above response to Reviewer 3.

      In addition, does retromer also interact with ATG16L1 similarly to the phenomenon in VAIL? 

      We fully agree with the reviewer that finding the direct interacting partners between retromer and membrane atg8ylation machinery is an important direction as in our opinion it would expand the repertoire of E3 ligases and its adaptors. However, given the complexity and variety of possibilities, we believe that this is a topic for a future study.  

      (3) In Line 166, Figures 2C and 2D, the Gal3 phenotype does not seem to be well complemented by VPS35. 

      We have adjusted the text to acknowledge incomplete complementation (p.7). 

      (4) In Figures 3 and 4, the authors show that KO of membrane atg8ylation machineries and ATG8-Hexa KO affects the localization of retromer cargo GLUT1 and SNX27. However, the mechanism by which membrane ATG8ylation affects retromer remains unresolved.

      Additionally, are other retromer subunits' locations are also affected, if so, how are they impacted? At least a speculative explanation should be provided. 

      Following reviewers request, we now state on p. 19 that “one of the limitations of our study is that beyond effects of membrane atg8ylation on quality of lysosomal membrane and its homeostasis there could be more direct effects of membrane modification with mATG8s on retromer that still need to be understood”.

      (5) In Figure 3, endogenous IP results are required to examine the interaction of ATG5 with retromer if suitable retromer antibodies for IP are available. 

      Endogenous IPs are given in Fig. 1. We have modified text on p. 8 to clarify this.

      (6) In Figure 4, ATG8 Hexa KO, and triple KO of LC3s or GABARAPs all increase the localization of GLUT1 on lysosomes. It seems redundant for ATG8 family proteins here.

      Can any individual member of the ATG8 family rescue this phenotype? 

      If the intent of such complementation analysis is to identify a specific mATG8 responsible for the observed effects, this is already pre-empted by the fact that TKOs also have a similar effect as HEXA mutants (i.e. loss of at least two of mATG8s is enough to cause the phenotype). We now discuss this in the text (p. 10): “Thus, at least two mATG8s, each one from two different mATG8 subclasses (LC3s and GABARAPs) or the entire membrane atg8ylation machinery was engaged in and required for proper GLUT-1 sorting”.  

      (7) In Figure 5, knockdown of ATG5 in FIP200 KO cells inhibited GLUT1 sorting from endosomes, leading to its trafficking to lysosomes. However, it is known that very little remnant ATG5 in ATG5 KD cells is enough to support ATG8 lipidation. Therefore, it is essential to repeat this experiment using ATG5/FIP200 double KO or ATG5 KO combined with an autophagy inhibitor. 

      We point out to this limitation in the text (p. 11): “….we knocked down ATG5 in FIP200 KO cells (Fig. S5D) and found that GLUT1 puncta and GLUT1+LAMP2+ profiles increased even in the FIP200 KO background with the effects nearing those of VPS35 knockout (Figs. 5D-F and S5C), with the difference between VPS35 KO and ATG5 KD attributable to any residual ATG5 levels in cells subjected to siRNA knockdowns”.

      (8) In Figure 7, the authors show that the induction of CASM inhibited GLUT1 sorting from endosomes. However, ATG5 KO, which abolishes membrane ATG8ylation, also inhibits GLUT1 sorting. This seems paradoxical and requires a reasonable explanation or discussion. 

      We understand reviewer’s comment. The answer to this paradox is that it is actually the lysosomal damage that causes GLUT1 mis-sorting and not CASM. Membrane atg8ylation, such as CASM and probably other processes given that involvement of both ATG2 and ESCRTs (Fig. 8) counteracts the damage and works in the direction of restoring/maintaining proper retromer-dependent sorting. This is now explained better in the text, and have revised the title of Fig. 7 to read “Lysosomal damage causes GLUT1 mis-sorting”. Our data with bafilomycin A1 show that it is the perturbance of lysosomes (not CASM per se) that leads to mis-sorting of GLUT1 (Fig. 7D,E), and our data with ATG2AB DKO and ESCRT (VPS37A) KO (Fig. 8A-F) indicate that repair of lysosomes is important to keep the retromer working machinery functional (as illustrated in Fig. 8G), which may be one of the effector mechanisms downstream of membrane atg8ylation  in general (and hence also of CASM).  

      (9) The immuno-staining results for Figures 7F and 7G are lacking. 

      We now provide the requested images.

      (10) In Figure 8D, the quality of the image for VPS37 KO cells treated with LLOME is not sufficient to show increased colocalization between GLUT1 and LAMP2. 

      We now provide a different example image. We note that these are epiflorescent HCM images  

      Minor Concerns: 

      (1) It would be better to distinguish the function of the membrane ATG8ylation machinery (i.e., ATG5) from the function of membrane ATG8ylation in the description. No ATG8ylation-deficient mutants were used in this study. 

      We have used atg8ylation mutants (e.g. KOs in ATG3, ATG5, ATG7, and ATG16L1). We now emphasize this better in the text (p. 10). 

      (2) In Figure 2D, a green box appears there by incident. 

      This has been fixed.

      (3) In Figure 3A, the conjugate for ATG5-ATG12 is absent in the gel for IB: ATG5.

      The ATG5 antibody used in Fig. 3A recognizes primarily the conjugated form of ATG5. This is now clarified in the figure legend. 

      (4) Figure 5G is missing in the manuscript. 

      Fig 5G is now mentioned in the text. Thank you.

      (5) The gRNA sequence information for FIP200 KO is missing in the Methods section. 

      Reference(s) to the already published gRNA sequence are in the manuscript. 

      (6) Suggest moving the last paragraph in Result section to Discussion section. 

      We kept this single-paragraph section in Results as it contains actual data.

      Reviewer #2 (Recommendations For The Authors): 

      (1) It is unclear why the rescue of VPS35KO cells in Fig 1C-D is so modest. 

      Complementation data depend on transfection efficiency and some variability is to be expected.

      Reviewer #3 (Recommendations For The Authors): 

      (1) Figures 2A, 2C, 2E, and 2G lack scale bars. Figure 2D has a small square above the y axis. 

      Relative scale bars are now included. 

      (2) Figures S3B, S3D, and S3F lack scale bars. 

      Relative scale bars are now included.

    1. eLife Assessment

      This important study uses reinforcement learning to study how turbulent odor stimuli should be processed to yield successful navigation. They find that there is an optimal memory length over which an agent should ignore blanks in the odor to discriminate whether the agent is still inside the plume or outside of it, complementing recent studies using RNNs and finite state controllers to identify optimal strategies for navigating a turbulent plume. While the overall strength of evidence is convincing, better justification for using Brownian motion as a recovery strategy and the addition of accompanying code for reproducibility would add to this strength.

    2. Reviewer #1 (Public review):

      Overall I found the approach taken by the authors to be clear and convincing. It is striking that the conclusions are similar to those obtained in a recent study using a different computational approach (finite state controllers), and lend confidence to the conclusions about the existence of an optimal memory duration. There are a few points or questions that could be addressed in greater detail in a revision:

      (1) Discussion of spatial encoding

      The manuscript contrasts the approach taken here (reinforcement learning in a grid world) with strategies that involve a "spatial map" such as infotaxis. The authors note that their algorithm contains "no spatial information." However, I wonder if further degrees of spatial encoding might be delineated to better facilitate comparisons with biological navigation algorithms. For example, the gridworld navigation algorithm seems to have an implicit allocentric representation, since movement can be in one of four allocentric directions (up, down, left, right). I assume this is how the agent learns to move upwind in the absence of an explicit wind direction signal. However, not all biological organisms likely have this allocentric representation. Can the agent learn the strategy without wind direction if it can only go left/right/forward/back/turn (in egocentric coordinates)? In discussing possible algorithms, and the features of this one, it might be helpful to distinguish<br /> (1) those that rely only on egocentric computations (run and tumble),<br /> (2) those that rely on a single direction cue such as wind direction,<br /> (3) those that rely on allocentric representations of direction, and<br /> (4) those that rely on a full spatial map of the environment.

      (2) Recovery strategy on losing the plume

      While the approach to encoding odor dynamics seems highly principled and reaches appealingly intuitive conclusions, the approach to modeling the recovery strategy seems to be more ad hoc. Early in the paper, the recovery strategy is defined to be path integration back to the point at which odor was lost, while later in the paper, the authors explore Brownian motion and a learned recovery based on multiple "void" states. Since the learned strategy works best, why not first consider learned strategies, and explore how lack of odor must be encoded or whether there is an optimal division of void states that leads to the best recovery strategies? Also, although the authors state that the learned recovery strategies resemble casting, only minimal data are shown to support this. A deeper statistical analysis of the learned recovery strategies would facilitate comparison to those observed in biology.

      (3) Is there a minimal representation of odor for efficient navigation?

      The authors suggest (line 280) that the number of olfactory states could potentially be reduced to reduce computational cost. This raises the question of whether there is a maximally efficient representation of odors and blanks sufficient for effective navigation. The authors choose to represent odor by 15 states that allow the agent to discriminate different spatial regimes of the stimulus, and later introduce additional void states that allow the agent to learn a recovery strategy. Can the number of states be reduced or does this lead to loss of performance? Does the optimal number of odor and void states depend on the spatial structure of the turbulence as explored in Figure 5?