5,827 Matching Annotations
  1. Feb 2025
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to quantify feral pig interactions in eastern Australia to inform disease transmission networks. They used GPS tracking data from 146 feral pigs across multiple locations to construct proximity-based social networks and analyse contact rates within and between pig social units.

      Strengths:

      (1) Addresses a critical knowledge gap in feral pig social dynamics in Australia.

      (2) Uses robust methodology combining GPS tracking and network analysis.

      (3) Provides valuable insights into sex-based and seasonal variations in contact rates.

      (4) Effectively contextualizes findings for disease transmission modeling and management.

      (5) Includes comprehensive ethical approval for animal research.

      (6) Utilizes data from multiple locations across eastern Australia, enhancing generalizability.

      Weaknesses:

      (1) Limited discussion of potential biases from varying sample sizes across populations

      This is a really good comment, and we will address this in the discussion as one of the limitations of the study

      (2) Some key figures are in supplementary materials rather than the main text.

      We will move some of our supplementary material to the main text as suggested.

      (3) Economic impact figures are from the US rather than Australia-specific data.

      We included the impact figures that are available for Australia (for FDM), and we will include the estimated impact of ASF in Australia in the introduction.

      (4) Rationale for spatial and temporal thresholds for defining contacts could be clearer.

      We will improve the explanation of why we chose the spatial and temporal thresholds based on literature, the size of animals and GPS errors.

      (5) Limited discussion of ethical considerations beyond basic animal ethics approval.

      This research was conducted under an ethics committee's approval for collaring the feral pigs. This research is part of an ongoing pest management activity, and all the ethics approvals have been highlighted in the main manuscript.

      The authors largely achieved their aims, with the results supporting their conclusions about the importance of sex and seasonality in feral pig contact networks. This work is likely to have a significant impact on feral pig management and disease control strategies in Australia, providing crucial data for refining disease transmission models.

      Reviewer #2 (Public review):

      Summary:

      The paper attempts to elucidate how feral (wild) pigs cause distortion of the environment in over 54 countries of the world, particularly Australia.

      The paper displays proof that over $120 billion worth of facilities were destroyed annually in the United States of America.

      The authors have tried to infer that the findings of their work were important and possess a convincing strength of evidence.

      Strengths:

      (1) Clearly stating feral (wild) pigs as a problem in the environment.

      (2) Stating how 54 countries were affected by the feral pigs.

      (3) Mentioning how $120 billion was lost in the US, annually, as a result of the activities of the feral pigs.

      (4) Amplifying the fact that 14 species of animals were being driven into extinction by the feral pigs.

      (5) Feral pigs possessing zoonotic abilities.

      (6) Feral pigs acting as reservoirs for endemic diseases like brucellosis and leptospirosis.

      (7) Understanding disease patterns by the social dynamics of feral pig interactions.

      (8) The use of 146 GPS-monitored feral pigs to establish their social interaction among themselves.

      Weaknesses:

      (1) Unclear explanation of the association of either the female or male feral pigs with each other, seasonally.

      This will be better explained in the methods.

      (2) The "abstract paragraph" was not justified.

      We have justified the abstract paragraph as requested by the reviewer.

      (3) Typographical errors in the abstract.

      Typographical errors have been corrected in the Abstract.

      Reviewer #3 (Public review):

      Summary:

      The authors sought to understand social interactions both within and between groups of feral pigs, with the intent of applying their findings to models of disease transmission. The authors analyzed GPS tracking data from across various populations to determine patterns of contact that could support the transmission of a range of zoonotic and livestock diseases. The analysis then focused on the effects of sex, group dynamics, and seasonal changes on contact rates that could be used to base targeted disease control strategies that would prioritize the removal of adult males for reducing intergroup disease transmission.

      Strengths:

      It utilized GPS tracking data from 146 feral pigs over several years, effectively capturing seasonal and spatial variation in the social behaviors of interest. Using proximity-based social network analysis, this work provides a highly resolved snapshot of contact rates and interactions both within and between groups, substantially improving research in wildlife disease transmission. Results were highly useful and provided practical guidance for disease management, showing that control targeted at adult males could reduce intergroup disease transmission, hence providing an approach for the control of zoonotic and livestock diseases.

      Weaknesses:

      Despite their reliability, populations can be skewed by small sample sizes and limited generalizability due to specific environmental and demographic characteristics. Further validation is needed to account for additional environmental factors influencing social dynamics and contact rates.

      This is a really good point, and we thank the reviewer for pointing out this issue. We will discuss the potential biases due to sample size in our discussion. We agree that environmental factors need to be incorporated and tested for their influence on social dynamics, and this will be added to the discussion as we have plans to expand this research and conduct, the analysis to determine if environmental factors are influencing social dynamics.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Consider moving some key figures from supplementary materials to the main text to strengthen the presentation of results.

      We included a new figure to strengthen the presentation of results (Figure 3a-b), which shows the node level measures by sex and for direct and indirect networks.

      (2) Expand discussion of limitations, particularly addressing potential biases from varying sample sizes across populations.

      We added more detail and clarity about this potential bias into the limitation section within the discussion: “Different populations in our study had varying numbers of collared individuals, with some populations having only two individuals at certain times. This variability in sample size across populations is a limitation when interpreting the results. Small populations are often the result of a few individuals being trapped and collared, and this does not necessarily reflect the actual number of individuals in those groups.” Moreover, while reviewing the effect of the potential bias, we found that a General Linear Mixed Effect Model (Table 1) was not optimal for analysing the effect of sex on the network measures, and therefore this analysis has been done again using a non-parametric test (Wilcoxon rank-sum test)  for direct and indirect networks based on a 5 metres threshold (Table 1).

      (3) If available, include Australia-specific economic impact data in the introduction.

      We included the impact figures that are available for Australia (for FDM) in the introduction.

      (4) Clarify the rationale for chosen spatial and temporal thresholds for defining contacts.

      This has been added in the methodology: “Direct contact was defined when two individuals interacted either at 2, 5, or 350-metre buffers within a five-minute interval [36]. A previous study used 350 metres as a spatial threshold [16], while others use the approximate average body length of an individual [36]”

      (5) Consider adding a brief discussion of ethical considerations beyond basic animal ethics approval, addressing aspects like animal welfare during collaring and potential environmental impacts.

      Feral pigs are an invasive species in Australia, and managing their population is crucial to protecting native ecosystems. The trapping and collaring of these animals have been conducted following the stringent animal welfare requirements necessary to obtain animal ethics approval in Australia. However, it is important to consider the broader ethical implications. Animal welfare during collaring is a critical aspect and involves minimising stress and physical harm to the animals. The collars used are lightweight and properly fitted only on adults due to welfare issues collaring juveniles.

      (6) Add a statement about data availability/accessibility.

      The GPS data cannot be shared; however, the R codes will be deposited in GitHub (https://github.com/Tatianaproboste/Feral-Pig-Interactions) and the link has been added in the final version.

      (7) Expand on the implications of seasonal variation in contact rates for disease management strategies in the discussion.

      We have added this information in the discussion: “For example, controlling an outbreak during summer would potentially require more resources than an outbreak in other seasons due to the higher number of contact between individuals during summer.”

      Reviewer #2 (Recommendations for the authors):

      The typographical errors in the abstract to be corrected are:

      (1) Line 22: Remove the "are" before "threaten".

      This has been corrected.

      (2) Line 24: Replace the "to" before "extinction" with "into".

      This has been corrected.

      (3) Line 28: Rephrase the sentence.

      ‘Yet social dynamics are known to vary enormously from place to place, so knowledge generated for example in USA and Europe might not easily transfer to locations such as Australia.’

      (3) Line 29: Insert a "comma" after "Here".

      This has been corrected.

      (4) Lines 33 -34: Explain, clearly, the contact rates; is it between females to females or females to males?

      We have improved this phrase and now it reads: “…. with females demonstrating higher group cohesion (female-female) and males acting as crucial connectors between independent groups.”

      (5) Line 36: Make yourselves clear about what you mean by "targeting adult male".

      We believe “targeting adult males” is correct in this context.

      Reviewer #3 (Recommendations for the authors):

      (1) Line 22 and 44, I think are threaten "are" should be removed for better clarity.

      This has been corrected.

      (2) Line 71, the source and not "force" of infection.

      The force of infection is correct here.

      (3) Line 72, population "of".

      This has been corrected.

      (4) Under statistical analysis, the software version should be included.

      R has changed to multiple versions since we started this analysis.

      (5) Terminological consistency: as far as possible try to be consistent with the terms used in the text, such as using "contact rate" instead of "interaction rate" in order not to puzzle the readers.

      We have changed most of the “interactions” to “contact” instead as suggested.

      (6) Correct Typos: Identify typos and grammatical inconsistencies of any kind, especially in those complex sentences that may be hard to follow.

      The typos have been checked.

      (7) Under the methodology, briefly describe why specific thresholds were chosen and any limitations.

      We added the following into the method: “Direct contact was defined when two individuals interacted either at 2, 5, or 350-metre buffers within a five-minute interval [36]. A previous study used 350 metres as a spatial threshold [16], while others use the approximate average body length of an individual [36]”

      (8) The discussion should be strengthened by drawing clear links between the findings and actionable management strategies.

      We have strengthened the discussion by adding more specific actionable management strategies. For example, controlling an outbreak during summer would potentially require more resources than an outbreak in other seasons due to the higher number of contacts between individuals during summer.

      (9) Did you consider additional environmental factors, such as rainfall, food availability, or habitat features, to better understand how these influence seasonal variations in pig interactions and contact rates?

      This is something that we have in mind and will explore in future research. This has been partially explored but is based on how environmental factors and seasons affect the home range (Wilson et al 2023).

      (10) Figure Legends: Add more detailed descriptions in figure legends, especially for those figures showing network metrics or contact rates.

      More information has been added to the figure legends.

      (11) The paper includes too many figures, and thus, it is recommended to simplify or merge some figures where appropriate. In particular, this is recommended for those figures that plot more network measures across thresholds. Adding clear, summarized captions with interpretation on threshold and measure significance would be a great help in interpreting complicated visualizations.

      The figure that shows the comparison between global network measures, including average local transitivity, edge density, global transitivity, mean distance and number of edges for direct and indirect networks has been moved to supplementary material (Figure S3). We also included direct and indirect model-level measures by sex as in Figure 3 and improved the captions of the figures presented in the main document.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this valuable study, the authors found that the macrolide drug rapamycin, which is an important pharmacological tool in the clinic and the research lab, is less specific than previously thought. They provide solid functional evidence that rapamycin activates TRPM8 and develop an NMR method to measure the specific binding of a ligand to a membrane protein.

      Strengths:

      The authors use a variety of complementary experimental techniques in several different systems, and their results support the conclusions drawn.

      Weaknesses:

      Controls are not shown in all cases, and a lack of unity across the figures makes the flow of the paper disjointed. The proposed location of the rapamycin binding pocket within the membrane means that molecular docking approaches designed for soluble proteins alone do not provide solid evidence for a rapamycin binding pocket location in TRPM8, but the authors are appropriately careful in stating that the model is consistent with their functional experiments.

      Impact:

      This work provides still more evidence for the polymodality of TRP channels, reminding both TRP channel researchers and those who use rapamycin in other contexts that the adjective "specific" is only meaningful in the context of what else has been explicitly tested.

      Reviewer #2 (Public Review):

      Summary:

      Tóth and Bazeli et al. find rapamycin activates heterologously-expressed TRPM8 and dissociated sensory neurons in a TRPM8-dependent way with Ca2+-imaging. With electrophysiology and STTD-NMR, they confirmed the activation is through direct interaction with TRPM8. Using mutants and computational modeling, the authored localized the binding site to the groove between S4 and S5, different than the binding pocket of cooling agents such as menthol. The hydroxyl group on carbon 40 within the cyclohexane ring in rapamycin is indispensable for activation, while other rapalogs with its replacement, such as everolimus, still bind but cannot activate TRPM8. Overall, the findings provide new insights into TRPM8 functions and may indicate previously unknown physiological effects or therapeutic mechanisms of rapamycin.

      Strengths:

      The authors spent extensive effort on demonstrating that the interaction between TRPM8 and rapamycin is direct. The evidence is solid. In probing the binding site and the structural-function relationship, the authors combined computational simulation and functional experiments. It is very impressive to see that "within" a rapamycin molecule, the portion shared with everolimus is for "binding", while the hydroxyl group in the cyclohexane ring is for activation. Such detailed dissection represents a successful trial in the computational biology-facilitated, functional experiment-validated study of TRP channel structuralactivity relationship. The research draws the attention of scientists, including those outside the TRP channel field, to previously neglected effects of rapamycin, and therefore the manuscript deserves broad readership.

      Weaknesses:

      The significance of the research could be improved by showing or discussing whether a similar binding pocket is present in other TRP channels, and hence rapalogs might bind to or activate these TRP channels. Additionally, while the finding on TRPM8 is novel, it is worthwhile to perform more comprehensive pharmacological characterization, including single-channel recording and a few more mutant studies to offer further insight into the mechanism of rapamycin binding to S4~S5 pocket driving channel opening. It is also necessary to know if rapalogs have independent or synergistic effects on top of other activators, including cooling agents and lower temperature, and their dependence on regulators such as PIP2.

      Additional discussion that might be helpful:

      The authors did confirm that rapamycin does not activate TRPV1, TRPA1 and TRPM3. But other TRP channels, particularly other structurally similar TRPM channels, should be discussed or tested. Alignment of the amino acid sequences or structures at the predicted binding pocket might predict some possible outcomes. In particular, rapamycin is known to activate TRPML1 in a PI(3,5)P2-dependent manner, which should be highlighted in comparison among TRP channels (PMID: 35131932, 31112550).

      Reviewer #3 (Public Review):

      Summary:

      Rapamycin is a macrolide of immunologic therapeutic importance, proposed as a ligand of mTOR. It is also employed as in essays to probe protein-protein interactions.

      The authors serendipitously found that the drug rapamycin and some related compounds, potently activate the cationic channel TRPM8, which is the main mediator of cold sensation in mammals. The authors show that rapamycin might bind to a novel binding site that is different from the binding site for menthol, the prototypical activator of TRPM8. These solid results are important to a wide audience since rapamycin is a widely used drug and is also employed in essays to probe protein-protein interactions, which could be affected by potential specific interactions of rapamycin with other membrane proteins, as illustrated herein.

      Strengths:

      The authors employ several experimental approaches to convincingly show that rapamycin activates directly the TRPM8 cation channel and not an accessory protein or the surrounding membrane. In general, the electrophysiological, mutational and fluorescence imaging experiments are adequately carried out and cautiously interpreted, presenting a clear picture of the direct interaction with TRPM8. In particular, the authors convincingly show that the interactions of rapamycin with TRPM8 are distinct from interactions of menthol with the same ion channel.

      Weaknesses:

      The main weakness of the manuscript is the NMR method employed to show that rapamycin binds to TRPM8. The authors developed and deployed a novel signal processing approach based on subtraction of several independent NMR spectra to show that rapamycin binds to the TRPM8 protein and not to the surrounding membrane or other proteins. While interesting and potentially useful, the method is not well developed (several positive controls are missing) and is not presented in a clear manner, such that the quality of data can be assessed and the reliability and pertinence of the subtraction procedure evaluated.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major points

      (1) Given the novelty of the STTD NMR approach, please provide more details and data for the reader.

      • I would like to see all of the collected spectra so that readers can see and judge the effect sizes for themselves, perhaps as an additional supplementary figure.

      We agree with the reviewer that the data transparency of the NMR measurements should be improved. We changed panel C of Figure 2 in the main text and provided all the STD and the computed STDD and STTD spectra recorded on one set of experiments. We carried out additional experimental replicas on new samples and addressed the variability of cell samples by rescaling the STD effects based on reference <sup>1</sup>H measurements. We provided supplementary spectra of the reference experiments without saturation (Figure S5) and the obtained STTD spectra from the three parallel NMR sessions (Figure S6).

      • I appreciate the labels for STDD-1, STDD-2, and STTD on the lower two spectra of Figure 2C. Is the top spectrum from STD-1 or is it prior to saturation? In Figure 2C, what do the x1 and x2 notations on the right-hand side of the spectra indicate?

      We showed the top spectrum as an overview and a demonstration of the spectral complexity of the samples. <sup>1</sup>H experiments were run before the STD measurements to assess the sample quality and stability. The demonstrated spectrum on sample 1 (TRPM8 with rapamycin in HEK cells) was recorded with more transients than the corresponding STDs, thus it is only visually comparable with the difference spectra after scaling (2x). Figure 2 was changed and all the spectra were replaced as mentioned before. All the recorded <sup>1</sup>H-experiments without saturation including the one removed are now available in the supplementary information (Figure S5).

      • The STTD NMR results with WT TRPM8 are consistent with rapamycin binding directly to the channel. Testing whether rapamycin binding observed with STTD NMR is disrupted by one of the most compelling mutations (D796A, D802A, G805A, or Q861A) would be a further test of this direct interaction.

      We thank the reviewer for the suggestion and agree that testing the most compelling mutants would be a promising next step. These mutations were generated in plasmid vectors and only transiently transfected into HEK cells. For NMR analysis we would need a high amount of cells stably overexpressing the mutant channels which were not available for experimentation.

      • Given that this is not a methods paper, it is probably outside the scope to further validate the STTD NMR measurements by performing parallel ITC, SPR, MST, or radiolabeled ligand experiments. Nevertheless, I would be excited to see such a comparison since STTD NMR appears to have promise as an experimental technique for assessing ligand binding to membrane proteins that does not require large amounts of purified protein or radioactive isotopes.

      We agree with the reviewer that additional independent biophysical measurements on the interactions are necessary to further validate the STTD methodology. This paper is a preliminary demonstration of the STTD concept and our group is currently working on the challenges of on-cell NMR (e.g., sample and spectral complexity) and the standardization of the proposed workflow.     

      (2) Please clarify the methods used to model of rapamycin binding. Docking can be imprecise in TRP channels, even with a sophisticated docking scheme (Hughes et al., 2019, doi: https://doi.org/10.7554/eLife.49572.001).  

      Thank you for mentioning this point and providing the reference. We have further clarified our methods and included the reference in our discussion, indicating the limitations of our approach.

      • As a positive control, does the docking strategy accurately predict binding of known compounds (menthol, icilin, etc.) to TRPM8 consistent with cryo-EM structures?  

      Yes, the binding site for menthol, based on a similar docking strategy as for rapamycin, is also presented, and matches with predictions from other publications. This is now clarified in the revised manuscript.

      • Why was homology modeling to the human sequence used with the mouse structure but not the avian structure?  

      At this onset of the project, only the avian structure was available, and it was used in the primary docking. Later, to get more precise docking relevant for human TRPM8 pharmacology, we did revert to the then available structure of the mouse ortholog.  

      • How many rapamycin structural clusters were built, and how many structures were there in each cluster? How many were used? "most populated" is unspecific.  

      Thank you for your comment. We have added the following highlighted information to the methods section to address your comment:

      “Representative conformations of rapamycin were identified by clustering of the 1000-membered pools, having the macrocycle backbone atoms compared with 1.0 Å RMSD cut-off. Middle structures of the ten most populated clusters, accounting for more than 90% of the total conformational ensemble generated by simulated annealing, were used for further docking studies. To refine initial docking results and to identify plausible binding sites, the above selected rapamycin structures were docked again, following the same protocol as above, except for the grid spacing which was set to 0.375 Å in the second pass. The resultant rapamycin-TRPM8 complexes were, again, clustered and ranked according to the corresponding binding free energies. Selected binding poses were subjected to further refinement. The three most populated and plausible binding poses were further refined by a third pass of docking, where amino acid side chains of TRPM8, identified in the previous pass to be in close contact with rapamycin (< 4 Å), were kept flexible. Grid volumes were reduced to these putative binding sites including all flexible amino acid side chains (21.0-26.2 Å x 26.2-31.5 Å x 24.8-29.2 Å).”

      However, it is important to clarify that the clusters are not built and their number is not specified by the user. The number of clusters found depends on how similar the structures are in the structural ensemble analyzed by clustering. A high number of clusters indicates a diverse, whereas a low number suggests a uniform structural ensemble. Furthermore, it is arbitrarily controlled by the similarity cutoff specified by the user. If the cutoff is selected well, then the number of structures is different in each cluster. There are some highly populated clusters and a few which only have one structure. The selection of how many cluster representatives are used is usually based on the decision of whether or not the sum of the population of selected clusters sufficiently covers the mapped conformational space.

      • Additionally, the rapamycin poses were generated using a continuum solvent model that is unlikely to replicate the conditions existing in the lipid bilayer or in a lipid-exposed binding pocket as is predicted here. It is therefore possible that the rapamycin poses chosen for docking do not represent the physiological rapamycin binding pose, hampering the ability of the docking algorithm to find an appropriate docking pocket.  

      • Furthermore, accurately docking that may bind to membrane-exposed pockets is a challenging problem, particularly because many scoring algorithms, including those employed by Autodock, do not distinguish between solvent-exposed and membrane-exposed faces of the protein. This affects the predicted binding energies.  

      We appreciate the reviewer's insightful comments. We add a note in discussion part, mentioning these important limitations.  

      • In Figure 4, it appears that the proposed rapamycin binding pocket is located at the interface between two subunits, but only one is shown. Is there any contact with residues in the neighboring subunit? Based on Figure S4, I assume not, but am unsure.

      Based on the estimated distances, we do not think that there are any relevant interactions with residues from neighboring subunits. This is now indicated in the results section.

      • Consider uploading the rapamycin-docked model to a public repository such as Zenodo for readers to examine and manipulate themselves  

      As suggested, the model will be uploaded in a public repository. A link to the file on Zenodo is now included.

      (3) Please discuss the spatial location of the proposed rapamycin binding pocket relative to the vanilloid binding pocket in TRPV1.

      • The mutagenesis indicates that D745, D802, G805, and Q861 are most important for rapamycin sensitivity in TRPM8. Interestingly, the proposed rapamycin binding pocket appears to overlap spatially with the vanilloid binding pocket in TRPV1. Consistent with this, Q861 aligns with E570 in TRPV1, which is a critical residue for resiniferatoxin sensitivity. Indeed, similar to Q861's modeled proximity to the cyclohexyl ring, the hydroxyl group of the vanillyl moity of capsaicin (4DY in 7LR0, for example) is in proximity to E750 in TRPV1. Additionally, searching PubChem by structural similarity suggests that vanillyl head group of the TRP channel modulators capsaicin and eugenol are similar structurally to the trans-2Methoxycyclohexan-1-ol ring. Without overlaying the two structures myself, it is difficult to say more than that, but I encourage the authors to comment on any similarities and differences they observe.

      • If the proposed rapamycin pocket is indeed similar to the location of the vanilloid binding site, the authors may wish to discuss other TRPM channel structures that show ligands and lipids bound to this pocket because this provides evidence that this pocket influences TRPM channel function. For example, how does the proposed rapamycin binding pocket compare to TRPM8 bound to agonist AITC (PDBID 8e4l), TRPM5 bound to inhibitor NDNA (7mbv), and TRPM2 bound to phosphatidylcholine (6co7)?

      • Other TRP channel structures with ligands or lipids modeled in this region include TRPV1 bound to resiniferatoxin, capsaicin, or phosphatidylinositol (7l2j, 7l24, 7l2s, 7l2t, 7l2u, 7lp9, 7lpc, 7lqy, 7mz6, 7mz9, 7mza); TRPV3 bound to phosphatidylcholine (7mij, 7mik, 7mim, 7min, 7ugg); TRPV5 bound to econazole (6b5v) or ZINC9155 (6pbf); TRPV6 bound to piperazine (7d2k, 7k4b, 7k4c, 7k4d, 7k4e, 7k4f) or cholesterol hemisuccinate (7s8c); TRPC6 bound to BTDM (7dxf) or phosphatidylcholine (6uza); and TRP1 bound to PIP2 (6pw5).

      We thank the reviewer for these valuable insights. We have included some additional discussion highlighting the similarities between the proposed rapamycin binding site and some of the other ligandchannel interactions in the TRP superfamily, in particular the well-known vanilloid binding site in TRPV1. However, to keep the discussion focused, we have not fully discussed all the indicated interactions, to best serve the clarity and scope of the manuscript.  

      (4) I would like to see negative control calcium imaging and electrophysiology data with untransfected HEK cells to confirm that the observed activation is mediated by TRPM8 to parallel the TRPM8 KO sensory neuron experiments.  

      This important information is now included in the revised manuscript (Figure S2).

      (5) The DM-nitrophen Ca uncaging experiments are an interesting method to test Ca sensitivity of rapamycin, but the results make these experiments more complex to interpret. Ca has been shown to be an obligate cofactor for icilin sensitivity in TRPM8 under conditions where both the internal and external Ca concentrations are tightly controlled (Kuhn et al., 2009, doi: https://doi.org/10.1074/jbc.M806651200), which is necessary because TRPM8 allows Ca permeation through the pore when open. The large icilin-evoked currents in Figure 5A and 5B indicate that the effective intracellular calcium concentration is not zero prior to calcium uncaging, which may be high enough to mask any Ca-dependence of rapamycin that occurs at low Ca concentrations. Given this ambiguity, the inside-out patch clamp configuration would provide more control over the internal and external Ca concentration than is achieved in the Ca uncaging experiments. Because the authors have already demonstrated their ability to perform such experiments (Figure 2 panel B), it would be nice to see tests of Ca dependence using inside-out patch clamp.

      As was already shown in Figure 2, Rapamycin activates TRPM8 in inside-out patches, and these experiments were performed using calcium-free cytosolic and extracellular solutions. Note that earlier studies have already shown that icilin activates outward TRPM8 currents in the full absence of calcium: see e.g. Janssens et al. eLife, 2016. Chuang et al. 2004. In the case of Icilin, increased calcium further potentiates the current, which is more prominent for the inward current.

      In the Ca uncaging experiments, considering the Kd of DM-nitrophen of 5 nM, we expect that the intracellular calcium concentration before the UV flash would be approximately 15 nM. Taken together, both the inside-out experiments and the flash uncaging experiments confirm that rapamycin responses are not directly regulated by intracellular calcium, contrary to icilin.

      (6) Sequence conservation within TRPM channels could be used in combination with the binding pocket model and mutagenesis to predict rapamycin selectivity for TRPM8 over other TRPMs. For example, some important residues, specifically G805 and Q861, are not conserved in TRPM3, which agrees with the lack of rapamycin sensitivity observed in TRPM3 (Figure S1). Further sequence comparison would provide testable hypotheses for future exploration of rapamycin sensitivity in other TRPMs that could validate the proposed binding pocket.

      Thank you for the suggestion. We now indicate in the discussion that only some of the key residues are conserved and make suggestions for future studies.  

      (7) Please unify the color scheme across the figures to improve clarity.

      • The authors frequently use the colors blue, red, and green to represent menthol and rapamycin in the figures, but they are inconsistent in which one represents menthol and which represents rapamycin. It would be clearer for the audience if, for example, rapamycin is always represented with red and menthol is always represented with blue.  

      Thank you for pointing this out. We have made the coloring schemes more uniform.

      • In Figure 1, panel E, the coloring for Menthol and Pregnenolone Sulfate changes between the TRPM8+/+ and TRPM8-/- panels.  

      Thank you for pointing this out. We have updated the coloring schemes to ensure consistency between the TRPM8+/+ and TRPM8-/- panels.

      • Figure 3 B and E, perhaps color the plot background as a 3-color gradient (blue to white to red) rather than yellow and aqua. Center the white at the WT ratio, keeping the dashed line, with diverging gradients to, for example, blue for mutations that selectively affect menthol sensitivity and red for rapamycin.

      Thank you for the suggestion – we have changed the figure accordingly.  

      • Figure 4 panels A and B use the same color (green) to show two different things (menthol molecule and mutated residues that affect rapamycin sensitivity). It would be clearer for readers to change these colors to agree with a unified color scheme such that, for example, the menthol molecule is colored blue and the rapamycin-neighboring residues are colored red.

      Thank you for the suggestion. We have updated the figure to use a unified color scheme, with the menthol molecule now colored green and the rapamycin-neighboring residues colored cyan, to enhance clarity for readers.

      • I recommend adding a figure or panel that shows side chains for all mutations, colored by menthol/rapamycin selectivity, as indicated by the functional data in Figure 3B and 3E. This will highlight spatial patterns of the selective residues that are discussed in the text.

      Thank you for your suggestion, we added all the side residues in Figure S10.

      Minor points

      (1) It would be nice to have one more concentration data point in the middle of the dose response curve shown in Figure 1 panel B. The response is not saturating at the top or foot of the curve in Figure 1 panel D, precluding a confident fit to a two-state Boltzmann function.

      Instead of adding a single data point to this figure, we performed independent measurements on a plate reader system, comparing concentration responses at room temperature and 37 degrees. These data are now included as Figure S1.   

      (2) The cartoon in Figure 2 panel B should be made more accurate. For example, only the transmembrane helices should be depicted embedded in the membrane, not the whole protein including the intracellular domain. Because the experiment was performed with cells, change the orientation of TRPM8 in the cartoon to show the intracellular domain of the protein facing away from the extracellular side of the membrane where the rapamycin is applied.

      Thank you for this comment. We have corrected the cartoon accordingly

      (3) Perhaps put the yellow circles under or around the carbon atoms to which the identified hydrogen atoms belong in Figure 2 panel E and Figure 4 panel C. I found it difficult to visualize and compare the STTD NMR results with the predicted binding pocket.

      Thank you for the feedback. We have added yellow circles around the carbon atoms corresponding to the identified hydrogen atoms in Figure S9.  

      (4) Regarding the sentence on p. 12 beginning "In agreement with this notion..."

      • Include icilin, Cooling Agent-10, and WS-3 as other cooling agents whose sensitivity has been modulated by mutation of Y745

      • Cryosim-3 responses were not tested in either of the two papers cited; please add citation to Yin et al., 2022, doi: https://doi.org/10.1126/science.add1268 .

      • Other relevant papers include:

      – Malkia et al., 2009, doi: https://doi.org/10.1186/1744-8069-5-62 which includes molecular docking showing the hydroxyl group of menthol interacting with Y745

      – Beccari et al., 2017, doi: https://doi.org/10.1038/s41598-017-11194-0 Figure 5 shows disruption of icilin and Cooling Agent-10 sensitivity by Y745A

      – Palchevskyi et al., 2023, doi: https://doi.org/10.1038/s42003-023-05425-6 Figure 3 shows disruption of icilin, cooling agent-10, WS-3, and menthol sensitivity by Y745A o Plaza-Cayon et al., 2022, https://doi.org/10.1002%2Fmed.21920 Review of TRPM8 mutations

      • typo: Y754H should be Y745H

      Thank you for these suggestions. We have added the above references to the text and corrected the typo.

      (5) The authors use the competitive action of everolimus on rapamycin activation as evidence that the different macrolides are binding to the same binding pocket. In addition, prior work showed that Y745H and N799A mutations (which render TRPM8 insensitive to menthol and icilin, respectively) do not affect TRPM8 sensitivity to the structurally-related compound tacrolimus (Arcas et al., 2019). This is consistent with the docking and mutagenesis results presented here.

      Thank you for this valuable suggestion. We discuss these data in the revised version.

      (6) Rapamycin sensitivity has also been observed in TRPML1 (Zhang et al. 2019, doi: https://doi.org/10.1371/journal.pbio.3000252).

      We added a short reference to this interesting finding in the discussion.

      (7) The whole-cell currents are very large in several of the electrophysiology experiments (for example Figure 3 panel D and Figure S1), which could lead to artifacts of voltage errors as well as ion accumulation/depletion. However, because this paper is not relying on reversal potential measurements or trying to quantify V1/2, these errors are unlikely to affect the qualitative conclusions drawn.

      This is a fair point, but indeed unlikely to affect our main conclusions. Note that we compensated between 70 and 90% of the series resistance, so we don’t expect voltage errors exceeding ~10 mV.

      (8) Ligand sensitivity is frequently species-dependent in TRP channels, so it is interesting that multiple species were used here and that both human and mouse isoforms exhibit rapamycin sensitivity. It should be emphasized that human TRPM8 was used in the calcium imaging and electrophysiology experiments, as well as some docking models, while the mouse isoform was used in the sensory neuron experiments and a mutated avian isoform was used for some docking models.

      This information is available in the Methods and we believe it is clear for the readers.

      (9) Perhaps discuss the unclear mechanism of G805A action in icilin (but not menthol, cold, or praziquantel) sensitivity because it is not in direct contact with the ligand. For example, Yin et al., 2019 propose flexibility allowing Ca binding site and larger binding site for icilin.

      Yin et al. (2019) suggests that the G805A mutation impacts icilin sensitivity by influencing the flexibility of the binding site and possibly affecting calcium binding. In our study, we found that G805A significantly reduces rapamycin sensitivity, likely due to its direct role in the rapamycin binding pocket rather than affecting calcium binding. This is now briefly mentioned in the results section.

      (10) The Figure S1 legend indicates that n=5 for all panels, so please show normalized population IV curves rather than individual examples. Additionally, it would be interesting to see what happens when each agonist is co-applied with rapamycin. Does rapamycin potentiate or inhibit agonist activation in these channels and/or TRPM8?

      We believe that normalized population IVs are not ideal for representing whole-cell currents, considering the substantial variation in current densities. We therefore prefer to show example traces in Figure S3 of the revised version but include mean values of current densities for all tested cells in the text.

      While the effects of co-application of rapamycin with activating ligands could be of interest, we consider this somewhat outside the scope of the present manuscript. The combination of HEK293 cell experiments, along with results obtained in WT and TRPM8-deficient mice does, in our opinion, sufficiently describe the selectivity of rapamycin towards TRPM8 compared to other sensory TRP channels.

      (11) Figure S1 panel A does not contain units for Rapamycin or AITC concentrations.

      Thank you for pointing this out. The units were added to the figure.  

      (12) It would be nice if the authors characterized the different mutations as predicted to contribute to site 1 (D796, H845, Q861, based on Figure S4), site 2 (D796, M801, F847, and R851), and/or site 3 (F847, V849, and R851).

      The indicated mutants were all tested, as shown in Figure 3.

      (13) The numbering scheme in Figure S4 does not appear to match the residue numbers in the rest of the paper for certain residues (HIS-844 rather than H845, PHE-846 rather than F847, VAL-848 rather than V849, ARG-850 rather than R851, and GLN-860 rather than Q861), and labels are often overlapping and difficult to see. I also find the transparent spheres very difficult to distinguish from the transparent background, which makes it difficult to appreciate the STTD NMR data overlay.

      We apologize for the confusing numbering scheme. The lower numbers refer to the initial docking that was done using the avian TRPM8 ortholog. We have made a newer, clearer version of Figure S4 and inserted as Figure S9.  

      (14) Please superpose the Ligplots in Figure S5 panels E and F as described in the LigPlus manual (https://www.ebi.ac.uk/thornton-srv/software/LigPlus/manual/manual.html) to facilitate easier comparison.

      Thank you for the suggestion. We followed the suggestion to superpose the Ligplots as described but found that the result was visually cluttered and difficult to interpret. To avoid confusion, we instead decided to remove panels E and F from Figure S5, as we believe that the visualization in panels A-D is clear and informative.

      (15) Some n values are missing in figure legends.

      We checked all legends, and added n numbers were missing.

      (16) There is an inconsistent specification of error bars as SEM in the figure legends, though it is specified in methods.

      A question for my own edification: Here, you have looked at ligand interactions with the protein by saturating the protein resonances and observing transfer to the ligand. Would it be possible to instead saturate lipid or solute resonances and observe transfer to a ligand? I am curious whether this would be one way to measure equilibrium partitioning of ligand into a membrane and/or determine the effective concentration of a ligand in the membrane. Additionally, could one determine whether the compound is fully partitioned into the center of the membrane or just sitting on the surface?

      The reviewer highlights an interesting aspect. The widely used WaterLOGSY NMR experiment (doi: 10.1023/a:1013302231549) saturates water molecules then the magnetization is transferred to the ligand of interest. Characteristic changes in ligand resonances are observed in the case of a binding event with proteins. On the other hand, the selective saturation of lipids is -while theoretically possible –technically challenging mainly because of the inherent low signal-dispersion of lipids and peak overlapping with ligand resonances. Additionally, lipid systems are more dynamic compared to proteins and ligand-lipid interactions could be weaker and less specific, significantly affecting the sensitivity of STD experiments.

      Reviewer #2 (Recommendations For The Authors):

      Major:

      • Is it feasible to test rapamycin on TRPM8 with single-channel recording? This will allow us to better probe the mechanism of rapamycin activation and compare it with menthol, with parameters of singlechannel conductance and maximal open probability.

      In our experience, it is very difficult to obtain single-channel recordings from TRPM8. The channel expresses at high densities, typically leading to patches contain multiple channels, making a proper analysis of mean open and closed times very difficult. Therefore, we have decided not to include such measurements in the manuscript.

      • The authors classified rapamycin as a type I agonist, the type that stabilizes the open conformation, same as menthol but more prominent. Does that indicate that rapamycin work synergistically (rather than independently) with menthol, because co-application of them can allow them to add to each other in stabilizing the open conformation? I wonder if the authors agree that this could be tested with experiments as in Figure S3, by showing a much more prolonged deactivation with co-application of menthol and rapamycin than applying each alone.

      Thank you for the insightful suggestion. We conducted co-application experiments, and our results show that the deactivation time is indeed significantly prolonged when both compounds are applied together compared to each alone. In fact, very little deactivation is seen when both compounds are co-applied, which made it virtually impossible to perform reliable fits to the deactivation time course for the Menthol+Rapamycin condition. Instead, we have now included summary results showing the percentage of deactivation after 100 ms. We included these findings in FigureS8.  

      • It could be tested whether rapamycin activation of TRPM8 requires or overrides the requirement of PIP2 with inside-out patch by briefly exposing the patch to poly-lysine to sequester PIP2.

      This is certainly a good suggestion for further follow-up studies. However, we considered that examination of the (potential) interaction between ligands and PIP2 was outside the scope of the current manuscript.

      • Figure 1C suggests that the authors test rapamycin when there is a relatively high baseline TRPM8 activation (prior to rapamycin) activation. This raises the possibility that rapamycin is more a potentiator than an activator. I wonder if the following two experiments could address it: (1) perfuse rapamycin while holding at different membrane potentials, wash-off rapamycin in the solution and quickly (in a few seconds) test the activated current magnitude (before rapamycin dissociation), to compare whether a more depolarized membrane potential (high baseline open probability) allows rapamycin to potentiate more. (2) Perform the experiment at a higher temperature (low baseline open probability) and test whether rapamycin EC50 shifts to the right.

      Thank you for the thoughtful suggestion. Overall, we are not really in favor of making a distinction between a potentiator and an activator since it is not really feasible to create a situation where TRPM8 activity is zero. As suggested, we performed the dose response experiment at a higher temperature (37 °C) and observed that rapamycin’s EC<sub>50</sub> shifts to the right FigureS2. This is similar to what has been observed for menthol on TRPM8 and for many other ligands on other temperature-sensitive TRP channels.

      Minor:

      (1) The author should report hill coefficient together with EC50 when showing dose-responses.

      We have added Hill coefficients for all the fits.

      (2) In Figure 1 (E, F), it might be clearer to use Venn-diagram to show whether there is overlapping among rapamycin-, menthol-, and cinnamaldehyde-responsive neurons. According to the authors' explanation, we can predict that rapamycin-insensitive, menthol-sensitive neurons should predominantly be cinnamaldehyde-responsive.

      Thank you for your suggestion. In these experiments, we applied several agonists and the combination of them would result in a visually crowded Venn diagram difficult to interpret. However, we agree, with the reviewer’s suggestion, and discuss the percentage of the cinnamaldehyde+ neurons in the rapa- menthol+ population in Trpm8<sup>-/-</sup> neurons.

      (3) In Figure 3(C), since F847 does not respond to either menthol or rapamycin, it should be excluded from (B). Otherwise it is misleading.

      Thank you for pointing this out. To clarify, we have included a calcium imaging trace for the F847 mutant, demonstrating a clear response to rapamycin in FigureS9. This additional data highlights that F847 does respond to rapamycin, albeit with a more modest response amplitude. This is now also clarified in the results section.  

      (4) The word "potency" in pharmacology usually refers to a smaller EC50 number in dose-dependent experiments. In "Effect of rapamycin analogs on TRPM8" session, the authors use "potency" to refer to response to a single-dose experiment of different compounds. The experiment does not measure potency.

      Thank you for pointing out this mistake. We have corrected the text and replaced “potency” with “efficacy”.

      (5)  "2-methoxyl-" is misspelled in the text body.

      We have corrected the typo.

      (6) It will be nice to include "vehicle" in Figure 6B, or alternatively normalize all individual traces to vehicle. In Figure 6C and D, everolimus has almost no effect with compared to vehicle, and should not be shown as if it had ~8% in Figure 6B.

      We have added the vehicle values to Figure 6B from the same experiments.

      Reviewer #3 (Recommendations For The Authors):

      (1) The NMR method presented here as novel and employed to identify a proposed molecule bound to a membrane protein (TRPM8 in this case) is not well explained and presented. Since several spectra need to be subtracted, the authors should present the raw data and the results of the subtractions step by step. Also, it seems that the height of the peaks in each spectra will be highly variable and thus a reliable criterion employed to scale spectra before subtraction. None of these problems are discussed of described.

      The reviewer is right, that the data transparency should be improved and due to the high molecular complexity of the samples the size of the STD effects should be carefully scaled. We carried out additional experimental replicas on new samples and addressed the inherent sample/peak height variability by rescaling the STD effects based on reference <sup>1</sup>H measurements. We provided supplementary spectra of the reference experiments without saturation (Figure S5) and the computed STTD spectra from three parallel NMR sessions (Figure S6). We changed panel C of Figure 2 in the main text and provided all the STD and the computed STDD and STTD spectra recorded on one set of NMR experiments. We added the following paragraph to the main text: “To address the effect of the inherent variability of cellular samples on peak heights, STD effects were normalized based on the comparison of independent <sup>1</sup>H experiments (Figure S5). Three STTD replicates were computed, unambiguously confirming direct binding to TRPM8 in two datasets (Figure S6 A,B)”.

      Importantly since this signal subtraction method is proposed as a new development, control experiments employing well-established pairs of ligand and membrane protein receptor should be performed to demonstrate the reliability of the method.

      We agree with the reviewer, that the STTD experiment as a new development needs further validation, however, this paper is a preliminary demonstration of a new strategy building on the well-established STD and STDD NMR methodologies. Our group is actively engaged in studying additional biological samples to enhance our understanding of the applicability of STTD NMR. These efforts also aim to address challenges such as sample and spectral complexity by refining and standardizing the proposed workflow.

      (2) The tail currents shown in supplementary figure 3 are clearly not monoexponential. The fit to a single exponential can be seen to be inadequate and thus the comparison of kinetics of control, rapamycin and menthol is incorrect. At least two exponentials should be fitted and their values compared.

      We agree that the decay in the (combined) presence of agonists deviates from a simple monoexponential behavior. While we agree that fitting with two (or more) exponentials would provide a better fit, this also comes with greater variations/uncertainties in the fit parameters. This is particularly the case when inactivation is very slow and incomplete, or when the difference between slow and fast exponential time constants is <5, as seen with rapamycin and rapamycin +menthol. Therefore, we decided to provide monoexponential time constants as a proxy to describe the clear slowing down of activation and deactivation time courses in the presence of Type I agonists.   

      Also related to this aspect, recordings of TRPM8 currents can not be leak subtracted with a p/n protocol, thus a large fraction of the initial tail current must be the capacitive transient. There is no indication in the methods of how was this dealt with for the fitting of tail currents.

      As explained in the methods, capacitive transients and series resistance were maximally compensated. Therefore, we do not agree that a large fraction of the initial tail current must be capacitive. This can also be clearly seen in experiment such as Figure 1C, where the inward tail current is fully abolished in the presence of a TRPM8 antagonist. Likewise, very small and rapidly inactivating tail currents can be seen during voltage steps under control conditions (e.g. Figure S7  and S8 in the revised version).  

      (3) The docking procedure employed, as the authors show, is not appropriate for membrane proteins since it does not include a lipid membrane. It is not clear in the methods section if the MD minimization described applies only to the rapamycin molecule or to rapamycin bound to TRPM8.  

      It is also not clear if the important residue Q861 (and other residues that are identified as interacting with rapamycin) were identified from dockings or proposed based on other evidence.

      (4) Identifying amino acid residues that diminish the response to a ligand, does not uniquely imply that they form a binding site or even interact with said ligand. It is entirely possible that they can be involved in the allosteric networks involved in the activating conformational change. This caveat should be clearly posited by the authors when discussing their results.

      In our study, we identified several residues that significantly reduce the response to rapamycin when mutated, while retaining robust responses to menthol, which indicates that these mutations do not affect crucial conformational changes leading to channel gating. While our cumulative data suggest that these residues may be involved in direct interaction with rapamycin, we recognize the alternative possibility that they allosterically affect rapamycin-induced channel gating. This is now clearly stated in the first paragraph of the discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Review):

      • While the title is fair with respect to the data shown, in the summary and the rest of the paper, the comparison between anesthetized and awake conditions is systematically stated, while more caution should be used.

      First, isoflurane is one of the (many) anesthetics commonly used in pre-clinical research, and its effect on the brain vasculature cannot be generalized to all the anesthetics. Indeed, other anesthesia approaches do not produce evident vasodilation; see ketamine + medetomidine mixtures. Second, the imaged awake state is head-fixed and body-constrained in mice. A condition that can generate substantial stress in the animals. In this study, there is no evaluation of the stress level of the mice. In addition, the awake imaging sessions were performed a few minutes after the mouse woke up from isoflurane induction, which is necessary to inject the MB bolus. It is known that the vasodilator effects of isoflurane last a long time after its withdrawal. This aspect would have influenced the results, eventually underestimating the difference with respect to the awake state.

      These limitations should be clearly described in the Discussion.

      Looking at Figure 2e, it takes more than 5' to reach the 5 Millions MB count useful for good imaging. However, the MB count per pixel drops to a few % at that time. This information tells me that (i) repeated measurements are feasible but with limited brain coverage since a single 'wake up' is needed to acquire a single brain section and (ii) this approach cannot fit the requirements of functional ULM that requires to merge the responses to multiple stimuli to get a complete functional image. Of course, a chronic i.v. catheter would fix the issue, but this configuration is not trivial to test in the experimental setup proposed by the authors, hindering the extension of the approach to fULM.

      Thank you for highlighting these limitations, as they address aspects that were not fully considered during the experimental design and manuscript writing. In response, we have added the following paragraphs to the discussion section, addressing these limitations of our study:

      (Line 310) “Although isoflurane is widely used in ultrasound imaging because it provides long-lasting and stable anesthetic effects, it is important to note that the vasodilation observed with isoflurane is not representative of all anesthetics. Some anesthesia protocols, such as ketamine combined with medetomidine, do not produce significant vasodilation and are therefore preferred in experiments where vascular stability is essential, such as functional ultrasound imaging(47). Therefore, in future studies, it would be valuable to design more rigorous control experiments with larger sample sizes to systematically compare the effects of isoflurane anesthesia, awake states, and other anesthetics that do not induce vasodilation on cerebral blood flow.

      Our proposed method enabled repeatable longitudinal brain imaging over a three-week period, addressing a key limitation of conventional ULM imaging and offering potential for various preclinical applications. However, there are still some limitations in this study. 

      One of the limitations is the lack of objective measures to assess the effectiveness of head-fix habituation in reducing anxiety. This may introduce variability in stress levels among mice. Recent studies suggest that tracking physiological parameters such as heart rate, respiratory rate, and corticosterone levels during habituation can confirm that mice reach a low stress state prior to imaging(48). This approach would be highly beneficial for future awake imaging studies. Furthermore, alternative head-fixation setups, such as air-floated balls or treadmills, which allow the free movement of limbs, have been shown to reduce anxiety and facilitate natural behaviors during imaging(30). Adopting these approaches in future studies could enhance the reliability of awake imaging data by minimizing stress-related confounds.

      Another limitation of this study is the potential residual vasodilatory effect of isoflurane anesthesia on awake imaging sessions. The awake imaging sessions were conducted shortly after the mice had emerged from isoflurane anesthesia, required for the MB bolus injections. The lasting vasodilatory effects of isoflurane may have influenced vascular responses, potentially contributing to an underestimation of differences in vascular dynamics between anesthetized and awake state. Future applications of awake ULM in functional imaging using an indwelling jugular vein catheter presents a promising alternative to enable more accurate functional imaging in awake animals, addressing current limitations associated with anesthesia-induced vascular effects.”

      • Statistics are often poor or not properly described. 

      The legend and the text referring to Figure 2 do not report any indication of the number of animals analyzed. I assume it is only one, which makes the findings strongly dependent on the imaging quality of THAT mouse in THAT experiment. Three mice have been displayed in Figure 3, as reported in the text, but it is not clear whether it is a mouse for each shown brain section. Figure 5 reports quantitative data on blood vessels in awake VS isoflurane states but: no indication about the number of tested mice is provided, nor the number of measured blood vessels per type and if statistics have been done on mice or with a multivariate method.

      Also, a T-test is inappropriate when the goal is to compare different brain regions and blood vessel types.

      Similar issues partially apply to Figure 6, too.

      Thank you for bringing this to our attention. 

      We acknowledge that the statistical analyses were not clearly explained in the original version. In the revised manuscript, we have ensured that the statistical methods are clearly described. 

      (Fig.4 caption) “b,c, Comparisons of vessel diameter (b) and flow velocity (c) for the selected arterial and venous segments. Statistical analysis was conducted using t-test at each measurement point along the segments.”

      (Fig.6 caption) “b,c, Comparisons of vessel diameter (b) and flow velocity (c) for the selected arterial and venous segments. Statistical analysis was conducted using the two one-sided test (TOST) procedure, which evaluates the null hypothesis that the difference between the two weeks is larger than three times the standard deviation of one week.”

      Additionally, we corrected an error in the previous comparison of the violin plots on flow velocities, where a t-test was incorrectly applied; this has now been removed.

      We acknowledge that the original version did not clearly indicate the numbers of animals in the statistical analysis. In the revised manuscript, we have added Supplementary Figure 1 to specify the mice used, and we have labeled each mouse accordingly in the figures or captions. In the revised Figures 4 and 6, we have ensured that each quantitative analysis figure or its caption clearly indicate the specific mice.

      For original Figures 1 and 2, these are presented as case studies to illustrate the methodology. Since the anesthesia time required for tail vein injection for each animal varies slightly, it is challenging to have the consistent time taken for each mouse to recover from anesthesia across all mice. For instance, in Figure 1, the mouse took nearly 500 seconds to recover from anesthesia, but this duration is not consistent across all animals, which is a limitation of the bolus injection technique. We have noted this point in the discussion (discussion on the limitation of bolus injection), and we have also clarified in the results section and figure captions that these figures represent a case study of a single mouse rather than a standardized recovery time for all animals.

      We further clarified this point in the end of the Figure 2 caption:

      (Fig.2 caption) “This figure presents a case study based on the same mouse shown in Fig 1. The x-axis for d-f begins at 500 seconds because, at this point, the mouse’s pupil size stabilized, indicating it had recovered to an awake state. Consequently, ULM images were accumulated starting from this time. It is important to note that not every mouse requires 500 seconds to fully awaken; the time to reach a stable awake state varies across individual mice.” We added the following statement before introducing Figure 1e:

      (Line 93) “Due to differences in tail vein injection timing and anesthesia depth, the time required for each mouse to fully awaken varied. Although it was not feasible to get pupil size stabilized just after 500 seconds for each animal, ULM reconstruction only used the data that acquired after the animal reached full pupillary dilation, to ensure that ULM accurately captures the cerebrovascular characteristics in the awake state.”

      We added the following statement before introducing Figure 2d:

      (Line 139) “To further verify that the proposed MB bolus injection method can help to achieve ULM image saturation shortly after mice awaken from anesthesia, an analysis on the change in MB concentration over time was conducted once pupil size had stabilized (T = 500s).”

      For Figures 3, 4, and 5 (in the revised version, Figures 4 and 5 have been combined into a single Figure 4), the data represents results from three individual mice, with each coronal plane corresponding to a different mouse. In the revised version, we have added labels to indicate the specific mouse in each image to improve clarity. We also recognize that some analyses in the original submission (original Figure 5) may have lacked sufficient statistical power due to the small sample size. Therefore, in the revised version, we have focused only on findings that were consistently observed across the three mice to ensure robust conclusions.

      Reviewer 1 (Recommendations For the Authors):

      • If the study's main goal is to compare awake vs anesthetized ULM, the authors should test at least another anesthetic with no evident vasodilator effect.

      Thank you for this valuable suggestion. We would like to clarify that the primary aim of our study is not to comprehensively compare the effects of anesthesia versus the awake state, as a rigorous comparison would indeed require a more controlled experimental design, including additional anesthetics, a larger cohort of mice, and broader controls to ensure sufficient statistical power. We also add the following statement in the Discussion to clarify this point:

      (Line 314) “Therefore, in future studies, it would be valuable to design more rigorous control experiments with larger sample sizes to systematically compare the effects of isoflurane anesthesia, awake states, and other anesthetics that do not induce vasodilation on cerebral blood flow.”

      We acknowledge that the initial organization of Figures 3–5 placed excessive emphasis on comparisons between the awake and anesthetized states, but without yielding consistently significant findings. Meanwhile, our longitudinal observations in original Figure 6 were underrepresented, despite their potential importance.

      In the revised version, we shifted our focus toward the main goal of awake longitudinal imaging. By consolidating the previous Figures 4 and 5 into the new Figure 4, we emphasize conclusions that are both more consistent and broadly applicable, avoiding areas that may lack sufficient rigor or consensus. Additionally, we expanded the quantitative analysis related to longitudinal imaging, highlighting its role as the ultimate objective of this study. The awake vs. anesthetized ULM comparison was intended to demonstrate the value of awake imaging and introduce the importance of awake longitudinal imaging. In the revised text, we have reframed this comparison to emphasize the specific response to isoflurane rather than a general response to anesthesia. For example, in Figures 3 and 4, we have replaced the original term "Anesthetized" with "Isoflurane". We have also added a discussion noting that isoflurane may induces more vasodilation than other anesthetic agents.

      (Line 310) “Although isoflurane is widely used in ultrasound imaging because it provides long-lasting and stable anesthetic effects, it is important to note that the vasodilation observed with isoflurane is not representative of all anesthetics. Some anesthesia protocols, such as ketamine combined with medetomidine, do not produce significant vasodilation and are therefore preferred in experiments where vascular stability is essential, such as functional ultrasound imaging(47).”

      • The claims made about the proposed experimental protocol to be suitable for the "long-term" (line 255) are not supported by the data and should be modified according to the presented evidence.

      Thank you for your valuable feedback. We agree that our current three-week experimental results do not yet fulfill the requirements for extended longitudinal imaging that may span several months. We have revised the relevant text accordingly. For instance, the phrase “Our proposed method enabled long-term, repeatable longitudinal brain imaging” has been modified to “Our proposed method enabled repeatable longitudinal brain imaging over a threeweek period.” (Similar changes also in Line 67, Line 318, and Line 337) Additionally, we have added the following paragraph in the discussion section to indicate that extending the monitoring period to several months is a meaningful direction for future exploration:

      (Line 337) “In our longitudinal study, consistent imaging results were obtained over a three-week period, demonstrating the feasibility of awake ULM imaging for this duration. However, for certain research applications, a monitoring period of several months would be valuable. Extending the duration of longitudinal awake ULM imaging to enable such long-term studies is a potential direction for future development.”

      Recommendations for improving the writing and presentation:

      • Reporting the number of mice and blood vessels and statistics for each quantitative figure.

      Thank you for highlighting this issue. We acknowledge that the quantitative figures in the previous version lacked clarity in specifying the number of mice, vessels, and associated statistics. In the revised version, we have ensured that each quantitative figure or its caption clearly indicate the specific mice, vessels, and statistical methods used. To further minimize any potential confusion, we have also added Supplementary Figure 1 to clearly label and reference each individual mouse included in the study.

      Minor corrections to the text and figures.

      • Line 22: "vascularity reduction from anesthesia" is not clear, nor it is a codified property of brain vasculature. Explain or rephrase.

      Thank you for your comment. We apologize for any confusion caused by the phrase “vascularity reduction from anesthesia” in the abstract. We agree that this phrasing was unclear without context. To improve clarity, we have revised this statement in the abstract to make it more straightforward and easier to understand. 

      (Line 24) “Vasodilation induced by isoflurane was observed by ULM. Upon recovery to the awake state, reductions in vessel density and flow velocity were observed across different brain regions.” 

      Additionally, we have added a section in the Methods titled Quantitative Analysis of ULM Images to provide a clear definition of vascularity. This section outlines how vascularity is quantified in our study, ensuring that our terminology is well-defined. 

      The following sentence shows the definition of vascularity:

      (Line 547) “Vascularity was defined as the proportion of the pixel count occupied by blood vessels within each ROI, obtained by binarizing the ULM vessel density maps and calculating the percentage of the pixels with MB signal.”

      We have also added an instant definition when it was firstly used in Results part:

      (Line 161) “When comparing vessel density maps, ULM images that are acquired in the awake state demonstrate a global reduction of vascularity, which refers to percentage of pixels that occupied by blood vessels.”

      • Line 76: putting the mice in a tube is also intended "To further reduce animal anxiety and minimize tissue motion" I agree with tissue motion, not with animal anxiety, which, indeed, I expect to be higher than if it could, for example, run on a ball or a treadmill.

      Thank you for pointing this out. We acknowledge the limitations of our setup regarding reducing animal anxiety. We have replaced the original phrase “to further reduce animal anxiety and minimize tissue motion” with “to further minimize tissue motion.” (Line 78) Additionally, we have added the following paragraph in Discussion section to address the limitations of our setup in reducing anxiety.

      (Line 321) “One of the limitations is the lack of objective measures to assess the effectiveness of head-fix habituation in reducing anxiety. This may introduce variability in stress levels among mice. Recent studies suggest that tracking physiological parameters such as heart rate, respiratory rate, and corticosterone levels during habituation can confirm that mice reach a low stress state prior to imaging(48). This approach would be highly beneficial for future awake imaging studies. Furthermore, alternative head-fixation setups, such as air-floated balls or treadmills, which allow the free movement of limbs, have been shown to reduce anxiety and facilitate natural behaviors during imaging(30). Adopting these approaches in future studies could enhance the reliability of awake imaging data by minimizing stress-related confounds.”

      • Line 79: PMP has been used by Sieu et al., Nat Methods, 2015; it should be acknowledged.

      Thank you for highlighting this. We have now included the reference to Sieu et al. Nat Methods, 2015 to appropriately acknowledge their use of PMP. (Line 81)

      • Figure: is there a reason why the plots start at 500 sec? What happened before that time?

      Thank you for your question regarding the starting time in the plots. Figures 1 and 2 are case studies using a single mouse to demonstrate the feasibility of our method. The “zero” timepoint was defined as the moment when anesthesia was stopped, and the microbubble injection began. However, the mouse does not fully recover immediately after anesthesia is stopped. As shown in Figure 1e, there is a period of approximately 500 seconds during which the pupil gradually dilates, indicating recovery. Only after this period does the mouse reach a relatively stable physiological state suitable for ULM imaging, which is why the plots in Figure 2 begin at T = 500 seconds.

      We recognize that this was not sufficiently explained in the main text and figure captions. In the revised manuscript, we have clarified this timing rationale in both the results section and the figure captions. We added the following sentence to the result section to introduce Fig.2d:

      (Line 139) “To further verify that the proposed MB bolus injection method can help to achieve ULM image saturation shortly after mice awaken from anesthesia, an analysis on the change in MB concentration over time was conducted once pupil size had stabilized (T = 500s).”

      We also added the following statement to note that this recover time varies across individual mice:

      (Line 154, Fig.2 caption) “This figure presents a case study based on the same mouse shown in Fig 1. The x-axis for d-f begins at 500 seconds because, at this point, the mouse’s pupil size stabilized, indicating it had recovered to an awake state. Consequently, ULM images were accumulated starting from this time. It is important to note that not every mouse requires 500 seconds to fully awaken; the time to reach a stable awake state varies across individual mice.”

      Reviewer 2 (Public Review):

      • The only major comment (calling for further work) I would like to make is the relative weakness of the manuscript regarding longitudinal imaging (mostly Figure 6), compared to the exhaustive review of the effect of isoflurane on the vasculature (3 rats, 3 imaging planes, quantification on a large number of vessels, in 9 different brain regions). The 6 cortical vessels evaluated in Figure 6 feel really disappointing. As longitudinal imaging is supposed to be the salient element of this manuscript (first word appearing in the title), it should be as good and trustworthy as the first part of the paper. Figure 6c. is of major importance, and should be supported by a more extensive vessel analysis, including various brain areas, and validated on several animals to validate the robustness of longitudinal positioning with several instances of the surgical procedure. Figure 6d estimates the reliability of flow measurements on 3 vessels only. Therefore I recommend showing something similar to what is done in Figures 4 and 5: 3 animals, and more extensive quantification in different brain regions.

      We thank the reviewer for pointing out this issue. We acknowledge that the first version of the manuscript lacked in-depth quantitative analysis in the section on the longitudinal study, which should have been a focal point. It also did not provide a sufficient number of animals to demonstrate the reproducibility of the technique. In this revised version, we have included results from more animals and conducted a more comprehensive quantitative analysis, with the corresponding text updated accordingly. Specifically, we combined the previous Figures 4 and 5 into the current Figure 4 (corresponding revised text from Line 169 to Line 207). The revised Figures 5 and 6

      compare the results of the longitudinal study, presenting data from three mice (corresponding revised text from

      Line 224 to Line 258). Detailed information about the mice used has been added to Supplementary Figure 1, and Supplementary Figure 4 further provides a detailed display of the results for the three mice in longitudinal study. We hope that these adjustments will provide a more thorough validation of the longitudinal imaging.

      Reviewer 2 (Recommendations For The Authors):

      Minor comments:

      • The statistical analyses are not always explained: could they be stated briefly in the legends of each figure, or gathered in a statistical methods section with details for each figure? Be sure to use the appropriate test (e.g. student t-test is used in Fig 5 k whereas normality of distribution is not guaranteed.)

      Thank you for pointing this out. We acknowledge that the statistical analyses were not clearly explained in the original version. In the revised manuscript, we have ensured that the statistical methods are clearly described. 

      (Fig.4 caption) “b,c, Comparisons of vessel diameter (b) and flow velocity (c) for the selected arterial and venous segments. Statistical analysis was conducted using t-test at each measurement point along the segments.”

      (Fig.6 caption) “b,c, Comparisons of vessel diameter (b) and flow velocity (c) for the selected arterial and venous segments. Statistical analysis was conducted using the two one-sided test (TOST) procedure, which evaluates the null hypothesis that the difference between the two weeks is larger than three times the standard deviation of one week.”

      Additionally, we corrected an error in the previous comparison of the violin plots on flow velocities, where a t-test was incorrectly applied; this has now been removed.

      • The authors use early in the manuscript the term vascularity, e.g. in "vascularity reduction", it is not exactly clear what they mean by vascularity, and would require a proper definition at that moment. If I am correct, a quantification of that "vascularity reduction" (page 5 line 132), is then done in Figures 5 d e f and j.

      Thank you for highlighting this issue. We acknowledge that our initial use of the term “vascularity” may have been unclear and potentially confusing. In the revised manuscript, we have included a clear definition of “vascularity” in the Methods section under Quantitative Analysis of ULM Images (Line 534). 

      The following sentence shows the definition of vascularity:

      (Line 547) “Vascularity was defined as the proportion of the pixel count occupied by blood vessels within each ROI, obtained by binarizing the ULM vessel density maps and calculating the percentage of the pixels with MB signal.”

      We have also added an instant definition when it was firstly used in Results part:

      (Line 161) “When comparing vessel density maps, ULM images that are acquired in the awake state demonstrate a global reduction of vascularity, which refers to percentage of pixels that occupied by blood vessels.”

      • There is very little motion in the images presented, except for the awake "Bregma -4.2 mm" (Figure 3, directional maps), especially in the area including colliculi and mesencephalon, while the cortical vessels do not move. Can you comment on that?

      Thank you for highlighting this important aspect of motion in awake animal imaging. Motion correction is indeed a critical factor in such studies. In the original version of our discussion, we briefly addressed this issue (from Line 342 to Line 346), but we agree that a more detailed discussion is needed.

      To minimize motion artifacts, we conducted habituation to acclimate the animals to the head-fixation setup, which helps reduce anxiety during imaging. With thorough head-fixed habituation, the imaging quality is generally well-preserved. We also applied correlation-based motion correction techniques based on ULM images, which can partially correct for overall brain motion, as stated in the previous version. However, this ULM-images-based correction is limited to addressing only rigid motion.

      In the revised discussion, we have expanded on the limitations of our current motion correction approach and referenced recent work about more advanced motion correction methods:

      (Line 346) “While rigid motion correction is often effective in anesthetized animals, awake animal imaging presents greater challenges due to the more prominent non-rigid motion, particularly in deeper brain regions. This is evidenced in Supplementary Fig. 1 (Mouse 7), where cortical vessels remain relatively stable, but regions around the colliculi and mesencephalon exhibit more noticeable motion artifacts, indicating that displacement is more pronounced in deeper areas. To address these deeper, non-rigid motions, recent studies suggest estimating nonrigid transformations from unfiltered tissue signals before applying corrections to ULM vascular images(16,50). Such advanced motion correction strategies may be more effective for awake ULM imaging, which experiences higher motion variability. The development of more robust and effective motion correction techniques will be crucial to reduce motion artifacts in future awake ULM applications.”

      • Figure 1f maybe flip the color bar to have an upward up and downward down.

      Thank you for your suggestion. This display method indeed makes the images more intuitive. In the revised manuscript, all directional flow color bars have been flipped to ensure that upward flow is displayed as ‘up’ and downward flow as ‘down.’

      • Figure 2b the figure is a bit confusing in what is displayed between dashed lines, solid lines, dots... maybe it would be easier to read with

      - bigger dots and dashed lines in color for each of the 4 series

      - and so in the legend, thin solid lines in the corresponding color for the fit, but no solid line in the legend (to distinguish data/fit)

      - no lines for FWHM as they are not very visible, and the FWHM values are not mentioned for these examples.

      Thank you for your detailed suggestions. We agree that the original Fig. 2b appeared messy and confusing. Based on this feedback and other comments, we decided to replace the FWHM-based vessel diameter measurement with a more stable binarization-based approach. In the revised version, we selected a specific segment of each vessel and measured the diameter by calculating the distance from the vessel’s centerline to both side after binarization. Each point on the centerline of this segment provides a diameter measurement, which can be further used to calculate the mean and standard error. This updated method is more stable and reproducible, providing reliable measurements even for vessels that are not fully saturated. It also facilitates comparison across more vessels, helping to further demonstrate the generalizability of our saturation standard. We believe these adjustments make the revised Fig. 2b clearer and more readable.

      • Page 7, lines 144-147. This passage is not really clear when linking going up or down and going from the stem to the branches that it is specific to Figure 4a (and therefore to this particular location).

      Thank you for your insightful comments on our vessel classification method. We recognize the limitations of the previous approach and, in order to enhance the rigor of the study, we have opted not to continue using this method in the revised manuscript. We have removed all content related to vessel classification based on branchin and branch-out criteria. This includes the original Classification of Cerebral Vessels section in the Methods, the relevant descriptions in the Results section under “ULM reveals detailed cerebral vascular changes from anesthetized to awake for the full depth of the brain”, limitation of this classification method in Discussion section, as well as related content in the original Figures 4 and 5.

      In the revised analysis, for the comparison between arteries and veins, we focus solely on penetrating vessels in the cortex. For these vessels, it is generally accepted that downward-flowing vessels are arterioles, while upwardflowing vessels are venules. Accordingly, in the revised Figures 4 and 6, we analyze arterioles and venules exclusively in the cortex, without relying on the previous classification method that could be considered controversial.

      • Page 11 line 222 "higher vascular density" seems unprecise.

      Thank you for pointing this out. We have revised the sentence to more precisely convey our observations regarding changes in vascular diameter and vascularity within the ROI. We present these findings as evidence of the vasodilation effect under isoflurane, in alignment with existing research. The revised statement is as follows:

      (Line 275) “Statistical analysis from Fig. 4 shows that certain vessels exhibit a larger diameter under isoflurane anesthesia, and the vascularity, calculated as the percentage of vascular area within selected brain region ROIs, is also higher in the anesthetized state. These findings suggest a vasodilation effect induced by isoflurane, consistent with existing research(20,40,41,43,44).

      • Discussion: page 12, lines 257-267: it is not exactly clear how 3D imaging will help for the differentiation of veins/arteries. However, some methods have already been proposed to discriminate between arteries and veins using pulsatility (Bourquin et al., 2022) or 3D positioning when vessels are overlapped (Renaudin et al., 2023). The latter can also help estimate the out-of-plane positioning during longitudinal imaging.

      Bourquin, C., Poree, J., Lesage, F., Provost, J., 2022. In Vivo Pulsatility Measurement of Cerebral Microcirculation in Rodents Using Dynamic Ultrasound Localization Microscopy. IEEE Trans. Med. Imaging 41, 782-792. https://doi.org/10.1109/TMI.2021.3123912

      Renaudin, N., Pezet, S., Ialy-Radio, N., Demene, C., Tanter, M., 2023. Backscattering amplitude in ultrasound localization microscopy. Sci. Rep. 13, 11477. https://doi.org/10.1038/s41598-023-38531-w

      Thank you for pointing this out. We have revised the relevant paragraph in the discussion to clarify the potential advantages of advances in ULM imaging methods, such as those based on pulsatility (as described by Bourquin et al., 2022) or backscattering amplitude (as demonstrated by Renaudin et al., 2023). These established methods could be helpful for longitudinal imaging. Below is the revised text in the discussion section:

      (Line 370) “Advances in ULM imaging methods can benefit longitudinal awake imaging. For instance, dynamic ULM can differentiate between arteries and veins by leveraging pulsatility features(51). 3D ULM, with volumetric imaging array(52,53), enables the reconstruction of whole-brain vascular network, providing a more comprehensive understanding of vessel branching patterns. Meanwhile, 3D ULM also helps to mitigate the challenge of aligning the identical coronal plane for longitudinal imaging, a process that requires precise manual alignment in 2D ULM to ensure consistency. Additionally, this alignment issue can also be alleviated in 2D imaging using backscattering amplitude method, which may assist in estimating out-of-plane positioning during longitudinal imaging(54).”

      Reviewer 3 (Public Review):

      • It is unclear whether multiple animals were used in the statistical analysis.

      Thank you for bringing this to our attention. We acknowledge that the original version did not clearly indicate the use of animals in the statistical analysis. In the revised manuscript, we have added Supplementary Figure 1 to specify the mice used, and we have labeled each mouse accordingly in the figures or captions. In the revised Figures 4 and 6, we have ensured that each quantitative analysis figure or its caption clearly indicate the specific mice.

      • Generalizations are sometimes drawn from what seems to be the analysis of a single vessel.

      Thank you for pointing this out. To enhance the generalizability of our conclusions, we have expanded our analysis beyond single vessels in several parts of the study. For instance, in Figure 2, we analyzed three vessels at different depths within the same brain region of a single mouse, and we have included additional results in the Supplementary Figure 2 to further support these findings. Additionally, we have revised the language in the manuscript to ensure that conclusions are appropriately qualified and avoid overgeneralization.

      In Figures 4 and 6, we extended the analysis from single vessels to larger region-of-interest (ROI) analyses across entire brain regions. Unlike single-vessel measurements, which are susceptible to bias based on specific measurement locations, ROI-based analyses are less influenced by the operator and provide more objective, generalizable insights.

      • The description of the statistical analysis is mostly qualitative.

      We recognize that some aspects of the original statistical analysis (Figures 4 and 5 in the previous version) lacked rigor and description is more qualitative. The revised version of statistical analysis (Figure 4 and Figure 6) presents our findings from multiple dimensions, ranging from individual vessels to individual cortical ROI of arteries and veins, and ultimately to broader brain regions. For instance, as illustrated in the revised Figure 4f, the average cortical arterial flow speed decreases by approximately 20% from anesthesia to wakefulness, while venous flow speed decreases by an average of 40%, with the reduction in venous flow speed being significantly greater than that of arterial flow. We believe that this kind of description offers more quantitative analysis.

      For more examples, please refer to the Results section where Figure 4 (Line 169 to Line 207) and Figure 6 (Line 224 to Line 258) are described. These sections have been extensively rewritten to emphasize quantitative interpretation of the data. Each part of the analysis now focuses more heavily on quantitative analyses that consistently show similar trends across all animals.

      • Some terms used are insufficiently defined.

      • Additional limitations should be included in the discussion.

      • Some technical details are lacking. 

      Thank you for highlighting these issues. In response, we have made several improvements in the revised manuscript to address these issues. We have clarified terms such as “vascularity” (Line 547) and “saturation point” (Line 112) to ensure precision and prevent ambiguity. We have expanded the discussion (Line 310 to Line 377) to include limitations such as motion correction challenges and advances in ULM imaging methods, including dynamic ULM and backscattering amplitude techniques. We have added further details on interleaved sampling (Line 494 to Line 497), ULM tracking (Line 517 to Line 529), and quantitative analysis (Line 535 to Line 551) in the Methods section to provide a clearer understanding of our approach. 

      Please refer to our other responses for more specific adjustments.

      • Without information about whether the results obtained come from multiple animals, it is difficult to conclude that the authors generally achieved their aim. They do achieve it in a single animal. The results that are shown are interesting and could have an impact on the ULM community and beyond. In particular, the experimental setup they used along with the high reproducibility they report could become very important for the use of ULM in larger animal cohorts.

      We thank the reviewer for recognizing the impact of our work. We also acknowledge that there were some issues—specifically, we did not provide sufficient proof of reproducibility. In the revised version, we have included additional animal experiment results to ensure that the conclusions were not drawn from a single animal but are generally representative of our aim. (See supplementary figure 1 for detailed use of the animals) 

      Reviewer 3 (Recommendations For The Authors):

      • The manuscript would be more convincing by removing some of the superlatives used in the text. For instance, shouldn't "super-resolution ultrasound localization microscopy" simply be "ultrasound localization microscopy"? Expressions such as "first study", "essential", and "invaluable", etc could be replaced by more factual terms. The word "significant" is also used sometimes with statistics to back it up and sometimes without.

      Thank you for highlighting this issue. We have removed the superlatives throughout the manuscript to make the language more precise. For instance, we have simplified “super-resolution ultrasound localization microscopy” to “ultrasound localization microscopy” throughout the main text and removed expressions such as “first study” and “invaluable”. We also reviewed all uses of “essential” and “significant,” replacing “essential” with more modest alternatives where it does not indicate a strict requirement. Similarly, where “significant” does not refer to statistical significance, we have used other terms to avoid any ambiguity.

      • The section "Microbubble count serves as a quantitative metric for awake ULM image reconstruction" had several issues that I think should be addressed. Mainly, the authors make the case that after detecting 5 million microbubbles, there is no clear gain in detecting more. The argument is not very convincing as we know many vessels will not have had a microbubble circulate in them within that timeframe, which will be especially true in smaller vessels. While the analysis in Figure 2 shows nicely that the diameter estimate for vessels in the 20-30 um range is stable at 5 million microbubbles, it is not necessarily the case for smaller vessels. A better approach here might be to select, e.g., a total of 5 million detected microbubbles for practical reasons and then to determine which vessel parameters estimation (e.g., diameter, flow velocity) remain stable. In addition:

      a. Terms such as 'complete ULM reconstruction', 'no obvious change', 'ULM image saturation' are not well defined within the manuscript.

      Thank you for pointing out these issues and for offering a more rigorous approach. We completely agree with your suggestion. While our analysis demonstrated stable diameter estimates for vessels with diameter around 20 µm at 5 million microbubbles, this does not necessarily ensure stability for smaller vessels. Therefore, the choice of 5 million microbubbles was primarily for practical reasons. In the revised version, we have provided a more objective description and clarification of this limitation. We also recognize that terms such as “complete ULM reconstruction,” “no obvious change,” and “ULM image saturation” were not well defined and may have caused confusion, reducing the rigor of this manuscript. Based on your feedback, we have clearly defined “ULM image saturation” within the context of our study, removed absolute and ambiguous terms like “complete ULM reconstruction” and “no obvious change”. We revised the entire section accordingly:

      (Line 109) “To facilitate equitable comparison of brain perfusion at different states, a practical saturation point enabling stable quantification of most vessels needs to be established. Our observations indicated that when the cumulative MB count reached 5 million, ULM images achieved a relatively stable state. Accordingly, in this study, the saturation point was defined as a cumulative MB count of 5 million. There are also possible alternatives for ULM image normalization. For example, different ULM images can be normalized to have the same saturation rate. However, the proposed method of using the same number of cumulative MB count for normalization enables the analysis of blood flow distribution across different brain regions from a probabilistic perspective. The following analysis substantiates this criterion.

      Fig. 2a compares ULM directional vessel density maps and flow speed maps generated with 1, 3, 5, and 6 million MBs, using the same animal as shown in Fig. 1. To quantitatively confirm saturation, multiple vessel segments were selected for further analysis. Fig. 2b presents the measured vessel diameter for a specific segment at various MB counts. After binarizing the ULM map, the vessel diameter was measured by calculating the distance from the vessel centerline to the edge. Each point along the centerline of the segment provided a diameter measurement, enabling calculation of the mean and standard error. At low MB counts, vessels appeared incompletely filled, leading to inaccurate estimation of vessel diameter due to incomplete profiles. For example, at 1–2 million MBs, the binarized ULM map displayed a width of only one or two pixels along the segment. As a result, the measurements always yielded the same diameter values (two pixels, ~10um) with a consistently low standard error of the mean across the entire segment. With increased MB counts, the measured vessel diameter gradually rose, ultimately reaching saturation. The plots in Fig. 2b show that vessel diameter stabilized at 5 million MB count. Additionally, Fig. 2c illustrates the changes in flow velocity measured at different cumulative MB counts. The violin plots display the distribution of flow speed estimates for all valid centerline pixels within the selected segment. At low MB counts (1–3 million), flow velocity estimates fluctuated, but they stabilized as the MB count increased (4–6 million MBs). At 5 million MBs, flow velocity estimates were nearly identical to those at 6 million MBs, corroborating previous findings that vessel velocity measurements stabilize as MB count grows(39). To assess the generalizability of the 5 million MB saturation condition, vessel segments from three different mice across various brain regions were examined. The results, shown in Supplementary Fig. 2, confirm that this saturation criterion applies broadly. Although the 5 million MB threshold may not ensure absolute saturation for all vessels, it is generally effective for vessels larger than 15 μm. This MB count threshold was therefore adopted as a practical criterion.” 

      b. The choice of 10 consecutive tracking frames is arbitrary and should be described as such unless a quantitative optimization study was conducted. Was there a gap-filling parameter? What was the maximum linking distance and what is its impact on velocity estimation?

      Thank you for your comment. We acknowledge that the choice of 10 consecutive tracking frames was based on our common practice rather than a specific quantitative optimization. Additionally, with the uTrack algorithm, we set both the gap-filling parameter and maximum linking distance to 10 pixels. Setting these parameters too high could potentially overestimate velocity. These details have now been added to the Methods section for clarity:

      (Line 517) “The choice of 10 consecutive frames (10 ms) was based on established practice but can be adjusted as needed. For the uTrack algorithm, two additional key parameters were specified: the maximum linking distance and the gap-filling distance, both set to 10 pixels (~50 microns). This configuration means that only bubble centroids within 10 pixels of each other across consecutive frames are considered part of the same bubble trajectory. Additionally, when the start and end points of two tracks fall within this threshold, the gap-filling parameter merges them into a single, continuous track. It is important to select these parameters carefully, as overly large values could lead to an overestimation of flow velocity. By setting the maximum linking distance to 10 pixels, we effectively limited the measurable velocity to 50 mm/s, under the assumption that no bubble would exceed a 50-micron displacement within the 1 ms interval between frames. After determining bubble tracks with the specified parameters for uTrack algorithm, accumulating the MB tracks resulted in the flow intensity map. Considering the velocity distribution across the mouse brain, this 50 mm/s limit ensures that the vast majority of blood flow is captured accurately.”

      c. 'The plots (Figure 2b) clearly indicate that the vessel diameter stabilized beyond 5 million MB count.' This is true for one vessel. To generalize that claim, the analysis should be performed quantitatively on a larger sample of vessels in various areas of the brain, across multiple animals.

      Thank you for pointing out this limitation. We agree that conclusions drawn from a single vessel cannot be generalized across all regions. Following your suggestion, we have added Supplementary Figure 2, where we analyzed multiple vessels from different brain regions across three mice. This expanded analysis further confirms that a 5 million MB count is sufficient to stabilize vessel diameter measurements across various samples.

      (Line 133) “To assess the generalizability of the 5 million MB saturation condition, vessel segments from three different mice across various brain regions were examined. The results, shown in Supplementary Fig. 2, confirm that this saturation criterion applies broadly. Although the 5 million MB threshold may not ensure absolute saturation for all vessels, it is generally effective for vessels larger than 15 μm. This MB count threshold was therefore adopted as a practical criterion.” 

      • "Statistical analysis validates the increase in blood flow induced by anesthesia" is a very interesting section but even though a quantitative analysis was conducted in Figure 5, the language used remains mostly qualitative. I think this section should include quantitative conclusions from the statistical analysis to increase the impact of this work.

      Thank you for your valuable feedback. We recognize that some aspects of the original quantitative analysis (Figures 4 and 5 in the previous version) lacked rigor, such as the classification of arteries, veins, and capillaries, and that the data presented in each row of Figure 5 represented only one mouse per coronal section, limiting the generalizability of statistical conclusions.

      In response to the reviewers’ feedback, the revised version incorporates a new approach by merging the previous Figure 4 and Figure 5 into a single, consolidated figure (now Figure 4). This updated figure aims to present our findings from multiple dimensions, ranging from individual vessels to individual cortical ROI of arteries and veins, and ultimately to broader brain regions. We have focused on quantitative analyses that consistently show similar trends across all animals. For instance, as illustrated in the revised Figure 4f, the average cortical arterial flow speed decreases by approximately 20% from anesthesia to wakefulness, while venous flow speed decreases by an average of 40%, with the reduction in venous flow speed being significantly greater than that of arterial flow. We believe that this approach offers more insightful analysis and enhances the overall impact of the study.

      For more examples, please refer to the revised Results section where Figure 4 are described (from Line 169 to Line 212). These sections have been extensively rewritten to emphasize quantitative interpretation of the data. Each part of the analysis now focuses more heavily on quantitative analyses that consistently show similar trends across all animals.

      • In the methods, it is claimed that 6 healthy female C57 mice were used in the study, but it is hard to tell whether more than one animal is shown in the figures. It is also unclear whether the statistics were performed within or across animals. Since one of the major strengths of the manuscript is that it shows the feasibility of performing reproducible measurements using ULM, most figures should be repeated for each individual animal and provided in supplementary data and statistics should be performed across animals.

      Thank you for bringing this to our attention. We acknowledge that the original version did not clearly indicate the use of individual animals. In the revised manuscript, we have added Supplementary Figure 1 to specify the mice used, and we have labeled each mouse accordingly in the figures or captions. Additionally, we included statistics across animals in the revised Figures 4 and 6, and detailed data for each individual mouse are now provided in Supplementary Figures 3 and 4.

      • The effect of aliasing should be discussed given that 1) a high-frequency probe is used along with a correspondingly relatively low frame rate (1000 fps) and 2) Doppler filtering is used to separate upward from downward-moving microbubbles. There will be microbubbles that circulate faster than the Nyquist limit, which will thus appear as moving in the opposite direction in the Doppler spectrum. It would be important to double-check that the effect is not too important and to report this as a limitation in the discussion.

      Thank you for highlighting this important point. Aliasing is indeed a relevant issue to consider, especially for higher flow velocities in large vessels. We have added a discussion on this limitation in the revised manuscript:

      (Line 359) “Based on the maximum linking distance and gap closing parameters outlined in the Methods section, blood flow with velocities below 50 mm/s can be detected. However, the use of a directional filter to estimate flow direction may introduce aliasing. MBs moving at higher velocities may be subject to incorrect flow direction estimation due to aliasing effects. Given that the compounded frame rate is 1000 Hz, with an ultrasound center frequency of 20 MHz and a sound speed of 1540 m/s, the relationship between Doppler frequency and the axial blood flow velocity(12) indicates that aliasing will not occur for axial flow velocities below 19.25 mm/s. In all flow velocity maps presented in this study, the range is limited to a maximum of 15 mm/s, remaining below the critical threshold for aliasing. Additionally, all vessels analyzed in the violin plots for arteriovenous flow comparisons fall within this range. While cortical arterioles and venules generally exhibit moderate flow speeds, aliasing remains a factor to consider when combining directional filtering with velocity analysis.”

      • The method used to classify vessels may be incorrect and may not be needed. I would recommend the authors not use it and describe the vessels as vessels that branch in or out, etc. Applying an arbitrary threshold of 2 to detect capillaries is also not very convincing. I understand that the authors might decide to maintain this nomenclature, in which case I would recommend clearly explaining it at the beginning of the manuscript along with some of the caveats that are already reported in the discussion.

      Thank you for your comments on our vessel classification method. We recognize the limitations of the previous approach and, in order to enhance the rigor of the study, we have opted not to continue using this method in the revised manuscript.

      In the revised analysis regarding artery and vein, we focus solely on penetrating vessels in the cortex. For these vessels, it is generally accepted that downward-flowing vessels are arterioles, while upward-flowing vessels are venules. Accordingly, in the revised Figures 4 and 6, we analyze arterioles and venules exclusively in the cortex, without relying on the previous classification method that could be considered controversial.

      Additionally, we agree that classifying vessels with values below 2 as capillaries was not a robust approach. Thus, we have removed all related analyses from the revised manuscript.

      Minor comments:

      • Line 16: "resolves capillary-scale ..."; it is not clear that the resolution that is achieved in this work is at the capillary scale.

      Thank you for your valuable feedback. We understand that “capillary-scale” may overstate the achieved resolution in our work. To clarify, we have revised the sentence as follows:

      (Line 18) “Ultrasound localization microscopy (ULM) is an emerging imaging modality that resolves microvasculature in deep tissues with high spatial resolution.” 

      This adjustment more accurately reflects the resolution capabilities of ULM as used in our study.

      • Line 22: 'vascularity' is not well defined in the manuscript. Consider defining or using another term.

      Thank you for pointing out the need for clarification on vascularity. We acknowledge that our initial use of the term “vascularity” may have been unclear and potentially confusing. In the revised manuscript, we have included a clear definition of “vascularity” in the Methods section under Quantitative Analysis of ULM Images (Line 534). 

      The following sentence shows the definition of vascularity:

      (Line 547) “Vascularity was defined as the proportion of the pixel count occupied by blood vessels within each ROI, obtained by binarizing the ULM vessel density maps and calculating the percentage of the pixels with MB signal.”

      We have also added an instant definition when it was firstly used in Results part:

      (Line 161) “When comparing vessel density maps, ULM images that are acquired in the awake state demonstrate a global reduction of vascularity, which refers to percentage of pixels that occupied by blood vessels.”

      • Line 30: I'm not convinced the first two sentences are useful.

      Thank you for pointing out this issue. The opening sentence of the article lacked focus and was too broad. We have rewritten the sentence as follows:

      (Line 34) “Sensitive imaging of correlates of activity in the awake brain is fundamental for advancing our understanding of neural function and neurological diseases.”

      • Line 37: 'micron-scale capillaries': this expression is unclear. Capillaries are typically micron-scaled, so it gives the impression that ULM can image ULM at the one-micron scale, which is not the case.

      Thank you for your helpful comment. We agree that “micron-scale capillaries” could be misleading, as it might imply a resolution at the single-micron level. To clarify, we have revised the sentence as follows:

      (Line 40) “ULM is uniquely capable of imaging microvasculature situated in deep tissue (e.g., at a depth of several centimeters).”

      This revised wording more accurately describes ULM’s capability without implying single-micron level resolution.

      • Line 74: I don't think motion-free imaging is possible in the context of awake animals. Consider 'limiting motion' instead.

      Thank you for pointing out the potential issue with the term “motion-free”. We agree that achieving entirely motion-free imaging is challenging, especially in the context of awake animals. In response to your suggestion, we have revised the sentence to better reflect this limitation:

      (Line 76) “To achieve consistent ULM brain imaging while allowing limited movement in awake animals, a headfixed imaging platform with a chronic cranial window was used in this study.”

      This revised wording more accurately conveys our approach to minimizing motion without implying that motion is completely eliminated.

      • Line 134:'clearly reveals decreased vessel diameter' How was that demonstrated?

      • Line 153: 'significant' according to which statistical test?

      • Line 167: 'slight increase', by how much, is it significant?

      • Line 183: 'smaller vessels' the center of the distribution is not at 10mm/s, and velocity is not necessarily correlated with diameter.

      • Line 184: 'more large vessels', see above. What is a large vessel, and how was this measured?

      • Line 205: 'significantly lower', according to which statistical test?

      We acknowledge that the original version did not properly use the terms of statistical analysis. In the revised manuscript, we have deleted the related points, and rewritten the statistical analysis part to ensure the terms are used correctly. Please refer to the revised part of “ULM reveals an increase in blood flow induced by isoflurane anesthesia” (From Line 169 to Line 209). In the revised Figures 4 and 6, we have also ensured that each quantitative analysis figure or its caption is clearly explained.

      •    Line 398: the interleaved sampling scheme should be described in more detail.

      Thank you for pointing out this issue. The previous version did not clearly explain the details of interleaved sampling. We have now added the following paragraph to the Ultrasound imaging sequence section in Methods:

      (Line 494) “Interleaved sampling is employed to capture high-frequency echoes more effectively. With the system’s sampling rate limited to 62.5 MHz, the upper limit of the center frequency of the transducer passband is 15.625 MHz. To mitigate aliasing, two transmissions are sent per angle, staggered in time. This approach effectively doubles the sampling rate, ensuring more accurate image reconstruction.”

      • Figure 1: Which mouse is it? Are these results consistent across all animals?

      • Figure 2: Which mouse is it? Are these results consistent across all animals?

      • Figure 3: Which mouse is it? Are these results consistent across all animals?

      • Figure 4: Which mouse is it? Are these results consistent across all animals?

      • Figure 5: Is it a single mouse or multiple mice? Are these results consistent across all animals?

      We acknowledge that the original version did not clearly indicate the numbers of animals in the statistical analysis. In the revised manuscript, we have added Supplementary Figure 1 to specify the mice used, and we have labeled each mouse accordingly in the figures or captions. In the revised Figures 4 and 6, we have ensured that each quantitative analysis figure or its caption clearly indicate the specific mice.

      For original Figures 1 and 2, these are presented as case studies to illustrate the methodology. Since the anesthesia time required for tail vein injection for each animal varies slightly, it is challenging to have the consistent time taken for each mouse to recover from anesthesia across all mice. For instance, in Figure 1, the mouse took nearly 500 seconds to recover from anesthesia, but this duration is not consistent across all animals, which is a limitation of the bolus injection technique. We have noted this point in the discussion (discussion on the limitation of bolus injection), and we have also clarified in the results section and figure captions that these figures represent a case study of a single mouse rather than a standardized recovery time for all animals.

      We further clarified this point in the end of the Figure 2 caption:

      (Fig.2 caption) “This figure presents a case study based on the same mouse shown in Fig 1. The x-axis for d-f begins at 500 seconds because, at this point, the mouse’s pupil size stabilized, indicating it had recovered to an awake state. Consequently, ULM images were accumulated starting from this time. It is important to note that not every mouse requires 500 seconds to fully awaken; the time to reach a stable awake state varies across individual mice.” We added the following statement before introducing Figure 1e:

      (Line 93) “Due to differences in tail vein injection timing and anesthesia depth, the time required for each mouse to fully awaken varied. Although it was not feasible to get pupil size stabilized just after 500 seconds for each animal, ULM reconstruction only used the data that acquired after the animal reached full pupillary dilation, to ensure that ULM accurately captures the cerebrovascular characteristics in the awake state.”

      We added the following statement before introducing Figure 2d:

      (Line 139) “To further verify that the proposed MB bolus injection method can help to achieve ULM image saturation shortly after mice awaken from anesthesia, an analysis on the change in MB concentration over time was conducted once pupil size had stabilized (T = 500s).”

      For Figures 3, 4, and 5 (in the revised version, Figures 4 and 5 have been combined into a single Figure 4), the data represents results from three individual mice, with each coronal plane corresponding to a different mouse. In the revised version, we have added labels to indicate the specific mouse in each image to improve clarity. We also recognize that some analyses in the original submission (original Figure 5) may have lacked sufficient statistical power due to the small sample size. Therefore, in the revised version, we have focused only on findings that were consistently observed across the three mice to ensure robust conclusions.

      Minor corrections and typos from all reviewers:

      We would like to sincerely thank the reviewers for their careful reading of our manuscript. We appreciate the time and effort taken to point out the minor typographical errors. We have carefully addressed and corrected all the identified typos, as listed below:

      From Reviewer #1:

      • Line 316: "insensate": correct, please.

      (Line 409) “After confirming that the mouse was anesthetized, the head of the animal was fixed in the stereotaxic frame.”

      From Reviewer #3:

      • Line 15: Super-resolution ultrasound localization microscopy -- consider removing super-resolution as it gives the impression that it is different from standard ULM.

      (Line 18) “Ultrasound localization microscopy (ULM) is an emerging imaging modality that resolves microvasculature in deep tissues with high spatial resolution.”

      • Line 39: typo: activities should be activity.

      (Line 41) “ULM can also be combined with the principles of functional ultrasound (fUS) to image whole-brain neural activity at a microscopic scale.”

      • Line 47: typo: over under.

      (Line 50) “Therefore, in neuroscience research, brain imaging in the awake state is often preferred over imaging under anesthesia.”

      Once again, we are grateful for the reviewers’ thorough review and valuable input, which have helped us improve the clarity and precision of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study reports that spatial frequency representation can predict category coding in the inferior temporal cortex.

      Thank you for taking the time to review our manuscript. We greatly appreciate your valuable feedback and constructive comments, which have been instrumental in improving the quality and clarity of our work.

      The original conclusion was based on likely problematic stimulus timing (33 ms which was too brief). Now the authors claim that they also have a different set of data on the basis of longer stimulus duration (200 ms).

      One big issue in the original report was that the experiments used a stimulus duration that was too brief and could have weakened the effects of high spatial frequencies and confounded the conclusions. Now the authors provided a new set of data on the basis of a longer stimulus duration and made the claim that the conclusions are unchanged. These new data and the data in the original report were collected at the same time as the authors report.

      The authors may provide an explanation why they performed the same experiments using two stimulus durations and only reported one data set with the brief duration. They may also explain why they opted not to mention in the original report the existence of another data set with a different stimulus duration, which would otherwise have certainly strengthened their main conclusions.

      Thank you for your comments regarding the stimulus duration used in our experiments. We appreciate the opportunity to clarify and provide further details on our methodology and decisions.

      In our original report, we focused on the early phase of the neuronal response, which is less affected by the duration of the stimulus. Observations from our data showed that certain neurons exhibited high firing rates even with the brief 33 ms stimulus duration, and the results we obtained were consistent across different durations. To avoid redundancy, we initially chose not to include the results from the 200 ms stimulus duration, as they reiterated the findings of the 33 ms duration.

      However, we acknowledge that the brief stimulus duration could raise concerns regarding the robustness of our conclusions, particularly concerning the effects of high spatial frequencies. Upon reflecting on the reviewer’s comments during the first revision, we recognized the importance of addressing these potential concerns directly. Therefore, we have included the data from the 200 ms stimulus duration in our revised manuscript.

      Furthermore, Our team is actively investigating the differences between fast (33 ms) and slow (200 ms) presentations in terms of SF processing. Our preliminary observations suggest similar processing of HSF in the early phase of the response for both fast and slow presentations, but different processing of HSF in the late phase. This was another reason we initially opted to publish the results from the brief stimulus duration separately, as we intended to explore the different aspects of SF processing in fast and slow presentations in subsequent studies.

      I suggest the authors upload both data sets and analyzing codes, so that the claim could be easily examined by interested readers.

      Thank you for your suggestion to make both data sets and the analyzing codes available for examination by interested readers.

      We have created a repository that includes a sample of the dataset along with the necessary codes to output the main results. While we cannot provide the entire dataset at this time due to ongoing investigations by our team, we are committed to ensuring transparency and reproducibility. The data and code samples we have provided should enable interested readers to verify our claims and understand our analysis process.

      Repository: https://github.com/ramintoosi/spatial-frequency-selectivity

      Reviewer #2 (Public Review):

      Summary:

      This paper aimed to examine the spatial frequency selectivity of macaque inferotemporal (IT) neurons and its relation to category selectivity. The authors suggest in the present study that some IT neurons show a sensitivity for the spatial frequency of scrambled images. Their report suggests a shift in preferred spatial frequency during the response, from low to high spatial frequencies. This agrees with a coarse-to-fine processing strategy, which is in line with multiple studies in the early visual cortex. In addition, they report that the selectivity for faces and objects, relative to scrambled stimuli, depends on the spatial frequency tuning of the neurons.

      Strengths:

      Previous studies using human fMRI and psychophysics studied the contribution of different spatial frequency bands to object recognition, but as pointed out by the authors little is known about the spatial frequency selectivity of single IT neurons. This study addresses this gap and shows spatial frequency selectivity in IT for scrambled stimuli that drive the neurons poorly. They related this weak spatial frequency selectivity to category selectivity, but these findings are premature given the low number of stimuli they employed to assess category selectivity.

      Thank you for your thorough review and insightful feedback on our manuscript. We greatly appreciate your time and effort in providing valuable comments and suggestions, which have significantly contributed to enhancing the quality of our work.

      The authors revised their manuscript and provided some clarifications regarding their experimental design and data analysis. They responded to most of my comments but I find that some issues were not fully or poorly addressed. The new data they provided confirmed my concern about low responses to their scrambled stimuli. Thus, this paper shows spatial frequency selectivity in IT for scrambled stimuli that drive the neurons poorly (see main comments below). They related this (weak) spatial frequency selectivity to category selectivity, but these findings are premature given the low number of stimuli to assess category selectivity.

      While we acknowledge that the number of instances per condition is relatively low, the overall dataset is substantial. Specifically, our study includes a total of 180 stimuli (6 spatial frequencies × 2 scrambled/non-scrambled conditions × 15 instances, including 9 fixed and 6 non-fixed) and 5400 trials (180 stimuli × 2 durations × 15 repetitions). Conducting these trials requires approximately one hour of experimental time per session.

      Extending the number of stimuli, while potentially addressing this limitation, would significantly compromise the quality of the experiment by increasing the duration and introducing potential fatigue effects in the subjects. Despite this limitation, our findings lay important groundwork by offering novel insights into object recognition through the lens of spatial frequency. We believe this work can serve as a foundation for future experiments designed to further explore and validate these theories with expanded stimulus sets.

      Main points.

      (1) They have provided now the responses of their neurons in spikes/s and present a distribution of the raw responses in a new Figure. These data suggest that their scrambled stimuli were driving the neurons rather poorly and thus it is unclear how well their findings will generalize to more effective stimuli. Indeed, the mean net firing rate to their scrambled stimuli was very low: about 3 spikes/s. How much can one conclude when the stimuli are driving the recorded neurons that poorly? Also, the new Figure 2- Appendix 1 shows that the mean modulation by spatial frequency is about 2 spikes/s, which is a rather small modulation. Thus, the spatial frequency selectivity the authors describe in this paper is rather small compared to the stimulus selectivity one typically observes in IT (stimulus-driven modulations can be at least 20 spikes/s).

      To address the concerns regarding the firing rates and the modulation of neuronal responses by spatial frequency (SF), we emphasize several key points:

      (1) Significance of Firing Rate Differences: While it is true that the mean net firing rate to our scrambled stimuli was relatively low, the firing rate differences observed were statistically significant, with p-values approximately at 1e-5. This indicates that despite the low firing rates, the observed differences are reliable and unlikely to have occurred by chance.

      (2) Classification Rate and Modulation by SF: Our analysis showed that the difference between various SF responses led to a classification rate of 44.68%, which is 24.68% higher than the chance level. This substantial increase above the chance level demonstrates that SF significantly modulates IT responses, even if the overall firing rates are modest.

      (3) Effect Size and SF Modulation: While the effect size in terms of firing rate differences may be small, it is significant. The significant modulation of IT responses by SF, as evidenced by our statistical analyses and classification rate, supports our conclusions regarding the role of SF in driving IT responses.

      (4) Expectations for Noise-like Pure SF Stimuli: We acknowledge that IT responses are typically higher for various object stimuli. Given the nature of our pure SF stimuli, which resemble noise-like patterns, we did not anticipate high responses in terms of spikes per second. The low firing rates are consistent with the expectation for such stimuli and do not undermine the significance of the observed modulation by SF.

      We believe that these points collectively support the validity of our findings and the significance of SF modulation in IT responses, despite the low firing rates. We appreciate your insights and hope this clarifies our stance on the data and its implications.

      We added the following description to the Appendix 1 - “Strength of SF selectivity” section:

      “While the firing rates and net responses to scrambled stimuli were modest (e.g., 2.9 Hz in T1), the differences across spatial frequency (SF) bands were statistically significant (p ≈ 1e-5) and led to a classification accuracy 24.68\% above chance. This demonstrates the robustness of SF modulation in IT neurons despite low firing rates. The modest responses align with expectations for noise-like stimuli, which are less effective in driving IT neurons, yet the observed SF selectivity highlights a fundamental property of IT encoding.”

      (2) Their new Figure 2-Appendix 1 does not show net firing rates (baseline-subtracted; as I requested) and thus is not very informative. Please provide distributions of net responses so that the readers can evaluate the responses to the stimuli of the recorded neurons.

      We understand the reviewer’s concern about the presentation of net firing rates. In T2 (the late time interval), the average response rate falls below the baseline, resulting in negative net firing rates, which might confuse readers. To address this, we have added the net responses to the text for clarity. Additionally, we have included the average baseline response in the figure to provide a more comprehensive view of the data.

      “To check the SF response strength, the histogram of IT neuron responses to scrambled, face, and non-face stimuli is illustrated in this figure. A Gamma distribution is also fitted to each histogram. To calculate the histogram, the neuron response to each unique stimulus is calculated for each neuron in spike/seconds (Hz). In the early phase, T1, the average firing rate to scrambled stimuli is 26.3 Hz which is significantly higher than the response in -50 to 50ms which is 23.4 Hz. In comparison, the mean response to intact face stimuli is 30.5 Hz, while non-face stimuli elicit an average response of 28.8 Hz. The average net responses to the scrambled, face, and non-face stimuli are 2.9 Hz, 7.1 Hz, and 5.4 Hz, respectively. Moving to the late phase, T2, the responses to scrambled, face, and object stimuli are 19.5 Hz, 19.4 Hz, and 22.4 Hz, respectively. The corresponding average net responses are 3.9 Hz, 4.0 Hz, and 1.0 Hz below the baseline response.”

      (3) The poor responses might be due to the short stimulus duration. The authors report now new data using a 200 ms duration which supported their classification and latency data obtained with their brief duration. It would be very informative if the authors could also provide the mean net responses for the 200 ms durations to their stimuli. Were these responses as low as those for the brief duration? If so, the concern of generalization to effective stimuli that drive IT neurons well remains.

      The firing rates for the 200 ms stimulus duration are as follows: 27.7 Hz, 30.7 Hz, and 30.4 Hz for scrambled, face, and object stimuli in T1), respectively; and 26.2 Hz, 29.1 Hz, and 33.9 Hz in T2. The average baseline firing rate (−50 to 50 ms) is 23.4 Hz. Therefore, the net responses are 4.3 Hz, 7.3 Hz, and 7.0 Hz for T1; and 2.8 Hz, 5.7 Hz, and 10.5 Hz for T2 for scrambled, face, and object stimuli, respectively.

      Notably, the impact of stimulus duration is more pronounced in T2, which is consistent with the time interval of the T2 compared to T1. However, the firing rates in T1 do not show substantial changes with the longer duration. As we discussed in our response to the first comment, it is important to note that high net responses are not typically expected for scrambled or noise-like stimuli in IT neurons. Instead, the key findings of this study lie in the statistical significance of these responses and their meaningful relationship to category selectivity. These results highlight the broader implications for understanding the role of spatial frequency in object recognition.

      We added the firing rates to the, Appendix 1, “Extended stimulus duration supports LSF-preferred tuning” part as follows.

      “For the 200 ms stimulus duration, the firing rates were 27.7 Hz, 30.7 Hz, and 30.4 Hz for scrambled, face, and object stimuli in T1, respectively, and 26.2 Hz, 29.1 Hz, and 33.9 Hz in T2. The corresponding net responses were 4.3 Hz, 7.3 Hz, and 7.0 Hz in T1, and 2.8 Hz, 5.7 Hz, and 10.5 Hz in T2. While the longer stimulus duration did not substantially increase firing rates in T1, its impact was more pronounced in T2.”

      (4) I still do not understand why the analyses of Figures 3 and 4 provide different outcomes on the relationship between spatial frequency and category selectivity. I believe they refer to this finding in the Discussion: "Our results show a direct relationship between the population's category coding capability and the SF coding capability of individual neurons. While we observed a relation between SF and category coding, we have found uncorrelated representations. Unlike category coding, SF relies more on sparse, individual neuron representations.". I believe more clarification is necessary regarding the analyses of Figures 3 and 4, and why they can show different outcomes.

      Figure 3 explores the relationship between SF coding and category coding at both the single-neuron and population levels.

      ● Figures 3(a) and 3(b) examine the relationship between a single neuron’s response pattern and object decoding in the population.

      ● Figure 3(c) investigates the relationship between a single neuron’s SF decoding capabilities and object decoding in the population.

      ● Figure 3(d) assesses the relationship between a single neuron’s object decoding capabilities and SF decoding in the population.

      In summary, Figure 3 demonstrates a relation between SF coding/response pattern at the single level and category coding at the population level.

      Figure 4, on the other hand, addresses the uncorrelated nature of SF and category coding.

      ● Figure 4(a) shows the uncorrelated relation between a single neuron’s SF decoding capability and its object decoding capability. This suggests that a neuron's ability to decode SF does not predict its ability to decode object categories.

      ● Figure 4(b) illustrates that the contribution of a neuron to the population decoding of SF is uncorrelated with its contribution to the population decoding of object categories. This further supports the idea that the mechanisms behind SF coding and object coding are uncorrelated.

      In summary, Figure 4 suggests that while there is a relation between SF coding and category coding as illustrated in Figure 3, the mechanisms underlying SF coding and object coding operate independently (in terms of correlation), highlighting the distinct nature of these processes.

      We hope this explanation clarifies why the analyses in Figures 3 and 4 present different outcomes. Figure 3 provides insight into the relationship between SF and category coding, while Figure 4 emphasizes the uncorrelated nature of these processes. We also added the following explanation in the “Uncorrelated mechanisms for SF and category coding” section.

      Based on your command, to clarify the presentation of the work, we added the following description to the “Uncorrelated mechanisms for SF and category coding” section:

      “Figures 3 and 4 examine different aspects of the relationship between SF and category coding. Figure 3 highlights a relationship between SF coding at the single-neuron level and category coding at the population level. Conversely, Figure 4 demonstrates the uncorrelated mechanisms underlying SF and category coding, showing that a neuron’s ability to decode SF is not predictive of its ability to decode object categories. This distinction underscores that while SF and category coding are related at broader levels, their underlying mechanisms are independent, emphasizing the distinct processes driving each form of coding.”

      (5) The authors found a higher separability for faces (versus scrambled patterns) for neurons preferring high spatial frequencies. This is consistent for the two monkeys but we are dealing here with a small amount of neurons. Only 6% of their neurons (16 neurons) belonged to this high spatial frequency group when pooling the two monkeys. Thus, although both monkeys show this effect I wonder how robust it is given the small number of neurons per monkey that belong to this spatial frequency profile. Furthermore, the higher separability for faces for the low-frequency profiles is not consistent across monkeys which should be pointed out.

      We appreciate the reviewer’s concern regarding the relatively small number of neurons in the high spatial frequency group (16 neurons, 6% of the total sample across the two monkeys) and the consistency of the results. While we acknowledge this limitation, it is important to note that findings involving sparse subsets of neurons can still be meaningful. For example, Dalgleish et al. (2020) demonstrated that perception can arise from the activity of as few as ~14 neurons in the mouse cortex, supporting the sparse coding hypothesis. This underscores the potential robustness of results derived from small neuronal populations when the activity is statistically significant and functionally relevant.

      Regarding the higher separability for faces among neurons preferring high spatial frequencies, the consistency of this finding across both monkeys suggests that this effect is robust within this subgroup. For neurons preferring low spatial frequencies, we agree that the lack of consistency across monkeys should be explicitly noted. These differences may reflect individual variability or differences in sampling across subjects and merit further investigation in future studies.

      To address this concern, we have updated the text to explicitly discuss the small size of the high spatial frequency group, its implications, and the observed inconsistency in the low spatial frequency profiles between monkeys. We have added the following description to the discussion.

      “Next, according to Figure 3(a), 6% of the neurons are HSF-preferred and their firing rate in HSF is comparable to the LSF firing rate in the LSF-preferred group. This analysis is carried out in the early phase of the response (70-170ms). While most of the neurons prefer LSF, this observation shows that there is an HSF input that excites a small group of neurons. Importantly, findings involving small neuronal populations can still be meaningful, as studies like Dalgleish et al. (2020) have demonstrated that perception can arise from the activity of as few as ~14 neurons in the mouse cortex, emphasizing the robustness of sparse coding.”

      Regarding the separability of faces for the low-frequency profiles, we added the following to the appendix section,

      “For neurons preferring LSF, LP profile, it is important to note the lack of consistency in responses across monkeys. This variability may reflect individual differences in neural processing or variations in sampling between subjects.”

      And in the discussion:

      “Our results are based on grouping the neurons of the two monkeys; however, the results remain consistent when looking at the data from individual monkeys as illustrated in Appendix 2. However, for neurons preferring LSF, we observed inconsistency across monkeys, which may reflect individual differences or sampling variability. These findings highlight the complexity of SF processing in the IT cortex and suggest the need for further research to explore these variations.”

      * Henry WP Dalgleish, Lloyd E Russel, lAdam M Packer, Arnd Roth, Oliver M Gauld, Francesca Greenstreet, Emmett J Thompson, Michael Häusser (2020) How many neurons are sufficient for perception of cortical activity? eLife 9:e58889.

      (6) I agree that CNNs are useful models for ventral stream processing but that is not relevant to the point I was making before regarding the comparison of the classification scores between neurons and the model. Because the number of features and trial-to-trial variability differs between neural nets and neurons, the classification scores are difficult to compare. One can compare the trends but not the raw classification scores between CNN and neurons without equating these variables.

      We appreciate the reviewer’s follow-up comment and agree that differences in the number of features and trial-to-trial variability between IT neurons and CNN units make direct comparisons of raw classification scores challenging. As the reviewer suggests, it is more appropriate to focus on comparing trends rather than absolute scores when analyzing the similarities and differences between these systems. In light of this, we have revised the text to clarify that our intention was not to equate raw classification scores but to highlight the qualitative patterns and trends observed in spatial frequency encoding between IT and CNN units.

      “SF representation in the artificial neural networks

      We conducted a thorough analysis to compare our findings with CNNs. To assess the SF coding capabilities and trends of CNNs, we utilized popular architectures, including ResNet18, ResNet34, VGG11, VGG16, InceptionV3, EfficientNetb0, CORNet-S, CORTNet-RT, and CORNet-z, with both pre-trained on ImageNet and randomly initialized weights. Employing feature maps from the four last layers of each CNN, we trained an LDA model to classify the SF content of input images. Figure 5(a) shows the SF decoding accuracy of the CNNs on our dataset (SF decoding accuracy with random (R) and pre-trained (P) weights, ResNet18: P=0.96±0.01 / R=0.94±0.01, ResNet34 P=0.95±0.01 / R=0.86±0.01, VGG11: P=0.94±0.01 / R=0.93±0.01, VGG16: P=0.92±0.02 / R=0.90±0.02, InceptionV3: P=0.89±0.01 / R=0.67±0.03, EfficientNetb0: P=0.94±0.01 / R=0.30±0.01, CORNet-S: P=0.77±0.02 / R=0.36±0.02, CORTNet-RT: P=0.31±0.02 / R=0.33±0.02, and CORNet-z: P=0.94±0.01 / R=0.97±0.01). Except for CORNet-z, object recognition training increases the network's capacity for SF coding, with an improvement as significant as 64\% in EfficientNetb0. Furthermore, except for the CORNet family, LSF content exhibits higher recall values than HSF content, as observed in the IT cortex (p-value with random (R) and pre-trained (P) weights, ResNet18: P=0.39 / R=0.06, ResNet34 P=0.01 / R=0.01, VGG11: P=0.13 / R=0.07, VGG16: P=0.03 / R=0.05, InceptionV3: P=<0.001 / R=0.05, EfficientNetb0: P=0.07 / R=0.01). The recall values of CORNet-Z and ResNet18 are illustrated in Figure 5(b). However, while the CNNs exhibited some similarities in SF representation with the IT cortex, they did not replicate the SF-based profiles that predict neuron category selectivity. As depicted in Figure 5(c) although neurons formed similar profiles, these profiles were not associated with the category decoding performances of the neurons sharing the same profile.”

      Discussion:

      “Finally, we compared SF's representation trends and findings within the IT cortex and the current state-of-the-art networks in deep neural networks.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      The mean baseline firing rate of their neurons (23.4 Hz) was rather high for single IT neurons (typically around 10 spikes/s or lower). Were these well-isolated units or mainly multiunit activity?

      We confirm that the recordings in our study were from both well-isolated single units and multi-unit activities (remaining after isolation neurons) sorted based on our spike sorting toolbox. The higher baseline firing rate is likely due to the experimental design, particularly the inclusion of the responsive neurons from the selectivity phase. We added the following statement to the methods section.

      “In our analysis, we utilized both well-isolated single units and multi-unit activities (which represent neural activities that could not be further sorted into single units), ensuring a comprehensive representation of neural responses across the recorded population.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thoughtful comments.

      Based on their suggestions we will:

      (1) Use more accurate language to describe the hypothalamus regions under investigation in this study. While we aimed to primarily investigate the medial preoptic area (MPOA), our dissections and sequencing data in fact capture several regions of the anterior hypothalamus including the anteroventral periventricular (AVPV), paraventricular (PVN), supraoptic (SON), suprachiasmatic nuclei (SCN), and more. We will revise the language in our manuscript to reflect that our study in fact investigates the cellular evolution of the anterior hypothalamus across behaviorally divergent deer mice.

      (2) Revise our language to clarify that while our study provides a rich dataset for generating hypotheses about which cell types may contribute to behavioral differences, it does not provide any evidence of causal relationships. We hope to investigate this further in future work.

      (3) Clarify specific methodological choices for which reviewers had questions, especially about the hypothalamic regions for which we did histology to validate cell abundance differences and methodological choices related to mapping our cell clusters to Mus cell types.

      Our responses to each reviewer’s specific comments are below.

      Reviewer #1:

      The major limitation of the study is the absence of causal experiments linking the observed changes in MPOA cell types to species-specific social behaviors. While the study provides valuable correlational data, it lacks functional experiments that would demonstrate a direct relationship between the neuronal differences and behavior. For instance, manipulating these cell types or gene expressions in vivo and observing their effects on behavior would have strengthened the conclusions, although I certainly appreciate the difficulty in this, especially in non-musculus mice. Without such experiments, the study remains speculative about how these neuronal differences contribute to the evolution of social behaviors.

      Yes, we agree the study lacks functional experiments. We hope that the dataset is of value for generating hypotheses about how hypothalamic neuronal cell types may govern species-specific social behaviors, and for these hypotheses to be functionally tested by us and others in future work.

      Reviewer #2:

      Some methodology could be further explained, like the decision of a 15% cutoff value for cell type assignment per cluster, or the necessity of a multi-step analysis pipeline for gene enrichment studies.

      A 15% cutoff value for cell type assignment was chosen to include all known homology correspondences between our dataset and the Mus atlas. For example, i14:Avp/Cck cells from the Mus atlas represent Avp cells from the suprachiasmatic nuclei (SCN). Though only 17.3% of cluster 15 maps to i14:Avp/Cck, we know these two clusters correspond based on the expression of Avp and additional SCN marker genes in cluster 15 (Supp Fig 6). We will further explain this cutoff in the revised manuscript.

      Our gene enrichment study includes a multi-step analysis pipeline because we wanted to control for confounders that may be introduced because of gene expression level. Genes that are more highly expressed are more accurately quantified and thus more likely to be identified as differentially expressed. Therefore, we wanted to test for gene enrichments in our set of DE genes against a background of genes with similar expression levels. We will clarify this motivation in the revised manuscript.

      The authors should exercise strong caution in making inferences about these differences being the basis of parental behavior. It is possible, given connections to relevant research, but without direct intervention, direct claims should be avoided. There should be clear distinctions of what to conclude and what to propose as possibilities for future research.

      Yes, we agree that we are unable to make direct claims about neuronal differences being the basis of parental behavior. We will revise our language to be clearer about which relationships we are hypothesizing and what we propose as possibilities for future research.

      Histology is not performed on all regions included in the sequencing analysis.

      We apologize that our language describing the hypothalamic regions included in the sequencing analysis and those included in the histology is unclear. We aimed to dissect the medial preoptic region for the sequencing analysis, but additionally captured parts of the anterior hypothalamus including the paraventricular (PVN), supraoptic (SON), and suprachiasmatic nuclei (SCN), and more.  Our histology was performed across the entire hypothalamus and includes all regions included in the sequencing data. We will revise the manuscript to more accurately describe the hypothalamic regions for which we investigated.

      Reviewer #3:

      My primary concern is that the dataset is limited: 52,121 neuronal nuclei across 24 samples, which does not provide many cells per cluster to analyze comparatively across sex and species, particularly given the heterogeneity of the region dissected. The Supplementary table reports lower UMIs/genes per cell than is typically seen as well. Perhaps additional information could be obtained from the data by not restricting the analyses to cells that can be assigned to Mus types. A direct comparison of the two Peromyscus species could be valuable as would a more complete Peromyscus POA atlas.

      Our dataset reports ~1,500 genes and ~1,000 UMIs per nuclei which is indeed lower than is typically reported in other single nuclei datasets. Some of this discrepancy is due to a lower quality genome and annotated transcriptome available for Peromyscus compared to Mus musculus, which results in a lower mapping rate than is typically reported in Mus studies. However, our dataset was sufficient to identify known peptidergic cell types (Supp Fig 6) and to map homology to Mus cell types for 34 (64%) of our 53 clusters. Additionally, although some of our clusters contain small numbers of cells, our differential abundance analysis accounts for the variance in cell numbers observed across samples and should be robust against any increase in variance due to small numbers. In fact, even differential abundance of very small cell clusters such as oxytocin neurons (cell type 40) was validated by histology.

      We would like to clarify that all analyses were performed on all cell clusters, regardless of whether or not they could be assigned homology to a Mus cell type. All the cell types that we identified as differentially abundant or contained significant sex differences happened to be cell types for which homology to a Mus cell type could be defined. This may arise for a relatively uninteresting reason: cell types that have more distinct transcriptional signatures will be more accurately clustered, leading to more accurate identification of homology as well as more accurate measurements of differential abundance / expression. We will revise language to make this more clear in our manuscript.

      In Supplement 7, it appears that most neurons can be assigned as excitatory or inhibitory, but then so many of these cells remain in the unassigned "gray blob" seen in panel 1E. Clustering of excitatory and inhibitory neurons separately, as in prior cited work in Mus POA (refs 31 and 57) may boost statistical power to detect sex and species differences in cell types. Perhaps the cells that cannot be assigned to Mus contain too few reads to be useful, in which case they should be filtered out in the QC. The technical challenges of a comparative single-cell approach are considerable, so it benefits the scientific community to provide transparency about them.

      We are not certain about why we are unable to cluster and assign homology to many of our cells (i.e. cells in the unassigned “gray blob”). However, we note that even in the Mus atlas, many cells did not belong to obvious clusters by UMAP visualization and that several clusters lacked notable marker genes and were designated simply as “Gaba” and “Glut” clusters. Therefore, it is unsurprising that our own dataset also contains cells that lack the transcriptional signatures needed to be clustered and/or mapped to Mus cell types. We do know, however, that the median number of reads/nuclei is uniform across cell clusters and does not explain why some clusters could not be assigned to Mus. We will add this information to our revised manuscript.

      We do not think that a two-stage clustering (i.e. clustering first by excitatory vs. inhibitory neurons) is expected to gain power to resolve cell types in this case. Excitatory vs. inhibitory neurons are clearly separable on our UMAP (Supp Fig 7) so that information is already being used by our clustering procedure. However, we will explore this further in our revised manuscript to see if doing so will boost statistical power.

      The Calb1 dimorphism as observed by immunostaining, appears much more extensive in P. maniculatus compared to P. polionotus (Figures 3 E and F). This finding is not reflected in the counts of the i20:Gal/Moxd1 cluster. The use of Calb1 staining as a proxy for the Gal/Moxd1 cluster would be strengthened if the number of POA Calb1+ neurons that are found in each cluster was apparent. There may be additional Calb+ neurons in the cells that are not annotated to a Mus cluster. This clarification would add support to the overall conclusion that there is reduced sexual dimorphism in P. polionotus.

      From the Mus MPOA atlas (which includes both single-cell sequencing data and imaging-based spatial information), it is known that the i20:Gal/Moxd1 cluster comprises sexually dimorphic cells that make up both the BNST and the SDN-POA. These sexually dimorphic cells are well-studied and known to be marked by Calb1, which we used in immunostaining as a proxy for i20:Gal/Moxd1.

      However, we would like to clarify that in our study, the immunostaining of Calb1+ neurons and the sequencing counts of the i20:Gal/Moxd1 cluster are not completely reflective of each other because our sequencing dataset only captured the ventral portion of the BNST. Therefore our i20:Gal/Moxd1 counts contain a combination of some Calb1+ BNST cells and likely all Calb1+ SDN-POA cells and is difficult to interpret on its own. Our histology, however, covers the entire hypothalamus and is more reliable for identifying sex and species differences in each region. We will clarify this in the revised manuscript.

      The relationship between the sex steroid receptor expression and the sex bias in gene expression would be improved if the sex bias in sex steroid receptor expression was included in Supplementary Figure 10.

      We will include this in the revised manuscript.

      There is no explanation for the finding that there is a female bias in gene expression across all cell types in P. polionotus.

      We also find this observation interesting but don’t have a good explanation for why at this point. We plan to follow this up in future work.

    1. Author Response:

      We appreciate the reviewers' detailed feedback, which has highlighted several areas where our study could be strengthened. Although we acknowledge the relatively limited scope of our CRISPR-based gene-deletion screen, we successfully demonstrated the immunogenic role of Pccb in our syngenetic pancreatic cancer mouse model. Specifically, loss of PCCB in our mutant KRAS/p53 PIK3CA-null (αKO) cells blocked host T cell killing of tumor cells.

      Furthermore, blocking the PD1/PD-L1 interaction reverses this anti-tumor immunogenic effect. We agree with the reviewers regarding the limitations of our study, such as the sample size in our scTCR sequencing and the lack of direct cytotoxicity assays to confirm tumor-specific T cell clones. However, our results are consistent across multiple experimental approaches that strongly suggest meaningful differences in host T cell response to the three implanted tumor types, KPC, αKO and p-αKO. We agree that future mechanistic studies will be important to determine how PCCB is involved in this immunogenic response. We also agree with the reviewers that future additional studies with other KPC cell lines will strength our conclusion regarding PCCB. Finally, we acknowledge the inherent limitations of IHC techniques to assess the involvement of other T cell checkpoints that might also be involved in this anti-tumor immunogenic effect. In summary, despite these limitations, our findings provide novel insight into the role of PCCB in pancreatic tumor immunogenicity and contribute to the ongoing discussion of how to improve therapeutic strategies for this deadly cancer.

      Reviewer 1:

      Weaknesses:

      (1) Clonal expansion of cytotoxic T cells infiltrating the pancreatic αKO tumors

      a. Only two tumor-bearing hosts were evaluated by single-cell TCR sequencing, thus limiting conclusions that may be drawn regarding repertoire diversity and expansion.

      We agree with the reviewer that possible repertoire diversity and expansion could be observed by sequencing more tumor-bearing hosts. However, our current data reveal a marked consistency in the transcriptional expression within the two tumors analyzed per group. Importantly, these features are significantly divergent between the αKO and p-αKO groups. While recognizing the limited sample size, the observed within-group consistency and the clear distinction between groups strongly support the validity of the reported trends.

      b. High abundance clones in the TME do not necessarily have tumor specificity, nor are they necessarily clonally expanded. They may be clones which are tissue-resident or highly chemokine-responsive and accumulate in larger numbers independent of clonal expansion. Please consider softening language to clonal enrichment or refer to clone size as clonal abundance throughout the paper.

      We agree with the reviewer that it’s possible that the high abundance clones are not necessarily tumor specific. Our previous work (N. Sivaram 2019) demonstrated the critical role of increased pancreatic CD8+ T cells in αKO tumor regression within B6 mice. Therefore, antigen specific CD8+ T cell clonal expansion within the pancreas is an anticipated observation. However, as the reviewer pointed out, a portion of this expansion may be attributable to factors independent of tumor antigens. While the low T cell infiltration observed in KPC-implanted mice argues against a purely tissue-resident explanation, further investigation is required to definitively establish the tumor specificity of individual clones. We have revised the manuscript to reflect this nuance, replacing "clonal expansion" with "clonal enrichment".

      c. The whole story would be greatly strengthened by cytotoxicity assays of abundant TCR clones to show tumor antigen specificity.

      As mentioned above, we agree with the reviewer that future studies are needed to investigate each of the specific clones. Due to the extended timeframe required, it’s beyond the scope of the present study.

      (2) A genome-wide CRISPR gene-deletion screen to identify molecules contributing to Pik3camediated pancreatic tumor immune evasion"

      a. CRISPR mutagenesis yielded outgrowth of only 2/8 tumors. A more complete screen with an increased total number of tumors would yield much stronger gene candidates with better statistical power. It is unsurprising that candidates were observed in only one of the two tumors. Nevertheless, the authors moved forward successfully with Pccb.

      We agree that by including more mice in the CRISPR screen, it’s possible that we could have identified more candidates. Regardless, we have successfully demonstrated PCCB’s role in pancreatic tumorgenicity with our mouse model.

      (3) T cells infiltrate p-αKO tumors with increased expression of immune checkpoint

      *a. In Figure 4D, cell counts are not normalized to totalCD8+ T cell counts making it difficult to directly compare aKO to p-aKO tumors. Based on quantifications from Figure 4D, I suspect normalization will strengthen the conclusion that CD8+ infiltrate is more exhausted in p-aKO tumors. *

      Due to the use of distinct tumor sections for quantifying CD8+ cells and T cell checkpoint inhibitory receptor expression, direct normalization of these counts is challenging. However, we observed comparable CD8+ cell numbers between αKO and p-αKO tumors, with p-αKO tumors exhibiting nearly double the expression of immune checkpoint receptors. Therefore, even accounting for potential normalization discrepancies, we anticipate that p-αKO tumors would still demonstrate a significantly higher percentage of immune checkpoint receptorpositive cells compared to αKO tumors.

      b. Flow cytometric analysis to further characterize the myeloid compartment is incomplete (single replicate) and does not strengthen the argument that p-aKO TME is more immunosuppressive. It could, however, strengthen the argument that TIL has less anti-tumor potential if effector molecule expression in CD8+ infiltrating cells were quantified.

      We agree that including more tumor samples will strengthen the argument that p-αKO TME is more immunosuppressive. Future studies need to be done to characterize CD8+ T cells.

      (4) Inhibition of PD1/PD-L1 checkpoint leads to elimination of most p-αKO tumors

      a. It is reasonable to conclude that p-aKO tumors are responsive to immune checkpoint blockade. However, there is no data presented to support the statement that checkpoint blockade reactivates an existing anti-tumor CD8+ T cell response and does not induce a de novo response

      We agree that future studies exploring the clonotypes of T cells infiltrating tumors in PD-1treated mice are necessary to determine whether observed T cell response represents reactivation of existing clones, a de novo response, or a combination of both.

      b. The discussion of these data implies that anti-PD-1 would not improve aKO tumor control, but these data are not included. As such, it is difficult to compare the therapeutic response in aKO versus p-aKO. Further, these data are at best an indirect comparison of the T cell responsiveness against tumor, as the only direct comparison is infiltrating cell count in Figure 4 and there are no public TCR clones with confirmed anti-tumor specificity to follow in the aKO versus p-aKO response.

      Since αKO tumors completely regress with 100% animal survival, we deemed anti-PD1 treatment in this group unnecessary. While we did assess anti-PD1 treatment in KPCimplanted mice, no survival benefit was observed (data not shown). The p-αKO tumor model was the only one in which anti-PD1 treatment improved survival. The complexity of the in vivo tumor microenvironment likely contributes to the lack of shared TCR clones between αKO and p-αKO tumors, even within the same tumor group. Future studies aimed at identifying tumorspecific clones may involve transferring in vivo models to in vitro assays or the generation of novel mouse strains expressing identified TCRs. However, these approaches require substantial time and resources and are beyond the scope of the present study.

      Reviewer 2:

      Weaknesses:

      (1) A major issue is that it seems these data are based on the use of a single tumor cell clone with PIK3CA deleted. Therefore, there could be other changes in this clone in addition to the deletion of PIK3CA that could contribute to the phenotype.

      We have previously tested a different KPC cell line (DT10022) with genetically downregulated PIK3CA and found mice implanted with αKO cells also showed tumor regression. However, we have not tested if deletion of Pccb in the DT10022-aKO cell line will have the same effect.

      2) The conclusion that the change in the PCCB-deficient tumor cell line is unrelated to mitochondrial metabolic changes may be incorrect based on the data provided. While it is true that in the experiments performed, there was no statistically significant change in the oxygen consumption rate or metabolite levels, this could be due to experimental error. There is a trend in the OCR being higher in the PCCB-deficient cells, although due to a high standard deviation, the change is not statistically significant. There is also a trend for there being more aKG in this cell line, but because there were only 3 samples per cell line, there is no statistically significant difference.

      Although PCCB is known to cause metabolic changes, in the context of this study, we are comparing PCCB-deficient to PCCB & PIK3CA double-deficient cells. We did not address if PCCB loss alone would cause metabolic alteration. We suspect that is the case.

      (3) More data are required to make the authors' conclusion that there are myeloid changes in the PCCB-deficient tumor cells. There is only flow data from shown from one tumor of each type.

      We agree that including more tumor samples will strengthen the argument that p-αKO TME is more immunosuppressive.

      (4) The previous published study demonstrated increased MHC and CD80 expression in the PIK3CA-deficient tumors and these differences were suggested to be the reason the tumors were rejected. However, no data concerning the levels of these proteins were provided in the current manuscript.

      Our previous hypothesis for altered MHC and CD80 levels is based on the observation that there is a dramatic increase in the number of infiltrating T cells upon Pik3ca deletion. In this study, similar levels of infiltrating T cells were observed when Pccb was deleted in αKO cells, therefore we do not expect any changes in MHC and CD80 levels since these tumors appears to be still recognized by the T cells. Indeed, we are able detect clonal enrichment in p-αKO tumors.

      Reviewer 3:

      Weaknesses:

      The IHC technique that was used to stain and characterize the exhaustion status of the tumorinfiltrating T cells.

      We agree with the reviewer that incorporating multi-color IHC or flow cytometry to characterize the exhaustion status of specific T cell subtypes would provide more comprehensive information. Unfortunately, we do not have the resources to perform these studies currently.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Guo and Uusisaari describes a series of experiments that employ a novel approach to address long-standing questions on the inferior olive in general and the role of the nucleoolivary projection specifically. For the first time, they optimized the ventral approach to the inferior olive to facilitate imaging in this area that is notoriously difficult to reach. Using this approach, they are able to compare activity in two olivary regions, the PO and DAO, during different types of stimulation. They demonstrate the difference between the two regions, linked to Aldoc-identities of downstream Purkinje cells, and that there is co-activation resulting in larger events when they are clustered. Periocular stimulation also drives larger events, related to co-activation. Using optogenetic stimulation they activate the nucleoolivary (N-O) tract and observe a wide range of responses, from excitation to inhibition. Zooming in on inhibition they test the assumption that N-O activation can be responsible for suppression of sensoryevoked events. Instead, they suggest that the N-O input can function to suppress background activity while preserving the sensory-driven responses.

      Strengths:

      This is an important study, tackling the long-standing issue of the impossibility to do imaging in the inferior olive and using that novel method to address the most relevant questions. The experiments are technically very challenging, the results are presented clearly and the analysis is quite rigorous. There is quite a lot of room for interpretation, see weaknesses, but the authors make an effort to cover many options.

      Weaknesses:

      The heavy anesthesia that is required during the experiment could severely impact the findings. Because of the anesthesia, the firing rate of IO neurons is found to be 0.1 Hz, significantly lower than the 1 Hz found in non-anesthetized mice. This is mentioned and discussed, but what the consequences could be cannot be understated and should be addressed more. Although the methods and results are described in sufficient detail, there are a few points that, when addressed, would improve the manuscript.

      We sincerely thank the reviewer for their encouraging comments and recognition of our study’s significance. We fully acknowledge the confounding effects of the deep anesthesia used in our experiments, which was necessary to ensure the animals’ welfare while establishing this technically demanding methodology. We elaborate on these effects below and will further clarify them in the revised manuscript.

      Ultimately, the full resolution of this issue will require recordings in awake animals, as we consider our approach an advancement from acute slice preparations but not yet a complete representation of in vivo IO function. However, key findings from our study—such as amplitude modulation with co-activation and the potential role of IO refractoriness in complex spike generation—could be further explored in existing cerebellar cortical recordings from awake, behaving animals. We hope our work will motivate re-examination of such datasets to assess whether these mechanisms contribute to overall cerebellar function.

      Reviewer #1 (Recommendations for the authors):

      On page 10 the authors indicate that 2084 events were included for DAO and 1176 for PO. Is that the total number of events? What was the average and the range per neuron and the average recording duration?

      Thank you for pointing out lack of clarity. The sentence should say "in total, 2084 and 1176 detected events from DAO and PO were included in the study". We will add the averages and ranges of events detected per neuron in different categories, as well as the durations of the recordings (ranging from 120s to 270s) to the tables.

      On page 10 it is also stated that: "events in PO reached larger values than those in DAO even though the average values did not differ". Please clarify that statement. Which parameter + p-value in the table indicates this difference?

      Apologies for omission. Currently the observation is only visible in the longer tail to the right in the PO data in Figure 2B2. We will add the range of values (3.0-75.2 vs 3.1-39.6 for PO and DAO amplitudes, respectively) in text and the tables in the revision.

      Abbreviating airpuff to AP is confusing, I would suggest not abbreviating it.

      Understood. We will change AP to airpuff in the text. In figure labels, at least in some panels, the abbreviation will be necessary due to space constraints.

      What type of pulse was used to drive ChrimsonR? Could it be that the pulse caused a rebound-like phenomenon with the pulse duration that drove the excitation?

      As described on line 229 and in the Methods, we used 5-second trains of 5-ms LED light pulses. Importantly, these stimulation parameters were informed by our extensive in vitro examination of various stimulation patterns (Lefler et al., 2014), which consistently produced stable postsynaptic responses without inducing depolarization or rebound effects. Additionally, Loyola et al. (2024) reported no evidence of rebound activity in IO cells following optogenetic activation of N-O axons in the absence of direct neuronal depolarization. We will incorporate these considerations into the discussion, while also acknowledging that unequivocal confirmation of “direct” rebound excitation would require intracellular recordings, such as patch clamp experiments.

      The authors indicate that the excitatory activity was indistinguishable in shape from other calcium activity, but can anything be said about the timing (the scale bar in Figure 4A2 has no value, is it the same 2s pulse)?

      Apologies for oversight in labeling the scale bar in Figure 4A2 (it is 2s). While we deliberately refrain from making strong claims regarding the origin of the NO-evoked spikes, their timing can be examined in more detail in Figure 4 - Supplement 1, panels C and D. We will make sure this is clearly stated in the revised text.

      Did the authors check for accidental sparse transfection with ChrimsonR of olivary neurons in the post-mortem analysis?

      Good point! However, we have never seen this AAV9-based viral construct to drive trans-synaptic expression in the IO, nor is this version of AAV known to have the capacity for transsynaptic expression in general.

      No sign of retrograde labeling (via the CF collaterals in the cerebellar nuclei) was seen either. Notably, the hSyn promoter used to drive ChrimsonR expression is extremely ineffective in the IO. Thus, we doubt that such accidental labeling could underlie the excitatory events seen upon N-O stimulation. We will add these mentions with relevant references to the discussion of the revised manuscript.

      On page 18 the authors state that: "The lower SS rate was attributed to intrinsic factors of PNs, while the reduced frequency of CSs was speculated to result from increased inhibition of the IO via the nucleo-olivary (N-O) pathway targeting the same microzone." I think I understand what you mean to say, but this is a bit confusing.

      Agreed. We will rephrase this sentence to clarify that a lower SS rate in a given microzone may lead to increased activation of inhibitory N-O axons that target the region of IO that sends CF to the same microzone.

      Is airpuff stimulation not more likely to activate PO dan DAO because of the related modalities (more face vs. more trunk/limbs?), and thereby also more likely to drive event co-activation (as it is stated in the abstract).

      We agree that the specific innervation patterns of different IO regions likely explain the discrepancy between previous reports of airpuff-evoked complex spikes in cerebellar cortical regions targeted by DAO and the absence of airpuff responses in the particular region of DAO accessible via our surgical approach. As in the present dataset virtually no airpuff-evoked events were seen in DAO regions, we are unable to directly compare airpuff-evoked event co-activation between PO and DAO. The higher co-activation for PO was observed for "spontaneous" activity.

      The Discussion addresses the question of why N-O pathway activation does not remove the airpuff response.

      Given the potentially profound effect, I would propose to expand the discussion on the role of aneasthesia, including longer refractory periods but also potential disruption of normal network interactions (even though individually the stimulations work). Briefly indicating what is known about alpha-chloralose would help interpret the results as well.

      We fully agree that the anesthetic state introduces confounding factors that must be considered when interpreting our results. We will expand the discussion to address how anesthesia, particularly alphachloralose as well as tissue cooling, may contribute to prolonged refractory periods and potential disruptions in normal network interactions. However, we recognize that certain aspects cannot be fully resolved without recordings in awake animals. For this reason, we characterize our preparation as an "upgraded" in vitro approach rather than a fully representative in vivo model.

      Please clearly indicate that the age range of P35-45 is for the moment of virus injection and specify the age range for the imaging experiment.

      Apologies for the oversight. We will indicate these age ranges in the results (as they are currently only specified in Methods). The P35-45 range refers to moment of virus injection.

      The methods indicate that a low-pass filter of 1Hz was used. I am sure this helps with smoothing, but does it not remove a lot of potentially interesting information. How would a higher low-pass filter affect the analysis and results?

      We acknowledge that applying a 1 Hz low-pass filter inevitably removes high-frequency components, including potential IO oscillations and fine details such as spike "doublets." However, given the temporal resolution constraints of our recording approach, we prioritized capturing robust, interpretable events over attempting to extract finer features that might be obscured by both the indicator kinetics and imaging speed.

      While a higher cut-off frequency could, in principle, allow more precise measurement of rise times and peak timings, it would also amplify high-frequency noise, complicating automated event detection and reducing confidence in distinguishing genuine neural signals from artifacts. Given these trade-offs, we opted for a conservative filtering approach to ensure stable event detection. Future work, particularly with faster imaging rates and improved sensors (GCaMP8s) will be used to explore the finer temporal structure of IO activity. We will deliberate on these matters more extensively in the revised discussion.

      Reviewer #2 (Public review):

      The authors developed a strategy to image inferior olive somata via viral GCaMP6s expression, an implanted GRIN lens, and a one-photon head-mounted microscope, providing the first in vivo somatic recordings from these neurons. The main new findings relate to the activation of the nucleoolivary pathway, specifically that: this manipulation does not produce a spiking rebound in the IO; it exerts a larger effect on spontaneous IO spiking than stimulus (airpuff)-evoked spiking. In addition, several findings previously demonstrated in vivo in Purkinje cell complex spikes or inferior olivary axons are confirmed here in olivary somata: differences in event sizes from single cells versus co-activated cells; reduced coactivation when activating the NO pathway; more coactivation within a single zebrin compartment.

      The study presents some interesting findings, and for the most part, the analyses are appropriate. My two principal critiques are that the study does not acknowledge major technical limitations and their impact on the claims; and the study does not accurately represent prior work with respect to the current findings.

      We thank the reviewer for recognising the value of the findings in our "reduced" in vivo preparation, and apologize for omissions in the work that led to critique. We will elaborate on these matters below and prepare a revised manuscript.

      The authors use GCaMP6s, which has a tau1/2 of >1 s for a normal spike, and probably closer to 2 s (10.1038/nature12354) for the unique and long type of olivary spikes that give rise to axonal bursts (10.1016/j.neuron.2009.03.023). Indeed, the authors demonstrate as much (Fig. 2B1). This affects at least several claims:

      a. The authors report spontaneous spike rates of 0.1 Hz. They attribute this to anesthesia, yet other studies under anesthesia recording Purkinje complex spikes via either imaging or electrophysiology report spike rates as high as 1.5 Hz (10.1523/JNEUROSCI.2525-10.2011). This discrepancy is not acknowledged and a plausible explanation is not given. Citations are not provided that demonstrate such low anesthetized spike rates, nor are citations provided for the claim that spike rates drop increasingly with increasing levels of anesthesia when compared to awake resting conditions.

      We fully acknowledge that anesthesia is a major confounding factor in our study. Given the unusually invasive nature of our surgical preparation, we prioritized deep anesthesia to ensure the animals’ welfare. This, along with potential cooling effects from tissue removal and GRIN lens contact, likely contributed to the observed suppression of IO activity.

      We recognize that reported complex spike rates under anesthesia vary considerably across studies, and we will expand our discussion to provide a more comprehensive comparison with prior literature. Notably, different anesthetic protocols, levels of anesthesia, and recording methodologies can lead to widely different estimates of firing rates. While we cannot resolve this issue without recordings in awake animals, we will clarify that our observed rates likely reflect both the effects of anesthesia and specific methodological constraints. We will also incorporate additional references to studies examining cerebellar activity under different anesthetic conditions.

      More likely, this discrepancy reflects spikes that are missed due to a combination of the indicator kinetics and low imaging sensitivity (see (2)), neither of which are presented as possible plausible alternative explanations.

      We acknowledge that the combination of slow indicator kinetics and limited optical power in our miniature microscope setup constrains the temporal resolution of our recordings. However, we are confident that we can reliably detect events occurring at intervals of 1 second or longer. This confidence is based on data from another preparation using the same viral vector and optical system, where we observed spike rates an order of magnitude higher.

      That said, we do not make claims regarding the presence or absence of somatic events occurring at very short intervals (e.g., 100-ms "doublets," as described by Titley et al., 2019), as these would likely fall below our temporal resolution. We will clarify this limitation in the revised manuscript to ensure that the constraints of our approach are fully acknowledged.

      While GCaMP6s is not as sensitive as more recent variants (Zhang et al., 2023, PMID 36922596), our previous work (Dorgans et al., 2022) demonstrated that its dynamic range and sensitivity are sufficient to detect both spikes and subthreshold activity in vitro. Although the experimental conditions differ in the current miniscope experiments, we took measures to optimize signal quality, including excluding recordings with a low signal-to-noise ratio (see Methods). This need for high signal fidelity also informed our decision to limit the sampling rate to 20 fps. In future work, we plan to adopt newer GCaMP variants that were not available at the start of this project, which should further improve sensitivity and temporal resolution.

      Many claims are made throughout about co-activation ("clustering"), but with the GCaMP6s rise time to peak (0.5 s), there is little technical possibility to resolve co-activation. This limitation is not acknowledged as a caveat and the implications for the claims are not engaged with in the text.

      As noted in the manuscript (L492-), "interpreting fluorescence signals relative to underlying voltage changes is challenging, particularly in IO neurons with unusual calcium dynamics." We acknowledge that the slow rise time of GCaMP6s ( 0.5 s) limits our ability to precisely resolve the timing of co-activation at very short intervals. However, given the relatively slow timescales of IO event clustering and the inherent synchrony in olivary network dynamics, we believe that the observed co-activation patterns remain meaningful, even if finer temporal details cannot be fully resolved.

      To ensure clarity, we will expand this section to explicitly acknowledge the temporal resolution limitations of our approach and discuss their implications for interpreting co-activation. While the precise timing of individual spikes within a cluster may not be resolvable, the observed increase in event magnitude with coarse co-activation suggests that clustering effects remain functionally relevant even when exact spike synchrony is not detectable at millisecond resolution.

      This finding is consistent with the idea that co-activation enhances calcium influx, leading to larger amplitude events — a relationship that does not require perfect temporal resolution to be observed. The fact that this effect persists across a broad range of clustering windows (as shown in Figure 2 Supplement 2) further supports its robustness. While we cannot make strong claims about precise spike timing within these clusters nor about the mechanism underlying enhanced calcium signal, our results demonstrate that co-activation may influence IO activity in a quantifiable way. We will clarify these points in the revised manuscript to ensure that our findings are appropriately framed given the temporal constraints of our imaging approach.

      The study reports an ultralong "refractory period" (L422-etc) in the IO, but this again must be tempered by the possibility that spikes are simply being missed due to very slow indicator kinetics and limited sensitivity. Indeed, the headline numeric estimate of 1.5 s (L445) is suspiciously close to the underlying indicator kinetic limitation of 1-2 s.

      Our findings suggest a potential refractory period limiting the frequency of events in the inferior olive under our recording conditions. This interpretation is supported by the observed inter-event interval distribution, the inability of N-O stimulation to suppress airpuff-evoked events, and lower bounds reported in earlier literature on complex spike intervals recorded in awake animals under various behavioral contexts. Taking into account the likely cooling of tissue, a refractory period of 1.5s is not unreasonable. Of course, we recognize that the slow decay kinetics of GCaMP6s may cause overlapping fluorescence signals, potentially obscuring closely spaced events. This is in line with data presented in the Chen et al 2013 manuscript describing GCaMp6s (PMID: 36922596; Figure 3b showing events detected with intervals less than 500 ms).

      The consideration of refractoriness only arose late in the project while we were investigating the explanations for lack of inhibition of airpuff-evoked spikes. Future experiments, particularly in awake animals, will be instrumental in validating this interpretation. To ensure that the refractory period is understood as one possible mechanism rather than a definitive explanation, we will rephrase the discussion to clarify that while our data are compatible with a refractory period, they do not establish it conclusively.

      The study uses endoscopic one-photon miniaturized microscope imaging. Realistically, this is expected to permit an axial point spread function (z-PSF) on the order of 40um, which must substantially reduce resolution and sensitivity. This means that if there *is* local coactivation, the data in this study will very likely have individual ROIs that integrate signals from multiple neighboring cells. The study reports relationships between event magnitude and clustering, etc; but a fluorescence signal that contains photons contributed by multiple neighboring neurons will be larger than a single neuron, regardless of the underlying physiology - the text does not acknowledge this possibility or limitation.

      We acknowledge that the use of one-photon endoscopic imaging imposes limitations on axial resolution, potentially leading to signal contributions from neighboring neurons. To mitigate this, we applied CNMFe processing, which allows for the deconvolution of overlapping signals and the differentiation of multiple neuronal sources within shared pixels. However, as the reviewer points out, if two neurons are perfectly overlapping in space, they may be treated as a single unit.

      To clarify this limitation, we will expand the discussion to explicitly acknowledge the impact of one-photon imaging on signal separation and to emphasize that, while CNMFe helps resolve some overlaps, perfect separation is not always possible. As already noted in the manuscript (L495-), "the absence of optical sectioning in the whole-field imaging method can lead to confounding artifacts in densely labeled structures such as the IO’s tortuous neuropil." We will further elaborate on how this factor was considered in our analysis and interpretation.

      Second, the text makes several claims for the first multicellular in vivo olivary recordings. (L11; L324, etc).

      I am aware of at least two studies that have recorded populations of single olivary axons using two-photon Ca2+ imaging up to 6 years ago (10.1016/j.neuron.2019.03.010; 10.7554/eLife.61593). This technique is not acknowledged or discussed, and one of these studies is not cited. No argument is presented for why axonal imaging should not "count" as multicellular in vivo olivary recording: axonal Ca2+ reflects somatic spiking.

      We appreciate the reviewer’s point and acknowledge the important prior work using two-photon imaging to record olivary axonal activity in the cerebellar cortex. However, while axonal calcium signals do reflect somatic spiking, these recordings inherently lack information about the local network interactions within the inferior olive itself.

      A key motivation for our study was to observe neuronal activity within the IO at the level of its gap-junctioncoupled local circuits, rather than at the level of its divergent axonal outputs. The fan-like spread of climbing fibers across rostrocaudal microzones in the cerebellar cortex makes them relatively easy to record in vivo, but it also means that individual imaging fields contain axons from neurons that may be distributed across different IO microdomains. As a result, while previous work has provided valuable insight into olivary output patterns, it has not allowed for the examination of coordinated somatic activity within localized IO neuron clusters.

      With apologies, we recognize that this distinction was not sufficiently emphasized in our introduction. We will clarify this key point and ensure that the important climbing fiber imaging studies are properly cited and contextualized in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      The authors state: "we found no reports that examined coactivation levels between Z+ and Z- microzones in cerebellar complex spike recordings" (L359). Multiple papers (that are not cited) using AldolaceC-tdTomato mice with two photon Purkinje dendritic calcium imaging showed synchronization (at similar levels) within but not across z+/z- bands. (2015 10.1523/JNEUROSCI.2170-14.2015, 2023 https://doi.org/10.7554/eLife.86340).

      We apologize for the misleading phrasing. We will rephrase this statement to: "While complex spike coactivation within individual zebrin zones has been extensively studied (references), we found no reports directly comparing the levels of intra-zone co-activation between Z+ and Z microzones."

      Additionally, we will ensure that the relevant studies demonstrating synchronization within zebrin zones, as well as (lack of) interactions between neighboring zones, are properly cited and discussed in the revised manuscript.

      The figures could use more proofreading, and several decisions should be reconsidered:

      Normalizing the amplitude to maximum is not a good strategy, as it can overemphasize noise or extremely small-magnitude signals, and should instead follow standard convention and present in fixed units (3A2, 4B2, and even 2C).

      As noted earlier, we have excluded recordings and cells with high noise or a low signal-to-noise ratio for event amplitudes, ensuring that such data do not influence the color-coded panels. Importantly, all quantitative analyses and traces presented in the manuscript are normalized to baseline noise level, not to maximal amplitude, ensuring that noise or low-magnitude signals do not skew the analysis.

      The decision to use max-amplitude normalization in color-coded panels was made specifically to aid visualization of temporal structure across recordings. This approach allows for clearer comparisons without the distraction of inter-cell variability in absolute signal strength. However, we recognize the potential for confusion and will revise the Results text to explicitly clarify that the color-coded visualizations use a different scaling method than the quantitative analyses.

      x axes with no units: Figures 2B2, 2E1, 3B2, 3C2, 5B2, 5C2, 5D2.

      No colorbar units: 5A3 (and should be shown in real not normalized units).

      No y axis units: 5D1.

      No x axis label or units: 5E1.

      5E3 says "stim/baseline" for the y-axis units and then the first-panel title says "absolute frequencies" meaning it’s *not* normalized and needs a separate (accurate) y-axis with units.

      Illegibly tiny fonts: 2E1, 3E1, etc.

      We will correct all these in the revised manuscript. Thank you for careful reading.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This study provides a thorough analysis of Nup107's role in Drosophila metamorphosis, demonstrating that its depletion leads to developmental arrest at the third larval instar stage due to disruptions in ecdysone biosynthesis and EcR signaling. Importantly, the authors establish a novel connection between Nup107 and Torso receptor expression, linking it to the hormonal cascade regulating pupariation.

      However, some contradictory results weaken the conclusions of the study. The authors claim that Nup107 is involved in the translocation of EcR from the cytoplasm to the nucleus. However, the evidence provided in the paper suggests it more likely regulates EcR expression positively, as EcR is undetectable in Nup107-depleted animals, even below background levels.

      We appreciate the concern raised in this public review. However, we must clarify that we do not claim that Nup107 regulates the translocation of EcR from the cytoplasm. It is important to note that we posited this hypothesis if Nup107 will regulate EcR nuclear translocation (9<sup>th</sup> line of 2<sup>nd</sup> paragraph on page 6). We have spelled this out more clearly as the 3<sup>rd</sup> sub-section title of the Results section, and in the discussion (8<sup>th</sup> line of 2<sup>nd</sup> paragraph on page 11). Overall, we have expressed surprise that Nup107 is not directly involved in the nuclear translocation of EcR.

      Ecdysone hormone acts through the EcR to induce the transcription of EcR also and creates a positive autoregulatory loop that enhances the EcR level through ecdysone signaling (1). Since Nup107 depletion leads to a reduction in ecdysone levels, it disrupts the transcription autoregulatory EcR expression loop. This can contribute to the reduced EcR levels seen in Nup107-depleted animals.

      Additionally, the link between Nup107 and Torso is not fully substantiated. While overexpression of Torso appears to rescue the lack of 20E production in the prothoracic gland, the distinct phenotypes of Torso and Nup107 depletion-developmental delay in the former versus complete larval arrest in the latter complicate understanding of Nup107's precise role.

      We understand that there are differences in the developmental delay when Tosro and Nup107 depletion is analyzed. However, the two molecules being compared here are very different, and the extent of Torso depletion is not evident in other studies (2). Even if the extent of depletion of Torso and Nup107 is similar, we believe that Nup107, being a more widely expressed protein, induces stronger defects owing to its importance in cellular physiology. We think that RNAi-mediated depletion of Nup107 causes a defect in 20E biosynthesis through the Halloween genes, inducing a developmental arrest.

      To clarify these discrepancies, further investigation into whether Nup107 interacts with other critical signaling pathways related to the regulation of ecdysone biosynthesis, such as EGFR or TGF-β, would be beneficial and could strengthen the findings.

      In summary, although the study presents some intriguing observations, several conclusions are not well-supported by the experimental data.

      We agree with the reviewer’s suggestion. As noted in the literature, five RTKs-torso, InR, EGFR, Alk, and Pvr-stimulate the PI3K/Akt pathway, which plays a crucial role in the PG functioning and controlling pupariation and body size (3). We have checked the torso and EGFR signaling. We rescued Nup107 defects with the torso overexpression, however, constitutively active EGFR (BL-59843) did not rescue the phenotype (data was not shown). Nonetheless, we plan to examine the EGFR pathway activation by measuring the pERK levels in Nup107-depleted PGs.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Kawadkar et al investigates the role of Nup107 in developmental progression via the regulation of ecdysone signaling. The authors identify an interesting phenotype of Nup107 whole-body RNAi depletion in Drosophila development - developmental arrest at the late larval stage. Nup107-depleted larvae exhibit mis-localization of the Ecdysone receptor (EcR) from the nucleus to the cytoplasm and reduced expression of EcR target genes in salivary glands, indicative of compromised ecdysone signaling. This mis-localization of EcR in salivary glands was phenocopied when Nup107 was depleted only in the prothoracic gland (PG), suggesting that it is not nuclear transport of EcR but the presence of ecdysone (normally secreted from PG) that is affected. Consistently, whole-body levels of ecdysone were shown to be reduced in Nup107 KD, particularly at the late third instar stage when a spike in ecdysone normally occurs. Importantly, the authors could rescue the developmental arrest and EcR mislocalization phenotypes of Nup107 KD by adding exogenous ecdysone, supporting the notion that Nup107 depletion disrupts biosynthesis of ecdysone, which arrests normal development. Additionally, they found that rescue of the Nup107 KD phenotype can also be achieved by over-expression of the receptor tyrosine kinase torso, which is thought to be the upstream regulator of ecdysone synthesis in the PG. Transcript levels of the torso are also shown to be downregulated in the Nup107KD, as are transcript levels of multiple ecdysone biosynthesis genes. Together, these experiments reveal a new role of Nup107 or nuclear pore levels in hormone-driven developmental progression, likely via regulation of levels of torso and torso-stimulated ecdysone biosynthesis.

      Strengths:

      The developmental phenotypes of an NPC component presented in the manuscript are striking and novel, and the data appears to be of high quality. The rescue experiments are particularly significant, providing strong evidence that Nup107 functions upstream of torso and ecdysone levels in the regulation of developmental timing and progression.

      Weaknesses:

      The underlying mechanism is however not clear, and any insight into how Nup107 may regulate these pathways would greatly strengthen the manuscript. Some suggestions to address this are detailed below.

      Major questions:

      (1) Determining how specific this phenotype is to Nup107 vs. to reduced NPC levels overall would give some mechanistic insight. Does knocking down other components of the Nup107 subcomplex (the Y-complex) lead to similar phenotypes? Given the published gene regulatory function of Nup107, do other gene regulatory Nups such as Nup98 or Nup153 produce these phenotypes?

      We thank this public review to raise this concern. Working with a Nup-complex like the Nup107 complex, this concern is anticipated but difficult to address as many Nups function beyond their complex identity. Our observations with all other members of the Nup107-complex, including dELYS, suggest that except Nup107, none of the other Nup107-complex members could induce larval developmental arrest.

      In this study, we primarily focused on the Nup107 complex (outer ring complex) of the NPC. We have not examined other nucleoporins outside of this complex, such as Nup98 and Nup153. However, previous studies have reported that Nup98 and Nup153 interact with chromatin, with these investigations conducted in Drosophila S2 cells (4, 5, 6). In the future, we may check whether Nup98 and Nup153 depletion can produce the arrest phenotype.

      (2) In a related issue, does this level of Nup107 KD produce lower NPC levels? It is expected to, but actual quantification of nuclear pores in Nup107-depleted tissues should be added. These and the above experiments would help address a key mechanistic question - is this phenotype the result of lower numbers of nuclear pores or specifically of Nup107?

      We agree with the concern raised here, and we plan to assess nucleoporin intensity using mAb414 antibody (exclusively FG-repeat Nup recognizing antibody) in the Nup107 depletion background. Our past observations suggest that Nup107-depletion does not affect the overall nuclear pore complex assembly in Drosophila salivary glands (Data is not shown).

      (3) Additional experiments on how Nup107 regulates the torso would provide further insight. Does Nup107 regulate transcription of the torso or perhaps its mRNA export? Looking at nascent levels of the torso transcript and the localization of its mRNA can help answer this question. Or alternatively, does Nup107 physically bind the torso?

      While the concern regarding torso transcript level is genuine, we have already reported in the manuscript that Nup107 levels directly regulate torso expression. When Nup107 is depleted torso levels go down, which in turn controls ecdysone production and subsequent EcR signaling (Figure 6B of the manuscript). However, the exact nature of Nup107 regulation on torso expression is still unclear. Since the Nup107 is known to interact with chromatin (7), it may affect torso transcription. The possibility of a physiologically relevant interaction between Nup107 and the torso in a cellular context is unlikely due to their distinct sub-cellular localizations. If we investigate this further, it will require a significant amount of time for having reagents and experimentation, and currently stands beyond the scope of this manuscript.

      (4) The depletion level of Nup107 RNAi specifically in the salivary gland vs. the prothoracic gland should be compared by RT-qPCR or western blotting.

      Although we know that the Nup107 protein signal is reduced in SG upon knockdown (Figure 3B), we have not compared the Nup107 transcript level in these two tissues (SG and PG). As suggested here, we will knock down Nup107 using SG and PG-specific drivers and quantify the Nup107 depletion level by RT-qPCR.

      (5) The UAS-torso rescue experiment should also include the control of an additional UAS construct - so Nup107; UAS-control vs Nup107; UAS-torso should be compared in the context of rescue to make sure the Gal4 driver is functioning at similar levels in the rescue experiment.

      This is a very valid point, and we took this into account while planning the experiment. To maintain the GAL4 function, we used the Nup107<sup>KK</sup>;UAS-GFP as control alongside the Nup107<sup>KK</sup>;UAS-torso. This approach ensures that GAL4 dilution does not affect observations made in the experiments. It can be noticed in Figure S7 that the presence of GFP signal in prothoracic glands and their reduced size indicates genes downstream to both UAS sequences are transcribed, and GAL4 dilution does not play a role here.

      Minor:

      (6) Figures and figure legends can stand to be more explicit and detailed, respectively.

      We will revisit all figures and their corresponding legends to ensure appropriate and explicit details are provided.

      Reviewer #3 (Public review):

      Summary:

      In this study by Kawadkar et al, the authors investigate the developmental role of Nup107, a nucleoporin, in regulating the larval-to-pupal transition in Drosophila through RNAi knockdown and CRISPR-Cas9-mediated gene editing. They demonstrate that Nup107, an essential component of the nuclear pore complex (NPC), is crucial for regulating ecdysone signaling during developmental transitions. The authors show that the depletion of Nup107 disrupts these processes, offering valuable insights into its role in development.

      Specifically, they find that:

      (1) Nup107 depletion impairs pupariation during the larval-to-pupal transition.

      (2) RNAi knockdown of Nup107 results in defects in EcR nuclear translocation, a key regulator of ecdysone signaling.

      (3) Exogenous 20-hydroxyecdysone (20E) rescues pupariation blocks, but rescued pupae fail to close.

      (4) Nup107 RNAi-induced defects can be rescued by activation of the MAP kinase pathway.

      Strengths:

      The manuscript provides strong evidence that Nup107, a component of the nuclear pore complex (NPC), plays a crucial role in regulating the larval-to-pupal transition in Drosophila, particularly in ecdysone signaling.

      The authors employ a combination of RNAi knockdown, CRISPR-Cas9 gene editing, and rescue experiments, offering a comprehensive approach to studying Nup107's developmental function.

      The study effectively connects Nup107 to ecdysone signaling, a key regulator of developmental transitions, offering novel insights into the molecular mechanisms controlling metamorphosis.

      The use of exogenous 20-hydroxyecdysone (20E) and activation of the MAP kinase pathway provides a strong mechanistic perspective, suggesting that Nup107 may influence EcR signaling and ecdysone biosynthesis.

      Weaknesses:

      The authors do not sufficiently address the potential off-target effects of RNAi, which could impact the validity of their findings. Alternative approaches, such as heterozygous or clonal studies, could help confirm the specificity of the observed phenotypes.

      This is a very valid point raised, and we are aware of the consequences of the off-target effects of RNAi. To assert the effects of authentic RNAi and reduce the off-target effects, we have used two RNAi lines (Nup107<sup>GD</sup> and Nup107<sup>KK</sup>) against Nup107. Both RNAi induced comparable levels of Nup107 reduction, and using these lines, ubiquitous and PG specific knockdown produced similar phenotypes. Although the Nup107<sup>GD</sup> line exhibited a relatively stronger knockdown compared to the Nup107<sup>KK</sup> line, we preferentially used the Nup107<sup>KK</sup> line because the Nup107<sup>GD</sup> line is based on the P-element insertion, and the exact landing site is unknown. Furthermore, there is an off-target predicted for the Nup107<sup>GD</sup> line, where a 19bp sequence aligns with the bifocal (bif) sequence. The bif-encoded protein is involved in axon guidance and regulation of axon extension. However, the Nup107<sup>KK</sup> line does not have a predicted off-target molecule, and we know its precise landing site on the second chromosome. Thus, the Nup107<sup>KK</sup> line was ultimately used in experimentation for its clearer and more reliable genetic background.

      We are also investigating Nup107 knockdown in the prothoracic gland, which exhibits polyteny. Additionally, the number of cells in the prothoracic gland is quite limited, approximately 50-60 cells (8). Given this, there is a possibility that a clonal study may not yield the phenotype. However, we will consider moving forward with this approach also.

      NPC Complex Specificity: While the authors focus on Nup107, it remains unclear whether the observed defects are specific to this nucleoporin or if other NPC components also contribute to similar defects. Demonstrating similar results with other NPC components would strengthen their claims.

      We thank this public review to raise this concern. Working with a Nup-complex like the Nup107 complex, this concern is anticipated but difficult to address as many Nups function beyond their complex identity. Our observations with all other members of the Nup107-complex, including dELYS, suggest that except Nup107, none of the other Nup107-complex members could induce larval developmental arrest. Since the study is primarily focused on the Nup107 complex (outer ring complex) of the NPC, we have not examined other nucleoporins outside of this complex.

      Although the authors show that Nup107 depletion disrupts EcR signaling, the precise molecular mechanism by which Nup107 influences this process is not fully explored. Further investigation into how Nup107 regulates EcR nuclear translocation or ecdysone biosynthesis would improve the clarity of the findings.

      We appreciate the concern raised. Through our observation, we have proposed the upstream effect of Nup107 on the PTTH-torso-20E-EcR axis regulating developmental transitions. We know that Nup107 regulates torso levels, but we do not know if Nup107 directly interacts with torso. We would like to address whether Nup107 exerts control on PTTH levels also.

      We must emphasize that Nup107 does not directly regulate the translocation of EcR. On the contrary, we have demonstrated that EcR translocation is 20E dependent and Nup107 independent. Through our observations, we have argued that Nup107 regulates the expression of Halloween genes required for ecdysone biosynthesis. We are interested in identifying if Nup107 associates directly or through some protein to chromatin to bring about the changes in gene expression required for normal development.

      There are some typographical errors and overly strong phrases, such as "unequivocally demonstrate," which could be softened. Additionally, the presentation of redundant data in different tissues could be streamlined to enhance clarity and flow.

      We thank the reviewer for this observation. We will remove all typographical errors and make reasonable statements based on our conclusions.

      References:

      (1) Varghese, Jishy, and Stephen M Cohen. “microRNA miR-14 acts to modulate a positive autoregulatory loop controlling steroid hormone signaling in Drosophila.” Genes & development vol. 21,18 (2007): 2277-82. doi:10.1101/gad.439807

      (2) Rewitz, Kim F et al. “The insect neuropeptide PTTH activates receptor tyrosine kinase torso to initiate metamorphosis.” Science (New York, N.Y.) vol. 326,5958 (2009): 1403-5. doi:10.1126/science.1176450

      (3) Pan, Xueyang, and Michael B O'Connor. “Coordination among multiple receptor tyrosine kinase signals controls Drosophila developmental timing and body size.” Cell reports vol. 36,9 (2021): 109644. doi:10.1016/j.celrep.2021.109644

      (4) Pascual-Garcia, Pau et al. “Metazoan Nuclear Pores Provide a Scaffold for Poised Genes and Mediate Induced Enhancer-Promoter Contacts.” Molecular cell vol. 66,1 (2017): 63-76.e6. doi:10.1016/j.molcel.2017.02.020

      (5) Pascual-Garcia, Pau et al. “Nup98-dependent transcriptional memory is established independently of transcription.” eLife vol. 11 e63404. 15 Mar. 2022, doi:10.7554/eLife.63404

      (6) Kadota, Shinichi et al. “Nucleoporin 153 links nuclear pore complex to chromatin architecture by mediating CTCF and cohesin binding.” Nature communications vol. 11,1 2606. 25 May. 2020, doi:10.1038/s41467-020-16394-3

      (7) Gozalo, Alejandro et al. “Core Components of the Nuclear Pore Bind Distinct States of Chromatin and Contribute to Polycomb Repression.” Molecular cell vol. 77,1 (2020): 67-81.e7. doi:10.1016/j.molcel.2019.10.017

      (8) Shimell, MaryJane, and Michael B O'Connor. “Endoreplication in the Drosophila melanogaster prothoracic gland is dispensable for the critical weight checkpoint.” microPublication biology vol. 2023 10.17912/micropub.biology.000741. 21 Feb. 2023, doi:10.17912/micropub.biology.000741

    1. Author response:

      Reviewer #1:

      Summary:

      In this study, the authors propose a "unifying method to evaluate inter-areal interactions in different types of neuronal recordings, timescales, and species". The method consists of computing the variance explained by a linear decoder that attempts to predict individual neural responses (firing rates) in one area based on neural responses in another area.

      The authors apply the method to previously published calcium imaging data from layer 4 and layers 2/3 of 4 mice over 7 days, and simultaneously recorded Utah array spiking data from areas V1 and V4 of 1 monkey over 5 days of recording. They report distributions over "variance explained" numbers for several combinations: from mouse V1 L4 to mouse V1 L2/3, from L2/3 to L4, from monkey V1 to monkey V4, and from V4 to V1. For their monkey data, they also report the corresponding results for different temporal shifts. Overall, they find the expected results: responses in each of the two neural populations are predictive of responses in the other, more so when the stimulus is not controlled than when it is, and with sometimes different results for different stimulus classes (e.g., gratings vs. natural images).

      Strengths:

      (1) Use of existing data.

      (2) Addresses an interesting question.

      Unfortunately, the method falls short of the state of the art: both generalized linear models (GLMs), which have been used in similar contexts for at least 20 years (see the many papers, both theoretical and applied to neural population data, by e.g. Simoncelli, Paninsky, Pillow, Schwartz, and many colleagues dating back to 2004), and the extension of Granger causality to point processes (e.g. Kim et al. PLoS CB 2011). Both approaches are substantially superior to what is proposed in the manuscript, since they enforce non-negativity for spike rates (the importance of which can be seen in Figure 2AB), and do not require unnecessary coarse-graining of the data by binning spikes (the 200 ms time bins are very long compared to the time scale on which communication between closely connected neuronal populations within an area, or between related areas, takes place).

      We thank the reviewer for this suggestion. Our goal was to use a simple and unified linear ridge regression framework that can be applied to both calcium imaging (mouse) and MUAe (monkey) data.

      We will perform a GLM-based analysis enforcing non-negativity as suggested, including in the GLM any additional available variables that may contribute to the neuronal responses.

      We also would like to note that:

      ● Macaque data: Our MUAe data are binned at 25 ms, not 200 ms. We used the envelope

      of multi-unit activity as reported in the original study [1]. We did not perform spike sorting on these data and therefore, strictly speaking, this is not a point process and methods developed for point processes are not directly applicable.

      ● Mouse data: The Stringer et al. dataset [2,3] uses two-photon calcium imaging sampled at 2.5 or 3 Hz. Additionally, responses were computed by averaging two frames per stimulus (yielding an effective bin size of 666 ms or 800 ms), dictated by acquisition constraints. We will emphasize the low temporal resolution of these signals as a limitation in the discussion section, but we cannot improve the temporal resolution with our analyses. These signals are not point processes either (although there is a correlation between two-photon calcium signals and spike rates).

      Regardless of these considerations, the reviewer’s points are well taken, and we will conduct additional analyses as described above.

      In terms of analysis results, the work in the manuscript presents some expected and some less expected results. However, because the monkey data are based on only one monkey (misleadingly, the manuscript consistently uses the plural ‘monkeys’), none of the results specific to that monkey, nor the comparison of that one monkey to mice, are supported by robust data.

      We will add data from at least two more monkeys, as suggested by the reviewer:

      ● First, we will include a second monkey from the same dataset [1]. The reason this monkey was not included in the original submission is that the dataset for this second monkey consisted of much less data than the original. For example, for the lights-off condition, the number of V4 channels with signal-to-noise ratio greater than 2 (recommended electrodes to use by dataset authors) is 9-12 in this second monkey, compared to 68-74 in the first monkey [1]. However, we will still add results for this second monkey.

      ● Additionally, we will include data from a new monkey by collaborating with the Ponce lab who will collect new data for this study.

      One of the main results for mice (bimodality of explained variance values, mentioned in the abstract) does not appear to be quantified or supported by a statistical test.

      We appreciate this point. We will conduct statistical tests to quantify the degree of bimodality and clarify these findings in the results.

      Moreover, the two data sets differ in too many aspects to allow for any conclusions about whether the comparisons reflect differences in species (mouse vs. monkey), anatomy (L2/3-L4 vs. V1-V4), or recording technique (calcium imaging vs. extracellular spiking).

      We agree that the methodological and anatomical differences between the mouse and monkey datasets make any direct cross-species comparisons hard to interpret. We explicitly discuss this point in the Discussion section. We will add a section within the Discussion entitled “Limitations of this study”. We will further emphasize that our goal is not to attempt a direct quantitative comparison across species. We will further emphasize that the two experiments differ in terms of: (i) differences in recording modalities (calcium vs. electrophysiology) and associated differences in temporal resolution, neuronal types, and SNR, (ii) cortical targets (layers vs. areas), (iii) sample size, (iv) stimuli, (v) task conditions. In the revised manuscript, we will further highlight that our primary aim is to investigate inter-areal interactions within each species rather than to draw comparisons across species.

      Reviewer #2:

      Summary:

      In this work, the authors investigated the extent of shared variability in cortical population activity in the visual cortex in mice and macaques under conditions of spontaneous activity and visual stimulation. They argue that by studying the average response to repeated presentations of sensory stimuli, investigators are discounting the contribution of variable population responses that can have a significant impact at the single trial level. They hypothesized that, because these fluctuations are to some degree shared across cortical populations depending on the sources of these fluctuations and the relative connectivity between cortical populations within a network, one should be able to predict the response in one cortical population given the response of another cortical population on a single trial, and the degree of predictability should vary with factors such as retinotopic overlap, visual stimulation, and the directionality of canonical cortical circuits.

      To test this, the authors analyzed previously collected and publicly available datasets. These include calcium imaging of the primary visual cortex in mice and electrophysiology recordings in V1 and V4 of macaques under different conditions of visual stimulation. The strength of this data is that it includes simultaneous recordings of hundreds of neurons across cortical layers or areas. However, the weaknesses of calcium dynamics (which has lower temporal resolution and misses some non-linear dynamics in cortical activity) and multi-unit envelope activity (which reflects fluctuations in population activity rather than the variance in individual unit spike trains), underestimate the variability of individual neurons. The authors deploy a regression model that is appropriate for addressing their hypothesis, and their analytic approach appears rigorous and well-controlled.

      We agree that both calcium imaging and multi-unit envelope recordings have inherent limitations in capturing the variability of individual neuron spiking. Among other factors, the slower temporal resolution of calcium signals can blur fast spiking events, and multi-unit envelopes can mask single-unit heterogeneity. In the Discussion, we will explicitly mention these modality-specific caveats and note that our approach is meant to capture shared variability at the population level rather than the fine temporal structure of individual neurons and individual spikes.

      From their analysis, they found that there was significant predictability of activity between layer II/III and layer IV responses in mice and V1 and V4 activity in macaques, although the specific degree of predictability varied somewhat with the condition of the comparison with some minor differences between the datasets. The authors deployed a variety of analytic controls and explored a variety of comparisons that are both appropriate and convincing that there is a significant degree of predictability in population responses at the single trial level consistent with their hypothesis. This demonstrates that a significant fraction of cortical responses to stimuli is not due solely to the feedforward response to sensory input, and if we are to understand the computations that take place in the cortex, we must also understand how sensory responses interact with other sources of activity in cortical networks. However, the source of these predictive signals and their impact on function is only explored in a limited fashion, largely due to limitations in the datasets. Overall, this work highlights that, beyond the traditionally studied average evoked responses considered in systems neuroscience, there is a significant contribution of shared variability in cortical populations that may contextualize sensory representations depending on a host of factors that may be independent of the sensory signals being studied.

      We will include a section within the Discussion to emphasize the limitations in the datasets used in this study. We also agree and appreciate the reviewer’s description and will borrow some of the reviewer’s terminology to provide context in the Discussion section.

      The different recording modalities and comparisons (within vs. across cortical areas) limit the interpretability of the inter-species comparisons.

      We agree that the methodological and anatomical differences between the mouse and monkey datasets make any direct cross-species comparisons hard to interpret. We explicitly discuss this point in the Discussion section. We will add a section within the Discussion entitled “Limitations of this study”. We will further emphasize that our goal is not to attempt a direct quantitative comparison across species. We will further emphasize that the two experiments differ in terms of: (i) differences in recording modalities (calcium vs. electrophysiology) and associated differences in temporal resolution, neuronal types, and SNR, (ii) cortical targets (layers vs. areas), (iii) sample size, (iv) stimuli, (v) task conditions. In the revised manuscript, we will further highlight that our primary aim is to investigate inter-areal interactions within each species rather than to draw comparisons across species.

      Strengths:

      This work considers a variety of conditions that may influence the relative predictability between cortical populations, including receptive field overlap, latency that may reflect feed-forward or feedback delays, and stimulus type and sensory condition. Their analytic approach is well-designed and statistically rigorous. They acknowledge the limitations of the data and do not over-interpret their findings.

      Weaknesses:

      The different recording modalities and comparisons (within vs. across cortical areas) limit the interpretability of the inter-species comparisons.The mechanistic contribution of known sources or correlates of shared variability (eye movements, pupil fluctuations, locomotion, whisking behaviors) were not considered, and these could be driving or a reflection of much of the predictability observed and explain differences in spontaneous and visual activity predictions.

      We also appreciate this important point. We agree that multiple behavioral factors may significantly contribute to shared variability. In our analyses of the mouse data, we addressed non-visual influences by projecting out “non-visual ongoing neuronal activity” (as shown in Figure 6C, following the approach in Stringer et al. 2019). Additionally, we will further evaluate the contribution of behavioral measures available in the open dataset—such as running speed, whisking, pupil area, and “eigenface” components– to predictivity of neuronal responses.

      For the macaque data, the head-fixed and eye-fixation conditions help minimize some of these other potential behavioral contributions. Moreover, we have performed comparisons of eyes-open versus eyes-closed conditions (see Figure 5D). We will also analyze pupil size specifically for the lights-off condition. We do not have access to any other behavioral data from monkeys.

      Previous work has explored correlations in activity between areas on various timescales, but this work only considered a narrow scope of timescales.

      We appreciate this suggestion. We will perform additional analyses to evaluate predictivity at different temporal scales, as suggested.

      The observation that there is some degree of predictability is not surprising, and it is unclear whether changes in observed predictability with analysis conditions are informative of a particular mechanism or just due to differences in the variance of activity under those conditions. Some of these issues could be addressed with further analysis, but some may be due to limitations in the experimental scope of the datasets and would require new experiments to resolve.

      Our initial analyses in Fig.6A examined the effect of variance in activity and predictability in mice. As the reviewer intuited, there is a correlation between variance and predictability, at least when presenting a stimulus. Importantly, however, this is not the case when predicting activity in the absence of any stimulus. In the macaque, we cannot compute the variance across stimuli in the checkerboard case (single stimulus), but we will compute it for the conditions of the 4 moving bars. In addition, inspired by the reviewer’s question, we will perform an analysis where we further normalize the variance in activity.

      We would like to note that our key contribution is not to merely show that some degree of predictability is possible (which we agree is not surprising) but rather: (i) to use a simple approach to quantify this predictability, (ii) to assess directional differences in predictability, (iii) to evaluate how this predictability depends on neuronal properties and receptive field overlap, (iv) how it depends on the stimuli, and, importantly, (v) to compare predictability during visual stimulation versus absence of visual input.

      We agree with the limitations in the datasets. We will include a section within the Discussion to emphasize these limitations.

      Reviewer #3:

      Neural activity in the visual cortex has primarily been studied in terms of responses to external visual stimuli. While the noisiness of inputs to a visual area is known to also influence visual responses, the contribution of this noisy component to overall visual responses has not been well characterized.

      In this study, the authors reanalyze two previously published datasets - a Ca++ imaging study from mouse V1 and a large-scale electrophysiological study from monkey V1-V4. Using regression models, they examine how neural activity in one layer (in mice) or one cortical area (in monkeys) predicts activity in another layer or area. Their main finding is that significant predictions are possible even in the absence of visual input, highlighting the influence of non-stimulus-related downstream activity on neural responses. These findings can inform future modeling work of neural responses in the visual cortex to account for such non-visual influences.

      A major weakness of the study is that the analysis includes data from only a single monkey. This makes it hard to interpret the data as the results could be due to experimental conditions specific to this monkey, such as the relative placement of electrode arrays in V1 and V4.

      We will add data from at least two more monkeys, as suggested by the reviewer:

      ● First, we will include a second monkey from the same dataset [1]. The reason this monkey was not included in the original submission is that the dataset for this second monkey consisted of much less data than the original. For example, for the lights-off condition, the number of V4 channels with signal-to-noise ratio greater than 2 (recommended electrodes to use by dataset authors) is 9-12 in this second monkey, compared to 68-74 in the first monkey [1]. However, we will still add results for this second monkey.

      ● Additionally, we will include data from a new monkey by collaborating with the Ponce lab who will collect new data for this study.

      The authors perform a thorough analysis comparing regression-based predictions for a wide variety of combinations of stimulus conditions and directions of influence. However, the comparison of stimulus types (Figure 4) raises a potential concern. It is not clear if the differences reported reflect an actual change in predictive influence across the two conditions or if they stem from fundamental differences in the responses of the predictor population, which could in turn affect the ability to measure predictive relationships. The authors do control for some potential confounds such as the number of neurons and self-consistency of the predictor population. However, the predictability seems to closely track the responsiveness of neurons to a particular stimulus. For instance, in the monkey data, the V1 neuronal population will likely be more responsive to checkerboards than to single bars. Moreover, neurons that don't have the bars in their RFs may remain largely silent. Could the difference in predictability be just due to this? Controlling for overall neuronal responsiveness across the two conditions would make this comparison more interpretable.

      This is also a valid concern. As the reviewer noted, we controlled for the number of neurons and degree of self-consistency (Fig. 3A, 3C), and this was always done within their respective stimulus type.

      As the reviewer intuits, in Fig. 6A in mice, we show that predictability correlates with neuronal responsiveness. This observation only held during the stimulus condition and not during the gray screen condition. We also showed correlations with self-consistency metrics as a proxy for responsiveness in Fig. 6A and 6C. However, we will directly assess the impact of responsiveness in two ways: (i) by correlating predictability directly with neuronal responsiveness and (ii) by following the same subsampling approach in Fig. 3 to normalize the degree of responsiveness and recompute the predictability metrics.

      REFERENCES

      (1) Chen, X., Morales-Gregorio, A., Sprenger, J., Kleinjohann, A., Sridhar, S., van Albada, S.J., Grün, S., and Roelfsema, P.R. (2022). 1024-channel electrophysiological recordings in macaque V1 and V4 during resting state. Sci Data 9, 77. https://doi.org/10.1038/s41597-022-01180-1.

      (2) Stringer, C., Pachitariu, M., Steinmetz, N., Carandini, M., and Harris, K.D. (2019). High-dimensional geometry of population responses in visual cortex. Nature 571, 361–365. https://doi.org/10.1038/s41586-019-1346-5.

      (3) Stringer, C., Pachitariu, M., Carandini, M., and Harris, K. (2018). Recordings of 10,000 neurons in visual cortex in response to 2,800 natural images. (Janelia Research Campus). https://doi.org/10.25378/janelia.6845348.v4 https://doi.org/10.25378/janelia.6845348.v4.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Fuchs describes a novel method of enzymatic protein-protein conjugation using the enzyme Connectase. The author is able to make this process irreversible by screening different Connectase recognition sites to find an alternative sequence that is also accepted by the enzyme. They are then able to selectively render the byproduct of the reaction inactive, preventing the reverse reaction, and add the desired conjugate with the alternative recognition sequence to achieve near-complete conversion. I agree with the authors that this novel enzymatic protein fusion method has several applications in the field of bioconjugation, ranging from biophysical assay conduction to therapeutic development. Previously the author has published on the discovery of the Connectase enzymes and has shown its utility in tagging proteins and detecting them by in-gel fluorescence. They now extend their work to include the application of Connectase in creating protein-protein fusions, antibody-protein conjugates, and cyclic/polymerized proteins. As mentioned by the author, enzymatic protein conjugation methods can provide several benefits over other non-specific and click chemistry labeling methods. Connectase specifically can provide some benefits over the more widely used Sortase, depending on the nature of the species that is desired to be conjugated. However, due to a similar lengthy sequence between conjugation partners, the method described in this paper does not provide clear benefits over the existing SpyTag-SpyCatcher conjugation system.  Additionally, specific disadvantages of the method described are not thoroughly investigated, such as difficulty in purifying and separating the desired product from the multiple proteins used. Overall, this method provides a novel, reproducible way to enzymatically create protein-protein conjugates.

      The manuscript is well-written and will be of interest to those who are specifically working on chemical protein modifications and bioconjugation.

      I'd like to comment on two points.

      (1) The benefits over the SpyTag-SpyCatcher system. Here, the conjugation partners are fused via the 12.3 kDa SpyCatcher protein, which is considerably larger than the Connectase fusion sequence (19 aa). This is mentioned in the introduction (p. 1 ln 24-26). Furthermore, SpyTag-SpyCatcher fusions are truly irreversible, while Connectase/BcPAP fusions may be reversed (p. 8, ln 265-273). For example, target proteins (e.g., AGAFDADPLVVEI-Protein) may be covalently fused to functionalized magnetic beads (e.g., Bead-ELASKDPGAFDADPLVVEI) in order to perform a pulldown assay. After the assay, the target protein and any bound interactors could be released from the beads by the addition of a Connectase / peptide (AGAFDAPLVVEI) mixture.

      In a related technology, the SpyTag-SpyCatcher system was split into three components, SpyLigase, SpyTag and KTag  (Fierer et al., PNAS 2014). The resulting method introduces a sequence between the fusion partners (SpyTag (13aa) + KTag (10aa)), which is similar in length to the Connectase fusion sequence (p. 8, ln 297 - 298). Compared to the original method, however, this approach seems to require longer incubation times, while yielding less fusion product (Fierer et al., Figure 2).

      (2) Purification of the fusion product. The method is actually advantageous in this respect, as described in the discussion (p. 8, ln 258-264). Examples are now provided in Figure 6.

      Reviewer #2 (Public review):

      Summary:

      Unlike previous traditional protein fusion protocols, the author claims their proposed new method is fast, simple, specific, reversible, and results in a complete 1:1 fusion. A multi-disciplinary approach from cloning and purification, biochemical analyses, and proteomic mass spec confirmation revealed fusion products were achieved.

      Strengths:

      The author provides convincing evidence that an alternative to traditional protein fusion synthesis is more efficient with 100% yields using connectase. The author optimized the protocol's efficiency with assays replacing a single amino acid and identification of a proline aminopeptidase, Bacilius coagulans (BcPAP), as a usable enzyme to use in the fusion reaction. Multiple examples including Ubiquitin, GST, and antibody fusion/conjugations reveal how this method can be applied to a diverse range of biological processes.

      Weaknesses:

      Though the ~100% ligation efficiency is an advancement, the long recognition linker may be the biggest drawback. For large native proteins that are challenging/cannot be synthesized and require multiple connectase ligation reactions to yield a complete continuous product, the multiple interruptions with long linkers will likely interfere with protein folding, resulting in non-native protein structures. This method will be a good alternative to traditional approaches as the author mentioned but limited to generating epitope/peptide/protein tagged proteins, and not for synthetic protein biology aimed at examining native/endogenous protein function in vitro.

      The assessment is fair, and I have no further comments to add.

      Reviewer #1 (Recommendations for the authors):

      Major/Experimental Suggestions:

      (1) Throughout the paper only one reaction shown via gels had 100% conversion to desired product (Figure 3C). It is misleading to title a paper with absolutes such as "100% product yield", when the majority of reactions show >95% product yield, without any purification. Please change the title of the manuscript to something along the lines of "Novel Irreversible Enzymatic Protein Fusions with Near-Complete Product Yield".

      The conjugation reaction is thermodynamically favored. It is driven by the hydrolysis of a peptide bond (P|GADFDADPLVVEI), which typically releases 8 - 16 kJ/mol energy. This should result in a >99.99% complete reaction (DG° = -RT ln (Product/Educt)). In line with this, 99% - 100% of the less abundant educts (LysS, Figure 3A; MBP, Figure 3B; Ub-Strep, Figure 3C) are converted in the time courses (Figure 3D-F show different reaction conditions, which slow down conjugate formation). 100% conversion are also shown in Figure 5, Figure 6, and Figure S4. Likewise, 99.6% relative fusion product signal intensity in an LCMS analysis (Figure S2) after 4h reaction time (0.13% and 0.25% educts). In this experiment, the proline had been removed from 99.8% of the peptide byproducts (P|GADFDADPLVVEI). It is clear that this reaction is still ongoing and that >99.99% of the prolines will be removed from the peptides in time. These findings suggest that the conjugation reaction gradually slows down the less educt is available, but eventually reaches completion.

      For some experiments, lower product yields (e.g. 97% in Figure 3B) are reported in the paper. These were calculated with Yield = 100% x Product / (Educt1 + Educt 2 + Product). With this formula, 100% conjugation can only be achieved with exactly equimolar educt quantities, because both educt 1 and educt 2 need to be converted entirely. If one educt 1 is available in excess, for example because of protein concentration measurement inaccuracies or pipetting errors, some of it will be left without fusion partner. In case of Figure 3B, 3% more GST seemed to have been in the mixture. These are methodological inaccuracies.

      (2) Please provide at least one example of a purified desired product, and mention the difficulties involved as a disadvantage to this particular method. Separating BcPAP, Connectase, and the desired protein-protein conjugate may prove to be quite difficult, especially when Connectase cleaves off affinity tags.

      Examples are now provided in Figure 6. As described in the discussion (p. 8, ln 258-264), the simple product purification is one of the advantages of the method.

      (3) For the antibody conjugate, please provide an example of conjugating an edduct that would prove to be more useful in the context of antibodies. For example, as you mention in the introduction, conjugation of fluorophores, immobilization tags such as biotin, and small molecule linker/drugs are useful bioconjugates to antibodies.

      Antibody-biotinylation is now shown in Figure S6; Antibody-fluorophore conjugates are part of Figures S5 and S7.

      (4) Please assess the stability of these protein-protein conjugates under various conditions (temperature, pH, time) to ensure that the ligation via Connectase is stable over a broad array of conditions. In particular, a relevant antibody-conjugate stability assay should be done over the period of 1-week in both buffer and plasma to show applicability for potential therapeutics.

      The stability of an antibody-biotin conjugate in blood plasma over 7 days at different temperatures is now shown in Figure S7.

      Generally, Connectase introduces a regular peptide bond (Asp-Ala) with a high chemical and physical stability (e.g. 10 min incubation at 95°C in SDS-PAGE loading buffer; H2O-formic acid / acetonitrile gradients for LC-MS). The sequence may be susceptible to proteases, although this is not the case in HEK293 cells (antibody expression), E. coli, or blood plasma (Figure S7).

      (5) Please conduct functional assays with the antibody-protein/peptide conjugates to show that the antibody retains binding capabilities to the HER-2 antigen and the modification was site-selective, not interfering with the binding paratope or binding ability of the antibody in any way. This can be done through bio-layer interferometry, surface plasmon resonance, ELISA, etc.

      We plan the immobilization of the HER2 antibody on microplates and its use in an ELISA. However, this experiment requires significant testing and optimizations. It will be part of a future paper on the use of Connectase for protein immobilization.

      For now, the mass spectrometry data provide clear evidence of a single site-selective conjugation, as the C-terminal ELASKDPGAFDADPLVVEI-Strep sequence is replaced by ELASKDAGAFDADPLVVEI(-Ub). Given that the conjugation sites at the C-termini are far from the antigen binding sites, and have already been used in a number of other approaches (e.g., SpyTag, SnapTag, Sortase), it appears unlikely that these conjugations interfere with antigen binding.

      (6) Please include gels of all proteins used in ligation reactions after purification steps in the SI to show that each species was pure.

      The pure proteins are now shown in Figure S9.

      (7) Please provide the figures (not just tables) of LC/MS deconvoluted mass spectra graphs for all conjugates, either in the main text or the SI.

      Please specify which spectra you are missing. I believe all relevant spectra are shown in Figures 4, 5, and S3. The primary data can be found in Dataset S2.

      (8) Please provide more information in the methods section on exactly how the densitometry quantification of gel bands was performed with ImageJ.

      Details on the quantification with Image Studio Lite 5.2 were added in the method section (p. 17, ln 461-463).

      Minor Suggestions:

      (1) Page 1, line 19: can include one sentence on what assays these particular bioconjugations are usefule for (e.g. internalization cell studies, binding assays, etc.)

      I prefer not to provide additional details here to keep the text concise and focused.

      (2) Page 1, line 22: "three to ten equivalents" instead of 3x-10x.

      Done.

      (3) Page 1, line 23: While NHS labeling is widely considered non-specific, maleimide conjugation to free cysteines is generally considered specific for engineered free cysteine residues, since native proteins often do not have free cysteine residues available for conjugation. If you are referring to the potential of maleimides to label lysines as well, that should be specifically stated.

      I modified the sentence, now stating that these methods are "can be" unspecific.

      As pointed out, it is possible to achieve specificity by eliminating all other free cysteines and/or engineering a cysteine in an appropriate position. In many other cases, however (e.g., natural antibodies), several cysteines are available, or the sample contains other proteins/peptides. I did not want to go into more detail here and refer to the cited review.

      (4) Page 1, line 31: "and an oligoglycine G(1-5)-B"

      Done.

      (5) Page 1, line 34: It is not clear where in the source these specific Km values are coming from, considering these are variable based on specific conditions/substrates and tend to be reaction-specific.

      I cited another review, which lists the same values, along with a few other measurements (Jacobitz et al., Adv Protein Chem Struct Biol 2017, Table 2). It is clear that each of these measurements differs somewhat, but they are generally comparable (K<sub>M</sub>(LPETG) = 5500 - 8760 µM; K<sub>M</sub>(GGGGG) = 140 - 196 µM). I chose the cited study (Frankel et al., Biochemistry 2005), because it also investigated hydrolysis rates. In this study, the measurements are derived from the plots in Figure 2.

      (6) Page 1, line 47: the comparison to western blots feels a little like apples to oranges, even though this comparison was made in previous literature. Engineering an expressed protein to have this tag and then using the tag to detect and quantify it, feels more akin to a tagging/pull down assay than a western blot in which unmodified proteins are easily detected.

      It is akin to a frequently used type of western blots with tag-specific antiboies, e.g. Anti-His<sub>6</sub>, -Streptavidin, -His<sub>6</sub>, -HA ,-cMyc, -Flag. I modified the sentence to clarify this.

      (7) Page 2, line 51: "Connectase cleaves between the first D and P amino acids in the recognition sequence, resulting in an N-terminal A-ELASKD-Connectase intermediate and a C-terminal PGAFDADPLVVEI peptide."

      I prefer the current sentence, because we assume that a bond between the aspartate and Connectase is formed before PGAFDADPLVVEI is cleaved off.

      (8) Page 3, line 94: "Exact determination is not possible due to reversibility of the reaction", the way it is stated now sounds like it is a flaw in the methods. Also, update Figure 2 to read "Estimated relative ligation rate".

      Done.

      (9) Page 3, lines 101-107: This is worded in a confusing way. It can either be X<sub>1</sub> or X<sub>2</sub> that is inactivated depending on if the altered amino acid is on the original protein sequence or on the desired edduct to conjugate. You first give examples of how to render other amino acids inactive, but then ultimately state that proline made inactive, so separate the two distinct possibilities a bit more clearly.

      The reaction requires the inactivation of X<sub>1</sub>, without affecting X<sub>2</sub> (ln 100 - 102). This is true, no matter whether it is X<sub>1</sub> = A, C, S, or P that is inactivated. I added a sentence to clarify this (ln 102 – 103).

      (10) Page 4, line 118: Give a one-sentence justification for why these proteins were chosen to work with (easy to express, stable, etc).

      Done.

      (11) Page 5, line 167: "payload molecules".

      Done.

      (12) Page 5, lines 170-173: Word this more clearly- "full conversion with many of these methods is difficult on antibodies due to each heavy and light chain being modified separately, resulting in only a total yield of 66% DAR4 even when 90% of each chain is conjugated."

      I rephrased the section.

      (13) Page 8, line 290: Discuss other disadvantages of this method including difficulties purifying and in incorporating such a long sequence into proteins of interest.

      Product purification is shown in the new Figure 6. As stated above, I consider the simple purification process an advantage of the method.  The genetic incorporation of the sequence into proteins is a routine process and should not make any difficulties. The disadvantages of long linker sequences between fusion partners are now discussed (p.8 – 9, ln 300-302).

      (14) Page 10, line 341: 'The experiment is described and discussed in detail in a previously published paper.31"

      Done.

      Reviewer #2 (Recommendations for the authors):

      Minor Points:

      (1) It's unclear how the author derived 100% ligation rate with X = Proline in Figure 2 when there is still residual unligated UB-Strep at 96h. Please provide an expanded explanation for those not familiar with the protocol. Is the assumption made that there will be no UB-Strep if the assay was carried out beyond 96h?

      I clarified the figure legend. The assay shows the formation of an equilibrium between educts and products. Therefore, only ~50% Ub-Strep is used with X = Proline (see p. 2, ln 79 - 81). The "relative ligation rate" refers to the relative speed with which this equilibrium is established. The highest rate is seen with X = Proline, and it is set to 100%. The other rates are given relative to the product formation with X = Proline.

      (2) Though the qualitative depiction of the data in Figure 3 is appreciated, an accompanying graphical representation of the data in the same figure will greatly enhance reception and better comprehension of several of the author's conclusions.

      Graphs are now shown in Figure S1.

      (3) Figure 3 panel E is misaligned. Please align it with panel B above it.

      Done, thank you.

      (4) The author refers to 'The resulting circular assemblies (37% UB2...)' in the text but identifies it as UB-C2 in Figure 5B. Is this a mistake or does UB2 refer to another assembly not mentioned in the Figures? Please check for inconsistencies.

      All circular assemblies are now labeled Ub-C <sub>1-6</sub>.

      (5) Finishing with a graphical schematic that depicts the entire protocol in a simple image would be much appreciated and well-received by readers. Including the scheme with A and B proteins, the recognition linkers, the addition of connectase and BcPAP, etc. to the final resulting protein with connected linker.

      A graphical summary of the reaction is now included in Figure 6.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Fuchsberger et al. demonstrate a set of experiments that ultimately identifies the de novo synthesis of GluA1-, but not GluA2-containing Ca2+ permeable AMPA receptors as a key driver of dopamine-dependent LTP (DA-LTP) during conventional post-before-pre spike-timing dependent (t-LTD) induction. The authors further identify adenylate cyclase 1/8, cAMP, and PKA as the crucial mitigators of these actions. While some comments have been identified below, the experiments presented are thorough and address the aims of the manuscript, figures are presented clearly (with minor comments), and experimental sample sizes and statistical analyses are suitable. Suitable controls have been utilized to confirm the role of Ca2+ permeable AMPAR. This work provides a valuable step forward built on convincing data toward understanding the underlying mechanisms of spike-timing-dependent plasticity and dopamine.

      Strengths:

      Appropriate controls were used.

      The flow of data presented is logical and easy to follow.

      The quality of the data, except for a few minor issues, is solid.

      Weaknesses:

      The drug treatment duration of anisomycin is longer than the standard 30-45 minute duration (as is the 500uM vs 40uM concentration) typically used in the field. Given the toxicity of these kinds of drugs long term it's unclear why the authors used such a long and intense drug treatment.

      In an initial set of control experiments (Figure S 1C-D) we wanted to ensure that protein synthesis was definitely blocked and therefore used a relatively high concentration of anisomycin and a relatively long pre-incubation period. We agree with the Reviewer that we cannot exclude the possibility that this treatment could compromise cell health in addition to the protein synthesis block. Therefore, we carried out an additional experiment with an alternative protein synthesis inhibitor cycloheximide at a lower standard concentration (10 µM) which confirmed a significant reduction in the puromycin signal (Figure S 1A-B). Together these results support the conclusion that puromycin signal is specific to protein synthesis in our labelling assay.

      Furthermore, in the electrophysiology experiments, we used 500 μM anisomycin in the patch pipette solution. Under these conditions, we recorded a stable EPSP baseline for 60 minutes, indicating that the treatment did not cause toxic effects to the cell (Figure S1F). This high concentration would ensure an effective block of local translation at dendritic sites. Nevertheless, we also carried out this experiment with cycloheximide at a lower standard concentration (10 µM) and observed a similar result with both protein synthesis inhibitors (Figure 1F).

      With some of the normalizations (such as those in S1) there are dramatic differences in the baseline "untreated" puromycin intensities - raising some questions about the overall health of slices used in the experiments.

      We agree with the Reviewer that there is a large variability in the normalised puromycin signal which might be due to variability in the health of slices. However, we assume that the same variability would be present in the treated slices, which showed, despite the variability, a significant inhibition of protein synthesis. To avoid any bias by excluding slices with low puromycin signal in the control condition, we present the full dataset.

      The large set of electrophysiology experiments carried out in our study (all recorded cells were evaluated for healthy resting membrane potential, action potential firing, and synaptic responses) confirmed that, generally, the vast majority of our slices were indeed healthy. 

      Reviewer #2 (Public Review):

      Summary:

      The aim was to identify the mechanisms that underlie a form of long-term potentiation (LTP) that requires the activation of dopamine (DA).

      Strengths:

      The authors have provided multiple lines of evidence that support their conclusions; namely that this pathway involves the activation of a cAMP / PKA pathway that leads to the insertion of calcium-permeable AMPA receptors.

      Weaknesses:

      Some of the experiments could have been conducted in a more convincing manner.

      We carried out additional control experiments and analyses to address the specific points that were raised.

      Reviewer #3 (Public Review):

      The manuscript of Fuchsberger et al. investigates the cellular mechanisms underlying dopamine-dependent long-term potentiation (DA-LTP) in mouse hippocampal CA1 neurons. The authors conducted a series of experiments to measure the effect of dopamine on the protein synthesis rate in hippocampal neurons and its role in enabling DA-LTP. The key results indicate that protein synthesis is increased in response to dopamine and neuronal activity in the pyramidal neurons of the CA1 hippocampal area, mediated via the activation of adenylate cyclases subtypes 1 and 8 (AC1/8) and the cAMP-dependent protein kinase (PKA) pathway. Additionally, the authors show that postsynaptic DA-induced increases in protein synthesis are required to express DA-LTP, while not required for conventional t-LTP.

      The increased expression of the newly synthesized GluA1 receptor subunit in response to DA supports the formation of homomeric calcium-permeable AMPA receptors (CP-AMPARs). This evidence aligns well with data showing that DA-LTP expression requires the GluA1 AMPA subunit and CP-AMPARs, as DA-LTP is absent in the hippocampus of a GluA1 genetic knock-out mouse model. Overall, the study is solid, and the evidence provided is compelling. The authors clearly and concisely explain the research objectives, methodologies, and findings. The study is scientifically robust, and the writing is engaging. The authors' conclusions and interpretation of the results are insightful and align well with the literature. The discussion effectively places the findings in a meaningful context, highlighting a possible mechanism for dopamine's role in the modulation of protein-synthesis-dependent hippocampal synaptic plasticity and its implications for the field. Although the study expands on previous works from the same laboratory, the findings are novel and provide valuable insights into the dynamics governing hippocampal synaptic plasticity.

      The claim that GluA1 homomeric CP-AMPA receptors mediate the expression of DA-LTP is fascinating, and although the electrophysiology data on GluA1 knock-out mice are convincing, more evidence is needed to support this hypothesis. Western blotting provides useful information on the expression level of GluA1, which is not necessarily associated with cell surface expression of GluA1 and therefore CP-AMPARs. Validating this hypothesis by localizing the protein using immunofluorescence and confocal microscopy detection could strengthen the claim. The authors should briefly discuss the limitations of the study.

      Although it would be possible to quantify the surface expression of GluA1 using immunofluorescence, it would not be possible to distinguish  between GluA1 homomers and GluA1-containing heteromers. It would therefore not be informative as to whether these are indeed CP-AMPARs. This is an interesting problem, which we have briefly discussed in the Discussion section.

      Additional comments to address:

      (1) In Figure 2A, the representative image with PMY alone shows a very weak PMY signal. Consequently, the image with TTX alone seems to potentiate the PMY signal, suggesting a counterintuitive increase in protein synthesis.

      We agree with the Reviewer that the original image was not representative and have replaced it with a more representative image.

      (2) In Figures 3A-B, the Western blotting representative images have poor quality, especially regarding GluA1 and α-actin in Figure 3A. The quantification graph (Figure 3B) raises some concerns about a potential outlier in both the DA alone and DA+CHX groups. The authors should consider running a statistical test to detect outlier data. Full blot images, including ladder lines, should be added to the supplementary data.

      We have replaced the western blot image in Figure 3A and have also presented full blot images including ladder lines in supplementary Figure S3.

      Using the ROUT method (Q=1%) we identified one outlier in the DA+CHX group of the western blot quantification. The quantification for this blot was then removed from the dataset and the experiment was repeated to ensure a sufficient number of repeats.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) How the authors perform these experiments with puromycin, these are puromycilation experiments - not SuNSET. The SuNSET protocol (surface sensing of translation) specifically refers to the detection of newly synthesized proteins externally at the plasma membrane. I'd advise to update the terminology used.

      We thank the Reviewer for pointing this out. We have updated this to ‘puromycin-based labelling assay’.

      (2) The legend presented in Figure 2F suggests WT is green and ACKO is orange, however, in Figure 2G the WT LTP trace is orange, consider changing this to green for consistency.

      We thank the Reviewer for this suggestion and agree that a matching colour scheme makes the Figure clearer. This has been updated.

      (3) In the results section, it is recommended to include units for the values presented at the first instance and only again when the units change thereafter.

      The units of the electrophysiology data were [%], this is included in the Results section. Results of western blots and IHC images were presented as [a.u.]. While we included this in the Figures, we have not specifically added this to the text of individual results. 

      (4) Two hours pre-treatment with anisomycin vs 30 minutes pretreatment with cycloheximide seems hard to directly compare - as the pharmokinetics of translational inhibition should be similar for both drugs. What was the rationale for the extremely long anisomycin pretreatment? What controls were taken to assess slice health either prior to or following fixation? This is relevant to the below point (5).

      In an initial set of control experiments (Figure S 1C-D) we wanted to ensure that protein synthesis was definitely blocked and therefore used a relatively high concentration of anisomycin and a relatively long pre-incubation period. We agree with the Reviewer that we cannot exclude the possibility that this treatment could compromise cell health in addition to the protein synthesis block. Therefore, we carried out an additional experiment with an alternative protein synthesis inhibitor cycloheximide at a lower standard concentration (10 µM) which confirmed a significant reduction in the puromycin signal (Figure S1A-B). Together these results support the conclusion that puromycin signal is specific to protein synthesis in our labelling assay.

      IHC slices were visually assessed for health. The large set of electrophysiology experiments carried out in our study (all recorded cells were evaluated for healthy resting membrane potential, action potential firing, and synaptic responses) also confirmed that, generally, the vast majority of our slices were indeed healthy. 

      (5) In Supplementary Figure 1, there is a dramatic difference in the a.u. intensities across CHX (B) and AM (D), please explain the reason for this. It is understood these are normalised values to nuclear staining, please clarify if this is a nuclear area.

      We agree with the Reviewer that there is a large variability in normalised puromycin signal which may be due to variability in the health of the slices. However, we assume that the same variability would be present in the treated slices, which showed, despite the variability, a significant effect of protein synthesis inhibition. To prevent introducing bias by excluding slices with low puromycin signal in the control condition, we present the full dataset.

      The CA1 region of the hippocampus contains of a dense layer of neuronal somata (pyramidal cell layer). We normalized against the nuclear area as it provides a reliable estimate of the number of neurons present in the image. This approach minimizes bias by accounting for variation in the number of neurons within the visual field, ensuring consistency and accuracy in our analysis.

      (6) Please clarify the decision to average both the last 5 minutes of baseline recordings and the last 5 minutes of the recording for the normalisation of EPSP slopes.

      The baseline usually stabilises after a few minutes of recording, thus the last 5 minutes were used for baseline measurement, which are the most relevant datapoints to compare synaptic weight change to. After induction of STDP, potentiation or depression of synaptic weights develops gradually. Based on previous results, evaluating the EPSP slopes at 30-40 minutes after the induction protocol gives a reliable estimate of the amount of plasticity.

      Reviewer #2 (Recommendations For The Authors):

      The concentration of anisomycin used (0.5 mM) is very high.

      As described above, in an initial set of control experiments (Figure S 1C-D) we wanted to ensure that protein synthesis was definitely blocked and therefore used a relatively high concentration of anisomycin and a relatively long pre-incubation period. We agree with the Reviewer that this is higher than the standard concentration used for this drug and we cannot exclude the possibility that this treatment could compromise cell health in addition to the protein synthesis block. Therefore, we carried out an additional experiment with an alternative protein synthesis inhibitor cycloheximide at a lower standard concentration (10 µM) which confirmed a significant reduction in the puromycin signal (Figure S1A-B). Together these results support the conclusion that puromycin signal is specific to protein synthesis in our labelling assay.

      Furthermore, in the electrophysiology experiments, we also used 500 µM anisomycin in the patch pipette solution. Under these conditions, we recorded a stable EPSP baseline for 60 minutes, indicating that the treatment did not cause toxic effects to the cell (Figure S1F). This high concentration would ensure an effective block of local translation at dendritic sites. Nevertheless, we also carried out this experiment with cycloheximide at a lower standard concentration (10 µM) and observed a similar result with both protein synthesis inhibitors (Figure 1F).

      The authors conclude that the effect of DA is mediated via D1/5 receptors, which based on previous work seems likely. But they cannot conclude this from their current study which used a combination of a D1/D5 and a D2 antagonist.

      We thank the Reviewer for pointing this out. We agree and have updated this in the Discussion section to ‘dopamine receptors’, without specifying subtypes.

      There is no mention that I can see that the KO experiments were conducted in a blinded manner (which I believe should be standard practice). Did they verify the KOs using Westerns?

      Only a subset of the experiments was conducted in a blinded manner. However, the results were collected by two independent experimenters, who both observed significant effects in KO mice compared to WTs (TF and ZB).

      We received the DKO mice from a former collaborator, who verified expression levels of the KO mice (Wang et al., 2003). We verified DKO upon arrival in our facility using genotyping.

      Maybe I'm misunderstanding but it appears to me that in Figure 1F there is LTP prior to the addition of DA. (The first point after pairing is already elevated). I think the control of pairing without DA should be added.

      We thank the Reviewer for pointing this out. Based on previous results (Brzosko et al., 2015) we would expect potentiation to develop over time once DA is added after pairing, however, it indeed appears in the Figure here as if there was an immediate increase in synaptic weights after pairing. It should be noted, however, that when comparing the first 5 minutes after pairing to the baseline, this increase was not significant (t(9)=1.810, p =0.1037). Nevertheless, we rechecked our data and noticed that this initial potentiation was biased by one cell with an increasing baseline, which had both the test and control pathway strongly elevated. We had mistakenly included this cell in the dataset, despite the unstable conditions (as stated in the Methods section, the unpaired control pathway served as a stability control). We apologise for the error and this has now been corrected (Figure 1F). In addition, we present the control pathway in Figure S1G and I.

      We have also now included the control for post-before-pre pairing (Δt = -20 ms) without dopamine in a supplemental figure (Figure S1E and F).

      The Westerns (Figure 3A) are fairly messy. Also, it is better to quantify with total protein. Surface biotinylation of GluA1 and GluA2 would be more informative.

      We carried out more repeats of Western blots and have exchanged blots in Figure 3A.

      We observed that DA increases protein synthesis, we therefore cannot exclude the possibility that application of DA could also affect total protein levels. Thus quantifying with total protein may not be the best choice here. Quantification with actin is standard practice.

      While we agree with the Reviewer that surface biotinylation of GluA1 and GluA2 could in principle be more informative, we do not think it would work well in our experimental setup using acute slice preparation, as it strictly requires intact cells. Slicing generates damaged cells, which would take up the surface biotin reagents. This would cause unspecific biotinylation of the damaged cells, leading to a strong background signal in the assay.

      In Figure 4 panels D and E the baselines are increasing substantially prior to induction. I appreciate that long stable baselines with timing-dependent plasticity may not be possible but it's hard to conclude what happened tens of minutes later when the baseline only appears stable for a minute or two. Panels A and B show that relatively stable baselines are achievable.

      We agree with the Reviewer that the baselines are increasing, however, when looking at the baseline for 5 minutes prior to induction (5 last datapoints of the baseline), which is what we used for quantification, the baselines appeared stable. Unfortunately, longer baselines are not suitable for timing-dependent plasticity. In addition, all experiments were carried out with a control pathway which showed stable conditions throughout the recording.

      In general, the discussion could be better integrated with the current literature. Their experiments are in line with a substantial body of literature that has identified two forms of LTP, based on these signalling cascades, using more conventional induction patterns.

      We thank the Reviewer for this suggestion and have added more discussion of the two forms of LTP in the Discussion section.

      It would be helpful to include the drug concentrations when first described in the results.

      Drug concentration have now been included in the Results section.

      It is now more common to include absolute t values (not just <0.05 etc).

      While we indicate significance in Figures using asterisks when p values are below the indicated significance levels, we report absolute values of p and t values in the Results section.

      Similarly full blots should be added to an appendix / made available.

      We have now included full blot images in Supplementary Figure S3.

      A 30% tolerance for series resistance seems generous to me. (10-20% would be more typical).

      We thank the Reviewer for their suggestion, and will keep this in mind for future studies. However, the error introduced by the higher tolerance level is likely to be small and would not influence any of the qualitative conclusions of the manuscript.

      Whereas series resistance is of course extremely important in voltage-clamp experiments, changes in series resistance would be less of a concern in current-clamp recordings of synaptic events. We use the amplifier as a voltage follower, and there are two problems with changes in the electrode, or access, resistance. First, there is the voltage drop across the electrode resistance. Clearly this error is zero if no current is injected and is also negligible for the currents we use in our experiments to maintain the membrane voltage at -70 mV. For example, the voltage drop would be 0.2 mV for 20 pA current through a typical 10 MOhm electrode resistance, and a change in resistance of 30% would give less than 0.1 mV voltage change even if the resistance were not compensated. The second problem is distortion of the EPSP shape due to the low-pass filtering properties of the electrode set up by the pipette capacitance and series resistance (RC). This can be a significant problem for fast events, such as action potentials, but less of a problem for the relatively slow EPSPs recorded in pyramidal cells. Nevertheless, we take on board the advice provided by the Reviewer and will use the conventional tolerance of 20% in future experiments.

      Reviewer #3 (Recommendations For The Authors):

      In the references, the entry for Burnashev N et al. has a different font size. Please ensure that all references are formatted consistently.

      We thank the Reviewer for spotting this and have updated the font size of this reference.

    1. Author response:

      eLife Assessment

      Birdsong production depends on precise neural sequences in a vocal motor nucleus HVC. In this useful biophysical model, Daou and colleagues identify specific biophysical parameters that result in sparse neural sequences observed in vivo. While the model is presently incomplete because it is overfit to produce sequences and therefore not robust to real biological variation, the model has the potential to address some outstanding issues in HVC function.

      We are grateful for the extensive supportive comments from the reviewers, including broad, strong appreciation of the novel aspects of our manuscript. We believe these will be only strengthened in the next submission.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper presents a model for sequence generation in the zebra finch HVC, which adheres to cellular properties measured experimentally. However, the model is fine-tuned and exhibits limited robustness to noise inherent in the inhibitory interneurons within the HVC, as well as to fluctuations in connectivity between neurons. Although the proposed microcircuits are introduced as units for sub-syllabic segments (SSS), the backbone of the network remains a feedforward chain of HVC_RA neurons, similar to previous models.

      Strengths:

      The model incorporates all three of the major types of HVC neurons. The ion channels used and their kinetics are based on experimental measurements. The connection patterns of the neurons are also constrained by the experiments.

      Weaknesses:

      The model is described as consisting of micro-circuits corresponding to SSS. This presentation gives the impression that the model's structure is distinct from previous models, which connected HVC_RA neurons in feedforward chain networks (Jin et al 2007, Li & Greenside, 2006; Long et al 2010; Egger et al 2020). However, the authors implement single HVC_RA neurons into chain networks within each micro-circuit and then connect the end of the chain to the start of the chain in the subsequent micro-circuit. Thus, the HVC_RA neuron in their model forms a single-neuron chain. This structure is essentially a simplified version of earlier models.

      In the model of the paper, the chain network drives the HVC_I and HVC_X neurons. The role of the micro-circuits is more significant in organizing the connections: specifically, from HVC_RA neurons to HVC_I neurons, and from HVC_I neurons to both HVC_X and HVC_RA neurons.

      We thank Reviewer 1 for their thoughtful comments.

      While the reviewer is correct about the fact that the propagation of sequential activity in this model is primarily carried by HVC<sub>RA</sub> neurons in a feed-forward manner, we need to emphasize that this is true only if there is no intrinsic or synaptic perturbation to the HVC network. For example, we showed in Figures 10 and 12 how altering the intrinsic properties of HVC<sub>X</sub> neurons or for interneurons disrupts sequence propagation. In other words, while HVC<sub>RA</sub> neurons are the key forces to carry the chain forward, the interplay between excitation and inhibition in our network as well as the intrinsic parameters for all classes of HVC neurons are equally important forces in carrying the chain of activity forward. Thus, the stability of activity propagation necessary for song production depend on a finely balanced network of HVC neurons, with all classes contributing to the overall dynamics. Moreover, all existing models that describe premotor sequence generation in the HVC either assume a distributed model (Elmaleh et al., 2021) that dictates that local HVC circuitry is not sufficient to advance the sequence but rather depends upon momentto-moment feedback through Uva (Hamaguchi et al., 2016), or assume models that rely on intrinsic connections within HVC to propagate sequential activity. In the latter case, some models assume that HVC is composed of multiple discrete subnetworks that encode individual song elements (Glaze & Troyer, 2013; Long & Fee, 2008; Wang et al., 2008), but lacks the local connectivity to link the subnetworks, while other models assume that HVC may have sufficient information in its intrinsic connections to form a single continuous network sequence (Long et al. 2010). The HVC model we present extends the concept of a feedforward network by incorporating additional neuronal classes that influence the propagation of activity (interneurons and HVC<sub>X</sub> neurons). We have shown that any disturbance of the intrinsic or synaptic conductances of these latter neurons will disrupt activity in the circuit even when HVC<sub>RA</sub> neurons properties are maintained.

      In regard to the similarities between our model and earlier models, several aspects of our model distinguish it from prior work. In short, while several models of how sequence is generated within HVC have been proposed (Cannon et al., 2015; Drew & Abbott, 2003; Egger et al., 2020; Elmaleh et al., 2021; Galvis et al., 2018; Gibb et al., 2009a, 2009b; Hamaguchi et al., 2016; Jin, 2009; Long & Fee, 2008; Markowitz et al., 2015), all the models proposed either rely on intrinsic HVC circuitry to propagate sequential activity, rely on extrinsic feedback to advance the sequence or rely on both. These models do not capture the complex details of spike morphology, do not include the right ionic currents, do not incorporate all classes of HVC neurons, or do not generate realistic firing patterns as seen in vivo. Our model is the first biophysically realistic model that incorporates all classes of HVC neurons and their intrinsic properties. We tuned the intrinsic and the synaptic properties bases on the traces collected by Daou et al. (2013) and Mooney and Prather (2005) as shown in Figure 3. The three classes of model neurons incorporated to our network as well as the synaptic currents that connect them are based on HodgkinHuxley formalisms that contain ion channels and synaptic currents which had been pharmacologically identified. This is an advancement over prior models that primarily focused on the role of synaptic interactions or external inputs. The model is based on a feedforward chain of microcircuits that encode for the different sub-syllabic segments and that interact with each other through structured feedback inhibition, defining an ordered sequence of cell firing. Moreover, while several models highlight the critical role of inhibitory interneurons in shaping the timing and propagation of bursts of activity in HVC<sub>RA</sub> neurons, our work offers an intricate and comprehensive model that help understand this critical role played by inhibition in shaping song dynamics and ensuring sequence propagation.

      How useful is this concept of micro-circuits? HVC neurons fire continuously even during the silent gaps. There are no SSS during these silent gaps.

      Regarding the concern about the usefulness of the 'microcircuit' concept in our study, we appreciate the comment and we are glad to clarify its relevance in our network. While we acknowledge that HVC<sub>RA</sub> neurons interconnect microcircuits, our model's dynamics are still best described within the framework of microcircuitry particularly due to the firing behavior of HVC<sub>X</sub> neurons and interneurons. Here, we are referring to microcircuits in a more functional sense, rather than rigid, isolated spatial divisions (Cannon et al. 2015). A microcircuit in our model reflects the local rules that govern the interaction between all HVC neuron classes within the broader network, and that are essential for proper activity propagation. For example, HVC<sub>INT</sub> neurons belonging to any microcircuit burst densely and at times other than the moments when the corresponding encoded SSS is being “sung”. What makes a particular interneuron belong to this microcircuit or the other is merely the fact that it cannot inhibit HVC<sub>RA</sub> neurons that are housed in the microcircuit it belongs to. In particular, if HVC<sub>INT</sub> inhibits HVC<sub>RA</sub> in the same microcircuit, some of the HVC<sub>RA</sub> bursts in the microcircuit might be silenced by the dense and strong HVC<sub>INT</sub> inhibition breaking the chain of activity again. Similarly, HVC<sub>X</sub> neurons were selected to be housed within microcircuits due to the following reason: if an HVC<sub>X</sub> neuron belonging to microcircuit i sends excitatory input to an HVC<sub>INT</sub> neuron in microcircuit j, and that interneuron happens to select an HVC<sub>RA</sub> neuron from microcircuit i, then the propagation of sequential activity will halt, and we’ll be in a scenario similar to what was described earlier for HVC<sub>INT</sub> neurons inhibiting HVC<sub>RA</sub> neurons in the same microcircuit.

      We agree that there are no sub-syllabic segments described during the silent gaps and we thank the reviewer to pointing this out. Although silent gaps are integral to the overall process of song production, we have not elaborated on them in this model due to the lack of a clear, biophysically grounded representation for the gaps themselves at the level of HVC. Our primary focus has been on modeling the active, syllable-producing phases of the song, where the HVC network’s sequential dynamics are critical for song. However, one can think the encoding of silent gaps via similar mechanisms that encode SSSs, where each gap is encoded by similar microcircuits comprised of the three classes of HVC neurons (let’s called them GAP rather than SSS) that are active only during the silent gaps. In this case, the propagation of sequential activity is carried throughout the GAPs from the last SSS of the previous syllable to the first SSS of the subsequent syllable. We’ll make sure to emphasize this mechanism more in the revised version of the manuscript.

      A significant issue of the current model is that the HVC_RA to HVC_RA connections require fine-tuning, with the network functioning only within a narrow range of g_AMPA (Figure 2B). Similarly, the connections from HVC_I neurons to HVC_RA neurons also require fine-tuning. This sensitivity arises because the somatic properties of HVC_RA neurons are insufficient to produce the stereotypical bursts of spikes observed in recordings from singing birds, as demonstrated in previous studies (Jin et al 2007; Long et al 2010). In these previous works, to address this limitation, a dendritic spike mechanism was introduced to generate an intrinsic bursting capability, which is absent in the somatic compartment of HVC_RA neurons. This dendritic mechanism significantly enhances the robustness of the chain network, eliminating the need to fine-tune any synaptic conductances, including those from HVC_I neurons (Long et al 2010).

      Why is it important that the model should NOT be sensitive to the connection strengths?

      We thank the reviewer for the comment. While mathematical models designed for highly complex nonlinear biological processes tangentially touch the biological realism, the current network as is right now is the first biologically realistic-enough network model designed for HVC that explains sequence propagation. We do not include dendritic processes in our network although that increases the realistic dynamics for various reasons. 1) The ion channels we integrated into the somatic compartment are known pharmacologically (Daou et al. 2013), but we don’t know about the dendritic compartment’s intrinsic properties of HVC neurons and the cocktail of ion channels that are expressed there. 2) We are able to generate realistic bursting in HVC<sub>RA</sub> neurons despite the single compartment, and the main emphasis in this network is on the interactions between excitation and inhibition, the effects of ion channels in modulating sequence propagation, etc. 3) The network model already incorporates thousands of ODEs that govern the dynamics of each of the HVC neurons, so we did not want to add more complexity to the network especially that we don’t know the biophysical properties of the dendritic compartments.

      Therefore, our present focus is on somatic dynamics and the interaction between HVC<sub>RA</sub> and HVC<sub>INT</sub> neurons, but we acknowledge the importance of these processes in enhancing network resiliency. Although we agree that adding dendritic processes improves robustness, we still think that somatic processes alone can offer insightful information on the sequential dynamics of the HVC network. While the network should be robust across a wide range of parameters, it is also essential that certain parameters are designed to filter out weaker signals, ensuring that only reliable, precise patterns of activity propagate. Hence, we specifically chose to make the HVC<sub>RA</sub>-to-HVC<sub>RA</sub> excitatory connections more sensitive (narrow range of values) such that only strong, precise and meaningful stimuli can propagate through the network representing the high stereotypy and precision seen in song production.

      First, the firing of HVC_I neurons is highly noisy and unreliable. HVC_I neurons fire spontaneous, random spikes under baseline conditions. During singing, their spike timing is imprecise and can vary significantly from trial to trial, with spikes appearing or disappearing across different trials. As a result, their inputs to HVC_RA neurons are inherently noisy. If the model relies on precisely tuned inputs from HVC_I neurons, the natural fluctuations in HVC_I firing would render the model non-functional. The authors should incorporate noisy HVC_I neurons into their model to evaluate whether this noise would render the model non-functional.

      We acknowledge that under baseline and singing settings, interneurons fire in an extremely noisy and inaccurate manner, although they exhibit time locked episodes in their activity (Hahnloser et al 2002, Kozhinikov and Fee 2007). In order to mimic the biological variability of these neurons, our model does, in fact, include a stochastic current to reflect the intrinsic noise and random variations in interneuron firing shown in vivo (and we highlight this in the Methods). If necessary and to make sure the network is resilient to this randomness in interneuron firing, we will investigate different approaches to enhance the noise representation even further and check its effect on sequence propagation.

      Second, Kosche et al. (2015) demonstrated that reducing inhibition by suppressing HVC_I neuron activity makes HVC_RA firing less sparse but does not compromise the temporal precision of the bursts. In this experiment, the local application of gabazine should have severely disrupted HVC_I activity. However, it did not affect the timing precision of HVC_RA neuron firing, emphasizing the robustness of the HVC timing circuit. This robustness is inconsistent with the predictions of the current model, which depends on finely tuned inputs and should, therefore, be vulnerable to such disruptions.

      We thank the reviewer for the comment. The differences between the Kosche et al. (2015) findings and the predictions of our model arise from differences in the aspect of HVC function we are modeling. Our model is more sensitive to inhibition, which is a designed mechanism for achieving precise song patterning. This is a modeling simplification we adopted to capture specific characteristics of HVC function. Hence, Kosche et al. (2015) findings do not invalidate the approach of our model, but highlights that HVC likely operates with several, redundant mechanisms that overall ensure temporal precision.Nevertheless, we will investigate further the effects of the degree of inhibition on song patterning.

      Third, the reliance on fine-tuning of HVC_RA connections becomes problematic if the model is scaled up to include groups of HVC_RA neurons forming a chain network, rather than the single HVC_RA neurons used in the current work. With groups of HVC_RA neurons, the summation of presynaptic inputs to each HVC_RA neuron would need to be precisely maintained for the model to function. However, experimental evidence shows that the HVC circuit remains functional despite perturbations, such as a few degrees of cooling, micro-lesions, or turnover of HVC_RA neurons. Such robustness cannot be accounted for by a model that depends on finely tuned connections, as seen in the current implementation.

      Our model of individual HVC<sub>RA</sub> neurons and as stated previously is reductive model that focuses on understanding the mechanisms that govern sequential neural activity. We agree that scaling the model to include many of HVC<sub>RA</sub> neurons poses challenges, specifically concerning the summation of presynaptic inputs. However, our model can still be adapted to a larger network without requiring the level of fine-tuning currently needed. In fact, the current fine-tuning of synaptic connections in the model is a reflection of fundamental network mechanisms rather than a limitation when scaling to a larger network. Besides, one important feature of this neural network is redundancy. Even if some neurons or synaptic connections are impaired, other neurons or pathways can compensate for these changes, allowing the activity propagation to remain intact.

      The authors examined how altering the channel properties of neurons affects the activity in their model. While this approach is valid, many of the observed effects may stem from the delicate balancing required in their model for proper function.

      In the current model, HVC_X neurons burst as a result of rebound activity driven by the I_H current. Rebound bursts mediated by the I_H current typically require a highly hyperpolarized membrane potential. However, this mechanism would fail if the reversal potential of inhibition is higher than the required level of hyperpolarization. Furthermore, Mooney (2000) demonstrated that depolarizing the membrane potential of HVC_X neurons did not prevent bursts of these neurons during forward playback of the bird's own song, suggesting that these bursts (at least under anesthesia, which may be a different state altogether) are not necessarily caused by rebound activity. This discrepancy should be addressed or considered in the model.

      In our HVC network model, one goal with HVC<sub>X</sub> neurons is to generate bursts in their underlying neuron population. Since HVC<sub>X</sub> neurons in our model receive only inhibitory inputs from interneurons, we rely on inhibition followed by rebound bursts orchestrated by the IH and the I<sub>CaT</sub> currents to achieve this goal. The interplay between the T-type Ca<sup>++</sup> current and the H current in our model is fundamental to generate their corresponding bursts, as they are sufficient for producing the desired behavior in the network. Due to this interplay, we do not need significant inhibition to generate rebound bursts, because the T-type Ca<sup>++</sup> current’s conductance can be stronger leading to robust rebound bursting even when the degree of inhibition is not very strong. We will highlight this with more clarity in the revised version.

      Some figures contain direct copies of figures from published papers. It is perhaps a better practice to replace them with schematics if possible.

      We will replace the relevant figures with schematic representations where possible.

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors use numerical simulations to try to understand better a major experimental discovery in songbird neuroscience from 2002 by Richard Hahnloser and collaborators. The 2002 paper found that a certain class of projection neurons in the premotor nucleus HVC of adult male zebra finch songbirds, the neurons that project to another premotor nucleus RA, fired sparsely (once per song motif) and precisely (to about 1 ms accuracy) during singing.

      The experimental discovery is important to understand since it initially suggested that the sparsely firing RA-projecting neurons acted as a simple clock that was localized to HVC and that controlled all details of the temporal hierarchy of singing: notes, syllables, gaps, and motifs. Later experiments suggested that the initial interpretation might be incomplete: that the temporal structure of adult male zebra finch songs instead emerged in a more complicated and distributed way, still not well understood, from the interaction of HVC with multiple other nuclei, including auditory and brainstem areas. So at least two major questions remain unanswered more than two decades after the 2002 experiment: What is the neurobiological mechanism that produces the sparse precise bursting: is it a local circuit in HVC or is it some combination of external input to HVC and local circuitry?

      And how is the sparse precise bursting in HVC related to a songbird's vocalizations?

      The authors only investigate part of the first question, whether the mechanism for sparse precise bursts is local to HVC. They do so indirectly, by using conductance-based Hodgkin-Huxley-like equations to simulate the spiking dynamics of a simplified network that includes three known major classes of HVC neurons and such that all neurons within a class are assumed to be identical. A strength of the calculations is that the authors include known biophysically deduced details of the different conductances of the three major classes of HVC neurons, and they take into account what is known, based on sparse paired recordings in slices, about how the three classes connect to one another. One weakness of the paper is that the authors make arbitrary and not well-motivated assumptions about the network geometry, and they do not use the flexibility of their simulations to study how their results depend on their network assumptions. A second weakness is that they ignore many known experimental details such as projections into HVC from other nuclei, dendritic computations (the somas and dendrites are treated by the authors as point-like isopotential objects), the role of neuromodulators, and known heterogeneity of the interneurons. These weaknesses make it difficult for readers to know the relevance of the simulations for experiments and for advancing theoretical understanding.

      Strengths:

      The authors use conductance-based Hodgkin-Huxley-like equations to simulate spiking activity in a network of neurons intended to model more accurately songbird nucleus HVC of adult male zebra finches. Spiking models are much closer to experiments than models based on firing rates or on 2-state neurons.

      The authors include information deduced from modeling experimental current-clamp data such as the types and properties of conductances. They also take into account how neurons in one class connect to neurons in other classes via excitatory or inhibitory synapses, based on sparse paired recordings in slices by other researchers.

      The authors obtain some new results of modest interest such as how changes in the maximum conductances of four key channels (e.g., A-type K<sup>+</sup> currents or Ca-dependent K<sup>+</sup> currents) influence the structure and propagation of bursts, while simultaneously being able to mimic accurately current-clamp voltage measurements.

      Weaknesses:

      One weakness of this paper is the lack of a clearly stated, interesting, and relevant scientific question to try to answer. In the introduction, the authors do not discuss adequately which questions recent experimental and theoretical work have failed to explain adequately, concerning HVC neural dynamics and its role in producing vocalizations. The authors do not discuss adequately why they chose the approach of their paper and how their results address some of these questions.

      For example, the authors need to explain in more detail how their calculations relate to the works of Daou et al, J. Neurophys. 2013 (which already fitted spiking models to neuronal data and identified certain conductances), to Jin et al J. Comput. Neurosci. 2007 (which already discussed how to get bursts using some experimental details), and to the rather similar paper by E. Armstrong and H. Abarbanel, J. Neurophys 2016, which already postulated and studied sequences of microcircuits in HVC. This last paper is not even cited by the authors.

      We thank the reviewer for this valuable comment, and we agree that we did not clarify enough throughout the paper the utility of our model or how it advanced our understanding of the HVC dynamics and circuitry. To that end, we will revise several places of the manuscript and make sure to cite and highlight the relevance and relatedness of the mentioned papers.

      In short, and as mentioned to Reviewer 1, while several models of how sequence is generated within HVC have been proposed (Cannon et al., 2015; Drew & Abbott, 2003; Egger et al., 2020; Elmaleh et al., 2021; Galvis et al., 2018; Gibb et al., 2009a, 2009b; Hamaguchi et al., 2016; Jin, 2009; Long & Fee, 2008; Markowitz et al., 2015; Jin et al., 2007), all the models proposed either rely on intrinsic HVC circuitry to propagate sequential activity, rely on extrinsic feedback to advance the sequence or rely on both. These models do not capture the complex details of spike morphology, do not include the right ionic currents, do not incorporate all classes of HVC neurons, or do not generate realistic firing patterns as seen in vivo. Our model is the first biophysically realistic model that incorporates all classes of HVC neurons and their intrinsic properties.

      No existing hypothesis had been challenged with our model, rather; our model is a distillation of the various models that’s been proposed for the HVC network. We go over this in detail in the Discussion. We believe that the network model we developed provide a step forward in describing the biophysics of HVC circuitry, and may throw a new light on certain dynamics in the mammalian brain, particularly the motor cortex and the hippocampus regions where precisely-timed sequential activity is crucial. We suggest that temporally-precise sequential activity may be a manifestation of neural networks comprised of chain of microcircuits, each containing pools of excitatory and inhibitory neurons, with local interplay among neurons of the same microcircuit and global interplays across the various microcircuits, and with structured inhibition as well as intrinsic properties synchronizing the neuronal pools and stabilizing timing within a firing sequence.

      The authors' main achievement is to show that simulations of a certain simplified and idealized network of spiking neurons, which includes some experimental details but ignores many others, match some experimental results like current-clamp-derived voltage time series for the three classes of HVC neurons (although this was already reported in earlier work by Daou and collaborators in 2013), and simultaneously the robust propagation of bursts with properties similar to those observed in experiments. The authors also present results about how certain neuronal details and burst propagation change when certain key maximum conductances are varied.

      However, these are weak conclusions for two reasons. First, the authors did not do enough calculations to allow the reader to understand how many parameters were needed to obtain these fits and whether simpler circuits, say with fewer parameters and simpler network topology, could do just as well. Second, many previous researchers have demonstrated robust burst propagation in a variety of feed-forward models. So what is new and important about the authors' results compared to the previous computational papers?

      A major novelty of our work is the incorporation of experimental data with detailed network models. While earlier works have established robust burst propagation, our model uses realistic ion channel kinetics and feedback inhibition not only to reproduce experimental neural activity patterns but also to suggest prospective mechanisms for song sequence production in the most biophysical way possible. This aspect that distinguishes our work from other feed-forward models. We go over this in detail in the Discussion. However, the reviewer is right regarding the details of the calculations conducted for the fits, we will make sure to highlight this in the Methods and throughout the manuscript with more details.

      We believe that the network model we developed provide a step forward in describing the biophysics of HVC circuitry, and may throw a new light on certain dynamics in the mammalian brain, particularly the motor cortex and the hippocampus regions where precisely-timed sequential activity is crucial. We suggest that temporally-precise sequential activity may be a manifestation of neural networks comprised of chain of microcircuits, each containing pools of excitatory and inhibitory neurons, with local interplay among neurons of the same microcircuit and global interplays across the various microcircuits, and with structured inhibition as well as intrinsic properties synchronizing the neuronal pools and stabilizing timing within a firing sequence.

      Also missing is a discussion, or at least an acknowledgment, of the fact that not all of the fine experimental details of undershoots, latencies, spike structure, spike accommodation, etc may be relevant for understanding vocalization. While it is nice to know that some models can match these experimental details and produce realistic bursts, that does not mean that all of these details are relevant for the function of producing precise vocalizations. Scientific insights in biology often require exploring which of the many observed details can be ignored and especially identifying the few that are essential for answering some questions. As one example, if HVC-X neurons are completely removed from the authors' model, does one still get robust and reasonable burst propagation of HVC-RA neurons? While part of the nucleus HVC acts as a premotor circuit that drives the nucleus RA, part of HVC is also related to learning. It is not clear that HVC-X neurons, which carry out some unknown calculation and transmit information to area X in a learning pathway, are relevant for burst production and propagation of HVC<sub>RA</sub> neurons, and so relevant for vocalization. Simulations provide a convenient and direct way to explore questions of this kind.

      One key question to answer is whether the bursting of HVC-RA projection neurons is based on a mechanism local to HVC or is some combination of external driving (say from auditory nuclei) and local circuitry. The authors do not contribute to answering this question because they ignore external driving and assume that the mechanism is some kind of intrinsic feed-forward circuit, which they put in by hand in a rather arbitrary and poorly justified way, by assuming the existence of small microcircuits consisting of a few HVC-RA, HVC-X, and HVC-I neurons that somehow correspond to "sub-syllabic segments". To my knowledge, experiments do not suggest the existence of such microcircuits nor does theory suggest the need for such microcircuits.

      Recent results showed a tight correlation between the intrinsic properties of neurons and features of song (Daou and Margoliash 2020, Medina and Margoliash 2024), where adult birds that exhibit similar songs tend to have similar intrinsic properties. While this is relevant, we acknowledge that not all details may be necessary for every aspect of vocalization, and future models could simplify concentrate on core dynamics and exclude certain features while still providing insights into the primary mechanisms.

      The question of whether HVC<sub>X</sub> neurons are relevant for burst propagation given that our model includes these neurons as part of the network for completeness, the reviewer is correct, the propagation of sequential activity in this model is primarily carried by HVC<sub>RA</sub> neurons in a feed-forward manner, but only if there is no perturbation to the HVC network. For example, we have shown how altering the intrinsic properties of HVC<sub>X</sub> neurons or for interneurons disrupts sequence propagation. In other words, while HVC neurons are the key forces to carry the chain forward, the interplay between excitation and inhibition in our network as well as the intrinsic parameters for all classes of HVC neurons are equally important forces in carrying the chain of activity forward. Thus, the stability of activity propagation necessary for song production depend on a finely balanced network of HVC neurons, with all classes contributing to the overall dynamics.

      We agree with the reviewer however that a potential drawback of our model is that its sole focus is on local excitatory connectivity within the HVC (Kornfeld et al., 2017; Long et al., 2010), while HVC neurons receive afferent excitatory connections (Akutagawa & Konishi, 2010; Nottebohm et al., 1982) that plays significant roles in their local dynamics. For example, the excitatory inputs that HVC neurons receive from Uvaeformis may be crucial in initiating (Andalman et al., 2011; Danish et al., 2017; Galvis et al., 2018) or sustaining (Hamaguchi et al., 2016) the sequential activity. While we acknowledge this limitation, our main contribution in this work is the biophysical insights onto how the patterning activity in HVC is largely shaped by the intrinsic properties of the individual neurons as well as the synaptic properties where excitation and inhibition play a major role in enabling neurons to generate their characteristic bursts during singing. This is true and holds irrespective of whether an external drive is injected onto the microcircuits or not. We will however elaborate on and investigate this more during the next submission.

      Another weakness of this paper is an unsatisfactory discussion of how the model was obtained, validated, and simulated. The authors should state as clearly as possible, in one location such as an appendix, what is the total number of independent parameters for the entire network and how parameter values were deduced from data or assigned by hand. With enough parameters and variables, many details can be fit arbitrarily accurately so researchers have to be careful to avoid overfitting. If parameter values were obtained by fitting to data, the authors should state clearly what the fitting algorithm was (some iterative nonlinear method, whose results can depend on the initial choice of parameters), what the error function used for fitting (sum of least squares?) was, and what data were used for the fitting.

      The authors should also state clearly the dynamical state of the network, the vector of quantities that evolve over time. (What is the dimension of that vector, which is also the number of ordinary differential equations that have to be integrated?) The authors do not mention what initial state was used to start the numerical integrations, whether transient dynamics were observed and what were their properties, or how the results depended on the choice of the initial state. The authors do not discuss how they determined that their model was programmed correctly (it is difficult to avoid typing errors when writing several pages or more of a code in any language) or how they determined the accuracy of the numerical integration method beyond fitting to experimental data, say by varying the time step size over some range or by comparing two different integration algorithms.

      We thank the reviewer again. The fitting process in our model occurred only at the first stage where the synaptic parameters were fit to the Mooney and Prather as well as the Kosche results. There was no data shared and we merely looked at the figures in those papers and checked the amplitude of the elicited currents, the magnitudes of DC-evoked excitations etc, and we replicated that in our model. While this is suboptimal, it was better for us to start with it rather than simply using equations for synaptic currents from the literature for other types of neurons (that are not even HVC’s or in the songbird) and integrate them into our network model. However, we will certainly highlight the details of this fitting process in the new submission. We will also highlight more technical details in the Methods regarding the exact number of ODEs, the initial conditions to run them, etc.

      Also disappointing is that the authors do not make any predictions to test, except rather weak ones such as that varying a maximum conductance sufficiently (which might be possible by using dynamic clamps) might cause burst propagation to stop or change its properties. Based on their results, the authors do not make suggestions for further experiments or calculations, but they should.

      We agree that making experimental testable predictions is crucial for the advancement of the model. Our predictions include testing whether eradication of a class of neurons such as HVC<sub>X</sub> neurons disrupts activity propagation which can be done through targeted neuron elimination. This also can be done through preventing rebound bursting in HVC<sub>X</sub> by pharmacologically blocking the I<sub>h</sub> channels. Others include down regulation of certain ion channels (pharmacologically done through ion blockers) and testing which current is fundamental for song production (and there a plenty of test based our results, like the SK current, the T-type Ca<sup>++</sup> current, the A-type K<sup>+</sup> current, etc). We will incorporate these into the revised manuscript to better demonstrate the model's applicability and to guide future research directions.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Structural colors (SC) are based on nanostructures reflecting and scattering light and producing optical wave interference. All kinds of living organisms exhibit SC. However, understanding the molecular mechanisms and genes involved may be complicated due to the complexity of these organisms. Hence, bacteria that exhibit SC in colonies, such as Flavobacterium IR1, can be good models.

      Based on previous genomic mining and co-occurrence with SC in flavobacterial strains, this article focuses on the role of a specific gene, moeA, in SC of Flavobacterium IR1 strain colonies on an agar plate. moeA is involved in the synthesis of the molybdenum cofactor, which is necessary for the activity of key metabolic enzymes in diverse pathways.

      The authors clearly showed that the absence of moeA shifts SC properties in a way that depends on the nutritional conditions. They further bring evidence that this effect was related to several properties of the colony, all impacted by the moeA mutant: cell-cell organization, cell motility and colony spreading, and metabolism of complex carbohydrates. Hence, by linking SC to a single gene in appearance, this work points to cellular organization (as a result of cell-cell arrangement and motility) and metabolism of polysaccharides as key factors for SC in a gliding bacterium. This may prove useful for designing molecular strategies to control SC in bacterial-based biomaterials.

      Strengths:

      The topic is very interesting from a fundamental viewpoint and has great potential in the field of biomaterials.

      Thank you for your comments.

      The article is easy to read. It builds on previous studies with already established tools to characterize SC at the level of the flavobacterial colony. Experiments are well described and well executed. In addition, the SIBR-Cas method for chromosome engineering in Flavobacteria is the most recent and is a leap forward for future studies in this model, even beyond SC.

      We appreciate these comments.

      Weaknesses:

      The paper appears a bit too descriptive and could be better organized. Some of the results, in particular the proteomic comparison, are not well exploited (not explored experimentally). In my opinion, the problem originates from the difficulty in explaining the link between the absence of moeA and the alterations observed at the level of colony spreading and polysaccharide utilization, and the variation in proteomic content.

      We will look at the organisation of the manuscript carefully in the coming, detailed revision, as suggested. In terms of the proteomics, there are clearly a large number of proteins affected by the moeA deletion. In terms of experimental exploration, we chose spreading, structural colour formation and starch degradation to test phenotypically, as the most relevant. For example, in L615-617, we discuss the downregulation of GldL (which is known to be involved Flavobacterial gliding motility [Shrivastava et al., 2013]) in the _moe_A KO as a possible explanation for the reduced colony spreading of moeA mutant. Changes in polysaccharide (starch) utilization were seen on solid medium, as well as in the proteomic profile where we observed the upregulation of carbohydrate metabolism proteins linked to PUL (polysaccharide utilisation locus) operons (Terrapon et al., 2015), such as PAM95095-90 (Figure 8), and other carbohydrate metabolism-related proteins, including a pectate lyase (Table S7) which is involved in starch degradation (Aspeborg et al., 2012). And as noted in L555-566 and Figure 9, starch metabolism was tested experimentally.

      First, the effect of moeA deletion on molybdenum cofactor synthesis should be addressed.

      MoeA is the last enzyme in the MoCo synthesis pathway, thus if only MoeA is absent the cell would accumulate MPT-AMP (molybdopterin-adenosine monophosphatase) (Iobbi-Nivol & Leimkühler, 2013), and the expressed molybdoenzymes would not be functional. In L582-585, we commented how the lack of molybdenum cofactor may affect the synthesis of molybdoenzymes. However, if you meant to analyse the presence of the small molecules, the cofactors, involved in these pathways, that was an assay we were not able to perform. Moreover, in L585-587, we addressed how the deletion of _moe_A affected the proteins encoded by the rest of genes in the operon.

      Second, as I was reading the entire manuscript, I kept asking myself if moeA (and by extension molybdenum cofactor) was really involved in SC or it was an indirect effect. For example, what if the absence of moeA alters the cell envelope because the synthesis of its building blocks is perturbed, then subsequently perturbates all related processes, including gliding motility and protein secretion? It would help to know if the effects on colony spreading and polysaccharide metabolism can be uncoupled. I don't think the authors discussed that clearly.

      The message of the paper is that the moeA gene, as predicted from a previous genomics analysis, is important in SC. This is based on the representation of the _moe_A gene in genomes of bacteria that display SC. This analysis does not predict the mechanism. When knocked out, a significant change in structural colour occurred, supporting this hypothesis. Whether this effect is direct or indirect is difficult to assess, as this referee rightly suggests. In order to follow up this central result, we performed proteomics (both intra- and extracellular). As we observed, the deletion of a single gene generated many changes in the proteomic profile, thus in the biological processes. Based on the known functions of molybdenum cofactor, we could only hypothesize that pterin metabolism is important for SC, not exactly how.

      We intend to discuss the links between gliding/spreading and polysaccharide metabolism more clearly, with reference to the literature, as quite a bit is known here including possible links to SC.

      Reviewer #2 (Public review):

      Summary:

      The authors constructed an in-frame deletion of moeA gene, which is involved in molybdopterin cofactor (MoCo) biosynthesis, and investigated its role in structural colors in Flavobacterium IR1. The deletion of moeA shifted colony color from green to blue, reduced colony spreading, and increased starch degradation, which was attributed to the upregulation of various proteins in polysaccharide utilization loci. This study lays the ground for developing new colorants by modifying genes involved in structural colors.

      Major strengths and weaknesses:

      The authors conducted well-designed experiments with appropriate controls and the results in the paper are presented in a logical manner, which supports their conclusions.

      We appreciate your comment.

      Using statistical tests to compare the differences between the wild type and moeA mutant, and adding a significance bar in Figure 4B, would strengthen their claims on differences in cell motility regarding differences in cell motility.

      Thank you. Figure 4B contains the significance bars that represent the standard deviation of the mean value of the three replicates, but we will modify it to make them more clear.

      Additionally, in the result section (Figure 6), the authors suggest that the shift in blue color is "caused by cells which are still highly ordered but narrower", which to my knowledge is not backed up by any experimental evidence.

      Thanks. We mentioned that the mutant cells are narrower than the wild type based on the estimated periodicity resulting from the goniometry analysis (L427-430). We will now say “likely to be narrower based on the estimated periodicity from the optical analysis” rather than just “narrower” in the revision.

      Overall, this is a well-written paper in which the authors effectively address their research questions through proper experimentation. This work will help us understand the genetic basis of structural colors in Flavobacterium and open new avenues to study the roles of additional genes and proteins in structural colors.

      Much appreciated.

      REFERENCES

      Aspeborg, Henrik, Pedro M. Coutinho, Yang Wang, Harry Brumer, and Bernard Henrissat. "Evolution, substrate specificity and subfamily classification of glycoside hydrolase family 5 (GH5)." BMC evolutionary biology 12 (2012): 1-16.

      lobbi-Nivol, Chantal, and Silke Leimkühler. "Molybdenum enzymes, their maturation and molybdenum cofactor biosynthesis in Escherichia coli." Biochimica et Biophysica Acta (BBA)-Bioenergetics 1827, no. 8-9 (2013): 1086-1101.

      Shrivastava, Abhishek, Joseph J. Johnston, Jessica M. Van Baaren, and Mark J. McBride. "Flavobacterium johnsoniae GldK, GldL, GldM, and SprA are required for secretion of the cell surface gliding motility adhesins SprB and RemA." Journal of bacteriology 195, no. 14 (2013): 3201-3212.

      Terrapon, Nicolas, Vincent Lombard, Harry J. Gilbert, and Bernard Henrissat. "Automatic prediction of polysaccharide utilization loci in Bacteroidetes species." Bioinformatics 31, no. 5 (2015): 647-655.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study explored a molecular comparison of smooth muscle and neighboring fibroblast cells found in lung blood vessels afflicted by a disease called pulmonary arterial hypertension. In doing so, the authors described distinct disease-associated states of each of these cell types with further insights into the cellular communication and crosstalk between them. The strength of evidence was convincing through the use of complementary and sophisticated tools, accompanied by rare isolation of human diseased lung blood vessel cells that were source-matched to the same donor for direct comparison.

      We thank the editors and reviewers in their highly positive and encouraging assessment of our manuscript detailing the cell state changes of arterial smooth muscle cells and fibroblasts in the pulmonary bed. We addressed reviewers’ major comments in the revised manuscript by providing validation of key in vitro findings, such as preserved marker localization and increased GAG deposition in IPAH pulmonary arteries. We additionally provide comparison of transcriptomic profiles spanning fresh, very early and late passage cells. Finally, we present expanded experimental data in support of cellular crosstalk, including testing of additional PAAF ligands on donor PASMC and influence of PTX3/HGF on IPAH PASMC.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors isolated and cultured pulmonary artery smooth muscle cells (PASMC) and pulmonary artery adventitial fibroblasts (PAAF) of the lung samples derived from the patients with idiopathic pulmonary arterial hypertension (PAH) and the healthy volunteers. They performed RNA-seq and proteomics analyses to detail the cellular communication between PASMC and PAAF, which are the main target cells of pulmonary vascular remodeling during the pathogenesis of PAH. The authors revealed that PASMC and PAAF retained their original cellular identity and acquired different states associated with the pathogenesis of PAH, respectively.

      Strengths:

      Although previous studies have shown that PASMC and PAAF cells each have an important role in the pathogenesis of PAH, there have been scarce reports focusing on the interactions between PASMC and PAAF. These findings may provide valuable information for elucidating the pathogenesis of pulmonary arterial hypertension.

      We appreciate the reviewer’s positive view of our study.

      Weaknesses:

      The results of proteome analysis using primary culture cells in this paper seem a bit insufficient to draw conclusions. In particular, the authors described "We elucidated the involvement of cellular crosstalk in regulating cell state dynamics and identified pentraxin-3 and hepatocyte growth factor as modulators of PASMC phenotypic transition orchestrated by PAAF." However, the presented data are considered limited and insufficient.

      We thank the reviewer for drawing our attention to this point and have accordingly modified the conclusion section to read: “We investigated the involvement of cellular crosstalk….” Moreover, we provide further experimental evidence demonstrating the effect of both PTX3 and HGF on cell state marker expression in IPAH-PASMC cells (Figure 7H). In addition, we clarify the selection strategy applied to investigate particular PAAF-secreted ligands and test three additional ligands on donor PASMC (Figure S8), supporting the original focus on PTX3 and HGF.

      Reviewer #2 (Public Review):

      Summary:

      Utilizing a combination of transcriptomic and proteomic profiling as well as cellular phenotyping from source-matched PASMC and PAAFs in IPAH, this study sought to explore a molecular comparison of these cells in order to track distinct cell fate trajectories and acquisition of their IPAH-associated cellular states. The authors also aimed to identify cell-cell communication axes in order to infer mechanisms by which these two cells interact and depend upon external cues. This study will be of interest to the scientific and clinical communities of those interested in pulmonary vascular biology and disease. It also will appeal to those interested in lung and vascular development as well as multi-omic analytic procedures.

      We thank the reviewer for overall highly positive assessment of our study.

      Strengths:

      (1) This is one of the first studies using orthogonal sequencing and phenotyping for the characterization of source-matched neighboring mesenchymal PASMC and PAAF cells in healthy and diseased IPAH patients. This is a major strength that allows for direct comparison of neighboring cell types and the ability to address an unanswered question regarding the nature of these mesenchymal "mural" cells at a precise molecular level.

      We value the reviewer’s kind and objective summary of our study.

      (2) Unlike a number of multi-omic sequencing papers that read more as an atlas of findings without structure, the inherent comparative organization of the study and presentation of the data were valuable in aiding the reader in understanding how to discern the distinct IPAH-associated cell states. As a result, the reader not only gleans greater insight into these two interacting cell types in disease but also now can leverage these datasets more easily for future research questions in this space.

      We thank the reviewer for this highly positive comment.

      (3) There are interesting and surprising findings in the cellular characterizations, including the low proliferative state of IPAH-PASMCs as compared to the hyperproliferative state in IPAH-PAAFs. Furthermore, the cell-cell communication axes involving ECM components and soluble ligands provided by PAAFs that direct cell state dynamics of PASMCs offer some of the first and foundational descriptions of what are likely complex cellular interactions that await discovery.

      We agree with the reviewer’s assessment that some of the novel data in our study helps to formulate testable hypothesis that can be followed through with more focused follow-up research.

      (4) Technical rigor is quite high in the -omics methodology and in vitro phenotyping tools used.

      We are grateful for reviewer’s assessment of our work and positive recognition.

      Weaknesses:

      There are some weaknesses in the methodology that should temper the conclusions:

      (1) The number of donors sampled for PAAF/PASMCs was small for both healthy controls and IPAH patients. Thus, while the level of detail of -omics profiling was quite deep, the generalizability of their findings to all IPAH patients or Group 1 PAH patients is limited.

      We appreciate the reviewers concerns regarding the generalizability of the findings and have acknowledged this as the study limitation in the discussion: “A low case number and end-stage disease samples used for omics characterization represents a study limitation that has to be taken into account before assuming similar findings would be evident in the entire PAH patient population over the course of the disease development and progression”. We have addressed this issue by performing validation of key in vitro findings using fresh cells or assessment of FFPE lung material from additional independent samples in the revised manuscript (Figures 2D, 3D, 3H, 4H). For transparency, we provide biological sample number in the result section of the modified manuscript.

      (2) While the study utilized early passage cells, these cells nonetheless were still cultured outside the in vivo milieu prior to analysis. Thus, while there is an assumption that these cells do not change fundamental behavior outside the body, that is not entirely proven for all transcriptional and proteomic signatures. As such, the major alterations that are noted would be more compelling if validated from tissue or cells derived directly from in vivo sources. Without such validation, the major limitation of the impact and conclusions of the paper is that the full extent of the relevance of these findings to human disease is not known.

      We thank the reviewer for this constructive and excellent suggestion. The comparison of fresh and cultured cells revealed a strong and early divergence of differentially regulated pathways for PAAF, while a more gradual transition for PASMC. The results of this analysis are included in the new Figures 2D, 3D, 3H, and 4H. Implications are discussed in the revised manuscript: “However, the same mechanism renders cells susceptible to phenotypic change induced simply by extended vitro culturing, testified by broad expression profile differences between fresh and cultured cells. This common caveat in cell biology research and represents a technical and practical tradeoff that requires cross validation of key findings. Using a combination of archived lung tissue and available single cell RNA sequencing dataset of human pulmonary arteries, we show that some of the key defining phenotypic features of diseased cells, such as altered proliferation rate and ECM production, are preserved and gradually lost upon prolonged culturing”.

      (3) While the presentation of most of the manuscript was quite clear and convincing, the terminology and conclusions regarding "cell fate trajectories" throughout the manuscript did not seem to be fully justified. That is, all of the analyses were derived from cells originating from end-stage IPAH, and otherwise, the authors were not lineage tracing across disease initiation or development (which would be impossible currently in humans). So, while the description of distinct "IPAH-associated states" makes sense, any true cell fate trajectory was not clearly defined.

      In accordance to reviewer’s comment, we have decided to modify the wording to exclude the “cell fate trajectory” phrase and replace it with “acquisition of disease cell state”.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) In Figure 1, PASMC and PAAF were collected from the lungs of healthy donors and analyzed for transcriptomics and proteomics; in Figure 1A, it can be taken as if both cells from IPAH patients were also analyzed, but this is not reflected in the results. In Figure1D, immunostaining of normal lungs confirms the localization of PASMC and PAAF markers found by transcriptomics. The authors describe a strong, but not perfect, correlation between the transcriptomics and proteomics data from Figure S1, but the gene names of each cellular marker they found should also be listed. In addition, the authors have observed the expression of markers characteristic of PASMC and PAAF in pulmonary vessels of healthy subjects by IH, but is there any novelty in these markers? Furthermore, are the expression sites of these markers altered in IPAH patients?

      In the revised manuscript we have adjusted the schematic to reflect the fact that only donor cells are compared in Figure 1. We additionally provide a correlation of cell type markers between proteomic and transcriptomic data sets for those molecules that are detected in both datasets (Figure S1B).

      We provide clarification on the novelty aspect in the result section: “Some of the molecules were previously associated with predominant SMC, such as RGS5 and CSPR1 (Crnkovic et al., 2022; Snider et al., 2008), or adventitial fibroblast, such as SCARA5, CFD and MGST1 (Crnkovic et al., 2022; Sikkema et al., 2023) expression”. Except for RGS5, expression and localization of other markers in IPAH was previously unknown.

      The conservation of expression sites for reported markers was validated in IPAH in the revised manuscript (Figure 2D), with IGFBP5 showing dual localization in both cell types. Moreover, results in Figure 1D, 1E and 2D support the validity of omics findings and preservation of key markers during passaging.

      (2) In Figure 2, the authors compare PASMC and PAAF derived from IPAH patients and donors. The results show that transcriptomics and proteomics changes are clearly differentiated by cell type and not by pathological state. In the pathological state, transcriptional changes are more pronounced. The GO analysis of the factors that showed significant changes in each cell type is shown in Figure 2E, but the differences between the GO analysis of the transcriptomics and proteomics results are not clearly shown. The reviewer believes that the advantages of a combined analysis of both should be indicated. Also, in Figure 2G, the GAG content in PA appears to be elevated in only 3 cases, while the other 5 cases appear to be at the same level as the donor; is there a characteristic change in these 3 cases? Figure 2I shows that the phenotype of PAAF changes with cell passages. Since this phenomenon would be interesting and useful to the reader, additional discussion regarding the mechanism would be desired.

      We have integrated both data sets in order to achieve stronger and meaningful analysis due to weaker and uncomplete correlation between transcriptomic and protein dataset as indicated in the results section: “Comparative analysis of transcriptomic and proteomic data sets revealed a strong, but not complete level of linear correlation between the gene and protein expression profiles (Figure S1B, C). We therefore decided to use an integrative dataset and analyzed all significantly enriched genes and proteins (-log10(P)>1.3) between both cell types to achieve stronger and more robust analysis”. In general, proteomic profile showed fewer significant differences and extent of change was lesser compared with transcriptomics, likely due to technical limitations of the method and sensitivity, testified by the complete lack of top transcriptomic molecules (RGS5, ADH1C, IGFBP5, CFD, SCARA5) in the protein dataset.

      To strengthen the findings of increased GAG in IPAH pulmonary arteries, we have performed compartment-specific, quantitative image analysis of Alcian blue staining on additional donor and patient samples (n=10 for each condition). The new analysis totaling around 40 PA confirmed significantly increased deposition of GAG in IPAH pulmonary arteries.

      We have addressed the issue of phenotypic change with prolonged cell culture in the revised manuscript by systematically comparing enrichment for biological processes between fresh (Crnkovic et al., 2022: GSE210248), very early (this study: GSE255669) and later passage cells (Chelladurai et al., 2022: GSE144932; Gorr et al., 2020: GSE144274). We observed cell type differences in the rate of change of phenotypic features, with PAAF showing faster shift early on during culturing that could for some of the features be due to isolation from immunomodulatory environment or presence of hydrocortisone supplement in the PAAF cell media. These points have been described in the revised results section and mentioned in the discussion.

      (3) The authors claim that one feature of this paper is the use of "very early passage (p1)" of pulmonary artery smooth muscle cells (PASMC). Since there are other existing (previouly reported) data that are publicly available, such as RNA-seq data using cells with 2-4 cell passages, it may be possible to show that fewer passages are better in primary culture by comparing the data presented in this paper.

      Following reviewers’ comments, we have performed systematic comparison (Crnkovic et al., 2022: GSE210248), very early (this study: GSE255669) and later passage cells (Chelladurai et al., 2022: GSE144932; Gorr et al., 2020: GSE144274). in the revised manuscript in order to comprehensively address the issue and define changes occurring as a result of prolonged in vitro conditions (Figure 3H). The results showed that the expression profile of early passage cells retains some of the key phenotypic features displayed by cells in their native environment, with PASMC displaying a more gradual loss of phenotypic characteristics compared to PAAF. Interestingly, PAAF displayed a striking inverse enrichment for inflammatory/NF-kB signaling between fresh and cultured PAAF, which could potentially be caused by the hydrocortisone supplement in the PAAF cell media or due to the isolation from its highly immunomodulatory enviroment. These points have been described in the revised results section and mentioned in the discussion.

      (4) The authors describe a study characterized by decreased expression of "cytoskeletal contractile elements" in pulmonary artery smooth muscle cells (PASMC) derived from patients with IPAH. What are the implications of this result, and does it arise from the use of smooth muscle in patients resistant to pulmonary artery smooth muscle dilating agents? A discussion on this issue needs to be made in a way that is easy for the reader to understand.

      The reviewer raises an interesting point regarding the loss the contractile markers and response to vasodilating therapy. We would speculate that isolated decrease in contractile machinery, without concomitant change in ECM and other PASMC features, would dampen both the contraction and relaxation properties of the single PASMC, affecting not only its response to dilating agents, but also to vasoconstrictors. Clinical consequences and responsiveness to dilating agents are more difficult to predict, since the vasoactive response would additionally depend on mechanical properties of the pulmonary artery defined by cellular and ECM composition. Nevertheless, we believe that decreased expression of contractile machinery reflects an intrinsic, “programmed” response of SMC to remodeling, rather than vasodilator therapy-induced selection pressure, since similar phenotypic change is observed in SMC from systemic circulation and in various animal models without exposure to PAH medication. These considerations have been included in the revised discussion section.

      (5) There are a lot of secreted proteins that increase or decrease in Figure 6G, but there is scant reason to focus on PTX3 and HGF among them. The authors need to elaborate on the above issue.

      We regret the lack of clarity and provide improved explanation of the ligand selection strategy in the revised manuscript. In order to prioritize the potential hits, we first used hierarchical clustering to group co-regulated ligands into smaller number of groups. We then prioritized for the ligands that lacked or had limited information with respect to IPAH. Based on these results, we analyzed the effect of three additional ligands on PASMC cell state marker expression (Figure S8). This additional data supported the initial focus on PTX3 and HGF.

      Minor comments:

      (1) Regarding the number of specimens used in the Result, it would be more helpful to the reader if the number of samples were also mentioned in the text.

      We have included the number of used samples in manuscript text.

      (2) There is no explanation of what R2Y represents in Figure 2B. This reviewer is not able to understand the statistical analysis of Figure 2H. The detailed results should be explained.

      We apologize for the oversight in labeling of Figure 2B and modify the figure legend: “Orthogonal projection to latent structures-discriminant analysis (OPLS-DA) T score plots separating predictive variability (x-axis), attributed to biological grouping, and non-predictive variability (technical/inter-individual, y-axis). Monofactorial OPLS-DA model for separation according to cell type or disease. C) Bifactorial OPLS-DA model considering cell type and disease simultaneously. Ellipse depicting the 95% confidence region, Q2 denoting model’s predictive power (significance: Q2>50%) and R2Y representing proportion of variance in the response variable explained by the model (higher values indicating better fit)”.

      We also modified figure legend wording for the analysis in Figure 2H (new Figure 3E) to clarify the independent factors whose interaction was investigated using 3-way ANOVA: “Interaction effects of stimulation, cell type, and disease state on cellular proliferation were analyzed by 3-way ANOVA. Significant interaction effects are indicated as follows: * for stimulation × cell type interactions and # for cell type × disease state interactions (both *, # p<0.05)”.

      (3) In Figure 3, the authors examined whether there were molecular abnormalities common to IPAH-PASMC and IPAH-PAAF and found that the number of commonly regulated genes and proteins was limited to 47. Further analysis of these regulators by STRING analysis revealed that factors related to the regulation of apoptosis are commonly altered in both cells. On the other hand, the authors focused on mitochondria, as SOD2 is downregulated, and found an increase in ROS production specific to PASMC, indicating that mitochondrial dysfunction is common to PASMC and PAAF in IPAH, but downstream phenomena are different between cell types. Factors associated with apoptosis regulation have been found to be both upward and downward regulated, but the actual occurrence of apoptosis in both cell types has not been addressed.

      We have performed TUNEL staining on FFPE lung tissue from donors and IPAH patients that revealed apoptosis as a rare event in both conditions in PASMC and PAAF. Therefore, no meaningful quantification could be conducted. An example of pulmonary artery where rare positive signal in either PAAF or PASMC could be found is provided in Figure 4H.

      Unfortunately, association of a particular gene with a pathway is by default arbitrary and potentially ambiguous. In particular, factors identified as associated in apoptosis are also involved in regulation of inflammatory signaling (BIRC3, DDIT3) and amino acid metabolism (SHMT1). Nevertheless, mitochondria represent a crucial cellular hub for apoptosis regulation and, as shown in the current study, display significant functional alterations in IPAH in both cell types, aligning with reduced mitochondrial superoxide dismutase (SOD2) expression.

      (4) The meaning of the gray circle in Figure 3C should be clarified. Similarly, the meaning of the color in Fig. 3D should be clearly explained. In Figure 3E-G, each cell is significantly different from 18-61 cells, and the number of each cell and the reason should be described.

      We regret the confusion and provide better explanation of the figure legend: “gray nodes representing their putative upstream regulators”, “with color coding reflecting the IPAH dependent regulation”. In the revised Figure panels 4E-G (old 3E-G) we provide the exact number of cells measured in each condition. Although we tried to have comparable cell confluency at the time of measurement, different proliferation rates between cells from different cell type and condition led to different number of measured cells per donor/patient.

      (5) In Figure 4, the authors focus on factors that vary in different directions between cells, revealing fingerprints of molecular changes that differ between cell types, particularly IPAH-PASMC, which acquires a synthetic phenotype with enhanced regulation of chemotaxis elements, whereas IPAH-PAAF, a fast cycling cell characteristics. Next, focusing on the ECM components that were specifically altered in IPAH-PASMC, Nichenet analysis in Figure 5 suggested that ligands from PAAF may act on PASMC, and the authors focused on integrin signaling to examine ECM contact and changes in cell function. The results indicate that adhesion to laminin is poor in PASMC. Although no difference was observed between donor and IPAH PASMCs, a discussion of the reasons for this would be desired and helpful to the readers.

      Both donor and IPAH PASMCs respond similarly to laminin. However, our key finding is the downregulation of laminin in IPAH PAAF, which likely leads to a skewed laminin-to-collagen ratio and altered ECM composition in remodeled arteries. This shift in the ECM class results in altered PASMC behavior, affecting both donor and IPAH cells similarly. In the revised manuscript, we demonstrate that PASMC largely retain the expression pattern of integrin subunits that serve as high-affinity collagen and laminin receptors, with higher levels compared to PAAF (Figure 6F, G). Furthermore, we speculate that the distinct cellular phenotypic responses to collagen versus laminin coatings may arise from different downstream signaling pathways activated by the various integrin subunits (Nguyen et al., 2000). These considerations have been included in the revised discussion: “The comparable responses of donor and IPAH PASMC likely result from their shared integrin receptor expression profiles. Meanwhile, ECM class switching engages different high-affinity integrin receptors, which activate alternative downstream signaling pathways (Nguyen et al., 2000) and lead to differential responses to collagen and laminin matrices. We thus propose a model in which laminins and collagens act as PAAF-secreted ligands, regulating PASMC behavior through their ECM-sensing integrin receptors.”

      (6) Since Figure 3B and Figure 4A seem to show the same results, why not combine them into one?

      Indeed, these figure panels show the same results, but the focus of the investigations in each Figure is different. We therefore opted to keep the panels separate for better clarity and logical link to other panels in the same figure

      (7) In Figure 6, the interaction analysis of scRNAseq data with respect to signaling between PASMC and PAAF was performed using Nichenet and CellChat, showing that signaling from PAAF to PASMC is biased toward secreted ligands and that a functionally relevant set of soluble ligands is impaired in the IPAH state. From there, they proceeded with co-culture experiments and showed that co-culture healthy PASMC with PAAF of IPAH patients abolished PASMC markers in the healthy state. Furthermore, the authors attempted to identify ligands that induce functional changes in PASMCs produced from IPAH PAAFs and found that HGF is a factor that downregulates the expression of contractile markers in PASMCs. Further insights may be gained by co-culturing IPAH-derived cells in co-culture experiments. Also, no beneficial effect of pentraxin3 was found in Figure 6H. The authors should examine the effect of pentraxin3 on PASMC cells derived from IPAH patients, rather than healthy donors.

      We tested the influence of IPAH-PASMC on donor-PAAF and found no effect on the expression of the selected markers. We thank the reviewer for the suggestion to conduct the experiments on IPAH-PASMC. The new data show that both PTX3 and HGF have a significant effect, but differential effect on IPAH-PASMC as compared to donors-PASMC. Whereas PTX lacks effect on donor PASMC, it leads to downregulation of some of the contractile markers in IPAH PASMC, while HGF upregulates VCAN synthetic marker in IPAH PASMC. These results are now included in Figure 7H.

      Reviewer #2 (Recommendations For The Authors):

      The authors should double-check for grammar and typos in the manuscript. I caught a few such as "therefor" and others, but there could be more.

      We thank the reviewer for the effort and time in reading and evaluating the manuscript. To the best of our knowledge, we have corrected the grammatical errors in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1:

      (1) To improve the clarity of the work, I suggest a final note to the authors to say more explicitly that objective accuracy has a finer resolution *due to the number of "special circles" per trial* in their task. This task detail got lost in my read of the manuscript, and confused me with respect to the resolution of each accuracy measure.

      We agree with the reviewer that this would be a useful clarification and have therefore added the following statement to the Methods section on p. 20:

      “It should be noted that the OIP has a slightly finer resolution due to the number of special circles per trial.”

      (2) Similarly for clarification, they could point out that their exclusion criteria removes subjects that have lower OIP than their AIP analysis allows (which is good for comparison between OIP and AIP). Thus, it removes the possibility that very poor performing subjects (OIP) are forced to have a higher than actual AIP due to the range).

      We agree this would be a useful statement to add and have included the following sentence in the Supplement on p. 8:

      “Such a restriction of the threshold parameter was intended to increase the comparability between AIP and OIP, and hence improved the calculation of the reminder bias.”


      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      (1) Upon reading their response to the question I had regarding AIP and OIP, a few more questions came up regarding OIP, AIP, how they're calculations differ, and how the latter was computed in R. I hope these help readers to clarify how to interpret these key measures, and the hypotheses that rely upon them.

      Regarding fitting, and in relation to power, is16 queries adequate to estimate an AIP using the R's quickpsy? That is, assuming some noise in the choice process, how recoverable is a true indifference points from 16 trials? If there's a parameter recovery analysis (ie generating choice via the fitting parameters, which will have built-in stochasticity, and seeing how well you recover the parameter) of interest would be helpful. It may help to characterize why the present study might differ from prior studies (maybe a power issue here).

      The reviewer is absolutely correct that we should have provided more detail when describing our fitting procedure for the psychometric curves. We have now addressed this by adding the following statements to the Methods section and Supplement:

      Page 20 in the main manuscript: “Fitting was done using the quickpsy package in R and more detail is given in the Supplement.”

      Pages 8 and 9 in the Supplement: 

      “Psychometric curve fitting

      We used the quickpsy package in R to fit psychometric curves to each participant’s choice data to derive their actual indifference point (AIP), which was operationalised as the threshold parameter when predicting reminder choices from target values. We restricted the possible parameter ranges from 2 to 9 for the threshold parameter and from 1 to 500 for the slope parameter, based on the task’s properties and pilot data. Apart from those parameter ranges, we used only default settings of the quickpsy() function.

      Each participant has only 16 trials (2 for each target value) contribute to the curve fitting. To understand the robustness of the AIP based on such limited data, we conducted a parameter recovery analysis. We simulated 16 trials based on each psychometric function and re-ran the curve fitting based on those simulated choices. There was close correspondence between the actual and recovered threshold parameters (or AIPs) with a correlation of r = 0.97, p < 0.001 (see also Figure S1). In contrast, the slope parameter—which was not central to any of our analyses—exhibited greater variability during the initial fitting. This increased uncertainty likely contributed to its poor recovery in the simulation, as evidenced by a near-zero correlation (r = −0.01, p = 0.82).”

      (2) Along these lines, it would be helpful for the reader to actually see the individual psychometric curve, now how quickpsy was used (did you fit left and right asymptotes), etc, to understand how that fitting procedure works and how the assumptions of the fitting procedure compare to what can be gleaned through seeing the choice curves plotted.

      As stated above, we used default settings of the quickpsy() function and hence assumed symmetric asymptotes at 0 and 1. However, the reviewer mentions “left and right asymptotes”, so maybe this question is about restricting the possible parameter range for the threshold, which we restricted to values from 2 to 9, as described above.

      Regarding the individual curves, we have now include the following statement on page 9 in the Supplement: “Figures S2 to S31 show the individual psychometric curves that were estimated for each participant.” Please refer to the Supplement for the added figures.

      (3) A more full explanation of quickpsy, its parameters, and how choice curves look might also generate interesting further questions to think about with respect to biases and compulsivity. Two individuals might have similar indifference points, but an asymptote might reflect a bias to always have some percent chance of for example to take the reminders even at the lowest offer available for them.

      We agree that this is an interesting focus which we will keep in mind for future studies.

      (4) Regarding comparing OIP to AIP: 

      For OIP, as far as I can understand, the resolution of it is decreased compared to AIP.  Accuracies for OIP can only be 0/4,1/4,2/4,3/4, or 4/4. Yet, the resolution for AIP is the full range of offers (2 to 9) with respect to the parameter of interest (the indifference point). Could this bias the estimation of OIP (for instance, someone who scored 25% might actually be much closer to either 50 or 0, but we can't tell due to resolution?

      As mentioned in response to comment (1), we restricted the parameter range for the thresholds to 2 to 9 to increase comparability. The reviewer is right to point out that the OIP  still has lower resolution than the AIP, which is one of the downsides of having a shortened paradigm (cf. the longer version in Gilbert et al., 2019), which is optimised for online testing, especially if used in combination with additional questionnaires. We have no reason to believe though that this could have led to any bias, especially none that would contribute to the individual differences which are the main focus of our study.

      Gilbert, S. J., Bird, A., Carpenter, J. M., Fleming, S. M., Sachdeva, C., & Tsai, P.-C. (2020). Optimal use of reminders: Metacognition, effort, and cognitive offloading. Journal of Experimental Psychology: General, 149(3), 501–517. https://doi.org/10.1037/xge0000652

      (5) Additionally, it seems like the upper and lower bounds of OIP (0 and 10) differ from AIP (2 and 9). Could this also introduce bias (for example, if someone terrible performance, the mean would artificially be higher under AIP than OIP because the smallest indifference point is 2 under AIP, but could be 0 under OIP.

      See our response to comment (1), we fixed the range to 2 to 9 (which was the range of target values used in our study).

      (6) Finally seeing how CIT actually corresponds to accuracy overall (not a relative measure like AIP compared to OIP) I think would also be helpful as this is related to most points noted above.

      We included the suggested test as an exploratory analysis on pages 42-43 in the Supplement: “Third, we were interested in how the transdiagnostic phenotypes would correspond to performance. We therefore fitted a model which predicted internal accuracy (that is, unaided task performance on trials where no reminders could be used) from AD, CIT, and the other covariates (age, education and gender). We found that neither AD, β = -0.02, SE = 0.05, t = 0.44, p = 0.658, nor CIT, β = -0.03, SE = 0.05, t = -0.66, p = 0.510, predicted internal accuracy.

      The full results can be found in Table S13 as well as in Figure S32.”

    1. Author response:

      We genuinely appreciate the reviewers' interest and recognition of our work. The comments and suggestions on the results presentation and interpretation are well taken. We plan to revise the manuscript based on the reviewers' recommendations in the following aspects.

      (1) We fully agree with the reviewer that the aged environment indeed would affect the myeloid and megakaryocyte differentiation behaviors of HSC. As a result, the clonal behaviors of HSCs presented in the current manuscript could be different from how HSCs differentiate in young mice. This point will be discussed in the revised manuscript.

      (2) We agree with the reviewer that the manuscript was not as easy to follow as many other papers in experimental hematology, primarily because the analyses presented in the current manuscript were not frequently used in previous studies. To address this, we will try to revise the manuscript using plain language to describe the results and conclusions. We will also provide graphical summary schematics where appropriate to present the findings better. We will further discuss our results in the context of previous findings to better illustrate the novelty of the current work.

      (3) We will provide more technical details of our analysis in the revised manuscript for readers to better understand how results are obtained and data analyses are performed in the current manuscript.

    1. Author response:

      We thank the reviewers for their thoughtful and constructive assessment of our manuscript. We agree that additional clarity on some key points in the manuscript will be valuable additions to this work. Both reviewers expressed a related concern regarding the basis for design and interpretation of our pyrazinamide ROS synergy experiments. 

      Reviewer 1:

      The in vitro experiments performed in this manuscript mainly report that PZA pre-treatment increases H2O2-mediated killing or inhibition. There is no direct evidence that clearly shows that oxidative stress drives the potent bactericidal activity of PZA. In these settings the oxidative stress is always applied after PZA pre-treatment and is therefore likely displaying the major lethal effect.

      Reviewer 2:

      The manuscript would benefit from a clear statement of the rationale for the protocols used to examine the synergy of PZA with ROS, the possible models their protocols could be testing, and then how their data supports or disproves the models being tested. The manuscript appears to propose, as stated in the title, that "Oxidative stress drives potent bactericidal activity of pyrazinamide...". However their experimental design more likely tests the effect of PZA on ROS sensitivity. Indeed, by the last figure, the authors begin the present their data as PZA sensitizing the bacteria to ROS. More clarity on these possible models and the different interpretations of the data should be considered.

      We agree that the data presented in the current version of the manuscript is incomplete in supporting our assertion that oxidative stress drives bactericidal activity of pyrazinamide. As both reviewers note, pretreatment of bacilli with pyrazinamide followed by challenge with ROS indicates that pyrazinamide enhances susceptibility to oxidative stress but does not address whether oxidative stress enhances susceptibility to pyrazinamide. Further, we neglected to provide information regarding why we chose to pretreat bacilli with pyrazinamide before ROS exposure. Over the course of our work, we had found that pyrazinoic acid, the active form of pyrazinamide, showed potent synergy with hydrogen peroxide.  In contrast to the time-dependent synergy that we observed between pyrazinamide and peroxide, synergy between pyrazinoic acid and peroxide did not require pretreatment. We will revise our manuscript to include results that address these key issues and we will carefully consider revising our interpretations accordingly.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The question of how central nervous system (CNS) lamination defects affect functional integrity is an interesting topic, though it remains a subject of debate. The authors focused on the retina, which is a relatively simple yet well-laminated tissue, to investigate the impact of afadin - a key component of adherens junctions on retinal structure and function. Their findings show that the loss of afadin leads to significant disruptions in outer retinal lamination, affecting the morphology and localization of photoreceptors and their synapses, as illustrated by high-quality images. Despite these severe changes, the study found that some functions of the retinal circuits, such as the ability to process light stimuli, could still be partially preserved. This research offers new insights into the relationship between retinal lamination and neural circuit function, suggesting that altered retinal morphology does not completely eliminate the capacity for visual information processing.

      Strengths:

      The retina serves as an excellent model for investigating lamination defects and functional integrity due to its relatively simple yet well-organized structure, along with the ease of analyzing visual function. The images depicting outer retinal lamination, as well as the morphology and localization of photoreceptors and their synapses, are clear and well-described. The paper is logically organized, progressing from structural defects to functional analysis. Additionally, the manuscript includes a comprehensive discussion of the findings and their implications.

      Weaknesses:

      While this work presents a wealth of descriptive data, it lacks quantification, which would help readers fully understand the findings and compare results with those from other studies. Furthermore, the molecular mechanisms underlying the defects caused by afadin deletion were not explored, leaving the role of afadin and its intracellular signaling pathways in retinal cells unclear. Finally, the study relied solely on electrophysiological recordings to demonstrate RGC function, which may not be robust enough to support the conclusions. Incorporating additional experiments, such as visual behavior tests, would strengthen the overall conclusions.

      Thank you very much for taking the time and thoughtful and valuable comments. Following your suggestions, we will quantify some of the histological data and explore the mechanisms underlying the defects of lamination and cell fate determination observed in afadin cKO retina. We will also try to examine the vision of afadin cKO mice by visual behavior tests.

      Reviewer #2 (Public review):

      Summary:

      Ueno et al. described substantial changes in the afadin knockout retina. These changes include decreased numbers of rods and cones, an increased number of bipolar cells, and disrupted somatic and synaptic organization of the outer limiting membrane, outer nuclear layer, and outer plexiform layer. In contrast, the number and organization of amacrine cells and retinal ganglion cells remain relatively intact. They also observed changes in ERG responses and RGC receptive fields and functions using MEA recordings.

      Strengths:

      The morphological characterization of retinal cell types and laminations is detailed and relatively comprehensive.

      Weaknesses:

      (1) The major weakness of this study, perhaps, is that its findings are predominantly descriptive and lack any mechanistic explanation. As afadin is key component of adherent junctions, its role in mediating retinal lamination has been reported previously (see PMCID: PMC6284407). Thus, a more detailed dissection of afadin's role in processes, such as progenitor generation, cell migration, or the formation of retinal lamination would provide greater insight into the defects caused by knocking out afadin.

      Thank you for taking the time and valuable comments. Following your suggestions, we will perform experiments to evaluate mechanisms of retinal lamination and cell fate determination defects observed in the afadin cKO retina. However, we would like to note that the paper cited in the comment (PMCID: PMC6284407) analyzed the function of afadin in the formation of dendrites of direction selective RGCs in the IPL, and that the word "lamination" refers to the layering of RGC dendrites in the IPL. Here, we analyzed the function of afadin in laminar construction of the retina.

      (2) The authors observed striking changes in the numbers of rods, cones, and BCs, but not in ACs or RGCs. The causes of these distinct changes in specific cell classes remain unclear. Detailed characterizations, such as the expression of afadin in early developing retina, tracing cell numbers across various early developmental time points, and staining of apoptotic markers in developing retinal cells, could help to distinguish between defects in cell generation and survival, providing a better understand of the underlying causes of these phenotypes.

      Following your suggestion, we will perform the experiments to characterize the causes of distinct changes in the afadin cKO retina.

      (3) Although the total number of ACs or RGCs remains unchanged, their localizations are somewhat altered (Figures 2E and 4E). Again, the cause of the altered somatic localization in ACs and RGCs is unclear.

      To clarify the reviewer’s point, we will analyze the progenitor and those cell positions in the developing stage of the afadin cKO retina.

      (4) One conclusion that the authors emphasise is that the function of RGCs remains detectable despite a major disrupted outer plexiform layer. However, the organization of the inner plexiform layer remains largely intact, and the axonal innervation of BCs remains unchanged. This could explain the function integrity of RGCs. In addition, the resolution of detecting RGCs by MEA is low, as they only detected 5 clusters in heterozygous animals. This represents an incomplete clustering of RGC functional types and does not provide a full picture of how functional RGC types are altered in the afadin knockout.

      We appreciate the reviewer’s insightful comments. Although our clustering of RGC subtypes in afadin cHet retinas resulted in only five clusters, the key finding of our study is the preservation of RGC receptive fields in afadin cKO retinas, despite severe photoreceptor loss (reduced to about one-third of normal) and disruption of photoreceptor-bipolar cell synapses in the OPL. This suggests that even with crucial damage to the OPL, the primary photoreceptor-bipolar-RGC pathway can still function as long as the IPL remains intact. Moreover, the presence of rod-driven responses in RGCs indicates that the AII amacrine cell-mediated rod pathway may also continue to function. We agree that our functional clustering in afadin cHet retinas was incomplete. However, we guess that the absence of RGCs with fast temporal responses in afadin cKO retinas may not simply due to the loss of specific RGC subtypes but due to disrupted synaptic connections between photoreceptors and fast-responding bipolar cells. Furthermore, the structural abnormalities in retinal lamination in afadin cKO retinas may alter RGC response properties, making strict functional classification less meaningful. We would like to emphasize the finding that disruption of the retinal lamination in afadin cKO retinas leads to the absence of RGCs with fast temporal response properties, rather than focusing solely on the classification of RGC subtypes.

      Minor Comments:

      (1) Line 56-67: "Overall, these findings provide the first evidence that retinal circuit function can be partially preserved even when there are significant disruptions in retinal lamination and photoreceptor synapses" There is existing evidence showing substantial adaption in retinal function when retinal lamination or photoreceptor synapses are disrupted, such as PMCID: PMC10133175.

      Thank you for your comment. The paper you mentioned is crucial for discussing and considering the results of our study. We will refer the paper and mention in Discussion.  

      (2) Line 114-115: "we focused on afadin, which is a scaffolding protein for nectin and has no ortholog in mice." The term "Ortholog" is misused here, as the mouse has an afadin gene. Should the intended meaning be that afadin has no other isoforms in mouse?

      Thank you for pointing it out. As we misused "Ortholog" as "Paralog", we will revise it.

    1. Author response:

      eLife Assessment

      This study presents a valuable theoretical exploration on the electrophysiological mechanisms of ionic currents via gap junctions in hippocampal CA1 pyramidal-cell models, and their potential contribution to local field potentials (LFPs) that is different from the contribution of chemical synapses. The biophysical argument regarding electric dipoles appears solid, but the evidence can be more convincing if their predictions are tested against experiments. A shortage of model validation and strictly comparable parameters used in the comparisons between chemical vs. junctional inputs makes the modeling approach incomplete; once strengthened, the finding can be of broad interest to electrophysiologists, who often make recordings from regions of neurons interconnected with gap junctions.

      We gratefully thank the editors and the reviewers for the time and effort in rigorously assessing our manuscript, for the constructive review process, for their enthusiastic responses to our study, and for the encouraging and thoughtful comments. We especially thank you for deeming our study to be a valuable exploration on the differential contributions of active dendritic gap junctions vs. chemical synapses to local field potentials. We thank you for your appreciation of the quantitative biophysical demonstration on the differences in electric dipoles that appear in extracellular potentials with gap junctions vs. chemical synapses.

      However, we are surprised by aspects of the assessment that resulted in deeming the approach incomplete, especially given the following with specific reference to the points raised:

      (1) Testing against experiments: With specific reference to gap junctions, quantitative experimental verification becomes extremely difficult because of the well-established nonspecificities associated with gap junctional modulators (Behrens et al., 2011; Rouach et al., 2003). The non-specific actions of gap junctions are tabulated in Table 2 of (Szarka et al., 2021), reproduced below. In addition, genetic knockouts of gap junctional proteins are either lethal or involve functional compensation (Bedner et al., 2012; Lo, 1999), together making causal links to specific gap junctional contributions with currently available techniques infeasible.

      In addition, the complex interactions between co-existing chemical synaptic, gap junctional, and active dendritic contributions from several cell-types make the delineation of the contributions of specific components infeasible with experimental approaches. A computational approach is the only quantitative route to specifically delineate the contributions of individual components to extracellular potentials, as seen from studies that have addressed the question of active dendritic contributions to field potentials (Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Sinha & Narayanan, 2015, 2022) or spiking contributions to local field potentials (Buzsaki et al., 2012; Gold et al., 2006; Schomburg et al., 2012). The biophysically and morphologically realistic computational modeling route is therefore invaluable in assessing the impact of individual components to extracellular field potentials (Einevoll et al., 2019; Halnes et al., 2024).

      Together, we emphasize that the computational modeling route is currently the only quantitative methodology to delineate the contributions of gap junctions vs. chemical synapses to extracellular potentials.

      (2) Model validation: The model used in this study was adopted from a physiologically validated model from our laboratory (Roy & Narayanan, 2021). Please note that the original model was validated against several physiological measurements along the somatodendritic axis. We sincerely regret our oversight in not mentioning clearly that we have used an existing, thoroughly physiologically-validated model from our laboratory in this study.

      (3) Comparisons between chemical vs. junctional inputs: We had taken elaborate precautions in our experimental design to match the intracellular electrophysiological signatures with reference to synchronous as well as oscillatory inputs, irrespective of whether inputs arrived through gap junctions or chemical synapses.

      In a revised manuscript, we will address all the concerns raised by the reviewers in detail. We have provided point-by-point responses to reviewers’ helpful and constructive comments below. We thank the editors and the reviewers for this constructive review process, which we believe will help us in improving our manuscript with specific reference to emphasizing the novelty of our approach and conclusions.

      Reviewer #1 (Public review):

      This manuscript makes a significant contribution to the field by exploring the dichotomy between chemical synaptic and gap junctional contributions to extracellular potentials. While the study is comprehensive in its computational approach, adding experimental validation, network-level simulations, and expanded discussion on implications would elevate its impact further.

      We gratefully thank you for your time and effort in rigorously assessing our manuscript, for the enthusiastic response, and the encouraging and thoughtful comments on our study. In what follows, we have provided point-by-point responses to the specific comments.

      Strengths

      Novelty and Scope

      The manuscript provides a detailed investigation into the contrasting extracellular field potential (EFP) signatures arising from chemical synapses and gap junctions, an underexplored area in neuroscience. It highlights the critical role of active dendritic processes in shaping EFPs, pushing forward our understanding of how electrical and chemical synapses contribute differently to extracellular signals.

      We thank you for the positive comments on the novelty of our approach and how our study addresses an underexplored area in neuroscience. The assumptions about the passive nature of dendritic structures had indeed resulted in an underestimation of the contributions of gap junctions to extracellular potentials. Once the realities of active structures are accounted for, the contributions of gap junctions increases by several orders of magnitude compared to passive structures (Fig. 1D).

      Methodological Rigor

      The use of morphologically and biophysically realistic computational models for CA1 pyramidal neurons ensures that the findings are grounded in physiological relevance. Systematic analysis of various factors, including the presence of sodium, leak, and HCN channels, offers a clear dissection of how transmembrane currents shape EFPs.

      We thank you for your encouraging comments on the experimental design and methodological rigor of our approach.

      Biological Relevance

      The findings emphasize the importance of incorporating gap junctional inputs in analyses of extracellular signals, which have traditionally focused on chemical synapses. The observed polarity differences and spectral characteristics provide novel insights into how neural computations may differ based on the mode of synaptic input.

      We thank you for your positive comments on the biological relevance of our approach. We also gratefully thank you for emphasizing the two striking novelties unveiling the dichotomy between gap junctions and chemical synapses in their contributions to field potentials: polarity differences and spectral characteristics.

      Clarity and Depth

      The manuscript is well-structured, with a logical progression from synchronous input analyses to asynchronous and rhythmic inputs, ensuring comprehensive coverage of the topic.

      We sincerely thank you for the positive comments on the structure and comprehensive coverage of our manuscript encompassing different types of inputs that neurons typically receive.

      Weaknesses and Areas for Improvement

      Generality and Validation

      The study focuses exclusively on CA1 pyramidal neurons. Expanding the analysis to other cell types, such as interneurons or glial cells, would enhance the generalizability of the findings. Experimental validation of the computational predictions is entirely absent. Empirical data correlating the modeled EFPs with actual recordings would strengthen the claims.

      We thank you for raising this important point. The prime novelty and the principal conclusion of this study is that gap junctional contributions to extracellular field potentials are orders of magnitude higher when the active nature of cellular compartments are accounted for. The lacuna in the literature has been consequent to the assumption that cellular compartments are passive, resulting in the dogma that gap junctional contributions to field potentials are negligible. Despite knowledge about active dendritic structures for decades now, this assumption has kept studies from understanding or even exploring the contributions of gap junctions to field potentials. The rationale behind the choice of a computational approach to address the lacuna were as follows:

      (1) The complex interactions between co-existing chemical synaptic, gap junctional, and active dendritic contributions from several cell-types make the delineation of the contributions of specific components infeasible with experimental approaches. A computational approach is the only quantitative route to specifically delineate the contributions of individual components to extracellular potentials, as seen from studies that have addressed the question of active dendritic contributions to field potentials (Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Sinha & Narayanan, 2015, 2022) or spiking contributions to local field potentials (Buzsaki et al., 2012; Gold et al., 2006; Schomburg et al., 2012). The biophysically and morphologically realistic computational modeling route is therefore invaluable in assessing the impact of individual components to extracellular field potentials (Einevoll et al., 2019; Halnes et al., 2024).

      (2) With specific reference to gap junctions, quantitative experimental verification becomes extremely difficult because of the well-established non-specificities associated with gap junctional modulators (Behrens et al., 2011; Rouach et al., 2003). The non-specific actions of gap junctions are tabulated in Table 2 of (Szarka et al., 2021). In addition, genetic knockouts of gap junctional proteins are either lethal or involve functional compensation (Bedner et al., 2012; Lo, 1999), together making causal links to specific gap junctional contributions with currently available techniques infeasible.

      We highlight the novelty of our approach and of the conclusions about differences in extracellular signatures associated with active-dendritic chemical synapses and gap junctions, against these experimental difficulties. We emphasize that the computational modeling route is currently the only quantitative methodology to delineate the contributions of gap junctions vs. chemical synapses to extracellular potentials. Our analyses clearly demonstrates that gap junctions do contribute to extracellular potentials if the active nature of the cellular compartments is explicitly accounted for (Fig. 1D). We also show theoretically well-grounded and mechanistically elucidated differences in polarity (Figs. 1–3) as well as in spectral signatures (Figs. 5–8) of extracellular potentials associated with gap junctional vs. chemical synaptic inputs. Together, our fundamental demonstration in this study is the critical need to account for the active nature of cellular compartments in studying gap junctional contributions of extracellular potentials, with CA1 pyramidal neuronal dendrites used as an exemplar.

      In a revised version of the manuscript, we will emphasize the motivations for the approach we took, highlighting the specific novelties both in methodological and conceptual aspects, finally emphasizing the need to account for other cell types and gap junctional contributions therein. Importantly, we will emphasize the non-specificities associated with gap-junctional blockers as the reason why experimental delineation of gap junctional vs. chemical synaptic contributions to LFP becomes tedious. We hope that these points will underscore the need for the computational approach that we took to address this important question, apart from the novelties of the manuscript.

      Role of Active Dendritic Currents

      The paper emphasizes active dendritic currents, particularly the role of HCN channels in generating outward currents under certain conditions. However, further discussion of how this mechanism integrates into broader network dynamics is warranted.

      We thank you for this constructive suggestion. We agree that it is important to consider the implications for broader network dynamics of the outward HCN currents that are observed with synchronous inputs. In a revised manuscript, we will elaborate on the implications of the outward HCN current to network dynamics in detail.

      Analysis of Plasticity

      While the manuscript mentions plasticity in the discussion, there are no simulations that account for activity-dependent changes in synaptic or gap junctional properties. Including such analyses could significantly enhance the relevance of the findings.

      We thank you for this constructive suggestion. Please note that we have presented consistent results for both fewer and more gap junctions in our analyses (Figure 1 with 217 gap junctions and Supplementary Figure 1 with 99 gap junctions). Thus, our fundamentally novel result that gap junctions onto active dendrites differentially shape LFPs holds true irrespective of the relative density of gap junctions onto the neuron. Thus, these results demonstrate that the conclusions about their contributions to LFP are invariant to plasticity in their gap junctional numerosity.

      We had only briefly mentioned plasticity in the Introduction to highlight the different modes of synaptic transmission and to emphasize that plasticity has been studied in both chemical synapses and gap junctions, playing a role in learning and adaptation. However, if this wording inadvertently suggests that our study includes plasticity simulations, we would remove it from Introduction in the updated manuscript to ensure clarity.

      In the ‘Limitations of analyses and future studies’ section in Discussion, we suggested investigating the impact of plasticity mechanisms—specifically, activity-dependent plasticity of ion channels—on synaptic receptors vs. gap junctions and their effects on extracellular field potentials under various input conditions and plasticity combinations across different structures. We fully agree with the reviewer that such studies would offer valuable insights and further enhance the broader relevance of our findings. However, while our study implies this direction, it was not the primary focus of our investigation.

      In the revised manuscript, we will expand on intrinsic/synaptic plasticity and how they could contribute to LFPs (Sinha & Narayanan, 2015, 2022), while also pointing to simulations with different numbers of gap junction in this context.

      Frequency-Dependent Effects

      The study demonstrates that gap junctional inputs suppress highfrequency EFP power due to membrane filtering. However, it could delve deeper into the implications of this for different brain rhythms, such as gamma or ripple oscillations.

      We sincerely thank you for these insightful comments that we totally agree with. As it so happens, this manuscript forms the first part of a broader study where we explore the implications of gap junctions to ripple frequency oscillations. The ripple oscillations part of the work was presented as a poster in the Society for Neuroscience (SfN) annual meeting 2024 (Sirmaur & Narayanan, 2024). There, we simulate a neuropil made of hundreds of morphologically realistic neurons to assess the role of different synaptic inputs — excitatory, inhibitory, and gap junctional — and active dendrites to ripple frequency oscillations. We demonstrate there that the conclusions from single-neuron simulations in this current manuscript extend to a neuropil with several neurons, each receiving excitatory, inhibitory and gap-junctional inputs, especially with reference to high-frequency oscillations. Our networkbased analyses unveiled a dominant mediatory role of patterned inhibition in ripple generation, with recurrent excitations through chemical synapses and gap junctions in conjunction with return-current contributions from active dendrites playing regulatory roles in determining ripple characteristics (Sirmaur & Narayanan, 2024).

      Our principal goal in this study, therefore, was to lay the single-neuron foundation for network analyses of the impact of gap junctions on LFPs. We are preparing the network part of the study, with a strong focus on ripple-frequency oscillations, for submission for peer review separately.

      In a revised manuscript, we will mention the results from our SfN abstract with reference to network simulations and high-frequency oscillations, while also presenting discussions from other studies on the role of gap junctions in synchrony and LFP oscillations.

      Visualization

      Figures are dense and could benefit from more intuitive labeling and focused presentations. For example, isolating key differences between chemical and gap junctional inputs in distinct panels would improve clarity.

      We thank you for this constructive suggestion. In the revised manuscript, we will enhance the visualization of the figures to ensure a clearer and more intuitive distinction between chemical synapses and gap junctions.

      Contextual Relevance

      The manuscript touches on how these findings relate to known physiological roles of gap junctions (e.g., in gamma rhythms) but does not explore this in depth. Stronger integration of the results into known neural network dynamics would enhance its impact.

      We sincerely appreciate your valuable suggestion and acknowledge the importance of integrating our results into established neural network dynamics, particularly their implications for gamma rhythms. We will address this aspect more comprehensively in the revised version of our manuscript.

      Reviewer #2 (Public review):

      This computational work examines whether the inputs that neurons receive through electrical synapses (gap junctions) have different signatures in the extracellular local field potential (LFP) compared to inputs via chemical synapses. The authors present the results of a series of model simulations where either electric or chemical synapses targeting a single hippocampal pyramidal neuron are activated in various spatio-temporal patterns, and the resulting LFP in the vicinity of the cell is calculated and analyzed. The authors find several notable qualitative differences between the LFP patterns evoked by gap junctions vs. chemical synapses. For some of these findings, the authors demonstrate convincingly that the observed differences are explained by the electric vs. chemical nature of the input, and these results likely generalize to other cell types. However, in other cases, it remains plausible (or even likely) that the differences are caused, at least partly, by other factors (such as different intracellular voltage responses due to, e.g., the unequal strengths of the inputs). Furthermore, it was not immediately clear to me how the results could be applied to analyze more realistic situations where neurons receive partially synchronized excitatory and inhibitory inputs via chemical and electric synapses.

      We gratefully thank you for your time and effort in rigorously assessing our manuscript, for the enthusiastic response, and the encouraging and thoughtful comments on our study. In what follows, we have provided point-by-point responses to the specific comments.

      Strengths

      The main strength of the paper is that it draws attention to the fact that inputs to a neuron via gap junctions are expected to give rise to a different extracellular electric field compared to inputs via chemical synapses, even if the intracellular effects of the two types of input are similar. This is because, unlike chemical synaptic inputs, inputs via gap junctions are not directly associated with transmembrane currents. This is a general result that holds independent of many details such as the cell types or neurotransmitters involved.

      We gratefully thank you for the positive comments and the encouraging words about the novel contributions of our study. We are particularly thankful to you for your comment on the generality of our conclusions that hold for different cell types and neurotransmitters involved.

      Another strength of the article is that the authors attempt to provide intuitive, non-technical explanations of most of their findings, which should make the paper readable also for non-expert audiences (including experimentalists).

      We sincerely thank you for the positive comments about the readability of the paper.

      Weaknesses

      The most problematic aspect of the paper relates to the methodology for comparing the effects of electric vs. chemical synaptic inputs on the LFP. The authors seem to suggest that the primary cause of all the differences seen in the various simulation experiments is the different nature of the input, and particularly the difference between the transmembrane current evoked by chemical synapses and the gap junctional current that does not involve the extracellular space. However, this is clearly an oversimplification: since no real attempt is made to quantitatively match the two conditions that are compared (e.g., regarding the strength and temporal profile of the inputs), the differences seen can be due to factors other than the electric vs. chemical nature of synapses. In fact, if inputs were identical in all parameters other than the transmembrane vs. directly injected nature of the current, the intracellular voltage responses and, consequently, the currents through voltage-gated and leak currents would also be the same, and the LFPs would differ exactly by the contribution of the transmembrane current evoked by the chemical synapse. This is evidently not the case for any of the simulated comparisons presented, and the differences in the membrane potential response are rather striking in several cases (e.g., in the case of random inputs, there is only one action potential with gap junctions, but multiple action potentials with chemical synapses). Consequently, it remains unclear which observed differences are fundamental in the sense that they are directly related to the electric vs. chemical nature of the input, and which differences can be attributed to other factors such as differences in the strength and pattern of the inputs (and the resulting difference in the neuronal electric response).

      We thank you for raising this important point. We would like to emphasize that our experimental design and analyses quantitatively account for the spatial distribution and temporal pattern of specific kinds of inputs that arrive through gap junctions and chemical synapses. We submit that our analyses quantitatively demonstrates that the fundamental difference between the gap junctional and chemical synaptic contributions to extracellular potentials is the absence of the direct transmembrane component from gap junctional inputs. We elucidate these points below:

      (1) Spatial distribution: The inputs were distributed randomly across the basal dendrites, irrespective of whether they were through gap junctions or chemical synapses. For both chemical synapses and gap junctions, the inputs were of the same nature: excitatory.

      (2) Different numbers of inputs: We have presented consistent results for both fewer and more gap junctions or chemical synapses in our analyses (see Figure 1 with 217 gap junctions or 245 chemical synapses and Supplementary Figure 2 with 99 gap junctions or 30 chemical synapses). Our fundamentally novel result that gap junctions onto active dendrites shape LFPs holds true irrespective of the relative density of gap junctions onto the neuron.

      (3) Synchronous inputs (Figs. 1–3): For chemical synapses, the waveforms are in the shape of postsynaptic potentials. For gap junctional inputs, the waveforms are in the shape of postsynaptic potentials or dendritic spikes (to respect the active nature of inputs from the other cell). Here, the electrical response of the postsynaptic cell is identical irrespective of whether inputs arrive through gap junctions or chemical synapses: an action potential. We quantitatively matched the strengths such that the model generated a single action potential in response to synchronous inputs, irrespective of whether they arrived through chemical synaptic and gap junctional inputs. We mechanistically analyze the contributions of different cellular components and show that the direct transmembrane current in chemical synapses is the distinguishing factor that determines the dichotomy between the contributions of gap junctions vs. chemical synapses to extracellular potentials (Figs. 2–3). In a revised manuscript, we will show the intracellular responses to demonstrate that they are electrically matched.

      (4) Random inputs (Fig. 4): For random inputs, we did not account for the number of action potentials that arrived, as the only observation we made here was with reference to the biphasic nature of the extracellular potentials with gap junctional inputs in the “No Sodium” scenario. We note that in the “No Sodium” scenario, the time-domain amplitudes were comparable for the field potentials (Fig. 4B, Fig. 4D).

      (5) Rhythmic inputs (Fig. 5–8): For rhythmic inputs, please note that the intracellular and extracellular waveforms for every frequency are provided in supplementary figures S5– S11. It may be noted that the intracellular responses are comparable. In simulations for assessing spike-LFP comparison, we tuned the strengths to produce a single spike per cycle, ensuring fair comparison of LFPs with gap junctions vs. chemical synapses.

      Taken together, we demonstrate through explicit sets of simulations and analyses that the differences in LFPs were not driven by the strength or patterns of the inputs but rather by the differences in direct transmembrane currents, which are subsequently reflected in the LFPs. In a revised manuscript, we will add a section to emphasize these points apart from providing intracellular traces for cases where they are not provided.

      Some of the explanations offered for the effects of cellular manipulations on the LFP appear to be incomplete. More specifically, the authors observed that blocking leak channels significantly changed the shape of the LFP response to synchronous synaptic inputs - but only when electric inputs were used, and when sodium channels were intact. The authors seemed to attribute this phenomenon to a direct effect of leak currents on the extracellular potential - however, this appears unlikely both because it does not explain why blocking the leak conductance had no effect in the other cases, and because the leak current is several orders of magnitude smaller than the spike-generating currents that make the largest contributions to the LFP. An indirect effect mediated by interactions of the leak current with some voltage-gated currents appears to be the most likely explanation, but identifying the exact mechanism would require further simulation experiments and/or a detailed analysis of intracellular currents and the membrane potential in time and space.

      We thank you for raising this important question. Leak channels were among the several contributors to the positive deflection observed in LFPs associated with gap junctions. This effect was present not only in gap junctional models with intact sodium conductance but also in the no-sodium model, where the amplitude of the positive deflection was reduced across other models as well (Fig. 2F, I). Furthermore, even in the absence of leak conductance, a small positive deflection was still observed (Fig. 2F), leading us to further investigate other transmembrane currents over time and across spatial locations, from the proximal to the distal dendritic ends relative to the soma (Fig. 3D). We had observed that the dominant contributor in the case of chemical synapses was the inward synaptic current (Fig. 3A), whereas for gap junctions, the primary contributors were leak conductance along with other outward currents, such as potassium and HCN currents (Fig. 3D). Together, the direct transmembrane component of chemical synapses provides a dominant contribution to extracellular potentials. This dominance translates to differences in the relative contributions of indirect currents (including leak currents) to extracellular potentials associated chemical synaptic vs. gap junctional inputs. Our analyses of the exact ionic mechanisms (Fig. 3) demonstrates the involvement of several ion channels contributing to the indirect component in either scenario.

      In every simulation experiment in this study, inputs through electric synapses are modeled as intracellular current injections of pre-determined amplitude and time course based on the sampled dendritic voltage of potential synaptic partners. This is a major simplification that may have a significant impact on the results. First, the current through gap junctions depends on the voltage difference between the two connected cellular compartments and is thus sensitive to the membrane potential of the cell that is treated as the neuron "receiving" the input in this study (although, strictly speaking, there is no pre- or postsynaptic neuron in interactions mediated by gap junctions). This dependence on the membrane potential of the target neuron is completely missing here. A related second point is that gap junctions also change the apparent membrane resistance of the neurons they connect, effectively acting as additional shunting (or leak) conductance in the relevant compartments. This effect is completely missed by treating gap junctions as pure current sources.

      We thank you for raising this important point. We agree with the analyses presented by the reviewer on the importance of network simulations and bidirectional gap junctions that respect the voltages in both neurons. However, the complexities of LFP modeling precludes modeling of networks of morphologically realistic models with patterns of stimulations occurring across the dendritic tree. LFP modeling studies predominantly uses “post-synaptic” currents to analyze the impact of different patterns of inputs arriving on to a neuron, even when chemical synaptic inputs are considered. Explicitly, individual neurons are separately simulated with different patterns of synaptic inputs, the transmembrane current at different locations recorded, and the extracellular potential is then computed using line source approximation (Buzsaki et al., 2012; Gold et al., 2006; Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Schomburg et al., 2012; Sinha & Narayanan, 2015, 2022). Even in scenarios where a network is analyzed, a hybrid approach involving the outputs of a pointneuron-based network being coupled to an independent morphologically realistic neuronal model is employed (Hagen et al., 2016; Martinez-Canada et al., 2021; Mazzoni et al., 2015). Given the complexities associated with the computation of electrode potentials arising as a distance-weighted summation of several transmembrane currents, these simplifications becomes essential.

      Our approach models gap junctional currents in a similar way as the other model incorporate synaptic currents in LFP modeling (Buzsaki et al., 2012; Gold et al., 2006; Halnes et al., 2024; Ness et al., 2018; Reimann et al., 2013; Schomburg et al., 2012; Sinha & Narayanan, 2015, 2022). As gap junctions are typically implemented as resistors from the other neuronal compartment, we accounted for gap-junctional variability in our model by randomizing the scaling-factors and the exact waveforms that arrive through individual gap junctions at specific locations. Thus, the inputs were not pre-determined by “pre” neurons. Instead, the recorded voltages from potential synaptic partner neurons were randomized across locations and scaled using factors at the dendrites before being injected into the target neuron (Supplementary Fig. S1). While incorporating a network of interconnected neurons is indeed important, we utilized biophysical, morphologically realistic CA1 neuron model with different sets of input patterns to model LFPs, which were derived from the total transmembrane currents across all compartments of the multi-compartmental neuron model. Given the complexity of this approach, adding further network-level interactions or pre-post connections would have been computationally demanding.

      In a revised manuscript, we will introduce the general methodology used in LFP modeling studies to introduce synaptic currents. We will emphasize that our study extends this approach to modeling gap junctional inputs, while also highlighting randomization of locations and the scaling process in assigning gap junctional synaptic strengths.

      One prominent claim of the article that is emphasized even in the abstract is that HCN channels mediate an outward current in certain cases. Although this statement is technically correct, there are two reasons why I do not consider this a major finding of the paper. First, as the authors acknowledge, this is a trivial consequence of the relatively slow kinetics of HCN channels: when at least some of the channels are open, any input that is sufficiently fast and strong to take the membrane potential across the reversal potential of the channel will lead to the reversal of the polarity of the current. This effect is quite generic and well-known and is by no means specific to gap junctional inputs or even HCN channels. Second, and perhaps more importantly, the functional consequence of this reversed current through HCN channels is likely to be negligible. As clearly shown in Supplementary Figure S3, the HCN current becomes outward only for an extremely short time period during the action potential, which is also a period when several other currents are also active and likely dominant due to their much higher conductances. I also note that several of these relevant facts remain hidden in Figure 3, both because of its focus on peak values, and because of the radically different units on the vertical axes of the current plots.

      We thank you for raising this point and agree with you on every point. Please note that we do not assert that the outward HCN currents are exclusively associated with gap junctional inputs. Rather, our results show that synchronous inputs generate outward HCN currents in both chemical synapses (Fig. 3B; positive/outward HCN currents, except in the no sodium or leak model) and gap junctions (Fig. 3D; positive/outward HCN currents). We emphasized this in the case of gap junctions because, in the absence of inward synaptic currents, HCN (acting as outward currents with synchronous inputs) contributed to the positive deflection observed in the LFPs. While HCN would also contribute in the case of chemical synapses, its effect was negligible due to the presence of large inward synaptic currents. Since LFPs reflect the collective total transmembrane currents, the dominant contributors differ between these two scenarios, which we aimed to highlight. Since HCN exhibited outward currents in our synchronous input simulations, we have elaborated on this mechanism in the supplementary figure (Fig. S3). Our intention was not to emphasize this effect for only one synaptic mode but rather to highlight HCN's contribution to the positive deflection as one of the contributing factors.

      We agree that HCN currents are relatively small in magnitude; therefore, our conclusions were based on HCN being one of the several contributing factors. Leak conductance and other outward conductances, including HCN currents (Fig. 3D), collectively contribute to the positive deflections observed in the case of gap junctional synchronous inputs.

      We will ensure that we will account for all the points appropriately in a revised manuscript.

      Finally, I missed an appropriate validation of the neuronal model used, and also the characterization of the effects of the in silico manipulations used on the basic behavior of the model. As far as I understand, the model in its current form has not been used in other studies. If this is the case, it would be important to demonstrate convincingly through (preferably quantitative) comparisons with experimental data using different protocols that the model captures the physiological behavior of at least the relevant compartments (in this case, the dendrites and the soma) of hippocampal pyramidal neurons sufficiently well that the results of the modeling study are relevant to the real biological system. In addition, the correct interpretation of various manipulations of the model would be strongly facilitated by investigating and discussing how the physiological properties of the model neuron are affected by these alterations.

      We thank you for raising this important point. The CA1 pyramidal neuronal model used in this study is built with ion-channel models derived from biophysical and electrophysiological recordings from these cells. As mentioned in the Methods section “Dynamics and distribution of active channels” and Supplementary Table S1, models for individual channels, their gating kinetics, and channel distributions across the somatodendritic arbor (wherever known) are all derived from their physiological equivalents. Importantly, these values were derived from previously validated models from the laboratory, which contain these very ion channel models and the exact same morphology (Roy & Narayanan, 2021). Please compare Supplementary Table S1 with the Table 1 from (Roy & Narayanan, 2021). Please note that this model was validated against several physiological measurements along the somatodendritic axis (Fig. 1 of (Roy & Narayanan, 2021)).

      In a revised manuscript, we will explicitly mention this while also mentioning the different physiological properties that were used for the validation process from (Roy & Narayanan, 2021). We sincerely regret not mentioning these details in the current version of our manuscript.

      We will fix these in a revised version of the manuscript.

      References

      Bedner, P., Steinhauser, C., & Theis, M. (2012). Functional redundancy and compensation among members of gap junction protein families? Biochim Biophys Acta, 1818(8), 1971-1984. https://doi.org/10.1016/j.bbamem.2011.10.016

      Behrens, C. J., Ul Haq, R., Liotta, A., Anderson, M. L., & Heinemann, U. (2011). Nonspecific effects of the gap junction blocker mefloquine on fast hippocampal network oscillations in the adult rat in vitro. Neuroscience, 192, 11-19. https://doi.org/10.1016/j.neuroscience.2011.07.015

      Buzsaki, G., Anastassiou, C. A., & Koch, C. (2012). The origin of extracellular fields and currents--EEG, ECoG, LFP and spikes. Nat Rev Neurosci, 13(6), 407-420. https://doi.org/10.1038/nrn3241

      Einevoll, G. T., Destexhe, A., Diesmann, M., Grun, S., Jirsa, V., de Kamps, M., Migliore, M., Ness, T. V., Plesser, H. E., & Schurmann, F. (2019). The Scientific Case for Brain Simulations. Neuron, 102(4), 735-744. https://doi.org/10.1016/j.neuron.2019.03.027

      Gold, C., Henze, D. A., Koch, C., & Buzsaki, G. (2006). On the origin of the extracellular action potential waveform: A modeling study. J Neurophysiol, 95(5), 3113-3128. https://doi.org/10.1152/jn.00979.2005

      Hagen, E., Dahmen, D., Stavrinou, M. L., Linden, H., Tetzlaff, T., van Albada, S. J., Grun, S., Diesmann, M., & Einevoll, G. T. (2016). Hybrid Scheme for Modeling Local Field Potentials from Point-Neuron Networks. Cereb Cortex, 26(12), 4461-4496. https://doi.org/10.1093/cercor/bhw237

      Halnes, G., Ness, T. V., Næss, S., Hagen, E., Pettersen, K. H., & Einevoll, G. T. (2024). Electric Brain Signals: Foundations and Applications of Biophysical Modeling. Cambridge University Press. https://doi.org/DOI: 10.1017/9781009039826

      Lo, C. W. (1999). Genes, gene knockouts, and mutations in the analysis of gap junctions. Dev Genet, 24(1-2), 1-4. https://doi.org/10.1002/(SICI)1520-6408(1999)24:1/2<1::AIDDVG1>3.0.CO;2-U

      Martinez-Canada, P., Ness, T. V., Einevoll, G. T., Fellin, T., & Panzeri, S. (2021). Computation of the electroencephalogram (EEG) from network models of point neurons. PLoS Comput Biol, 17(4), e1008893. https://doi.org/10.1371/journal.pcbi.1008893

      Mazzoni, A., Linden, H., Cuntz, H., Lansner, A., Panzeri, S., & Einevoll, G. T. (2015). Computing the Local Field Potential (LFP) from Integrate-and-Fire Network Models. PLoS Comput Biol, 11(12), e1004584. https://doi.org/10.1371/journal.pcbi.1004584

      Ness, T. V., Remme, M. W. H., & Einevoll, G. T. (2018). h-Type Membrane Current Shapes the Local Field Potential from Populations of Pyramidal Neurons. J Neurosci, 38(26), 6011-6024. https://doi.org/10.1523/jneurosci.3278-17.2018

      Reimann, M. W., Anastassiou, C. A., Perin, R., Hill, S. L., Markram, H., & Koch, C. (2013). A biophysically detailed model of neocortical local field potentials predicts the critical role of active membrane currents. Neuron, 79(2), 375-390. https://doi.org/10.1016/j.neuron.2013.05.023

      Rouach, N., Segal, M., Koulakoff, A., Giaume, C., & Avignone, E. (2003). Carbenoxolone blockade of neuronal network activity in culture is not mediated by an action on gap junctions. Journal of Physiology, 553(Pt 3), 729-745. https://doi.org/10.1113/jphysiol.2003.053439

      Roy, A., & Narayanan, R. (2021). Spatial information transfer in hippocampal place cells depends on trial-to-trial variability, symmetry of place-field firing, and biophysical heterogeneities. Neural Netw, 142, 636-660. https://doi.org/10.1016/j.neunet.2021.07.026

      Schomburg, E. W., Anastassiou, C. A., Buzsaki, G., & Koch, C. (2012). The spiking component of oscillatory extracellular potentials in the rat hippocampus. J Neurosci, 32(34), 11798-11811. https://doi.org/10.1523/JNEUROSCI.0656-12.2012

      Sinha, M., & Narayanan, R. (2015). HCN channels enhance spike phase coherence and regulate the phase of spikes and LFPs in the theta-frequency range. Proc Natl Acad Sci U S A, 112(17), E2207-2216. https://doi.org/10.1073/pnas.1419017112

      Sinha, M., & Narayanan, R. (2022). Active Dendrites and Local Field Potentials: Biophysical Mechanisms and Computational Explorations. Neuroscience, 489, 111-142. https://doi.org/10.1016/j.neuroscience.2021.08.035

      Sirmaur, R., & Narayanan, R. (2024). Distinct extracellular signatures of chemical and electrical synapses impinging on active dendrites differentially contribute to ripple-frequency oscillations. Society for Neuroscience annual meeting (https://www.abstractsonline.com/pp8/?_gl=1*1bxo7m*_gcl_au*MTc5MTQ0NjE0NC4xNzI3MDcwOTMw*_ga*MTMxMTE5OTcyMy4xNzI3MDcwOTMx*_ga_T09K 3Q2WDN*MTcyNzA3MDkzMS4xLjEuMTcyNzA3MDkzNy41NC4wLjA.#!/20433/ presentation/13949), Chicago, USA.

      Szarka, G., Balogh, M., Tengolics, A. J., Ganczer, A., Volgyi, B., & Kovacs-Oller, T. (2021). The role of gap junctions in cell death and neuromodulation in the retina. Neural Regen Res, 16(10), 1911-1920. https://doi.org/10.4103/1673-5374.308069

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2:

      The authors indicated that they had added coefficients of variation for within-lineage heterogeneity (line 93), but I can't seem to find this.

      The coefficients of variation were indeed included as suggested, and can be found in lines 94-96 of the current revised version of the manuscript. The sentence states: “Nevertheless, substantial intra-lineage heterogeneity could be observed, particularly within L1 and L2 (coefficients of variation 84.4% [L1] and 66.0% [L2] vs. 32.6% [L3], 34.6% [L4] and 31.9% [L5]).”

      They were unable to address my question on the impact of T-cell depletion from PBMC on bacterial growth? Their discussion should include that this experimental limitation means that they are unable to test cause and effect for the relationship between T cell proliferation and bacterial growth.

      As recommended, this experimental limitation is now included in the discussion in lines 344-346.

      Reviewer #3:

      EM:

      Based on the authors lack of resources, I don't believe that electron microscopy experiments should be required for this publication. However, it should be noted that EM is performed on fixed samples such that implementation of those protocols as it relates to bio-safety is no more demanding than the preparation of samples for other common assays performed outside of the BSL3.

      We appreciate your understanding regarding our lack of resources to carry out the EM experiments, although we recognize the possibility of them being performed on BSL3 samples.

      Granuloma score:

      From the author comments and the manuscript's text, it appears that the "granuloma score" is an attempt at quantitation of PBMC organization. Where every component of the metric [(mean area / mean aspect ratio) / mean n ] is a visual facet of the relative integration of PBMCs into a more organized aggregate. The area and number (n) of aggregates both address regional coalescence of the total number of PBMCs added into the matrix. Whereas the aspect ratio component is an indicator of uniformity of the PBMCs that have been assigned to an individual aggregate. Perhaps another roundness estimation would have been a more precise, but aspect ratio seems fine for their assay. Considering these factors and the author's contention that the aggregates making up (n) are granulomas, the name "granuloma score" is inaccurate and a more appropriate title would be "aggregate organization score" or "aggregate organization index".

      Thank you for the suggested alternative terminology, the term “granuloma score” has been substituted with “aggregate formation score” throughout the manuscript.

      Dormancy:

      In the manuscript, the authors should explicitly reference the validation studies which demonstrate induction of the DosR regulon in the model, lest their previously generated and conducted studies go unappreciated by a broader audience. In the title of that previous work (PMID: 32069329) this group used the designation "dormant-like" to describe the state observed in bacteria within their in vitro granuloma model system, as they also do in LINE 124. This term or a variation of it should be exchanged for dormant/dormancy throughout the manuscript when referring to observations in the model bacteria. It is a more precise description. Further, "dormant-like" allows the latitude to refer to actively growing bacteria in the context of dormancy without running the risk of putting forth confusing or potentially erroneous assertions.

      As recommended, the suffix “-like” has been added to the designation “dormant” when referring to the bacterial phenotype induced in the model. In addition, de induction of the DosR regulon in the model is now mentioned in line 116 and the reference to Kapoor’s work that originally demonstrated it by qPCR included.

      PBMC aggregation:

      I would like to make the authors aware that in well vetted models, cell aggregation as a function of infection does not typically occur in PBMCs on tissue culture plates until day 6 post infection (PMID: 25691598, Fig 2). Further, this group's own published protocol for the model under consideration in this manuscript (PMID: 33659472, Fig1) explicitly states that "Formation of granuloma like structures can be observed after 7-8 days", the implication being that prior to 7 days granuloma like structures cannot be observed reliably. Regardless, it seems evident that the authors will not be conducting additional experiments for this publication, which I find acceptable. However a proper negative control would certainly strengthen evidence for the association of strain specific bacterial and host responses with the granulomatous response in this model.

      We had interpreted the reviewer’s previous comment regarding PBMC aggregation as referring to a different experimental model rather than a matter of timing. Since many other studies have previously assessed the impact of strain/lineage variability in macrophage responses, in this work we decided to focus on later time points and we did include uninfected as a negative control. Nonetheless, we agree it would be indeed very interesting to additionally evaluate monocyte/macrophage early responses and we will take it into account for future studies.

      Use of antiquated terminology:

      I can appreciate the desire to establish continuity between publications by using the same abbreviation for TNF but it will come at a cost. Using outdated terms in general makes people more dismissive of the work. Perhaps something to consider.

      Since this seems an important issue to the reviewer, we have replaced the term TNF-a with TNF throughout the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) Adding microscopy of the untreated group to compare Figure 2A with would further strengthen the findings here.

      First of all, we would like to thank Reviewer #1 for their comments and efforts on our manuscript. We have carefully revised it. We used a time-lapse method to capture images at 0 minutes, before any drugs were added. We will change '0 min' to 'untreated,' which will further strengthen the findings.

      (2) Quantification of immune infiltration and histological scoring of kidney, liver, and spleen in the various treatment groups would increase the impact of Figure 4.

      Thank you very much to Reviewer #1 for their comments and efforts on our manuscript. We have revised it carefully. We conducted quantitative analysis of immune infiltration in the kidney, liver, and spleen across different treatment groups. However, due to the extremely low number of abnormal cells in the negative control, treatment, and prophylactic groups, neither the instrument nor manual methods could reliably gate the cells. Consequently, quantification of immune infiltration and histological scoring were not performed.

      (3) The data in Figure 6 I is not sufficiently convincing as being significant.

      Thanks so much for Reviewer #1 comments and efforts for our manuscript. We have revised it carefully. Previous researches have shown that antibiotics and other drugs can cause alterations in gut microbiota. Therefore, we plan to study the effects of antibiotics on gut microbiota. To conduct this research, we need to isolate these microbes from the gut. Although this process is challenging, we still aim to explore the gut microbiota. If possible, we will continue to delve into interesting aspects of how antibiotics affect gut microbiota in future studies.

      (4) Comparisons of the global transcriptomic analysis of the untreated group to the PC, LP, and LT groups would strengthen the author's claims about the immunological and transcriptomic changes caused by linalool and provide a true baseline.

      Thanks so much for Reviewer #1 comments and efforts for our manuscript. We have revised it carefully. Due to the initial research design and data analysis strategy, we have focused on comparisons among the PC, LP, and LT groups to more directly explore the differences under various treatment conditions. Specifically, while the transcriptomic data from the untreated group could provide a basic reference, it has shown limited relevance to the core hypotheses of our study. Our research has aimed to investigate the immunological and transcriptomic changes among the treatment groups rather than comparing treated and untreated states. We believe that the current experimental design and data analysis have effectively revealed the mechanisms of linalool and that the additional comparisons among the treatment groups have further supported our conclusions. We hope the reviewer understands the rationale behind our experimental design. If there are additional suggestions, we are more than willing to further optimize the content of our manuscript.

      Reviewer #2 (Public review):

      (1) The authors have taken for granted that the readers already know the experiments/assays used in the manuscript. There was not enough explanation for the figures as well as figure legends.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. We will provide more detailed explanations of the experiments and assays used in the manuscript, as well as enhance the descriptions in the figure legends, to ensure that readers have a clear understanding of the figures and their context.

      (2) The authors missed adding the serial numbers to the references.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. We will add serial numbers to the references to ensure proper citation and improve the clarity of our manuscript.

      (3) The introduction section does not provide adequate rationale for their work, rather it is focused more on the assays done.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. We will add a section to the introduction that provides a rationale for our work, specifically focusing on the impact of plant extract on immunoregulation.

      (4) Full forms are missing in many places (both in the text and figure legends), also the resolution of the figures is not good. In some figures, the font size is too small.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We will ensure that all abbreviations are expanded where necessary, both in the text and figure legends. Additionally, we will improve the resolution of the figures and increase the font size where needed to enhance clarity.

      (5) There is much mislabeling of the figure panels in the main text. A detailed explanation of why and how they did the experiments and how the results were interpreted is missing.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. We will improve the labeling of the figure panels, provide detailed explanations of the experimental methods, including their rationale and interpretation, and clarify the connections between the methods.

      (6) There is not enough experimental data to support their hypothesis on the mechanism of action of linalool. Most of the data comes from pathway analysis, and experimental validation is missing.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Actually, in our manuscript the transcriptomic data are not alone, and we carried out many experiments to substantiate the changes inferred from the transcriptomic data as SEM, TEM, CLSM, molecular docking, RT-qPCR, histopathological examinations. The detailed information is listed as below.

      As shown in Figure 2, we combined the transcriptomic data related to membrane and organelle with SEM, TEM, and CLSM images. After deep analysis of these data and observation together, we illustrated that cell membrane may be a potential target for linalool.

      As shown in Figure 3, we carried out molecular docking to explore the specific binding protein of linalool with ribosome which were screen out as potential target of linalool by transcriptomic data.

      As shown in Figure 5, transcriptomic data illustrated that linalool enhanced the host complement and coagulation system. To substantiate these changes, we carried out RT-qPCR to detect those important immune-related gene expressions, and found that RT-qPCR analysis results were consistent with the expression trend of transcriptome analysis genes.

      As shown in Figure 4 and 5, transcriptomics data revealed that linalool promoted wound healing tissue repair, and phagocytosis (Figure. 5E). To ensure these, we carried out histopathological examinations, and found that linalool alleviated tissue damage caused by S. parasitica infection on the dorsal surface of grass carp and enhancing the healing capacity (Figure. 4G).

      Overall, we will conduct additional experiments to verify the mechanism of action of linalool in the future.

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1 Panel G is not referenced in the legend, this should be fixed

      Thanks so much for Reviewer #1 comments and efforts for our manuscript. We have revised it carefully. Please check the Figure 1. The order of Panel F and G in Figure 1 is wrong. We have modified the order of Figure 1.

      (2) Statistical comparisons between groups in Figure 4 Panels C-F is lacking and should be added.

      Thanks so much for Reviewer #1 comments and efforts for our manuscript. We have revised it carefully. Please check the Figure 4 C-F. We have added statistical comparisons between groups in Figure 4 Panels C-F.

      (3) Capitalize Kidney label in Figure 4G.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check the Figure 4G. We have capitalized the K of kidney.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors missed adding the serial numbers to the references. I could not go through the references to cross-check if they cited the right ones because it's extremely difficult to figure out which one corresponds to which reference number.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check the references. We have added the serial numbers to the references.

      (2) In the last paragraph of the introduction section, most of the techniques in the paper were summarized which does not go with the flow of the paper. The introduction should not be focused on the different techniques used the focus should be more on the rationale of the work. It would be nice if the last paragraph could be rewritten.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check it in Line 85-94. We have added a section to the introduction that provides a rationale for our work, specifically focusing on the impact of plant extract on immunoregulation.

      (3) The resolution of the figures is not good.

      Thank you for your suggestion. We have revised it carefully. Please check all the figures. We have increased the resolution and size of all the figures.

      (4) Mostly, the figure legends sound like results, with not enough explanation. Full forms are missing in many places which would make the readers go back to the text/other figures each time.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check it throughout the manuscript and all the figure legends. We have added full names and abbreviations to both the manuscript and all the figure legends so that we don't make the readers go back to the text/other figures each time.

      (5) Figure 1:

      Figure 1A: there is not enough explanation for this panel. It's not clear from the text which other EOs than Linalool are referred to here. Which EOs were extracted from daidai flowers?

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check it in the Figure 1A. Figure 1A is divided into “Essential oils (EOs)” and “The main compounds of EOs” to make it easier to distinguish.

      Figure 1B: do the three different wells of each set represent three replicates? If so, are they biological/technical replicates? Also, I'm not sure how the MFC was determined from this figure (line 116) because clearly this panel only corresponds to the determination of MICs, not MFCs.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check it in Line 126-130. The three different wells of each set represent three biological replicates. After adding 5 μL of resazurin dye, when the color of the wells turned to pink, the linalool concentration in the first non-pink well corresponded to the MIC. The culture liquid in the well where no mycelium growth was seen was marked onto the plate and incubated at 25°C for 7 days. The well with the lowest linalool concentration and no mycelium growth was identified as MFC.

      Figure 1C: the figure legend says that the effect of linalool on mycelium growth inhibition was done over a 6hr timepoint but according to the figure the timepoint was 60hr. I am also confused about the concentrations of linalool used. Although a range of concentration from 0 to 0.4% is mentioned, I only see the time vs diameter curves for 7 concentrations.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check it in Line 983 and Figure 1C. We have changed 6 h to 60 h in the figure legends. The reason why only the time vs diameter curves for 7 concentrations in Figure1C is that the growth inhibition of 0.4%, 0.2% and 0.1% linalool on mycelial growth is the same. As a result, the time vs diameter curves coincide. We have shown the time and diameter curves of 0.4%, 0.2% and 0.1% concentration with three dotted lines of different colors and sizes in Figure 1C.

      Figure 1D: mislabeled as 1G in the figure panel.

      Figures 1E and 1G: Figure 1E is missing and I do not see any figure legend for Figure 1G.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check the Figure 1. The order of Panel F and G in Figure 1 is wrong. We changed the order of Figure 1 ABCDEF, no Figure G.

      Overall, Figure 1 is very confusing and needs rewriting. Also, there is a need to add more explanation of the figure panels in the results section.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check the Figure 1. We have corrected all the problems in Figure1. And we have added more explanation of the figure panels in the results section, and increased the correlation between methods, in order to show how to carry out the experiment logically and interpret the results, please check them in Line 126-130, 144-147, 174-179, 213-217, 343-345, 677-682.

      (6) Figure 2:

      The authors could justify the reason for doing the experiments before moving into the results they got.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check the methods and results in the manuscript, please check them in Line 126-130, 144-147, 174-179, 213-217, 343-345, 677-682. We have added more explanation of the figure panels in the results section, and increased the correlation between methods, in order to show how to carry out the experiment logically and interpret the results.

      What concentration of linalool was used?

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check it in Line 992-996. The mycelium treated with 6×MIC (0.3%) linalool was observed by Confocal laser scanning microscopy (CLSM), and the mycelium treated with 1×MIC 0.05% linalool was observed by Scanning Electron Microscope (SEM) and transmission electron microscopy (TEM).

      The full form of DEGs has been mentioned later, but it should be mentioned in the figure legend of Figure 2 as this is the first time the term was used. Also, what is the full form of DEPs?

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check it in Line 168, 175, 182, 631, 998, 1001. The word DEPs in Figure 2I was incorrect, and we have changed DEPs to DEGs.

      Is there a particular reason for looking into the cellular component rather than molecular function and biological processes in the GO analysis? (what I see is that Figure 2H indicates the prevalence of catalytic activity, binding, cellular, and metabolic processes as well). Also, there is not enough explanation of the observation from Figure 2I (both in the results section and figure legend).

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check it in Line 174-179, 998-1002 (Figure 2I). The reason we looked at cellular components rather than molecular functions and biological processes in GO analysis is because we focused more on the effects of cell membranes and cell walls. These results are closely related to and echo the results of our scanning electron microscopy (SEM) and transmission electron microscopy (TEM), and also support the results of electron microscopy. Enough explanations have added to the results and figure legend section to explain the observations from Figure 2I.

      (7) Figure 3:

      Figures 3A and 3B: The adjusted p value is already indicated in the figures, so there is no need to add statistical significance (Asterix) to each bar. The resolution for these panels is not good and the font is too small.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check the Figure 3A and 3B. We have removed statistical significance (Asterix) from Figure3A and 3B. If we are lucky, we will upload the clearest figures when the manuscript is published.

      Figure 3C: the figure legend is missing (wrongly added as KEGG analysis, which should be network analysis). The numbering for the figure legends is wrong. What are the node sizes (5, 22, 40, 58) mentioned in the figure represent? Also, I wonder why ribosome biogenesis in eukaryotes has been indicated as the most enriched pathway despite its less connection to the other nodes.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check the Figure 3C. Figure 3C is KEGG analysis generated by software, not network analysis. For the convenience of readers, we have made a new Figure of KEGG analysis.

      Figure 3D: KEGG enrichment and GO analysis: global/local search? Which database was used as a reference?

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check the 633-635. Functional enrichment analysis was performed using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. KEGG pathway analysis was conducted using Goatools.

      Figure 3E: why were the RNA pol structures compared? The authors did not mention anything about this panel in their results.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check the line 207. We found that many DEGs related to ribosome biogenesis (Figure 3D) and RNA polymerase (Figure 3E) are down expressed. Because RNA polymerase is closely related to ribosome biogenesis, the downregulation of RNA polymerase directly affects the synthesis of ribosome-related RNAs, including rRNA, mRNA, and tRNA, thereby inhibiting ribosome production. This relationship is particularly significant in cell growth, division, and the response to external environmental changes.

      Figures 3F and 3G: please mention which model is illustrated (ribbon/sphere model).

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check the line 1010-1015. The tertiary structure of NOP1 was displayed using a cartoon representation. Molecular docking of linalool with NOP1 was performed by enlarging the regions binding to the NOP1 activation pocket to showcase the detailed amino acid structures, which were presented using a surface model, while the small molecule was displayed with a ball-and-stick representation.

      Figure 3H: this panel needs more explanation. Why were some of the ABC transporters upregulated while some were downregulated?

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. It is a common phenomenon that microorganisms adjust the expression of genes related to substance transport in response to different environmental stimuli to optimize their survival strategies. The expression of ATP-binding cassette (ABC) transporters can be upregulated or downregulated due to various factors, such as environmental stimuli, metabolic demands, energy consumption, species specificity, and signaling molecules. This explains why some ABC transporters are upregulated while others are downregulated.

      (8) Figure 4:

      There was no statistical significance shown in the figures (D-F) which makes me wonder how they worked out that there was any significant increase/decrease, as mentioned in the text. What are the p values? What is the number of replicates? What concentration of linalool was used?

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully.  Please check the Figure 4D-F. In this study, 4 groups were established: (1) Positive control (PC) group (10 fish infected with S. parasitica). (2) Linalool therapeutic (LT) group (10 fish infected with S. parasitica, soaked in 0.00039% linalool in a 20L tank for 7 days). (3) Linalool prophylactic (LP) group (10 uninfected fish soaked in 0.00039% linalool in a 20L tank for 2 days, followed by the addition of 1×10<sup>6</sup> spores/mL secondary zoospores). (4) Negative control (NC) group (10 uninfected fish without linalool treatment). Each group had 3 replicate tanks. In each group, 8 fish were utilized for immunological assays, and on day 7, blood samples were collected from the tail veins using heparinized syringes and left to coagulate overnight at 4°C. Kits from Nanjing Jiancheng Institute (Nanjing, China) were used to measure lysozyme (LZY) activity, superoxide dismutase (SOD) activity, and alkaline phosphatase (AKP) activity.

      (9) Figure 5:

      Again, the resolution and font size are off. Please mention the full forms of the terms used in the figure legend. The interpretation of the in vivo protective mechanism of linalool is completely based on GO enrichment and KEGG pathway analysis (also some transcriptional analysis). The only wet lab validation done was by checking the mRNA level of some cytokines but that does not necessarily validate what the authors claim.

      Thank you for your suggestion. We have revised it carefully. Please check all the figures and figure legend. We have increased the resolution and size of all the figures and used the full forms of the terms in figure legend. If we are lucky, we will upload the clearest figures when the manuscript is published. Currently, in the field of aquaculture research, mRNA quantification at the genetic level faces numerous challenges compared to model organisms like mice and zebra fish, primarily due to the lack of available antibodies. For instance, antibodies related to grass carp have not yet been commercialized, making protein-level studies and validations significantly more difficult. This lack of antibodies limits the progress of protein verification. However, we hope to design more experiments and validation tests in the future to gradually overcome these technical bottlenecks and provide stronger support for research in the future.

      (10) Figure 6:

      There is not enough explanation on why and how the experiments were done. It seems like the authors already presumed that the readers know the experiments. The interpretation of the PCA plot is not clear. Why are the quadrant sizes different? How was the heat map plotted? Also, the claim of linalool regulating the gut microbiota is only dependent on the correlation analysis and there is no wet lab validation for this. The data represented in this figure is not enough to prove their hypothesis and needs further investigation.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check the Figure 6. We will improve the labeling of the figure panels, provide detailed explanations of the experimental methods, including their rationale and interpretation, and clarify the connections between the methods.

      The goal of PCoA is to preserve the distance relationships between samples as much as possible through the principal coordinates, thereby revealing the differences or patterns in microbial composition among different groups. For example, in our study, PCoA analysis demonstrated that the microbial compositions of the positive control (PC), linalool prophylactic (LP), and linalool therapeutic (LT) groups showed significant differences in the reduced dimensional space, possibly indicating that these treatments had a notable impact on the microbial community.

      In our study, the heatmap was generated using the Majorbio Cloud Platform. This platform visualized the preprocessed microbial community data, providing an intuitive representation of the differences in microbial composition and relative abundance among samples. The platform automatically performed steps such as data normalization, color mapping, and clustering analysis, offering convenience for data analysis and interpretation.

      Previous researches have shown that antibiotics and other drugs can cause alterations in gut microbiota. Therefore, we plan to study the effects of antibiotics on gut microbiota. To conduct this research, we need to isolate these microbes from the gut. Although this process is challenging, we still aim to explore the gut microbiota. If possible, we will continue to delve into interesting aspects of how antibiotics affect gut microbiota in future studies.

      (11) Figure 7:

      This figure does not clarify how they did the interpretation. The in vivo study does not phenocopy their in vivo studies.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. we have carefully reviewed and confirmed the current experimental design and data analysis. Although we have not made any changes to Figure 7, we have further clarified the interpretation of the results in the revised manuscript, especially concerning the discrepancies between the in vivo and in vitro studies. We have added more experimental background information to help better understand the possible reasons for these differences. We hope the reviewer will understand our explanation and we look forward to your further feedback.

      (12) Minor comments:

      Line 61: what's meant by "et al"?

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check it in Line 61. We have removed "et al".

      Line 87-88: please add a citation referring to the earlier studies.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check it in Line 109.

      Line 151-152: the term "related to" has been used a couple of times. Mentioning it once in the beginning and avoiding repeating the same word might be better.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check it in Line 168-171.We have rewritten this paragraph to avoid repeating the word “related to”.

      How did they reconstitute the EO compounds?

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. The EO compounds we used in our experiments were partially extracted from essential oils in the laboratory and partially purchased from ThermoFisher (USA).

      Line 544: needs explanation of how there was a 2-fold dilution in the concentrations shown in the figure compared to the concentrations mentioned here.

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. We set the concentration of MIC assay for mycelium to be 0.8%, 0.4%, 0.2%, 0.1%, 0.05%, 0.025%, 0.0125%, and 0.00625%, and the concentration of MIC assay for spores to be 0.4%, 0.2%, 0.1%. 0.05%, 0.025%, 0.0125%, 0.00625%. Figure 1B shows the MIC determination of linalool on spores, while the MIC determination of mycelium is not shown.

      Line 546: remove "were".

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check it in Line 573. We have removed "were".

      Line 555: what concentration of malachite green and tween 20 was used?

      Thanks so much for Reviewer #2 comments and efforts for our manuscript. We have revised it carefully. Please check it in Line 579-580. 2.5mg /mL malachite green and 1% Tween 20 were used.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1:

      (1) Some conclusions are not completely supported by the present data, and at times the manuscript is disjoint and hard to follow. While the work has some interesting observations, additional experiments and controls are warranted to support the claims of the manuscript.

      Thank you for the comments. We revised some of the claims and conclusions to be more objective and result-supportive.

      (2) While the authors present compelling data that is relevant to the development of anti-bacterial vaccinations, the data does not completely match their assertions and there are places where some further investigation would further the impact of their interesting study.

      We do not fully agree with the reviewer's comments. We have demonstrated that changes in CPS levels during infection are associated with pathogenesis, which will guide future studies on the underlying mechanisms. A significant amount of effort is required for studying mechanisms, which is beyond the scope of this research. We concur with the reviewer that assertions should be made cautiously until further studies are conducted. We have revised these assertions to align with the data and to avoid extrapolating the results (pages 7, lines 126, 133-136; page 11, lines 216-218; page 13, line 264; and page 18, lines 378-383).

      (3) The difference in the pathogenesis of a log phase vs. stationary phage intranasal infection would be interesting. Especially because the bacteria is a part of the natural microbial community of swine tonsils, it is curious if the change in growth phase and therefore CPS levels may be a causative reason for pathogenic invasion in some pigs.

      S. suis is a part of the natural microbial community of swine tonsils but not mouse NALT. It is interesting to know if CPS levels are low in pig tonsils since CPS is hydrophilic and not conducive to bacterial adhesion. In the study, mice were i.n. infected with a high dose of the bacteria, which could increase opportunities for dissemination (acidic acid may not be a contributor since with or without it is similar). S. suis getting into other body compartments from pig tonsils might be triggered by other conditions, such as viral coinfection, nasal cavity inflammation, cold weather, and decreased immunity.

      Experiments with pig blood and phagocytes have shown that genes involved in the synthesis of CPS are upregulated in pig blood. In contrast, these genes are downregulated [1]. In addition, the absence of CPS correlated with increased hydrophobicity and phagocytosis, proposing that S. suis undergoes CPS phase variation and could play a role in the different steps of S. suis infection [2]. We showed direct evidence of encapsulation modulation associated with S. suis pathogenesis in mice. A pig infection model is required to confirm these findings.

      (4) The authors should consider taking the bacteria from NALT/CSF and blood and compare the lag times bacteria from different organs take to enter a log growth phase to show whether the difference in CPS is because S. suis in each location is in a different growth phase. If log phase bacteria were intranasally delivered, would it adapt a stationary phase life strategy? How long would that take? 

      What causes CPS regulation in vivo is not known. CPS changes in different culture stages, indicating that stress, such as nutrition levels, is one of the signals triggering CPS regulation. The microenvironment in the body compartments is far more complex than in vitro, in which host cells, immune factors and others may affect CPS regulation, individually or collectively. The reviewer’ question is important but the suggested experiment is impracticable since bacterial numbers taken from organs are few, and culturing the bacteria in vitro would obliterate the in vivo status.  

      (5) Authors should be cautious about claims about S. suis downregulating CPS in the NALT for increased invasion and upregulating CPS to survive phagocytosis in blood. While it is true that the data shows that there are different levels of CPS in these locations, the regulation and mechanism of the recorded and observed cell wall difference are not investigated past the correlation to the growth phase.

      We lower the tone and change the claim as “suggest a correlation between lower CPS in the NALT and a greater capacity for cellular association, whereas elevated CPS levels in the blood are linked to improved resistance against bactericidal activity. However, the mechanisms behind these associations remain unknown.” (page 7, lines 133-136).

      (6) The mouse model used in this manuscript is useful but cannot reproduce the nasal environment of the natural pig host. It is not clear if the NALTs of pigs and mice have similar microbial communities and how this may affect the pathogenesis of S. Suis in the mouse. Because the authors show a higher infection rate in the mouse with acetic acid, they may want to consider investigating what the mouse NALT microenvironment is naturally doing to exclude more bacterial invasion. Is it simply a host mismatch or is there something about the microbiome or steady-state immune system in the nose of mice that is different from pigs?

      It is a very interesting comment. The mice are SPF level. The microenvironment in SPF mouse NALT should be significantly different from conventional pig tonsils. Although NALT in mice resembles pig tonsils in function, many factors may contribute to the sensitivity to S. suis colonization in the pig nasal cavity, such as the microbiome and local steady-state immune system. More complex microbiota in tonsils could be one of the factors. Analyzing what makes S. suis inclined towards colonization in pig tonsils by SPF and conventional pigs are an ideal experiment to answer the question. 

      (7) Have some concerns regarding the images shown for neuroinvasion because I think the authors mistake several compartments of the mouse nasal cavity as well as the olfactory bulb. These issues are critical because neuroinvasion is one of the major conclusions of this work.

      Thank you for your comments. The olfactory epithelium (OE) is located directly underneath the olfactory bulb in the olfactory mucosa area and lines approximately half of the nasal cavities of the nasal cavity. The remaining surface of the nasal cavity is lined by respiratory epithelium, which lacks neurons. The olfactory receptor neuron in OE is stained green in the images by β-tubulin III, a neuron-specific marker. The respiratory epithelium is colorless due to the absence of nerve cells. Similarly, the green color stained by β-tubulin III identifies the olfactory bulb. The accuracy of the anatomic compartments of the mouse nasal cavity has been checked and confirmed by referring to related literature [3, 4].

      References

      (1) Wu Z, Wu C, Shao J, Zhu Z, Wang W, Zhang W, Tang M, Pei N, Fan H, Li J, Yao H, Gu H, Xu X, Lu C. The Streptococcus suis transcriptional landscape reveals adaptation mechanisms in pig blood and cerebrospinal fluid. RNA. 2014 Jun;20(6):882-98.

      (2) Charland N, Harel J, Kobisch M, Lacasse S, Gottschalk M. Streptococcus suis serotype 2 mutants deficient in capsular expression. Microbiology (Reading). 1998 Feb;144 ( Pt 2):325-332.

      (3) Pägelow D, Chhatbar C, Beineke A, Liu X, Nerlich A, van Vorst K, Rohde M, Kalinke U, Förster R, Halle S, Valentin-Weigand P, Hornef MW, Fulde M. The olfactory epithelium as a port of entry in neonatal neurolisteriosis. Nat Commun. 2018;9(1):4269.

      (4) Sjölinder H, Jonsson AB. Olfactory nerve--a novel invasion route of Neisseria meningitidis to reach the meninges. PLoS One. 2010 Nov 18;5(11):e14034.

      Reviewer 2:

      (1) However, there are serious concerns about data collection and interpretation that require further data to provide an accurate conclusion. Some of these concerns are highlighted below:

      Both reviewers were concerned about some of the interpretations of the results. We modified the interpretations in related lines throughout the manuscript (Please see the related responses to Reviewer 1).

      (2) In figure 2, the authors conclude that high levels of CPS confer resistance to phagocytic killing in blood exposed S. suis. However, it seems equally likely that this is resistance against complement mediated killing. It would be important to compare S. suis killing in animals depleted of complement components (C3 and C5-9).

      We thank the reviewer for the comment. The experiment should be Bactericidal Assay instead of anti-phagocytosis killing. CPS is a main inhibitor of C3b deposition [1]. It interferes with complement-mediated and receptor-mediated phagocytosis; and direct killing. Data in Figure 2C is expressed as “% of bacterial survival in whole blood” for clarity (page 8, Fig. 2C and page 23, lines 489-490).

      (3) Intranasal administration non-CPS antisera provides a nice contrast to intravenous administration, especially in light of the recently identified "blood-olfactory barrier". Can the authors provide any insight into how long and where this antibody would be located after intranasal administration? Would this be antibody mediated cellular resistance, or something akin to simple antibody "neutralization"

      Anti-V5 may not stay long locally following intranasal administration. Efficient reduction of S. suis colonization in NALT supports that anti-V5 could recognize and neutralize the bacteria in NALT quickly, thereby reducing further dissemination in the body. Antibody-mediated phagocytosis may not play a major role because neutrophils are mainly present in the blood but not in the tissues.  

      (4) The micrographs in Figure 7 depict anatomy from the respiratory mucosa. While there is no histochemical identification of neurons, the tissues labeled OE are almost certainly not olfactory and in fact respiratory. However, more troubling is that in figures 7A,a,b,e, and f, the lateral nasal organ has been labeled as the olfactory bulb. This undermines the conclusion of CNS invasion, and also draws into question other experiments in which the brain and CSF are measured.

      We understand the significance of your concerns and appreciate your careful review of Figure 7. The olfactory epithelium (OE) is situated directly beneath the olfactory bulb in the olfactory mucosa area and covers about half of the nasal cavity. This positioning allows information transduction between the olfactory and the olfactory epithelium. The remaining surface of the nasal cavity is lined with respiratory epithelium, which does not contain neurons and primarily serves as a protective barrier. In contrast, the olfactory epithelium consists of basal cells, sustentacular cells, and olfactory receptor neurons. The olfactory receptor neurons are specifically stained green in the images using β-tubulin III, a marker that is unique to neurons. The respiratory epithelium appears colorless due to the lack of nerve cells. Similarly, the green staining with β-tubulin III also highlights the olfactory bulb. The anatomical structures indicated in the images are consistent with those described in the literature [2, 3], confirming that the anatomy of the nasal cavity has been accurately identified.

      (5) Micrographs of brain tissue in 7B are taken from distal parts of the brain, whereas if olfactory neuroinvasion were occurring, the bacteria would be expected to arrive in the olfactory bulb. It's also difficult to understand how an inflammatory process would be developed to this point in the brain -even if we were looking at the appropriate region of the brain -within an hour of inoculation (is there a control for acetic acid induced brain inflammation?). Some explanations about the speed of the immune responses recorded are warranted.

      Thank you for highlighting this issue. Cerebrospinal fluid (CSF) flows into the subarachnoid space surrounding the spinal cord and the brain. There are direct connections from this subarachnoid space to lymphatic vessels that wrap around the olfactory nerves as they cross the cribriform plate towards the nasal submucosa. This connection allows for the drainage of CSF into the nasal submucosal lymphatics in mice [4, 5]. Bacteria may utilize this CSF outflow channel in the opposite direction, which explains the development of brain inflammation in the distal areas of brain tissue adjacent to the subarachnoid space. We have included additional relevant information in the revised manuscript (page 16, lines 323-325).

      (6) The detected presence of S. suis in the CSF 0.5hr following intranasal inoculation is difficult to understand from an anatomical perspective. This is especially true when the amount of S. suis is nearly the same as that found within the NALT. Even motile pathogens would need far longer than 0.5hr to get into the brain, so it's exceedingly difficult to understand how this could occur so extensively in under an hour. The authors are quantifying CSF as anything that comes out of the brain after mincing. Firstly, this should more accurately be referred to as "brain", not CSF. Secondly, is it possible that the lateral nasal organ -which is mistakenly identified as olfactory bulb in figure 7- is being included in the CNS processing? This would explain the equivalent amounts of S. suis in NALT and "CSF".

      The high dose of inoculation used in the experiment may explain the rapid presence of S. suis in the CSF. Mice exhibit low sensitivity to S. suis infection, and the range for the effective intranasal infectious dose is quite narrow. Higher doses lead to the quick death of the mice, while lower doses do not initiate an infection at all. The dose used in this study is empirical and is intended to facilitate the observation of the progression of S. suis infection in mice.

      The NALT tissue and CSF samples are collected separately. After obtaining the NALT tissue, the nasal portion was carefully separated from the rest of the head along the line of the eyeballs. The brain tissue was then extracted from the remaining part of the head to collect the CSF, and it was lacerated to expose the subarachnoid space without being minced. This procedure aims to preserve the integrity of the brain tissue as much as possible. Further details about the CSF collection process can be found in the Materials and Methods section (page 24, lines 508-512).

      (7) To support their conclusions about neuroinvasion along the olfactory route and /CSF titer the authors should provide more compelling images to support this conclusion: sections stained for neurons and S. suis, images of the actual olfactory bulb (neurons, glomerular structure etc).

      Thank you. We respectfully disagree with the reviewer. We stained neurons using a neuron-specific marker to identify the anatomical structures of the olfactory bulb and olfactory epithelium (in green). We used an S. suis-specific antibody to highlight the bacteria present in these areas (in orange and red). The images, along with the bacteria found in the cerebrospinal fluid (CSF) and the brain inflammation observed early in the infection, strongly support our conclusion regarding brain invasion through the olfactory pathway. Please see the response to question 4 for further clarification.

      References

      (1) Seitz M, Beineke A, Singpiel A, Willenborg J, Dutow P, Goethe R, Valentin-Weigand P, Klos A, Baums CG. Role of capsule and suilysin in mucosal infection of complement-deficient mice with Streptococcus suis. Infect Immun. 2014 Jun;82(6):2460-71.

      (2) Sjölinder H, Jonsson AB. Olfactory nerve--a novel invasion route of Neisseria meningitidis to reach the meninges. PLoS One. 2010 Nov 18;5(11):e14034.

      (3) Pägelow D, Chhatbar C, Beineke A, Liu X, Nerlich A, van Vorst K, Rohde M, Kalinke U, Förster R, Halle S, Valentin-Weigand P, Hornef MW, Fulde M. The olfactory epithelium as a port of entry in neonatal neurolisteriosis. Nat Commun. 2018;9(1):4269.

      (4) Yoon JH, Jin H, Kim HJ, Hong SP, Yang MJ, Ahn JH, Kim YC, Seo J, Lee Y, McDonald DM, Davis MJ, Koh GY. Nasopharyngeal lymphatic plexus is a hub for cerebrospinal fluid drainage. Nature. 2024 Jan;625(7996):768-777.

      (5) Spera I, Cousin N, Ries M, Kedracka A, Castillo A, Aleandri S, Vladymyrov M, Mapunda JA, Engelhardt B, Luciani P, Detmar M, Proulx ST. Open pathways for cerebrospinal fluid outflow at the cribriform plate along the olfactory nerves. EBioMedicine. 2023 May;91:104558.

      Response to Recommendations for the authors:

      Reviewer 1:

      Minor concerns for the manuscript:

      (1) In the introduction, please consider giving a little more background about the bacteria itself and how it causes pathogenesis.

      We appreciate your suggestion. We have included additional background on the virulent factors and the pathogenesis of the bacteria in the introduction to enhance understanding of the results (page 4, lines 63-69).

      (2) Figure 2C would be more correct to say percent survival as the CFUs before and after are what are being compared and not if the bacteria is being phagocytosed or not. Flow cytometry of the leukocytes and a fluorescent S. Suis would show phagocytosis. Unless that experiment is performed, the authors cannot claim that there is a resistance to phagocytosis.

      Thank you for your feedback. We agree with the reviewer that the experiment should be Bactericidal Assay rather than anti-phagocytosis killing. CPS interferes with complement-mediated phagocytosis and direct killing, and receptor-mediated phagocytosis. To enhance clarity, the data in Fig. 2C has been presented as “% of bacterial survival in whole blood” (page 8).  

      (3) There are two different legends present for Figure 1. Please resolve.

      We apologize for the oversight. The redundant figure legend has been removed (page 6).

      (4) There are places such as in lines 194-195, that there are assertions and interpretations about the data that are not directly drawn from the data. These hypotheses are valuable, but please move them to the discussion.

      Thank you for your suggestion. The hypothesis has been moved to the Discussion section (page 19, lines 402 - 405).

      (5) In Figure 4B, higher resolution images would strengthen the ability of non-microbiologists to see the differences in CPS levels in the cell wall.

      We achieved the highest resolution possible for clearer distinctions in CPS levels. To enhance the visualization of the different CPS levels in the images, we revised the description of the CPS changes in Figure 4B within the results section (page 11, lines 208-213).

      (6) In Figure 5 there is no D. Further, the schematics throughout would be easier to parse with the text if the challenge occurred at time 0. Consider revising them for clarity.

      Thank you for highlighting the error. We have removed "i.v + i.n (Fig. 5)" from Figure 5A and made adjustments to the schematic illustrations in Figures 5 and 6 as recommended by the reviewer (page 14).

      (7) What is the control for the serum? The findings for figures 5 and 6 would be much stronger if a non- S. Suis isotype control serum was also infused.

      We used a naive serum as a control to avoid interference from a non-S. suis isotype control that targets other surface molecules of S. suis serotypes.

      (8) Figure 6 legend does not include the anti-CPS treatment.

      Thank you. We have added anti-CPS serum in the legend (page 15, line 249).

      (9) Figure 7 legend does not include the time point for panel 7A.

      Thank you. The time point is shown on Fig.7A (page 17).

      (10) Figure 7 should show OB micrographs or entire brain including the OB.

      The neuron-specific marker, β-tubulin III, identifies the neuro cells in the olfactory bulb (OB) as shown in Fig. 7A. Unfortunately, we were unable to provide an image of the entire brain that includes the OB due to limitations in our section preparation. We apologize for the mislabeled structure in Fig. 7A, which may have caused confusion. We have corrected the labeling for consistency (see page 15, lines 257-260). Additionally, we included a drawing of the sagittal plane of the rodent's nose, depicting the compartments of the OB, olfactory epithelium (OE), nasal cavity (NC), and brain. This illustration, presented in Fig. 7B on page 17, aims to clarify the structural and functional connections between the nasopharynx and the CNS.

      (11) Some conclusions may be better drawn if figures were to be consolidated. As noted above, the data at times feels disjointed and the importance is more difficult for readers to follow because data are presented further apart. Particularly figures 5 and 6 which are similar with different time points and controls of antisera administrative routes; placing these figures together would be an example of increasing continuity throughout the paper.

      Thank you for the valuable suggestion. Figures 5 and 6, along with their related descriptions in the results section, have been combined for better cohesiveness (pages 14-15).

      Reviewer #2:

      To support their conclusions about neuroinvasion along the olfactory route and /CSF titer the authors should provide more compelling images to support this conclusion: sections stained for neurons and S. suis, images of the actual olfactory bulb (neurons, glomerular structure etc).

      Please refer to our responses to Reviewer 1's Question 7, Reviewer 2's Questions 4 and 7 in the public reviews, and Reviewer 1's Question 10 in the authors' recommendations.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      The authors have strengthened their conclusions by providing additional information about the specificity of their antibodies, but at the same time the authors have revealed concerning information about the source of their antibodies.

      It appears that many of the antibodies used in this study have been discontinued because the supplier company was involved in a scandal of animal cruelty and all their goats and rabbits Ab products were sacrificed. The authors acknowledge that this is unfortunate but they also claim that the issue is out of their hands.

      The authors' statement is false; the authors ought to not use these antibodies, just as the providing company chose to discontinue them, as those antibodies are tied to animal cruelty. The issue that the authors feel OK with using them is of concern. In short, please remove any results from unethical antibodies.

      Removal of such results also best serves science. That is, any of their results using the discontinued antibodies means that the authors' results are non-reproducible and we should be striving to publish good, reproducible science.

      For the antibodies that do not have unethical origins the authors claim that their antibodies have been appropriately validated, by "testing in positive control tissue and/or Western blot or in situ hybridization". This is good but needs to be expanded upon. It is a strong selling point that the Abs are validated and I want to see additional information in their Supplementary Table 2 stating for each Ab specifically:

      (1) What +ve control tissue was used in the validation of each Ab and which species that +ve control came from. Likewise, if competition assays to confirm validity was used, please also specify.

      (2) Which assay was the Ab validated for (WB, IHC, ELISA, all etc)

      (3) For Antibodies that were validated for, or using WBs please let the reader know if there were additional bands showing.

      (4) Include references to the literature that supports these validations. That is, please make it easy for the reader to appreciate the hard work that went into the validation of the Antibodies.

      Finally, for the Abs, when the authors write that "All antibodies used have been validated by testing in positive control tissue and/or Western blot or in situ hybridization" I fail to understand what in situ hybridisation means in this context. I am under the impression that in situ hybridisation is some nucleic acid -hybridising-to-organ or tissue slice. Not polypeptide binding.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Remove results that have been obtained by unethically-sourced antibody reagents.

      Strengthen the readers' confidence about the appropriateness & validity of your antibodies.

      First, we want to stress that reviewer 1 has raised his critique related to the used of antibodies from Santa Cruz biotechnology not only through the journal. The head of our department and two others were contacted by reviewer 1 directly without going through the journal or informing/approaching the corresponding or first author. It is our opinion that this debate and critique should be handled through the journal and editorial office and not with people without actual involvement in the project.

      It is correct that we have purchased antibodies from Santa Cruz Biotechnologies both mouse, rabbit and goat antibodies as stated in the correspondence with the reviewer.

      As stated in our previous rebuttal – the goat antibodies from Santa Cruz were discontinued due to inadequate treatment of goats after settling with the authorities in 2016.

      https://www.nature.com/articles/nature.2016.19411

      https://www.science.org/content/blog-post/trouble-santa-cruz-biotechnology

      We have used 11 mouse, rabbit or goat antibodies from Santa Cruz biotechnologies in the manuscript as listed in supplementary table 2 of the manuscript and all of them have been carefully validated in other control tissues supported by ISH and/or WB and many of them already used in several publications by our group (https://pubmed.ncbi.nlm.nih.gov/34612843/, https://pubmed.ncbi.nlm.nih.gov/33893301/, https://pubmed.ncbi.nlm.nih.gov/32931047/, https://pubmed.ncbi.nlm.nih.gov/32729975/, https://pubmed.ncbi.nlm.nih.gov/30965119/, https://pubmed.ncbi.nlm.nih.gov/29029242/, https://pubmed.ncbi.nlm.nih.gov/23850520/, https://pubmed.ncbi.nlm.nih.gov/23097629/, https://pubmed.ncbi.nlm.nih.gov/22404291/, https://pubmed.ncbi.nlm.nih.gov/20362668/, https://pubmed.ncbi.nlm.nih.gov/20172873/,  and other research groups. All antibodies used in this manuscript were purchased before the whole world was aware of mistreatment of goats that was evident several years later.

      We do not support animal cruelty in anyway but the purchase of antibodies from Santa Cruz biotechnologies were conducted long before mistreatment was reported. Moreover, antibodies from Santa Cruz biotechnologies are being used in thousands of publications annually. The company has been punished for their misconduct, and subsequently granted permission to produce antibodies from the relevant authorities again.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Despite the study being a collation of important results likely to have an overall positive effect on the field, methodological weaknesses and suboptimal use of statistics make it difficult to give confidence to the study's message.

      Strengths:

      Relevant human and mouse models approached with in vivo and in vitro techniques.

      Weaknesses:

      The methodology, statistics, reagents, analyses, and manuscripts' language all lack rigour.

      (1) The authors used statistics to generate P-values and Rsquare values to evaluate the strength of their findings.

      However, it is unclear how stats were used and/or whether stats were used correctly. For instance, the authors write: "Gaussian distribution of all numerical variables was evaluated by QQ plots". But why? For statistical tests that fall under the umbrella of General Linear Models (line ANOVA, t-tests, and correlations (Pearson's)), there are several assumptions that ought to be checked, including typically:

      (a) Gaussian distribution of residuals.

      (b) Homoskedasticity of the residuals.

      (c) Independence of Y, but that's assumed to be valid due to experimental design.

      So what is the point of evaluating the Gaussian distribution of the data themselves? It is not necessary. In this reviewer's opinion, it is irrelevant, not a good use of statistics, and we ought to be leading by example here.

      Additionally, it is not clear whether the homoscedasticity of the residuals was checked. Many of the data appear to have particularly heteroskedastic residuals. In many respects, homoscedasticity matters more than the normal distribution of the residuals. In Graphpad analyses if ANOVA is used but equal variances are assumed (when variances among groups are unequal then standard deviations assigned in each group will be wrong and thus incorrect p values are being calculated.

      Based on the incomplete and/or wrong statistical analyses it is difficult to evaluate the study in greater depth.

      We agree with the reviewer that we should lead by example and improve clarity on the use of the different statistical tests and their application. In response to the reviewer’s suggestion, we have extended the statistical section, focusing on the analyses used. Additionally, we have specified the statistical test used in the figure legends for each figure. Additionally, we did check for Gaussian distribution and homoskedasticity of residuals before conducting a general linear model test, and this has now been specified in the revised manuscript. In case the assumptions were not met, we have specified which non-parametric test we used. If the assumptions were not met, we specified which non-parametric test was used.

      While on the subject of stats, it is worth mentioning this misuse of statistics in Figure 3D, where the authors added the Slc34a1 transcript levels from controls in the correlation analyses, thereby driving the intercept down. Without the Control data there does not appear to be a correlation between the Slc34a1 levels and tumor size.

      We agree with the reviewer that a correlation analysis is inappropriate here and have removed this part of the figure.

      There is more. The authors make statements (e.g. in the figure levels as: "Correlations indicated by R2.". What does that mean? In a simple correlation, the P value is used to evaluate the strength of the slope being different from zero. The authors also give R2 values for the correlations but they do not provide R2 values for the other stats (like ANOVAs). Why not?

      We agree with the reviewer and have replaced the R2 values with the Pearson correlation coefficient in combination with the P value.

      (2) The authors used antibodies for immunos and WBs. I checked those antibodies online and it was concerning:

      (a) Many are discontinued.

      Many of the antibodies we have used were from the major antibody provider Santa Cruz Biotechnology (SCBT). SCBT was involved in a scandal of animal cruelty and all their goats and rabbits were sacrificed, which explains why several antibodies were discontinued, while the mice antibodies were allowed to continue. This is unfortunate but out of our hands.

      (b) Many are not validated.

      We agree with the reviewer that antibody validation is essential. All antibodies used in this manuscript have been validated. The minimal validation has been to evaluate cellular expression in positive control tissue for instance bone, kidney, or mamma. Moreover, many of the antibodies have been used and validated in previous publications (doi: 10.1593/neo.121164, doi:10.1096/fj.202000061RR, doi: 10.1093/cvr/cvv187) including knockout models. Moreover, many antibodies but not all have been validated by western blot or in situ hybridization. We have included the following in the Materials and Methods section: “All antibodies used have been validated by testing in positive control tissue and/or Western blot or in situ hybridization”.

      (c) Many performed poorly in the Immunos, e.g. FGF23, FGFR1, and Kotho are not really convincing. PO5F1 (gene: OCT4) is the one that looks convincing as it is expressed at the correct cell types.

      We fail to understand the criticism raised by the reviewer regarding the specificity of these specific antibodies. We believe the FGF23 and Klotho antibodies are performing exceptionally well, and FGFR1 is abundantly expressed in many cell types in the testis. As illustrated in Figure 2E, the expression of Klotho, FGF23, and FGFR1 is very clear, specific, and convincing. FGF23 is not expressed in normal testis – which is in accordance with no RNA present there either. However, it is abundantly expressed in GCNIS where RNA is present. On the other hand, Klotho is abundantly expressed in germ cells from normal testis but not expressed in GCNIS.

      (d) Others like NPT2A (product of gene SLC34A1) are equally unconvincing. Shouldn't the immuno show them to be in the plasma membrane?

      If there is some brown staining, this does not mean the antibodies are working. If your antibodies are not validated then you ought to omit the immunos from the manuscript.

      We acknowledge your concerns regarding the NPT2A, NPT2B, and NPT2C staining. While the NPT2A antibody is performing well, we understand your reservations about the other antibodies. It's worth noting that NPT2A is not expressed in normal testis (no RNA either) but is expressed in GCNIS where the RNA is also present. Although it is typically present in the plasma membrane, cytoplasmic expression can be acceptable as membrane availability is crucial for regulating NPT2A function, particularly in the kidney where FGF23 controls membrane availability. We are currently involved in a comprehensive study exploring these phosphate transporters in the organs lining the male reproductive tract. In functional animal models, we have observed very specific staining with this NPT2A antibody following exposed to high phosphate or FGF23. Additionally, we are conducting Western Blot analyses with this antibody, which reinforces our belief that the antibody has a specific binding.

      Reviewer #2 (Public Review):

      Summary:

      This study set out to examine microlithiasis associated with an increased risk of testicular germ cell tumors (TGCT). This reviewer considers this to be an excellent study. It raises questions regarding exactly how aberrant Sertoli cell function could induce osteogenic-like differentiation of germ cells but then all research should raise more questions than it answers.

      Strengths:

      Data showing the link between a disruption in testicular mineral (phosphate)homeostasis, FGF23 expression, and Sertoli cell dysfunction, are compelling.

      Weaknesses:

      Not sure I see any weaknesses here, as this study advances this area of inquiry and ends with a hypothesis for future testing.

      We thank the reviewer for the acknowledgment and highlighting that this is an important message that addresses several ways to develop testicular microlithiasis, which indicates that it is not only due to malignant disease but also frequent in benign conditions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I applaud the authors' approach to nomenclature for rodent and human genes and proteins (italicised for genes, all caps for humans, capitalised only for rodents, etc), but the authors frequently got it wrong when referring to genes or proteins. A couple of examples include:

      (1) SLC34A1 (italics) refers to gene (correct use by the authors) but then again the authors use e.g. SLC34A1 (not italics) to refer to the protein product of SLC34A1(italics) gene. In fact, the protein product of the SLC34A1 (italics) gene is called NPT2A (non-italics).

      (2) OCT4 (italics) refers to gene (correct use by the authors) but then again the authors use e.g. OCT4 (not italics) to refer to the protein product of OCT4 (italics)gene. In fact, the protein product of the OCT4 gene (italics) gene is called PO5F1(non-italics).

      The problem with their incorrect and inconsistent nomenclature is widespread in the manuscript making further evaluation difficult.

      Please consult a reliable protein-based database like Uniprot to derive the correct protein names for the genes. You got NANOG correct though.

      We thank the reviewer for addressing this important point. We have corrected the nomenclature throughout the manuscript as suggested.

      (3) The authors use the word "may" too many times. Also often in conjunction with words like "indicates", and "suggests". Examples of phrases that reflect that the authors lack confidence in their own results, conclusions, and understanding of the literature are:

      "...which could indicate that the bone-specific RUNX2 isoform may also be expressed... "

      "...which indicates that the mature bone may have been..."

      Are we shielding ourselves from being wrong in the future because "may" also means "may not"? It is far more engaging to read statements that have a bit more tooth to them, and some assertion too. How about turning the above statements around, to :

      "...which shows that the bone-specific RUNX2 isoform is also expressed... "

      "...which reveals that the mature bone were..."

      ...then revisit ambiguous language ("may", "might" "possibly", "could", "indicate" etc.) throughout the manuscript?

      It's OK to make a statement and be found wrong in the future. Being wrong is integral to Science.

      Thank you for addressing this. We agree with the reviewer that it is fair to be more direct and have revised many of these vague phrases throughout the manuscript.

      (4) The authors use the word "transporter" which in itself is confusing. For instance, is SLC34A1 an importer or an exporter of phosphate? Or both? Do SLC34As move phosphate in or out of the cells or cellular compartments? "Transporter" sounds too vague a word.

      We understand that it might be easier for the reader with the term "importer". However, we should use the specific nomenclature or "wording" that applies to these transporters. The exact terminology is a co-transporter or sodium-dependent phosphate cotransporter as reported here (doi: 10.1152/physrev.00008.2019). Thus, we will use the terms “co-transporter” and “transporter” throughout the revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We would like to remind the editors and reviewers that the present project is a pilot study that does not claim to produce definitive results. Pilot studies are exploratory preliminary studies to test the validity of hypotheses, the feasibility of a study as well as the research methods and the study design. From our point of view, our hypotheses and the feasibility of the pilot study have been confirmed to such an extent that the implementation of a larger study is justified. At the same time, it became clear during the pilot that the methods and design need to be adapted in some areas in order to increase the reliability of the results - a finding that pilot studies are usually conducted to obtain. We discussed these limitations in detail in order to explain the planned changes in the follow-up study. What the reviewers and editors interpret as incompleteness is therefore due to the nature of a pilot study.  We consider it necessary that appropriate standards are taken into account in the evaluation of the present work.

      In addition, we would like to make a counterstatement as to what our main claims, which should be used to assess the strength of evidence, are - and what they are not:

      In the introduction, we describe the background that led to the formation of our hypotheses: Previous animal and human studies show that food, along with light, serves as the main Zeitgeber for circadian clocks. It has also been shown that chrononutrition can lead to weight loss and improved well-being. Based on this, we hypothesized that individualized meal timing can enhance these positive effects. This hypothesis has been validated on the basis of the available results. Contrary to what the editors and reviewers stated, the assumption that the observed beneficial effects are indeed related to an alteration or resetting of endogenous circadian rhythms was not intended to be investigated in this study and is not one of our main claims. This has already been sufficiently demonstrated and, in our view, need not and should not be repeated in every study on chrononutrition. Accordingly, this assumption was not formulated as a working hypothesis or main claim. It is described in the paper as a potential mechanism, the assumption of which is justified on the basis of previous studies. The lack of a corresponding examination and the erroneous insinuation that corresponding results were nevertheless listed by us in the paper as a main claim should therefore not be used as a criterion for downgrading the assessment of the strength of evidence.

      The main criticism of our study is the collection of data using self-reported food and food quantities. This form of data collection is indeed prone to error, as there is little control over the accuracy of the reported data. However, we believe that this problem is limited in scope.

      (1) Contrary to what the editors and reviewers claim, at no point do we write that we are convinced that food intake has not changed. On the contrary, in Figure 2 we explicitly show that there was a change in what some participants reported to us regarding their food intake. We make it clear throughout the text that we could not find any correlation between weight change and the changes in the reports of food quantities/meals. These statements are correct and only what are actual and formulated main claims should be included in the evaluation of the study.

      (2) As previously stated, we conducted analyses that suggest that an unreported reduction in food intake is unlikely to be the cause of weight loss. For the most part, participants did not change their reporting behavior during the exploration and intervention phases. That is, participants who underreported food intake reported similar amounts in both phases of the study, but lost weight only in the intervention phase. To explain their weight loss with imprecise reporting, it would have to be assumed that these participants began to eat less in the intervention phase and at the same time report more in order to achieve similar calorie counts and food composition in the evaluation. We consider such behavior to be very unlikely, especially since it would apply to numerous participants.

      (3) The editors and reviewers reduce the results to the absence of a correlation between weight loss and reported food quantity and composition. In their assessment of the significance of the findings, however, they ignore the fact that we did find a significant correlation in our analyses, namely between weight loss and an increase in the regularity of food intake. There is no correlation between an increase in regularity and a reduction in reported calories (R<sup>2</sup> = 0.01472). This is credible in our view, as it is unlikely that the more regularly participants ate, the more pronounced the error in their reports was (while in reality they ate less than before).

      (4) We also had the requirement for the study design that the participants could carry out the intervention in their normal everyday life and environment in order to test and ensure implementation in real life. We consider it unrealistic to be able to monitor food intake continuously and without interruption over a period of several weeks under these conditions. We therefore see no alternative to self-reporting. As the reviewers and editors did not suggest any alternative methods of data collection that would fulfil the requirements of our study, we assume that, despite criticism and reservations, they generally agree with our assessment and take this into account in their evaluation.

      It is still criticized that some confounding factors are present. The reviewer makes no reference to the fact that we either eliminated these in the last version submitted (age range), identified them as unproblematic (unmatched cohorts, menstrual cycle, shift work) or even deliberately used them in order to be able to test our hypothesis more validly (inclusion of individuals with normal weight, overweight, and obesity).

      Besides, the use of actimeters to determine circadian rhythms as proposed by the editors and reviewers is not valid for this study and the requirement to use them to determine a circadian reset in the eLife assessment is misleading and inappropriate. This instrument only measures physical activity, but not the physiological parameters that are relevant for an investigation in this field of research.

      For the assessment of chronotype alone, the MCTQ questionnaire is a valid instrument that has been validated several times against actimetry (e.g., DOIs: 10.1080/07420528.2022.2025821, 10.1080/07420528.2023.2202246, 10.1016/j.ijpsycho.2016.07.433, 10.1155/2018/5646848). The reviewer's statement that the MCTQ questionnaire is unreliable for determining chronotype is unsupported and incorrect.

      Equally unproven is the statement that any form of imposed diet appears to lead to weight loss over a period of several months.

      Nevertheless, in order to prevent further misunderstandings, we have revised our text in a number of places and clarified that our statements are not irrefutable assertions, but potential interpretations of the results obtained in the pilot study, which are to be analyzed in more detail with regard to the planned more comprehensive study.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The authors found that IL-1b signaling is pivotal for hypoxemia development and can modulate NETs formation in LPS+HVV ALI model.  

      Strengths: 

      They used IL1R1 ko mice and proved that IL1R1 is involved in ALI model proving that IL1b signalling leads towards ARDS. In addition, hypothermia reduces this effect, suggesting a therapeutic option.  

      We thank the Reviewer for recognizing the strengths of our study and their positive feedback.

      Weaknesses: 

      (1) IL1R1 binds IL1a and IL1b. What would be the role of IL1a in this scenario? 

      Thank you for asking this question. We have addressed this in our previous paper (Nosaka et al. Front Immunol 2020;11; 207) where we used  anti-IL-1a and IL-1a KO mice (Nosaka et al. Front Immunol 2020;11; 207) in our model and found that neither anti-IL-1a treated mice nor IL-1a KO mice were protected. Thus, IL-1b plays a role in inducing hypoxemia during LPS+HVV but not IL-1a. We will now add this point in our revised manuscript discussion.

      (2) The authors depleted neutrophils using anti-Ly6G. What about MDSCs? Do these latter cells be involved in ARDS and VILI?  

      Anti-Ly6G neutrophils depletion may potentially affect G-MDSCs as well (Blood Adv 2022 Jul 29;7(1):73–86), however, we have not looked directly at G-MDSCs.  If these cells were depleted we would have expected to see an increase in inflammation, which we did not.   

      Instead, anti-Ly6G treated mice were protected. Thus, we can not comment on any presumed role of G-MDSCs in LPS+HVV induced severe ALI model that we used.  

      (3) The authors found that TH inhibited IL-1β release from macrophages led to less NETs formation and albumin leakage in the alveolar space in their lung injury model. A graphical abstract could be included suggesting a cellular mechanism.  

      Thanks for summarizing our findings and the suggestion. Unfortunately, eLIFE does not publish a graphical abstract. We tried to mention this mechanism in the discussion.

      (4) If Macrophages are responsible for IL1b release that via IL1R1 induces NETosis, what happens if you deplete macrophages? what is the role of epithelial cells?  

      Previous studies have found that macrophage depletion is protective in several models of ALI (Eyal. Intensive Care Med. 2007;33:1212–1218., Lindauer.  J Immunol. 2009;183:1419–1426.), and other researchers have found that airway epithelial cells did not contribute to IL-1β secretion (Tang. PLoS ONE. 2012;7:e37689.). We have previously reported that epithelial cells produce IL-18 without LPS priming signal during LPS+HVV (Nosaka et al. Front Immunol 2020;11; 207). Thus, IL-18 is not sufficient to induce Hypoxemia as Saline+HVV treated mice do not develop hypoxemia (Nosaka et al. Front Immunol 2020;11; 207). We will now add this point to the revised discussion of the manuscript.

      Reviewer #2 (Public review): 

      Summary: 

      The manuscript by Nosaka et al is a comprehensive study exploring the involvement of IL1beta signaling in a 2-hit model of lung injury + ventilation, with a focus on modulation by hypothermia. 

      Strengths: 

      The authors demonstrate quite convincingly that interleukin 1 beta plays a role in the development of ventilator-induced lung injury in this model, and that this role includes the regulation of neutrophil extracellular trap formation. The authors use a variety of in vivo animal-based and in vitro cell culture work, and interventions including global gene knockout, cell-targeted knockout and pharmacological inhibition, which greatly strengthen the ability to make clear biological interpretations. 

      We thank the Reviewer for their positive feedback 

      Weaknesses: 

      A primary point for open discussion is the translatability of the findings to patients. The main model used, one of intratracheal LPS plus mechanical ventilation is well accepted for research exploring the pathogenesis and potential treatments for acute respiratory distress syndrome (ARDS). However, the interpretation may still be open to question - in the model here, animals were exposed to LPS to induce inflammation for only 2 hours, and seemingly displayed no signs of sickness, before the start of ventilation. This would not be typical for the majority of ARDS patients, and whether hypothermia could be effective once substantial injury is already present remains an open question. The interaction between LPS/infection and temperature is also complicated - in humans, LPS (or infection) induces a febrile, hyperthermic response, whereas in mice LPS induces hypothermia (eg. Ganeshan K, Chawla A. Nat Rev Endocrinol. 2017;13:458-465). Given this difference in physiological response, it is therefore unclear whether hypothermia in mice and hypothermia in humans are easily comparable. Finally, the use of only young, male animals such as in the current study has been typical but may be criticised as limiting translatability to people. 

      Therefore while the conclusions of the paper are well supported by the data, and the biological pathways have been impressively explored, questions still remain regarding the ultimate interpretations.  

      We agree with the reviewer that at two hours post LPS, there is only minimal pulmonary inflammation at that time (Dagvadorj et al Immunity 42, 640–653). This is a limitation to the experimental model we used in our study. Additionally, as the reviewer pointed out that LPS induces hyperthermia in human, but it is also well-established that physiological hypothermia occurs in humans with severe infections and sepsis (Baisse. Am J Emerg Med. 2023 Sep: 71: 134-138., Werner.  Am J Emerg Med. 2025 Feb;88:64-78.). Therefore, the difference between human and mouse responses to sepsis or infections may be more nuanced.  Furthermore, it is important to distinguish between physiological hypothermia (just <36°C) and therapeutic hypothermia (typically 32-34°C). We will add to the discussion whether hypothermia serves as a protective response, and the transition from normothermia to hyperthermia could have detrimental effects. We only used young male mice in our study as the Reviewer points out; we will also add this point to the revised discussion as a limitation of our study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      DiPeso et al. develop two tools to (i) classify micronucleated (MN) cells, which they call VCS MN, and (ii) segment micronuclei and nuclei with MMFinder. They then use these tools to identify transcriptional changes in MN cells.

      The strengths of this study are:

      (1) Developing highly specialized tools to speed up the analysis of specific cellular phenomena such as MN formation and rupture is likely valuable to the community and neglected by developers of more generalist methods.

      (2) A lot of work and ideas have gone into this manuscript. It is clearly a valuable contribution.

      (3) Combining automated analysis, single-cell labeling, and cell sorting is an exciting approach to enrich phenotypes of interest, which the authors demonstrate here.

      Weaknesses:

      (1) Images and ground truth labels are not shared for others to develop potentially better analysis methods.

      We regret this omission and thank the reviewer for pointing it out. Both the images and ground truth labels for VCS MN and MNFinder are now available on the lab’s github page and described in the README.txt files. VCS MN: https://github.com/hatch-lab/fast-mn. MNFinder: https://github.com/hatch-lab/mnfinder.

      (2) Evaluations of the methods are often not fully explained in the text.

      The text has been extensively updated to include a full description of the methods and choices made to develop the VCS MN and MNFinder image segmentation modules.

      (3) To my mind, the various metrics used to evaluate VCS MN reveal it not to be terribly reliable. Recall and PPV hover in the 70-80% range except for the PPV for MN+. It is what it is - but do the authors think one has to spend time manually correcting the output or do they suggest one uses it as is?

      VCS MN attempts to balance precision and recall with speed to reduce the fraction of MN changing state from intact to ruptured during a single cell cycle during a live-cell isolation experiment. In addition, we chose to prioritize inclusion of small MN adjacent to the nucleus in our positive calls. This meant that there were more false positives (lower PPV) than obtained by other methods but allowed us to include this highly biologically relevant class of MN in our MN+ population. Thus, for a comprehensive understanding of the consequences of MN formation and rupture, we recommend using the finder as is. However, for other visual cell sorting applications where a small number of highly pure MN positive and negative cells is preferred, such as clonal outgrowth or metastasis assays, we would recommend using the slower, but more precise, MNFinder to get a higher precision at a cost of temporal resolution. In addition, MNFinder, with its higher flexibility and object coverage, is recommended for all fixed cell analyses.

      Reviewer #2 (Public review):

      Summary:

      Micronuclei are aberrant nuclear structures frequently seen following the missegregation of chromosomes. The authors present two image analysis methods, one robust and another rapid, to identify micronuclei (MN) bearing cells. The authors induce chromosome missegregation using an MPS1 inhibitor to check their software outcomes. In missegregation-induced cells, the authors do not distinguish cells that have MN from those that have MN with additional segregation defects. The authors use RNAseq to assess the outcomes of their MN-identifying methods: they do not observe a transcriptomic signature specific to MN but find changes that correlate with aneuploidy status. Overall, this work offers new tools to identify MN-presenting cells, and it sets the stage with clear benchmarks for further software development.

      Strengths:

      Currently, there are no robust MN classifiers with a clear quantification of their efficiency across cell lines (mIoU score). The software presented here tries to address this gap. GitHub material (tools, protocols, etc) provided is a great asset to naive and experienced computational biologists. The method has been tested in more than one cell line. This method can help integrate cell biology and 'omics' studies.

      Weaknesses:

      Although the classifier outperforms available tools for MN segmentation by providing mIOU, it's not yet at a point where it can be reliably applied to functional genomics assays where we expect a range of phenotypic penetrance.

      We agree that the MNFinder module has limitations with regards to the degree of nuclear atypia and cell density that can be tolerated. Based on the recall and PPV values and their consistency across the majority conditions analyzed, we believe that MNFinder can provide reliable results for MN frequency, integrity, shape, and label characteristics in a functional genomics assay in many commonly used adherent cell lines. We also added a discussion of caveats for these analyses, including the facts that highly lobulated nuclei will have higher false positive rates and that high cell confluency may require additional markers to ensure highly accurate assignment of MN to nuclei.

      Spindle checkpoint loss (e.g., MPS1 inhibition) is expected to cause a variety of nuclear atypia: misshapen, multinucleated, and micronucleated cells. It may be difficult to obtain a pure MN population following MPS1 inhibitor treatment, as many cells are likely to present MN among multinucleated or misshapen nuclear compartments. Given this situation, the transcriptomic impact of MN is unlikely to be retrieved using this experimental design, but this does not negate the significance of the work. The discussion will have to consider the nature, origin, and proportion of MN/rupture-only states - for example, lagging chromatids and unaligned chromosomes can result in different states of micronuclei and also distinct cell fates.

      We appreciate the reviewer’s comments and now quantify the frequency of other nuclear atypias and MN chromosome content in RPE1 cells after 24 h Mps1 inhibition (Fig. S1). In summary, we find only small increases in nuclear atypia, including multinucleate cells, misshapen nuclei, and chromatin bridges, compared to the large increase in MN formation. This contrasts with what is observed when mitosis is delayed using nocodazole or CENPE inhibitors where nuclear atypia is much more frequent. Importantly, after Mps1 inhibition, RPE1 cells with MN were only slightly more likely to have a misshapen nucleus compared to cells without MN (Fig. S1C).

      Interestingly, this analysis showed that the VCS MN pipeline, which uses the Deep Retina segmenter to identify nuclei, has a strong bias against lobulated nuclei and frequently fails to find them (Fig. S2B). Therefore, the cell populations analyzed by RNAseq were largely depleted of highly misshapen nuclei and differences in nuclear atypia frequency between MN+ and MN- cells in the starting population were lost (Fig. S9A, compare to Fig. S1C). This strongly suggests that the transcript changes we observed reflect differences in MN frequency and aneuploidy rather than differences in nuclei morphology.

      We agree with the reviewer that MN rupture frequency and formation, and downstream effects on cell proliferation and DNA damage, are sensitive to the source of the missegregated chromatin. In the revised manuscript we make clear that we chose Mps1 inhibition because it is strongly biased towards whole chromosome MN (Fig. S1E), limiting signal from DNA damage products, including chromosome fragments and chromatin bridges. This provides a base line to disambiguate the consequences of micronucleation and DNA damage in more complex chromosome missegregation processes, such as DNA replication disruption and irradiation. 

      Reviewer #3 (Public review):

      Summary:

      The authors develop a method to visually analyze micronuclei using automated methods. The authors then use these methods to isolate MN post-photoactivation and analyze transcriptional changes in cells with and without micronuclei of RPE-1 cells. The authors observe in RPE-1 cells that MN-containing cells show similar transcriptomic changes as aneuploidy, and that MN rupture does not lead to vast changes in the transcriptome.

      Strengths:

      The authors develop a method that allows for automating measurements and analysis of micronuclei. This has been something that the field has been missing for a long time. Using such a method has the potential to advance micronuclei biology. The authors also develop a method to identify cells with micronuclei in real time and mark them using photoconversion and then isolate them via FACS. The authors use this method to study the transcriptome. This method is very powerful as it allows for the sorting of a heterogenous population and subsequent analysis with a much higher sample number than could be previously done.

      Weaknesses:

      The major weakness of this paper is that the results from the RNA-seq analysis are difficult to interpret as very few changes are found to begin with between cells with MN and cells without. The authors have to use a 1.5-fold cut-off to detect any changes in general. This is most likely due to the sequencing read depth used by the authors. Moreover, there are large variances between replicates in experiments looking at cells with ruptured versus intact micronuclei. This limits our ability to assess if the lack of changes is due to truly not having changes between these populations or experimental limitations. Moreover, the authors use RPE-1 cells which lack cGAS, which may contribute to the lack of changes observed. Thus, it is possible that these results are not consistent with what would occur in primary tissues or just in general in cells with a proficient cGAS/STING pathway.

      We agree with the reviewer’s assessment of the limitations of our RNA-Seq analysis. After additional analysis, we propose an alternative explanation for the lower expression changes we observe in the MN+ and Mps1 inhibitor RNA-Seq experiments. In summary, we find that VCS MN has a strong bias against highly lobulated nuclei that depletes this class of cells from both the bulk analysis and the micronucleated cell populations (Fig. S9A). Based on this result, we propose that our analysis reduces the contribution of nuclear atypia to these transcriptional changes and that nuclear morphology changes are likely a signaling trigger associated with aneuploidy.

      We believe that this finding strengthens our overall conclusion that MN formation and rupture do not cause transcriptional changes, as suppressing the signaling associated with nuclei atypia should increase sensitivity to changes from the MN. However, we cannot completely rule out that MN formation or rupture cause a broad low-level change in transcription that is obscured by other signals in the dataset.

      As to cGAS signaling, several follow up papers and even the initial studies from the Greenburg lab show that MN rupture does not activate cGAS and does not cause cGAS/STING-dependent signaling in the first cell cycle (see citations and discussion in text). Therefore, we expect the absence of cGAS in RPE1 cells will have no effect in the first cell cycle, but could alter the transcriptional profile after mitosis. Although analysis of RPE1  cGAS+ cells or primary cells in these experiments will be required to definitively address this point, we believe that our interpretation of our RNAseq results is sufficiently backed up by the literature to warrant our conclusion that MN formation and rupture do not induce a transcriptional response in the first cell cycle.

      Reviewer #1 (Recommendations for the authors):

      I do not recommend additional experimental or computational work. Instead, I just recommend adapting the claims of the manuscript to what has been done. I am just asking for further clarification and minor rewriting.

      (1) The manuscript is written like a molecular biology paper with sparse explanations of the authors' reasoning, especially in the development of their algorithms. I was often lost as to why they did things in one way or another.

      The revised manuscript has thorough explanations and additional data and graphics defining how and why the VCS MN and MNFinder modules were developed. We hope that this clears up many of the questions the reviewer had and appreciate their guidance on making it more readable for scientists from different backgrounds.

      (2) Evaluations of their method are often not fully explained, for example:

      "On average, 75% of nuclei per field were correctly segmented and cropped."

      "MN segments were then assigned to 'parent' nuclei by proximity, which correctly associated 97% of MN."

      Were there ground truth images and labels created? How many? For example, I don't know how the authors could even establish a ground-truth for associating MNs to nuclei if MNs happened to be almost equidistant between two nuclei in their images.

      I suggest a separate subsection early in the Results section where the underlying imaging data + labels are presented.

      We added new sections to the text and figures at the beginning of the VCS MN and MNFinder subsections (Fig. S2 and Fig. S5) with specific information about how ground truth images and labels were generated for both modules and how these were broken up for training, validation, and testing.

      We also added information and images to explain how ground truth MN/nucleus associations were derived. In summary, we took advantage of the fact that 2xDendra-NLS is present at low levels in the cytoplasm to identify cell boundaries. This combined with a subconfluent cell population allowed us to unambiguously group MN and nuclei for 98% of MN, we estimate. These identifications were used to generate ground truth labels and analyze how well proximity defines MN/nuclei groups (Fig.s S1 and S2).

      (3) Overall, I find the sections long and more subtitles would help me better navigate the manuscript.

      Where possible, we have added subtitles.

      (4) Everything following "To train the model, H2B channel images were passed to a Deep Retina neural net ..." is fully automated, it seems to me. Thus, there seems to be no human intervention to correct the output before it is used to train the neural network. Therefore, I do not understand why a neural network was trained at all if the pipeline for creating ground truth labels worked fully automatically. At least, the explanations are insufficient.

      We apologize for the initial lack of clarity in the text and included additional details in the revision. We used the Deep Retina segmenter to crop the raw images to areas around individual nuclei to accelerate ground truth labeling of MN. A trained user went through each nucleus crop and manually labeled pixels belonging to MN to generate the ground truth dataset for training, validation, and imaging in VCS MN (Fig. S2A).

      (5) To my mind, the various metrics used to evaluate VCS MN reveal it not to be terribly reliable. Recall and PPV hover in the 70-80% range except for the PPV for MN+. It is what it is - but do the authors think one has to spend time manually correcting the output or do they suggest one uses it as is? I understand that for bulk transcriptomics, enrichment may be sufficient but for many other questions, where the wrong cell type could contaminate the population, it is not.

      Remarks in the Results section on what the various accuracies mean for different applications would be good (so one does not need to wait for the Discussion section).

      One of the strengths of the visual cell sorting system is that any image analysis pipeline can be used with it. We used VCS MN for the transcriptomics experiment, but for other applications a user could run visual cell sorting in conjunction with MNFinder for increased purity while maintaining a reasonable recall or use a pre-existing MN segmentation program that gives 100% purity but captures only a specific subgroup of micronucleated cells (e.g. PIQUE). 

      To maintain readability, especially with the expansion of the results sections, we kept the discussion of how we envision using visual cell sorting for other MN-based applications in the discussion section.

      (6) I am confused about what "cell" is referring to in much of the manuscript. Is it the nucleus + MNs only? Is it the whole cell, which one would ordinarily think it is? If so, are there additional widefield images, where one can discern cell boundaries? I found the section "MNFinder accurately ..." very hard to read and digest for this reason and other ambiguous wording. I suggest the authors take a fresh look at their manuscript and see whether the text can be improved for clarity. I did not find it an easy read overall, especially the computational part.

      After re-examining how “cell” was used, we updated the text to limit its use to the MNFinder arm tasked with identifying MN-nucleus associations where the convex hull defined by these objects is used to determine the “cell” boundary. In all other cases we have replaced cell with “nucleus” because, as the reviewer points out, that is what is being analyzed and converted. We hope this is clearer.

      (7) Post-FACS PPVs are not that great (Figure 3c). It depends on the question one wants to answer whether ~70% PPV is good enough. Again, would be good to comment on.

      We added discussion of this result to the revision. In summary, a likely reason for the reduced PPV is that, although we maintain the cells in buffer with a Cdk1 inhibitor, we know that some proportion of the cells go through mitosis post-sorting. Since MN are frequently reincorporated into the nucleus after mitosis (Hatch et al, 2013; Zhang et al., 2015), we expect this to reduce the MN+ population. Thus, we expect that the PPV in the RNAseq population is higher than what we can measure by analyzing post-sorted cells that have been plated and analyzed later.

      (8) I am thoroughly confused as to why the authors claim that their system works in the "absence of genetic perturbations" and why they emphasize the fact that their cells are non-transformed: They still needed a fluorescent label and they induce MNs with a chemical Mps1 inhibitor. (The latter is not a genetic manipulation, of course, but they still need to enrich MNs somehow. That is, their method has not been tested on a cell population in which MNs occur naturally, presumably at a very low rate, unless I missed something.) A more careful description of the benefits of their method would be good.

      We apologize for the confusion on these points and hope this is clarified in the revision. We were comparing our system, which can be made using transient transfection, if desired, to current tools that disambiguate aneuploidy and MN formation by deleting parts of chromosomes or engineering double strand breaks with CRISPR to generate single chromosome-specific missegregation events. Most of these systems require transformed cancer cells to obtain high levels of recombination. In contrast, visual cell sorting can isolate micronucleated cells from any cell line that can exogenously express a protein, including primary cells and non-transformed cells like RPE1s.

      Other minor points:

      (1) The authors should not refer to "H2B channels" but to "H2B-emiRFP703 channels". It may seem obvious to the authors but for someone reading the manuscript for the very first time, it was not. I was not sure whether there were additional imaging modalities used for H2B/nucleus/chromatin detection before I went back and read that only fluorescence images of H2B-emiRFP703 were used. To put it another way, the authors are detecting fluorescence, not histones -- unless I misunderstood something.

      To address this point, we altered the text to read “H2B-emiRFP703” when discussing images of this construct. For MNFinder some images were of cells expressing H2B-GFP, which has also been clarified.

      (2) If the level of zoom on my screen is such that I can comfortably read the text, I cannot see much in the figure panels. The features that I should be able to see are the size of a title. The image panels should be magnified.

      In the revision, the images are appended to the end at full resolution to overcome this difficulty. Thank you for your forbearance.

      Reviewer #2 (Recommendations for the authors):

      The methods are adequately explained. The Results text narrating experiments and data analysis is clear. Interpretation of a few results could be clarified and strengthened as explained below.

      (1) RNAseq experiments are a good proof of principle. To strengthen their interpretation in Figures 4 and 6, I would recommend the authors cite published work on checkpoint/MPS1 loss-induced chromosome missegregation (PMID: 18545697, PMID: 33837239, PMC9559752) and consider in their discussion the 'origin' and 'proportion' of micronucleated cells and irregularly shaped nuclei expected in RPE1 lines. This will help interpret Figure 6 findings on aneuploidy signature accurately. Not being able to see an MN-specific signature could be due to the way the biological specimen is presented with a mixture of cells with 'MN only' or 'rupture' or 'MN along with misshapen nuclei'. These features may all link to aneuploidy rather than 'MN' specifically.

      We appreciate the reviewer’s suggestion and added a new analysis of nuclear atypia after Mps1 inhibition in RPE1 cells to Fig. S1. Overall, we found that Mps1 inhibition significantly, but modestly, increased the proportion of misshapen nuclei and chromatin bridges. Multinucleate cells were so rare that instead of giving them their own category we included them in “misshapen nuclei.” These results are consistent with images of Msp1i treated RPE1 cells from He et al. 2019 and Santaguida et al. 2017 and distinct from the stronger changes in nuclear morphology observed after delaying mitosis by nocodazole or CENPE inhibition.

      We also found that the Deep Retina segmenter used to identify nuclei in VCS MN had a significant bias against highly lobulated nuclei (Fig. S2B) that led to misshapen nuclei being largely excluded from the RNAseq analyses. As a result we found no enrichment of misshapen nuclei, chromatin bridges, or dead/mitotic nuclear morphologies in MN+ compared to MN- nuclei in our RNASeq experiments (Fig. S9A).

      (2) As the authors clarify in the response letter, one round of ML is unlikely to result in fully robust software; additional rounds of ML with other markers will make the work robust. It will be useful to indicate other ML image analysis tools that have improved through such reiterations. They could use reviews on challenges and opportunities using ML approaches to support their statement. Also in the introduction, I would recommend labelling as 'rapid' instead of 'rapid and precise' method.

      We updated the text to reference review articles that discuss the benefit of additional training for increasing ML accuracy and changed the text to “rapid.”

      (3) The lack of live-cell studies does not allow the authors to distinguish the origin of MN (lagging chromatids or unaligned chromosomes). As explained in 1, considering these aspects in discussion would strengthen their interpretation. Live-cell studies can help reduce the dependencies on proximity maps (Figure S2).

      The revised text includes new references and data (Fig. S1E) demonstrating that Mps1 inhibition strongly biases towards whole chromosome missegregation and that MN are most likely to contain a single centromere positive chromosome rather than chromatin fragments or multiple chromosomes.

      (4) Mean Intersection over Union (mIOU) is a good measure to compare outcomes against ground truth. However, the mIOU is relatively low (Figure 2D) for HeLa-based functional genomics applications. It will help to discuss mIOU for other classifiers (non-MN classifiers) so that they can be used as a benchmark (this is important since the authors state in their response that they are the first to benchmark an MN classifier). There are publications for mitochondria, cell cortex, spindle, nuclei, etc. where IOU has been discussed.

      We added references to classifiers for other small cellular structures. We also evaluated major sources of error in MNFinder found that false negatives are enriched in very small MN (3 to 9 pixels, or about 0.4 µm<sup>2</sup> – 3 µm<sup>2</sup>, Fig. S6B). A similar result was obtained for VCS MN (Fig. S3B). Because small changes in the number of pixels identified in small objects can have outsized effects on mIoU scores, we suspect that this is exerting downward pressure on the mIoU value. Based on the PPV and recall values we identified, we believe that MNFinder is robust enough to use for functional genomics and screening applications with reasonable sample sizes.

      (5) Figure 5 figure legend title is an overinterpretation. MN and rupture-initiated transcriptional changes could not be isolated with this technique where several other missegregation phenotypes are buried (see point 1 above).

      We decided to keep the figure title legend based on our analysis of known missegregation phenotypes in Fig. S1 and S9 showing that there is no difference in major classes of nuclear atypia between MN+ and MN- populations in this analysis. Although we cannot rule out that other correlated changes exist, we believe that the title represents the most parsimonious interpretation.

      Minor comments

      (1) The sentence in the introduction needs clarification and reference. "However, these interventions cause diverse "off-target" nuclear and cellular changes, including chromatin bridges, aneuploidy, and DNA damage." Off-target may not be the correct description since inhibiting MPS1 is expected to cause a variety of problems based on its role as a master kinase in multiple steps of the chromosome segregation process. Consider one of the references in point 1 for a detailed live-cell view of MPS1 inhibitor outcomes.

      We have changed “off-target” to “additional” for clarity.

      (2) In Figure 3 or S3, did the authors notice any association between the cell cycle phase and MN or rupture presence? Is this possible to consider based on FACS outcomes or nuclear shapes?

      Previous work by our lab and others have shown that MN rupture frequency increases during the cell cycle (Hatch et al., 2013; Joo et al., 2023). Whether this is stochastic or regulated by the cell cycle may depend on what chromosome is in the MN (Mammel et al., 2021) and likely the cell line. Unfortunately, the H2B-emiRFP703 fluorescence in our population is too variable to identify cell cycle stage from FACS or nuclear fluorescence analysis.

      (3) Figure 5 - Please explain "MA plot".

      An MA plot, or log fold-change (M) versus average (A) gene expression, is a way to visualize differently expressed genes between two conditions in an RNASeq experiment and is used as an alternative to volcano plots. We chose them for our paper because most of the expression changes we observed were small and of similar significance and the MA plot spreads out the data compared to a volcano plot and allowed a better visualization of trends across the population.

      (4) Page 7: "our results strongly suggest that protein expression changes in MN+ and rupture+ cells are driven mainly by increased aneuploidy rather than cellular sensing of MN formation and rupture.". This is an overstatement considering the mIOU limits of the software tool and the non-exclusive nature of MN in their samples.

      We agree that we cannot rule out that an unknown masking effect is inhibiting our ability to observe small broad changes in transcription after MN formation or rupture. However, we believe we have minimized the most likely sources of masking effects, including nuclear atypia and large scale aneuploidy differences, and thus our interpretation is the most likely one.

      Reviewer #3 (Recommendations for the authors):

      Overall, the authors need to explain their methods better, define some technical terms used, and more thoroughly explain the parameters and rationale used when implementing these two protocols for identifying micronuclei; primarily as this is geared toward a more general audience that does not necessarily work with machine learning algorithms.

      (1) A clearer description in the methods as to how accuracy was calculated. Were micronuclei counted by hand or another method to assess accuracy?

      We significantly expanded the section on how the machine learning models were trained and tested, including how sensitivity and specificity metrics were calculated, in both the results and the methods sections. The code used to compare ground truth labels to computed masks is also now included in the MNFinder module available on the lab github page. 

      (2) Define positive predictive value.

      The text now says “the positive predictive value (PPV, the proportion of true positives, i.e. specificity) and recall (the proportion of MN found by the classifier, i.e. sensitivity)…”.

      (3) Why is it a problem to use the VCS MN at higher magnifications where undersegmentation occurs? What do the authors mean by diminished performance (what metrics are they using for this?).

      We have included a representative image and calculated mIoU and recall for 40x magnification images analyzed by MNFinder after rescaling in Fig. 2A. In summary, VCS MN only correctly labeled a few pixels in the MN, which was sufficient to call the adjacent nucleus “MN+” but not sufficient for other applications, such as quantifying MN area. In addition, VCS MN did much worse at identifying all the MN in 40x images with a recall, or sensitivity, metric of 0.36. We are not sure why. Developing MNFinder provided a module that was well suited to quantify MN characteristics in fixed cell images, an important use case in MN biology.

      (4) The authors should compare MN that are analyzed and not analyzed using these methods and define parameters. Is there a size limitation? Closeness to the main nucleus?

      We added two new figures defining what contributes to module error for both VCS MN (Fig. S3) and MNFinder (Fig. S6). For VCS MN, false negatives are enriched in very large or very small MN and tend to be dimmer and farther from the nucleus than true positives. False positives are largely misclassification of small dim objects in the image as MN. For MNFinder, the most missed class of MN are very small ones (3-9 px in area) and the majority of false positives are misclassifications of elongated nuclear blebs as MN.

      (5) Are there parameters in how confluent an image must be to correctly define that the micronucleus belongs to the correct cell? The authors discussed that this was calculated based on predicted distance. However, many factors might affect proper calling on MN. And the authors should test this by staining for a cytosolic marker and calculating accuracy.

      We updated the text with more information about how the cytoplasm was defined using leaky 2x-Dendra2-NLS signal to analyze the accuracy of MN/nucleus associations (Fig. S2G-H). In addition, we quantified cell confluency and distance to the first and second nearest neighbor for each MN in our training and testing image datasets. We found that, as anticipated, cells were imaged at subconfluent concentrations with most fields having a confluency around 30% cell coverage (Fig. S2E) and that the average difference in distance between the closest nucleus to an MN and the next closest nucleus was 3.3 fold (Fig. S2F). We edited the discussion section to state that the ability of MN/nuclear proximity to predict associations at high cell confluencies would have to be experimentally validated.

      (6) The authors measure the ratio of Dendra2(Red) v. Dendra2 (Green) in Figure 3B to demonstrate that photoconversion is stable. This measurement, to me, is confusing, as in the end, the authors need to show that they have a robust conversion signal and are able to isolate these data. The authors should directly demonstrate that the Red signal remains by analyzing the percent of the Red signal compared to time point 0 for individual cells.

      We found a bulk analysis to be more powerful than trying to reidentify individual cells due to how much RPE1 cells move during the 4 and 8 hours between image acquisitions. In addition, we sort on the ratio between red and green fluorescence per cell, rather than the absolute fluorescence, to compensate for variation in 2xDendra-NLS protein expression between cells. Therefore, demonstrating that distinct ratios remained present throughout the time course is the most relevant to the downstream analysis.

      To address the reviewer’s concern, we replotted the data in Fig. 3B to highlight changes over time in the raw levels of red and green Dendra fluorescence (Fig. S7D). As expected, we see an overall decrease in red fluorescence intensity, and complementary increase in green fluorescence intensity, over 8 hours, likely due to protein turnover. We also observe an increase in the number of nuclei lacking red fluorescence. This is expected since the well was only partially converted and we expect significant numbers of unconverted cells to move into the field between the first image and the 8 hour image.

      (7) The authors isolate and subsequently use RNA-sequencing to identify changes between Mps1i and DMSO-treated cells. One concern is that even with the less stringent cut-off of 1.5 fold there is a very small change between DMSO and MPS1i treated cells, with only 63 genes changing, none of which were affected above a 2-fold change. The authors should carefully address this, including why their dataset sees changes in many more pathways than in the He et al. and Santaguida et al. studies. Is this due to just having a decreased cut-off?

      The reviewer correctly points out that we observed an overall reduction in the strength of gene expression changes between our dataset of DMSO versus Mps1i treated RPE1 cells compared to similar studies. We suggest a couple reasons for this. One is that the log<sub>2</sub> fold changes observed in the other studies are not huge and vary between 2.5 and -3.8 for He et al., 3.3 and -2.3 for Santaguida et al., and -0.8 and 1.6 for our study. This variability is within a reasonable range for different experimental conditions and library prep protocols. A second is that our protocol minimizes a potential source of transcriptional change – nuclear lobulation – that is present in the other datasets.

      For the pathway analysis we did not use a fold-change cut-off for any data set, instead opting to include all the genes found to be significantly different between control and Mps1i treated cells for all three studies. Our read-depth was higher than that of the two published experiments, which could contribute to an increased DEG number. However, we hypothesize that our identification of a broader number of altered pathways most likely arises from increased sensitivity due to the loss of covering signal from transcriptional changes associated with increased nuclear atypia. Additional visual cell sorting experiments sorting on misshapen nuclei instead of MN would allow us to determine the accuracy of this hypothesis.

      (8) Moreover, clustering (in Figure 5E) of the replicates is a bit worrisome as the variances are large and therefore it is unclear if, with such large variance and low screening depth, one can really make such a strong conclusion that there are no changes. The authors should prove that their conclusion that rupture does not lead to large transcriptional changes, is not due to the limitations of their experimental design.

      We agree with the reviewers that additional rounds of RNAseq would improve the accuracy of our transcriptomic analysis and could uncover additional DEGs. However, we believe the overall conclusion to be correct based on the results of our attempt to validate changes in gene expression by immunofluorescence. We analyzed two of the most highly upregulated genes in the ruptured MN dataset, ATF3 and EGR1. Although we saw a statistically significant increase in ATF3 intensity between cells without MN and those with ruptured MN, the fold change was so small compared to our positive control (100x less) that we believe it is it is more consistent with a small increase in the probability of aneuploidy rather than a specific signature of MN rupture.

      (9) The authors also need to address the fact that they are using RPE-1 cells more clearly and that the lack of effect in transcriptional changes may be simply due to the loss of cGAS-STING pathway (Mackenzie et al., 2017; Harding et al., 2017; etc.).

      As we discuss above in the public comments section, the literature is clear that MN do not activate cGAS in the first cell cycle after their formation, even upon rupture. Therefore, we do not expect any changes in our results when applied to cGAS-competent cells. However, this expectation needs to be experimentally validated, which we plan to address in upcoming work.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      The paper begins with phenotyping the DGRP for post-diapause fecundity, which is used to map genes and variants associated with fecundity. There are overlaps with genes mapped in other studies and also functional enrichment of pathways including most surprisingly neuronal pathways. This somewhat explains the strong overlap with traits such as olfactory behaviors and circadian rhythm. The authors then go on to test genes by knocking them down effectively at 10 degrees. Two genes, Dip-gamma and sbb, are identified as significantly associated with post-diapause fecundity, and they also find the effects to be specific to neurons. They further show that the neurons in the antenna but not the arista are required for the effects of Dip-gamma and sbb. They show that removing the antenna has a diapause-specific lifespan-extending effect, which is quite interesting. Finally, ionotropic receptor neurons are shown to be required for the diapause-associated effects. 

      Strengths and Weaknesses: 

      Overall I find the experiments rigorously done and interpretations sound. I have no further suggestions except an ANOVA to estimate the heritability of the post-diapause fecundity trait, which is routinely done in the DGRP and offers a global parameter regarding how reliable phenotyping is. 

      We added to the Methods: “We performed a one-way ANOVA to get the mean squares for between-group and withingroup variances and calculated broad-sense heritability using the formula: H<sup>2</sup> = MS<sub>G</sub> - MS<sub>E</sub> / MS<sub>G</sub> + (k-1) MS<sub>E</sub> where MS<sub>G</sub> - Mean square between groups and MS<sub>G</sub> - Mean square within groups and k - Number of individuals per group. Using this formula, the broad-sense heritability for normalized post-diapause fecundity was found to be 0.51.” 

      We added to the Results: “The broad-sense heritability for normalized post-diapause fecundity was found to be 0.51 (see Methods).”

      A minor point is I cannot find how many DGRP lines are used. 

      Response: We screened 193 lines and have added that to the Results. 

      Reviewer #2 (Public Review):

      Summary

      In this study, Easwaran and Montell investigated the molecular, cellular, and genetic basis of adult reproductive diapause in Drosophila using the Drosophila Genetic Reference Panel (DGRP). Their GWAS revealed genes associated with variation in post-diapause fecundity across the DGRP and performed RNAi screens on these candidate genes. They also analyzed the functional implications of these genes, highlighting the role of genes involved in neural and germline development. In addition, in conjunction with other GWAS results, they noted the importance of the olfactory system within the nervous system, which was supported by genetic experiments. Overall, their solid research uncovered new aspects of adult diapause regulation and provided a useful reference for future studies in this field.

      Strengths:

      The authors used whole-genome sequenced DGRP to identify genes and regulatory mechanisms involved in adult diapause. The first Drosophila GWAS of diapause successfully uncovered many QTL underlying post-diapause fecundity variations across DGRP lines. Gene network analysis and comparative GWAS led them to reveal a key role for the olfactory system in diapause lifespan extension and post-diapause fecundity.

      Comments on revised version:

      While the authors have addressed many of the minor concerns raised by the reviewers, they have not fully resolved some of the key criticisms. Notably, two reviewers highlighted significant concerns regarding the phenotype and assay of post-diapause fecundity, which are critical to the study. The authors acknowledged that this assay could be confounded by the 'cold temperature endurance phenotype,' potentially altering the interpretation of their results.

      However, they responded by stating that it is not obvious how to separate these effects experimentally. This leaves the analysis in this research ambiguous, as also noted by Reviewer #3.

      We should have clarified earlier that we actually chose to measure post-diapause fecundity in order to minimize any impact of ‘cold temperature endurance.” In fact, we chose post-diapause fecundity as the appropriate measure of successful diapause for both technical and conceptual reasons. Conceptually, the benefit of diapause is to perpetuate the species. It seems obvious to us that post-diapause fecundity is more relevant to species propagation than other measures of diapause such as how many egg chambers contain yolk or how many eggs are laid. Technically, we chose 5-week diapause and recovery based on pilot studies that showed that nearly all DGRP lines showed excellent survival at 5 weeks in diapause conditions. Therefore, our experimental design minimized as much as possible any effect of cold temperature endurance - in the sense of the ability to survive at 10°C - on our phenotype. 

      We apologize for not clarifying that point earlier and have added this text to the Results: “We chose 5 weeks based on pilot studies that showed that nearly all DGRP lines showed excellent survival at 5 weeks in diapause conditions while exhibiting sufficient variation in post-diapause fecundity to carry out GWAS. Beyond 5 weeks, fecundity was low and there was insufficient variation to conduct a GWAS.”

      Additionally, I raised concerns about the validity of prioritizing genes with multiple associated variants. Although the authors agreed with this point, they did not revise the manuscript accordingly. The statement that 'Genes with multiple SNPs are good candidates for influencing diapause traits' is not a valid argument within the context of population and quantitative genetics.

      We apologize for neglecting to revise the manuscript accordingly. We have revised Supplemental Table: S4 and ranked the genes by p-value.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] Strengths: This study utilized multiple in vitro approaches, such as proteomics, siRNA, and overexpression, to demonstrate that PCBP2 is an intrinsic factor of BMSC aging.

      Weaknesses:

      This study did not perform in vivo experiments.

      Response: We will continue to conduct animal experiments in subsequent studies.

      Reviewer #2 (Public Review):

      [...] Weaknesses: It is unclear if PCBP2 can also function as an intrinsic factor for BMSC cells in female individuals. More work may be needed to further dissect the mechanism of how PCBP2 impacts FGF2 expression. Could PCBP2 impact the FGF2 expression independent of ROS?

      Response: Thank you very much for your valuable comments, which is also the focus of our follow-up work. We will sort out the data and publish the relevant research results as soon as possible.

      Additional context that would help readers interpret or understand the significance of the work: In the current work, the authors studied the aging process of BMSC cells, which are related to osteoporosis. Aging processes also impact many other cell types and their function, such as in muscle, skin, and the brain.

      Response: Thank you very much for your valuable comments, we will continue to improve the writing logic of the article to make the article more understandable.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript performs a comprehensive biochemical, structural, and bioinformatic analysis of TseP, a type 6 secretion system effector from Aeromonas dhakensis that includes the identification of a domain required for secretion and residues conferring target organism specificity. Through targeted mutations, they have expanded the target range of a T6SS effector to include a gram-positive species, which is not typically susceptible to T6SS attack.

      Strengths:

      All of the experiments presented in the study are well-motivated and the conclusions are generally sound.

      Thank you.

      Weaknesses:

      There are some issues with the clarity of figures. For example, the microscopy figures could have been more clearly presented as cell counts/quantification rather than representative images. Similarly, loading controls for the secreted proteins for the westerns probably should be shown.

      Also, some of the minor/secondary conclusions reached regarding the "independence" of the N and C term domains of the TseP are a bit overreaching.

      We thank the reviewer for pointing out the issues and have carefully revised the manuscript accordingly. We acknowledge the reviewer’s concern regarding the independence of the N- and C-terminal domains, and have toned down the relevant claims.

      Reviewer #2 (Public review):

      Summary:

      Wang et al. investigate the role of TseP, a Type VI secretion system (T6SS) effector molecule, revealing its dual enzymatic activities as both an amidase and a lysozyme. This discovery significantly enhances the understanding of T6SS effectors, which are known for their roles in interbacterial competition and survival in polymicrobial environments. TseP's dual function is proposed to play a crucial role in bacterial survival strategies, particularly in hostile environments where competition between bacterial species is prevalent.

      Strengths:

      (1) The dual enzymatic function of TseP is a significant contribution, expanding the understanding of T6SS effectors.

      (2) The study provides important insights into bacterial survival strategies, particularly in interbacterial competition.

      (3) The findings have implications for antimicrobial research and understanding bacterial interactions in complex environments.

      Thank you.

      Weaknesses:

      (1) The manuscript assumes familiarity with previous work, making it difficult to follow. Mutants and strains need clearer definitions and references.

      Thank you for raising the issue. We have revised the manuscript accordingly to improve the clarity by including more detailed descriptions of the mutants and strains, along with references to prior work where relevant, to improve clarity.

      (2) Figures lack proper controls, quantification, and clarity in some areas, notably in Figures 1A and 1C.

      We have now added the controls as requested by reviewers.

      (3) The Materials and Methods section is poorly organized, hindering reproducibility. Biophysical validation of Zn<sup>2+</sup> interaction and structural integrity of proteins need to be addressed.

      We have now included more details in the Materials and Methods section. While we recognize the importance of biophysical validation of the Zn<sup>2+</sup> interaction, this analysis lies beyond the primary scope of the current study. We plan to investigate the role of Zn²⁺ interaction and the EF-hand domain in greater depth as part of our follow-up studies. Thank you for this suggestion.

      (4) Discrepancies in protein degradation patterns and activities across different figures raise concerns about data reliability.

      We acknowledge the concern about discrepancies in protein degradation patterns. TseP exhibits inherent instability, which might explain the observed variations. We have added an explanation in the detailed response letter and the manuscript.

      Reviewer #3 (Public review):

      Summary:

      Type VI secretion systems (T6SS) are employed by bacteria to inject competitor cells with numerous effector proteins. These effectors can kill injected cells via an array of enzymatic activities. A common class of T6SS effector are peptidoglycan (PG) lysing enzymes. In this manuscript, the authors characterize a PG-lysing effector-TseP-from the pathogen Aeromonas dhakensis. While the C-terminal domain of TseP was known to have lysozyme activity, the N-terminal domain was uncharacterized. Here, the authors functionally characterize TsePN as a zinc-dependent amidase. This discovery is somewhat novel because it is rare for PG-lysing effectors to have amidase and lysozyme activity.

      In the second half of the manuscript, the authors utilize a crystal structure of the lysozyme TsePC domain to inform the engineering of this domain to lyse gram-positive peptidoglycan.

      Strengths:

      The two halves of the manuscript considered together provide a nice characterization of a unique T6SS effector and reveal potentially general principles for lysozyme engineering.

      Thank you.

      Weaknesses:

      The advantage of fusing amidase and lysozyme domains in a single effector is not discussed but would appear to be a pertinent question. Labeling of the figures could be improved to help readers understand the data.

      Thank you for the suggestions. We have revised the manuscript and figures to improve clarity.

      The advantage of having dual-domain functions relative to having just one of the two functions is likely for increasing competitive fitness. Although such dual functional cell-wall targeting effectors have not been characterized prior to this study, there are some examples that dual functions are encoded by the same secretion module, for example the VgrG1-TseL pair in Vibrio cholerae. The C-terminal of VgrG1 not only catalyzes actin crosslinking but also recognizes and delivers the downstream encoded lipase effector TseL through direct interaction. In this context, the VgrG1-TseL pair also represent a dual-functional module. Therefore, it is likely that fusing effector domains and coupling effector functions are parallel strategies for the evolution of T6SS effectors.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The paper explored cross-species variance in albumin glycation and blood glucose levels in the function of various life-history traits. Their results show that

      (1) blood glucose levels predict albumin gylcation rates

      (2) larger species have lower blood glucose levels

      (3) lifespan positively correlates with blood glucose levels and

      (4) diet predicts albumin glycation rates.

      The data presented is interesting, especially due to the relevance of glycation to the ageing process and the interesting life-history and physiological traits of birds. Most importantly, the results suggest that some mechanisms might exist that limit the level of glycation in species with the highest blood glucose levels.

      While the questions raised are interesting and the amount of data the authors collected is impressive, I have some major concerns about this study:

      (1) The authors combine many databases and samples of various sources. This is understandable when access to data is limited, but I expected more caution when combining these. E.g. glucose is measured in all samples without any description of how handling stress was controlled for. E.g glucose levels can easily double in a few minutes in birds, potentially introducing variation in the data generated. The authors report no caution of this effect, or any statistical approaches aiming to check whether handling stress had an effect here, either on glucose or on glycation levels.

      (2) The database with the predictors is similarly problematic. There is information pulled from captivity and wild (e.g. on lifespan) without any confirmation that the different databases are comparable or not (and here I'm not just referring to the correlation between the databases, but also to a potential systematic bias (e.g. captivate-based sources likely consistently report longer lifespans). This is even more surprising, given that the authors raise the possibility of captivity effects in the discussion, and exploring this question would be extremely easy in their statistical models (a simple covariate in the MCMCglmms).

      (3) The authors state that the measurement of one of the primary response variables (glycation) was measured without any replicability test or reference to the replicability of the measurement technique.

      (4) The methods and results are very poorly presented. For instance, new model types and variables are popping up throughout the manuscript, already reporting results, before explaining what these are e.g. results are presented on "species average models" and "model with individuals", but it's not described what these are and why we need to see both. Variables, like "centered log body mass", or "mass-adjusted lifespan" are not explained. The results section is extremely long, describing general patterns that have little relevance to the questions raised in the introduction and would be much more efficiently communicated visually or in a table.

      Reviewer #2 (Public review):

      Summary

      In this extensive comparative study, Moreno-Borrallo and colleagues examine the relationships between plasma glucose levels, albumin glycation levels, diet, and lifehistory traits across birds. Their results confirmed the expected positive relationship between plasma blood glucose level and albumin glycation rate but also provided findings that are somewhat surprising or contradicting findings of some previous studies (relationships with lifespan, clutch mass, or diet). This is the first extensive comparative analysis of glycation rates and their relationships to plasma glucose levels and life history traits in birds that are based on data collected in a single study and measured using unified analytical methods.

      Strengths

      This is an emerging topic gaining momentum in evolutionary physiology, which makes this study a timely, novel, and very important contribution. The study is based on a novel data set collected by the authors from 88 bird species (67 in captivity, 21 in the wild) of 22 orders, which itself greatly contributes to the pool of available data on avian glycemia, as previous comparative studies either extracted data from various studies or a database of veterinary records of zoo animals (therefore potentially containing much more noise due to different methodologies or other unstandardised factors), or only collected data from a single order, namely Passeriformes. The data further represents the first comparative avian data set on albumin glycation obtained using a unified methodology. The authors used LC-MS to determine glycation levels, which does not have problems with specificity and sensitivity that may occur with assays used in previous studies. The data analysis is thorough, and the conclusions are mostly wellsupported (but see my comments below). Overall, this is a very important study representing a substantial contribution to the emerging field of evolutionary physiology focused on the ecology and evolution of blood/plasma glucose levels and resistance to glycation.

      Weaknesses

      My main concern is about the interpretation of the coefficient of the relationship between glycation rate and plasma glucose, which reads as follows: "Given that plasma glucose is logarithm transformed and the estimated slope of their relationship is lower than one, this implies that birds with higher glucose levels have relatively lower albumin glycation rates for their glucose, fact that we would be referring as higher glycation resistance" (lines 318-321) and "the logarithmic nature of the relationship, suggests that species with higher plasma glucose levels exhibit relatively greater resistance to glycation" (lines 386-388). First, only plasma glucose (predictor) but not glycation level (response) is logarithm transformed, and this semi-logarithmic relationship assumed by the model means that an increase in glycation always slows down when blood glucose goes up, irrespective of the coefficient. The coefficient thus does not carry information that could be interpreted as higher (when <1) or lower (when >1) resistance to glycation (this only can be done in a log-log model, see below) because the semi-log relationship means that glycation increases by a constant amount (expressed by the coefficient of plasma glucose) for every tenfold increase in plasma glucose (for example, with glucose values 10 and 100, the model would predict glycation values 2 and 4 if the coefficient is 2, or 0.5 and 1 if the coefficient is 0.5). Second, the semi-logarithmic relationship could indeed be interpreted such that glycation rates are relatively lower in species with high plasma glucose levels. However, the semi-log relationship is assumed here a priori and forced to the model by log-transforming only glucose level, while not being tested against alternative models, such as: (i) a model with a simple linear relationship (glycation ~ glucose); or (ii) a loglog model (log(glycation) ~ log(glucose)) assuming power function relationship (glycation = a * glucose^b). The latter model would allow for the interpretation of the coefficient (b) as higher (when <1) or lower (when >1) resistance in glycation in species with high glucose levels as suggested by the authors.

      Besides, a clear explanation of why glucose is log-transformed when included as a predictor, but not when included as a response variable, is missing.

      We apologize for missing an answer to this part before. Indeed, glucose is always log transformed and this is explained in the text.

      The models in the study do not control for the sampling time (i.e., time latency between capture and blood sampling), which may be an important source of noise because blood glucose increases because of stress following the capture. Although the authors claim that "this change in glucose levels with stress is mostly driven by an increase in variation instead of an increase in average values" (ESM6, line 46), their analysis of Tomasek et al.'s (2022) data set in ESM1 using Kruskal-Wallis rank sum test shows that, compared to baseline glucose levels, stress-induced glucose levels have higher median values, not only higher variation.

      Although the authors calculated the variance inflation factor (VIF) for each model, it is not clear how these were interpreted and considered. In some models, GVIF^(1/(2*Df)) is higher than 1.6, which indicates potentially important collinearity; see for example https://www.bookdown.org/rwnahhas/RMPH/mlr-collinearity.html). This is often the case for body mass or clutch mass (e.g. models of glucose or glycation based on individual measurements).

      It seems that the differences between diet groups other than omnivores (the reference category in the models) were not tested and only inferred using the credible intervals from the models. However, these credible intervals relate to the comparison of each group with the reference group (Omnivore) and cannot be used for pairwise comparisons between other groups. Statistics for these contrasts should be provided instead. Based on the plot in Figure 4B, it seems possible that terrestrial carnivores differed in glycation level not only from omnivores but also from herbivores and frugivores/nectarivores.

      Given that blood glucose is related to maximum lifespan, it would be interesting to also see the results of the model from Table 2 while excluding blood glucose from the predictors. This would allow for assessing if the maximum lifespan is completely independent of glycation levels. Alternatively, there might be a positive correlation mediated by blood glucose levels (based on its positive correlations with both lifespan and glycation), which would be a very interesting finding suggesting that high glycation levels do not preclude the evolution of long lifespans.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 84: "glycation scavengers" such as polyamines - can you specify what these polyamines do exactly?

      A clarification of what we mean with "glycation scavengers" is added.

      (2) Line 87-89: specify that the work of Wein et al. and this sentence is about birds.

      This is now clarified.

      (3) Line 95: "88 species" add "OF BIRDS". Also, I think it would be nice if you specified here that you are relying on primary data.

      This is now clarified (line 96).

      (4) Line 90-119: I find this paragraph very long and complex, with too many details on the methodology. For instance, I agree with listing your hypothesis, e.g. that with POL, but then what variables you use to measure the pace of life can go in the materials and methods section (so all lines between 112-119).

      This is explained here as a previous reviewer considered this presentation was indeed needed in the introduction.

      (5) Line 122-124: The first sentence should state that you collected blood samples from various sources, and list some examples: zoos? collaborators? designated wild captures? Stating the sample size before saying what you did to get them is a bit weird. Besides, you skipped a very important detail about how these samples were collected, when, where, and using what protocols. We know very well, that glucose levels can increase quickly with handling stress. Was this considered during the captures? Moreover, you state that you had 484 individuals, but how many samples in total? One per individual or more?

      We kindly ask the reviewer to read the multiple supplementary materials provided, in which the questions of source of the samples, potential stress effects and sample sizes for each model are addressed. All individuals contributed with one sample. More details about the general sources employed are given now in lines 125-127.

      (6) Line 135-36: numbers below 10 should be spelled out.

      Ok. Now that is changed.

      (7) Line 136: the first time I saw that you had both wild and captive samples. This should be among the first things to be described in the methods, as mentioned above.

      As stated above, details on this are included in the supplementary materials, but further clarifications have now been included in the main text (question 5).

      (8) Line 137-138: not clear. So you had 46 samples and 9 species. But what does the 3-3-3 sample mean? or for each species you chose 9 samples (no, cause that would be 81 samples in total)?

      This has now been clarified (lines 139-140).

      (9) Line 139-141: what methodological constraints? Too high glucose levels? Too little plasma?

      There were cases in which the device (glucometer) produced an unspecific error. This did not correspond to too high nor too low glucose levels, as these are differently signalled errors. Neither the manual nor the client service provided useful information to discern the cause. This may perhaps be related to the composition of the plasma of certain species, interfering with the measurement. Some clarifications have been added (lines 143-146).

      (10) Line 143: should be ZIMS.

      Corrected.

      (11) Line 120-148: you generally talk about individuals here, but I feel it would be more precise to use 'samples'.

      The use is totally interchangeable, as we never measured more than one sample for a given individual within this study. Besides, in some cases, saying “sample” could result less informative.

      (12) Line 150: missing the final number of measurements for glucose and glycation.

      Please, read the ESM6 (Table ESM6.1), where this information is given.

      (13) Line 154-155: so you took multiple samples from the same individual? It's the first time the text indicates so. Or do you mean technical replicates were not performed on the same samples?

      As previously indicated, each individual included only one sample. Replicates were done only for some individuals to validate the technique, as it would be unfeasible to perform replicates of all of them. This part of the text is referring to the fact that not all samples were analysed at the same time, as it takes a considerable amount of time, and the mass spectrometry devices are shared by other teams and project. Clarifications in this sense are now added (lines 160-163).

      (14) Line 171-172: "After realizing that diet classifications from AVONET were not always suitable for our purpose" - too informal. Try rephrasing, like "After determining that AVONET diet classifications did not align with our research needs...", but you still need to specify what was wrong with it and what was changed, based on what argument?

      The new formulation suggested by the reviewer has now been applied (lines 181-183). The details are given in the ESM6, as indicated in the text. 

      (15) Line 174-176: You start a new paragraph, talking about missing values, but you do not specify what variable are you talking about. you talk about calculating means, but the last variable you mentioned was diet, so it's even more strange.

      We refer to life history traits. It has now been clarified in the text (line 185).

      (16) Line 177: what longevity records? Coming from where? How did you measure longevity? Maximum lifespan ever recorded? 80-90% longevity, life expectancy???

      We refer to maximum lifespan, as indicated in the introduction and in every other case throughout the manuscript. Clarifications have now been introduced (188-190).

      (17) Line 180-183: using ZIMS can be problematic, especially for maximum longevity. There are often individuals who had a wrong date of birth entered or individuals that were failed to be registered as dead. The extremes in this database are often way off. If you want to combine though, you can check the correlation of lifespans obtained from different sources for the overlapping species. If it's a strong correlation it can be ok, but intuitively this is problematic.

      The species for which we used ZIMS were those for which no other databases reported any values. We could try correlations for other species, but this issue is not necessarily restricted to ZIMS, as the primary origin of the data from other databases is often difficultly traceable. Also, ZIMS is potentially more updated that some of the other databases, mainly Amniotes database, from which we rely the most, as it includes the highest number of species in the most easily accessible format.

      (18) Line 181-186: in ZIMS you calculate the average of the competing records, otherwise you choose the max. Why use different preferences for the same data?

      This constitutes a misunderstanding, for which we include clarifications now (line 196). We were referring here to the fact that for maximum lifespan the maximum is always chosen, while for other variables an average is calculated. 

      (19) Line 198: Burn-in and thinning interval is quite low compared to your number of iterations. How were model convergences checked?

      Please, check ESM1.

      (20) Line 201-203: What's the argument using these priors? Why not use noninformative ones? Do you have some a priori expectations? If so, it should be explained.

      Models have now been rerun with no expectations on the variance partitions so the priors are less informative, given the lack of firm expectations, and results are similar. Smaller nu values are also tried.

      (21) Line 217: "carried" OUT.

      Corrected (now in line 229).

      (22) Line 233-234: "species average model" - what is this? it was not described in the methods.

      Please, read the ESM6.

      (23) Line 232-246: (a) all this would be better described by a table or plot. You can highlight some interesting patterns, but describing it all in the text is not very useful I think, (b) statistically comparing orders represented by a single species is a bit odd.

      (a) Figure 1 shows this graphically, but this part was found to be quite short without descriptions by previous reviewers. (b) We recognise this limitation, but this part is not presented as one of the main results of the article, and just constitutes an attempt to illustrate very general patterns, in order to guide future research, as in most groups glycation has never been measured, so this still constitutes the best illustration of such patterns in the literature.

      (24) Line 281: the first time I saw "mass-adjusted maximum lifespan" - what is this, and how was it calculated? It should be described in the methods. But in any case, neither ratios, nor residuals should be used, but preferably the two variables should be entered side by side in the model.

      Please, see ESM6 for the explanations and justifications for all of this.

      (25) Line 281: there was also no mention of quadratic terms so far. How were polynomial effects tested/introduced in the models? Orthogonal polynomials? or x+ x^2?

      Please, read ESM6.

      (26) Table 1. What is 'Centred Log10Body mass', should be added in the methods.

      Please, read ESM6.

      (27) Table 1: what's the argument behind separating terrestrial and aquatic carnivores?

      This was mostly based on the a priori separation made in AVONET, but it is also used in a similar way by Szarka and Lendvai 2024 (comparative study on glucose in birds), where differences in glucose levels between piscivorous and carnivorous are reported. We had some reasons to think that certain differences in dietary nutrient composition, as discussed later, can make this difference relevant.

      (28) Table 1: The variable "Maximum lifespan" is discussed and plotted as 'massadjusted maximum lifespan' and 'residual maximum lifespan'. First, this is confusing, the same name should be used throughout and it should be defined in the methods section. Second, it seems that non-linear effects were tested by using x + x^2. This is problematic statistically, orthogonal polynomials should be used instead (check polyfunction in R). Also, how did you decide to test for non-linear effects in the case of lifespan but not the other continuous predictors? Should be described in the methods again.

      Please, read ESM6. Data exploration was performed prior to carry out these models. Orthogonal polynomials were considered to difficult the interpretation of the estimates and therefore the patterns predicted by the models, so raw polynomials were used. Clarifications have now been included in line 297.

      (29) Figure 2. From the figure label, now I see that relative lifespan is in fact residual. This is problematic, see Freckleton, R. P. (2009). The seven deadly sins of comparative analysis. Journal of evolutionary biology, 22(7), 1367-1375. Using body mass and lifespan side by side is preferred. This would also avoid forcing more emphasis on body mass over lifespan meaning that you subjectively introduce body mass as a key predictor, but lifespan and body size are highly correlated, so by this, you remove a large portion of variance that might in fact be better explained by lifespan.

      Please, read ESM6 for justifications on the use of residuals.

      Reviewer #2 (Recommendations for the authors):

      (1) If the semi-logarithmic relationship (glycation ~ log10(glucose)) is to be used to support the hypothesis about higher glycation resistance in species with high blood glucose (lines 318-321 and 386-388), it should be tested whether it is significantly better than the model assuming a simple linear relationship (i.e., glycation ~ glucose). Alternatively, if the coefficient is to be used to determine whether glycation rate slows down or accelerates with increasing glucose levels, log-log model (log10(glycation) ~ log10(glucose)) assuming power function relationship (glycation = a * glucose^b) should be used (as is for example in the literature about relationships between metabolic rates and body size). Probably the best approach would be to compare all three models (linear, semi-logarithmic, and log-log) and test if one performs significantly better. If none of them, then the linear model should be selected as the most parsimonious.

      Different options (linear, both semi-logarithmic combinations and log-log) have now been tested, with similar results. All of the models confirm the pattern of a significant positive relationship between glucose and glycation. Moreover, when standardizing the variables (both glucose and glycation, either log transformed or not), the estimate of the slope is almost equal for all the models. It is also lower than one, which in the case of both the linear and log-log confirms the stated prediction. The log-log model, showing a much lower DIC than the linear version, is now shown as the final model.

      (2) ESM6, line 46: Please note that Kruskal-Wallis rank sum test in ESM1 shows that, compared to baseline glucose levels, stress-induced glucose levels have higher median values (not only higher variation). With this in mind, what is the argument here about increased variation being the main driver of stress-induced change in glucose levels based on? It seems that both the median values and variation differ between baseline and stress-induced levels, and this should be acknowledged here.

      As discussed in the public answers, Kruskal Wallis does not allow to determine differences in mean, but just says that the groups are “different” (implicitly, in their ranksums, which does not mean necessarily in mean), while the Levene test performed signals heteroskedasticity. This makes this feature of the data analytically more grounded. Of course, when looking at the data, a higher mean can be perceived, but nothing can be said about its statistical significance. Still, some subtle changes have been introduced in corresponding section of the ESM6.

      (3) Have you recorded the sampling times? If yes, why not control them in the models? It is at least highly advisable to include the sampling times in the data (ESM5).

      As indicated in ESM6 lines 42-43, we do not have sampling times for most of the individuals (only zebra finches and swifts), so this cannot be accounted for in the models.

      (4) If sampling times will remain uncontrolled statistically, I recommend mentioning this fact and its potential consequences (i.e., rather conservative results) in the Methods section of the main text, not only in ESM6.

      A brief description of this has now been included in the main text (lines 129-132), referencing the more detailed discussion on the supplementary materials. Some subtle changes have also been included in the “Possible effects of stress” section of the ESM6.

      (5) ESM6, lines 52-53: The lower repeatability in Tomasek et al.' study compared to your study is irrelevant to the argument about the conservative nature of your results (the difference in repeatability between both studies is most probably due to the broader taxonomic coverage of the current study). The important result in this context is that repeatability is lower when sampling time is not considered within Tomasek et al's data set (ESM1). Therefore, I suggest rewording "showing a lower species repeatability than that from our data" to "showing lower species repeatability when sampling time is not considered" to avoid confusion. Please also note that you refer here to species repeatability but, in ESM1, you calculate individual repeatability. Nevertheless, both individual and species repeatabilities are lower when not controlling for sampling time because the main driver, in that case, is an increased residual variance.

      We recognize the current confusion in the way the explanation is exposed, and have significantly changed the redaction of the section. However, we would like to indicate that ESM1 shows both species and individual repeatability (for Tomasek et al. 2022 data, for ours only species as we do not have repeated individual values). Changes are now made to make it more evident.

      (6) I recommend providing brief guidelines for the interpretation of VIFs to the readers, as well as a brief discussion of the obtained values and their potential importance.

      Thank you for the recommendation. We included a brief description in lines 230-231. Also in the results section (lines 389-393).

      (7) Line: 264: Please note that the variance explained by phylogeny obtained from the models with other (fixed) predictors does not relate to the traits (glucose or glycation) per se but to model residuals.

      We appreciate the indication, and this has been rephrased accordingly (lines 280-286).

      (8) Change the term "confidence intervals" to "credible intervals" throughout the paper, since confidence interval is a frequentist term and its interpretations are different from Bayesian credible interval.

      Thank you for the remark, this has now been changed.

      (9) Besides lifespan, have you also considered quadratic terms for body mass? The plot in Figure 2A suggests there might be a non-linear relationship too.

      A quadratic component of body mass has not shown any significant effect on glucose in an alternative model. Also, a model with linear instead of log glucose (as performed in other studies) did not perform better by comparing the DICs, despite both showing a significant relationship between glucose and body mass. Therefore, this model remains the best option considered as presented in the manuscript.

      (10) ESM6, lines 115-116: It is usually recommended that only factors with at least 6 or 8 levels are included as random effects because a lower number of levels is insufficient for a good estimation of variance.

      In a Bayesian approach this does not apply, as random and fixed factors are estimated similarly. 

      (11) Typos and other minor issues:

      a) Line 66: Delete "related".

      b) Figure 2: "B" label is missing in the plot.

      c) Reference 9: Delete "Author".

      d) References 15 and 83 are duplicated. Keep only ref. 83, which has the correct citation details.

      e) ESM6, line 49: Change "GLLM" to "GLMM".

      Thank you for indicating this. Now it’s corrected.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Response to Reviewer 2’s comments:

      I am concerned that the results in Figure 8D may not be correct, or that the authors may be mis-interpreting them. From my reading of the paper they cite (Lammers & Flamholz 2023), the equilibrium sharpness limit for the network they consider in Figure 8 should be 0.25. But both solutions shown in Figure 8D fall below this limit, which means that they have sharpness levels that could have been achieved with no energy expenditure. If this is the case, then it would imply that while both systems do dissipate energy, they are not doing so productively; meaning that the same results could be achieved while holding Phi=0.

      I acknowledge that this could be due to a difference in how they measure sharpness, but wanted to raise it here in case it is, in fact, a genuine issue with the analysis.There should be an easy fix for this: just set the sharper "desired response" curve in 8b to be such that it demands non-equilibrium sharpness levels (0.25<S<0.5).

      Thank you for raising this point regarding the interpretation of our results in Figure 8D. We agree that if the equilibrium sharpness limit for this particular network is around 0.25 (as shown by Lammers & Flamholz 2023), then achieving a sharpness below this threshold could, in principle, be accomplished without any energy expenditure. However, in our current design approach, the loss function is solely designed to enforce agreement with a target mean mRNA level at different input concentrations; it does not explicitly constrain energy dissipation, noise, or other metrics. Consequently, the DGA has no built-in incentive to minimize or optimize energy consumption, which means the resulting solutions may dissipate energy without exceeding the equilibrium sharpness limit.

      In other words, the same input–output relationship could theoretically be achieved with \Phi =0 if an explicit constraint or regularization term penalizing energy usage had been included. As noted, adding such a term (e.g., penalizing \Phi^2) is conceptually straightforward but falls outside the scope of this study. Our primary goal is to demonstrate the flexibility of the DGA in designing a desired response, rather than to delve into energy–sharpness trade-offs or other biological considerations

      While we appreciate the suggestion to set a higher target sharpness that exceeds the equilibrium limit, we believe the current example effectively demonstrates the DGA’s ability to design circuits with desired input-output relationships, which is the primary focus of this study. Researchers interested in optimizing energy efficiency, burst size, burst frequency, noise, response time, mutual information, or other system properties can easily extend our approach by incorporating additional terms into the loss function to target these specific objectives.

      We hope this explanation addresses your concern and clarifies that the manuscript provides sufficient context for readers to interpret the results in Figure 8D correctly.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      We thank Reviewer #1 for their thoughtful feedback and appreciation of the manuscript's clarity. Our primary goal is to introduce the DGA  as a foundational tool for integrating stochastic simulations with gradient-based optimization. While we recognize the value of providing detailed comparisons with existing methods and a deeper analysis of the DGA’s limitations (such as rare event handling), these topics are beyond the scope of this initial work. Our focus is on presenting the core concept and demonstrating its potential, leaving more extensive evaluations for future research.

      Reviewer #2 (Public review):

      We thank Reviewer #2 for their detailed and constructive feedback. We appreciate the recognition of the DGA as a significant conceptual advancement for stochastic biochemical network analysis and design.

      Weaknesses:

      (1) Validation of DGA robustness in complex systems:

      Our primary goal is to introduce the DGA framework and demonstrate its feasibility. While validation on high-dimensional and non-steady-state systems is important, it is beyond the scope of this initial work. Future studies may improve scalability by employing techniques such as dynamically adjusting the smoothness of the DGA's approximations during simulation or using surrogate models that remain differentiable but more accurately capture discrete behaviors in critical regions, thus preserving gradient computation while improving accuracy.

      (2) Inference accuracy and optimization:

      We acknowledge that the non-convex loss landscape in the DGA can hinder parameter inference and convergence to global minima, as seen in Figure 5A. While techniques like multi-start optimization or second-order methods (e.g., L-BFGS) could improve performance, our focus here is on establishing the DGA framework. We plan to explore better optimization methods in future work to improve the accuracy of parameter inference in complex systems.

      (3) Use of simple models for demonstration:

      We selected well-understood systems to clearly illustrate the capabilities of the DGA. These examples were intended to demonstrate how the DGA can be applied, rather than to solve problems better addressed by analytical methods. Applying DGA to more complex, analytically intractable systems is an exciting avenue for future work, but introducing the method was our main objective in this study.

      Reviewer #3 (Public review):

      We thank the reviewer for their detailed and insightful feedback. We appreciate the recognition of the DGA as a significant advancement for enabling gradient-based optimization in stochastic systems.

      Weaknesses:

      (1) Application beyond steady-state analysis

      We acknowledge the limitation of focusing solely on steady-state properties. To extend the DGA for analyzing transient dynamics, time-dependent loss functions can be incorporated to capture system evolution over time. This could involve aligning simulated trajectories with experimental time-series data or using moment-matching across multiple time points. 

      (2) Numerical instability in gradient computation

      The reviewer correctly highlights that large sharpness parameters (a and b) in the sigmoid and Gaussian approximations can induce numerical instability due to vanishing or exploding gradients. To address this, adaptive tuning of a and b during optimization could balance smoothness and accuracy. Additionally, alternative smoothing functions (e.g., softmax-based reaction selection) and gradient regularization techniques (such as gradient clipping and trust-region methods) could improve stability and convergence.

      Reviewer #1 (recommendations):

      We thank the reviewer for their thoughtful and constructive feedback on our manuscript. Below, we address each of the comments and suggestions raised.

      Main points:

      (1) It would have been useful to have a brief discussion, based on a concrete example, of what can be achieved with the DGA and is totally beyond the reach of the Gillespie algorithm and the numerous existing stochastic simulation methods.

      Thank you for your comment. We would like to clarify that the primary aim of this work is to introduce the DGA and demonstrate its feasibility for tasks such as parameter estimation and network design. Unlike traditional stochastic simulation methods, the DGA’s differentiable nature enables gradient-based optimization, which is not possible with the classical Gillespie algorithm or its variants.

      (2) As often with machine learning techniques, there is a sense of black box, with a lack of mathematical details of the proposed method: as opposite to the exact Gillespie algorithm, whose foundations lie on solid mathematical results (exponentially-distributed waiting times of continuous-time Markov processes), the DGA involves uncontrolled approximations, that are only briefly mentioned in the paper. For instance, it is currently simply noted that "the approximations introduced by the DGA may be pronounced in more complex settings such as the calculation of rare events", without specifying how limiting these errors are. It would be useful to include a clearer and more comprehensive discussion of the limitations of the DGA: When does it work accurately? What are the approximations/errors and can they be controlled? When is it worth paying the price for those approximations/errors, and when is it better to stick to the Gillespie algorithm? Is this notably the case for problems involving rare events? Clearly, these are difficult questions, and the answers are problem specific. However, it would be important to draw the readers' attention on the issues, especially if the DGA is presented as a potentially significant tool in computational and synthetic biology.

      We acknowledge the importance of discussing the limitations of the DGA in more detail. While we have noted that the approximations introduced by the DGA may impact its accuracy in certain scenarios, such as rare-event problems, a deeper exploration of these trade-offs is outside the scope of this work. Instead, we provide sufficient context in the manuscript to guide readers on when the DGA is appropriate.

      (3) The DGA is here introduced and discussed in the context of non-spatial problems (simple gene regulatory networks). However, numerous problems in the life sciences and computational/synthetic biology, involve stochasticity and spatial degrees of freedom (e.g. for problems involving diffusion, migration, etc). It is notoriously challenging to use the Gillespie algorithm to efficiently simulate stochastic spatial systems, especially in the context of rare events (e.g., extinction or fixation problems). It would be useful to comment on whether, and possibly how, the DGA can be used to efficiently simulate stochastic spatial systems, and if it would be better suited than the Gillespie algorithm for this purpose.

      Thank you for pointing this out. Although our current work centers on non-spatial systems, we agree that many biological contexts incorporate both stochasticity and spatial degrees of freedom. Extending the DGA to efficiently simulate such systems would indeed require substantial modifications—for instance, coupling it with reaction-diffusion frameworks or spatial master equations. We believe this is an exciting direction for future research and mention it briefly in the discussion as a potential extension.

      Minor suggestions:

      (1) After Eq.(10): it would be useful to explain and motivate the choice of the ratio JSD/H.

      Done.

      (2) On page 6, just below the caption of Fig.4: it would be useful to clarify what is actually meant by "... convergence towards the steady-state distribution of the exact Gillespie simulation, which is obtained at a simulation time of 10^4".

      Done.

      (3) At the end of Section B on page 7: please clarify what is meant here by "soft directions".

      Done.

      Reviewer #2 (recommendations):

      We thank the reviewer for their thoughtful comments and constructive feedback. Below, we address each of the comments/suggestions.

      Main points:

      (1) Enumerate the conditions under which DGA assumptions hold (and when they do not). There is currently not enough information for the interested reader to know whether DGA would work for their system of interest. Without this information, it is difficult to assess what the true scope of DGA's impact will be. One simple idea would be to test DGA performance along two axes: (i) increasing number of model states and (ii) presence/absence of non-steady state dynamics. I acknowledge that these are very open-ended directions, but looking at even a single instance of each would greatly strengthen this work. Alternatively, if this is not feasible, then the authors should provide more discussion of the attendant difficulties in the main text.

      We agree that a detailed exploration of the conditions under which the DGA assumptions hold would be a valuable addition to the field. However, this paper primarily aims to introduce the DGA methodology and demonstrate its proof-of-concept applications. A comprehensive analysis along axes such as increasing model states or non-steady-state dynamics, while important, would require significant additional simulations and is beyond the scope of this work. In Appendix A, we have discussed the trade-off between accuracy and numerical stability. Additionally, we encourage future users to tune the hyperparameters a and b for their specific systems.

      (2) Demonstrate DGA performance in a more complex biochemical system. Clearly the authors were aware that analytic solutions exist for the 2-state system in Figure 7, but it this is actually also the case (I think) for mean mRNA production rate of the non-equilibrium system in Figure 8. To really demonstrate that DGA is practically viable, I encourage the authors to seek out an interesting application that is not analytically tractable.

      We appreciate the suggestion to validate DGA on a more complex biochemical system. However, the goal of this study is not to provide an exhaustive demonstration of all possible applications but to introduce the DGA and validate it in systems where ground-truth comparisons are available. While the non-equilibrium system in Figure 8 might be analytically tractable, its complexity already provides a meaningful demonstration of DGA’s ability to optimize parameters and design systems. Extending this work to analytically intractable systems is an exciting direction for future studies, and we hope this paper will inspire others to explore these applications.

      (3) Take steps to improve the robustness of parameter optimization and error bar calculations. (3a) When the loss landscape is degenerate, shallow, or otherwise "difficult," a common solution is to perform multiple (e.g. 25-100) inference runs starting from different random positions in parameter space. Doing this, and then taking the parameter set that minimizes the loss should, in theory, lead to a more robust recovery of the optimal parameter set.

      (3b) It seems clear that the Hessian approximation is underestimating the true error in your inference results. One alternative is to use a "brute force" approach like bootstrap resampling to get a better estimate for the statistical dispersion in parameter estimates. But I recognize that this is only viable if the inference is relatively fast. Simply recovering the true minimum will, of course, also help.

      (3a) We acknowledge the challenge posed by degenerate or shallow loss landscapes during parameter optimization. While performing multiple inference runs from different initializations is a common strategy, this approach is computationally intensive. Instead, we rely on standard optimization techniques (e.g., ADAM) to find a robust local minimum. 

      (3b) Thank you for your comment. We agree that Hessian-based error bars can underestimate uncertainty, particularly in degenerate or poorly conditioned loss landscapes. While methods like bootstrap and Monte Carlo can provide more robust estimates, they can be computationally prohibitive for larger-scale simulations. A simpler reason for not using them is the high resource demand from repeated simulations, which quickly becomes infeasible for complex or high-dimensional models. We note these trade-offs between robust estimation and practicality as an important area for further exploration.

      Moderate comments:

      (1) Figure 7: is it possible to also show the inferred kon values? Specifically, it would be of interest to see how kon varies with repressor concentration.

      Thank you for the suggestion. We have updated Figure 7 to include the inferred kon values, showing their variation with the mean mRNA copy number. However, we could not plot them against repressor concentration due to the lack of available data.

      (2) Figure 8B & D: the authors claim that the sharper system dissipates more energy, but doesn't 8D show the opposite of this? More importantly, it does not look like either network drives sharpness levels that exceed the upper equilibrium limit cited in [36]. So it is not clear that it is appropriate to look at energy dissipation here. In fact, it is likely that equilibrium networks could produce the curves in 8B, and might be worth checking.

      Thank you for pointing this out. We realized that the plotted values in Figure 8D were incorrect, as we had mistakenly plotted noise instead of energy dissipation. The plot has now been corrected. 

      (3) Figure 8: I really like this idea of using DGA to "design" networks with desired input-output properties, but I wonder if you could explore more a biologically compelling use-case. Specifically, what about some kind of switch-like logic where, as the activator concentration increases, you have first 0 genes on, then 1 promoter on, then 2 promoters on. This would achieve interesting regulatory logic, and having DGA try to produce step functions would ensure that you force the networks to be maximally sharp (i.e. about double what you're currently achieving).

      Thank you for this intriguing suggestion. While the proposed switch-like logic use case is indeed compelling, implementing such a system would require significant work. This goes beyond the scope of the current study, which focuses on demonstrating the feasibility of DGA for network design with simple input-output properties.

      Minor comments:

      (1) Figure 4B & C: the bar plots do not do a good job conveying the points made by the authors. Consider alternatives, such as scatter plots or box plots that could convey inference uncertainty.

      Done.

      (2) Figure 4B: consider using a log y-axis.

      The y-axis in Figure 4B is already plotted on a log scale.

      (3) Figure 4D is mentioned prior to 4C in the text. Consider reordering.

      Done. 

      (4) Figure 5B: it is difficult to assess from this plot whether or not the landscape is truly "flat," as the authors claim. Flat relative to what? Consider alternative ways to convey your point.

      Thank you for highlighting this ambiguity. By describing the loss landscape as “flat,” we intend to convey its relative insensitivity to parameter variations in certain regions, rather than implying a completely level surface. While we believe Figure 5B still provides a useful qualitative depiction of this behavior, we acknowledge that it does not quantitatively establish “flatness.” In future work, we plan to incorporate more rigorous measures—such as gradient magnitudes or Hessian eigenvalues—to more accurately characterize and communicate the geometry of the loss landscape.

      Reviewer #3 (recommendations):

      We sincerely thank the reviewer for their thoughtful feedback and constructive suggestions, which have helped us improve the clarity and rigor of our manuscript. Below, we address each of the comments.

      (1) Precision is lacking in the introduction section. Do the authors mean the Direct SSA, sorted SSA, which is usually faster, and how about rejection sampling methods?

      Thank you for pointing this out. We have updated the introduction to explicitly mention the Direct SSA.

      (2) When mentioning PyTorch and Jax, would be good to also talk about Julia, as they have fast stochastic simulators.

      We have now mentioned Julia alongside PyTorch and Jax.

      (3) Mentioned references 22-27. Reference 26 is an odd choice; a better reference is from the same author the Automatic Differentiation of Programs with Discrete Randomness, G Arya, M Schauer, F Schäfer, C Rackauckas, Advances in Neural Information Processing Systems, NeurIPS 2022

      We have now cited the suggested reference.

      (4) Page 1, Section: 'To circumnavigate these difficulties, the DGA modifies....' Have you thought about how you would deal with the bias that will be introduced by doing this?

      Thank you for your insightful comment. We acknowledge the potential for bias due to the differentiable approximations in the DGA; however, our analysis has not revealed any systematic bias compared to the exact Gillespie algorithm. Instead, we observe irregular deviations from the exact results as the smoothness of the approximations increases.

      (5) Page 2, first sentence '... traditional Gillespie...' be more precise here - the direct algorithm.

      Thank you for your comment. We believe that the context of the paper, particularly the schematic in Figure 1, makes it clear that we are focusing on the Direct SSA. 

      (6) Page 2, second paragraph: ' In order to simulate such a system...' This doesn't fit here as this section is about tau-leaping. As this approach approximates discrete operations, it is unclear if it would work for large models, snap-shot data of larger scale and if it would be possible to extend it for time-lapse data

      Thank you for your comment. We respectfully disagree that this paragraph is misplaced. The purpose of this paragraph is to explain why the standard Gillespie algorithm does not use fixed time intervals for simulating stochastic processes. By highlighting the inefficiency of discretizing time into small intervals where reactions rarely occur, the paragraph provides necessary context for the Gillespie algorithm’s event-driven approach, which avoids this inefficiency.

      Regarding the applicability of the DGA to larger models, snapshot data, or time-lapse data, we acknowledge these are important directions and have noted them as potential extensions in the discussion section.

      (7) Page 2 Section B: 'In order to make use of modern deep-learning techniques...' It doesn't appear from the paper that any modern deep learning is used.

      Thank you for your comment. Although the DGA does not utilize deep learning architectures such as neural networks, it employs automatic differentiation techniques provided by frameworks like PyTorch and Jax. These tools allow efficient gradient computations, making the DGA compatible with modern optimization workflows.

      (8) Page 3, Fig 1(a). S matrix last row, B and C should swap places: B should be 1 and C is -1.

      Corrected the typo.

      (9) Fig1 needs a more detailed caption.

      Expanded the caption slightly for clarity.

      (10) Page 3 last paragraph: 'The hyperparameter b...' Consequences of this are relevant, for example can we now go below zero. Also, we lose more efficient algorithms here. It would be good to discuss this in more detail that this is an approx.. algorithm that is good for our case study, but for other to use it more tests are needed.

      Thank you for the comment. Appendix A discusses the trade-offs related to a and b, but we agree that more detailed analysis is needed. The hyperparameters are tailored to our case study and must be tuned for specific systems.

      (11) Page 4, Section C, first paragraph, 'The goal of making...' This is snapshot data. Would the framework also translate to time-lapse data? Also, it would be better to make it clearer earlier which type of data are the target of this study.

      Thank you for your suggestion. While the current study focuses on snapshot data and steady-state properties, we believe the DGA could be extended to handle time-lapse data by incorporating multiple recorded time points into its inference objective. Specifically, one could modify the loss function to penalize discrepancies across observed transitions between these time points, effectively capturing dynamic trajectories. We consider this an exciting area for future development, but it lies beyond our present scope.

      (12) Page 4 Section C, sentence '...experimentally measured moments'. Should later be mentioned as error, as moments are imperfect

      Thank you for your comment. We agree that experimentally measured moments are inherently noisy and may not perfectly represent the true system. However, within the context of the DGA, these moments serve as target quantities, and the discrepancy between simulated and measured moments is already accounted for in the loss function. 

      (13) Page 4 Section C, last sentence '...second-order...such as ADAM'. Another formulation would be better as second order can be confusing, especially in the context of parameter estimation

      We have revised the language to avoid confusion regarding “second-order” methods.

      (14) Fig 4(a) a density plot would fit better here

      Fig. 4(a) has been updated to a scatter density plot as suggested.

      (15) Fig 4(c) Would be interesting to see closer analysis of trade of between gradient and accuracy when changing a and b parameters

      Thank you for this suggestion. We acknowledge that an in-depth exploration of these trade-offs could provide deeper insights into the method’s performance. However, for now, we believe the current analysis suffices to highlight the utility of the DGA in the contexts examined.

      (16) Page 6 Section III, first sentence: This fits more to intro. Further the reference list is severely lacking here, with no comparison to other methods for actually fitting stochastic models.

      Thank you for the suggestion. We have added a few references there.

      (17) Page 6, Section A, sentence: '....experimental measured mean...' Why is it a good measure here (moment matching is not perfect), also do you have distribution data, would that not be better? How about accounting for measurement error?

      Thank you for the comment. While we do not have full distribution data, we acknowledge that incorporating experimental measurement error could enhance the framework. A weighted loss function could model uncertainty explicitly, but this is beyond the scope of the current study. 

      (18) Page 7, section B, first paragraph: 'Motivated by this, we defined the...'Why using Fisher-Information when profile-likelihood have proven to be better, especially for systems with few parameters like this.

      Thank you for the suggestion. While profile-likelihood is indeed a powerful tool for parameter uncertainty analysis, we chose Fisher Information due to its computational efficiency and compatibility with the differentiable nature of the DGA framework.

      (19)  Page 7, section C, sentence '...set kR/off=1..'. In this case, we cannot infer this parameter.

      Thank you for the comment. You are correct that setting kR/off = 1 effectively normalizes the rates, making this parameter unidentifiable. In steady-state analyses, not all parameters can be independently inferred because observable quantities depend on relative—rather than absolute—rate values (as evident when setting the time derivative to zero in the master equation). To infer all parameters, one would need additional information, such as time-series data or moments at finite time.

      (20)  Page 7 Section 2. Estimating parameters .... Sentence: '....as can be seen, there is very good agreement..' How many times the true value falls within the CI (because corr 0.68 is not great).

      Thank you for your comment. While a correlation coefficient of 0.68 indicates moderate agreement, the primary goal was to demonstrate the feasibility of parameter estimation using the DGA rather than achieving perfect accuracy. The coverage of the CI was not explicitly calculated, as the focus was on the overall trends and relative agreement.

      (21) Page 7 Section 2. Estimating parameters .... Sentence: 'Fig5(c) shows....' Is this when using exact simulator?

      Thank you for your question. Yes, the exact values in x-axis of Fig. 5(c) are obtained using the exact Gillespie simulation.

      (22) Page 7 Section 3 Estimating parameters for the... Sentence: 'Fig6(a) shows...' Why Cis are not shown?

      Thank you for your comment. CIs are not shown in Fig. 6(a) because this particular case is degenerate, making the calculation and meaningful representation of CIs challenging. 

      (23) Page 10, Sentence: 'As can be seen in Fig 7(b)...' Can you show uncertainty in measured value? It would be good to see something of a comparison against an exact method, at least on simulated synthetic data

      Thank you for the comment. Fig. 7(a) already includes error bars for the experimental data, which account for measurement uncertainty. However, in Fig. 7(b), we do not include error bars for the experimental values due to limitations in the available data.

      (24) Page 12, Section B Loss function '...n=600...' This is on a lower range. Have you tested with n=1000?

      Yes, we have tested with n=1000 and observed no significant difference in the results. This indicates that n=600 is sufficient for the purposes of this study. 

      (25) Fig 8(c) why there are no CI shown?

      Thank you for your comment. CIs were not included in Fig. 8(c) due to degeneracy, which makes meaningful confidence intervals difficult to compute.

      (26) Page 12 Conclusion, sentence: '..gradients via backpropagation...' Actually, by making the function continuous, both forward and reverse mode might be used. And in this case, forward-mode would actually be the fastest by quite a margin

      Thank you for your insightful comment. You are correct that by making the function continuous, both forward-mode and reverse-mode automatic differentiation can be used. We have now mentioned this point in the discussion.

      (27) Overall comment for the Conclusion section: It would be good to discuss how this framework compares to other model-fitting frameworks for models with stochastic dynamics. The authors mention dynamic data and more discussion on this would be very welcomed. Why use ADAM and not something established like BFGS for model fitting? It would be interesting to discuss how this can fit with other SSA algorithms (e.g. in practice sorting SSA is used when models get larger). Also, inference comparison against exact approaches would be very nice. As it is now, the authors truly only check the accuracy of the SSA on 1 model -it would be interesting to see for other models.

      Thank you for your detailed comments. While this study focuses on introducing the DGA and demonstrating its feasibility, we agree that comparisons with other model-fitting frameworks, testing on additional models, and integrating with other SSA variants like sorted SSA are important directions for future work. Similarly, extending the DGA to handle transient dynamics and exploring alternatives to ADAM, such as BFGS, are promising areas to investigate further.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We are grateful for the positive evaluation of the work and the critical points raised by the reviewers. We thank all reviewers for their excellent comments. We believe that these revisions have significantly improved the quality of our study.

      In response to the 2nd reviewer, we apologise for the missing data, we failed to provide a P-value of the RM ANOVA post-hoc test, we are very grateful that this was brought to our attention. We have revised the RM ANOVA by using the Tukey HSD post-hoc test, which is generally recommended for pairwise comparisons as it is more robust to unequal sample sizes. The controversial statistical analysis of the overall comparison of speed differences was deleted, as were three supplementary figures (Fig. S4, Fig. S9 and S10), which are less informative in support of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study is useful as it provides further analysis of previously published data to address which specific genes are part of the masculinizing actions of E2 on female zebra finches, and where these key genes are expressed in the brain. However the data supporting the conclusion of masculinizing the song system are incomplete as the current manuscript is a re-analysis of differential gene expression modulated by E2 treatment between male/female zebra finches without manipulation of gene expression. The conclusions (and title) regarding song learning are also incompletely supported with no gene manipulation or song analysis. Importantly, the use of WGCNA for a question of sex-chromosome expression in species without dosage compensation is considered inadequate. As the experimental design did not include groups to directly test for song learning, and there was also no analysis of song performance, these data were also considered inadequate in that regard.

      We are sorry the editor felt the manuscript so incomplete and inadequate. Though the tone of this assessment seems more severe than the below reviewer comments, we are also happy to see that the editor has considered our paper further for a revised publication, based on the reviewer’s comments. We address the editor’s comments as follows:

      While we agree that manipulation of some of the genes we discovered, whose expression levels are E2-sensitive in the song system, would take the study further in validating some proposed hypothesis in the discussion of the paper, we don’t think the outcome of gene manipulations would change the major conclusions from the results of the paper. In this study we performed estrogen hormone manipulations, with causal consequences on gene expression in song nuclei and associated song behavior. In a way this is analogous to gene manipulations, but manipulating directly the action of estrogen. The categories of genes impacted, and the differences among the sex chromosomes wouldn’t change.

      For the comment on WGCNA being inadequate for addressing questions on sex chromosome expression in species without dosage compensation, we think the evidence in our data does not bear that out. One main result of this paper is the separation of Z chromosome transcripts whose expression is most strongly regulated by chromosomal dosage (WGCNA module E) across regions from those subject to additional sources of regulation in song nuclei (other modules). It seems to us that rather than being confounded by the lack of dosage compensation, WGCNA allowed us to better resolve the effects of dosage on different genes within the sex chromosomes. We have added a new figure more directly examining sex chromosome transcript abundance within different modules. Briefly, we found that module E assigned Z chromosome genes exhibited almost exactly the male-biased expression ratio expected from no dosage compensation while the Z chromosome genes in song nuclei assigned to other modules were expressed below the dosage predicted value, consistent with module E containing those genes whose expression are most strongly regulated by dose across all brain regions sampled.

      At its core, WGCNA finds sets of correlated genes. The biological reality of the zebra finch transcriptome is that Z chromosome expression is largely anti-correlated with W chromosome due to dosage. However, this dosage effect is not felt equally by all genes and WGCNA provides an unbiased computational framework which can be used to separate dose from other potential sources of gene regulation. This is why roughly ⅓ of Z chromosome genes are not assigned to module E; for example the growth hormone receptor is assigned to module G based on its correlation with genes upregulated within HVC.

      “As the experimental design did not include groups to directly test for song learning, and there was also no analysis of song performance, these data were also considered inadequate in that regard.”

      Concerning the comment on no analysis on song performance in the paper, all such analyses were conducted on our previous study on the same animals (Choe et al. 2021, Hormones & Behavior). The birds considered here were sacrificed at PHD30, prior to the onset of learned song behavior. However, females treated with E2 the same at the same time and allowed to mature into adulthood, went onto to develop rudimentary song. Further, induction of rudimentary song learning in females following E2 treatment has been well established since the early ‘80s. We have added the following text toward the end of the intro to make this more clear:

      “While the birds for this study were sacrificed prior to the developmental presentation of song behavior, we have previously shown that female finches treated in exactly the say way with E2 go on to produce rudimentary imitative songs as adults (Choe et al 2021), consistent with the known induction of vocal learning in females by E2 (REF).”

      Reviewer #1 (Recommendations For The Authors):

      Overall, this is a wonderfully designed and executed study that takes full advantage of new resources, such as the most complete zebra finch genome assembly yet, as well as the latest methods. I have very few suggestions as to the improvement of the manuscript. They are as follows:

      Results Section:

      In the paragraph "Identification of gene expression modules in song nuclei":

      "The E2-treated females in this study had similarly sized song system nuclei as males, indicating that E2 treatment prevented atrophy."

      Clarify if this comparison is to treated and/or untreated males.

      We thank the reviewer for their comment. The relative differences in the song nuclei sizes between the E2-treated females and the other groups is more complex that our original sentence implied. We have revised the main the text as follows

      “In our previous study, we found that estradiol treatment in PHD30 females caused HVC to enlarge and Area X to appear when it normally does not develop in females, but both at sizes less than in untreated or treated males.The sizes of PHD30 female LMAN RA were already the sizes as seen in males, as the later has not atrophied yet at this age(25).”

      In the paragraph "Sex- and micro-chromosome gene expression across the telencephalon": "These animal and chromosome specific shifts in the transcriptomes could represent the systemic effects of allelic chromosomal structural variation..."

      The authors should clarify the meaning of a"llelic chromosomal structural variation" in this context, as it is an unusual phrase. Major chromosomal structural variation seems unlikely to produce these effects. Is it also possible that animal-specific modules with brain-wide higher could also result from laboratory contamination between all samples from one animal? This is not too likely but perhaps should be acknowledged or ruled out.

      We have removed the word allelic, which was unnecessary. We can’t envision how laboratory contamination could occur such that all of one animal’s samples would be affected to produce the observed result which is module and chromosome specific. An animal wide effect could emerge during sacrifice, but we can think of no reason that would affect these modules and not others. Rather, the most likely explanation is biological natural difference between animals. We have added this consideration of alternative explanations.

      In the section "Candidate gene drivers of HVC specialization in E2-treated females":

      When discussing GHR's role in cell growth and proliferation, the authors' argument could be expanded by including the documented role of GH signaling in anti-apoptotic protection of neurons from rounds of neural pruning during development as documented in the chicken, e.g. • Harvey S, Baudet M-L, Sanders EJ. 2009. Growth Hormone-induced Neuroprotection in the Neural Retina during Chick Embryogenesis. Annals of the New York Academy of Sciences, 1163: 414-416. https://doi.org/10.1111/j.1749-6632.2008.03641.x

      We thank the reviewer for sharing this publication with us.. We have added the following sentence to our discussion with the above citation. “Further, our results are consistent with growth hormone’s known role in avian anti-apoptotic protection, with elevated signaling associated with the survival of chicken neurons during rounds of pruning in the developing

      retina.”

      The authors' argument of the relevance of the passerine GH duplication would be strengthened by citing:

      • Rasband SA, Bolton PE, Fang Q, Johnson PLF, Braun MJ. 2023. Evolution of the Growth Hormone Gene Duplication in Passerine Birds, Genome Biol Evol, 15(3) https://doi.org/10.1093/gbe/evad033. Greatly expands on the Yuri et al. paper cited by characterizing of the molecular evolution of these genes across hundreds of avian species, supporting positive selection on multiple amino acid sites identified in both ancestral and duplicate (passerine) growth hormone.

      • Xie F, London SE, Southey BR et al. 2010. The zebra finch neuropeptidome: prediction, detection and expression. BMC Biol 8, 28. https://doi.org/10.1186/1741-7007-8-28 The authors report significantly different expression of the ancestral GH gene in the adult male zebra finch auditory forebrain after different song exposure experiences.

      We have amended the results section sentence and added all suggested citations. The sentence now reads: “The gene which encodes growth hormone receptor’s ligand, growth hormone, is interestingly duplicated and undergoing accelerated evolution in the genomes of songbirds (Rasband et al 2023); the GH ligand has been found to be upregulated in the zebra finch auditory forebrain following the presentation of familiar song (Xie et al 2010).”

      Figures:

      - Figure 1B. "Duration of sex typing" being a shorter bar compared to the others is not fully explained in the experimental design. Presumably at the end of this time period, the sex is non-invasively, phenotypically evident. I suggest an arrow pointing to the PHD/PHD range when sex is apparent in plumage/anatomy.

      - Figure 4. Caption appears to be truncated; "across all... genes"?

      Fixed

      - Figure 5. For 5E, 5F, 5G, 5H, consider enlarging the plots so overlapping gene symbols are readable. Alternately, smaller numbers or symbols could be used with a key in areas where overlapping symbols are hard to prevent.

      We agree that these are not the easiest to read; we originally offset the symbols in R to minimize overlaps, but it can only do so much for the more crammed panels. We have now added a supplemental .xlsx file with the underlying data from each of the 4 tests for readers that want to examine the data in more detail.

      Reviewer #2 (Recommendations For The Authors):

      Since WGCNA methods will inherently draw together sex-chromosome genes into the same module in systems without dosage compensation, I suggest the authors rerun the WGCNA using only female samples and only male samples. Then identify the composition of modules that differ between E2 and vehicle-treated females and compare these genes to males. Then from male WGCNA identify the composition of modules that differ between E2 and vehicle-treated males and compare to female modules.

      We thank the reviewer for their suggestions. However, we believe it is not as strong as the approach we used, which is grouping data from both sexes in the WGCNA analyses in a study that is looking for sex differences. The reviewer's proposed approach amounts to computing modules twice (once per sex), determining song system specialized modules and E2 responsive modules in both settings, then intersecting the two sets to find corresponding modules, all done to prevent the non-dose compensated sex chromosome genes from being drawn into the same module.

      While WGCNA does group the majority of sex chromosome genes into module E, it does not categorize them all this way (Fig 3). The module classification instead differentiates those sex chromosome genes whose expression are most explained by chromosome dosage / sex across regions (modE) from those whose expression is controlled by other sources of regulation; for an example of the latter, the growth hormone receptor (GHR) is one of several Z chromosome genes classified into modG as its expression better correlates with the genes specialized to HVC than it does with the majority of dosage-dependent Z chromosome genes found in modE. Further, to remove biological sex as a variable in a WGCNA analysis that is focused on sex differences seems counterintuitive.

      Instead, to quantitatively address the reviewer’s concern, we conducted additional analyses, that led to an added new figure, associated text, and tables, that better describes sex/chromosome dosage effects on the abundance (FPKM) and expression ratios of sex chromosome transcripts by module irrespective of brain region (Fig. 5). We find that the Z chromosome genes in modE were expressed at the expected chromosome dosage in the non-vocal surrounding regions (65.06% observed vs 66.6% expected) while in other modules, other Z chromosome genes were expressed at intermediate levels between equal expression and the expected chromosomal dosage. For example, the Z chromosome content of modules D and H exhibited near equal expression between sexes. Within the song system, Z chromosome gene content of modG was highly expressed in males beyond what is expected from chromosome dosage, consistent with modG’s male-specific upregulation in song nuclei relative to surrounds in the absence of E2. These results better demonstrate that in our WGCNA on the combined dataset we are able to separate those Z chromosome genes whose expression is predominantly dosage controlled from those subject to additional regulation such as song system specialization.

      Fig. S3 Legend: 'Black arrow' -> 'Red arrow'

      Change made.

      Fig. S5 - What part of the figure shows the 'human convergent signature'? Also, simply listing the number of genes mapped to a chromosome is misleading to readers unfamiliar with the zebra finch genome, you should either provide the number of genes on each chromosome or present as corrected by that number.

      Fig. S5 was the same type of analyses in Fig. 3 but with an older zebra finch genome assembly, where we had not included the panel a for enrichments with genes convergent in expression between songbird song regions and humans speech brain regions. However, we see that Fig. S5 was not adding any new important information to the paper, so we removed it.

      For the chromosome analyses in Fig. 3b, we provide both the total raw number of module assigned genes broken down by chromosome (The black bar plots on the right) as well as a statistical fold-enrichment value of modules per chromosome. Given the number of genes per chromosome and genes per module in our data, we computed the fold-enrichment for each intersection (observed intersection size / expected intersection size). To test for the significance of these enrichments, we bootstrapped FDR corrected p values for the enrichment of each chromosome-module pairing by randomizing the mapping of genes to modules to construct a null distribution of fold enrichments for each intersection. Our intent was not to describe the size of the chromosomes themselves, information readily available elsewhere, but to show the disproportionate chromosomal origins of the gene sets considered by this study. Performing this enrichment test using all annotated genes per chromosome would artificially increase enrichment values and make the analysis less conservative by confounding the results with the inherent enrichment for “brain function” in the assigned genes relative to all genes.

      At several places you say "we correlated expression of each sex chromosome transcript with sexual dimorphism within each region, such that expressed W genes would be positively correlated and depleted Z chromosome genes would be anticorrelated." What was the sexual dimorphism that was being correlated with? Is this the eigengene?

      We thank you for this comment. Our language was less clear than it could be. We tested for correlations of both the eigengene and the individual gene expression profiles with the biological sex of the animals. We have changed the text to:

      “To do this, we tested for a correlation between the expression of each sex chromosome transcript to the animals’ sex within each brain region. We found that female-enriched transcripts were positively correlated with sex and male-enriched transcripts were anticorrelated (Fig. 4f,g).”

      Fig. 4A: The 'true/false' boxes and animal A-L is confusing and unnecessary. I'd suggest just using M and F (or sex symbols) with a horizontal line below each set of 3 for respective E2 and Veh.

      Change made.

      Reviewer #3 (Recommendations For The Authors):

      General comments:

      After the initial characterization of the datasets and module identification, it is quite hard to follow the logic of the data presentation in the various other Results sections or to clearly understand how they relate to the main stated goal to identify factors related to sex differences in vocal learning. The most relevant findings relate to the presumed actions of hormone treatment and sex chromosome gene dosage in song nuclei, whereas analyses of other brain areas, other chromosomes, or speech-related genes serve more as controls and/or appear as distractions from the main theme. A suggestion to increase the clarity of the presentation and potential impact of the study is to change the order of the presentation, focusing first on the specific analyses and comparisons that most directly speak to the main goals of the study, and then secondarily and more briefly presenting the controls or less related comparisons.

      The reviewer’s suggestion for the results section organization is exactly what we had tried to do. We opened the first paragraph on identification of modules, then presented the song nuclei specific modules, followed by E2-changes to those modules; and the followed by other specific results for the remainder of the paper, including module enrichments to specific chromosomes. The reviewer mentioned our analyses of “other brain areas” (which we assume to mean the non-vocal surround regions), other chromosomes (which we assume means autosomes) and speech-related genes as controls were a distraction in the paper; but within our analysis, these other brain regions are essential controls needed to assess the song-system specificity of any observed sex differences observed from the very first paragraphs of the results; the autosomes were not controls for sex chromosome results, but primary results in of themselves; the overlap with speech-related genes was also not a control, but a novel discovery. We have revised these points in the paper to make them clearer, and revised some of the section titles and transitions between sections to help increase clarity of the main storyline of the paper.

      A related comment is that many of the inferences drawn from the WGCNA analysis were quite complex, thus independent verification of some predictions would be quite valuable. For example, consider the passage: "In non-vocal learning juvenile females, interestingly LMAN was specialized relative to the AN by the same gene modules as in males (B, F, and I) as well as an additional module G (Fig. 2b); RA was specialized by module A as in males, but not module L and by additional modules A and G. In contrast, neither juvenile female HVC nor Area X exhibited significant gene module expression specializations relative to their surrounds." Providing in situ hybridization verification of these regional gene expression predictions with a few representative genes seems quite feasible given the group's expertise and would considerably strengthen confidence in the module-based inferences.

      We performed in-situ independent validation of 36 candidate genes in our first study with this dataset (Choe et al 2021). We now mention this validation in the revised paper. The reviewer’s selection of one of our sentences though made us realize that our grammar used to explain the results was not as clear as it needs to be. We thus cleaned up the grammar of our module descriptions so that it should be communicated with less complexity, the main issue noted by the reviewer.

      Because this is a re-analysis of a previously published dataset, the authors should more explicitly describe somewhere in the Discussion how the present analysis advances the understanding of sex differences in songbird neuroanatomy and behavior beyond the previous analysis.

      We have added an additional sentence into the discussion more clearly separating the results of the current study from our previous work.

      Specific comments:

      Abstract:

      There is evidence (from Frank Johnson's lab) that RA does not completely atrophy in female zebra finches, but is still present with more preserved connectivity than previously thought, possibly related to non-singing function(s). A term like 'marked reduction' of female RA may more accurately reflect the current state of knowledge.

      We have changed the text to “partial atrophy”.

      The term "driver" is undefined and unclear at this point of the paper; a clear definition for "driver" is also lacking in the Intro.

      We now define “driver” or “genetic driver” as understood to mean “a genetic locus whose expression and/or inheritance strongly regulates the trait of interest”.

      When citing the literature on studies that identified "specific genes with specialized up- or down-regulated expression in song and speech circuits relative to the surrounding motor control circuits", the authors should also cite studies from other labs (e.g. Li et al., PNAS, 2007; Lovell et al, Plos One 2008; Lovell et al, BMC Genomics 2018; Nevue et al, Sci Rep. 2020), to be accurate and fair.

      Citations added

      For clarity, the authors should explicitly formulate the hypothesis they are proposing at the end of the Summary.

      We thank the reviewer for this comment. We have replaced the final sentence of the summary with: “We present a hypothesis where reduced dosage and expression of these Z chromosome genes changes the developmental trajectory of female HVC, partially preventable by estrogen treatment, contributing to the loss of song learning behavior.”

      Introduction:

      Vocal learning is arguably the ability to imitate 'vocal' sounds, this could be clarified here.

      We have amended the sentence to “Vocal learning is the ability to imitate heard sounds using a vocal organ…”

      Given they are currently considered sister taxa, can the author briefly explain what is the basis for assuming that songbirds and parrots independently evolved vocal learning?

      Although songbirds and parrots belong to a monophyletic clade, they are not sister taxa. There are two clades separating them that are vocal non-learners. We have cited the reference that demonstrated this (e.g. Jarvis et al 2014 Science).

      Why use Taeniopygia castanotis rather than the more broadly used Taeniopygia guttata?

      Zebra finches were recently reclassified and T.castanotis is now more accurate. The Indonesian Timor zebra finch retained T.guttata while the Australian finch, used here, was classified as T.castanotis.

      The authors state: "...vocal learning is strongly sexually dimorphic in zebra finches and many other vocal learning species" and cite Nottebohm and Arnold, Science, 1978. That landmark paper only shows dimorphism in song nuclei (not learning) in two songbird species. The authors should provide citations for other species and behavior, or modify the statement.

      We have added an additional citation (Odom et al.) to this sentence which covers the phylogeny more broadly.

      The authors refer to the nucleus RA as being located in the lateral intermediate arcopallium (LAI). Other labs have described this domain as the dorsal part of the intermediate arcopallium, thus AId or AID (Mello et al., JCN, 2019; Yuan and Bottjer, J Neurophys 2019; Yuan and Bottjer, eNeuro, 2020; Nevue et al., BCM Genomics, 2020). The authors should acknowledge this discrepancy in nomenclature so that data and conclusions can be more readily compared across studies.

      We thank the reviewer and agree that this is helpful. We have added a note at the first mention of LAI.

      The authors state that data from the gynandromorph bird described by Agate et al implicates "sex chromosome gene expression within the song system" as involved in the song system sexual dimorphism. That study, however, only rules out circulating gonadal steroids, and while suggesting a cell-autonomous mechanism like sex chromosome genes, it does not necessarily exclude other brain-autonomous factors like sex differences in local production of sex steroids.

      We say that this study “implicated” sex chromosome gene expression, which is accurate per the results and discussion of that study. We are unsure what “brain autonomous factors like sex differences in local production of sex steroids” means?. “Brain autonomous” and “local production” in the brain seem contradictory in this context?

      Results:

      The authors state that "the E2-treated females in this study had similarly sized song system nuclei as males, indicating that E2 treatment prevented atrophy". Can they clarify whether the VEH-treated females actually had smaller RAs than E2-treated females or VEH-treated males at this age? This is still quite early in development and it is unclear to what extent RA's marked sexual dimorphism in adults or later developmental ages has already taken place in untreated (or VEH-treated) birds. A related comment is that the authors state later on: "We interpret these findings to indicate that: LMAN and RA atrophy later in juvenile female development..." Does this mean these nuclei actually did not show the marked decreases predicted earlier in the text? Clarifying this point would be helpful.

      We thank the reviewer for pointing out this discrepancy, which reviewer #1 asked for clarification as well. RA size at this age is similar in males and females. However, HVC and Area X is smaller and absent respectively in females and E2 treatment partially prevents this atrophy. The text now reads:

      “In our previous study, we found that estradiol treatment in PHD30 females caused HVC to enlarge and Area X to appear when it normally does not develop in females, but both at sizes less than in untreated or treated males.The sizes of PHD30 female LMAN RA were already the sizes as seen in males, as the later has not atrophied yet at this age(25).”

      The authors acknowledge that area X is absent in untreated and VEH-treated females. Could they please clarify how area X and the surrounding stratal tissue that excludes area X were identified for laser capture dissections in juvenile females?

      We have added the following statement to the main text portion discussing the dissections.

      “In the case of vehicle-treated females which lack Area X, a piece of striatum from the same location of where Area X is found in males was taken. “

      Some passages in Results discussing the authors' interpretation of the modules seem quite speculative and possibly belong instead in the Discussion. For example: "... that module A and G genes could be associated with the start of this atrophy; HVC and Area X are likely the first to atrophy or not develop; and lack of any gene module specialization in them at this age could mean that they would be more sensitive to estrogen prevention of vocal learning loss."

      As suggested, we have removed this text from the results; these ideas were already presented in the Discussion. We have merged the resulting small paragraph with the preceding paragraph.

      The authors state: "To assess the effects of chronic exogenous estrogen on the developing song system, we first performed a control analysis of modules in the E2-treated juvenile males." How can an assessment of estrogen effects be a "control" analysis? Does this refer to a contrast with females? Please clarify the language here.

      The reviewer is correct, that E2 treatment in males should not be considered a control experiment. We removed the word “control”.

      When discussing the GO-enriched terms for module G, it is unclear how the authors reached the conclusion about "proliferative", as the enriched terms do not refer to processes more directly indicative of proliferation like "cell division" or "cell cycle regulation". Rather, these terms seem more related to differentiation and growth, which do not necessarily imply proliferation. The authors also refer to "HVC proliferation" later on in the Discussion. However, there is conclusive evidence from several labs that proliferative events associated with postnatal neuronal addition and/or replacement in song nuclei occur in the subventricular zone, not in song nuclei like HVC itself, and that the growth of song nuclei largely reflects cell survival, as well as growth in size and complexity under the regulation of sex steroids.

      We agree that “proliferative” may have been a poor word choice here. We did not mean to indicate that cell division was occuring in HVC itself. Instead we meant to indicate that HVC is able to accommodate the new born neurons from the SVZ. We have replaced the word “proliferative” throughout. In the instance the reviewer mentions specifically we replaced it with,“...potentially act to integrate and differentiate late born neurons.”

      With regard to module E, referring to a telencephalon-wide sexually dimorphic gene expression program seems quite a stretch, given that only a few regions were sampled and compared between sexes. These related statements should be toned down.

      We have replaced “telencephalon-wide” with “more distributed across the finch telencephalon” and other similar language in each instance.

      The following passage is very speculative and should shortened and/or moved to the Discussion: "Based on the findings in these gene sets, we hypothesize that without excess estrogen in females, HVC expansion is prevented by not specializing the growth and neuronal migration promoting genes in module G to the HVC lineage by late development. This is potentially enacted by depleting necessary gene products from the Z sex chromosome, such as GHR, which are already present in only one copy."

      We have deleted this portion of the text, as the idea is already present in the discussion.

      Figure 5: To this reviewer, the comparisons of sex differences and of female response to E2 are the most relevant and informative ones, whereas the regional differences between song nuclei and surrounds refer to different cell populations and cell types where other processes may be occurring, independently of what occurs in song nuclei. It thus seems like the intersection analysis in panel 5i may be subtracting out important "core genes" in terms of E2 effects and/or sex differences in the most relevant cell populations, i.e. in this case within song nucleus HVC.

      Song learning and the vocal learning brain regions are specialized behaviors and associated nuclei which have a set of hundreds of specialized genes compared to the surrounds. Our previous findings shows that E2 drives the appearance of these specializations in female zebra finches. Thus, we considered this the most interesting question to focus on, which we have further highlighted. Nevertheless, in response to the reviewers suggestion, we have added a .xlsx supplemental file containing the results from each of the individual tests so readers may examine any single comparison, or set of comparisons, in more detail.

      Discussion:

      It is unclear what the term "critical period" refers to in: "during the critical period of atrophy for the female vocal circuit"; please clarify.

      We agree that our language was nebulous. We have replaced it with “as several male song control nuclei begin to expand and female nuclei partially atrophy”

      In: "HVC appeared unspecialized at the level of gene module expression in control females", does "unspecialized" refer to a lack of difference in gene expression when compared to surroundings? Please clarify. The same comment applies to other uses of "unspecialized" in this paragraph.

      Yes, unspecialized means lack of difference in gene expression in the song nucleus. To clarify this point, we have reworked that and the following sentence as follows:

      “HVC appeared unspecialized compared to the surrounding nidopallium at the level of gene module expression in control females, with no significantly differentially expressed MEGs . However, in E2-treated females, HVC exhibited a subset of the observed male HVC gene expression specializations. Similarly, the vehicle-treated female striatum located where Area X would be also lacked any specialized gene module expression, but the E2-treated female Area X exhibited a subset of the male Area X specializations, consistent with the known absence of Area X in vehicle-treated females and presence in E2-treated females.”

      The authors state: "...we surprisingly found that the most specialized genes were disproportionately from the Z chromosome", when discussing module G in HVC. Why is this so surprising? In a sense, this could be taken as consistent with the findings of Friedrich et al, 2022, where sex differences in the RA transcriptome were predominantly Z related on 20 dph. Arguably 20 dph is still quite close to 30 dph in the present study, when compared to 50 dph in Friedrich et al, when autosomes predominate.

      Our bioRxiv was originally posted in July 2021, prior to the publication of Friedrich et al, 2022; however we had previously added to our discussion that several of our results are consistent with the observations of Friedrich et al..

      We have a different interpretation of Z chromosome gene results in Friedrich et al.. While the percentage of specialized genes from the Z chromosome decreased, the absolute number of specialized Z chromosome genes actually increased over this interval. In Fig. 3a from Friedrich et al. it appears that ~28% of Z chromosome genes were sexually dimorphic in their expression in RA at PHD20 but that ~39% of Z chromosome genes were similarly dimorphic at PHD50. We interpret this result as the Z chromosome genes being among the earliest genes differentially expressed between the sexes, not that their differential expression or role ever subsequently decreased. We have reworked this portion of the discussion to make our point more clear:

      “This model of sex chromosome influenced song system development is consistent with recent observations comparing male and female zebra finch transcriptomes from RA at young juvenile (PHD20) and young adult (PHD50) ages in un-manipulated birds (Friedrich et al. 2022)57. While that study proposes that the role of the sex chromosome in maintaining transcriptomic sex differences diminishes across development, as the proportion of specialized genes that originate on the sex chromosomes diminishes, this effect was driven by large increases in differentially expressed autosomal genes rather than by any reduction in sex chromosome dimorphism; the percentage of differentially expressed Z chromosome genes increased from PHD20 (28%) to PHD50 (39%) (Friedrich et al). This leads us to conclude that sexually dimorphic Z chromosome expression at juvenile ages precedes the sexually dimorphic expression of the autosomes seen in adults. This is consistent with our hypothesis that sufficient expression of select Z chromosome gene products (GHR, etc..) is necessary for subsequent autosomal song system specializations (modG).”

      Further, when we write ”When examining the module G HVC specialization induced by E2-treatment in female HVC, we surprisingly found that the most specialized genes were disproportionately from the Z chromosome” we are referring to the upregulation of module G by E2 in female HVC, not the sex difference described in RA by Friedrich et al. which only utilized un-treated RA samples and thus is more likely related to our observations of module E.

      The term "sexual dimorphism" has been more traditionally used for sex differences that are very marked, like features that are highly regressed or absent in one sex, most often in females. Quantitative differences in gene expression, including dosage differences like those related to module E, are more appropriately described as sex differences rather than dimorphisms. That usage would be more consistent with most of the literature, and thus preferable.

      We did a google search for common definitions, and found more the opposite. Sexual dimorphism being used more often as differences of degree (with the zebra finch example as one of the top hits), and sex differences being used often as more absolute differences (like presence vs absence of the Y chromosome). Further, as in the reviewer’s first sentence, the definition of sexual dimorphism is a sex difference. That is, the two phrases can be interchangeable. Thus, we prefer to keep sexual dimorphism.

      Several references are incomplete or seem truncated, like 9 and 10.

      Fixed

      Table S2: Please examine and take into account the W gene curation presented in Table S3 of Friedrich et al., 2022.

      We have added additional supplementals (supplemetal_w_chrom_express.csv and supplemetal_z_chrom_express.csv) of the data provided in new Fig 5 incorporating the curation information from Table S3 from Friedrich et al.

      Data availability:

      Genes for all the main modules identified should be presented in a Supplemental Table, or through a link to a stable data repository.

      We have added an additional Supplemental Table supplemental_gene_module_assignment.csv with this information.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      The authors have created a system for designing and running experimental pipelines to control and coordinate different programs and devices during an experiment, called Heron. Heron is based around a graphical tool for creating a Knowledge Graph made up of nodes connected by edges, with each node representing a separate Python script, and each edge being a communication pathway connecting a specific output from one node to an iput on another. Each node also has parameters that can be set by the user during setup and runtime, and all of this behavior is concisely specified in the code that defines each node. This tool tries to marry the ease of use, clarity, and selfdocumentation of a purely graphical system like Bonsai with the flexibility and power of a purely code-based system like Robot Operating System (ROS).

      Strengths:

      The underlying idea behind Heron, of combining a graphical design and execution tool with nodes that are made as straightforward Python scripts seems like a great way to get the relative strengths of each approach. The graphical design side is clear, selfexplanatory, and self-documenting, as described in the paper. The underlying code for each node tends to also be relatively simple and straightforward, with a lot of the complex communication architecture successfully abstracted away from the user. This makes it easy to develop new nodes, without needing to understand the underlying communications between them. The authors also provide useful and well-documented templates for each type of node to further facilitate this process. Overall this seems like it could be a great tool for designing and running a wide variety of experiments, without requiring too much advanced technical knowledge from the users.

      The system was relatively easy to download and get running, following the directions and already has a significant amount of documentation available to explain how to use it and expand its capabilities. Heron has also been built from the ground up to easily incorporate nodes stored in separate Git repositories and to thus become a large community-driven platform, with different nodes written and shared by different groups. This gives Heron a wide scope for future utility and usefulness, as more groups use it, write new nodes, and share them with the community. With any system of this sort, the overall strength of the system is thus somewhat dependent on how widely it is used and contributed to, but the authors did a good job of making this easy and accessible for people who are interested. I could certainly see Heron growing into a versatile and popular system for designing and running many types of experiments.

      Weaknesses:

      (1) The number one thing that was missing from the paper was any kind of quantification of the performance of Heron in different circumstances. Several useful and illustrative examples were discussed in depth to show the strengths and flexibility of Heron, but there was no discussion or quantification of performance, timing, or latency for any of these examples. These seem like very important metrics to measure and discuss when creating a new experimental system.

      Heron is practically a thin layer of obfuscation of signal passing across processes. Given its design approach it is up to the code of each Node to deal with issues of timing, synching and latency and thus up to each user to make sure the Nodes they author fulfil their experimental requirements. Having said that, Heron provides a large number of tools to allow users to optimise the generated Knowledge Graphs for their use cases. To showcase these tools, we have expanded on the third experimental example in the paper with three extra sections, two of which relate to Heron’s performance and synching capabilities. One is focusing on Heron’s CPU load requirements (and existing Heron tools to keep those at acceptable limits) and another focusing on post experiment synchronisation of all the different data sets a multi Node experiment generates.   

      (2) After downloading and running Heron with some basic test Nodes, I noticed that many of the nodes were each using a full CPU core on their own. Given that this basic test experiment was just waiting for a keypress, triggering a random number generator, and displaying the result, I was quite surprised to see over 50% of my 8-core CPU fully utilized. I don’t think that Heron needs to be perfectly efficient to accomplish its intended purpose, but I do think that some level of efficiency is required. Some optimization of the codebase should be done so that basic tests like this can run with minimal CPU utilization. This would then inspire confidence that Heron could deal with a real experiment that was significantly more complex without running out of CPU power and thus slowing down.

      The original Heron allowed the OS to choose how to manage resources over the required process. We were aware that this could lead to significant use of CPU time, as well as occasionally significant drop of packets (which was dependent on the OS and its configuration). This drop happened mainly when the Node was running a secondary process (like in the Unity game process in the 3rd example). To mitigate these problems, we have now implemented a feature allowing the user to choose the CPU that each Node’s worker function runs on as well as any extra processes the worker process initialises. This is accessible from the Saving secondary window of the node. This stops the OS from swapping processes between CPUs and eliminates the dropping of packages due to the OS behaviour. It also significantly reduces the utilised CPU time. To showcase this, we initially run the simple example mentioned by the reviewer. The computer running only background services was using 8% of CPU (8 cores). With Heron GUI running but with no active Graph, the CPU usage went to 15%. With the Graph running and Heron’s processes running on OS attributed CPU cores, the total CPU was at 65% (so very close to the reviewer’s 50%). By choosing a different CPU core for each of the three worker processes the CPU went down to 47% and finally when all processes were forced to run on the same CPU core the CPU load dropped to 30%.  So, Heron in its current implementation running its GUI and 3 Nodes takes 22% of CPU load. This is still not ideal but is a consequence of the overhead of running multiple processes vs multiple threads. We believe that, given Heron’s latest optimisation, offering more control of system management to the user, the benefits of multi process applications outweigh this hit in system resources. 

      We have also increased the scope of the third example we provide in the paper and there we describe in detail how a full-scale experiment with 15 Nodes (which is the upper limit of number of Nodes usually required in most experiments) impacts CPU load. 

      Finally, we have added on Heron’s roadmap projects extra tasks focusing only on optimisation (profiling and using Numba for the time critical parts of the Heron code).

      (3) I was also surprised to see that, despite being meant specifically to run on and connect diverse types of computer operating systems and being written purely in Python, the Heron Editor and GUI must be run on Windows. This seems like an unfortunate and unnecessary restriction, and it would be great to see the codebase adjusted to make it fully crossplatform-compatible.

      This point was also mentioned by reviewer 2. This was a mistake on our part and has now been corrected in the paper. Heron (GUI and underlying communication functionality) can run on any machine that the underlying python libraries run, which is Windows, Linux (both for x86 and Arm architectures) and MacOS. We have tested it on Windows (10 and 11, both x64), Linux PC (Ubuntu 20.04.6, x64) and Raspberry Pi 4 (Debian GNU/Linux 12 (bookworm), aarch64). The Windows and Linux versions of Heron have undergone extensive debugging and all of the available Nodes (that are not OS specific) run on those two systems. We are in the process of debugging the Nodes’ functionality for RasPi. The MacOS version, although functional requires further work to make sure all of the basic Nodes are functional (which is not the case at the moment). We have also updated our manuscript (Multiple machines, operating systems and environments) to include the above information. 

      (4) Lastly, when I was running test experiments, sometimes one of the nodes, or part of the Heron editor itself would throw an exception or otherwise crash. Sometimes this left the Heron editor in a zombie state where some aspects of the GUI were responsive and others were not. It would be good to see a more graceful full shutdown of the program when part of it crashes or throws an exception, especially as this is likely to be common as people learn to use it. More problematically, in some of these cases, after closing or force quitting Heron, the TCP ports were not properly relinquished, and thus restarting Heron would run into an "address in use" error. Finding and killing the processes that were still using the ports is not something that is obvious, especially to a beginner, and it would be great to see Heron deal with this better. Ideally, code would be introduced to carefully avoid leaving ports occupied during a hard shutdown, and furthermore, when the address in use error comes up, it would be great to give the user some idea of what to do about it.

      A lot of effort has been put into Heron to achieve graceful shut down of processes, especially when these run on different machines that do not know when the GUI process has closed. The code that is being suggested to avoid leaving ports open has been implemented and this works properly when processes do not crash (Heron is terminated by the user) and almost always when there is a bug in a process that forces it to crash. In the version of Heron available during the reviewing process there were bugs that caused the above behaviour (Node code hanging and leaving zombie processes) on MacOS systems. These have now been fixed. There are very seldom instances though, especially during Node development, that crashing processes will hang and need to be terminated manually. We have taken on board the reviewer’s comments that users should be made more aware of these issues and have also described this situation in the Debugging part of Heron’s documentation. There we explain the logging and other tools Heron provides to help users debug their own Nodes and how to deal with hanging processes.

      Heron is still in alpha (usable but with bugs) and the best way to debug it and iron out all the bugs in all use cases is through usage from multiple users and error reporting (we would be grateful if the errors the reviewer mentions could be reported in Heron’s github Issues page). We are always addressing and closing any reported errors, since this is the only way for Heron to transition from alpha to beta and eventually to production code quality.

      Overall I think that, with these improvements, this could be the beginning of a powerful and versatile new system that would enable flexible experiment design with a relatively low technical barrier to entry. I could see this system being useful to many different labs and fields. 

      We thank the reviewer for positive and supportive words and for the constructive feedbacks. We believe we have now addressed all the raised concerns.  

      Reviewer #2 (Public Review):

      Summary:

      The authors provide an open-source graphic user interface (GUI) called Heron, implemented in Python, that is designed to help experimentalists to

      (1) design experimental pipelines and implement them in a way that is closely aligned with their mental schemata of the experiments,

      (2) execute and control the experimental pipelines with numerous interconnected hardware and software on a network.

      The former is achieved by representing an experimental pipeline using a Knowledge Graph and visually representing this graph in the GUI. The latter is accomplished by using an actor model to govern the interaction among interconnected nodes through messaging, implemented using ZeroMQ. The nodes themselves execute user-supplied code in, but not limited to, Python.

      Using three showcases of behavioral experiments on rats, the authors highlighted three benefits of their software design:

      (1) the knowledge graph serves as a self-documentation of the logic of the experiment, enhancing the readability and reproducibility of the experiment,

      (2) the experiment can be executed in a distributed fashion across multiple machines that each has a different operating system or computing environment, such that the experiment can take advantage of hardware that sometimes can only work on a specific computer/OS, a commonly seen issue nowadays,

      (3) he users supply their own Python code for node execution that is supposed to be more friendly to those who do not have a strong programming background.

      Strengths:

      (1) The software is light-weight and open-source, provides a clean and easy-to-use GUI,

      (2) The software answers the need of experimentalists, particularly in the field of behavioral science, to deal with the diversity of hardware that becomes restricted to run on dedicated systems.

      (3) The software has a solid design that seems to be functionally reliable and useful under many conditions, demonstrated by a number of sophisticated experimental setups.

      (4) The software is well documented. The authors pay special attention to documenting the usage of the software and setting up experiments using this software.

      Weaknesses:

      (1) While the software implementation is solid and has proven effective in designing the experiment showcased in the paper, the novelty of the design is not made clear in the manuscript. Conceptually, both the use of graphs and visual experimental flow design have been key features in many widely used softwares as suggested in the background section of the manuscript. In particular, contrary to the authors’ claim that only pre-defined elements can be used in Simulink or LabView, Simulink introduced MATLAB Function Block back in 2011, and Python code can be used in LabView since 2018. Such customization of nodes is akin to what the authors presented.

      In the Heron manuscript we have provided an extensive literature review of existing systems from which Heron has borrowed ideas. We never wished to say that graphs and visual code is what sets Heron apart since these are technologies predating Heron by many years and implemented by a large number of software. We do not believe also that we have mentioned that LabView or Simulink can utilise only predefined nodes. What we have said is that in such systems (like LabView, Simulink and Bonsai) the focus of the architecture is on prespecified low level elements while the ability for users to author their own is there but only as an afterthought. The difference with Heron is that in the latter the focus is on the users developing their own elements. One could think of LabView style software as node-based languages (with low level visual elements like loops and variables) that also allow extra scripting while Heron is a graphical wrapper around python where nodes are graphical representations of whole processes. To our knowledge there is no other software that allows the very fast generation of graphical elements representing whole processes whose communication can also be defined graphically. Apart from this distinction, Heron also allows a graphical approach to writing code for processes that span different machines which again to our knowledge is a novelty of our approach and one of its strongest points towards ease of experimental pipeline creation (without sacrificing expressivity). 

      (2) The authors claim that the knowledge graph can be considered as a self-documentation of an experiment. I found it to be true to some extent. Conceptually it’s a welcoming feature and the fact that the same visualization of the knowledge graph can be used to run and control experiments is highly desirable (but see point 1 about novelty). However, I found it largely inadequate for a person to understand an experiment from the knowledge graph as visualized in the GUI alone. While the information flow is clear, and it seems easier to navigate a codebase for an experiment using this method, the design of the GUI does not make it a one-stop place to understand the experiment. Take the Knowledge Graph in Supplementary Figure 2B as an example, it is associated with the first showcase in the result section highlighting this self-documentation capability. I can see what the basic flow is through the disjoint graph where 1) one needs to press a key to start a trial, and 2) camera frames are saved into an avi file presumably using FFMPEG. Unfortunately, it is not clear what the parameters are and what each block is trying to accomplish without the explanation from the authors in the main text. Neither is it clear about what the experiment protocol is without the help of Supplementary Figure 2A.

      In my opinion, text/figures are still key to documenting an experiment, including its goals and protocols, but the authors could take advantage of the fact that they are designing a GUI where this information, with properly designed API, could be easily displayed, perhaps through user interaction. For example, in Local Network -> Edit IPs/ports in the GUI configuration, there is a good tooltip displaying additional information for the "password" entry. The GUI for the knowledge graph nodes can very well utilize these tooltips to show additional information about the meaning of the parameters, what a node does, etc, if the API also enforces users to provide this information in the form of, e.g., Python docstrings in their node template. Similarly, this can be applied to edges to make it clear what messages/data are communicated between the nodes. This could greatly enhance the representation of the experiment from the Knowledge graph.

      In the first showcase example in the paper “Probabilistic reversal learning.

      Implementation as self-documentation” we go through the steps that one would follow in order to understand the functionality of an experiment through Heron’s Knowledge Graph. The Graph is not just the visual representation of the Nodes in the GUI but also their corresponding code bases. We mention that the way Heron’s API limits the way a Node’s code is constructed (through an Actor based paradigm) allows for experimenters to easily go to the code base of a specific Node and understand its 2 functions (initialisation and worker) without getting bogged down in the code base of the whole Graph (since these two functions never call code from any other Nodes). Newer versions of Heron facilitate this easy access to the appropriate code by also allowing users to attach to Heron their favourite IDE and open in it any Node’s two scripts (worker and com) when they double click on the Node in Heron’s GUI. On top of this, Heron now (in the versions developed as answers to the reviewers’ comments) allows Node creators to add extensive comments on a Node but also separate comments on the Node’s parameters and input and output ports. Those can be seen as tooltips when one hovers over the Node (a feature that can be turned off or on by the Info button on every Node).  

      As Heron stands at the moment we have not made the claim that the Heron GUI is the full picture in the self-documentation of a Graph. We take note though the reviewer’s desire to have the GUI be the only tool a user would need to use to understand an experimental implementation. The solution to this is the same as the one described by the reviewer of using the GUI to show the user the parts of the code relevant to a specific Node without the user having to go to a separate IDE or code editor. The reason this has not been implemented yet is the lack of a text editor widget in the underlying gui library (DearPyGUI). This is in their roadmap for their next large release and when this exists we will use it to implement exactly the idea the reviewer is suggesting, but also with the capability to not only read comments and code but also directly edit a Node’s code (see Heron’s roadmap). Heron’s API at the moment is ideal for providing such a text editor straight from the GUI.

      (3) The design of Heron was primarily with behavioral experiments in mind, in which highly accurate timing is not a strong requirement. Experiments in some other areas that this software is also hoping to expand to, for example, electrophysiology, may need very strong synchronization between apparatus, for example, the record timing and stimulus delivery should be synced. The communication mechanism implemented in Heron is asynchronous, as I understand it, and the code for each node is executed once upon receiving an event at one or more of its inputs. The paper, however, does not include a discussion, or example, about how Heron could be used to address issues that could arise in this type of communication. There is also a lack of information about, for example, how nodes handle inputs when their ability to execute their work function cannot keep up with the frequency of input events. Does the publication/subscription handle the queue intrinsically? Will it create problems in real-time experiments that make multiple nodes run out of sync? The reader could benefit from a discussion about this if they already exist, and if not, the software could benefit from implementing additional mechanisms such that it can meet the requirements from more types of experiments.

      In order to address the above lack of explanation (that also the first reviewer pointed out) we expanded the third experimental example in the paper with three more sections. One focuses solely on explaining how in this example (which acquires and saves large amounts of data from separate Nodes running on different machines) one would be able to time align the different data packets generated in different Nodes to each other. The techniques described there are directly implementable on experiments where the requirements of synching are more stringent than the behavioural experiment we showcase (like in ephys experiments). 

      Regarding what happens to packages when the worker function of a Node is too slow to handle its traffic, this is mentioned in the paper (Code architecture paragraph): “Heron is designed to have no message buffering, thus automatically dropping any messages that come into a Node’s inputs while the Node’s worker function is still running.” This is also explained in more detail in Heron’s documentation. The reasoning for a no buffer system (as described in the documentation) is that for the use cases Heron is designed to handle we believe there is no situation where a Node would receive large amounts of data in bursts while very little data during the rest of the time (in which case a buffer would make sense). Nodes in most experiments will either be data intensive but with a constant or near constant data receiving speed (e.g. input from a camera or ephys system) or will have variable data load reception but always with small data loads (e.g. buttons). The second case is not an issue and the first case cannot be dealt with a buffer but with the appropriate code design, since buffering data coming in a Node too slow for its input will just postpone the inevitable crash. Heron’s architecture principle in this case is to allow these ‘mistakes’ (i.e. package dropping) to happen so that the pipeline continues to run and transfer the responsibility of making Nodes fast enough to the author of each Node. At the same time Heron provides tools (see the Debugging section of the documentation and the time alignment paragraph of the “Rats playing computer games”  example in the manuscript) that make it easy to detect package drops and either correct them or allow them but also allow time alignment between incoming and outgoing packets. In the very rare case where a buffer is required Heron’s do-it-yourself logic makes it easy for a Node developer to implement their own Node specific buffer.

      (4) The authors mentioned in "Heron GUI’s multiple uses" that the GUI can be used as an experimental control panel where the user can update the parameters of the different Nodes on the fly. This is a very useful feature, but it was not demonstrated in the three showcases. A demonstration could greatly help to support this claim.

      As the reviewer mentions, we have found Heron’s GUI double role also as an experimental on-line controller a very useful capability during our experiments. We have expanded the last experimental example to also showcase this by showing how on the “Rats playing computer games” experiment we used the parameters of two Nodes to change the arena’s behaviour while the experiment was running, depending on how the subject was behaving at the time (thus exploring a much larger set of parameter combinations, faster during exploratory periods of our shaping protocols construction). 

      (5) The API for node scripts can benefit from having a better structure as well as having additional utilities to help users navigate the requirements, and provide more guidance to users in creating new nodes. A more standard practice in the field is to create three abstract Python classes, Source, Sink, and Transform that dictate the requirements for initialisation, work_function, and on_end_of_life, and provide additional utility methods to help users connect between their code and the communication mechanism. They can be properly docstringed, along with templates. In this way, the com and worker scripts can be merged into a single unified API. A simple example that can cause confusion in the worker script is the "worker_object", which is passed into the initialise function. It is unclear what this object this variable should be, and what attributes are available without looking into the source code. As the software is also targeting those who are less experienced in programming, setting up more guidance in the API can be really helpful. In addition, the self-documentation aspect of the GUI can also benefit from a better structured API as discussed in point 2 above.

      The reviewer is right that using abstract classes to expose to users the required API would be a more standard practice. The reason we did not choose to do this was to keep Heron easily accessible to entry level Python programmers who do not have familiarity yet with object oriented programming ideas. So instead of providing abstract classes we expose only the implementation of three functions which are part of the worker classes but the classes themselves are not seen by the users of the API. The point about the users’ accessibility to more information regarding a few objects used in the API (the worker object for example) has been taken on board and we have now addressed this by type hinting all these objects both in the templates and more importantly in the automatically generated code that Heron now creates when a user chooses to create a Node graphically (a feature of Heron not present in the version available in the initial submission of this manuscript).  

      (6) The authors should provide more pre-defined elements. Even though the ability for users to run arbitrary code is the main feature, the initial adoption of a codebase by a community, in which many members are not so experienced with programming, is the ability for them to use off-the-shelf components as much as possible. I believe the software could benefit from a suite of commonly used Nodes.

      There are currently 12 Node repositories in the Heron-repositories project on Github with more than 30 Nodes, 20 of which are general use (not implementing a specific experiment’ logic). This list will continue to grow but we fully appreciate the truth of the reviewer’s comment that adoption will depend on the existence of a large number of commonly used Nodes (for example Numpy, and OpenCV Nodes) and are working towards this goal.

      (7) It is not clear to me if there is any capability or utilities for testing individual nodes without invoking a full system execution. This would be critical when designing new experiments and testing out each component.

      There is no capability to run the code of an individual Node outside Heron’s GUI. A user could potentially design and test parts of the Node before they get added into a Node but we have found this to be a highly inefficient way of developing new Nodes. In our hands the best approach for Node development was to quickly generate test inputs and/or outputs using the “User Defined Function 1I 1O” Node where one can quickly write a function and make it accessible from a Node. Those test outputs can then be pushed in the Node under development or its outputs can be pushed in the test function, to allow for incremental development without having to connect it to the Nodes it would be connected in an actual pipeline. For example, one can easily create a small function that if a user presses a key will generate the same output (if run from a “User Defined Function 1I 1O” Node) as an Arduino Node reading some buttons. This output can then be passed into an experiment logic Node under development that needs to do something with this input. In this way during a Node development Heron allows the generation of simulated hardware inputs and outputs without actually running the actual hardware. We have added this way of developing Nodes also in our manuscript (Creating a new Node).

      Reviewer #3 (Public Review):

      Summary:

      The authors present a Python tool, Heron, that provides a framework for defining and running experiments in a lab setting (e.g. in behavioural neuroscience). It consists of a graphical editor for defining the pipeline (interconnected nodes with parameters that can pass data between them), an API for defining the nodes of these pipelines, and a framework based on ZeroMQ, responsible for the overall control and data exchange between nodes. Since nodes run independently and only communicate via network messages, an experiment can make use of nodes running on several machines and in separate environments, including on different operating systems.

      Strengths:

      As the authors correctly identify, lab experiments often require a hodgepodge of separate hardware and software tools working together. A single, unified interface for defining these connections and running/supervising the experiment, together with flexibility in defining the individual subtasks (nodes) is therefore a very welcome approach. The GUI editor seems fairly intuitive, and Python as an accessible programming environment is a very sensible choice. By basing the communication on the widely used ZeroMQ framework, they have a solid base for the required non-trivial coordination and communication. Potential users reading the paper will have a good idea of how to use the software and whether it would be helpful for their own work. The presented experiments convincingly demonstrate the usefulness of the tool for realistic scientific applications.

      Weaknesses:

      (1) In my opinion, the authors somewhat oversell the reproducibility and "selfdocumentation" aspect of their solution. While it is certainly true that the graph representation gives a useful high-level overview of an experiment, it can also suffer from the same shortcomings as a "pure code" description of a model - if a user gives their nodes and parameters generic/unhelpful names, reading the graph will not help much. 

      This is a problem that to our understanding no software solution can possibly address. Yet having a visual representation of how different inputs and outputs connect to each other we argue would be a substantial benefit in contrast to the case of “pure code” especially when the developer of the experiment has used badly formatted variable names.

      (2) Making the link between the nodes and the actual code is also not straightforward, since the code for the nodes is spread out over several directories (or potentially even machines), and not directly accessible from within the GUI. 

      This is not accurate. The obligatory code of a Node always exists within a single folder and Heron’s API makes it rather cumbersome to spread scripts relating to a Node across separate folders. The Node folder structure can potentially be copied over different machines but this is why Heron is tightly integrated with git practices (and even politely asks the user with popup windows to create git repositories of any Nodes they create whilst using Heron’s automatic Node generator system). Heron’s documentation is also very clear on the folder structure of a Node which keeps the required code always in the same place across machines and more importantly across experiments and labs. Regarding the direct accessibility of the code from the GUI, we took on board the reviewers’ comments and have taken the first step towards correcting this. Now one can attach to Heron their favourite IDE and then they can double click on any Node to open its two main scripts (com and worker) in that IDE embedded in whatever code project they choose (also set in Heron’s settings windows). On top of this, Heron now allows the addition of notes both for a Node and for all its parameters, inputs and outputs which can be viewed by hovering the mouse over them on the Nodes’ GUIs. The final step towards GUI-code integration will be to have a Heron GUI code editor but this is something that has to wait for further development from Heron’s underlying GUI library DearPyGUI.

      (3) The authors state that "[Heron’s approach] confers obvious benefits to the exchange and reproducibility of experiments", but the paper does not discuss how one would actually exchange an experiment and its parameters, given that the graph (and its json representation) contains user-specific absolute filenames, machine IP addresses, etc, and the parameter values that were used are stored in general data frames, potentially separate from the results. Neither does it address how a user could keep track of which versions of files were used (including Heron itself).

      Heron’s Graphs, like any experimental implementation, must contain machine specific strings. These are accessible either from Heron’s GUI when a Graph json file is opened or from the json file itself. Heron in this regard does not do anything different to any other software, other than saving the graphs into human readable json files that users can easily manipulate directly.

      Heron provides a method for users to save every change of the Node parameters that might happen during an experiment so that it can be fully reproduced. The dataframes generated are done so in the folders specified by the user in each of the Nodes (and all those paths are saved in the json file of the Graph). We understand that Heron offers a certain degree of freedom to the user (Heron’s main reason to exist is exactly this versatility) to generate data files wherever they want but makes sure every file path gets recorded for subsequent reproduction. So, Heron behaves pretty much exactly like any other open source software. What we wanted to focus on as the benefits of Heron on exchange and reproducibility was the ability of experimenters to take a Graph from another lab (with its machine specific file paths and IP addresses) and by examining the graphical interface of it to be able to quickly tweak it to make it run on their own systems. That is achievable through the fact that a Heron experiment will be constructed by a small amount of Nodes (5 to 15 usually) whose file paths can be trivially changed in the GUI or directly in the json file while the LAN setup of the machines used can be easily reconstructed from the information saved in the secondary GUIs.

      Where Heron needs to improve (and this is a major point in Heron’s roadmap) is the need to better integrate the different saved experiments with the git versions of Heron and the Nodes that were used for that specific save. This, we appreciate is very important for full reproducibility of the experiment and it is a feature we will soon implement. More specifically users will save together with a graph the versions of all the used repositories and during load the code base utilised will come from the recorded versions and not from the current head of the different repositories. This is a feature that we are currently working on now and as our roadmap suggests will be implemented by the release of Heron 1.0. 

      (4) Another limitation that in my opinion is not sufficiently addressed is the communication between the nodes, and the effect of passing all communications via the host machine and SSH. What does this mean for the resulting throughput and latency - in particular in comparison to software such as Bonsai or Autopilot? The paper also states that "Heron is designed to have no message buffering, thus automatically dropping any messages that come into a Node’s inputs while the Node’s worker function is still running."- it seems to be up to the user to debug and handle this manually?

      There are a few points raised here that require addressing. The first is Heron’s requirement to pass all communication through the main (GUI) machine. We understand (and also state in the manuscript) that this is a limitation that needs to be addressed. We plan to do this is by adding to Heron the feature of running headless (see our roadmap). This will allow us to run whole Heron pipelines in a second machine which will communicate with the main pipeline (run on the GUI machine) with special Nodes. That will allow experimenters to define whole pipelines on secondary machines where the data between their Nodes stay on the machine running the pipeline. This is an important feature for Heron and it will be one of the first features to be implemented next (after the integration of the saving system with git). 

      The second point is regarding Heron’s throughput latency. In our original manuscript we did not have any description of Heron’s capabilities in this respect and both other reviewers mentioned this as a limitation. As mentioned above, we have now addressed this by adding a section to our third experimental example that fully describes how much CPU is required to run a full experimental pipeline running on two machines and utilising also non python code executables (a Unity game). This gives an overview of how heavy pipelines can run on normal computers given adequate optimisation and utilising Heron’s feature of forcing some Nodes to run their Worker processes on a specific core. At the same time, Heron’s use of 0MQ protocol makes sure there are no other delays or speed limitations to message passing. So, message passing within the same machine is just an exchange of memory pointers while messages passing between different machines face the standard speed limitations of the Local Access Network’s ethernet card speeds. 

      Finally, regarding the message dropping feature of Heron, as mentioned above this is an architectural decision given the use cases of message passing we expect Heron to come in contact with. For a full explanation of the logic here please see our answer to the 3rd comment by Reviewer 2.

      (5) As a final comment, I have to admit that I was a bit confused by the use of the term "Knowledge Graph" in the title and elsewhere. In my opinion, the Heron software describes "pipelines" or "data workflows", not knowledge graphs - I’d understand a knowledge graph to be about entities and their relationships. As the authors state, it is usually meant to make it possible to "test propositions against the knowledge and also create novel propositions" - how would this apply here?

      We have described Heron as a Knowledge Graph instead of a pipeline, data workflow or computation graph in order to emphasise Heron’s distinct operation in contrast to what one would consider a standard pipeline and data workflow generated by other visual based software (like LabView and Bonsai). This difference exists on what a user should think of as the base element of a graph, i.e. the Node. In all other visual programming paradigms, the Node is defined as a low-level computation, usually a language keyword, language flow control or some simple function. The logic in this case is generated by composing together the visual elements (Nodes). In Heron the Node is to be thought of as a process which can be of arbitrary complexity and the logic of the graph is composed by the user both within each Node and by the way the Nodes are combined together. This is an important distinction in Heron’s basic operation logic and it is we argue the main way Heron allows flexibility in what can be achieved while retaining ease of graph composition (by users defining their own level of complexity and functionality encompassed within each Node). We have found that calling this approach a computation graph (which it is) or a pipeline or data workflow would not accentuate this difference. The term Knowledge Graph was the most appropriate as it captures the essence of variable information complexity (even in terms of length of shortest string required) defined by a Node.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      -  No buffering implies dropped messages when a node is busy. It seems like this could be very problematic for some use cases... 

      This is a design principle of Heron. We have now provided a detailed explanation of the reasoning behind it in our answer to Reviewer 2 (Paragraph 3) as well as in the manuscript. 

      -  How are ssh passwords stored, and is it secure in some way or just in plain text?  

      For now they are plain text in an unencrypted file that is not part of the repo (if one gets Heron from the repo). Eventually, we would like to go to private/public key pairs but this is not a priority due to the local nature of Heron’s use cases (all machines in an experiment are expected to connect in a LAN).  

      Minor notes / copyedits:

      -  Figure 2A: right and left seem to be reversed in the caption. 

      They were. This is now fixed. 

      -  Figure 2B: the text says that proof of life messages are sent to each worker process but in the figure, it looks like they are published by the workers? Also true in the online documentation.  

      The Figure caption was wrong. This is now fixed.

      -  psutil package is not included in the requirements for GitHub

      We have now included psutil in the requirements.

      -  GitHub readme says Python >=3.7 but Heron will not run as written without python >= 3.9 (which is alluded to in the paper)

      The new Heron updates require Python 3.11. We have now updated GitHub and the documentation to reflect this.

      -  The paper mentions that the Heron editor must be run on Windows, but this is not mentioned in the Github readme.  

      This was an error in the manuscript that we have now corrected.

      -  It’s unclear from the readme/manual how to remove a node from the editor once it’s been added.  

      We have now added an X button on each Node to complement the Del button on the keyboard (for MacOS users that do not have this button most of the times).

      -  The first example experiment is called the Probabilistic Reversal Learning experiment in text, but the uncertainty experiment in the supplemental and on GitHub.  

      We have now used the correct name (Probabilistic Reversal Learning) in both the supplemental material and on GitHub

      -  Since Python >=3.9 is required, consider using fstrings instead of str.format for clarity in the codebase  

      Thank you for the suggestion. Latest Heron development has been using f strings and we will do a refactoring in the near future.

      -  Grasshopper cameras can run on linux as well through the spinnaker SDK, not just Windows.  

      Fixed in the manuscript. 

      -  Figure 4: Square and star indicators are unclear.

      Increased the size of the indicators to make them clear.

      -  End of page 9: "an of the self" presumably a typo for "off the shelf"?  

      Corrected.

      -  Page 10 first paragraph. "second root" should be "second route"

      Corrected.

      -  When running Heron, the terminal constantly spams Blowfish encryption deprecation warnings, making it difficult to see the useful messages.  

      The solution to this problem is to either update paramiko or install Heron through pip. This possible issue is mentioned in the documentation.

      -  Node input /output hitboxes in the GUI are pretty small. If they could be bigger it would make it easier to connect nodes reliably without mis-clicks.

      We have redone the Node GUI, also increasing the size of the In/Out points.

      Reviewer #2 (Recommendations For The Authors):

      (1) There are quite a few typos in the manuscript, for example: "one can accessess the code", "an of the self", etc.  

      Thanks for the comment. We have now screened the manuscript for possible typos.

      (2) Heron’s GUI can only run on Windows! This seems to be the opposite of the key argument about the portability of the experimental setup.  

      As explained in the answers to Reviewer 1, Heron can run on most machines that the underlying python libraries run, i.e. Windows and Linux (both for x86 and Arm architectures). We have tested it on Windows (10 and 11, both x64), Linux PC (Ubuntu 20.04.6, x64) and Raspberry Pi 4 (Debian GNU/Linux 12 (bookworm), aarch64). We have now revised the manuscript and the GitHub repo to reflect this.

      (3) Currently, the output is displayed along the left edge of the node, but the yellow dot connector is on the right. It would make more sense to have the text displayed next to the connectors.  

      We have redesigned the Node GUI and have now placed the Out connectors on the right side of the Node.

      (4) The edges are often occluded by the nodes in the GUI. Sometimes it leads to some confusion, particularly when the number of nodes is large, e.g., Fig 4.

      This is something that is dependent on the capabilities of the DearPyGUI module. At the moment there is no way to control the way the edges are drawn.

      Reviewer #3 (Recommendations For The Authors):

      A few comments on the software and the documentation itself:

      - From a software engineering point of view, the implementation seems to be rather immature. While I get the general appeal of "no installation necessary", I do not think that installing dependencies by hand and cloning a GitHub repository is easier than installing a standard package.

      We have now added a pip install capability which also creates a Heron command line command to start Heron with. 

      -The generous use of global variables to store state (minor point, given that all nodes run in different processes), boilerplate code that each node needs to repeat, and the absence of any kind of automatic testing do not give the impression of a very mature software (case in point: I had to delete a line from editor.py to be able to start it on a non-Windows system).  

      As mentioned, the use of global variables in the worker scripts is fine partly due to the multi process nature of the development and we have found it is a friendly approach to Matlab users who are just starting with Python (a serious consideration for Heron). Also, the parts of the code that would require a singleton (the Editor for example) are treated as scripts with global variables while the parts that require the construction of objects are fully embedded in classes (the Node for example). A future refactoring might make also all the parts of the code not seen by the user fully object oriented but this is a decision with pros and cons needing to be weighted first. 

      Absence of testing is an important issue we recognise but Heron is a GUI app and nontrivial unit tests would require some keystroke/mouse movement emulator (like QTest of pytest-qt for QT based GUIs). This will be dealt with in the near future (using more general solutions like PyAutoGUI) but it is something that needs a serious amount of effort (quite a bit more that writing unit tests for non GUI based software) and more importantly it is nowhere as robust as standard unit tests (due to the variable nature of the GUI through development) making automatic test authoring an almost as laborious a process as the one it is supposed to automate.

      -  From looking at the examples, I did not quite see why it is necessary to write the ..._com.py scripts as Python files, since they only seem to consist of boilerplate code and variable definitions. Wouldn’t it be more convenient to represent this information in configuration files (e.g. yaml or toml)?  

      The com is not a configuration file, it is a script that launches the communication process of the Node. We could remove the variable definitions to a separate toml file (which then the com script would have to read). The pros and cons of such a set up should be considered in a future refactoring.

      Minor comments for the paper:

      -  p.7 (top left): "through its return statement" - the worker loop is an infinite loop that forwards data with a return statement?  

      This is now corrected. The worker loop is an infinite loop and does not return anything but at each iteration pushes data to the Nodes output.

      -  p.9 (bottom right): "of the self" → "off-the-shelf"  

      Corrected.

      -  p.10 (bottom left): "second root" → "second route"  

      Corrected.

      -  Supplementary Figure 3: Green start and square seem to be swapped (the green star on top is a camera image and the green star on the bottom is value visualization - inversely for the green square).  

      The star and square have been swapped around.

      -  Caption Supplementary Figure 4 (end): "rashes to receive" → "rushes to receive"  

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      (1) This is a valuable manuscript that successfully integrates several data sets to determine genomic interactions with nuclear bodies.

      In this paper we both challenge and/or revise multiple long-standing “textbook” models of nuclear genome organization while also revealing new features of nuclear genome organization. Therefore, we argue that the contributions of this paper extend well beyond “valuable”. Specifically, these contributions include:

      a. We challenge a several decades focus on the correlation of gene positioning relative to the nuclear lamina. Instead, through comparison of cell lines, we show a strong correlation of di4erences in gene activity with di4erences in relative distance to nuclear speckles in contrast to a very weak correlation with di4erences in relative distance to the nuclear lamina. This inference of little correlation of gene expression with nuclear lamina association was supported by direct experimental manipulation of genome positioning relative to the nuclear lamina. Despite pronounced changes in relative distances to the nuclear lamina there was little change relative to nuclear speckles and little change in gene expression.

      b. We similarly challenge the long-standing proposed functional correlation between the radial positioning of genes and gene expression. Here, and in a now published companion paper (doi.org/10.1038/s42003-024-06838-7), we demonstrate how nuclear speckle positioning relative to nucleoli and the nuclear lamina varies among cell types, as does the inverse relationship between genome positioning relative to nuclear speckles and the nuclear lamina. Again, this is consistent with the primary correlation of gene activity being the positioning of genes relative to nuclear speckles and also explains previous observations showing a strong relationship between radial position and gene expression only in some cell types.

      c. We identified a new partially repressed, middle to late DNA replicating type of chromosome domain- “p-w-v fILADs”- by their weak interaction with the nuclear lamina, which, based on our LMNA/LBR KO experimental results, compete with LADs for nuclear lamina association. Moreover, we show that when fLADs convert to iLADs, most conversions are to this p-w-v fiLAD state, although ~ one third are to a normal, active, early replicating iLAD state. Thus, fLADs can convert between repressed, partially repressed, and active states, challenging the prevailing assumption of the division of the genome into two states – active, early replicating A compartment/iLAD regions versus inactive, late replicating, B compartment/LAD regions.

      d. We identified nuclear speckle associated domains as DNA replication initiation zones, with the domains showing strongest nuclear speckle attachment initiating DNA replication earliest in S-phase.

      e. We describe for the first time an overall polarization of nuclear genome organization in adherent cells with the most active, earliest replicating genomic regions located towards the equatorial plane and less expressed genomic regions towards the nuclear top or bottom surfaces. This includes polarization of some LAD regions to the nuclear lamina at the equatorial plane and other LAD regions to the top or bottom nuclear surfaces.

      We have now rewritten the text to make the significance of these new findings clearer.

      (2) Strength of evidence: The evidence supporting the central claims is varied in its strength ranging from solid to incomplete. Orthogonal evidence validating the novel methodologies with alternative approaches would better support the central claims.

      We argue that our work exploited methods, data, and analyses equal to or more rigorous than the current state-of-the-art. This indeed includes orthogonal evidence using alternative methods which both supported our novel methodologies as well as demonstrating their robustness relative to more conventional approaches. This explains how we were able to challenge/revise long-standing models and discover new features of nuclear genome organization. More specifically:

      a. Unlike most previous analyses, we have integrated both genomic and imaging approaches to examine the nuclear genome organization relative to not one, but several di4erent nuclear locales and we have done this across several cell types. To our knowledge, this is the first such integrated approach and has been key to our success in appreciating new features of nuclear genome organization.

      b. The 16-fraction DNA replication Repli-seq data we developed and applied to this project represents the highest temporal mapping of DNA replication timing to date.

      c. The TSA-seq approach that we used remains the most accurate sequence-based method for estimating microscopic distance of chromosome regions to di4erent nuclear locales. As implemented, this method is unusually robust and direct as it exploits the exponential micron-scale gradient established by the di4usion of the free-radicals generated by peroxidase labeling to measure relative distances of chromosome regions to labeled nuclear locales. We had previously demonstrated that TSA-seq was able to estimate the average distances of genomic regions to nuclear speckles with an accuracy of ~50 nm, as validated by light microscopy. The TSA-seq 2.0 protocol we developed and applied to this project maintained the original resolution of TSA-seq to estimate to an accuracy of ~50 nm the average distances of genomic regions from nuclear speckles, as validated by light microscopy, while achieving more than a 10-fold reduction in the required number of cells.

      We have rewritten the text to address the reviewer concerns that led them to their initial characterization of the TSA-seq as novel and not yet validated.

      First, we have added a discussion of how the use of nuclear speckle TSA-seq as a “cytological ruler” was based on an extensive initial characterization of TSA-seq as described in previous published literature. In that previous literature we showed how the conventional molecular proximity method, ChIP-seq, instead showed local accumulation of the same marker proteins over short DNA regions unrelated to speckle distances. Second, we reference our companion paper, now published, and describe how the extension of TSA-seq to measure relative distances to nucleoli was further validated and shown to be robust by comparison to NAD-seq and extensive multiplexed immuno-FISH data. We further discuss how in the same companion paper we show how nucleolar DamID instead was inconsistent with both the NAD-seq and multiplexed immuno-FISH data as well as the nucleolar TSA-seq.

      Third, we have added scatterplots showing exactly how highly the estimated microscopic distances to all three nuclear locales, measured in IMR90 fibroblasts, correlate with the TSA-seq measurements in HFF fibroblasts. This addresses the concern that we were not using the exact same fibroblast cell line for the TSA-seq versus microscopic measurements. The strong correlation already observed would only be expected to become even stronger with use of the exact same fibroblast cell lines for both measurements.

      Fourth, we have addressed the reviewer concern that the nuclear lamin TSA-seq was not properly validated because it did not match nuclear lamin Dam-ID. We have now added to the text a more complete explanation of how microscopic proximity assays such as TSA-seq measure something di4erent from molecular proximity assays such as DamID or NAD-seq. We have added further explanation of how TSA-seq complements molecular proximity assays such as DamID and NAD-seq, allowing us to extract further information than either measurement alone. We also briefly discuss why TSA-seq succeeds for certain nuclear locales using multiple independent markers whereas molecular proximity assays may fail against the same nuclear locales using the same markers. This includes brief discussion from our own experience attempting unsuccessfully to use DamID against nucleoli and nuclear speckles.

      Reviewer #1 (Public Review):

      (1) The weakness of this study lies in the fact that many of the genomic datasets originated from novel methods that were not validated with orthogonal approaches, such as DNAFISH. Therefore, the detailed correlations described in this work are based on methodologies whose efficacy is not clearly established. Specifically, the authors utilized two modified protocols of TSA-seq for the detection of NADs (MKI67IP TSA-seq) and LADs (LMNB1-TSA-seq).

      We disagree with the statement that the TSA-seq approach and data has not been validated by orthogonal approaches. We have now addressed this point in the revised manuscript text:

      a) We added text to describe how previously FISH was used to validate speckle TSA-seq by demonstrating a residual of ~50 nm between the TSA-seq predicted distance to speckles and the distance measured by light microscopy using FISH:

      "In contrast, TSA-seq measures relative distances to targets on a microscopic scale corresponding to 100s of nm to ~ 1 micron based on the measured diffusion radius of tyramide-biotin free-radicals (Chen et al., 2018). Exploiting the measured exponential decay of the tyramide-biotin free-radical concentration, we showed how the mean distance of chromosomes to nuclear speckles could be estimated from the TSA-seq data to an accuracy of ~50 nm, as validated by FISH (Chen et al., 2018)."

      b) We note that we also previously have validated lamina (Chen et al, JCB 2018) and nucleolar (Kumar et al, 2024) TSA-seq and further validated speckle TSA-seq (Zhang et al, Genome Research 2021) by traditional immuno-FISH and/or immunostaining. The overall high correlation between lamina TSA-seq and the orthogonal lamina DamID method was also extensively discussed in the first TSA-seq paper (Chen et al, JCB 2018). Included in this discussion was description of how the di4erences between lamina TSA-seq and DamID were expected, given that DamID produces a signal more proportional to contact frequency, and independent of distance from the nuclear lamina, whereas TSA-seq produces a signal that is a function of microscopic distance from the lamina, as validated by traditional FISH.

      c) We added text to describe how the nucleolar TSA-seq previously was validated by two orthogonal methods- NAD-seq and multiplexed DNA immuno-FISH:

      "We successfully developed nucleolar TSA-seq, which we extensively validated using comparisons with two different orthogonal genome-wide approaches (Kumar et al., 2024)- NAD-seq, based on the biochemical isolation of nucleoli, and previously published direct microscopic measurements using highly multiplexed immuno-FISH (Su et al., 2020)."

      d) We have now added panels A&B to Fig. 7 and a new Supplementary Fig. 7 demonstrating further validation of TSA-seq based on showing the high correlation between the microscopically measured distances of many hundreds of genomic sites across the genome from di4erent nuclear locales and TSA-seq scores. As discussed in response #2 below, we have used comparison of distances measured in IMR90 fibroblasts with TSA-seq scores measured in HFF fibroblasts. We would argue therefore that these correlations are a lower estimate and therefore the correlation between microscopic distances and TSAseq scores would likely have been still higher if we had performed both assays in the exact same cell line.

      (2) Although these methods have been described in a bioRxiv manuscript by Kumar et al., they have not yet been published. Moreover, and surprisingly, Kumar et al., work is not cited in the current manuscript, despite its use of all TSA-seq data for NADs and LADs across the four cell lines.

      The Kumar et al, Communications Biology, 2024 paper is now published and is cited properly in our revision. We apologize for this oversight and confusion our initial omission of this citation may have created. We had been writing this manuscript and the Kumar et al manuscript in parallel and had intended to co-submit. We planned to cross-reference the two at the time we co-submitted, adding the Kumar et al reference to the first version of this manuscript once we obtained a doi from bioRxiv. But we then submitted the Kumar et al manuscript several months earlier, but meanwhile forgot that we had not added the reference to our first manuscript version.

      (3) Moreover, Kumar et al. did not provide any DNA-FISH validation for their methods.

      As we described in our response to Reviewer 1's comment #1, we had previously provided traditional FISH validation of lamina TSA-seq in our first TSA- seq paper as well as validation by comparison with lamina DamID (Chen et al, 2018).

      We also described how the nucleolar TSA-seq was extensively cross-validated in the Kumar et al, 2024 paper by both NAD-seq and the highly multiplexed immuno-FISH data from Su et al, 2020).

      We note additionally that in the Kumar et al, 2024 paper the nucleolar TSA-seq was additionally validated by correlating the predicted variations in centromeric association with nucleoli across the four cell lines predicted by nucleolar TSA-seq with the variations observed by traditional immunofluorescence microscopy.

      (4) Therefore, the interesting correlations described in this work are not based on robust technologies.

      This comment was made in reference to the Kumar et al paper not having been published, and, as noted in responses to points #2 and #3, the paper is now published.

      But we wanted to specifically note, however, that our experience is that TSA-seq has proven remarkably robust in comparison to molecular proximity assays. We've described in our responses to the previous points how TSA-seq has been cross-validated by both microscopy and by comparison with lamina DamID and nucleolar NAD-seq. We note also that in every application of TSA-seq to date, all antibodies that produced good immunostaining showed good TSA-seq results. Moreover, we obtained nearly identical results in every case in which we performed TSA-seq with different antibodies against the same target. Thus anti-SON and antiSC35 staining produced very similar speckle TSA-seq data (Chen et al, 2018), anti-lamin A and anti-lamin B staining produced very similar lamina TSA-seq data (Chen et al, 2018), antinucleolin and anti-POL1RE staining produced very similar DFC/FC nucleolar TSA-seq data (Kumar et al, 2024), and anti-MKI67IP and anti-DDX18 staining produced very similar GC nucleolar TSA-seq data (Kumar et al, 2024).

      This independence of results with TSA-seq to the particular antibody chosen to label a target differs from experience with methods such as ChIP, DamID, and Cut and Run/Tag in which results can differ or be skewed based on variable distance and therefore reactivity of target proteins from the DNA or due to other factors such as non-specific binding during pulldown (ChIP) or differential extraction by salt washes (Cut and Tag).

      Our experience in every case to date is that antibodies that produce similar immunofluorescence staining produce similar TSA-seq results. We attribute this robustness to the fact that TSA-seq is based only on the original immunostaining specificity provided by the primary and secondary antibodies plus the diffusion properties of the tyramide-free radical.

      We've now added the following text to our revised manuscript:

      "As previously demonstrated for both SON and lamin TSA-seq (Chen et al., 2018), nucleolar TSA-seq was also robust in the sense that multiple target proteins showing similar nucleolar staining showed similar TSA-seq results (Kumar et al., 2024); this robustness is intrinsic to TSA-seq being a microscopic rather than molecular proximity assay, and therefore not sensitive to the exact molecular binding partners and molecular distance of the target proteins to the DNA."

      (5) An attempt to validate the data was made for SON-TSA-seq of human foreskin fibroblasts (HFF) using multiplexed FISH data from IMR90 fibroblasts (from the lung) by the Zhuang lab (Su et al., 2020). However, the comparability of these datasets is questionable. It might have been more reasonable for the authors to conduct their analyses in IMR90 cells, thereby allowing them to utilize MERFISH data for validating the TSA-seq method and also for mapping NADs and LADs.

      We disagree with the reviewer's overall assessment that that the use of the IMR90 data to further validate the TSA-seq is questionable because the TSA-seq data from HFF fibroblasts is not necessarily comparable with multiplexed immuno-FISH microscopic distances measured in IMR90 fibroblasts.

      In response we have now added panels to Fig. 7 and Supplementary Fig. 7, showing:

      a) There is very little di4erence in correlation between speckle TSA-seq and measured distances from speckles in IMR90 cells whether we use IMR90 or HFF cells SON TSA-seq data (R<sup>2</sup> = 0.81 versus 0.76) (new Fig. 7A).

      b) There is also a high correlation between lamina (R<sup>2</sup> = 0.62) and nucleolar (R<sup>2</sup> = 0.73) HFF TSA-seq and measured distances in IMR90 cells. Thus, we conclude that this high correlation shows that the multiplexed data from ~1000 genomic locations does validate the TSA-seq. These correlations should be considered lower bounds on what we would have measured using IMR90 TSA-seq data. Thus, the true correlation between distances of loci from nuclear locales and TSA-seq would be expected to be either comparable or even stronger than what we are seeing with the IMR90 versus HFF fibroblast comparisons.

      c) This correlation is cell-type specific (Fig. 7B, new SFig. 7). Thus, even for speckle TSAseq, highly conserved between cell types, the highest correlation of IMR90 distances with speckle TSA-seq is with IMR90 and HFF fibroblast data. For lamina and nucleolar TSA-seq, which show much lower conservation between cell types, the correlation of IMR90 distances is high for HFF data but much lower for data from the other cell types. This further justifies the use of IMR90 fibroblast distance measurements as a proxy for HFF fibroblast measurements.

      Thus, we have added the following text to the revised manuscript:

      "We reasoned that the nuclear genome organization in the two human fibroblast cell lines would be sufficiently similar to justify using IMR90 multiplexed FISH data [43] as a proxy for our analysis of HFF TSA-seq data. Indeed, the high inverse correlation (R= -0.86) of distances to speckles measured by MERFISH in IMR90 cells with HFF SON TSA-seq scores is nearly identical to the inverse correlation (R= -0.89) measured instead using IMR90 SON TSA-seq scores (Fig. 7A). Similarly, distances to the nuclear lamina and nucleoli show high inverse correlations with lamina and nucleolar TSA-seq, respectively (Fig. 7A). These correlations were cell type specific, particularly for the lamina and nucleolar distance correlations, as these correlations were reduced if we used TSA-seq data from other cell types (SFig. 7A). Therefore, the high correlation between IMR90 microscopic distances and HFF TSA-seq scores can be considered a lower bound on the likely true correlation, justifying the use of IMR90 as a proxy for HFF for testing our predictions."

      Reviewer #2 (Public Review):

      Weaknesses:

      (1) The experiments are largely descriptive, and it is difficult to draw many cause-andeffect relationships...The study would benefit from a clear and specific hypothesis.

      This study was hypothesis-generating rather than hypothesis-testing in its goal. Our research was funded through the NIH 4D-Nucleome Consortium, which had as its initial goal the development, benchmarking, and validation of new genomic technologies. Our Center focused on the mapping of the genome relative to different nuclear locales and the correlation of this intranuclear positioning of the genome with functions- specifically gene expression and DNA replication timing. By its very nature, this project took a discovery-driven versus hypothesis-driven scientific approach. Our question fundamentally was whether we could gain new insights into nuclear genome organization through the integration of genomic and microscopic measurements of chromosome positioning relative to multiple different nuclear compartments/bodies and their correlation with functional assays such as RNA-seq and Repliseq.

      Indeed, this study resulted in multiple new insights into nuclear genome organization as summarized in our last main figure. We believe our work and conclusions will be of general interest to scientists working in the fields of 3D genome organization and nuclear cell biology. We anticipate that each of these new insights will prompt future hypothesis-driven science focused on specific questions and the testing of cause-and-effect relationships.

      However, we do want to point out that our comparison of wild-type K562 cells with the LMNA/LBR double knockout was designed to test the long-standing model that nuclear lamina association of genomic loci contributes to gene silencing. This experiment was motivated by our surprising result that gene expression differences between cell lines correlated strongly with differences in positioning relative to nuclear speckles rather than the nuclear lamina. Despite documenting in these double knockout cells a decreased nuclear lamina association of most LADs, and an increased nuclear lamina association of the “p-w-v” fiLADs identified in this manuscript, we saw no significant change in gene expression in any of these regions as compared to wild-type K562 cells. Meanwhile, distances to nuclear speckles as measured by TSA-seq remained nearly constant.

      We would argue that this represents a specific example in which new insights generated by our genomics comparison of cell lines led to a clear and specific hypothesis and the experimental testing of this hypothesis.

      (2) Similarly, the paper would be very much strengthened if the authors provided additional summary statements and interpretation of their results (especially for those not as familiar with 3D genome organization).

      We appreciate this feedback and agree with the reviewer that this would be useful, especially for those not familiar with previous work in the field of 3D genome organization. In an earlier draft, we had included additional summary and interpretation statements in both the Introduction and Results sections. At the start of each Results section, we had also previously included brief discussion of what was known before and the context for the subsequent analysis contained in that section. However, we had thought we might be submitting to a journal with specific word limits and had significantly cut out that text.

      We have now restored this text and, in certain cases, added additional explanations and context.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Figures 1C and D. Please add the units at the values of each y-axis.

      We have done that.

      The representation of Figure 2C lacks clarity and is diJicult to understand. The x-axis labeling regarding the gene fraction number needs clarification.

      We've modified the text to the Fig. 2C legend: "Fraction of genes showing significant di=erence in relative positioning to nuclear speckles (gene fraction, x-axis) versus log2 (HFF FKPM / H1 FKPM) (y-axis);"

      "We next used live-cell imaging to corroborate that chromosome regions close to nuclear speckles, primarily Type I peaks, would show the earliest DNA replication timing." This sentence requires modification as Supplementary Figure 3F does not demonstrate that Type I peaks exhibit the earliest DNA replication timing; it only indicates that the first PCNA foci in S-phase are in proximity to nuclear speckles.

      We've modified the text to: "We next used live-cell imaging to show that chromosome regions close to nuclear speckles show the earliest DNA replication timing; this is consistent with the earliest firing DNA replication IZs, as determined by Repli-seq, aligning with Type 1 peaks that are closely associated with nuclear speckles."

      In Figure 5, the authors employed LaminB1-DamID to quantify LADs in LBR-KO and LMNA/LBR-DKO K562 cells. These are interesting results. However, for these experiments, it is crucial to assess LMNB1 signal at the nuclear periphery via immunofluorescence (IF) to confirm the absence of changes, ensuring that the DamID signal solely reflects contacts with the nuclear lamina. Furthermore, in this instance, their findings should be validated through DNA-FISH.

      Immunostaining of LMNB1 was performed and showed a normal staining pattern as a ring adjacent to the nuclear periphery. Images of this staining were included in the metadata tied to the sequencing data sets deposited on the 4D Nucleome Data portal. We thank the reviewer for bringing up this point, and have added a sentence mentioning this result in the Results Section:

      "Immunostaining against LMNB1 revealed the normal ring of staining around the nuclear periphery seen in wt cells (images deposited as metadata in the deposited sequencing data sets)."

      Because both TSA-seq and DamID have been extensively validated by FISH, as detailed in our previous responses to the public reviewer comments, we feel it is unnecessary to validate these findings by FISH.

      p-w-v-fiLADs should be labelled in Figure 5B.

      We've added labeling as suggested.

      "The consistent trend of slightly later DNA replication timing for regions (primarily p-w-v fiLADs) moving closer to the lamina" is not visible in the representation of the data of Figure 5G.

      We did not make a change as we believed this trend was apparent in the Figure.

      To reduce the descriptive nature of the data, it would be pertinent to conduct H3K9me3 and H3K27me3 ChIP-seq analyses in both the parental and DKO mutant cells. This would elucidate whether p-w-v-fiLADs and NADs anchoring to the nuclear lamina undergo changes in their histone modification profile.

      We believe further analysis of the reasons underlying these shifts in positioning, including such ChIP-seq or equivalent analysis, is of interest but beyond the scope of this publication. We see such measurements as the beginning of a new story but insuJicient alone to determine mechanism. Therefore we believe such experiments should be part of that future study.

      The description of Figure 7 lacks clarity. Additionally, it appears that TSA-seq for NADs and LADs may not be universally applicable across all cell types, particularly in flat cells, whereas DamID scores demonstrate less variation across cell lines, as also stated by the authors.

      TSA-seq is a complement to rather than a replacement for either DamID or NAD-seq. TSAseq reports on microscopic distances whereas both DamID and NAD-seq instead are more proportional to contact frequency with the nuclear lamina or nucleoli, respectively, and insensitive to distances of loci away from the lamina or nucleoli. Thus, TSA-seq provides additional information based on the intrinsic diJerences in what TSA-seq measures relative to molecular proximity methods such as DamID or NAD-seq. The entire point is that the convolution of the exponential point-spread-function of the TSA-seq with the shape of the nuclear periphery allows us to distinguish genomic regions in the equatorial plane versus the top and bottom of the nuclei. The TSA-seq is therefore highly "applicable" when properly interpreted in discerning new features of genome organization. As we stated in the revised manuscript, the lamina DamID and TSA-seq are complementary and provide more information together then either method along. The same is true for the NAD-seq and nucleolar TSA-seq comparison, as described in more detail in the Kumar, et al, 2024 paper.

      Introduction:

      The list of methodologies for mapping genomic contacts with nucleoli (NADs) should also include recent technologies, such as Nucleolar-DamID (Bersaglieri et al., PMID: 35304483), which has been validated through DNA-FISH.

      We did not include nucleolar DamID in the mention in the Introduction of methods for identifying diJerential lamina versus nucleolar interactions of heterochromatin- either from our own collaborative group or from the cited reference- because we did not have confidence in the accuracy of this method in identifying NADs. In the case of the published nucleolar DamID from our collaborative group, published in Wang et al, 2021, we later discovered that despite apparent agreement of the nucleolar DamID with a small number of published FISH localization the overall correlation of the nucleolar DamID with nucleolar localization was poor. As described in detail in the Kumar et al, 2024 publication, this poor correlation of the nucleolar DamID was established using three orthogonal methods- nucleolar TSA-seq, NAD-seq, and multiplexed immuno-FISH measurements from ~1000 genomic locations. Instead, we found that this nucleolar DamID showed high correlation with lamina DamID. We note that many strong NADs are also LADs, which we think is why validation with only several FISH probes is inadequate to demonstrate overall validation of the approach.

      We could not compare our nucleolar-DamID data in human cells with the alternative nucleolar-DamID results cited by the reviewer which were performed in mouse cells. We note that in this paper the nucleolar DamID FISH validation only included several putative NAD chromosome regions and, I believe, one LAD region. However, our initial comparison of the nucleolar DamID cited by the reviewer with unpublished TSA-seq data from mouse ESCs produced by the Belmont laboratory and with NAD-seq data from the Kaufman laboratory shows a similar lack of correlation of the nucleolar DamID signal with nucleolar TSA-seq and NAD-seq, as well as multiplexed immuno-FISH data from the Long Cai laboratory, as we saw in our analysis of own nucleolar DamID data in human cells.

      We have added explanation concerning the lack of correlation of our nucleolar DamID with orthogonal measurements of nucleolar proximity in the added text (below) to our revised manuscript:

      "Nucleolar DamID instead showed broad positive peaks over large chromatin domains, largely overlapping with LADs mapped by LMNB1 DamID (Wang et al., 2021). However, this nucleolar DamID signal, while strongly correlated with lamin DamID, showed poor correlation with either NAD-seq or nucleolar distances mapped by multiplexed immunoFISH (Kumar et al., 2024). We suspect the problem is that with molecular proximity assays the output signals are disproportionally dominated by the small fraction of target proteins juxtaposed in su=icient proximity to the DNA to produce a signal rather than the amount of protein concentrated in the target nuclear body. "

      Our mention of nucleolar TSA-seq was in the context of why we focused on nucleolar TSAseq and excluded our own nucleolar DamID. We chose not to discuss the second nucleolar DamID method cited above 1) because it was not appropriate to our discussion of our own experimental approach and 2) also because we cannot yet make a definitive statement of its accuracy for nucleolar mapping.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors start the manuscript by describing the 'radial genome organization' model and contrast it with the 'binary model' of genome organization. It would be helpful for the authors to contextualize their results a bit more with regard to these two diJerent models in the discussion.

      We have added several sentences in the first paragraph of the Discussion to accomplish this contextualization. The new paragraph reads:

      "Here we integrated imaging with both spatial (DamID, TSA-seq) and functional (Repli-seq, RNA-seq) genomic readouts across four human cell lines. Overall, our results significantly extend previous nuclear genome organization models, while also demonstrating a cell-type dependent complexity of nuclear genome organization. Briefly, in contrast to the previous radial model of genome organization, we reveal a primary correlation of gene expression with relative distances to nuclear speckles rather than radial position. Additionally, beyond a correlation of nuclear genome organization with radial position, in cells with flat nuclei we show a pronounced correlation of nuclear genome organization with distance from the equatorial plane. In contrast to previous binary models of genome organization, we describe how both iLAD / A compartment and LAD / B compartment contain within them smaller chromosome regions with distinct biochemical and/or functional properties that segregate di=erentially with respect to relative distances to nuclear locales and geometry."

      (2) Data should be provided demonstrating KO of LBR and LMNA - immunoblotting for both proteins would be one approach. In addition, it would be helpful to provide additional nuclear morphology measurements of the DKO cells (volume, surface area, volume of speckles/nucleoli, number of speckles/nucleoli).

      We've added additional description describing the generation and validation of the KO lines:

      "To create LMNA and LBR knockout (KO) lines and the LMNA/LBR double knockout (DKO) line, we started with a parental "wt" K562 cell line, clone #17, expressing an inducible form of Cas9 (Brinkman et al., 2018). The single KO and DKO were generated by CRISPR-mediated frameshift mutation according to the procedure described previously (Schep et al., 2021). The "wt" K562 clone #17 was used for comparison with the KO clones.

      The LBR KO clone, K562 LBR-KO #19, was generated, using the LBR2 oligonucleotide GCCGATGGTGAAGTGGTAAG to produce the gRNA, and validated previously, using TIDE (Brinkman et al., 2014) to check for frameshifts in all alleles as described elsewhere (Schep et al., 2021). The LMNA/LBR DKO, K562 LBR-LMNA DKO #14, was made similarly, starting with the LBR KO line and using the combination of two oligonucleotides to produce gRNAs:

      LMNA-KO1: ACTGAGAGCAGTGCTCAGTG, LMNA-KO2: TCTCAGTGAGAAGCGCACGC.

      Additionally, the LMNA KO line, K562 LMNA-KO #14, was made the same way but starting with the "wt" K562 cell line. Validation was as described above; additionally, for the new LMNA KO and LMNA/LBR DKO lines, immunostaining showed the absence of anti-LMNA antibody signal under confocal imaging conditions used to visualize the wt LMNA staining while the RNA-seq from these clones revealed an ~20-fold reduction in LMNA RNA reads relative to the wt K562 clone."

      As suggested, we also added morphological data for the DKO line in a modified SFig.5.

      (3) The rationale for using LMNB1 TSA-seq and LMNB1 DAMID is not immediately clear. The LMNB1 TSA-seq is more variable across cell types and replicates than the DAMID. Could the authors please compare the datasets a bit more to understand the diJerences? For example, the authors demonstrate that "40-70% of the genome shows statistically significant diJerences in Lamina TSA-seq over regions 100 kb or larger, with most of these regions showing little or no diJerences in speckle TSA-seq scores." If the LMNB1 DAMID data is used for this analysis or Figure 2D, is the same conclusion reached? Also, in Figure 6, the authors conclude that C1 and C3 LAD regions are enriched for constitutive LADs, while C2 and C4 LAD regions are fLADs. This is a bit surprising because the authors and others have previously shown that constitutive LADs have higher LMNB1 contact frequency than facultative LADs (Kind, et al Cell 2015, Figure 3C).

      Indeed, in the first TSA-seq paper (Chen et al, 2018) we did observe that cLADs had the highest LMNB TSA-seq scores; this was for K562 cells with round nuclei in which there is therefore no diJerence in lamina TSA-seq scores produced by nuclear shape over the entire nucleus.

      However, there are diJerences between TSA-seq and DamID in terms of what they measure and we refer the reviewer to the first TSA-seq paper (Chen et al, 2018) that explains in greater depth these diJerences. This first paper explains how DamID is indeed related to contact frequency but how the TSA-seq instead estimates mean distances from the target, in this case the nuclear lamina. This is because the diJusion of tyramide free radicals from the site of their constant HRP production produces an exponential decay gradient of tyramide free radical concentration at steady state.

      We have summarized these diJerences in in text we have added to introduce both DamID and TSA-seq in the second Results section:

      "DamID is a well-established molecular proximity assay; DamID applied to the nuclear lamina divides the genome into lamina-associated domains (LADs) versus nonassociated “inter-LADs” or “iLADs” (Guelen et al., 2008; van Steensel and Belmont, 2017). In contrast, TSA-seq measures relative distances to targets on a microscopic scale corresponding to 100s of nm to ~ 1 micron based on the measured diJusion radius of tyramide-biotin free-radicals (Chen et al., 2018)... While LMNB1 DamID segments LADs most accurately, lamin TSA-seq provides distance information not provided by DamID- for example, variations in relative distances to the nuclear lamina of diJerent iLADs and iLAD regions. These diJerences between the LMNB1 DamID and LMNB TSA-seq signals are also crucial to a computational approach, SPIN, that segments the genome into multiple states based on their varying nuclear localization, including biochemically and functionally distinct lamina-associated versus near-lamina states (Consortium et al., 2024; Wang et al., 2021).

      Thus, lamin DamID and TSA-seq complement each other, providing more information together than either one separately."

      We note that these diJerences in lamina DamID and TSA-seq are crucial to being able to gain additional information by comparing variations in the lamina TSA-seq for LADs in Figs. 6&7. See our response to point (4) below, for further explanation.

      (4) In 7B/C, the authors show that the highest LMNB1 regions in HFF are equator of IMR90s. However, in Figure 7G, their cLAD score indicates that constitutive LADs are not at the equator. This is a bit surprising given the point above and raises the possibility that SON signals (as opposed to LMNB1 signals) might be more responsible for correlation to localization relative to the equator. Hence, it might be helpful if the authors repeat the analyses in Figures 7B/C in regions with diJering LMNB1 signals but similar SON signals (and vice versa).

      Again, this is based on the apparent assumption by the reviewer that DamID and TSA-seq work the same way and measure the same thing. But as explained above in the previous point, this is not true.

      In our first TSA-seq paper (Chen et al, 2018) we showed how we could use the exponential decay point-spread-function produced by TSA, measured directly by light microscopy, to convert sequencing reads from the TSA-seq into a predicted mean distance from nuclear speckles, approximated as point sources. These mean distances predicted from the SON TSA-seq data agreed with measured FISH distances to nuclear speckles to within ~50 nm for a set of DNA probes from diJerent chromosome regions. Moreover, varying TSA staining conditions changed the decay constants of this exponential decay, thus producing diJerences in the SON TSA-seq signals. By using these diJerent exponential decay functions to convert the TSA-seq scores from these independent data sets to estimated distances from nuclear speckles, we again observed a distance residual of ~50 nm; in this case though this distance residual of ~50 nm represented the mean residual observed genome-wide. This gives us great confidence that the TSA-seq is working as we have modeled it.

      As we mentioned in our response to point 3 above, we did see the highest LMNB TSA-seq signal for cLADs in K562 cells with round nuclei (Chen et al, 2018).

      But as we now show in our simulation performed in this paper for Fig. 7, the observed tyramide free radical exponential decay gradient convolved with the flat nuclear lamina shape produces a higher equatorial LMNB1 TSA-seq signal for LADs at the equatorial plane. We confirmed that LADs with this higher TSA-seq signal were enriched at the equatorial plane by mining the multiplexed IMR90 imaging data. Similar mining of the multiplexed FISH IMR90 data showed localization of cLADs away from the equatorial plane.

      We are not clear about the rationale for what the reviewer is suggesting about SON signals "being more responsible for correlation to localization to the equator". We have provided an explanation for the higher lamina TSA-seq scores for LADs near the equator based on the measured spreading of the tyramide free radicals convolved with the eJect of the nuclear shape. This makes a prediction that the observed variation in lamina TSA-seq scores for LADs with similar DamID scores is related to their positioning relative to the equatorial plane as we then validated through our mining of the IMR90 multiplexed FISH data.

      (5) FISH of individual LADs, v-fiLADs, and p-w-v-fiLADs relative to the lamina and speckle would be helpful to understand their relative positioning in control and LBR/LMNA double KO cells. This would significantly bolster the claim that "histone mark enrichments..more precisely revealed the diJerential spatial distribution of LAD regions...".

      Adequately testing these predictions made from the lamina/SON TSA-seq scatterplots by direct FISH measurements would require measurements from large numbers of diJerent chromosome regions through a highly multiplexed immuno-FISH approach. We are not equipped currently in any of our laboratories to do such measurements and we leave this therefore for future studies.

      Rather our statement is based on our use of TSA-seq analyzed through these 2D scatterplots and should be valid to the degree that our TSA-seq measurements do indeed correlate with microscopy derived distances.

      However, we do now include demonstration of a high correlation of speckle, lamina, and nucleolar TSA-seq with highly multiplexed immuno-FISH measurement of distances to these locales in a revised Fig. 7. The high correlation shown between the TSA-seq scores and measured distances does therefore add additional support to our claim that the reviewer is discussing, even without our own multiplexed FISH validation.

      (6) "In contrast, genes within genomic regions which in pair-wise comparisons of cell lines show a statistically significant diJerence in lamina TSA-seq show no obvious trend in their expression diJerences (Figure 2C).". This appears to be an overstatement based on the left panel of 2D.

      We do not follow the reviewer's point. In Fig. 2C we show little bias in the diJerences in gene expression between the two cell types for regions that showed diJerences in lamina TSA-seq. The reviewer is suggesting something otherwise based on their impression, not explicitly stated, of the left panel of Fig. 2D. But we see similar shades of blue extending vertically at low SON values and similar shades of red extending vertically at high SON values, suggesting a correlation of gene expression only with the SON TSA-seq score but not with the LMNB1 TSA-seq score displayed on the y-axis. This is also consistent with the very small and/or insignificant correlation coeJicients measured in our linear model relating diJerences in LMNB1 TSA-seq to diJerences in expression but the large correlation coeJicient observed for SON TSA-seq (Fig. 2E). Thus, we see Fig. 2C-E as self-consistent.

      (7) In the section on "Polarity of Nuclear Genome Organization" - "....Using the IMR90 multiplexed FISH data set [43]...." - The references are not numbered.

      We thank the reviewer for this correction.

      (8) I believe there is an error in the Figure 7B legend. The descriptions of Cluster 1 and 2 do not match those indicated in the figure.

      We again thank the reviewer for this correction.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      The entire study is based on only 2 adult animals, that were used for both the single cell dataset and the HCR. Additionally, the animals were caught from the ocean preventing information about their age or their life history. This makes the n extremely small and reduces the confidence of the conclusions. 

      This statement is incorrect.  While the scRNAseq was indeed performed in two animals (n=2), the HCR-FISH was performed in 3-5 animals (depending on the probe used).  These were different animals from those used for the scRNAseq.  The number of animals used has now been included in the manuscript.

      All the fluorescent pictures present in this manuscript present red nuclei and green signals being not color-blind friendly. Additionally, many of the images lack sufficient quality to determine if the signal is real. Additional images of a control animal (not eviscerated) and of a negative control would help data interpretation. Finally, in many occasions a zoomed out image would help the reader to provide context and have a better understanding of where the signal is localized. 

      Fluorescent photos have been changed to color-blind friendly colors.  Diagrams, arrows and new photos have been included as to guide readers to the signal or labeling in cells. Controls for HCR-FISH and labeling in normal intestines have been included.  

      Reviewer #2:

      The spatial context of the RNA localization images is not well represented, making it difficult to understand how the schematic model was generated from the data. In addition, multiple strong statements in the conclusion should be better justified and connected to the data provided.

      As explained above we have made an effort to provide a better understanding of the cellular/tissue localization of the labeled cells. Similarly, we have revised the conclusions so that the statements made are well justified.

      Reviewer #3:

      Possible theoretical advances regarding lineage trajectories of cells during sea cucumber gut regeneration, but the claims that can be made with this data alone are still predictive.

      We are conscious that the results from these lineage trajectories are still predictive and have emphasized this in the text. Nonetheless, they are important part of our analyses that provide the theoretical basis for future experiments.

      Better microscopy is needed for many figures to be convincing. Some minor additions to the figures will help readers understand the data more clearly.

      As explained above we have made an effort to provide a better understanding of the cellular/tissue localization of the labeled cells.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      -  Page 4, line 70-81: if the reader is not familiar with holothurian anatomy and regeneration process, this section can be complicated to fully understand. An illustration, together with clear definitions of mesothelium, coelomic epithelium, celothelium and luminal cells would help the reader. 

      A figure (now Figure 1) detailing the holothurian anatomy of normal and regenerating animals has been added. A figure detailing the intestinal regeneration process has also been included (S1).

      -  Page 5 line 92-104: this paragraph could be shortened. It would be more important to explain what the main question is the Authors would like to answer and why single cell would be the best technique to answer it, than listing previous studies that used scRNA-Seq. 

      The paragraph has been shortened and the focus has been shifted to the question of cellular components of regenerative tissues in holothurians.

      -  Page 6, line 125-127 and line 129-132: this belongs to the method section. 

      This information is now provided in the Materials and Methods section.

      -  Page 11, line 210-217: this belongs to the discussion. 

      This section has now been included in the Discussion.

      -  How many mesenteries are present in one animal? 

      This has now been included as part of Figure S1.

      -  In the methods there are no information about the quality of the dataset and the sequencing and the difference between the 2 samples coming from the 2 animals. How many cells from each sample and which is the coverage? The Authors provided this info only between mesentery and anlage but not between animals. 

      We have added additional information about the sequencing statistics in S4 Fig and S15 Table. Description has also been added in the methods in lines 922-926 under Single Cell RNA Sequencing and Data Analysis section.

      -  The result section "An in-depth analysis of the various cluster..." is particularly long and very repetitive. I would encourage to Authors to remove a lot of the details (list of genes and GO terms) that can be found in the figures and stressed only the most important elements that they will need to support their conclusions. Having full and abbreviated gene names and the long list of references makes the text difficult to read and it is challenging to identify the main point that the Authors are trying to highlight. 

      This section has been abbreviated.

      -  Figure 1: I would suggest adding a graph of holothurian anatomy before and after the evisceration to provide more context of the process we are looking at and remove 1C. 

      Information on the holothurian anatomy has been included in a new Fig 1 and in supplementary figure S1

      -  Figure 2: I would suggest removing this figure that is redundant with Figure 3 and several genes are not cluster specific. Figure 3 is doing a better job in showing similar concepts. 

      Figure 2 was removed and placed in the Supplement section. 

      - In figure 3 how were the 3 cell types defined? Was this done manually or through a bioinformatic analysis? 

      The cell definition was done following the analysis of the highly expressed transcripts and comparisons to what has been shown in the scientific literature.

      -  Figure 2O shows that one of the supra-cluster is made of C2, C7, C6 and C10. This contradicts the text page 9, line 195. 

      The transcript chosen for this figure gives the wrong idea that these 4 clusters are similar. We have now addressed this in the manuscript.

      -  Figure 4A and 4C: if these are representing a subset of Figure 3, they should be removed in one or the other. The same comment is valid also for Figures 5, 6 and 7. In general the manuscript is very redundant both in terms of Figures and text. 

      These are indeed subsets of Fig 3 that were added with the purpose of clarifying the findings, however, in view of the reviewer’s comment we have deleted the redundant information from all figures.

      -  Figure 9: since the panels are not in order, it is difficult to follow the flow of the figure.  - All UMAP should have the number of the cluster on the UMAP itself instead of counting only on the color code in order to be color-blind friendly. 

      The figure has been modified and clusters are now identified in the UMAP by their number.

      -  Figure S1F seems acquired in very different conditions compared to the other images in the same figure. 

      Fig S1F (now S2 Fig) is an overlay of fluorescent immune-histochemistry (UV light detected) with “classical” toluidine blue labeling (visible light detected).  This has now been explained in the figure legend.

      -  Table S7 is lacking some product numbers. 

      The toluidine blue product number has now been added to the table.  The antibodies that lack product number correspond to antibodies generated in our lab  and described in the references provided.

      -  The discussion is pretty long and partially redundant with the result section. I would encourage the Authors to shorten the text and shorten paragraphs that have repeating information.  - It might be out of the scope of the Authors but the readers would benefit from having a manuscript that focuses more on the novel aspects discovered with the single-cell RNA-Seq and then have a review that will bring together all the literature published on this topic and integrating the single-cell data with everything that is known so far. 

      We have tried to shorten the discussion by eliminating redundant text.

      Reviewer #2 (Recommendations For The Authors): 

      -  An intriguing finding is the lack of significant difference in the cell clusters between the anlage and mesentery during regeneration. This discovery raises important questions about the regenerative process. The authors should provide a more detailed explanation of the implications of this finding. For example, does it suggest that both organs contribute equally to the regenerated tissues? 

      The lack of significant differences in the cell clusters between the anlage and the mesentery is somewhat surprising but can be explained by two different facts. First, we have previously shown that many of the cellular processes that take place in the anlage, including cell proliferation, apoptosis, dedifferentiation and ECM remodeling occur in a gradient that begins at the tip of the mesentery where the anlage forms and extends to various degrees into the mesentery.  Similarly, migrating cells move along the connective tissue of the mesentery to the anlage.  Thus, there is no clear partition of the two regions that would account for distinct cell populations associated with the regenerative stage.  Second, the two cell populations that would have been found in the mesentery but not in the regenerating anlage, mature muscle and neurons, were not dissociated by our experimental protocol as to allow for their sequencing.  Our current experiments are being done using single nuclei RNA sequencing to overcome this hurdle. This has now been included in the discussion.

      -  Proliferating cells are obviously important to the study of regeneration as it is assumed these form the regenerating tissue. The authors describe cluster 8 as the proliferative cells. Is there evidence of proliferation in other cell types or are these truly the only dividing cells? Is c8 of multiple cell types but the clustering algorithm picks up on the markers of cell division i.e. what happens if you mask cell division markers - does this cluster collapse into other cluster types? This is important as if there is only one truly proliferating cell type then this may be the origin of the regenerative tissues and is important for this study to know this. 

      As the reviewer highlights, we also believe this to be an important aspect to discuss. We have addressed this in the manuscript discussion with the following: “Our data suggest that there appears to be a specific population of only proliferative cells (C8) characterized by a large number of cell proliferation genes, which can be visualized by the top genes shown in Fig 3. These cell proliferation genes are specific to C8, with minimum representation in other populations. Interestingly, as mentioned before C8 expresses at lower levels many of the genes of other coelomic epithelium populations. Nevertheless, even if we mask the top 38 proliferation genes (not shown), this cluster is maintained as an independent cluster, suggesting that its identity is conferred by a complex transcriptomic profile rather than only a few proliferation-related genes. Therefore, the identity and potential role of C8 could be further described by two distinct alternatives: (1) cells of C8 could be an intermediate state between the anlage precursor cells (discussed below) and the specialized cell populations or (2) cells of C8 are the source of the anlage precursor populations from which all other populations arise. The pseudotime data is certainly complex and challenging to interpret with our current dataset, yet the RNA velocity analysis showed in Fig 11B would suggests that cells of C8 transition into the anlage precursor populations, rather than being an intermediate state. This is also supported by the Slingshot pseudotime analysis that incorporates C8 (S13 Fig).

      Nevertheless, additional experiments are needed to confirm this hypothesis.”

      -  The schematic model presented in Fig 10 is essential for clarifying the paper's findings and will provide a crucial baseline model for future research. However, the comparison of the data shown in the HCR figures with the schematic is challenging due to the lack of spatial context in the HCR figures. The authors should find a way to provide better context in the figures, such as providing two-color in situ images to compare spatial relationships of cell types and/or including lower resolution and side-by-side fluorescent and bright field images if possible. 

      The figure has been modified to explain the spatial arrangement of the tissues.

      The authors make several strong statements in the discussion that weren't well connected to the findings in the data. Specifically: 

      “Regardless of which cell population is responsible for giving rise to the cells of the regenerating intestine, our study reveals that the coelomic epithelium, as a tissue layer, is pluripotent.” 

      This has now been expanded to better explain the statement.

      738 “…we postulate that cells from C1 stand as the precursor cell population from which the rest of the cells in the coelomic epithelium arise”. 

      This has now been expanded to better explain the statement.

      748 “differentiation: muscle, neuroepithelium, and coelomic epithelium cells. We also propose the presence of undifferentiated and proliferating cell populations in the coelomic epithelia, which give rise to the cells in this layer…”

      This has now been expanded to better explain the statement.

      777 “amphibians, the cells of the holothurian anlage coelomic epithelium are proliferative undifferentiated cells and originated via a dedifferentiation process…”

      This has now been expanded to better explain the statement.

      Reviewer #3 (Recommendations For The Authors): 

      Specific questions: 

      - Is there any way to systematically compare these cells to evolutionarily-diverged cells in distant relatives to sea cucumbers? Or even on a case-by-case basis? For example, is there evidence for any of these transitory cell types to have correlate(s) in vertebrate gut regeneration? 

      This is a most interesting question but one that is perhaps a bit premature to answer due to multiple reasons.  First, most of the studies in vertebrates focus on the regeneration of the luminal epithelium, a layer that we are not studying in our system since it appears later in the regeneration process.  Second, there is still too little data from adult echinoderms to fully comprehend which cells are cell orthologues to vertebrates. Third, we are only analyzing one regenerative stage.  It is our hope that this is just the start of a full description of what cell types/stages are found and how they function in regeneration and that this will lead us to identify the cellular orthologues among animal species.

      Major revisions: 

      - If lineage tracing is within the scope of this paper, it would provide more definitive evidence to the conclusions made about the precursor populations of the regenerating anlage. 

      Response:  This is certainly one of the next steps, however at present, it is not possible due to technical limitations.

      Minor revisions: 

      - Line 47: "for decades" even longer! Could the authors also cite some other amphibians, such as other salamanders (newts) and larval frogs? 

      References have been added.

      - Line 85: "specially"-could authors potentially change to "specifically" 

      Corrected

      - Line 122: Authors should add the full words of what these abbreviations stand for in the caption for Figure 1 or in Figure 1A itself. 

      Corrected

      - Lines 153: What conclusions are the authors trying to make from one type of tubulin presence compared to the others? It's unclear from the text. 

      The authors are not trying to reach any particular conclusion.  They are just stating what was found using several markers, and the possibility that what might be viewed first hand as a single cell population might be more heterogenous.  Although the tubulin-type information might not be relevant for the conclusions in the present manuscript, it might be important for future work on the cell types involved in the regeneration process.

      - Line 226: Could the authors clarify if "WNT9" is "WNT9a". Figure 3 lists WNT9a but authors refer to WNT9 in the text. 

      The gene names in Fig 3 are based on the human identifiers. H. glaberrima only has one sequence of Wnt9 (Auger et al. 2023) and this sequence shares the highest similarity to human Wnt9a, thus the name in the list. We have now identified the gene as Wnt9 to avoid confusion.

      - Lines 236-237: Can authors rule out that some immune cells might infiltrate the mesenchymal population? 

      No, this cannot be ruled out.  In fact, we believe that most of the immune cells found in our scRNA-seq are indeed cells that have infiltrated the anlage and are part of the mesenchyma.  This has been reported by us previously (see Garcia-Arraras et al. 2006). We have now included this in the text.

      - Line 452-453: The over-representation of ribosomal genes not shown. Would it be possible to show this information in the supplementary figures? 

      The sentence has been modified, the data is being prepared as part of a separate publication that focuses on the ribosomal genes.

      - Line 480: Could authors clarify if it's WNT9a or just WNT9?

      It is indeed Wnt9. See previous response above.

      - Line 500: In future experiments, it would be interesting to compare to populations at different timepoints in order see how the populations are changing or if certain precursors are activated at different times. 

      We fully agree with the reviewer. These are ongoing experiments or are part of new grant proposals.

      - Line 567-568: Choosing 9-dpe allowed for 13 clusters, but do authors expect a different number of clusters at different timepoints as things become more terminally differentiated? 

      Definitely, we believe that clusters related to the different regenerative stages of cells can be found by looking at earlier or later regeneration stages of the organ.  A clear example is that if the experiment is done at 14-dpe, when the lumen is forming, cells related to luminal epithelium populations will appear. It is also possible that different immune cells will be associated with the different regeneration stages.

      - Line 653: References Figure 10D (not in this manuscript). Are authors referring to only 1D or 9D or an old draft figure number? 

      As the reviewer correctly points out, this was a mistake where the reference is to a previous draft. It has now been corrected.

      - Line 701: "our study reveals that the coelomic epithelium, as a tissue layer, is pluripotent." Phrasing may be better as referring to the cell population making up the tissue layer as pluripotent/multipotent or that the cells it contains would likely be pluripotent or multipotent. Additionally, lineage tracing may be needed to definitively demonstrate this. 

      This has been modified.

      - Line 808: The authors may make a more accurate conclusion by saying that the characteristics are similar to blastemas or behave like a blastema rather than it is blastema. There is ambiguity about the meaning of this term in the field, but most researchers seem to currently have in mind that the "blastema" definition includes a discrete spatial organization of cells, and here these cells are much more spread out. This could be a good opportunity for the authors to engage in this dialogue, perhaps parsing out the nuances of what a "blastema" is, what the term has traditionally referred to, and how we might consider updating this term or at least re-framing the terminology to be inclusive of functions that "blastemas" have traditionally had in the literature and how they may be dispersed over geographical space in an organism more so than the more rigid, geographically-restricted definition many researchers have in mind. However, if the authors choose to elaborate on these issues, those elaborations do belong in the discussion, and the more provisional terminology we mention here could be used throughout the paper until that element of the revised discussion is presented. We would welcome the authors to do this as a way to point the field in this direction as this is also how we view the matter. For example, some of the genes whose expression has been observed to be enriched following removal of brain tissue in axolotls (such as kazald2, Lust et al.), are also upregulated in traditional blastemas, for instance, in the limb, but we appreciate that the expression domain may not be as localized as in a limb blastema. Additionally, since there is now evidence that some aspects of progenitor cell activation even in limb regeneration extend far beyond the local site of amputation injury (Johnson et al., Payzin-Dogru et al.), there is an opportunity to connect the dots and make the claim that there could be more dispersion of "blastema function" than previously appreciated in the field. Diving a bit more into these nuances may also enable better conceptual framework of how blastema function may evolve across vast evolutionary time and between different injury contexts in super-regenerative organisms. 

      We have followed the reviewer’s suggestion and stated that the holothurian anlage behaves as a blastema. Though we would love to elaborate on the blastema topic, as suggested by the reviewer, we believe that it would extend the discussion too much and that the topic might be better served in a different publication.

      - In the discussion, it would be important not to leave the reader with the impression that all amphibian blastema cells originate via dedifferentiation. This is not the case. For example, in axolotls (Sandoval-Guzman et al.) and in larval/juvenile newts, muscle progenitors within the blastema structure have been shown to originate from muscle satellite cells, a kind of stem cell, in stump tissues (while adult newts use dedifferentiation of myofibers to generate muscle progenitors in the blastema). Most cell lineages simply have not been evaluated in the level of detail that would be required to definitively conclude one way or the other, and the door is open for a more substantial contribution from stem cell populations than previously appreciated especially because new tools exist to detect and study them. Providing the reader with a more nuanced view of this situation will not negatively impact the findings in this paper, but it will show that there is biological complexity still waiting to be discovered and that we don't have all the answers at this point. 

      This has now been corrected. 

      Figures: Overall, the figures need minor work. 

      - Figure 1A: Can the authors draw a smaller, full-body cartoon and feature the current high-mag cartoon as an inset to that? Can they label the axes and make it clear how the geometry works here?

      Fig 1 has been re-done and now is split into Fig 1 and Fig 2.

      - Figure 1B: Can the authors label the UMAP with cluster identities on the map itself? This will make it easier to identify each cluster (especially to make sure cluster 11 is easier to find). 

      This has been corrected.

      - Figure 2: Could the authors put boxes/clearly distinguish panel labels around each cluster (AO), so that there are clear boundaries? 

      Fig 2 has been moved to Supplement, following another reviewer recommendation.

      - "Gene identifiers starting with "g" correspond to uncharacterized gene models of H. glaberrima." - The sentence is from another figure caption but this figure would benefit from having this sentence in the figure caption as well. 

      This has been added to other figures as suggested.

      - Figure 3A: Can the authors potentially bold, highlight, or underline genes you discuss in text, so it's easier for the reader to reference? 

      This has been added as suggested.

      - Figure 3C: Can the authors please label the cell types directly on the UMAP here as well? 

      The changes were made following the reviewer’s recommendation.

      - Figure 4D-E: There's not much context here to determine if this HCR-FISH validation can tell us anything about these cells besides some of them appear to be there. Do authors expect the coelomocyte morphology to look different in regenerating/injured tissue versus normal animals? Can the authors provide some double in situs, as well as some lower-magnification views showing where the higher-magnification insets are located? Is there any spatial pattern to where these cells are found? Counter stains would be helpful. 

      - Figure 6C: If clusters C5, C8, C9 are part of the coelomic epithelium, then authors could show a smaller diagram above with blue and grey to show types and then show clusters separately to help get their point across better. 

      - Figure 6G: This image appears to have high background- would it be possible for authors to repeat phalloidin stain or reimage with a lower exposure/gain. Additionally, imaging with Zstacks would help to obtain maximum intensity projections. It would greatly aid the reader if each image was labeled with HCR probes/antibodies that have been applied to the sample. 

      - Figure 7E: The cells appear to be out of focus and have high background. Additionally, they are lacking the speckled appearance expected to be seen with HCR-FISH. Would it be possible for authors to collect another image utilizing z-stacks? 

      HCR-FISH figures identifying the gene expression characteristic of cell clusters have been modified following the reviewer’s concerns.  The changes include:

      (1) Additional clusters have been verified with probes to gene identifiers. These include clusters 8, 9 and 12.

      (2) Redundant information has been removed.

      (3) Colors have been changed to make figures friendlier to color-impaired readers.

      (4) Spatial context has been added or identified.

      (5) In some cases, improved photos have been added

      (6) Better labels have been included

      (7) When necessary individual photos used for the overlay have been included.

      - Figure 9A: Could authors add cluster labels onto UMAP directly? 

      This change was made to Fig 2A. UMAP in Fig 9A is the same and used just as reference of the subset.

      - Figure 10: It could be useful if authors put a small map of the sea cucumber like in other images so that readers know where in the anlage this zoomed in model represents. 

      Added as suggested by the reviewer.

      - Supplementary figure 1F: Could authors add an arrow to the dark cell that's being pointed out? 

      Changed made as suggested by the reviewer.

      - Supplementary figure 1: Could authors label clearly what color is labeled with what marker? 

      Changed made as suggested by the reviewer.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The authors present valuable findings on trends in hind limb morphology throughout the evolution of titanosaurian sauropod dinosaurs, the land animals that reached the most remarkable gigantic sizes. The solid results include the use of 3D geometric morphometrics to examine the femur, tibia, and fibula to provide new information on the evolution of this clade and understand the evolutionary trends between morphology and allometry. Further justification of the ontogenetic stages of the sampled individuals would help strengthen the manuscript's conclusions, and the inclusion of additional large-body mass taxa could provide expanded insights into the proposed trends.

      Most of the analyzed specimens, especially from the smaller taxa, come from adult or subadult specimens. None exhibit features that may indicate juvenile status. However, we lack information of the paleohistology that may be a stronger indicator on the ontogenetic status of the individual, and some of operative taxonomic units used in the study come from mean shape of all the sampled specimens.

      Current information on morphological differences between adult and subadult or juvenile specimens indicates that even early juvenile specimens may share same morphological features and overall morphology as the adult (e.g., see Curry-Rogers et al., 2016; Appendix S3). We included a comprehensive analysis of the impact of juvenile specimens as one of the aspects of the intraspecific variability that may alter our results in Appendix S3.

      Public Reviews:

      Reviewer #1:

      Weaknesses:

      Several sentences throughout the manuscript could benefit from citations. For example, the discussion of using hind limb centroid size as a proxy for body mass has no citations attributed. This should be cited or described as a new method for estimating body mass with data from extant taxa presented in support of this relationship. This particular instance is a very important point to include supporting documentation because the authors' conclusions about evolutionary trends in body size are predicated on this relationship.

      We address this issue in the text (Line 32 & 64). Centroid size seems a good indication as it’s the overall size of the entire hind limb, and the length of the femur and tibia is well correlated independently with the body size/mass. Also, as we use few landmarks and only those that are purely type I or II landmarks, with curves of semilandmarks bounded or limited by them, centroid size is not sensible to landmark number differences across the sample in our study (as the centroid size is dependent of the number of landmarks of the current study as well as the physical dimensions of the specimens).

      We have sampled and repeated all the analyses using other proxies like the femoral length and the body mass estimated from the Campione & Evans (2020) and Mazzeta et al. (2004) methods. The comprehensive description of the method is in Appendix S2, the alternative analyses can be accessed in the Appendix S3 and S4; and the code for the alternative analyses can be accessed in the modified Appendix S5. All offer similar results than the ones obtained in our analyses with the body size proxied with the hind limb landmark configuration centroid size.

      An additional area of concern is the lack of any discussion of taphonomic deformation in Section 3.3 Caveats of This Study, the results, or the methods. The authors provide a long and detailed discussion of taphonomic loss and how this study does a good job of addressing it; however, taphonomic deformation to specimens and its potential effects on the ensuing results were not addressed at all. Hedrick and Dodson (2013) highlight that, with fossils, a PCA typically includes the effects of taphonomic deformation in addition to differences in morphology, which results in morphometric graphs representing taphomorphospaces. For example, in this study, the extreme negative positioning of Dreadnoughtus on PC 2 (which the authors highlight as "remarkable") is almost certainly the result of taphonomic deformation to the distal end of the holotype femur, as noted by Ullmann and Lacovara (2016).

      We included a brief commentary in the Caveats of This Study (Line 467) and greatly expanded this issue in the Appendix S3. We followed the methodology proposed by Lefebvre et al. (2020) to discuss the effects of taphonomic deformation in the shape analyses.

      Our shape variables (PCs obtained from the shape PCA) should be viewed as taphomorphospaces as Hedrick and Dodson, as well as the reviewer, points in such cases.

      The analysis of the effects of taphonomy or errors induced by the landmark estimation method indicate that Dreadnoughtus schrani is one of the few sampled taxa that may have a noticeable impact on our analyses due lithostatic deformation. Other taxa like Mendozasaurus neguyelap or Ampelosaurus atacis may also induce some alterations to the PCs. In general, the trends of those PCs slightly altered by taphonomy, where D. scharni is the only sauropod that may alter an entire PC like PC2, did not exhibit phylogenetic signal and are a small proportion of the sample variance.

      The authors investigated 17 taxa and divided them into 9 clades, with only Titanosauria and Lithostrotia including more than two taxa (and four clades are only represented by one taxon). While some of these clades represent the average of multiple individuals, the small number of plotted taxa can only weakly support trends within Titanosauria. If similar general trends could be found when the taxa are parsed into fewer, more inclusive clades, it would support and strengthen their claims. Of course, the authors can only study what is preserved in the fossil record, and titanosaurian remains are often highly fragmentary; these deficiencies should therefore not be held against the authors. They clearly put effort and thought into their choices of taxa to include in this study, but there are limitations arising from this low sample size that inherently limit the confidence that can be placed on their conclusions, and this caveat should be more clearly discussed. Specifically, the authors note that their dataset contains many lithostrotians, but they do not discuss unevenness in body size sampling. As neither their size-category boundaries nor the taxa which fall into each of them are clearly stated, the reader must parse the discussion to glean which taxa are in each size category. It should be noted that the authors include both Jainosaurus and Dreadnoughtus as 'large' taxa even though the latter is estimated to have been roughly five times the body mass of the former, making Dreadnoughtus the only taxon included in this extreme size category. The effects that this may have on body size trends are not discussed. Additionally, few taxa between the body masses of Jainosaurus and Dreadnoughtus have been included even though the hind limbs of several such macronarians have been digitized in prior studies (such as Diamantinasaurus and Giraffititan; Klinkhamer et al. 2018). Also, several members of Colossosauria are more similar in general body size to Dreadnoughtus than Jainosaurus, but unfortunately, they do not preserve a known femur, tibia, and fibula, so the authors could not include them in this study. Exclusion of these taxa may bias inferences about body size evolution, and this is a sampling caveat that could have been discussed more clearly. Future studies including these and other taxa will be important for further evaluating the hypotheses about macronarian evolution advanced by Páramo et al. in this study.

      Sadly, we could not include some larger sized titanosaurians sauropods. As the reviewers points out, the lack of larger sauropods among the sampled taxa may hinder our results, as the “large-bodied” category is filled with some mid-sized taxa and the former Dreadnoughtus schrani which is five times larger than some of them. We tried to include Elaltitan lilloi, digitized for this study and included in preliminary analyses, but the fragmentary status increased greatly the error by the estimation method as there is only a proximal third or mid femur preserved from this taxon. Therefore we opted to exclude it from our database.

      Other taxa considered, as the reviewer suggest, was not readily available for the authors as the time of this study was conducted and including now may have increased the possible bias of our study. Giraffatitan brancai is an Late Jurassic brachiosaurid, which may again increase the number of early-branching titanosauriforms with large body masses while most of the smaller taxa sampled are recovered in deeply-branching macronarians (including Diamantinasaurus matildae if we would have also included it). Future analyses may include a wider sample of the mid to large-bodied titanosaurians, especially lithostrotians, as well as some colossosaurs like Patagotitan mayorum.

      Reviewer #1 (Recommendations For The Authors):

      These are all minor comments that would improve the manuscript.

      - There are a few typos throughout the manuscript such as: line 70 should be 2016 and line 242 should be forelimb.

      Corrected.

      - To me, the most interesting aspect of your study is the diversity and trends recovered in titanosaurian subclades and I would highlight this, not gigantism, in the title if you choose to revise the title.

      It has been addressed. The specificality of some of the tests and the implication to the acquisition of the spread limb posture and gigantism in early-branching taxa is important nonetheless, so we think that it may remain in the title.

      - The abstract should provide more details on the results such as none of the listed trends were statistically significant.

      Many of the trends exhibit phylogenetic signal, but not the allometric components. We have briefly addressed them.

      - Several sentences in the manuscript need citations such as: line 48 the reference to other megaherbivores, line 66 the discussion of poor understanding of the relationship of wide gauge posture and gigantism, and the use of centroid size as an estimate of body mass (see Public Review).

      We changed the line 66 to improve the focus on the current state of the art in the hypothesis of a relationship between arched limbs and in the increase of body size. We included a section relating centroid size as a proxy (due the good correlation between the femur and tibia length and the body mass) and the caveats of using it. We also expanded in the Appendix S2 the use of centroid size and the alternative models.

      - With titanosaur evolution, you mention that they are adapting to new niches and topography (line 64). What support is there for this versus they are adapting to be more successful in their current environment?

      Noted, we have changed the phrase to improved efficiency exploiting of inland environments, as thy can be either opening new inland niches or adapting better to current inland niches that were already exploited for less deeply branching sauropods. However, its testing is beyond the scope of the current work.

      - Line 384-385: the discussion of Rapetosaurus should mention that it is a juvenile and some studies have suggested that titanosaur limbs grow allometrically.

      We have included a small line. Whether Rapetosaurus krausei exhibit allometric growth or not may not change greatly the discussion, maybe only excluding it as morphologically convergent to Lirainosaurus and Muyelensaurus. But if that so, it will be further proof that small-sized titanosaurs exhibit the robust skeleton expected in the giant titanosaurs.

      - I would consider addressing the question of if we are certain enough in our understanding of titanosaurian phylogeny to rule out homology, especially when you discuss the uncertainty of the placement of specific taxa. Also, Diamantinasaurus is not the only titanosaur that has been proposed as a member of both basal and more derived subclades (e.g., Dreadnoughtus).

      We tried to assume a more conservative approach. We could not fully rule out that some of the features observed in the sampled deeply branching lithostrotians, especially saltasauroids, cannot be present in the entire somphospondylan lineage. However, none of the less deeply-branching or early-branching titanosaurs exhibit this kind of morphology. Recent studies propose the possibility that entire groups, included in this study like the Colossosauria, change its position in the phylogeny. However, despite the debated phylogenetic position of Diamantinasaurus or Dreadnoughtus, or even the inclusion of Colossosauria within the saltasauroids and the inclusion of the Ibero-Armorican lithostrotians as putative saltasaurids (Mocho et al. 2024). However, even considering these changes we did not notice any relevant differences in our conclusions about hind limb arched morphology nor about size. Distal hind limb overall robustness should indeed be addressed in the light of shifts in phylogenetic position and include some interesting sauropods like Diamantinasaurus or expand the large-sized Colossosauria or early-branching somphospondyls as it may have profound implications on the morphofunctional adaptations to specific feeding niches, e.g., see current hypotheses about rearing as mentioned in Bates et al. (2016), Ullmann et al. (2017) or Vidal et al. (2020). We had not enough information to conclude the presence of any plesiomorphic condition or analogous feature with our current sample and the debated titanosaurian phylogeny.

      - I understand this is not standard in the field, but your study provides the opportunity to conduct sensitivity testing of the effects of cartilage thickness and user articulation of the bones on PCA results. This would be an inciteful addition to the field of GMM.

      We are currently developing such a comprehensive analysis and several other implications on our past results. However, we feel that it is beyond the scope of the current study. We appreciate the suggestion nonetheless, as it would be a sensitivity test of the impact of several of our assumptions in the final results that is often not considered.

      - In Figure 1, if all the limbs were arranged the same way it would be easier to interpret. Consider flipping panels B and D to match A and C.

      Accepted.

      - In Figures 2-4, the views in C should be labeled in the figure or caption. Oceanotitan is also in the PCA plot but not included in the figure caption. Also, consider changing the names to represent the paraphyletic groupings you are using instead of formal clade names. For example, change 'Titanosauria' to 'Basal Titanosaurs' to reflect that it is not including all titanosaurs in the sample.

      Changes accepted for the shape PCA results. The informal (i.e., paraphyletic) terms such as “Basal Titanosaurs” were only used in the shape analyses as in the RMA, the Titanosauria (and other more inclusive groups) were used as natural groups. Each partial RMA model is based on a sample of all the taxa that are included within that particular clade (e.g., Titanosauria includes both Dreadnoughtus and Saltasaurus; Lithostrotia excludes the former).

      - I am concerned that centroid size does not scale evenly across the wide-ranging body mass of titanosaurs. I do not know if this affects your size trends or their significance, but as I mentioned above Dreadnoughtus is much bigger than most of the taxa included and that isn't as drastically apparent in centroid size (in Figure 5) as it is when taxa are plotted by body mass.

      Main problematic with centroid size of the hind limb is the shift in the body plan of deeply-branching titanosaurs as the Center of Masses is displaced toward the anterior portion of the body and it has been proposed due a large development of the forelimb region (e.g., Bates et al. 2016). However, it would only increase the effects of the phyletic body size reduction, as smaller taxa tend to have a 1:1 fore limb and hind limb ratio, e.g., from our past analyses as in Páramo et al. (2019), and the sacrum is not as beveled as in earlier somphospondyls, e.g., Vidal et al. (2020). The role of the low-browsing feeding habits of deeply-branching lithostrotians shall be explored elsewhere, as it may be the main driving force of this effect. Our point is, the proxy used may have some slight offset due some high-browsing giant early-branching titanosaurs which has a greater cranial region development which increase its body size and mass beyond our bare-minimum estimation based on the hind limb region. But, overall, this offset is assumed to be low. We repeated the analyses with the femoral length as proxy of body size and a mass estimation, including the quadratic equation based on both humeral and femoral lengths, and the results remain similar. Another problem that arises with the use of centroid size is the way it shall be calculated, but as we used an even number of landmarks and curve semilandmarks, and all of them bounded to anatomical features, it remains equal at least for our sample (but cannot be extrapolated to other geometric morphometric studies that do not use the same configurations)

      We appreciate the reviewer concerns nonetheless, as it was on of our own when designing this study, and we in the future will try to expand the analyses, or advise anyone expanding on this study, using total body size/volume estimations following Bates et al. (2016). Which also includes test of the effects of the different whole-body estimation models.

      Cites:

      Bates KT, Mannion PD, Falkingham PL, Brusatte SL, Hutchinson JR, Otero A, Sellers WI, Sullivan C, Stevens KA, Allen V. 2016. Temporal and phylogenetic evolution of the sauropod dinosaur body plan. Royal Society Open Science 3:150636. doi:10.1098/rsos.150636

      Mocho P, Escaso F, Marcos-Fernández F, Páramo A, Sanz JL, Vidal D, Ortega F. 2024. A Spanish saltasauroid titanosaur reveals Europe as a melting pot of endemic and immigrant sauropods in the Late Cretaceous. Commun Biol 7:1016. doi:10.1038/s42003-024-06653-0

      Páramo A, Ortega F, Sanz JL. 2019. A Niche Partitioning Scenario for the Titanosaurs of Lo Hueco (Upper Cretaceous, Spain). International Congress of Vertebrate Morphology (ICVM) - Abstract Volume, Journal of Morphology. Prague. p. S197.

      Ullmann PV, Bonnan MF, Lacovara KJ. 2017. Characterizing the Evolution of Wide-Gauge Features in Stylopodial Limb Elements of Titanosauriform Sauropods via Geometric Morphometrics. The Anatomical Record 300:1618–1635. doi:10.1002/ar.23607

      Vidal D, Mocho P, Aberasturi A, Sanz JL, Ortega F. 2020. High browsing skeletal adaptations in Spinophorosaurus reveal an evolutionary innovation in sauropod dinosaurs. Sci Rep 10:6638. doi:10.1038/s41598-020-63439-0

      Reviewer #2:

      The authors report a quantitative comparative study regarding hind limb evolution among titanosaurs. I find the conclusions and findings of the manuscript interesting and relevant. The strength of the paper would be increased if the authors were to improve their reporting of taxon sampling and their discussion of age estimation and the potential implications that uncertainty in these estimates would have for their conclusions regarding gigantism (vs. ontogenetic patterns).

      Considering the observations made by reviewer #1, we included a data about the impact of ontogenetic patterns and other intraspecific variability in the Appendix S3. We considered to increase the sample but it has not been possible at the time of this study was carried out.

      Reviewer #2 (Recommendations For The Authors):

      I have a few concerns/requests for the authors, that I hope can be easily resolved.

      Comments:

      - What drove taxon sampling?

      Random sampling of somphospondylan sauropods focused on the Lithostrotia clade for the thesis project of one of the authors, APB. Logistics were also one of the bias on our sample, and based on the available titanosaurian material we left out several macronarians that has been already sampled but would further induce a early-branching large sauropod, deeply-branching small sauropod that may alter our results.

      - Which phylogenies were used to create the supertree applied to the analyses? What references were used to time-calibrate the tips and deeper nodes? I couldn't find any reference to this. Additionally, more information regarding the R packages and analytical pipeline would be appreciated: e.g. were measurements used in the analyses log-transformed?

      A comprehensive description of the methodology is provided in Appendix S2.

      - Age estimate: can the author confirm the skeletal maturity of the sampled individuals? If this is not the case, how can the author be sure that the patterns towards gigantism are not reflecting different ontogenetic stages? I believe this should be part of both methods and discussion.

      As commented before, we excluded small, probable juvenile specimens from our sample. We have no paleohistological sample backing the claims of the ontogenetic status of some of the specimens that were included or excluded were calculating the mean shape for the operative taxonomic units. However, we followed a criteria to identify the relative ontogenetic status and it has been included in Appendix S3.

      - The authors used the centroid size for regressions in Figure 6. Although I believe that this is a good variable, would the author be willing to use body mass and log-transformed femur length in addition to what was done? These would be very useful considering that these variables are (relatively) independent from shape/morphology.

      Accepted, we tested our hypotheses with three alternative models based on femoral length, combined femoral and humeral lengths for body mass estimations. Methodology can be found in Appendix S2, results on Appendix S4, code for the alternative methods in Appendix S5.

      - Data access: will stl. Files of the limb elements be shared and freely available? In this case, where the files will be deposited?

      At the time of the current study, some of the sampled specimens cannot be available (material under study) but the mean shapes can be generated after the landmarks and semilandmark curves and the “atlas” mesh.

      - Additionally, outstanding references regarding limb evolution, GMM, role of ontogeny, and evolution of columnar gait are missing. The authors should reinforce the literature review with the following (alphabetical order):

      Bonnan, M. F. (2003). The evolution of manus shape in sauropod dinosaurs: implications for functional morphology, forelimb orientation, and phylogeny. Journal of Vertebrate Paleontology, 23(3), 595-613.

      Botha, J., Choiniere, J. N., & Benson, R. B. (2022). Rapid growth preceded gigantism in sauropodomorph evolution. Current Biology, 32(20), 4501-4507.

      Curry Rogers, K., Whitney, M., D'Emic, M., & Bagley, B. (2016). Precocity in a tiny titanosaur from the Cretaceous of Madagascar. Science, 352(6284), 450-453.

      Day, J. J., Upchurch, P., Norman, D. B., Gale, A. S., & Powell, H. P. (2002). Sauropod trackways, evolution, and behavior. Science, 296(5573), 1659-1659.

      Fabbri, M., Navalón, G., Benson, R. B., Pol, D., O'Connor, J., Bhullar, B. A. S., ... & Ibrahim, N. (2022). Subaqueous foraging among carnivorous dinosaurs. Nature, 603(7903), 852-857.

      Fabbri, M., Navalón, G., Mongiardino Koch, N., Hanson, M., Petermann, H., & Bhullar, B. A. (2021). A shift in ontogenetic timing produced the unique sauropod skull. Evolution, 75(4), 819-831.

      González Riga, B. J., Lamanna, M. C., Ortiz David, L. D., Calvo, J. O., & Coria, J. P. (2016). A gigantic new dinosaur from Argentina and the evolution of the sauropod hind foot. Scientific Reports, 6(1), 19165.

      Lefebvre, R., Allain, R., & Houssaye, A. (2023). What's inside a sauropod limb? First three‐dimensional investigation of the limb long bone microanatomy of a sauropod dinosaur, Nigersaurus taqueti (Neosauropoda, Rebbachisauridae), and implications for the weight‐bearing function. Palaeontology, 66(4), e12670.

      McPhee, B. W., Benson, R. B., Botha-Brink, J., Bordy, E. M., & Choiniere, J. N. (2018). A giant dinosaur from the earliest Jurassic of South Africa and the transition to quadrupedality in early sauropodomorphs. Current Biology, 28(19), 3143-3151.

      Martin Sander, P., Mateus, O., Laven, T., & Knötschke, N. (2006). Bone histology indicates insular dwarfism in a new Late Jurassic sauropod dinosaur. Nature, 441(7094), 739-741.

      Remes, K. (2008). Evolution of the pectoral girdle and forelimb in Sauropodomorpha (Dinosauria, Saurischia): osteology, myology and function (Doctoral dissertation, München, Univ., Diss., 2008).

      Sander, P. M., & Clauss, M. (2008). Sauropod gigantism. Science, 322(5899), 200-201.

      Yates, A. M., & Kitching, J. W. (2003). The earliest known sauropod dinosaur and the first steps towards sauropod locomotion. Proceedings of the Royal Society of London. Series B: Biological Sciences, 270(1525), 1753-1758.

      We appreciate this suggestion and we already used some of the articles in our study but the selection of cites were based also in the available manuscript space enforced by the edition guidelines. We would have like to include several of these works but we had opted to include some of the works that summarize some of them, whereas excluding others.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive criticism. It is rare and gratifying to receive such thoughtful feedback, and the result is a much stronger paper. We made significant changes to our statistical analyses and figures to better differentiate the effects of sex and dominance rank on food-cleaning behaviors. These revisions uphold our original conclusion––that rank-related variation overwhelms any sex difference in cleaning behavior. We hope that these edits, together with the rest of our responses, provide a convincing demonstration of the tradeoffs of eliminating quartz from food surfaces.

      Reviewer #1 (Public Review):

      Summary

      We have no objections to Reviewer 1’s summary of our manuscript.

      Strengths

      Reviewer 1 is extremely gracious, and we are grateful for the kind words.

      Weaknesses

      Reviewer 1 identified several weaknesses, enumerating three types: (1) statistics, (2) insufficient links to foraging theory, and (3) interpretation and validity of the model. The present response is organized around these same categories.

      (1) Statistics

      We put all of our data and code into the Zenodo repository prior to submission. This content should have been accessible to Reviewer 1 from the outset. But in any event, we are very sorry for the mixup. To ensure access to our data and code during the present stage of review, we included the URL in the main mainscript and here: https://doi.org/10.5281/zenodo.14002737

      (a) AIC and outcome distributions

      Reviewer 1 criticized our use of AIC for determining model selection. We agree and this aspect of our manuscript is now removed. In lieu of AIC, we produced two data sets consisting of whole number counts (seconds) with means <5. The data were right-skewed due to high concentrations of biologically-meaningful zeros (i.e., bouts of food handling without any cleaning effort). Following the recommendations of Bolker et al. (2008) and others (Brooks et al. 2017, 2019), we chose an outcome distribution (zero-inflated Poisson, see response below) that best matched this data distribution. In addition, we evaluated the post-hoc performance of each of our models using the standardized residual diagnostic tools for hierarchical regression models available in the DHARMa package (Hartig, 2022). To further evaluate our choice of outcome distribution, we generated QQ-plots and residual vs. predicted plots for each model and included them in our revision as Figures S3-S5.

      (b) zeros

      Reviewer 1 expressed concern over our treatment of biologically-meaningful zeros, and recommended use of a zero-inflated GLMM with either a Poisson or negative binomial outcome distribution. We agree that such models are best for our two data sets. Accordingly, we fit a series of zero-inflated generalized linear mixed models (ZIGLMM) using the glmmTMB package in R, each with a logit-link function, a single zero-inflation parameter applying to all observations, and a Poisson error distribution. For the food-brushing model, we fit a zero-inflated Poisson (ZIP), which produced favorable standardized residual diagnostic plots with no major patterns of deviation (Figure S3) and minor, but non-significant underdispersion (DHARMa dispersion statistic = 0.99, p = 0.80). For our two food-washing models, we used zero-inflated models with Conway-Maxwell Poisson (ZICMP) distributions, an error distribution chosen for its ability to handle data that are more underdispersed (DHARMa dispersion statistic = 8.2E-09, p = 0.74) than the standard zero-inflated Poisson (Brooks et al. 2019). Using this error distribution improved residual diagnostic plots over a standard ZIP model and we view any deviations in the standardized residuals as minor and attributable to the smaller sample size of our food-washing data set (see Figures S4 and S5) (Hartig, 2022). We reported the summarized fixed effects tests for each GLMM in Tables S1-S3 as Analysis of Deviance Tables (Type II Wald chi square tests, one-sided) along with 𝜒2 values, degrees of freedom, and p-values (one-sided tests). Full model summaries with standard errors and confidence intervals are also included in Tables S4-S6. For all statistical analyses, we set 𝛼 = 0.05.

      (2) Absence of Links to Foraging Theory

      This critique has three components. The first revisits the absence of code for the optimal cleaning time model. This omission was an unfortunate error at the moment of submission, but our code is available now as a Mathematica notebook in Zenodo (https://doi.org/10.5281/zenodo.14002737). The second pivots around our scholarship, admonishing us for failing to acknowledge the marginal value theorem of Charnov (1976). It is a fair point and we have corrected the oversight with a citation to this classic paper. The third criticism is also rooted in scholarship, with Reviewer 1 asking for greater connection to the existing literature on optimal foraging theory, a point echoed in the summary assessment of the editors at eLife. This comment and the weight given to it by eLife’s editors put us in a difficult spot, as our paper is focused on the optimization of delayed gratification, not food acquisition per se. So, we are in the awkward position of gently resisting this recommendation while simultaneously agreeing with Reviewer 1 that we need to better situate our findings in the landscape of existing literature. To thread this needle, we produced Box 2 with a photograph and 410 words. This display box puts our findings into direct conversation with recent research focused on the sunk cost fallacy.

      (3) Interpretation and validity of model relative to data

      This critique is focused on the simulated brushing and washing results reported in Figure S1, along with its captioning, which was inadequate. We edited the caption to identify the author (JER) who simulated the brushing and washing behaviors of the monkeys. In addition, we clarified the number of brushing replicates (3) and washing replicates (3) for each of three treatments, for a total of 18 simulations.

      We followed Reviewer 1’s suggestion, incorporating the experimental uncertainty of grit removal into our optimal cleaning time model. We drew % grit removed values the % grit removed is used to estimate the cleaning inefficiency≥ 100%parameter 𝑐 for from a distribution, discounting the rare event when values were drawn. As brushing and washing, the included uncertainty now allows us to evaluate these parameters as distributions; and, in turn, obtain a distribution for our predicted brushing and washing optimal cleaning times. As we now describe in the main text, the optimal cleaning time for brushing and washing are 𝑡* \= 0. 98 ± 0. 19 s and * = 2. 40 ± 0. 74 s, respectively. We are grateful for Reviewer 1’s suggestion, for it added𝑡 valuable context to our model predictions. Notably, the inclusion of experimental uncertainty did not change the qualitative nature of our results, or the interpretations of our model predictions compared to observed cleaning behaviors.

      We choose to exclude variability in handling time h to generate predicted cleaning time optima, at least in the main text. Our reasoning stems from the observation that handling time variability is long-tailed, with the longer handling times associated with behaviors that we do not account for in our analysis. For example, individuals carrying multiple cucumber slices to the ocean were apt to drop them, struggling at times to re-grasp so many at once. Such moments increased handling times substantially. Still, we acted on Reviewer 1’s suggestion, accounting for the tandem effects of handling time variability and uncertainty in % grit removed (see Figure S6). Drawing handling time estimates from a log-normal distribution fitted to the handling time data, we found that these dual sources of uncertainty did not qualitatively change our results. They added further uncertainty to the predicted washing time, but the mean remains roughly equivalent. (We note that brushing is assumed to have a constant handling time––composed of only assessment time and no travel––such that the results for brushing do not change.) Both analyses are included in the Mathematica notebook at (https://doi.org/10.5281/zenodo.14002737).

      Reviewer #2 (Public Review):

      Summary

      We have no objections to Reviewer 2’s summary of our manuscript.

      Strengths

      Reviewer 2 is extremely gracious, and we are grateful for the kind words.

      Weaknesses

      Reviewer 2 noted that our manuscript failed to provide “sufficient background on [our study] population of animals and their prior demonstrations of food-cleaning behavior or other object-handling behaviors (e.g., stone handling).” To address this comment, we edited the introduction (lines 56-58) to alert readers to the onset of regular food-cleaning behaviors sometime after December 26, 2004. In addition, we edited our methods text (lines 155-160) to highlight the onset and limited scope of prior research with this study population:

      “The animals are well habituated to human observers due to regular tourism and sustained study since 2013 (Tan et al., 2018). Most of this research has revolved around stone tool-mediated foraging on mollusks, the only activity known to elicit stone handling (Malaivijitnond et al., 2007; Gumert and Malaivijitnond, 2012, 2013; Tan et al., 2015), although infants and juveniles will sometimes use stones during object play (Tan, 2017). There has been no prior examination of food-cleaning behaviors.”

      Reviewer #3 (Public Review):

      Reviewer 3 identified three weaknesses, which we address in three paragraphs.

      Reviewer 3 questioned our methods for determining rank-dependent differences in cleaning behavior, arguing that our conclusions were unsupported. It is a fair point, and it compelled us to combine males and females into a single standardized ordinal rank of 24 individuals. This unified ranking is now reflected in the x-axes of Figure 2 and Figure S2. Plotting the data this way––see Figure S2––underscores Reviewer 3’s concern that sex and dominance rank are confounding variables. To address this problem, our GLMM included rank and sex as predictor variables, which controls for the effect of sex when assessing the relationship between rank and cleaning time across the three treatments. Reported in Tables S1-S3, these findings show that the effect of sex on either brushing or washing time was not significant. This result bolsters our original contention that rank-related variation in cleaning time overwhelms any sex differences.

      Relatedly, Reviewer 3 questioned our conclusions on the effects of rank because our study was focused on a single social group. In other words, it is plausible that our results were heavily influenced by the idiosyncrasies of select individuals, not dominance rank per se. It is a fair point, and it compelled us to include individual ID as a random effect in each of our GLMMs. Including individual ID as a random intercept allowed us to control for inter-individual variation in cleaning duration while assessing the effects of rank. An analysis based on additional social groups or longitudinal data are certainly desirable, but also well beyond the scope of a Short Report for eLife.

      Finally, Reviewer 3 objected to fragments of sentences in our abstract, introduction, and discussion, combining them into a criticism of claims that we did not and do not make. It probably wasn’t intentional, but it puts us in the awkward position of deconstructing a strawman:

      ● Review 3 begins, “there is no evidence presented on the actual fitness-related costs of tooth wear or the benefits of slightly faster food consumption”. This statement is true while insinuating that collecting such evidence was our intent. To be clear, our experiment was never designed to measure tooth wear or reproductive fitness, nor do we make any claims of having done so.

      ● Reviewer 3 adds, “Support for these arguments is provided based on other papers, some of which come from highly resource-limited populations (and different species). But this is a population that is supplemented by tourists with melons, cucumbers, and pineapples!” We were puzzled over these sentences. The first fails to mention that the citations exist in our discussion. Citing relevant work in a discussion is a basic convention of scientific writing. But it seems the underlying intent of these words is to denigrate the value of our study population because two dozen tourists visit Koram Island once a day. Exclamations to the contrary, the amount of tourist-provisioned food in the diet of any one monkey is negligible.

      ● Last, Reviewer 3 commented on matters of style, objecting to “overly strong claims.” We puzzled over this criticism because the claims in question are broader points of introduction or discussion, not results. The root problem appears to be the final sentence of our abstract:

      “Dominant monkeys abstained from washing, balancing the long-term benefits of mitigating tooth wear against immediate energetic requirements, an essential predictor of reproductive fitness.”

      This sentence has three clauses. The first is a statement of results, whereas the second and third are meant to mirror our discussion on the importance of our findings. We combined the concepts into a single concluding sentence for the sake of concision, but we can appreciate how a reader could feel deceived, expecting to see data on tooth wear and fitness. So, our impression is that we are dealing with a simple misunderstanding of our own making, and that this single sentence explains Reviewer 3’s criticism and tone––it cast a long shadow over the substance of our paper. To resolve this problem, we edited the sentence:

      “Dominant monkeys abstained from washing, a choice consistent with the impulses of dominant monkeys elsewhere: to prioritize rapid food intake and greater reproductive fitness over the long-term benefits of prolonging tooth function.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In their manuscript, Gomez-Frittelli and colleagues characterize the expression of cadherin6 (and -8) in colonic IPANs of mice. Moreover, they found that these cdh6-expressing IPANs are capable of initiating colonic motor complexes in the distal colon, but not proximal and midcolon. They support their claim by morphological, electrophysiological, optogenetic, and pharmacological experiments.

      Strengths:

      The work is very impressive and involves several genetic models and state-of-the-art physiological setups including respective controls. It is a very well-written manuscript that truly contributes to our understanding of GI-motility and its anatomical and physiological basis. The authors were able to convincingly answer their research questions with a wide range of methods without overselling their results.

      We greatly appreciate the reviewer’s time, careful reading and support of our study.

      Weaknesses:

      The authors put quite some emphasis on stating that cdh6 is a synaptic protein (in the title and throughout the text), which interacts in a homophilic fashion. They deduct that cdh6 might be involved in IPAN-IPAN synapses (line 247ff.). However, Cdh6 does not only interact in synapses and is expressed by non-neuronal cells as well (see e.g., expression in the proximal tubuli of the kidney). Moreover, cdh6 does not only build homodimers, but also heterodimers with Chd9 as well as Cdh7, -10, and -14 (see e.g., Shimoyama et al. 2000, DOI: 10.1042/02646021:3490159). It would therefore be interesting to assess the expression pattern of cdh6proteins using immunostainings in combination with synaptic markers to substantiate the authors' claim or at least add the possibility of cell-cell-interactions other than synapses to the discussion. Additionally, an immunostaining of cdh6 would confirm if the expression of tdTomato in smooth muscle cells of the cdh6-creERT model is valid or a leaky expression (false positive).

      We agree with the reviewer that Cdh6 could be mediating some other cell-cell interaction besides synapses between IPANs, and we noted it in the discussion. Cdh6 primarily forms homodimers but, as the reviewer points out, has been known to also form heterodimers with some other cadherins. We performed RNAscope in the colonic myenteric plexus with Cdh7 and found no expression (data not shown). Cdh10 is suggested to have very low expression (Drokhlyansky et al., 2020), possibly in putative secretomotor vasodilator neurons, and Cdh14 has not been assayed in any RNAseq screens. We attempted to visualize Cdh6 protein via antibody staining (Duan et al., 2018) but our efforts did not result in sufficient signal or resolution to identify synapses in the ENS, which remain broadly challenging to assay. Similarly, immunostaining with Cdh6 antibody was unable to confirm Cdh6 protein in tdT-expressing muscle cells, or by RNAscope. We have addressed these caveats in the discussion section.

      (1) E. Drokhlyansky, C. S. Smillie, N. V. Wittenberghe, M. Ericsson, G. K. Griffin, G. Eraslan, D. Dionne, M. S. Cuoco, M. N. Goder-Reiser, T. Sharova, O. Kuksenko, A. J. Aguirre, G. M. Boland, D. Graham, O. Rozenblatt-Rosen, R. J. Xavier, A. Regev, The Human and Mouse Enteric Nervous System at Single-Cell Resolution. Cell 182, 1606-1622.e23 (2020).

      (2) X. Duan, A. Krishnaswamy, M. A. Laboulaye, J. Liu, Y.-R. Peng, M. Yamagata, K. Toma, J. R. Sanes, Cadherin Combinations Recruit Dendrites of Distinct Retinal Neurons to a Shared Interneuronal Scaffold. Neuron 99, 1145-1154.e6 (2018).

      Reviewer #2 (Public review):

      Summary:

      Intrinsic primary afferent neurons are an interesting population of enteric neurons that transduce stimuli from the mucosa, initiate reflexive neurocircuitry involved in motor and secretory functions, and modulate gut immune responses. The morphology, neurochemical coding, and electrophysiological properties of these cells have been relatively well described in a long literature dating back to the late 1800's but questions remain regarding their roles in enteric neurocircuitry, potential subsets with unique functions, and contributions to disease. Here, the authors provide RNAscope, immunolabeling, electrophysiological, and organ function data characterizing IPANs in mice and suggest that Cdh6 is an additional marker of these cells.

      Strengths:

      This paper would likely be of interest to a focused enteric neuroscience audience and increase information regarding the properties of IPANs in mice. These data are useful and suggest that prior data from studies of IPANs in other species are likely translatable to mice.

      We appreciate the reviewer’s support of our study and insightful critiques for its improvement.

      Weaknesses:

      The advance presented here beyond what is already known is minimal. Some of the core conclusions are overstated and there are multiple other major issues that limit enthusiasm. Key control experiments are lacking and data do not specifically address the properties of the proposed Cdh6+ population.

      Major weaknesses:

      (1) The novelty of this study is relatively low. The main point of novelty suggests an additional marker of IPANs (Cdh6) that would add to the known list of markers for these cells. How useful this would be is unclear. Other main findings basically confirm that IPANs in mice display the same classical characteristics that have been known for many years from studies in guinea pigs, rats, mice and humans.

      We appreciate the already existing markers for IPANs in the ENS and the existing literature characterizing these neurons. The primary intent of this study was to use these well-established characteristics of IPANs in both mice and other species to characterize Cdh6-expressing neurons in the mouse myenteric plexus and confirm their classification as IPANs.

      (2) Some of the main conclusions of this study are overstated and claims of priority are made that are not true. For example, the authors state in lines 27-28 of the abstract that their findings provide the "first demonstration of selective activation of a single neurochemical and functional class of enteric neurons". This is certainly not true since Gould et al (AJP-GIL 2019) expressed ChR2 in nitrergic enteric neurons and showed that activating those cells disrupted CMC activity. In fact, prior work by the authors themselves (Hibberd et al., Gastro 2018) showed that activating calretinin neurons with ChR2 evoked motor responses. Work by other groups has used chemogenetics and optogenetics to show the effects of activating multiple other classes of neurons in the gut.

      We thank the reviewer for bringing up this important point and apologize if our wording was not clear. Whilst single neurochemical classes of enteric neurons have been manipulated to alter gut functions, all such instances to date do not represent manipulation of a single functional class of enteric neurons. In the given examples, multiple functional classes are activated utilizing the same neurotransmitter, as NOS and calretinin are each expressed to varying degrees across putative motor neurons, interneurons and IPANs. In contrast, Chd6 is restricted to IPANs and therefore this study is the first optogenetic investigation of enteric neurons from a single putative functional class. Our abstract and discussion emphasizes this point and differentiates this study from those previous.

      (3) Critical controls are needed to support the optogenetic experiments. Control experiments are needed to show that ChR2 expression a) does not change the baseline properties of the neurons, b) that stimulation with the chosen intensity of light elicits physiologically relevant responses in those neurons, and c) that stimulation via ChR2 elicits comparable responses in IPANs in the different gut regions focused on here.

      We completely agree controls are essential. However, our paper is not the first to express ChR2 in enteric neurons. Authors of our paper have shown in Hibberd et al. 2018 that expression of ChR2 in a heterogeneous population of myenteric neurons did not change network properties of the myenteric plexus. This was demonstrated in the lack of change in control CMC characteristics in mice expressing ChR2 under basal conditions (without blue light exposure). Regarding question (b), that it should be shown that stimulation with the chosen intensity of light elicits physiologically relevant responses in those neurons. We show the restricted expression of ChR2 in IPANs and that motor responses (to blue light) are blocked by selective nerve conduction blockade.

      Regarding question (c), that our study should demonstrate that stimulation via ChR2 elicits comparable responses in IPANs in the different gut regions. We would not expect each region of the gut to behave comparably. This is because the different gut regions (i.e. proximal, mid, distal) are very different anatomically, as is anatomy of the myenteric plexus and myenteric ganglia between each region, including the density of IPANs within each ganglia, in addition to the presence of different patterns of electrical and mechanical activity [Spencer et al., 2020]. Hence, it is difficult to expect that between regions stimulation of ChR2 should induce similar physiological responses. The motor output we record in our study (CMCs) is a unified motor program that involves the temporal coordination of hundreds of thousands of enteric neurons and a complex neural circuit that we have previously characterized [Spencer et al., 2018]. But, never has any study until now been able to selectively stimulate a single functional class of enteric neurons (with light) to avoid indiscriminate activation of other classes of neurons.

      (1) T. J. Hibberd, J. Feng, J. Luo, P. Yang, V. K. Samineni, R. W. Gereau, N. Kelley, H. Hu, N. J. Spencer, Optogenetic Induction of Colonic Motility in Mice. Gastroenterology 155, 514-528.e6 (2018).

      (2) N. J. Spencer, L. Travis, L. Wiklendt, T. J. Hibberd, M. Costa, P. Dinning, H. Hu, Diversity of neurogenic smooth muscle electrical rhythmicity in mouse proximal colon. American Journal of Physiology-Gastrointestinal and Liver Physiology 318, G244–G253 (2020).

      (3) N. J. Spencer, T. J. Hibberd, L. Travis, L. Wiklendt, M. Costa, H. Hu, S. J. Brookes, D. A. Wattchow, P. G. Dinning, D. J. Keating, J. Sorensen, Identification of a Rhythmic Firing Pattern in the Enteric Nervous System That Generates Rhythmic Electrical Activity in Smooth Muscle. The Journal of Neuroscience 38, 5507–5522 (2018).

      (4) The electrophysiological characterization of mouse IPANs is useful but this is a basic characterization of any IPAN and really says nothing specifically about Cdh6+ neurons. The electrophysiological characterization was also only done in a small fraction of colonic IPANs, and it is not clear if these represent cell properties in the distal colon or proximal colon, and whether these properties might be extrapolated to IPANs in the different regions. Similarly, blocking IH with ZD7288 affects all IPANs and does not add specific information regarding the role of the proposed Cdh6+ subtype.

      Our electrophysiological characterization was guided to be within a subset of Cdh6+ neurons by Hb9:GFP expression. As in the prior comment (1) above, we used these experiments to confirm classification of Cdh6+ (Hb9:GFP+) neurons in the distal colon as IPANs. We have clarified in the results and methods that these experiments were performed in the distal colon and agree that we cannot extrapolate that these properties are also representative of IPANs in the proximal colon. We apologize that this was confusing. Finally, we agree with the reviewer that ZD7288 affects all IPANs in the ENS and have clarified this in the text.

      (5) Why SMP IPANs were not included in the analysis of Cdh6 expression is a little puzzling. IPANs are present in the SMP of the small intestine and colon, and it would be useful to know if this proposed marker is also present in these cells.

      We agree with the reviewer. In addition to characterizing Cdh6 in the myenteric plexus, it would be interesting to query if sensory neurons located within the SMP also express Cdh6. Our preliminary data (n=2) show ~6-12% tdT/Hu neurons in Cdh6-tdT ileum and colon (data not shown). We have added a sentence to the discussion.

      (6) The emphasis on IH being a rhythmicity indicator seems a bit premature. There is no evidence to suggest that IH and IT are rhythm-generating currents in the ENS.

      Regarding the statement there is no evidence to suggest that IH and IT are rhythm-generating currents in the ENS. We agree with the reviewer that evidence of rhythm generation by IH and IT in the ENS has not been explicitly confirmed. We are confident the reviewer agrees that an absence of evidence is not evidence of absence, although the presence of IH has been well described in enteric neurons. We have modified the text in the results to indicate more clearly that IH and IT are known to participate in rhythm generation in thalamocortical circuits, though their roles in the ENS remain unknown. Our discussion of the potential role of IH or IT in rhythm generation or oscillatory firing of the ENS is constrained to speculation in the discussion section of the text.

      (7) As the authors point out in the introduction and discuss later on, Type II Cadherins such as Cdh6 bind homophillically to the same cadherin at both pre- and post-synapse. The apparent enrichment of Cdh6 in IPANs would suggest extensive expression in synaptic terminals that would also suggest extensive IPAN-IPAN connections unless other subtypes of neurons express this protein. Such synaptic connections are not typical of IPANs and raise the question of whether or not IPANs actually express the functional protein and if so, what might be its role. Not having this information limits the usefulness of this as a proposed marker.

      We agree with the reviewer that the proposed IPAN-IPAN connection is novel although it has been proposed before (Kunze et al., 1993). As detailed in our response to Reviewer #1, we attempted to confirm Cdh6 protein expression, but were unsuccessful, due to insufficient signal and resolution. We therefore discuss potential IPAN interconnectivity in the discussion, in the context of contrasting literature.

      (1) W. A. A. Kunze, J. B. Furness, J. C. Bornstein, Simultaneous intracellular recordings from enteric neurons reveal that myenteric ah neurons transmit via slow excitatory postsynaptic potentials. Neuroscience 55, 685–694 (1993).

      (8) Experiments shown in Figures 6J and K use a tethered pellet to drive motor responses. By definition, these are not CMCs as stated by the authors.

      The reviewer makes a valid criticism as to the terminology, since tethered pellet experiments do not record propagation. We believe the periodic bouts of propulsive force on the pellet is triggered by the same activity underlying the CMC. In our experience, these activities have similar periodicity, force and identical pharmacological properties. Consistent with this, we also tested full colons (n = 2) set up for typical CMC recordings by multiple force transducers, finding that CMCs were abolished by ZD7288, similar to fixed pellet recordings (data not shown).

      (9) The data from the optogenetic experiments are difficult to understand. How would stimulating IPANs in the distal colon generate retrograde CMCs and stimulating IPANs in the proximal colon do nothing? Additional characterization of the Cdh6+ population of cells is needed to understand the mechanisms underlying these effects.

      We agree that the different optogenetic responses in the proximal and distal colon are challenging to interpret, but perhaps not surprising in the wider context. It is not only possible that the different optogenetic responses in this study reflect regional differences in the Chd6+ neuronal populations, but also differences in neural circuits within these gut regions. A study some time ago by the authors showed that electrical stimulation of the proximal mouse colon was unable to evoke a retrograde (aborally) propagating CMC (Spencer, Bywater, 2002), but stimulation of the distal colon was readily able to. We concluded that at the oral lesion site there is a preferential bias of descending inhibitory nerve projections, since the ascending excitatory pathways have been cut off. In contrast, stimulation of the distal colon was readily able to activate an ascending excitatory neural pathway, and hence induce the complex CMC circuits required to generate an orally propagating CMC. Indeed, other recent studies have added to a growing body of evidence for significant differences in the behaviors and neural circuits of the two regions (Li et al., 2019, Costa et al., 2021a, Costa et al., 2021b, Nestor-Kalinoski et al., 2022). We have expanded this discussion.

      (1) N. J. Spencer, R. A. Bywater, Enteric nerve stimulation evokes a premature colonic migrating motor complex in mouse. Neurogastroenterology & Motility 14, 657–665 (2002).

      (2) Li Z, Hao MM, Van den Haute C, Baekelandt V, Boesmans W, Vanden Berghe P, Regional complexity in enteric neuron wiring reflects diversity of motility patterns in the mouse large intestine. Elife 8:e42914 (2019).

      (3) Costa M, Keightley LJ, Hibberd TJ, Wiklendt L, Dinning PG, Brookes SJ, Spencer NJ, Motor patterns in the proximal and distal mouse colon which underlie formation and propulsion of feces. Neurogastroenterology & Motility e14098 (2021a).

      (4) Costa M, Keightley LJ, Hibberd TJ, Wiklendt L, Smolilo DJ, Dinning PG, Brookes SJ, Spencer NJ, Characterization of alternating neurogenic motor patterns in mouse colon. Neurogastroenterology & Motility 33:e14047 (2021b).

      (5) Nestor-Kalinoski A, Smith-Edwards KM, Meerschaert K, Margiotta JF, Rajwa B, Davis BM, Howard MJ, Unique Neural Circuit Connectivity of Mouse Proximal, Middle, and Distal Colon Defines Regional Colonic Motor Patterns. Cellular and Molecular Gastroenterology and Hepatology 13:309-337.e303 (2022).

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the authors):

      As mentioned above, immunolocalization of cdh6 would be helpful to substantiate the claims regarding IPAN-IPAN synapses.

      As mentioned in our response to both reviewers’ public reviews, we attempted to visualize Cdh6 protein via antibody staining (Duan et al., 2018), but our efforts did not result in sufficient signal or resolution to identify Cdh6+ synapses.

      (1) X. Duan, A. Krishnaswamy, M. A. Laboulaye, J. Liu, Y.-R. Peng, M. Yamagata, K. Toma, J. R. Sanes, Cadherin Combinations Recruit Dendrites of Distinct Retinal Neurons to a Shared Interneuronal Scaffold. Neuron 99, 1145-1154.e6 (2018).

      Reviewer #2 (Recommendations for the authors):

      (1) The authors repeatedly refer to IPANs as "sensory" neurons (e.g. in title, abstract, and introduction) but there is some debate regarding whether these cells are truly "sensory" because the information they convey never reaches sensory perception. This is why they have classically been referred to as intrinsic primary afferent (IPAN) neurons. It would be more appropriate to stick with this terminology unless the authors have compelling data showing that information detected by IPANs reaches the sensory cortex.

      We thank the reviewer for their comment, but respectfully disagree. The term “sensory neuron” is well established in the ENS. The first definitive proof that “sensory neurons” exist in the ENS was published in Kunze et al., 1995. We note that this paper did not use the word “IPAN” but used the term “sensory neuron”. Furthermore, mechanosensory neurons were published in Spencer and Smith (2004).

      Regarding the reviewer’s comment that the authors would need compelling data showing that information detected by IPANs reaches the sensory cortex before the term “sensory neuron” should be valid, it is important to note that many sensory neurons do not provide direct information to the cortex.

      (1) W. A. A. Kunze, J. C. Bornstein, J. B. Furness, Identification of sensory nerve cells in a peripheral organ (the intestine) of a mammal. Neuroscience 66, 1–4 (1995).

      (2) N. J. Spencer, T. K. Smith, Mechanosensory S-neurons rather than AH-neurons appear to generate a rhythmic motor pattern in guinea-pig distal colon. The Journal of Physiology 558, 577–596 (2004).

      (2) Important information regarding the gut region shown and other details are absent from many figure legends.

      We apologize for this omission. We have updated the figure legends to include information on gut regions.

    1. Author response:

      Thank you for the constructive feedback from the reviewers. We are grateful for their insights and are committed to addressing the key concerns raised in the public reviews through the following revisions:

      (1) Validating Axoneme Stability Claims

      We have procured new antibodies for DRC11, as well as marker proteins for ODA, IDA, and RS. We will conduct quantitative immunofluorescence staining to validate our claims regarding axoneme stability.

      (2) Investigating ANKRD5 Expression in Other Ciliated Cells

      We plan to examine the expression of ANKRD5 in mouse respiratory cilia to determine whether it is also expressed in these cells.

      (3) Supplementing Key Citations for N-DRC Components

      We will add references to published studies on N-DRC components (e.g., DRC1, DRC2, DRC3, DRC5) associated with male infertility in the Introduction to strengthen the background context.

      (4) Further Analysis and Validation of ANKRD5 Interactome

      We will conduct additional analyses and validation of the interactome of ANKRD5 detected by LC-MS.

      (5) Elucidating the Function of ANKRD5 in Mitochondria

      We will further investigate the role of ANKRD5 in mitochondrial function.

      (6) Investigating Mitochondrial Function and Energy Metabolism

      We will further explore the role of ANKRD5 in mitochondrial function and energy metabolism.

      (7) Improving Cryo-ET Data Quality and Interpretation

      We will attempt to further improve the quality of the STA results and try to calculate the DMT structure with a period of 96 nm. We will also use the WT density map with the same period to generate a difference map.

      (8) Expanding Discussion and Correcting Terminology

      The Discussion section will be revised to elaborate on the implications of ANKRD5 for male contraceptive research, particularly in targeting sperm motility. We will also correct terminology inaccuracies (e.g., changing "9+2 microtubule doublet" to "9+2 structure") and address formatting issues (e.g., capitalizing "Control").

      Response to Reviewer #2 Comment 4:

      We appreciate the reviewer's careful consideration of our proteomic data. However, our Gene Set Enrichment Analysis (GSEA) of glycolysis/gluconeogenesis pathways showed no significant enrichment (p-value=0.089, NES=0.708; Fig.6D), which does not meet the statistical thresholds for biological significance (|NES|>1, pvalue<0.05). This observation is further corroborated by our direct ATP measurements showing no difference between genotypes (Fig.6E). We agree that further studies on metabolic regulation could be valuable, but current evidence does not support glycolysis disruption as a primary mechanism for the motility defects observed in Ankrd5-null sperm. This misinterpretation likely arose from the reviewer's overinterpretation of non-significant proteomic trends. We request that this specific claim be excluded from the assessment to avoid misleading readers.

      We will provide a comprehensive point-by-point response, along with detailed experimental data and revised figures, in the resubmitted manuscript. Thank you once again for the opportunity to address the reviewers' concerns. We are confident that these revisions will strengthen our manuscript and contribute to the scientific community.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors examined the role of Afadin, a key adaptor protein associated with cell-adhesion molecules, in retinal development. Using a conditional knockout mouse line (Six3-Cre; AfadinF/F), the authors successfully characterized a disorganized pattern of various neuron types in the mutant retinae. Despite these altered distributions, the retinal neurons maintained normal cell numbers and seemingly preserved some synaptic connections. Notably, tracing results indicated mistargeting of retinal ganglion cell (RGC) axon projections to the superior colliculus, and electroretinography (ERG) analyses suggested deficits in visual functions.

      Thank you for the summary and highlights of our study. We appreciate the input from Reviewer 1 and the Editor on this study, with focus on laminar choices, synaptic choices and axonal projections.

      Strengths:

      This compelling study provides solid evidence addressing the important question of how cell-adhesion molecules influence neuronal development. Compared to previous research conducted in other parts of the central nervous system (CNS), the clearly defined lamination of cell types in the retina serves as a unique model for studying the aberrant neuronal localizations caused by Afadin knockout. The data suggest that cell-cell interactions are critical for retinal cellular organization and proper axon pathfinding, while aspects of cell fate determination and synaptogenesis remain less understood. This work has broad implications not only for retinal studies but also for developmental biology and regenerative medicine.

      Weaknesses:

      While the phenotypes observed in the Afadin knockout (cKO) mice are intriguing, I would expect to see evidence confirming that Afadin is indeed knocked out in the retina through immunostaining. Specifically, is Afadin knocked out only in certain retinal regions and not others, as suggested by Figures 4A-B? Are Afadin levels different among distinct neuron types, which could mean that its knockout may have a more pronounced impact on certain cell types, such as rods compared to others?

      The authors suggest that synapses may form between canonical synaptic partners, based on the proximity of their processes (Figure 2). However, more solid evidence is needed to verify these synapses through the use of synaptic marker staining or transsynaptic labeling before drawing further conclusions.

      Although the Afadin cKO mice displayed dramatic phenotypes, additional experiments are necessary to clarify the details of this process. By manipulating Afadin levels in specific cell types or at different developmental time points, we could gain a better understanding of how Afadin regulates accurate retinal lamination and axonal projection.

      Regarding the antibody confirming the Knockout, we tested the commercially available antibody from Sigma but weren’t able to confirm its specificity. There was a homemade antibody from another Japan-based laboratory, but it was not available to share at the moment when the study was conducted. Nonetheless, the original allele was derived for hippocampal and cortical studies by Louis Reichardt’s Lab (UCSF), with verified efficacies of the KO allele.

      Regarding phenotypical penetrance, this may likely come from the mosaicism of the clone and the symmetric cell division, leading to a rosette-like structure. At this moment, we reason that Afadin KO does NOT lead to direct neuronal loss, and the selective rod loss may derive from other issues, but we lack direct evidence to validate this point.

      In regards to the specific neuronal types and synaptic pairs, we acknowledge the limitations of the current Figure 2 in linking the mutant phenotypes to circuit changes. However, the current genetic reagents (Six3Cre) are not compatible with neuron-type specific labeling of synaptic labeling – i.e., cell type-specific Cre and additional Cre-dependent AAV tools might be desired. To do so, we will need to initiate cell-type-specific breeding of transgenic markers such as Hb9GFP for ooDSGCs, or Chat-Cre, VGlut3-Cre for starburst amacrine cells, vG3 amacrine cells, followed by retinal physiology. These experiments take multi-allelic genetic crosses for a very low breeding yield (1/16 or 1/32 Mendelian ratio). These extensive genetic tests are beyond the scope of the current manuscript.

      Reviewer #2 (Public review):

      Summary:

      This study by Lum and colleagues reports on the role of Afadin, a cytosolic adapter protein that organizes multiple cell adhesion molecule families, in the generation and maintenance of complex cellular layers in the mouse retina. They used a conditional deletion approach, removing Afadin in retinal progenitors, and allowing them to analyze broad effects on retinal neuron development.

      The study presents high-quality and extensive characterization of the cellular phenotypes, supporting the main conclusions of the paper. They show that Afadin loss results in significant disorganization of the retinal cellular layers and the neuropil, producing rosettes and displacement of cells away from their resident layers. The major classes of neurons in the inner retina are affected, and some neurons are, remarkably, displaced to the other side of the inner plexiform layer. Nevertheless, they mostly target their synaptic partners, including the RGCs to distant retinorecipient targets in the brain. The main conclusions are as follows. Afadin is necessary for establishing and maintaining the retinal architecture. It is not necessary for the generation of the correct numbers/densities of retinal neuron subtypes. Moreover, Afadin loss preserves associations between known synaptic partners and preserves axonal targeting to retinorecipient layers. The consequences on photoreceptor viability and visual processing are also interesting, underscoring the essential function for maintaining retinal structure and function. Overall, the main conclusions describing the consequences are supported by the results.

      Strengths:

      The study provides new knowledge on the requirement of Afadin in retinal development. The introduction and discussion effectively set up the rationale for this work, and place it in the context of previous studies of Afadin in other regions of the CNS.

      The study presents high-quality and extensive characterizations of the cellular phenotypes resulting from Afadin loss. By analyzing various aspects of retinal organization - from cellular densities to axon targeting to brain - the study narrows down the role of a structure for promoting the establishment of the layers, or maintenance. The data are straightforward and convincing, and the interpretations are bounded by the data shown (though minor weakness re. survival). Another important finding is that the targeting of retinal neuron processes to synaptic partners, including retinorecipient targets in the brain, are intact.

      The study is important as it establishes a focused requirement for Afadin to set up and preserve the overall cellular organizations within the retinal tissue. The demonstration that Afadin is needed for photoreceptor viability and overall visual function enhances impact by establishing its functional importance.

      The manuscript is well well-written and presented. The images are attractive and compelling, and the figures are well organized.

      Thank you for your high praise on the logic, data presentation, and significance of the current manuscript. We appreciate your comments on the novelty and impact of our study using retinal circuits as a model.

      Weaknesses:

      (1) Expanding on the developmental mechanism is beyond the scope of the study, and would not add to the main conclusions. However, the manuscript would be improved by providing more clarity on the developmental emergence of the defects. The study left me questioning whether the rosettes and cell displacements occur during earlier stages of retina development, or are progressive. For instance, do the RGCs migrate and establish within the GCL correctly at first, and then are displaced with the progressive disorganization? Or are they disorganized and delaminate en route? Images of RGC staining at P0, or earlier during their migration, would be informative. Data in Figure 1 is limited to DAPI staining at P7. Figure 4 shows an image of rod photoreceptors at P7, with their displacement in the GCL layer (and not contained within a rosette). Are the progenitors mislocalized due to delamination?  A few additional thoughts on how these defects compare to other mutants with rosettes might give us more context for understanding the results.

      We chose P7 as our focus due to the lamination in controls. In the revised manuscript, we plan to include earlier time points, as suggested by the reviewer. The data in Figure 1 at P7 utilizes well-established cell type markers (RBPMS, Chx10, Ap2α) and is not limited only to DAPI. Additionally, we will revise the discussion section and place our mutant analyses in the context of other mutants with rosettes (beta-catenin, etc.) in the retina. Finally, we will address the comment on progenitor lamination by exploring earlier developmental time points.

      (2) The manuscript reports that the densities of major inner retinal classes are unaffected. There are a few details missing for this point. How were the cell densities quantified (in terms of ROI size), and normalized? This information is lacking in the methods. There is a striking thickening of the GCL in the DAPI-labeled images shown in Figure 1. What are these cells?

      We will revise the manuscript, particularly the methods section, to address these comments. Additionally, we will tackle ROI units and normalization. The cells in the thickened GCL were identified as displaced amacrine cells and bipolar cells.

    1. Author response:

      Reviewer #1:

      Summary:

      The authors address the role of the centromere histone core in force transduction by the kinetochore.

      Strengths:

      They use a hybrid DNA sequence that combines CDEII and CDEIII as well as Widom 601 so they can make stable histones for biophysical studies (provided by the Widom sequence) and maintain features of the centromere (CDE II and III).

      Weaknesses:

      The main results are shown in one figure (Figure 2). Indeed the Centromere core of Widom and CDE II and III contribute to strengthening the binding force for the OA-beads. The data are very nicely done and convincingly demonstrate the point. The weakness is that this is the entire paper. It is certainly of interest to investigators in kinetochore biology, but beyond that, the impact is fairly limited in scope.

      This reviewer might have missed that this is a Research Advance, not an article.  Research Advances are limited in scope by definition and provide a new development that builds on research reported in a prior paper.  They can be of any length.  Our Research Advance builds on our prior work, Hamilton et al., 2020 and provides the new result that native centromere sequences strengthen the attachment of the kinetochore to the nucleosome.

      Reviewer #2:

      Summary:

      This paper provides a valuable addendum to the findings described in Hamilton et al. 2020 (https://doi.org/ 10.7554/eLife.56582). In the earlier paper, the authors reconstituted the budding yeast centromeric nucleosome together with parts of the budding yeast kinetochore and tested which elements are required and sufficient for force transmission from microtubules to the nucleosome. Although budding yeast centromeres are defined by specific DNA sequences, this earlier paper did not use centromeric DNA but instead the generic Widom 601 DNA. The reason is that it has so far been impossible to stably reconstitute a budding yeast centromeric nucleosome using centromeric DNA.

      In this new study, the authors now report that they were able to replace part of the Widom 601 DNA with centromeric DNA from chromosome 3. This makes the assay more closely resemble the in vivo situation. Interestingly, the presence of the centromeric DNA fragment makes one type of minimal kinetochore assembly, but not the other, withstand stronger forces.

      We thank the reviewer for their careful and positive assessment of our work.

      Which kinetochore assembly turned out to be affected was somewhat unexpected, and can currently not be reconciled with structural knowledge of the budding yeast centromere/kinetochore. This highlights that, despite recent advances (e.g. Guan et al., 2021; Dendooven et al., 2023), aspects of budding yeast kinetochore architecture and function remain to be understood and that it will be important to dissect the contributions of the centromeric DNA sequence.

      We couldn’t agree more.

      Given the unexpected result, the study would become yet more informative if the authors were able to pinpoint which interactions contribute to the enhanced force resistance in the presence of centromeric DNA.

      Strength:

      The paper demonstrates that centromeric DNA can increase the attachment strength between budding yeast microtubules and centromeric nucleosomes.

      Weakness:

      How centromeric DNA exerts this effect remains unclear.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment

      In this work, the authors use a Drosophila adult ventral nerve cord injury model extending and confirming previous observations; this important study reveals key aspects of adult neural plasticity. Taking advantage of several genetic reporter and fate tracing tools, the authors provide solid evidence for different forms of glial plasticity, that are increased upon injury. The data on detected plasticity under physiologic conditions and especially the extent of cell divisions and cell fate changes upon injury would benefit from validation by additional markers. The experimental part would improve if strengthened and accompanied by a more comprehensive integration of results regarding glial reactivity in the adult CNS.

      Thank you very much for your thoughtful comments and constructive feedback regarding our manuscript. We appreciate all the positive remarks on the significance of our findings on neural plasticity in this Drosophila adult ventral nerve cord injury model.

      In response to your suggestion, we fully agree that the continuation of this project should address in detail cell fate changes with additional markers if available, or an “omic” approach such as scRNAseq. Unfortunately, these further experiments are beyond the scope of this paper to describe the in vivo phenomena of cell reprogramming, and the cellular events that take glial cells to convert into neurons or neuronal precursors.

      Additionally, we agree that the experimental part can be further improved by providing a more comprehensive integration of our results with current knowledge on glial reactivity in the adult CNS. We will revise the manuscript accordingly to include a deeper discussion of the broader implications of our findings and their alignment with existing literature.

      Thank you again for your valuable input, which will undoubtedly enhance the quality of our work. We look forward to submitting the revised manuscript for your consideration.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Casas-Tinto et al. present convincing data that injury of the adult Drosophila CNS triggers transdifferentiation of glial cell and even the generation of neurons from glial cells. This observation opens up the possibility to get an handle on the molecular basis of neuronal and glial generation in the vertebrate CNS after traumatic injury caused by Stroke or Crush injury. The authors use an array of sophisticated tools to follow the development of glial cells at the injury site in very young and mature adults. The results in mature adults reveal a remarkable plasticity in the fly CNS and dispels the notion that repair after injury may be only possible in nerve cords which are still developing. The observation of so called VC cells which do not express the glial marker repo could point to the generation of neurons by former glial cells.

      Conclusion:

      The authors present an interesting story which is technically sound and could form the basis for an in depth analysis of the molecular mechanism driving repair after brain injury in Drosophila and vertebrates.

      Strengths:

      The evidence for transdifferentiation of glial cells is convincing. In addition, the injury to the adult CNS shows an inherent plasticity of the mature ventral nerve cord which is unexpected.

      Weaknesses:

      Traumatic brain injury in Drosophila has been previously reported to trigger mitosis of glial cells and generation of neural stem cells in the larval CNS and the adult brain hemispheres. Therefore this report adds to but does not significantly change our current understanding. The origin and identity of VC cells is still unclear. The authors show that VC cells are not GABA- or glutamergic. Yet, there are many other neurotransmitter or neuropetides. It would have been nice to see a staining with another general neuronal marker such as anti-Syt1 to confirm the neuronal identity of Syt1.

      We thank the reviewer for the constructive comments and positive feedback. We concur that previous studies have demonstrated glial cell proliferation in response to CNS injury. In contrast, our study focuses on glial transdifferentiation that emerges as a novel phenomenon, particularly in response to injury. We found that neuropile glia lose their glial identity and express the pan-neuronal marker Elav. To investigate the identity of these newly observed elav-positive cells, we employed anti-ChAT, antiGABA and anti-GluRIIA antibodies to determine the functional identity of these cells, besides we stained them with other neuronal markers such Enabled, Gigas or Dac (not shown); however, our attempts yielded limited success. To address this, we have now included a discussion section exploring the potential identity of these cells, considering the possibility that they may represent immature neurons.

      Reviewer #2 (Public review):

      Summary:

      Casas-Tinto et al., provide new insight into glial plasticity using a crush injury paradigm in the ventral nerve cord (VNC) of adult Drosophila. The authors find that both astrocyte-like glia (ALG) and ensheating glia (EG) divide under homeostatic conditions in the adult VNC and identify ALG as the glial population that specifically ramps up proliferation in response to injury, whereas the number of EGs decreases following the insult. Using lineage-tracing tools, the authors interestingly observe interconversion of glial subtypes, especially of EGs into ALGs, which occurs independent of injury and is dependent on the availability of the transcription factor Prospero in EGs, adding to the plasticity observed in the system. Finally, when tracing the progeny of glia, Casas-Tinto and colleagues detect cells of neuronal identity and provide evidence that such gliaderived neurogenesis is specifically favoured following ventral nerve cord injury, which puts forward a remarkable way in which glia can respond to neuronal damage.

      Strengths:

      This study highlights a new facet of adult nervous system plasticity at the level of the ventral nerve cord, supporting the view that proliferative capacity is maintained in the mature CNS and stimulated upon injury.

      The injury paradigm is well chosen, as the organization of the neuromeres allows specific targeting of one segment, compared to the remaining intact and with the potential to later link observed plasticity to behaviour such as locomotion.

      Numerous experiments have been carried out in 7-day old flies, showing that the observed plasticity is not due to residual developmental remodelling or a still immature VNC.

      By elegantly combining different methods, the authors show glial divisions including with mitotic-dependent tracing and find that the number of generated glia is refined by apoptosis later on.

      The work identifies prospero in glia as an important coordinator of glial cell fate, from development to the adult context, which draws further attention to the upstream regulatory mechanisms.

      We would like to thank the reviewer for his/her comments and the positive analysis of this work.

      Weaknesses:

      The authors observe consistent inter-conversion of EG to ALG glial subtypes that is further stimulated upon injury. The authors conclude that these findings have important consequences for CNS regeneration and potentially for memory and learning. However, it remains somewhat unclear how glial transformation could contribute to regeneration and functional recovery.

      This is an ongoing question in the laboratory and in the field. We know that glial cells contribute to the regenerative program in the nervous system, and molecular signalling in glial cells is determinant for the functional recovery (Losada-Perez et al 2021). Therefore, we include this concept in the discussion as the evidence indicates that glial cells participate in these programs. However, further investigation is required to clarify and determine the mechanisms underlying this glial contribution. To determine if glial to neuron transformation contributes to functional recovery, we would need to compare the recovery of animals with new VC to animals without VC, however, the  molecular mechanism that produces this change of identity is still unknown, and therefore we are not able to generate injured flies with no new VC

      The signal of the Fucci cell cycle reporter seems more complex to interpret based on the panels provided compared to the other methods employed by the authors to assess cell divisions.

      We agree that Fly Fucci is a genetic reporter that might be more complex to interpret than EdU staining or other markers. However, glial cells proliferation is a milestone of this manuscript, and we used different available tools to confirm our results. We have revised this specific section to ensure that the text is clear and straightforward.

      Elav+ cells originating from glia do not express markers for mature neurons at the analysed time-point. If they will eventually differentiate or what type of structure is formed by them will have to be followed up in future studies.

      We fully agree with the reviewer, and we will analyze later days to study neuronal fate and contribution to VNC function.

      Context/Discussion

      There is some lack of connecting or later comparing the observed forms of glial plasticity in the VNC with respect to plasticity described in the fly brain.

      Highlighting some differences in the reactiveness of glia in the VNC compared to the brain could point to relevant differences in repair capacity in different areas of the CNS.

      Based on the assays employed, the study points to a significant amount of glial "identity" changes or interconversions under homeostatic conditions. The potential significance of this rather unexpected "baseline" plasticity in adult tissues is not explicitly pointed out and could improve the understanding of the findings.

      Some speculations if "interconversion" of glia is driven by the needs in the tissue could enrich the discussion.

      We would like to thank the reviewer for these suggestions. We have changed the discussion to introduce these concepts.

      Reviewer #3 (Public review):

      In this manuscript, Casas-Tintó et al. explore the role of glial cell in the response to a neurodegenerative injury in the adult brain. They used Drosophila melanogaster as a

      model organism, and found that glial cells are able to generate new neurons through the mechanism of transdifferentiation in response to injury. This paper provides a new mechanism in regeneration, and gives an understanding to the role of glial cells in the process.

      Comments on revisions:

      In the previous version of the manuscript, I had suggested several recommendations for the authors. Unfortunately, none of these were addressed in the author's revision.

      We are sorry for this error. We apologize but we never received these comments. We have now found them, and we have incorporated these comments in the new version of the manuscript.

      (1) Have you tried screening for other markers for the EdU+ Repo+ Pros- cells?

      We have identified these cells as glial cells (Repo +), and not astrocyte-like glia (pros-). But we have not further characterized  the identity of these cells. Our aim was to identify these proliferating glial cells as NPG (Neuropile glia), which are Astrocyte-Like Glia (ALG), as previous works suggest in larvae (Kato et al., 2020; Losada-Perez et al., 2016), or Ensheathing Glia (EG). To discard the ALG identity, we used prospero as the best marker. The results indicate that there are ALG among the proliferating population, but in addition, we also found pros- glial cells that were EdU positive. These cells are located in the interface between cortex and neuropile, where the neuropile glia position is described. The anti-pros staining indicated they were no ALG which suggest that they are EG.

      There is no specific nuclear marker for EG cells, therefore we used FLY_FUCCI under the control of a EG specific promoter (R56F03-Gal4) to determine if the other dividing cells were EG. These results indicate that EG glia divide although their proliferation does not increase upon injury.

      The R56F03 Gal4 construct is described as ensheathing glia specific by previous publications, including:

      (1) Kremer M. C., Jung C., Batelli S., Rubin G. M. and Gaul U. (2017). The glia of the adult Drosophila nervous system. Glia 65, 606-638. 10.1002/glia.23115

      (2) Qingzhong Ren, Takeshi Awasaki, Yu-Chun Wang, Yu-Fen Huang, Tzumin Lee. Lineage-guided Notch-dependent gliogenesis by Drosophila multi-potent progenitors. Development. 2018 Jun 11;145(11):dev160127. doi: 10.1242/dev.160127   

      To summarize, our results suggest that part of these proliferating glial cells are ALG and EG. Our results can not discard that a residual part of these proliferating cells are not AG nor EG.

      (2) You mentioned that ALG are heterogenous in size and shape, does that mean that you may have different subpopulations of ALG? Would that also mean that only a portion of them responds to injury?

      Yes, as in Astrocytes in vertebrates this population is highly heterogeneous. Currently there are no molecular tools to specifically identify these subpopulations and characterize their distinct roles. However, emerging research suggests that differences in size, shape, and potentially molecular markers could correlate with functional diversity. This implies that certain subpopulations of ALG may be more specialized or primed to respond to injury, while others may play roles in homeostasis or other processes. Understanding this heterogeneity will require advanced techniques such as single-cell RNA sequencing, spatial transcriptomics, or live imaging to unravel how these subpopulations contribute to injury responses and overall tissue dynamics.

      (3) You mentioned that NP-like cells have similar nuclear shape and size to ALG and EG, while Ventral cortex cells have larger nuclei. Can you please show a quantification of the NP-like cells and Ventral cortex cells size, and show a direct comparison with ALG and EG cells to support those claims (images, quantification and analysis)?

      We added a new supplementary figure with a graph showing nuclei size differences between VC and NP-like cells, and a diagram showing VC cell localization. Images in figure 2A-A’ and 2B-B’ show both types of cells with the same scale, additionally, NPG cells are shown in red (current expression of the specific Gal4 line). A direct comparison between EG and NP-like glia can be observed in Figure 3 as well.

      Besides of size and localization, we conclude  that VC and N-like cells present different molecular markers as VC are elav-positive and reponegative whereas NP-like cells are repo-positive elav-negative

      (4) In Figure 2B, the repo expression is not very clear. I suggest using a different example to support the claim that NP cells are Repo+.

      We have changed the color of anti-elav staining to facilitate visualisation

      (5) Again, in Figure 2C, you need quantification and analysis to support the claim that you used nuclear shape and size to identify VC vs. NP like cells.

      Quantification in point 3, criteria in Figure S1

      (6) What is the identity of the newly formed neurons? Other than Elav, have you tried using other markers of neurons that are typically found in this area?

      This question is of great interest and relevance. We have done great efforts to solve this open question and so far, our data suggest that these neurons might be in an immature state. In this last version of the manuscript, we included the results (Figure S1) with several different markers. 

      The molecular identity of these cell populations, glia and neurons, is currently under investigation.

      Minor comments:

      (1) In the abstract, EG and ALG abbreviations are not introduced properly.

      Thank you very much for noticing this missing information, we have now included it in the abstract.

      (2) Please include a representation of the NPG somata location in Figure 1A.

      We have included this information in the figure

      (3) A schematic showing the differences between ALG and EG cells would be helpful as well.

      We have included in the introduction references and reviews where other authors describe in detail the differences.

      (4) In Figure 1 E, G, H- please indicated the genotype of the fly used in the panel as well as the cell type studied.

      The complete genotype is included in the corresponding figure legend. We have added a simplified genotype in the figure for clarity.

      (5) Please show the genotype used for images in Figure 2: ALG or EG specific drivers.

      This information is included in the corresponding figure legend. We believe that it is better to keep the figure clean so we decided to keep the complete genotype, which is considerably long, only in the figure legend.

    1. Author response:

      We appreciate the constructive feedback provided by the reviewers and the editorial board. We are delighted by the positive reception of our work and the thoughtful insights shared.

      Regarding the validation of our predicted interactions, we are currently conducting yeast two-hybrid (Y2H) assays using a commercially available Arabidopsis thaliana cDNA library to screen for interacting partners of the ANK putative effector PBTT_00818 from Plasmodiophora brassicae. Following this initial screening, we will validate positive interactions through targeted 1-to-1 Y2H assays. In particular, we aim to confirm the AlphaFold Multimer-predicted interaction between PBTT_00818 and MPK3, a key immunity-related kinase in Arabidopsis

      We are grateful for the reviewers’ thoughtful suggestions regarding clustering visualization, sequence vs. structure-based motif alignments, and structural confidence assessments. We will carefully incorporate these improvements in our planned revisions.

      Once again, we thank the editors and reviewers for their rigorous and constructive assessment. We look forward to implementing these refinements and submitting an updated version that further enhances the impact of our study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study identifies the "H-state" as a potential conformational marker distinguishing amyloidogenic from non-amyloidogenic light chains, addressing a critical problem in protein misfolding and amyloidosis. By combining advanced techniques such as small-angle X-ray scattering, molecular dynamics simulations, and H-D exchange mass spectrometry, the authors provide convincing evidence for their novel findings. However, incomplete experimental descriptions, limitations in SAXS data interpretation, and the way HDX MS data is presented aHect the strength and generalizability of the conclusions. Strengthening these aspects would enhance the impact of this work for researchers in amyloidosis and protein misfolding.

      We thank eLife editors and reviewers for their constructive feedback. The manuscript has been improved to provide a more complete description of the experiments and to strengthen the interpretation and presentation of all data. Updated Figures (Figure 2 and Figure 5) and a new Table (Table 2) in the main text provide a more complete and clearer comparison of the SAXS data with MD simulations as well as a clearer representation of the HDX MS data. Additional figures have been added in SI. The text has been extended accordingly and complete materials and methods are now included in the main text. Abstract, introduction and discussion have been revised to improve the overall readability of the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      The study investigates light chains (LCs) using three distinct approaches, with a focus on identifying a conformational fingerprint to diHerentiate amyloidogenic light chains from multiple myeloma light chains. The study's major contribution is identifying a low-populated "H state," which the authors propose as a unique marker for AL-LCs. While this finding is promising, the review highlights several strengths and weaknesses. Strengths include the valuable contribution of identifying the H state and using multiple approaches, which provide a comprehensive understanding of LC structural dynamics. However, the study suHers from weaknesses, particularly in interpreting SAXS data, lack of clarity in presentation, and methodological inconsistencies. Critical concerns include high error margins between SAXS profiles and MD fits, unclear validation of oligomeric species in SAXS measurements, and insuHicient quantitative cross-validation between experimental (HDX) and computational data (MD). This reviewer calls for major revisions including clearer definitions, improved methodology, and additional validation, to strengthen the conclusions.

      We thank the reviewer for the supportive comments, in the revised version of the manuscript we have focused on improving the clarity and completeness of our work. We are sorry for example to not have made previously clear enough that the comparison of SAXS with MD simulation was not that shown in the main text in Figure 1 and Table 1 (this is the comparison with single structures) but that reported in the SI (previously Figure S1 and Table S2, showing very good fits). These data have been moved in the main text in the reworked Figure 2 and new Table 2.  We have also improved the presentation of the HDX MS data in Figure 5 and in the text adding also additional analysis in SI. Materials and methods are now completely moved in the main text. We generally revised the manuscript for clarity.

      Reviewer #2 (Public review):

      Summary:

      This well-written manuscript addresses an important but recalcitrant problem - the molecular mechanism of protein misfolding in Ig light chain (LC) amyloidosis (AL), a major life-threatening form of systemic human amyloidosis. The authors use expertly recorded and analyzed smallangle X-ray scattering (SAXS) data as a restraint for molecular dynamics simulations (called M&M) and to explore six patient-based LC proteins. The authors report that a highly populated "H-state" determined computationally, wherein the two domains in an LC molecule acquire a straight rather than bent conformation, is what distinguishes AL from non-AL LCs. They then use H-D exchange mass spectrometry to verify this conclusion. If confirmed, this is a novel and interesting finding with potentially important translational implications.

      We thank the reviewer for the supportive comments.

      Strengths:

      Expertly recorded and analyzed SAXS data combined with clever M&M simulations lead to a novel and interesting conclusion. Regardless of whether or not the CL-CL domain interface is destabilized in AL LCs explored in this (Figure 6) and other studies, stabilization of this interface is an excellent idea that may help protect at least a subset of AL LCs from misfolding in amyloid. This idea increases the potential impact of this interesting study.

      We thank the reviewer for the supportive comments.

      Weaknesses:

      The HDX analysis could be strengthened.

      We have extended the analysis and improved the presentation of the HDX data. Figure 5 has been reworked, text has been improved accordingly and additional analysis have been reported in SI.

      Reviewer #3 (Public review):

      Summary:

      This study identifies conformational fingerprints of amyloidogenic light chains, that set them apart from the non-amyloidogenic ones.

      We thank the reviewer for the supportive comments.

      Strengths:

      The research employs a comprehensive combination of structural and dynamic analysis techniques, providing evidence that conformational dynamics at the VL-CL interface and structural expansion are distinguished features of amyloidogenic LCs.

      We thank the reviewer for the supportive comments.

      Weaknesses:

      The sample size is limited, which may aHect the generalizability of the findings. Additionally, the study could benefit from deeper analysis of specific mutations driving this unique conformation to further strengthen therapeutic relevance.

      We agree, we tried to maximise the size of the sample and this was the best we could do. With respect to the analysis of the mutations, while we tried to discuss some of them also in view of previous works, because our set covers multiple germlines instead than focusing on a single one, this limit our ability to discuss single point mutations systematically, at the same time the discussion of single points mutations has been the focus of many recent works, while our approach provide a diNerent point of view.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This study provides an investigation of light chains (LCs) using three distinct approaches, focusing primarily on identifying a conformational fingerprint to distinguish amyloidogenic light chains (AL-LCs) from multiple myeloma light chains (MM-LCs). The authors propose that the presence of a low-populated "H state," characterized by an extended quaternary structure and a perturbed CL-CL interface, is unique to AL-LCs. This finding is validated through hydrogendeuterium exchange mass spectrometry (HDX-MS). The study makes a valuable contribution to understanding the structural dynamics of light chains, particularly with the identification of the H state in AL-LCs. However, significant concerns regarding the interpretation of the SAXS data, clarity in presentation, and methodological rigor must be addressed. I recommend major revisions and resubmission of the work.

      Major concerns:

      (1) A critical concern is how the authors ensure that the SAXS profiles represent only dimeric species, given the high propensity of LCs to aggregate. If higher-order aggregates or monomers were present, this would significantly impact the SAXS data and SAXS-MD integration. Some measurements are bulk SAXS, while others are SEC-SAXS, making the study questionable. The authors need to clarify how only dimeric species were measured for the SEC-SAXS analysis, and all assessments of the dimeric state should be shown in the SI. Additionally, complementary techniques such as DLS or SEC-MALS should be used to verify the oligomeric state of the samples. Without this validation, the SAXS profiles may not be reliable.

      We added SEC-MALS and SEC-SAXS data in the SI (Figures S20 and S21) as well the SAXS curves shown in log-log plot (Figure S1) that display a flat trend at low q that exclude aggregation. SAXS is very sensitive to oligomers and aggregates and our data do not indicate the presence of those species. When we had indication of possible aggregation in the sample we used SEC-SAXS.

      (2) A major problem with the paper is that the claim of the "H state," which is the novelty of the study and serves as a marker of aggregation, is derived from samples where the error between the SAXS profiles and MD fits is extremely high. This casts doubt on whether the structure is indeed resolved by MD. The main conclusion of the paper is derived from weak consistency between experiment and simulation. In AL55, the error between experiment and simulation is greater than 5; for H7, it is higher than 2.8. The residuals show significant error at mid-q values, suggesting that long-range distance correlations (20-10 Å, CL, VL positioning) are not consistent between simulation and experiment. Furthermore, the FES plots of two independent replicas show deviation in the existence of the H state. One shows a minimum in that region, while the other does not. So, how robust is this conclusion? What is the chi-squared value if each replica is used independently? A separate experimental cross-validation is necessary to claim the existence of the H state.

      We apologise for the misunderstanding underlying this reviewer comment. The poor agreement mentioned is not between the SAXS and MD simulations, but with the individual structures, and this disagreement led us to perform MD simulations that are in much better agreement with the data (previously Fig. S1 and Table S2). To avoid this misunderstanding, which would indeed weaken our work, we have now moved both the figure and the table in the main text to the updated Figure 2 and the new Table 2.

      Regarding the robustness of the sampling, we believe that Table 3 (previously Table 2) clearly shows the statistical convergence of the data, diNerences in the presentation of the free energy are purely interpolation issues. The chi-squares of each replicate are reported in Table 2 (previously Table S2).

      (3) There is insuHicient discussion about SAXS computations from MD trajectories. The accuracy of these calculations is crucial to deriving the existing conclusions, and the study's reliance on the PLUMED plugin, which is known to give inaccurate results for SAXS computations, raises concerns. How the solvent is treated in the SAXS computations needs to be explained. Alternative methods like WAXSiS or Crysol should be explored to check whether the SAXS profiles derived from the MD trajectory are consistent across other SAXS computation methods for the major conformers of the proteins.

      We have now clarified that while the SAXS calculation to perform Metainference MD were done using PLUMED (that to our knowledge is as accurate as crysol) SAXS curves used for analysis were calculated using crysol.

      (4) The HDX and MD results do not seem to correlate well, and there is a disconnect between Figure 2 (SAXS profiles) and Figure 5 (HDX structural interpretation). The authors should quantitatively assess residue-level dynamics by comparing HDX signals with MD-derived HDX signals for each protein. This would provide a cross-validation between the experimental and computational data.

      In our opinion our SAXS, MD and HDX MS data provide a consistent picture. Our HDX-MS do not provide per residue data, making a quantitative comparison out of scope. RMSF data do not necessarily need to correlate with the deuterium uptake.

      (5) MD simulations are only used to refine the structure of AlphaFold predictions, but the trajectories could help explain why these structures diHer, what stabilizes the dimer, or what leads to the conformational transition of the H state. A lack of analysis regarding the physical mechanism behind these structural changes is a weakness of the study. The authors should dedicate more eHort to analyzing their data and provide physical insights into why these changes are observed.

      Our aim was to identify a property that could discriminate between AL and MM LCs. We used MD simulations, not to refine structures, but to explore the conformational dynamics of LCs (starting from either X-ray structures, homology or AlphaFold models), because SAXS data suggested that conformational dynamics could discriminate between AL- and MM-LCs. Simulations allowed us to propose a hypothesis, which we tested by HDX MS. While more insight is always welcome, we believe that we have achieved our goal for now. In the discussion, we present additional analysis of the simulations to connect with previous literature, we agree that more analysis can be done, and also for this reason, all our data are publicly available.

      Minor concerns

      (6) The abstract leans heavily on describing the problem and methods but lacks a clear presentation of key results. Providing a concise summary of the main findings (e.g., the identification of the H state) would better balance the abstract.

      We agree with the reviewer and we rewrote the abstract.

      (7) In the abstract, the term "experimental structure" is used ambiguously. Since SAXS also provides an experimental structure, it is unclear what the authors are referring to. This should be clarified.

      We agree with the reviewer and we rewrote the abstract.

      (8) Abbreviations such as VL (variable domain) and CL (constant domain) are not defined, making it harder for readers unfamiliar with the field to follow. Abbreviations should be defined when first mentioned.

      We agree with the reviewer and we rewrote the abstract.

      (9) The introduction provides a good general context but fails to explicitly define the knowledge gap. Specifically, the structural and dynamic determinants of LC amyloidogenicity are not well established, and this study could be framed as addressing that gap.

      We thank the reviewer and we agree this could be better framed, we improved the introduction accordingly.

      (10) The introduction does not present the novel discovery of the H state early enough. The unique contribution of identifying this state as a marker for AL-LCs should be mentioned upfront to guide the reader through the significance of the study.

      We thank the reviewer and we have now made more explicit what we found.

      (11) The therapeutic implications of this research should be highlighted more clearly in the discussion. Examples of how these findings could be utilized in drug design or therapeutic approaches would enhance the study's impact.

      We thank the reviewer, but while we think that the H-state could be targeted for drug design, since we do not have data yet we do not want to stress this point more than what we are already doing.

      (12) There is an overwhelming use of abbreviations such as H3, H7, H18, M7, and M10 without proper introduction. This makes it diHicult for readers to follow the results, and the average reader may become lost in the details. An introductory figure summarizing the sequences under study, along with a schematic of the dimeric structure defining VL and CL domains, would significantly aid comprehension.

      We agree and we tried to better introduce the systems and simplify the language without adding a figure that we think would be redundant.

      (13) In Figure 1, add labels to each SAXS curve to indicate which protein they correspond to. Also, what does online SEC-SAXS mean?

      Done

      (14) The caption of Figure 3 is unclear, particularly with abbreviations like Lb, Ls, G, and H, which are not mentioned in the captions. The authors should define these terms for clarity.

      Done

      (15) The study claims that the dominant structure of the dimer changes between diHerent LCs. However, Figure 5 shows identical structures for all proteins, raising questions about the consistency between the SAXS and HDX data. This inconsistency is a general problem between the MD and HDX sections, where cross-communication and comparisons are not properly addressed.

      We do not claim that the dominant structure of the dimer changes between diNerent LCs, this would also be in contradiction with current literature. We claim a diNerence in a low-populated state. From this point of view using always the same structure is consistent and should simplify the representation of the results. We agree that the manuscript may be not always easy to follow and we thank the reviewer in helping us improving it.

      (16) The authors show I(q) vs q and residuals for each protein. The Kratky plots are not suHicient to compare the SAXS computations with the measured profile.

      Showing Kratky and residuals is a standard and complementary way to present and compare SAXS data to structures. Chi-square values are also reported. Log-log plots have been added to SI in response to previous comments.

      (17) The authors need to explain how they estimate the Rg values (from simulation or SAXS profiles). If they are using simulations, they should compute the Rg values from the simulations for comparison.

      Rg values reported in Table 1 are derived from SAXS. Rg from simulations have been added in Table 2.

      (18) The evolution of the sampling is unclear. The authors need to show the initial starting conformation in each case and the most likely conformation after M&M in the SI, to demonstrate that their approach indeed caused changes in the initial predictions.

      Our approach is not structure refinement and as such the proposed analysis would be misleading. Metainference is meant to generate a statistical ensemble representing the equilibrium conformations that as whole reproduce the data. DiNerences (or not) between initial and selected configurations will not be particularly informative in this context.

      (19) The authors should also provide a running average of chi-squared values over time to demonstrate that the conformational ensemble converged toward the SAXS profile.

      Our simulations are not driven to improve the agreement with SAXS over time, this is not structure refinement. Metainference is meant to generate a statistical ensemble representing the equilibrium conformations that as whole reproduce the data. The suggested analysis would be a misinterpretation of our simulations. The comparison with SAXS is provided in Figure 2 and Table 2 as mentioned above.

      (20) The aggregate simulation time of 120 microseconds is misleading, as each replica was only run for 2-3 microseconds. This should be clarified.

      The number reported in the text is accurate and represent the aggregated sampling. The number of replicas for each metainference simulation and their length is reported in Table 2 now moved for clarity from the SI to main text.

      (21) It is not clear how the replicas were weighted to compute the SAXS profiles and FES. There are two independent runs in each case, and each run has about 30 replicas. How these replicas are weighted needs to be discussed in the SI.

      Done

      (22) The methods section is unevenly distributed, with detailed explanations of LC production and purification, while other key methodologies like SAXS+MD integration and HDX are not even mentioned in the main text (they are in the Supporting Information). The authors should provide a brief overview of all methodologies in the main text or move everything to the SI for consistency.

      We agree with the reviewer, all methods are now in main text. 

      Reviewer #2 (Recommendations for the authors):

      (1) Computational M&M evidence is strong (Figure 3) and is supported by SAXS (used as restraints). However, Kratky plots reported in the main MS Figure 1 show significant diHerences between the data and the structural model only for one protein, AL-55. It is hard for the general reader to see how these SAXS data support a clear diHerence between AL and non-AL proteins. If possible, please strengthen the evidence; if not, soften the conclusions.

      We thank the reviewer for the comments. The chi-square (Table 1) and the residuals (Figure 1) are a strong indication of the diNerence. To strengthen the evidence, following also the comment from reviewer 3 we calculated the p-value (<10<sup>-5</sup>) on the significance of the radius of gyration to discriminate AL and MM LCs. We agree that SAXS alone was not enough and this is indeed what prompted us to perform MD simulations.

      (2) HDX MS results are cursory and not very convincing as presented. The butterfly plots in Figure 5 are too small to read and are unlabeled so it is unclear which protein is which.  

      Figure 5 has been reworked for readability. More data have been added in SI. 

      (3) What labeling time was selected to construct these plots and why?

      The deuterium uptakes at 30 min HDX time showed the most pronounced diNerences between diNerent proteins, which were chosen to illustrate the key structural features in the main figure panel (Figure 5).

      How diHerent are the results at other labeling times? Showing uptake curves (with errors) for more than just two peptides in the supplement Figure S12 might be helpful. 

      We found a continuous increase in deuterium uptake as we increased the exchange time from 0.5 to 240 min, which reached saturation at 120 min. Therefore, the exchange follows the same pattern at all time points. Butterfly plots at diNerent HDX times of 0.5 to 240 min are shown in gradient of light blue to dark blue which clearly shows the pattern of deuterium uptake at increasing incubation times (Figure 5). The HDX uptake kinetics of selected peptides with corresponding error bars are shown in Figure S12.

      How redundant are the data, i.e. how good is the peptide coverage/resolution in key regions at the domain-domain interface that the authors deem important? Mapping the maximal deuterium uptake on the structures in Figure 5 is not very helpful. Perhaps mapping the whole range of uptake using a gradient color scheme would be more informative.

      Overall coverage and redundancy for all four proteins are> 90% and > 4.0, respectively, with an average error margin in fractional uptake among all peptides is 0.04-0.05 Da, which suggests that our data is reliable (Table S3). We modified the main panel figures showing the gradient of deuterium uptake in blue-white-red for 0 to 30% of deuterium uptake on the chain A of the dimeric LCs.

      (3) Is the conformational heterogeneity depicted in M&M simulations consistent with HDX results? The authors may want to address this by looking at the EX1/EX2 exchange kinetics for AL vs. non-AL proteins. Do AL proteins show more EX1?

      No, we don’t see any EX1 exchange kinetics in our analysis. This is compatible with the prediction of the H-state that is a native like state and not an unfolded/partially folded state. 

      (4) Perhaps the main conclusion could be softened given the small number of proteins (six), esp. since only four (3 AL and 1 non-AL) could be explored by HDX. Are other HDX MS data of AL LCs from the same Lambda6 family (e.g. PMID: 34678302) consistent with the conclusions that a particular domain-domain interface is weakened in AL vs. non-AL LCs?

      We thank the reviewer for this suggestions. A diNerence in HDX MS data is indeed visible between AL and MM proteins for peptide 33-47 in the suggested paper (Figures 4, S5 and S8). The diNerence is reduced by the mutation identified in the paper as driving the aggregation in that specific case. We now mention this in the discussion.

      (5) Please clarify if the H* state is the same for a covalent vs. non-covalent LC dimer.

      We do not know because our data are only for covalent dimers. But, interestingly, the state is very similar to what was observed for a model kappa light-chain in Weber, et al., we have better highlighted this point in the discussion.

      (6) Please try and better explain why a smaller distance between CL domains in H7 protein and a larger distance in other AL proteins both promote protein misfolding.

      We do not have elements to discuss this point in more detail.

      (7) Please comment on the Kratky plots data vs. model agreement (see comments above).

      Done.

      (8) Please find a better way to display, describe, and interpret the HD exchange MS data.

      We have generated new main text (new Figure 5) and SI figures that we think allow the reader to better appreciated our observations. Corresponding results sections have been also improved.

      Minor points:

      (9) Is the population of the H-state with perturbed CL-CL domain interface, which was obtained in M&M simulations, suHicient to be observable by HDX MS?

      While populations alone are not enough to determine what is observable by HDX MS, a 10% population correspond roughly to 6 kJ/mol of ΔG and is compatible with EX2 kinetics. Previous works suggested that HDX-MS data should be sensitive to subpopulations of the order of 10%, (https://doi.org/10.1016/j.bpj.2020.02.005, https://doi.org/10.1021/jacs.2c06148)

      (10) Typically, an excited intermediate in protein unfolding is a monomer, while here it is an LC dimer. Is this unusual?

      This is a good point, we think that intermediates have mostly been studied on monomeric proteins because these are more commonly used as model systems, but we do not feel like discussing this point.

      (11) Low deuterium uptake is consistent with a rigid structure but may also reflect buried structure and/or structure that moves on a time scale greater than the labeling time.

      We agree.

      Reviewer #3 (Recommendations for the authors):

      (1) The p-value (statistical significance) of Rg diHerence should be computed.

      We thank the reviewer for the suggestion, we calculated the p-value that resulted quite significant.

      (2) The significance of mutations (SHM?) at the interface, such as A40G should be compared with previous observations. (Garrofalo et al., 2021).

      We thank the reviewer for the suggestion, a sentence has been added in the discussion.

    2. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study identifies the "H-state" as a potential conformational marker distinguishing amyloidogenic from non-amyloidogenic light chains, addressing a critical problem in protein misfolding and amyloidosis. By combining advanced techniques such as small-angle X-ray scattering, molecular dynamics simulations, and H-D exchange mass spectrometry, the authors provide convincing evidence for their novel findings. However, incomplete experimental descriptions, limitations in SAXS data interpretation, and the way HDX MS data is presented aHect the strength and generalizability of the conclusions. Strengthening these aspects would enhance the impact of this work for researchers in amyloidosis and protein misfolding.

      We thank eLife editors and reviewers for their constructive feedback. The manuscript has been improved to provide a more complete description of the experiments and to strengthen the interpretation and presentation of all data. Updated Figures (Figure 2 and Figure 5) and a new Table (Table 2) in the main text provide a more complete and clearer comparison of the SAXS data with MD simulations as well as a clearer representation of the HDX MS data. Additional figures have been added in SI. The text has been extended accordingly and complete materials and methods are now included in the main text. Abstract, introduction and discussion have been revised to improve the overall readability of the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      The study investigates light chains (LCs) using three distinct approaches, with a focus on identifying a conformational fingerprint to diHerentiate amyloidogenic light chains from multiple myeloma light chains. The study's major contribution is identifying a low-populated "H state," which the authors propose as a unique marker for AL-LCs. While this finding is promising, the review highlights several strengths and weaknesses. Strengths include the valuable contribution of identifying the H state and using multiple approaches, which provide a comprehensive understanding of LC structural dynamics. However, the study suHers from weaknesses, particularly in interpreting SAXS data, lack of clarity in presentation, and methodological inconsistencies. Critical concerns include high error margins between SAXS profiles and MD fits, unclear validation of oligomeric species in SAXS measurements, and insuHicient quantitative cross-validation between experimental (HDX) and computational data (MD). This reviewer calls for major revisions including clearer definitions, improved methodology, and additional validation, to strengthen the conclusions.

      We thank the reviewer for the supportive comments, in the revised version of the manuscript we have focused on improving the clarity and completeness of our work. We are sorry for example to not have made previously clear enough that the comparison of SAXS with MD simulation was not that shown in the main text in Figure 1 and Table 1 (this is the comparison with single structures) but that reported in the SI (previously Figure S1 and Table S2, showing very good fits). These data have been moved in the main text in the reworked Figure 2 and new Table 2. We have also improved the presentation of the HDX MS data in Figure 5 and in the text adding also additional analysis in SI. Materials and methods are now completely moved in the main text. We generally revised the manuscript for clarity.

      Reviewer #2 (Public review):

      Summary:

      This well-written manuscript addresses an important but recalcitrant problem - the molecular mechanism of protein misfolding in Ig light chain (LC) amyloidosis (AL), a major life-threatening form of systemic human amyloidosis. The authors use expertly recorded and analyzed smallangle X-ray scattering (SAXS) data as a restraint for molecular dynamics simulations (called M&M) and to explore six patient-based LC proteins. The authors report that a highly populated "H-state" determined computationally, wherein the two domains in an LC molecule acquire a straight rather than bent conformation, is what distinguishes AL from non-AL LCs. They then use H-D exchange mass spectrometry to verify this conclusion. If confirmed, this is a novel and interesting finding with potentially important translational implications.

      We thank the reviewer for the supportive comments.

      Strengths:

      Expertly recorded and analyzed SAXS data combined with clever M&M simulations lead to a novel and interesting conclusion. Regardless of whether or not the CL-CL domain interface is destabilized in AL LCs explored in this (Figure 6) and other studies, stabilization of this interface is an excellent idea that may help protect at least a subset of AL LCs from misfolding in amyloid. This idea increases the potential impact of this interesting study.

      We thank the reviewer for the supportive comments.

      Weaknesses:

      The HDX analysis could be strengthened.

      We have extended the analysis and improved the presentation of the HDX data. Figure 5 has been reworked, text has been improved accordingly and additional analysis have been reported in SI.

      Reviewer #3 (Public review):

      Summary:

      This study identifies conformational fingerprints of amyloidogenic light chains, that set them apart from the non-amyloidogenic ones.

      We thank the reviewer for the supportive comments.

      Strengths:

      The research employs a comprehensive combination of structural and dynamic analysis techniques, providing evidence that conformational dynamics at the VL-CL interface and structural expansion are distinguished features of amyloidogenic LCs.

      We thank the reviewer for the supportive comments.

      Weaknesses:

      The sample size is limited, which may aHect the generalizability of the findings. Additionally, the study could benefit from deeper analysis of specific mutations driving this unique conformation to further strengthen therapeutic relevance.

      We agree, we tried to maximise the size of the sample and this was the best we could do. With respect to the analysis of the mutations, while we tried to discuss some of them also in view of previous works, because our set covers multiple germlines instead than focusing on a single one, this limit our ability to discuss single point mutations systematically, at the same time the discussion of single points mutations has been the focus of many recent works, while our approach provide a diNerent point of view.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This study provides an investigation of light chains (LCs) using three distinct approaches, focusing primarily on identifying a conformational fingerprint to distinguish amyloidogenic light chains (AL-LCs) from multiple myeloma light chains (MM-LCs). The authors propose that the presence of a low-populated "H state," characterized by an extended quaternary structure and a perturbed CL-CL interface, is unique to AL-LCs. This finding is validated through hydrogendeuterium exchange mass spectrometry (HDX-MS). The study makes a valuable contribution to understanding the structural dynamics of light chains, particularly with the identification of the H state in AL-LCs. However, significant concerns regarding the interpretation of the SAXS data, clarity in presentation, and methodological rigor must be addressed. I recommend major revisions and resubmission of the work.

      Major concerns:

      (1) A critical concern is how the authors ensure that the SAXS profiles represent only dimeric species, given the high propensity of LCs to aggregate. If higher-order aggregates or monomers were present, this would significantly impact the SAXS data and SAXS-MD integration. Some measurements are bulk SAXS, while others are SEC-SAXS, making the study questionable. The authors need to clarify how only dimeric species were measured for the SEC-SAXS analysis, and all assessments of the dimeric state should be shown in the SI. Additionally, complementary techniques such as DLS or SEC-MALS should be used to verify the oligomeric state of the samples. Without this validation, the SAXS profiles may not be reliable.

      We added SEC-MALS and SEC-SAXS data in the SI (Figures S20 and S21) as well the SAXS curves shown in log-log plot (Figure S1) that display a flat trend at low q that exclude aggregation. SAXS is very sensitive to oligomers and aggregates and our data do not indicate the presence of those species. When we had indication of possible aggregation in the sample we used SEC-SAXS.

      (2) A major problem with the paper is that the claim of the "H state," which is the novelty of the study and serves as a marker of aggregation, is derived from samples where the error between the SAXS profiles and MD fits is extremely high. This casts doubt on whether the structure is indeed resolved by MD. The main conclusion of the paper is derived from weak consistency between experiment and simulation. In AL55, the error between experiment and simulation is greater than 5; for H7, it is higher than 2.8. The residuals show significant error at mid-q values, suggesting that long-range distance correlations (20-10 Å, CL, VL positioning) are not consistent between simulation and experiment. Furthermore, the FES plots of two independent replicas show deviation in the existence of the H state. One shows a minimum in that region, while the other does not. So, how robust is this conclusion? What is the chi-squared value if each replica is used independently? A separate experimental cross-validation is necessary to claim the existence of the H state.

      We apologise for the misunderstanding underlying this reviewer comment. The poor agreement mentioned is not between the SAXS and MD simulations, but with the individual structures, and this disagreement led us to perform MD simulations that are in much better agreement with the data (previously Fig. S1 and Table S2). To avoid this misunderstanding, which would indeed weaken our work, we have now moved both the figure and the table in the main text to the updated Figure 2 and the new Table 2.

      Regarding the robustness of the sampling, we believe that Table 3 (previously Table 2) clearly shows the statistical convergence of the data, diNerences in the presentation of the free energy are purely interpolation issues. The chi-squares of each replicate are reported in Table 2 (previously Table S2).

      (3) There is insuHicient discussion about SAXS computations from MD trajectories. The accuracy of these calculations is crucial to deriving the existing conclusions, and the study's reliance on the PLUMED plugin, which is known to give inaccurate results for SAXS computations, raises concerns. How the solvent is treated in the SAXS computations needs to be explained. Alternative methods like WAXSiS or Crysol should be explored to check whether the SAXS profiles derived from the MD trajectory are consistent across other SAXS computation methods for the major conformers of the proteins.

      We have now clarified that while the SAXS calculation to perform Metainference MD were done using PLUMED (that to our knowledge is as accurate as crysol) SAXS curves used for analysis were calculated using crysol.

      (4) The HDX and MD results do not seem to correlate well, and there is a disconnect between Figure 2 (SAXS profiles) and Figure 5 (HDX structural interpretation). The authors should quantitatively assess residue-level dynamics by comparing HDX signals with MD-derived HDX signals for each protein. This would provide a cross-validation between the experimental and computational data.

      In our opinion our SAXS, MD and HDX MS data provide a consistent picture. Our HDX-MS do not provide per residue data, making a quantitative comparison out of scope. RMSF data do not necessarily need to correlate with the deuterium uptake.

      (5) MD simulations are only used to refine the structure of AlphaFold predictions, but the trajectories could help explain why these structures diHer, what stabilizes the dimer, or what leads to the conformational transition of the H state. A lack of analysis regarding the physical mechanism behind these structural changes is a weakness of the study. The authors should dedicate more eHort to analyzing their data and provide physical insights into why these changes are observed.

      Our aim was to identify a property that could discriminate between AL and MM LCs. We used MD simulations, not to refine structures, but to explore the conformational dynamics of LCs (starting from either X-ray structures, homology or AlphaFold models), because SAXS data suggested that conformational dynamics could discriminate between AL- and MM-LCs. Simulations allowed us to propose a hypothesis, which we tested by HDX MS. While more insight is always welcome, we believe that we have achieved our goal for now. In the discussion, we present additional analysis of the simulations to connect with previous literature, we agree that more analysis can be done, and also for this reason, all our data are publicly available.

      Minor concerns

      (6) The abstract leans heavily on describing the problem and methods but lacks a clear presentation of key results. Providing a concise summary of the main findings (e.g., the identification of the H state) would better balance the abstract.

      We agree with the reviewer and we rewrote the abstract.

      (7) In the abstract, the term "experimental structure" is used ambiguously. Since SAXS also provides an experimental structure, it is unclear what the authors are referring to. This should be clarified.

      We agree with the reviewer and we rewrote the abstract.

      (8) Abbreviations such as VL (variable domain) and CL (constant domain) are not defined, making it harder for readers unfamiliar with the field to follow. Abbreviations should be defined when first mentioned.

      We agree with the reviewer and we rewrote the abstract.

      (9) The introduction provides a good general context but fails to explicitly define the knowledge gap. Specifically, the structural and dynamic determinants of LC amyloidogenicity are not well established, and this study could be framed as addressing that gap.

      We thank the reviewer and we agree this could be better framed, we improved the introduction accordingly.

      (10) The introduction does not present the novel discovery of the H state early enough. The unique contribution of identifying this state as a marker for AL-LCs should be mentioned upfront to guide the reader through the significance of the study.

      We thank the reviewer and we have now made more explicit what we found.

      (11) The therapeutic implications of this research should be highlighted more clearly in the discussion. Examples of how these findings could be utilized in drug design or therapeutic approaches would enhance the study's impact.

      We thank the reviewer, but while we think that the H-state could be targeted for drug design, since we do not have data yet we do not want to stress this point more than what we are already doing.

      (12) There is an overwhelming use of abbreviations such as H3, H7, H18, M7, and M10 without proper introduction. This makes it diHicult for readers to follow the results, and the average reader may become lost in the details. An introductory figure summarizing the sequences under study, along with a schematic of the dimeric structure defining VL and CL domains, would significantly aid comprehension.

      We agree and we tried to better introduce the systems and simplify the language without adding a figure that we think would be redundant.

      (13) In Figure 1, add labels to each SAXS curve to indicate which protein they correspond to. Also, what does online SEC-SAXS mean?

      Done

      (14) The caption of Figure 3 is unclear, particularly with abbreviations like Lb, Ls, G, and H, which are not mentioned in the captions. The authors should define these terms for clarity.

      Done

      (15) The study claims that the dominant structure of the dimer changes between diHerent LCs. However, Figure 5 shows identical structures for all proteins, raising questions about the consistency between the SAXS and HDX data. This inconsistency is a general problem between the MD and HDX sections, where cross-communication and comparisons are not properly addressed.

      We do not claim that the dominant structure of the dimer changes between diNerent LCs, this would also be in contradiction with current literature. We claim a diNerence in a low-populated state. From this point of view using always the same structure is consistent and should simplify the representation of the results. We agree that the manuscript may be not always easy to follow and we thank the reviewer in helping us improving it.

      (16) The authors show I(q) vs q and residuals for each protein. The Kratky plots are not suHicient to compare the SAXS computations with the measured profile.

      Showing Kratky and residuals is a standard and complementary way to present and compare SAXS data to structures. Chi-square values are also reported. Log-log plots have been added to SI in response to previous comments.

      (17) The authors need to explain how they estimate the Rg values (from simulation or SAXS profiles). If they are using simulations, they should compute the Rg values from the simulations for comparison.

      Rg values reported in Table 1 are derived from SAXS. Rg from simulations have been added in Table 2.

      (18) The evolution of the sampling is unclear. The authors need to show the initial starting conformation in each case and the most likely conformation after M&M in the SI, to demonstrate that their approach indeed caused changes in the initial predictions.

      Our approach is not structure refinement and as such the proposed analysis would be misleading. Metainference is meant to generate a statistical ensemble representing the equilibrium conformations that as whole reproduce the data. DiNerences (or not) between initial and selected configurations will not be particularly informative in this context.

      (19) The authors should also provide a running average of chi-squared values over time to demonstrate that the conformational ensemble converged toward the SAXS profile.

      Our simulations are not driven to improve the agreement with SAXS over time, this is not structure refinement. Metainference is meant to generate a statistical ensemble representing the equilibrium conformations that as whole reproduce the data. The suggested analysis would be a misinterpretation of our simulations. The comparison with SAXS is provided in Figure 2 and Table 2 as mentioned above.

      (20) The aggregate simulation time of 120 microseconds is misleading, as each replica was only run for 2-3 microseconds. This should be clarified.

      The number reported in the text is accurate and represent the aggregated sampling. The number of replicas for each metainference simulation and their length is reported in Table 2 now moved for clarity from the SI to main text.

      (21) It is not clear how the replicas were weighted to compute the SAXS profiles and FES. There are two independent runs in each case, and each run has about 30 replicas. How these replicas are weighted needs to be discussed in the SI.

      Done

      (22) The methods section is unevenly distributed, with detailed explanations of LC production and purification, while other key methodologies like SAXS+MD integration and HDX are not even mentioned in the main text (they are in the Supporting Information). The authors should provide a brief overview of all methodologies in the main text or move everything to the SI for consistency.

      We agree with the reviewer, all methods are now in main text.

      Reviewer #2 (Recommendations for the authors):

      (1) Computational M&M evidence is strong (Figure 3) and is supported by SAXS (used as restraints). However, Kratky plots reported in the main MS Figure 1 show significant diHerences between the data and the structural model only for one protein, AL-55. It is hard for the general reader to see how these SAXS data support a clear diHerence between AL and non-AL proteins. If possible, please strengthen the evidence; if not, soften the conclusions.

      We thank the reviewer for the comments. The chi-square (Table 1) and the residuals (Figure 1) are a strong indication of the diNerence. To strengthen the evidence, following also the comment from reviewer 3 we calculated the p-value (<10<sup>-5</sup>) on the significance of the radius of gyration to discriminate AL and MM LCs. We agree that SAXS alone was not enough and this is indeed what prompted us to perform MD simulations.

      (2) HDX MS results are cursory and not very convincing as presented. The butterfly plots in Figure 5 are too small to read and are unlabeled so it is unclear which protein is which.

      Figure 5 has been reworked for readability. More data have been added in SI.

      (3) What labeling time was selected to construct these plots and why?

      The deuterium uptakes at 30 min HDX time showed the most pronounced diNerences between diNerent proteins, which were chosen to illustrate the key structural features in the main figure panel (Figure 5).

      How diHerent are the results at other labeling times? Showing uptake curves (with errors) for more than just two peptides in the supplement Figure S12 might be helpful.

      We found a continuous increase in deuterium uptake as we increased the exchange time from 0.5 to 240 min, which reached saturation at 120 min. Therefore, the exchange follows the same pattern at all time points. Butterfly plots at diNerent HDX times of 0.5 to 240 min are shown in gradient of light blue to dark blue which clearly shows the pattern of deuterium uptake at increasing incubation times (Figure 5). The HDX uptake kinetics of selected peptides with corresponding error bars are shown in Figure S12.

      How redundant are the data, i.e. how good is the peptide coverage/resolution in key regions at the domain-domain interface that the authors deem important? Mapping the maximal deuterium uptake on the structures in Figure 5 is not very helpful. Perhaps mapping the whole range of uptake using a gradient color scheme would be more informative.

      Overall coverage and redundancy for all four proteins are> 90% and > 4.0, respectively, with an average error margin in fractional uptake among all peptides is 0.04-0.05 Da, which suggests that our data is reliable (Table S3). We modified the main panel figures showing the gradient of deuterium uptake in blue-white-red for 0 to 30% of deuterium uptake on the chain A of the dimeric LCs.

      (3) Is the conformational heterogeneity depicted in M&M simulations consistent with HDX results? The authors may want to address this by looking at the EX1/EX2 exchange kinetics for AL vs. non-AL proteins. Do AL proteins show more EX1?

      No, we don’t see any EX1 exchange kinetics in our analysis. This is compatible with the prediction of the H-state that is a native like state and not an unfolded/partially folded state.

      (4) Perhaps the main conclusion could be softened given the small number of proteins (six), esp. since only four (3 AL and 1 non-AL) could be explored by HDX. Are other HDX MS data of AL LCs from the same Lambda6 family (e.g. PMID: 34678302) consistent with the conclusions that a particular domain-domain interface is weakened in AL vs. non-AL LCs?

      We thank the reviewer for this suggestions. A diNerence in HDX MS data is indeed visible between AL and MM proteins for peptide 33-47 in the suggested paper (Figures 4, S5 and S8). The diNerence is reduced by the mutation identified in the paper as driving the aggregation in that specific case. We now mention this in the discussion.

      (5) Please clarify if the H* state is the same for a covalent vs. non-covalent LC dimer.

      We do not know because our data are only for covalent dimers. But, interestingly, the state is very similar to what was observed for a model kappa light-chain in Weber, et al., we have better highlighted this point in the discussion.

      (6) Please try and better explain why a smaller distance between CL domains in H7 protein and a larger distance in other AL proteins both promote protein misfolding.

      We do not have elements to discuss this point in more detail.

      (7) Please comment on the Kratky plots data vs. model agreement (see comments above).

      Done.

      (8) Please find a better way to display, describe, and interpret the HD exchange MS data.

      We have generated new main text (new Figure 5) and SI figures that we think allow the reader to better appreciated our observations. Corresponding results sections have been also improved.

      Minor points:

      (9) Is the population of the H-state with perturbed CL-CL domain interface, which was obtained in M&M simulations, suHicient to be observable by HDX MS?

      While populations alone are not enough to determine what is observable by HDX MS, a 10% population correspond roughly to 6 kJ/mol of ΔG and is compatible with EX2 kinetics. Previous works suggested that HDX-MS data should be sensitive to subpopulations of the order of 10%, (https://doi.org/10.1016/j.bpj.2020.02.005, https://doi.org/10.1021/jacs.2c06148)

      (10) Typically, an excited intermediate in protein unfolding is a monomer, while here it is an LC dimer. Is this unusual?

      This is a good point, we think that intermediates have mostly been studied on monomeric proteins because these are more commonly used as model systems, but we do not feel like discussing this point.

      (11) Low deuterium uptake is consistent with a rigid structure but may also reflect buried structure and/or structure that moves on a time scale greater than the labeling time.

      We agree.

      Reviewer #3 (Recommendations for the authors):

      (1) The p-value (statistical significance) of Rg diHerence should be computed.

      We thank the reviewer for the suggestion, we calculated the p-value that resulted quite significant.

      (2) The significance of mutations (SHM?) at the interface, such as A40G should be compared with previous observations. (Garrofalo et al., 2021).

      We thank the reviewer for the suggestion, a sentence has been added in the discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this article the authors described mouse models presenting with backer muscular dystrophy, they created three transgenic models carrying three representative exon deletions: ex45-48 del., ex45-47 19 del., and ex45-49 del. This article is well written but needs improvement in some points.

      Strengths:

      This article is well written. The evidence supporting the authors' claims is robust, though further implementation is necessary. The experiments conducted align with the current state-of-the-art methodologies.

      Weaknesses:

      This article does not analyze atrophy in the various mouse models. Implementing this point would improve the impact of the work

      We thank the reviewer for their constructive suggestions and comments on this work. Muscle hypertrophy is shown with growth in dystrophin-deficient skeletal muscle in mdx mice; thus, we did not pay attention to the factors associated with muscle atrophy in BMD mice. As the reviewer suggested, the examination of the association between type IIa fiber reduction and muscle atrophy is important, and the result is considered to be helpful in resolving the cause of type IIa fiber reduction in BMD mice.

      In response, we reviewed the following.

      (1) The cross-sectional areas (CSAs) of muscles. We confirmed that the CSAs in BMD and mdx mice were rather high at 3 months, in accordance with muscle hypertrophy, compared with those of WT mice. The data is presented in Fig. 4–figure supplement 1B.

      (2) The mRNA expression levels of Murf1 and atrogin-1. We confirmed that these muscle atrophy inducing factors did not differ among WT, BMD, and mdx mice. The data is presented in Fig. 4–figure supplements 1C and 1D.

      Reviewer #2 (Public review):

      Summary:

      Miyazaki et al. established three distinct BMD mouse models by deleting different exon regions of the dystrophin gene, observed in human BMD. The authors demonstrated that these models exhibit pathophysiological changes, including variations in body weight, muscle force, muscle degeneration, and levels of fibrosis, alongside underlying molecular alterations such as changes in dystrophin and nNOS levels. Notably, these molecular and pathological changes progress at different rates depending on the specific exon deletions in the dystrophin gene. Additionally, the authors conducted extensive fiber typing, revealing a site-specific decline in type IIa fibers in BMD mice, which they suggest may be due to muscle degeneration and reduced capillary formation around these fibers.

      Strengths:

      The manuscript introduces three novel BMD mouse models with different dystrophin exon deletions, each demonstrating varying rates of disease progression similar to the human BMD phenotype. The authors also conducted extensive fiber typing across different muscles and regions within the muscles, effectively highlighting a site-specific decline in type IIa muscle fibers in BMD mice.

      Weaknesses:

      The authors have inadequate experiments to support their hypothesis that the decay of type IIa muscle fibers is likely due to muscle degeneration and reduced capillary formation. Further investigation into capillary density and histopathological changes across different muscle fibers is needed, which could clarify the mechanisms behind these observations.

      We thank the reviewer for these positive comments and the very important suggestion about type IIa fiber reduction and capillary change around muscle fibers in BMD mice. From the results of the cardiotoxin-induced muscle degeneration and regeneration model, type IIa and IIx fibers showed delayed recovery compared with that of type-IIb fibers. However, this delayed recovery of type IIa and IIx could not explain the cause of the selective muscle fiber reduction limited to type IIa fibers in BMD mice. Therefore, we considered vascular dysfunction as the reason for the selective type IIa fiber reduction, and we found morphological capillary changes from a “ring pattern” to a “dot pattern” around type IIa fibers in BMD mice. However, the association between selective type IIa fiber reduction and the capillary change around muscle fibers in BMD mice remains unclear due to the lack of information about capillaries around type IIx and IIb fibers. The reviewer pointed out this insufficient evaluation of capillaries around other muscle fibers (except for type IIa fibers), and this suggestion is very helpful for explaining the association between selective type IIa fiber reduction and vascular dysfunction in BMD mice.

      In response, we reviewed the following.

      (1) The capillary formation around type IIx, IIb, and I fibers, in addition to that around type IIa fibers. We found that capillaries contacting around type IIx, IIb, and I fibers were poor in WT mice compared with that around type IIa fibers, with ‘incomplete ring-patterns’ around type IIx fibers, and ‘dot-patterns’ around type IIb and I fibers in WT mice. Morphological capillary changes around muscle fibers from WT to d45-49 and mdx mice were ‘incomplete dot-pattern’ to ‘dot-pattern’ around type IIx fibers, and ‘dot-pattern’ to ‘dot-pattern’ around type IIb and I fibers. This was in contrast to those around type IIa fibers: remarkable ‘ring-pattern’ to ‘dot-pattern’. These data are presented in Fig. 6B.

      (2) The endothelial area in contact with type IIx, IIb, and I fibers, and additionally that in contact with type IIa fibers. The endothelial area in contact with both type IIa and IIx fibers was less in d45-49 and mdx mice than in WT mice, but the reduction was larger around type IIa fibers than around type IIx fibers, reflecting the difference between the ‘ring-pattern’ around the former and the ‘incomplete ring-pattern’ around the latter in WT mice. These data are presented in Fig. 6C.

      (3) Transversely interconnected branches and capillary loops, using longitudinal muscle sections. We confirmed that there were fewer interconnected capillaries in BMD and mdx mice than in WT mice. These data are presented in Fig. 6E.

      (4) The mRNA expression levels of neuronal nitric oxide synthase (nNOS). We confirmed that nNOS protein expression levels were decreased in BMD and mdx mice in spite of adequate levels of nNOS mRNA expression. The data on nNOS mRNA expression levels is presented in Fig. 3–figure supplement 1C.

      (5) We added a sentence in the Abstract about the potential utility of BMD mice in developing vascular targeted therapies.

      Recommendation for the authors:

      Reviewer #1 (Recommendation for the authors):

      Abstract:

      Abstract: more emphasis should be on the pathological implications of Becker muscular dystrophy (BMD). Furthermore, should be emphasized the findings made in this article and the conclusions. Abbreviations such as DMD and MDX should be written in full and only then with the acronym.

      We appreciate the reviewers’ comments, and we apologize for the confusion over abbreviations. DMD is the gene name encoding dystrophin, and mdx is the strain name of mouse lacking dystrophin.

      In the Abstract and the Figure legends we changed:

      (1) DMD to DMD;

      (2) mdx mice to mdx mice.

      Results:

      Line 95: in this line, authors evaluated serum creatinine kinase (CK) levels at 1, 3, 6 and 12 months in WT mice and mdx mice. Why did you decide to study it? This part should be described in more detail. Serum CK is one of the main markers of muscle necrosis; therefore, I would report this data alongside the description of the muscle histology and necrotic fibers.

      We thank the reviewers for the important remarks. In this study, serum creatine kinase (CK) levels were two-fold to four-fold higher in BMD mice than in WT mice, but its rate of increase was less than that of mdx mice. We consider that the lesser changes in serum CK levels in BMD mice may be due to the smaller area of muscle degeneration because of focal and uneven muscle degeneration compared with that in mdx mice, which showed diffuse muscle degeneration.

      In response, we have moved the description of serum CK levels in the Results, from the section about the establishment of BMD mice to the section about site-specific muscle degeneration in BMD mice.

      In addition, we added a description in the Discussion about the possible association between the lesser changes in serum CK levels in BMD mice and its uneven distribution of muscle degeneration.

      Line 192-202: In these lines, authors observed a decrease in type IIa fibers after 3 months in BMD mice. I suggest evaluating also atrophy through evaluating cross-sectional areas (CSA) and expression of Murf1 and Atrogin1

      We thank the reviewer for the point about the association between type IIa fiber reduction and muscle atrophy. We evaluated the CSAs and the mRNA expression levels of Murf1 and atrogin-1. We confirmed that the CSAs in BMD and mdx mice were rather high at 3 months, in accordance with muscle hypertrophy, compared with those of WT mice, and that Murf1 and atrogin-1 mRNA expression levels did not differ among WT, BMD, and mdx mice. These data are presented in Fig. 4–figure supplements 1B, 1C, and 1D. We added a sentence about the changes in CSA and muscle atrophy inducing factors in the Discussion.

      Methods and material

      Line 342-348: authors have described animals, but not specified sex and number of mice in each group. This part should be improved.

      We apologize for our insufficient information about the sex and number of mice in the Materials and methods.

      We added a sentence specifying the sex, number, and evaluation period of each mouse group in the section on the generation of BMD mice.

      Line 426-433: authors described qPCR. It is necessary that the authors also describe primer sequences.

      We apologize for any lack of information about the primer sequences used in qPCR analysis. Supplemental Table 1 lists the primer sequences.

      We also added a sentence about the information in the primer list in the section on RNA isolation and RT-PCR in the Materials and methods.

      Reviewer #2 (Recommendation for the authors):

      Miyazaki et al. established three distinct BMD mouse models by removing different exon regions of the dystrophin gene. The authors demonstrated that the pathophysiological and molecular changes in these models progress at varying rates. Additionally, they observed a site-specific decline in type IIa fibers in BMD mice, while the proportions of other fiber types, such as type I and type IIx, remained consistent with those in wild-type mice. They proposed that the selective decay of type IIa fibers in BMD mice could be due to two primary factors: 1) muscle degeneration and regeneration, supported by their findings in cardiotoxin-treated mouse models, and 2) reduced capillary formation around type IIa fibers. However, the authors also presented evidence that type IIx fibers exhibited delayed recovery, similar to type IIa fibers, as demonstrated in cardiotoxin-induced regeneration models. Additionally, dot-patterned capillary formations were observed around both type IIa and type IIx fibers. Despite these findings, BMD mice did not show any changes in the proportion of type IIx fibers in inner BMD muscles. The authors should consider adding further analysis to strengthen their hypothesis and to disclose any possible mechanisms that led to these discrepancies.

      If the authors hypothesize that reduced capillary density around type IIa fibers contribute to their site-specific decay in BMD mice, they should consider measuring and statistically analyzing the endothelial area around all fiber types. By plotting and comparing these measurements across different fiber types between wild-type, BMD, and mdx mice, the authors could provide more robust evidence to support their hypothesis. This approach would help clarify whether reduced capillary density is a contributing factor to the site-specific decay of type IIa fibers in BMD mice and the more diffuse, non-specific muscle changes observed in mdx mice.

      The authors reported in the first part of the manuscript that histopathological changes, including muscle degeneration in BMD mice, are predominantly restricted to the inner part of the muscles. In the second part, they noted a decline in type IIa fibers specifically in the inner muscle region. To strengthen the hypothesis that the decay of type IIa fibers in the inner muscle is linked to muscle degeneration, the authors should consider performing histopathological measurements across different fiber types within the inner muscle. Reporting the correlations between these measurements would provide more compelling evidence to support their hypothesis.

      We thank the reviewer for these important suggestions about the association between type IIa fiber reduction and capillary change around muscle fibers in BMD mice. We prepared an additional evaluation about the capillary formation (in Fig. 6B) and endothelial area (in Fig. 6C) around type IIx, IIb, and I fibers. We found that capillaries contacting around type IIx, IIb, and I fibers were poor in WT mice compared with those around type IIa fibers, and showed an ‘incomplete ring-pattern’ around type IIx fibers and a ‘dot-pattern’ around type IIb and I fibers in WT mice, in contrast with type IIa fibers, which showed remarkable ‘ring-pattern’ capillaries. Reflecting this, the changes in endothelial area around type IIx, IIb, and I fibers between WT and BMD mice were less than those around type IIa fibers. These results suggest that type IIa fibers may require numerous capillaries and maintained blood flow compared with type IIx, IIb, and I fibers, and this high requirement for blood flow might be associated with the type IIa fiber-specific decay in BMD mice.

      We added the following.

      (1) Sentences in the Results about the capillary changes around type IIx, IIb, and I fibers in WT, d45-49, and mdx mice.

      (2) Sentences in the Results about the changes in endothelial area around type IIx, IIb, and I fibers in WT, d45-49, and mdx mice.

      (3) Sentences in the Discussion about the association between the type IIa fiber-specific decay in BMD mice and the differences in capillary changes of each muscle fiber from WT to BMD mice.

      We changed a sentence in the Discussion about the delayed recovery of type IIa and IIx fibers after CTX injection, to make it clear that the recovery of type IIx fibers was slower than that of type IIa fibers after CTX injection, and that therefore the type IIa fiber-specific decay in BMD mice might not be explained by this vulnerability and delayed recovery during muscle degeneration and regeneration.

      Minor Issues:

      Line 103: The word "mice" is duplicated and should be corrected.

      We apologize that “mice” was duplicated. We have corrected it.

      Line 120: Revise for clarity: "The proportion of opaque fibers is significantly different between d45-48 mice and WT at 3 months, with an increased tendency observed only in 1-month-old mice."

      We apologize for the confusion about the proportion of opaque fibers. We revised this sentence as follows.

      “Opaque fibers, which are thought to be precursors of necrotic fibers, increased at an earlier age of 1 month in d45–49 mice compared with WT mice; in contrast, the proportion of opaque fibers differs significantly between d45–47 and WT mice at 3 months, with an increased tendency only in 1-month-old mice (Fig. 2C).”

      Line 152: Clarify the statement regarding utrophin levels, as it currently contradicts the Western blot data. The sentence reads: "The increased levels of utrophin are 8-fold higher at 1 month and 30-fold higher at 3 months." This should be verified against the data, as the band densities in the Western blots suggest otherwise.

      We apologize for the confusion about utrophin expression levels. We revised this sentence as follows.

      “By western blot analysis, the utrophin expression levels showed only an increased tendency in all BMD mice at 3 months, whereas there was a significant increase in mdx mice (8-fold at 1 month, and 30-fold at 3 months) compared to WT mice (Figs. 3C and F).”

      Line 235: Correct the sentence to accurately reflect the findings: "BMD mice showed reduced muscle weakness."

      We apologize for our incorrect wording. We have removed the word “reduced” in this sentence.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The manuscript by Dr. Shinkai and colleagues is about the posttranslational modification of a highly important protein, MT3, also known as the growth inhibitory factor. Authors postulate that MT3, or generally all MT isoforms, are sulfane sulfur binding proteins. The presence of sulfane sulfur at each Cys residue has, according to the authors, a critical impact on redox protein properties and almost does not affect zinc binding. They show a model in which 20 Cys residues with sulfane sulfur atoms can still bind seven zinc ions in the same clusters as unmodified protein. They also show that recombinant MT3 (but also MT1 and MT2) protein can react with HPE-IAM, an efficient trapping reagent of persulfides/polysulfides. This reaction performed in a new approach (high temperature and high reagent concentration) resulted in the formation of bis-S-HPE-AM product, which was quantitatively analyzed using LC-MS/MS. This analysis indicated that all Cys residues of MT proteins are modified by sulfane sulfur atoms. The authors performed a series of experiments showing that such protein can bind zinc, which dissociates in the reaction with hydrogen peroxide or SNAP. They also show that oxidized MT3 is reduced by thioredoxin. It gives a story about a new redox-dependent switching mechanism of zinc/persulfide cluster involving the formation of cystine tetrasulfide bridge.

      The whole story is hard to follow due to the lack of many essential explanations or full discussion. What needs to be clarified is the conclusion (or its lack) about MT3 modification proven by mass spectrometry. Figure 1B shows the FT-ICR-MALDI-TOF/MS spectrum of recombinant MT3. It clearly shows the presence of unmodified MT3 protein without zinc ions. Ions dissociate in acidic conditions used for MALDI sample preparation. If the protein contained all Cys residues modified, its molecular weight would be significantly higher. Then, they show the MS spectrum (low quality) of oxidized protein (Fig. 1C), in which new signals (besides reduced apo-MT3) are observed. They conclude that new signals come from protein oxidation and modification with one or two sulfur atoms. If the conclusion on Cys residue oxidation is reasonable, how this protein contains sulfur is unclear. What is the origin of the sulfur if apo-MT does not contain it? Oxidized protein was obtained by acidification of the protein, leading to zinc dissociation and subsequent neutralization and air oxidation. Authors should perform a detailed isotope analysis of the isotopic envelope to prove that sulfur is bound to the protein. They say that the +32 mass increase is not due to the appearance of two oxygen donors. They do not provide evidence. This protein is not a sulfane sulfur binding protein, or its minority is modified. Moreover, it is unacceptable to write that during MT3 oxidation are "released nine molecules of H2". How is hydrogen molecule produced? Moreover, zinc is not "released", it dissociates from protein in a chemical process.

      Thank you for your comment. According to your suggestion, we have rewritten the corresponding sentences below, together with addition of new Fig.1D.

      First, the sentence “which corresponded to the mass of zinc-free apo-GIF/MT3 and indicated that zinc was removed during MS analysis.” was changed to “which corresponded to the mass of zinc-free apo-GIF/MT3 and indicated that zinc dissociates from protein in acidic conditions used for MALDI sample preparation.” in the introduction section. Second, we have added the following sentence “However, FT-ICR-MALDI-TOF/MS analysis failed to detect sulfur modifications in GIF/MT-3 (Fig. 1B), suggesting that sulfur modifications in the protein were dissociated during laser desorption/ionization. Therefore, we postulate that the small amount of sulfur detected in oxidized apo-GIF/MT-3 is derived from the effect of laser desorption/ionization rather than any actual modification of the minority component.” in the discussion section. Third, we have added new Fig. 1D and the corresponding citation in the introduction. Fourth, the sentence “An increase in mass of 32 Da can also result from addition of two oxygen atoms, but we attributed it to one sulfur atom for reasons described later.” was changed to “Note that an increase in mass of 32 Da can also result from addition of two oxygen atoms.”.

      Another important point is a new approach to the HPE-IAM application. Zinc-binding MT3 was incubated with 5 mM reagent at 60°C for 36 h. Authors claim that high concentration was required because apoMT3 has stable conformation. Figure 2B shows that product concentration increases with higher temperature, but it is unclear why such a high temperature was used. Figure 1D shows that at 37°C, there is almost no reaction at 5 mM reagent. Changing parameters sounds reasonable only when the reaction is monitored by mass spectrometry. In conclusion, about 20 sulfane sulfur atoms present in MT3 would be clearly visible. Such evidence was not provided. Increased temperature and reagent concentration could cause modification of cysteinyl thiol/thiolates as well, not only persulfides/polysulfides. Therefore, it is highly possible that non-modified MT3 protein could react with HPE-IAM, giving false results. Besides mass spectrometry, which would clearly prove modifications of 20 Cys, authors should use very important control, which could be chemically synthesized beta- or alfa-domain of MT3 reconstituted with zinc (many protocols are present in the literature). Such models are commonly used to test any kind of chemistry of MTs. If a non-modified chemically obtained domain would undergo a reaction with HPE-IAM under such rigorous conditions, then my expectation would be right.

      Thank you for your comments. Although we have already confirmed that no false-positive results were observed using this method in Fig. 5 (previously Fig. 4), we have conducted additional experiments by preparing chemically synthesized α- and β-domains of GIF/MT-3, as well as recombinant α- and β-domains of GIF/MT-3. As shown in the new Fig. S2A, the chemically synthesized α- and β-domains of GIF/MT-3 detected almost no sulfane sulfur (less than 1 molecule per protein), whereas the recombinant α- and β-domains detected several molecules of sulfane sulfur (more than 5 molecules per protein) (Fig. S2A). Therefore, I would like to emphasize here that the cysteine residue itself cannot be the source of the bis-S-HPE-AM product (sulfane sulfur derivative).

      Accordingly, we have added the following sentence in the results section: “Because this assay was performed at relatively high temperatures (60°C), we also examined the sulfane sulfur levels of several mutant proteins using chemically synthesized α- and β-domains of GIF/MT-3 to eliminate false-positive results. As shown in Fig. S2A, sulfane sulfur (less than 1 molecule per protein) was undetectable in chemically synthesized α- and β-domains of GIF/MT-3, whereas several molecules of sulfane sulfur per protein were detected in recombinant α- and β-domains exhibited (Fig. S2B, left panel). These findings indicated that the sulfane sulfur detected in our assay was derived from biological processes executed during the production of GIF/MT-3 protein. We further analyzed mutant proteins with β-Cys-to-Ala and α-Cys-to-Ala substitutions and found that their sulfane sulfur levels were comparable with those of the α- and β-domains of GIF/MT-3, respectively (Fig. S2B, left panel). Additionally, Ser-to-Ala mutation did not affect the sulfane sulfur levels of GIF/MT-3. The zinc content of each mutant protein was also determined under these conditions (Fig. S2B, right panel).”

      - The remaining experiments provided in the manuscript can also be applied for non-modified protein (without sulfane sulfur modification) and do not provide worthwhile evidence. For instance, hydrogen peroxide or SNAP may interact with non-modified MTs. Zinc ions dissociate due to cysteine residue modification, and TCEP may reduce oxidized residue to rescue zinc binding. Again, mass spectrometry would provide nice evidence.

      Thank you for your comment. We understand that such experiments can also be applied to non-modified proteins (without sulfane sulfur modification). However, the experiments shown in Fig. 4 and Fig. 6 were conducted to investigate the role of sulfane sulfur under oxidative stress conditions, rather than to examine sulfur modification in the protein itself. As mentioned previously, it is difficult to detect sulfur modifications directly in the protein using MALDI-TOF/MS (Fig. 1), as sulfur modifications appear to dissociate during the laser desorption/ionization process.

      - The same is thioredoxin (Fig. 7) and its reaction with oxidized MT3. Nonmodified and oxidized MT3 would react as well.

      Thank you for your comment. We understand that such experiments can also be applied to non-modified MT-3 protein. However, to the best of our knowledge, this is the first report demonstrating that apo-MT-3 can serve as a good substrate for the Trx system. In fact, this experiment is not intended to prove that MT-3 is sulfane sulfur-binding protein. Rather, it demonstrates the novel finding that apo-MT3 serves as an excellent substrate for Trx and that the sulfane sulfur (persulfide structure) remains intact throughout the reduction process.

      - If HPE-IAM reacts with Cys residues with unmodified MT3, which is more likely the case under used conditions, the protein product of such reaction will not bind zinc. It could be an explanation of the cyanolysis experiment (Fig. 6).

      Thank you for your comment. As you pointed out, HPE-IAM reacts with cysteine residues in unmodified MT-3, thereby preventing zinc from binding to the protein. However, we did not use HPE-IAM prior to measuring zinc binding. Instead, HPE-IAM was used solely for determining the sulfane sulfur content in the protein, and thus it cannot explain the results of the cyanolysis experiment.

      - Figure 4 shows the reactivity of (pol)sulfides with TCEP and HPE-IAM. What are redox potentials? Do they correlate with the obtained results?

      Thank you for your comment. However, we must apologize as we do not fully understand the rationale behind determining redox potentials in this experiment. We believe the data itself to be very clear and presenting convincing results.

      - Raman spectroscopy experiments would illustrate the presence of sulfane sulfur in MT3 only if all Cys were modified.

      Yes, that is correct. Since approximately 20 sulfane sulfur atoms are detected in the protein with 20 cysteine residues, we believe that nearly all cysteine residues are modified by sulfane sulfur. Therefore, Raman spectroscopy is considered applicable to our current study.

      - The modeling presented in this study is very interesting and confirms the flexibility of metallothioneins. MT domains are known to bind various metal ions of different diameters. They adopt in this way to larger size the ions. The same mechanism could be present from the protein site. The presence of 9 or 11 sulfur atoms in the beta or alfa domain would increase the size of the domains without changing the cluster structure.

      We truly appreciate your positive evaluation of this work.

      - Comment to authors. Apo-MT is not present in the cell. It exists as a partially metallated species. The term "apo-MT" was introduced to explain that MTs are not fully saturated by metals and function as a metal buffer system. Apo-MT comes from old ages when MT was considered to be present only in two forms: apo-form and fully saturated forms.

      Thank you for your insightful comments. We find it reasonable to understand that apo-MT exists as a partially metallated species within the cell.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors reveal that GIF/MT-3 regulates zinc homeostasis depending on the cellular redox status. The manuscript technically sounds, and their data concretely suggest that the recombinant MTs, not only GIF/MT-3 but also canonical MTs such as MT-1 and MT-2, contain sulfane sulfur atoms for the Zn-binding. The scenario proposed by the authors seems to be reasonable to explain the Zn homeostasis by the cellular redox balance.

      Strengths:

      The data presented in the manuscript solidly reveal that recombinant GIF/MT-3 contains sulfane sulfur.

      Weaknesses:

      It is still unclear whether native MTs, in particular, induced MTs in vivo contain sulfane sulfur or not.

      Thank you for pointing out the strengths and weaknesses of this manuscript. Based on your suggestions, we have determined the sulfane sulfur content in the native GIF/MT-3 protein, as explained in our response to "Recommendations for the Authors #2."

      Reviewer #3 (Public Review):

      Summary:

      The authors were trying to show that a novel neuronal metallothionein of poorly defined function, GIF/MT3, is actually heavily persulfidated in both the Zn-bound and apo (metal-free) forms of the molecule as purified from a heterologous or native host. Evidence in support of this conclusion is compelling, with both spectroscopic and mass spectrometry evidence strongly consistent with this general conclusion. The authors would appear to have achieved their aims.

      Strengths:

      The analytical data are compelling in support of the author's primary conclusions are strong. The authors also provide some modeling evidence that strongly supports the contention that MT3 (and other MTs) can readily accommodate sulfane sulfur on each of the 20 cysteines in the Zn-bound structure, with little perturbation of the structure. This is not the case with Cys trisulfides, which suggests that the persulfide-metallated state is clearly positioned at lower energy relative to the immediately adjacent thiolate- or trisulfidated metal coordination complexes.

      Weaknesses:

      The biological significance of the findings is not entirely clear. On the one hand, the analytical data are clearly solid (albeit using a protein derived from a bacterial over-expression experiment), and yes, it's true that sulfane S can protect Cys from overoxidation, but everything shown in the summary figure (Fig. 8D) can be done with Zn release from a thiol by ROS, and subsequent reduction by the Trx/TR system. In addition, it's long been known that Zn itself can protect Cys from oxidation. I view this as a minor weakness that will motivate follow-up studies. Fig. 1 was incomplete in its discussion and only suggests that a few S atoms may be covalently bound to MT3 as isolated. This is in contrast to the sulfate S "release" experiment, which I find quite compelling.

      Impact:

      The impact will be high since the finding is potentially disruptive to the metals in the biology field in general and the MT field for sure. The sulfane sulfur counting experiment (the HPE-IAM electrophile trapping experiment) may well be widely adopted by the field. Those of us in the metals field always knew that this was a possibility, and it will interesting to see the extent to which metal-binding thiolates broadly incorporate sulfate sulfur into their first coordination shells.

      Thank you for pointing out the strengths and weaknesses of this manuscript. As you noted, the explanations and discussions regarding Fig. 1 were missing. To address this, we have added the following sentences to the discission section: “However, FT-ICR-MALDI-TOF/MS analysis failed to detect sulfur modifications in GIF/MT-3 (Fig. 1B), suggesting that sulfur modifications in the protein were dissociated during laser desorption/ionization. Therefore, we postulate that the small amount of sulfur detected in oxidized apo-GIF/MT-3 is derived from the effect of laser desorption/ionization rather than any actual modification of the minority component.”

      Reviewer #1 (Recommendations For The Authors):

      Overall, the topic of the study is interesting, but the provided evidence is insufficient to claim that MT3 is a sulfane sulfur-binding protein. Indeed, some recent studies showed that natural and recombinant MT proteins can be modified, but only one or a few cysteine residues were modified. Authors should follow my suggestion and apply mass spectrometry to all performed reactions and, first of all, to freshly obtained protein. I strongly suggest using chemically synthesized and reconstituted domains to test whether the home-developed approach is appropriate. Moreover, native MS and ICP-MS analysis of MT3 would support their claims.

      Thank you for your insightful comments. Following your suggestions, we have prepared chemically synthesized proteins of the α- and β-domains of GIF/MT-3 and conducted additional experiments, as explained in response comments to “Public Review #1”. Regarding the MS analysis, we have also added a discussion on the difficulty of detecting sulfur modifications in the protein.

      Reviewer #2 (Recommendations For The Authors):

      I have some minor points which should be considered by the authors.

      (1) Table 1: In the simulation by MOE, the authors speculated 7 atoms of metal bound to GIF/MT-3. Although a total of 7 atoms of Zn or Cd are actually bound to MTs as a divalent ion, the number of Cu and Hg bound to MTs as a monovalent ion is scientifically controversial. Several ideas have been proposed in the literature, however, "7 atoms of Cu or Hg" could be inappropriate as far as I know. The authors should simulate again using a more appropriate number of Cu or Hg in MTs.

      Thank you for providing this valuable information. We reviewed several papers by the Stillman group and found that the relative binding constants of Cu4-MT, Cu6-MT, and Cu10-MT were determined after the addition of Cu(I) to apo MT-1A, MT-2, and MT-3 (Melenbacher and Stillman, Metallomics, 2024). However, incorporating these copper numbers into our GIF/MT-3 simulation model proved challenging. Therefore, we decided to omit the score value for copper in Table 1.

      On the other hand, some researchers have reported that mercury binds to MT as a divalent ion, and the formation of Hg<sub>7</sub>MT is possible (not just other forms). Therefore, we decided to continue using the score value for mercury shown in Table 1.

      (2) If possible, native MT samples isolated from an experimental animal should be evaluated for the sulfane sulfur content. Canonical MTs, MT-1 and MT-2, are highly inducible by not only heavy metals but also oxidative stress. Under the oxidative stress condition such as the exposure of hydrogen peroxide, it is questionable whether the induced Zn-MTs contain sulfane sulfur or not.

      According to your suggestion, we evaluated the sulfane sulfur content in native GIF/MT-3 samples isolated from mouse brain cytosol (Fig. 10). The measured amount was 3.3 per protein. This suggests that sulfane sulfur in GIF/MT-3 could be consumed under oxidative conditions, as you anticipated. Another possible explanation for the discrepancy between the native form and recombinant protein is likely related to metal binding in the protein. It is generally understood that both zinc and copper bind to GIF/MT-3 in approximately equal proportions in vivo. When we prepared recombinant copper-binding GIF/MT-3 protein, the sulfane sulfur content in the protein was significantly different (approximately 4.0 per protein) compared to the Zn<sub>7</sub>GIF/MT-3 form. Further studies are needed to clarify the relationship between sulfane sulfur binding and the types of metals in the future.

      (3) The biological significance of sulfane sulfur in MTs is still unclear to me.

      Thank you for your comments. To address this question, we have added the following sentence to the discussion section: “The biological significance of sulfane sulfur in MTs lies in its ability to 1) contribute to metal binding affinity, 2) provide a sensing mechanism against oxidative stress, and 3) aid in the regeneration of the protein.”

      (4) According to the widely accepted nomenclature of MT, "MT3" should be amended to "MT-3".

      According to your suggestion, we have amended from MT3 to MT-3 throughout the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      Most of my comments are editorial in nature, largely focused on what I perceive as overinterpretation or unnecessary speculation.

      The authors state in the abstract that the intersection of sulfane sulfur and Zn enzymes "has been overlooked." This is not actually true - please tone down to "under investigated" or something like this.

      Based on your suggestion, we have replaced the term “has been overlooked” with “has been under investigated” in the abstract.

      Line 228: The discussion of Fig. 6C involved too much speculation. I cannot see a quantitative experiment that supports this.

      Based on your suggestion, we have removed Fig. 6C (currently referred to as Fig. 7C). Additionally, we have revised the sentence from “implying that the sulfane sulfur is an essential zinc ligand in apo-GIF/MT3 and that an asymmetric SSH or SH ligand is insufficient for native zinc binding (Fig. 6C)” to “implying the contribution of sulfane sulfur to zinc binding in GIF/MT-3”.

      Line 247 "persulfide in apo-GIF/MT3 seems.." I think the authors mean that the Zn form of the protein is resistant to Trx or TCEP.

      Thank you for pointing this out. We realized that the term “persulfide in apo-GIF/MT3” might be confusing. Therefore, we have replaced it with “persulfide formation derived from apo-GIF/MT3” in the corresponding sentence.

      Molecular modeling: We need more details- were these structures energy-minimized in any way? Can the authors comment on the plethora of S-S dihedral angles in these structures, and whether they are consistent with expectations of covalent geometry? Please add text to explain or even a table that compiles these data.

      Thank you for your comment. Yes, energy minimization calculations for structural optimization were conducted during homology modeling in MOE. In fact, we have already stated in the Methods section that “Refinement of the model with the lowest generalized Born/volume integral (GBVI) score was achieved through energy minimization of outlier residues in Ramachandran plots generated within MOE.” In this model, covalent geometry, including the S-S dihedral angles, is also taken into consideration.

      What is a thermostability score? Perhaps a bit more discussion here and what relationship this has to an apparent (or macroscopic) metal affinity constant.

      The thermostability score is used to compare the thermal stability between the wild-type and mutant proteins. As shown in Equation (1) in the method section, it is calculated by subtracting the energy of the hypothetical unfolded state from the energy of the folded state. Since obtaining the structure of the unfolded state requires extensive computational effort, MOE employs an empirical formula based on two-dimensional structural features to estimate it. The ΔΔG values represent the difference between ΔGf(WT) and ΔGf(Mut). However, because it is difficult to directly determine ΔGf(Mut) and ΔGf(WT), MOE calculates ΔΔG using the thermodynamic cycle equivalence: ΔΔGs =ΔGsf (WT→Mut) - ΔGsu (WT→Mut), as expressed in Equation (1).

      On the other hand, the affinity score represents the interaction energy between the target ligand and the protein. In this study, we calculated the affinity score by selecting metal atoms as the ligands. The interaction energy (E int) is defined as:

      E int = E complex − E receptor − E ligand

      where each term is as follows:

      E complex : Potential energy of the complex.

      E receptor : Potential energy of the receptor alone.

      E ligand : Potential energy of the ligand alone.

      Each potential energy term includes contributions from bonded interactions such as bond lengths and bond angles. However, since there is no structural difference among E receptor, and E ligand, the bonded energy components cancel out. Consequently, E int is determined as:

      E int = ΔEele +ΔEvdW +ΔE sol

      Here, a negative E int indicates that the complex is more stable, while a positive E int implies that the receptor and ligand are more stable in their dissociated states.

      We have revised the sentence "The affinity score was also calculated using MOE software as the difference between the ΔΔGs values of the protein, free zinc, and metal–protein complex” to "The affinity score was also calculated using MOE software as the difference between the potential energy values of the protein, free zinc, and metal–protein complex” to correct the misdescription.

      Lines 278-280: The authors state that they observe a "marked enhancement of metal binding affinity, and rearrangement of zinc ions." I don't see support for this rather provocative conclusion. This is the expectation of course. I would love to see actual experimental data on this point, direct binding titrations with metals performed before and after the release of the sulfate sulfur atoms.

      Thank you for your comments. Although this statement is based on the 3D modeling simulation, we have also experimentally observed that the diminishment of sulfane sulfur in GIF/MT-3 resulted in a decrease in zinc binding levels, as shown in Fig. 7. However, conducting direct binding titration experiments was difficult for us due to the difficulty in preparing pure GIF/MT-3 protein with or without sulfane sulfur. Therefore, we have revised the sentence "marked enhancement of metal binding affinity, and rearrangement of zinc ions" to simply "enhancement of metal binding affinity" to avoid over-speculation.

      Table I- quantitatively lower stability for the Cu complex- the stoichiometry is clearly wrong in this simulation- please redo this simulation with the right stoichiometry or Cu to MT3- consult a Stillman paper.

      Thank you for providing this valuable information. We reviewed several papers by the Stillman group and found that the relative binding constants of Cu4-MT, Cu6-MT, and Cu10-MT were determined after the addition of Cu(I) to apo MT-1A, MT-2, and MT-3 (Melenbacher and Stillman, Metallomics, 2024). However, incorporating these copper numbers into our GIF/MT-3 simulation model proved challenging. Therefore, we decided to omit the score value for copper in Table 1.

      I like the model for reversible metal release mediated by the thioredoxin system (Fig. 8D)- but you can also do this with thiols- nothing really novel here. Has it been generally established that tetraulfides are better substrates for the Trx/TR system? The data shown in Fig. 7B seems to suggest this, but is this broadly true, from the literature?

      There are reports describing that persulfides and polysulfides are reduced by the thioredoxin system. However, it is not well-established that tetraulfides are better substrates for the Trx/TR system. To the best of our knowledge, this is the first report demonstrating that apo-MT-3 can serve as a good substrate for the Trx/TR system. Further research is required to compare the catalytic efficiency between proteins containing disulfide and those with tetraulfide moieties.

      Line 380: Many groups have reported that many proteins are per- or polysulfidated in a whole host of cells using mass spectrometry workflows, and that terminal persulfides can be readily reduced by general or specific Trx/TR systems. This work could be better acknowledged in the context of the authors' demonstration of the reduction of the tetrasulfides, which itself would appear to be novel (and exciting!).

      We truly appreciate your positive evaluation of this work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript uses the eye lens as a model to investigate basic mechanisms in the Fgf signaling pathway. Understanding Fgf signaling is of broad importance to biologists as it is involved in the regulation of various developmental processes in different tissues/organs and is often misregulated in disease states. The Fgf pathway has been studied in embryonic lens development, namely with regards to its involvement in controlling events such as tissue invagination, vesicle formation, epithelium proliferation, and cellular differentiation, thus making the lens a good system to uncover the mechanistic basis of how the modulation of this pathway drives specific outcomes. Previous work has suggested that proteins, other than the ones currently known (e.g., the adaptor protein Frs2), are likely involved in Fgfr signaling. The present study focuses on the role of Shp2 and Shc1 proteins in the recruitment of Grb2 in the events downstream of Fgfr activation.

      Strengths:

      The findings reveal that the juxtamembrane region of the Fgf receptor is necessary for proper control of downstream events such as facilitating key changes in transcription and cytoskeleton during tissue morphogenesis. The authors conditionally deleted all four Fgfrs in the mouse lens that resulted in molecular and morphological lens defects, most importantly, preventing the upregulation of the lens induction markers Sox2 and Foxe3 and the apical localization of F-actin, thus demonstrating the importance of Fgfrs in early lens development, i.e. during lens induction. They also examined the impact of deleting Fgfr1 and 2, on the following stage, i.e. lens vesicle development, which could be rescued by expressing constitutively active KrasG12D. By using specific mutations (e.g. Fgfr1ΔFrs lacking the Frs2 binding domain and Fgfr2LR harboring mutations that prevent binding of Frs2), it is demonstrated that the Frs2 binding site on Fgfr is necessary for specific events such as morphogenesis of lens vesicle. Further, by studying Shp2 mutations and deletions, the authors present a case for Shp2 protein to function in a context-specific manner in the role of an adaptor protein and a phosphatase enzyme. Finally, the key surprising finding from this study is that downstream of Fgfr signaling, Shc1 is an important alternative pathway - in addition to Shp2 - involved in the recruitment of Grb2 and in the subsequent activation of Ras. The methodologies, namely, mouse genetics and state-of-the-art cell/molecular/biochemical assays are appropriately used to collect the data, which are soundly interpreted to reach these important conclusions. Overall, these findings reveal the flexibility of the Fgf signaling pathway and its downstream mediators in regulating cellular events. This work is expected to be of broad interest to molecular and developmental biologists.

      Weaknesses:

      A weakness that needs to be discussed is that Le-Cre depends on Pax6 activation, and hence its use in specific gene deletion will not allow evaluation of the requirement of Fgfrs in the expression of Pax6 itself. But since this is the earliest Cre available for deletion in the lens, mentioning this in the discussion would make the readers aware of this issue. Referring to Jag1 among "lens-specific markers" (page 5) is debatable, suggesting changing to the lines of "the expected upregulation of Jag1 in lens vesicle". The Abstract could be modified to clearly convey the existing knowledge gap and the key findings of the present study. As it stands now, it is a bit all over the place. Some typos in the manuscript need to be fixed, e.g. "...yet its molecular mechanism remains largely resolved" - unresolved? "...in the development lens" - in the developing lens? In Figure 4 legend, "(B) Grb2 mutants Grb2 mutants displayed...", etc.

      We thank the reviewer for the thoughtful and constructive feedback. We have added the caveat regarding the Le-Cre dependency on Pax6 expression to the discussion, removed the reference to Jag1 as a “lens-specific marker” and corrected the typographical errors noted by the reviewer.

      Reviewer #2 (Public review):

      Summary:

      I have reviewed a manuscript submitted by Wang et al., which is entitled "Shc1 cooperates with Frs2 and Shp2 to recruit Grb2 in FGF-induced lens development". In this paper, the authors first examined lens phenotypes in mice with Le-Cre-mediated knockdown (KD) of all four FGFR (FGFR1-4), and found that pERK signals, Jag1, and foxe3 expression are absent or drastically reduced, indicating that FGF signaling is essential for lens induction. Next, the authors examined lens phenotypes of FGFR1/2-KD mice and found that lens fiber differentiation is compromised and that proliferative activity and cell survival are also compromised in lens epithelium. Interestingly, Kras activation rescues defects in lens growth and lens fiber differentiation in FGFR1/2-KD mice, indicating that Ras activation is a key step for lens development. Next, the authors examined the role of Frs2, Shp2, and Grb2 in FGF signaling for lens development. They confirmed that lens fiber differentiation is compromised in FGFR1/3-KD mice combined with Frs2-dysfunctional FGFR2 mutants, which is similar to lens phenotypes of Grb2-KD mice. However, lens defects are milder in mice with Shp2YF/YF and Shp2CS mutant alleles, indicating that the involvement of Shp2 is limited for the Grb2 recruitment for lens fiber differentiation. Lastly, the authors showed new evidence on the possibility that another adapter protein, Shc1, promotes Grb2 recruitment independent of Frs2/Shp2-mediated Grb2 recruitment.

      Strengths:

      Overall, the manuscript provides valuable data on how FGFR activation leads to Ras activation through the adapter platform of Frs2/Shp2/Grb2, which advances our understanding of complex modification of the FGF signaling pathway. The authors applied a genetic approach using mice, whose methods and results are valid to support the conclusion. The discussion also well summarizes the significance of their findings.

      Weaknesses:

      The authors eventually found that the new adaptor protein Shc1 is involved in Grb2 recruitments in response to FGF receptor activation. however, the main data for Shc1 are histological sections and statistical evaluation of lens size. So, my major concern is that the authors need to provide more detailed data to support the involvement of Shc1 in Grb2 recruitment of FGF signaling for lens development.

      We thank the reviewer for the positive comments and valuable suggestions. We have addressed the concerns in detail in the response to the recommendation outlined below.

      Reviewer #3 (Public review):

      Summary:

      The manuscript entitled "Shc1 cooperates with Frs2 and Shp2 to recruit Grb2 in FGF-induced lens development" by Wang et al., investigates the molecular mechanism used by FGFR signaling to support lens development. The lens has long been known to depend on FGFR signaling for proper development. Previous investigations have demonstrated that FGFR signaling is required for embryonic lens cell survival and for lens fiber cell differentiation. The requirement of FGFR signaling for lens induction has remained more controversial as deletion of both Fgfr1 and Fgfr2 during lens placode formation does not prevent the induction of definitive lens markers such as FOXE3 or αA-crystallin. Here the authors have used the Le-Cre driver to delete all four FGFR genes from the developing lens placode demonstrating a definitive failure of lens induction in the absence of FGFR signaling. The authors focused on FGFR1 and FGFR2, the two primary FGFRs present during early lens development, and demonstrated that lens development could be significantly rescued in lenses lacking both FGFR1 and FGFR2 by expressing a constitutively active allele of KRAS. They also showed that the removal of pro-apoptotic genes Bax and Bak could also lead to a substantial rescue of lens development in lenses lacking both FGFR1 and FGFR2. In both cases, the lens rescue included both increased lens size and the expression of genes characteristic of lens cells.

      Significantly the authors concentrated on the juxtamembrane domain, a portion of the FGFRs associated with FRS2. Previous investigations have demonstrated the importance of FRS2 activation for mediating a sustained level of ERK activation. FRS2 is known to associate both with GRB2 and SHP2 to activate RAS. The authors utilized a mutant allele of Fgfr1, lacking the entire juxtamembrane domain (Fgfr1ΔFrs), and an allele of Fgfr2 containing two-point mutations essential for Frs2 binding (Fgfr2LR). When combining three floxed alleles and leaving only one functional allele (Fgfr1ΔFrs or Fgfr2LR) the authors got strikingly different phenotypes. When only the Fgfr1ΔFrs allele was retained, the lens phenotype matched that of deleting both Fgfr1 and Fgfr2. However, when only the Fgfr2LR allele was retained the phenotype was significantly milder, primarily affecting lens fiber cell differentiation, suggesting that something other than FRS2 might be interacting with the juxtamembrane domain to support FGFR signaling in the lens. The authors also deleted Grb2 in the lens and showed that the phenotype was similar to that of the lenses only retaining the Fgfr2LR allele, resulting in a failure of lens fiber cell differentiation and decreased lens cell survival. However, mutating the major tyrosine phosphorylation site of GRB2 did not affect lens development. The author additionally investigated the role of SHP2 lens development by making by either deleting SHP2 or by making mutations in the SHP2 catalytic domain. The deletion of the SHP2 phosphatase activity did not affect lens development as severely as the total loss of SHP2 protein, suggesting a function for SHP2 outside of its catalytic activity. Although the loss of Shc1 alone has only a slight effect on lens size and pERK activation in the lens, the authors showed that the loss of Shc1 exacerbated the lens phenotype in lenses lacking both Frs2 and Shp2. The authors suggest that SHC1 binds to the FGFR juxtamembrane domain allowing for the recruitment of GRB2 independently of FRS2.

      Strengths:

      (1) The authors used a variety of genetic tools to carefully dissect the essential signals downstream of FGFR signaling during lens development.

      (2) The authors made a convincing case that something other than FRS2 binding mediates FGFR signaling in the juxtamembrane domain.

      (3) The authors demonstrated that despite the requirement of both the adaptor function and phosphatase activity of SHP2 are required for embryonic survival, neither of these activities is absolutely required for lens development.

      (4) The authors provide more information as to why FGFR loss has a phenotype much more severe than the loss of FRS2 alone during lens development.

      (5) The authors followed up their work analyzing various signaling molecules in the context of lens development with biochemical analyses of FGF-induced phosphorylation in murine embryonic fibroblasts (MEFs).

      (6) In general, this manuscript represents a Herculean effort to dissect FGFR signaling in vivo with biochemical backing with cell culture experiments in vitro.

      We thank the reviewer for the thorough review of our paper and positive comments.

      Weaknesses:

      (1) The authors demonstrate that the loss of FGFR1 and FGFR2 can be compensated by a constitutive active KRAS allele in the lens and suggest that FGFRs largely support lens development only by driving ERK activation. However, the authors also saw that lens development was substantially rescued by preventing apoptosis through the deletion of BAK and BAX. To my knowledge, the deletion of BAK and BAX should not independently activate ERK. The authors do not show whether ERK activation is restored in the BAK/BAX deficient lenses. Do the authors suggest the FGFR3 and/or FGFR4 provide sufficient RAS and ERK activation for lens development when apoptosis is suppressed? Alternatively, is it the survival function of FGFR-signaling as much as a direct effect on lens differentiation?

      Our interpretation is that at the lens induction stage, where FGFR1 and FGFR2 are crucial, their primary function operates through Ras signaling to promote cell survival. Thus, either constitutively active KRAS or the direct suppression of apoptosis by deleting Bak and Bax is sufficient to rescue lens induction. This rescue enables the subsequent differentiation of lens progenitor cells, a process for which FGFR3 and FGFR4 are sufficient to support.

      (2) The authors make the argument that deleting all four FGFRs prevented lens induction but that the deletion of only FGFR1 and FGFR2 did not. Part of this argument is the retention of FOXE3 expression, αA-crystallin expression, and PROX1 expression in the FGFR1/2 double mutants. However, in Figure 1E, and Figure 1F, the staining of the double mutant lens tissue with FOXE3, αA-crystallin, and PROX1 is unconvincing. However, the retention of FOXE3 expression in the FGFR1/FGFR2 double mutants was previously demonstrated in Garcia et al 2011. Also, there needs to be an enlargement or inset to demonstrate the retention of pSMAD in the quadruple FGFR mutants in Figure 1D.

      We have updated Figure 1E with a clearer image of FOXE3 staining to better illustrate FOXE3 expression in the FGFR1/2 double mutants. It seems there may have been a misunderstanding regarding our claims about αA-crystallin and PROX1. To clarify, our observation is that both αA-crystallin and PROX1 are lost in the FGFR1/2 double mutants, which we believe is clearly demonstrated in Figure 1F. Additionally, we have added inserts to Figure 1D to highlight the retention of pSMAD.

      (3) Do the authors suggest that GRB2 is required for RAS activation and ultimately ERK activation? If so, do the authors suggest that ERK activation is not required for FGFR-signaling to mediate lens induction? This would follow considering that the GRB2 deficient lenses lack a problem with lens induction.

      We do believe that GRB2 is required for RAS-ERK signaling activation; however, ERK activation is not absolutely required for lens induction. This conclusion is consistent with our previous study, which showed that deletion of ERK1/2 did not prevent lens induction (Garg et al. eLife 2020;9:e51915), as well as with our current findings demonstrating that the GRB2-deficient mutant is still capable of supporting lens induction.

      (4) The increase in p-Shc is only slightly higher in the Cre FGFR1f/f FGFR2r/LR than in the FGFR1f/Δfrs FGFR2f/f. Can the authors provide quantification?

      pShc quantification is now provided in Fig. 7B.

      (5) The authors have not shown directly that Shc1 binds to the juxtamembrane region of either Fgfr1 or Fgfr2.

      It is not yet clear whether Shc1 directly binds to the juxtamembrane region of FGFR1 or FGFR2, as it may also be recruited indirectly. We acknowledge this as an important question that warrants further investigation in future studies.

      (6) The authors have used the Le-Cre strain for all of their lens deletion experiments. Previous work has documented that the Le-Cre transgene can cause lens defects independent of any floxed alleles in both homozygous and hemizygous states on some genetic backgrounds (Dora et al., 2014 PLoS One 9:e109193 and Lam et al., Human Genomics 2019 13(1):10. Are the controls used in these experiments Le-Cre hemizygotes?

      As stated in the Method section, Le-Cre only or Le-Cre and heterozygous flox mice were used as controls.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Weaknesses

      There are only a few minor weaknesses that need to be addressed.

      (1) The point could be made in the Discussion that since Le-Cre depends on Pax6 placodal expression, it is challenging to evaluate the impact of deletion of the four Fgfrs on the expression of Pax6 (since Pax6 needs to be activated prior to achieving Fgfr deletion). A different Cre line (e.g. a Cre which is expressed in the surface ectoderm prior to lens placode formation) could help partially address this question, although it may not be able to comment on the requirement of the Fgfrs specifically in the lens ectoderm. Thus, it will be prudent to mention this in the discussion.

      We have added the caveat regarding the Le-Cre dependency on Pax6 expression to the discussion.

      (2) Referring to Jag1 among "lens-specific markers" (page 5) is debatable, I suggest changing it along the lines of "the expected upregulation of Jag1 in lens vesicle".

      The wording has been changed as suggested.  

      (3) The Abstract could be modified to clearly convey the existing knowledge gap and the key findings of the present study. As it stands now, it is a bit all over the place.

      The abstract has been revised.  

      (4) Some typos in the manuscript need to be fixed.

      e.g. "...yet its molecular mechanism remains largely resolved" - unresolved?, "...in the development lens" - in the developing lens?, In Fig. 4 legend, "(B) Grb2 mutants Grb2 mutants displayed...", etc.

      These typos have been corrected.

      Reviewer #2 (Recommendations for the authors):

      My specific suggestions are shown below.

      (1) The authors need to describe the role of Shc1 in FGF signaling and vertebrate lens development, by citing previous publications in the introduction.

      We have detailed previous studies on the role of Shc in FGF signaling in the Introduction and discussed its function in the vertebrate lens in the Discussion section.

      (2) Figure 1B bottom panels: Inset images seem to be missing, although frames and arrowheads are there. Please check them.

      The inset images were correctly placed.

      (3) Results (page 5, line 13): The authors mentioned "Sox2 expression remained at basal levels". Since Figure 1B indicates that Sox2 expression fails to be upregulated in FGFR1/2 mutant lens placode in contrast to Pax6, it is better to clearly mention the failure in upregulation of Sox2 expression in the FGFR1/2 mutants.

      This sentence has been rewritten as suggested.  

      (4) Results (page 6, line 8): The authors mentioned "we observed .... expression of Foxe3 in ...mutant lens cells (Figure 1E, arrows). However, Foxe3-expressing lens cells are a very small population in Figure 1E. It is important to state the decreased number of Foxe3-expressing lens cells in FGFR1/2 mutants. In addition, I would like to request the authors to show histograms indicating sample size and statistical analysis for marker expression: Foxe3 (Figure 1E), Prox1 and aA-crystallin (Fig. 1F), cyclin D1 and TUNEL (Fig. 1G) and pmTOR and pS6 (Supplementary figure 1B).

      We added a statement indicating that the number of Foxe3-expressing cells is reduced in FGFR1/2 mutants, which is now quantified in Fig. 1H. Quantifications for Cyclin D1 and TUNEL are now shown in Fig. 1I and J, respectively. However, we chose not to quantify Prox1, αA-crystallin, pmTOR, and pS6, as the FGFR1/2 mutants showed no staining for these markers.

      (5) Results (page 6, line 19- page 7, line 6): The authors showed that inducible expression of constitutive active Kras, KrasG12D, using Le-Cre, recovered lens size to the half level of wild-type control. However, in the lens of mice with Le-Cre; FGFR1/2f/f; LSL-KrasG12D, pERK was detected in the most posterior edge of the lens fiber core, whereas pERK was detected in the broader area of the lens in control. Furthermore, pMEK was detected in the whole lens of mice with Le-Cre; FGFR1/2f/f; and LSL-KrasG12D, whereas pMEK was detected only in the lens epithelial cells at the equator. So, the spatial profile of pERK and pMEK expression was different from those of wild-type, although the authors observed that Prox1 and Crystallin expression are normally induced in the lens of mice with Le-Cre; FGFR1/2f/f; LSL-KrasG12D. I wonder whether the lens normally develops in mice with Le-Cre; LSL-KrasG12D? Is the lens growth enhanced in mice with Le-Cre; LSL-KrasG12D? Please add the panels of mice with Le-Cre; LSL-KrasG12D in Figure 2B and 2C. In addition, I wonder whether apoptosis is suppressed in the lens of mice with Le-Cre; FGFR1/2f/f; LSL-KrasG12D?

      As we previously reported (Developmental Biology 355, 2011, 12–20), Le-Cre; LSL-KrasG12D did not lead to enhanced lens growth. While we agree that including images of Le-Cre; LSL-KrasG12D as controls in Fig. 2B and C and evaluating apoptosis in Le-Cre; FGFR1/2f/f; LSL-KrasG12D mutants would be appropriate, we regretfully no longer have these animals available to conduct these experiments.

      (6) Results (page 11, line 15): the PCR genotyping image of Fig. 6C seems to be missing.

      The PCR genotyping image was correctly placed below Fig. 6B. 

      (7) Results (page 11, lines 15-20): there is no citation of Figure 6D in the results section.

      The citation for Fig. 6D is added in the results section.

      (8) Figures 5H, 6H, and 7A: Western blotting of some of the pERK, ERK lanes is missing.

      These western blots all have pERK/ERK overlay images.

      (9) Figure 7A, western blotting data on pShc levels are important to suggest the involvement of Shc1 in Frs2-independent Grb2 activation by FGF stimulation. Please provide the histogram for statistical analysis.

      pShc quantification is now provided in Fig. 7B.

      (10) There is no citation of Figure 7D, E, and F in the results section. Please add them.

      These citations have been added.

      (11) Figures 7E, and 7F: The authors showed that lens morphology and lens size evaluation in genetic combinations: control, Frs2/Shc1 KD, Frs2/Shp2 KD, and Frs2/Shp2/Shc1 KD. However, I would like to request the authors to show more detailed data in these genetic combinations, for example, pERK, foxe3, Maf, Prox1, Jag1, p57, cyclin D3, g-crystallin, and TUNEL.

      Unfortunately, we no longer have these mutant mice to perform these detailed staining.  

      Reviewer #3 (Recommendations for the authors):

      (1) The figure legend for Figure 2 lists (G) twice. The second (G) should be (H). Also, in Figures 2G and H there is no indication as to what stage lenses were used for the TUNEL and size analyses. I assume that it was E13.5, but it should be explicitly stated.

      The figure labeling has been corrected and the stage added to the figure legend.

      (2) In Figure 4 A the label should be gamma-crystallin rather than r-crystallin.

      The figure labeling has been corrected.

      (3) In Figure 6 D, I believe that the immunolabeling for Maf and Foxe3 are reversed. The Maf should be red as it is in the fibers and the Foxe3 should be green as it is epithelial.

      The figure labeling has been corrected.

      (4) In Figure 6C I believe that the labels for the WT and YF alleles on the western blot are reversed.

      The YF PCR band was designed to be larger than WT, so the labeling was correct as is.

      (5) In Figure 6F I believe that the labels for WT and CS on the western blot are reversed.

      The figure labeling has been corrected.

      (6) In Supplemental figure 2 there are no genotype labels for the TUNEL bar graph.

      The figure labeling has been added.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript investigates a mechanism between the histone reader protein YEATS2 and the metabolic enzyme GCDH, particularly in regulating epithelial-to-mesenchymal transition (EMT) in head and neck cancer (HNC).

      Strengths:

      Great detailing of the mechanistic aspect of the above axis is the primary strength of the manuscript.

      Weaknesses:

      Several critical points require clarification, including the rationale behind EMT marker selection, the inclusion of metastasis data, the role of key metabolic enzymes like ECHS1, and the molecular mechanisms governing p300 and YEATS2 interactions.

      We would like to sincerely thank the reviewer for the detailed, in-depth, and positive response. We are committed to implementing constructive revisions to the manuscript to address the reviewer’s concerns effectively.

      Major Comments:

      (1) The title, "Interplay of YEATS2 and GCDH mediates histone crotonylation and drives EMT in head and neck cancer," appears somewhat misleading, as it implies that YEATS2 directly drives histone crotonylation. However, YEATS2 functions as a reader of histone crotonylation rather than a writer or mediator of this modification. It cannot itself mediate the addition of crotonyl groups onto histones. Instead, the enzyme GCDH is the one responsible for generating crotonyl-CoA, which enables histone crotonylation. Therefore, while YEATS2 plays a role in recognizing crotonylation marks and may regulate gene expression through this mechanism, it does not directly catalyse or promote the crotonylation process.

      We thank the reviewer for raising this concern. As stated by the reviewer, YEATS2 functions as a reader protein, capable of recognizing histone crotonylation marks and assisting in the addition of this mark to nearby histone residues, possibly by assisting the recruitment of the writer protein for crotonylation. Our data indicates the involvement of YEATS2 in the recruitment of writer protein p300 on the promoter of the SPARC gene, making YEATS2 a regulatory factor responsible for the addition of crotonyl marks in an indirect manner. Thus, we have decided to make changes in the title by replacing the word “mediates” with “regulates”. Therefore, the updated title can be read as: “Interplay of YEATS2 and GCDH regulates histone crotonylation and drives EMT in head and neck cancer”.

      (2) The study suggests a link between YEATS2 and metastasis due to its role in EMT, but the lack of clinical or pre-clinical evidence of metastasis is concerning. Only primary tumor (PT) data is shown, but if the hypothesis is that YEATS2 promotes metastasis via EMT, then evidence from metastatic samples or in vivo models should be included to solidify this claim.

      We appreciate the reviewer’s suggestion. Here, we would like to state that the primary aim of this study was to delineate the molecular mechanisms behind the role of YEATS2 in maintaining histone crotonylation at the promoter of genes that favour EMT in head and neck cancer. We have dissected the importance of histone crotonylation in the regulation of gene expression in head and neck cancer in great detail, having investigated the upstream and downstream molecular players involved in this process that promote EMT. Moreover, with the help of multiple phenotypic assays, such as Matrigel invasion, wound healing, and 3D invasion assays, we have shown the functional importance of YEATS2 in promoting EMT in head and neck cancer cells. Since EMT is known to be a prerequisite process for cancer cells undergoing metastasis(1), the evidence of YEATS2 being associated with EMT demonstrates a potential correlation of YEATS2 with metastasis. However, as part of the revision, we will use publicly available patient data to investigate the direct association of YEATS2 with metastasis by checking the expression of YEATS2 between different grades of head and neck cancer, as an increase in tumor grade is often correlated with the incidence of metastasis(2).

      (3) There seems to be some discrepancy in the invasion data with BICR10 control cells (Figure 2C). BICR10 control cells with mock plasmids, specifically shControl and pEGFP-C3 show an unclear distinction between invasion capacities. Normally, we would expect the control cells to invade somewhat similarly, in terms of area covered, within the same time interval (24 hours here). But we clearly see more control cells invading when the invasion is done with KD and fewer control cells invading when the invasion is done with OE. Are these just plasmid-specific significant effects on normal cell invasion? This needs to be addressed.

      We appreciate the reviewer for the thorough evaluation of the manuscript. The figure panels in question, Figure 2B and 2C, represent two different experiments performed independently, the invasion assay performed after knockdown and overexpression of YEATS2, respectively. We would like to clarify that both panels represent results that are distinct and independent of each other and that the method used to knockdown or overexpress YEATS2 is also different. As stated in the Materials and Methods section, the knockdown is performed using lentivirus-mediated transfection (transduction) of cells, on the other hand, the overexpression is done using standard method of transfection by directly mixing transfection reagent and the respective plasmids, prior to the addition of this mix to the cells. The difference in the experimental conditions in these two experiments might have attributed to the differences seen in the controls as observed previously(3). Hence, we would like to state that the results of figure panels Figure 2B and Figure 2C should be evaluated independently of each other.

      (4) In Figure 3G, the Western blot shows an unclear band for YEATS2 in shSP1 cells with YEATS2 overexpression condition. The authors need to clearly identify which band corresponds to YEATS2 in this case.

      The two bands seen in the shSP1+pEGFP-C3-YEATS2 condition correspond to the endogenous YEATS2 band (lower band, indicated by * in the shControl lane) and YEATS2-GFP band (upper band, corresponding to overexpressed YEATS2-GFP fusion protein, which has a higher molecular weight). To avoid confusion, the endogenous band will be highlighted (marked by *) in the lane representing the shSP1+pEGFP-C3-YEATS2 condition in the revised version of the manuscript.

      (5) In ChIP assays with SP1, YEATS2 and p300 which promoter regions were selected for the respective genes? Please provide data for all the different promoter regions that must have been analysed, highlighting the region where enrichment/depletion was observed. Including data from negative control regions would improve the validity of the results.

      Throughout our study, we have performed ChIP-qPCR assays to check the binding of SP1 on YEATS2 and GCDH promoter, and to check YEATS2 and p300 binding on SPARC promoter. Using transcription factor binding prediction tools and luciferase assays, we selected multiple sites on the YEATS2 and GCDH promoter to check for SP1 binding. The results corresponding to the site that showed significant enrichment were provided in the manuscript. The region of SPARC promoter in YEATS2 and p300 ChIP assay was selected on the basis of YEATS2 enrichment found in the YEATS2 ChIP-seq data. We will provide data for all the promoter regions investigated (including negative controls) in the revised version of the manuscript.

      (6) The authors establish a link between H3K27Cr marks and GCDH expression, and this is an already well-known pathway. A critical missing piece is the level of ECSH1 in patient samples. This will clearly delineate if the balance shifted towards crotonylation.

      We thank the reviewer for their valuable suggestion. To support our claim, we had checked the expression of GCDH and ECHS1 in TCGA HNC RNA-seq data (provided in Figure 4—figure supplement 1A and B) and found that GCDH showed increase while ECHS1 showed decrease in tumor as compared to normal samples. We hypothesized that higher GCDH expression and decreased ECHS1 expression might lead to an increase in the levels of crotonylation in HNC. To further substantiate our claim, we will check the abundance of ECHS1 in HNC patient samples as part of the revision.

      (7) The p300 ChIP data on the SPARC promoter is confusing. The authors report reduced p300 occupancy in YEATS2-silenced cells, on SPARC promoter. However, this is paradoxical, as p300 is a writer, a histone acetyltransferase (HAT). The absence of a reader (YEATS2) shouldn't affect the writer (p300) unless a complex relationship between p300 and YEATS2 is present. The role of p300 should be further clarified in this case. Additionally, transcriptional regulation of SPARC expression in YEATS2 silenced cells could be analysed via downstream events, like Pol-II recruitment. Assays such as Pol-II ChIP-qPCR could help explain this.

      Using RNA-seq and ChIP-seq analyses, we have shown that YEATS2 affects the expression of several genes by regulating the level of histone crotonylation at gene promoters globally. The histone writer p300 is a promiscuous acyltransferase protein that has been shown to be involved in the addition of several non-acetyl marks on histone residues, including crotonylation(4). Our data provides evidence for the dependency of the writer p300 on YEATS2 in mediating histone crotonylation, as YEATS2 downregulation led to decreased occupancy of p300 on the SPARC promoter (Figure 5F). However, the exact mechanism of cooperativity between YEATS2 and p300 in maintaining histone crotonylation remains to be investigated. To address the reviewer’s concern, we will perform various experiments to delineate the molecular mechanism pertaining to the association of YEATS2 with p300 in regulating histone crotonylation. Following are the experiments that will be performed:

      (a) Co-immunoprecipitation experiments to check the physical interaction between YEATS2 and p300.

      (b) We will check H3K27cr levels on the SPARC promoter and SPARC expression in p300-depleted HNC cells.

      (c) Rescue experiments to check if the decrease in p300 occupancy on the SPARC promoter can be compensated by overexpressing YEATS2.

      (d) As suggested by the reviewer, Pol-II ChIP-qPCR at the promoter of SPARC will be performed in YEATS2-silenced cells to explain the mode of transcriptional regulation of SPARC expression by YEATS2.

      (8) The role of GCDH in producing crotonyl-CoA is already well-established in the literature. The authors' hypothesis that GCDH is essential for crotonyl-CoA production has been proven, and it's unclear why this is presented as a novel finding. It has been shown that YEATS2 KD leads to reduced H3K27cr, however, it remains unclear how the reader is affecting crotonylation levels. Are GCDH levels also reduced in the YEATS2 KD condition? Are YEATS2 levels regulating GCDH expression? One possible mechanism is YEATS2 occupancy on GCDH promoter and therefore reduced GCDH levels upon YEATS2 KD. This aspect is crucial to the study's proposed mechanism but is not addressed thoroughly.

      The source for histone crotonylation, crotonyl-CoA, can be produced by several enzymes in the cell, such as ACSS2, GCDH, ACOX3, etc(5). Since metabolic intermediates produced during several cellular pathways in the cell can act as substrates for epigenetic factors, we wanted to investigate if such an epigenetic-metabolism crosstalk existed in the context of YEATS2. As described in the manuscript, we performed GSEA using publicly available TCGA RNA-seq data and found that patients with higher YEATS2 expression also showed a high correlation with expression levels of genes involved in the lysine degradation pathway, including GCDH. Since the preferential binding of YEATS2 with H3K27cr and the role of GCDH in producing crotonyl-CoA was known(6,7), we hypothesized that higher H3K27cr in HNC could be a result of both YEATS2 and GCDH. We found that the presence of GCDH in the nucleus of HNC cells is correlated to higher H3K27cr abundance, which could be a result of excess levels of crotonyl-CoA produced via GCDH. We also found a correlation between H3K27cr levels and YEATS2 expression, which could arise due to YEATS2-mediated preferential maintenance of crotonylation. This states that although being a reader protein, YEATS2 is affecting the promoter H3K27cr levels, possibly by helping in the recruitment of p300 (as shown in Figure 5F). Thus, YEATS2 and GCDH are both responsible for the regulation of histone crotonylation-mediated gene expression in HNC.

      We did not find any evidence of YEATS2 regulating the expression of GCDH in HNC cells. However, we found that YEATS2 downregulation reduced the nuclear pool of GCDH in head and neck cancer cells (Figure 7F). This suggests that YEATS2 not only regulates histone crotonylation by affecting promoter H3K27cr levels (with p300), but also by affecting the nuclear localization of crotonyl-CoA producing GCDH. Also, we observed that the expression of YEATS2 and GCDH are regulated by the same transcription factor SP1 in HNC. We found that the transcription factor SP1 binds to the promoter of both genes, and its downregulation led to a decrease in their expression (Figure 3 and Figure 7).

      We would like to state that the relationship between YEATS2 and the nuclear localization of GCDH, as well as the underlying molecular mechanism, remains unexplored and presents an open question for future investigation.

      (9) The authors should provide IHC analysis of YEATS2, SPARC alongside H3K27cr and GCDH staining in normal vs. tumor tissues from HNC patients.

      We thank the reviewer for their suggestion. We are consulting our clinical collaborators to assess the feasibility of including this IHC analysis in our revision and will make every effort to incorporate it.

      Reviewer #2 (Public review):

      Summary:

      The manuscript emphasises the increased invasive potential of histone reader YEATS2 in an SP1-dependent manner. They report that YEATS2 maintains high H3K27cr levels at the promoter of EMT-promoting gene SPARC. These findings assigned a novel functional implication of histone acylation, crotonylation.

      We thank the reviewer for the constructive comments. We are committed to making beneficial changes to the manuscript in order to alleviate the reviewer’s concerns.

      Concerns:

      (1) The patient cohort is very small with just 10 patients. To establish a significant result the cohort size should be increased.

      We thank the reviewer for this suggestion. We will increase the number of patient samples to assess the levels of YEATS2 and H3K27cr in normal vs. tumor samples.

      (2) Figure 4D compares H3K27Cr levels in tumor and normal tissue samples. Figure 1G shows overexpression of YEATS2 in a tumor as compared to normal samples. The loading control is missing in both. Loading control is essential to eliminate any disparity in protein concentration that is loaded.

      In Figures 1G and 4D, we have used Ponceau S staining as a control for equal loading. Ponceau S staining is frequently used as an alternative for housekeeping genes like GAPDH as a control for protein loading(8). It avoids the potential for variability in housekeeping gene expression. However, it may be less quantitative than using housekeeping proteins. To address the reviewer’s concern, we will probe with an antibody against a house keeping gene as a loading control in the revised figures, provided its expression remains stable across the conditions tested.

      (3) Figure 4D only mentions 5 patient samples checked for the increased levels of crotonylation and hence forms the basis of their hypothesis (increased crotonylation in a tumor as compared to normal). The sample size should be more and patient details should be mentioned.

      A total of 9 samples were checked for H3K27cr levels (5 of them are included in Figure 4D and rest included in Figure 4—figure supplement 1D). However, as a part of the revision, we will check the H3K27cr levels in more patient samples.

      (4) YEATS2 maintains H3K27Cr levels at the SPARC promoter. The p300 is reported to be hyper-activated (hyperautoacetylated) in oral cancer. Probably, the activated p300 causes hyper-crotonylation, and other protein factors cause the functional translation of this modification. The authors need to clarify this with a suitable experiment.

      In our study, we have shown that p300 is dependent on YEATS2 for its recruitment on the SPARC promoter. As a part of the revision, we propose the following experiments to further substantiate the role of p300 in YEATS2-mediated gene regulation:

      (a) Co-immunoprecipitation experiments to check the physical interaction between YEATS2 and p300.

      (b) We will check H3K27cr levels on the SPARC promoter and SPARC expression in p300-depleted HNC cells.

      (c) Rescue experiments to check if the decrease in p300 occupancy on the SPARC promoter can be compensated by overexpressing YEATS2.

      (d) Pol-II ChIP-qPCR at the promoter of SPARC will be performed in YEATS2-silenced cells to explain the mode of transcriptional regulation of SPARC expression by YEATS2.

      (5) I do not entirely agree with using GAPDH as a control in the western blot experiment since GAPDH has been reported to be overexpressed in oral cancer.

      We would like to clarify that GAPDH was not used as a loading control for protein expression comparisons between normal and tumor samples. GAPDH was used as a loading control only in experiments using head and neck cancer cell lines where shRNA-mediated knockdown or overexpression was employed. These manipulations specifically target the genes of interest and are not expected to alter GAPDH expression, making it a suitable loading control in these instances.

      (6) The expression of EMT markers has been checked in shControl and shYEATS2 transfected cell lines (Figure 2A). However, their expression should first be checked directly in the patients' normal vs. tumor samples.

      We thank the reviewer for the suggestion. To address this, we will check the expression of EMT markers alongside YEATS2 expression in normal vs. tumor samples.

      (7) In Figure 3G, knockdown of SP1 led to the reduced expression of YEATS2 controlled gene Twist1. Ectopic expression of YEATS2 was able to rescue Twist1 partially. In order to establish that SP1 directly regulates YEATS2, SP1 should also be re-introduced upon the knockdown background along with YEATS2 for complete rescue of Twist1 expression.

      To address the reviewer’s concern regarding the partial rescue of Twist1 in SP1 depleted-YEATS2 overexpressed cells, we will perform the experiment as suggested by the reviewer. In brief, we will overexpress both SP1 and YEATS2 in SP1-depleted cells and then assess the expression of Twist1.

      (8) In Figure 7G, the expression of EMT genes should also be checked upon rescue of SPARC expression.

      We thank the reviewer for the suggestion. We will check the expression of EMT markers on YEATS2/ GCDH rescue and update Figure 7G in the revised version of the manuscript.

      References

      (1) T. Brabletz, R. Kalluri, M. A. Nieto and R. A. Weinberg, Nat Rev Cancer, 2018, 18, 128–134.

      (2) P. Pisani, M. Airoldi, A. Allais, P. Aluffi Valletti, M. Battista, M. Benazzo, R. Briatore, S. Cacciola, S. Cocuzza, A. Colombo, B. Conti, A. Costanzo, L. Della Vecchia, N. Denaro, C. Fantozzi, D. Galizia, M. Garzaro, I. Genta, G. A. Iasi, M. Krengli, V. Landolfo, G. V. Lanza, M. Magnano, M. Mancuso, R. Maroldi, L. Masini, M. C. Merlano, M. Piemonte, S. Pisani, A. Prina-Mello, L. Prioglio, M. G. Rugiu, F. Scasso, A. Serra, G. Valente, M. Zannetti and A. Zigliani, Acta Otorhinolaryngol Ital, 2020, 40, S1–S86.

      (3) J. Lin, P. Zhang, W. Liu, G. Liu, J. Zhang, M. Yan, Y. Duan and N. Yang, Elife, 2023, 12, RP87510.

      (4) X. Liu, W. Wei, Y. Liu, X. Yang, J. Wu, Y. Zhang, Q. Zhang, T. Shi, J. X. Du, Y. Zhao, M. Lei, J.-Q. Zhou, J. Li and J. Wong, Cell Discov, 2017, 3, 17016.

      (5) G. Jiang, C. Li, M. Lu, K. Lu and H. Li, Cell Death Dis, 2021, 12, 703.

      (6) D. Zhao, H. Guan, S. Zhao, W. Mi, H. Wen, Y. Li, Y. Zhao, C. D. Allis, X. Shi and H. Li, Cell Res, 2016, 26, 629–632.

      (7) H. Yuan, X. Wu, Q. Wu, A. Chatoff, E. Megill, J. Gao, T. Huang, T. Duan, K. Yang, C. Jin, F. Yuan, S. Wang, L. Zhao, P. O. Zinn, K. G. Abdullah, Y. Zhao, N. W. Snyder and J. N. Rich, Nature, 2023, 617, 818–826.

      (8) I. Romero-Calvo, B. Ocón, P. Martínez-Moya, M. D. Suárez, A. Zarzuelo, O. Martínez-Augustin and F. S. de Medina, Anal Biochem, 2010, 401, 318–320.

    1. Author response:

      We thank the reviewers for their careful evaluation of our manuscript and appreciate the suggestions for improvement. We will outline our planned revisions in response to these reviews.

      Reviewer 2:

      “The one exception is the claim that "maintenance of respiration is the only cellular target of chalkophore mediated copper acquisition." While under the in vitro conditions tested this does appear to be the case; however, it can't be ruled out that the chalkophore is important in other situations. In particular, for maintenance of the periplasmic superoxide dismutase, SodC, which is the other M. tuberculosis enzyme known to require copper.”

      And

      Reviewer 3:

      “Because the phenotype of M. tuberculosis lacking chalkophores is similar, if not identical, to using Q203, an inhibitor of cytochrome bcc:aa3, the authors propose that the copper-containing cytochrome bcc:aa3 is the only recipient of copper-uptake by chalkophores. A minor weakness of the work is that this latter conclusion is not verified under infection conditions and other copper-enzymes might still be functionally required during one or more stages of infection.

      Both comments concern the question of whether the bcc:aa3 respiratory oxidase supercomplex is the only target of chalkophore delivered copper. In culture, our experiments suggest that bcc:aa3 is the only target. The evidence for this claim is in Figure 2E and F. In 2E, we show that M. tuberculosis DctaD (a subunit of bcc:aa3) is growth impaired, copper chelation with TTM does not exacerbate that growth defect, and that a DctaDDnrp double mutant is no more sensitive to TTM than DctaD. These data indicate that role of the chalkophore in protecting against copper deprivation is absent when the bcc:aa3 oxidase is missing. Similar results were obtained with Q203 (Figure 2F). Q203 or TTM arrest growth of M. tuberculosis Dnrp, but the combination has no additional effect, indicating that when Q203 is inhibiting the bcc:aa3 oxidase, the chalkophore has no additional role. However, we agree with the reviewers that we cannot exclude the possibility that during infection, there is an additional target of chalkophore mediated Cu acquisition. We will add this caveat to the revised version of this manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In previous work, the authors described necrosis-induced apoptosis (NiA) as a consequence of induced necrosis. Specifically, experimentally induced necrosis in the distal pouch of larval wing imaginal discs triggers NiA in the lateral pouch. In this manuscript, the authors confirmed this observation and found that while necrosis can kill all areas of the disc, NiA is limited to the pouch and to some extent to the notum, but is excluded from the hinge region. Interestingly and unexpectedly, signaling by the Jak/Stat and Wg pathways inhibits NiA. Further characterization of NiA by the authors reveals that NiA also triggers regenerative proliferation which can last up to 64 hours following necrosis induction. This regenerative response to necrosis is significantly stronger compared to discs ablated by apoptosis. Furthermore, the regenerative proliferation induced by necrosis is dependent on the apoptotic pathway because RNAi targeting the RHG genes is sufficient to block proliferation. However, NiA does not promote proliferation through the previously described apoptosis-induced proliferation (AiP) pathway, although cells at the wound edge undergo AiP. Further examination of the caspase levels in NiA cells allowed the authors to group these cells into two clusters: some cells (NiA) undergo apoptosis and are removed, while others referred to as Necrosis-induced Caspase Positive (NiCP) cells survive despite caspase activity. It is the NiCP cells that repair cellular damage including DNA damage and that promote regenerative proliferation. Caspase sensors demonstrate that both groups of cells have initiator caspase activity, while only the NiA cells contain effector caspase activity. Under certain conditions, the authors were also able to visualize effector caspase activity in NiCP cells, but the level was low, likely below the threshold for apoptosis. Finally, the authors found that loss of the initiator caspase Dronc blocks regenerative proliferation, while inhibiting effector caspases by expression of p35 does not, suggesting that Dronc can induce regenerative proliferation following necrosis in a non- apoptotic manner. This last finding is very interesting as it implies that Dronc can induce proliferation in at least two ways in addition to its requirement in AiP.

      Strengths:

      This is a very interesting manuscript. The authors demonstrate that epithelial tissue that contains a significant number of necrotic cells is able to regenerate. This regenerative response is dependent on the apoptotic pathway which is induced at a distance from the necrotic cells. Although regenerative proliferation following necrosis requires the initiator caspase Dronc, Dronc does not induce a classical AiP response for this type of regenerative response. In future work, it will be very interesting to dissect this regenerative response pathway genetically.

      Weaknesses:

      No weaknesses were identified.

      We thank the reviewer for their positive evaluation and kind words.

      Reviewer #2 (Public Review):

      Summary / Strengths:

      In this manuscript, Klemm et al., build on past published findings (Klemm et al., 2021) to characterize caspase activation in distal cells following necrotic tissue damage within the Drosophila wing imaginal disc. Previously in Klemm et al., 2021, the authors describe necrosis-induced-apoptosis (NiA) following the development of a genetic system to study necrosis that is caused by the expression of a constitutive active GluR1 (Glutamate/Ca2+ channel), and they discovered that the appearance of NiA cells were important for promoting regeneration.

      In this manuscript, the authors aim to investigate how tissues regenerate following necrotic cell death. They find that the cells of the wing pouch are more likely to have non-autonomous caspase activation than other regions within the wing imaginal disc (hinge and notum),two signaling pathways that are known to be upregulated during regeneration, Wnt (wingless) and JAK/Stat signaling, act to prevent additional NiA in pouch cells, and may explain the region specificity, the presence of NiA cells promotes regenerative proliferation in late stages of regeneration, not all caspase-positive cells are cleared from the epithelium (these cells are then referred to as Necrosis-induced Caspase Positive (NiCP) cells), these NiCP cells continue to live and promote proliferation in adjacent cells, the caspase Dronc is important for creating NiA/NiCP cells and for these cells to promote proliferation. Animals heterozygous for a Dronc null allele show a decrease in regeneration following necrotic tissue damage.

      The study has the potential to be broadly interesting due to the insights into how tissues differentially respond to necrosis as compared to apoptosis to promote regeneration.

      Weaknesses:

      However, here are some of my current concerns for the manuscript in its current version:

      The presence of cells with activated caspase that don't die (NiCP cells) is an interesting biological phenomenon but is not described until Figure 5. How does the existence of NiCP cells impact the earlier findings presented? Is late proliferation due to NiA, NiCP, or both? Does Wg and JAK/STAT signaling act to prevent the formation of both NiA and NiCP cells or only NiA cells? Moreover, the authors are able to specifically manipulate the wound edge (WE) and lateral pouch cells (LP), but don't show how these manipulations within these distinct populations impact regeneration. The authors provide evidence that driving UAS-mir(RHG) throughout the pouch, in the LP or the WE all decrease the amount of NiA/NiCP in Figure 3G-O, but no data on final regenerative outcomes for these manipulations is presented (such as those presented for Dronc-/+ in Fig 7M). The manuscript would be greatly enhanced by quantification of more of the findings, especially in describing if the specific manipulations that impacted NiA /NiCP cells disrupt end-point regeneration phenotypes.

      We have added a line to the results to clarify that we believe the finding that some NiA likely persist as NiCP does not affect our conclusions up to this point.

      We have added a statement emphasizing the results from our first paper, which demonstrate that LP>miRHG expression reduces the overall capacity to regenerate.

      Quantification of the change in posterior NiA number have been added to Figure 2L to strengthen the evidence. Likewise, we have included quantification of the E2F time course presented in Figure 3A (Figure 3 – Figure supplement 1C), and quantification of the change in GC3Ai signal over time has been added to Figure 5 - Figure supplement 1D) to emphasize the perdurance of GC3Ai-positive NiA/NiCP.

      How fast does apoptosis take within the wing disc epithelium? How many of the caspase(+) cells are present for the whole 48 hours of regeneration? Are new cells also induced to activate caspase during this time window? The author presented a number of interesting experiments characterizing the NiCP cells. For the caspase sensor GC3Ai experiments in Figure 5, is there a way to differentiate between cells that have maintained fluorescent CG3Ai from cells that have newly activated caspase? What is the timeline for when NiA and NiCP are specified? In addition, what fraction of NiCP cells contribute to the regenerated epithelium? Additional information about the temporal dynamics of NiA and NiCP specification/commitment would be greatly appreciated.

      We have included more information concerning the kinetics of apoptotic cell removal, and how this compares to the observations we have made with NiA/NiCP in our GC3Ai experiments. Additionally, we have included a quantification of the percent of the whole wing pouch with GC3Ai signal over time (Figure 5F) as well as the distal wing pouch with GC3Ai signal over time (Figure 5 – Figure supplement 1D) to further support the idea that NiCP persist over time.

      We acknowledge that our GC3Ai time course unfortunately cannot confirm whether the increase in GC3Ai signal over time is due to cells with new caspase activity or proliferating NiCP and have included this point in the discussion.

      We attempted to track the lineage of NiA/NiCP into the pupal and adult wings with CasExpress and DBS, however the results of these experiments were inconsistent, and therefore we did not feel confident to include these data or draw conclusions in either direction. We are currently designing variations of these lineage trace tools in order to better track the lineage of these cells that we hope to include in a future paper.

      The notum also does not express developmental JAK/STAT, yet little NiA was observed within the notum. Do the authors have any additional insights into the differential response between the pouch and notum? What makes the pouch unique? Are NiA/NiCP cells created within other imaginal discs and other tissues? Are they similarly important for regenerative responses in other contexts?

      We have added a brief mention of these points to the appropriate results section to avoid further increasing the length of the discussion.

      Data on the necrosis of other imaginal discs through FLP/FRT clone formation in haltere and leg discs has been added to Figure 1 Figure supplement 1J, and described in the text.

      Reviewer #3 (Public Review):

      The manuscript "Regeneration following tissue necrosis is mediated by non- apoptotic caspase activity" by Klemm et al. is an exploration of what happens to a group of cells that experience caspase activation after necrosis occurs some distance away from the cells of interest. These experiments have been conducted in the Drosophila wing imaginal disc, which has been used extensively to study the response of a developing epithelium to damage and stress. The authors revise and refine their earlier discovery of apoptosis initiated by necrosis, here showing that many of those presumed apoptotic cells do not complete apoptosis. Thus, the most interesting aspect of the paper is the characterization of a group of cells that experience mild caspase activation in response to an unknown signal, followed by some effector caspase activation and DNA damage, but that then recover from the DNA damage, avoid apoptosis, and proliferate instead. Many questions remain unanswered, including the signal that stimulates the mild caspase activation, and the mechanism through which this activation stimulates enhanced proliferation.

      The authors should consider answering additional questions, clarifying some points, and making some minor corrections:

      Major concerns affecting the interpretation of experimental results:

      Expression of STAT92E RNAi had no apparent effect on the ability of hinge cells to undergo NiA, leading the authors to conclude that other protective signals must exist. However, the authors have not shown that this STAT92E RNAi is capable of eliminating JAK/STAT signaling in the hinge under these experimental conditions. Using a reporter for JAK/STAT signaling, such as the STAT-GFP, as a readout would confirm the reduction or elimination of signaling. This confirmation would be necessary to support the negative result as presented.

      We have included data demonstrating our ability to knock down JAK/STAT activity in the hinge with UAS-Stat92E<sup>RNAi</sup> (Figure 2 – Figure supplement 1E and F). Additionally, we have included a quantification of posterior NiA/NiCP with the Stat92E<sup>RNAi</sup> (as well as wg<sup>RNAi</sup> and Zfh-2<sup>RNAi</sup>, Figure 2L) to strengthen our conclusion that JAK/STAT and WNT signaling acts to regulate NiA formation within the pouch.

      Similarly, the authors should confirm that the Zfh2 RNAi is reducing or eliminating Zfh2 levels in the hinge under these experimental conditions, before concluding that Zfh2 does not play a role in stopping hinge cells from undergoing NiA.

      We have repeated this experiment with a longer knockdown using a GAL4 driver that expresses from early larval stages until our evaluation at L3, but were unable to demonstrate a loss of Zfh-2 with IF labeling. Additionally, we have quantified posterior NiA/NiCP with a Zfh-2RNAi (Figure 2L) and do find a slight increase in NiA/NiCP number, however this change is not significant. We have altered our conclusions to reflect these new data.

      EdU incorporation was quantified by measuring the fluorescence intensity of the pouch and normalizing it to the fluorescence intensity of the whole disc. However, the images show that EdU fluorescence intensity of other regions of the disc, especially the notum, varied substantially when comparing the different genetic backgrounds (for example, note the substantially reduced EdU in the notum of Figure 3 B' and B'). Indeed, it has been shown that tissue damage can lead to suppression of proliferation in the notum and elsewhere in the disc, unless the signaling that induces the suppression is altered. Therefore, the normalization may be skewing the results because the notum EdU is not consistent across samples, possibly because the damage-induced suppression of proliferation in the notum is different across the different genetic backgrounds.

      To more accurately reflect the observations that we have made with the EdU assay, we have changed our terminology to indicate that the EdU signal is more localized to the damaged tissue in ablated discs, thus taking into account the relative changes across the disc, rather than referring to it as an increase in the pouch. To further strengthen our observation that damage results in a localized proliferation, we have included a quantification of the E2F time course presented in Figure 3A (Figure 3 – Figure supplement 1C), which underscores the trend observed in our EdU experiments.

      The authors expressed p35 to attempt to generate "undead cells". They take an absence of mitogen secretion or increased proliferation as evidence that undead cells were not generated. However, there could be undead cells that do not stimulate proliferation non-autonomously, which could be detected by the persistence of caspase activity in cells that do not complete apoptosis. Indeed, expressing p35 and observing sustained effector caspase activation could help answer the later question of what percentage of this cell population would otherwise complete apoptosis (NiA, rescued by p35) vs reverse course and proliferate (NiCP, unaffected by p35).

      In our previous work, we showed that P35 expression impairs our ability to detect effector caspases with IF-based tools. This can also be seen in Figure 4 of this work (Figure 4C and F). Given that P35 expression precludes our ability to label and assay effector caspase activity visually, and thus address the concerns outlined above, we relied on other tools such as reporters of AiP mitogens (wg-lacZ & dpp-lacZ) to assay whether NiA participate in AiP. As a functional readout, we also paired P35 expression with the EdU assay to test whether proliferation was altered by the presence of undead cells. The results discussed in Figure 4 lead us to conclude that NiA likely do not participate in the canonical AiP feedforward loop, although it is possible that these experiments generate another type of undead cell – one that utilizes a different mechanism to promote proliferation.

      It is unclear if the authors' model is that the NiCP cells lead to autonomous or non-autonomous cell proliferation, or both. Could the lineage-tracing experiments and/or the experiments marking mitosis relative to caspase activity answer this question?

      We have added further details to the discussion on the potential for NiA/NiCP to induce cell autonomous/non-autonomous proliferation.

      Many of the conclusions rely on single images. Quantification of many samples should be included wherever possible.

      We have added quantification to strengthen the results of Figures 2, 3 and 5.

      Why does the reduction of Dronc appear to affect regenerative growth in females but not males?

      We have repeated this regeneration scoring experiments and have increased the N for control versus droncI29 mutant males, however the results of the analysis for male wing size remain not significant, although the general trend that droncI29 wings are slightly smaller. While there could be sex-specific differences in the capacity to regenerate that contribute to this observation, it is unclear what the underlying mechanism could be.

      Reviewer #1 (Recommendations for the authors):

      The work in this paper is already very complete and very well worked out. The conclusions are well supported by the data in this manuscript. I do not have any experimental requests, only a few minor and formal requests/questions.

      (1) Why does Diap1 overexpression not affect regenerative proliferation, whereas mir(RHG) and dronc[I29] do, given that Diap1 acts between RHG and Dronc?

      We speculate on this point in the discussion section but have adjusted some of the phrasing for clarity.

      (2) I assume that the authors used the cleaved Dcp-1 antibody from Cell Signaling Technologies. I recommend that the authors refer to this antibody as cDcp-1 in text and figures as this antibody specifically detects the cleaved, and thus activated form of Dcp-1, and not the uncleaved, inactive form of Dcp-1 which has a uniform expression in the discs.

      Changed to cDcp-1.

      (3) Line 299: Hay et al. 1994 did not show that p35 inhibits Drice and Dcp-1 (in fact, both genes were not even cloned yet). This was shown by Meier et al. 2000 and Hawkins et al. 2000. Please correct references.

      Corrected.

      (4) Line 574/575. Meier et al. 2000 did not show that Dronc is mono-ubiquitylated. This was shown by Kamber-Kaya et al., 2017. Please correct.

      Corrected.

      Reviewer #2 (Recommendations for the authors):

      (1) Does domeless knockdown cause apoptosis without tissue ablation (Figures 2C-E)? Currently, the non-ablation control is not shown.

      Domeless knockdown does not cause apoptosis in the absence of ablation (Added Figure 2 – Figure supplement 1A).

      (2) The supplemental experiment with zfh2-RNAi is hard to interpret because there is no evidence of RNAi knockdown based on the staining with the anti-Zfh2 antibody.

      As noted above, a longer zfh-2 knockdown does not appear to alter Zfh-2 protein levels. A quantification of posterior NiA/NiCP following knockdown shows a slight (non-significant) increase in posterior NiA/NiCP. Considering these new results, we have altered our interpretation within the appropriate results and discussion sections.

      (3) The authors should consider adding a diagram showing where mir(RHG) and DIAP1 are in the apoptotic/caspase activation pathway (Figure 7N).

      Completed, Figure 7N and 7O.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 2 I -The purported increase in NiA should be quantitated relative to the NiA in G across many discs.

      Completed (Figure 2L)

      (2) Figure 2 M - contrary to the conclusion drawn, the posterior Dcp1 does not appear different from that in the control (K). This conclusion that the NiA does not occur in the margin could be better supported with more images/quantification.

      We have exchanged the image for a representative one that more clearly shows the lack of margin NiA and highlighted with an arrowhead (Figure 2K)

      (3) Figure 2 supp 1 E - the "slight increase" in NiA in the pouch is relative to which control? Can this conclusion be supported by quantification?

      Figure 2L now quantifies this change.

      (4) Figure 2 Supp 1 D, E - these discs supposedly have Zfh2 RNAi expressed, but there appears to be no reduction in Zfh2.

      We were unable to demonstrate a reduction of Zfh2, even with a longer knockdown. Considering these new data, we have altered our conclusions from the Zfh2 experiments.

      (5) Figure 2 Supp 1 I - please quantitate the Dcp-1 across many discs to support the conclusion.

      This is the UAS-wg experiment, which we decided to remove from the quantification given the non-specific increase in cDcp-1 throughout the disc (likely as a result from ectopic Wg expression).

      (6) Figure 4 legend M - The authors conclude that the experiment indicates that "NiA promote proliferation independent of AiP". It would be more precise to say that NiA cells do not secrete AiP mitogens and do not increase the proliferation of surrounding cells when prevented from completing apoptosis. To say that the NiA-induced proliferation does not require AiP would require eliminating AiP, perhaps through reaper hid grim knockdown or mitogen knockdown.

      Corrected.

      Minor concerns and clarification needed:

      (7) Line 61 - consider the distinction between a feed-forward loop and a positive feedback loop.

      Corrected.

      (8) Line 338 - it would be helpful to have a brief explanation of what the GC3Ai consists of and how it reports caspase activity.

      Corrected.

      (9) Line 343 - the authors should clarify by what they mean when they state GC3Ai-positive cells are "associated with" mitotic cells. Are the GC3Ai cells undergoing mitosis? Or is the increase in mitosis non-autonomous?

      Adjusted. “associated with adjacent proliferative cells”.

      (10) Lines 392-394 - the authors should add brief descriptions of how the Drice-Based sensor and the CasExpress function, so the readers can better understand the distinctions between these sensors and the previously mentioned sensors (anti-Dcp1 and GC3Ai). In addition, please clarify how the Gal80ts modulates the sensitivity of the CasExpress.

      Descriptions of DBS and CasExpress and additional clarification provided.

      (11) Line 413: How does Gal80ts suppress the background developmental caspase signal, and how does this suppression lead to NiCP cells expressing GFP?

      This section has been reworded to clarify.

      (12) Line 417 - which GFP label is referred to here?

      This section has been reworded to clarify.

      (13) Line 445 is the first mention of the CARD domain - it could be introduced more fully and explained why the DroncDN's lack of effect on proliferation excludes the CARD domain as being important.

      Clarified. See also the discussion for the significance of the CARD domain as dispensable for regenerative proliferation following necrosis.

      (14) Line 452 - "As mentioned" - the manuscript has not previously mentioned DIAP1 modification of the CARD domain and what that modification does. Perhaps the previous explanatory text was inadvertently removed?

      Corrected.

      (15) The Discussion is a lengthy list of experiments that the authors did not do or observations they were unable to make. This section could benefit from a more in-depth discussion of necrosis and the possibility that NiCP cells contribute to repair after injury across contexts and species.

      We have made several changes to the discussion that elaborate on some of the points listed in the public reviews.

      (16) All figures: Consider making single-channel panels grayscale to aid visualization. Also consider using color combinations that can be distinguished by color-blind readers.

      We appreciate these suggestions and will consider them for future manuscripts.

      (17) All figure legends - are error bars SD or SEM?

      Standard deviation. Added to appropriate legends.

      (18) Figure 1A,C - it would be helpful in the diagrams to note when the necrosis occurs/completes.

      The endpoint of necrosis is not well defined, given the simultaneous changes that occur with regeneration. Thus, we opted to not include an indicator of when necrotic ablation ends.

      (19) Figure 1B - it would be helpful to name the GAL4 drivers whose expression domain is depicted to correlate with the terms used in the text.

      Completed.

      (20) Figure 1 legend- what do the different colors of the arrowheads denote? The dotted lines are in R' and S', not N' and O'.

      Completed.

      (21) Figure 2G - the yellow dashed line is not in the same place in the two images.

      Corrected.

      (22) Figure 2I - what is the open arrowhead?

      Completed (Figure 2I legend).

      (23) Figure 3 legend - please describe what the time course is observing (EdU).

      Completed.

      (24) Figure 4 - please include the yellow boxes in the Dcp-1 channels.

      Completed.

      (25) Figure 5 F' - add the arrowheads to all the panels. The yellow arrowhead appears to be pointing to nothing.

      Completed.

      (27) Figure 5 legend - what is a "cytoplasmic undisturbed cell"? What is the arrowhead in G? J and J' should show the same view at different time points or different views at the same time point.

      Figure legend has been corrected.

      (28) Figure 5 Supp 1 would be especially helped by having more single-channel panels in grayscale.

      For clarity and consistency, we chose to maintain the different color channels.

      (29) Figure 5 Supp 1 D and E - It would be helpful to have higher magnification and arrows pointing to the cells of interest. Why are there TUNEL+ cells that do not have caspase activation (green)?

      We have added arrowheads as suggested. We believe the disparity in TUNEL and GC3Ai signals are a result of the different sensitivities of the IF staining and the TUNEL assay.

      (30) Figure 5 Supp 1 F - perhaps the arrowheads should be in all panels - they point to empty spaces with no H2Av staining in the final panel. Perhaps a higher magnification image would make the "strong overlap" of the two signals more apparent?

      We have added arrowheads where appropriate.

      (31) Figure 6 D-E - does the widespread GFP lineage tracing signal suggest that most cells in the repaired tissue originated from cells that once had caspases activity?

      Possibly, however given that CasExpress leads to significant developmental labeling, we were unable to determine to what extent the signal in this experiment comes from NiA/NiCP activity versus developmental labeling. Note that tubGAL80ts is not present in this experiment.

      (32) Writing corrections:

      Line 343 "positive" is misspelled.

      Completed

      Line 429 - a word may be missing.

      Completed

      Line 639 - the word "day" may be missing.

      Completed

      Line 658 - what temperature was the recovery?

      Completed

      Lines 706-708 - were the discs incubated in 55 mL and 65 mL of liquid, or a smaller volume?

      Completed

    1. Author response:

      Reviewer #1:

      Overall I find the evidence very well presented and the study compelling. It offers an important new perspective on the key properties of neoblasts. I do have some comments to clarify the presentation and significance of the work.

      We thank the reviewer for the positive feedback and plan to improve the presentation of the work.

      Reviewer #2:

      However, the absence of a cell-cell feedback mechanism during colony growth and the likelihood of the difference needs to be clarified. Is there any difference in interpreting the results if this mechanism is considered?

      We will improve the description of the model assumptions and the interpretation of the data on the basis of these assumptions.

      Although hnf-4 and foxF have been silenced together to validate the model, a deeper understanding of the tgs-1+ cell type and the non-significant reduction of tgs-1+ neoblasts in zfp-1 RNAi colonies is necessary, considering a high neural lineage frequency.

      We will improve the analysis of this result in light of the experimentally determined frequency of the tgs-1+ neoblast population.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Cheng et al explore the utility of analyte ratios instead of relative abundance alone for biological interpretation of tissue in a MALDI MSI workflow. Utilizing the ratio of metabolites and lipids that have complimentary value in metabolic pathways, they show the ratio as a heat map which enhances the understanding of how multiple analytes relate to each other spatially. Normally, this is done by projecting each analyte as a unique color but using a ratio can help clarify visualization and add to biological interpretability. However, existing tools to perform this task are available in open-source repositories, and fundamental limitations inherent to MALDI MSI need to be made clear to the reader. The study lacks rigor and controls, i.e. without quantitative data from a variety of standards (internal isotopic or tissue mimetic models for example), the potential delta in ionization efficiencies of different species subtracts from the utility of pathway analysis using metabolite ratios.

      We thank the reviewer for comments on the availability of four other commercial and open-source tools for performing ratio imaging: ENVI® Geospatial Analysis Software, MATLAB image processing toolbox, Spectral Python (SPy) and QGIS. We now highlight these in the introduction (page 3 line 80-86). However, in contrast to these target ratio imaging methods, our approach uniquely enables the untargeted discovery of correlated (or anti-correlated) ratios of molecular features, whether the species are structurally known or unknown.

      ENVI® Geospatial Analysis Software and MATLAB image processing toolbox for hyperspectral imaging are both paid programs, limiting free access and software evaluation for the potential application of untargeted ratio-metric imaging. We are able to evaluate the application of MATLAB RatioImage since Weill Cornell Medicine has an institutional subscription for Mathwork-MATLAB. Notably, MATLAB RatioImage computes and displays an individual intensity modulated ratiometric image by choosing a numerator and denominator image. This software tool only images the ratios of selected metabolites from an input list of multiple species and does not allow for the possibility of untargeted ratiometric images of all metabolite pairs.

      While Spectral Python (SPy) and QGIS are both freely-available software packages, and both can perform individual metabolite ratio images, neither allows for untargeted ratiometric imaging of all pairs from a multiple metabolite input list. Table S1 (below) provides a comparison of the ratio imaging tool that we offer in comparison with other previously available tools.

      We appreciate the reviewer’s insightful comments on differential ionization efficiency among metabolites and the importance of using stable isotope internal standard to gain absolute quantification.

      A fundamental advantage of our ratiometric imaging tool is to provide better image contrast for tissue regions with differential ionization efficiency, with the potential to discover new “metabolic” regions that can be revealed by metabolite ratio. Note that comparison for ratio image abundance is limited to tissue groups in the equivalent region which is expected to have similar ionization efficiency for given metabolites. Furthermore, the power of our strategy is to provide untargeted (and targeted) ratio imaging as a hypothesis generation tool and this use does not require absolute quantification. If cost was not an issue, an extensive group of stable isotope standards could theoretically be used for absolute metabolite quantification of target metabolites with known identity.

      Using the tissue mimetic model, we generate calibration curve for stable isotope standards spiked in carboxymethylcellulose (CMC)-embedded brain homogenate cryosections and quantify the concentration of brain glucose, lactate and ascorbate concentrations. Similar ratio images among these metabolites are obtained from abundance data compared to quantified concentration data (Fig S3). While stable isotope standards are often used to obtain quantitative concentration of metabolite/lipid of interest, it is not applicable for untargeted metabolite ratios that include an assessment of structurally undefined species. Nevertheless, our data indicates that absolute quantification is not necessary for the targeted and untargeted ratio imaging described here (Page 6, line 196-205).

      Reviewer #2 (Public Review):

      Summary:

      In the article, "Untargeted Pixel-by-Pixel Imaging of Metabolite Ratio Pairs as a Novel Tool for Biomedical Discovery in Mass Spectrometry Imaging" the authors describe their software package in R for visualizing metabolite ratio pairs. I think the novelty of this manuscript is overstated and there are several notable issues with the figures that prevent detailed assessment but the work would be of interest to the mass spectrometry community.

      Strengths:

      The authors describe a software that would be of use to those performing MALDI MSI. This software would certainly add to the understanding of metabolomics data and enhance the identification of critical metabolites.

      Weaknesses:

      The authors are missing several references and discussion points, particularly about SIMS MSI, where ratio imaging has been previously performed.

      There are several misleading sentences about the novelty of the approach and the limitations of metabolite imaging.

      Several sentences lack rigor and are not quantitative enough.

      The figures are difficult to interpret/ analyze in their current state and lack some critical components, including labels and scale bars.

      We thank reviewer for very helpful comments. The tone of the manuscript has been adjusted to highlight the real novelty of this method in the ease of computing and application to MS specific projects (abstract line 26-30 ). All figures have been updated to include labels and scale bars with improved resolution. References for ratio imaging use of SIMS MSI has been added in the introduction (Page 3, line 80-89).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major Comments:

      In the Abstract it is stated that: "the research community lacks a discovery tool that images all metabolite abundance ratio pairs." However, the following tools exist that perform this fundamental task.

      A "pixel by pixel" data frame in .csv form has a very similar data structure to many instruments like satellite imaging or other hyperspectral tools. It is true this does not exist in the MALDI-specific context, but it would not be difficult to perform this task on the following programs. Highlight the novelty here is not ratios but the ease of computing them and the application in the specific project. Also, describe the available tools and what shortcomings others lack that this package provides. A supplemental table of MSI data analysis tools and the function of each would be a good addition.

      List of tools to perform band ratio computation with minimal modification:

      (1) ENVI IDL: geospatial imaging tool that allows ratio computation between spectral bands.

      (2) MATLAB image processing toolbox for hyperspectral imaging.

      (3) Spectral Python package (SPy).

      (4) QGIS with plugins can be used for hyperspectral image analysis with a ratio between bands.

      We revised the abstract and introduction to include novelty and comparison to other existing methods listed in Table S1.

      "untargeted R package workflow" - If there are functions used outside the SCiLS Lab API client then write it up and include a GitHub link for open access to fit the mission of eLife.

      As shown in Scheme I. We develop two types of codes for untargeted ratio imaging. The first type uses Scils lab API client to extend the function of targeted and targeted ratio imaging and all related spatial image analysis. This is suitable for Scils lab users. The second type does not require Scils lab API, it allows extracting pixel data from imzml file then proceed targeted and untargeted imaging and analysis. Both codes are now deposit in Github via public access (https://github.com/qic2005/Untargeted-massspectrometry-ratio-imaging.git).

      "across cells and tissue subregions" The value in reporting cell type and tissue type-specific differences in any metric is powerful, but not done in this paper. Only whole samples are compared such as "KO vs WT" and the annotations in Figure 3 are not leveraged for increased biological relevance. This paper treats each image as a homogenization experiment in a practical sense beyond just visually inspecting each image. Remove this claim or do the calculations on region/tissue/cell-type specific differences with the appropriate tools to show the data beyond simple heat map images.

      We have deleted the sentence containing across cells and tissue subregions from the abstract.

      "enhances spatial image resolution" Clarify. The resolution in MALDI is set by the raster size of the pixels which is an instrument parameter and cannot be changed post-acquisition. Image-specific methods to increase resolution exist, but dividing the value in one peak column by another does not change functional resolution in the context of the instruments here.

      We thank reviewer for pointing out this typo. We have changed it to enhance spatial image contrast in the abstract (line 34).

      "pixel-by-pixel imaging of the ratio of an enzyme's substrate to its derived product offers an opportunity to view the distribution of functional activity for a given metabolic pathway across tissue" - Appropriately calibrate the impact of this work and correct this statement to better reflect the capabilities of this approach. Do not oversell the exploration of pathway activity since the raw quantity reported as relative abundance does not provide biologically interpretable pathway information. This is due to unaccounted differences in ionization efficiencies between analytes in a pathway and lack of determination of rate. Without a calibration curve and more techniques on the analytical chemistry side of the project, it is possible a relative abundance of one analyte (like the product of a pathway) could be higher than the relative abundance of another analyte (a precursor), but due to structural differences, the actual quantity of the higher relative abundance species could be significantly different or even lower than its counterpart. Secondly, "functional activity" cannot be assessed in this manner without isotopic labeling or additional techniques. This does not subtract from the overall validity and impact of the work, but highlighting these shortcomings and slight alterations to the claim are important for a multidisciplinary audience.

      Although we show that abundance ratio results in similar image to concentration ratio for brain metabolites such as lactate, glucose and ascorbate, we agree with the reviewer that abundance ratio is different from the absolute concentration ratio in numerical value due to difference in ionization efficiency. We delete the sentence “pixel-by-pixel imaging of the ratio of an enzyme's substrate to its derived product offers an opportunity to view the distribution of functional activity for a given metabolic pathway across tissue" from the abstract. We apologize for not clarifying this application more clearly. We meant to compare pathway activity among the equivalent and similar pixel/regions of tissues from different biological groups, given the assumption that ionization efficiency is identical for equivalent pixel from different tissue sections ( i.e. same cell type and microenvironment), especially for metabolites with similar functional structure in the same pathway. For example, fatty acids with different chain length and phospholipid with same head groups are expected to have similar ionization efficiency in the same tissue pixel/region. We have thereby rewritten this section (Page 7, line 239-247).

      "We further show that ratio imaging minimizes systematic variations in MSI data by sample handling and instrument drift, improves image resolution, enables anatomical mapping of metabotype heterogeneity, facilitates biomarker discovery, and reveals new spatially resolved tissue regions of interest (ROIs) that are metabolically distinct but otherwise unrecognized."

      Instrument drift is not accounted for by ratios as it impacts the process before ratio computation. "metabotype" - spelling?

      Instrument drift here refers to individual ion abundance changes during long data acquisition. Ratio may offer a better read-out than individual metabolite abundance alone. However, for acquired data after total ion normalization, ratio data would not have difference from non-ratio data. Therefore, we delete instrument drift from the sentence (Page 2, line 33, and Page 3, line 99)

      Metabotype is a term widely used for metabolomics field. It is categorized by similar metabolic profiles, which are based on combinations of specific metabolites. https://nutritionandmetabolism.biomedcentral.com/articles/10.1186/s12986-020-00499-z

      Results 3: Justify the claim that the ratio reduces artifacts. A ratio is the value from one m/z area over another and would seem that the quality of the ratio would be always lower than the individually higher quality pixel signal of the two analytes that compose a ratio.

      Ratio images are indeed the heatmaps of pixel-by-pixel ratio data, set by the scale of all ratio values. For very abundant ion pairs, their individual image may not be better than the ratio image, depending on the abundance changes among pixels within tissue sections. Similarly, the quality of ratio image may not be higher than the individual image if distribution of ratios does not change much among pixels in tissue sections. For example, metabolite or lipids in Figures 2 and 5 are abundant, but non-ratio images do not have better quality than ratio images. Furthermore, ratio image provides additional information on how the ratio of the two metabolite pair changes pixel-by pixel in all tissue sections, such additional information could be useful for data interpretation.

      Results 4: The metabolite pairs are biologically sensible but should be clearly stated that they do not account for differences in ionization efficiency between metabolites and cannot provide quantitative pathway analysis with a high degree of biological confidence.

      We apologize for not clarifying this application more clearly. We meant to compare pathway activity among the equivalent and similar pixel/regions of tissues from different biological groups, given the assumption that ionization efficiency is identical for equivalent pixel from different tissue sections ( i.e. same cell type and microenvironment), especially for metabolites with similar functional structure in the same pathway. For example, fatty acids with different chain length and phospholipid with same head groups are expected to have similar ionization efficiency in the same tissue pixel/region. We have thereby rewritten this section (Page 7, 239-247, 254-255).

      Results 4: "cell-type specific metabolic activity at cellular (10 µm) spatial resolution" Prove the cell type differences with IHC coregistration or MALDI IHC if you want to make claims about them. Just visually determining a tissue type of a scan of a slide is inadequate to support this claim.

      We agree with reviewer’s comments. We meant to provide additional information on cellular level metabolic activity such as adenosine nucleotide phosphorylation status (ATP/AMP) ratio at 10µm resolution. Hippocampus neurons provide a good example for depicting this utility. We have rewritten the claim to highlight the role of ratio imaging in providing additional metabolic information (Page 8, line 288-290).

      Minor Comments:

      Table 2 "Aspartiate" spelling

      We have corrected it.

      Describe the process and mathematical background for ratio computation in the Methods section. As this paper introduces a package, describing its underlying functions has value.

      We have added R-script comments to illustrate the untargeted ratio calculation using the R-mathematical function of combination and division between any two metabolite pairs in a data matrix (Page 4, line 139-141)

      "we annotate missing values with 1/5 the minimum value quantified in all pixels in which it was detected" This is explicit (ie only values with exactly 1/5 the value are annotated" - make it clear this is a threshold.

      We apologize for misunderstanding. Missing values are either have no value or have solid zero in their abundance. We first calculate the minimum abundance of a particular m/z among all pixels with detectable abundance ( i.e. excluding non-missing values), then use 1/5 this minimum value as a threshold to annotate missing value (Page 4, 133-139).

      Figure 1: legend scils is branded SCiLS and EXCEL does not need caps lock (Excel).

      Figure 1 legend has been corrected.

      Conflicts of interest "None" - there are Bruker employees on a paper about MALDI method development in a field they dominate.

      We added Joshua Fischer as a Bruker employee.

      Figure 3: The legend does not describe the purple arrow in J.

      Purple arrow description is added to figure legend.

      Figure 5: Fix orientation inconsistencies in G, H, I, and J. Especially in J - they are opposite directions. This is arbitrary and determined in SCiLS lab with simple rotation.

      Orientation has been made consistent in G,H, I and J.

      Figure S8: Provide exact number of biological and technical replicates used to generate this figure.

      Figure S8, now Figure S9, was generated from 4 biological replicates of KO and 4 biological replicates of WT brain section in the ROI7 region. This information has been added to the figure legend.

      Figure S9: Make consistent orientation of all brains

      We have made brain orientations consistent.

      In addition to ionization efficiencies impacting the value of the numeric relative abundance where ratio computation originates from, it should be mentioned how different classes of metabolites are differentially impacted by the euthanasia and collection methods used for various tissue types. For example, it is well established the ATP/AMP ratio can change drastically from tissue collection.

      We have added this to page 8, line 315-319.

      Perform standards to adjust for ionization efficiency between different m/z features.

      Untargeted ratio imaging serves as an add-on MSI data analysis tool with primary use in comparing ratio among equivalent regions/pixels with similar ionization efficiencies. It is a hypothesis generation tool. Standards adjust for ionization efficiency would be a great idea for a more accurate assessment of ratio values. Due to the cost and availability of stable isotope standards for different m/z, we chose glucose, lactate and ascorbate to showcase that abundance ratio and concentration ratio result in similar images among example brain metabolite lactate, glucose and ascorbate (page 6, 196-205).

      Add more controls to support the claims.

      We have 4 biological replicates for each genotype of brain. We have added the number of controls in all figure legends.

      Significantly tone down the claims, it is unclear how knowledgeable the authors are about the current literature of SW regarding MALDI.

      The tone has been significantly tuned down throughout the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Abstract:

      "relative abundance of structurally identified and yet-undefined metabolites across tissue cryosections" is misleading, since tandem MS can be performed in an imaging context and is often also compatible with the same instrument.

      We have deleted this sentence in the abstract.

      Intro:

      Paragraph 1: The authors mention MALDI and DESI, but I would argue that SIMS is more abundantly used than DESI within single-cell applications.

      We have added SIMS to the introduction Page 3, line 67.

      Paragraph 2: While it may not be all detected pairs, there are many examples of ratio imaging in the MALDI MSI and SIMS communities, particularly for bacterial signaling. These would be important examples to reference.

      We have added the application of SIMS ratio imaging to the introduction, page 3, line 74-75.

      Materials :

      Paragraph 1: More specificity on sample size is required. 3 or 4 per group is not specific. Which has four and which has three? Why are they different?

      We have corrected sample numbers for specific genotype in the text and figure legends. The number of sections per group is different due to the availability of fresh-frozen tissues (Page 4, line 115-117).

      Results:

      Paragraph 1: Am I correct in reading that an .imzml can't be used directly? Why not?

      Imaging Mass Spectrometry Markup Language (imzml) is a common data format for mass spectrometry imaging. It was developed to allow the flexible and efficient exchange of large MS imaging data between different instruments and data analysis software (Schramm et al, 2012). It contains two sets of data: the mass spectral data which is stored in a binary file (.ibd file) to ensure efficient storage and the XML metadata (.imzml file) which stores instrumental parameters, sample details. Therefore, it can’t be used directly. We have added this to result 1(Page 5, line 160-169).

      Paragraph 4: "Additionally, nonlipid small molecule metabolites suffer from smearing and/or diffusion during cryosection processing, including over the course of matrix deposition for MALDI-MSI." This is misleading. There are several examples of MALDI MSI of small metabolites that are nonlipids, where smearing or diffusion have not occurred. It would be beneficial to have a more accurate discussion of this instead. The authors should also provide some evidence of this, since they continue to focus on it for the full paragraph and don't provide references.

      We initially meant the poor image quality of small molecule metabolites is due to its interaction with aqueous phase of spraying solution, rapid degradation rate and matrix interference. We have deleted this sentence in the revised version.

      Section 5 Paragraph 2; "However, ratio imaging revealed a much greater aspartate to glutamate ratio in an unusual "moon arc" region across the amygdala and hypothalamus relative to the rest of the coronal brain." Much greater isn't scientifically accurate or descript. Use real numbers and be quantitative.

      We used pixel data from all 8 sections to obtain quantitative changes in the ratio-generated “moon arc” region compared to the rest of coronal brain (page 8, line 331-337). Ratio imaging revealed a average of 1.59-fold increase in aspartate to glutamate ratio in an unusual “moon arc” region across the amygdala and hypothalamus (mean abundance 0.563 in 6345 pixels) relative to the rest of the coronal brain (mean abundance 0.353 in 45742 pixels, Figure 5D). Similar but different arc-like structures are encompassed within the ventral thalamus and hypothalamus, wherein glutamate to glutamine ratio show a 1.63-fold increase in intensity compared to the rest of the brain (mean abundance of 0.695 in 7108 pixels vs 0.428 in 44979 pixels, Figure 5E).

      Section 8 Paragraph 2: "UMAPing" is not scientifically written.

      We have replaced UMAPing with UMAP.

      Figure 2 is difficult to interpret, given the small sizes of the images. Align the images, reduce the white space, clearly label the different tissues, add scale bars, increase size, etc. This applies to all figures, except for 3. This will make it possible to review.

      All figures have been resized by removing extra space between sections.

      Figure 3. There seems to be a change in tissue after section I, so a different diagram would be helpful. SCD has a high abundance in an area that seems to be off of the tissue. Can the authors explain this? Some of the images also appear to be low signal-to-noise. Example spectra in the SI would be helpful, so I can more accurately judge the quality of the data.

      We apologize for the discrepancy. All images are from the same sample. We initially cropped the individual image from multiple page PDF plot, then inserted it in Figure 3. Resizing and cropping inconsistency may lead to the small difference in image size. In the revised version, we plot all images in one page, which eliminates the inconsistency.

      Figure 3 example pixel data, ratio pixel data, mass spectra and ratio images can be downloaded below:

      https://wcm.box.com/s/2d5jch45ar8upjzytljnylt6doewcsqc

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      In this revised manuscript, the authors aim to elucidate the cytological mechanisms by which conjugated linoleic acids (CLAs) influence intramuscular fat deposition and muscle fiber transformation in pig models. They have utilized single-nucleus RNA sequencing (snRNA-seq) to explore the effects of CLA supplementation on cell populations, muscle fiber types, and adipocyte differentiation pathways in pig skeletal muscles. Notably, the authors have made significant efforts in addressing the previous concerns raised by the reviewers, clarifying key aspects of their methodology and data analysis.

      Strengths:

      (1) Thorough validation of key findings: The authors have addressed the need for further validation by including qPCR, immunofluorescence staining, and western blotting to verify changes in muscle fiber types and adipocyte populations, which strengthens their conclusions.

      (2) Improved figure presentation: The authors have enhanced figure quality, particularly for the Oil Red O and Nile Red staining images, which now better depict the organization of lipid droplets (Figure 7A). Statistical significance markers have also been clarified (Figure 7I and 7K).

      Thanks!

      Weaknesses:

      (1) Cross-species analysis and generalizability of the results: Although the authors could not perform a comparative analysis across species due to data limitations, they acknowledged this gap and focused on analyzing regulatory mechanisms specific to pigs. Their explanation is reasonable given the current availability of snRNA-seq datasets on muscle fat deposition in other human and mouse.

      Thanks for your suggestion!

      (2) Mechanistic depth in JNK signaling pathway: While the inclusion of additional experiments is a positive step, the exploration of the JNK signaling pathway could still benefit from deeper analysis of downstream transcriptional regulators. The current discussion acknowledges this limitation, but future studies should aim to address this gap fully.

      Thanks! As we discussed in discussion part, further studies should focus on the downstream transcriptional regulators of JNK signaling pathway on IMF deposition.

      (3) Limited exploration of other muscle groups: The authors did not expand their analysis to additional muscle groups, leaving some uncertainty regarding whether other muscle groups might respond differently to CLA supplementation. Further studies in this direction could enhance the understanding of muscle fiber dynamics across the organism.

      Thanks for your suggestion! In this study, we mainly focused on the adipocytes, muscles and FAPs subpopulations, which play important roles in lipid deposition. As you suggested, our further study will focus on other subpopulations such as endothelial cells and immune cells.

      Reviewer #2 (Public review):

      Summary:

      This study comprehensively presents data from single nuclei sequencing of Heigai pig skeletal muscle in response to conjugated linoleic acid supplementation. The authors identify changes in myofiber type and adipocyte subpopulations induced by linoleic acid at depth previously unobserved. The authors show that linoleic acid supplementation decreased the total myofiber count, specifically reducing type II muscle fiber types (IIB), myotendinous junctions, and neuromuscular junctions, whereas type I muscle fibers are increased. Moreover, the authors identify changes in adipocyte pools, specifically in a population marked by SCD1/DGAT2. To validate the skeletal muscle remodeling in response to linoleic acid supplementation, the authors compare transcriptomics data from Laiwu pigs, a model of high intramuscular fat, to Heigai pigs. The results verify changes in adipocyte subpopulations when pigs have higher intramuscular fat, either genetically or diet-induced. Targeted examination using cell-cell communication network analysis revealed associations with high intramuscular fat with fibro-adipogenic progenitors (FAPs). The authors then conclude that conjugated linoleic acid induces FAPs towards adipogenic commitment. Specifically, they show that linoleic acid stimulates FAPs to become SCD1/DGAT2+ adipocytes via JNK signaling. The authors conclude that their findings demonstrate the effects of conjugated linoleic acid on skeletal muscle fat formation in pigs, which could serve as a model for studying human skeletal muscle diseases.

      Strengths:

      The comprehensive data analysis provides information on conjugated linoleic acid effects on pig skeletal muscle and organ function. The notion that linoleic acid induces skeletal muscle composition and fat accumulation is considered a strength and demonstrates the effect of dietary interactions on organ remodeling. This could have implications for the pig farming industry to promote muscle marbling. Additionally, these data may inform the remodeling of human skeletal muscle under dietary behaviors, such as elimination and supplementation diets and chronic overnutrition of nutrient-poor diets. However, the biggest strength resides in thorough data collection at the single nuclei level, which was extrapolated to other types of Chinese pigs.

      Weaknesses:

      Although the authors compiled a substantial and comprehensive dataset, the scope of cellular and molecular-level validation still needs to be expanded. For instance, the single nuclei data suggest changes in myofiber type after linoleic acid supplementation, but these findings need more thorough validation. Further histological and physiological assessments are necessary to address fiber types and oxidative potential. Similarly, the authors propose that linoleic acid alters adipocyte populations, FAPs, and preadipocytes; however, there are limited cellular and molecular analyses to confirm these findings. The identified JNK signaling pathways require additional follow-ups on the molecular mechanism or transcriptional regulation. However, these issues are discussed as potential areas for future exploration. While various individual studies have been conducted on mouse/human skeletal muscle and adipose tissues, these have only been briefly discussed, and further investigation is warranted. Additionally, the authors incorporate two pig models into their results, but they only examine one muscle group. Exploring whether other muscle groups respond similarly or differently to linoleic acid supplementation would be valuable. Furthermore, the authors should discuss how their results translate to human and pig nutrition, such as the desirability and cost-effectiveness for pig farmers and human diets high in linoleic acid. Notably, while the single nuclei data is comprehensive, there needs to be a statement on data deposition and code availability, allowing others access to these datasets.

      Thanks for your suggestion!

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors have discussed and provided some experimental evidence to address the related issues to help justify their conclusions. The reviewer believes that authors should deposit their single-cell sequencing data and code for the broader research community.

      Thank you! We have uploaded our raw dataset in the Genome Sequence Archive (Genomics, Proteomics & Bioinformatics 2021) in National Genomics Data Center (Nucleic Acids Res 2022), China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences and data availability part has been updated (line 575-579).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigates the role of macrophage lipid metabolism in the intracellular growth of Mycobacterium tuberculosis. By using a CRISPR-Cas9 gene-editing approach, the authors knocked out key genes involved in fatty acid import, lipid droplet formation, and fatty acid oxidation in macrophages. Their results show that disrupting various stages of fatty acid metabolism significantly impairs the ability of Mtb to replicate inside macrophages. The mechanisms of growth restriction included increased glycolysis, oxidative stress, pro-inflammatory cytokine production, enhanced autophagy, and nutrient limitation. The study demonstrates that targeting fatty acid homeostasis at different stages of the lipid metabolic process could offer new strategies for host-directed therapies against tuberculosis.

      The work is convincing and methodologically strong, combining genetic, metabolic, and transcriptomic analyses to provide deep insights into how host lipid metabolism affects bacterial survival.

      Strengths:

      The study uses a multifaceted approach, including CRISPR-Cas9 gene knockouts, metabolic assays, and dual RNA sequencing, to assess how various stages of macrophage lipid metabolism affect Mtb growth. The use of CRISPR-Cas9 to selectively knock out key genes involved in fatty acid metabolism enables precise investigation of how each step-lipid import, lipid droplet formation, and fatty acid oxidation affect Mtb survival. The study offers mechanistic insights into how different impairments in lipid metabolism lead to diverse antimicrobial responses, including glycolysis, oxidative stress, and autophagy. This deepens the understanding of macrophage function in immune defense.

      The use of functional assays to validate findings (e.g., metabolic flux analyses, lipid droplet formation assays, and rescue experiments with fatty acid supplementation) strengthens the reliability and applicability of the results.

      By highlighting potential targets for HDT that exploit macrophage lipid metabolism to restrict Mtb growth, the work has significant implications for developing new tuberculosis treatments.

      Weaknesses:

      The experiments were primarily conducted in vitro using CRISPR-modified macrophages. While these provide valuable insights, they may not fully replicate the complexity of the in vivo environment where multiple cell types and factors influence Mtb infection and immune responses.

      We thank the reviewer for pointing this out. We acknowledge that our in vitro system may indeed not fully replicate the complex in vivo environment given of what is becoming to light of macrophage heterogenous responses to Mtb infection in whole animal models. We do believe, however, that the Hoxb8 in vitro model provides a powerful genetic tool to interrogate host-Mtb interactions using primary macrophages that represent the bone marrow-derived macrophage lineage.

      Reviewer #2 (Public review):

      Summary:

      Host-derived lipids are an important factor during Mtb infection. In this study, using CRISPR knockouts of genes involved in fatty acid uptake and metabolism, the authors claim that a compromised uptake, storage, or metabolism of fatty acid restricts Mtb growth upon infection. Further, the authors claim that the mechanism involves increased glycolysis, autophagy, oxidative stress, pro-inflammatory cytokines, and nutrient limitation. The authors also claim that impaired lipid droplet formation restricts Mtb growth. However, promoting lipid droplet biogenesis does not reverse/promote Mtb growth.

      Strengths:

      The strength of the study is the use of clean HOXB8-derived primary mouse macrophage lines for generating CRISPR knockouts.

      Weaknesses:

      There are many weaknesses of this study, they are clubbed into four categories below

      (1) Evidence and interpretations: The results shown in this study at several places do not support the interpretations made or are internally contradictory or inconsistent. There are several important observations, but none were taken forward for in-depth analysis.

      a) The phenotypes of PLIN2<sup>-/-</sup>, FATP1<sup>-/-</sup>, and CPT-/- are comparable in terms of bacterial growth restriction; however, their phenotype in terms of lipid body formation, IL1B expression, etc., are not consistent. These are interesting observations and suggest additional mechanisms specific to specific target genes; however, clubbing them all as altered fatty acid uptake or catabolism-dependent phenotypes takes away this important point.

      We thank the reviewer for highlighting this. Our focus was on assessing the impact of manipulating lipid homeostasis in macrophages at several stages and the consequences this has on the intracellular growth of Mtb. Throughout the manuscript (abstract, results and discussion), we have continuously emphasized that interfering with lipid handling at several stages in macrophages results in both conserved and divergent antimicrobial responses against intracellular Mtb.

      b) Finding the FATP1 transcript in the HOXB8-derived FATP1<sup>-/-</sup> CRISPR KO line is a bit confusing. There is less than a two-fold decrease in relative transcript abundance in the KO line compared to the WT line, leaving concerns regarding the robustness of other experiments as well using FATP1<sup>-/-</sup> cells.

      CRISPR-Cas9 targeting of genes with single sgRNAs as is the case with our mutants generates insertions and deletions (INDELs) at the CRISPR cut site. These INDELs do not block mRNA transcription totally, and this is widely reported in the field.  Because of this, quantitative RT-PCR or RNA-seq methods are not routinely used to verify CRISPR knockouts as they are not sensitive enough to identify INDELs. We provide INDEL quantification and knockout efficiencies by ICE analysis in supplemental file 1 for all the mutants used in the study. We also demonstrate protein depletion by western blot and flow cytometry for all the mutants (Figure 1 - figure supplement 1). Only mutants with greater than >90% protein depletion were used for subsequent characterization.

      c) No gene showing differential regulation in FATP<sup>-/-</sup> macrophages, which is very surprising.

      We assume the reviewer is referring to the Mtb transcriptome response in FATP1<sup>-/-</sup> macrophages, which we agree was unexpected.  However, we saw a significant compensatory response in the host cell (at transcriptional level) in FATP1<sup>-/-</sup> macrophages as evidenced by an upregulation of other fatty acid transporters (Figure 5 - figure supplement 1, now Figure 6 - figure supplement 1). We believe that these compensatory responses could, in part, alleviate the stresses the bacteria experience within the cell. We discuss this point in the manuscript.

      d) ROS measurements should be done using flow cytometry and not by microscopy to nail the actual pattern.

      We thank the reviewer for the suggestion. However, confocal imaging is also widely used to measure ROS with similar quantitative power and individual cell resolution (PMID: 32636249, 35737799).

      (2) Experimental design: For a few assays, the experimental design is inappropriate

      a) For autophagy flux assay, immunoblot of LC3II alone is not sufficient to make any interpretation regarding the state of autophagy. This assay must be done with BafA1 or CQ controls to assess the true state of autophagy.

      We would like to point out that monitoring LC3I to LC3II conversion by western blot, confocal imaging of LC3 puncta and qPCR analysis of autophagy related genes are all validated assays for monitoring autophagic flux in a wide variety of cells. We refer the reviewer to the latest extensive guidelines on the subject (PMID: 33634751). Furthermore, Bafilomycin A and chloroquine are not specific inhibitors of autophagy and therefore are of limited value as controls. BafA is an inhibitor of the proton-ATPase apparatus and can indirectly impact autophagy through activity on the Ca-P60A/SERCA pathway. Chloroquine impacts vacuole acidification, autophagosome/lysosome fusion and slows phagosome maturation. So, while BafA and chloroquine will reduce autophagy; their effects are pleotropic and their impact on Mtb is unknown.

      b) Similarly, qPCR analyses of autophagy-related gene expression do not reflect anything on the state of autophagy flux.

      See our response above.

      (3) Using correlative observations as evidence:

      a) Observations based on RNAseq analyses are presented as functional readouts, which is incorrect.

      We are not entirely sure where we used our RNA-seq data sets as functional readouts. We used our transcriptome data to provide a preliminary identification of anti-microbial responses in the mutant macrophages infected with Mtb and we mention this at the beginning of the RNA-seq results sections. Where applicable, we followed up and confirmed the more compelling RNA-seq data either by metabolic flux analyzes, qPCR, ROS measurements, and quantitative imaging.

      b) Claiming that the inability to generate lipid droplets in PLIN2<sup>-/-</sup> cells led to the upregulation of several pathways in the cells is purely correlative, and the causal relationship does not exist in the data presented.

      It was not our intention to infer causality. We have re-written the beginning of the sentence, and it now starts with “Meanwhile, Mtb infection of PLIN2<sup>-/-</sup> macrophages led to upregulation” which hopefully eliminates any association to causality.

      (4) Novelty: A few main observations described in this study were previously reported. That includes Mtb growth restriction in PLIN2 and FATP1 deficient cells. Similarly, the impact of Metformin and TMZ on intracellular Mtb growth is well-reported. While that validates these observations in this study, it takes away any novelty from the study.

      To the best of our knowledge, Mtb growth restrictions in PLIN2 and FATP1 deficient macrophages have not been reported elsewhere. To the contrary, PLIN2 knockout macrophages obtained from PLIN2 deficient mice have been reported to robustly support Mtb replication (PMID: 29370315). We extensively discuss these discrepancies in the manuscript. We also discuss and cite appropriate references where Mtb growth restriction for similar macrophage mutants have been reported (CD36<sup>-/-</sup> and CPT2<sup>-/-</sup>). Our aim was to carry out a systematic myeloid specific genetic interference of fatty acid import, storage and catabolism to assess the effect on Mtb growth at all stages of lipid handling instead of focusing on one target. In the chemical approach, we used TMZ and Metformin deliberately because they had already been reported as being active against intracellular Mtb and we wished to place our data in the context of existing literature.  These studies have been referenced extensively in the text.

      (5) Manuscript organisation: It will be very helpful to rearrange figures and supplementary figures.

      New figures have been added, and existing ones have been re-arranged where necessary. See our responses to recommendations for authors.

      Reviewer #3 (Public review):

      Summary:

      This study provides significant insights into how host metabolism, specifically lipids, influences the pathogenesis of Mycobacterium tuberculosis (Mtb). It builds on existing knowledge about Mtb's reliance on host lipids and emphasizes the potential of targeting fatty acid metabolism for therapeutic intervention.

      Strengths:

      To generate the data, the authors use CRISPR technology to precisely disrupt the genes involved in lipid import (CD36, FATP1), lipid droplet formation (PLIN2), and fatty acid oxidation (CPT1A, CPT2) in mouse primary macrophages. The Mtb Erdman strain is used to infect the macrophage mutants. The study, reveals specific roles of different lipid-related genes. Importantly, results challenge previous assumptions about lipid droplet formation and show that macrophage responses to lipid metabolism impairments are complex and multifaceted. The experiments are well-controlled and the data is convincing.

      Overall, this well-written paper makes a meaningful contribution to the field of tuberculosis research, particularly in the context of host-directed therapies (HDTs). It suggests that manipulating macrophage metabolism could be an effective strategy to limit Mtb growth.

      Weaknesses:

      None noted. The manuscript provides important new knowledge that will lead mpvel to host-directed therapies to control Mtb infections.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The study presents compelling and well-supported conclusions based on a solid body of evidence. However, the clarity of several figures could be improved for better understanding.

      (1) In Figure 1, panels B and C are referenced incorrectly in the text.

      We thank the reviewer for identifying the error. This has now been corrected

      (2) Figures 2 and S2 would benefit from being combined or reorganized to display the data related to infected and uninfected cells together, making it easier for the reader to interpret.

      We thank the reviewer for the suggestion. However, we believe that combining the two figures would further complicate the merged figure making it even more difficult to interpret. We decided to highlight the mutant macrophage’s responses upon Mtb infection in Figure 2 and put the uninfected data sets in supplementary information given that the OCR and ECAR trends were similar and as expected in both infected and uninfected states.

      (3) Figure 3 is mislabeled, with four panels shown in the figure, but only panels A and B are mentioned in both the text and the figure legend.

      We thank the reviewer for the observation. Figure 3 has been extensively revised. We have included new blots, statistical comparisons and a corresponding new supplementary figure (Figure 3 - figure supplement 1). We have verified that the figure panels are labelled correctly and appropriately referenced in the manuscript text.

      (4) Figure 5 is overly complex and difficult to interpret. Simplifying the figure, possibly by reducing the amount of data or breaking it into more digestible parts, would enhance its readability.

      We thank the reviewer for the suggestion. We have separated the figure into two parts which are now Figure 5 for the PCA and Venn diagrams and Figure 6 for the pathway enrichment figure panels. We have increased the resolution of both figures in the revised manuscript to improve readability.

      (5) Panel 6A is not particularly informative and could either be omitted with a more detailed explanation provided in the text, or replaced with a clearer visual representation, such as Venn diagrams, to improve data visualization.

      We thank the reviewer for the suggestion. We have removed Figure 6A given that detailed explanation of the panel is already available in the manuscript text.

      (6) Additionally, on line 309, the word "to" is missing before "generate".

      We thank the reviewer for identifying this. This sentence has now been re-written to address some unintended inferences of causation in line with recommendations from reviewer 2.

      Reviewer #2 (Recommendations for the authors):

      (1) Manuscript Organisations: The manuscript is very poorly organised. Supplemental figures are labelled very unconventionally, and that creates much confusion in following the manuscript. Some of the results in the supplementary figures could be easily kept in the main figures, as it is difficult to compare plots between the main figures and the supple figures. The results of RNAseq experiments are impossible to follow with very small fonts. Overall, the figures are very casually organised and can certainly be improved.

      We would like to clarify that supplemental figures are labelled and organized as is in line with the eLife formatting of supplemental figures. We deliberately put some redundant figures like Figure 2 - figure supplement 1 in supplementary information (see our response to reviewer 1 recommendations on the same). We have split the RNA-seq Figure 5 into two separate figures (now Figure 5 and 6) and increased their resolution to improve readability.

      (2) Figure 3: Among the KO lines, only PLIN2<sup>-/-</sup> had a higher HIF1a level before infection. Infection surely leads to higher levels across the three cases.

      We have generated replicate western blots and provide statistical quantitation for both HIF1a, AMPK and pAMPK. Figure 3 has now been revised extensively, replicate blots are in Figure 3 - figure supplement 1. We have updated the text to reflect the reviewer observation which was also consistent with our statistical quantification.

      (3) pAMPK blots are of very poor quality. Without quantification, the trend mentioned in the text is not clearly visible.

      We have provided two more replicate blots for AMPK/pAMPK and provide statistical quantification as described above.

      (4) Line 230: Regarding autophagy flux, neither the data suggest what is interpreted nor is this experiment correctly done. LC3 WB and autophagy gene qPCR: Unfortunately, LC3 WB, the way it was done, does not tell anything about the state of autophagy in these cells. A very mild LC3II increase is noted in CPT2<sup>-/-</sup> cells upon infection; the rest of the others do not show any change. This assay is not done correctly. To interpret LC3II WB, one needs to include the Bafilomycin A1 control, usually +Baf and -Baf run in the adjacent wells in the gel. Similarly, qPCR results are not indicative of any increase in autophagy. Regulation of ATG7, MAP1LC3B, and ULK1 is more at the post-translational level than the transcriptional level.

      We have provided an additional replicate blot together with statistical quantification of LC3II/LC3I ratios in the revised Figure 3 - figure supplement 2. Our quantifications remain consistent with our prior assertations in the manuscript text. See our response in the public review section concerning autophagy assays and the use of Baf or chloroquine as controls.

      (5) Exogenous oleate fails to rescue the Mtb icl1-deficient mutant in FATP1<sup>-/-</sup>, PLIN2<sup>-/-</sup> and CPT2<sup>-/-</sup> macrophages: this result is confusing. Lipid uptake and metabolism have been the central players so far; however, here, the phenotypes of FATP1 and CPT2 in terms of lipid body accumulation are very distinct. Therefore, the assessment that Mtb growth inhibition is due to factors other than limited access to fatty acid is not consistent with the theme of the study.

      Nutrient limitation is a distinct transcriptional signature of Mtb, at least in PLIN2<sup>-/-</sup> macrophages (Figure 7). We used the oleate supplementation assay with the Mtb Dicl1 mutant to assess whether nutrient restriction was the sole anti-microbial pathway against Mtb in the knockout macrophages. This would have been the case (to a certain extent) if the growth of the Mtb Dicl1 mutant was rescuable upon addition of exogenous oleate in the knockout macrophages. Our data clearly shows that this is not the case and that in addition to nutrient limitation, interference with lipid processing results in several other macrophage anti-microbial responses against the bacteria. We extensively discuss these points in the abstract, results and discussion sections of the manuscript.

      (6) Line 309: "Meanwhile, inability generate lipid droplets in Mtb infected PLIN2<sup>-/-</sup> macrophages led to upregulation in pathways involved in ribosomal biology, MHC class 1 antigen presentation, canonical glycolysis, ATP metabolic processes and type 1 interferon responses (Figure 5C, Supplementary file 3)." This is just a correlative observation. However, it is mentioned here as a causal mechanism.

      We have revised this sentence to remove any unintended inference of causation.

      (7) IL-1b is upregulated in FATP-/- macrophages, no effect in CPT2<sup>-/-</sup> macrophages, but downregulated in PLIN2<sup>-/-</sup> macrophages. Moreover, this effect is very transient, and by 24 hours, all these differences are lost. This suggests the mechanism of action, as their pro-bacterial function shown in Figure 1, is very distinct for different proteins, and FA metabolism is probably not the common denominator across these phenotypes.

      We agree with the reviewer, and we extensively discuss this in the manuscript text (results and discussion). Clearly, they are shared anti-microbial responses across the mutants, but they are also points of divergence. We would like to further clarify that pro-inflammatory responses (IL-1b or IFN-B) in Mtb infected macrophages show a biphasic early upregulation (up to 8 hours of infection) followed by a rapid resolution phase (24-48 hours post infection). This is well reported in the literature (PMID: 30914513). It is common for pro-inflammatory gene expression differences to be temporary lost during the resolution phase (PMID: 30914513, 39472457). IL-1b expression profiles return to the 4-hour equivalent profile in Mtb infected FATP1<sup>-/-</sup> and PLIN2<sup>-/-</sup> macrophages 4 days post infection (Figure 6A, Figure 6 - figure supplement 2B, Supplementary file 2)

      (8) It is very surprising that FATP-/- macrophages do not show any change in Mtb gene expression. The robustness of this experiment and analysis appears doubtful, given that the phenotype in terms of bacterial growth was clean.

      See our response to this comment in the public reviews section

      (9) Figure 5, Supplementary Figure 1: Among the FA transporters, authors also show data for FATP1. I am surprised to see FATP1 expression levels in the FATP1<sup>-/-</sup> cells. This puts into doubt every dataset using FATP-/- cells in this study.

      See our response to this comment in the public reviews section

      (10) Unfortunately, with the kind of evidence presented, it is far-fetched to claim that PLIN2<sup>-/-</sup> macrophages restrict Mtb growth by increasing ROS production. There is no evidence for this statement. The MFI units in Figure 6, Supplementary 1 are too small to extract meaningful interpretations. Moreover, the data appears to be arrived at by combining multiple technical replicates. Usually, flow cytometry data are more reliable for CellROX assays. Microscopy is not the technique of choice for this assay.

      We would like to point out that MFIs are arbitrary units set to predetermined reference points. In our case, the reference was background fluorescence in CellROX unstained cells and cells stained with CellROX equivalent fluorophore conjugated isotype antibodies. We are not entirely sure what the reviewer means by “small” in these contexts. And the data is not entirely from technical replicates. Reported MFIs are from three independent repeats with MFI reads of at least 30 cells per replicate. We have added this clarification in Figure 6 - figure supplement 1 legend, now Figure 7 - figure supplement 1. See our response in the public reviews section on the use of confocal microcopy to image and quantify ROS. Furthermore, the Mtb transcriptional response in PLIN2<sup>-/-</sup> and CPT2<sup>-/-</sup> macrophages is clearly indicative of increased oxidative stresses (Figure 7).

      (11) The CFU results with Metformin and TMZ are on the expected lines, as published earlier by others. FATP1 In data is good and aligned with the knockout phenotype.

      We thank the reviewer for the note.

      (12) Western blots, when interpreted for quantitative differences, must be quantified, and data should be represented as plots with statistical analysis.

      Replicate blots have been provided and statistical quantifications performed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews

      Reviewer #1 (Public review):

      Overall I find the evidence very well presented and the study compelling. It offers an important new perspective on the key properties of neoblasts. I do have some comments to clarify the presentation and significance of the work.

      We thank the reviewer for the positive feedback and plan to improve the presentation of the work.

      Reviewer #2 (Public review):

      However, the absence of a cell-cell feedback mechanism during colony growth and the likelihood of the difference needs to be clarified. Is there any difference in interpreting the results if this mechanism is considered?

      We will improve the description of the model assumptions and the interpretation of the data on the basis of these assumptions.

      Although hnf-4 and foxF have been silenced together to validate the model, a deeper understanding of the tgs-1+ cell type and the non-significant reduction of tgs-1+ neoblasts in zfp-1 RNAi colonies is necessary, considering a high neural lineage frequency.

      We will improve the analysis of this result in light of the experimentally determined frequency of the tgs-1+ neoblast population.

      Recommendations for the authors

      Reviewing Editor Comments:

      After consultation, we have compiled a list of the key changes to be made to the manuscript, along with reviewer-specific recommendations to follow.

      (1) Include a section that explicitly describes the assumptions and limitations of the study, particularly with respect to the following assumptions:

      We thank the reviewers for the comment. We added a description of the model assumptions in the methods section “Assumptions underlying neoblast colony growth model”.

      a) All known types of specialized neoblasts cycle at the same rate (see points from Reviewer 1).

      We thank the reviewers for the comment. The current data used to estimate τ (Lei et al., Dev Cell, 2016) does not allow the direct estimation of individual cycling behaviors. Consequently, we assume that all specialized neoblasts cycle at the same average rate, a simplification supported by the model's accurate prediction of colony growth.

      b) The assumption that any FSTF-like gene would behave like zfp1 or foxF and hnfA genes. The manuscript does not mention that there may be fundamental differences among these different FSTFs that could be uncovered by future work. A strong addition to the paper would be to test other epithelial genes (e.g. p53, chd4, egr5) to show reproducible behavior within a single lineage.

      We thank the reviewers for the comment. Colony size reduction following inhibition of Smed-p53 and failure to produce epidermal progenitors is strongly supported by previous analysis (Wagner et al., Cell Stem Cell, 2012). We refer to this observation in the paper in the section titled: “Inhibition of zfp-1 does not induce overexpression of other lineages in homeostasis”. We added the following sentence to the discussion (Line 460-462): Interestingly, suppression of Smed-p53, a TF expressed in neoblasts and required for epidermal cell production, has resulted in a similar reduction in colony size (Wagner et al., Cell Stem Cell, 2012).

      Of note, Chd4 expression is not limited to specialized neoblasts or to a specific lineage (Scinome et al., Development, 2010), and therefore its inhibition likely has a more complex outcome than an effect on a single lineage. Furthermore, egr-5 is not expressed in neoblasts (Tu et al, eLife, 2015), making this experimental condition more challenging to examine in the context of neoblast colonies at the time points assessed in this study.

      c) The fact that the data used to feed the model relies on radiated animals which are likely to have altered cell cycle rates compared to unirradiated animals (see comment by Reviewer 1). Of note, the model predicts a steady increase in colony size, but colony size does not change between 9dpi and 12dpi.

      We thank the reviewers for the comment. The colony size in control animals increased between 9 and 12 dpi (Fig 3B), as predicted by the model. In zfp-1 (RNAi) animals, the median colony size has also increased over this period, at a slower rate, which we attribute to the increase in q. We attribute the unchanged average colony size to an increase in the frequency of cells failing to proliferate, because of selection of a fate they cannot fully differentiate into.

      d) In light of both reviewers' comments about colony expansion vs. feedback, the authors should discuss how predicted changes to division frequencies might change as homeostasis is reached, or explain how their model accounts for the predicted rate differences under homeostatic conditions in which overall neoblast numbers do not change. Can the model estimate when this transition might occur?

      We thank the reviewers for the comment. Our colony assays are constrained by the animals survival following sub-total irradiation (16 to 20 days). In this timeframe, the neoblast population is overwhelmingly smaller in comparison to non-irradiated animals. Therefore, the animals do not reach homeostasis during the experiment, and the model does not allow to estimate the time the system would need to return to homeostasis.

      (2) In Figure 2D, the assumption is that these adjacent smedwi-1+ cells are sisters. Previous data analyzing this relied on EdU or H3P staining to show a shared division history. When these images were collected is therefore extremely critical to include (the methods suggest 7, 9, or 12 days). The authors should justify why they believe that these adjacent cells are derived from a single neoblast that has divided only once.

      We thank the reviewers for the comment. The images were collected at 7 dpi. We modified the figure legend and the associated methods to include this information. At this early time point, smedwi-1+ cell dyads are spatially separated from other neighboring cells, suggesting that they are the product of a single cell division. Importantly, our data is in complete agreement with previous estimates of symmetric renewal division rate (Raz et al., Cell Stem Cell, 2021; Lei et al, Developmental Cell, 2016).

      (3) Clarify the wording 'pre-selected' in the abstract as described by Reviewer 1.

      We thank the reviewers for the comment, and for clarity we replaced the wording “pre-select” with “select”. 

      (4) Experimental details that are important to the interpretation should be added. For example, how is belonging to a colony defined? This is important because some of the data (e.g. Figure S1A: similar numbers of smedwi-1+ cells are observed at 2dpi and 4dpi, but 4dpi is considered a colony whereas 2dpi is not). The timing of quantification should be included in each figure (it is missing in Figure S2, and Figure 3C and 3D). How the authors distinguish biological vs technical replicates is not mentioned.

      We thank the reviewers for the comment. Subtotal irradiation may result in formation of a spatially-isolated cluster of neoblasts that is not distributed throughout the animal (Wagner et al., Science, 2011). This localized cluster of neoblasts is defined as a neoblast colony (Wagner et al., Science, 2011; Wagner et al., Cell Stem Cell, 2012). The small number of high smedwi-1+ cells observed at 4 dpi in our experiments aligns with this definition (Fig S1A). By contrast, the low smedwi-1 expression detected across the animal 2 dpi does not fit this definition and likely reflects remnants of dying neoblasts resulting from irradiation. The following text was added to the figure legend: “isolated cells expressing low levels of smedwi-1+ were scattered in the planarian parenchyma, likely reflecting remnants of dying neoblasts”.

      (5) Figure 5F appears to use SMEDWI-1 antibody (based on capital letters and increased signal in the brain). Is this the case? The methods do not mention the use of a SMEDWI-1 antibody, and the text indicates that these are progenitors, but SMEDWI-1 protein is well known to not mark neoblasts. If the antibody was used, the authors should not claim that these are neoblasts.

      We thank the reviewers for the comment. The SMEDWI-1 antibody used in the experiments described in Figure 5F indeed labels neoblasts and their progeny (Guo et al., Developmental cell, 2006). The methods section “Immunofluorescence combined with FISH” details the labeling procedure, which combines FISH and IF using this antibody.

      All microscopy images are difficult to see. Perhaps this is because they are formatted as CMYK images. They should be converted to RGB format to make them appear less dull.

      We thank the reviewer for the comment. Improved version of the figures has now been uploaded.

      The terminology used in Figure 5 to describe upregulation should not be "overexpression".  We thank the reviewers for the comment.

      We changed the terminology to “upregulated”.

      Reviewer #1 (Recommendations for the authors):

      I think the authors should include a section that explicitly lays out the assumptions and limitations of the study. For example, I believe that determining tau requires assuming that all different types of specialized neoblasts cycle at the same rates. Also there is the assumption that any FSTF-like gene would behave like zfp1 or foxF and hnfA genes. It seems to remain possible that a future study could find that a subset of FSTFs might indeed exert "either/or" decisions in fating, just not the particular genes under investigation here.

      We thank the reviewer for the comment. We added a description of the model assumptions in the methods section.

      In the abstract, the wording "pre-selected" is somewhat puzzling to me. I would interpret a preselection as a process that defines the next specified state prior to its manifestation. Instead, and as I understand the authors argue this as well, the study provides good evidence that the determination mechanism is random in that subsequent neoblast choices do not likely depend on prior states. So I would suggest changing that wording.

      We thank the reviewer for the comment. We replaced “pre-select” with “select”

      Is it possible to determine the uncertainty in measuring tau the cell cycle time and would this have an impact on subsequent modeling?

      We thank the reviewers for the comment. The current data that was used to estimate tau (Lei et al., Dev Cell, 2016) does not allow us to directly estimate the uncertainty in measuring τ.

      For lines 154-164 I would suggest doing a little more to explicitly write out the logic of determining the growth constants within the main text and not just in methods, for ease of reading.

      We thank the reviewer for the comment, and added explanations for how we determined the growth constant in the text. The text now reads (lines 160-166): “Considering an average cell cycle length of 29.7 hours, we calculated the value of q using the following approach: the probabilities of all cell division outcomes must sum to 1. Our experimental data showed that symmetric renewal (p) and asymmetric division (a) occur at equal rates (i.e., p = a). By fitting these parameters to the experimental data, we determined that the difference between the probabilities of symmetric renewal and symmetric differentiation (i.e., p - q) was = 0.345 (Fig 2E, S1D-E). Therefore, with these criteria, we estimated the probabilities of cell division outcomes in the colony as p = 0.45, a = 0.45, and q = 0.1 (Fig 2G; Methods).”

      Line 192 why does post-mitotic progeny number linearly relate to neoblast number? In clones, a change in q has an exponential effect. I feel like I am missing something.

      We thank the reviewer for the comment. In colonies, 50% of cell divisions result in the production of post-mitotic progeny (asymmetric division). Therefore, the number of produced progenitors in a given cell cycle is linearly correlated with the number of neoblasts. This statement is in line with previous analysis of planarian colony size (Wagner et al., Cell Stem Cell, 2012).

      Line103 it also seems possible, although less likely, that the specified state is not fixed within a given cell cycle and could be that cells that try to switch into zeta-neoblasts mid-cell cycle arrest in proliferation etc just for that time.

      We thank the reviewer for the comment and agree that this is a possibility. However, our observations suggest that incorporating this factor into the model is unnecessary for accurately predicting colony size.

      In terms of the feedback mechanism proposed to operate in homeostasis, I think in the case of zfp-1 it is quite likely that loss of epidermal differentiation results in wound responses (this phenomenon has been documented in egr-5 RNAi in Tu et al 2015 I believe). This could play out differently in the clone assay because the effects of sublethal irradiation on this process would predominate in both control versus zfp1(RNAi) conditions.

      We thank the reviewer for the comment. Our RNA-seq analysis following zfp-1 inhibition did not show overexpression of injury-induced genes at an early time point (6 days; Fig. 5B-C). However, an increase in cycling cells was detected much earlier via EdU labeling (3 days; Fig. 5D). In the case of egr-5 suppression, Tu et al. analyzed injury-induced gene expression at a later stage (21 days of RNAi), where they found significant epidermal defects (see Fig. 5C in Tu et al.). We agree that sublethal irradiation effects likely predominate in colony analysis for both control and zfp-1 (RNAi) animals. In homeostasis, additional factors likely influence cell proliferation and differentiation.

      It seems likely that some of the differences noted between homeostasis versus clone growth could ultimately arise from the different growth parameters under each setting. Could the rate parameters be estimated from prior data in homeostasis as well? It seems to me that with the framework the authors use, homeostasis must involve a net zero change to neoblast abundance (also shown by Wagner 2011 by the sigmoidal curve of neoblast abundance at the endpoint of clone expansion). Therefore, in these conditions p=q by definition. Experimental evidence from Lei 2016 (Figure S7M) suggests asymmetric divisions and symmetric renewing divisions are about equally abundant (5/12 41% sym renewing vs 7/12 69% asymmetric renewing). Therefore, under homeostasis, there would be an estimated p=q=0.3 and a=0.4. Compared to clone growth conditions then, in homeostasis, it seems that roughly the rate of symmetric renewal decreases and the rate of symmetric differentiation also increases. I wonder, could this kind of difference potentially account for the differences between homeostasis versus clone expansion settings? It is also worth noting that the clone expansion context has been used as a sensitized genetic background for identifying effects of gene inhibition on neoblast self-renewal, so perhaps the reason this works is that the rates of selfrenewal are relatively less in homeostasis so that clone expansion represents a case where there is greater demand for self-renewal.

      We thank the reviewer for the comment. We agree that under homeostatic conditions, where the population size remains stable, the average probability of symmetric renewal matches the average probability of symmetric differentiation or elimination. By contrast, during colony expansion, the probability of symmetric renewal exceeds that of symmetric differentiation or elimination. The differences in response to a lineage block between homeostasis and colony expansion can have multiple interpretations. However, data from homeostatic animals does not permit the analysis of individual neoblasts or their specific responses to a lineage block. Consequently, we cannot determine whether the proliferative response following the lineage block during homeostasis is a direct response to the lineage block or an indirect effect resulting from changes in other neoblasts. We discuss these possibilities further in lines 472 - 484.

      In terms of the memory effect, I recall some arguments presented in the Raz 2021 study that were consistent with a slight memory for neoblast specification being retained. I believe this was a minor point from detecting a slightly higher likelihood of identifying 2-cell clones that both took on prog1+ identity compared to the population average. If this is the case, it may be worth the authors commenting on reconciling those observations with their model.

      We thank the reviewer for their comment. Raz et al. (Cell Stem Cell, 2021) reported that in the asymmetric division of a zeta-neoblast, which generates a prog-2+ cell and a neoblast, there was a slightly higher observed frequency of zfp-1 expression in the neoblast compared to the expected rate (Expected: 32%, Observed: 44%). This small increase may reflect a mild memory effect, experimental variability, or both. However, statistical analysis using Fisher's exact test yielded a non-significant p-value (p = 0.1), suggesting that this difference could be attributed to experimental variability. Other data from Raz et al., such as lineage representation in early colonies, also did not show significant memory effects, indicating that any such effects, if present, are minimal and difficult to detect. Therefore, while we do not, and cannot, rule out the presence of minor memory effects, we expect that effects of this magnitude will have minimal impact on our model.

      Reviewer #2 (Recommendations for the authors):

      Figure 2C and 2D:

      Please provide the specific time points for the data presented.

      We thank the reviewer for the comment. The information was added to the figure legend.

      Colony growth and homeostasis:

      It would be beneficial to estimate a time point at which colony growth transitions to a model with a cell-cell feedback mechanism, similar to that observed in homeostasis. This would help in understanding the dynamics and timing of these processes.

      We thank the reviewers for the comment. Our colony assays were constrained by the animals survival following sub-total irradiation (16 to 20 days). Neoblast numbers are substantially reduced compared to unirradiated animals, preventing us from determining the time point at which homeostasis is achieved.

      Methods:

      μl should be μL  

      The text was changed accordingly.

      Line 526: H2O should be H2O

      The text was changed accordingly.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript describes the role of PRDM16 in modulating BMP response during choroid plexus (ChP) development. The authors combine PRDM16 knockout mice and cultured PRDM16 KO primary neural stem cells (NSCs) to determine the interactions between BMP signaling and PRDM16 in ChP differentiation.

      They show PRDM16 KO affects ChP development in vivo and BMP4 response in vitro. They determine genes regulated by BMP and PRDM16 by ChIP-seq or CUT&TAG for PRDM16, pSMAD1/5/8, and SMAD4. They then measure gene activity in primary NSCs through H3K4me3 and find more genes are co-repressed than co-activated by BMP signaling and PRDM16. They focus on the 31 genes found to be co-repressed by BMP and PRDM16. Wnt7b is in this set and the authors then provide evidence that PRDM16 and BMP signaling together repress Wnt activity in the developing choroid plexus.

      Strengths:

      Understanding context-dependent responses to cell signals during development is an important problem. The authors use a powerful combination of in vivo and in vitro systems to dissect how PRDM16 may modulate BMP response in early brain development.

      Main weaknesses of the experimental setup:

      (1) Because the authors state that primary NSCs cultured in vitro lose endogenous Prdm16 expression, they drive expression by a constitutive promoter. However, this means the expression levels are very different from endogenous levels (as explicitly shown in Supplementary Figure 2B) and the effect of many transcription factors is strongly dose-dependent, likely creating differences between the PRDM16-dependent transcriptional response in the in vitro system and in vivo.<br />

      We acknowledge that our in vitro experiments may not ideally replicate the in vivo situation, a common limitation of such experiments, our primary aim was to explore the molecular relationship between PRDM16 and BMP signaling in gene regulation. Such molecular investigations are challenging to conduct using in vivo tissues. In vitro NSCs treated with BMP4 has been used a model to investigate NSC proliferation and quiescence, drawing on previous studies (e.g., Helena Mira, 2010; Marlen Knobloch, 2017). Crucially, to ensure the relevance of our in vitro findings to the in vivo context, we confirmed that cultured cells could indeed be induced into quiescence by BMP4, and this induction necessitated the presence of PRDM16. Furthermore, upon identifying target genes co-regulated by PRDM16 and SMADs, we validated PRDM16's regulatory role on a subset of these genes in the developing Choroid Plexus (ChP) (Fig. 7 and Suppl.Fig7-8). Only by combining evidence from both in vitro and in vivo experiments could we confidently conclude that PRDM16 serves as an essential co-factor for BMP signaling in restricting NSC proliferation.

      (2) It seems that the authors compare Prdm16_KO cells to Prdm16 WT cells overexpressing flag_Prdm16. Aside from the possible expression of endogenous Prdm16, other cell differences may have arisen between these cell lines. A properly controlled experiment would compare Prdm16_KO ctrl (possibly infected with a control vector without Prdm16) to Prdm16_KO_E (i.e. the Prdm16_KO cells with and without Prdm16 overexpression.)

      We agree that Prdm16 KO cells carrying the Prdm16-expressing vector would be a good comparison with those with KO_vector. However, despite more than 10 attempts with various optimization conditions, we were unable to establish a viable cell line after infecting Prdm16 KO cells with the Prdm16-expressing vector. The overall survival rate for primary NSCs after viral infection is low, and we observed that KO cells were particularly sensitive to infection treatment when the viral vector was large (the Prdm16 ORF is more than 3kb).

      As an alternative oo assess vector effects, we instead included two other control cell lines, wt and KO cells infected with the 3xNLS_Flag-tag viral vector, and presented the results in supplementary Fig 2.  When we compared the responses of the four lines — wt, KO, wt infected with the Flag vector, KO infected with the Flag vector — to the addition and removal of BMP4, we confirmed that the viral infection itself has no significant impacts on the responses of these cells to these treatments regarding changes in cell proliferation and Ttr induction.

      Given that wt cells and the KO cells, with or without viral backbone infection behave quite similarly in terms of cell proliferation, we speculate that even if we were successful in obtaining a cell line with Prdm16-expressing vector in the KO cells, it may not exhibit substantial differences compared to wt cells infected with Prdm16-expressing vector.

      Other experimental weaknesses that make the evidence less convincing:

      (1) The authors show in Figure 2E that Ttr is not upregulated by BMP4 in PRDM16_KO NSCs. Does this appear inconsistent with the presence of Ttr expression in the PRDM16_KO brain in Figure1C?<br />

      The reviwer’s point is that there was no significant increase in Ttr expression in Prdm16_KO cells after BMP4 treatment (Fig. 2E), but there remained residule Ttr mRNA signals in the Prdm16 mutant ChP (Fig. 1C). We think the difference lies in the measuable level of Ttr expression between that induced by BMP4 in NSC culture and that in the ChP. This is based on our immunostaining expreriment in which we tried to detect Ttr using a Ttr antibody. This antibody could not detect the Ttr protein in BMP4-treated Prdm16_expressing NSCs but clearly showed Ttr signal in the wt ChP. This means that although Ttr expression can be significantly increased by BMP4 in vitro to a level measurable by RT-qPCR, its absolute quantity even in the Prdm16_expressing condition is much lower compared to that in vivo. Our results in Fig 1C and Fig 2E, as well as Fig 7B, all consistently showed that Prdm16 depletion significantly reduced Ttr expression in in vitro and in vivo.

      (2) Figure 3: The authors use H3K4me3 to measure gene activity. This is however, very indirect, with bulk RNA-seq providing the most direct readout and polymerase binding (ChIP-seq) another more direct readout. Transcription can be regulated without expected changes in histone methylation, see e.g. papers from Josh Brickman. They verify their H3K4me3 predictions with qPCR for a select number of genes, all related to the kinetochore, but it is not clear why these genes were picked, and one could worry whether these are representative.

      H3K4me3 has widely been used as an indicator of active transcription and is a mark for cell identity genes. And it has been demonstrated that H3K4me3 has a direct function in regulating transciption at the step of RNApolII pausing release. As stated in the text, there are advantages and disadvantages of using H3K4me3 compared to using RNA-seq. RNA-seq profiles all gene products, which are affected by transcription and RNA stability and turnover. In contrast, H3K4me3 levels at gene promoter reflects transcriptional activity. In our case, we aimed to identify differential gene expression between proliferation and quiescence states. The transition between these two states is fast and dynamic. RNA-seq may not be able to identify functionally relevant genes but more likely produces false positive and negative results. Therefore, we chose H3K4me3 profiling.

      We agree that transcription may change without histone methylation changes. This may cause an under-estimation of the number of changed genes between the conditions. 

      We validated 7 out of 31 genes (Wnt7b, Id3, Mybl2, Spc24, Spc25, Ndc80 and Nuf2). We chose these genes based on two critira: 1) their function is implicated in cell proliferation and cell-cycle regulation based on gene ontology analysis; 2) their gene products are detectable in the developing ChP based on the scRNA-seq data. Three of these genes (Wnt7b, Id3, Mybl2) are not related to the kinetochore. We now clarify this description in the revised text.

      (3) Line 256: The overlap of 31 genes between 184 BMP-repressed genes and 240 PRDM16-repressed genes seems quite small.

      This indicates that in addition to co-repressing cell-cycle genes, BMP and PRDM16 have independent fucntions. For example, it was reported that BMP regulates neuronal and astrocyte differentiation (Katada, S. 2021), while our previous work demonstrated that Prdm16 controls temporal identity of NSCs (He, L. 2021).

      (4) The Wnt7b H3K4me3 track in Fig. 3G is not discussed in the text but it shows H3K4me3 high in _KO and low in _E regardless of BMP4. This seems to contradict the heatmap of H3K4me3 in Figure 3E which shows H3K4me3 high in _E no BMP4 and low in _E BMP4 while omitting _KO no BMP4. Meanwhile CDKN1A, the other gene shown in 3G, is missing from 3E.

      The track in Fig 3G shows the absolute signal of H3K4me3 after mapping the sequencing reads to the genome and normaliz them to library size. Compare the signal in Prdm16_E with BMP4 and that in Prdm16_E without BMP4, the one with BMP4 has a lower peak. The same trend can be seen for the pair of Prdm16_KO cells with or without BMP4.  The heatmap in Fig. 3E shows the relative level of H3K4me3 in three conditions. The Prdm16_E cells with BMP4 has the lowest level, while the other two conditions (Prdm16_KO with BMP4 and Prdm16_E without BMP4) display a higher level. These two graphs show a consistent trend of H3K4me3 changes at the Wnt7b promoter across these conditions.

      (5) The authors use PRDM16 CUT&TAG on dissected dorsal midline tissues to determine if their 31 identified PRDM16-BMP4 co-repressed genes are regulated directly by PRDM16 in vivo. By manual inspection, they find that "most" of these show a PRDM16 peak. How many is most? If using the same parameters for determining peaks, how many genes in an appropriately chosen negative control set of genes would show peaks? Can the authors rigorously establish the statistical significance of this observation? And why wasn't the same experiment performed on the NSCs in which the other experiments are done so one can directly compare the results? Instead, as far as I could tell, there is only ChIP-qPCR for two genes in NSCs in Supplementary Figure 4D.

      In our text, we indicated the genes containing PRDM16 binding peaks in the figures and described them as “Text in black in Fig. 6A and Supplementary Fig. 5A”. We will add the precise number “25 of these genes” in the main text to clarify it. To define a negative control set of genes, we will use BMP-only repressed 184-31 =153 genes (excluding PRDM16-BMP4 co-repressed), and of these 153 genes, we will determine how many have PRDM16 peaks in the E12.5 ChP data, say X. Then we will use binomial test to calculate p-value binom_test(25, 31, X/153, alternative=“greater).

      We are confused with the second part of the comment “And why wasn't the same experiment performed on the NSCs in which the other experiments are done so one can directly compare the results? Instead, as far as I could tell, there is only ChIP-qPCR for two genes in NSCs in Supplementary Figure 4D.” If the reviewer meant why we didn’t sequence the material from sequential-ChIP or validate more taget genes, the reason is the limitation of the material. Sequential ChIP requires a large quantity of the antibodies, and yields little material barely sufficient for a few qPCR after the second round of IP. This yielded amount was far below the minimum required for library construction. The PRDM16 antibody was a gift, and the quantity we have was very limited. We made a lot of efforts to optimize all available commercial antibodies in ChIP and Cut&Tag, but none of them worked.

      (6) In comparing RNA in situ between WT and PRDM16 KO in Figure 7, the authors state they use the Wnt2b signal to identify the border between CH and neocortex. However, the Wnt2b signal is shown in grey and it is impossible for this reviewer to see clear Wnt2b expression or where the boundaries are in Figure 7A. The authors also do not show where they placed the boundaries in their analysis. Furthermore, Figure 7B only shows insets for one of the regions being compared making it difficult to see differences from the other region. Finally, the authors do not show an example of their spot segmentation to judge whether their spot counting is reliable. Overall, this makes it difficult to judge whether the quantification in Figure 7C can be trusted.

      To address these questions, in the revised manuscript we will include an individal channel of Wnt2b and mark the boundaries. We will also provide full-view images and examples of spot segmentation in supplementary figures as space limitation in the main figures.

      (7) The correlation between mKi67 and Axin2 in Figure 7 is interesting but does not convincingly show that Wnt downstream of PRDM16 and BMP is responsible for the increased proliferation in PRDM16 mutants.

      We agree that this result (the correlation between mKi67 and Axin2) alone only suggests that Wnt signaling is related to the proliferation defect in the Prdm16 mutant, and does not necessarily mean that Wnt is downstream of PRDM16 and BMP. Our concolusion is backed up by two additional lines of evidences:  the Cut&Tag data in which PRDM16 binds to regulatory regions of Wnt7b and Wnt3a; BMP and PRDM16 co-repress Wnt7b in vitro.

      An ideal result is that down-regulating Wnt signaling in Prdm16 mutant can rescue Prdm16 mutant phenotype. Such an experiment is technically challenging. Wnt plays diverse and essential roles in NSC regulation, and one would need to use a celltype-and stage-specific tool to down-regulate Wnt in the background of Prdm16 mutation. Moreover, Wnt genes are not the only targets regulated by PRDM16 in these cells, and downregulating Wnt may not be sufficient to rescue the phenotype. 

      Weaknesses of the presentation:

      Overall, the manuscript is not easy to read. This can cause confusion.

      We will revise the text to improve the clarity.

      Reviewer #2 (Public review):

      Summary:

      This article investigates the role of PRDM16 in regulating cell proliferation and differentiation during choroid plexus (ChP) development in mice. The study finds that PRDM16 acts as a corepressor in the BMP signaling pathway, which is crucial for ChP formation.

      The key findings of the study are:

      (1) PRDM16 promotes cell cycle exit in neural epithelial cells at the ChP primordium.

      (2) PRDM16 and BMP signaling work together to induce neural stem cell (NSC) quiescence in vitro.

      (3) BMP signaling and PRDM16 cooperatively repress proliferation genes.

      (4) PRDM16 assists genomic binding of SMAD4 and pSMAD1/5/8.

      (5) Genes co-regulated by SMADs and PRDM16 in NSCs are repressed in the developing ChP.

      (6) PRDM16 represses Wnt7b and Wnt activity in the developing ChP.

      (7) Levels of Wnt activity correlate with cell proliferation in the developing ChP and CH.

      In summary, this study identifies PRDM16 as a key regulator of the balance between BMP and Wnt signaling during ChP development. PRDM16 facilitates the repressive function of BMP signaling on cell proliferation while simultaneously suppressing Wnt signaling. This interplay between signaling pathways and PRDM16 is essential for the proper specification and differentiation of ChP epithelial cells. This study provides new insights into the molecular mechanisms governing ChP development and may have implications for understanding the pathogenesis of ChP tumors and other related diseases.

      Strengths:

      (1) Combining in vitro and in vivo experiments to provide a comprehensive understanding of PRDM16 function in ChP development.

      (2) Uses of a variety of techniques, including immunostaining, RNA in situ hybridization, RT-qPCR, CUT&Tag, ChIP-seq, and SCRINSHOT.

      (3) Identifying a novel role for PRDM16 in regulating the balance between BMP and Wnt signaling.

      (4) Providing a mechanistic explanation for how PRDM16 enhances the repressive function of BMP signaling. The identification of SMAD palindromic motifs as preferred binding sites for the SMAD/PRDM16 complex suggests a specific mechanism for PRDM16-mediated gene repression.

      (5) Highlighting the potential clinical relevance of PRDM16 in the context of ChP tumors and other related diseases. By demonstrating the crucial role of PRDM16 in controlling ChP development, the study suggests that dysregulation of PRDM16 may contribute to the pathogenesis of these conditions.

      Weaknesses:

      (1) Limited investigation of the mechanism controlling PRDM16 protein stability and nuclear localization in vivo. The study observed that PRDM16 protein became nearly undetectable in NSCs cultured in vitro, despite high mRNA levels. While the authors speculate that post-translational modifications might regulate PRDM16 in NSCs similar to brown adipocytes, further investigation is needed to confirm this and understand the precise mechanism controlling PRDM16 protein levels in vivo.

      While mechansims controlling PRDM16 protein stability and nuclear localization in the developing brain are interesting, the scope of this paper is revealing the function of PRDM16 in the choroid plexus and its interaction with BMP signaling. We will be happy to pursuit this direction in our next study.

      (2) Reliance on overexpression of PRDM16 in NSC cultures. To study PRDM16 function in vitro, the authors used a lentiviral construct to constitutively express PRDM16 in NSCs. While this approach allowed them to overcome the issue of low PRDM16 protein levels in vitro, it is important to consider that overexpressing PRDM16 may not fully recapitulate its physiological role in regulating gene expression and cell behavior.

      As stated above, we acknowledge that findings from cultured NSCs may not directly apply to ChP cells in vivo. We are cautious with our statements. The cell culture work was aimed to identify potential mechanisms by which PRDM16 and SMADs interact to regulate gene expression and target genes co-regulated by these factors. We expect that not all targets from cell culture are regulated by PRDM16 and SMADs in the ChP, so we validated expression changes of several target genes in the developing ChP and now included the new data in Fig. 7 and Supplementary Fig. 7. Out of the 31 genes identified from cultured cells, four cell cycle regulators including Wnt7b, Id3, Spc24/25/nuf2 and Mybl2, showed de-repression in Prdm16 mutant ChP. These genes can be relevant downstream genes in the ChP, and other target genes may be cortical NSC-specific or less dependent on Prdm16 in vivo.

      (3) Lack of direct evidence for AP1 as the co-factor responsible for SMAD relocation in the absence of PRDM16. While the study identified the AP1 motif as enriched in SMAD binding sites in Prdm16 knockout cells, they only provided ChIP-qPCR validation for c-FOS binding at two specific loci (Wnt7b and Id3). Further investigation is needed to confirm the direct interaction between AP1 and SMAD proteins in the absence of PRDM16 and to rule out other potential co-factors.

      We agree that the finding of the AP1 motif enriched at the PRDM16 and SMAD co-binding regions in Prdm16 KO cells can only indirectly suggest AP1 as a co-factor for SMAD relocation. That’s why we used ChIP-qPCR to examine the presence of C-fos at these sites. Although we only validated two targets, the result confirms that C-fos binds to the sites only in the Prdm16 KO cells but not Prdm16_expressing cells, suggesting AP1 is a co-factor.  We results cannot rule out the presence of other co-factors.

      Reviewer #3 (Public review):

      Summary:

      Bone morphogenetic protein (BMP) signaling instructs multiple processes during development including cell proliferation and differentiation. The authors set out to understand the role of PRDM16 in these various functions of BMP signaling. They find that PRDM16 and BMP co-operate to repress stem cell proliferation by regulating the genomic distribution of BMP pathway transcription factors. They additionally show that PRDM16 impacts choroid plexus epithelial cell specification. The authors provide evidence for a regulatory circuit (constituting of BMP, PRDM16, and Wnt) that influences stem cell proliferation/differentiation.

      Strengths:

      I find the topics studied by the authors in this study of general interest to the field, the experiments well-controlled and the analysis in the paper sound.

      Weaknesses:

      I have no major scientific concerns. I have some minor recommendations that will help improve the paper (regarding the discussion).

      We will revise the discussion according the suggestions.

    1. Author response:

      eLife Assessment 

      The authors utilize a valuable computational approach to exploring the mechanisms of memorydependent klinotaxis, with a hypothesis that is both plausible and testable. Although they provide a solid hypothesis of circuit function based on an established model, the model's lack of integration of newer experimental findings, its reliance on predefined synaptic states, and oversimplified sensory dynamics, make the investigation incomplete for both memory and internal-state modulation of taxis.  

      We would like to express our gratitude to the editor for the assessment of our work. However, we respectfully disagree with the assessment that our investigation is incomplete, if the negative assessment is primarily due to the impact of AIY interneuron ablation on the chemotaxis index (CI) which was reported in Reference [1]. It is crucial to acknowledge that the CI determined through experimental means incorporates contributions from both klinokinesis and klinotaxis [1]. It is plausible that the impact of AIY ablation was not adequately reflected in the CI value. Consequently, the experimental observation does not necessarily diminish the role of AIY in klinotaxis. Anatomical evidence provided by the database (http://ims.dse.ibaraki.ac.jp/ccep-tool/) substantiates that ASE sensory neurons and AIZ interneurons, which have been demonstrated to play a crucial role in klinotaxis [Matsumoto et al., PNAS 121 (5) e2310735121], have the highest number of synaptic connections with AIY interneurons. These findings provide substantial evidence supporting the validity of the presented minimal neural network responsible for salt klinotaxis.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This research focuses on C. elegans klinotaxis, a chemotactic behavior characterized by gradual turning, aiming to uncover the neural circuit mechanism responsible for the context-dependent reversal of salt concentration preference. The phenomenon observed is that the preferred salt concentration depends on the difference between the pre-assay cultivation conditions and the current environmental salt levels. 

      We would like to express our gratitude for the time and consideration you have dedicated to reviewing our manuscript.

      The authors propose that a synaptic-reversal plasticity mechanism at the primary sensory neuron, ASER, is critical for this memory- and context-dependent switching of preference. They build on prior findings regarding synaptic reversal between ASER and AIB, as well as the receptor composition of AIY neurons, to hypothesize that similar "plasticity" between ASER and AIY underpins salt preference behavior in klinotaxis. This plasticity differs conceptually from the classical one as it does not rely on any structural changes but rather synaptic transmission is modulated by the basal level of glutamate, and can switch from inhibitory to excitatory. 

      To test this hypothesis, the study employs a previously established neuroanatomically grounded model [4] and demonstrates that reversing the ASER-AIY synapse sign in the model agent reproduces the observed reversal in salt preference. The model is parameterized using a computational search technique (evolutionary algorithm) to optimize unknown electrophysiological parameters for chemotaxis performance. Experimental validity is ensured by incorporating constraints derived from published findings, confirming the plausibility of the proposed mechanism. 

      Finally. the circuit mechanism allowing C. elegans to switch behaviour to an exploration run when starved is also investigated. This extension highlights how internal states, such as hunger, can dynamically reshape sensory-motor programs to drive context-appropriate behaviors.  

      We would like to thank the reviewer for the appropriate summary of our work. 

      Strengths and weaknesses: 

      The authors' approach of integrating prior knowledge of receptor composition and synaptic reversal with the repurposing of a published neuroanatomical model [4] is a significant strength.

      This methodology not only ensures biological plausibility but also leverages a solid, reproducible modeling foundation to explore and test novel hypotheses effectively.

      The evidence produced that the original model has been successfully reproduced is convincing.

      The writing of the manuscript needs revision as it makes comprehension difficult.  

      We would like to thank the reviewer for recognizing the usefulness of our approach. In the revised version, we will improve the explanation.  

      One major weakness is that the model does not incorporate key findings that have emerged since the original model's publication in 2013, limiting the support for the proposed mechanism. In particular, ablation studies indicate that AIY is not critical for chemotaxis, and other interneurons may play partially overlapping roles in positive versus negative chemotaxis. These findings challenge the centrality of AIY and suggest the model oversimplifies the circuit involved in klinotaxis.

      We would like to express our gratitude for the constructive feedback we have received. We concur with some of your assertions. In fact, our model is the minimal network for salt klinotaxis, which includes solely the interneurons that are connected to each other via the highest number of synaptic connections. It is important to note that our model does not consider redundant interneurons that exhibit overlapping roles. Consequently, the model is not applicable to the study of the impact of interneuron ablation. In the reference [1], the influence of interneuron ablations on the chemotaxis index (CI) has been investigated. The experimentally determined CI value incorporates the contributions from both klinokinesis and klinotaxis. Consequently, it is plausible that the impact of AIY ablation was not significantly reflected in the CI value. The experimental observation does not necessarily diminish the role of AIY in klinotaxis. 

      Reference [1] also shows that ASER neurons exhibit complex, memory- and context-dependent responses, which are not accounted for in the model and may have a significant impact on chemotactic model behaviour. 

      As pointed out by the reviewer, our model does not incorporate the context-dependent response of the ASER. Instead, the salt concentration-dependent glutamate release from the ASRE [S. Hiroki et al. Nat Commun 13, 2928 (2022)] as the result of the ASER responses is considered in the present study.

      The hypothesis of synaptic reversal between ASER and AIY is not explicitly modeled in terms of receptor-specific dynamics or glutamate basal levels. Instead, the ASER-to-AIY connection is predefined as inhibitory or excitatory in separate models. This approach limits the model's ability to test the full range of mechanisms hypothesized to drive behavioral switching.  

      We would like to thank the reviewer for the helpful comments. In the revised version, we will mention the limitation.

      While the main results - such as response dependence on step inputs at different phases of the oscillator - are consistent with those observed in chemotaxis models with explicit neural dynamics (e.g., Reference [2]), the lack of richer neural dynamics could overlook critical effects. For example, the authors highlight the influence of gap junctions on turning sensitivity but do not sufficiently analyze the underlying mechanisms driving these effects. The role of gap junctions in the model may be oversimplified because, as in the original model [4], the oscillator dynamics are not intrinsically generated by an oscillator circuit but are instead externally imposed via $z_¥text{osc}$. This simplification should be carefully considered when interpreting the contributions of specific connections to network dynamics. Lastly, the complex and contextdependent responses of ASER [1] might interact with circuit dynamics in ways that are not captured by the current simplified implementation. These simplifications could limit the model's ability to account for the interplay between sensory encoding and motor responses in C. elegans chemotaxis. 

      We might not understand the substance of your assertions. However, we understand that the oscillator dynamics were not generated by an oscillator neural circuit in our modeling. On the other hand, the present study focuses on how the sensory input and resulting interneuron dynamics regulate the oscillatory activity of SMB motor neurons to generate klinotaxis. 

      Appraisal: 

      The authors show that their model can reproduce memory-dependent reversal of preference in klinotaxis, demonstrating that the ASER-to-AIY synapse plays a key role in switching chemotactic preferences. By switching the ASER-AIY connection from excitatory to inhibitory they indeed show that salt preference reverses. They also show that the curving/turn rate underlying the preference change is gradual and depends on the weight between ASER-AIY. They further support their claim by showing that curving rates also depend on cultivated (set-point).  

      We would like to thank the reviewer for assessing our work.

      Thus within the constraints of the hypothesis and the framework, the model operates as expected and aligns with some experimental findings. However, significant omissions of key experimental evidence raise questions on whether the proposed neural mechanisms are sufficient for reversal in salt-preference chemotaxis.  

      We agree with your opinion. The present hypothesis should be verified by experiments.

      Previous work [1] has shown that individually ablating the AIZ or AIY interneurons has essentially no effect on the Chemotactic Index (CI) toward the set point ([1] Figure 6). Furthermore, in [1] the authors report that different postsynaptic neurons are required for movement above or below the set point. The manuscript should address how this evidence fits with their model by attempting similar ablations. It is possible that the CI is rescued by klinokinesis but this needs to be tested on an extension of this model to provide a more compelling argument.  

      We would like to express our gratitude for the constructive feedback we have received. In the reference [1], the influence of interneuron ablations on the chemotaxis index (CI) has been investigated. It is important to acknowledge that the experimentally determined CI value encompasses the contributions of both klinokinesis and klinotaxis. It is plausible that the impact of AIY ablation was not reflected in the CI value. Consequently, these experimental observations do not necessarily diminish the role of AIY in klinotaxis. The neural circuit model employed in the present study constitutes a minimal network for salt klinotaxis, encompassing solely interneurons that are connected to each other via the highest number of synaptic connections. Anatomical evidence provided by the database (http://ims.dse.ibaraki.ac.jp/cceptool/) substantiates that ASE sensory neurons and AIZ interneurons, which have been demonstrated to play a crucial role in klinotaxis [Matsumoto et al., PNAS 121 (5) e2310735121], have the highest number of synaptic connections with AIY interneurons. Our model does not take into account redundant interneurons with overlapping roles, thus rendering it not applicable to the study of the effects of interneuron ablation.

      The investigation of dispersal behaviour in starved individuals is rather limited to testing by imposing inhibition of the SMB neurons. Although a circuit is proposed for how hunger states modulate taxis in the absence of food, this circuit hypothesis is not explicitly modelled to test the theory or provide novel insights.  

      As pointed out by the reviewer, the neural circuit that inhibits the SMB motor neurons was not explicitly incorporated in our model. We then examined whether our minimal network model could reproduce dispersal behavior under starvation conditions solely due to the experimentally identified inhibitory effect of SMB motor neurons.

      Impact : 

      This research underscores the value of an embodied approach to understanding chemotaxis, addressing an important memory mechanism that enables adaptive behavior in the sensorimotor circuits supporting C. elegans chemotaxis. The principle of operation - the dependence of motor responses to sensory inputs on the phase of oscillation - appears to be a convergent solution to taxis. Similar mechanisms have been proposed in Drosophila larvae chemotaxis [2], zebrafish phototaxis [3], and other systems. Consequently, the proposed mechanism has broader implications for understanding how adaptive behaviors are embedded within sensorimotor systems and how experience shapes these circuits across species.

      We would like to express our gratitude for useful suggestion. We will add the argument that the reviewer mentioned in the revised version.  

      Although the reported reversal of synaptic connection from excitatory to inhibitory is an exciting phenomenon of broad interest, it is not entirely new, as the authors acknowledge similar reversals have been reported in ASER-to-AIB signaling for klinokinesis ( Hiroki et al., 2022). The proposed reversal of the ASER-to-AIY synaptic connection from inhibitory to excitatory is a novel contribution in the specific context of klinotaxis. While the ASER's role in gradient sensing and memory encoding has been previously identified, the current paper mechanistically models these processes, introducing a hypothesis for synaptic plasticity as the basis for bidirectional salt preference in klinotaxis.  

      The research also highlights how internal states, such as hunger, can dynamically reshape sensory-motor programs to drive context-appropriate behaviors.  

      The methodology of parameter search on a neural model of a connectome used here yielded the valuable insight that connectome information alone does not provide enough constraints to reproduce the neural circuits for behaviour. It demonstrates that additional neurophysiological constraints are required.  

      We would like to acknowledge the appropriate recognition of our work.

      Additional Context 

      Oscillators with stimulus-driven perturbations appear to be a convergent solution for taxis and navigation across species. Similar mechanisms have been studied in zebrafish phototaxis [3],

      Drosophila larvae chemotaxis [2], and have even been proposed to underlie search runs in ants.

      The modulation of taxis by context and memory is a ubiquitous requirement, with parallels across species. For example, Drosophila larvae modulate taxis based on current food availability and predicted rewards associated with odors, though the underlying mechanism remains elusive. The synaptic reversal mechanism highlighted in this study offers a compelling framework for understanding how taxis circuits integrate context-related memory retrieval more broadly.  

      We would like to express our gratitude for the insightful commentary. In the revised version, we will incorporate the discussion that the similar oscillator mechanism with stimulus-driven perturbations has been observed for zebrafish phototaxis [3] and Drosophila larvae chemotaxis [2].

      As a side note, an interesting difference emerges when comparing C. elegans and Drosophila larvae chemotaxis. In Drosophila larvae, oscillatory mechanisms are hypothesized to underlie all chemotactic reorientations, ranging from large turns to smaller directional biases (weathervaning). By contrast, in C. elegans, weathervaning and pirouettes are treated as distinct strategies, often attributed to separate neural mechanisms. This raises the possibility that their motor execution could share a common oscillator-based framework. Re-examining their overlap might reveal deeper insights into the neural principles underlying these maneuvers. 

      We would like to acknowledge your thoughtfully articulated comment. As pointed out by the reviewer, from the anatomical database (http://ims.dse.ibaraki.ac.jp/ccep-tool/), we found that the neural circuits underlying weathervaning and pirouettes in C. elegans are predominantly distinct but exhibit partial overlap. When we restrict our search to the neurons that are connected to each other with the highest number of synaptic connections, we identify the projections from the neural circuit of weathervaning to the circuit of pirouettes; however we observed no reversal projections. This finding suggests that the neural circuit of weathervaning, namely, our minimal neural network, is not likely to be affected by that of pirouettes, which consists of AIB interneurons and interneurons and motor neurons the downstream. 

      (1) Luo, L., Wen, Q., Ren, J., Hendricks, M., Gershow, M., Qin, Y., Greenwood, J., Soucy, E.R., Klein, M., Smith-Parker, H.K., & Calvo, A.C. (2014). Dynamic encoding of perception, memory, and movement in a C. elegans chemotaxis circuit. Neuron, 82(5), 1115-1128. 

      (2) Antoine Wystrach, Konstantinos Lagogiannis, Barbara Webb (2016) Continuous lateral oscillations as a core mechanism for taxis in Drosophila larvae eLife 5:e15504. 

      (3) Wolf, S., Dubreuil, A.M., Bertoni, T. et al. Sensorimotor computation underlying phototaxis in zebrafish. Nat Commun 8, 651 (2017). 

      (4) Izquierdo, E.J. and Beer, R.D., 2013. Connecting a connectome to behavior: an ensemble of neuroanatomical models of C. elegans klinotaxis. PLoS computational biology, 9(2), p.e1002890. 

      Reviewer #2 (Public review): 

      Summary: 

      This study explores how a simple sensorimotor circuit in the nematode C. elegans enables it to navigate salt gradients based on past experiences. Using computational simulations and previously described neural connections, the study demonstrates how a single neuron, ASER, can change its signaling behavior in response to different salt conditions, with which the worm is able to "remember" prior environments and adjust its navigation toward "preferred" salinity accordingly.  

      We would like to express our gratitude for the time and consideration the reviewer has dedicated to reviewing our manuscript.

      Strengths: 

      The key novelty and strength of this paper is the explicit demonstration of computational neurobehavioral modeling and evolutionary algorithms to elucidate the synaptic plasticity in a minimal neural circuit that is sufficient to replicate memory-based chemotaxis. In particular, with changes in ASER's glutamate release and sensitivity of downstream neurons, the ASER neuron adjusts its output to be either excitatory or inhibitory depending on ambient salt concentration, enabling the worm to navigate toward or away from salt gradients based on prior exposure to salt concentration.

      We would like to thank the reviewer for appreciating our research. 

      Weaknesses: 

      While the model successfully replicates some behaviors observed in previous experiments, many key assumptions lack direct biological validation. As to the model output readouts, the model considers only endpoint behaviors (chemotaxis index) rather than the full dynamics of navigation, which limits its predictive power. Moreover, some results presented in the paper lack interpretation, and many descriptions in the main text are overly technical and require clearer definitions.  

      We would like to thank the reviewer for the constructive feedback. As the reviewer noted, the fundamental assumptions posited in the study have yet to be substantiated by biological validation. Consequently, these assumptions must be directly assessed by biological experimentation. The model performance for salt klinotaxis is evaluated by multiple factors, including not only a chemotaxis index but also the curving rate vs. bearing (Fig. 4a, the bearing is defined in Fig. A3) and the curving rate vs. normal gradient (Fig. 4c). The subsequent two parameters work to characterize the trajectory during salt klinotaxis. In the revised version, we will meticulously revise the manuscript according to the suggestions by the reviewer. We would like to express our sincere gratitude for your insightful review of our work.

    1. Author response:

      We thank all the reviewers for their detailed comments. In response, we will address the comments with further analysis, experiments and an expanded discussion.

      In terms of each specific reviewer's comments:

      Reviewer 1 was positive overall but had several suggestions and requested further rigorously controls. These are highly constructive technical concerns and will be addressed through additional experimentation and methods for quantification.

      Reviewer 2 summarised the strengths of the study as being largely confirmatory. They have perhaps not fully appreciated that this is the first published functional assessment of cerebral vascular permeability in a pericyte deficient zebrafish model.

      The reviewer has made a number of very helpful suggestions to improve technical aspects of the analysis. Many align with the suggestions of Reviewer 1. Additional experiments that include more rigorous controls and further methods to quantify vessel permeability will address these concerns in revision.

      We also note that the reviewer calls for a more nuanced and careful discussion section. We take the reviewers point and do appreciate their concerns. We were limited by wordcount in the initial submission in short report format, but in response will expand and provide a more thorough discussion.

      Reviewer 3 was positive overall but has suggested additional controls and experiments to further strengthen the findings and support our conclusions. Some align with the suggestions of Reviewers 1 and 2. We agree and aim to address them through additional work in revision.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We thank Reviewers for highlighting the strengths of our work along with suggestions for future directions.

      We agree with the Reviewers that RPS26 depletion may impact not only RAN translation initiation and codon selection (as showed in the experiments in Figure 4G), but also other mechanisms, such as speed of PIC scanning, as we stated in the discussion. Although, we did provide the data showing that mRNA of exogenous FMR1-GFP does not change upon RPS26 depletion (Figure 3B&C), hence observed effect most likely stems from translation regulation. In addition, an experiment with ASO-ACG treatment (Figure 4G) suggests that near cognate start codon selection or speed of PIC scanning may be a part of the regulation of RAN translation sensitive to RPS26 depletion. In addition, our latest unpublished results (Niewiadomska D. et al., in revision), indicate that FMRpolyG in fusion with GFP is fairly stable, in particular, while derived from long repeats (>90xCGG), suggesting that the protein stability is not at play in RPS26-dependent regulation.

      We would like to stress that in order to avoid bias in result interpretation and to mimic the natural situation, the majority of experiments concerning levels of FMRpolyG were performed in cell models with stable expression of ACG-initiated FMRpolyG. Currently, we do not possess a cell model with stable expression of AUG-initiated FMRpolyG, and the experiments based on transient transfection system would not necessarily be comparable to the results obtained in stable expression system. However, we believe that the experiment presented in Figure 2B serves as a good control for overall translation level upon RPS26 depletion indicating that RPS26 insufficiency does not affect global translation and the observed regulation is specific to some mRNAs including the one encoding FMRpolyG frame. We also show that the level of ca. 80% of identified canonical proteins, including FMRP, did not change upon RPS26 silencing (SILAC-MS, Figure 4A). Indeed, we did not explore the ribosome composition upon RPS26 and TSR2 depletion, although, most likely the pool of functional ribosomes in the cell is sufficient enough to support the basal translation level (SUnSET assays, Figure 2B & 5C). However, we cannot exclude possibility that for some mRNAs, including one encoding for FMRpolyG, the observed effect can be partially caused by lowering the number of fully active ribosomes, especially in experiments with transient transfection experiments where transgene expression is hundreds times higher than for average native mRNA.

      Finally, we agree with the Reviewer that in vitro translation assay would provide the evidence of direct effect of RPS26 on FMRpolyG level, however, we did not manage to overcome technical difficulties in obtaining cellular lysate devoid of RPS26 from vendor companies.


      The following is the authors’ response to the original reviews.

      General Comments

      We thank Reviewers for the critical comments and experimental suggestions. We considered most of the advices in the revised version of the manuscript, which allowed for a more balanced interpretation of the results presented, and further supported major statement of the manuscript that insufficiency of the RPS26 and RPS25 plays a role in modulating the efficiency of noncanonical RAN translation from FMR1 mRNA, which results in the production of toxic polyglycine protein (FMRpolyG). Firstly, performing new experiments, we showed that silencing of the RPS26 and its chaperone protein TSR2, which regulates loading/exchange of RPS26 in maturing small ribosome subunit, did not elicit global translation inhibition. Secondly, we demonstrated that in contrary to RPS26 and RPS25 depletion, silencing the RPS6 protein, a core component of 40S subunit, did not affect FMRpolyG production, further supporting the specific effect of RPS26 and RPS25 on RAN translation regulation of mutant FMR1 mRNA. We also observed that depletion of RPS26, RPS25 and RPS6 had significant negative effect on cells proliferation which is in line with previously published results indicating that insufficiencies of ribosomal proteins negatively affect cell growth. Moreover, we showed that FMRpolyG production is significantly affected by RPS26 depletion while initiated at ACG, but not other near cognate start codons. Importantly, translation of FMRP initiated at canonical AUG codon of the same mRNA upstream the CGGexp was not affected by RPS26 silencing, similarly to vast majority of the human proteome. This implies that RAN translation of FMR1 mRNA mediated by RPS26 insufficiency is likely to be dependent on start codon selection/fidelity. In essence, we provide a series of evidences indicating that cellular amount of 40S ribosomal proteins RPS26 and RPS25 is important factor of CGGrelated RAN translation regulation. Finally, we also decided to tone down our claims. Now, we state that the RPS26/25/TSR2 insufficiency or depletion, affects RAN translation, rather than composition of 40S ribosomal subunit per se influences RAN translation. We have addressed all specific concerns below and made changes to the new version of manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Tutak et al use a combination of pulldowns, analyzed by mass spectrometry, reporter assays, and fluorescence experiments to decipher the mechanism of protein translation in fragile X-related diseases. The topic is interesting and important.

      Although a role for Rps26-deficient ribosomes in toxic protein translation is plausible based on already available data, the authors' data are not carefully controlled and thus do not support the conclusions of the paper.

      We sincerely appreciate your rigorous, insightful, and constructive feedback throughout the revision process. We believe your guidance has been instrumental in significantly enhancing the quality of our research. Below, we have addressed your comments pointby-point.

      Strengths:

      The topic is interesting and important.

      Weaknesses:

      In particular, there is very little data to support the notion that Rps26-deficient ribosomes are even produced under the circumstances. And no data that indicate that they are involved in the RAN translation. Essential controls (for ribosome numbers) are lacking, no information is presented on the viability of the cells (Rps26 is an essential protein), and the differences in protein levels could well arise from block in protein synthesis, and cell division coupled to differential stability of the proteins.

      We agree that data presented in the first version of the manuscript did not directly address the following processes: ribosome content, global translation rate and cell viability upon RPS26 depletion. Therefore we addressed some of the issues in the revised version of the manuscript. In particular, we showed that RPS26 and TSR2 knock down did not inhibit global translation (new Figure 2B & 4C), hence we concluded that the changes of FMRpolyG level did not arise from general translational shut down. On the other hand, RPS26, RPS25 and RPS6 depletion negatively affected cells proliferation (new Figure 2A,5D,6C), which is in line with a number of previously published researches (e.g. Cheng et al, 2019; Havkin-Solomon et al, 2023). However, the rate of proliferation abnormalities is limited. We agree that observed effects on RAN translation from mutant FMR1 mRNA may stem from the combination of altered protein synthesis, conditions of the cells but also cis-acting factors of mRNA sequence/structure. In new experiments we showed that single nucleotide substitution of ACG by other near cognate start codons change sensitivity of RAN translation to insufficiency of RPS26 (new Figure 4F). Also the inhibitory effect of antisense oligonucleotide binding to the region of 5’UTR containing ACG initiation codon (ASO_ACG) is different in cells differing in amount of RPS26 (new Figure 4G).

      We also agree that our data only partially supports the role of RPS26-defficient ribosomes in RAN translation. Therefore, we have toned down our claims. Now, we state that the RPS26/25/TSR2 insufficiency or depletion affects RAN translation. We also changed the title of the manuscript to: “Insufficiency of 40S ribosomal proteins, RPS26 and RPS25, negatively affects biosynthesis of polyglycine-containing proteins in fragile-X associated conditions” (Previously it was: “Ribosomal composition affects the noncanonical translation and toxicity of polyglycine-containing proteins in fragile X-associated conditions”.

      Specific points:

      (1) Analysis of the mass spec data in Supplemental Table S3 indicates that for many of the proteins that are differentially enriched in one sample, a single peptide is identified. So the difference is between 1 peptide and 0. I don't understand how one can do a statistical analysis on that, or how it would give out anything of significance. I certainly do not think it is significant. This is exacerbated by the fact that the contaminants in the assay (keratins) are many, many-fold more abundant, and so are proteins that are known to be mitochondrial or nuclear, and therefore likely not actual targets (e.g. MCCC1, PC, NPM1; this includes many proteins "of significance" in Table S1, including Rrp1B, NAF1, Top1, TCEPB, DHX16, etc...).

      The data in Table S6/Figure 3A suffer from the same problem.

      I am not convinced that the mass spec data is reliable.

      We thank Reviewer for the comment concerning MS data; however, we believe that it may stem from misunderstanding of the data presented in Table S3 and S6. Both tables represent the output from MaxQuant analysis (so-called ProteinGroup) of MS .raw files, without any filtering. As stated in the Material&Methods, we applied default parameters suggested by MaxQuant developers to analyze MS data, these include identification of proteins based on at least 1 unique peptide, and thus some of the proteins with only 1 unique peptide are shown in Tables S1 and S3. Reviewer is also right that in this output table common contaminants, such as keratins are included. However, these identifications are denoted as “CON_”, and are further filtered out during statistical analysis in Perseus software. During the statistical analysis we first filtered out irrelevant protein groups identifications, such as contaminants, or only identified by site modifications.

      We have changed the names of Supplementary Table files, giving more detailed description. We hope this will help to avoid misunderstanding for broader public. Secondly, when comparing the data presented in Table S3 and volcano plot presented in Figure 1B, one can notice that indeed the majority of identified proteins are not statistically significant (grey points), thus not selected for further stratification. Lack of significance of these proteins may be partially due to poor MS identification, however, they are not included in the following parts of the manuscript. Further, we selected only eight proteins (out of over 150) for stratification by orthogonal techniques, thus we argue that this step validates the biological relevance of chosen candidate RAN-translation modifiers. One should also keep in mind that pull down samples analyzed by MS often yield lower intensity and identification rates, when comparing to whole cell analysis, as a result of lower protein input or stringent washes used during sample preparation.

      Regarding the data presented in Table S6 (SILAC data), we argue that these data are of very good quality. More than 2,000 proteins were identified in a 125min gradient, with over 80% of proteins that were identified with at least 2 unique peptides. Each of three biological replicates was analyzed three times (technical replicates), giving total of 9 high resolution MS runs. Together, we strongly believe that this data is of high confidence.

      (2) The mass-spec data however claims to identify Rps26 as a factor binding the toxic RNA specifically. The rest of the paper seeks to develop a story of how Rps26-deficient ribosomes play a role in the translation of this RNA. I do not consider that this makes sense.

      Indeed, we identified RPS26 as a protein that co-precipitated with FMR1 containing expanded CGG repeats (Supplementary Figure 1G) and found that depletion of RPS26 hindered RAN translation of FMRpolyG, suggesting that RPS26 positively affects RAN translation. However, we did not state that RPS26 directly interacts with toxic RNA. In order to confirm the specificity of RAN translation regulation by RPS26 insufficiency, we tested whether depletion of other 40S ribosomal protein, RPS6, affects FMRpolyG synthesis. Our experiments showed that there was no any significant effect on RAN translation efficiency post RPS6 silencing (new Figure 5C). Importantly, we showed that RPS26 depletion did not inhibit global translation (new Figure 2B). In addition, mutagenesis of near-cognate start codon (new Figure 4F) and ASO_ACG treatment (new Figure 4G) provided the evidences that modulation of FMRpolyG biosynthesis by RPS26 level may depend on start codon selection. In essence, our data suggest that RPS26 depletion specifically affects synthesis of FMRpolyG, but not FMRP derived from the same FMR1 mRNA with CGGexp. However, we do not claim that the observed effect is the consequence of a direct interaction between RPS26 and 5’UTR of FMR1 mRNA. Downregulation of FMRpolyG biosynthesis could be an outcome of the alteration of ribosomal assembly, decrease of efficiency and fidelity of PIC scanning/initiation or impeded elongation or a combination of all these processes. In the manuscript we presented the results of experiments which tested many of these possibilities.

      (3) Rps26 is an essential gene, I am sure the same is true for DHX15. What happens to cell viability? Protein synthesis? The yeast experiments were carefully carried out under experiments where Rps26 was reduced, not fully depleted to give small growth defects.

      We agree with the Reviewer that RPS26 and DHX15 are essential proteins, similarly to all RNA binding proteins, and caution should be taken during experimental design. To address this, we titrated different concentrations of siRPS26, and found that administration of 5 nM siRPS26, which just partially silenced RPS26, decreased FMRpolyG by around 50% (new Figure 1D). This impact was even greater with 15 nM siRPS26, as we observed around 80% decrease of FMRpolyG.

      Havkin-Solomon et al. (2023), showed that proliferation rate is decreased in cells with mutated C-terminus of RPS26, which is required for contacting mRNA. In accordance with this study, we showed that cells with knocked down RPS26 proliferate less efficiently (new Figure 2A), but depletion of RPS26 did not impact the global translation (new Figure 2B). In addition, our SILAC-MS data indicates that ~80% of proteins with determined expression level were not affected by RPS26 insufficiency, and ~20% of the proteins turned out to be sensitive to RPS26 decrease. Although, these data do not take into account the protein stability.

      (4) Knockdown efficiency for all tested genes must be shown to evaluate knockdown efficiency.

      The current version of the manuscript contains representative western blots with validation of knock-down efficiency (for example in Figure 3B, C, E, Figure 6A) and we included knock-down validations where applicable (Figures 1D, 2B, 4G and 5C).

      (5) The data in Figure 1E have just one mock control, but two cell types (control si and Rps26 depletion).

      Mock control corresponds to the cells treated with lipofectamine reagent and was included in the study to determine the “background” signal from cells treated with delivery agent and reagents used to measure the apoptosis process. These cells were neither expressing FMRpolyG, nor siRNAs. Luminescence signals were normalized to the values obtained from mock control. We added more details describing this assay in the Figure 1 legend.

      (6) The authors' data indicate that the effects are not specific to Rps26 but indeed also observed upon Rps25 knockdown. This suggests strongly that the effects are from reduced ribosome content or blocked protein synthesis. Additional controls should deplete a core RP to ascertain this conclusion.

      We agree that observed effects may stem from reduced ribosome content, however, we argue that this is the only possibility and explanation. Previously, it was shown that RPS25 regulates G4C2-related RAN translation, but knock out of RPS25 does not affect global translation (Yamada S, 2019, Nat. Neuroscience). Similarly, we showed that KD of RPS26 or TSR2 did not reduce significantly global translation rate (SUnSET assay; new Figure 2B and 5C, respectively).

      Moreover, in a new version of manuscript we included a control experiment, where we silenced core ribosomal protein (RPS6) and found that RPS6 depletion did not affect RAN translation from mutant FMR1 mRNA (new Figure 5C), thus strengthening our conclusion about specific RAN translation regulation by the level of RPS26 and RPS25.

      Finally, our observation aligns well with current knowledge about how deficiency of different ribosomal proteins alters translation of some classes of mRNAs (Luan Y, 2022, Nucleic Acids Res; Cheng Z, 2019, Mol Cell). It was shown that depletion of RPS26 affects translation rate of different mRNAs compared to depletion of other proteins of small ribosomal subunit.

      (7) Supplemental Figure S3 demonstrates that the depletion of S26 does not affect the selection of the start codon context. Any other claim must be deleted. All the 5'-UTR logos are essentially identical, indicating that "picking" happens by abundance (background).

      Supplementary Figure 3D represents results indicating that the mutation in -4 position (from G to A) did not affect the RAN translation regardless of RPS26 presence or depletion. However, this result does not imply that RPS26 does not affect the selection of start codon of sequence- or RNA structure-context. We verified this particular -4 position, as it was suggested previously as important RPS26-sensitive site in yeasts (Ferretti M, 2017, Nat Struct Mol Biol). We agree with Reviewer that all 5’UTR logos presented in our paper did not show statistical significance for neither tested position for human mRNAs. On the contrary, we observed that regulation sensitive to RPS26 level depends on the selection of start codon of RAN translation, in particular ACG initiation (new Figure 4F&G). RPS26 depletion affected ACG-initiated but not GTG- or CTG-initiated RAN translation.

      In the previous version of the manuscript, we wrote that we did not identify any specific motifs or enrichment within analyzed transcripts in comparison to the background. On the other hand, we found that the GC-content among analyzed transcripts is higher within 5’UTRs and in close proximity to ATG in coding sequences (Figure 4D), what suggests the importance of RNA stable structures in this region. In addition, we showed that mRNAs encoding proteins responding to RPS26 depletion have shorter than average 5’UTRs (new Figure 4E).

      (8) Mechanism is lacking entirely. There are many ways in which ribosomes could have mRNA-specific effects. The authors tried to find an effect from the Kozak sequence, unsuccessfully (however, they also did not do the experiment correctly, as they failed to recognize that the Kozak sequence differs between yeast, where it is A-rich, and mammalian cells, where it is GGCGCC). Collisions could be another mechanism.

      Indeed, collisions as well as other mechanisms such as skewed start codon fidelity may have an effect on efficiency of FMRpolyG biosynthesis. In the current version of the manuscript, we show that RPS26 amount-sensitive regulation seems to be start codonselection dependent (new Figure 4F&G).

      Reviewer #2 (Public Review):

      Summary:

      Translation of CGG repeats leads to the accumulation of poly G, which is associated with neurological disorders. This is a valuable paper in which the authors sought out proteins that modulate RAN translation. They determined which proteins in Hela cells bound to CGG repeats and affected levels of polyG encoded in the 5'UTR of the FMR1 mRNA. They then showed that siRNA depletion of ribosomal protein RPS26 results in less production of FMR1polyG than in control. There are data supporting the claim that RPS26 depletion modulates RAN translation in this RNA, although for some results, the Western results are not strong. The data to support increased aggregation by polyG expression upon S26 KD are incomplete.

      We thank the Reviewer for critical comments and suggestions. We sincerely appreciate your rigorous, insightful, and constructive feedback throughout the revision process.

      Below each specific point, we addressed the mentioned issues.

      Strengths:

      The authors have proteomics data that show the enrichment of a set of proteins on FMR1 RNA but not a related RNA.

      We thank Reviewer for appreciation of provided MS-screening results, which identified proteins enriched on FMR1 RNA with expanded CGG repeats.

      Weaknesses:

      - It is insinuated that RPS26 binds the RNA to enhance CGG-containing protein expression. However, RPS26 reduction was also shown previously to affect ribosome levels, and reduced ribosome levels can result in ribosomes translating very different RNA pools.

      In previous version of the manuscript we did not state that RPS26 binds directly to RNA with expanded CGG repeats and we did not show the experiment indicating direct interaction between studied RNA and RPS26. What we showed is that RPS26 was enriched on FMR1 RNA MS samples, however, we did not verify whether it is direct or indirect interaction. We also tried to test hypothesis that lack of RPS26 in PIC complex may affect efficiency of RAN translation initiation via specific, previously described in yeast Kozak context (Ferretti M, 2017, Nat Struct Mol Biol). As we described this hypothesis was negatively validated. However, we showed that other features of 5’UTR sequences (e.g. higher GC-content or shorter leader sequence) are potentially important for translation efficiency in cells with depleted RPS26.

      Indeed, RPS26 is involved in 40S maturation steps (Plassart L, 2021, eLife) and its insufficiency or mutations or blocking its inclusion to 40S ribosome may result in incomplete 40S maturation, which subsequently might negatively affect translation per se. However, we did not observe global translation inhibition after RPS26 depletion or depletion of TSR2, the chaperon involved in incorporation/exchange RPS26 to small ribosomal subunit (new Figure 2B and 5C). In addition, our SILAC-MS data indicates that majority of studied proteins (including FMRP, the main product of FMR1 gene) were not affected by RPS26 depletion which can be carefully extrapolated to global translation. In revised manuscript we also showed that relatively low silencing of RPS26 also decreased FMRpolyG production in model cells (new Figure 1D).

      We agree that reduced ribosome levels can result in different efficiency of translation of different RNA pools. We enhance this statement in revised manuscript. However, we also showed that the same mRNA containing different near cognate start codons (single/two nucleotide substitution) specific to RAN translation, or targeting this codon with antisense oligonucleotides resulted in altered sensitivity of FMR1 mRNA translation to RPS26 depletion (new Figure 4F).

      - A significant claim is that RPS26 KD alleviates the effects of FMRpolyG expression, but those data aren't presented well.

      We thank the Reviewer for this comment. In the new version of the manuscript, we have added new microscopic images and improved the explanation of Figure 1E. We have also completed the interpretation of Figure 1F in the main text, figure image as well as figure legend, and we hope that these changes will ameliorate understanding of our data.

      Recommendations For The Authors:

      - A significant claim is that RPS26 KD alleviates the effects of FMR polyG expression, but those data aren't presented well:

      Figure 1D (supporting data in S2) and 2D - the authors need to show representative images of a control that has aggregation and indicate aggregates being counted on an image. The legend states that there are no aggregates, but the quantification of aggregates/nucleus is ~1, suggesting there are at least 1 per cell. It is preferred to show at least a representative of what is quantified in the main figure instead of a bar graph.

      The representative images of control and siRPS26-treated cells are now shown in revised version of Figure 1E. Additionally, we completed the Figure legend concerning this part, as well as extended description of the experiment in Materials&Methods section.

      Figure 1E - it is unclear what luminescence signal is being measured. Is this a dye for an apoptotic marker? More information is needed in the legend.

      This information was added to the legend of modified Figure 1F (previously 1E) as suggested.

      - Some of the Western blots are not very convincing. Better evidence for the changes in bar graphs would improve how convincing the data are:

      Fig 2B. The western for FMR95G in the first model is not very convincing. The difference by eye for the second siRNA seems to give a larger effect than the first for 95G construct but they appear almost the same on the graph. More supporting information for the quantification is needed.

      We provided better explanation for WB quantification in M&M section in the manuscript. Alos, we provided additional blot demonstrating independent biological replicate of the mentioned experiment in supplementary materials (Supplementary Figure S2E).

      Figure 4A, the blots for RPS26 and FMR95G are not convincing. They are quite smeary compared to all of the others shown for these proteins in other figures. Could a different replicate be shown?

      We provided additional blot demonstrating the effect on transiently expressed FMRpolyG affected by depletion of TSR2 in COS7 cell line (Supplementary Figure S4A).

      Figure 5A and 5B blots are not ideal. Could a different replicate be shown? Or show multiple replicates in the supplemental figure?

      We provided additional blots from the same experiment, although data is not statistically significant, most likely due to low quality of normalization factor, which is Vinculin (Supplementary Figure S5A). Nevertheless, the level of FMRpolyG is decreased by ~70% after RPS25 silencing in SH-SY5Y cells.

      Figure 2C. Please use the same y axes for all four Westerns in B and C. One would like to compare 95 and 15 repeats, but it is difficult when the y axes are different.

      Thank you for this comment. The y axis was adjusted as suggested by the Reviewer.

      Figure 3D-The text suggests a significant difference between positive and negative responders that is not clear in the figure.

      In the main body of the manuscript we state that: “We did not observe any significant differences in the frequency of individual nucleotide positions in the 20-nucleotide vicinity of the start codon relative to the expected distribution in the BG”, which is in line with the graph showed in Figure 4D (previously 3D).

      Reviewer #3 (Public Review):

      Tutak et al provide interesting data showing that RPS26 and relevant proteins such as TSR2 and RPS25 affect RAN translation from CGG repeat RNA in fragile X-associated conditions. They identified RPS26 as a potential regulator of RAN translation by RNAtagging system and mass spectrometry-based screening for proteins binding to CGG repeat RNA and confirmed its regulatory effects on RAN translation by siRNA-based knockdown experiments in multiple cellular disease models and patient-derived fibroblasts. Quantitative mass spectrometry analysis found that the expressions of some ribosomal proteins are sensitive to RPS26 depletion while approximately 80% of proteins including FMRP were not influenced. Since the roles of ribosomal proteins in RAN translation regulation have not been fully examined, this study provides novel insights into this research field. However, some data presented in this manuscript are limited and preliminary, and their conclusions are not fully supported.

      (1) While the authors emphasized the importance of ribosomal composition for RAN translation regulation in the title and the article body, the association between RAN translation and ribosomal composition is apparently not evaluated in this work. They found that specific ribosomal proteins (RPS26 and RPS25) can have regulatory effects on RAN translation (Figures 1C, 2B, 2C, 2E, 4A, 5A, and 5B), and that the expression levels of some ribosomal proteins can be changed by RPS26 knockdown (Figure 3B, however, the change of the ribosome compositions involved in the actual translation has not been elucidated). Therefore, their conclusive statement, that is, "ribosome composition affects RAN translation" is not fully supported by the presented data and is misleading.

      We thank the Reviewer for critical comments and suggestions. We agree that the initial title and some statements in the text were misleading and the presented data did not fully support the aforementioned statement regarding ribosomal composition affecting FMRpolyG synthesis. Therefore, in the revised version of the manuscript we included a control experiment indicating that depletion of another core 40S ribosomal protein (RPS6) did not impact the FMRpolyG synthesis (new Figure 5C), which supports our hypothesis that RPS26 and RPS25 are specific CGG-related RAN translation modifiers. To precisely deliver a main message of our work, we changed the title that will indicate the specific effect of RPS26 and RPS25 insufficiency on RAN translation of FMRpolyG. Proposed title: “Insufficiency of 40S ribosomal proteins, RPS26 and RPS25 negatively affects biosynthesis of polyglycine-containing proteins in fragile-X associated conditions”. We also changed all statements regarding “ribosomal composition” in main text of the new version of manuscript.

      (2) The study provides insufficient data on the mechanisms of how RPS26 regulates RAN translation. Although authors speculate that RPS26 may affect initiation codon fidelity and regulate RAN translation in a CGG repeat sequence-independent manner (Page 9 and Page 11), what they really have shown is just identification of this protein by the screening for proteins binding to CGG repeat RNA (Figure 1A, 1B), and effects of this protein on CGG repeat-RAN translation. It is essential to clarify whether the regulatory effect of RPS26 on RAN translation is dependent on CGG repeat sequence or near-cognate initiation codons like ACG and GUG in the 5' upstream sequence of the repeat. It would be better to validate the effects of RPS26 on translation from control constructs, such as one composed of the 5' upstream sequence of FMR1 with no CGG repeat, and one with an ATG substitution in the 5' upstream sequence of FMR1 instead of near-cognate initiation codons.

      We agree that the data presented in the manuscript implies that insufficiency of RPS26 plays a pivotal role in the regulation of CGG-related RAN translation and in the revised version of the manuscript we included a series of experiments indicating that ACG codon selection seems to be an important part of RPS26 level-dependent regulation of polyglycine production (new Figure 4F&G; see point 3 below for more details). Importantly, in the luciferase assay showed on Figure 4F we used the AUG-initiated firefly luciferase reporter as normalization control.

      Moreover, to verify if FMRpolyG response to RPS26 deficiency depends on the type of reporter used, we repeated many experiments using FMRpolyG fused with different tags. The luciferase-based assays were in line with experiments conducted on constructs with GFP tag (new Figure 1D), thus strengthening our previous data. Moreover, in the series of experiments, we show that FMRP synthesis which is initiated from ATG codon located in FMR1 exon 1, was not affected by RPS26 depletion (Figure 3E & 4C), even though its translation occurs on the same mRNA as FMRpolyG. This indicates a specific RPS26 regulation of polyglycine frame initiated from ACG near cognate codon.

      (3) The regulatory effects of RPS26 and other molecules on RAN translation have all been investigated as effects on the expression levels of FMRpolyG-GFP proteins in cellular models expressing CGG repeat sequences Figures 1C, 2B, 2C, 2E, 4A, 5A, and 5B). In these cellular experiments, there are multiple confounding factors affecting the expression levels of FMRpolyG-GFP proteins other than RAN translation, including template RNA expression, template RNA distribution, and FMRpolyG-GFP protein degradation. Although authors evaluated the effect on the expression levels of template CGG repeat RNA, it would be better to confirm the effect of these regulators on RAN translation by other experiments such as in vitro translation assay that can directly evaluate RAN translation.

      We agree that there are multiple factors affecting final levels of FMRpolyG-GFP proteins including aforementioned processes. We evaluated the level of FMR1 mRNA, which turned out not to be decreased upon RPS26 depletion (Figure 3B&C), therefore, we assumed that what we observed, was the regulation on translation level, especially that RPS26 is a ribosomal protein contacting mRNA in E-site. We believe that direct assays such as in vitro translation may be beneficial, however, depletion of RPS26 from cellular lysate provided by the vendor seems technically challenging, if not completely impossible. Instead, we focused on sequence/structure specific regulation of RAN translation with the emphasis on start-codon initiation selection. It resulted in generating the valuable results pointing out the RPS26 role in start codon fidelity (Figure 4F&G). These new results showed that translation from mRNAs differing just in single or two nucleotide substitution in near cognate start codon (ACG to GUG or ACG to CUG), although results in exactly the same protein, is differently sensitive to RPS26 silencing (new Figure 4F). Similar differences were observed for translation efficiency from the same mRNA targeted or not with antisense oligonucleotide complementary to the region of RAN translation initiation codon (new Figure 4G). These results also suggest that stability of FMRpolyG is not affected in cells with decreased level of RPS26.

      (4) While the authors state that RPS26 modulated the FMRpolyG-mediated toxicity, they presented limited data on apoptotic markers, not cellular viability (Figure 1E), not fully supporting this conclusion. Since previous work showed that FMRpolyG protein reduces cellular viability (Hoem G, 2019,Front Genet), additional evaluations for cellular viability would strengthen this conclusion.

      We thank the Reviewer for this suggestion. We addressed the apoptotic process in order to determine the effect of RPS26 depletion on RAN translation related toxicity (Figure 1F). In revised version of the manuscript, we also added the evaluation on how cells proliferation was affected by RPS26, RPS25, RPS6 and TSR2 depletion. Our data indicate that TSR2 silencing slightly impacted the cellular fitness (new Figure 5D), whereas insufficiencies of RPS26, RPS25 and RPS6 had a much stronger negative effect on proliferation (new Figure 2A, 5D, 6C), which is in line with previous data (Cheng Z 2019, Mol Cell; Luan Y, 2022, Nucleic Acids Res). The difference in proliferation rate after treatment with siRPS26 makes proper interpretation of cellular viability assessment very difficult.

      Recommendations For The Authors:

      (1) It would be nice to validate the effects of overexpression of RPS26 and other regulators on RAN translation, not limited to knockdown experiments, to support the conclusion.

      We did not performed such experiments because we believed that RPS26 overexpression may have no or marginal effect on translation or RAN translation. It is likely impossible to efficiently incorporate overexpressed RPS26 into 40S subunits, because the concentration of all ribosomal proteins in the cells is very high.

      (2) It would be better to explain how authors selected 8 proteins for siRNA-based validation (Figure 1C, 1D, S1D) from 32 proteins enriched in CGG repeat RNA in the first screening.

      We selected those candidates based on their functions connected to translation, structured RNA unwinding or mRNA processing. For example, we tested few RNA helicases because of their known function in RAN translation regulation described by other researchers. This explanation was added to the revised version of the manuscript.

      (3) Original image data showing nuclear FMRpolyG-GFP aggregates should be presented in Figure 1D.

      The representative images of control and siRPS26-treated cells are now shown in modified version of Figure 1E and described with more details in the legend.

      (4) Image data in Figure 2A and 2D have poor signal/noise ratio and the resolution should be improved. In addition, aggregates should be clearly indicated in Figure 2D in an appropriate manner.

      The stable S-FMR95xG cellular model is characterized by very low expression of RANtranslated FMR95xG, therefore, it is challenging to obtain microscopic images of better quality with higher GFP signal. In the L-99xCGG model expression of transgene is higher. Therefore, we provided new image in the new version of Figure 3D (former 2D). Moreover, we showed aggregates on the image obtained using confocal microscopy (new Supplementary Figure 2D).

      (5) The detailed information on patient-derived fibroblast (age and sex of the patient, the number of CGG repeats, etc.) in Figure 2F needed to be presented.

      This information was added to the figure legend (Figure 3F; previously 2F) and in the Material and Methods section as suggested.

      (6) It would be better to normalize RNA expression levels of FMR1 and FMR1-GFP by the housekeeping gene in Figure S2C, like other RT-qPCR experimental data such as Figure 2B.

      Normalization of FMR1-GFP to GAPDH is now shown in modified version of Figure S2C (right graph) as requested by the Reviewer.

      (7) It would be better to add information on molecular weight on all Western blotting data.

      (8) Marks corresponding to molecular weight ladder were added to all images.

      Full blots, including protein ladders were deposited in Zenodo repository, under doi: 10.5281/zenodo.13860370

      References

      Cheng Z, Mugler CF, Keskin A, Hodapp S, Chan LYL, Weis K, Mertins P, Regev A, Jovanovic M & Brar GA (2019) Small and Large Ribosomal Subunit Deficiencies Lead to Distinct Gene Expression Signatures that Reflect Cellular Growth Rate. Mol Cell 73: 36-47.e10

      Havkin-Solomon T, Fraticelli D, Bahat A, Hayat D, Reuven N, Shaul Y & Dikstein R (2023) Translation regulation of specific mRNAs by RPS26 C-terminal RNA-binding tail integrates energy metabolism and AMPK-mTOR signaling. Nucleic Acids Res 51: 4415–4428

      Hoem,G., Larsen,K.B., Øvervatn,A., Brech,A., Lamark,T., Sjøttem,E. and Johansen,T. (2019) The FMRpolyGlycine protein mediates aggregate formation and toxicity independent of the CGG mRNA hairpin in a cellular model for FXTAS. Front. Genet., 10, 1–18.

      Luan Y, Tang N, Yang J, Liu S, Cheng C, Wang Y, Chen C, Guo YN, Wang H, Zhao W, et al (2022) Deficiency of ribosomal proteins reshapes the transcriptional and translational landscape in human cells. Nucleic Acids Res 50: 6601–6617

      Plassart L, Shayan R, Montellese C, Rinaldi D, Larburu N, Pichereaux C, Froment C, Lebaron S, O’donohue MF, Kutay U, et al (2021) The final step of 40s ribosomal subunit maturation is controlled by a dual key lock. Elife 10

    1. Author response:

      The following is the authors’ response to the original reviews.

      It would be great if the authors could add clarification about the NMDS analyses and the associated results (Fig. 1, Table 1 and Tables S2-4). The overall aim of these analyses was to see how plot characteristics (e.g. canopy cover) and composition of one taxonomic group were related to the composition of another taxonomic group. The authors quantified species composition by two axes from NMDS. (1) This analysis may yield an interpretation problem: if we only find one of the axes, but not the other, was significantly related to one variable, it would be difficult to determine whether that specific variable is important to the species composition because the composition is co-determined by two axes. (2) It is unclear how the authors did the correlation analyses for Tables S2-4. If correlation coefficients were presented in these tables, then these coefficients should be the same or very similar if we switch the positions of y vs. x. That is, the correlation between host vs. parasite phylogenetic composition would be very close to the correlation between parasite vs. phylogenetic composition, but not as the author found that these two relationships were quite different, leading to the interpretation of bottom-up or top-down processes. It is also unclear which correlation coefficient was significant or not because only one P value was provided per row in these tables. (3) In addition to the issues of multiple axes (point 1), NMDS axes simply define the relative positions of the objects in multi-dimensional space, but not the actual dissimilarities. Other methods, such as generalized dissimilarity modeling, redundancy analysis and MANOVA, can be better alternatives.

      Thank you for the thorough and constructive review. We have taken the concerns and questions raised by the editors and reviewers into account and provided clarification about the NMDS analyses as well as additional analyses to confirm our results. First, we have now added a brief explanation in the manuscript regarding the interpretation of the two NMDS axes and how they relate to species composition. Specifically, we clarified that while NMDS defines the relative positions of objects in multi-dimensional space, the two axes together provide a more comprehensive representation of the community composition, which is not solely determined by either axis independently. Second, we acknowledge that alternative approaches could help further strengthen our conclusions. To address this, we incorporated Mantel tests and PERMANOVA (with ‘adonis2’) as additional validation methods. These analyses allowed us to summarize compositional patterns while testing our hypotheses within the framework of the plot characteristics and taxonomic relationships. We have added these analyses and their results in the manuscript to reinforce our findings.

      In methods: L478-481 “To strengthen the robustness of our findings based on NMDS, we further validated the results using Mantel test and PERMANOVA (with ‘adonis2’) for correlation between communities and relationships between communities and environmental variables.”

      L469-475 “NMDS was used to summarize the variation in species composition across plots. The two axes extracted from the NMDS represent gradients in community composition, where each axis reflects a subset of the compositional variation. These axes should not be interpreted in isolation, as the overall species composition is co-determined by their combined variation. For clarity, results were interpreted based on the relationships of variables with the compositional gradients captured by both axes together."

      In results: L172-177 “The PERMANOVA analysis also highlighted the important role of canopy cover for host and parasitoid community (Table S6-9). The Mantel test revealed a consistent pattern with the NMDS analysis, highlighting a pronounced relationship between the species composition of hosts and parasitoids (Table S10). However, the correlation between the phylogenetic composition of hosts and parasitoids was not significant.”

      In discussion: L257-261 “However, this significant pattern was observed only in the NMDS analysis and not in the Mantel test, suggesting that the non-random interactions between hosts and parasitoids could not be simply predicted by their community similarity and associations between the phylogenetic composition of hosts and parasitoids are more complex and require further investigation in the future.”

      -- One additional minor point: "site" would be better set as a fixed rather than random term in the linear mixed-effects models, because the site number (2) is too small to make a proper estimate of random component.

      Now we treated “site” as a fixed factor in our models, interacting with tree species richness/tree MPD and tree functional diversity to reflect the variation of spatial and tree composition between the two sites. We found the main results did not change, as both sites showed consistent patterns for effects of tree richness/MPD on network metrics, which is more pronounced in one site.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors analyzed how biotic and abiotic factors impact antagonistic host-parasitoid interaction systems in a large BEF experiment. They found the linkage between the tree community and host-parasitoid community from the perspective of the multi-dimensionality of biodiversity. Their results revealed that the structure of the tree community (habitat) and canopy cover influence host-parasitoid compositions and their interaction pattern. This interaction pattern is also determined by phylogenetic associations among species. This paper provides a nice framework for detecting the determinants of network topological structures.

      Strengths:

      This study was conducted using a five-year sampling in a well-designed BEF experiment. The effects of the multi-dimensional diversity of tree communities have been well explained in a forest ecosystem with an antagonistic host-parasitoid interaction.

      The network analysis has been well conducted. The combination of phylogenetic analysis and network analysis is uncommon among similar studies, especially for studies of trophic cascades. Still, this study has discussed the effect of phylogenetic features on interacting networks in depth.

      Weaknesses:

      (1) The authors should examine species and interaction completeness in this study to confirm that their sampling efforts are sufficient.

      (2) The authors only used Rao's Q to assess the functional diversity of tree communities. However, multiple metrics of functional diversity exist (e.g., functional evenness, functional dispersion, and functional divergence). It is better to check the results from other metrics and confirm whether these results further support the authors' results.

      (3) The authors did not elaborate on which extinction sequence was used in robustness analysis. The authors should consider interaction abundance in calculating robustness. In this case, the author may use another null model for binary networks to get random distributions.

      (4) The causal relationship between host and parasitoid communities is unclear. Normally, it is easy to understand that host community composition (low trophic level) could influence parasitoid community composition (high trophic level). I suggest using the 'correlation' between host and parasitoid communities unless there is strong evidence of causation.

      Thank you very much for your thoughtful and constructive review of our manuscript. We have carefully addressed your comments and made several revisions to improve the clarity and robustness of our work.1) We appreciate your suggestion regarding species and interaction completeness. To confirm that our sampling efforts were sufficient, we have now included a figure (Fig. S1) showing the species accumulation curve and the coverage of interactions in our study. This ensures that the data collected provide a comprehensive representation of the system. 2) Regarding the use of only Rao’s Q to assess functional diversity, we acknowledge that multiple metrics of functional diversity exist. However, due to the large number of predictors in our analysis, we decided to streamline our approach and focus on Rao’s Q as it provides a robust measure for our research objectives. We have discussed this decision in the revised manuscript and clarified that, while additional metrics could be informative, we believe Rao’s Q sufficiently captures the key aspects of functional diversity in our study. 3) We have elaborated on the robustness analysis and the null model used in our study. Specifically, we now clarified which extinction sequence (random extinction) was used in our manuscript, and explained interaction abundance was incorporated into the robustness calculations (networklevel function, weighted=TURE; see L506). 4) We have revised the text to clarify the relationship between host and parasitoid communities. As you correctly pointed out, while it is intuitive that host community composition influences parasitoid community composition, we have reframed our analysis to emphasize the correlation between the two communities rather than implying causation without strong evidence. We have revised the manuscript to reflect this distinction.

      Reviewer #2 (Public Review):

      Summary:

      In their manuscript, Multi-dimensionality of tree communities structure host-parasitoid networks and their phylogenetic composition, Wang et al. examine the effects of tree diversity and environmental variables on communities of reed-nesting insects and their parasitoids. Additionally, they look for the correlations in community composition and network properties of the two interacting insect guilds. They use a data set collected in a subtropical tree biodiversity experiment over five years of sampling. The authors find that the tree species, functional, and phylogenetic diversity as well as some of the environmental factors have varying impacts on both host and parasitoid communities. Additionally, the communities of the host and parasitoid showed correlations in their structures. Also, the network metrices of the host-parasitoid network showed patterns against environmental variables.

      Strengths:

      The main strength of the manuscript lies in the massive long-term data set collected on host-parasitoid interactions. The data provides interesting opportunities to advance our knowledge on the effects of environmental diversity (tree diversity) on the network and community structure of insect hosts and their parasitoids in a relatively poorly known system.

      Weaknesses:

      To me, there are no major issues regarding the manuscript, though sometimes I disagree with the interpretation of the results and some of the conclusions might be too far-fetched given the analyses and the results (namely the top-down control in the system). Additionally, the methods section (especially statistics) was lacking some details, but I would not consider it too concerning. Sometimes, the logic of the text could be improved to better support the studied hypotheses throughout the text. Also, the results section cannot be understood as a stand-alone without reading the methods first. The study design and the rationale of the analyses should be described somewhere in the intro or presented with the results.

      Thank you very much for your valuable comments and suggestions on our manuscript! We appreciate your feedback and have made revisions accordingly. Specifically, we have rephrased the interpretation of the results and conclusions to better align with the analyses and avoid overstatements, particularly concerning the top-down control in the system. In addition, we have expanded the methods section by providing more details, especially regarding the statistical approaches, to address the points you raised. To enhance the clarity of the manuscript, we have also ensured that the logic of the text better supports the hypotheses throughout. Please see our point-by-point responses below for additional clarifications.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Line 120: "... and large ecosystems susceptible to global change (add citation here)": Citation(s)?

      Now we provided the missed citations.

      Line 141: Add sampling completeness information.

      Now we provide a new figure about sampling completeness (Fig. S1) in the supplementary materials, showing the adequate sampling effort for our study.

      Line 151: use more metrics in the evaluation of functional diversity

      We used tree functional diversity Rao’s Q, which is an integrated and wildly used metric to represent functional dissimilarity of trees. As our study focus on multiple diversity indices of trees, it would be better to do not pay more attention to one type of diversity. Thank you for your suggestion!

      Line 164: host vulnerability. Although generality and vulnerability are commonly used in network analysis, it is better to link these metrics with the trophic level, like the 'host vulnerability' you used. Thus, you can use 'parasitoid generality' instead of 'generality'.

      Thanks for your suggestion. Now the metrics were labeled with the trophic levels in the full text.

      Line 169: two'.'

      Corrected.

      Line 173: 'parasitoid robustness' Or 'robustness of parasitoids'?

      Now changed it to ‘robustness of parasitoid’.

      Lines 173, 468: For the robustness estimations, maybe use null model for binary networks to get random distributions?

      Thanks for the suggestion. Actually, we have used Patefield null models to compare the randomized robustness and observed, helping to assess whether the robustness of the observed network is significantly different compared to expected by chance. All robustness indices across plots were significantly different from a random distribution, See results section L197-201.

      Line 184: modulating interacting communities of hosts and parasitoids.

      Changed accordingly.

      Line 186: determined host-parasitoid interaction patterns

      Changed accordingly.

      Line 191: Biodiversity loss in this study refers to low trophic levels.

      Now we clarified this point.

      Line 190: understand

      Changed accordingly.

      Lines 215-216: Reorganize these sentences

      Line 227: indirectly influenced by...

      Changed accordingly.

      Line 238: Be more specific. Which type of further study?

      Rephased it more specific.

      Lines 297-299: rewrite this sentence to make it more transparent.

      Now we rewrote the sentence accordingly.

      Line 302: Certain

      Changed accordingly.

      Line 453: effective

      Changed accordingly.

      Finally, the authors should check the text carefully to avoid grammatical errors.

      Thanks, now we have checked the full text to avoid grammatical errors.

      Reviewer #2 (Recommendations For The Authors):

      I feel that the authors have very interesting data and have a solid set of analyses. I do not have major issues regarding the manuscript, though sometimes I disagree with the interpretation of the results and some of the conclusions might be too far-fetched given the analyses and the results. Additionally, the methods section (especially statistics) was lacking some details, but I would not consider it too concerning at this point.

      I feel that the largest caveat of the manuscript remains in the representation of the rationale of the study. I felt the introduction could be more concise and be better focused to back up the study questions and hypotheses. Many times, the sentences were too vague and unspecific, and thus, it was difficult to understand what was meant to be said. The authors could mention something more about how community composition of hosts and parasitoids are expected to change with the studied experimental design regarding the metrices you mention in the introduction (stronger hypotheses). The results section cannot be understood as a stand-alone without reading the methods first. The study design and the rationale of the analyses must be described somewhere in the intro or results, if the journal/authors want to keep the methods last structure. Also, the results and discussion could be more focused around the hypotheses. Naturally, these things can be easily fixed.

      I also disagree with the interpretation of results finding top-down control in the system (it might well be there, but I do not think that the current methods and tests are suitable in finding it). First, the used methodology cannot distinguish parasitoids if the hosts are not there and the probability to detect parasitoid likely depends on the abundance of the host. Thus, the top-down regulation is difficult to prove (is it the parasitoids that have driven the host population down). Secondly, I would be hesitant to say anything about the top-down and bottom-up control in the systems as the data in the manuscript is pooled across five years while the top-down/bottom-up regulation in insect systems usually spans only one season/generation in time (much shorter than five years). Consequently, the analyses are comparing the communities of species that some of most likely do not co-exist (they were found in the same space but not during the same time). Luckily, the top-down/bottom-up effects could potentially be explored by using separately the time steps of the now pooled community data: e.g., does the population of the host decrease in t if the parasitoids are abundant in t-1? There are also other statistical tests to explore these patterns.

      In the manuscript "Phylogenetic composition" refers to Mean Pairwise Distance. I would use "phylogenetic diversity" instead throughout the text. Also, to my understanding, in trees both "phylogenetic composition" and "phylogenetic diversity" are used even though based on their descriptions, they are the same.

      Detailed comments:

      Punctuation needs to be checked and edited at some point (I think copy-pasting had left things in the wrong places). Please check that "-" instead of "-" is used in host-parasitoid.

      1-2 The title is not very matching with the content. "Multi-dimensionality" is not mentioned in the text. "phylogenetic composition" -> "phylogenetic diversity"

      We didn’t find the role of functional diversity of trees in host-parasitoid interactions, but we still have tree richness and phylogenetic diversity. I also disagree with that using phylogenetic diversity to replace phylogenetic composition, because diversity highlights higher or lower phylogenetic distance among communities, while the later highlights the phylogenetic dissimilarity across communities.

      53-57 This sentence is quite vague and because of it, difficult to follow. Consider rephrasing and avoiding unspecified terms such as "tree identity", "genetic diversity", and "overall community composition of higher trophic levels" (at least, I was not sure what taxa/level you meant with them).

      Rephased.

      L58-61 “Especially, we lack a comprehensive understanding of the ways that biotic factors, including plant richness, overall community phylogenetic and functional composition of consumers, and abiotic factors such as microclimate, determining host–parasitoid network structure and host–parasitoid community dynamics.”

      56 I would remove "interact" as no interactions were tested.

      Removed accordingly.

      59-60 This needs rephrasing. I feel "taxonomic and phylogenetic composition should be just "species composition". To better match, what was done: "taxonomic, phylogenetic, and network composition of both host and parasitoid communities" -> "species and phylogenetic diversity of both host and parasitoid communities and the composition their interaction networks"

      Changed accordingly.

      62 Remove "tree composition".

      Done.

      62 Replace "taxonomic" with "species". Throughout the text.

      Done.

      63-64 "Generally, top-down control was stronger than bottom-up control via phylogenetic association between hosts and parasitoids" I disagree, see my comments elsewhere.

      Now we rephased the sentence.

      L68-70 “Generally, phylogenetic associations between hosts and parasitoids reflect non-randomly structured interactions between phylogenetic trees of hosts and parasitoids.”

      68 "habitat structure and heterogeneity" This is too strong and general of a statement based on the results. You did not really measure habitat structure or heterogeneity.

      Now we rephased the statement to avoid strong and general description.

      L71-73 “Our study indicates that the composition of higher trophic levels and corresponding interaction networks are determined by plant diversity and canopy cover especially via trophic phylogenetic links in species-rich ecosystems.”

      69 Specify "phylogenetic links". Trophic links?

      Specified to “trophic phylogenetic links”.

      75-77 The sentence is a bit difficult to follow. Consider rephrasing.

      Now we rephased it.

      L79-82 “Changes in network structure of higher trophic levels usually coincide with variations in their diversity and community, which could be in turn affected by the changes in producers via trophic cascades”

      76 Be more specific about what you mean by "community of trophic levels".

      Specified to “community composition”.

      79 Remove "basal changes of", it only makes the sentence heavier.

      Done.

      81 What is "species codependence"?

      We sim to describe the species co-occurrence depending on their closely relationships. For clarity, now we changed to “species coexistence”

      82 What do you mean by "complex dynamics"?

      Rephased to “mechanisms on dynamics of networks”.

      83 onward: I would not focus so much on top-down/bottom-up as I feel that your current analyses cannot really say anything too strong about these causalities but are rather correlative.

      Thanks, we now removed the relevant contents from the discussion. However, we kept one sentence in the Introduction, because it should be highlighted to make reviewers aware of this (the other text on about this were removed).

      89 Remove "environmental".

      Done.

      90 Specify what you mean by "these forces".

      Done.

      98-99 I have difficulties following the logic here "potential specialization of their hosts may cascade up to impact the parasitoids' presence or absence". Consider rephrasing.

      Now we rephased it.

      L101-102 “…and their host fluctuations may cascade up to impact the parasitoids’ presence or absence.”

      100 Be more specific with "habitat-level changes".

      Specified to “community-level changes”

      100 I do not see why host-parasitoid systems would be ideal to study "species interactions". There are much simpler and easier systems available.

      Changed to “… one of ideal…”

      101-103 "influence of" on what?

      Now we rephased the sentence.

      L104-105 “Previous studies mainly focused on the influence of abiotic factors on host-parasitoid interactions”

      104 Be more specific in "the role of multiple components of plant diversity".

      Now we specified "the role of multiple components of plant diversity".

      L107-108 “…the role of multiple components of plant diversity (i.e. taxonomic, functional and phylogenetic diversity)…”

      106 "diversity associations" of what?

      “diversity associations between host and parasitoids”.

      108 Specify the "direct and indirect effects".

      Now we specified it to “…direct and indirect effects (i.e. one pathway and more pathways via other variables)…”

      110-113 A bit heavy sentence to follow. Consider rephrasing.

      Now we rephased the sentence to make it more readable.

      114 Give an example of "phylogenetic dependences".

      Done. Phylogenetic dependences (e.g. phylogenetic diversity)

      117 Move the "e.g. taxonomic, phylogenetic, functional" within brackets in 117 after "dimensions of biodiversity".

      Done.

      120 "(add citation here)" Yes please!

      Done.

      120-121 Specify "such relationships".

      Done. Specified to “multiple dimensions of biodiversity”

      128-130 This is difficult to follow. Please rephrase.

      Now we rephased the sentence.

      L135-137 “We aimed to discern the primary components of the diversity and composition of tree communities that affect higher trophic level interactions via quantifying the strength and complexity of associations between hosts and parasitoid.”

      131-132 Remove "phylogenetic and". It is redundant to phylogenetic diversity.

      Done.

      128 Tested robustness does not really capture "stability of associations".

      Yes, we agree. Now we rephased the sentence and exclude the “stability” description.

      133 Specify "phylogenetic processes".

      Now we specified “phylogenetic processes”.

      L140-141 “…especially via phylogenetic processes (e.g. lineages of trophic levels diverge and evolve over time)…”

      141 I would like to have more details on the data set somewhere in the results. How many individuals and species were found in each plot (on average)? Was there a lot of temporal variation (e.g. between the seasons)? On how many sites were the insect species found?

      Thanks for your suggestion. Now we provide more details on the data set in the results (L153-156), including mean values of individuals and species in each plot. However, the temporal variation should be studied for another relative independent topic, as our study focus on the general patter of interactions between hosts and parasitoids. Therefore, we would not put more information on temporal changes to make readers get lost in the text.

      153-156 “Among them, we found 56 host species (12 bees and 44 wasps, mean abundance and richness are 400.05 and 45.14, respectively, for each plot) and 50 parasitoid species (38 Hymenoptera and 12 Diptera, mean abundance and richness are 14.07 and 9.05, respectively, for each plot).”

      149 tree -> trees

      Done.

      149 Should there read also some else than "NMDS scores"?

      Thanks! Now we provided more details about “NMDS scores”.

      L161-162 “(NMDS axis scores; i.e. preserving the rank order of pairwise dissimilarities between samples)”

      149 You could mention the amount of variation explained by the first two axes of the NMDSs. Now it is difficult to estimate how much the models actually explain.

      Thanks for your comments! However, we could not directly provide the explanatory power of the two axes, because NMDS is based on rank-order distances rather than linear relationships like in PCA. However, the goodness of fit for the NMDS solution is typically evaluated using the stress value. We provide the stress value in the figure caption.

      150 "tree MPD" is mentioned for the first time. Spell it out.

      Done.

      150 Explain "eastness".

      Done.

      L163-164 “…eastness (sine-transformed radian values of aspect) )”

      151 How was "tree functional diversity" quantified?

      Please see methods. L437-L438.

      160 Specify that you talk about phylogenetic compositions of the host and parasitoid communities here.

      We would keep it refined here, keeping consistent with species composition here. Phylogenetic composition just represents the dissimilarities of phylogenetic linages within a community.

      161 Describe "parafit" test here when first mentioned.

      Done, see methods L485-487.

      182 Keep on referring to tables and figures in the discussion! Also, more clearly discuss your hypotheses. There are lots of discussions on top-down/bottom-up control. It could be good to form a hypothesis on them and predict what kind of patterns would suggest either one and what would you expect to find regarding them.

      Now we referred figures and tables in the discussion. As the contents on top-down and bottom-up control were not fit very well with our study (as also suggested by reviewers), so we rephased the discussion and also clearly discuss our hypotheses in the discussion. See L218, L226, and L237 etc.

      186 "partly determined host-parasitoid networks" Be more specific.

      Done.

      L206-207 “…partly determined host-parasitoid network indices, including vulnerability, linkage density, and interaction evenness.”

      195 Tell what you mean by "other biotic factors".

      Specified it: “…other biotic factors such as elevation and slope…”

      197-198 "It seems likely that these results are based on bee linkages to pollen resources" I would be hesitant to conclude this as the bees most likely forage way beyond the borders of the 30m by 30m study plots.

      Thanks for your concern about this problem. While it is true that bees can forage beyond 30 x 30m, the study focuses on their nesting behavior and activity within this defined area, rather than their entire foraging range. Existing literature shows bees often forage locally when resources are available (e.g. Ebeling et al., 2012 Oecologia; Guo et al., year, Basic and Applied Ecology). Therefore, we are confident that this pattern could be associated with the resources around the trap nests.

      223 "This could be further tested by collecting the food directly used by the wasps (caterpillars)" A bit unnecessary addition.

      Thanks for your suggestion. Yes, this definitely is a good point, but currently we don’t have enough data of caterpillars, but we will follow this in the future.

      232-238 I disagree with the authors on the interpretation of the causality of the results here. I think that the community of parasitoids simply indicates which host species are available, while the host community does not have an as strong effect on parasitoid community as parasitoids do not utilise the whole species pool of the hosts. (Presence of parasitoid tells that the host is around while the presence of the host does not necessarily tell about the presence of the parasitoid.) To me, this would rather indicate a bottom-up than top-down regulation. Similar patterns are also visible in species communities of hosts and parasites.

      Thank you for your suggestion. We agree with you that parasitoids are more depended on hosts, as host could not be always attacked by parasitoids. Now we rephased our explanation to follow this argument.

      L254-256 “Such pattern could be further confirmed by the significant association between host phylogenetic composition and parasitoid phylogenetic composition (Fig. 1c), which suggested that their interactions are phylogenetically structured to some extent.”

      247-266 The logic in this section is difficult to follow. Try rephrasing.

      Now we rephased the section for a clearer logic.

      L270-287 “Tree community species richness did not significantly influence the diversity of hosts targeted by parasitoids (parasitoid generality), but caused a significant increase in the diversity of parasitoids per host species (host vulnerability) (Fig. 3a; Table 2). This is likely because niche differentiation often influences network specialization via potential higher resource diversity in plots with higher tree diversity (Lopez-Carretero et al. 2014). Such positive relationship between host vulnerability and tree species richness suggested that host-parasitoid interactions could be driven through bottom-up effects via benefit from tree diversity. For example, parasitoid species increases more than host diversity with increasing tree species richness (Guo et al. 2021), resulting increasing of host vulnerability at community level. According to the enemies hypothesis (Root 1973), which posits a positive effects of plant richness on natural enemies, the higher trophic levels in our study (e.g. predators and parasitoids) would benefit from tree diversity and regulate herbivores thereby (Staab and Schuldt 2020). Indeed, previous studies at the same site found that bee parasitoid richness and abundance were positively related to tree species richness, but not their bee hosts (Fornoff et al. 2021, Guo et al. 2021). Because our dataset considered all hosts and reflects an overall pattern of host-parasitoid interactions, the effects of tree species richness on parasitoid generality might be more complex and difficult to predict, as we found that neither tree species richness nor tree MPD were related to parasitoid generality.”

      249 "This is likely because niche differentiation often influences network specialization via potential higher resource diversity in plots with higher tree diversity" This is a bit contradicting your vulnerability results as niche differentiation should increase specialization and diversity and specialization should decrease vulnerability (less host per parasitoid).

      Thanks! We understand that the concepts of “generality” and “vulnerability” can be a bit confusing. To clarify, “fewer hosts per parasitoid” actually corresponds to lower generality at the community level.

      332-337 How did you select the species growing on your plots? Or was only species number considered? What was the pool of tree species growing on the selected plots? Was the selection similar at both sites?

      Now we provided more information on the experiment design.

      L354-356 “The species pools of the two plots are nonoverlapping (16 species for each site). The composition of tree species within the study plots is based on a “broken-stick” design (see Bruelheide et al. 2014).”

      342 Remove "centrally per plot"?

      Done.

      346-347 Was the selection of different reed diameters similar in all the plots?

      Diameters and the relative distribution of diameters was similar in all trap nests.

      399 & 432 Are "phylogenetic diversity of the tree communities" and "phylogenetic composition of trees" the same? They are both described as mean pairwise distance.

      These two are actually different, as we use this to distinguish the phylogenetic diversity with communities and rank order of dissimilarities between tree communities. Here, the phylogenetic diversity of the tree communities is mean pairwise phylogenetic distance of species for tree communities. Tree phylogenetic composition is the rank order of pairwise dissimilarities between tree communities based on NMDS.

      400 Do you think that MPD makes any sense with the monocultures (value is always 0)? Does this have a potential to bias your analyses and result?

      We agree your point. However, we do not think that this is a major problem in the analyses. We followed the experimental design and considered low phylogenetic relatedness of tree species in a plot (Likewise in monocultures, the tree species richness is always 1).

      402-405 MNTD is not mentioned before or after this. Consider removing this section.

      We tested the potential effects of MNTD in our models. Now we mentioned it in our results.

      L194-195 “Tree mean nearest taxon distance (MNTD) was unrelated to any network indices.”

      405 "Phylogenetic metrics of trees" Which ones?

      Both tree MPD and MNTD. Now we have noted it in the manuscript. (L432)

      410 Further details on "Rao's Q" and how the functional diversity of the communities was calculated are needed.

      Now more details were provided.

      L435-438 “Specifically, seven leaf traits were used for calculation of tree functional diversity, which was calculated as the mean pairwise distance in trait values among tree species, weighted by tree wood volume, and expressed as Rao's Q”

      413 Specify "higher trophic levels".

      Now we specified the trophic levels.

      L440-441 “…higher trophic levels in our study area, such as herbivores and predators”

      417-424 What about the position of the plots within study sites? Is there potential for edge effects (e.g. bees finding easier the trap nest close to the edge of the experimental forest)? Were there any differences between the two sites? What is the elevation range of the plots?

      Thanks for concerning the details of our study. First, all the plots were randomly distributed within the study sites (see Fig. S2). Admittedly, there are several plots are located in the edges of the site. However, we did not consider the potential edge effects in our analysis. Of course, this will be a good point in our future studies. Moreover, the biggest difference between the two is the non-overlapping tree species pool, and the two study sites are apart from 5 km in the same town. Finally, there is not too distinct elevation gradient across the plots (112 m - 260 m).

      432-434 "The species and phylogenetic composition of trees, hosts, and parasitoids were quantified at each plot with nonmetric multidimensional scaling (NMDS) analysis based on Morisita-Horn distances" This section needs to be more specific and detailed. Did you do the NMDS separately for each plot as suggested in the text?

      We provided more details of the section.

      L462-465 “The minimum number of required dimensions in the NMDS based on the reduction in stress value was determined in the analysis (k = 2 in our case). We centred the results to acquire maximum variance on the first dimension, and used the principal components rotation in the analysis.”

      435 Specify how picante was used (function and arguments)!

      Now we specified the function.

      L465-467 “The phylogenetic composition was calculated by mean pairwise distance among the host or parasitoid communities per plot with the R package “picante” with ‘mpd’ function.”

      436 "standardized values" Of what? How was the standardisation done?

      Now we citied a supplementary table (Table S2) to specify it (see L469). For the standardization, we used ‘scale’ function in R, which standardizes data by centering and scaling data. Specifically, it subtracts the mean and divides by the standard deviation for each variable.

      443 Provide more details on parafit.

      Actually, we have provided the reason why we use the parafit test and the usage.

      L483-486 “We used a parafit test (9,999 permutations) with the R package “ape” to test whether the associations were non-random between hosts and parasitoids. This is widely used to assess host-parasite co-phylogeny by analyzing the congruence between host and parasite phylogenies using a distance-based matrix approach.”

      449-451 Rephrase the sentence.

      Rephased.

      L490-491 “We constructed quantitative host-parasitoid networks at community level with the R package “bipartite” for each plot of the two sites.”

      451 "six" Should this be five?

      Yes, should be five, thanks.

      470-481 What package and function were used for the LMMs?

      As we now used linear models, we do no longer use a R package for LMMs.

      470 "mix" -> mixed

      Changed to linear models.

      472 "six" Should this be five?

      Again, we changed it to five.

      479-481 How did you treat the variables from the two different sites when testing for the correlations to avoid two geographic clusters of data points?

      Now we considered the two study sites as fixed factor in our linear models. Moreover, tree-based variables were additionally included as interaction terms with the study sites.

      501 "mix" -> mixed

      Changed to linear models.

      The panel selection for figures 3 and 4 seems random. Justify it!

      Thank you. To avoid including too many figures in the main text, which could potentially confuse readers, we have selected the key results that are of primary interest. The remaining figures are provided in the appendix for reference.

      533 "Note that axes are on a log scale for tree species richness." Why the log-scale if the analyses were performed with linear fit? Also, the drawn regression lines do not match the model description (non-linear, while a linear model is described in the text). The models should probably be described in more detail.

      We used log-transformed to promote the normality of the data. The drawn regression lines are linear lines, which fit our models.

      539 "Values were adjusted for covariates of the final regression model." How?

      We used residual plot to directly visualizes the relationship between the predictor and the response variable with the fitted regression line, making it easier to assess the model's fit.

      Fig. S4 text does not match the figure.

      Thanks! We now deleted the unmatched text in the figure.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      In this work, Noorman and colleagues test the predictions of the "four-stage model" of consciousness by combining psychophysics and scalp EEG in humans. The study relies on an elegant experimental design to investigate the respective impact of attentional and perceptual blindness on visual processing. 

      The study is very well summarised, the text is clear and the methods seem sound. Overall, a very solid piece of work. I haven't identified any major weaknesses. Below I raise a few questions of interpretation that may possibly be the subject of a revision of the text. 

      We thank the reviewer for their positive assessment of our work and for their extremely helpful and constructive comments that helped to significantly improve the quality of our manuscript.

      (1) The perceptual performance on Fig1D appears to show huge variation across participants, with some participants at chance levels and others with performance > 90% in the attentional blink and/or masked conditions. This seems to reveal that the procedure to match performance across participants was not very successful. Could this impact the results? The authors highlight the fact that they did not resort to postselection or exclusion of participants, but at the same time do not discuss this equally important point. 

      Performance was indeed highly variable between observers, as is commonly found in attentional-blink (AB) and masking studies. For some observers, the AB pushes performance almost to chance level, whereas for others it has almost no effect. A similar effect can be seen in masking. We did our best to match accuracy over participants, while also matching accuracy within participants as well as possible, adjusting mask contrast manually during the experimental session. Naturally, those that are strongly affected by masking need not be the same participants as those that are strongly affected by the AB, given the fact that they rely on different mechanisms (which is also one of the main points of the manuscript). To answer the research question, what mattered most was that at the group-level, performance was well matched between the two key conditions. As all our statistical inferences, both for behavior and EEG decoding, rest on this group level. We do not think that variability at the individualsubject level detracts from this general approach.  

      In the Results, we added that our goal was to match performance across participants:

      “Importantly, mask contrast in the masked condition was adjusted using a staircasing procedure to match performance in the AB condition, ensuring comparable perceptual performance in the masked and the AB condition across participants (see Methods for more details).”

      In the Methods, we added:

      “Second, during the experimental session, after every 32 masked trials, mask contrast could be manually updated in accordance with our goal to match accuracy over participants, while also matching accuracy within participants as well as possible.”

      (2) In the analysis on collinearity and illusion-specific processing, the authors conclude that the absence of a significant effect of training set demonstrates collinearity-only processing. I don't think that this conclusion is warranted: as the illusory and nonillusory share the same shape, so more elaborate object processing could also be occurring. Please discuss. 

      We agree with this qualification of our interpretation, and included the reviewer’s account as an alternative explanation in the Discussion section:  

      “It should be noted that not all neurophysiological evidence unequivocally links processing of collinearity and of the Kanizsa illusion to lateral and feedback processing, respectively (Angelucci et al., 2002; Bair et al., 2003; Chen et al., 2014), so that overlap in decoding the illusory and non-illusory triangle may reflect other mechanisms, for example feedback processes representing the triangular shapes as well.”

      (3) Discussion, lines 426-429: It is stated that the results align with the notion that processes of perceptual segmentation and organization represent the mechanism of conscious experience. My interpretation of the results is that they show the contrary: for the same visibility level in the attentional blind or masking conditions, these processes can be implicated or not, which suggests a role during unconscious processing instead. 

      We agree with the reviewer that the interpretation of this result depends on the definition of consciousness that one adheres to. If one takes report as the leading metric for consciousness (=conscious access), one can indeed conclude that perceptual segmentation/organization can also occur unconsciously. However, if the processing that results in the qualitative nature of an image (rather than whether it is reported) is taken as leading – such as the processing that results in the formation of an illusory percept – (=phenomenal) the conclusion can be quite different. This speaks to the still ongoing debate regarding the existence of phenomenal vs access consciousness, and the literature on no-report paradigms amongst others (see last paragraph of the discussion). Because the current data do not speak directly to this debate, we decided to remove  the sentence about “conscious experience”, and edited this part of the manuscript (also addressing a comment about preserved unconscious processing during masking by Reviewer 2) by limiting the interpretation of unconscious processing to those aspects that are uncontroversial:

      “Such deep feedforward processing can be sufficient for unconscious high-level processing, as indicated by a rich literature demonstrating high-level (e.g., semantic) processing during masking (Kouider & Dehaene, 2007; Van den Bussche et al., 2009; van Gaal & Lamme, 2012). Thus, rather than enabling deep unconscious processing, preserved local recurrency during inattention may afford other processing advantages linked to its proposed role in perceptual integration (Lamme, 2020), such as integration of stimulus elements over space or time.”

      (4) The two paradigms developed here could be used jointly to highlight nonidiosyncratic NCCs, i.e. EEG markers of visibility or confidence that generalise regardless of the method used. Have the authors attempted to train the classifier on one method and apply it to another (e.g. AB to masking and vice versa)? What perceptual level is assumed to transfer? 

      To avoid issues with post-hoc selection of (visible vs. invisible) trials (discussed in the Introduction), we did not divide our trials into conscious and unconscious trials, and thus did not attempt to reveal NCCs, or NCCs generalizing across the two paradigms. Note also that this approach alone would not resolve the debate regarding the ‘true’ NCC as it hinges on the operational definition of consciousness one adheres to; also see our response to the previous point the reviewer raised. Our main analysis revealed that the illusory triangle could be decoded with above-chance accuracy during both masking and the AB over extended periods of time with similar topographies (Fig. 2B), so that significant cross-decoding would be expected over roughly the same extended period of time (except for the heightened 200-250 ms peak). However, as our focus was on differences between the two manipulations and because we did not use post-hoc sorting of trials, we did not add these analyses.

      (5) How can the results be integrated with the attentional literature showing that attentional filters can be applied early in the processing hierarchy? 

      Compared to certain manipulations of spatial attention, the AB phenomenon is generally considered to represent an instance of  “late” attentional filtering. In the Discussion section we included a paragraph on classic load theory, where early and late filtering depend on perceptual and attentional load. Just preceding this paragraph, we added this:  

      “Clearly, these findings do not imply that unconscious high-level (e.g., semantic) processing can only occur during inattention, nor do they necessarily generalize to other forms of inattention. Indeed, while the AB represents a prime example of late attentional filtering, other ways of inducing inattention or distraction (e.g., by manipulating spatial attention) may filter information earlier in the processing hierarchy (e.g., Luck & Hillyard, 1994 vs. Vogel et al., 1998).”

      Reviewer #2 (Public Review): 

      Summary: 

      This is a very elegant and important EEG study that unifies within a single set of behaviorally equated experimental conditions conscious access (and therefore also conscious access failures) during visual masking and attentional blink (AB) paradigms in humans. By a systematic and clever use of multivariate pattern classifiers across conditions, they could dissect, confirm, and extend a key distinction (initially framed within the GNWT framework) between 'subliminal' and 'pre-conscious' unconscious levels of processing. In particular, the authors could provide strong evidence to distinguish here within the same paradigm these two levels of unconscious processing that precede conscious access : (i) an early (< 80ms) bottom-up and local (in brain) stage of perceptual processing ('local contrast processing') that was preserved in both unconscious conditions, (ii) a later stage and more integrated processing (200-250ms) that was impaired by masking but preserved during AB. On the basis of preexisting studies and theoretical arguments, they suggest that this later stage could correspond to lateral and local recurrent feedback processes. Then, the late conscious access stage appeared as a P3b-like event. 

      Strengths: 

      The methodology and analyses are strong and valid. This work adds an important piece in the current scientific debate about levels of unconscious processing and specificities of conscious access in relation to feed-forward, lateral, and late brain-scale top-down recurrent processing. 

      Weaknesses: 

      - The authors could improve clarity of the rich set of decoding analyses across conditions. 

      - They could also enrich their Introduction and Discussion sections by taking into account the importance of conscious influences on some unconscious cognitive processes (revision of traditional concept of 'automaticity'), that may introduce some complexity in Results interpretation 

      - They should discuss the rich literature reporting high-level unconscious processing in masking paradigms (culminating in semantic processing of digits, words or even small group of words, and pictures) in the light of their proposal (deeper unconscious processing during AB than during masking). 

      We thank the reviewer for their positive assessment of our study and for their insightful comments and helpful suggestions that helped to significantly strengthen our paper. We provide a more detailed point-by-point response in the “recommendations for the authors” section below. In brief, we followed the reviewer’s suggestions and revised the Results/Discussion to include references to influences on unconscious processes and expanded our discussion of unconscious effects during masking vs. AB.  

      Reviewer #3 (Public Review): 

      Summary: 

      This work aims to investigate how perceptual and attentional processes affect conscious access in humans. By using multivariate decoding analysis of electroencephalography (EEG) data, the authors explored the neural temporal dynamics of visual processing across different levels of complexity (local contrast, collinearity, and illusory perception). This is achieved by comparing the decidability of an illusory percept in matched conditions of perceptual (i.e., degrading the strength of sensory input using visual masking) and attentional impairment (i.e., impairing topdown attention using attentional blink, AB). The decoding results reveal three distinct temporal responses associated with the three levels of visual processing. Interestingly, the early stage of local contrast processing remains unaffected by both masking and AB. However, the later stage of collinearity and illusory percept processing are impaired by the perceptual manipulation but remain unaffected by the attentional manipulation. These findings contribute to the understanding of the unique neural dynamics of perceptual and attentional functions and how they interact with the different stages of conscious access. 

      Strengths: 

      The study investigates perceptual and attentional impairments across multiple levels of visual processing in a single experiment. Local contrast, collinearity, and illusory perception were manipulated using different configurations of the same visual stimuli. This clever design allows for the investigation of different levels of visual processing under similar low-level conditions. 

      Moreover, behavioural performance was matched between perceptual and attentional manipulations. One of the main problems when comparing perceptual and attentional manipulations on conscious access is that they tend to impact performance at different levels, with perceptual manipulations like masking producing larger effects. The study utilizes a staircasing procedure to find the optimal contrast of the mask stimuli to produce a performance impairment to the illusory perception comparable to the attentional condition, both in terms of perceptual performance (i.e., indicating whether the target contained the Kanizsa illusion) and metacognition (i.e., confidence in the response). 

      The results show a clear dissociation between the three levels of visual processing in terms of temporal dynamics. Local contrast was represented at an early stage (~80 ms), while collinearity and illusory perception were associated with later stages (~200-250 ms). Furthermore, the results provide clear evidence in support of a dissociation between the effects of perceptual and attentional processes on conscious access: while the former affected both neuronal correlates of collinearity and illusory perception, the latter did not have any effect on the processing of the more complex visual features involved in the illusion perception. 

      Weaknesses: 

      The design of the study and the results presented are very similar to those in Fahrenfort et al. (2017), reducing its novelty. Similar to the current study, Fahrenfort et al. (2017) tested the idea that if both masking and AB impact perceptual integration, they should affect the neural markers of perceptual integration in a similar way. They found that behavioural performance (hit/false alarm rate) was affected by both masking and AB, even though only the latter was significant in the unmasked condition. An early classification peak was instead only affected by masking. However, a late classification peak showed a pattern similar to the behavioural results, with classification affected by both masking and AB. 

      The interpretation of the results mainly centres on the theoretical framework of the recurrent processing theory of consciousness (Lamme, 2020), which lead to the assumption that local contrast, collinearity, and the illusory perception reflect feedforward, local recurrent, and global recurrent connections, respectively. It should be mentioned, however, that this theoretical prediction is not directly tested in the study. Moreover, the evidence for the dissociation between illusion and collinearity in terms of lateral and feedback connections seems at least limited. For instance, Kok et al. (2016) found that, whereas bottom-up stimulation activated all cortical layers, feedback activity induced by illusory figures led to a selective activation of the deep layers. Lee & Nguyen (2001), instead, found that V1 neurons respond to illusory contours of the Kanizsa figures, particularly in the superficial layers. They all mention feedback connections, but none seem to point to lateral connections. 

      Moreover, the evidence in favour of primarily lateral connections driving collinearity seems mixed as well. On one hand, Liang et al. (2017) showed that feedback and lateral connections closely interact to mediate image grouping and segmentation. On the other hand, Stettler et al. (2002) showed that, whereas the intrinsic connections link similarly oriented domains in V1, V2 to V1 feedback displays no such specificity. Furthermore, the other studies mentioned in the manuscript did not investigate feedback connections but only lateral ones, making it difficult to draw any clear conclusions. 

      We thank the reviewer for their careful review and positive assessment of our study, as well as for their constructive criticism and helpful suggestions. We provide a more detailed point-by-point response in the “recommendations for the authors” section below. In brief, we addressed the reviewer’s comments and suggestions by better relating our study to Fahrenfort et al.’s (2017) paper and by highlighting the limitations inherent in linking our findings to distinct neural mechanisms (in particular, to lateral vs. feedback connections).

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors): 

      -  Methods: it states that "The distance between the three Pac-Man stimuli as well as between the three aligned two-legged white circles was 2.8 degrees of visual angle". It is unclear what this distance refers to. Is it the shortest distance between the edges of the objects? 

      It is indeed the shortest distance between the edges of the objects. This is now included in the Methods.

      -  Methods: It's unclear to me if the mask updating procedure during the experimental session was based on detection rate or on the perceptual performance index reported on Fig1D. Please clarify. 

      It was based on accuracy calculated over 32 trials. We have included this information in the Methods.

      -  Methods and Results: I did not understand why the described procedure used to ensure that confidence ratings are not contaminated by differences in perceptual performance was necessary. To me, it just seems to make the "no manipulations" and "both manipulations" less comparable to the other 2 conditions. 

      To calculate accurate estimates of metacognitive sensitivity for the two matched conditions, we wanted participants to make use of the full confidence scale (asking them to distribute their responses evenly over all ratings within a block). By mixing all conditions in the same block, we would have run the risk of participants anchoring their confidence ratings to the unmatched very easy and very difficult conditions (no and both manipulations condition). We made this point explicit in the Results section and in the Methods section:

      “To ensure that the distribution of confidence ratings in the performancematched masked and AB condition was not influenced by participants anchoring their confidence ratings to the unmatched very easy and very difficult conditions (no and both manipulations condition, respectively), the masked and AB condition were presented in the same experimental block, while the other block type included the no and both manipulations condition.”

      “To ensure that confidence ratings for these matched conditions (masked, long lag and unmasked, short lag) were not influenced by participants anchoring their confidence ratings to the very easy and very difficult unmatched conditions (no and both manipulations, respectively), one type of block only contained the matched conditions, while the other block type contained the two remaining, unmatched conditions (masked, short lag and unmasked, long lag).”

      - Methods: what priors were used for Bayesian analyses? 

      Bayesian statistics were calculated in JASP (JASP Team, 2024) with default prior scales (Cauchy distribution, scale 0.707). This is now added to the Methods.

      - Results, line 162: It states that classifiers were applied on "raw EEG activity" but the Methods specify preprocessing steps. "Preprocessed EEG activity" seems more appropriate. 

      We changed the term to “preprocessed EEG activity” in the Methods and to “(minimally) preprocessed EEG activity (see Methods)” in the  Results, respectively.

      - Results, line 173: The effect of masking on local contrast decoding is reported as "marginal". If the alpha is set at 0.05, it seems that this effect is significant and should not be reported as marginal. 

      We changed the wording from “marginal” to “small but significant.”  

      - Fig1: The fixation cross is not displayed. 

      Because adding the fixation cross would have made the figure of the trial design look crowded and less clear, we decided to exclude it from this schematic trial representation. We are now stating this also in the legend of figure 1.  

      - Fig 3A: In the upper left panel, isn't there a missing significant effect of the "local contrast training and testing" condition in the first window? If not, this condition seems oddly underpowered compared to the other two conditions. 

      Thanks for the catch! The highlighting in bold and the significance bar were indeed lacking for this condition in the upper left panel (blue line). We corrected the figure in our revision.

      - Supplementary text and Fig S6: It is unclear to me why the two control analyses (the black lines vs. the green and purple lines) are pooled together in the same figure. They seem to test for different, non-comparable contrasts (they share neither training nor testing sets), and I find it confusing to find them on the same figure. 

      We agree that this may be confusing, and deleted the results from one control analysis from the figure (black line, i.e., training on contrast, testing on illusion), as the reviewer correctly pointed out that it displayed a non-comparable analysis. Given that this control analysis did not reveal any significant decoding, we now report its results only in the Supplementary text.  

      - Fig S6: I think the title of the legend should say testing on the non-illusory triangle instead of testing on the illusory triangle to match the supplementary text. 

      This was a typo – thank you! Corrected.  

      Reviewer #2 (Recommendations For The Authors): 

      Issue #1: One key asymmetry between the three levels of T2 attributes (i.e.: local contrast; non-illusory triangle; illusory Kanisza triangle) is related to the top-down conscious posture driven by the task that was exclusively focusing on the last attribute (illusory Kanisza triangle). Therefore, any difference in EEG decoding performance across these three levels could also depend to this asymmetry. For instance, if participants were engaged to report local contrast or non-illusory triangle, one could wonder if decoding performance could differ from the one used here. This potential confound was addressed by the authors by using decoders trained in different datasets in which the main task was to report one the two other attributes. They could then test how classifiers trained on the task-related attribute behave on the main dataset. However, this part of the study is crucial but not 100% clear, and the links with the results of these control experiments are not fully explicit. Could the author better clarity this important point (see also Issue #1 and #3). 

      The reviewer raises an important point, alluding to potential differences between decoded features regarding task relevance. There are two separate sets of analyses where task relevance may have been a factor, our main analyses comparing illusion to contrast decoding, and our comparison of collinearity vs. illusion-specific processing.  

      In our main analysis, we are indeed reporting decoding of a task-relevant feature (illusion) and of a task-irrelevant feature (local contrast, i.e., rotation of the Pac-Man inducers). Note, however, that the Pac-Man inducers were always task-relevant, as they needed to be processed to perceive illusory triangles, so that local contrast decoding was based on task-relevant stimulus elements, even though participants did not respond to local contrast differences in the main experiment. However, we also ran control analyses testing the effect of task-relevance on local contrast decoding in our independent training data set and in another (independent) study, where local contrast was, in separate experimental blocks, task-relevant or task-irrelevant. The results are reported in the Supplementary Text and in Figure S5. In brief, task-relevance did not improve early (70–95 ms) decoding of local contrast. We are thus confident that the comparison of local contrast to illusion decoding in our main analysis was not substantially affected by differences in task relevance. In our previous manuscript version, we referred to these control analyses only in the collinearity-vs-illusion section of the Results. In our revision, we added the following in the Results section comparing illusion to contrast decoding:

      “In the light of evidence showing that unconscious processing is susceptible to conscious top-down influences (Kentridge et al., 2004; Kiefer & Brendel, 2006; Naccache et al., 2002), we ran control analyses showing that early local contrast decoding was not improved by rendering contrast task-relevant (see Supplementary Information and Fig. S5), indicating that these differences between illusion and contrast decoding did not reflect differences in task-relevance.”

      In addition to our main analysis, there is the concern that our comparison of collinearity vs. illusion-specific processing may have been affected by differences in task-relevance between the stimuli inducing the non-illusory triangle (the “two-legged white circles”, collinearity-only) and the stimuli inducing the Kanizsa illusion (the PacMan inducers, collinearity-plus-illusion). We would like to emphasize that in our main analysis classifiers were always used to decode T2 illusion presence vs. absence (collinearity-plus-illusion), and never to decode T2 collinearity-only. To distinguish collinearity-only from collinearity-plus-illusion processing, we only varied the training data (training classifiers on collinearity-only or collinearity-plus-illusion), using the independent training data set, where collinearity-only and collinearity-plus-illusion (and rotation) were task-relevant (in separate blocks). As discussed in the Supplementary Information, for this analysis approach to be valid, collinearity-only processing should be similar for the illusory and the non-illusory triangle, and this is what control analyses demonstrated (Fig. S7). In any case, general task-relevance was equated for the collinearity-only and the collinearity-plus-illusion classifiers.  

      Finally, in supplementary Figure 6 we also show that our main results reported in Figure 2 (discussed at the top of this response) were very similar when the classifiers were trained on the independent localizer dataset in which each stimulus feature could be task-relevant.  

      Together, for the reasons described above, we believe that differences in EEG decoding performance across these three stimulus levels did  are unlikely to depend also depend on a “task-relevance” asymmetry.

      Issue #2: Following on my previous point the authors should better mention the concept of conscious influences on unconscious processing that led to a full revision of the notion of automaticity in cognitive science [1 , 2 , 3 , 4]. For instance, the discovery that conscious endogenous temporal and spatial attention modulate unconscious subliminal processing paved the way to this revision. This concept raises the importance of Issue#1: equating performance on the main task across AB and masking is not enough to guarantee that differences of neural processing of the unattended attributes of T2 (i.e.: task-unrelated attributes) are not, in part, due to this asymmetry rather than to a systematic difference of unconscious processing strengtsh [5 , 6-8]. Obviously, the reported differences for real-triangle decoding between AB and masking cannot be totally explained by such a factor (because this is a task-unrelated attribute for both AB and masking conditions), but still this issue should be better introduced, addressed, clarified (Issue #1 and #3) and discussed. 

      We would like to refer to our response to the previous point: Control analyses for local contrast decoding showed that task relevance had no influence on our marker for feedforward processing. Most importantly, as outlined above, we did not perform real-triangle decoding – all our decoding analyses focused on comparing collinearity-only vs. collinearity-plus-illusion were run on the task-relevant T2 illusion (decoding its presence vs. absence). The key difference was solely the training set, where the collinearity-only classifier was trained on the (task-relevant) real triangle and the collinearity-plus-illusion classifier was trained on the (task-relevant) Kanizsa triangle. Thus, overall task relevance was controlled in these analyses.  

      In our revision, we are now also citing the studies proposed by the reviewer, when discussing the control analyses testing for an effect of task-relevance on local contrast decoding:

      “In the light of evidence showing that unconscious processing is susceptible to conscious top-down influences (Kentridge et al., 2004; Kiefer & Brendel, 2006; Naccache et al., 2002), we ran control analyses showing that early local contrast decoding was not improved by rendering contrast task-relevant (see Supplementary Information and Fig. S5), indicating that these differences between illusion and contrast decoding did not reflect differences in task-relevance.”

      Issue #3: In terms of clarity, I would suggest the authors to add a synthetic figure providing an overall view of all pairs of intra and cross-conditions decoding analyses and mentioning main task for training and testing sets for each analysis (see my previous and related points). Indeed, at one point, the reader can get lost and this would not only strengthen accessibility to the detailed picture of results, but also pinpoint the limits of the work (see previous point). 

      We understand the point the reviewer is raising and acknowledge that some of our analyses, in particular those using different training and testing sets, may be difficult to grasp. But given the variety of different analyses using different training and testing sets, different temporal windows, as well as different stimulus features, it was not possible to design an intuitive synthetic figure summarizing the key results. We hope that the added text in the Results and Discussion section will be sufficient to guide the reader through our set of analyses.  

      In our revision, we are now more clearly highlighting that, in addition to presenting the key results in our main text that were based on training classifiers on the T1 data, “we replicated all key findings when training the classifiers on an independent training set where individual stimuli were presented in isolation (Fig. 3A, results in the Supplementary Information and Fig. S6).” For this, we added a schematic showing the procedure of the independent training set to Figure 3, more clearly pointing the reader to the use of a separate training data set.  

      Issue #4: In the light of these findings the authors should discuss more thoroughly the question of unconscious high-level representations in masking versus AB: in particular, a longstanding issue relates to unconscious semantic processing of words, numbers or pictures. According to their findings, they tend to suggest that semantic processing should be more enabled in AB than in masking. However, a rich literature provided a substantial number of results (including results from the last authors Simon Van Gaal) that tend to support the notion of unconscious semantic processing in subliminal processing (see in particular: [9 , 10 , 11 , 12 , 13]). So, and as mentioned by the authors, while there is evidence for semantic processing during AB they should better discuss how they would explain unconscious semantic subliminal processing. While a possibility could be to question the unconscious attribute of several subliminal results, the same argument also holds for AB studies. Another possible track of discussion would be to differentiate AB and subliminal perception in terms of strength and durability of the corresponding unconscious representations, but not necessarily in terms of cognitive richness. Indeed, one may discuss that semantic processing of stimuli that do not need complex spatial integration (e.g.: words or digits as compared to illusory Kanisza tested here) can still be observed under subliminal conditions. 

      We thank the reviewer for pointing us to this shortcoming of our previous Discussion. Note that our data does not directly speak to the question of high-level unconscious representations in masking vs AB, because such conclusions would hinge on the operational definition of consciousness one adheres to (also see response to Reviewer 1). Nevertheless, we do follow the reviewer’s suggestions and added the following in the Discussion (also addressing a point about other forms of attention raised by Reviewer 1):

      “Clearly, these findings do not imply that unconscious high-level (e.g., semantic) processing can only occur during inattention, nor do they necessarily generalize to other forms of inattention. Indeed, while the AB represents a prime example of late attentional filtering, other ways of inducing inattention or distraction (e.g., by manipulating spatial attention) may filter information earlier in the processing hierarchy (e.g., Luck & Hillyard, 1994 vs. Vogel et al., 1998).”

      And, in a following paragraph in the Discussion:

      “Such deep feedforward processing can be sufficient for unconscious high-level processing, as indicated by a rich literature demonstrating high-level (e.g., semantic) processing during masking (Kouider & Dehaene, 2007; Van den Bussche et al., 2009; van Gaal & Lamme, 2012). Thus, rather than enabling high-level unconscious processing, preserved local recurrency during inattention may afford other processing advantages linked to its proposed role in perceptual integration (Lamme, 2020), such as integration of stimulus elements over space or time.  

      Reviewer #3 (Recommendations For The Authors): 

      (1) The objective of Fahrenfort et al., 2017 seems very similar to that of the current study. What are the main differences between the two studies? Moreover, Fahrenfort et al., 2017 conducted similar decoding analyses to those performed in the current study.

      Which results were replicated in the current study, and which ones are novel? Highlighting these differences in the manuscript would be beneficial. 

      We now provide a more comprehensive coverage of the study by Fahrenfort et al., 2017. In the Introduction, we added a brief summary of the key findings, highlighting that this study’s findings could have reflected differences in task performance rather than differences between masking and AB:

      “For example, Fahrenfort and colleagues (2017) found that illusory surfaces could be decoded from electroencephalogram (EEG) data during the AB but not during masking. This was taken as evidence that local recurrent interactions, supporting perceptual integration, were preserved during inattention but fully abolished by masking. However, masking had a much stronger behavioral effect than the AB, effectively reducing task performance to chance level. Indeed, a control experiment using weaker masking, which resulted in behavioral performance well above chance similar to the main experiment’s AB condition, revealed some evidence for preserved local recurrent interactions also during masking. However, these conditions were tested in separate experiments with small samples, precluding a direct comparison of perceptual vs. attentional blindness at matched levels of behavioral performance. To test …”

      In the Results , we are now also highlighting this key advancement by directly referencing the previous study:

      “Thus, whereas in previous studies task performance was considerably higher during the AB than during masking (e.g., Fahrenfort et al., 2017), in the present study the masked and the AB condition were matched in both measures of conscious access.” When reporting the EEG decoding results in the Results section, we continuously cite the Fahrenfort et al. (2017) study to highlight similarities in the study’s findings. We also added a few sentences explicitly relating the key findings of the two studies:

      “This suggests that the AB allowed for greater local recurrent processing than masking, replicating the key finding by Fahrenfort and colleagues (2017). Importantly, the present result demonstrates that this effect reflects the difference between the perceptual vs. attentional manipulation rather than differences in behavior, as the masked and the AB condition were matched for perceptual performance and metacognition.”

      “This similarity between behavior and EEG decoding replicates the findings of Fahrenfort and colleagues  (2017) who also found a striking similarity between late Kanizsa decoding (at 406 ms) and behavioral Kanizsa detection. These results indicate that global recurrent processing at these later points in time reflected conscious access to the Kanizsa illusion.”  

      We also more clearly highlighted where our study goes beyond Fahrenfort et al.’s (2017), e.g., in the Results:

      “The addition of this element of collinearity to our stimuli was a key difference to the study by Fahrenfort and colleagues (2017), allowing us to compare non-illusory triangle decoding to illusory triangle decoding in order to distinguish between collinearity and illusion-specific processing.”

      And in the Discussion:

      “Furthermore, the addition of line segments forming a non-illusory triangle to the stimulus employed in the present study allowed us to distinguish between collinearity and illusion-specific processing.”

      Also, in the Discussion, we added a paragraph “summarizing which results were replicated in the current study, and which ones are novel”, as suggested by the reviewer:

      “This pattern of results is consistent with a previous study that used EEG to decode Kanizsa-like illusory surfaces during masking and the AB (Fahrenfort et al., 2017). However, the present study also revealed some effects where Fahrenfort and colleagues (2017) failed to obtain statistical significance, likely reflecting the present study’s considerably larger sample size and greater statistical power. For example, in the present study the marker for feedforward processing was weakly but significantly impaired by masking, and the marker for local recurrency was significantly impaired not only by masking but also by the AB, although to a lesser extent. Most importantly, however, we replicated the key findings that local recurrent processing was more strongly impaired by masking than by the AB, and that global recurrent processing was similarly impaired by masking and the AB and closely linked to task performance, reflecting conscious access. Crucially, having matched the key conditions behaviorally, the present finding of greater local recurrency during the AB can now unequivocally be attributed to the attentional vs. perceptual manipulation of consciousness.”

      Finally, we changed the title to “Distinct neural mechanisms underlying perceptual and attentional impairments of conscious access despite equal task performance” to highlight one of the crucial differences between the Fahrenfort et al., study and this study, namely the fact that we equalized task performance between the two critical conditions (AB and masking).

      (2) It is not clear from the text the link between the current study and the literature on the role of lateral and feedback connections in consciousness (Lamme, 2020). A better explanation is needed. 

      To our knowledge, consciousness theories such as recurrent processing theory by Lamme make currently no distinction between the role of lateral and feedback connections for consciousness. The principled distinction lies between unconscious feedforward processing and phenomenally conscious or “preconscious” local recurrent processing, where local recurrency refers to both lateral (or horizontal) and feedback connections. We added a sentence in the Discussion:

      “As current theories do not distinguish between the roles of lateral vs. feedback connections for consciousness, the present findings may enrich empirical and theoretical work on perceptual vs. attentional mechanisms of consciousness …”

      (3) When training on T1 and testing on T2, EEG data showed an early peak in local contrast classification at 75-95 ms over posterior electrodes. The authors stated that this modulation was only marginally affected by masking (and not at all by AB); however, the main effect of masking is significant. Why was this effect interpreted as nonrelevant? 

      Following this and Reviewer 1’s comment, we changed the wording from “marginal” to “weak but significant.” We considered this effect “weak” and of lesser relevance, because its Bayes factor indicated that the alternative hypothesis was only 1.31 times more likely than the null hypothesis of no effect, representing only “anecdotal” evidence, which is in sharp contrast to the robust effects of the consciousness manipulations on illusion decoding reported later. Furthermore, later ANOVAs comparing the effect of masking on contrast vs. illusion decoding revealed much stronger effects on illusion decoding than on contrast decoding (BFs>3.59×10<sup>4</sup>).

      (4) The decoding analysis on the illusory percept yielded two separate peaks of decoding, one from 200 to 250 ms and another from 275 to 475 ms. The early component was localized occipitally and interpreted as local sensory processing, while the late peak was described as a marker for global recurrent processing. This latter peak was localized in the parietal cortex and associated with the P300. Can the authors show the topography of the P300 evoked response obtained from the current study as a comparison? Moreover, source reconstruction analysis would probably provide a better understanding of the cortical localization of the two peaks. 

      Figure S4 now shows the P300 from electrode Pz, demonstrating a stronger positivity between 375 and 475 ms when the illusory triangle was present than when it was absent. We did not run a source reconstruction analysis.  

      (5) The authors mention that the behavioural results closely resembled the pattern of the second decoding peak results. However, they did not show any evidence for this relationship. For instance, is there a correlation between the two measures across or within participants? Does this relationship differ between the illusion report and the confidence rating? 

      This relationship became evident from simply eyeballing the results figures: Both in behavior and EEG decoding performance dropped from the both-manipulations condition to the AB and masked conditions, while these conditions did not differ significantly. Following a similar observation of a close similarity between behavior and the second/late illusion decoding peak in the study by Fahrenfort et al. (2017), we adopted their analysis approach and ran two additional ANOVAs, adding “measure” (behavior vs. EEG) as a factor. For this analysis, we dropped the both-manipulations condition due to scale restrictions (as noted in footnote 1: “We excluded the bothmanipulations condition from this analysis due to scale restrictions: in this condition, EEG decoding at the second peak was at chance, while behavioral performance was above chance, leaving more room for behavior to drop from the masked and AB condition.”). The analysis revealed that there were no interactions with condition:

      “The pattern of behavioral results, both for perceptual performance and metacognitive sensitivity, closely resembled the second decoding peak: sensitivity in all three metrics dropped from the no-manipulations condition to the masked and AB conditions, while sensitivity did not differ significantly between these performancematched conditions (Fig. 2C). Two additional rm ANOVAs with the factors measure (behavior, second EEG decoding peak) and condition (no-manipulations, masked, AB)<sup>1</sup> for perceptual performance and metacognitive sensitivity revealed no significant interaction (performance: F</iv><sub>2,58</sub>=0.27, P\=0.762, BF<sub>01</sub>=8.47; metacognition: F</iv><sub>2,58</sub=0.54, P\=0.586, BF<sub>2,58</sub>=6.04). This similarity between behavior and EEG decoding replicates the findings of Fahrenfort and colleagues  (2017) who also found a striking similarity between late Kanizsa decoding (at 406 ms) and behavioral Kanizsa detection. These results indicate that global recurrent processing at these later points in time reflected conscious access to the Kanizsa illusion.”

      (6) The marker for illusion-specific processing emerged later (200-250 ms), with the nomanipulation decoding performing better after training on the illusion than the nonillusory triangle. This difference emerged only in the AB condition, and it was fully abolished by masking. The authors confirmed that the illusion-specific processing was not affected by the AB manipulations by running a rm ANOVA which did not result in a significant interaction between condition and training set. However, unlike the other non-significant results, a Bayes Factor is missing here. 

      We added Bayes factors to all (significant and non-significant) rm ANOVAs.

      (7) The same analysis yielded a second illusion decoding peak at 375-475 ms. This effect was impaired by both masking and AB, with no significant differences between the two conditions. The authors stated that this result was directly linked to behavioural performance. However, it is not clear to me what they mean (see point 5). 

      We added analyses comparing behavior and EEG decoding directly (see our response to point 5).

      (8) The introduction starts by stating that perceptual and attentional processes differently affect consciousness access. This differentiation has been studied thoroughly in the consciousness literature, with a focus on how attention differs from consciousness (e.g., Koch & Tsuchiya, TiCS, 2007; Pitts, Lutsyshyna & Hillyard, Phil. Trans. Roy. Soc. B Biol. Sci., 2018). The authors stated that "these findings confirm and enrich empirical and theoretical work on perceptual vs. attentional mechanisms of consciousness clearly distinguishing and specifying the neural profiles of each processing stage of the influential four-stage model of conscious experience". I found it surprising that this aspect was not discussed further. What was the state of the art before this study was conducted? What are the mentioned neural profiles? How did the current results enrich the literature on this topic? 

      We would like to point out that our study is not primarily concerned with the conceptual distinction between consciousness and attention, which has been the central focus of e.g., Koch and Tsuchiuya (2007). While this literature was concerned with ways to dissociate consciousness and attention, we tacitly assumed that attention and consciousness are now generally considered as different constructs. Our study is thus not dealing with dissociations between attention and consciousness, nor with the distinction between phenomenal consciousness and conscious access, but is concerned with different ways of impairing conscious access (defined as the ability to report about a stimulus), either via perceptual or via attentional manipulations. For the state of the art before the study was conducted, we would like to refer to the motivation of our study in the Introduction, e.g., previous studies’ difficulties in unequivocally linking greater local recurrency during attentional than perceptual blindness to the consciousness manipulation, given performance confounds (we expanded this Introduction section). We also expanded a paragraph in the discussion to remind the reader of the neural profiles of the 4-stage model and to highlight the novelty of our findings related to the distinction between lateral and feedback processes:

      “As current theories do not distinguish between the roles of lateral vs. feedback connections for consciousness, the present findings may enrich empirical and theoretical work on perceptual vs. attentional mechanisms of consciousness (Block, 2005; Dehaene et al., 2006; Hatamimajoumerd et al., 2022; Lamme, 2010; Pitts et al., 2018; Sergent & Dehaene, 2004), clearly distinguishing the neural profiles of each processing stage of the influential four-stage model of conscious experience (Fig. 1A). Along with the distinct temporal and spatial EEG decoding patterns associated with lateral and feedback processing, our findings suggest a processing sequence from feedforward processing to local recurrent interactions encompassing lateral-tofeedback connections, ultimately leading to global recurrency and conscious report.”  

      (9) When stating that this is the first study in which behavioural measures of conscious perception were matched between the attentional blink and masking, it would be beneficial to highlight the main differences between the current study and the one from Fahrenfort et al., 2017, with which the current study shares many similarities in the experimental design (see point 1). 

      We would like to refer the reviewer to our response to point 1), where we detail how we expanded the discussion of similarities and differences between our present study and Fahrenfort et al. (2017).

      (10) The discussion emphasizes how the current study "suggests a processing sequence from feedforward processing to local recurrent interactions encompassing lateral-to-feedback connections, ultimately leading to global recurrency and conscious report". For transparency, it is though important to highlight that one limit of the current study is that it does not provide direct evidence for the specified types of connections (see point 6). 

      We added a qualification in the Discussion section:

      “Although the present EEG decoding measures cannot provide direct evidence for feedback vs. lateral processes, based on neurophysiological evidence, …”

      Furthermore, we added this qualification in the Discussion section:

      “It should be noted that the not all neurophysiological evidence unequivocally links processing of collinearity and of the Kanizsa illusion to lateral and feedback processing, respectively (Angelucci et al., 2002; Bair et al., 2003; Chen et al., 2014), so that overlap in decoding the illusory and non-illusory triangle may reflect other mechanisms, for example feedback processing as well.”

      References

      Angelucci, A., Levitt, J. B., Walton, E. J. S., Hupe, J.-M., Bullier, J., & Lund, J. S. (2002). Circuits for local and global signal integration in primary visual cortex. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 22(19), 8633–8646.

      Bair, W., Cavanaugh, J. R., & Movshon, J. A. (2003). Time course and time-distance relationships for surround suppression in macaque V1 neurons. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 23(20), 7690–7701.

      Block, N. (2005). Two neural correlates of consciousness. Trends in Cognitive Sciences, 9(2), 46–52.

      Chen, M., Yan, Y., Gong, X., Gilbert, C. D., Liang, H., & Li, W. (2014). Incremental integration of global contours through interplay between visual cortical areas. Neuron, 82(3), 682–694.

      Dehaene, S., Changeux, J.-P., Naccache, L., Sackur, J., & Sergent, C. (2006). Conscious, preconscious, and subliminal processing: a testable taxonomy. Trends in Cognitive Sciences, 10(5), 204–211.

      Hatamimajoumerd, E., Ratan Murty, N. A., Pitts, M., & Cohen, M. A. (2022). Decoding perceptual awareness across the brain with a no-report fMRI masking paradigm. Current Biology: CB. https://doi.org/10.1016/j.cub.2022.07.068

      JASP Team. (2024). JASP (Version 0.19.0)[Computer software]. https://jasp-stats.org/ Kentridge, R. W., Heywood, C. A., & Weiskrantz, L. (2004). Spatial attention speeds discrimination without awareness in blindsight. Neuropsychologia, 42(6), 831– 835.

      Kiefer, M., & Brendel, D. (2006). Attentional Modulation of Unconscious “Automatic” Processes: Evidence from Event-related Potentials in a Masked Priming Paradigm. Journal of Cognitive Neuroscience, 18(2), 184–198.

      Kouider, S., & Dehaene, S. (2007). Levels of processing during non-conscious perception: a critical review of visual masking. Philosophical Transactions of the Royal Society B: Biological Sciences, 362(1481), 857–875.

      Lamme, V. A. F. (2010). How neuroscience will change our view on consciousness. Cognitive Neuroscience, 1(3), 204–220.

      Luck, S. J., & Hillyard, S. A. (1994). Electrophysiological correlates of feature analysis during visual search. Psychophysiology, 31(3), 291–308.

      Naccache, L., Blandin, E., & Dehaene, S. (2002). Unconscious masked priming depends on temporal attention. Psychological Science, 13(5), 416–424.

      Pitts, M. A., Lutsyshyna, L. A., & Hillyard, S. A. (2018). The relationship between attention and consciousness: an expanded taxonomy and implications for ‘noreport’ paradigms. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 373(1755), 20170348.

      Sergent, C., & Dehaene, S. (2004). Is consciousness a gradual phenomenon? Evidence for an all-or-none bifurcation during the attentional blink. Psychological Science, 15(11), 720–728.

      Van den Bussche, E., Van den Noortgate, W., & Reynvoet, B. (2009). Mechanisms of masked priming: a meta-analysis. Psychological Bulletin, 135(3), 452–477. van Gaal, S., & Lamme, V. A. F. (2012). Unconscious high-level information processing: implication for neurobiological theories of consciousness: Implication for neurobiological theories of consciousness. The Neuroscientist: A Review Journal Bringing Neurobiology, Neurology and Psychiatry, 18(3), 287–301.

      Vogel, E. K., Luck, S. J., & Shapiro, K. L. (1998). Electrophysiological evidence for a postperceptual locus of suppression during the attentional blink. Journal of Experimental Psychology. Human Perception and Performance, 24(6), 1656– 1674.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weaknesses:<br /> (1) While the overall results are interesting, I am somewhat left confused about how to interpret the difference in the scores derived from different conditions. For example, the authors stated "Comparing the weights for in-group and out-group distractors, the effect of proximity was larger than that of aggression and grooming" in p.8. Does this mean that the proximity is indeed the type of behavior most affected in the out-group condition compared to the in-group condition? The out-group effects are difficult to examine with actual behavioral data, but some in-group effects such as those involving OT can be tested, which possibly provides good insights into interpreting the differences of the weights observed across the experimental conditions.

      Thank you for your thoughtful comments and for highlighting an important aspect of our findings. The statement in page 8 refers to the relative impact of different social behaviors—proximity, aggression, and grooming—on the derived weights for in-group and out-group distractors. Specifically, the data suggest that proximity exerts a stronger influence than aggression or grooming in differentiating the effects of out-group versus in-group distractors. Regarding the out-group condition, we acknowledge that it presents challenges for direct behavioral observation, as interactions involving out-group members are often more difficult to quantify in naturalistic settings. However, we agree with you about the suggestion to test certain in-group effects, particularly those influenced by oxytocin (OT), as they offer a more controlled framework to validate and interpret the observed differences in weights across experimental conditions. In line with this, we examined specific in-group behaviors under OT administration to disentangle their contributions to attentional dynamics (Fig. 4 and Fig. 5 e to h). By integrating controlled experimental manipulations, we think these results could provide deeper insights into how social relationships shape the observed patterns of attention.

      (2) I think it is important to provide how variable spontaneous social interactions were across sessions and how impactful the variability of the interactions is on the SEI and IEI, as it helps to understand how meaningful the differences of weights are across the conditions, but such data are missing. In line with this point, although the conclusions still hold as those data were obtained during the same experimental periods, shouldn't the weights in Fig. 3f and Figs. 4g and 4h (saline) be expected to be similar, if not the same?

      Thank you for your insightful comments. As highlighted, we utilized the entire experimental period as the dataset to evaluate the monkeys' social interactions. The experiments presented in Figures 3 and 4 were designed to examine how social relationships correlate with patterns of social attention under two distinct conditions: without manipulation (Fig. 3) and with nebulized exposure to oxytocin and saline (Fig. 4). Theoretically, the weights observed in the unmanipulated condition and the nebulized saline condition should be similar. However, our results indicate that distractor biases shifted significantly following nebulized saline exposure (Fig. 4) compared to the unmanipulated condition (Fig. 3) (MK: p = 9.3×10<sup>-3</sup>, ML: p = 9.77×10<sup>-4</sup>, MC: p = 9.77×10<sup>-4</sup>, MA: p = 0.09; n<sub>1</sub> = n<sub>2</sub> = 12 experimental days; Two-sided Wilcoxon signed-rank test). This suggests that the nebulization process itself, despite acclimating the monkeys to saline exposure for approximately two weeks prior to the experiments, still influenced their attentional behaviors.

      While the primary goal of nebulization was to assess the effects of oxytocin on social attention, our main conclusions remain robust, even considering the impact of nebulization on distractor biases. We acknowledge that variability in spontaneous social interactions across days or experimental sessions could be an important factor influencing the SEI and IEI. The dynamic nature of social interactions within the colony is likely affected by numerous variables. Future research will aim to integrate these factors into a more comprehensive and dynamic framework to better interpret their influence on social attention metrics.

      Reviewer #2 (Public review):

      Weaknesses:<br /> (1) The study's conclusions are based on observations of only four monkeys, which limits the generalizability of the findings. Larger sample sizes could strengthen the validity of the results.

      Thank you for your valuable comment. We acknowledge that the relatively small sample size could influence the generalizability of the findings.  However, despite this limitation, our work systematically examined multifaceted social relationships among monkeys and their attentional strategies within a well-controlled experimental setup. We reported results across sessions and conditions (e.g., in-group vs. out-group; saline vs. Oxytocin), which strengthens the reliability of the observed effects of social networks within this context. We agree that increasing the sample size would improve the generalizability of the results. Future studies with a larger cohort will be critical for confirming the robustness of our findings and expanding their broader applicability. We have acknowledged this limitation in the revised manuscript and highlighted the potential for further research with larger sample sizes to validate and extend our conclusions.

      (2) The limited set of stimulus images (in-group and out-group faces) may introduce unintended biases. This could be addressed by increasing the diversity of stimuli or incorporating a broader range of out-group members.

      Thank you for your thoughtful comment. We acknowledge that the use of a limited set of six monkey faces as stimuli for in-group and out-group conditions could potentially introduce biases. To address this concern, we conducted an additional analysis to minimize the potential impact of individual images on our findings using the current dataset. Specifically, we randomly excluded one in-group and one out-group image and reanalyzed distractor biases using the remaining two images (Supplementary Fig. 3a). For each subject, this approach generated three sets of two distractors per group, resulting in 81(3<sup>4</sup>) combinations across four monkey subjects, and a total of 81 × 81 subject-distractor pairings. We statistically compared distractor biases between in-group and out-group faces for each combination (Supplementary Fig. 3b). As shown in Supplementary Fig. 3c, 99.30% of the 6,561 combinations demonstrated significantly lower distractor biases towards in-group faces compared to out-group faces (two-sided Wilcoxon signed-rank test, p < 0.05). These results suggest that the observed differences in social attention between in-group and out-group monkeys are unlikely to be driven by specific images within the stimulus set. That said, we agree that increasing the diversity of stimulus images or incorporating a broader range of out-group members would improve the generalizability of the results. We have acknowledged this limitation in the revised manuscript and highlighted the potential for further research to incorporate a more diverse stimulus set to validate and extend our findings.

      “However, these conclusions may be constrained by the relatively small sample size and the homogeneity of stimulus set in the study. Future research focusing on larger, more diverse cohorts and incorporating a broader range of stimuli will enhance the generalizability and applicability of the findings.”

      Reviewer #1 (Recommendations for the authors):

      It is difficult to distinguish "Getting fighted" and "Fighting partner" in Fig. 1b (esp. when printed). I thought Actor showed "Fighting partner" several times in Session 2, but it seems to be "Getting fighted" judging from Figs. 1c and 1d. Is this correct? If so, I would suggest to change the color to improve visibility.

      Thank you for your valuable comment. We apologize for the confusion in the previous version. To improve clarity, we have both terms to “begin fighting” and “being fought”. As shown in Figure 1b, we now explicitly define the identities of the two monkeys as the actor (K) and the partner (L), with all behaviors described from the perspective of the actor. For example, when the actor (K) initiates the fight, it is marked as “begin fighting”, whereas when the partner (L) initiates the fight, the actor (K) is the recipient and labeled as “being fought”. Additionally, we have implemented your suggestion by changing the colors to enhance visibility, especially for the terms “begin fighting” and “being fought”.

      Reviewer #2 (Recommendations for the authors): 

      I have some minor concerns:

      (1) Figure1B, caption for x axis is missing, 4 means 4 days?

      Thank you so much for the comment. We have clarified the x-axis in Figure 1B, where the label "4" corresponds to 4 hours of video typing on each experimental day. The revised figure now includes the appropriate label for better clarity. We appreciate your careful attention to this detail.

      (2) I am slightly concerned about animal safety. How do the experimenters ensure the animals' safety and well-being in cases of aggressive interactions or attacks?

      Thank you for your comment. We share your concern regarding animal safety and take re the well-being of the monkeys in the study. All experimental procedures were reviewed and approved by the Institutional Animal Care and Use Committee at the Institute of Biophysics, Chinese Academy of Sciences (IBP-NHP-002(22)). The monkeys were housed together in the same colony room for over four years, in interconnected cages that allowed for direct physical interaction. Animal behaviors in cages were closely monitored via a live video system to ensure their safety. To prevent potential injuries, a sliding partition system was in place, enabling the isolation of individual animals when necessary, minimizing risks to their well-being.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      his study shows a new mechanism of GS regulation in the archaean Methanosarcina mazei and clarifies the direct activation of GS activity by 2-oxoglutarate, thus featuring another way in which 2-oxoglutarate acts as a central status reporter of C/N sensing.

      Mass photometry and single particle cryoEM structure analysis convincingly show the direct regulation of GS activity by 2-OG promoted formation of the dodecameric structure of GS. The previously recognized small proteins GlnK1 and Sp26 seem to play a subordinate role in GS regulation, which is in good agreement with previous data. Although these data are quite clear now, there remains one major open question: how does 2-OG further increase GS activity once the full dodecameric state is achieved (at 5 mM)? This point needs to be reconsidered.

      Weaknesses:

      It is not entirely clear, how very high 2-OG concentrations activate GS beyond dodecamer formation.

      The data presented in this work are in stark contrast to the previously reported structure of M. mazei GS by the Schumacher lab. This is very confusing for the scientific community and requires clarification. The discussion should consider possible reasons for the contradictory results.

      Importantly, it is puzzling how Schumacher could achieve an apo-structire of dodecameric GS? If 2-OG is necessary for dodecameric formation, this should be discussed. If GlnK1 doesn't form a complex with the dodecameric GS, how could such a complex be resolved there?

      In addition, the text is in principle clear but could be improved by professional editing. Most obviously there is insufficient comma placement.

      We thank Reviewer #1 for the professional evaluation and raising important points. We will address those comments in the updated manuscript and especially improve the discussion in respect to the two points of concern.

      (1) How can GlnA1 activity further be stimulated with further increasing 2-OG after the dodecamer is already fully assembled at 5 mM 2-OG.

      We assume a two-step requirement for 2-OG, the dodecameric assembly and the priming of the active sites. The assembly step is based on cooperative effects of 2-OG and does not require the presence of 2-OG in all 2-OG-binding pockets: 2-OG-binding to one binding pocket also causes a domino effect of conformational changes in the adjacent 2-OG-unbound subunit, as also described for Methanothermococcus thermolithotrophicus GS in Müller et al. 2023. Due to the introduction of these conformational changes, the dodecameric form becomes more favourable even without all 2-OG binding sites being occupied. With higher 2-OG concentrations present (> 5mM), the activity increased further until finally all 2-OG-binding pockets were occupied, resulting in the priming of all active sites (all subunits) and thereby reaching the maximal activity.

      (2) The contradictory results with previously published data on the structure of M. mazei by Schumacher et al. 2023.

      We certainly agree that it is confusing that Schumacher et al. 2023 obtained a dodecameric structure without the addition of 2-OG, which we claim to be essential for the dodecameric form. 2-OG is a cellular metabolite that is naturally present in E. coli, the heterologous expression host both groups used. Since our main question focused on analysing the 2-OG effect on GS, we have performed thorough dialysis of the purified protein to remove all 2-OG before performing MP experiments. In the absence of 2-OG we never observed significant enzyme activity and always detected a fast disassembly after incubation on ice. We thus assume that a dodecamer without 2-OG in Schumacher et al. 2023 is an inactive oligomer of a once 2-OG-bound form, stabilized e.g. by the presence of 5 mM MgCl2.

      The GlnA1-GlnK1-structure (crystallography) by Schumacher et al. 2023 is in stark contrast to our findings that GlnK1 and GlnA1 do not interact as shown by mass photometry with purified proteins. A possible reason for this discrepancy might be that at the high protein concentrations used in the crystallization assay, complexes are formed based on hydrophobic or ionic protein interactions, which would not form under physiological concentrations.

      Reviewer #2 (Public Review):

      Summary:

      Herdering et al. introduced research on an archaeal glutamine synthetase (GS) from Methanosarcina mazei, which exhibits sensitivity to the environmental presence of 2-oxoglutarate (2-OG). While previous studies have indicated 2-OG's ability to enhance GS activity, the precise underlying mechanism remains unclear. Initially, the authors utilized biophysical characterization, primarily employing a nanomolar-scale detection method called mass photometry, to explore the molecular assembly of Methanosarcina mazei GS (M. mazei GS) in the absence or presence of 2-OG. Similar to other GS enzymes, the target M. mazei GS forms a stable dodecamer, with two hexameric rings stacked in tail-to-tail interactions. Despite approximately 40% of M. mazei GS existing as monomeric or dimeric entities in the detectable solution, the majority spontaneously assemble into a dodecameric state. Upon mixing 2-OG with M. mazei GS, the population of the dodecameric form increases proportionally with the concentration of 2-OG, indicating that 2-OG either promotes or stabilizes the assembly process. The cryo-electron microscopy (cryo-EM) structure reveals that 2-OG is positioned near the interface of two hexameric rings. At a resolution of 2.39 Å, the cryo-EM map vividly illustrates 2-OG forming hydrogen bonds with two individual GS subunits as well as with solvent water molecules. Moreover, local side-chain reorientation and conformational changes of loops in response to 2-OG further delineate the 2-OG-stabilized assembly of M. mazei GS.

      Strengths & Weaknesses:

      The investigation studies the impact of 2-oxoglutarate (2-OG) on the assembly of Methanosarcina mazei glutamine synthetase (M mazei GS). Utilizing cutting-edge mass photometry, the authors scrutinized the population dynamics of GS assembly in response to varying concentrations of 2-OG. Notably, the findings demonstrate a promising and straightforward correlation, revealing that dodecamer formation can be stimulated by 2-OG concentrations of up to 10 mM, although GS assembly never reaches 100% dodecamerization in this study. Furthermore, catalytic activities showed a remarkable enhancement, escalating from 0.0 U/mg to 7.8 U/mg with increasing concentrations of 2-OG, peaking at 12.5 mM. However, an intriguing gap arises between the incomplete dodecameric formation observed at 10 mM 2-OG, as revealed by mass photometry, and the continued increase in activity from 5 mM to 10 mM 2-OG for M mazei GS. This prompts questions regarding the inability of M mazei GS to achieve complete dodecamer formation and the underlying factors that further enhance GS activity within this concentration range of 2-OG.

      Moreover, the cryo-electron microscopy (cryo-EM) analysis provides additional support for the biophysical and biochemical characterization, elucidating the precise localization of 2-OG at the interface of two GS subunits within two hexameric rings. The observed correlation between GS assembly facilitated by 2-OG and its catalytic activity is substantiated by structural reorientations at the GS-GS interface, confirming the previously reported phenomenon of "funnel activation" in GS. However, the authors did not present the cryo-EM structure of M. mazei GS in complex with ATP and glutamate in the presence of 2-OG, which could have shed light on the differences in glutamine biosynthesis between previously reported GS enzymes and the 2-OG-bound M. mazei GS.

      Furthermore, besides revealing the cryo-EM structure of 2-OG-bound GS, the study also observed the filamentous form of GS, suggesting that filament formation may be a universal stacking mechanism across archaeal and bacterial species. However, efforts to enhance resolution to investigate whether the stacked polymer is induced by 2-OG or other factors such as ions or metabolites were not undertaken by the authors, leaving room for further exploration into the mechanisms underlying filament formation in GS.

      We thank Reviewer #2 for the detailed assessment and valuable input. We will address those comments in the updated manuscript and clarify the message.

      (1) The discrepancy of the dodecamer formation (max. at 5 mM 2-OG) and the enzyme activity (max. at 12.5 mM 2-OG). We assume that there are two effects caused by 2-OG: 1. cooperativity of binding (less 2-OG needed to facilitate dodecamer formation) and 2. priming of each active site. See also Reviewer #1 R.1). We assume this is the reason why the activity of dodecameric GlnA1 can be further enhanced by increased 2-OG concentration until all catalytic sites are primed.

      (2) The lack of the structure of a 2-OG and ATP-bound GlnA1. Although we strongly agree that this would be a highly interesting structure, it seems out of the scope of a typical revision to request new cryo-EM structures. We evaluate the findings of our present study concerning the 2-OG effects as important insights into the strongly discussed field of glutamine synthetase regulation, even without the requested additional structures.

      (3) The observed GlnA1-filaments are an interesting finding. We certainly agree with the referee on that point, that the stacked polymers are potentially induced by 2-OG or ions. However, it is out of the main focus of this manuscript to further explore those filaments. Nevertheless, this observation could serve as an interesting starting point for future experiments.

      Reviewer #3 (Public Review):

      Summary:

      The current manuscript investigates the effect of 2-oxoglutarate and the Glk1 protein as modulators of the enzymatic reactivity of glutamine synthetase. To do this, the authors rely on mass photometry, specific activity measurements, and single-particle cryo-EM data.

      From the results obtained, the authors convey that glutamine synthetase from Methanosarcina mazei exists in a non-active monomeric/dimeric form under low concentrations of 2-oxoglutarate, and its oligomerization into a dodecameric complex is triggered by higher concentration of 2-oxoglutarate, also resulting in the enhancement of the enzyme activity.

      Strengths:

      Glutamine synthetase is a crucial enzyme in all domains of life. The dodecameric fold of GS is recurrent amongst prokaryotic and archaea organisms, while the enzyme activity can be regulated in distinct ways. This is a very interesting work combining protein biochemistry with structural biology.

      The role of 2-OG is here highlighted as a crucial effector for enzyme oligomerization and full reactivity.

      Weaknesses:

      Various opportunities to enhance the current state-of-the-art were missed. In particular, omissions of the ligand-bound state of GnK1 leave unexplained the lack of its interaction with GS (in contradiction with previous results from the authors). A finer dissection of the effect and role of 2-oxoglurate are missing and important questions remain unanswered (e.g. are dimers relevant during early stages of the interaction or why previous GS dodecameric structures do not show 2-oxoglutarate).

      We thank Reviewer #3 for the expert evaluation and inspiring criticism.

      (1) Encouragement to examine ligand-bound states of GlnK1. We agree and plan to perform the suggested experiments exploring the conditions under which GlnA1 and GlnK1 might interact. We will perform the MP experiments in the presence of ATP. In GlnA1 activity test assays when evaluating the presence/effects of GlnK1 on GlnA1 activity, however, ATP was always present in high concentrations and still we did not observe a significant effect of GlnK1 on the GlnA1 activity.

      (2) The exact role of 2-OG could have been dissected much better. We agree on that point and will improve the clarity of the manuscript. See also Reviewer #1 R.1.

      (3) The lack of studies on dimers. This is actually an interesting point, which we did not consider during writing the manuscript. Now, re-analysing all our MP data in this respect, GlnA1 is likely a dimer as smallest species. Consequently, we will add more supplementary data which supports this observation and change the text accordingly.

      (4) Previous studies and structures did not show the 2-OG. We assume that for other structures, no additional 2-OG was added, and the groups did not specifically analyse for this metabolite either. All methanoarchaea perform methanogenesis and contain the oxidative part of the TCA cycle exclusively for the generation of glutamate (anabolism) but not a closed TCA cycle enabling them to use internal 2-OG concentration as internal signal for nitrogen availability. In the case of bacterial GS from organisms with a closed TCA cycle used for energy metabolism (oxidation of acetyl CoA) like e.g. E. coli, the formation of an active dodecameric GS form underlies another mechanism independent of 2-OG. In case of the recent M. mazei GS structures published by Schumacher et al. 2023, the dodecameric structure is probably a result from the heterologous expression and purification from E. coli. (See also Reviewer #1 R.2). One example of methanoarchaeal glutamine synthetases that do in fact contain the 2-OG in the structure, is Müller et al. 2023.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Specific issues:

      L 141: 2-OG levels increase due to slowing GOGAT reaction (due to Gln limitation as a consequence of N-starvation).... (2-OG also increases in bacteria that lack GDH...)

      As the GS-GOGAT cycle is the major route of ammonium assimilation, consumption of 2-OG by GDH is probably only relevant under high ammonium concentrations.

      In Methanoarchaea, GS is strictly regulated and expression strongly repressed under nitrogen sufficiency - thus glutamate for anabolism is mainly generated by GDH under N sufficiency consuming 2-OG delivered by the oxidative part of the TCA cycle (Methanogenesis is the energy metabolism in methanoarchaea, a closed TCA cycle is not present) thus 2-OG is increasing under nitrogen limitation, when no NH3 is available for GDH.

      L148: it is not clear what is meant by: "and due to the indirect GS activity assay"

      We apologize for not being clear here. The GS activity assay used is the classical assay by Sahpiro & Stadtman 1970 and is a coupled optical test assay (coupling the ATP consumption of the GS activity to the oxidation of NADH by lactate dehydrogenase). Based on the coupled test assay the measurements of low activities show a high deviation. We now added this information in the revised MS respectively.

      L: 177: arguing about 2-OG affinities: more precisely, the 0.75 mM 2-OG is the EC50 concentration of 2-OG for triggering dodecameric formation; it might not directly reflect the total 2-OG affinity, since the affinity may be modulated by (anti)cooperative effects, or by additional sites... as there may be different 2-OG binding sites involved... (same in line 201)

      Thank you for the valuable input. We changed KD to EC50 within the entire manuscript. Concerning possible additional 2-OG binding sites: we did not see any other 2-OG in the cryo-EM structure aside from the described one and we therefore assume that the one described in the manuscript is the main and only one. Considering the high amounts of 2-OG (12.5 mM) used in the structure, it is quite unlikely that additional 2-OG sites exist since they would have unphysiologically low affinities.

      In this respect, instead of the rather poor assay shown in Figure 1D, a more detailed determination of catalytic activation by different 2-OG concentrations should be done (similar to 1A)... This would allow a direct comparison between dodecamerization and enzymatic activation.

      We agree and performed the respective experiments, which are now presented in revised Fig. 1D

      Discussion: the role of 2-OG as a direct activator, comparison with other prokaryotic GS: in other cases, 2-OG affects GS indirectly by being sensed by PII proteins or other 2-OG sensing mechanisms (like 2OG-NtcA-mediated repression of IF factors in cyanobacteria)

      We agree and have added that information in the discussion as suggested.

      290. Unclear: As a second step of activation, the allosteric binding of 2-OG causes a series of conformational.... where is this site located? According to the catalytic effects (compare 1A and 1D) this site should have a lower affinity …

      Thank you very much for pointing this out. Binding of 2-OG only occurs in one specific allosteric binding-site. Binding however, has two effects on the GlnA1: dodecamer assembly and priming of the active site (with two specific EC50, which are now shown in Fig. 1A and D).

      See also public comment #1 (1).

      Reviewer #2 (Recommendations For The Authors):

      The primary concern for me is that mass photometry might lead to incorrect conclusions. The differences in the forms of GS seen in SEC and MP suggest that GS can indeed form a stable dodecamer when the concentration of GS is high enough, as shown in Figure S1B. I strongly suggest using an additional biophysical method to explore the connection between GS and 2-OG in terms of both assembly and activity, to truly understand 2-OG's role in the process of assembly and catalysis.

      We apologize if we did not present this clear enough, however the MP analysis of GlnA1 in the absence of 2-OG showed always (monomers/) dimers, dodecamers were only present in the presence of 2-OG. The SEC analysis in Fig. S1B has been performed in the presence of 12.5 mM 2-OG, we realized this information is missing in the figure legend - we now added this in the revised version. The 2-OG is in addition visible in the Cryo EM structure. Thus, we do not agree to perform additional biophysical methods.

      As for the other experimental findings, they appear satisfactory to me, and I have no reservations regarding the cryoEM data.

      (1) Mass photometry is a fancy technique that uses only a tiny amount of protein to study how they come together. However, the concentration of the protein used in the experiment might be lower than what's needed for them to stick together properly. So, the authors saw a lot of single proteins or pairs instead of bigger groups. They showed in Figure S1B that the M. mazei GS came out earlier than a 440-kDa reference protein, indicating it's actually a dodecamer. But when they looked at the dodecamer fraction using mass photometry, they found smaller bits, suggesting the GS was breaking apart because the concentration used was too low. To fix this, they could try using a technique called analytic ultracentrifuge (AUC) with different amounts of 2-OG to see if they can spot single proteins or pairs when they use a bit more GS. They could also try another technique called SEC-MALS to do similar tests. If they do this, they could replace Figure 1A with new data showing fully formed GS dodecamers when they use the right amount of 2-OG.

      Thank you for this input. In MP we looked at dodecamer formation after removing the 2-OG entirely and re-adding it in the respective concentration. We think that GlnA1 is much more unstable in its monomeric/dimeric fraction and that the complete and harsh removal of 2-OG results in some dysfunctional protein which does not recover the dodecameric conformation after dialysis and re-addition of 2-OG. Looking at the dodecamer-peak right after SEC however, we exclusively see dodecamers, which is now included as an additional supplementary figure (suppl. Fig. 1C). Consequently, we did not perform additional experiments.

      (2) Building on the last point, the estimated binding strength (Kd) between 2-OG and GS might be lower than it really is, because the GS often breaks apart from its dodecameric form in this experiment, even though 2-OG helps keep the pairs together, as seen with cryoEM. What if they used 5-10 times more GS in the mass photometry experiment? Would the estimated bond strength stay the same? Could they use AUC or other techniques like ITC to find out the real, not just estimated, strength of the bond?

      We agree that the term KD is not suitable. We have changed the term KD to EC50 as suggested by reviewer #1, which describes the effective concentration required for 50 % dodecamer assembly. Furthermore, we disagree that the dodecamer breaks apart when the concentrations are as low as in MP experiments. The actual reason for the breaking is rather the harsh dialysis to remove all 2-OG before MP experiments. Right after SEC, the we exclusively see dodecamer in MP (suppl. Fig. S1C). See also #2 (1).

      (3) The fact that the GS hardly works without 2-OG is interesting. I tried to understand the experiment setup, but it wasn't clear as the protocol mentioned in the author's 2021 FEBS paper referred to an old paper from 1970. The "coupled optical test assay" they talked about wasn't explained well. I found other papers that used phosphometry assays to see how much ATP was used up. I suggest the authors give a better, more detailed explanation of their experiments in the methods section. Also, it's unclear why the GS activity keeps going up from 5 to 12.5 mM 2-OG, even though they said it's saturated. They suggested there might be another change happening from 5 to 12.5 mM 2-OG. If that's the case, they should try to get a cryo-EM picture of the GS with lots of 2-OG, both with and without ATP/glutamate (or the Met-Sox-P-ADP inhibitor), to see what's happening at a structural level during this change caused by 2-OG.

      We agree with the reviewer that the GS assay was not explained in detail (since published and known for several years). However, we now added the more detailed description of the assay in the revised MS, which also measures the ATP used up by GS, but couples the generation of ADP to an optical test assay producing pyruvate from PEP with the generated ADP catalysed by pyruvate kinase present in the assay. This generated pyruvate is finally reduced to lactate by the present lactate dehydrogenase consuming NADH, the reduction of which is monitored at 340 nm.

      The still increasing activity of GS after dodecamer formation (max. at 5 mM 2-OG) and the continuously increasing enzyme activity (max. at 12.5 mM 2-OG): See also public reviews, we assume that there are two effects caused by 2-OG: 1. cooperativity of binding (less 2-OG needed to facilitate dodecamer formation) and 2. priming of each active site.

      The suggested additional experiments with and without ATP/Glutamate: Although we strongly agree that this would be a highly interesting structure, it seems out of the scope of a typical revision to request new cryo-EM structures. We evaluate the findings of our present study concerning the 2-OG effects as important insights into the strongly discussed field of glutamine synthetase regulation, even without the requested additional structures.

      (4) Please remake Figure S2, the panels are too small to read the words. At least I have difficulty doing so.

      We assume the reviewer is pointing to Suppl. Fig S3, we now changed this figure accordingly.

      Line 153, the reference Schumacher et al. 23, should be 2023?

      Yes, thank you. We corrected that.

      Line 497. I believe it's UCSF ChimeraX, not Chimera.

      We apologize and corrected accordingly.

      Reviewer #3 (Recommendations For The Authors):

      Recent studies on the Methanothermococcus thermolithotrophicus glutamine synthetase, published by Müller et al., 2024, have identified the binding site for 2-oxoglutarate as well as the conformational changes that were induced in the protein by its presence. In the present study, the authors confirm these observations and additionally establish a link between the presence of 2-oxoglutarate and the dodecameric fold and full activation of GS.

      Curiously, here, the authors could not confirm their own findings that the dodecameric GS can directly interact with the PII-like GlnK1 protein and the small peptide sP26. However, the lack of mention of the GlnK-bound state in these studies is very alarming since it certainly is highly relevant here.

      We agree with the reviewer that we have not observed the interaction with GlnK1 and sP26 in the recent study. Consequently, we speculate that yet unknown cellular factor(s) might be required for an interaction of GlnA1 with GlnK1 and sP26, which were not present in the in vitro experiments using purified proteins, however they were present in the previous pull-down approaches (Ehlers et al. 2005, Gutt et al. 2021). Another reason might be that post-translational modifications occur in M. mazei, which might be important for the interaction, which are also not present in purified proteins expressed in E. coli.

      The manuscript interest could have been substantially increased if the authors had done finer biochemical and enzymatic analyses on the oligomerization process of GS, used GlnK1 bound to known effectors in their assays and would have done some more efforts to extrapolate their findings (even if a small niche) of related glutamine synthetases.

      We thank the reviewer for their valuable encouragement to explore ligand-bound-states of GlnK1. However, in this manuscript we mainly focused on 2-OG as activator of GlnA1 and decided to dedicate future experiments to the exploration of conditions that possibly favor GlnK1-binding.

      In principle, we have explored the ATP bound GlnK1 effects on GlnA1 activity in the activity assays (Fig. 2E) since ATP (3.6 mM) is present. GlnK1 however showed no effects on GlnA1 activity.

      In general, the manuscript is poorly written, with grammatically incorrect sentences that at times, which stands in the way of passing on the message of the manuscript.

      Particular points:

      (1) It is mentioned that 2-OG induces the active oligomeric (dodecamer, 12-mer) state of GlnA1 without detectable intermediates. However, only 62 % of the starting inactive enzyme yields active 12-mers. Note that this is contradicted in line 212.

      Thanks for pointing out this discrepancy. After removing all 2-OG as we did before MP-experiments, GlnA1 doesn’t reach full dodecamers anymore when 2-OG is re-added. This is not because the 2-OG amount is not enough to trigger full assembly, but because the protein is much more unstable in the absence of 2-OG, so we predict that some GlnA1 breaks during dialysis. See also answer reviewer #2 (1) and supplementary figure S1C.

      Is there any protein precipitation upon the addition of 2-OG? Is all protein being detected in the assay, meaning, is monomer/dimer + dodecamer yields close to 100% of the total enzyme in the assay?

      There is no protein precipitation upon the addition of 2-OG, indeed, GlnA1 is much more stable in the presence of 2-OG. In the mass photometry experiments, all particles are measured, precipitated protein would be visible as big entities in the MP.

      Please add to Figure 1 the amount of monomer/dimer during titration. Some debate why there is no full conversion should be tentatively provided.

      We agree with the reviewer and included the amount of monomer/dimer in the figure, as well as some discussion on why it is not fully converted again. GlnA1 is unstable without 2-OG and it was dialysed against buffer without 2-OG before MP measurements. This sample mistreatment resulted in no full re-assembly after re-adding 2-OG (although full dodecamers before dialysis (suppl. Fig. S1C).

      (2) Figure 1B reflects an exemplary result. Here, the addition of 0.1 mM 2-OG seems to promote monomer to dimer transition. Why was this not studied in further detail? It seems highly relevant to know from which species the dodecamer is assembled.

      We thank the reviewer for their comment. However, we would like to point out that, although not shown in the figure, GlnA1 is always mainly present as dimers as the smallest entity. As suggested earlier, we have added the amount of monomers/dimers to Figure 1A, which shows low monomer-counts at all 2-OG concentrations (Fig.1A). Although not depicted in the graph starting at 0.01 mM OG, we also see mainly dimers at 0 mM 2-OG.

      How does the y-axis compare to the number and percentage of counts assigned to the peaks? In line 713, it is written that the percentage of dodecamer considers the total number of counts, and this was plotted against the 2-OG concentration.

      We thank the reviewer for addressing this unclarity. Line 713 corresponds to Figure 1A, where we indeed plotted the percentage of dodecamer against the 2-OG-concentration. Thereby, the percentage of dodecamer corresponds to the percentage calculated from the Gaussian Fit of the MP-dodecamer-peak. In Figure 1 B, however, the y-axis displays the relative amount of counts per mass, multiple similar masses then add up to the percentage of the respective peak (Gaussian Fit above similar masses).

      (3) Lines 714 and 721 (and elsewhere): Why only partial data is used for statistical purposes?

      We in general only show one exemplary biological replicate, since the quality of the respective GlnA1 purification sometimes varied (maximum activity ranging from 5 - 10 U/mg). Therefore, we only compared activities within the same protein purification. For the EC50 calculations of all measurements, we refer to the supplement.

      (4) Lines 192-193: It is claimed that GlnK1 was previously shown to both regulate the activity of GlnA1 and form a complex with GlnA1. Please mention the ratio between GlnK1 and GlnA1 in this complex.

      We now included the requested information (GlnA1:GlnK1 1:1, (Ehlers et al. 2005); His6-GlnA1 (0.95 μM), His6-GlnK1 (0.65 μM); 2:1,4, Gutt et al. 2021).

      It is also known that PII proteins such as GlnK1 can bind ADP, ATP, and 2-OG. Interestingly, however, for various described PII proteins, 2-OG can only bind after the binding of ATP.

      So, the crucial question here is what is the binding state of GlnK1? 

      Were these assays performed in the absence of ATP? This is key to fully understand and connect the results to the previous observations. For example, if the GlnK1 used was bound to ADP but not to ATP, then the added 2-OG might indeed only be able to affect GlnA1 (leading to its activation/oligomerization). If this were true and according to the data reported, ADP would prevent GlnK1 from interacting with any oligomeric form of GlnA1. However, if GlnK1 bound to ATP is the form that interacts with GlnA1 (potentially validating previous results?) then, 2-OG would first bind to GlnK1 (assuming a higher affinity of 2-OG to GlnK1), eventually causing its release from GlnA1 followed by binding and activation of GlnA1.

      These experiments need to be done as they are essential to further understand the process. Given the ability of the authors to produce the protein and run such assays, it is unclear why they were not done here. As written in line 203, in this case, "under the conditions tested" is not a good enough statement, considering what is known in the field and how many more conclusions could easily be taken from such a setup.

      Thanks for the encouragement to investigate the ligand-bound states of GlnK1. We agree and plan to perform the suggested mass photometry experiments exploring the conditions under which GlnA1 and GlnK1 might interact in future work. In GlnA1 activity test assays, when evaluating the presence/effects of GlnK1 on GlnA1 activity, however, ATP was always present in high concentrations and still we did not observe a significant effect of GlnK1 on the GlnA1 activity.

      (5) Figure 2D legend claims that the graphic shows the percentage of dodecameric GlnA1 as a function of the concentration of 2-OG. This is not what the figure shows; Figure 2D shows the dodecamer/dimer (although legend claims monomer was used, in line 732) ratio as a function of 2-OG (stated in line 736!). If this is true, a ratio of 1 means 50 % of dodecamers and dimers co-exist. This appears to be the case when GlnK1 was added, while in the absence of GlnK1 higher ratios are shown for higher 2-OG concentration implying that about 3 times more dodecamers were formed than dimers. However, wouldn´t a 50 % ratio be physiologically significant?

      We apologize for the partially incorrect and also misleading figure legend and corrected it. Indeed, the ratio of dodecamers and dimers is shown. Furthermore, we did not use monomeric GlnA1 (the smallest entity is mainly a dimer, see Fig 1A), however, the molarity was calculated based on the monomer-mass. Concerning the significance of the difference between the maximum ratio of GlnA1 and GlnK1: The ratio does appear higher, but this is mostly because adding large quantities of GlnK1 broadens all peaks at low molecular weight. This happens because the GlnK1 signal starts overlapping with the signal from GlnA1, leading to inflated GlnA1 dimer counts. We therefore do not think that this is biologically significant, especially as the activities do not differ under these conditions.

      (6) Is it possible that the uncleaved GlnA1 tag is preventing interaction with GlnK1? This should be discussed.

      This is of course a very important point. We however realized that Schumacher et al. also used an N-terminal His-tag, so we assume that the N-terminal tag is not hampering the interaction.

      (7) Line 228: Please detail the reported discrepancies in rmsd between the current protein and the gram-negative enzymes.

      The differences in rmsd between our M.mazei GlnA1 structure and the structure of gram-negative enzymes is caused by a) sequence similarity: E.g. M.mazei GlnA1 compared to B.subtilis GlnA have a sequence percent identity of 58.47; b) ligands in the structure: The B.Subtilis structure contains L-Methionine-S-sulfoximine phosphate, a transition state inhibitor, while the M. mazei  structure contains 2OG; c) Methodology: The structural determination methods also contribute to these differences. B. subtilis GlnA was determined using X-ray crystallography, while the M. mazei GlnA1 structure was resolved using Cryo-EM, where the protein behaves differently in ice compared to a crystal.

      (8) Line 747: The figure title claims "dimeric interface" although the manuscript body only refers to "hexameric interface" or "inter-hexamer interface" (line 224). Moreover, the figure 4 legend uses terms such as vertical and horizontal dimers and this too should be uniformized within the manuscript.

      Thank you for your valuable feedback. We have updated both the figure title and the figure legend as well in the main text to ensure consistency in the description.

      (9) Line 752: The description of the color scheme used here is somehow unclear.

      Thanks for pointing this out. We changed the description to make it more comprehensive.

      (10) Please label H14/15 and H14´/H15´in Fig 4C zoom.

      We agree that this has not been very clear. We added helix labels.

      (11) In Figure 4D legend, make sure to note that the binding sites for the substrate are based on homologies with another enzyme poised with these molecules.

      The same should be clear in the text: sites are not known, they are assumed to be, based on homologies (paragraph starting at line 239).

      Concerning this comment we want to point out that we studied the exact same enzyme as the Schumacher group, except that we used 2-OG in our experiments, which they did not.

      (12) Figure 3 appears redundant in light of Figure 4. 

      (13) Line 235: When mentioning F24, please refer to Figure 5.

      Thank you, we changed that accordingly.

      (14) Please provide the distances for the bonds depicted in Figure 4B.

      Thanks for pointing this out, we added distance labels to Figure 4B. For reasons of clarity only to three H-bonds.

      (15) Line 241: D57 is likely serving to abstract a proton from ammonium, what is residue Glu307 potentially doing? The information seems missing in light of how the sentence is built.

      Thanks for pointing this out. According to previous studies both residues are likely involved in proton abstraction - first from ammonium, and then from the formed gamma-ammonium group. Additionally, they contribute in shielding the active site from bulk solvent to prevent hydrolysis of the formed phospho-glutamate.

      (16) Why do the authors assume that increased concentrations of 2-OG are a signal for N starvation only in M. mazei and not in all prokaryotic equivalent systems (line 288)?

      In line 288, we did not claim that this is a unique signal for M. mazei. It is also the central N-starvation signal in Cyanobacteria but not directly perceived by the cyanobacterial GS through binding directly to GS.

      The authors should look into the residues that bind 2-OG and check if they are conserved in other GS. The results of this sequence analysis should be discussed in line with the variable prokaryotic glutamine synthetase types of activity modulation that were exposed in the introduction and Figure 7.

      Please refer to supplementary figure S5, where we already aligned the mentioned glutamine synthetase sequences. Since this was also already discussed in Müller et al. 2024, we did not want to repeat their observations and refer to our supplementary figure in too much detail.

      (17) Figure 5 title: Replace TS by transition state structures of homology enzymes, or alike.

      Thank you for this suggestion. We did not change the title however, since it is not a homologue but the exact same glutamine synthetase from Methanosarcina mazei.

      (18) Line 249: D170 is not shown in Figure 5A or elsewhere in Figure 5.

      Thank you for pointing this out. We added D170 to figure 5A.

      (19) Representative density for the residues binding 2-OG should be provided, maybe in a supplemental figure.

      Thank you for the suggestion. We added the densities of 2-OG-binding residues to figure 4B

      (20) Line 260: Please add a reference when describing the phosphoryl transfer.

      We thank the reviewer for this important point and added that accordingly.

      (21) Line 296: The binding of 2-OG indeed appears to be cooperative, such that at concentrations above its binding affinity to the protein, only dodecamers are seen (under experimental conditions). However, claiming that the oligomerization is fast is not correct when the experimental setup includes 10 minutes of incubation before measurements are done. Please correct this within the entire manuscript.

      A (fast) continuous kinetic assay could have confirmed this point and revealed the oligomerization steps and the intermediaries in the process (maybe monomer/dimers, then dimers/hexamers, and then hexamers/dodecamers). Such assays would have been highly valuable to this study.

      We thank the reviewer for this suggestion, but disagree. It is indeed a rather fast regulation (as activity assays without pre-incubation only takes 1 min longer to reach full activity, see the newly included suppl. Fig S6). Considering other regulation mechanisms like e.g. transcription or translation regulation, an activation that takes only 60 s is actually quite quick.

      (22) Line 305 (and elsewhere in the manuscript): the authors state that 2-OG primes the active site for a transition state. This appears incorrect. The transition state is the highest energy state in an enzymatic reaction progressing from substrate to product. Meaning, the transition state is a state that has a more or less modified form of the original substrate bound to the active site. This is not the case.

      In line 366 an "active open state" appears much more adequate to use. 

      We agree and changed accordingly throughout the manuscript.

      (23) Line 330: Please delete "found". Eventually replace it with "confirmed": As the authors write, others have described this residue as a ligand to glutamine.

      Thanks, we changed that accordingly, although previous descriptions were just based on homologies without the experimental validation.

      (24) The discussion in at various points summarizing again the results. It should be trimmed and improved.

      (25) Line 381: replace "two fast" with "fast"?

      We thank the reviewer for this suggestion, but disagree on this point. We especially wanted to highlight that there are two central nitrogen-metabolites involved in the direct regulation of GlnA1, that means TWO fast direct processes mediated by 2-OG and glutamine.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This manuscript describes an important study of the giant virus Jyvaskylavirus. The characterisation presented is solid, although, in the current form, it is not clear to what extent these findings change our perception of how giant viruses, especially those isolated from a cold environment, function. The work will be of interest to virologists working on giant viruses as well as those working with other members of the PRD1/Adenoviridae lineage.

      Thank you for the revision and positive comments. We decided to submit our revised version of the manuscript with changes made in light of the comments made by the editorial team and the reviewers. We hope that now the manuscript is in a better shape and satisfies all comments received. Major changes made were:

      - We changed the author order considering reviewer 2 comments (point 11). Note that no author was added or removed, we just rearranged the order of authorship.

      - We included a new supplementary table with the Jyvaskylavirus genome annotation. This is now supplementary table 2.

      - We included a supplementary figure 9 to support our changes based on reviewer 2 comments (point 6).

      - Figures 2,5,6,7 and the supplementary figure 2 were updated to accommodate our answers to different reviewer comments.

      - Three new references were added to support some of our changes.

      Below you will find our responses to each specific point raised by the reviewers.

      Public Reviews:

      Reviewer #1 (Public review):

      This study presents Jyvaskylavirus, a new member of the Marseilleviridae family, infecting Acanthamoeba castellanii. The study provides a detailed and comprehensive genomic and structural analysis of Jyvaskylavirus. The authors identified ORF142 as the capsid penton protein and additional structural proteins that comprise the virion. Using a combination of imaging techniques the authors provide new insights into the giant virus architecture and lifecycle. The study could be improved by providing atomic coordinates and refinement statistics, comparisons with available giant virus structures could be expanded, and the novelty in terms of the first isolated example of a giant virus from Finland could be expounded upon.

      The study contributes new structural and genomic diversity to the Marseilleviridae family, hinting at a broader distribution and ecological significance of giant viruses than previously thought.

      Thank you for your constructive comments. We have addressed each point raised in our rebuttal letter and revised the manuscript accordingly. By following your specific comments, we improved the manuscript regarding atomic coordinates, refinement statistics and novelty of finding a Finnish marseillevirus. Details are provided in the specific answers to your points.

      Reviewer #2 (Public review):

      Summary:

      This paper describes the molecular characterisation of a new isolate of the giant virus Jyvaskylavirus, a member of the Marseilleviridae family infecting Acanthamoeba castellanii. The isolate comes from a boreal environment in Finland, showcasing that giant viruses can thrive in this ecological niche. The authors came up with a non-trivial isolation procedure that can be applied to characterise other members of the family and will be beneficial for the virology field. The genome shows typical Marseilleviridae features and phylogenetically belongs to their clade B. The structural characterisation was performed on the level of isolated virion morphology by negative stain EM, virions associated with cells either during the attachment or release by helium microscopy, the visualisation of the virus assembly inside cells using stained thin sections, and lastly on the protein secondary structure level by reconstructing ~6 A icosahedral map of the massive virion using cryoEM. The cryoEM density combined with gene product structure prediction enabled the identification and functional assessment of various virion proteins.

      Strengths:

      The detailed description of the virus isolation protocol is the largest strength of the paper and this reviewer believes it can be modified for isolating various viruses infecting small eukaryotes. The cryoEM map allows us to understand how exceptionally large virions of these viruses are stabilised by minor capsid proteins and nicely demonstrates the integration of medium-resolution cryoEM with protein structure prediction in deciphering virion protein function. The visualisation of ongoing virus assembly inside virus factories brings interesting hypotheses about the process that; however, needs to be verified in the next studies.

      Weaknesses:

      The conclusions from helium microscopy images are overinterpreted, as the native membrane structure cannot be preserved in a fixed and dehydrated sample. In the image, there are many other parts of the curved membrane and a lot of virions, to me it seems the specific position of the highlighted virion could arise by a random chance. The claim that the cells were imaged in the near-original state by this method should be therefore omitted. Also, no mass spectrometry data are presented that would supplement and confirm the identity of virion proteins which predicted models were fitted into the cryoEM density. For a general virology reader outside of the giant virus field, the results presented in the current state might not have enough influence and the section should be rewritten to better showcase the novelty of findings.

      Thank you for your constructive comments. We thank reviewer #2 for highlighting these weaknesses, giving us the opportunity to improve our study. We have removed the claim that the cells were imaged in a near-original state. Additionally, we agree that the positions of the virions on the cell surface could result from a random distribution. However, the specific virion in panel 3C is situated halfway into a crevice, and it cannot be ruled out that this particular one could be in the process of being endocytotically uptaken. This is why we used the term "probably" while referring to this finding. Regarding the mass spectrometry data, while we understand that MS data would provide an additional layer of evidence to validate the specific proteins present in the virion, they would not confirm the precise location or role of these proteins within the virion.

      We have addressed each point raised in our rebuttal letter and revised the manuscript accordingly.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I have only minor comments which should be relatively simple to address:

      (1) Atomic coordinates should be deposited in the PDB, and refinement statistics for the models provided, for example by expanding Table S2.

      We thank reviewer #1 for the suggestion. In the original submission in the ‘Data availability’ statement we stated that ‘Predicted Jyvaskylavirus PDB models using ModelAngelo and Alphafold have been deposited at BioStudies under the accession number S-BSST1654’. So, atomic coordinates of all predicted models are publicly available at the https://www.ebi.ac.uk/biostudies/ ; for additional clarity we also added the link in the ‘Data availability’ statement in the revised version.

      Our reasoning of not depositing them in the Protein Data Bank associated to our EMD-51613 entry is because they remain predicted models rigid-body fitted into the Jyvaskylavirus density map of 6.3 Å resolution. However, we have added into our BioStudies deposition (BSST1654) the whole Jyvaskylavirus pentameric assembly model (including all identified and predicted major and minor capsid proteins) rigid-body fitted into the Jyvaskylavirus map, and it can be easily downloaded.

      We did not to perform the real-space ‘minimization_global’ refinement of the predicted models corresponding to the ORFs of Melbournevirus (or Jyvaskylavirus) into the corresponding Melbournevirus available densities with entries EMD-37188, 37189, 37190 at ~ 3.5 Å resolution (by block-based reconstruction methods) as these maps were generated and deposited by other authors. Instead, we performed the rigid-body fit-into-map procedure of the individual predicted Jyvaskylavirus models into the previously deposited Melbournevirus maps using ChimeraX, demonstrating a fold-map alignment and assignment (see for example the individual stereo views in Supplementary Figure 6).

      In the revised version, we now provide the refinement statistics for the complete Jyvaskylavirus pentameric assembly (inclusive of peripentonal major capsid and minor capsid proteins) rigid-body fitted as a whole into the Melbournevirus 5-block reconstruction map using PHENIX, resulting into a CC<sub>mask</sub> of 57.3% (this is also stated in Supplementary Figure 7). The same pentameric assembly model was then placed into our lower-resolution 6.3 Å Jyvaskylavirus 3D density map in ChimeraX and rigid-body refined as a whole in PHENIX, yielding a predictably lower CC<sub>mask</sub> of 33%. This pentameric assembly model has now also been included into BioStudies entry.

      The procedure for this rigid body fitting and refinement has been clarified and added to the 'Materials and Methods' section as follows:

      “Then, the corresponding full 3D models were predicted using AlphaFold3 and fitted into the Melbournevirus and Jyvaskylavirus cryoEM density using the fit-into-map routine in ChimeraX together with the peripentonal capsomers (Meng et al 2023). To assess the metric of this fitting (Supplementary Figure 7), the 3.5 Å five-fold Melbournevirus block 3D density (EMDB-37190) was boxed around the pentameric assembly model and refined as a whole using rigid-body refinement in PHENIX, yielding a CC<sub>mask</sub> of 57.3%. The same pentameric model was subsequently fitted into the 6.3 Å Jyvaskylavirus 3D cryo-EM density (previously boxed around the model), resulting in a lower CC<sub>mask</sub> of 33%, consistent with the limited resolution of the capsid map and below regions.”

      (2) The results section 'Jyvaskylavirus three-dimensional architecture' could be expanded to compare and contrast with other giant virus structures, in terms of T-number, diameter, and features on and inside the capsid. This is not essential but would help focus claims of novelty with regard to structure.

      We have added a few lines as indicated by reviewer#1 to contextualize in morphological terms Jyvaskylavirus with other NCLDV viruses as follows:

      “Both the capsid organization and virion size are similar to those of other Marseilleviruses, such as Melbournevirus and Tokyovirus. Pacmanvirus, considered to be at the crossroads between Asfarviridae and Faustoviruses, also possesses the same T number (309) and a comparable diameter to Jyvaskylavirus. In contrast, other giant viruses, such as African swine fever virus (ASFV), representative of the Asfarviridae family, have a T number of 277 and a diameter of approximately 2,100 Å, while PBCV-1, a member of the Phycodnaviridae family, has a T number of 169 and an average diameter of 1,900 Å. All of the above-mentioned viruses have been shown to possess a major capsid protein with a vertical double jelly-roll fold that composes the capsid shell, along with an internal membrane bilayer. Minor capsid proteins have been identified and structurally modelled for the smaller virions ASFV and PBCV-1 (Wang et al. 2019; Shao et al. 2022).”

      (3) The authors highlight one of the main novelties of the virus as being the first to be isolated from Finland. The first isolation of a giant virus from the region is indeed a success but reported isolation experiments for giant viruses are still relatively few. To help shed light on the likely distribution of Jyvaskylavirus-like viruses in the region, and further afield, the genome of Jyvaskylavirus could be searched against relevant available metagenomes.

      In the last decade the interest on finding giant viruses by metagenomics has increased. However, the focus has been on marine environments, where these viruses are shown to be prevalent. Besides the few isolates from the Northern hemisphere mentioned in the manuscript, northern giant viruses were detected in metagenome datasets from glacier samples, epishelf lakes, the permafrost, the Nordic seas and in a deep-sea hydrothermal vent. Most of the genomic hits are for mimivirus-like or phycodnavirus-like sequences. A few marseilleviruses were found in the Loki’s castle deep sea vent, and we have already included these sequences in the analysis shown by the supplementary figure 3. In this case the deep-sea vent viruses clusters outside the conventional clades of the marseilleviridae family, evidencing their uniqueness.

      In response to the suggestion of exploring the distribution of Jyvaskylavirus, we utilized the MGnify-database to search for DNA polymerase (DNApol) and major capsid protein (MCP) sequences. Our findings revealed multiple hits with significantly low E-values (< 1e-80), where both DNApol and MCP were detected from the same studies, indicating the presence of similar virus-like particles (VLPs) globally. Of particular interest was the detection of similar sequences in metagenomes and transcriptomes obtained from drinking water distribution systems of ground and surface waterworks in central and eastern Finland (https://www.ebi.ac.uk/metagenomics/studies/MGYS00005650#overview). We have acknowledged this in the manuscript and cited the appropriated references, as follows:

      Results: “Searching the Jyvaskylavirus major capsid protein and DNA polymerase sequences in the MGnify-database (Richardson et al 2023) yields multiple hits with significantly low E-values (< 1e-80), as expected from the apparent ubiquity of marseilleviruses. Of note was the detection of similar sequences in metagenomes and transcriptomes obtained from drinking water distribution systems of ground and surface waterworks in central and eastern Finland, evidencing that marseilleviruses are prevalent but still unexplored in this region (Tiwari et al 2022)”.

      Discussion: “Marseillevirus DNA polymerase sequences are present in metagenomes from Finnish drinking water distribution systems (Tiwari et al 2022), hinting to a wide distribution of these viruses and still unknown ecological role in Central and Eastern Finland.”

      Reviewer #2 (Recommendations for the authors):

      Apart from the major comments in the weaknesses section, I have these additional minor comments to the authors:

      (1) I do not understand why the authors emphasized the uniqueness of isolating a giant virus from Finland. I think the manuscript would benefit if they rather emphasize that the virus comes from a boreal environment.

      The first giant virus, APMV, was described in 2003. In the following years the apparent ubiquity of these viruses was evidenced by two fronts. Metagenomics made clear that giant viruses are found almost everywhere, biased towards the oceans. Isolation efforts brought new virus groups in evidence but has been so far biased towards central Europe and South America samples. The closest isolated giant viruses to Jyvaskylavirus would be either an uncharacterized Swedish cedratvirus or a few microalgae-infecting mimivirus-like and phycodnaviruses-like isolates from Norway. Among marseilleviruses, Jyvaskylavirus is the northernmost isolate so far. Other marseilleviruses from the northern hemisphere were found in France, India, Japan and Algeria only.

      We still believe that finding a giant virus in Finland is relevant, considering that no other is known to date, be as an isolate or detected by genomics. We have made these observations clearer in the manuscript, giving emphasis to the boreal environment as well.

      (2) All discussed AlphaFold models should be added as Supplementary PDB data.

      We thank reviewer #2 for the suggestion. In the original submission in the ‘Data availability’ statement we stated that ‘Predicted Jyvaskylavirus PDB models using ModelAngelo and Alphafold have been deposited at BioStudies under the accession number S-BSST1654’. So, atomic coordinates of all predicted models are publicly available at the https://www.ebi.ac.uk/biostudies/ ; for additional clarity we also added the link in the ‘Data availability’ statement in the revised version.

      Our reasoning of not depositing them in the Protein Data Bank associated to our EMD-51613 entry is because they remain predicted models rigid-body fitted into the Jyvaskylavirus density map of 6.3 Å resolution. However, we have added into our BioStudies deposition (BSST1654) the whole Jyvaskylavirus pentameric assembly model (including all identified and predicted major and minor capsid proteins) rigid-body fitted into the Jyvaskylavirus map, and it can be easily downloaded.

      We did not to perform the real-space ‘minimization_global’ refinement of the predicted models corresponding to the ORFs of Melbournevirus (or Jyvaskylavirus) into the corresponding Melbournevirus available densities with entries EMD-37188, 37189, 37190 at ~ 3.5 Å resolution (by block-based reconstruction methods) as these maps were generated and deposited by other authors. Instead, we performed the rigid-body fit-into-map procedure of the individual predicted Jyvaskylavirus models into the previously deposited Melbournevirus maps using ChimeraX, demonstrating a fold-map alignment and assignment (see for example the individual stereo views in Supplementary Figure 6).

      In the revised version, we now provide the refinement statistics for the complete Jyvaskylavirus pentameric assembly (inclusive of peripentonal major capsid and minor capsid proteins) rigid-body fitted as a whole into the Melbournevirus 5-block reconstruction map using PHENIX, resulting into a CC<sub>mask</sub> of 57.3% (this is also stated in Supplementary Figure 7).

      The same pentameric assembly model was then placed into our lower-resolution 6.3 Å Jyvaskylavirus 3D density map in ChimeraX and rigid-body refined as a whole in PHENIX, yielding a predictably lower CC<sub>mask</sub> of 33%. This pentameric assembly model has now also been included into BioStudies entry.

      The procedure for this rigid body fitting and refinement has been clarified and added to the 'Materials and Methods' section as follows:

      “Then, the corresponding full 3D models were predicted using AlphaFold3 and fitted into the Melbournevirus and Jyvaskylavirus cryoEM density using the fit-into-map routine in ChimeraX together with the peripentonal capsomers (Meng et al 2023). To assess the metric of this fitting (Supplementary Figure 7), the 3.5 Å five-fold Melbournevirus block 3D density (EMDB-37190) was boxed around the pentameric assembly model and refined as a whole using rigid-body refinement in PHENIX, yielding a CC<sub>mask</sub> of 57.3%. The same pentameric model was subsequently fitted into the 6.3 Å Jyvaskylavirus 3D cryo-EM density (previously boxed around the model), resulting in a lower CC<sub>mask</sub> of 33%, consistent with the limited resolution of the capsid map and below regions.”

      (3) Figure 2A: Could ORFs that encode structural proteins discussed in the paper, be somehow highlighted?

      We have updated Figure2A to include this information.

      (4) Figure 2C: Could be somehow highlighted from these members on which there was conducted structural characterisation (e.g. by some symbol next to the name)?

      We have updated Figure2C to include this information.

      (5) Figure 5A: Could the central bid be shown in a lower threshold (you can retain the threshold for the protein shell)? It would be interesting to see some details of the interior, rather than a massive blob.

      We have decreased the threshold level of the map as suggested.

      (6) Figure 6: the density corresponding to MCPs, minor capsid, and penton proteins respectively could be colour-zoned in Chimera(X). This would better visualise where each entity lies.

      About ORF142 - what other virus protein possesses this fold? Is it similar to the penton protein in other PRD1/Adenoviridae viruses? Maybe some comparison could be presented?

      We have incorporated the feedback from reviewer_#_2 by modifying the corresponding panel A in Figure 6. We have colour-zoned the penton (ORF142), some of the density region corresponding to the MCPs (ORF184) and to the minor cap proteins (ORF121). We have kept in grey the density corresponding to other minor proteins, and those we were able to identify are logically introduced later and shown as individual coloured cartoon tube models fitted into the density in panel A of Figure 7.

      Regarding ORF142, we have included a reference in the Discussion section to a new Supplementary Figure 9, where we provide a side-by-side comparison of the predicted Jyvaskylavirus penton protein model with experimentally derived penton protein models of PRD1 and HCIV-1. In light of this comparison, we have also added a brief clarification in the Discussion as follows:

      “However, in ORF142, the CHEF strands are predicted to be tilted relative to the BIDG strands, with an estimated angle of approximately 60° based on visual inspection (Supplementary Figure 9).”

      (7) Figure 7B: Could the density around the protein be zoned (rather than side view clipped), as this would better showcase how it fits the density?

      Initially, we presented a side view of the clipped surface to highlight the correspondence between the wall-shaped density, characteristic of a low-resolution beta-barrel, and the beta-barrel of the predicted model. Following the Reviewer’s suggestion, we have now surface-zoned the density and provided a stereo view of the density with the model fitted into the map using ChimeraX. While we recognize that stereo views are no longer commonly used in main text figures, we believe they remain valuable for visually assessing the overall match in low-resolution 3D density maps.

      (8) The authors did not try to reconstruct the asymmetric feature of the virion by classifying pentons, which may have identified a special vertex, one they claim might be required for genome packaging in "open particles". I understand the number of particles is low, but even low-resolution classification in C5 might be of interest in the field.

      We thank reviewer #2 for this valuable comment. The potential existence of a unique vertex in Marseilleviruses remains an open and intriguing question. Further investigations, including a significant increase in the number of particles, may help clarify this issue, and we plan to explore this topic in future structural studies.

      (9) Supplementary Figure 2: It would be interesting how the titre changes after the 12 hours, will it plateau? Could you add a bar showing the original titre to the chart showing stability after 109 days? I like the data in this figure and think it should be transferred to the main text.

      The titre at the 12h time point is very close to the titre we often get in our stocks, indicating that indeed it is close to peaking. For comparison: the titre of the 12-hour time point was 10<sup>11.55</sup> TCID50/ml, whereas our stock has a titre of 10<sup>11.66</sup> TCID50/ml. Our growth curve had more time points up to 48h and we lost the later time points due to a higher viral load than predicted, which led to us not being able to count these time points with the dilutions used. Showing the first 12 hours was enough for our initial purpose, which was to show a quick replication cycle for Jyvaskylavirus, in accordance with the other marseilleviruses in which the timing of the replication cycle was observed (see the answer for point 10 below).

      We have added a bar representing the original titre of the stock used for the stability experiment as suggested.

      While preparing the draft we were divided into having the growth and stability figure in the main text or in the supplementary material. Our decision was to move this data to the supplementary material and keep the focus of the main text on the discovery, genome analysis and structural data, as these are the main findings of our work. The specifics regarding stability, growth and other uncharacterized VLPs went to the supplementary material for those in the field who are interested in looking deeper. That being said, we will decide to keep this data as supplementary material if you and the editor agrees.

      (10) In the Discussion, the authors should focus on how our perception of giant viruses changes by this study - compare with other growth curves, stability assays, and structures of giant viruses, showcasing how prevalent those stabilising minor capsid proteins are, etc. My impression is that in the current form, it is just not clear if/how substantial these findings are and such a comparison and putting the results in a bigger picture would considerably increase the impact of the paper.

      Our comparisons with other marseilleviruses were based on genomic and structural characteristics, the two fronts we had data from the literature and databases to compare to. Sadly there is not too much information regarding stability and growth of other isolates that could be used for an in-depth comparison. For example: although marseilleviruses are known to have a fast replication cycle, this has been measured by DAPI staining of DNA inside infected cells to evaluate viral factory formation (Boyer et al 2009), or by time-series observations of viral cycle stages by electron microscopy (Fabre et al 2017), and not by viral titration as done here. We included a mention to these references in the results:

      “A fast replication cycle is a feature also shown for other marseilleviruses (Boyer et al 2009 ; Fabre et al 2017).”

      The literature also does not show virion stability of other isolates, making it impossible to have a comparison with jyvaskylavirus. A comparative study testing different isolates side by side is definitely of relevance and interest, but this would be difficult to be done in a short time due to obtaining other isolates. We believe the results in this manuscript might set some parameters to be used for comparing with other marseilleviruses, by our groups and others, in the future.

      Regarding the prevalence of the minor capsid proteins, we have expanded and clarified the identification of ORFs in Melbournevirus in the ‘Results’ and ‘Discussion’ sections. The revised Supplementary Table 4 has been updated accordingly and referenced in the results to clarify that the identification of Melbourne ORFs was carried out in BLASTp by querying the Jyvaskylavirus minor protein sequences exclusively against the Melbournevirus isolate 1 (NCBI Reference Sequence: NC_025412.1). BLASTp was then performed against the full sequence database, and homologous sequences were primarily retrieved from other marseillaviruses. These results have been compiled in a new Supplementary Table 5.

      However, Supplementary Table 5 also shows that the hits for Melbournevirus are not ranked at the top, and in some cases, they do not appear among the top hits.

      The ‘Results’ section now contains the following text:

      “To this end, we identified the corresponding Jyvaskylavirus ORFs in Melbournevirus through sequence comparison with Melbournevirus isolate 1 (NCBI Reference Sequence: NC_025412.1) (Supplementary Table 34). However, when the identified Jyvaskylavirus ORF sequences were analyzed using BLASTp without restricting the search to the Melbournevirus reference, many hits were observed in other giant viruses, primarily marseillevirus. Remarkably, some of these hits scored higher than those for Melbournevirus, supporting the presence of homologous proteins in these viruses (Supplementary Table 5).”

      The ‘Discussion’ section now contains the following text:

      “Additionally, the observation that the identified Jyvaskylavirus minor capsid protein sequences are shared across other marseillaviruses supports their essential structural and stabilizing roles in these viruses.”

      At the same time, we have modified the ‘Materials and Methods’ section to include a reference to Supplementary Figure 5, where the use of ModelAngelo is mentioned. Additionally, a new Supplementary Figure 10 has been included to clarify how the residues built into the Melbournevirus density using ModelAngelo (without prior knowledge of any sequence) are subsequently matched with the Jyvaskylavirus sequences.

      (11) Based on the author's statement, Iker Arriaga did all the cryoEM experiments. It is strange to me they are not placed higher on the author's list.

      We thank you for this observation and agree with your comment. This manuscript has been in preparation for a few years, and the first draft had the author order defined before the structural data collection and analyses were completed. Iker participation was indeed important and substantial from the first draft to the submitted version and he definitely deserves a better author placement. We have modified the author order to accommodate this. Note that only the author order changed and that no author has been included or removed.

    1. Author response:

      We thank the reviewers and editors for these careful and constructive comments. Based on these comments, we plan to perform new experiments and revised analysis, summarized as follows:

      (1) A more thorough analysis and experimental test of the effects of YW->SR variants on baseline AP excitability in neurons in the absence of any pharmacology.

      (2) More details on modeling of selective block of Na<sub>V</sub>1.2 and Na<sub>V</sub>1.6.

      (3) Revisions to text, figure contents, and figure order to better convey key points and better frame these findings in the context of current clinically available anti-seizure medications that interact with sodium channels.

    1. Author response:

      We thank both reviewers for their comments on our manuscript. We are pleased that the value of this research has been communicated effectively, and that the reviewers agree that whilst our sample size of individuals is relatively small, it offers a unique perspective for understanding the effects of aging for wild chimpanzees’ technological behaviors. Whilst only yielding data on a few individuals, the Bossou archive is the only available data source with which we can currently address these questions over extended timescales, and is key for understanding longitudinal effects of aging for specific individuals. This is particularly true if we are to understand the life-long dynamics of chimpanzees’ technical skills during tasks which require the organization of multiple movable elements. Bossou is the only community where chimpanzees both perform nut cracking with moveable hammer and anvil stones, and have been systematically studied over a period of decades. Moreover, given the dwindling population at Bossou (N = 3 as of 2025), we must make every effort to understand these effects with existing data. We agree that this work will likely form a valuable foundation for future studies, which may aim to either replicate our results, or use our findings to design more specific research questions and approaches.

      In the next iteration of the manuscript, we will elaborate on our choice of field seasons more clearly. However, this was a logistical tradeoff between needing to sample across a long lifespan using fine-granularity behavior coding, versus the time constraints for our project and the likely yield of data collection. We sampled from the middle of individuals’ prime age, up until the oldest recorded ages of individuals lifespans (17 years). Where possible we aimed to use consistent time intervals (approximately 4 years); however, this was not always possible, as in some years data was not collected by researchers at Bossou (for example, during years where there were Ebola outbreaks affecting the region). In such instances, we sampled the closest available year that offered sufficient data to meet our sampling requirements).

      Reviewer 2 raises that there may be a disconnect between how human observers and chimpanzees conceive of efficiency when nut cracking, and support this idea with a citation to previous work on efficiency of Oldowan stone knapping. We agree that knowing precisely how chimpanzees perceive their own efficiency during tool use is not available through observation alone, nor can we assess the true extent to which chimpanzees are concerned about the efficiency of their nut-cracking. However, following previous studies, it is reasonable to assume that adult chimpanzees embody some level of efficiency, given that adults often select tools which aid efficient nut cracking (Braun et al. 2025, J. Hum. Evol.; Carvalho et al. 2008, J. Hum. Evol.; Sirianni et al. 2015, Animal Behav.); perform nut cracking using more streamlined combinations of actions than less experienced individuals (Howard-Spink et al. 2024, Peer J; Inoue-Nakamura & Matsuzawa 1997, J. Comp. Psychol.), and consequently end up cracking nuts using fewer hammer strikes, indicating a higher level of skill (Biro et al. 2003, Animal Cogn.; Boesch et al. 2019, Sci. Rep.). Ultimately, these factors suggest that across adulthood, experienced chimpanzees perform nut cracking with a level of efficiency which exceeds novice individuals, including across the chaine operatoire.

      To account for the multiple ways in which reduced efficiency may manifest later in life, we provide one of the most flexible measures of efficiency in wild chimpanzee tool use to date, which incorporates more classical measures of time and hammer strikes (see previous examples of Biro et al. 2003, Animal Cogn.; Boesch et al. 2019 Sci. Rep.) as well as additional variables which aim to characterize how streamlined behavioral sequences are (tool rotations, tool swaps, nut replacements, etc. see Berdugo et al. 2024 Nat. Hum. Behav for other analyses using similar metrics). In the case of swapping out tools, Reviewer 2 suggests that some of these tool swaps may in fact be to aid nut cracking, by maintaining kernel integrity (a key result relating to Yo’s coula nut cracking efficiency). This however seems unlikely, given that these behaviors were performed extremely rarely by chimpanzees in early field seasons, and were not performed more frequently by other individuals with aging. We will provide additional information behind our metrics for measuring efficiency, with reference to earlier work, and also will incorporate the points raised by Reviewer 2 concerning the limitations with which we can infer chimpanzees’ goals, and how efficiently they meet them.

      Reviewer 1 questioned why we did not sample efficiency data for younger individuals, and compare this data with older individuals to detect the effects of aging. Throughout our manuscript, we compared aging individuals’ nut-cracking efficiency with their efficiency in previous years (thus, at younger ages). This offered each individual personalized benchmark of efficiency in early life, and allowed us to identify aging effects whilst controlling for long-term interindividual variation in skill levels. Indeed, previous analyses at Bossou find that across the majority of adulthood, efficiency varies between individuals, but is relatively stable within individuals (see Berdugo et al. 2024, Nat. Hum. Behav.). As focal aging chimpanzees cracked multiple nuts each field season (and each encounter), we had ample data to fit models that examine individuals’ efficiency over field seasons, using random slopes to model correlations for each individual. By taking this approach, our paper offers a novel perspective by being able to report the longitudinal effects of aging on tool-using efficiency, rather than averaged cross-sectional effects between young and old cohorts. As random slope models (and not just random intercept models) offered the best explanation for variation in aging individuals’ efficiency over our sample period, this implies that focal chimpanzees were experiencing individual-level changes in efficiency over time, thus giving us key evidence that interindividual variation in tool-using efficiency can be compounded by aging.

      We argue that the reductions in efficiency observed for some individuals (e.g. Yo & Velu) are unlikely to be due to environmental changes (e.g. nuts becoming harder in later field seasons), as if this was the case, these effects would be detected across the behaviors of all individuals (which was not observed). Additionally, in the specific case of the hardness of nuts, nuts used in our experiment were sourced from local communities, and were moderately aged. This avoided the use of young nuts which are harder to crack, or older nuts which are often worm-eaten or can be empty (Sakura & Matsuzawa, 1991; Ethology). We will update our manuscript with this information.

      Whilst other factors may introduce general variation into our efficiency data (such as different stones used on different encounters, or more general variation in nut hardness across encounters), very few of these factors predict directional long-term changes in efficiency. Rather, if these factors were driving the majority of variation in our data, we would expect them to lead to variation across visits during earlier field seasons (such as 1999-2008) and later field seasons (2011 onwards) equally, and in a way which does not necessarily correlate with age. This does not match the pattern we observed in our data, where for some individuals (e.g. Yo & Velu), efficiency in nut cracking reduced in later field seasons only, and was relatively consistent across field seasons prior to 2011. Moreover, for Yo – the individual who exhibited the greatest reductions in tool-using efficiency - efficiency continued to decrease across the three of the latest sampled field seasons. Thus, it is more likely Yo was experiencing deleterious effects of aging. We do however agree that additional data on these variables would help us to remove the possibility of compounding factors more rigorously – we will include recommendations for this data to be collected in future studies.

      When modelling the effect of aging on attendance at the outdoor laboratory, we could not use the same approach we used when modelling tool-using efficiency, as we could only acquire one datapoint (attendance rate) per individual for each field season. We therefore had to adapt our analysis, and introduce attendance rates for younger individuals as a baseline to compare against the attendance rates of older individuals across years. We observed a significant interaction effect, where across field seasons, attendance dropped significantly more rapidly for older individuals than younger ones. Reviewer 2 has asked why we do not consider inter-annual variability across this time period, and suggested that we ignored intervening years. This is not the case. When fitting models that examined the effects of aging on attendance, we used all data across all field seasons. We reported an approximate effect size for this significant correlation using a digestible comparison of the attendance rates in the initial and final field seasons sampled. We will ensure that this is clear in the next iteration of our manuscript.

      Reviewer 2 noted that many factors may have influenced the decision for chimpanzees to attend the outdoor laboratory in older field seasons, and the current data may not be used to make strong arguments for changes in attendance rates being due to dietary preferences. We agree that many factors may have influenced these attendance rates, and that is what we have aimed to transparently report within our discussion where we raise an extensive, non-exhaustive list of hypotheses for why we have observed this age-related change in our data. We will aim to ensure that this is exceptionally clear prior to resubmission, and where relevant, will further emphasize points raised by Reviewer 2. We consider some points raised by Reviewer 2 to be unlikely to apply for our study; for example, it is unlikely neophobia has influenced the behaviors of chimpanzees, as these chimpanzees habitually attended the outdoor laboratory at their own accord for over a decade prior to the earliest year we sampled in this study (reflecting extremely high levels of habituation to the experimental set up). Previous studies at Bossou have surveyed the ecology of stone tool use across the home range, and confirm that the outdoor laboratory is visited by chimpanzees during ranging as a food patch (Almeida-Warren et al. 2022 Int. J. Primatol.).

      Reviewer 2 suggested that it would be helpful to have additional data on variables such as hand grip, as this may reveal further information about how cognitive and physiological senescence influences reductions in tool-using efficiency. We agree that whilst further data on hand grips are not required to detect reductions in efficiency per say per se, it would be profitable for future analyses to collect similar data – we will add this as a recommendation to our discussion.

      Finally, Reviewer 2 commented that they found our discussion of coula-nut cracking disruptive to the flow of the manuscript, given that we could not compare with coula-nut cracking in earlier years. We reported the coula nut cracking of Yo in 2011 as it was part of our sampled data, and we felt that the comparison with other individuals in the same year was an interesting discussion point, however we acknowledge this limitation. We will move all data and discussion of coula-nut cracking to the Supplementary Materials, which we will present as an interesting additional observation which may warrant further investigation using additional data from the Bossou archive. Data collection for this future project could include collecting data on the additional variables raised by both reviewers (e.g. hand grips).

      We thank both reviewers for their comments. We believe that their feedback will improve the quality of our reporting, and the validity of our interpretations.

    1. Author response:

      eLife Assessment

      The conclusions of this work are based on valuable simulations of a detailed model of striatal dopamine dynamics. Establishing that a lower dopamine uptake rate can lead to a 'tonic' level of dopamine in the ventral but not dorsal striatum, and that dopamine concentration changes at short delays can be tracked by D1 but not D2 receptor activation, is of value and will be of interest to dopamine aficionados. However, the simulations are incomplete, providing only partial support for the key claims. Several things can be done to strengthen the conclusions, including, for example, but not exclusively, a demonstration of how the results would change as a function of changes in D2 affinity.

      We sincerely thank the Editors and Reviewers for their insightful comments on our manuscript. We are pleased that our simulations are recognized as interesting, sophisticated and valuable. Moreover, we fully agree that many of the findings will be of particular interest to dopamine aficionados. While we maintain that our simulations provide a solid basis for the key claims, we acknowledge that the conclusions can be further strengthened by the revisions suggested below.

      Reviewer #1 (Public review):

      Ejdrup, Gether, and colleagues present a sophisticated simulation of dopamine (DA) dynamics based on a substantial volume of striatum with many DA release sites. The key observation is that a reduced DA uptake rate in the ventral striatum (VS) compared to the dorsal striatum (DS) can produce an appreciable "tonic" level of DA in VS and not DS. In both areas they find that a large proportion of D2 receptors are occupied at "baseline"; this proportion increases with simulated DA cell phasic bursts but has little sensitivity to simulated DA cell pauses. They also examine, in a separate model, the effects of clustering dopamine transporters (DAT) into nanoclusters and say this may be a way of regulating tonic DA levels in VS. I found this work of interest and I think it will be useful to the community. At the same time, there are a number of weaknesses that should be addressed, and the authors need to more carefully explain how their conclusions are distinct from those based on prior models.

      (1) The conclusion that even an unrealistically long (1s) and complete pause in DA firing has little effect on DA receptor occupancy is potentially important. The ability to respond to DA pauses has been thought to be a key reason why D2 receptors (may) have high affinity. This simulation instead finds evidence that DA pauses may be useless. This result should be highlighted in the abstract and discussed more.

      We appreciate that the reviewer finds our work interesting and useful to the community. However, we acknowledge that in the revised version we to need to better describe how our conclusions are different from those reached based on previous models.

      We will also carry out new simulations across a range of D2R affinities to assess how this will affect the finding that even a long pause in DA firing has little effect on DR2 receptor occupancy. As also suggested, the results will be highlighted and further discussed.

      (2) The claim of "DAT nanoclustering as a way to shape tonic levels of DA" is not very well supported at present. None of the panels in Figure 4 simply show mean steady-state extracellular DA as a function of clustering. Perhaps mean DA is not the relevant measure, but then the authors need to better define what is and why. This issue may be linked to the fact that DAT clustering is modeled separately (Figure 4) to the main model of DA dynamics (Figures 1-3) which per the Methods assumes even distribution of uptake. Presumably, this is because the spatial resolution of the main model is too coarse to incorporate DAT nanoclusters, but it is still a limitation.

      We will improve our definitions and descriptions relating to nanoclustering of DAT in the revised version of the manuscript. We fully agree that the spatial resolution of the main model is a limitation and, ideally, that the nanoclustering should be combined with the large-scale release simulations. Unfortunately, this would require many orders of magnitude more computational power than currently available.

      As it stands it is convincing (but too obvious) that DAT clustering will increase DA away from clusters, while decreasing it near clusters. I.e. clustering increases heterogeneity, but how this could be relevant to striatal function is not made clear, especially given the different spatial scales of the models.

      Thank you for raising this important point. While it is true that DAT clustering increases heterogeneity in DA distribution at the microscopic level, the diffusion rate is, in most circumstances, too fast to permit concentration differences on a spatial scale relevant for nearby receptors. Accordingly, we propose that the primary effect of DAT nanoclustering is to decrease the overall uptake capacity, which in turn increases overall extracellular DA concentrations. Thus, homogeneous changes in extracellular DA concentrations can arise from regulating heterogenous DAT distribution. An exception to this would be the circumstance where the receptor is located directly next to a dense cluster – i.e. within nanometers. In such cases, local DA availability may be more directly influenced by clustering effects. This will be further discussed in the revised manuscript.

      (3) I question how reasonable the "12/40" simulated burst firing condition is, since to my knowledge this is well outside the range of firing patterns actually observed for dopamine cells. It would be better to base key results on more realistic values (in particular, fewer action potentials than 12).

      We fully agree that this typically is outside the physiological range. The values are included to showcase what extreme situations would look like.

      (4) There is a need to better explain why "focality" is important, and justify the measure used.

      We will expand on the intention of this measure in the revised manuscript. Thank you for pointing out this lack of clarification.

      (5) Line 191: " D1 receptors (-Rs) were assumed to have a half maximal effective concentration (EC50) of 1000 nM" The assumptions about receptor EC50s are critical to this work and need to be better justified. It would also be good to show what happens if these EC50 numbers are changed by an order of magnitude up or down.

      We agree that these assumptions are critical. Simulations on effective off-rates across a range of EC50 values will be included in the revised version.

      (6) Line 459: "we based our receptor kinetics on newer pharmacological experiments in live cells (Agren et al., 2021) and properties of the recently developed DA receptor-based biosensors (Labouesse & Patriarchi, 2021). Indeed, these sensors are mutated receptors but only on the intracellular domains with no changes of the binding site (Labouesse & Patriarchi, 2021)”

      This argument is diminished by the observation that different sensors based on the same binding site have different affinities (e.g. in Patriarchi et al. 2018, dLight1.1 has Kd of 330nM while dlight1.3b has Kd of 1600nM).

      We sincerely thank the reviewer for highlighting this important point. We fully recognize the fundamental importance of absolute and relative DA receptor kinetics for modeling DA actions and acknowledge that differences in affinity estimates from sensor-based measurements highlight the inherent uncertainty in selecting receptor kinetics parameters. While we have based our modeling decisions on what we believe to be the most relevant available data, we acknowledge that the choice of receptor kinetics is a topic of ongoing debate. Importantly, we are making our model available to the research community, allowing others to test their own estimates of receptor kinetics and assess their impact on the model’s behavior. In our revised manuscript, we will further discuss the rationale behind our parameter choices, including: Our selection of a Kd value of 1000 nM for D1R (based on the observed affinities for D1R sensors) and an extrapolated Koff of 19.5 s<sup>-1</sup> (Labouesse & Patriarchi, 2021). Our use of a Kd value of 7 nM and an extrapolated Koff of 0.2 s<sup>-1</sup> for D2R, consistent with recent binding studies (Ågren et al., 2021).

      (7) Estimates of Vmax for DA uptake are entirely based on prior fast-scan voltammetry studies (Table S2). But FSCV likely produces distorted measures of uptake rate due to the kinetics of DA adsorption and release on the carbon fiber surface.

      We fully agree that this is a limitation of FSCV. However, most of the cited papers attempt to correct for this by way of fitting the output to a multi-parameter model for DA kinetics. If newer literature brings the Vmax values estimated into question, we have made the model publicly available to rerun the simulations with new parameters.

      (8) It is assumed that tortuosity is the same in DS and VS - is this a safe assumption?

      The original paper cited does not specify which region the values are measured in. However, a separate paper estimates the rat cerebellum has a comparable tortuosity index (Nicholson and Phillips, J Physiol. (1981)), suggesting it may be a rather uniform value across brain regions.

      (9) More discussion is needed about how the conclusions derived from this more elaborate model of DA dynamics are the same, and different, to conclusions drawn from prior relevant models (including those cited, e.g. from Hunger et al. 2020, etc).

      As part of our revision, we will expand the current discussion of our finding in the context of previous models in the manuscript

      Reviewer #2 (Public review):

      The work presents a model of dopamine release, diffusion, and reuptake in a small (100 micrometer^2 maximum) volume of striatum. This extends previous work by this group and others by comparing dopamine dynamics in the dorsal and ventral striatum and by using a model of immediate dopamine-receptor activation inferred from recent dopamine sensor data. From their simulations, the authors report two main conclusions. The first is that the dorsal striatum does not appear to have a sustained, relatively uniform concentration of dopamine driven by the constant 4Hz firing of dopamine neurons; rather that constant firing appears to create hotspots of dopamine. By contrast, the lower density of release sites and lower rate of reuptake in the ventral striatum creates a sustained concentration of dopamine. The second main conclusion is that D1 receptor (D1R) activation is able to track dopamine concentration changes at short delays but D2 receptor activation cannot.

      The simulations of the dorsal striatum will be of interest to dopamine aficionados as they throw some doubt on the classic model of "tonic" and "phasic" dopamine actions, further show the disconnect between dopamine neuron firing and consequent release, and thus raise issues for the reward-prediction error theory of dopamine.

      There is some careful work here checking the dependence of results on the spatial volume and its discretisation. The simulations of dopamine concentration are checked over a range of values for key parameters. The model is good, the simulations are well done, and the evidence for robust differences between dorsal and ventral striatum dopamine concentration is good.

      However, the main weakness here is that neither of the main conclusions is strongly evidenced as yet. The claim that the dorsal striatum has no "tonic" dopamine concentration is based on the single example simulation of Figure 1 not the extensive simulations over a range of parameters. Some of those later simulations seem to show that the dorsal striatum can have a "tonic" dopamine concentration, though the measurement of this is indirect. It is not clear why the reader should believe the example simulation over those in the robustness checks, for example by identifying which range of parameter values is more realistic.

      We appreciate that the reviewer finds our work interesting and carefully performed. The reviewer is correct that DA dynamics, including the presence and level of tonic DA, are parameter-dependent in both the dorsal striatum (DS) and ventral striatum (VS). Indeed, our simulations across a broad range of biological parameters were intended to help readers understand how such variation would impact the model’s outcomes, particularly since many of the parameters remain contested. Naturally, altering these parameters results in changes to the observed dynamics. However, to derive possible conclusions, we selected a subset of parameters that we believe best reflect the physiological conditions, as elaborated in the manuscript. This is eventually required in computational modelling of biological systems. In response to the reviewer’s comment, we will place greater emphasis on clarifying which parameter regimes produce a "tonic" versus "non-tonic" DA state in the DS. Additionally, we will underscore that the distinction between tonic and non-tonic states is not a binary outcome but a parameter-dependent continuum—one that our model now allows researchers to explore systematically. Finally, we will highlight how our simulations across parameter space not only capture this continuum but also identify the regimes that produce the most heterogeneous DA signaling, both within and across striatal regions.

      The claim that D1Rs can track rapid changes in dopamine is not well supported. It is based on a single simulation in Figure 1 (DS) and 2 (VS) by visual inspection of simulated dopamine concentration traces - and even then it is unclear that D1Rs actually track dynamics because they clearly do not track rapid changes in dopamine that are almost as large as those driven by bursts (cf Figure 1i).

      We would like to draw the attention also to Fig. S1, where the claim that D1R track rapid changes is supported in more depth. According to this figure, upon coordinated burst firing, the D1R occupancy rapidly increased as diffusion no longer equilibrated the extracellular concentrations on a timescale faster than the receptors – and D1R receptor occupancy closely tracked extracellular DA with a delay on the order of tens of milliseconds. Note that the brief increases in [DA] from uncoordinated stochastic release events from tonic firing in Fig. 1i are too brief to drive D1 signaling, as the DA concentration diffuses into the remaining extracellular space on a timescale of 1-5 ms. This is faster than the receptors response rate, and does not lead to any downstream signaling according to our simulations. This means D1 kinetics are rapid enough to track coordinated signaling on a ~50 ms timescale and slower, but not fast enough to respond to individual release events from tonic activity. In our revised manuscript we will expand the discussion of this topic to provide greater clarity.

      The claim also depends on two things that are poorly explained. First, the model of binding here is missing from the text. It seems to be a simple bound-fraction model, simulating a single D1 or D2 receptor. It is unclear whether more complex models would show the same thing.

      We realize that this is not made clear in the methods and, accordingly, we will update the method section to elaborate on how we model receptor binding. The model simulates occupied fraction of D1R and D2R in every single voxel of the simulation space.

      Second, crucial to the receptor model here is the inference that D1 receptor unbinding is rapid; but this inference is made based on the kinetics of dopamine sensors and is superficially explained - it is unclear why sensor kinetics should let us extrapolate to receptor kinetics, and unclear how safe is the extrapolation of the linear regression by an order of magnitude to get the D1 unbinding rate.

      We chose to use the sensors because it was possible to estimate precise affinities/off-rates from the fluorescent measurements. Although there might some variation in affinities that could be attributable to the mutations introduced in the sensors, the data clearly separated D1R and D2R with a D1R affinity of ~1000 nM and a D1R affinity of ~7 nM (Labouesse & Patriarchi, 2021) consistent with earlier predictions of receptor affinities. From our assessment of the literature we found that this was the most reasonable way to estimate affinities and thereby off-rates. Importantly, the model has been made publicly available, so should new measurements arise, the simulations can be rerun with tweaks to the input parameters.

    1. Author response:

      We thank the reviewers for their thoughtful feedback. Below we provide an initial response to the central concerns that they have raised. In general, as part of our revisions, we plan to perform additional analyses to strengthen our conclusions, tone down more speculative interpretations, and clarify the novel contributions of our work. A full, point-by-point reply will follow alongside the revised manuscript.

      Briefly, the reviewers’ central concerns are that some of the conclusions are not sufficiently supported by the experimental evidence, specifically (1) the involvement of sharp-wave ripple (SWR)-unmodulated PFC neurons in signaling upcoming choice and (2) the absence of SWR time-locking of PFC non-local representations. They further suggest that (3) the spatial tuning in the PFC may reflect other cognitive processes rather than encoding spatial information; and (4) the manuscript is ambiguous as to which results are novel or corroborating previous work.

      (1) SWR-unmodulated PFC neurons signaling upcoming choice

      Reviewer 1 suggests that our finding that SWR-modulated neurons relate to hippocampal non-local representations contradicts the manuscript’s main conclusion. However, in our view, there is no contradiction and the finding highlights the distinction between the two sub-populations, namely the SWR-modulated neurons linked to hippocampal non-local representations, and the SWR-unmodulated neurons that are more active during prefrontal non-local representations.

      We do agree with the reviewer that the observation of higher firing rates of SWR-unmodulated neurons in the expression of non-local representations does not mean that these neurons are the sole or even main contributors to the non-local decoding. To address both comments, we will perform additional analyses to further disentangle the contributions of SWR-modulated and SWR-unmodulated PFC neurons to the non-local representations of upcoming choice.

      (2) Time-locking of PFC non-local representations to hippocampal SWRs

      Reviewer 1 comments that in the analysis of time-locking to hippocampal SWRs and theta phase, the behavior of the animals needs to be taken into account (i.e., immobility or running). We confirm that this was indeed done in our analysis and we will clarify this point in the revised manuscript.

      The reviewer further requested that PFC decoding during SWRs be performed at shorter timescales as in previous studies. We like to point out that (1) we found no increase in non-local decoding in the PFC around SWR onset (see Fig 5a), and (2) most of the non-local representations in the PFC occurred during the expression of local representations in the hippocampus (see Fig 4d). These data suggest that the non-local representations in both brain regions are expressed independently. To further strengthen this idea, we plan to (1) include the result of decoding PFC activity during SWRs at fine timescales as the reviewer suggested, and (2) look at the firing rates of PFC neurons during non-local representations exclusively when the hippocampus is encoding the actual (local) position.

      Following a suggestion by reviewer 2, we will also add a statistical assessment of how strongly the data supports the absence of time-locking.

      (3) Spatial tuning in the mPFC

      Reviewer 2 points out that the spatial tuning in the prefrontal cortex may be related to cognitive processes (e.g., attention or decision-making) rather than spatial encoding. However, our results show that decoded mPFC activity reliably differentiates between the two start and goal arms (Fig 4a), rate maps show little evidence of mirroring (Fig 3a), and the activity predicts turns in the cue-based task during which goal arms switch pseudo-randomly (meaning that the non-local representations encode the North and South arm alternatingly and correctly, rather than encoding a general rewarded goal arm; Fig. 4b). While it is likely that mPFC encodes several task-related variables, our data suggest that it also encodes distinct locations.

      The reviewer further claims that the results of Jadhav et al. (2016) contradict our findings because they supposedly showed that mPFC neurons unmodulated by SWRs are less tuned to space. However, this is incorrect, as Jadhav et al. (2016) showed that SWR-unmodulated PFC neurons have lower spatial coverage and consequentially are more spatially selective, which is consistent with our observations. We will rephrase this in the text to improve clarity.

      (4) Novelty

      We thank reviewer 2 for pointing out the significance of several novel findings in our work that deserve to be highlighted. This includes the dorsal-ventral profile of SWR-modulation and theta phase locking in the PFC and our observation that the neural representations in the PFC precede the behavioral switch in reversal learning. In our revised manuscript, we will rewrite the text to better emphasize our novel contributions, clearly distinguish new findings from confirmatory observations, and add missing citations where appropriate.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1 (Public Review):

      O’Neill et al. have developed a software analysis application, miniML, that enables the quantification of electrophysiological events. They utilize a supervised deep learned-based method to optimize the software. miniML is able to quantify and standardize the analyses of miniature events, using both voltage and current clamp electrophysiology, as well as optically driven events using iGluSnFR3, in a variety of preparations, including in the cerebellum, calyx of held, Golgi cell, human iPSC cultures, zebrafish, and Drosophila. The software appears to be flexible, in that users are able to hone and adapt the software to new preparations and events. Importantly, miniML is an open-source software free for researchers to use and enables users to adapt new features using Python.

      Overall this new software has the potential to become widely used in the field and an asset to researchers. However, the authors fail to discuss or even cite a similar analysis tool recently developed (SimplyFire), and determine how miniML performs relative to this platform. There are a handful of additional suggestions to make miniML more user-friendly, and of broad utility to a variety of researchers, as well as some suggestions to further validate and strengthen areas of the manuscript:

      (1) miniML relative to existing analysis methods: There is a major omission in this study, in that a similar open source, Python-based software package for event detection of synaptic events appears to be completely ignored. Earlier this year, another group published SimplyFire in eNeuro (Mori et al., 2024; doi: 10.1523/eneuro.0326-23.2023). Obviously, this previous study needs to be discussed and ideally compared to miniML to determine if SimplyFire is superior or similar in utility, and to underscore differences in approach and accuracy.

      We thank the reviewer for bringing this interesting publication to our attention. We have included SimplyFire in our benchmarking for comprehensive comparison with miniML. The approach taken by SimplyFire differs from miniML in a number of ways. Our results show that miniML provides higher recall and precision than SimplyFire (revised Figure 3). We appreciate that SimplyFire provides a user-interface similar to the commonly used MiniAnalysis software. In addition, the peak-finding-based approach of SimplyFire makes it relatively robust to event shape, which facilitates analysis of diverse data. However, we noted a strong threshold-dependence and long run time of SimplyFire (revised Figure 3 and Figure 3—figure supplement 1). In addition, SimplyFire is not robust against various types of noise typically encountered in electrophysiological recordings. Our extended benchmark analysis thus indicates that AI-based event detection is superior to existing algorithmic approaches, including SimplyFire.

      (2) The manuscript should comment on whether miniML works equally well to quantify current clamp events (voltage; e.g. EPSP/mEPSPs) compared to voltage clamp (currents, EPSC/mEPSCs), which the manuscript highlights. Are rise and decay time constants calculated for each event similarly?

      miniML works equally well for current- and voltage events (Figure 5, Figure 9). In general, events of opposite polarity can be analyzed by simply inverting the data. Transfer learning models may further improve the detection.

      For each detected event, independent of data/recording type, rise times are calculated as 10–90% times (baseline–peak), and decay times are calculated as time to 50% of the peak. In addition, event decay time constants are calculated from a fit to the event average. With miniML being open-source, researchers can adapt the calculations of event statistics to their needs, if desired. In the revised manuscript, we have expanded the Methods section that describes the quantification of event statistics (Methods, Quantification).

      (3) The interface and capabilities of miniML appear quite similar to Mini Analysis, the free software that many in the field currently use. While the ability and flexibility for users to adapt and adjust miniML for their own uses/needs using Python programming is a clear potential advantage, can the authors comment, or better yet, demonstrate, whether there is any advantage for researchers to use miniML over Mini Analysis or SimplyFire if they just need the standard analyses?

      Following the reviewer’s suggestion, we developed a graphical user interface (GUI) for miniML to enhance its usability (Figure 2—figure supplement 2), which is provided on the GitHub repository. Our comprehensive benchmark analysis demonstrated that miniML outperforms existing tools such as MiniAnalysis and SimplyFire. The main advantages are (i) increased reliability of results, which eliminates the need for visual inspection; (ii) fast runtime and easy automation; (iii) superior detection performance as demonstrated by higher recall in both synthetic and real data; (iv) open-source Python-based design. We believe that these advantages make miniML a valuable tool for researchers recording various types of synaptic events, offering a more efficient and reliable solution compared to existing methods.

      (4) Additional utilities for miniML: The authors show miniML can quantify miniature electrophysiological events both current and voltage clamp, as well as optical glutamate transients using iGluSnFR. As the authors mention in the discussion, the same approach could, in principle, be used to quantify evoked (EPSC/EPSP) events using electrophysiology, Ca2+ events (using GCaMP), and AP waveforms using voltage indicators like ASAP4. While I don’t think it is reasonable to ask the authors to generate any new experimental data, it would be great to see how miniML performs when analysing data from these approaches, particularly to quantify evoked synaptic events and/or Ca2+ (ideally postsynaptic Ca2+ signals from miniature events, as the Drosophila NMJ have developed nice approaches).

      In the revised manuscript, we have extended the application examples of miniML. We applied miniML to detect mEPSPs recorded with the novel voltage-sensitive indicator ASAP5 (Figure 9 and Figure 9—figure supplement 1). We performed simultaneous recordings of membrane voltage through electrophysiology and ASAP5 voltage imaging in rat cultured neurons at physiological temperature. Data were analyzed using miniML, with electrophysiology data being used as ground-truth for assessing detection performance in imaging data. Our results demonstrate that miniML robustly detects mEPSPs in current-clamp, and can localize corresponding transients in imaging data. Furthermore, we observed that miniML performs better than template matching and deconvolution on ASAP5 imaging data (Figure 9 and Figure 9—figure supplement 2).

      Reviewer 2 (Public Review):

      This paper presents miniML as a supervised method for the detection of spontaneous synaptic events. Recordings of such events are typically of low SNR, where state-of-the-art methods are prone to high false positive rates. Unlike current methods, training miniML requires neither prior knowledge of the kinetics of events nor the tuning of parameters/thresholds.

      The proposed method comprises four convolutional networks, followed by a bi-directional LSTM and a final fully connected layer which outputs a decision event/no event per time window. A sliding window is used when applying miniML to a temporal signal, followed by an additional estimation of events’ time stamps. miniML outperforms current methods for simulated events superimposed on real data (with no events) and presents compelling results for real data across experimental paradigms and species. Strengths:

      The authors present a pipeline for benchmarking based on simulated events superimposed on real data (with no events). Compared to five other state-of-the-art methods, miniML leads to the highest detection rates and is most robust to specific choices of threshold values for fast or slow kinetics. A major strength of miniML is the ability to use it for different datasets. For this purpose, the CNN part of the model is held fixed and the subsequent networks are trained to adapt to the new data. This Transfer Learning (TL) strategy reduces computation time significantly and more importantly, it allows for using a substantially smaller data set (compared to training a full model) which is crucial as training is supervised (i.e. uses labeled examples).

      Weaknesses:

      The authors do not indicate how the specific configuration of miniML was set, i.e. number of CNNs, units, LSTM, etc. Please provide further information regarding these design choices, whether they were based on similar models or if chosen based on performance.

      The data for the benchmark system was augmented with equal amounts of segments with/without events. Data augmentation was undoubtedly crucial for successful training.

      (1) Does a balanced dataset reflect the natural occurrence of events in real data? Could the authors provide more information regarding this matter?

      In a given recording, the event frequency determines the ratio of event-containing vs. nonevent-containing data segments. Whereas many synapses have a skew towards non-events, high event frequencies as observed, e.g., in pyramidal cells or Purkinje neurons, can shift the ratio towards event-containing data.

      For model training, we extracted data segments from mEPSC recordings in cerebellar granule cells, which have a low mEPSC frequency (about 0.2 Hz, Delvendahl et al. 2019). Unbalanced training data may complicate model training (Drummond and Holte 2003; Prati et al. 2009; Tyagi and Mittal 2020). We therefore decided to balance the training dataset for miniML by down-sampling the majority class (i.e., non-event segments), so that the final datasets for model training contained roughly equal amounts of events and non-events.

      (2) Please provide a more detailed description of this process as it would serve users aiming to use this method for other sub-fields.

      We thank the reviewer for raising this point. In the revised manuscript, we present a systematic analysis of the impact of imbalanced training data on model training (Figure 1—figure supplement 2). In addition, we have revised the description of model training and data augmentation in the Methods section (Methods, Training data and annotation).

      The benchmarking pipeline is indeed valuable and the results are compelling. However, the authors do not provide comparative results for miniML for real data (Figures 4-8). TL does not apply to the other methods. In my opinion, presenting the performance of other methods, trained using the smaller dataset would be convincing of the modularity and applicability of the proposed approach.

      Quantitative comparison of synaptic detection methods on real-world data is challenging because the lack of ground-truth data prevents robust, quantitative analyses. Nevertheless, we compared miniML to common template-based and finite-threshold based methods on four different types of synapses. We noted that miniML generally detects more events, whereas other methods are susceptible to false-positives (Figure 4—figure supplement 1). In addition, we analyzed the performance of miniML on voltage imaging data (Figure 9). Simultaneous recordings of electrophysiological and imaging data allowed a quantitative comparison of detection methods in this dataset. Our results demonstrate that miniML provides higher recall for optical minis recorded using ASAP5 (Figure 9 and Figure 9—figure supplement 2; F1 score, Cohen’s d 1.35 vs. template matching and 5.1 vs. deconvolution).

      Impact:

      Accurate detection of synaptic events is crucial for the study of neural function. miniML has a great potential to become a valuable tool for this purpose as it yields highly accurate detection rates, it is robust, and is relatively easily adaptable to different experimental setups.

      Additional comments:

      Line 73: the authors describe miniML as "parameter-free". Indeed, miniML does not require the selection of pulse shape, rise/fall time, or tuning of a threshold value. Still, I would not call it "parameter-free" as there are many parameters to tune, starting with the number of CNNs, and number of units through the parameters of the NNs. A more accurate description would be that as an AI-based method, the parameters of miniML are learned via training rather than tuned by the user.

      We agree that a deep learning model is not parameter-free, and this term may be misleading. We have therefore changed this sentence in the introduction as follows: "The method is fast, robust to threshold choice, and generalizable across diverse data types [...]"

      Line 302: the authors describe miniML as "threshold-independent". The output trace of the model has an extremely high SNR so a threshold of 0.5 typically works. Since a threshold is needed to determine the time stamps of events, I think a better description would be "robust to threshold choice".

      To detect event localizations, a peak search is performed on the model output, which uses a minimum peak height parameter (or threshold). Extreme values for this parameter do indeed have a small impact on detection performance (Figure 3J). We have changed the description in the introduction and discussion according to the reviewer’s suggestion.

      Reviewer 3 (Public Review):

      miniML as a novel supervised deep learning-based method for detecting and analyzing spontaneous synaptic events. The authors demonstrate the advantages of using their methods in comparison with previous approaches. The possibility to train the architecture on different tasks using transfer learning approaches is also an added value of the work. There are some technical aspects that would be worth clarifying in the manuscript:

      (1) LSTM Layer Justification: Please provide a detailed explanation for the inclusion of the LSTM layer in the miniML architecture. What specific benefits does the LSTM layer offer in the context of synaptic event detection?

      Our model design choice was inspired by similar approaches in the literature (Donahue et al. 2017; Islam et al. 2020; Passricha and Aggarwal 2019; Tasdelen and Sen 2021; Wang et al. 2020). Convolutional and recurrent neural networks are often combined for time-series classification problems as they allow learning spatial and temporal features, respectively. Combining the strengths of both network architectures can thus help improve the classification performance. Indeed, a CNN-LSTM architecture proved to be superior in both training accuracy and detection performance (Figure 1—figure supplement 2). Further, this architecture requires fewer free parameters than comparable model designs using fully connected layers instead. The revised manuscript shows a comparison of different model architectures (Figure 1—figure supplement 2), and we added the following description to the text (Methods, Deep learning model architecture):

      "The combination of convolutional and recurrent neural network layers helps to improve the classification performance for time-series data. In particular, LSTM layers allow learning temporal features."

      (2) Temporal Resolution: Can you elaborate on the reasons behind the lower temporal resolution of the output? Understanding whether this is due to specific design choices in the model, data preprocessing, or post-processing will clarify the nature of this limitation and its impact on the analysis.

      When running inference on a continuous recording, we choose to use a sliding window approach with stride. Therefore, the model output has a lower temporal resolution than the raw data, which is determined by the stride length (i.e., how many samples to advance the sliding window). While using a stride is not required, it significantly reduces inference time (cf. Figure 2—figure supplement 1). We recommend a stride of 20 samples, which does not impact the detection of events. Any subsequent quantification of events (amplitude, area, risetimes, etc.) is performed on raw data. Based on the reviewer’s comment, we have adapted the code to resample the prediction trace to the sampling rate of the original data. This maintains temporal precision and avoids confusion.

      The Methods now include the following statement:

      "To maintain temporal precision, the prediction trace is resampled to the sampling frequency of the raw data."

      (3) Architecture optimization: how was the architecture CNN+LSTM optimized in terms of a number of CNN layers and size?

      We performed a Bayesian optimization over a defined range of hyperparameters in combination with empirical hyperparameter tuning. We now describe this in the Methods section as follows:

      "To optimise the model architecture, we performed a Bayesian optimisation of hyperparameters. Hyperparameter ranges were chosen for the free parameters of all layers. Optimisation was then performed with a maximum number of trials of 50. Models were evaluated using the validation dataset. Because higher number of free parameters tended to increase inference times, we then empirically tuned the chosen hyperparameter combination to achieve a trade-off between number of free parameters and accuracy."

      Recommendations For The Authors

      Reviewing Editor (Recommendations For The Authors):

      Overall suggestions to the authors:

      (1) Directly compare miniML with SimplyFire (which was not cited or discussed in the original manuscript), with both idealized and actual data. Discuss the pros/cons of each software.

      We have conducted an extensive comparison between miniML and SimplyFire using both simulated and actual experimental data. This analysis is now presented in the revised Figure 3, Figure 3—figure supplement 1, and Figure 4—figure supplement 1. In addition, we have included relevant citations for SimplyFire in our manuscript. These additions provide a more comprehensive and balanced view of the available tools in the field, positioning our work within the broader context of existing solutions.

      (2) Generate a better user interface akin to MiniAnalysis or SimplyFire.

      We thank the editor and reviewers for the suggestion to improve the user interface. We have created a user-friendly graphical user interface (GUI) for miniML that is available on our GitHub repository. This GUI is now showcased in Figure 2—figure supplement 2 of the manuscript. The new interface allows users to load and analyze data through an intuitive point-and-click system, visualize results in real-time, and adjust parameters easily without coding knowledge. We have incorporated user feedback to refine the interface and improve user experience. These improvements significantly enhance the accessibility of miniML, making it more user-friendly for researchers with varying levels of programming expertise.

      Reviewer 1 (Recommendations For The Authors):

      Related to point (1) of the Public Review, we have taken the liberty to compare electrophysiological data using miniAnalysis, SimiplyFire, and miniML. In our comparison, we note the following in our experience:

      (1.1) In contrast to both SimplyFire and miniAnalysis, miniML does not currently have a user-friendly interface where the user can directly control or change the parameters of interest, nor does miniML have a user control center, so the user cannot simply type or select the mini manually. Rather, if any parameter needs to be changed, the user needs to read, understand, and change the original source code to generate the preferred change. This level of "activation energy" and required user coding expertise in computer science, which many researchers do not have, renders miniML much less accessible when directly compared to SimplyFire and miniAnalysis. Hence, unless miniML’s interface can be made more user-friendly, this is a major disadvantage, especially when compared to SimplyFire, which has many of the same features as miniML but with a much easier interface and user controls.

      As suggested by the reviewer, we have created a graphical user interface (GUI) for miniML. The GUI allows easy data loading, filtering, analysis, event inspection, and saving of results without the need for writing Python code. Figure 2—figure supplement 2 illustrates the typical workflow for event analysis with miniML using the GUI and a screenshot of the user interface. Code to use miniML via the GUI is now included in the project’s GitHub repository. The GUI provides a simple and intuitive way to analyze synaptic events, whereas running miniML as Python script allows for more customization and a high degree of automatization.

      (1.2) We compared electrophysiological miniature events between miniML, SimplyFire, and miniAnalysis. All three achieved similar mean amplitudes in "wild type" conditions, and conditions in which mini events were enhanced and diminished, so the overall means and utilities are similar, with miniML and SimplyFire being preferred given the flexibility and much faster analysis. We did note a few differences, however. SimplyFire tends to capture a high number of mini-events over miniML, especially in conditions of diminished mini amplitude (e.g., miniML found 76 events, while SimplyFire 587). The mean amplitudes, however, were similar. It seems that in data with low SNR, SimplyFire captures many more events as real minis that are probably noise, while miniML is more selective, which might be an advantage in miniML. That being said, we found SimplyFire to be superior in many respects, not least of which the user interface and experience.

      We appreciate the reviewer’s thorough comparison of miniML, SimplyFire, and MiniAnalysis. While we acknowledge SimplyFire’s user-friendly interface, our study highlights several advantages of AI-based event analysis over conventional algorithmic approaches. Our updated benchmark analysis revealed better detection performance of miniML compared with SimplyFire (revised Figure 3), which had similar performance to deconvolution. As already noted by the reviewer, high false positive rates are a major issue of the SimplyFire approach. Although a minimum amplitude cutoff can partially resolve this problem, detection performance is highly sensitive to threshold setting (revised Figure 3). Another apparent disadvantage of SimplyFire is its relatively slow runtime (Figure 3—figure supplement 1). Finally, we have enhanced miniML’s accessibility by providing a graphical user interface that is easy to use and provides additional functionality.

      Some technical comments:

      (1) Improvements to the dependence version of miniML: There is a need to clarify the dependence version of the python and tensor flow used in this study and in the GitHub. We used Python version 3.8.19 to load the miniML model. However, if Python versions >=3.9, as described on the GitHub provided, it is difficult to have a matched h5py version installed. It is also inaccurate to say using Python >=3.9, because tensor flow version for this framework needs to be around 2.13. However, if using Python >=3.10, it will only allow 2.16 version tensor flow to be the download choice. Therefore, as a Python framework, the dependency version needs to be specified on GitHub to allow researchers to access the model using the entire work.

      Thank you for highlighting this issue. We have now included specific version numbers in the requirements to avoid version conflicts and to ensure proper functioning of the code.

      (2) Due to the intrinsic characteristics of the trained model, every model is only suitable for analyzing data with similar attributes. It is hard for researchers without a strong computer science background to train a new model themselves for their specific data. Therefore, it would be preferred if there were more available transfer learning models on GitHub accessible for researchers to adapt to their data.

      We would like to thank the reviewer for this feedback. Trained models (such as the default model) can often be used on different data (see, e.g., Figure 4, where data from four distinct synaptic preparations were analyzed with the base model, and Figure 5—figure supplement 1). However, changes in event waveform and/or noise characteristics may necessitate transfer learning to obtain optimal results with miniML. We have revised the description and tutorial for model training on the project’s GitHub repository to provide more guidance in this process. In addition, we now provide a tutorial on how to use existing models on out-of-sample data with distinct kinetics, using resampling. We hope these updates to the miniML GitHub repository will facilitate the use of the method.

      Following the suggestion by the reviewer, we have provided the transfer learning models used for the manuscript on the project’s GitHub repository to increase the number of available machine learning models for event detection. In addition, users of miniML are encouraged to supply their custom models. We hope that this will facilitate model exchange between laboratories in the future.

      Reviewer 3:

      I congratulate all authors for the convincing demonstration of their methodology, I do not have additional recommendations.

      We would like to thank the reviewer for the positive assessment of our manuscript.

      References

      Delvendahl, I., Kita, K., & Müller, M. (2019). Rapid and sustained homeostatic control of presynaptic exocytosis at a central synapse. Proceedings of the National Academy of Sciences, 116(47), 23783–23789. https://doi.org/10.1073/pnas.1909675116

      Donahue, J., Hendricks, L. A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., & Darrell, T. (2017). Long-term recurrent convolutional networks for visual recognition and description. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 677–691. https://doi.org/10.1109/tpami.2016.2599174

      Drummond, C., & Holte, R. C. (2003). C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. https: //api.semanticscholar.org/CorpusID:204083391

      Islam, M. Z., Islam, M. M., & Asraf, A. (2020). A combined deep CNN-LSTM network for the detection of novel coronavirus (COVID-19) using x-ray images. Informatics in Medicine Unlocked, 20, 100412. https://doi.org/10.1016/j.imu.2020.100412

      Passricha, V., & Aggarwal, R. K. (2019). A hybrid of deep CNN and bidirectional LSTM for automatic speech recognition. Journal of Intelligent Systems, 29(1), 1261–1274. https://doi.org/10.1515/jisys-2018-0372

      Prati, R. C., Batista, G. E. A. P. A., & Monard, M. C. (2009). Data mining with imbalanced class distributions: Concepts and methods. Indian International Conference on Artificial Intelligence. https://api.semanticscholar.org/CorpusID:16651273

      Tasdelen, A., & Sen, B. (2021). A hybrid CNN-LSTM model for pre-miRNA classification. Scientific Reports, 11(1). https://doi.org/10. 1038/s41598-021-93656-0

      Tyagi, S., & Mittal, S. (2020). Sampling approaches for imbalanced data classification problem in machine learning. In P. K. Singh, A. K. Kar, Y. Singh, M. H. Kolekar, & S. Tanwar (Eds.), Proceedings of icric 2019 (pp. 209–221). Springer International Publishing.

      Wang, H., Zhao, J., Li, J., Tian, L., Tu, P., Cao, T., An, Y., Wang, K., & Li, S. (2020). Wearable sensor-based human activity recognition using hybrid deep learning techniques. Security and Communication Networks, 2020, 1–12. https://doi.org/10.1155/2020/ 2132138

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This is a new and important system that can efficiently train mice to perform a variety of cognitive tasks in a flexible manner. It is innovative and opens the door to important experiments in the neurobiology of learning and memory.

      Strengths:

      Strengths include: high n's, a robust system, task flexibility, comparison of manual-like training vs constant training, circadian analysis, comparison of varying cue types, long-term measurement, and machine teaching.

      Weaknesses:

      I find no major problems with this report.

      (1) Line 219: Water consumption per day remained the same, but number of trails triggered was more as training continued. First, is this related to manual-type training? Also, I'm trying to understand this result quantitatively, since it seems counter-intuitive: I would assume that with more trials, more water would be consumed since accuracy should go up over training (so more water per average trial). Am I understanding this right? Can the authors give more detail or understanding to how more trials can be triggered but no more water is consumed despite training?

      Thanks for the thoughtful comment. We would like to clarify the phenomenon described in Line 219: As the training advanced, the number of trials triggered by mice per day decreased (rather than increased as you mentioned in the comment) gradually for both manual and autonomous groups of mice (Fig. 2H left). The performance as you mentioned, improved over time, leading to an increased probability of obtaining water and thus relatively stable daily water intake (Fig. 2H left). We believe the stable daily intake is the minimum amount of water required by the mice under circumstance of autonomous behavioral training.

      (2) Figure 2J: The X-axis should have some label: at least "training type". Ideally, a legend with colors can be included, although I see the colors elsewhere in the figure. If a legend cannot be added, then the color scheme should be explained in the caption.

      (3) Figure 2K: What is the purple line? I encourage a legend here. The same legend could apply to 2J.

      (4) Supplementary Figure S2 D: I do not think the phrase "relying on" is correct. Instead, I think "predicted by" or "correlating with" might be better.

      We thank the reviewer for the valuable suggestion. We will address all these points and make the necessary revisions in the next version of our manuscript.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Yu et al. describes a novel approach for collecting complex and different cognitive phenotypes in individually housed mice in their home cage. The authors report a simple yet elegant design that they developed for assessing a variety of complex and novel behavioral paradigms autonomously in mice.

      Strengths:

      The data are strong, the arguments are convincing, and I think the manuscript will be highly cited given the complexity of behavioral phenotypes one can collect using this relatively inexpensive ($100/box) and high throughput procedure (without the need for human interaction). Additionally, the authors include a machine learning algorithm to correct for erroneous strategies that mice develop which is incredibly elegant and important for this approach as mice will develop odd strategies when given complete freedom.

      Weaknesses:

      (1) A limitation of this approach is that it requires mice to be individually housed for days to months. This should be discussed in depth.

      Thank you for raising this important point. We agree that the requirement for individual housing of mice during the training period is a limitation of our approach, and we appreciate the opportunity to discuss this in more depth. In the revised manuscript, we will add a dedicated section to the Discussion to address this limitation, including the potential impact of individual housing on the mice, the rationale for individual housing in our study, and efforts or alternatives made to mitigate the effects of individual housing.

      (2) A major issue with continuous self-paced tasks such as the autonomous d2AFC used by the authors is that the inter-trial intervals can vary significantly. Mice may do a few trials, lose interest, and disengage from the task for several hours. This is problematic for data analysis that relies on trial duration to be similar between trials (e.g., reinforcement learning algorithms). It would be useful to see the task engagement of the mice across a 24-hour cycle (e.g., trials started, trials finished across a 24-hour period) and approaches for overcoming this issue of varying inter-trial intervals.

      Thank you for your insightful comment regarding the variability in inter-trial intervals and its potential impact on data analysis. We agree that this is an important consideration for continuous self-paced tasks like the autonomous d2AFC paradigm used in our study. In the original manuscript, we have showed the general task engagement across 24-hour cycle (Fig. 2K). The distribution of inter-trial interval was also illustrated (Fig. S3H), which actually shows that most of trials have short intervals (though with extreme long ones). We will include more detailed analysis and discuss the challenges for data analysis.

      Regarding the approaches to mitigate the issue of varying inter-trial interval, we will also discuss strategies to account for and mitigate the effects, including: trial selection, incorporating engagement period (e.g., open only during a fixed 2-hour period each day), etc.

      (3) Movies - it would be beneficial for the authors to add commentary to the video (hit, miss trials). It was interesting watching the mice but not clear whether they were doing the task correctly or not.

      Thanks for the reminder. We will add subtitles to the videos in the next version.

      (4) The strength of this paper (from my perspective) is the potential utility it has for other investigators trying to get mice to do behavioral tasks. However, not enough information was provided about the construction of the boxes, interface, and code for running the boxes. If the authors are not willing to provide this information through eLife, GitHub, or their own website then my evaluation of the impact and significance of this paper would go down significantly.

      Thanks for this important comment. We would like to clarify that the construction methods, GUI, code for our system, PCB and CAD files (newly uploaded) have already been made publicly available on https://github.com/Yaoyao-Hao/HABITS. Additionally, we have open-sourced all the codes and raw data for all training protocols (https://doi.org/10.6084/m9.figshare.27192897). We will continue to maintain these resources in the future.

      Minor concerns:

      Learning rate is confusing for Figure 3 results as it actually refers to trials to reach the criterion, and not the actual rate of learning (e.g., slope).

      Thanks for pointing this out. We will make the revision in the next version.

      Reviewer #3 (Public review):

      Summary:

      In this set of experiments, the authors describe a novel research tool for studying complex cognitive tasks in mice, the HABITS automated training apparatus, and a novel "machine teaching" approach they use to accelerate training by algorithmically providing trials to animals that provide the most information about the current rule state for a given task.

      Strengths:

      There is much to be celebrated in an inexpensively constructed, replicable training environment that can be used with mice, which have rapidly become the model species of choice for understanding the roles of distinct circuits and genetic factors in cognition. Lingering challenges in developing and testing cognitive tasks in mice remain, however, and these are often chalked up to cognitive limitations in the species. The authors' findings, however, suggest that instead, we may need to work creatively to meet mice where they live. In some cases, it may be that mice may require durations of training far longer than laboratories are able to invest with manual training (up to over 100k trials, over months of daily testing) but the tasks are achievable. The "machine teaching" approach further suggests that this duration could be substantially reduced by algorithmically optimizing each trial presented during training to maximize learning.

      Weaknesses:

      (1) Cognitive training and testing in rodent models fill a number of roles. Sometimes, investigators are interested in within-subjects questions - querying a specific circuit, genetically defined neuron population, or molecule/drug candidate, by interrogating or manipulating its function in a highly trained animal. In this scenario, a cohort of highly trained animals that have been trained via a method that aims to make their behavior as similar as possible is a strength.

      However, often investigators are interested in between-subjects questions - querying a source of individual differences that can have long-term and/or developmental impacts, such as sex differences or gene variants. This is likely to often be the case in mouse models especially, because of their genetic tractability. In scenarios where investigators have examined cognitive processes between subjects in mice who vary across these sources of individual difference, the process of learning a task has been repeatedly shown to be different. The authors do not appear to have considered individual differences except perhaps as an obstacle to be overcome.

      The authors have perhaps shown that their main focus is highly-controlled within-subjects questions, as their dataset is almost exclusively made up of several hundred young adult male mice, with the exception of 6 females in a supplemental figure. It is notable that these female mice do appear to learn the two-alternative forced-choice task somewhat more rapidly than the males in their cohort.

      Thank you for your insightful comments and for highlighting the importance of considering both within-subject and between-subject questions in cognitive training and testing in rodent models.

      We acknowledge that our study primarily focused on highly controlled within-subject questions. However, the datasets we provided have showed some evidences for the ‘between-subject’ questions. For example, the large variability in learning rates among mice observed in Fig. 2I, the overall learning rate difference between male and female subjects (Fig. 2D vs. Fig. S2G, as the reviewer already mentioned), the varying nocturnal behavioral patterns (Fig. 2K), etc. While our primary focus was on highly controlled within-subjects questions, we recognize the value of exploring between-subjects differences. In the revised version, we will discuss these points more systematically.

      (2) Considering the implications for mice modeling relevant genetic variants, it is unclear to what extent the training protocols and especially the algorithmic machine teaching approach would be able to inform investigators about the differences between their groups during training. For investigators examining genetic models, it is unclear whether this extensive training experience would mitigate the ability to observe cognitive differences, or select the animals best able to overcome them - eliminating the animals of interest. Likewise, the algorithmic approach aims to mitigate features of training such as side biases, but it is worth noting that the strategic uses of side biases in mice, as in primates, can benefit learning, rather than side biases solely being a problem. However, the investigators may be able to highlight variables selected by the algorithm that are associated with individual strategies in performing their tasks, and this would be a significant contribution.

      Thank you for the insightful comments. We acknowledge that the extensive training experience, particularly through the algorithmic machine teaching approach, could potentially influence the ability to observe cognitive differences between groups of mice with relevant genetic variants. However, our study design and findings suggest that this approach can still provide valuable insights into individual differences and strategies used by the animals during training. First, the behavioral readout (including learning rate, engagement pattern, etc.) as mentioned above, could tell certain number of differences among mice. Second, detailed modelling analysis (with logistical regression modelling) could further dissect the strategy that mouse use along the training process (Fig. S2B). We have actually highlighted some variables selected by the regression that are associated with individual strategies in performing their tasks (Fig. S2C) and these strategies could be different between manual and autonomous training groups (Fig. S2D). We will discuss these points more in the next version of the manuscript.

      (3) A final, intriguing finding in this manuscript is that animal self-paced training led to much slower learning than "manual" training, by having the experimenter introduce the animal to the apparatus for a few hours each day. Manual training resulted in significantly faster learning, in almost half the number of trials on average, and with significantly fewer omitted trials. This finding does not necessarily argue that manual training is universally a better choice because it leads to more limited water consumption. However, it suggests that there is a distinct contribution of experimenter interactions and/or switching contexts in cognitive training, for example by activating an "occasion setting" process to accelerate learning for a distinct period of time. Limiting experimenter interactions with mice may be a labor-saving intervention, but may not necessarily improve performance. This could be an interesting topic of future investigation, of relevance to understanding how animals of all species learn.

      Thank you for your insightful comments. We agree that the finding that manual training led to significantly faster learning compared to self-paced training is both intriguing and important. One of the possible reasons we think is due to the limited duration of engagement provided by the experimenter in the manual training case, which forced the mice to concentrate more on the trails (thus with fewer omitting trials) than in autonomous training. Your suggestion that experimenter interactions might activate an "occasion setting" process is particularly interesting. In the context of our study, we could actually introduce, for example, a light, serving as the cue that prompt the animals to engage; and when the light is off, the engagement was not accessible any more for the mice to simulate the manual training situation. We agree that this could be an interesting topic for future investigation that might create a more conducive environment for learning, thereby accelerating the learning rate.

    1. Author response:

      Reviewer #1:

      Summary:

      The authors aim to predict ecological suitability for the transmission of highly pathogenic avian influenza (HPAI) using ecological niche models. This class of models identify correlations between the locations of species or disease detections and the environment. These correlations are then used to predict habitat suitability (in this work, ecological suitability for disease transmission) in locations where surveillance of the species or disease has not been conducted. The authors fit separate models for HPAI detections in wild birds and farmed birds, for two strains of HPAI (H5N1 and H5Nx) and for two time periods, pre- and post-2020. The authors also validate models fitted to disease occurrence data from pre-2020 using post-2020 occurrence data.

      Strengths:

      The authors follow the established methods of Dhingra et al., 2016 to provide an updated spatial assessment of HPAI transmission suitability for two time periods, pre- and post-2020. They explore further methods of model cross-validation and consider the diversity of the bird species that HPAI has been detected in.

      Weaknesses:

      The precise ecological niche that the authors are modelling here is ambiguous: if we treat the transmission of HPAI in the wild bird population and in poultry populations as separate transmission cycles, linked by spillover events, then these transmission cycles are likely to have fundamentally different ecological niches.

      We apologise if this aspect was not clear enough in the previous version of our manuscript but our analyses do not treat or make the assumption of distinct transmission cycles between wild and domestic bird species; those transmission cycles being indeed interconnected by frequent spillover events. Yet, we indeed conduct independent ecological niche modelling analyses to estimate both the ecological suitability for the risk of local circulation in domestic birds as well as the ecological suitability for the risk of local circulation in wild birds. This distinction does not imply that the virus circulates exclusively within one of these populations but rather allows us to identify potential differences in the environmental conditions associated with virus occurrences in each context.

      Our results indicate that these two ecological niche models capture distinct environmental patterns. Virus occurrences in wild birds were primarily associated with factors such as open water and proximity to urban areas, while occurrences in domestic birds were more strongly linked to variables like poultry density and cultivated vegetation. This finding supports the existence of two distinct ecological niches for the virus, corresponding to virus circulation in wild and domestic bird populations. We thank the Reviewer for their feedback and we will take this opportunity to further clarify this aspect in the text.

      While an "index case" in farmed poultry is relevant to the wildlife transmission cycle, further within-farm and farm-to-farm transmission is likely to be contingent on anthropogenic factors, rather than the environment. Similarly, we would expect "index cases" in outbreaks of HPAI in mammals to be relevant to transmission risk in wild birds - this data is not included in this manuscript. Such "index cases" in farmed poultry occur under separate ecological conditions to subsequent transmission in farmed poultry, so should be separated if possible. Some careful editing of the language used in the manuscript may elucidate some of my questions related to model conceptualisation.

      We agree, but index cases are particularly difficult to separate from secondary spread in the absence of field investigation. Identification of index cases based on space-time filtering have been previously investigated but are strongly dependent on the quality of the surveillance, i.e. an “apparent” primary case can be a secondary case of previously undetected ones, and constant surveillance quality cannot be assumed to be homogeneous across countries. Our ecological niche modelling approach is based on HPAI cases reported in the EMPRES-i database, which includes all documented outbreaks without distinguishing primary introductions from subsequent farm-to-farm transmissions. Thus, our ecological niche models are trained on confirmed cases that result from a combination of different transmission dynamics, including introduction events in poultry populations (which can be impacted by ecological factors) and persistence within and between poultry populations (which can be impacted by anthropogenic factors).

      For clarity, we will revise the manuscript to clarify that, while our study primarily aims to assess the environmental suitability for HPAI occurrences, the dataset does not exclude cases resulting from farm-to-farm spread. This means that our models can capture the environmental variables associated with the risk of cases associated with both primary introductions (e.g., spillover from wild birds) and secondary transmission events within poultry systems, although the latter is also influenced by anthropogenic factors such as biosecurity practices and poultry trade networks. These latter factors are not included in our models, which will be highlighted in the limitations (Discussion section) of the revised manuscript.

      In addition, we note the Reviewer's comment regarding the relevance of “index cases” in mammalian outbreaks to understanding the risk of HPAI transmission in wild birds. Although these data are not included in our current study, we will highlight the potential value of incorporating these cases into future models in order to refine risk predictions, provided that they can be identified with some reasonable level of certainty.

      The authors' handling of sampling bias in disease detection data in poultry is possibly inappropriate: one would expect the true spatial distribution of disease surveillance in poultry to be more closely correlated with poultry farming density, in contrast to human population density. This shortcoming in the modelling workflow possibly dilutes a key finding of the Results, that the transmission risk of HPAI in poultry is greatest in areas where poultry farming density is high.

      The Reviewer raises a valid point that poultry surveillance efforts can also be considered as correlated with poultry farm density than with human population density. While human population density can serve as a reasonable proxy for surveillance intensity — given that disease detection is often more active in areas with stronger veterinary notification systems — we acknowledge that poultry disease surveillance can also be influenced by the spatial distribution of poultry farms, as high-density poultry areas could be prioritised for monitoring. Please note that in our study, we followed a previously established approach (Dhingra et al. 2016) and weighted pseudo-absence sampling based on human population density to account for general surveillance biases. However, we do not agree with the Reviewer’s point. In fact, assuming a sampling bias correlated with poultry density would result in reducing its effect as a risk factor. The current approach does not.

      Reviewer #2:

      Summary:

      This study aimed to determine which spatial factors (conceived broadly as environmental, agronomic and socio-economic) explain greater avian influenza case numbers reported since 2020 (2020--2022) by comparing similar models built with data from the period 2015--2020. The authors have chosen an environmental niche modelling approach, where detected infections are modelled as a function of spatial covariates extracted at the location of each case. These covariates are available over the entire world so that the predictions can be projected back to space in the form of a continuous map.

      Strengths:

      The authors use boosted regression trees as the main analytical tool, which always feature among the best-performing models for environmental niche models (also known as habitat suitability models). They run replicate sets of the analysis for each of their model targets (wild/domestic x pathogen variant), which can help produce stable predictions. The authors take steps to ameliorate some forms of expected bias in the detection of cases, such as geographic variation in surveillance efforts, and in general more detections near areas of higher human population density.

      Weaknesses:

      The study is not altogether coherent with respect to time. Data sets for the response (N5H1 or N5Hx case data in domestic or wild birds) are divided into two periods; 2015-2020, and 2020-2022. Each set is modelled using a common suite of covariates that are not time-varying. That suggests that causation is inferred by virtue of cases being in different geographic areas in those two time periods. Furthermore, important predictors such as chicken density appear to be informed (in the areas of high risk) from census data from before 2010. The possibility for increased surveillance effort *through time* is overlooked, as is the possibility that previously high-burden locations may implement practice changes to reduce vulnerability.

      We acknowledge the Reviewer's comments regarding the consistency of time periods in our study. Our approach is to divide the HPAI case data into two time periods (2015-2020 and 2020-2022) and ecological niche models using a common set of covariates that do not explicitly account for temporal variation. We will further clarify these aspects in the revised version of our manuscript:

      (1) Our primary objective is to assess changes in ecological suitability over time rather than infer direct causation. By comparing models trained on pre-2020 data with post-2020 occurrences, we evaluate whether pre-2020 environmental conditions can predict recent HPAI suitability. However, we acknowledge that this does not capture dynamic changes in surveillance efforts, biosecurity measures, or host-pathogen interactions over time.

      (2) Regarding predictor variables, we used poultry density data from 2015, rather than pre-2010 data. However, this dataset is not based on a single census year; instead, it represents a median estimate derived from subnational poultry census data collected between 2000 and 2019. This median year approach provides a more stable representation of poultry density than any single-year snapshot. Furthermore, while poultry production systems may exhibit some temporal variation, these changes are generally minor compared to the inter-annual variability observed in HPAI occurrence, which is largely driven by epidemic dynamics. Given the current limitations of global poultry data, distinguishing distributions from different years is not feasible with the available GLW dataset. We will clarify these points in the manuscript.

      (3) We recognise that increased surveillance efforts and adaptive changes in poultry farming practices could influence the observed HPAI case distribution. While our current models do not incorporate time-varying surveillance intensity or biosecurity policies, we will address this limitation in the Discussion section and suggest that future work integrates dynamic surveillance data to improve risk assessments.

    1. Author response:

      Reviewer #1 (Public review):

      Wang et al. investigated how sexual failure influences sweet taste perception in male Drosophila. The study revealed that courtship failure leads to decreased sweet sensitivity and feeding behavior via dopaminergic signaling. Specifically, the authors identified a group of dopaminergic neurons projecting to the suboesophageal zone that interacts with sweet-sensing Gr5a+ neurons. These dopaminergic neurons positively regulate the sweet sensitivity of Gr5a+ neurons via DopR1 and Dop2R receptors. Sexual failure diminishes the activity of these dopaminergic neurons, leading to reduced sweet-taste sensitivity and sugar-feeding behavior in male flies. These findings highlight the role of dopaminergic neurons in integrating reproductive experiences to modulate appetitive sensory responses.

      Previous studies have explored the dopaminergic-to-Gr5a+ neuronal pathways in regulating sugar feeding under hunger conditions. Starvation has been shown to increase dopamine release from a subset of TH-GAL4 labeled neurons, known as TH-VUM, in the suboesophageal zone. This enhanced dopamine release activates dopamine receptors in Gr5a+ neurons, heightening their sensitivity to sugar and promoting sucrose acceptance in flies. Since the function of the dopaminergic-to-Gr5a+ circuit motif has been well established, the primary contribution of Wang et al. is to show that mating failure in male flies can also engage this circuit to modulate sugar-feeding behavior. This contribution is valuable because it highlights the role of dopaminergic neurons in integrating diverse internal state signals to inform behavioral decisions.

      An intriguing discrepancy between Wang et al. and earlier studies lies in the involvement of dopamine receptors in Gr5a+ neurons. Prior research has shown that Dop2R and DopEcR, but not DopR1, mediate starvation-induced enhancement of sugar sensitivity in Gr5a+ neurons. In contrast, Wang et al. found that DopR1 and Dop2R, but not DopEcR, are involved in the sexual failure-induced decrease in sugar sensitivity in these neurons. I wish the authors had further explored or discussed this discrepancy, as it is unclear how dopamine release selectively engages different receptors to modulate neuronal sensitivity in a context-dependent manner.

      Our immunostaining experiments showed that three dopamine receptors, DopR1, Dop2R, and DopEcR were expressed in Gr5a<sup>+</sup> neurons in the proboscis, which was consistent with previous findings by using RT-PCR (Inagaki et al 2012). As the reviewer pointed out, we found that DopR1 and Dop2R were required for courtship failure-induced suppression of sugar sensitivity, whereas Marella et al 2012 and Inagaki et al 2012 found that Dop2R and DopEcR were required for starvation-induced enhancement of sugar sensitivity. These results may suggest different internal states (courtship failure vs. starvation) modulate peripheral sensory system via different signaling pathways (e.g. different subsets of dopaminergic neurons; different dopamine release mechanisms; and different dopamine receptors). We will further discuss these possibilities in the revised manuscript.

      The data presented by Wang et al. are solid and effectively support their conclusions. However, certain aspects of their experimental design, data analysis, and interpretation warrant further review, as outlined below.

      (1) The authors did not explicitly indicate the feeding status of the flies, but it appears they were not starved. However, the naive and satisfied flies in this study displayed high feeding and PER baselines, similar to those observed in starved flies in other studies. This raises the concern that sexually failed flies may have consumed additional food during the 4.5-hour conditioning period, potentially lowering their baseline hunger levels and subsequently reducing PER responses. This alternative explanation is worth considering, as an earlier study demonstrated that sexually deprived males consumed more alcohol, and both alcohol and food are known rewards for flies. To address this concern, the authors could remove food during the conditioning phase to rule out its influence on the results.

      We think this is a valid concern. We will conduct courtship conditioning in the absence of food and test if courtship failure can still suppress sugar sensitivity in the revised manuscript.

      (2) Figure 1B reveals that approximately half of the males in the Failed group did not consume sucrose yet Figure 1-S1A suggests that the total volume consumed remained unchanged. Were the flies that did not consume sucrose omitted from the dataset presented in Figure 1-S1A? If so, does this imply that only half of the male flies experience sexual failure, or that sexual failure affects only half of males while the others remain unaffected? The authors should clarify this point.

      Here is a brief clarification of our experimental design and we will further clarify the details in the revised manuscript:

      After the behavioral conditioning, male flies were divided for two assays. On the one hand, we quantified PER responses of individual flies. As shown in Figure 1C, Failed males exhibited decreased sweet sensitivity (as demonstrated by the right shift of the response curve).

      On the other hand, we sought to quantify food consumption of individual flies by using the MAFE assay (Qi et al 2005). When presented with 400 mM sucrose, approximately 100% of the flies in the Naïve and Satisfied groups, and 50% of the flies in the Failed group, extended their proboscis and started feeding (Figure 1B). For these flies, we could quantify the consumed volumes and found there was no change (Figure 1, S1A). We should also note the consistency of these two experiments, e.g. in Figure 1C, only 50-60% of Failed males responded to 400 mM stimulation.  

      These two experiments in combination suggest that sexual failure suppressed sweet sensitivity of the Failed males. Meanwhile, as long as they still initiated feeding, the volume of food consumption remained unchanged. These results led us to focus on the modulatory effect of sexual failure on the sensory system, the main topic of this present study.

      In addition, to further clarify the potential misunderstanding, we plan to examine food consumption by using 800 mM sucrose in the revised manuscript. As shown in Figure 1C, 800 mM sucrose was adequate to induce feeding in ~100% of the flies.

      (3) The evidence linking TH-GAL4 labeled dopaminergic neurons to reduced sugar sensitivity in Gr5a+ neurons in sexually failed males could be further strengthened. Ideally, the authors would have activated TH-GAL4 neurons and observed whether this restored GCaMP responses in Gr5a+ neurons in sexually failed males. Instead, the authors performed a less direct experiment, shown in Figures 3-S1C and D. The manuscript does not describe the condition of the flies used in this experiment, but it appears that they were not sexually conditioned. I have two concerns with this experiment. First, no statistical analysis was provided to support the enhancement of sucrose responses following activation of TH-GAL4 neurons. Second, without performing this experiment in sexually failed males, the authors lack direct evidence to confirm that the dampened response of Gr5a+ neurons to sucrose results from decreased activity in TH-GAL4 neurons.

      We think this is also a valid suggestion. We will directly examine whether activating TH<sup>+</sup> neurons in sexually conditioned males would enhance sugar responses of Gr5a<sup>+</sup> neurons in sexually failed males. We will also add in statistical analysis.

      Nevertheless, we would still argue our current experiments using Naive males (Figure 3, S1C-D) are adequate to show a functional link between TH<sup>+</sup> neurons and Gr5a<sup>+</sup> neurons. Combining with the results that these neurons form active synapses (Figure 3, S1B) and that the activity of TH<sup>+</sup> neurons was dampened in sexually failed males (Figure 3G-I), our current data support the notion that sexual failure suppresses sweet sensitivity via TH-Gr5a circuity.

      (4) The statistical methods used in this study are poorly described, making it unclear which method was used for each experiment. I suggest that the authors include a clear description of the statistical methods used for each experiment in the figure legends. Furthermore, as I have pointed out, there is a lack of statistical comparisons in Figures 3-S1C and D, a similar problem exists for Figures 6E and F.

      We will add detailed information of statistical analysis in each figure legend.

      (5) The experiments in Figure 5 lack specificity. The target neurons in this study are Gr5a+ neurons, which are directly involved in sugar sensing. However, the authors used the less specific Dop1R1- and Dop2R-GAL4 lines for their manipulations. Using Gr5a-GAL4 to specifically target Gr5a+ neurons would provide greater precision and ensure that the observed effects are directly attributable to the modulation of Gr5a+ neurons, rather than being influenced by potential off-target effects from other neuronal populations expressing these dopamine receptors.

      We agree with the reviewer that manipulating Dop1R1 and Dop2R genes (Figure 4) and the neurons expressing them (Figure 5) might have broader impacts. In fact, we have also tested the role of Dop1R1 and Dop2R in Gr5a<sup>+</sup> neurons by RNAi experiments (Figure 6). As shown by both behavioral and calcium imaging experiments, knocking down Dop1R1 and Dop2R in Gr5a<sup>+</sup> neurons both eliminated the effect of sexual failure to dampen sweet sensitivity, further confirming the role of these two receptors in Gr5a<sup>+</sup> neurons.

      (6) I found the results presented in Fig. 6F puzzling. The knockdown of Dop2R in Gr5a+ neurons would be expected to decrease sucrose responses in naive and satisfied flies, given the role of Dop2R in enhancing sweet sensitivity. However, the figure shows an apparent increase in responses across all three groups, which contradicts this expectation. The authors may want to provide an explanation for this unexpected result.

      We agree that there might be some potential discrepancies. However, our current data are not adequate for the clarification given the experiments shown in Figure 6E-F and the apparent control (Figure 3C) were not conducted under identical settings at the same (that’s why we did not directly compare these results). One way to address the issues is to conduct these calcium imaging experiments again with a head-to-head comparison with the control group (Gr5a-GCaMP, +/- Dop1R1 and Dop2R RNAi). We will conduct the experiments and present the data in the revised manuscript.

      (7) In several instances in the manuscript, the authors described the effects of silencing dopamine signaling pathways or knocking down dopamine receptors in Gr5a neurons with phrases such as 'no longer exhibited reduced sweet sensitivity' (e.g., L269 and L288), 'prevent the reduction of sweet sensitivity' (e.g., L292), or 'this suppression was reversed' (e.g. L299). I found these descriptions misleading, as they suggest that sweet sensitivity in naive and satisfied groups remains normal while the reduction in failed flies is specifically prevented or reversed. However, this is not the case. The data indicate that these manipulations result in an overall decrease in sweet sensitivity across all groups, such that a further reduction in failed flies is not observed. I recommend revising these descriptions to accurately reflect the observed phenotypes and avoid any confusion regarding the effects of these manipulations.

      We will change our expressions in the revised manuscript. In brief, we think that these manipulations (suppressing Dop1R1<sup>+</sup> and Dop2R<sup>+</sup> neurons) have two consequences: suppressing the overall sweet sensitivity and eliminating the effect of sexual failure.

      Reviewer #2 (Public review):

      Summary:

      The authors exposed naïve male flies to different groups of females, either mated or virgin. Male flies can successfully copulate with virgin females; however, they are rejected by mated females. This rejection reduces sugar preference and sensitivity in males. Investigating the underlying neural circuits, the authors show that dopamine signaling onto GR5a sensory neurons is required for reduced sugar preference. GR5a sensory neurons respond less to sugar exposure when they lack dopamine receptors.

      Strengths:

      The findings add another strong phenotype to the existing dataset about brain-wide neuromodulatory effects of mating. The authors use several state-of-the-art methods, such as activity-dependent GRASP to decipher the underlying neural circuitry. They further perform rigorous behavioral tests and provide convincing evidence for the local labellar circuit.

      Weaknesses:

      The authors focus on the circuit connection between dopamine and gustatory sensory neurons in the male SEZ. Therefore, it is still unknown how mating modulates dopamine signaling and what possible implications on other behaviors might result from a reduced sugar preference.

      We agree with the reviewer that in the current study, we did not examine how mating experience suppressed the activity of dopaminergic neurons in the SEZ. The current study mainly focused on the behavioral characterization (sexual failure suppresses sweet sensitivity) and the downstream mechanism (TH-Gr5a pathway). We think that examining the upstream modulatory mechanism may be more suitable for a separate future study.

      We believe that a sustained reduction in sweet sensitivity (not limited to sucrose but extend to other sweet compounds, Figure 1, S1B-C) upon sexual failure suggests a generalized and sustained consequence on reward-related behaviors. Sexual failure may thus resemble a state of “primitive emotion” in fruit flies. We will further discuss this possibility in the revised manuscript.

      Reviewer #3 (Public review):

      Summary

      In this work, the authors asked how mating experience impacts reward perception and processing. For this, they employ fruit flies as a model, with a combination of behavioral, immunostaining, and live calcium imaging approaches.

      Their study allowed them to demonstrate that courtship failure decreases the fraction of flies motivated to eat sweet compounds, revealing a link between reproductive stress and reward-related behaviors. This effect is mediated by a small group of dopaminergic neurons projecting to the SEZ. After courtship failure, these dopaminergic neurons exhibit reduced activity, leading to decreased Gr5a+ neuron activity via Dop1R1 and Dop2R signaling, and leading to reduced sweet sensitivity. The authors therefore showed how mating failure influences broader behavioral outputs through suppression of the dopamine-mediated reward system and underscores the interactions between reproductive and reward pathways.

      Concern

      My main concern regarding this study lies in the way the authors chose to present their results. If I understood correctly, they provided evidence that mating failure induces a decrease in the fraction of flies exhibiting PER. However, they also showed that food consumption was not affected (Fig. 1, supplement), suggesting that individuals who did eat consumed more. This raises questions about the analysis and interpretation of the results. Should we consider the group as a whole, with a reduced sensitivity to sweetness, or should we focus on individuals, with each one eating more? I am also concerned about how this could influence the results obtained using live imaging approaches, as the flies being imaged might or might not have been motivated to eat during the feeding assays. I would like the authors to clarify their choice of analysis and discuss this critical point, as the interpretation of the results could potentially be the opposite of what is presented in the manuscript.

      Here is a brief clarification of our experimental design and we will further clarify the details in the revised manuscript:

      After the behavioral conditioning, male flies were divided for two assays. On the one hand, we quantified PER responses of individual flies. As shown in Figure 1C, Failed males exhibited decreased sweet sensitivity (as demonstrated by the right shift of the response curve).

      On the other hand, we sought to quantify food consumption of individual flies by using the MAFE assay (Qi et al 2005). When presented with 400 mM sucrose, approximately 100% of the flies in the Naïve and Satisfied groups, and 50% of the flies in the Failed group, extended their proboscis and started feeding (Figure 1B). For these flies, we could quantify the consumed volumes and found there was no change (Figure 1, S1A). We should also note the consistency of these two experiments, e.g. in Figure 1C, only 50-60% of Failed males responded to 400 mM stimulation.  

      These two experiments in combination suggest that sexual failure suppressed sweet sensitivity of the Failed males. Meanwhile, as long as they still initiated feeding, the volume of food consumption remained unchanged. These results led us to focus on the modulatory effect of sexual failure on the sensory system, the main topic of this present study.

      In addition, to further clarify the potential misunderstanding, we plan to examine food consumption by using 800 mM sucrose instead. As shown in Figure 1C, 800 mM sucrose was adequate to induce feeding in ~100% of the flies.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study combined whole-head magnetoencephalography (MEG) and subthalamic (STN) local field potential (LFP) recordings in patients with Parkinson's disease undergoing deep brain stimulation surgery. The paper provides solid evidence that cortical and STN beta oscillations are sensitive to movement context and may play a role in the coordination of movement redirection.

      We are grateful for the expert assessment by the editor and the reviewers. Below we provide pointby-point replies to both public and private reviews. We have tried to keep the answers in the public section short and concise, not citing the changed passages unless the point does not re-appear in the recommendations. There, we did include all of the changes to the manuscript, such that the reviewers need not go back and forth between replies and manuscript.

      The reviewer comments have not only led to numerous improvements of the text, but also to new analyses, such as Granger causality analysis, and to methodological improvements e.g. including numerous covariates in the statistical analyses. We believe that the article improved substantially through the feedback, and we thank the reviewers and the editor for their effort.

      Public Reviews

      Reviewer #1 (Public review):

      Summary:

      Winkler et al. present brain activity patterns related to complex motor behaviour by combining wholehead magnetoencephalography (MEG) with subthalamic local field potential (LFP) recordings from people with Parkinson's disease. The motor task involved repetitive circular movements with stops or reversals associated with either predictable or unpredictable cues. Beta and gamma frequency oscillations are described, and the authors found complex interactions between recording sites and task conditions. For example, they observed stronger modulation of connectivity in unpredictable conditions. Moreover, STN power varied across patients during reversals, which differed from stopping movements. The authors conclude that cortex-STN beta modulation is sensitive to movement context, with potential relevance for movement redirection.

      Strengths:

      This study employs a unique methodology, leveraging the rare opportunity to simultaneously record both invasive and non-invasive brain activity to explore oscillatory networks.

      Weaknesses:

      It is difficult to interpret the role of the STN in the context of reversals because no consistent activity pattern emerged.

      We thank the reviewer for the valuable feedback to our study. We agree that the interpretation of the role of the STN during reversals is rather difficult, because reversal-related STN activity was highly variable across patients. Although there seem to be consistent patterns in sub-groups of the current cohort, with some patients showing event-related increases (Fig. 3b) and others showing decreases, the current dataset is not large enough to substantiate or even explain the existence of such clusters. Thus, we limit ourselves to acknowledging this limitation and discussing potential reasons for the high variability, namely variability in electrode placement and insufficient spatial resolution for the separation of specialized cell ensembles within the STN (see Discussion, section Limitations and future directions).

      Reviewer #2 (Public review):

      Summary:

      This study examines the role of beta oscillations in motor control, particularly during rapid changes in movement direction among patients with Parkinson's disease. The researchers utilized magnetoencephalography (MEG) and local field potential (LFP) recordings from the subthalamic nucleus to investigate variations in beta band activity within the cortex and STN during the initiation, cessation, and reversal of movements, as well as the impact of external cue predictability on these dynamics. The primary finding indicates that beta oscillations more effectively signify the start and end of motor sequences than transitions within those sequences. The article is well-written, clear, and concise.

      Strengths:

      The use of a continuous motion paradigm with rapid reversals extends the understanding of beta oscillations in motor control beyond simple tasks. It offers a comprehensive perspective on subthalamocortical interactions by combining MEG and LFP.

      Weaknesses:

      (1) The small and clinically diverse sample size may limit the robustness and generalizability of the findings. Additionally, the limited exploration of causal mechanisms reduces the depth of its conclusions and focusing solely on Parkinson's disease patients might restrict the applicability of the results to broader populations.

      We thank the reviewer for the insightful feedback. We address these issues one by one in our responses to points 2, 4 and 6, respectively.

      (2) The small sample size and variability in clinical characteristics among patients may limit the robustness of the study's conclusions. It would be beneficial for the authors to acknowledge this limitation and propose strategies for addressing it in future research. Additionally, incorporating patient-specific factors as covariates in the ANOVA could help mitigate the confounding effects of heterogeneity.

      Thank you for this comment. The challenges associated with recording brain activity peri-operatively can be a limiting factor when it comes to sample size and cohort stratification. We now acknowledge this in the revised discussion (section Limitations and future directions). Furthermore, we suggest using sensing-capable devices in the future as a measure to increase sample sizes (Discussion, section Limitations and future directions). Lastly, we appreciate the idea of adding patient-specific factors as covariates to the ANOVAs and have thus included age, disease duration and pre-surgical UPDRS score into our models. This did not lead to any qualitative changes of statistical effects.

      (3) The author may consider using standardized statistics, such as effect size, that would provide a clearer picture of the observed effect magnitude and improve comparability.

      Thanks for the suggestion. As measures of effect size, we have added partial eta squared (η<sub>p</sub><sup2</sup>) to the results of all ANOVAs and Cohen’s d to all follow-up t-tests.

      (4) Although the study identifies relevance between beta activity and motor events, it lacks causal analysis and discussion of potential causal mechanisms. Given the valuable datasets collected, exploring or discussing causal mechanisms would enhance the depth of the study.

      We appreciate this idea and have conducted Granger causality analyses in response to this comment. This new analysis reveals that there is a strong cortical drive to the STN for all movements of interest and predictability conditions in the beta band. The detailed results can be viewed on p. 16 in the section on Granger causality. For statistical testing, we conducted an rmANCOVA, similar to those for power and coherence (see p. 46-48 and 54-56 for the corresponding tables), as well as t-tests assessing directionality (Figure 6-figure supplement 2 on p. 35). In the discussion section, we connect these results with prior findings suggesting that the frontal cortex drives the STN in the beta band, likely through hyperdirect pathway fibers (p. 17).

      (5) The study cohort focused on senior adults, who may exhibit age-related cortical responses during movement planning in neural mechanisms. These aspects were not discussed in the study.

      We appreciate the comment and agree that age may have impacted neural oscillatory activity of patients in the present study. We now acknowledge this in the limitations section, and point out that our approach to handling these effects was including age as a covariate in the statistical analyses.

      (6) Including a control group of patients with other movement disorders who also undergo DBS surgery would be beneficial. Because we cannot exclude the possibility that the observed findings are specific to PD or can be generalized. Additionally, the current title and the article, which are oriented toward understanding human motor control, may not be appropriate.

      We thank the reviewer for this comment and fully agree that it cannot be ruled out that the present findings are, in part, specific to PD. We acknowledge this limitation in the Limitations and future directions section (p. 20-21). Indeed, including a control group of patients with other disorders would be ideal, but the scarcity of patients with diseases other than PD who receive STN DBS in our centre makes this an unfeasible option in practical terms. We do suggest that future research may address this issue by extending our approach to different disorders or healthy participants on the cortical level (p. 21). Lastly, we appreciate the idea to adjust the title of the present article. The adjusted title is: “Context-Dependent Modulations of Subthalamo-Cortical Synchronization during Rapid Reversals of Movement Direction in Parkinson’s Disease”.

      That being said, we do believe that our findings at least approximate healthy functioning and are not solely related to PD. For one, patients were on their usual dopaminergic medication and dopamine has been found to normalize pathological alterations of beta activity. Further, the general pattern of movement-related beta and gamma oscillations reported here has been observed in numerous diseases and brain structures, including cortical beta oscillations measured non-invasively in healthy participants.

      Reviewer #3 (Public review):

      Summary:

      The study highlights how the initiation, reversal, and cessation of movements are linked to changes in beta synchronization within the basal ganglia-cortex loops. It was observed that different movement phases, such as starting, stopping briefly, and stopping completely, affect beta oscillations in the motor system.

      It was found that unpredictable cues lead to stronger changes in STN-cortex beta coherence. Additionally, specific patterns of beta and gamma oscillations related to different movement actions and contexts were observed. Stopping movements was associated with a lack of the expected beta rebound during brief pauses within a movement sequence.

      Overall, the results underline the complex and context-dependent nature of motor-control and emphasize the role of beta oscillations in managing movement according to changing external cues.

      Strengths:

      The paper is very well written, clear, and appears methodologically sound.

      Although the use of continuous movement (turning) with reversals is more naturalistic than many previous button push paradigms.

      Weaknesses:

      The generalizability of the findings is somewhat curtailed by the fact that this was performed perioperatively during the period of the microlesion effect. Given the availability of sensing-enabled DBS devices now and HD-EEG, does MEG offer a significant enough gain in spatial localizability to offset the fact that it has to be done shortly postoperatively with externalized leads, with an attendant stun effect? Specifically, for paradigms that are not asking very spatially localized questions as a primary hypothesis?

      We appreciate the reviewer’s feedback and acknowledge the valid point raised on the timing of our measurements. Indeed, sensing-enabled devices offer a valid alternative to peri-operative recordings, circumventing the stun effect. We acknowledge this in the revised discussion, section Limitations and future directions (p. 23): “Additionally, future research could capitalize on sensingcapable devices to circumvent the necessity to record brain activity peri-operatively, facilitating larger sample sizes and circumventing the stun effect, an immediate improvement in motor symptoms arising as a consequence of electrode implantation (Mann et al., 2009).” This alternative strategy, however, was not an option here because we did not have a sufficient number of patients implanted with sensing-enabled devices at the time when the data collection was initialized.

      That being said, we would like to highlight that in the present study, our goal was not to study pathology related to Parkinson’s disease. Rather, we aimed to learn about motor control in general. The stun effect may have facilitated motor performance in our patients, which is actually beneficial to the research goals at hand.

      Further investigation of the gamma signal seems warranted, even though it has a slightly lower proportional change in amplitude in beta. Given that the changes in gamma here are relatively wide band, this could represent a marker of neural firing that could be interestingly contrasted against the rhythm account presented.

      We appreciate the reviewer’s interest and we have extended the investigation of gamma oscillations. We now provide statistics regarding the influence of predictability on gamma power and gamma coherence (no significant effects) and explore Granger causality in the gamma (and beta) band (see comment 4 of reviewer 2). Unfortunately, we cannot measure spiking via the DBS electrode, and therefore we cannot investigate correlations between gamma oscillatory activity and action potentials. We do agree with the reviewer, however, that action potentials rather than oscillations form the basis of motor control in the brain. This view of ours is now reflected in the revised discussion, section Limitations and future directions (p. 21): “Lastly, given the present study’s focus on understanding movement-related rhythms, particularly in the beta range, future research could further explore the role of gamma oscillations in continuous movement and their relation to action potentials in motor areas (Fischer et al., 2020; Igarashi, Isomura, Arai, Harukuni, & Fukai, 2013), which form the basis of movement encoding in the brain.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This is a well-conducted study and overall the results are clear. I only have one minor suggestion for improvement of the manuscript. I found the order of appearance of the results somewhat confusing, switching from predictability-related behavioral effects to primarily stopping and reversal-related neurophysiological effects, back to predictability but starting with coherence. I would suggest that the authors try to follow a systematic order focused on the questions at hand. E.g. perhaps readability could be improved if the results section is split into reversal vs. stopping related effects, reporting behavior, power, and coherence in this order, followed by a predictability section, again reporting behavior, power, and coherence. Obviously, this is an optional suggestion. Apart from that, I just missed a more direct message related to the absence of statistical significance related to STN power changes during reversal. I think this could be made more clear in the text.

      We thank the reviewer for the feedback to our study. In order to ease reading, we modified the order and further added additional sub-titles to the results section. We start with Behavior (p. 4) and then move on to Power (general movement effects on power – movement effects on STN power – movement effects on cortical power – predictability effects on power). Next, we move on to Connectivity (movement effects on connectivity – predictability effects on connectivity – Granger causality). We hope that these adaptations will help guide the reader.

      Additionally, we thank the reviewer for noting that we did not explicitly mention the lack of statistical significance of reversal-related beta power modulations in the STN. We have adapted the section on modulation of STN beta power associated with reversals (p. 8) to: “In the STN, reversals were associated with a brief modulation of beta power, which was weak in the group-average spectrum and did not reach significance (Fig. 3a).”

      Reviewer #2 (Recommendations for the authors):

      (1) The small sample size and variability in clinical characteristics among patients may limit the robustness of the study's conclusions. It would be beneficial for the authors to acknowledge this limitation and propose strategies for addressing it in future research. Additionally, incorporating patient-specific factors as covariates in the ANOVA could help mitigate the confounding effects of heterogeneity.

      Thank you for this comment. The challenges associated with recording brain activity peri-operatively can be a limiting factor when it comes to sample size. We now acknowledge this in the revised discussion, section Limitations and future directions (p. 20):

      “Invasive measurements of STN activity are only possible in patients who are undergoing or have undergone brain surgery. Studies drawing from this limited pool of candidate participants are typically limited in terms of sample size and cohort stratification, particularly when carried out in a peri-operative setting. Here, we had a sample size of 20, which is rather high for a peri-operative study, but still low in terms of absolute numbers.”

      Furthermore, we suggest using sensing-capable devices in the future as a measure to increase sample sizes (p. 21):

      “Additionally, future research could capitalize on sensing-capable devices to circumvent the necessity to record brain activity peri-operatively, facilitating larger sample sizes and circumventing the stun effect, an immediate improvement in motor symptoms arising as a consequence of electrode implantation (Mann et al., 2009).”

      Lastly, we appreciate the idea of adding patient-specific factors as covariates to the ANOVAs and have thus included age, disease duration and pre-surgical UPDRS score into our models. This did not lead to any qualitative changes of statistical effects.

      Revised article

      Methods, Statistical analysis:

      “To account for their potential influence on brain activity, we added age, pre-operative UPDRS score, and disease duration as covariates to all ANOVAs. Covariates were standardized by means of zscoring.”

      (2) The author may consider using standardized statistics, such as effect size, that would provide a clearer picture of the observed effect magnitude and improve comparability.

      Thanks for this useful suggestion. As measures of effect size, we have added partial eta squared (η<sub>p</sub><sup2</sup>) to the results of all ANOVAs and Cohen’s d to all follow-up _t-_tests.

      (3) Although the study identifies relevance between beta activity and motor events, it lacks causal analysis and discussion of potential causal mechanisms. Given the valuable datasets collected, exploring or discussing causal mechanisms would enhance the depth of the study.

      We appreciate this idea and have conducted Granger causality analyses in response to this comment. This new analysis reveals that there is a strong cortical drive to the STN for all movements of interest and predictability conditions in the beta band, but no directed interactions in the gamma band. For statistical testing, we conducted an rmANCOVA, similar to the analysis of power and coherence (see p. 46-48 and 54-56 for the corresponding tables), as well as t-tests assessing directionality (Figure 6 figure supplement 2 on p. 35). In the discussion section, we connect these results with prior findings suggesting that the frontal cortex drives the STN in the beta band, likely through hyperdirect pathway fibers (p. 17).

      Revised article

      Methods Section, Granger Causality Analysis

      “We computed beta and gamma band non-parametric Granger causality (Dhamala, Rangarajan, & Ding, 2008) between cortical ROIs and the STN in the hemisphere contralateral to movement for the post-event time windows (0 – 2 s with respect to start, reversal, and stop). Because estimates of Granger causality are often biased, we compared the original data to time-reversed data to suppress non-causal interactions. True directional influence is reflected by a higher causality measure in the original data than in its time-reversed version, resulting in a positive difference between the two, the opposite being the case for a signal that is “Granger-caused” by the other. Directionality is thus reflected by the sign of the estimate (Haufe, Nikulin, Müller, & Nolte, 2013). Because rmANCOVA results indicated no significant effects for predictability and movement type, and post-hoc tests did not detect significant differences between hemispheres, we averaged Granger causality estimates over movement types, hemispheres and predictability conditions in Figure 6-figure supplement 2.”

      Results, Granger causality

      “In general, cortex appeared to drive the STN in the beta band, regardless of the movement type and predictability condition. This was reflected in a main effect of ROI on Granger causality estimates (F<sub>ROI</sub>(7,9) = 3.443, p<sub>ROI</sub> = 0.044, η<sub>p</sub><sup2</sup> = 0.728; refer to Supplementary File 4 for the full results of the ANOVA). In the hemisphere contralateral to movement, follow-up t-tests revealed significantly higher Granger causality estimates from M1 to the STN (t = 3.609, one-sided p < 0.001, d = 0.807) and from MSMC to the STN (t = 2.051, one-sided p < 0.027, d = 0.459) than the other way around. The same picture emerged in the hemisphere ipsilateral to movement (M1 to STN: t = 3.082, one-sided p = 0.003, d = 0.689; MSMC to STN: t \= 1.833, one-sided p < 0.041, d = 0.410). In the gamma band, we did not detect a significant drive from one area to the other (F<sub>ROI</sub>(7,9) = 0.338, p<sub>ROI</sub> = 0.917, η<sub>p</sub><sup2</sup> = 0.208, Supplementary File 6). Figure 6-figure supplement 2 demonstrates the differences in Granger causality between original and time-reversed data for the beta and gamma band.”

      Discussion, The dynamics of STN-cortex coherence

      “Considering the timing of the increase observed here, the STN’s role in movement inhibition (Benis et al., 2014; Ray et al., 2012) and the fact that frontal and prefrontal cortical areas are believed to drive subthalamic beta activity via the hyperdirect pathway (Chen et al., 2020; Oswal et al., 2021) it seems plausible that the increase of beta coherence reflects feedback of sensorimotor cortex to the STN in the course of post-movement processing. In line with this idea, we observed a cortical drive of subthalamic activity in the beta band.”

      (4) The study cohort focused on senior adults, who may exhibit age-related cortical responses during movement planning in neural mechanisms. These aspects were not discussed in the study.

      We appreciate the comment and agree that age may have impacted neural oscillatory activity of patients in the present study. We now acknowledge this in the limitations section, and point out that our approach to handling these effects was including age as a covariate in the statistical analyses.

      Revised article

      Discussion, Limitations and Future Directions

      “Further, most of our participants were older than 60 years. To diminish any confounding effects of age on movement-related modulations of neural oscillations, such as beta suppression and rebound (Bardouille & Bailey, 2019; Espenhahn et al., 2019), we included age as a covariate in the statistical analyses.”

      (5) Including a control group of patients with other movement disorders who also undergo DBS surgery would be beneficial. Because we cannot exclude the possibility that the observed findings are specific to PD or can be generalized. Additionally, the current title and the article, which are oriented toward understanding human motor control, may not be appropriate.

      We thank the reviewer for this comment and fully agree that it cannot be ruled out that the present findings are, in part, specific to PD. We acknowledge this limitation in the Limitations and future directions section (p. 20-21). Indeed, including a control group of patients with other disorders would be ideal, but the scarcity of patients with diseases other than PD who receive STN DBS makes this an unfeasible option. We do suggest that future research may address this issue by extending our approach to different disorders or healthy participants on the cortical level (p. 21). Lastly, we appreciate the idea to adjust the title of the present article. The adjusted title is: “Context-Dependent Modulations of Subthalamo-Cortical Synchronization during Rapid Reversals of Movement Direction in Parkinson’s Disease”.

      That being said, we do believe that our findings at least approximate healthy functioning and are not solely related to PD. For one, patients were on their usual dopaminergic medication for the study and dopamine has been found to normalize pathological alterations of beta activity. More importantly, the general pattern of movement-related beta and gamma oscillations has been observed in numerous diseases and brain structures, including cortical beta oscillations measured non-invasively in healthy participants. Thus, it is not unlikely that the new aspects discovered here are also general features of motor processing.

      Revised article

      Discussion, Limitations and future directions

      “Furthermore, we cannot be sure to what extent the present study’s findings relate to PD pathology rather than general motor processing. We suggest that our approach at least approximates healthy brain functioning as patients were on their usual dopaminergic medication. Dopaminergic medication has been demonstrated to normalize power within the STN and globus pallidus internus, as well as STN-globus pallidus internus and STN-cortex coherence (Brown et al., 2001; Hirschmann et al., 2013). Additionally, several of our findings match observations made in other patient populations and healthy participants, who exhibit the same beta power dynamics at movement start and stop (Alegre et al., 2004) that we observed here. Notably, our finding of enhanced cortical involvement in face of uncertainty aligns well with established theories of cognitive processing, given the cortex' prominent role in managing higher cognitive functions (Altamura et al., 2010). Yet, transferring our approach and task to patients with different disorders, e.g. obsessive compulsive disorder, or examining young and healthy participants solely at the cortical level, could contribute to elucidating whether the synchronization dynamics reported here are indeed independent of PD and age.”

      Reviewer #3 (Recommendations for the authors):

      Despite the strengths of the "rhythm" account of cognitive processes, the paper could possibly be improved by making it less skewed to rhythms explaining all of the movement encoding.

      Thank you for this comment - the point is well taken. There is a large body of literature relating neural oscillations to spiking in larger neural populations, which itself is likely the most relevant signal with respect to motor control. In our eyes, it is this link that justifies the rhythm account, i.e. we agree with the reviewer that action potentials are the basis of movement encoding in the brain, not oscillations. Unfortunately, we cannot measure spiking with the method at hand.

      To better integrate this view into the current manuscript, we make the following suggestion for future research in the Limitations and future directions section (p. 21): “Lastly, given the present study’s focus on understanding movement-related rhythms, particularly in the beta range, future research could further explore the role of gamma oscillations in continuous movement and their relation to action potentials in motor areas (Fischer et al., 2020; Igarashi, Isomura, Arai, Harukuni, & Fukai, 2013), which form the basis of movement encoding in the brain.”

      In Figure 5 - is the legend correct? Is it really just a 0.2% change in power only? That would be a very surprisingly small effect size.

      We thank the reviewer for noting this. Indeed, the numbers on the scale quantify relative change (post - pre)/pre and should be multiplied by 100 to obtain %-change. We have adjusted the color bars accordingly.

      The dissociation between the effects of unpredictable cues in coherence versus raw power is interesting and could potentially be directly contrasted further in the discussion (here they are presented separately with separate discussions, but this seems like a pretty important and novel finding as beta coherence and power usually go in the same direction).

      We appreciate the reviewer’s interest in our findings on the predictability of movement instructions. In case of coherence, the difference between pre- and post-event was generally more positive in the unpredictable condition, meaning that suppressions (negative pre-post difference) were diminished whereas increases (positive pre-post difference) were enhanced. With respect to power, we also observed less suppression in the unpredictable condition at movement start. Therefore, the direction of change is in fact the same. We made this clearer in the revised version by adapting the corresponding sections of the abstract, results and discussion (see below).

      The only instance of coherence and power diverging (on a qualitative level) was observed during reversals: here, we noted post-event increases in coherence and post-event decreases in M1 power in the group-average spectra. However, when comparing the pre- and post-event epochs statistically by means of permutation testing, the coherence increase did not reach significance. Hence, we did not highlight this aspect.

      Revised version

      Abstract

      “… Event-related increases of STN-cortex beta coherence were generally stronger in the unpredictable than in the predictable condition. … “

      Results, Effects of predictability on beta power  

      “With respect to the effect of predictability of movement instructions on beta power dynamics (research aim 2), we observed an interaction between movement type and condition (F<sub>cond*mov</sub> (2,14) = 4.206, p<sub>cond*mov</sub> = 0.037, η<sub>p</sub><sup2</sup> = 0.375), such that the beta power suppression at movement start was generally stronger in the predictable (M = -0.170, SD = 0.065) than in the unpredictable (M \= -0.154, SD = 0.070) condition across ROIs (t = -1.888, one-sided p \= 0.037, d = -0.422). We did not observe any modulation of gamma power by the predictability of movement instructions (F<sub>cond</sub> (1,15) = 0.792, p<sub>cond</sub> = 0.388, η<sub>p</sub><sup2</sup> = 0.050, Supplementary File 5).”

      Effects of predictability on STN-cortex coherence

      “With respect to the effect of predictability of movement instructions on beta coherence (research aim 2), we found that the pre-post event differences were generally more positive in the unpredictable condition (main effect of predictability condition; F<sub>cond</sub>(1,15) = 8.684, p<sub>cond</sub> = 0.010, η<sub>p</sub><sup2</sup> = 0.367; Supplementary File 3), meaning that the suppression following movement start was diminished and the increases following stop and reversal were enhanced in the unpredictable condition (Fig. 6a). This effect was most pronounced in the MSMC (Fig. 6b). When comparing regionaverage TFRs between the unpredictable and the predictable condition, we observed a significant difference only for stopping (t<sub>clustersum</sub> = 142.8, p = 0.023), suggesting that the predictability effect was mostly carried by increased beta coherence following stops. When repeating the rmANCOVA for preevent coherence, we did not observe an effect of predictability (F<sub>cond</sub>(1,15) = 0.163, p<sub>cond</sub> = 0.692, η<sub>p</sub><sup2</sup> = 0.011), i.e. the effect was most likely not due to a shift of baseline levels. The increased tendency for upward modulations and decreased tendency for downward modulations rather suggests that the inability to predict the next cue prompted intensified event-related interaction between STN and cortex. STN-cortex gamma coherence was not modulated by predictability (F<sub>cond</sub>(1,15) = 0.005, p<sub>cond</sub> = 0.944, η<sub>p</sub><sup2</sup> = 0.000, Supplementary File 5).”

      Discussion, Beta coherence and beta power are modulated by predictability

      “In the present paradigm, patients were presented with cues that were either temporally predictable or unpredictable. We found that unpredictable movement prompts were associated with stronger upward modulations and weaker downward modulations of STN-cortex beta coherence, likely reflecting the patients adopting a more cautious approach, paying greater attention to instructive cues. Enhanced STN-cortex interactions might thus indicate the recruitment of additional neural resources, which might have allowed patients to maintain the same movement speed in both conditions. […]”

      With respect to power, we observed reduced beta suppression in the unpredictable condition at movement start, consistent with the effect on coherence, likely demonstrating a lower level of motor preparation.

      Given that you have a nice continuous data task here - the turning of the wheel, it might be interesting to cross-correlate the circular position (and separately - velocity) of the turning with the envelope of the beta signal. This would be a nice finding if you could also show that the beta is modulated continuously by the continuous movements. In the natural world, we rarely do a continuous movement with a sudden reversal, or stop, most of the time we are in continuous movement. Look at this might also be a strength of your dataset.

      We could not agree more. In fact, having a continuous behavioral output was a major motivation for choosing this particular task. We are very interested in state space models such as preferential subspace identification (Sani et al., 2021), for example. These models relate continuous brain signals to continuous behavioral target variables and should be of great help for questions such as: do oscillations relate to moment-by-moment adaptations of continuous movement? Which frequency bands and brain areas are important? Is angular position encoded by different brain areas/frequency bands than angular speed? These analyses are in fact ongoing. This project, however, is too large to fit into the current article.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study is an important follow-up to their prior work - Wong et al. (2019), starting with clear questions and hypotheses, followed by a series of thoughtful and organized experiments. The method and results are convincing. Experiment 1 demonstrated the sensory preconditioned fear with few (8) or many (32) sound-light pairings. Experiments 2A and 2B showed the role of PRh NMDA receptors during conditioning for online integration, revealing that this contribution is present only after a few sound-light pairings, not after many sound-light pairings. Experiments 3A and 3B showed the contribution of PRh-BLA communication to online integration, again only after a few but not after many. Contrary to Experiments 3A and 3B, Experiments 4A and 4B showed the contribution of PRh-BLA communication to integration at test only after many but not few sound-light pairings.

      Strengths:

      Throughout the manuscript, the methods and results are clearly organized and described, and the use of statistics is solid, all contributing to the overall clarity of the research. The discussion section was also well-written, effectively comparing the current research with the prior work and offering insightful interpretations and potential future directions for this line of research. I have only a limited amount of concerns about some results and some details of experiments/statistics.

      We thank the reviewer for their positive assessment.

      Weaknesses:

      Could you provide further interpretation regarding line 171: the observation that sensory preconditioned fear increased with the number of sound-light pairings? Was this increase due to better sound-light association learning during Stage 1? Additionally, were there any experimental differences between Experiment 1 and the other experiments that might explain why freezing was higher in the P32 group compared to the P8 group? This pattern seemed to be absent in the other experiments. If we consider the hypothesis that the online integration mechanism is more active with fewer pairings and the chaining mechanism at the test is more prominent with many pairings, we wouldn't expect a difference between the P8 and P32 groups. Given the relatively small sample size in Experiment 1, the authors might consider conducting a cross-experiment analysis or something similar to investigate this further.

      We appreciate the reviewer’s point and thank them for the question. The heightened level of sensory preconditioned fear among rats that received many sound-light pairings in the initial control experiment (Group P32) may reflect the combined effects of both mediated learning and chaining at test. We are, however, reluctant to offer a strong interpretation of this result as it was not replicated in the subsequent experiments: i.e., the levels of freezing to the sensory preconditioned stimulus at test were almost identical among vehicle-injected controls that received either few (8) or many (32) sound-light pairings in Experiments 2A and 2B; and this was also true in Experiments 3A and 3B, and again in Experiments 4A and 4B. A key difference between the initial and subsequent experiments is that, in contrast to the initial experiment, rats in subsequent experiments underwent surgery for one reason or another (implantation of cannulas, lesion of the perirhinal cortex). The implication is that surgical interventions in the perirhinal cortex and/or basolateral amygdala might affect the way that rats integrate the sound-light and light-shock associations in sensory preconditioning: i.e., they may force rats to rely on one type of integration strategy or the other. This is, of course, purely speculative – it will be addressed in future research.

      Reviewer #2 (Public review):

      This manuscript builds on the authors' earlier work, most recently Wong et al. 2019, in which they showed the importance of the perirhinal cortex (PRh) during the first-order conditioning stage of sensory preconditioning. Sensory preconditioning requires learning between two neutral stimuli (S2-S1) and subsequent development of a conditioned response to one of the neutral stimuli after pairing of the other stimulus with a motivationally relevant unconditioned stimulus (S1-US). One highly debated question regarding the mechanisms of learning of sensory preconditioning has been whether conditioned responses evoked by the indirectly trained stimulus (S2) occur through a mediated representation at the time of the first-order US training, or whether the conditioned responses develop through a chained evoked representation (S2--> S1 --> US) at the time of test. The authors' prior findings provided strong evidence for PRh being involved in mediated learning during the first-order training. They showed that protein synthesis was required during the first-order S1-US learning to support the conditioned response to the indirectly trained stimulus (S2) at the test.

      One question remaining following the previous paper was whether certain conditions may promote a chaining mechanism over mediated learning, as there is some evidence for chained representations at the time of the test. In this paper, the authors directly address this important question and find unambiguous results that the extent of training during the preconditioning stage impacts the involvement of PRh during the first-order conditioning or stage 2. They show that putative blockade of synaptic changes in PRh, using an NMDA antagonist, disrupts responding to the preconditioned cue at test during shorter duration preconditioning training (8 trials), but not during extended training (32 trials). They also show that this is the case for communication between the PRh and BLA during the same stage of training using a contralateral inactivation approach. This confirms their previous findings in 2019 of connectivity between these regions for the short-duration training, while they observe here for the first time that this is not the case for extended training. Finally, they show that with extended training, communication between BLA and the PRh is required at the final test of the preconditioned stimulus, but not for the short duration training.

      The results are clear and extremely consistent across experiments within this paper as well as with earlier work. The experiments here are thorough, and well-conceived, and address an important and highly debated question in the field regarding the neural and psychological mechanisms underlying sensory preconditioning. This work is highly impactful for the field as the debate over mediated versus chaining mechanisms has been an important topic for more than 70 years.

      We thank the reviewer for their kind assessment.

      Reviewer #3 (Public review):

      The authors tested whether the number of stimulus-stimulus pairings alters whether preconditioned fear depends on online integration during the formation of the stimulus-outcome memory or during the probe test/mobilization phase, when the original stimulus, which was never paired with aversive events, elicits fear via chaining of stimulus-stimulus and stimulus-outcome memories. They found that sensory preconditioning was successful with either 8 or 32 stimulus-stimulus pairings. Perirhinal cortex NMDA receptor blockade during stimulus-outcome learning impaired preconditioning following 8 but not 32 pairings during preconditioning. Therefore, perirhinal cortex NMDA activity is required for online integration or mediated learning. Perirhinal-basolateral amygdala had nearly identical effects with the same interpretation: these areas communicate during stimulus-outcome learning, and this online communication is required for later expressing preconditioned fear. Disconnection prior to the probe test, when chaining might occur, had different effects: it impaired the expression of preconditioned fear in rats that received 32, but not 8, pairings during preconditioning. The study has several strengths and provides a thoughtful discussion of future experiments. The study is highly impactful and significant; the authors were successful in describing the behavioral and neurobiological mechanisms of mediated learning versus chaining in sensory preconditioning, which is often debated in the learning field. Therefore this study will have a significant impact on the behavioral neurobiology and learning fields.

      Strengths:

      Careful, rigorous experimental design and statistics.

      The discussion leaves open questions that are very much worth exploring. For example - why did perirhinal-amygdala disconnection prior to the probe have no effect in the 8-pairing group, when bilateral perirhinal inactivation did (in Wong et al, 2019)? The authors propose that perirhinal cortex outputs bypass the amygdala during the probe test, which is an excellent hypothesis to test.

      The authors provide evidence that both mediated learning and chaining occur.

      Thank you for the positive assessment – we fully intend to identify the circuitry that regulates retrieval/expression of sensory preconditioned fear when it is based on mediated learning in stage 2.

      Weaknesses:

      This is inherent to all neural interference and behavioral experiments: biological/psychological functions do not typically operate binarily. There is no single clear number or parameter at which mediated learning or chaining happens, and both probably happen to some extent. Addressing this is even more difficult given behavioral variability across subjects, implant sites, etc. Thus, this is not so much a weakness particular to this study as much as an existential problem, which the authors were able to work around with careful experimental design and appropriate controls.

      We completely agree with the point raised here and thank the reviewer for their assessment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) It appears that the method description for Sensory Preconditioning was copied from their previous Wong et al. (2019) paper, which is fine, but in the current research, the authors use 8 or 32 presentations, which is not reflected in the description.

      Thank you for bringing this to our attention. This is now addressed in the method section on page 27 (beginning at line 655):

      “Rats received either eight presentations of the sound and eight of the light in a single session, or 32 presentations of the sound and 32 of the light across four daily sessions. On Day 3, all rats received eight presentations of the sound and eight of the light. Each presentation of the sound was 30 s in duration and each presentation of the light was 10 s in duration. The first stimulus presentation occurred five min after rats were placed into the chambers. The offset of one stimulus co-occurred with the onset of the other stimulus for groups that received paired presentations of the sound and the light, while these stimuli were presented separately for groups that received explicitly unpaired presentations. The interval between each paired presentation was five min while the interval between each separately presented stimulus was 150 s. After the last stimulus presentation, rats remained in the chambers for an additional one min. They were then returned to their home cages. This training was repeated on Days 4-6 for rats that received 32 presentations of the sound and 32 of the light. All rats proceeded to first-order conditioning (details below) the day after their final session of sound and light exposures, which was Day 4 for rats exposed to eight presentations of the sound and light and Day 7 for rats exposed to 32 presentations of the sound and light.”

      (2) Line 148: Could the authors clarify how the "significant linear increase" was assessed? From similar descriptions in later experiments, it seems it was based on a comparison of freezing across the four presentations, but the F(1,26) statistic suggests there seemed to be a half-split test. The same questions exist in all the experiments. Please clarify.

      Conditioning data were analysed using contrasts with repeated measures in ANOVA. The repeated measures (or within-subject) factor was “trial” as all rats were exposed to four light-shock pairings in this stage of training. We examined whether there was a significant linear increase in freezing across trials using a standard within-subject contrast. The specific coefficients for this contrast, given the four trials, were -3, -1, 1, and 3. The reason that the degrees of freedom remain 1 and 26 in this analysis is because the within-subject contrast is part of a set of planned orthogonal contrasts. That is, in any planned analysis of the sort conducted here, the df1 will always be 1, indicating the very nature of the analysis. There was no splitting of the data, or comparisons between the split halves.

      (3) Line 154: Could the authors clarify what is meant by "other main effects and their interactions"? It is not clearly inferable from the context.

      Apologies for the confusion here. “Other main effects” refer to the two between-subject factors in isolation: i.e., the overall comparison of freezing to the light (averaged across the four trials) between groups that received either paired or unpaired stimulus presentations in stage 1 (factor 1 à main effect 1), and between groups that received either eight or 32 sound and light exposures in stage 1 (factor 2 à main effect 2). “Their interaction” refers to the assessment of whether the overall difference in freezing to the light (averaged across the four trials) between Groups P8 and U8 differs from the overall difference in freezing to the light (averaged across the four trials) between Groups P32 and U32. We have edited the text near line 153 to indicate that:

      “The overall comparisons of freezing to the light (averaged across the four conditioning trials) between groups that received either paired or unpaired stimulus presentations in stage 1 (factor 1), and between groups that received either eight or 32 sound and light exposures in stage 1 (factor 2), were not significant (Fs < .45, p > .508). The interaction between these two between-subject factors was also not significant (F < .45, p > .508).”

      (4) The use of sound and light as preconditioned and conditioned cues are counterbalanced. Was there any difference in the increase of freezing during conditioning depending on the type of conditioned cues? Was there any difference in the preconditioned fear? While it is hard to assess statistical significance due to the sample size limit, even observing a trend could be interesting.

      We examined whether the levels of freezing to the conditioned and preconditioned stimuli depend on their physical identity. In general, there was a slight trend towards more freezing to the preconditioned stimulus when it was a tone, and less freezing to the conditioned stimulus when it was a tone. These are, however, simply indications. None of the statistical comparisons between rats for which the preconditioned stimulus was the tone (and, thereby, conditioned stimulus was the light) and rats for which the preconditioned stimulus was the light (and, thereby, conditioned stimulus was the tone) reached the conventional level of significance.

      (5) General suggestion on reporting non-significant statistics: the authors reported a small F statistic value a few times to suggest non-significance. But without clearly specifying degrees of freedom, it is hard to get a sense of statistical significance (e.g. Line 227, largest F<3.10). I recommend adding p values alongside the F statistics and reporting exact statistics whenever possible.

      Apologies for the omission. The p values have now been included alongside all non-significant F statistics.

      (6) Another general suggestion is to use non-parametric statistical testing with such small sample sizes. I recommend using the Kruskal-Wallis H test (the non-parametric equivalent of F-statistic) to replace the ANOVA result. Also, given many tests only involve comparing two independent groups, using Mann-Whitney U test (the non-parametric equivalent of independent t-test) would be sufficient.

      We understand that small sample sizes can occasionally lead to unequal variances between groups, which necessitates the use of non-parametric statistics. However, as non-parametric statistics raise a different set of issues for data analysis (e.g., power) and interpretation, our general view for the type of data collected in this study is that parametric analyses are appropriate and should be retained (particularly in the absence of unequal variances between groups). We hold this view for two reasons. First, the hypotheses tested in the present series were derived from past work in which parametric analyses revealed meaningful patterns of results at the same level of statistical power. Second, the application of these analyses then yielded results consistent with our hypotheses: for the most part, we observed between-group differences where we expected there to be such differences and did not observe between-group differences where we did not expect there to be such differences. As such, we have not switched from a parametric to non-parametric analysis strategy. We do, however, appreciate the suggestion and will apply a non-parametric approach where it is warranted in our future work.

      Reviewer #2 (Recommendations for the authors):

      I have a few very minor comments for the authors regarding the discussion and interpretation of the very nice experimental results.

      (1) In Figures 4 and 5, the authors provide a schematic of the experiment. It's very clearly indicated whether the BLA inactivation is ipsi- or contralateral, but the unilateral PRh lesion isn't mentioned. I'd recommend including that here so that someone reading through the figures can more easily understand the experiment. The hypothesis is clear and the experiment is so well designed that a read through of the figures can relay most information to an experienced reader.

      Thank you for this suggestion – we have included information about the unilateral PRh lesion in the schematic for Figures 4 and 5.

      (2) The authors have an extended description of backward conditioning in the discussion. It seems like the authors are suggesting this as an important future direction, but they never explicitly say this, resulting in a bit of confusion as to what this section refers to. Also, Ward-Robinson and Hall 1996 showed backward sensory preconditioning using a serial auditory-visual association and argued for a mediated solution based on their results. It may be worth citing that paper here.

      Apologies for the lack of clarity. We have revised this point in the discussion (page 18, beginning line 434) and referenced Ward-Robinson and Hall (1996):

      “Why does increasing the number of sound-light pairings change the way that rats integrate the sound-light and light-shock memories? One possibility is that increasing the number of sound-light pairings in stage 1 reduces the ability of each stimulus to activate the memory of the other. This is consistent with findings by Holland (1998), who showed that the likelihood of mediated learning in rats decreases with the amount of training (see also Holland, 2005); but inconsistent with our findings that, after extended training, rats continue to integrate the sound-light and light-shock associations through chaining at the time of testing (as chaining is predicated on the sound activating the memory of the light after extended training). Instead, we propose that the change in integration occurs because the increased number of sound-light pairings allows the rats to learn about the order in which the sound and light are presented (Figure 1; for evidence that rats acquire order information in sensory preconditioning, see Barnet et al., 1997; Hart et al., 2022; Leising et al., 2007; Miller & Barnet, 1993). This order hypothesis is consistent with evidence showing that the way in which animals represent an audio-visual compound changes across repeated compound exposures (e.g., Bellingham & Gillette, 1981; Holmes & Harris, 2009). It can be tested using a so-called “backward” sensory preconditioning protocol, which reverses the order of stimulus presentations in stage 1 (e.g., Ward-Robinson & Hall, 1996). That is, rather than rats being exposed to the “forward” sound-light pairings used here and by Wong et al. (2019), rats in a backward protocol are exposed to light-sound pairings. Increasing the number of light-sound pairings in this protocol should result in rats learning that the light is followed by the sound (light→sound) and that the sound is followed by nothing (sound→nothing). Hence, during the session of light-shock pairings in stage 2, the light should continue to activate the memory of the sound, resulting in formation of the mediated sound-shock association (e.g., Ward-Robinson & Hall, 1996). That is, if our order hypothesis is correct, increasing the number of light-sound pairings in the backward protocol should preserve the likelihood of mediated learning in stage 2 and, if anything, diminish the likelihood of chaining at test in stage 3 (as the sound is never followed by a light). Hence, PRh manipulations that fail to affect fear of the sound when administered after many sound-light pairings (e.g., infusion of DAP5) should disrupt that fear when administered after many light-sound pairings in the backward protocol. This will be assessed in future work.”

      (3) Line 467 in the discussion suggests that the results are surprising that PRh-BLA communication is not needed at test when learning putatively occurs through a mediated mechanism during first-order conditioning. I was a bit surprised by this comment since I was under the assumption that only BLA was required at this point after consolidation of the mediated learning. Holmes et al., 2013 showed that BLA is required for extinction to S2 after first-order conditioning. In that experiment they inactivated BLA during S2- presentations (typically considered the extinction test), and showed that reduction to S2 did not occur the subsequent day, indicating the memory was stored in BLA and may not necessarily require PRh-BLA communication.

      The result noted here was somewhat surprising as our past studies showed that silencing activity in the PRh prior to testing attenuates freezing to a sensory preconditioned stimulus (i.e., an S2). We took this to mean that the PRh is necessary for retrieval/expression of fear to S2 and supposed that this retrieval/expression would be achieved through communication between the PRh and BLA. However, the results of the PRh-BLA disconnection at test show that this communication is not required, leaving us to speculate that retrieval/expression of fear to S2 may be achieved through communication between the PRh and CeA.

      We have edited the opening of the relevant paragraph to clarify why the result noted here was surprising (page 20, beginning line 485):

      “While the PRh and BLA clearly communicate to support mediated learning about the sound, this communication is not required for retrieval/expression of the mediated sound-shock association at the time of testing. This result is somewhat surprising as activity in the PRh is needed for expression of fear to the sound (Holmes et al., 2013; Wong et al., 2019) and raises the question: how does the PRh-dependent sound-shock association come to be expressed in fear responses?”

      (4) The authors reference Holland 1981 and 1998, yet there's not much discussion of these findings. I think there should be a bit more emphasis on these studies since they show how mediated learning greatly depends on the extent of training. Also, it may be worth considering Holland's theory of why mediated conditioning is more effective with shorter training. His theory may be consistent with the authors, but I believe he suggests that early in training a stronger mediated representation is evoked which tends to dissipate with time. I think this is a valid hypothesis to consider in this paper.

      The Holland papers show that rats form mediated associations (Holland, 1981) and that the likelihood of them doing so decreases with the amount of training (Holland, 1998). These findings are paralleled by those reported in the present series of experiments. However, the protocols used by Holland were very different to those used in the present study; and the explanation for his 1998 findings (which is the more relevant of the two papers) simply does not apply to the case of sensory preconditioning.

      To be clear: Holland (1998) exposed rats to either “few” or “many” tone-food pairings in stage 1, tone-lithium chloride pairings in stage 2 and, finally, tested rats with the food alone in stage 3. He predicted and showed that those exposed to few tone-food pairings showed an aversion to the food at test (i.e., they consumed less of the food than controls) whereas those exposed to many tone-food pairings showed no such aversion (i.e., they consumed the same amount of food as the controls). This was taken to mean that, across the series of tone-lithium pairings, the tone activated the memory of food among rats in the few condition, resulting in a mediated food-lithium association; but failed to do so among rats in the many condition, resulting in no food-lithium association. According to Holland, the tone failed to activate the memory of food in the many condition because, by the end of training in stage 1, it was not needed for them to know what to do when the tone was presented: they simply had to run to the magazine to collect the food when delivered. That is, the tone eventually associated with the responses that rats emitted in the training situation, thereby obviating any need for activation of the food memory.

      While this explanation is both elegant and interesting, it cannot be applied to the results obtained in the present study where the initial stage of training involved few or many sound-light pairings. That is, unlike in the Holland study where rats in the many condition eventually learned a stimulus-“run to magazine” association that maintained performance in the absence of any mental image of food, in the present study, any stimulus-response association acquired in stage 1 (e.g., orienting responses towards the sources of the auditory and visual stimuli) cannot have contributed to the expression of sensory preconditioned fear at test. Hence, stimulus-response learning in the many condition cannot be invoked to explain the pattern of results in the present study, even if it adequately explains what-appears-to-be a similar finding in the Holland study.

      Nonetheless, we have included a reference to the general style of explanation that was considered and rejected by Holland in his 1998 and 2005 papers. This appears on page 18 (beginning line 434) and reads:

      “Why does increasing the number of sound-light pairings change the way that rats integrate the sound-light and light-shock memories? One possibility is that increasing the number of sound-light pairings in stage 1 reduces the ability of each stimulus to activate the memory of the other. This is consistent with findings by Holland (1998), who showed that the likelihood of mediated learning in rats decreases with the amount of training (see also Holland, 2005); but inconsistent with our findings that, after extended training, rats continue to integrate the sound-light and light-shock associations through chaining at the time of testing (as chaining is predicated on the sound activating the memory of the light after extended training). Instead, we propose that the change in integration occurs because the increased number of sound-light pairings allows the rats to learn about the order in which the sound and light are presented (Figure 1; for evidence that rats acquire order information in sensory preconditioning, see Barnet et al., 1997; Hart et al., 2022; Leising et al., 2007; Miller & Barnet, 1993)…”

      (5) There is also a Holland 2005 paper in which he tests whether extended training of the initial stimulus associations may result in a reduced associability of those stimuli. This would potentially result in lower mediated learning due to a decreased associability of the mediated representation, thereby explaining why extended training reductions in mediated learning occur. Using a probabilistic design, Holland shows that this reduction in mediated learning is likely not due to a change in associability.

      We appreciate the note re Holland (2005) and have included a reference to it in our General Discussion. We agree with Holland that the reduction in mediated learning across extended training is not due to reduced associability of the retrieved stimulus representation. If this were the case, it would remain to explain why stimulus representations continue to be activated at test, which must occur for successful chaining of the sound-light and light-shock associations upon presentations of the sound alone. This is included in the modified text on page 18 (beginning line 434), which is part of our response to point 4.

      Reviewer #3 (Recommendations for the authors):

      (1) I think the 4th intro paragraph is essentially saying that more pairings during preconditioning encourage chaining as opposed to mediated learning - I might recommend clarifying this a bit. It took me a while to put it together.

      Apologies for the confusion. We have clarified the argument at this point in the Introduction with the following insertion on page 4 (beginning line 84):

      “That is, increasing the number of sound-light pairings may allow rats to encode information about stimulus order in stage 1 and, thereby, shift the locus of integration from mediated conditioning in stage 2 to chaining at test in stage 3 (Holmes et al., 2022).”

      (2) In analyzing test data I am assuming percent freezing is the average of the entire 30s or 10s CS period - could this be clarified?

      This is correct and has been clarified in the section for ‘Scoring and Statistics’ on page 29 (beginning line 708):

      “Freezing data were collected using a time-sampling procedure in which each rat was scored as either ‘freezing’ or ‘not freezing’ every two seconds by an observer blind to the rat’s group allocation. A percentage score was then calculated by dividing the number of samples scored as freezing by the total number of samples. The baseline level of freezing was established by scoring the first two min at the start of each experimental session: i.e., we divided the total number of samples scored as freezing by the total number of observed samples, which was 60. The levels of freezing to the 10 s conditioned stimulus and 30 s preconditioned stimulus were established in a similar manner: we scored the entire period of each stimulus presentation and divided the number of samples scored as freezing by the total number of observed samples, which was 5 for each presentation of the conditioned stimulus and 15 for each presentation of the preconditioned stimulus.”

      (3) Complementary to the above - during the probe test is there a difference during the first/last 2s of the CS? This would be interesting with respect to understanding the associative structure encoded.

      We have previously examined whether freezing responses change across the duration of a 30 s preconditioned stimulus and a 10 s conditioned stimulus. We have never seen any such changes: in our past work and in the present series of experiments, the expression of freezing is largely uniform across each presentation of a preconditioned or conditioned stimulus.

      (4) It is sort of unclear to me why more CS-CS pairings produced stronger preconditioned fear - is it that both mediated learning and chaining occur and giving 32 pairings permits both processes more than 8 pairings?

      This is a very reasonable explanation for the heightened level of sensory preconditioned fear among rats that received many sound-light pairings in the initial control experiment. We are, however, reluctant to offer a strong interpretation of this result as it was not replicated across subsequent experiments in the series: i.e., the levels of freezing to the sensory preconditioned stimulus at test were largely the same among vehicle-injected controls that received either few (8) or many (32) sound-light pairings in Experiments 2A and 2B, and again in Experiments 3A and 3B as well as Experiments 4A and 4B.

      (5) I would suggest individual data points overlaid on the bars, violin plots, or box and whisker plots to provide a better visualization of the data.

      We appreciate the suggestion – these have been included overlaid on bars in each histogram_._

      (6) There are other citations that would strengthen arguments for the idea that unidirectional/temporal associative structure can be acquired during (appetitive) sensory preconditioning: Leising 2007 Learning and Behavior, Hart 2022 Current Biology, for example.

      Thank you for these citations. We have included references to the Leising et al (2007) and Hart et al (2022) papers in our discussion on page 18-19 (beginning line 442):

      “Instead, we propose that the change in integration occurs because the increased number of sound-light pairings allows the rats to learn about the order in which the sound and light are presented (Figure 1; for evidence that rats acquire order information in sensory preconditioning, see Barnet et al., 1997; Hart et al., 2022; Leising et al., 2007; Miller & Barnet, 1993)…”

      Editor's note:

      We agree with the suggestions about full statistical reporting for non-significant results and about putting individual data points, perhaps coded to identify sex, on top of the bar graphs. Both will increase the transparency of the rigor of the work for readers.

      We thank the editors and authors for their suggestions. We have included full statistical reporting for non-significant results and overlaid individual data points on the bars in each histogram.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Summary:

      The behavioral switch between foraging and mating is important for resource allocation in insects. This study investigated the role of the neuropeptide, sulfakinin, and of its receptor, the sulfakinin receptor 1 (SkR1), in mediating this switch in the oriental fruit fly, Bactrocera dorsalis. The authors use genetic disruption of sulfakinin and of SkR1 to provide strong evidence that changes in sulfakinin signaling alter odorant receptor expression profiles and antennal responses and that these changes mediate the behavioral switch. The combination of molecular and physiological data is a strength of the study. Additional work would be needed to determine whether the physiological and molecular changes observed account for the behavioral changes observed.

      Strengths:

      (1) The authors show that sulfakinin signaling in the olfactory organ mediates the switch between foraging and mating, thereby providing evidence that peripheral sensory inputs contribute to this important change in behavior.

      (2) The authors' development of an assay to investigate the behavioral switch and their use of different approaches to demonstrate the role of sulfakinin and SkR1 in this process provides strong support for their hypothesis.

      (3) The manuscript is overall well-organized and documented.

      Weaknesses:

      (1) The authors claim that sulfakinin acts directly on SkR1-positive neurons to modulate the foraging and mating behaviors in B. dorsalis. The authors also indicated in the schematic that satiation suppresses SkR1 expression. Additional experiments and more a detailed discussion of the results would help support these claims.

      (2) The findings reported could be strengthened with additional experimental details regarding time of day versus duration of starvation effects and additional genetic controls, amongst others.

      Recommendations for the authors:

      Major issues

      (1) As written the introduction is somewhat fragmented and does not lay out a clear rationale for the current study in the species used by the authors. Others, including Guo et al. (2021) and Wang et al. (2022), have previously shown that sulfakinin signaling pathways are important for feeding and receptivity regulation in D. melanogaster. Thus, the novelty of this study should be more clearly articulated.

      The introduction in the revision is significantly changed to improve the description for the rationale of study (lines 60-66 in the revision).

      (2) In addition, the Introduction should provide more specific background information on the pheromonal activity of oriental fruit fly body extract, the odor-preferences, and the sex pheromone of this species compared to that of model insects such as Drosophila melanogaster.

      The revision contains a paragraph of introduction for chemical ecology of oriental fruit fly that is related to this study (lines 67-75).

      (3) It isn't clear what the first image in Figure 1C represents - is this a schematic of the area or does it represent data?

      The Fig 1C and the associated figure caption are revised. The figure is more visible by changing the track colors. The figure caption is revised as “Representative foraging trajectories in the 100 mm diameter arenas within a 15-min observation period of flies starved for different durations.”

      (4) The authors should include examples of the EAG recordings following the stimulation with food volatiles or pheromones, not only the results of their analyses. This could be included in the main figures or even in supporting information.

      As suggested, we added the examples of the EAG recordings following the stimulation with food odors and body extracts in the Figure 1 and Figure 3.

      (5) The demonstration that removal of the antennae severely impairs mating is dispensable because the antennae are required for other functions in addition to olfaction.

      We agree that the roles of the antennae are likely more than the olfactory function. As suggested, we removed the data.

      (6) It is currently difficult to understand how the authors measured successful rates of foraging. Please provide more details.

      In the revision, we added a sentence describing the method for measuring in detail. See line 269-273.

      (7) The expression of sulfakinin does not change significantly in the antennae following starvation (Figure 2A). Do the authors know whether they change in the central nervous system under these conditions? Have the authors (or has anyone else) checked the expression pattern of sulfakinin in the antennae? This information would help determine whether the sulfakinin signal that acts on SkR1 is released from neurons in the central nervous system (Figure S4C) or whether it is also released from the neurons in the olfactory organs. Based on the immunochemistry results shown in Figure S4C, it would also be interesting to determine whether the intensity of anti-sulfakinin immunoreactivity changes before versus after starvation. This could help establish whether sulfakinin is released during starvation.

      We added the expression data showing the the mRNA level of Sk in the head that is higher after refeeding in Fig. S3. The change in the expression of Sk is also added in the text (lines 107-110). We were unable to identify the Sk neurons in the antennae suggesting possibility of the direct action of humoral Sk on the antennae.

      (8) In Figure 2A, the authors show that the expression levels of some neuropeptides system components change during starvation. However, it would be helpful if the authors could include more detailed information on how the results are shown in the figure legends (e.g., the expression level of each candidate in fed flies was set as 1, etc).

      We revised the figure caption to explain the Figure 2 with the expression values in the figure legend.

      (9) In Figure 2D, null mutant males of sulfakinin and SkR1 consume more food at all times compared to the wild type. However, the corresponding mutant females consume more food only at night. Is this because the wild-type female flies eat more food during the day? In a related issue, Figure 2D shows differences in food consumption measured at different times of day, however, this is not directly addressed in the text, which instead mentions that "the amount of excess food consumed by the mutants was dependent on the duration of the starvation period in both sexes".

      Thank you for the important suggestions. We speculate that the difference of feeding amounts of females occurring only at night is due to the high basal feeding rate of females during the daytime, masking the increase in feeding in the knockout of Sk signaling. As suggested, we have added a relevant description of the difference in food consumption. In addition, we changed the Y-axis scale in the figure for a justified comparison between males and females. See line 123-128.

      (10) It isn't clear how the time of day relates to the duration of starvation. This suggests that mutant females only consume more at 21:00 (presumably at night) whereas males consume more throughout the day. Does this suggest an interaction with the circadian system? What is the duration of starvation in Figure 3A? In a related issue, in Figure 4 it would be useful to know what time of day the EAG analysis was done because the data shown in Figure 2D suggests that the time of day significantly impacts behavioral responses. And does the red versus blue color scheme of the OR subunits represent up/downregulated levels in wild-type animals? Please define this for the reader.

      In addition to the response to the point 9, responding to the issue of feeding amount in females. As the reviewer noted, there was indeed a diurnal difference in food amount consumed by B. dorsalis. However, whether this is related to circadian rhythms is something we haven't studied for further in-depth. Measuring food intake at these 3 times of day, we all ensured that the duration of starvation was the same 12 h. The duration of starvation in Figure 3A is 12h. We have mentioned this in the manuscript. See line 267-268.

      The EAG for sex pheromones and body surface extracts were measured form 21:00-23:00, and food odor was measured from 9:00-11:00. The times of the experiments are described in the revision. See line 309-311.

      Accordingly, we made a revision of the figure caption for explaining the colored fonts. Red color represents a set of ORs related with foraging and blue color is for a set of ORs related with mating. Therefore, the ORs with red color were upregulated in starved wild-type animals and the ORs with blue color were downregulated in starved wild-type flies. We have defined this in the revised manuscript. See line 672-673.

      (11) The authors convincingly show that SKR1 is present in the antennae and is co-expressed with orco. It would be useful to discuss whether this receptor is also expressed in other tissues where there may be additional sites of action of this pathway.

      Indeed, SkR1 is also expressed in the Drosophila brain. We added the discussion on the expression and additional sites of action of SKR1 within the central nervous system. See line 200-205.

      (12) It isn't clear what the dotted arrows in the model shown in Figure 5 represent.

      Dashed arrows represent the additional possible pathways that have not been tested in this study, but not excluded in the model. Please see the discussion for details of additional possible factors modulating odorant sensitivity relevant to satiety. See line 210-229.

      (13) In Figure 5, the authors indicate that satiation suppresses SkR1 expression. It would be helpful if the authors tested the expression level of SkR1 in re-fed flies (by feeding the flies after 12h starvation) to see whether levels of expression are rapidly restored to the levels seen in satiated animals. Such a result could further support the claims made by the authors.

      Thank for your suggestions. Indeed, refeeding after 12h starvation significantly decreased SkR1. We added the result in supporting information (Fig. S3). See line 713. Results see line 107-110.

      (14) The authors show that locomotor activity is unaffected in the mutants but body size comparison would be more useful here since this could also contribute to baseline differences in meal size.

      In the revision, we provided a comparison between WT and Sk-/- in the supplementary data. Results showed that mutant flies have the same body size as the WT flies. (Fig. S7) See line 742. Results see line 120-121.

      (15) Have the authors tested the behavioral phenotypes of heterozygotes mutant of both Sk and SkR1 flies? This may reveal whether a reduced expression of Sk-SkR1 will also cause significant changes in the foraging and mating behaviors seen during starvation.

      We tested the behavioral phenotypes of heterozygous mutant of Sk knockout flies. The results showed that foraging and mating behaviors of Sk heterozygous mutants were unaffected during starvation, suggesting the mutants are completely recessive. We have added the results in supporting information (Fig. S8). See line 746. Results see line 132-135.

      (16) It would be useful to provide information about which SK peptide is detected by the antibody used in Figure S4C. In Figures S4C and S5D, it would be useful to include a counterstain to show that the general morphology is unaffected in the mutants.

      As suggested, we added a detailed description for rabbit anti-BdSk antibody. See line 362-363. We have improved the background image to be available to show the general structure. So counter staining would not be essential.

      (17) The figure legends for supporting figures need to be improved as they are currently difficult to understand. For example, in S2: what is the meaning of "different removal of antennae"? In S3: it isn't clear how the authors evaluated the responses in EAG experiments; in S4A: there are several DNA sequences that do not appear in the main text of the manuscript; in S4C: the meaning of the boxes and the dots is unclear, as is the figure to the left; in S5D, the authors explain only the suppression of SKR1, yet the figure indicates some images for SKR IHC. These are only a few examples; we ask that the authors revise and improve the legends for supporting figures.

      For S2, we removed the data as suggested. For S3, we added a sentence describing the method for measuring in detail. See line 707-709. For S4, the figure in the revision is significantly changed and added a detailed description in the legend (lines 717-724 in the revision). For S5, we have improved our description. See line 731-734. In addition, we have checked all the figure legends of our manuscript and changes were displayed in track version.

      Minor issues

      (1) It isn't clear what the meaning of "the complexity of sulfakinin pathways" is. Please explain.

      We have rewritten the sentence in the revised manuscript by adding the description as “…complexity of Sk pathways, special and temporal dynamics and multiple ligands and receptors, is…”. See line 61-65.

      (2) Please double-check the calls to the various figures in the text.

      We have double-checked the calls to all the figures in the text to make sure they were correct.

      (3) L125: What is the meaning of "olfactory reprogramming"? Please explain.

      We rephrased it to “alteration of olfactory sensitivities”. See line 145.

      (4) L135: After mentioning qRT-PCR the authors should include a call to a figure that shows these results.

      Thank you for your suggestion, the qRT-PCR results are shown in Figure 4B, and we have added it as suggested. See line 154.

      (5) L270: Details are provided for the extraction of the pheromone. However, more details are needed on how the EAG and other functional assays were done.

      We have described the assay procedures in detail in the materials and method part. See line 298-311.

      (6) Figure 2B. Please remove the period(".") at the C-terminal end of WT sk.

      We are sorry for our mistake. We have corrected it.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #3 (Public review):

      Human and simian immunodeficiency viruses (HIV and SIV, respectively) evolved numerous mechanisms to compromise effective immune responses but the underlying mechanisms remain incompletely understood. Here, Yamamoto and Matano examined the humoral immune response in a large number of rhesus macaques infected with the difficult-to-neutralize SIVmac239 strain and identified a subgroup of animals showing significant neutralizing Ab responses. Sequence analyses revealed that in most of these animals (7/9) but only a minority in the control group (2/19) SIVmac variants containing a CD8+ T-cell escape mutation of G63E/R in the viral Nef gene emerged. Functional analyses revealed that this change attenuates the ability of Nef to stimulate PI3K/Akt/mTORC2 signalling. The authors propose that this improved induction of SIVmac239 nAb is reciprocal to antibody dysregulation caused by a previously identified human PI3K gain-of-function mutation associated with impaired anti-viral B-cell responses. Altogether, the results suggest that PI3K signalling plays a role in B-cell maturation and generation of effective nAb responses. Preliminary data indicate that Nef might be transferred from infected T cells to B cells by direct contact. However, the exact mechanism and the relevance for vaccine development requires further studies

      Strengths of the study are that the authors analyzed a large number of SIVmac-infected macaques to unravel the biological significance of the known effect of the interaction of Nef with PI3K/Akt/mTORC2 signaling. This is interesting and may provide a novel means to improve humoral immune responses to HIV. In the revised version the authors made an effort to address previous concerns. Especially, they provide data supporting that Nef might be transferred to B cells by direct cell-cell contact. In addition, the provide some evidence that G63R that also emerged in most animals does not share the disruptive effect of G63G although experimental examination and discussion why G63R might emerge remains poor. Another weakness that remains is that some effects of the G63E mutation are modest and effects were not compared to SIVmac constructs lacking Nef entirely. The evidence for a role of Nef G63E mutation on PI3K and the association with improved nAb responses was largely convincing and it is appreciated that the authors provide additional evidence for a potential impact of "soluble" Nef on neighboring B cells. However, the experimental set-up and the results are difficult to comprehend. It seems that direct cell-cell contact is required and membranes are exchanged. Since Nef is associated with cellular membranes this might lead to some transfer of Nef to B cells. However, the immunological and functional consequences of this remain largely elusive. Alternatively, Nef-mediated manipulation of helper CD4 T cells might also impact B cell function and effective humoral immune responses. As previously noted, the presentation of the results and conclusions was in part very convoluted and difficult to comprehend. While the authors made attempts to improve the writing parts of the manuscript are still challenging to follow. This applies even more to the rebuttal (complex words combined with poor grammar), which made it difficult to assess which concerns have been satisfactory addressed.

      We are grateful for the visionary comments. Based on suggestion, we have edited the writing throughout and appended remarks on certain points raised in the Discussion section. For points that need experimentation, we would like to address them in a follow-up study now under preparation.

      Reviewer #3 (Recommendations for the authors):

      Additional editing of the manuscript is highly recommended to make the results accessible for a broad readership.

      We are grateful for the important suggestion. Accordingly, we have made editing of the manuscript aimed for a broad readership.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Corso-Diaz et al, focus on the NRL transcription factor (TF), which is critical for retinal rod photoreceptor development and function. The authors profile NRL's protein interactome, revealing several RNA-binding proteins (RBPs) among its components. Notably, many of these RBPs are associated with R-loop biology, including DHX9 helicase, which is the primary focus of this study. R-loops are three-stranded nucleic acid structures that frequently form during transcription. The authors demonstrate that R-loop levels increase during photoreceptor maturation and establish an interaction between NRL TF and DHX9 helicase. The association between NRL and RBPs like DHX9 suggests a cooperative regulation of gene expression in a cell-type-specific manner, an intriguing discovery relevant to photoreceptor health. Since DHX9 is a key regulator of R-loop homeostasis, the study proposes a potential mechanism where a cell-type-specific TF controls the expression of certain genes by modulating R-loop homeostasis. This study also presents the first data on R-loop mapping in mammalian retinas and shows the enrichment of R-loops over intergenic regions as well as genes encoding neuronal function factors. While the research topic is very important, there is some concern regarding the data presented: there are substantial data supporting the interaction between NRL and DHX9, including pull-down experiments and proximity labeling assay (PLA), however, the data showing an interaction between NRL and DDX5, another R-loop-associated helicase, are inadequate. Importantly, the data supporting the claim that NRL interacts with R-loops are absolutely insufficient and at best, correlative. The next concerns are regarding the R-loop mapping data analysis and visualization.

      Strengths:

      There is compelling evidence that the NRL transcription factor interacts with several RNA binding proteins, and specifically, sufficient data supporting the interaction of NRL with DHX9 helicase.

      A major strength is the use of the single-stranded R-loop mapping method in the mouse retina.

      Weaknesses:

      (1) Figure S1A: There is a strong band in GST-IP (control IP) for either HNRNPUI1 or HNRNPU, although the authors state in their results that there is a strong interaction of these two RBPs with NRL.

      Under our experimental conditions, most RNA-binding proteins displayed higher binding to glutathione beads (Fig. S1A). However, GST-NRL purifications showed much stronger signals for respective RBPs. In the case of HNRNPU and HNRNPUl1, white bands that are indicative of substrate depletion due to higher protein levels are observed in GST-NRL lanes. Additionally, in Figures 1B and 1C, there is a clear enrichment of HNRNPU and HNRNPUl1 above the background signal. We added this to the text. See page 5.

      Both DHX9 and DDX5 samples have a faint band in the GST-IP.

      RNA-binding proteins may display some background as observed in other studies (e.g. PMID: 32704541). We think that showing the raw data without decreasing the exposure time is useful and that there is a clear enrichment compared to controls.  In addition, we tested the interaction in multiple systems.

      There is an extremely faint band for HNRNPA2B1 in the GST-NRL IP lane. Given this is a pull-down with added benzonase treatment to remove all nucleic acids, these data suggest, that previously observed NRL interactions with these particular RBPs are mediated via nucleic acids. Similarly, there is a loss of band signal for HNRNM in this assay, although it was identified as an NRL-interacting protein in three assays, which again suggests that nucleic acids mediate the interaction.

      Thank you for highlighting this point. We mention in the manuscript that the interaction between HNRNPM and A1 depends on nucleic acids, as noted by the reviewer, since there is no obvious band after the pull-down. We have now added that the interaction of NRL with HNRNPA1B1 is likely dependent on nucleic acids as well, given its weak signal. See page 5.

      (2) The data supporting NRL-DDX5 interaction in rod photoreceptor nuclei is very weak. In Figure 2D, the PLA signal for DDX5-NRL is very weak in the adult mouse retina and is absent in the human retina, as shown in Figure 2H.

      We agree with the reviewer. We think that the signal for DDX5 is weak, and we addressed this in the text. We noted on page 7: “Taken together, these findings suggest a strong interaction between NRL and DHX9 throughout the nuclear compartment in the retina and that a transient and/or more regulated interaction of NRL with DDX5 may require additional protein partners.”  We have modified this sentence to add that the data also suggest transient interaction or the requirement of additional protein partners for stable interaction. See page 7.

      Given that there is no NRL-KO available for the human PLA assay, the control experiments using single-protein antibodies should be included in the assay. Similarly, the single-protein antibody control PLA experiments should be included in the experimental data presented in Figure 2J.

      Thank you for the suggestion. We performed PLAs using both DHX9 and IgG in the human retina and observed no specific amplification signal. Some background is observed outside the nucleus and in the extracellular space. We added these results to the text and to the supplementary information. See page 7 and Fig.S2B.

      (3) The EMSA experiment using a probe containing NRL binding motif within the DHX9 promoter should include incubation with retina nuclear extracts depleted for NRL as a control.

      In EMSA experiments, we used bovine retina to obtain enough protein quantities. As suggested by the reviewer, using NRL depleted extract would increase the specificity of observed gel shift and complement our pre-immune serum as a negative control. However, removal of all the NRL protein using the antibodies available was not feasible. In the future, we will use enough mice to obtain large quantities of protein for this experiment and will collect retinas from Nrl knockout as negative control.

      (4) There is a reduced amount of DHX9 pulled down in NRL-IP in HEK293 cells, but there is no statistically significant difference in the reciprocal IP (DHX9-IP and blotting for NRL) (Figure 4C).

      We believe the reviewer is referring to the data in Figure 4C showing that RNase H treatment led to significantly reduced pulldown of DHX9 as compared to control, but the reciprocal IP in Figure 4D showed no statistical significance between control and RNase H treatment. In Figure 4D, we hypothesize that NRL may account for only a small proportion of DHX9’s interactome, so the change in NRL levels could not be detected due to the sensitivity of our assay. DHX9 likely constitutes a large proportion of NRL’s interactome in HEK293 cells, hence the change in DHX9 level was more obvious when pulling down with NRL. We added this information to the results. See page 8.

      (5) The only data supporting the claim that NRL interacts with R-loops are presented in Figure 5A.

      Additional evidence that NRL interacts with R-loops comes from DRIP-Seq experiments where signals from R-loops overlap with NRL ChIP-Seq signals (Figure 7A). This shows that R-loops and NRL co-occur on multiple genomic regions. In addition, indirect evidence of NRL and R-loops’ interaction is shown in pull down experiments and PLA assays where R-loops influence DHX9 and NRL binding. We clarified this in the discussion. See page 14.

      This is a co-IP of R-loops and then blotting for NRL, DHX9, and DDX5. Here, there is no signal for DDX5, quantification of DHX9 signal shows no statistically significant difference between RNase H treated and untreated samples, while NRL shows a signal in RNase H treated sample. These data are not sufficient to make the statement regarding the interaction of NRL with R-loops.

      Thank you for this comment. We respectfully disagree as we observe statistically significant enrichment for both NRL and DHX9 in these experiments (See Fig5A). Some NRL continues to bind to DNA that is pulled down nonspecifically, which may be expected since NRL is a transcription factor. See for example R-loop binding by the transcription factor Sox2 (PMID: 32704541). However, binding to R-loops is evidenced by an enrichment compared to RNase H-treated sample. We clarified this in Results section (See page 9).

      (6) Regarding R-loop mapping, the data analysis is quite confusing. The authors perform two different types of analyses: either overall narrow and broad peak analysis or strand-specific analysis. Given that the authors used ssDRIP-seq, which is a method designed to map R-loops strand specifically, it is confusing to perform different types of analyses.

      Thank you for highlighting this point. This has enhanced the clarity of the methods and enriched the discussion. We aimed to identify R-loops as accurately as possible. We conducted two types of analyses to capture different aspects of R-loops: one that looks at overall patterns (narrow and broad peaks) and another that focuses on specific strands of DNA.

      Using ssDRIP-seq, which is designed to map R-loops on specific strands, allowed us to examine R-loops formed in only one strand and those formed on both strands. To identify strand-specific R-loops, we filtered our RNase-H enriched peaks for those enriched on one strand compared to the opposite strand. We clarified the analysis in the results section, and Figure 6B. See page 10 and methods section page 25.

      Next, the peak analysis is usually performed based on the RNase H treated R-loop mapping; what does it mean then to have a pool of "Not R-loops", see Figure 6B?

      The “Not R-loop” group refers to peaks called using the opposite strand that are not observed when calling peaks using RNase H as control. We modified this figure for clarity (Figure 6B).

      In that regard, what does the term "unstranded" R-loops mean? Based on the authors' definition, these are R-loops that do not fall within the group of strand-specific R-loops. The authors should explain the reasons behind these types of analyses and explain, what the biological relevance of these different types of R-loops is.

      Thank you for helping us clarify this point. Unstranded R-loops are DNA regions containing DNA:RNA hybrids on both plus and minus strands and possibly representing bidirectional transcription by Pol II. We observed that unstranded R-loops are enriched only in intergenic regions, H3K9me3 regions, and downstream of the transcriptional termination site (TTS). We added to the discussion the possible implications of these enrichments, including regulation of Pol II termination and transcription of long genes.  See Page 13.

      (7) It would be more useful to show the percent distribution of R-loops over the different genomic regions, instead of showing p-value enrichment, see Figure 6C.

      Since most of the genome is non-coding, plotting the distribution as a proportion was not informative since the vast proportion of the data falls in intergenic regions. However, we created a new figure showing observed vs. expected ratio that seems to be more informative and moved the current p-value figure to the supplement in revised version. See Figure 6C and S6D.

      (8) Based on the model presented, NRL regulates R-loop biology via interaction with RBPs, such as DHX9, a known R-loop resolution helicase. Given that the gene targets of NRL TF are known, it would be useful to then analyze the R-loop mapping data across this gene set.

      Thank you for this suggestion. We performed an analysis of R-loops on NRL-regulated genes. Interestingly, NRL target genes have an enrichment of stranded R-loops at the promoter/TSS and unstranded R-loops on the gene body compared to all Ensembl genes (Figure S7B). We added a table containing all NRL-regulated genes we used for this analysis (table S5) and a figure showing this result (Fig. S7B).

      Reviewer #2 (Public review):

      Summary:

      The authors utilize biochemical approaches to determine and validate NRL protein-protein interactions to further understand the mechanisms by which the NRL transcription factor controls rod photoreceptor gene regulatory networks. Observations that NRL displays numerous protein-protein interactions with RNA-binding proteins, many of which are involved in R-loop biology, led the authors to investigate the role of RNA and R-loops in mediating protein-protein interactions and profile the co-localization of R-loops with NRL genomic occupancy.

      Strengths:

      Overall, the manuscript is very well written, providing succinct explanations of the observed results and potential implications. Additionally, the authors use multiple orthogonal techniques and tissue samples to reproduce and validate that NRL interacts with DHX9 and DDX5. Experiments also utilize specific assays to understand the influence of RNA and R-loops on protein-protein interactions. The authors also use state-of-the-art techniques to profile R-loop localization within the retina and integrate multiple previously established datasets to correlate R-loop presence with transcription factor binding and chromatin marks in an attempt to understand the significance of R-loops in the retina.

      Weaknesses:

      In general, the authors provide superficial interpretations of the data that fit a narrative but fail to provide alternative explanations or address caveats of the results. Specifically, many bands are present in interaction studies either in control lanes (GST controls) of Westerns or large amounts of background in PLA experiments.

      We have added additional information to the text regarding the presence of background signals in pull downs. We wish to note that experimental samples always exceeded background signals.  We believe that reporting these raw findings (rather than showing shorter exposures) is valuable for the scientific community. We did not observe any background in the proximity ligation assay (PLA) that exceeded what is typically expected, and the signals were clearly discernible. Cases where signals are weaker, such as with DDX5, have been highlighted. In addition, we added a DHX9-IgG negative control for the human PLA experiment. See page 5 and Fig. S2B.

      Additionally, the lack of experiments testing the functional significance of Nrl interactions or R-loops within the developing retina fails to provide novel biological insights into the regulation of gene regulatory networks other than, 'This could be a potentially important new mechanism'.

      We agree that functional experiments are necessary to understand the molecular mechanisms behind R-loop regulation in the retina; however, we believe it goes beyond the scope of this initial characterization (as this is the first report on R-loops in the retina). We are currently pursuing these studies.

      We performed new analysis on NRL-regulated genes as suggested by reviewer 1. We show that NRL target genes have an enrichment of stranded R-loops at the promoter/TSS and unstranded R-loops on the gene body compared to all Ensembl genes (Figure S7B), providing further evidence of the functional  interaction between NRL and R-loops. See table S5 and Fig. S7B, and discussion.

      Additionally, the authors test the necessity of RNA for NRL/DHX9 interactions but don't show RNA binding of NRL or DHX9 or the sufficiency of RNA to interfere/mediate protein-protein interactions. Recent work has highlighted the prevalence of RNA binding by transcription factors through Arginine Rich Motifs that are located near the DNA binding domains of transcription factors.

      We agree that the role of RNA in these complexes is very exciting, and we are currently pursuing these studies. However, we believe that they fall outside the scope of this initial report on R-loops in the retina.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      There are a couple of minor comments:

      (1) Unfinished sentence; page 11, the end of the first paragraph.

      Thank you for catching this error. We removed the unfinished text.

      (2) Page 6: Figure S2A should be Figure S2.

      In general, the manuscript would benefit from a deeper explanation of the biological relevance of R-loop formation and the connection to NRL TF and the expression of genes regulated by NRL. In this regard, a more substantial description of the model would be useful.

      We have modified the discussion for clarity and included new ideas on possible roles of R-loops in gene regulation of photoreceptors.

      Reviewer #2 (Recommendations for the authors):

      (1) The specificity of interactions needs to be addressed:

      - Figure 1B - HNRNPUI1 bands present in GST control.

      - Figure 1C - Bands present in the Empty Vector control IP for HNRNPU and DHX9.

      - Supplemental Figure 1A - most proteins are present in GST control suggesting prevalent binding to GST and lack of specificity for other interactions.

      Thank you for your comment. RNA-binding proteins can have more background as observed in other studies (e.g. PMID: 32704541) but there is always a higher signal in experimental samples compared to controls. While we agree that we can enhance the conditions for immunoprecipitation (IP) by optimizing washing buffers, exposure and other parameters, we believe the current methods tell the story. We have added additional text explaining this. See page 5.

      (2) Use of the term 'Strongest' interaction - IPs don't directly address the strength of interaction, but depend on levels of expression AND affinity. The strength of interaction should be tested using techniques like an OCTET or SPR assay. One can also quantify the effect that RNA would have in such an assay.

      Thank you for your suggestion. We replaced the term 'stronger' with “higher signal” and “robust” at most places. The source of protein lysates is the same for experiments and controls, thus the amount of protein is consistent in both conditions, and not dependent on level of gene expression.

      (3) In supplemental tables, please use the proper gene names, not the UniProt peptide name. For example, there are no genes named ELAV1-ELAV4. These should be ELAVL1-ELAVL4. A short glance identifies >10 gene name errors.

      Thank you for the suggestion. We updated current gene names in all tables.

      (4) Please provide the rationale for the choice of DNA sequence for the DHX9 nucleotide sequence used for EMSA assays. In the human DHX9 locus, the NRL ChIP-seq peak looks to be contained in Intron1 whereas the NRL ChIP-seq peak in mouse DHX9 looks to be in the proximal upstream promoter. Did the authors choose an evolutionarily conserved sequence in the promoter region that contained the NRL motif or does the probe sequence arise from the sequence that has known NRL binding as assayed by NRL ChIP-seq? A zoomed-in image of the NRL ChIP-seq pile-ups in the DHX9 locus in each species would be beneficial.

      Thank you for this suggestion. The probe was chosen by scanning for NRL binding motifs on the Chip-Seq peak at the human DHX9 promoter. We added a Zoom-in image of the ChIP-Seq or CUT&RUN reads for NRL on both human and mouse retinas. Figure 3D shows NRL binding in both species in regions containing the homologous motif. The sequence is partially conserved and shown in the figure.

      (5) Normalization in RNaseH/RNaseA Co-IP experiments. Why does RNAseH treatment result in increased NRL IP (increased NRL expression?) or does RNaseA treatment cause reduced IP of DHX9? These differences seem to cause a 'denominator' effect, leading the Authors to conclude decreased co-IP of DHX9 with NRL when R-loops are inhibited or increased co-IP of NRL with DHX9 when RNA is degraded. An alternate interpretation would be that inhibiting the R-loop binding of NRL unmasks the epitope for antibody recognition. The authors should test NRL binding to RNA and determine if RNA binding affects the co-IP of NRL with DHX9.

      We agree that removing total RNA by RNase A or R-loops by RNase H may alter the accessibility of our antibodies to the epitopes, resulting in the differences in the level of total protein pulled down. However, we quantified the relative level of the associating protein to the total protein and confirmed, in reciprocal assays, that RNase A treatment led to increased interaction between NRL and DHX9. However, the quantification was not consistent between the reciprocal IPs upon RNase H treatment. We reason that in Figure 4D, as NRL may account for only a small proportion of DHX9’s interactome, the change in NRL level could not be detected due to the sensitivity of our assay. However reciprocally, DHX9 can constitute a larger proportion of NRL’s interactome in HEK293 cells, hence the change in DHX9 level was more obvious. We added this information to the text. See page 8.

      (6) Figure 7 - Malat1 - there doesn't seem to be an overlap of NRL with Stranded R-loop peaks in this image. Nrl seems to flank the region of R-loops.

      We changed Malat1 for Mplkip that shows a direct overlap of Nrl binding and R-loops. See Figure 7C.

      (7) Results end with 'A Model'. Seems like some concluding remarks and references to Figure 8 were mistakenly left out.

      Thank you for catching this typo. We removed the misplaced text.

      (8) Model and Discussion - authors should show raw data for RHO with respect to NRL binding and R-loops. No evidence was provided regarding R-loops (or lack thereof) in the Rhodopsin locus. Additionally, conclusions stating that "R-loops... are specifically depleted from genes, such as Rhodopsin, with high expression levels" go against Figures 7B and 7C. Malat1 is one of the highest expressed genes in the retina and contains R-loops.

      Thank you for helping us clarify our hypothesis. We added a genome browser view of Rhodopsin showing the absence of R-loops (Fig. S8). We hypothesize that R-loops could interfere with achieving higher rates of transcription, however we did not mean to say that all high expressed genes lack R-loops. We have rephrased the discussion to clarify this point.

      (9) Neuronal genes, particularly those involved in synaptic transmission are known to be, on average, longer than most genes (Gabel, 2015; PMID: 25762136). Is it possible that R-loops are detected at genes involved in synaptic function/structure solely because of transcript length, as it takes longer for transcription termination to resolve in genes that are longer? A plot showing R-loop enrichment and transcript length would address this.

      We added a plot showing gene length in relation to R-loops and expression levels. We observed that R-loops are more common over long genes regardless of their expression levels. We also observed that the concomitant presence of stranded and unstranded R-loops is restricted to the longest genes in most cases. We added this to Figure 7D.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for valuable feedback and comments. Based on the feedback we revised the manuscript and believe that we addressed most of the reviewers' raised points. Below we include a summary of key revisions and point-by-point responses to reviewers comments.

      Abstract/Introduction

      We further emphasized EP-GAN strength in parameter inference of detailed neuron parameters vs specialized models with reduced parameters.

      Results

      We further elaborated on the method of training EP-GAN on synthetic neurons and validating on both synthetic and experimental neurons.

      We added a new section Statistical Analysis and Loss Extension which includes:

      - Statistical evaluation of baseline EP-GAN and other methods on neurons with multi recording membrane potential responses/steady-state currents data: AWB, URX, HSN

      - Evaluation of EP-GAN with added resting potential loss + longer simulations to ensure stability of membrane potential (EP-GAN-E)

      Methods

      We added a detailed explanation on "inverse gradient process"

      We added detailed current/voltage-clamp protocols for both synthetic and experimental validation and prediction scenarios (table 6)

      Supplementary

      We added error distribution and representative samples for synthetic neuron validations (Fig S1)

      We added membrane potential response statistical analysis plots for existing methods for AWB, URX, HSN (Fig S6)

      We added steady-state currents statistical analysis plots on EP-GAN + existing methods for AWB, URX, HSN (Fig S7)

      We added mean membrane potential errors for AWB, URX, HSN normalized by empirical standard deviations for all methods (Table S4)

      Please see our point-by-point responses to specific feedback and comment below.

      Reviewer 1:

      First, at the methodological level, the authors should explain the inverse gradient operation in more detail, as the reconstructed voltage will not only depend on the evaluation of the right-hand side of the HH-equations, as they write but also on the initial state of the system. Why did the authors not simply simulate the responses?

      We thank the reviewer for the feedback regarding the need for further explanation. We have revised the Methods section to provide a more detailed description of the inverse gradient process. The process uses a discrete integration method, similar to Euler’s formula, which takes systems’ initial conditions into account. For the EP-GAN baseline, the initial states were picked soon after the start of the stimulus to reconstruct the voltage during the stimulation period. For EP-GAN with extended loss (EP-GAN-E), introduced in this revision in sub-section Statistical Analysis and Loss Extension, initial states before/after stimulations were also taken into account to incorporate resting voltage states into target loss.

      Since EP-GAN is a neural network and we want the inverse gradient process to be part of the training process (i.e., making EP-GAN a “model informed network”), the process is expected to be implemented as a differentiable function of generated parameter p. This enables the derivatives from reconstructed voltages to be traced back to all network components via back-propagation algorithm.

      Computationally, this requires the implementation of the process as a combination of discrete array operations with “auto-differentiation”, which allows automatic computation of derivatives for each operation. While explicit simulation of the responses using ODE solvers provides more accurate solutions, the algorithms used by these solvers typically do not support such specialized arrays nor are they compatible with neural network training. We thus utilized PyTorch tensors [54], which support both auto-differentiation and vectorization to implement the process.

      The authors did not allow the models time to equilibrate before starting their reconstruction simulations, as testified by the large transients observed before stimulation onset in their plots. To get a sense of whether the models reproduce the equilibria of the measured responses to a reasonable degree, the authors should allow sufficient time for the models to equilibrate before starting their stimulation protocol.

      In the added Statistical Analysis and Loss Extension under the Results section, we added results for EP-GAN-E where we simulate the voltage responses with 5 seconds of added stabilization period in the beginning of simulations. The added period mitigates voltage fluctuations observed during the initial simulation phase and we observe that simulated voltage responses indeed reach stable equilibrium for both prior stimulations and for the zero stimulus current-clamp protocol (Figure 5 bottom, Column 3).

      In fact, why did the authors not explicitly include the equilibrium voltage as a target loss in their set of loss functions? This would be an important quantity that determines the opening level of all the ion channels and therefore would influence the associated parameter values.

      EP-GAN baseline does include equilibrium voltage as a target loss since all current-clamp protocols used in the study (both synthetic and experimental) include a membrane potential trace where the stimulus amplitude is zero throughout the entire recording duration (see added Table 6 for current clamp protocols), thus enforcing EP-GAN to optimize resting membrane potential alongside with other non-zero stimulus current-clamp scenarios.

      To further study EP-GAN’s accuracy in resting potential, we evaluated EP-GAN with supplemental resting potential target loss and evaluated its performance in the sub-section Statistical Analysis and Loss Extension. The added loss, combined with 5 seconds of additional stabilization period, improved accuracy in predicting resting potentials by mitigating voltage fluctuations during the early simulation phase and made significant improvements to predicting AWB membrane potential responses where EP-GAN baseline resulted in overshoot of the resting potential.

      The authors should provide a more detailed evaluation of the models. They should explicitly provide the IV curves (this should be easy enough, as they compute them anyway), and clearly describe the time-point at which they compute them, as their current figures suggest there might be strong transient changes in them.

      We included predicted IV-curve vs ground truth plots in addition to the voltages in the supplementary materials (Figure S2, S5) in the original submitted version of the manuscript. In this revision, we added additional IV-curve plots with statistical analysis for the neurons with multi-recording data (AWB, URX, HSN) in the supplementary materials (Figure S7).

      For the evaluation of predicted membrane potential responses, we added further details in Validation Scenarios (Synthetic) under Results section such that it clearly explains on the current-clamp protocols used for both synthetic and experimental neurons and which time interval the RMSE evaluations were performed.

      In the sub-section Statistical Analysis and Loss Extension, we introduced a new statistical metric in addition to RMSE, applied for neurons AWB, URX, HSN which evaluates the percentage of predicted voltages that fall within the empirical range (i.e., mean +- 2 std) and voltage error normalized by empirical standard deviations (Table S4).

      The authors should assess the stability of the models. Some of the models exhibit responses that look as if they might be unstable if simulated for sufficiently long periods of time. Therefore, the authors should investigate whether all obtained parameter sets lead to stable models.

      In the sub-section Statistical Analysis and Loss Extension, we included individual voltage traces generated by both EP-GAN baseline and EP-GAN-E (extended) with longer simulation (+5 seconds) to ensure stability. EP-GAN-E is able to produce equilibrium voltages that are indeed stable and within empirical bounds throughout the simulations for the zero-stimulus current-clamp scenario (column 3) for the 3 tested neurons (AWB, URX, HSN).

      Minor:

      The authors should provide a description of the model, and it's trainable parameters. At the moment, it is unclear which parameter of the ion channels are actually trained by the methodology.

      The detailed description of the model and its ion channels can be found in [7]. Supplementary materials also include an excel table predicted parameters which lists all EP-GAN fitted parameters for 9 neurons (+3 new parameter sets for AWB, URX, HSN using EP-GAN-E) included in the study, the labels for trainability, and their respective lower/upper bounds used during training data generation. In the revised manuscript, we further elaborated on the above information in the second paragraph of the Results section.

      Reviewer 2:

      Major 1: While the models generated with EP-GAN reproduce the average voltage during current injections reasonably well, the dynamics of the response are not well captured. For example, for the neuron labeled RIM (Figure 2), the most depolarized voltage traces show an initial 'overshoot' of depolarization, i.e. they depolarize strongly within the first few hundred milliseconds but then fall back to a less depolarized membrane potential. In contrast, the empirical recording shows no such overshoot. Similarly, for the neuron labeled AFD, all empirically recorded traces slowly ramp up over time. In contrast, the simulated traces are mostly flat. Furthermore, all empirical traces return to the pre-stimulus membrane potential, but many of the simulated voltage traces remain significantly depolarized, far outside of the ranges of empirically observed membrane potentials. While these deviations may appear small in the Root mean Square Error (RMSE), the only metric used in the study to assess the quality of the models, they likely indicate a large mismatch between the model and the electrophysiological properties of the biological neuron.

      EP-GAN main contribution is targeted towards parameter inference of detailed neuron model parameters, in a compute efficient manner. This is a difficult problem to address even with current state-of-the-art fitting algorithms. While EP-GAN is not perfect in capturing the dynamics of the responses and RMSE does not fully reflect the quality of predicted electrophysiological properties, it’s a generic error metric for time series that is easily interpretable and applicable for all methods. Using such a metric, our studies show that EP-GAN overall prediction quality exceeds those of existing methods when given identical optimization goals in a compute normalized setup.

      In our revised manuscript, we included a new section Statistical Analysis and Loss Extension under Results section where we performed additional statistical evaluations (e.g., % of predicted responses within empirical range) of EP-GAN’s predictions for neurons with multi recording data. The results show that predicted voltage responses from EP-GAN baseline (introduced in original manuscript) are in general, within the empirical range with ~80% of its responses falling within +- 2 empirical standard deviations, which were higher than existing methods: DEMO (57.9%), GDE3 (37.9%), NSDE (38%), NSGA2 (60.2%).

      Major 2: Other metrics than the RMSE should be incorporated to validate simulated responses against electrophysiological data. A common approach is to extract multiple biologically meaningful features from the voltage traces before, during and after the stimulus, and compare the simulated responses to the experimentally observed distribution of these features. Typically, a model is only accepted if all features fall within the empirically observed ranges (see e.g. https://doi.org/10.1371/journal.pcbi.1002107). However, based on the deviations in resting membrane potential and the return to the resting membrane potential alone, most if not all the models shown in this study would not be accepted.

      In our original manuscript, due to all of our neurons’ recordings having a single set of recording data, RMSE was chosen to be the most generic and interpretable error metric. We conducted additional electrophysiological recordings for 3 neurons in prediction scenarios (AWB, URX, HSN) and performed statistical analysis of generated models in the sub-section Statistical Analysis and Loss Extension. Specifically, we evaluated the percentage of predicted voltage responses that fall within the empirical range (empirical mean +- 2 std, p ~ 0.05) that encompass the responses before, during and after stimulus (Figure 5, Table 5) and mean membrane potential error normalized by empirical standard deviations (Table S4).

      The results show that EP-GAN baseline achieves average of ~80% of its predicted responses falling within the empirical range, which is higher than the other methods: DEMO (57.9%), GDE3 (37.9%), NSDE (38%), NSGA2 (60.2%). Supplementing EP-GAN with additional resting potential loss (EPGAN-E) increased the percentage to ~85% with noticeable improvements in reproducing dynamical features for AWB (Figure 5). Evaluations of membrane potential errors normalized by empirical standard deviations also showed similar results where EP-GAN baseline and EP-GAN-E have average error of 1.0 std and 0.7 std respectively, outperforming DEMO (1.7 std), GDE3 (2.0 std), NSDE (3.0 std) and NSGA (1.5 std) (Table S4).

      Major 3: Abstract and introduction imply that the 'ElectroPhysiome' refers to models that incorporate both the connectome and individual neuron physiology. However, the work presented in this study does not make use of any connectomics data. To make the claim that ElectroPhysiomeGAN can jointly capture both 'network interaction and cellular dynamics', the generated models would need to be evaluated for network inputs, for example by exposing them to naturalistic stimuli of synaptic inputs. It seems likely that dynamics that are currently poorly captured, like slow ramps, or the ability of the neuron to return to its resting membrane potential, will critically affect network computations.

      In the paper, EP-GAN is introduced as a parameter estimation method that can aid the development of ElectroPhysiome, which is a network model - these are two different method types and we do not claim EP-GAN is a model that can capture network dynamics. To avoid possible confusion, we made further clarifications in the abstract/introduction that EP-GAN is a machine learning approach for neuron HH-parameter estimation.

      I find it hard to believe that the methods EP-GAN is compared to could not perform any better. For example, multi-objective optimization algorithms are often successful in generating models that match empirical observations very well, but features used as target of the optimization need to be carefully selected for the optimization to succeed. Likely, each method requires extensive trial and error to achieve the best performance for a given problem. It is therefore hard to do a fair comparison. Given these complications, I would like to encourage the authors to rethink the framing of the story as a benchmark of EP-GAN vs. other methods. Also, the number of parameters does not seem that relevant to me, as long as the resulting models faithfully reproduce empirical data. What I find most interesting is that EP-GAN learns general relationships between electrophysiological responses and biophysical parameters, and likely could also be used to inspect the distribution of parameters that are consistent with a given empirical observation.

      We thank the reviewer for providing this perspective. While it is indeed difficult to have a completely fair comparison between existing optimization methods vs EP-GAN due to the fundamental differences in their algorithms, we believe that the current comparisons with other methods are justified as they provide baseline performance metrics to test EP-GAN for its intended use cases.

      The main strength of EP-GAN, as previously mentioned, is in its ability to efficiently navigate large detailed HH-models with many parameters so that it can aid in the development of nervous system models such as ElectroPhysiome, potentially fitting hundreds of neurons in a time efficient manner.

      While EP-GAN’s ability to learn the general relationship between electrophysiological responses and parameter distribution are indeed interesting and warrant a more careful examination, this is not the main focus of the paper since in this work we focus on introducing EP-GAN as a methodology for parameter inference.

      In this context, we believe the comparisons with other methods conducted in a compute normalized manner (i.e., each method is given the same # of simulations) and identical optimization targets provides an adequate framework for evaluating the aforementioned EP-GAN aim. Indeed, while EPGAN excels with larger HH-models, it performs slightly worse than DE for smaller models such as the one used by [16] despite it being more compute efficient (Table S2).

      To emphasize the EP-GAN aim, we revised the main manuscript description to focus on its intended use in parameter inference of detailed neuron parameters vs specialized models with reduced parameters.

      I could not find important aspects of the methods. What are the 176 parameters that were targeted as trainable parameters? What are the parameter bounds? What are the remaining parameters that have been excluded? What are the Hodgkin-Huxley models used? Which channels do they represent? What are the stimulus protocols?

      The detailed description and development of the HH-model that we use and its ion channel list can be found in [7]. Supplementary materials also include an excel table predicted parameters which lists all EP-GAN fitted parameters for 9 neurons (+3 new parameter sets for AWB, URX, HSN using EPGAN-E), the labels for trainability, and parameter bounds used for parameters during the generation of training data.

      We also added a new Table which details the current/voltage clamp protocols used for 9 neurons including the ones used for evaluating EP-GAN-E, which was supplemented with longer simulation time to ensure voltage stability (please see Table 6).

      I could not assess the validation of the EP-GAN by modeling 200 synthetic neurons based on the data presented in the manuscript since the only reported metric is the RMSE (5.84mV and 5.81mV for neurons sampled from training data and testing data respectively) averaged over all 200 synthetic neurons. Please report the distribution of RMSEs, include other biologically more relevant metrics, and show representative examples. The responses should be carefully investigated for the types of mismatches that occur, and their biological relevance should be discussed. For example, is the EP-GAN biased to generate responses with certain characteristics, like the 'overshoot' discussed in Major 1? Is it generally poor at fitting the resting potential?

      We thank the reviewer for the feedback regarding the need for additional supporting data for synthetic neuron validations. In the revised supplementary materials Figure S1, we included the distribution of RMSE errors for both groups of synthetic neuron validations (validation/test set) and representative samples for both EP-GAN baseline and EP-GAN-E. Notably, the inaccuracies observed during the experimental neuron predictions (e.g., resting potential, voltage overshoot) do not necessarily generalize to synthetic neurons, indicating that such mismatches could stem from the differences between synthetic neurons used for training and experimental neurons for predictions. While synthetic neurons are generated according to empirically determined parameter bounds, some experimental neuron types are rarer than the others and may also involve other channels that have not been recorded or modeled in [7], which can affect the quality of predicted parameters (see 2nd and 4th paragraphs of Discussions section for more detail). Also, properties such as recording error/noise that are often present in experimental neurons are not fully accounted for in synthetic neurons.

      To further study how these mismatches can be mitigated, in the revision we added an extended version of EP-GAN where target loss was supplemented with additional resting potential and 5 seconds of stabilization period during simulations (EP-GAN-E described in Statistical Analysis and Loss Extension). With such extensions, EP-GAN-E was able to improve its accuracies on both resting potentials and dynamical features with the most notable improvements on AWB where predicted voltage responses closely match slowly rising voltage response during stimulation. EPGAN-E is an example of further extensions to loss function that account for additional experimental features.

      Furthermore, the conclusion of the ablation study ('EP-GAN preserves reasonable accuracy up to a 25% reduction in membrane potential responses') does not seem to be justified given the voltage traces shown in Figure 3. For example, for RIM, the resting membrane potential stays around 0 mV, but all empirical traces are around -40mV. For AFD, all simulated traces have a negative slope during the depolarizing stimuli, but a positive slope in all empirically observed traces. For AIY, the shape of hyperpolarized traces is off.

      Since EP-GAN baseline optimizes voltage responses during the stimulation period, RMSE was also evaluated with respect to this period. From these errors, we evaluated whether the predicted voltage error for each ablation scenario fell within the 2 standard deviations from the mean error obtained from synthetic neuron test data (i.e. the baseline performance). We found that for input ablation for voltage responses, the error was within such range up to 25% reduction whereas for steady-state current input ablation, all 25%, 50% and 75% reductions resulted in errors within the range.

      We extended the “Ablation Studies” sub-section so that the above reasoning is better communicated to the readers.

      Additionally, I found a number of minor issues:

      Minor 1: Table 1 lists the number of HH simulations as '32k (11k · 3)'. Should it be 33k, since 11.000 times 3 is 33.000? Please specify the exact number of samples.

      Minor 2: x- and y-ticks are missing in Fig 2, Fig 3, Fig S1, Fig S2, Fig S3 and Fig S4.

      Minor 3: All files in the supplementary zip file should be listed and described.

      Minor 4: Code for training the GAN, generation of training datasets and for reproducing the figures should be provided.

      Minor 5: In the reference (Figure 3A, Table 1 Row 2): should this refer to Table 2?

      Minor 6: 'the ablation is done on stimulus space where a 50% reduction corresponds to removing half of the membrane potential responses traces each associated with a stimulus.' - which half is removed?

      We thank the reviewer for pointing out these errors in the original manuscript. The revised manuscript includes corrections for these items. We will publish the python code reproducing the results in the public repository in the near future.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Since multiple Reviewers requested that the results describing effects of TTX treatment on GluA2 receptor levels detected by immunofluorescence and confocal imaging be revised, we have made substantial changes, which are described below. We believe the changes have greatly improved the manuscript and thank the reviewers for their comments.

      Lack of significant increase in GluA2 receptor data is due to too few cultures sampled; anything could have happened [in one] particular dissociation. A concern that the TTX effect might vary greatly from culture to culture was why we felt it was important to match the receptor measurements on the same cultures that we recorded mEPSCs. We now present the culture means in Figure 5A (mEPSCs) and 5B (GluA2 receptor cluster size). These plots make it clear that the variability in the GluA2 receptor cluster size effect is not attributable to a failure of that culture to show a homeostatic effect. That is, the variability in GluA2 receptor effect is independent of the variability in mEPSC effect. To increase sample size, we examined 2 additional cultures for synaptic GluA2 receptor levels in control vs. TTX treatment. These cultures showed very modest increases (Figure 5C). When cell means from these experiments were pooled with those from the 3 matched cultures, the TTX effect was still not statistically significant (Figure 5G).

      Lack of significant increase in GluA2 receptor data is due to the choice to restrict our analysis to the primary dendrite, close to the cell body. We restricted our analysis to the primary dendrite because Figure 3 in Turrigiano et al, 1998, shows the increased response to exogenously applied glutamate after TTX treatment is greatest close to the cell body and wanes as the glutamate is applied further away (added to Results, new lines 388-389).

      Variability in GluA2 receptor data is due to the much smaller number of synapses sampled, compared to mEPSCs. We matched the sampling for mEPSC amplitude data to that of imaging data by taking only 20 samples from each electrophysiological recording. Each mEPSC represents one synapse; in a set of 20 mEPSCs some might come from the same synapse, so that we are sampling from £ 20 synapses. The effect of TTX on mEPSC amplitudes remained significant despite the reduced samples per cell (Figure 5A).

      Why do we fail to show a significant increase in receptors when this has been shown in many studies?

      We have added to our discussion the point that several studies, including Wang et al. 2019, use the number of puncta, rather than the number of cells, as the sample number. We ran an analysis of GluA2 receptor cluster size where we sampled multiple synapses per cell, and used the number of clusters as the sample n. We found that even with as few as 6 synapses randomly selected from each cell, the effect of TTX on GluA2 receptor cluster size became highly significant (p = 0.001 for data from 3 cultures and p = 0.005 for data from 5 cultures) (see new lines 400-406 in Discussion). In sum, our data are not very different from that of some previous studies. We are not arguing that receptors do not increase. Instead our point is that the increase is more variable than the increase in MESPC amplitude and thus takes a much bigger sample size to detect. In sum, the difference between the mEPSC data and the receptor data is that the mEPSC data consistently show a ~20-25% increase, whereas the receptor data do not always show an increase and sometimes the increase is only ~10%. Finally, we added two matched culture experiments examining synaptic GluA1 receptor cluster characteristics. GluA1 receptor cluster size decreased in one culture, and increased very modestly in the other (Supplemental Figure 1B), whereas mEPSC amplitude robustly increased (Supplemental Figure 1A; Results, new lines 265-268).

      We conclude that these data support the idea that there is another contributor to the TTXinduced increase in quantal size.

      Other changes in presentation of GluA2 receptor results: Since the effects on intensity and integral are of lesser magnitude than that on cluster size, we have removed these results from the graphs, although they are presented in Table 1. We have removed Figure 6, the presentation of individual culture results, since these results are now conveyed in Figure 5A-C. We have removed graphs depicting GluA2 receptor cluster size in response to TTX in Rab3A-/- cultures, but these data are still presented in Table 1.

      We address other detailed comments below.

      Public Reviews:

      Reviewer #1 (Public review):

      (2) The effects of Rab3A on TTX-induced mini frequency modulation remains unclear, because TTX does not induce a change in mini frequency in the Rab3A+/Ebd control (Fig. 2). The respective conclusions should be revised accordingly (l. 427).

      The effects on mini frequency were added for completeness, but given the lack of consistently significant changes with TTX treatment or changes in the KO or Rab3A<sup>Ebd/Ebd</sup> cultures, we have removed comment on these results from the Discussion.

      (3) The model is still not supported by the data. In particular, data supporting a negative regulation of Rab3A by APs, Rab3A-dependent release of a tropic factor, or a Rab3Adependent increase in GluA2 abundance are not presented.

      We have removed the model from the manuscript.

      (4) Data points are not overlapping and appear "quantal" in most box plots. How were the data rounded?

      The appearance of quantal variation in cell amplitude means is due to the binning that is part of the creation of the box plot. We have not remade the figures without binning, because the binning provides a visual depiction of the distribution of the data points. We have added the bin sizes to the appropriate figure legends.

      Reviewer #2 (Public review):

      However, the authors still have not provided further investigation of the mechanisms behind the role of Rab3A in this form of plasticity, and the revision therefore has added little to the significance of the study. Moreover, the experimental design for the investigation of the mismatch between mEPSC amplitude and GluA2 cluster fluorescence remains questionable, making it difficult to draw any credible conclusions from groups of data that not only look similar to the eye but also show no significance statistically.

      To our knowledge, no other study has matched measurements of mEPSC amplitude in the same cultures where synaptic receptor levels were assessed. As stated above, we have revised the presentation of GluA2 receptor results, concluding from the lack of significant effects on receptor levels that the mEPSC amplitude increase cannot be fully explained by the receptor data (which is strengthened by addition of two more cultures analyzed for GluA2 immunofluorescence). This is an important addition to the significance of the study.

      In summary, this study establishes that neuronal Rab3A plays a role in homeostatic synaptic plasticity, but so do a number of other molecules that have been implicated in homeostatic synaptic plasticity in the past two decades (only will grow with the new techniques such as RNAseq). Without going beyond this finding and demonstrating how exactly Rab3A participates in the induction and/or expression of this form of plasticity, or maybe the potential Rab3A-mediated functional and behavioral defects in vivo, the contribution of the current study to the field is limited. However, given the presynaptic location of Rab3A, this finding could serve as a starting point for researchers interested in pre-postsynaptic cross-talk during homeostatic plasticity in general.

      We previously published a review in which we list 19 molecules known at that time to be important for homeostatic synaptic plasticity (see Table 2, Koesters et al., 2024), and they fall into two categories: molecules involved in glutamate receptor expression or trafficking, and signaling molecules. Rab3A is the first synaptic vesicle protein to be implicated in homeostatic plasticity of quantal size. We have added this point to the Discussion, new lines 473-476. By demonstrating that Rab3A is not acting in glia (which release TNF, which regulates receptor expression), and that GluA2 receptor levels do not explain the homeostatic mEPSC increase in our experimental conditions, we have ruled out two major mechanisms.

      Reviewer #3 (Public review):

      Other questions arise from the NASPM experiments, used to justify looking at GluA2 (and not GluA1) in the immunostaining. First, there is a frequency effect that is unclear in origin. One would expect NASPM to merely block some fraction of the post-synaptic current, and not affect pre-synaptic release or block whole synapses. However the change in frequency seems to argue (as the authors do) that some synapses only have CP-AMPARs, while the rest of the synapses have few or none. Another possibility is that there are pre-synaptic NASPM-sensitive receptors that influence release probability. Further, the amplitude data show a strong trend towards smaller amplitude following NASPM treatment (Fig 3B). The p value for both control and TTX neurons was 0.08 - it is very difficult to argue that there is no effect. The decrease on average is larger in the TTX neurons, and some cells show a strong effect. It is possible there is some heterogeneity between neurons on whether GluA1/A2 heteromers or GluA1 homomers are added during HSP. This would impact the weakly supported conclusions about the GluA2 imaging vs mEPSC amplitude data.

      We cannot rule out that the NAPSM-induced decrease in mEPSC frequency is due to a loss of presynaptic glutamate receptor enhancement of release probability, and have added this statement to the Results, new lines 202-204. Regarding the p value of 0.08—we are not arguing that NASPM has no effect on mEPSC amplitude, only that it has no effect on the homeostatic increase in amplitude after TTX treatment. An increase in GluA1/A2 heteromers should have been detected in our imaging studies.

      Unaddressed issues that would greatly increase the impact of the paper:

      (1) Is Rab3A acting pre-synaptically, post-synaptically or both? The authors provide good evidence that Rab3A is acting within neurons and not astrocytes. But where it is acting (pre or post) would aid substantially in understanding its role. They could use sparse knockdown of Rab3A, or simply mix cultures from KO and WT mice (with appropriate tags/labels). The general view in the field has been that HSP is regulated post-synaptically via regulation of AMPAR trafficking, and considerable evidence supports this view. The more support for their suggestion of a pre-synaptic site of control, the better.

      We agree that doing co-cultures of Rab3A-/- and Rab3A+/+ neurons is the definitive experiment to determine the locus of action of Rab3A in homeostatic synaptic plasticity. We hope to examine this question in a future manuscript.

      (2) Rab3A is also found at inhibitory synapses. It would be very informative to know if HSP at inhibitory synapses is similarly affected. This is particularly relevant as at inhibitory synapses, one expects a removal of GABARs (ie the opposite of whatever is happening at excitatory synapses). If both processes are regulated by Rab3A, this might suggest a role for this protein more upstream in the signaling; an effect only at excitatory synapses would argue for a more specific role just at these synapses.

      We agree that it would be very interesting to determine if the homeostatic decrease in mIPSCs after activity blockade depends on Rab3A. We hope to address this question in the future.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      Minor points:

      The abstract is a bit repetitive in places. Some editing would be advised.

      We did not identify anything repetitive in the abstract except the parallel construction referring to the previous findings at the NMJ and current findings in cortical neurons. However, we have eliminated a section in the introduction which went into detail about the receptor imaging results (previous lines 103-110).

      Line 77: 'shift toward early awakening' is unclear; do you mean shorter sleep/wake cycle? Other circadian issues? A more complete description is needed.

      We have moved the additional detail about the Earlybird mutation’s effect on circadian period from the Results to the Introduction, new lines 77 to 79.

      The results section has many passages that seem more like discussion, offering various interpretation and alternatives for the data. While some commentary is appropriate, to justify the next series of experiments and maintain a logical flow, this manuscript has rather a high amount of this. Some editing and shifting material to the discussion might be warranted.

      We have reduced the commentary in the Results section.

      Line 245: GluA2 homomers are really unlikely, as they won't pass current (unless unedited) and don't often if ever form. But GluA2/A3 heteromers are likely (and detected by their methods).

      GluA2 homomers do conduct current, albeit less than heteromers (Swanson et al., 1997; Oh and Derkach, 2005; Coombs et al., 2019). [The Oh and Derkach paper shows a GluA2 homomer current in Supplementary Figure 3]. We have modified the text to acknowledge that the GluA2 receptor imaging will detect heteromers and homomers (Results, new lines 214 to 215).

      Line 258: If the number of synaptic pairs analyzed was usually <20, what was the average and range of pairs? This gets into the sampling issue.

      We have added the average number of synaptic sites (20.4 ± 6.5) and range (11-38) to the text, Results, new line 229.

      Are the stats of the baseline mEPSC amplitude and frequency shifts (WT vs KO on WT feeder layer) given somewhere (lines 398-402)? If not, please add them.

      These stats have been added to the text, mEPSC amplitude, (CON, WT on WT, 13.3 ± 0.5 pA; CON, KO on WT, 15.2 ± 1.1 pA, p = 0.23, Kruskal-Wallis test), new lines 325-326 and frequency, (CON, WT on WT, 2.54 ± 0.57 sec<sup>-1</sup>; CON, KO on WT, 4.46 ± 1.21 sec<sup>-1</sup>, p = 0.23, Kruskal-Wallis test), new lines, 329-330.

      25mM K+ is going to be much more than 'mildly' depolarizing (line 697). Should just skip that word.

      ‘mildly’ has been removed.

      The section on MiniAnalysis seems overly argumentative, and there is no need to discuss flaws in the Wu paper. The important thing (a bit buried at the end of this section) is that the manual mini selection was done blind to condition, which is the normal way of dealing with potential bias. It would be better to limit the methods to describing what was done.

      The bulk of the justification of manual analysis has been removed from the text.

      The discussion of potential conductance changes (lines 534-6) seems somewhat unwarranted.

      Modification of GluA1 phosphorylation in the GluA1/A2 heteromer would not be detected by NASPM (and the NASPM data being a bit inconclusive anyway). Further, auxiliary subunits (like TARPs) can alter conductance of any of the AMPARs. So I don't think they have enough data to exclude such a possibility.

      The discussion of contributions of conductance have been removed from the text.

      Coombs ID, Soto D, McGee TP, Gold MG, Farrant M, Cull-Candy SG (2019) Homomeric GluA2(R) AMPA receptors can conduct when desensitized. Nat Commun 10:4312.

      Oh MC, Derkach VA (2005) Dominant role of the GluR2 subunit in regulation of AMPA receptors by CaMKII. Nat Neurosci 8:853-854.

      Swanson GT, Kamboj SK, Cull-Candy SG (1997) Single-channel properties of recombinant AMPA receptors depend on RNA editing, splice variation, and subunit composition. J Neurosci 17:5869.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors investigate the role of BEND2, a novel regulator of meiosis, in both male and female fertility. Huang et al have created a mouse model where the fulllength BEND2 transcript is depleted but the truncated BEND2 version remains. This mouse model is fertile, and the authors used it to study the role of BEND2 on both male and female meiosis. Overall, the full-length BEND2 appears dispensable for male meiosis. The more interesting phenotype was observed in females. Females exhibit a lower ovarian reserve suggesting that full-length BEND2 is involved in the establishment of the primordial follicle pool.

      Strengths:

      The authors generated a mouse model that enabled them to study the role of BEND2 in meiosis. The role of BEND2 in female fertility is novel and enhances our knowledge of genes involved in the establishment of the primordial follicle pool.

      Weaknesses:

      The manuscript extensively explores the role of BEND2 in male meiosis; however, a more interesting result was obtained from the study of female mice. Only a few experiments were performed using female mice, therefore, more experiments should be performed to complete the story of the role of BEND2 on female fertility. In addition, the title and abstract of the manuscript do not align with the story, as female fertility is only a small portion of the data compared to the male fertility section.

      We appreciate the reviewer’s thoughtful summary, recognition of the strengths of our study, and constructive feedback. In the revised manuscript, we have performed additional experiments to enhance our understanding of the role of BEND2 in female gametogenesis. These new experiments provide further insights into the establishment of the ovarian reserve and the role of BEND2 in female fertility.

      Additionally, we have rewritten the title, abstract, and introduction to better align with the content of the manuscript and to reflect the balance between the male and female fertility results. We believe these changes address the reviewer’s concerns and improve the overall clarity and focus of the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      • I recommend that the authors re-organize their abstract and introduction to accurately reflect the manuscript's primary focus on male fertility. Right now, the title of the manuscript is misleading. The manuscript does not investigate reproductive aging; rather, it primarily describes the depletion of primordial follicle number. The mechanism behind this depletion and whether this phenotype accelerates reproductive aging, are not explored. Clarifying these points will help align the title and content of the manuscript more accurately.

      We thank the reviewer for this suggestion. We agree that the original title and abstract did not fully capture the focus of the study. In response, we have rewritten the title, abstract, and introduction to better align with the results presented, focusing more clearly on the implications of the effects of the full-length BEND2 depletion for spermatogenesis and oogenesis. These revisions ensure that the title, the abstract, and the manuscript's introduction are now more accurately reflective of the work performed.

      • Figure 1: I couldn't find the validation of the polyclonal antibody against BEND2 that the authors generated.

      Regarding this query about the validation of the polyclonal antibody against BEND2, we apologize for any confusion. We would like to clarify that this validation is indeed presented in Figure 2 of our manuscript. To ensure this information is easily accessible, we have revised the text to explicitly mention the validation in Figure 2.

      • Figure 2A: Could you provide the actual numbers for the weight of the mice testis?

      In response to this question regarding Figure 2A and the weights of the mice testis, we have now included this data in a graph in Fig 2A and Table S1 and added this information in the results section.

      • Figure 2C and D: I am confused by the fact that in the WB we can appreciate a high expression of the p75 protein, but the signal is very low in the IF (Figure 2D).

      We thank the reviewer for raising this point. We acknowledge the apparent discrepancy between the strong p75 signal observed in the Western blot (Fig. 2C) and the weaker signal seen in the immunofluorescence (Fig. 2D). We think several factors could contribute to this difference, such as differences in sensitivity and detection methods, epitope accessibility, protein localization or differences in sample preparation, antibody affinity, and experimental conditions between Western blot and IF.

      • In the same figure, the authors also mention that the p75 protein is functional. On what basis do they rely on reaching this conclusion?

      We acknowledge that we cannot definitively confirm the functionality of the p75 protein. Our assumption was based on the observed fertility of the male mice and existing literature indicating that BEND2 is essential for completing meiosis (Ma et al., 2022). However, we understand the importance of clarity in our claims. To avoid any potential confusion, we have revised the sentence to read: "The p75 BEND2 protein—likely corresponding to an exon 11-skipped transcript—is present and might be functional in our mutant testis, based on the observed phenotype (see below)."

      • The phenotype in females is very interesting. The authors conclude that BEND2 influences primordial follicle formation, oocyte quality, fertility, and reproductive aging by (1) performing follicle counts, (2) analyzing the litter size, and (3) analyzing meiotic progression. Given that the authors build their story around these experiments, I strongly encourage them to expand the section on female fertility, or reorganize the manuscript, or be more cautious with some of their conclusions. They might consider performing additional experiments such as:

      - Oocyte quality: To determine whether BEND2 impacts oocyte quality, mice should be stimulated with hormones and oocyte quality should be analyzed (GV, MI, MII progression, spindle morphology and/or fertilization, and embryo development). Does the decrease in primordial follicles correlate with the number of ovulated oocytes, or is the impact only on oocyte quality?

      We appreciate the reviewer's suggestion to assess the impact of BEND2 on oocyte quality. Following the reviewer’s recommendation, we stimulated three control and three mutant mice. We analyzed the number of ovulated oocytes, their fertilization rate, and the percentage of embryos that developed to the blastocyst stage. These new results are included in the revised manuscript (see Results section and new Table 1). Our analyses indicate that for all parameters assessed, control and mutant oocytes behaved similarly. Specifically, there were no significant differences in the number of ovulated oocytes, fertilization rates, or the ability of embryos to progress to the blastocyst stage between the control and mutant groups. These findings suggest that mutant oocyte quality is comparable to control mice of a similar age. We have incorporated these new results into the manuscript.

      - Reproductive aging: A fertility trial would provide more information on whether BEND2 depletion triggers an acceleration of reproductive aging. In addition, the oldest mice used by the authors are 9 months old, and at this point, fertility has not declined yet.

      We appreciate the reviewer's suggestion regarding the assessment of reproductive aging. However, we respectfully disagree with the assertion that fertility has not declined by 9 months of age. In our colony, we have observed a significant decline in fertility around 10 months of age. Specifically, out of 18 10-month-old female mice placed in breeding cages, we observed only three pregnancies within the first 30 days (N.N. and I.R., data not published). Based on these observations, we determined that fertility begins to decline around this age in our colony, which informed our decision to use 9-month-old mice as the oldest age group for our analysis. Thus, this age is appropriate for evaluating the potential effects of BEND2 depletion on reproductive aging in our specific mouse population.

      - The observation that the primordial follicle pool is already diminished in mice that are 1 week old is very interesting. Some experiments that the authors could perform to figure out the mechanism are: (1) Analyzing apoptosis. Are the primordial follicles dying during the pool's establishment, or is this an ongoing apoptotic process throughout the mice's lifespan? (2) If the authors still have ovaries from mice younger than 1 week of age (when the primordial pool is forming), they could perform DDX4 staining and quantify the number of oocytes in follicles and the total number of oocytes. These experiments would provide mechanistic insights into whether BEND2 impacts the formation of the primordial follicle pool or if the pool forms but is then depleted.

      We appreciate the reviewer's suggestion to further explore the mechanism behind the reduced primordial follicle pool. In response, we have analyzed the number of DDX4positive cells (DDX4 labels oocytes) in newborn mutant and wild-type animals. Our results show that mutant ovaries contain significantly fewer oocytes compared to controls (see new Fig. 5). This finding supports the hypothesis that BEND2 is critical for the establishment of a normal ovarian reserve. We are grateful for this suggestion, as these additional data reinforce our conclusion that BEND2 is required to determine a normal ovarian reserve in mice.

      • What is the red signal in Supplementary Figure 1C?

      This image depicts the BEND2 staining pattern in 16 days post-coitum (dpc) wild-type mouse ovaries. To clarify this and prevent any confusion, we have updated the figure legend to explicitly state that the sample shown is from a wild-type mouse.

      • Please spell out the full term of all the acronyms.

      We apologize for the oversight in not fully spelling out some acronyms in the original manuscript. We have carefully reviewed the entire manuscript and have ensured that all acronyms are now spelled out in full upon their first use in the revised version. We want to thank the reviewer for bringing this to our attention.

      • Is Line-1 also dysregulated in the ovary? This was one of the main findings from the male part. It would be interesting to perform the same analysis in the ovary since Line1 has a role in establishing the ovarian reserve (PDMI: 31949138).

      We thank the reviewer for this insightful suggestion. We have analyzed the number of LINE1 and SYCP3-positive cells in wild-type and mutant newborn ovaries (new Fig. S4). Our results show no significant difference between the two genotypes, suggesting that LINE-1 is not dysregulated in newborn Bend2 mutant oocytes. These findings indicate that, at least in the context of the newborn ovary, LINE-1 does not appear to be affected by BEND2 depletion.

      Reviewer #2 (Public Review):

      In their manuscript entitled "BEND2 is a crucial player in oogenesis and reproductive aging", the authors present their findings that full-length BEND2 is important for repair of meiotic double strand break repair in spermatocytes, regulation of LINE-1 elements in spermatocytes, and proper oocyte meiosis and folliculogenesis in females. The manuscript utilizes an elegant system to specifically ablate the full-length form of BEND2 which has been historically difficult to study due to its location on the X chromosome and male sterility of global knockout animals.

      While the manuscript is an overall excellent addition to the field, it would significantly benefit from a few additional experiments, as well as some additional clarification/elaboration.

      The claim that BEND2 is required for ovarian reserve establishment is not supported, as the authors only look at folliculogenesis and oocyte abundance starting at one week of age, after the reserve is formed. Analysis of earlier time points would be much more convincing and would parse the role of BEND2 in the establishment vs. maintenance of this cell population. In spermatocytes, the authors demonstrate a loss of nuclear BEND2 in their mutant but do not comment on the change in localization (which is now cytoplasmic) of the remaining protein in these animals. This may have true biological significance and a discussion of this should be more thoroughly explored.

      We thank the reviewer for their thoughtful feedback and constructive suggestions to improve our manuscript.

      In response to the comment regarding the establishment of the ovarian reserve, we have now analyzed Bend2 mutant and control newborn ovaries. Our results show a significant reduction in the number of DDX4-positive cells in mutant ovaries compared to controls. These findings demonstrate that BEND2 is required for the establishment of the ovarian reserve, as the reduction is evident at birth.

      Regarding the cytoplasmic staining of BEND2 in mutant spermatocytes, we did perform secondary-antibody-only controls using goat anti-rabbit Cy3 to address the specificity of the signal. The staining observed in the Bend2 mutants closely resembles background staining, suggesting that the cytoplasmic signal is nonspecific. Therefore, we do not believe this represents a meaningful change in the localization of BEND2 protein in the mutants. We have clarified this in the revised manuscript to address this point.

      We hope these additional experiments and clarifications strengthen the manuscript and address the reviewer’s concerns.

      Reviewer #2 (Recommendations For The Authors):

      Major points:

      (1) The title of the manuscript does not accurately capture the content of the work. The vast majority of the data presented here is from the male, which is not reflected at all in the title - perhaps considering revising it?

      Thank you for your valuable suggestion. We agree that the original title did not fully reflect the focus of the manuscript. In response, we have revised the title, along with the abstract and introduction, to more accurately capture the content of the study and the emphasis on the male data. These changes ensure that the manuscript more clearly aligns with the results presented.

      (2) In Figure 2D, the authors demonstrate that WT BEND2 expression and localization are lost in the mutant, but staining is still apparent, just in the cytoplasm. Did the authors perform secondary-antibody-only controls to determine if this was background staining or real staining? If real, can they comment on the change in localization of the protein?

      We thank the reviewer for this insightful question. We have indeed performed secondary antibody-only controls using goat anti-rabbit Cy3. The staining observed in the Bend2 mutants closely resembles background staining, suggesting that the signal in the cytoplasm is not specific. Therefore, we do not believe this staining represents any real or meaningful expression of the BEND2 protein in the mutants.

      (3) In Figure S2A, the authors show Ku70 staining and describe that it is similar between the genotypes, but - to my eye - it looks quite distinctly different. It appears to stain in patches in WT SYCP3+ spermatocytes, versus staining in patches in the more mature, SYCP3- germ cells closer to the lumen in the mutant. Can the authors please clarify, or provide arrows to point which foci they are referring to?

      We apologize for the confusion caused by the image provided in the original submission. Upon review, we realized that the mutant image was not fully representative of the staining pattern observed in the majority of mutant samples. We have replaced this image with a new one in the revised manuscript, which more accurately reflects the similarity in Ku70 staining between wild-type and mutant testis. In this updated Figure S2, we have also included arrowheads to indicate the relevant foci, making it clearer to the reader. We have updated the figure legend to correspond with these changes as well.

      (4) The authors state that BEND2 is "required to establish the ovarian reserve during oogenesis" but this has not been demonstrated. The authors do show a reduced density of primordial follicles at one week of age. While this is compelling data, the ovarian reserve is established earlier in the mouse, around postnatal days 0-1, so it is not clear from this manuscript whether BEND2 is required for the maintenance of this population after PND1, leading to reduced numbers by 1 week of age, OR if it is required for the establishment of this population, which would result in reduced numbers of oocytes around the time of birth. This is a critical experiment that should be performed in order to determine which of these possibilities is likely the case. Ideally, looking at embryonic through early postnatal time points during ovarian development would be very helpful.

      We thank the reviewer for raising this important point. As mentioned earlier in response to Reviewer 1, we have performed the experiment suggested by Reviewer 2 and analyzed the number of DDX4-positive cells in newborn ovaries. Our results show that Bend2 mutant ovaries have fewer oocytes at birth than wild-type controls (Fig. 5H). This finding reinforces our conclusion that BEND2 is indeed required to establish the ovarian reserve, as the reduction in oocyte number is evident at the time of birth. We agree that this additional data strengthens our original claim, so we have included these results in the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      Huang et al. investigated the phenotype of Bend2 mutant mice which expressed a truncated isoform. This mutant male showed increasing apoptosis due to unrepaired double-strand breaks. However, this mutant male has fertility, and this enabled them to analyze Bend2 function in females. They revealed that Bend2 mutation in females showed decreasing follicle numbers which leads to loss of ovarian reserve.

      Strengths:

      Since their Bend2 mutant males were fertile, they were able to analyze the function of Bend2 in females and they revealed that loss of Bend2 causes less follicle formation.

      Weaknesses:

      Why the phenotype of their mutant male is different from previous work (Ma et al.) is not clear enough although they discuss it.

      We appreciate the reviewer’s comment regarding the differences between our Bend2 mutant male phenotype and the previously reported phenotype by Ma et al., 2022. We believe this discrepancy is due to the fact that the Bend2 locus encodes two BEND2 isoforms: p140 and p80. In contrast to the previous study, where both proteins were ablated by mutation employed (the deletion of exons 12 and 13), our exon 11 deletion specifically ablates p140 expression while allowing the expression of p80 in the testis.

      Based on the distinct phenotypes observed in the two Bend2 mutant mouse models, we hypothesize that p80 is sufficient to fulfill BEND2’s roles in meiosis, which could explain why our Bend2 mutant males remain fertile. We have rewritten the relevant sections in the results and discussion to better articulate this hypothesis and clarify the potential mechanisms behind the observed phenotypic differences.

      We hope these clarifications and additional details adequately address the reviewer’s concerns.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors showed that Bend2 mutant females had decreased fertility. This may be due to decreased ovarian reserve. Did the authors check if the mutant mice decreased or lost fertility faster than WT? If the authors have the data, please refer to it in the manuscript.

      We followed the breeding performance of a small number of control and Bend2 mutant females, and preliminary observations suggested no clear differences between the two groups. However, due to the limited sample size, we felt that these data were not conclusive enough to be included in the manuscript. We agree that a more thorough analysis of fertility decline over time would be valuable, and we plan to address this question in a future study.

      (2) In Figure 1 A, there is no exon1 in the upper figure.

      We thank the reviewer for pointing this out. We have revised Figure 1A to include exon 1 and ensure the schematic is accurate. The updated figure is included in the revised version of the manuscript.

      (3) Figure 3A, it would be nice to show several tubules of the testis section as well as an enlarged one.

      Following the reviewer's advice, we have revised Figure 3A to include new images showing several tubules and an enlarged view of one section of a tubule. These updates are included in the revised manuscript to better represent the testis sections.

      (4) Please be consistent with the format of the graph, especially Supplemental figures 2C and 4D.

      We have revised the figures, including Supplemental Figures 2C and 4D, to ensure consistency in the format throughout the manuscript. We have made modifications to the figures to align them more closely and improve the overall presentation.

    1. Author response:

      We are grateful to the reviewers and editors for their time and positive assessment of our manuscript. We will incorporate all their comments to further improve our work. In the revised version of the manuscript, we will provide a more detailed description of the quantification of the wrapping index and further explain the differential roles of Htl and Uif during cell growth versus the role of Notch during axon wrapping. In addition, we will perform further experiments using combinations of reporters and antibodies to further explore the relationship between Htl, Uif and Notch. The discussion will be expanded and possible mechanisms by which Uif 'stabilises' a specific membrane domain will be included.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewing Editor Comment:<br /> Please note that all three reviewers suggested this manuscript would best fit as a resource paper at eLife.

      Reviewer #1 (Public review):

      Summary:

      This impressive study presents a comprehensive scRNAseq atlas of the cranial region during neural induction, patterning, and morphogenesis. The authors collected a robust scRNAseq dataset covering six distinct developmental stages. The analysis focused on the neural tissue, resulting in a highly detailed temporal map of neural plate development. The findings demonstrate how different cell fates are organized in specific spatial patterns along the anterior-posterior and medial-lateral axes within the developing neural tissue. Additionally, the research utilized high-density single-cell RNA sequencing (scRNAseq) to reveal intricate spatial and temporal patterns independent of traditional spatial techniques.

      The investigation utilized diffusion component analysis to spatially order cells based on their positioning along the anterior-posterior axis, corresponding to the forebrain, midbrain, hindbrain, and medial-lateral axis. By cross-referencing with MGI expression data, the identification of cell types was validated, affirming the expression patterns of numerous known genes and implicating others as differentially expressed along these axes. These findings significantly advance our understanding of the spatially regulated genes in neural tissues during early developmental stages. The emphasis on transcription factors, cell surface, and secreted proteins provides valuable insights into the intricate gene regulatory networks underpinning neural tissue patterning. Analysis of a second scRNAseq dataset where Shh signaling was inhibited by culturing embryos in SAG identified known and previously unknown transcripts regulated by Shh, including the Wnt pathway.

      The data includes the neural plate and captures all major cell types in the head, including the mesoderm, endoderm, non-neural ectoderm, neural crest, notochord, and blood. With further analyses, this high-quality data promises to significantly advance our understanding of how these tissues develop in conjunction with the neural tissue, paving the way for future breakthroughs in developmental biology and genomics.

      Strengths:

      The data is well presented in the figures and thoroughly described in the text. The quality of the scRNAseq data and bioinformatic analysis is exceptional.

      Weaknesses:

      No weaknesses were identified by this reviewer.

      Reviewer #2 (Public review):

      Summary:

      Brooks et al. generate a gene expression atlas of the early embryonic cranial neural plate. They generate single-cell transcriptome data from early cranial neural plate cells at 6 consecutive stages between E7.5 to E9. Utilizing computational analysis they infer temporal gene expression dynamics and spatial gene expression patterns along the anterior-posterior and mediolateral axis of the neural plate. Subsequent comparison with known gene expression patterns revealed a good agreement with their inferred patterns, thus validating their approach. They then focus on Sonic Hedgehog (Shh) signalling, a key morphogen signal, whose activities partition the neural plate into distinct gene expression domains along the mediolateral axis. Single-cell transcriptome analysis of embryos in which the Shh pathway was pharmacologically activated throughout the neural plate revealed characteristic changes in gene expression along the mediolateral axis and the induction of distinct Shh-regulated gene expression programs in the developing fore-, mid-, and hindbrain.

      Strengths:

      This manuscript provides a comprehensive transcriptomic characterisation of the developing cranial neural plate, a part of the embryo that to my knowledge has not been extensively analysed by single-cell transcriptomic approaches. The single-cell sequencing data appears to be of high quality and will be a great resource for the wider scientific community. Moreover, the computational analysis is well executed and the validation of the sequencing data using published gene expression patterns is convincing. Taken together, this is a well-executed study that describes a relevant scientific resource for the wider scientific community.

      Weaknesses:

      Conceptually, the findings that gene expression patterns differ along the rostrocaudal, mediolateral, and temporal axes of the neural plate and that Shh signalling induces distinct target genes along the anterior-posterior axis of the nervous system are more expected than surprising. However, the strength of this manuscript is again the comprehensive characterization of the spatiotemporal gene expression patterns and how they change upon ectopic activation of the Shh pathway.

      Reviewer #3 (Public review):

      Summary:

      The authors performed a detailed single-cell analysis of the early embryonic cranial neural plate with unprecedented temporal resolution between embryonic days 7.5 and 8.75. They employed diffusion analysis to identify genes that correspond to different temporal and spatial locations within the embryo. Finally, they also examined the global response of cranial tissue to a Smoothened agonist.

      Strengths:

      Overall, this is an impressive resource, well-validated against sets of genes with known temporal and spatial patterns of expression. It will be of great value to investigators examining the early stages of neural plate patterning, neural progenitor diversity, and the roles of signaling molecules and gene regulatory networks controlling the regionalization and diversification of the neural plate.

      Weaknesses:

      The manuscript should be considered a resource. Experimental manipulation is limited to the analysis of neural plate cells that were cultured in vitro for 12 hours with SAG. Besides the identification of a significant set of previously unreported genes that are differentially expressed in the cranial neural plate, there is little new biological insight emerging from this study. Some additional analyses might help to highlight novel hypotheses arising from this remarkable resource.

      We thank all three reviewers for their thoughtful and constructive public reviews and believe they nicely capture the contributions of our study. We agree that this article represents a valuable resource for the community and agree with its designation as a Tools and Resources article.

      We also thank the reviewers for their useful suggestions for improving the manuscript. In addition to addressing most of their comments, described below, we note that we have changed midbrain-hindbrain boundary (MHB) to rhombomere 1 (r1) throughout the paper and in Tables S4, S7, S10, and S11, as this designation is more closely aligned with the literature on this region. In addition, we added the anterior-posterior and mediolateral cluster identities from our wild-type analysis for the genes that were differentially expressed in SAG-treated embryos in Table S11. Lastly, we have added a new figure (Figure 5—figure supplement 2), as suggested by Reviewer 2, in which we compare our results with the published expression of genes in neural progenitor domains along the dorsal-ventral axis of the spinal cord.

      Reviewer #1 (Recommendations for the authors):

      I have a few small suggestions for improving the presentation of the data.

      (1) It would be helpful to show illustrations and embryo images of all the stages utilized in the analysis in Figures 1A and B.

      (2) It was difficult to distinguish all the different colors in Figures 3B and 4B. Could you label, as in Figure 4, supplements 1D, F?

      (3) I was confused by the position of the color code key for Figure 7D-J, thinking it belonged to panels B and C. Could you put it under the figure/heatmap key so that it is clearly linked to panels D-J?

      Thank you for these suggestions. We have incorporated the third suggestion to improve readability, but were not able to make the first two changes due to space limitations.

      Reviewer #2 (Recommendations for the authors):

      I only have a couple of minor additional suggestions/questions for the authors:

      (1) The authors state that nearly half of the transcripts they found as differentially regulated in SAG-treated embryos were also characterized as spatially regulated in the wild-type embryos. It would be great if the authors could provide more detail here. How many of the transcripts that are differentially regulated along the mediolateral axis of the wild-type are characterized as differentially regulated in the SAG-treated embryos? How does this further break down into where these genes are expressed along the mediolateral and the anterior-posterior axes? I am aware that the authors answer some of these questions already by providing examples, but a more systematic characterisation would be appreciated here.

      We have updated Table S11 to include the anterior-posterior and mediolateral cluster identities of differentially expressed genes in SAG-treated embryos, where applicable. In addition, we have added more discussion of the genes from our SAG analysis that were also found to be spatially patterned in wild-type embryos to the fourth paragraph of the last results section.

      (2) Related to the previous question, the authors nicely demonstrate that SAG treatment of embryos causes many transcriptional changes, including the expression/repression of several transcription factors well-known to mediate spatial patterning, raising the question of which of these effects are directly due to gene regulation by the Shh pathway and which effects are secondary consequences of transcriptional changes of other transcription factors. Similarly, the authors' results also suggest that some genes are only induced in specific parts along the neuraxis, raising the question of why. The authors could attempt some type of regulon-interference approaches to identify further candidates that may mediate these effects.

      This is an excellent suggestion for a future extension of this work, as we agree that validation of the predicted SHH targets, including which targets are direct, indirect, or region-specific, would be required to evaluate the predictions of this scRNA-seq analysis.

      (3) The authors report that they observed 'a previously unreported inhibition of Scube2' upon SAG treatment of the embryos. At least in the spinal cord Scube2 is well-known to be expressed at a distance from the source of Shh secretion (e.g. Kawakami et al. Curr. Biol. 2005), thus the direct or indirect repression by Shh signalling is strongly expected. Moreover, a recent preprint (Collins et al. bioRxiv, https://doi.org/10.1101/469239 ) suggests that the interaction between Shh and Scube2 can mediate the scale-invariance of Shh patterning. Of note, the authors of this preprint also state that 'upregulation of Shh represses scube2 expression while Shh downregulation increases scube2 expression thus establishing a negative feedback loop.'

      Thank you for this suggestion. We have added these references.

      (4) The authors partition genes based on different diffusion components as being differentially expressed along the mediolateral axis. However, starting from ~e8.5, neural progenitors in the neural tube can be partitioned based on the expression of well-characterised combinatorial sets of transcription factors into molecularly defined progenitor domains that subsequently give rise to functionally distinct types of neurons. How much of this patterning process can the authors capture with their diffusion component analysis and does their data also allow them to capture these finer-grained differences in gene expression along the mediolateral and prospective dorsal-ventral axis of the neural tube that are known to exist?

      This is a very interesting point. We have added a new figure showing UMAPs of the E8.5-9.0 cranial neural plate for a subset of 29 genes (described in Delile et al., 2019) that define distinct neural progenitor domains along the dorsal-ventral axis of the spinal cord (Figure 5—figure supplement 2). We observed that 18 of 20 genes that were detected in the midbrain/r1 region in our dataset were expressed in broad domains along the mediolateral axis of the cranial neural plate that were roughly consistent with their expression domains along the dorsal-ventral axis of the spinal cord. Of these 18 genes, 14 were patterned along both anterior-posterior and mediolateral axes, 2 were patterned only along the mediolateral axis, and 2 were patterned only along the anterior-posterior axis. These results suggest a general correspondence between mediolateral patterning in the cranial neural plate and dorsal-ventral patterning in the spinal cord. However, less refinement of these domains along the mediolateral axis was observed in the cranial neural plate, possibly because the relatively early, pre-closure stages captured by our dataset may be before the establishment of secondary feedback systems that lead to fine-scale patterning of mutually exclusive neural precursor domains. These results are described in the last paragraph of the results section titled “An integrated framework for analyzing cell identity in multiscale space.”

      (5) The authors state that they will not only make the raw sequencing data but also the processed intermediate data files available. This is greatly appreciated as it strongly facilitates the re-use of the data. However, it would be also appreciated if the authors made the computational code publicly available that was used to analyze the data and generate the figure panels in the manuscript.

      We have deposited the processed h5ad files in the GEO database, accession number GSE273804. Additionally, we have made interactive python notebooks available with the code used to analyze gene expression and generate the figures in this study, as well as code used to automatically generate customizable links to gene expression images in the Mouse Genome Informatics Gene Expression database, on our lab GitHub page (https://github.com/ZallenLab). We have updated the Data availability section to reflect these changes.

      Reviewer #3 (Recommendations for the authors):

      (1) Considering that individual progenitor domains in the developing neural tube are typically sharply delineated with few cells exhibiting mixed identities, it is interesting that clustering of single-cell data results in a largely continuous “cloud” of cells. Is this because the early neural plate cells have not yet crystallized their identity, or would clustering based on a smaller set of genes that exhibit high variance across only neural plate cells result in improved granularity, allowing for better characterization and quantification of distinct progenitor subtypes?

      Thank you for raising this interesting point. The apparent continuity of gene expression in the cranial neural plate could reflect a gene signature shared by cranial neural plate cells and that cells may not be extensively regionalized into unique populations at these early stages. We now discuss these possibilities in the third paragraph of the discussion.

      (2) Can the authors clarify how neural plate cells were identified and how they were distinguished from the anterior epiblast?

      Cell typing was performed by supervised clustering based on known markers of fate. Cranial neural plate cells were identified by their expression of pan-neural factors (Sox2 and Sox3), early or late neural plate markers (Cdh1 or Cdh2), and the lack of markers associated with non-neural ectodermal cell fates (Grhl2, Krt18, Tfap2a) or other cell types (Ets1, T, Tbx6). Full gene sets used to identify all cell types in our analysis are provided in Supplementary Table 13.

      (3) Did the study identify cells with cranial placode identity? Cranial placodes emerge during the same period, and it would be useful to highlight them in Figure 1.

      Thank you for highlighting this point. Examination of the early placode markers Six1 and Eya1 indicates that cranial placode cells are a subset of the cells in PhenoGraph cluster 17 in our full dataset Figure 1—figure supplement 1). We now mention this along with other cell types of interest in the last paragraph of the discussion.

      (4) It could be interesting to provide more information about the novel genes identified as differentially expressed along the AP or mediolateral axes. Do they belong to gene families that were not previously implicated in neural patterning, or do they point to novel biological mechanisms controlling neural patterning?

      Diverse gene families are represented by the genes that are patterned along the anterior-posterior and mediolateral axes of the cranial neural plate at these stages, likely due to the large number of genes that are spatially patterned in this tissue. Further investigation of the biological mechanisms suggested by these patterns is an important direction for future work, both in terms of molecularly classifying the genes identified as well as directly investigating their roles in neural patterning using genetic analysis.

      (5) It would be helpful to discuss how the data presented here compare to other relevant single-cell analyses, such as PMC10901739. This would help to highlight aspects that are unique to this study.

      We have added this reference as well as an earlier study from these authors and we discuss how our study complements this work in the introduction.

      (6) The inclusion of single-cell data from control embryos that were cultured for 12 hours is of great interest. The authors should identify the set of genes that are deregulated in cultured cells and, taking advantage of their detailed temporal series, examine whether the maturation of cultured embryos progresses normally or whether there are genes that fail to mature correctly in vitro.

      We agree that an analysis of the impact of ex vivo culture on gene expression would be useful. However, the large difference in the number of cells in our wild-type and cultured embryo datasets, as well as the lack of time-course data for the cultured embryos, could make a comparison between our current cultured and non-cultured embryo datasets difficult to interpret.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors studied how hippocampal connectivity gradients across the lifespan, and how these relate to memory function and neurotransmitter distributions. They observed older age with less distinct transitions and observed an association between gradient de-differentiation and cognitive decline.

      This is overall an innovative and interesting study to assess gradient alterations across the lifespan and its associations to cognition.

      The paper is well-written, and the methods appear sound and thoughtful. There are several strengths, including the inclusion of two independent cohorts, the use of gradient mapping and alignment techniques, and an overall sound statistical and analysis framework. There are several areas for potential improvements in the paper, and these are listed below:

      We thank the Reviewer for their positive assessment and summary of our work. We address each of the Reviewer’s comments below, and outline the revisions we have made to the manuscript based on the Reviewer’s suggestions.

      (1) The reported D1 associations appear a bit post-hoc in the current work and I was unclear why the authors specifically focussed on dopamine here, as other transmitter systems are similar present at the level of the hippocampus and implicated in aging.

      Other neurotransmitter systems may indeed be relevant in the context of hippocampal function in aging. In this study, however, we included a specific research question about the DA D1 receptor (D1DR) based on previous research 1) emphasizing the role of DA neuromodulation in maintaining functional network segregation in aging to support cognition (Pedersen et al., 2023), 2) reporting heterogeneous distribution of DA markers across the hippocampus, supporting efficient modulation of distinct behaviors (Dubovyk & ManahanVaughan, 2019; Edelmann & Lessmann, 2018; Gasbarri et al., 1994; Kempadoo et al., 2016), and 3) demonstrating the spatial distribution of D1DRs as varying across neocortex along a unimodal-transmodal gradient (Pedersen et al., 2024). To which degree this variation might be reflected in cortico-hippocampal connectivity, however, remained to be investigated. As such, one of the study’s specific aims was to evaluate the spatial distribution of D1DRs as a molecular correlate of the hippocampus’ functional organization. Importantly, we were interested in mapping associations between individual differences in the organization of connectivity and D1DRs. This was uniquely enabled by utilizing the DyNAMiC sample, as it includes structural and functional MRI data in combination with D1DR PET in the same individuals across the adult lifespan (n=180). However, after observing significant spatial correspondence between functional organization and D1DR expressed by the second hippocampal gradient (G2), we did indeed perform complimentary analyses with group-averaged data of additional dopamine markers (D2DR from a subsample of our participants, as well as DAT and FDOPA from open sources) to test the generalizability of the original finding. Taken together, the original analyses based on subject-level data and complimentary group-level analyses provided support for the interpretation of G2 as a dopaminergic mode.

      We have updated the manuscript to clarify the focus on the D1 receptor and the contribution of including additional DA markers.

      Updated paragraph in the Introduction, pages 5-6:

      “Dopamine (DA) is one of the most important modulators of hippocampus-dependent function(47,48), and influences the brain’s functional architecture through enhancing specificity of neuronal signaling(49). Consistently, there is a DA-dependent aspect of maintained functional network segregation in aging which supports cognition(50). Animal models suggest heterogeneous patterns of DA innervation(51,52) and postsynaptic DA receptors(53), across both transverse and longitudinal hippocampal axes, likely allowing for separation between DA modulation of distinct hippocampus-dependent behaviors(47). Moreover, the human hippocampus has been linked to distinct DA circuits on the basis of long-axis variation in functional connectivity with midbrain and striatal regions(54,55). Taken together with recent findings revealing a unimodal-transmodal organization of the most abundantly expressed DA receptor subtype, D1 (D1DR), across cortex(56), we tested the hypothesis that the organization of hippocampal-neocortical connectivity partly reflects the underlying distribution of hippocampal DA receptors, predicting predominant spatial correspondence for any hippocampal gradient conveying a unimodal-transmodal pattern across cortex.”

      Updated sections in the Results, page 13-14:

      “Our next aim was to investigate to which extent the distribution of hippocampal DA D1 receptors (D1DRs), measured by [<sup>11</sup>C]SCH23390 PET in the DyNAMiC(58) sample, may serve as a molecular correlate of the hippocampus’ functional organization.”

      “Complimentary analyses were then conducted to further evaluate G2 as a dopaminergic hippocampal mode by utilizing additional DA markers at group-level.”

      Moreover, the authors may be aware that multiple PET tracers are somewhat challenged in the mesiotemporal region. Is this the case for the D1 receptor as well? The hippocampus is a small and complex structure, and PET more of a low res technique so one would want to highlight and discuss the limitations of the correlations with PET maps here and/or evaluate whether the analysis adds necessary findings to the study.

      We thank the Reviewer for raising this point. The lower resolution of PET is indeed a relevant aspect to consider when quantifying D1DR availability in the hippocampus, even though previous research indicate high test-retest reliability of [<sup>11</sup>C]SCH23390 PET measurement in this region (Kaller et al., 2017). We have now elaborated on PET limitations in the Discussion of the revised manuscript.

      In our study, we made efforts to reduce potential partial volume effects (PVE) by correcting our PET data, and tested spatial associations between our functional gradients and D1DR maps using trend-surface modelling (TSM), rather than through voxel-wise comparisons. This allowed us to evaluate the spatial correspondence between functional connectivity and D1DRs at a level of spatial trends, estimated using TSM models computed at increasing levels of complexity. The results showed consistent spatial overlap between G2 and D1DRs across these models, that is, across spatial trends described at coarser-to-finer scales. Furthermore, this was replicated across several DA markers with PET and SPECT data from independent samples.

      Taken together, we agree with the Reviewer that the spatial correspondence observed between G2 and hippocampal D1DRs should be interpreted in the context of resolution-related limitations inherent to PET imaging. However, we strongly believe that our DA analyses offer valuable insight to the molecular underpinnings of hippocampal functional organization.

      Updated paragraph in the Discussion, pages 25-26:

      “We discovered that G2, specifically, manifested organizational principles shared among function, behavior, and neuromodulation. Meta-analytical decoding reproduced a unimodalassociative axis across G2 (Figure 3B), and analyses in relation to the distribution of D1DRs – which vary across cortex along a unimodal-transmodal axis(76,77) – demonstrated topographic correspondence both at the level of individual differences and across the group. It should, however, be acknowledged that PET imaging in the hippocampus is associated with resolutionrelated limitations, although previous research indicate high test-retest reliability of [<sup>11</sup>C]SCH23390 PET to quantify D1DR availability in this region(78). As such, mapping the distribution of hippocampal D1DRs at a fine spatial scale remains challenging, and replication of our results in terms of overlap with G2 is needed in independent samples. Here, we evaluated the observed spatial overlap between G2 topography and D1DRs across multiple TSM model orders, showing correspondence between modalities from simple to more complex parameterizations of their spatial properties. Topographic correspondence was additionally observed between G2 and other DA markers from independent datasets (Figure 3B), suggesting that G2 may constitute a mode reflecting a dopaminergic phenotype, which contributes to the currently limited understanding of its biological underpinnings.”

      From my (perhaps somewhat biased) perspective, it might be valuable to instead or in addition look at measures of hippocampal microstructure and how these relate to the functional aging effects. This could be done, if available, using data from the same subjects (eg based on quantitative MRI contrasts and/or structural MRI) and/or using contextualization findings as implemented in eg hippomaps.readthedocs.io

      We thank the Reviewer for this suggestion. We performed additional analyses investigating the spatial overlap between our connectivity gradients and estimates of hippocampal microstructure, computed as the ratio of T1- over T2-weighted (T1w/T2w) images (Glasser & Von Essen, 2011; vos de Wael et al., 2018). Analyses of spatial correspondence then followed the TSM-based method used to test the spatial overlap between functional connectivity gradients and D1DR distribution. Applying TSM to the T1w/T2w image computed for each participant yielded subject-level model parameters describing microstructure topography, which were then entered as predictors of connectivity topography in multivariate GLMs (separate models for each gradient and hemisphere, 6 models in total).

      Analyses revealed that microstructure of the right hippocampus significantly predicted gradient topography of right-hemisphere G1 (F = 1.325, p \= 0.034), while no other links between connectivity gradients and microstructure emerged as significant (F 0.930-1.184, ps 0.7060.079).

      These results, suggesting an association along the anteroposterior axis, deviate from previous findings linking hippocampal microstructure to G3-like, medial-lateral, connectivity organization (vos de Wael et al., 2018). As we believe that comprehensive analyses of our gradients in relation to microstructure across the lifespan would be best addressed in future work, we have not included these analyses of microstructure in the revised manuscript.

      (2) Can the authors clarify why they did not replicate based on cohorts that are more widely used in the community and open access, such as CamCAN and/or HCP-Aging? It might connect their results with other studies if an attempt was made to also show that findings persist in either of these repositories.

      We agree with the Reviewer that replication in samples such as CamCAN and/or HCP-Aging would provide valuable opportunities to connect our findings with those of other studies using those datasets. Here, we included the Betula dataset (Nilsson et al., 2004) as our replication sample, as it was immediately available to us, included a large sample of adults in a comparable age, and a word recall episodic memory task closely aligned with the one included in DyNAMiC. Importantly, leveraging the Betula dataset as our replication sample allows us to link our findings to a wide range of previous studies central to the understanding of neurocognitive aging in general, and hippocampal aging in particular (Nyberg, 2017; Nyberg et al., 2020). Betula is a large longitudinal project that has been tracking individuals since 1988, and is part of the National E-infrastructure for Aging Research (NEAR: www.near-aging.se), through which data from several Swedish studies are made available to both national and international researchers. While we acknowledge the value of extending replication efforts to datasets like CamCAN and HCP-Aging, we emphasize the significant contribution of having replicated our connectivity gradients in the Betula dataset.

      (3) The authors applied TSM and related these parameters to topographic changes in the gradients. I was wondering whether and how such an approach controls for autocorrelation present in both the PET map and gradients. Could the authors clarify?

      The Reviewer raises an important topic in spatial autocorrelation. The TSM approach used to parameterize the topography of the functional gradients and D1DR distribution, and to test the spatial correspondence between modalities, did not include any specific method to control for autocorrelation. Here, we highlight two aspects of our study in relation to this point. First, we demonstrated in the Supplementary information (S. Figure 4) that autocorrelation induced by spatial smoothing likely has limited effects on overall gradient topography and the ability of TSM parameters to capture meaningful inter-individual differences in terms of age. Second, in the case of spatial overlap effects being significantly impacted by autocorrelation, we would expect the association between right-hemisphere G2 and D1DR topography to similarly emerge for G2 in the left hemisphere. The absence of such an association may speak to a limited effect of spatial autocorrelation.

      (4) The TSM approach quantifies the gradients in terms of x/y/z direction in a cartesian coordinate system. Wouldn't a shape intrinsic coordinate system in the hippocampus also be interesting, and perhaps even be more efficient to look at here (see eg DeKraker 2022 eLife or Paquola et al 2020 eLife)?

      This is a very relevant question and we appreciate the Reviewer’s suggestion. We recognize that there may be several benefits associated with adopting a shape-intrinsic coordinate system when characterizing effects in the hippocampus, given its curved/folded anatomy. Approaches like the ones adopted in DeKraker et al., 2022 and Paquola et al., 2020, utilizes geodesic coordinate frameworks to represent the hippocampus in surface space, enabling mapping of connectivity onto the hippocampal surface while respecting its inherent curvature and topology. We anticipate that quantifying gradients within such a framework would especially benefit identification of connectivity change across the hippocampal surface relative to reference points such as subfield boundaries, while minimizing effects of interindividual differences in hippocampal shape and folding. In our study, hippocampal gradients and their associated cortical patterns were computed in volumetric space, with TSM subsequently used to parameterize the change in connectivity along these gradients. This indeed yields a description of connectivity change within a coordinate system less specific to hippocampal anatomy, but may favor generalizability and integration with previous gradient findings within and beyond the hippocampus (e.g., Przeździk et al., 2019; Tian et al., 2020; Katsumi et al., 2023; Navarro-Schröder et al., 2015), as well as connections with broader neuroimaging frameworks through techniques such as meta-analytical decoding. In our view, the different coordinate frameworks offer complimentary insight to hippocampal organization, and while we have opted to not undertake novel analyses to explore our gradients within a geodesic coordinate system for the purposes of this paper, we recognize the importance of such evaluation of our gradients in future analyses. We have made updates to the Discussion in the revised manuscript on this topic (pages 23-24):

      “Greater anatomical specificity, with more precise characterization of connectivity in relation to subfield boundaries while minimizing effects of inter-individual differences in hippocampal shape and folding, might be achieved by adopting techniques implementing a geodesic coordinate system to represent effects within the hippocampus(68,69).”

      Reviewer #2 (Public Review):

      Summary:

      This paper derives the first three functional gradients in the left and right hippocampus across two datasets. These gradient maps are then compared to dopamine receptor maps obtained with PET, associated with age, and linked to memory. Results reveal links between dopamine maps and gradient 2, age with gradients 1 and 2, and memory performance.

      Strengths:

      This paper investigates how hippocampal gradients relate to aging, memory, and dopamine receptors, which are interesting and important questions. A strength of the paper is that some of the findings were replicated in a separate sample.

      Weaknesses:

      The paper would benefit from added clarification on the number of models/comparisons for each test. Furthermore, it would be helpful to clarify whether or not multiple comparison correction was performed and - if so - what type or - if not - to provide a justification. The manuscript would furthermore benefit from code sharing and clarifying which results did/did not replicate.

      We thank the Reviewer for their positive assessment and suggestions regarding further clarifications. We have addressed the Reviewer’s comments in a point-by-point manner under the “Recommendations for the authors” section.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors analyzed the complex functional organization of the hippocampus using two separate adult lifespan datasets. They investigated how individual variations in the detailed connectivity patterns within the hippocampus relate to behavioral and molecular traits. The findings confirm three overlapping hippocampal gradients and reveal that each is linked to established functional patterns in the cortex, the arrangement of dopamine receptors within the hippocampus, and differences in memory abilities among individuals. By employing multivariate data analysis techniques, they identified older adults who display a hippocampal gradient pattern resembling that of younger individuals and exhibit better memory performance compared to their age-matched peers. This underscores the behavioral importance of maintaining a specific functional organization within the hippocampus as people age.

      Strengths:

      The evidence supporting the conclusions is overall compelling, based on a unique dataset, rich set of carefully unpacked results, and an in-depth data analysis. Possible confounds are carefully considered and ruled out.

      Weaknesses:

      No major weaknesses. The transparency of the statistical analyses could be improved by explicitly (1) stating what tests and corrections (if any) were performed, and (2) justifying the elected statistical approaches. Further, some of the findings related to the DA markers are borderline statistically significant and therefore perhaps less compelling but they line up nicely with results obtained using experimental animals and I expect the small effect sizes to be largely related to the quality and specificity of the PET data rather than the derived functional connectivity gradients.

      We thank the Reviewer for the thoughtful summary and positive assessment of our work. To increase transparency of the statistical analyses, we have in the revised manuscript added information regarding statistical tests and corrections for multiple comparisons. In the Results, p-values were reported at an uncorrected statistical threshold, and we have in the revised manuscript included the corresponding p-values adjusted for multiple comparisons using the Benjamini-Hochberg method to control the false discovery rate (FDR). Finally, in the revised manuscript, we have now elaborated on the potential limitations of our PET analyses and we include the updated paragraph below.

      Addition made to the Results section, page 13:

      “Individual maps of D1DR binding potential (BP) were also submitted to TSM, yielding a set of spatial model parameters describing the topographic characteristics of hippocampal D1DR distribution for each participant. D1DR parameters were subsequently used as predictors of gradient parameters in one multivariate GLM per gradient (in total 6 GLMs, controlled for age, sex, and mean FD). Results are reported with p-values at an uncorrected statistical threshold and p-values after adjustment for multiple comparisons using the Benjamini-Hochberg method to control the false discovery rate (FDR).”

      Addition made to the Results section, page 15:

      “Effects of age on gradient topography were assessed using multivariate GLMs including age as the predictor and gradient TSM parameters as dependent variables (controlling for sex and mean frame-wise displacement; FD). One model was fitted per gradient and hemisphere, each model including all TSM parameters belonging to a gradient (in total, 6 GLMs).”

      Addition made to the Results section, page 17:

      “Models were assessed separately for left and right hemispheres, across the full sample and within age groups, yielding eight hierarchical models in total. Results are reported with p-values at an uncorrected statistical threshold and p-values after FDR adjustment.”

      Updated paragraph in the Discussion, pages 25-26:

      “We discovered that G2, specifically, manifested organizational principles shared among function, behavior, and neuromodulation. Meta-analytical decoding reproduced a unimodalassociative axis across G2 (Figure 3B), and analyses in relation to the distribution of D1DRs – which vary across cortex along a unimodal-transmodal axis(76,77) – demonstrated topographic correspondence both at the level of individual differences and across the group. It should, however, be acknowledged that PET imaging in the hippocampus is associated with resolutionrelated limitations, although previous research indicate high test-retest reliability of [<sup>11</sup>C]SCH23390 PET to quantify D1DR availability in this region(78). As such, mapping the distribution of hippocampal D1DRs at a fine spatial scale remains challenging, and replication of our results in terms of overlap with G2 is needed in independent samples. Here, we evaluated the observed spatial overlap between G2 topography and D1DRs across multiple TSM model orders, showing correspondence between modalities from simple to more complex parameterizations of their spatial properties. Topographic correspondence was additionally observed between G2 and other DA markers from independent datasets (Figure 3B), suggesting that G2 may constitute a mode reflecting a dopaminergic phenotype, which contributes to the currently limited understanding of its biological underpinnings.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Please see the comments in the public review.

      We thank the Reviewer for their comments and recommendations, and have addressed them in the “Public review” section.

      Reviewer #2 (Recommendations For The Authors):

      (1) All statistical analyses are based on linear regressions using trend surface modeling (TSM) parameters that parameterize gradients at the subject level. These models resulted in 9 parameters for gradient 1 and 12 parameters each for gradients 2 and 3. The text states that 'Effects of age on gradient topography was assessed using multivariate GLMs including age as the predictor and gradient TSM parameters as dependent variables (controlling for sex and mean frame-wise displacement; FD)'. Please clarify whether these GLMs were fitted separately for each TSM parameter (i.e., 9+12+12=33 models for both left and right = 66 total models) or on the overall model?

      We appreciate the Reviewer’s request for clarification on this matter. These GLMs were fitted on the overall TSM model, that is, through one GLM per gradient (3) and hemisphere (2), each one including all TSM parameters belonging to a gradient (in total, 6 GLMs).

      In the revised manuscript, we have added more details to the Results section, page 15: “Effects of age on gradient topography were assessed using multivariate GLMs including age as the predictor and gradient TSM parameters as dependent variables (controlling for sex and mean frame-wise displacement; FD). One model was fitted per gradient and hemisphere, each model including all TSM parameters belonging to a gradient (in total, 6 GLMs).”

      (2) Similarly, for memory it appears that multiple models were performed (left and right, young, middle-aged, old, whole groups). Please clarify whether and how multiple comparison correction was performed in this case.

      In the revised manuscript, we have now specified the number of analyses conducted in relation to memory performance. We have also clarified that p-values were reported at an uncorrected statistical threshold, and we have in the revised manuscript included the corresponding p-values adjusted for multiple comparisons using the Benjamini-Hochberg method to control the FDR.

      Updated section in the Results, page 17:

      “Models were assessed separately for left and right hemispheres, across the full sample and within age groups, yielding eight hierarchical models in total. Results are reported with p-values at an uncorrected statistical threshold and p-values after FDR adjustment.”

      (3) Although I applaud the authors for their replication efforts, the results do not appear to replicate well. For example, memory was linked to gradient 2 in the whole group but to gradient 1 in the young group. Furthermore, dopamine was linked to gradient 2 in the right but not the left hemisphere. Although the overall group-level gradients were very stable between the two datasets, it is not clear whether the age findings replicated and the memory subgroup findings only replicated at trend level for memory and only partially replicated at the TSM parameter level.

      We thank the Reviewer for highlighting the inclusion of a replication dataset as a strength of our study, and we appreciate the recommendation to clarify to which extent results replicated. We provide a response to the Reviewer’s points below, and specify the revisions made to the manuscript in relation to this topic.

      The main aim of our study was to characterize the topographic organization of functional hippocampal-neocortical connectivity within the hippocampus across the adult lifespan, as previous studies have limited their focus to younger adults. Given the lack of previous studies for comparison, together with our identification of a novel secondary long-axis connectivity gradient (G2) taking precedence over the previously established medial-lateral G3, we included the Betula sample (Nilsson et al., 2004) for the purpose of replication. There was a high level of consistency between our main dataset and our replication dataset, with gradients 1-3 in left and right hemispheres identified in both samples.

      Further use of the replication dataset, beyond the identification of the connectivity gradients, was originally not planned. As such, not all subsequent analyses in the main dataset were conducted in the replication dataset. However, we found it critical to evaluate the observation that older individuals who maintained a youth-like gradient topography also exhibited higher levels of memory performance in an independent sample. This was possible given that the replication dataset included a comparable number of participants in similar ages and a word recall episodic memory task corresponding well to the one used in DyNAMiC. Overall, we conclude that these analyses replicated well across samples. Firstly, topography of lefthemisphere G1 informed the classification of older adults into youth-like and aged subgroups in both samples. Furthermore, in both samples, we observed that the older subgroups identified based on G1 topography also exhibited the youth-like vs. aged pattern in G2 topography. This pattern was, however, evident also in G3 only in the main sample, possibly suggesting a limited contribution of G3 topography in determining overall functional profiles in older age. In terms of the behavioral relevance of maintaining youth-like gradient topography in older age, we observed effects on word recall performance in both samples; although the Reviewer correctly points out that, the difference between subgroups was significant at trend-level (p = 0.058) in the replication dataset. While this indeed underscores the importance of replication efforts in additional samples, we argue that the pattern observed in our replication dataset is overall consistent with, and conveys effects in the expected direction based on, the original observations in our main dataset.

      In revising the manuscript, we have performed additional analyses for replication purposes in terms of memory. Originally, we observed a significant association between G2 topography and episodic memory across the main sample. However, this effect did not remain significant after FDR adjustment for multiple comparisons. To evaluate this association further, we conducted a corresponding hierarchical multiple regression analysis in the replication dataset, which supported a role of G2 in memory (Adj. R<sup>2</sup> = 0.368, ΔR<sup>2</sup> = 0.081, F= 1.992, p = 0.028). Together, these analyses suggest that inter-individual differences in episodic memory performance may in part be explained by the spatial characteristics of G2 across the adult lifespan, although increased statistical power in relation to the large number of TSM parameters included in the hierarchical regression models may be needed to explore this association in smaller, age-stratified, groups. Relatedly, it is worth mentioning that higher levels of memory performance in older age were linked to the maintenance of youth-like G2 topography in both our main and replication datasets.

      In parallel, topographic parameters of G1 predicted memory performance in the younger adults, which successfully replicates TSM-based results previously reported in Przeździk et al., 2019. Although similar associations were not evident within the other age groups, a link between G1 topography and memory was demonstrated in older age based on a) the identification of individuals maintaining a youth-like G1 profile and higher levels of memory, within which b) memory performance was, as in young adults, significantly predicted by G1 topography.

      The spatial correspondence between G2 topography and distribution of hippocampal D1DRs was lateralized to the right, and as the Reviewer points out, as such did not replicate across hemispheres. To which extent replication across hemispheres should be expected in this case is, however, difficult to determine. Lateralization and/or hemispheric asymmetry is commonly observed in numerous hippocampal features, from the molecular level to its functional involvement in behavior (Nematis et al., 2023; Persson & Söderlund, 2015), including various dopaminergic markers tested in the animal literature (Afonso et al., 1993; Sadeghi et al., 2017). Yet, potential differences between hemispheres in D1DR availability and the spatial distribution of receptors along hippocampal axes remain less studied in humans. More data is therefore needed to determine the nature of this right-hemisphere lateralization.

      In sum, we argue that our results show a good level of replication across independent datasets and across analyses in our main dataset. Whereas this study did not attempt replication of all analyses conducted in the main dataset, it has through replication across independent samples provided support for its main findings – the organization of hippocampal-neocortical connectivity along three main hippocampal gradients across the adult lifespan, and the gradient topography-based identification of older individuals maintaining a youth-like hippocampal organization in older age.

      The revised manuscript includes edits made to incorporate the new analyses and clarifications of observations in relation to memory.

      In the Results, page 17:

      “Observing that the association between G2 and memory did not remain significant after FDR adjustment, we performed the same analysis in our replication dataset, which also included episodic memory testing. Consistent with the observation in our main dataset, G2 significantly predicted memory performance (Adj. R<sup>2</sup> = 0.368, ΔR<sup>2</sup> = 0.081, F= 1.992, p = 0.028) over and above covariates and topography of G1. Here, the analysis also showed that G1 topography predicted performance across the sample (Adj. R<sup>2</sup> = 0.325, ΔR<sup>2</sup> = 0.112, F= 3.431, p < 0.001).”

      In the Discussion, page 26:

      “Results linked both G1 and G2 to episodic memory, suggesting complimentary contributions of these two overlapping long-axis modes. Considered together, analyses in the main and replication datasets indicated a role of G2 topography in memory across the adult lifespan, independent of age. A similar association with G1 was only evident across the entire sample in the replication dataset, whereas results in the main sample seemed to emphasize a role of youthlike G1 topography in memory performance. In line with previous research, memory was successfully predicted by G1 topography in young adults(30), and similarly predicted by G1 in older adults exhibiting a youth-like functional profile.”

      (4) Please share the data and code and add a description of data and code availability in the manuscript.

      We have now made our code available, and added a statement on data and code availability in the revised manuscript.

      On page 37: “Data from the DyNAMiC study are not publicly available. Access to the original data may be shared upon request from the Principal investigator, Dr. Alireza Salami. The Matlab, R, and FSL codes used for analyses included in this study are openly available at https://github.com/kristinnordin/hcgradients. Computation of gradients was done using the freely available toolbox ConGrads: https://github.com/koenhaak/congrads.”

      Reviewer #3 (Recommendations For The Authors):

      Please see the comments in the public review.

      We thank the Reviewer for their comments and recommendations, and have addressed them in the “Public review” section.

      References

      Afonso, D., Santana, C., & Rodriguez, M. (1993). Neonatal lateralization of behavior and brain dopaminergic asymmetry. Brain Research Bulletin, 32(1), 11–16. https://doi.org/10.1016/0361-9230(93)90312-Y

      DeKraker, J., Haast, R. A., Yousif, M. D., Karat, B., Lau, J. C., Köhler, S., & Khan, A. R. (2022). Automated hippocampal unfolding for morphometry and subfield segmentation with HippUnfold. eLife, 11, e77945. https://doi.org/10.7554/eLife.77945

      Dubovyk, V., & Manahan-Vaughan, D. (2019). Gradient of expression of dopamine D2 receptors along the dorso-ventral axis of the hippocampus. Frontiers in Synaptic Neuroscience, 11. https://doi.org/10.3389/fnsyn.2019.00028

      Edelmann, E., & Lessmann, V. (2018). Dopaminergic innervation and modulation of hippocampal networks. Cell and Tissue Research, 373(3), 711–727. https://doi.org/10.1007/s00441-018-2800-7

      Gasbarri, A., Verney, C., Innocenzi, R., Campana, E., & Pacitti, C. (1994). Mesolimbic dopaminergic neurons innervating the hippocampal formation in the rat: A combined retrograde tracing and immunohistochemical study. Brain Research, 668(1), 71–79. https://doi.org/10.1016/0006-8993(94)90512-6

      Glasser, M. F., & Essen, D. C. V. (2011). Mapping Human Cortical Areas In Vivo Based on Myelin Content as Revealed by T1- and T2-Weighted MRI. Journal of Neuroscience, 31(32), 11597–11616. https://doi.org/10.1523/JNEUROSCI.2180-11.2011

      Kaller, S., Rullmann, M., Patt, M., Becker, G.-A., Luthardt, J., Girbardt, J., Meyer, P. M., Werner, P., Barthel, H., Bresch, A., Fritz, T. H., Hesse, S., & Sabri, O. (2017). Test– retest measurements of dopamine D1-type receptors using simultaneous PET/MRI imaging. European Journal of Nuclear Medicine and Molecular Imaging, 44(6), 1025–1032. https://doi.org/10.1007/s00259-017-3645-0

      Katsumi, Y., Zhang, J., Chen, D., Kamona, N., Bunce, J. G., Hutchinson, J. B., Yarossi, M., Tunik, E., Dickerson, B. C., Quigley, K. S., & Barrett, L. F. (2023). Correspondence of functional connectivity gradients across human isocortex, cerebellum, and hippocampus. Communications Biology, 6(1), Article 1. https://doi.org/10.1038/s42003-023-04796-0

      Kempadoo, K. A., Mosharov, E. V., Choi, S. J., Sulzer, D., & Kandel, E. R. (2016). Dopamine release from the locus coeruleus to the dorsal hippocampus promotes spatial learning and memory. Proceedings of the National Academy of Sciences, 113(51), 14835–14840. https://doi.org/10.1073/pnas.1616515114

      Navarro Schröder, T., Haak, K. V., Zaragoza Jimenez, N. I., Beckmann, C. F., & Doeller, C. F. (2015). Functional topography of the human entorhinal cortex. eLife, 4, e06738. https://doi.org/10.7554/eLife.06738

      Nemati, S. S., Sadeghi, L., Dehghan, G., & Sheibani, N. (2023). Lateralization of the hippocampus: A review of molecular, functional, and physiological properties in health and disease. Behavioural Brain Research, 454, 114657. https://doi.org/10.1016/j.bbr.2023.114657

      Nilsson, L.-G., Adolfsson, R., Bäckman, L., Frias, C. M. de, Molander, B., & Nyberg, L. (2004). Betula: A Prospective Cohort Study on Memory, Health and Aging. Aging, Neuropsychology, and Cognition, 11(2–3), 134–148. https://doi.org/10.1080/13825580490511026

      Nyberg, L. (2017). Functional brain imaging of episodic memory decline in ageing. Journal of Internal Medicine, 281(1), 65–74. https://doi.org/10.1111/joim.12533

      Nyberg, L., Boraxbekk, C.-J., Sörman, D. E., Hansson, P., Herlitz, A., Kauppi, K., Ljungberg, J. K., Lövheim, H., Lundquist, A., Adolfsson, A. N., Oudin, A., Pudas, S., Rönnlund, M., Stiernstedt, M., Sundström, A., & Adolfsson, R. (2020). Biological and environmental predictors of heterogeneity in neurocognitive ageing: Evidence from Betula and other longitudinal studies. Ageing Research Reviews, 64, 101184. https://doi.org/10.1016/j.arr.2020.101184

      Paquola, C., Benkarim, O., DeKraker, J., Larivière, S., Frässle, S., Royer, J., Tavakol, S.,

      Valk, S., Bernasconi, A., Bernasconi, N., Khan, A., Evans, A. C., Razi, A., Smallwood, J., & Bernhardt, B. C. (2020). Convergence of cortical types and functional motifs in the human mesiotemporal lobe. eLife, 9, e60673. https://doi.org/10.7554/eLife.60673

      Pedersen, R., Johansson, J., Nordin, K., Rieckmann, A., Wåhlin, A., Nyberg, L., Bäckman, L., & Salami, A. (2024). Dopamine D1-Receptor Organization Contributes to Functional Brain Architecture. Journal of Neuroscience, 44(11). https://doi.org/10.1523/JNEUROSCI.0621-23.2024

      Pedersen, R., Johansson, J., & Salami, A. (2023). Dopamine D1-signaling modulates maintenance of functional network segregation in aging. Aging Brain, 3, 100079. https://doi.org/10.1016/j.nbas.2023.100079

      Persson, J., & Söderlund, H. (2015). Hippocampal hemispheric and long-axis differentiation of stimulus content during episodic memory encoding and retrieval: An activation likelihood estimation meta-analysis. Hippocampus, 25(12), 1614–1631. https://doi.org/10.1002/hipo.22482

      Przeździk, I., Faber, M., Fernández, G., Beckmann, C. F., & Haak, K. V. (2019). The functional organisation of the hippocampus along its long axis is gradual and predicts recollection. Cortex, 119, 324–335. https://doi.org/10.1016/j.cortex.2019.04.015

      Sadeghi, L., Rizvanov, A. A., Salafutdinov, I. I., Dabirmanesh, B., Sayyah, M., Fathollahi, Y., & Khajeh, K. (2017). Hippocampal asymmetry: Differences in the left and right hippocampus proteome in the rat model of temporal lobe epilepsy. Journal of Proteomics, 154, 22–29. https://doi.org/10.1016/j.jprot.2016.11.023

      Tian, Y., Margulies, D. S., Breakspear, M., & Zalesky, A. (2020). Topographic organization of the human subcortex unveiled with functional connectivity gradients. Nature Neuroscience, 1–12. https://doi.org/10.1038/s41593-020-00711-6

      vos de Wael, R., Larivière, S., Caldairou, B., Hong, S.-J., Margulies, D. S., Jefferies, E., Bernasconi, A., Smallwood, J., Bernasconi, N., & Bernhardt, B. C. (2018). Anatomical and microstructural determinants of hippocampal subfield functional connectome embedding. Proceedings of the National Academy of Sciences, 115(40), 10154–10159. https://doi.org/10.1073/pnas.1803667115

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment 

      This study presents valuable finding regarding the role of life history differences in determining population size and demography. The evidence for the claims is still partially incomplete, with concerns about generation times and population structure. Nonetheless, the work will be of considerable interest to biologists thinking about the evolutionary consequences of life history changes.  

      Thank you. We have addressed the generation time and population structure issues in detail in our revision and hope that you, like us, find them to be of sufficiently low concern (i.e., they are not driving the results) that they do not overshadow the main findings and conclusions.

      The opportunity to make in-depth revisions also helped the manuscript in two ways unanticipated by both us and the reviewers. First, KW made a mistake in the original analysis of phylogenetic signal, and catching that error simplifies that aspect of the study (there is none in our measured variables). Second, in June 2024 Hilgers et al. (2024; https://doi.org/10.1101/2024.06.17.599025) posted an important manuscript to bioRxiv noting the possibility of false population size peaks in PSMC analyses using the standard default settings. Our results had three of those, which we have eliminated. N<sub>e</sub>ither of these issues affect the overall conclusions, but their resolution improves the work.  

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This interesting study applies the PSMC model to a set of new genome sequences for migratory and nonmigratory thrushes and seeks to describe differences in the population size history among these groups. The authors create a set of summary statistics describing the PSMC traces - mean and standard deviation of N<sub>e</sub>, plus a set of metrics describing the shape of the oldest N<sub>e</sub> peak - and use these to compare across migratory and resident species (taking single samples sequenced here as representative of the species). The analyses are framed as supporting or refuting aspects of a biogeographic model describing colonization dynamics from tropical to temperate North and South America. 

      Strengths: 

      At a technical level, the sequencing and analysis up through PSMC looks good and the paper is engaging and interesting to read as an introduction to some verbal biogeographic models of avian evolution in the Pleistocene.

      The core findings - higher and more variable N<sub>e</sub> in migratory species - seem robust, and the biogeographic explanation is plausible.  

      Thanks. We thought so as well. Our analyses go beyond being simply descriptive and test some simple hypotheses, including a biogeographic+ecological expansion opportunity gained in some lineages through the adoption of a seasonal migration life-history strategy.  

      Weaknesses: 

      I did not find the analyses particularly persuasive in linking specific aspects of clade-level PSMC patterns causally to evolutionary driving forces. To their credit, the authors have anticipated my main criticism in the discussion. This is that variation in population size inferred by methods like PSMC is in "effective" terms, and the link between effective and census population size is a morass of bias introduced by population structure and selection so robustly connecting specific aspects of PSMC traces to causal evolutionary forces is somewhere between extremely difficult and impossible.  

      As R1 notes, we do not attempt to link effective population sizes and census sizes (though we do discuss this), and we are also careful to discuss correlated rather than causative factors when going beyond the overarching hypotheses regarding life-history strategy.

      Population structure is the most obvious force that can generate large N<sub>e</sub> changes mimicking the census-sizefocused patterns the authors discuss. The authors argue in the discussion that since they focus on relatively deep time (>50kya at least, with most analyses focusing on the 5mya - 500kya range) population structure is "likely to become less important", and the resident species are usually more structured today (true) which might bias the findings against the observed higher N<sub>e</sub> in migrants.  

      To clarify, the patterns we discuss are entirely related to effective population size, not census size. But, yes, this is why we’ve given population structure its own section in the Discussion.

      But is structure really unimportant in driving PSMC results at these specific timescales? There is no numerical analysis presented to support the claim in this paper. The biogeographic model of increased temperate-latitude land area supporting higher populations could yield high N<sub>e</sub> via high census size, but shifts in population structure (for example, from one large panmictic population to a series of isolated refugial populations as a result of glaciation-linked climate changes) could plausibly create elevated and more variable N<sub>e</sub>. Is it more land area and ecological release leading to a bigger and faster initial N<sub>e</sub> bump, or is it changes in population connectivity over time at expanding range edges, or is the whole single-bump PSMC trace an artifact of the dataset size, or what? The authors have convinced me that the N<sub>e</sub> history of migratory thrushes is on average very different from nonmigrant thrushes, but beyond that it's unclear what exactly we've learned here about the underlying process.  

      We do not argue that population structure is unimportant, only that it is less important as one goes into deeper time. Further, we agree with the reviewer’s observation above that structure is more likely to bias nonmigrant estimates of N<sub>e</sub>. In other words, following Li & Durbin’s (2011) simulations, we interpret that an inflated N<sub>e</sub> due to structure should occur more often among residents. We have clarified this in the revision. We also agree that what we’ve learned about the underlying process is not entirely clear, but as we stated, population structure does not seem to be the main driver, and there is evidence that both biogeographic and ecological factors are involved. With this being the first time that these questions have been asked, we think we’ve made an important advance and that we’ve opened a number of avenues for future study.

      It also important to consider the time scales involved and the sampling regime. Glacial-interglacial cycles averaged ~100 Kyr back to 0.74 Mya and then averaged ~41 Kyr from then back to 2.47 Mya; about 50-60 of these cycles occurred (Lisiecki & Raymo 2005: fig. 4). This probably caused a lot of population structuring and mixing in these lineages. In addition, in the PSMC output from one of our lineages, C. ustulatus swainsonii, we find that there are 54 time segments sampled for the Pleistocene, indicating the inadequacy of this method to reflect fine-scale changes and suggesting that each estimate is capturing a lot of both phenomena, structuring and mixing. We have added this to the revision.

      I generally agree with the authors that "at present there is no way to fully disentangle the effects of population structure and geographic space on our results". But given that, I think there are two options - either we can fully acknowledge that oversimplified demographic models like PSMC cannot be interpreted as supporting evidence of any particular mechanistic or biogeographic hypothesis and stop trying to use them to do that, or we have to do our best to understand specifically which models can be distinguished by the analyses we're employing. 

      Short of developing some novel theory deep in the PSMC model, I think readers would need to see simulations showing that the analyses employed in this paper are capable of supporting or refuting their biogeographic hypothesis before viewing them as strongly supporting a specific biogeographic model. Tools like msprime and stdpopsim can be used to simulate genome-scale data with fairly complex biogeographic models. Running simulations of a thrush-like population under different biogeographic scenarios and then using PSMC to differentiate those patterns would be a more convincing argument for the biogeographic aspects of this paper. The other benefit of this approach would be to nail down a specific quantitative version of the taxon cycles model referenced in the abstract, and it would allow the authors to better study and explain the motivation behind the specific summary statistics they develop for PSMC posthoc analysis.  

      These could very well be fruitful pursuits for future work, but they are beyond the scope of this paper. The impossibility of reconstructing ranges through deep time makes anything other than the very general biogeographic hypothesis we’ve posed an uncertain pursuit. Also, a purely biogeographic approach neglects the likelihood of ecological expansion also being involved. We get at the importance of the latter in the “Geography and evolutionary ecology” section of the Discussion. Below, the editor states that discussions among reviewers indicate that simulations are not warranted at this time. We agree that the complexities involved are substantial, to the point of making direct relevance to this empirical study uncertain (especially in such an among-lineage context). Regarding taxon cycles, we merely point out that that conceptual framework seems relevant given our findings. This was not even remotely anticipated at the outset of the study, so we are reluctant to do anything more than point out its possible relevance in several aspects of the results. Finally, the motivation for the study’s summary statistics were entirely driven by the hypotheses, as given in Methods, and due to an earlier error (noted above), there are no post-hoc analyses in the revision. Sorry for the needless confusion.

      Reviewer #2 (Public Review): 

      Summary: 

      Winker and Delmore present a study on the demographic consequences of migratory versus resident behavior by contrasting the evolutionary history of lineages within the same songbird group (thrushes of the genus Catharus). 

      Strengths: 

      I appreciate the test-of-hypothesis design of the study and the explicit formulation of three main expectations to test. The data analysis has been done with appropriate available tools. 

      Weaknesses: 

      The current version of the paper, with the case study chosen, the results, and the relative discussion, is not satisfying enough to support or reject the hypotheses here considered.  

      Given the stated strengths, the weaknesses noted seem a little incongruous, but we understand from the comments below that the reviewer would like to see the study redesigned and expanded.  

      The authors hypothesized that the wider realized breeding and ecological range characterising migrants versus resident lineages could be a major drive for increased effective population size and population expansion in migrants versus residents. I understand that this pattern (wider range in migrants) is a common characteristic across bird lineages and that it is viewed as a result of adapting to migration. A problem that I see in their dataset is that the breeding grounds range of the two groups are located in very different geographic areas (mainly South versus North America). The authors could have expanded their dataset to include species whose breeding grounds are from the two areas, regardless of their migratory behaviour, as a comparison to disentangle whether ecological differences of these two areas can affect the population sizes or growth rates.

      Because the questions are about the migratory life history strategy and the best way to get at this is in a phylogenetic framework, we’re not sure how we could effectively add species “regardless of their migratory behavior.” Further, we know that migration causes lineages to experience variable ecological conditions that include breeding, migration, and wintering conditions. Obligate migrants are going to have different breeding ranges from their close relatives, and the more distantly related species are, the less likely it is that they respond to particular ecological conditions the same way. So we do not think that an approach that included miscellaneous species from northern and southern regions would strengthen this study. Here, the comparative framework of closely related lineages that possess or lack the trait of interest is a study design strength. We do agree, however, that future work is needed that does encompass more lineages (we would argue in a phylogenetic context), and that disentangling the effects of geography and ecology will also be an important future endeavor. 

      As I understand from previous literature, the time-scale to population growth and estimates of effective population sizes considered in the present paper for the resident versus migratory clades seem to widely predate the times to speciation for the same lineages, which were reported in previous work of the same authors (Everson et al 2019) and others (Termignoni-Garcia et al 2022). This piece of information makes the calculation of species-specific population size changes difficult to interpret in the light of lineages' comparison. It is unclear what the authors consider to be lineage-specific in these estimates, as the clades were likely undergoing substantial admixture during the time predating full isolation.  

      We do recognize that timing estimates vary among studies. Differences among studies in important variables like markers, methods, generation time, and mutation or substitution rates create much of this uncertainty. Also, we are not confident in prior dating efforts in this group, largely because of gene flow and its effects on bringing estimates closer to the present. As we point out (line 485), differences among studies on these issues do not detract from the strengths here for within-study, among-lineage contrasts. In short, the timing could be off in an among-study context (and likely is with prior work, given gene flow), but relative performance of among-lineage N<sub>e</sub> differences is less susceptible to these factors. This was shown fairly well in Li & Durbin’s initial use of the method among human populations. Regarding substantial admixture, PSMC curves often unite at their origins with sister lineages (when they were the same lineage). A good example is with the two C. guttatus E & W curves in Fig. S3, which still have substantial gene flow today (they are subspecies and in contact), yet they show remarkably different N<sub>e</sub> curves through their history. It is not possible to mark a cutoff point for each lineage that represents the cessation of admixture with another lineage (e.g., Everson et al. 2019 showed substantial admixture between three full species in this group); that period can be very long (Price et al. 2008), varies among lineages, and will not be available for deeper lineage divergences in the phylogeny. We therefore chose to use all of the time intervals retrievable from the genomic data in each lineage, considering that this uniform treatment is the best approach for our among-lineage comparison. And note that we were careful to label these as “the lineages’ PSMC inception” (line 190).  

      Regarding the methodological difficulties in interpreting the impact of population structure on the estimates of effective population sizes with the PSMC approach, I would think that performing simulations to compare different scenarios of different degrees of structured populations would have helped substantially understand some of the outcomes.  

      The complexities of such modeling in a system like this are daunting. The different degrees of structuring among all of these lineages across just a single glacial-interglacial cycle would necessitate a lot of guesswork; projecting that back across 50-60 such cycles just in the Pleistocene would probably end up being fiction. Disentangling the effects of structure versus changes in N<sub>e</sub> in a system like this would probably not be possible with that approach and these data. As noted above and below, there was agreement among reviewers and the editor that simulations in this case are not warranted for revision. We have added the nature of the glacialinterglacial cycles and the PSMC sampling time segments to help readers understand this better (see above in response to R1, and lines 272-278).

      Additionally, I have struggled to understand if migratory behaviour in birds is considered to be acquired to relieve species competition, or as a consequence of expanded range (i.e., birds expand their range but their feeding ground is kept where speciation occurred as to exploit a ground with higher quality and abundance of seasonal local resources).  

      The origins of migration have been a struggle for researchers since the subject was taken up. But how the trait was acquired among these species does not really matter for our study. Here, migratory lineages possess different biogeographic+ecological attributes than their close relatives that are sedentary. Our focus is on the presence and absence of this life-history trait.

      The points raised above could be considered to improve the current version of the paper. 

      Thank you. We appreciate the opportunity to guide our revision using your comments.  

      Reviewer #3 (Public Review): 

      Summary: 

      This paper applies PSMC and genomic data to test interesting questions about how life history changes impact long-term population sizes. 

      Strengths: 

      This is a creative use of PSMC to test explicit a priori hypotheses about season migration and N<sub>e</sub>. The PSMC analyses seem well done and the authors acknowledge much of the complexity of interpretation in the discussion. 

      Weaknesses: 

      The authors use an average generation time for all taxa, but the citations imply generation time is known for at least some of them. Are there differences in generation time associated with migration? I am not a bird biologist, but quick googling suggests maybe this is the case (https://doi.org/10.1111/1365-2656.13983). I think it important the authors address this, as differences in generation time I believe should affect estimates of N<sub>e</sub> and growth.  

      Good point. The study cited by the reviewer encompasses a much higher degree of variation in body size and thus generation time. Differences in generation time in similarly sized close relatives, as in our study, should be small, and our approach has been to average those that are known. Unfortunately, generation times are not known for all of these species, but given their similarity in size we can have reasonable confidence in their being similar. We used data from the life-history research available (as cited) to obtain our average; there are not appropriate data for the residents, though. However, there is thought to be a generation time cost to seasonal migration in birds, and Bird et al. (2020) included this in their estimates to provide modeled values for all of the lineages we studied. We’re leery of using modeled values where good data for the nonmigrants in this group don’t exist (and the basis for quantifying this cost is tiny), but we recognize that this second approach is available and could leave some doubt in our results if not pursued. So we re-did everything with the modeled generation times of Bird et al. (2020). As expected, most of the differences are time-related. Importantly, our overall results are not different. We present them as Table S2 and have added the details on this to the Methods.

      The writing could be improved, both in the introduction for readers not familiar with the system and in the clarity and focus of the discussion.  

      We have added a phylogeny (new Fig. 1) to help readers better understand the system, and we’ve re-worked the Discussion to make it clearer what is clarified by our results and what remains unclear.  

      Recommendations for the authors:

      Reviewing Editor comment: 

      I note that discussion among the reviewers made clear that simulations are probably not the right answer given the complexity of the modeling required.  

      We appreciate this conclusion, with which we agree.  

      Reviewer #2 (Recommendations For The Authors): 

      Apologies for the delay with the review, which came at a very busy time. I hope you will find my comments helpful.

      Thanks. Your comments are helpful, and we fully understand how reviews (and our revisions!) have to wait until more pressing needs are addressed.

      I enjoyed reading the manuscript but I believe that the discussion sections could be heavily rewritten for better clarity. The discussion is sometimes redundant and lacks some flow/clarity. In a nutshell, I had the feeling that a bit of everything is thrown in the discussion but clear conclusions are not made.  

      Yes, the Discussion has been difficult to write, because more issues arose in the Results than we anticipated at the outset. We feel that discussing them is relevant, but we agree that much remains unclear. This coupling of paleodemographics with geography and ecology is a new area, which opens some important new (and relevant) areas to consider. So clarity is not possible in some areas. We’ve revised to point out where we do have clarity (e.g., in migrant lineages having different paleodemographic attributes than nonmigrants) and where only further study can provide clarity (e.g., in the roles of geography versus ecology). The journal format does not seem to have secondary subheaders, but we’ve used bold in one place to highlight ‘ecological mechanisms’ to offset that section, one of the more complex. We’ve also added a paragraph in the conclusions to clarify where we have clear takeaways and where uncertainties remain. 

      Reviewer #3 (Recommendations For The Authors): 

      The introduction should engage the reader with biology, not the use of demographic methods or genomics (both of which have been around for more than a decade). I would drop the first paragraph and considerably expand the second. What has previous research on ecology/behavior/genetics found regarding the demographic effects of seasonal migration?

      There are two important aspects to our study: 1) using paleodemographic methods to test hypotheses about adoption of a major life-history trait—an important biological question regardless of system, and so far (surprisingly) unaddressed; and 2) using this novel approach to study the effects of one such trait, seasonal migration. At these timescales, nothing exists on this subject, so there is really nothing to expand with. If there is relevant literature that we’ve missed, we’d be happy to add it.

      What is the missing bit of information or angle the current study addresses (other than just doing it larger and fancier with genomics)?  

      The effects of major life-history traits on paleodemographics has not been addressed before, to our knowledge. The whole context is new, so we’re not doing something “larger and fancier” with genomics. We are doing something that has not been done before: testing hypotheses about the effects of a major life-history trait on population sizes in evolutionary time. We’re not sure how this can be made clearer. To us this seems like a very engaging biological question with wide applicability. We hope that this study is just the first of many to come, in a diversity of biological systems.

      A figure showing the phylogenetic relationships of these taxa which are migratory would help the reader immensely. Although this is shown in Fig S3 I think it might be nice to have a map of the species and their ranges alongside a phylogeny as a main figure early on.  

      Thank you. This is a good suggestion. We can’t fit a phylogeny and all the distribution maps (Fig. S1) onto a page, but we can include a phylogeny as one of the main figures with nonmigrants highlighted. We’ve inserted this as a new Fig. 1. 

      If I understand correctly, the authors' arguments for why migratory species should show more growth hinge on large range size and geographic expansion. Yet they argue in the discussion that these forces are unlikely to be important (L226). I found the discussion on this confusing (e.g. L231 then says maybe it does matter). I think more clarity here would be helpful.

      Our argument and predictions are based both on geographic and ecological expansion. This was clearly stated as our third prediction “3) early population growth would be higher as seasonal migration opens novel ecological and geographic space…” We have gone back through and reiterated the coupling of these two factors. The line mentioned concludes the first paragraph in the section ‘Geography and evolutionary ecology,’ which focuses on the difficulty of decoupling these in this system. As the paragraph relates, geography alone does not seem to be driving our results (we do not argue that it is unimportant). 

      I also would have liked more time in the discussion addressing why variation in N<sub>e</sub> may be higher in migratory lineages.

      In addition to re-clarifying this in the Introduction, we have touched back on this now at line 221: “We attribute the higher variation in N<sub>e</sub> among migrants to be the result of the relative instability of northern biomes compared with tropical ones through glacial-interglacial cycles (e.g., Colinvaux et al., 2000; Pielou, 1991).”

      Minor comments: 

      L 62: Presumably PSMC is limited by the coalescent depth of the genelaogy, which may be younger or older than population "origins" depending on the history of colonization, lineage splitting, gene flow, etc.  

      We were careful to phrase these as “the lineages’ PSMC inception” (line 190), and responded to this issue in more detail above in response to R2’s public review. 

      L 338: I think a few more details on PSMC would be helpful. Was no maskfile used?  

      We did not use a maskfile, choosing instead to generate data of decent coverage and aligning reads to a single closely related relative. 

      Did the consensus fasta include all species?  

      No, we used a single reference high-quality fasta of Catharus ustulatus , as reported (lines 434-37). We have added that “Identical treatment of all lineages in these respects should provide a strong foundation for a comparative study like this among close relatives.” 

      L 361: Fair to assume the authors used a weighted average of N<sub>e</sub> from the output, rather than just averaging the N<sub>e</sub> values from each time segment?  

      No – we used all the values of N<sub>e</sub> produced by PSMC output. The PSMC method uses nonoverlapping portions of the genome in its analyses (which we’ve added to make that clear), and portions in juxtaposition will often provide data for very different periods in the time segments. Further, time segments are uneven within and among taxa, so it is not clear how a uniform and comparable weighting scheme could be implemented. We consider a uniform approach to be of primary importance, including for future comparisons among studies. 

      L 383 "delta" typo

      Thank you for catching this.

      L 93: I'd be tempted to present the questions (how does seasonal migration affect population size trajectory, means, and variation) and rationale before presenting the hypotheses. I found myself reading the hypotheses and wondering "why?"  

      We’ve tried this change in the revision. It makes the hypotheses a little harder to pull out (they are no longer numbered in a short sequence), but it is shorter and solves this concern.  

      L 337 read depth is usually expressed as X (e.g. "23X") rather than bp.

      Changed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study further validates DNAH12 as a causative gene for asthenoteratozoospermia and male infertility in humans and mice. The data supporting the notion that DNAH12 is required for proper axonemal development are generally convincing, although more experiments would solidify the conclusions. This work will interest reproductive biologists working on spermatogenesis and sperm biology, as well as andrologists working on male fertility.

      We thank the editor and the two reviewers for their time and careful evaluation of our manuscript. We sincerely appreciate their encouraging feedback and insightful guidance on improving our study. In the revised manuscript, we have performed additional experiments and provided quantitative data regarding the reviewers' comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Even though this is not the first report that the mutation in the DNAH12 gene causes asthenoteratozoospermia, the current study explores the sperm phenotype in-depth. The authors show experimentally that the said mutation disrupts the proper axonemal arrangement and recruitment of DNALI1 and DNAH1 - proteins of inner dynein arms. Based on these results, the authors propose a functional model of DNAH12 in proper axonemal development. Lastly, the authors demonstrate that the male infertility caused by the studies mutation can be rescued by ICSI treatment at least in the mouse. This study furthers our understanding of male infertility caused by a mutation of axonemal protein DNAH12, and how this type of infertility can be overcome using assisted reproductive therapy.

      Strengths:

      This is an in-depth functional study, employing multiple, complementary methodologies to support the proposed working model.

      Thank you for your recognition of the strength of this study. Your positive feedback motivates us to continue refining our research and methodological rigor in future studies.

      Weaknesses:

      The study strength could be increased by including more controls such as peptide blocking of the inhouse raised mouse and rat DNAH12 antibodies, and mass spectrometry of control IP with beads/IgG only to exclude non-specific binding. Objective quantifications of immunofluorescence images and WB seem to be missing. At least three technical replicates of western blotting of sperm and testis extracts could have been performed to demonstrate that the decrease of the signal intensity between WT and mutant was not caused by a methodological artifact.

      Thank you for your comments. In order to study in-depth, we have analyzed the protein sequence features of DNAH12 protein, 1-200 amino acids of DNAH12 were selected as the ideal antigen considering its good performance (1. high immunogenicity; 2. High hydrophilicity; 3. Good Surface Leakage Groups; 4. Sequence homology analysis to avoid unspecific recognition to other proteins;). The two different anti-DNAH12 antibodies were developed with the help Dia-An Biotech company in 2022, we have tried to acquire the polypeptide fragments of target proteins to do peptide blocking but the material were discard after the service. Luckily, we have got the target band of DNAH12 protein in western blotting experiment while the band was not detected in knockout mice group; the immunofluorescence signals of DNAH12 were strong but not present in knockout mice group. Besides, we have tested that the inhouse raised rabbit antibody were suitable for IP experiment. The IP experiment also showed the raised rabbit antibody were able to immunoprecipitated the DNAH12 band in the Dnah12<sup>+/+</sup> mice but not in Dnah12<sup>-/-</sup> mice. Collectively, these data could support the specificity of the raised DNAH12 antibodies.  In IP assay, we have added the IgG group in the IP-mass spectrometry to exclude non-specific binding. And the experimental design was described in Figure 6B. The raw data were deposited in iProX partner repository (accession number: PXD051681), and we have coordinated with the repository manager to make the data publicly accessible (https://www.iprox.cn/page/subproject.html?id=IPX0008674001).  

      Besides, we have conducted replicates of western blotting of sperm and testis extracts at least 3 times and added the objective quantifications of immunofluorescence signals and WB images. The quantifications of the blot were shown in figures to help readers understand these results easily.

      Reviewer #2 (Public Review):

      Summary:

      The authors first conducted whole exome sequencing for infertile male patients and families where they co-segregated the biallelic mutations in the Dynein Axonemal Heavy Chain 12 (DNAH12) gene.

      Sperm from patients with biallelic DNAH12 mutations exhibited a wide range of morphological abnormalities in both tails and heads, reminiscing a prevalent cause of male infertility, asthenoteratozoospermia. To deepen the mechanistic understanding of DNAH12 in axonemal assembly, the authors generated two distinct DNAH12 knockout mouse lines via CRISPR/Cas9, both of which showed more severe phenotypes than observed in patients. Ultrastructural observations and biochemical studies revealed the requirement of DNAH12 in recruiting other axonemal proteins and that the lack of DNAH12 leads to the aberrant stretching in the manchette structure as early as stage XI-XII. At last, the authors proposed intracytoplasmic sperm injection as a potential measure to rescue patients with DNAH12 mutations, where the knockout sperm culminated in the blastocyst formation with a comparable ratio to that in WT.

      Strengths:

      The authors convincingly showed the importance of DNAH12 in assembling cilia and flagella in both human and mouse sperm. This study is not a mere enumeration of the phenotypes, but a strong substantiation of DNAH12's essentiality in spermiogenesis, especially in axonemal assembly.

      The analyses conducted include basic sperm characterizations (concentration, motility), detailed morphological observations in both testes and sperm (electron microscopy, immunostaining, histology), and biochemical studies (co-immunoprecipitation, mass-spec, computational prediction). Molecular characterizations employing knockout animals and recombinant proteins beautifully proved the interactions with other axonemal proteins.

      Many proteins participate in properly organizing flagella, but the exact understanding of the coordination is still far from conclusive. The present study gives the starting point to untangle the direct relationships and order of manifestation of those players underpinning spermatogenesis. Furthermore, comparing flagella and trachea provides a unique perspective that attracts evolutional perspectives.

      Thank you for your thoughtful and positive feedback. We are delighted that you found our study to be a strong substantiation of DNAH12's essential role in spermiogenesis, particularly in axonemal assembly. We believe that this study represents a meaningful step toward unraveling the intricate coordination of axonemal proteins during spermatogenesis, and your comments further inspire us to continue exploring these complex mechanisms in future work. Thank you once again for your valuable insights and summary of this work.

      Weaknesses:

      Seemingly minor, but the discrepancies found in patients and genetically modified animals were not fully explained. For example, both knockout mice vastly reduced the count of sperm in the epididymis and the motility, while phenotypes in patients were rather milder. Addressing the differences in the roles that the orthologs play in spermatogenesis would deepen the comprehensive understanding of axonemal assembly.

      This is an interesting question. Actually, it seems that although humans and mice share the male infertility phenotypes with deficiency in dynein proteins essential for sperm flagellar development, they are different in some ways. For instance, it has been reported that deficiency in DNAH17 (Clin Genet. 2021. PMID: 33070343) or DNAH8 (Am J Hum Genet. 2020. PMID: 32619401; PMCID: PMC7413861), two other members of Dynein Axonemal Heavy Chain family, also cause more severe phenotype in mice, comparing with that of human patients carrying bi-allelic DNAH17 or DNAH8 loss-of-function mutations. In knockout mice, sperm counts are lower, and the proportion of abnormal sperm morphology is higher, whereas the phenotypes in human patients tend to be milder. These observations suggest that orthologs may influence spermatogenesis to slightly different extents in humans and mice. We plan to investigate the mechanisms underlying these discrepancies in future studies, which will provide deeper insights into axonemal assembly and the evolutionary aspects of spermatogenesis. Thank you again for bringing up this important issue.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This reviewer is impressed by the study's depth and the extent of the methodology used in the study. The study is well-designed, and the results are very interesting. The reviewer's enthusiasm was reduced by the lack of some controls (provided that the reviewer did not miss them). Further are point-to-point suggestions that this reviewer believes will increase the merit of the present study.

      Title:

      (1) Why a "special" dynein? What makes it special when compared to other dyneins? I suggest removing the word special.

      Through phylogenetic and protein domain analyses of the DNAH family, we found that DNAH12 is the shortest member and the only one that lacks a typical microtubule-binding domain (MTBD) in the DNAH family, thus we want to describe it as a “special” dynein. We have fully considered your valuable suggestion and decided to remove it from the title.

      Abstract:

      (2) L23: same as above, why special?

      We identified DNAH12 as the shortest member of the DNAH family and uniquely lacking the typical microtubule-binding domain (MTBD). This distinct characteristic prompted us to describe it as a 'special' dynein in the abstract part.

      (3) L37: the reviewer did not find a figure (neither main nor supplementary) that would demonstrate the proper organization of microtubules in cilia. Figure S11 only shows the presence of cilia in DNAH12-/- mouse. A TEM image of cilia is required to confirm or reject the claim that DNAH12 does not play a crucial role in proper microtubule organization in cilia.

      We have now added TEM images of cilia in wild-type and Dnah12<sup>-/-</sup> mice. The ultra-structures of cilia axonemes were comparable in wild-type and Dnah12<sup>-/-</sup> groups, suggesting that DNAH12 may not play crucial role in proper microtubule organization. The results have now been added to Supplemental Figure 11F.

      (4) L122-6: Did the authors also confirm these structures by cryo-EM? If not, this needs to be pointed out as a shortcoming in the discussion, that the structures and interactions are predicted in silico only.

      Thank you for your comment. Due to resource limit, we do not perform cryo-EM to confirm these structures. We will pursue the structures details at an atomic resolution structure in further study. We understand this point and now we have addressed this as a shortcoming in the discussion part.

      (5) L134: Be more specific about what characteristics of DNAH12 were analyzed.

      Thank you for your comment. We have now updated these in the method part. The characteristics of the DNAH12 including its region immunogenicity, hydrophilicity, surface leakage groups, and sequence homology were analyzed.

      (6) L137: Be more specific about how the antibodies validated were. Were the antibodies validated for both immunofluorescence and western blotting? I suggest doing peptide blocking of the antibody, for instance for ICC, preincubation of ab with immunizing peptide followed by primary ab incubation with studied cells/tissues.

      Thank you for your comments and suggestions. We validated the antibodies for both immunofluorescence and western blotting to ensure their effectiveness in our experiments. The two different anti-DNAH12 antibodies were developed with the help Dia-An Biotech company in 2022, we have attempted to acquire the polypeptide fragments of target proteins to do peptide blocking but the material were disposed after the service. Luckily, we have got the target band of DNAH12 protein which showed strong signal in western blotting experiment and the band was not detected in knockout mice group; the immunofluorescence signals of DNAH12 were strong but not present in knockout mice group. Besides, the IP experiment also showed the raised rabbit antibody were able to immunoprecipitated the DNAH12 band in the Dnah12<sup>+/+</sup> mice but not in Dnah12<sup>-/-</sup> mice. Collectively, these data could support the specificity of the raised DNAH12 antibodies. We sincerely admire your suggestion and will require for the peptide material if we develop new antibodies.

      (7) L142: This reviewer is unfamiliar with using TRIzol for sperm protein extraction. Is there a specific reason for not using PAGE loading buffer for human sperm protein extraction?

      Thanks for your suggestions. TRIzol reagent can be used for small amounts of samples (5×10<sup>6</sup> cells) as well as large amounts of samples (>10<sup>7</sup> cells). It is suitable for extraction of RNA and proteins at the same time. Our lab has adopted these methods in our previous work (Hum Reprod Open. 2023; PMID: 37325547; PMCID: PMC10266965.). This method is very useful to process valuable small amounts of samples for scientific work. The human sperm protein extraction was added with SDS-sample buffer [PAGE loading buffer] before SDS-PAGE separation. We have added this detail in the method part. We are sorry for making this misunderstanding.

      (8) L144: Were these the final concentrations of the SDS loading buffer? 1 × Laemmli buffer contains 62.5 mM TRIS, 2% (w/w) SDS, 10 % (w/v) glycerol, and 5% 2-mercaptoethanol. Please, amend accordingly.

      Thanks for your suggestions.  We apologized for incorrect labelling of concentrations (The previous one is 3× SDS loading buffer).  We have now amended the SDS loading buffer to 1 × Laemmli buffer as suggested.

      (9) L151: Table S2 contains other homemade antibodies than DNAH12. Please, include references to the studies where the generation and validation of these antibodies is described.

      Thank you for your suggestions. We have developed a DNAH1 antibody for use in Western blot assays, with its generation and validation detailed in Frontiers in Endocrinology (Lausanne), 2021 (PMID: 34867808; PMCID: PMC8635859). Additionally, we have produced a DNAH17 antibody for both immunofluorescence (IF) and Western blot, as described in Journal of Experimental Medicine, 2020 (PMID: 31658987; PMCID: PMC7041708). These references have now been included.

      (10) L167: Please, spell out ICR at its first appearance.

      Done as suggested, Thank you. The full name of ICR is Institute of Cancer Research.

      (11)L169: This reviewer is confused. It seems that the mouse encodes DNAH12 on exons 5 and 18 simultaneously. Each mouse model has only one exon targeted for a knockout. Would not this mean that the expression of DNAH12 in both models is not completely knocked down? Please, give more background in this paragraph for those less familiar with CRISPR/Cas9.

      Thank you for your insightful comment. We appreciate your attention to detail. To clarify, while the mouse model does indeed encode DNAH12 on exons 5 and 18 simultaneously, we specifically targeted the key exon 5 or exon 18 in each model to achieve different knockout strategies. This approach allows us to assess the functional implications of the remaining DNAH12 expression in both models. We have checked the DNAH12 expression in both models, and the result showed both models present with undetected DNAH12 proteins, indicating both models were completely knocked out of DNAH12 proteins. Additionally, we will revise the manuscript to include further details on the CRISPR/Cas9 methodology, ensuring accessibility for readers less familiar with this technique. Thank you again for your valuable feedback, which we believe will greatly enhance our manuscript.

      (12) L201: 50 % PBS? As in 0.5 x concentrated PBS? Please, rewrite for clarity.

      The term "50% PBS" refers to a 1:1 dilution of phosphate-buffered saline (PBS) with an appropriate diluent, resulting in a final concentration of 0.5x PBS. We will revise the text to explicitly clarify this, ensuring it is clear to all readers. Thank you for highlighting this point.

      (13) L224: Please, state what beads those were (magnetic/agarose, conjugated to protein A/G...) Include catalog # and manufacturer.

      Thank you for your suggestion. We have updated the manuscript to include this information. The beads used were Protein A/G Magnetic Beads (Catalog #B23202, Bimake, Texas, USA).

      (14) L227: What was the reason for adding a proteasomal inhibitor? What concentration was used? Please, add this information to the text.

      We adding MG132 in cell immunoprecipitation (IP) experiments is to inhibit proteasomal activity, thereby preventing the degradation of the target protein. This helps maintain the stability of the target protein during the experiment (Sci Adv. 2022. PMID: 35020426; PMCID: PMC8754306.), enhancing its detectability in subsequent analyses. MG132 (5 μM) was added. We have added this information in the revised the manuscript

      (15) L233: in vivo IP of mouse testis lysate? This does not make sense. I suggest removing "in vivo".

      Thank you for your careful review and comments on our manuscript. We have modified as suggested.

      (16) L317: Supplemental Figure 6 precedes Supplemental Figure 5 in the text, which is neither logical nor orderly.

      Thank you for your suggestion. Since the N-terminal DNAH12 antibody is already described in the Methods section (L317), we propose removing Supplemental Figure 6 from the content to improve the logical flow and maintain an orderly presentation.

      (17) L345 and elsewhere: how did the authors quantify the decrement of the signal? This needs to be measured objectively.

      Thank you for your valuable suggestion. We quantified the signal intensity using Fiji (Nat Methods. 2012. PMID: 22743772; PMCID: PMC3855844), which allows for precise analysis of pixel intensity. The results are presented in the figures to effectively illustrate the decrement in signal intensity. We appreciate your suggestion, and we have provided a description of the method in our methodology section.

      (18) L371: I recommend: ...and elongated spermatids; the abnormal...

      Done as suggested. Thank you.

      (19) L412-4: Cilia in both Dnah12<sup>mut/mut</sup> and Dnah12<sup>-/-</sup> are developed, but are they motile or immotile? This needs to be investigated. Is the DNAH12 in cilia truncated while still fulfilling its function?

      Thanks for your comment. We have checked the ciliary motility using an inverted microscope, and no significant difference of ciliary motility were observed between the knockout group and the control group. These results indicated that the ciliary motility was not affected by DNAH12 deficiency. The N-terminal DNAH12 antibody was developed to detect whether a truncated protein in mice tissues while we do not detect DNAH12 signals through immunofluorescence assay on trachea sections of the Dnah12<sup>-/-</sup> mice. These results indicate that DNAH12 may exert little influence on cilia, comparing to its important function in flagella.

      (20) L414-6: The results do not support this claim as the authors do not show that cilia are motile.

      Thanks for your comment. The supplemental videos 3-4 of trachea live of Dnah12<sup>+/+</sup> and Dnah12<sup>-/-</sup> mice have been uploaded to support this conclusion.

      (21) L421-3: Did the authors perform a negative test, where they let the testis lysate interact with beads/IgG only and performed the MS to identify non-specific binding? This is a crucial specificity test for this approach.

      We have performed negative test. In IP assay, we have added the IgG group in the IP-mass spectrometry to exclude non-specific binding. And the experimental design was described in Figure 6B. The raw data were deposited in iProX partner repository (PXD051681), which we have required the manager soon to update the status to public, so it will be visible to readers. 

      (22) L462: same as #18 the authors need to show that cilia are also motile. The mere presence of cilia in DNAH12-/- as shown in Fig S11C&D is not sufficient to conclude that the mice do not manifest PCD symptoms.

      Thanks for your comment. We do not observe obvious differences between the cilia of Dnah12<sup>+/+</sup> and Dnah12<sup>-/-</sup> mice.  The supplemental videos 3-4 of trachea live of Dnah12<sup>+/+</sup> and Dnah12<sup>-/-</sup> mice have been uploaded to show the motility of the trachea.

      (23) L529: MTBD region instead of domain, as "domain" is already part of the abbreviation.

      Done as suggested

      (24) L875: Sperm is both the singular and plural form. Spermatozoon vs spermatozoa can be used where the distinction between singular and plural needs to be made.

      Thanks for your suggestion. We have checked and changed this usage.

      (25) Figure 3H: Is there a specific reason why P11 is not shown?

      Because limited smear slides of P11 were available, the P11 were not stained for DNAH17 antibody previously. We have now updated the experiment, which showed that DNAH17 expression were not affected in patient P11. We have now added this result to Figure 3H.

      (26) Figure 8H: The authors in their MS do not describe what is happening to N-DRC proteins, yet they suggest in their model that it's unaffected in the mutant mouse/human. Please, address this in the MS and clearly state in the model that N-DRC needs further exploration in future studies.

      Thanks for your suggestion, we have checked the MS data but do not observe the enrichment of nexin-dynein regulatory complex (N-DRC) protein, just one known N-DRC protein DRC1 present with only 1 unique peptide. Instead, enrichment of inner dynein arm proteins and radial spoke proteins were observed. However, we cannot determine the N-DRC structures maybe affected or not. We have stated this in the discussion part and will pursue this with high resolution technology like cryo-EM in the future.

      (27) Figure 5F: Is it possible to choose a different Dnah12<sup>-/-</sup> spermatozoon to see a reduced level of DNALI1 so that it corresponds with the WB detection in Fig 5B?

      Thanks for your suggestion, we have chosen a Dnah12<sup>-/-</sup> spermatozoon with faint remnants of the DNALI1 signal as the representative picture.

      (28) Figure S2 and elsewhere: How were the authors able to resolve and calibrate 356 kDa protein using SDS PAGE? Agarose electrophoresis protein electrophoresis is more suitable for resolution of high molecular proteins. Most of the protein standards have as high molecular standard as 250 kDa.

      We have found that high molecular proteins (like 356kDa) were able to resolve in concentration 4-12% gradient gel of polyacrylamide gels and employ appropriate voltages and more time during electrophoresis to improve resolution of high molecular weight proteins. The DNAH12 proteins were calibrated by the using of a HiMark™ Pre-Stained High Molecular Weight Protein Standard (30-460 kDa). We have now updated the blot images to show the size of the DNAH12 protein (Fig S6B,). The target band is obvious between 268 kDa and 460 kDa, which make it easy to calculate the target band of DNAH12 antibody elsewhere. Thanks for your suggestion.

      (29) Figure S5: similar to #24: Why P10 and P11 are not shown?

      Because limited smear slides of P10 or P11 were available, we did not stain ODF2 antibody previously. We have now updated the experiments, which showed that ODF2 expression were not affected in patient P10 or P11. We have now added this result to Figure S5.

      (30) Figure S6B: The specificity of the anti-DNAH12 antibody against mouse DNAH12 seems to be questionable since the authors detect multiple bands on WB. I recommend doing peptide blocking to show that these are non-specific binding as opposed to off-target binding.

      Thank you for your comments. In order to study in-depth, we have analyzed the protein sequence features of DNAH12 protein, 1-200 amino acids of DNAH12 were selected as the ideal antigen considering its good performance (1. high immunogenicity; 2. High hydrophilicity; 3. Good Surface Leakage Groups; 4. Sequence homology analysis to avoid unspecific recognition to other proteins;). The two different anti-DNAH12 antibodies were developed with the help Dia-An Biotech company in 2022, we have attempted to acquire the polypeptide fragments of target proteins to do peptide blocking but the material were disposed after the service. Luckily, we have got the target band of DNAH12 protein which showed strong signal in western blotting experiment and the band was not detected in knockout mice group; the immunofluorescence signals of DNAH12 were strong but not present in knockout mice group. Besides, we have tested that the inhouse raised rabbit antibody was suitable for IP experiment. The IP experiment also showed the raised rabbit antibody were able to immunoprecipitated the DNAH12 band in the Dnah12<sup>+/+</sup> mice but not in Dnah12<sup>-/-</sup> mice. Collectively, these data could support the specificity of the raised DNAH12 antibodies. We admire your suggestion and will require for the peptide material if we develop new antibodies.

      Reviewer #2 (Recommendations For The Authors):

      Recruitment of DNAH1 and DNALI1 to the flagella is dependent on DNAH12 expression, according to the data. What would be the mechanism that locates DNAH12 which lacks MTBD to the flagella?

      Thank you for your insightful question. We are currently investigating the mechanisms that facilitate the loading of DNAH12 to the flagella. Based on existing data, we hypothesize that CCDC39 and/or CCDC40 may play a critical role in the recruitment of DNAH12 to sperm flagella during spermiogenesis (Nat Genet. 2011, PMID: 21131972; PMCID: PMC3509786; Nat Genet. 2011, PMID: 21131974; PMCID: PMC3132183). Furthermore, a structural study by Walton et al. showed that DNAH12 associates with CCDC39/CCDC40 proteins (Nature. 2023, PMID: 37258679; PMCID: PMC10266980). These findings suggest that CCDC39 and/or CCDC40 may play a role in facilitating the localization of DNAH12 to the flagella. Additional studies are needed to identify other potential factors involved in this process and to further elucidate the mechanisms underlying this complex biological phenomenon.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Regarding the manuscript's clarity, the sentence on page 5, "We also stained VTA sections for Tyrosine hydroxylase (TH) to estimate the rate of ChR2 colocalization with DA neurons," reads awkwardly. Removing the word "rate" could improve clarity.

      We have made the recommended clarifying edit (page 5, lines 30-31).

      Additionally, the anatomical data and findings are largely non-quantitative in nature. However, solid microscopy images are presented to support each claim. Additional quantification would strengthen the paper, specifically the quantification of projection density for each population and the proportion of each subpopulation that projects to their regions of interest.

      To rigorously quantify the projection density of each subpopulation would require a level of exhaustivity our study was not designed for. This is because during microscopy we focused efforts on imaging regions containing dense signals but did not exhaustively image regions receiving apparently weak or no input. While we considered including a semi-quantitative table of projection density, based on the data available we could not discriminate with confidence between, e.g., regions recipient of minimal input versus no input from VTA populations. Thus, while we stand by our descriptive statements we do not expand on those further.

      The authors should consider discussing the possibility that subpopulations of these cells could still be true interneurons especially if cells were looked at the single neuron level of resolution.

      We agree that some of the VTA populations we studied could include subpopulations that are bona fide interneurons. The identification of alternate markers or combinations of markers, or use of single-cell imaging approaches may indeed support this possibility in future. This is discussed in the context of currently available evidence on page 5 lines 32-34, page 11 lines 2-4, page 12 lines 2-11, and page 12 lines 15-16.

      Overall, the paper is well-written and important for the field and beyond.

      Thank you!

      Reviewer #2:

      Weaknesses:

      While the authors use several Cre driver lines to identify GABAergic projection neurons, they then use wild-type mice to show that projection neurons synapse onto neighboring cells within the VTA. This does not seem to lend evidence to the idea that previously described "interneurons" are projection neurons that collateralize within the VTA.

      We think the use of WT mice is a strength because it allows us to measure both GABA and non-GABA synapses made by VTA projections on to the same cells within VTA. However, we have also done this experiment targeting NAc-projecting VTA VGAT-Cre neurons, and VP-projecting VTA MOR-Cre neurons. Consistent with the WT dataset, we find that these defined projection neurons also make intra-VTA synapses. These data are now included as Figure 7.

      More broadly. Our review of the literature finds very little evidence to support the notion of a VTA interneuron as we define it: VTA neurons that makes only local connections. But the absence of evidence need not imply evidence of absence, thus we do not claim that all VTA neurons previously presumed to be interneurons must be projection neurons. We do express confidence in our findings that VTA projection neurons (that include GABA-releasing neurons) make local synapses in VTA. We argue that in the absence of compelling positive evidence for the existence of VTA interneurons, such as a selective marker, “we”, “the field”, should not presume their existence.

      Other suggestions:

      (1) While the authors present evidence that some projection neurons also synapse locally, there is no quantification as to the proportion of each neuronal subtype that collateralizes within the VTA. This would be a useful analysis.

      We agree this would be useful information. But our experiments were not designed to answer this question. Indeed, we have not conceived of a feasible method to discriminate between collateralizing and non-collateralizing VTA projection neurons at the single-cell level, thus we do not know how we would calculate such proportions.

      (2) There is significant interest in the molecular heterogeneity and spatial topography of the VTA. Additional analyses of the spatial topography of labeled projectors would be useful. For example, knowing if Pvalb+ projection neurons are distributed throughout the VTA or located along the midline would be a useful analysis.

      Prior studies and public databases (e.g., Allen brain atlas, GENSAT) allow one to visualize the location of VTA neurons positive for Pvalb and the other markers we investigated (Olson & Nestler, 2007). However, these label the entire population of neurons and thereby include those that project to any of the various projection targets. There are also studies that have used retrograde labeling approaches to map the distribution of labeled VTA cells projecting to one or another target (Beier et al., 2015; Lammel et al., 2008; Margolis et al., 2006). For example, finding that LHb-projecting neurons (a major target of Pvalb+ VTA neurons) are enriched in medial VTA (Root et al., 2014). From this evidence we might infer that Pvalb+ VTA neurons that project to LHb are likely to be medially biased. Future studies may more carefully map the intersection of specific projection targets for each VTA subpopulation.  

      Reviewer #3 (Recommendations For The Authors):

      Weaknesses:

      This study has a few modest shortcomings, of which the first is likely addressable with the authors' existing data, while the latter items will likely need to be deferred to future studies:

      (1) Some key anatomical details are difficult to discern from the images shown. In Figure 1, the low-magnification images of the VTA in the first column, while essential for seeing what overall section is being shown, are not of sufficient resolution to distinguish soma from processes. A supplemental figure with higher-resolution images could be helpful.

      We uploaded a higher resolution file for figure 1.

      Also, where are the insets shown in the second column obtained from? There is not a corresponding marked region on the low-magnification images. Is this an oversight, or are these insets obtained from other sections that are not shown?

      This was an oversight, we added the corresponding marked region to the low-magnification images.

      Lastly, there is a supplemental figure showing the NAc injection sites corresponding to Figure 5, but not one showing VP or PFC injection sites in Figure 6. Why not?

      We added a figure with histology examples for the VP and the PFC injection sites as done for Figure 5, included as Supplemental Figure 3.

      (2) Because multiple ChR2 neurons are activated in the optogenetic experiments, it is not clear how common is it for any specific projection neuron to make local connections. Are the observed synaptic effects driven by just a few neurons making extensive local collateralizations (while other projection neurons do not), or do most VTA projection neurons have local collaterals? I realize this is a complex question, that may not have an easy answer.

      This is a great question but, indeed, we don’t know the answer. As mentioned in response to Reviewer #2, we are not convinced there is a currently feasible way to discriminate between collateralizing and non-collateralizing cells at the single cell level.

      (3) There is something of a conceptual disconnect between the early and later portions of this paper. Whereas Figures 1-4 examine forebrain projections of genetic subtypes of VTA neurons, the optogenetic studies do not address genetic subtypes at all. I do realize that is outside of the scope of the author's intent, but it does give the impression of somewhat different (but related) studies being stitched together. For example, the MOR-expressing neurons seem to project strongly to the VP, but it is not addressed whether these are also the ones making local projections. Also, after showing that PV neurons project to the LHb, the opto experiments do not examine the LHb projection target at all.

      This too was raised by Reviewer #2. While addressing this question for all the populations we investigated feels redundant, we now include optogenetic data showing that NAc-projecting VTA VGAT-Cre and VP-projecting VTA MOR-Cre neurons also make local collaterals (Figure 7). We think this allows us to connect the two approaches to a greater degree. Based on our findings using a dual virus approach to express Syn:Ruby in each population of VTA projection neuron, we think it very likely that we’d continue to find similar results using optogenetics-assisted slice electrophysiology for each population.

      Other suggestions:

      (1) I appreciated the extensive and high-quality anatomical figures shown in Figures 2-4. However, the layout was sometimes left-to-right, and sometimes right-to-left, which felt distracting. At some point, the text refers to "Fig. 3KJ", i.e. with the letters being in backward alphabetical order, and Figures 3I and 3L do not appear mentioned anywhere in the main text, leading me to wonder if that text was intended to read "Fig. 3I-L".

      Thank you for noting this. We have harmonized the layout of Figures 2-4 and adjusted the in-text Figure call-outs.

      Also, the inset in Figure 3J appears to show local collaterals of NTS neurons in the VTA, since there is no soma in that inset. This is interesting, and worth reporting, but is not explained in either the main text or Figure legend.

      We added a more complete description in the result section (page 6 line 25-30).

      (2) Perhaps I missed it, but I could not find any mention of the intensity of the LED light delivered during the optogenetic experiments. While acknowledging that this can be variable, do the authors have at least a rough range?

      We have added this information to the methods, page 17 line 8.

      Editor's Note:

      Should you choose to revise your manuscript, please double check that you have fully reported all statistics including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals.

      We confirm that we have fully reported all statistics including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals.

      Note to Editor and Readers

      While reanalyzing our data for resubmission, we discovered that some of the short-latency optogenetic evoked postsynaptic currents (oPSCs) we detected were erroneously categorized. Specifically, some VTA cells that showed large outward currents (oIPSCs) when held at 0 mV, also had small inward currents when held at -60 mV. These small inward currents were initially categorized as oEPSCs, suggesting these VTA cells received input from populations of VTA projection neurons that released GABA and/or glutamate. However, the kinetics of these small inward currents were slow and aligned with the within-cell kinetics of the oIPSCs, indicating that these were very likely mediated by GABA<SUB>A</SUB> receptors. In one case the opposite was apparent, with a small PSC initially miscategorized as an oIPSC. These miscategorized oEPSCs and oIPSC were presumably detected because our holding potentials were not precisely identical to the reversal potentials for GABA<SUB>A</SUB> and AMPA receptors, respectively. For this reason, we removed these 14 oEPSCs and 1 oIPSCs from our analyses in the revised version. The revised dataset suggests that VTA glutamate projection neurons may be less likely to collateralize widely within VTA compared to GABA projection neurons. But, importantly, this correction does not affect any of our conclusions.

      Citations:

      Beier, K. T., Steinberg, E. E., DeLoach, K. E., Xie, S., Miyamichi, K., Schwarz, L., Gao, X. J., Kremer, E. J., Malenka, R. C., & Luo, L. (2015). Circuit Architecture of VTA Dopamine Neurons Revealed by Systematic Input-Output Mapping. Cell, 162(3), 622-634. https://doi.org/10.1016/j.cell.2015.07.015

      Lammel, S., Hetzel, A., Hackel, O., Jones, I., Liss, B., & Roeper, J. (2008). Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system. Neuron, 57(5), 760-773. https://doi.org/10.1016/j.neuron.2008.01.022

      Margolis, E. B., Lock, H., Chefer, V. I., Shippenberg, T. S., Hjelmstad, G. O., & Fields, H. L. (2006). Kappa opioids selectively control dopaminergic neurons projecting to the prefrontal cortex. Proc Natl Acad Sci U S A, 103(8), 2938-2942. https://doi.org/10.1073/pnas.0511159103

      Olson, V. G., & Nestler, E. J. (2007). Topographical organization of GABAergic neurons within the ventral tegmental area of the rat. Synapse, 61(2), 87-95. https://doi.org/10.1002/syn.20345

      Root, D. H., Mejias-Aponte, C. A., Zhang, S., Wang, H. L., Hoffman, A. F., Lupica, C. R., & Morales, M. (2014). Single rodent mesohabenular axons release glutamate and GABA. Nat Neurosci, 17(11), 1543-1551. https://doi.org/10.1038/nn.3823

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This is by far the phylogenetic analysis with the most comprehensive coverage for the Nemacheilidae family in Cobitoidea. It is a much-lauded effort. The conclusions derived using phylogenetic tools coincide with geological events, though not without difficulties (Africa pathway).

      Strengths:

      Comprehensive use of genetic tools

      Weaknesses:

      Lack of more fossil records

      Thank you for appreciating the comprehensiveness of our study.

      We agree that additional nemacheilid fossils would have provided valuable support for reconstructing the evolutionary history of the family. However, the nemacheilid fossil used in our study is currently the only fossil species of the family, which precludes the possibility of including more. To address this limitation, we incorporated fossils from closely related fish families, as well as a geological event, to calibrate the time tree. We have added further details on this point in “Divergence time estimations and ancestral range reconstruction” section of the Methods. The reconstruction of the pathway by which loaches reached northeast Africa, is further complicated by the extensive aridification of the Arabian Peninsula and the Nile valley, leaving no fossil or extant Nemacheilidae species of Nemacheilidae to provide insights into the distribution of the family during late Miocene.

      Reviewer #2 (Public review):

      Summary:

      The authors present the results of molecular phylogenetic analysis with very comprehensive samplings including 471 specimens belonging to 250 species, trying to give a holistic reconstruction of the evolutionary history of freshwater fishes (Nemacheilidae) across Eurasia since the early Eocene. This is of great interest to general readers.

      Strengths:

      They provide very vast data and conduct comprehensive analyses. They suggested that Nemacheilidae contain 6 major clades, and the earliest differentiation can be dated to the early Eocene.

      Weaknesses:

      The analysis is incomplete, and the manuscript discussion is not well organized. The authors did not discuss the systematic problems that widely exist. They also did not use the conventional way to discuss the evolutionary process of branches or clades, but just chronologically described the overall history.

      In the revised version, we address the systematic issues within Nemacheilidae in a new paragraph. The polyphyly of the genus Schistura and the polyphyly or paraphyly of many other nemacheilid genera are wellknown challenges in ichthyology. However, the large size of the family Nemacheilidae and the absence of a clear basal classification system has made systematic work difficult.

      The chronological concept in the description of events is in accordance with the sequence in which the events occurred over time and corresponds with Figure 8. Additionally, a clade-by-clade description would make it challenging to capture the periods before all clades were formed. As a compromise, the revised version includes a new table where each clade is represented by a column, allowing readers to trace the history of each clade in a clear overview. With this table, we make both the chronological and clade-by-clade perspectives to enhance reader understanding

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I have no major comments, except for Figure 8, where the colour code for Sunda is not consistent, appearing as light purple and then dark purple. I was trying to locate the colour legend, maybe include this for all figures or refer to it.

      Figure 8 has been revised to improve matching of the colours.

      Reviewer #2 (Recommendations for the authors):

      (1) It is better to discuss the evolutionary history of the major inner groups. For example, why the Branch A and B differentiated? How are the 6 major clades differentiated?

      As mentioned above, the new table provides an overview of the evolutionary history of the major clades and, where known, the mechanism that led to their differentiation. For branches A and B, the underlying causes of differentiation remain known. Currently, the extensive morphological variability within each clade prevents a definitive morphological diagnosis, but such a study is planned for the future.

      (2) In this study, there are still some phylogenetic or systematic problems unresolved. For example, the Genus Schistura remains polyphyletic even in different major clades. The situation is similar for the Genus Tripophysa though not so serious. These need to be discussed or at least partially solved before discussing the evolutionary history.

      We discuss these topics now in a new paragraph ‘Taxonomic implications’.

      (3) In Table S1, what is the meaning of "-". Does this mean no data available? If so, how do the authors treat this in their phylogenetic analysis?

      Indeed, in Table S1, a ‘-‘ indicates that no sequence was available for the given species and gene. In the phylogenetic analyses, these cases were treated as missing data.

      (4) What is the source of Figure 8? There are different opinions on the geological events. The authors need to indicate the source of their information.

      The sources of Fig. 8 are now provided in the figure caption.

      (5) The Eastern Clade forms continuous distribution in Figure 6, but discontinuous in Figure 8. Is this correct?

      Figure 6 does not display the distribution areas for the clades, but illustrates the biogeographic regions used in the biogeographic analysis.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Recommendations for the authors):

      A good number of sentences in the introduction, page two, refer to a figure, 'Fig. 2a', which appears to be the copy-paste effect of these sentences from another location (please see below):

      "Notably, SPHK2 does not directly contribute to levels of secreted S1P (Thuy et al., 2022), nor is it annotated in the chick genome. S1P can be exported from cells by a transporter (MFSD2A and SPNS2) or converted to sphingosine by a phosphatase (SGPP1) (Fig. 2a). Levels of sphingosine are increased by ASAH1 by conversion of ceramide or decreased by CERS2/5/6 by conversion to ceramide (Fig. 2a). S1P is known to activate G-protein coupled receptors, S1PR1 through S1PR5 (Fig. 2a). S1PRs are known to activate different cell signaling pathways including MAPK and PI3K/mTor, and crosstalk with pro-inflammatory pathways such as NFκB (Fig. 2a) (Hu et al., 2020)."

      We have removed references to Fig. 2a, which was from a previous draft of this manuscript.

      Please correct the typo in the following sentence (Fid.)

      "S1PR1 was most prominently expressed by resting MG and MG returning to a resting state, whereas S1PR3 was detected in relatively few scattered cells in clusters of MG, ganglion cells, horizontal cells, bipolar cells, amacrine cells, photoreceptors, oligodendrocytes, microglia and NIRG cells (Fid. 1d).

      We have corrected this typo_._

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weaknesses: 

      It is not always clear what the novel findings are that this manuscript is presenting. It appears to be largely similar to the analysis done by McKey et al. (2022) but with more time points and molecular markers. The novelty of the present study's findings needs to be better articulated. 

      The previous study focused on placing the Rete Ovarii in the context of ovarian development. The current study focuses on the novel findings that the EOR is a active structure that sends fluid/information to the ovary. We show this by characterizing the presence of secretory proteins in the RO epithelial cells, by dye injections into the EOR and observing transport of the dye to the ovary, and by collection of EOR fluid followed by proteomic analysis. We also show that RO is embedded in an elaborate vascular network and contacted by neurons. None of this data was not discussed in the McKey 2022 paper. 

      Reviewer #2 (Public Review):

      Clarifications: 

      (1) Is there any comparative data on the proteomics of RO and rete testis in early development? With some molecular markers also derived from rete testis, it would be better to provide the data or references.

      To the best of our knowledge, there are no available proteomic datasets of the embryonic or early postnatal mouse Rete Testis or Epididymis. The authors agree that having this information would be very useful. 

      (2) Although the size of RO and its components is quite small and difficult to operate, the researchers in this article had already been able to perform intracavitary injection of EOR and extract EOR or CR for mass spectrometry analysis. Therefore, can EOR, CR, or IOR be damaged or removed, providing further strong evidence of ovarian development function?

      We attempted to genetically ablate the RO by expressing the diphtheria toxin receptor (DTR) in RO cells and adding DT. This approach was not successful in ablating the RO. We also tried to use Pax2/8 homo- and heterozygous mutants for ablation (as used in the McKey 2022 paper), but so far, we cannot find a genetic combination that ablates the RO, but not the oviduct, uterus and/or kidneys. We have also embarked on a study to surgically remove the RO. This assay is taking some time to optimize. The goal of the current study was to characterize the cells along the length of the RO and to present evidence that it is a secretory appendage of the ovary.

      (3) Although IOR is shown on the schematic diagram, it cannot be observed in the immunohistochemistry pictures in Figure 1 and Figure 3. The authors should provide a detailed explanation.

      An annotation has been added to Figure 1 to indicate the IOR. As the images within the panels are of maximum intensity projections, it is often difficult to clearly see the IOR as it is deeper within the ovary. In Figure 3, the view of the ovary is from the ventral side:  this view does not allow for clear visualization of the IOR.

      Reviewer #3 (Public Review):

      Weaknesses: 

      There is a lack of conclusive data supporting many conclusions in the manuscript. Therefore, the paper's overall conclusions should be moderated until functional validations are conducted.

      We have moderated the conclusions where appropriate

      Reviewer #1 (Recommendations For The Authors):

      (1) The introduction is relatively brief and does not mention some historical data/hypotheses on the role of the RO in ovarian function (e.g. regulation of meiotic entry) or development (e.g. Mayère et al., 2022).

      Mayere 2022 was cited in line 57. Steins hypothesis about entry into meiosis has been added line 58.

      (2) L82-84: It is stated that KRT8 was first identified as a potential RO marker by sc/snRNAseq (Anbarci et al., 2023) and then validated in this manuscript. However, KRT8 was used by McKey et al. (2022) as a RO marker, and they noted there that KRT8 was enriched in the EOR. It is not clear why McKey et al. is not cited as the primary reference validating KRT8 as an EOR marker.

      The embryonic and neonatal timecourse description from KRT8 expression is first identified in this paper. McKey 2022 only highlights KRT8 at E18.5 A reference has been added to address this line 85

      (3) Figure 1: Can the IOR be seen in these images? If so, please label. 

      The label has been added.

      (4) L107: It is hypothesized that "the RO may respond to or interpret homeostatic cues." Can transcriptomics data shed light on what signals the RO may be capable of responding to? E.g. what receptors are expressed by cells of the RO (e.g. ER, LHCGR, FSHR)?

      The RO expresses ESR1, PGR, INSR, IGF1R. The IOR exclusively expresses LHCGR and FSHR.This has been added to the manuscript line 309

      (5) L152: Mass spec was used to identify proteins secreted into the lumen of the RO. These proteins were then compared to the mammalian secretome to filter out possible nonsecreted protein contaminants. Finally, the candidates were compared to the RO scRNAseq data from Anbarci et al., (2023). This method gives a very conservative candidate list. However, it may also be informative to compare the sc/snRNA-seq gene list directly to the secretome to ID other possible candidate-secreted proteins that may not have been detected in the mass spec data set. 

      There are quite a number of secreted proteins that are also not actively secreted. This is a good suggestion for future analysis. For the current study we wanted to take a more conservative approach, and chose to do proteomics to determine proteins that are actively secreted. 

      (6) L195: It is not clear if IGFBP2 is expressed by both OR and granulosa cells or only granulosa cells. It would be informative to know what ovarian cell types express both IGFBP2 and IGF1R (e.g. from sc/snRNA-seq)? This information is referenced in the discussion (L285-287) but would be better to reference it in the results section for clarity.

      Both RO and granulosa cells express IGFBP2 and IGF1R. A sentence has been added to results for clarity. (Line 197)

      (7) L295: "...the RO participates in endocrine signaling..." might be more accurate to say "...the RO responds to endocrine signaling...".

      The authors agreed that this statement is more accurate and the changes have been made. 

      Reviewer #3 (Recommendations For The Authors): 

      Several issues significantly affect the paper's quality in the current version. Firstly, there is a lack of conclusive data supporting many conclusions in the manuscript. For instance, the assertion in line 105 that "EOR was directly innervated by neurons" lacks substantial evidence beyond basic immunofluorescent staining. 

      We agree that the term “innervated” might be a step too far since we rely on IF evidence.  We changed the wording of this sentence to say, “The EOR was directly contacted by neurons”.

      In another pivotal experiment illustrated in Figure 3, the provided images lack temporal continuity and quantitative analysis, suggesting the incorporation of time-lapse imaging for improved sequential presentation in Figure 3.

      The microscope where we can perform injections cannot record movies.  We have tried moving the rete to another microscope after injection, but so far, we have been unable to capture dextran moving through the RO. We therefore believe that transport is rapid, but future experiments will be needed to optimize this imaging.

      Moreover, relying solely on proteomics analysis, as seen in lines 188-189, makes it challenging to assert conclusions such as "EOR actively secretes proteins." Therefore, the paper's overall conclusions should be moderated until functional validations are conducted. 

      The findings that (1) the cells of the EOR express SNARE complex proteins at their apical surfaces and (2) luminal fluid expelled from the EOR contains abundant secreted proteins strongly suggest that the RO is involved in active secretion. We use the word “suggest” in this sentence, lines 188-189 as we realize that further experiments should be done to validate this conclusion.

      Furthermore, the predominant methods in this study involve immunostaining and imaging. However, the current images exhibit a notable inconsistency in color definitions for different markers by the authors. For instance, in Figure 2.A/C, PAX8 is portrayed as cyan, while in D, it is represented in yellow. Similarly, in Figure 4, E-CAD is depicted using both cyan and yellow. Utilizing different colors for the same protein within a figure can significantly confuse readers' interpretation of the experiments. Rectifying these inconsistencies is essential to enhance the clarity and comprehension of the experimental results.

      These colors were chosen to be visible to those with color image impairments. We typically used cyan and magenta to emphasize the most important markers in the image. When E-Cad and KRT8 were often used to emphasized or landmark a structure by localization of these protein. When KRT8 and E-Cad were highlighted, they were represented in cyan and magenta for visibility. When these proteins were used as a landmark to orient the reader and not as the main point, they were labeled in yellow.

      At last, many markers in this study are derived from bulk and single-cell sequencing of developing RO. However, it seems that these important data were separated into another paper as a preprint. If this data were incorporated into the current manuscript, the manuscript would become more comprehensive for guiding future research on the RO.

      Since we have single cell and single nuclei data from fetal and adult estrus and metestrus stages, we found that incorporating all this data into the present manuscript was overwhelming. Instead, we devoted another manuscript to presenting and validating that data. We believe a quick look at the sequencing manuscript will make this clear.

    1. Author response:

      We appreciate the reviewers’ thoughtful and constructive feedback, which has provided valuable insights to refine our manuscript. Below, we outline the planned revisions in response to the public reviews.

      Response to Reviewer #1

      We are grateful for the reviewer’s recognition of our methodological approach and the potential significance of CD47 as a novel MSC marker for cartilage repair. To address the concerns raised:

      (1) Clarifying the proteomics data supporting CD47 as an MSC marker

      · The manuscript will be revised to clearly indicate where the proteomics data demonstrate elevated CD47 expression in MSCs compared to non-MSCs.

      · Additional figure annotations or a supplemental figure may be included to enhance clarity.

      (2) Providing further details on CD47hi and CD47lo MSC populations

      · Information on the number of isolated CD47hi and CD47lo cells, along with any necessary expansion steps before in vivo use, will be explicitly detailed.

      (3) Expanding the characterization of CD47hi MSCs in vitro

      · A more comprehensive analysis of the chondrogenic differentiation capacity of CD47hi MSCs will be incorporated to strengthen the findings.

      (4) Clarifying experimental details of the in vivo rat OA model

      · The methodology section will be updated to specify the number of injected cells and their labeling strategies.

      · Representative histological images will be added to support the results.

      · To further substantiate the cartilage repair potential of CD47hi MSCs, additional staining for Collagen Type II will be included alongside Sox9 expression.

      Response to Reviewer #2

      We appreciate the reviewer’s enthusiasm for the study and recognition of its rigor and translational significance. The following revisions are planned to address the feedback:

      (1) Addressing additional assessments for OA phenotype in the rat model

      · While this study primarily relied on histology, the limitations of this approach will be acknowledged in the discussion.

      · The absence of microCT and behavioral assessments will be explained, with suggestions for incorporating these methods in future studies.

      (2) Justifying the focus on CD47

      · The rationale behind prioritizing CD47 over other proteomics-identified markers will be expanded to provide better context for this choice.

      (3) Clarifying MSC engraftment patterns

      · The manuscript will include a discussion on whether CD47hi MSCs specifically engraft in articular cartilage or contribute to ectopic cartilage formation (e.g., osteophytes).

      (4) Contextualizing findings within recent research on synovial progenitors

      · Additional discussion will highlight recent studies on DPP4+ PI16+ CD34+ stromal cells and how the identified MSC populations may relate to these universal fibroblasts.

      We are confident that these revisions will strengthen the manuscript and enhance its clarity and impact. The reviewers’ insights have been invaluable, and we look forward to refining the study accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      While CRISPR/Cas technology has greatly facilitated the ability to perform precise genome edits in Leishmania spp., the lack of a non-homologous DNA end-joining (NHEJ) pathway in Leishmania has prevented researchers from performing large-scale Cas-based perturbation screens. With the introduction of base editing technology to the Leishmania field, the Beneke lab has begun to address this challenge (Engstler and Beneke, 2023).

      In this study, the authors build on their previously published protocols and develop a strategy that:

      (1) allows for very high editing efficiency. The cell editing frequency of 1 edit per 70 cells reported in this study represents a 400-fold improvement over the previously published protocol,

      (2) reduces the negative effects of high sgRNA levels on parasite growth by using a weaker T7 promoter to drive sgRNA transcription.

      The combination of these two improvements should open the door to exciting large-scale screens and thus be of great interest to researchers working with Leishmania and beyond.

      We thank reviewer #1 for these encouraging comments.

      Reviewer #2 (Public Review):

      Summary:

      Previously, the authors published a Leishmania cytosine base editor (CBE) genetic tool that enables the generation of functionally null mutants. This works by utilising a CAS9-cytidine deaminase variant that is targeted to a genetic locus by a small guide RNA (sgRNA) and causes cytosine to thymine conversion. This has the potential to generate a premature stop codon and therefore a loss of function mutant.

      CBE has advantages over existing CAS-based knockout tools because it allows the targeting of multicopy gene families and, potentially, the easier generation of pooled loss of function mutants in complex population experiments. Although successful, the first generation of this genetic tool had several limitations that may have prevented its wider adoption, especially in complex genome-wide screens. These include nonspecific toxicity of the sgRNAs, low transfection efficiencies, low editing efficiencies, a proportion of transfectants that express multiple different sgRNAs, and insufficient effectivity in some Leishmania species.

      Here, the authors set out to systematically solve each of these limitations. By trialling different transfection conditions and different CAS12a cut sites to promote sgRNA expression cassette integration, they increase the transfection efficiency 400-fold and ensure that only a single sgRNA expression cassette integrates that edits with high efficiencies. By trialling different T7 promoters, they significantly reduce the non-specific toxicity of sgRNA expression whilst retaining high editing efficiencies in several Leishmania species (Leishmania major, L. mexicana and L. donovani). By improving the sgRNA design, the authors predict that null mutants will be more efficiently produced after editing.

      This tool will find adoption for producing null mutants of single-copy genes, multicopy gene families, and potentially genome-wide mutational analyses.

      Strengths:

      This is an impressive and thorough study that significantly improves the previous iteration of the CBE. The approach is careful and systematic and reflects the authors' excellent experience developing CRISPR tools. The quality of data and analysis is high and data are clearly presented.

      Weaknesses:

      Figure 4 shows that editing of PF16 is 'reversed' between day 6 and day 16 in L. mexicana WTpTB107 cells. The authors reasonably conclude that in drug-selected cells there is a mixed population of edited and non-edited cells, possibly due to mis-integration of the sgRNA expression construct, and non-edited cells outcompete edited cells due to a growth defect in PF16 loss of function mutants. However, this suggests that the CBE tool will not work well for producing mutants with strong fitness phenotypes without incorporating a limiting dilution cloning step (at least in L. mexicana and quite possibly other Leishmania species). Furthermore, it suggests it will not be possible to incorporate genes associated with a growth defect into a pooled drop-out screen as described in the paper. This issue is not well explored in the paper and the authors have not validated their tool on a gene associated with a severe growth defect, or shown that their tool works in a mixed population setting.

      We would like to thank reviewer #2 for this helpful comment and valid point. We have now included a small-scale loss-of-function screen in L. mexicana, targeting nine known essential genes with 24 CBE sgRNAs and 15 non-targeting control sgRNAs. This approach successfully detected all known included growth-associated phenotypes in a pooled screening format. This experiment is now shown in Figure 5 and described in section “Detection of fitness-associated phenotypes in a pooled loss-of-function screen”.

      In addition, we would like to re-iterate our initial public response to this comment. We believe that escapes or reversals of mutant phenotypes can be observed also with other genetic tools used for loss-of-function screening, including lentiviral CRISPR approaches in mammalian systems and RNAi in Trypanosoma brucei (e.g. Ariyanayagam et al., 2005 and Schlecker et al., 2005). Notably, in lentiviral delivered CRISPR screens, sgRNA expression cassettes are integrated in random places within the genome and multiple cassettes can be integrated depending on the viral titre. In these type of screens, cells can escape phenotypes through various mechanisms, such as promoter silencing or selection of non-deleterious mutations. Additionally, not every CRISPR guide is efficient in generating a mutant phenotype, and RNAi constructs can also vary in their effectiveness. Despite these challenges, genome-wide loss-of-function screens have been successfully carried out in mammalian cells and Trypanosoma parasites. Therefore, we believe that the observed escape of one mutant phenotype does not preclude the detection of growth-associated or other phenotypes in pooled screens. Moreover, we did not observe a reversal of the mutant phenotype in L. mexicana, L. donovani, and L. major parasites expressing tdTomato from an expression cassette integrated into the 18S rRNA SSU locus (Figure 4). Our now included small scale fitness screen (Figure 5) confirms these assumptions and shows that we can detect “strong” growth associated phenotypes. We would also like to point out that we have recently successfully conducted several genome-wide loss-of-function screens in vivo and in vitro, ultimately confirming the feasibility of this type of screen on a genome-wide scale (manuscript in preparation).

      We have included a discussion of these points under section “Integration of CBE sgRNA expression cassettes via AsCas12a ultra-introduced DSBs increase editing rates” and section “Detection of fitness-associated phenotypes in a pooled loss-of-function screen” in our revised manuscript.

      Although welcome, the improvements to the crRNA CBE design tool are hypothetical and untested.

      We agree that the improvements to the CBE sgRNA design are currently hypothetical. We plan to systematically test our guide design principles in future studies. Since this will require testing hundreds of guides to draw robust conclusions, we believe that this aspect is beyond the scope of the current study. In section “Improved CBE sgRNA design to prioritize edits resulting only in STOP codons” of our revised manuscript we now discuss these future plans.

      The Sanger and Oxford Nanopore Technology analyses on integration sites of the sgRNA expression cassette integration will not detect the mis-integration of the sgRNA expression construct into an entirely different locus.

      We have now re-analysed our ONT data and have extracted all ONT contigs that match the CBE sgRNA expression cassette. All extracted contigs align to the 18S rRNA SSU locus, showing integration of the cassette into this locus. It is important to note that here a population was sequenced and not a clone. Despite this, no contigs could be found that would link the CBE sgRNA expression cassettes to another locus. This is now shown in Figure 4 S2 and described in section “Cas12a-mediated DSB ensures the integration of one CBE sgRNA per L. mexicana transfectant”.

      Reviewer #3 (Public Review):

      Genetic manipulation of Leishmania has some challenges, including some limitations in the DNA repair strategies that are present in the organism and the absence of RNA interference in many species. The senior author has contributed significantly to expanding the available routes towards Leishmania genetic manipulation by developing and adapting CRISPR-Cas9 tools to allow gene manipulation via DNA double-strand break repair and, more recently, base modification. This work seeks to improve on some limitations in the tools previously described for the latter approach of base modification leading to base change.

      The work in the paper is meticulously described, with solid evidence for most of the improvements that are claimed: Figure1 clearly describes reduced impairment in the growth of parasites expressing sgRNAs via changes in promoters; Figures 2 and 3 compellingly document the usefulness of using AsCas12a for integration after transformation; and Figures 1 and 4 demonstrate the capacity of the combined modifications to efficiently edit a gene in three different Leishmania species. There is little doubt these new tools will be adopted by the Leishmania community, adding to the growing arsenal of approaches for genetic manipulation.

      There are two weaknesses the authors may wish to address, one smaller and one larger.

      (1) The main advance claimed here is in this section title: 'Integration of CBE sgRNA expression cassettes via AsCas12a ultra-introduced DSBs increase editing rates', with the evidence for this presented in Figure 4. It is hard work in the submission to discern what direct evidence there is for editing rates being improved relative to earlier, Cas9-based approaches. Did they directly compare the editing by the new and old approach? If not, can they more clearly explain how they are able to make this claim, either by adding text or a new figure? A side-by-side comparison would emphasise the advance of the new approach more clearly.

      We would like to thank reviewer #3 for this helpful comment. We have directly compared our improved method to our previous base editing method in Figures 1E and 4, demonstrating higher editing rates in a much shorter time. Especially the L. major panel in Figure 4B shows that in a direct comparison between the previously published (Engstler and Beneke, eLife 2023) and our here presented new system, editing can be only observed with the version presented here. However, to clarify the improvements we made, we compare now data from our previous screen done in Engstler and Beneke, eLife 2023 with a loss-of-function screen carried out with our updated method (see Figure 5 and section “Detection of fitness-associated phenotypes in a pooled loss-of-function screen”).

      In addition, we also feel that our title might have been misleading in a sense that we claim that Cas12a editing is more efficient than other Cas9 based approaches, which is something that we don’t want to state here. Given that we have now included a small scale CRISPR screen and given that we generally show improved base editing compared to our previous method (improved in terms of less toxicity, more editing in shorter time, higher transfection rates and less species specific variation), we have rephrased our title to: “Improved base editing and functional screening in Leishmania via co-expression of the AsCas12a ultra, a T7 RNA Polymerase, and a cytosine base editor”. 

      (2) The ultimate, stated goal of this work is (abstract) to 'enable a variety of loss-of-function screens', as the older approach had some limitations. This goal is not tested for the new tools that have been developed here; the experiment in Figure 5 merely shows that they can, not unexpectedly, make a gene mutant, which was already possible with available tools. Thus, to what extent is this paper describing a step forward? Why have the authors not run an experiment - even the same one that was described previously in Engstler and Beneke (2023) - to show that the new approach improves on previous tools in such a screen, either in scale or accuracy?

      We have now included a small-scale loss-of-function screen in L. mexicana, targeting nine known essential genes with 24 CBE sgRNAs and 15 non-targeting control sgRNAs. This approach successfully detected all known included growth-associated phenotypes in a pooled screening format. This experiment is now shown in Figure 5 and described in section “Detection of fitness-associated phenotypes in a pooled loss-of-function screen”. We believe that this underscores our claims made here and believe therefore that our updated toolbox will indeed enable a variety of loss-of-function screens.

      As pointed out in the comment to reviewer #2, we have recently successfully conducted several genome-wide loss-of-function screens in vivo and in vitro, ultimately confirming the feasibility of this type of screen on a genome-wide scale (manuscript in preparation). Without the improvements presented here, such as the higher transfection and base editing rates, these genome-wide screens could have not been carried out.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I would like to compliment Tom Beneke and his lab on their continued efforts to develop tools to facilitate genome editing in Leishmania.

      I have no doubt that the toolkit presented in this study will be very useful for the community. The submitted paper is very well written and contains all the necessary controls to support the author's claims. There is only one point that left me a bit concerned if this strategy is to be used for large-scale screens, and that is the potential for integration of multiple sgRNA expression cassettes in a single cell.

      We would like to thank reviewer 1 for helpful comments. We have addressed the major concern raised by including a small-scale loss-of-function screen in our revised manuscript. By targeting nine known essential genes with 24 CBE sgRNAs and 15 non-targeting control sgRNAs, this approach successfully detected growth-associated phenotypes in a pooled format (see section “Detection of fitness-associated phenotypes in a pooled loss-of-function screen” and Figure 5). Regarding the point of multiple sgRNA expression cassette integration, please see the next comment below.

      Major points:

      Integration of multiple sgRNA expression cassettes:

      While Illumina-based gDNA-seq is well suited to determine changes in ploidy, I don't think it is sensitive enough to draw conclusions about possible double integration in a small percentage of cells. In fact, the data shown in Figure 4 S1D show a normalized coverage >1.5 for sgRNA cassette and NeoR, suggesting that they may have integrated >1 times in some cells.

      To verify that the integration of the CBE sgRNA expression cassette is specific, we have re-analysed our ONT results and confirmed that only ONT contigs can be detected that link the CBE sgRNA expression to the 18S rRNA locus. No other integration sites can be found. We also do not detect any contigs containing multiple CBE sgRNA expression cassettes. This is now shown in Figure 4 S2 and described in section “Cas12a-mediated DSB ensures the integration of one CBE sgRNA per L. mexicana transfectant”.

      Nevertheless, it is a valid concern that the sequencing depth is not sufficient to detect small percentage of cells that have integrated the CBE sgRNA expression multiple times. However, in this case we also like to make the point that this small percentage of cells within a screen is likely to be not relevant and we therefore now added a small scale pooled loss-of-function screen, targeting essential genes, to the manuscript (see new Figure 5) to proof our claim. If the integration of multiple sgRNAs into one cell would have any measurable combinatorial effect, the non-targeting controls in our screen would have been depleted as well. However, there is no detectable difference between all 15 included controls in our small-scale screen.

      We have addressed all points in sections “Cas12a-mediated DSB ensures the integration of one CBE sgRNA per L. mexicana transfectant“ and “Detection of fitness-associated phenotypes in a pooled loss-of-function screen”.

      To avoid double integration, wouldn't it be easiest to just create an allele-specific "landing pad" on one chromosome? I believe that a double integration rate of ~20% could severely complicate the analysis of any large-scale screen later on.

      We thank the reviewer for this suggestion but we have tried to use an allele-specific "landing pad" and described this already in our first manuscript version (see section “DSBs introduced by AsCas12a ultra increase integration rates of donor DNA constructs”). Specifically, we integrated CBE sgRNA expression cassettes into the neomycin resistance marker contained in the tdTomato expression cassette (Figure 2 S1D, Cas12a crRNA-5 and 6) but this resulted in lower transfection rates (Figure 2F: crRNA-5 1 in ~47,000; crRNA-6 1 in ~32,000) then when using a Cas12a crRNA that targets the 18S rRNA locus directly (Figure 2F: crRNA-4 1 in ~2,000). As we believe a high transfection rate is key for pooled large-scale screens, we therefore pursued further experiments with crRNA-4. However, since a different crRNA can be easily selected for our tool, simply by just changing the Cas12a crRNA during transfection, users can chose a different integration site or other “landing pads” if they want to. We have updated section “Cas12a-mediated DSB ensures the integration of one CBE sgRNA per L. mexicana transfectant” to clarify these details.

      Also, it is not clear to me how the integration of tdTomato could affect the integration of the sgRNA expression cassette 400 bp downstream.

      As said above, our ONT data clearly shows that we can only see integration into one locus (Figure 4 S1 and S2). Given that the recognition site of crRNA-4 is contained in the homology flank used to integrate tdTomato into the 18S rRNA locus, this may contribute to the effect we observe. But since the homology sequences match the original sequences within the locus, the reasons to why this affects integration of the CBE sgRNA expression cassettes remain also elusive to us. We try to discuss this better now in the section “Cas12a-mediated DSB ensures the integration of one CBE sgRNA per L. mexicana transfectant”.

      Data accessibility:

      The Illumina and ONT data should be made publicly available.

      ONT and Illumina fastq reads are now available at the European Nucleotide Archive (ENA Accession Number: PRJEB83088)

      Minor point:

      Line 30: It would be easier for readers if the authors could briefly explain what bar-seq is.

      We have added more details:[…] and bar-seq screens, which involve individually deleting, barcoding, and pooling mutants for analysis, have facilitated […].

      Lines 114, 120: I think the authors are referring to Figures 1E and F, not Figures 2E and F.

      Many thanks for picking this up, we have corrected the Figure reference.

      Reviewer #2 (Recommendations For The Authors):

      This has the potential to be a valuable tool for the community if it is efficiently distributed. If the authors have not yet done so they should make their plasmids available to the community via Addgene.

      We have started the deposit process with Addgene and plasmids will be available soon. In the meantime, all plasmid maps are available on our website www.leishbaseedit.net and can be requested for shipment from our lab.

      Line 162-165, 400-401: The potential for using AsCAS12a's intrinsic RNase activity for "multiplexing" would benefit from a little more explanation (i.e. how this would work, and what multiplexing means in this context).

      We have added further details on multiplexing with Cas12a and point out potential applications.

      “For example, Cas12a crRNA arrays with four or more guides can be assembled and transfected to introduce multiple DSBs within one gene. Since Cas12a generates sticky DNA ends that facilitate recombination via microhomology-mediated end joining and homologous recombination (Zhang et al., 2021), this approach could effectively disrupt target genes without requiring the addition of donor DNA and this may provide an alternative approach to our here presented base editing method in the future. Moreover, CBE sgRNAs could be multiplexed by interspacing them with Cas12a direct repeats (DRs), enabling simultaneous targeting of multiple genes in one cell.”

      Line 193-194: can the authors offer an explanation for the reduction in mNG editing observed with 30nt homology flanks?

      We assume this is caused by imprecise recombination events in some cells and have revised the original sentence.

      In several places in the manuscript, it is unclear if an analysis has been done on an individual clone or a population derived from multiple transfected cells. If on mixed population, clarify this and calculate the number of clones that the mixture represents. E.g. lines 195-196 and 221-223 (Sanger sequencing of integration site); Line 333-352 (ONT analysis of CBE expression cassette integration).

      Only when we tested whether multiple CBE sgRNAs are integrated, we generated and analysed clones (Figure 4 S3). In all other experiments we analysed parasite populations. For better clarity, we have where possible indicated this in the revised manuscript (e.g. at the lines requested). 

      Line 259: "site by site" should presumably be "side by side".

      Many thanks for pointing this out. We have changed this typo.

      Lines 315-317: Clarify why the mis-integration of the CBE sgRNA expression cassette might cause a lack of editing (e.g. lack of expression?).

      We have added: “This could potentially result in the silencing of the CBE sgRNA expression or even lead to the deletion of the guide cassette”

      Line 364 - 367: it is unlikely there is the statistical power to state that 2/10 represents lower than the previously observed 38% of double integrants.

      We agree that the statistical power is low and have therefore changed our phrasing to an overall estimation.

      Reviewer #3 (Recommendations For The Authors):

      I suggest that the authors make clearer to the reader the evidence for improved editing efficiency in the new CBE system described here relative to the system described in Engstler and Beneke, 2023. Such clarification could be as simple as an extra paragraph or figure, clearly comparing the editing rates with the two systems in, as far as possible, equivalent conditions.

      We have directly compared our improved method to our previous base editing method in Figures 1E and 4, demonstrating higher editing rates in a much shorter time. Especially the L. major panel in Figure 4B shows that in a direct comparison between the previously published (Engstler and Beneke, eLife 2023) and new system, editing can be only observed with the version presented here. However, to clarify the improvements we made, we compare now data from our previous screen done in Engstler and Beneke, eLife 2023 with a loss-of-function screen carried out with our updated method (see Figure 5 and section “Detection of fitness-associated phenotypes in a pooled loss-of-function screen”).

      The significance of this work would be improved by running the type of loss of fitness screen described previously in Engstler and Beneke (2023), thereby showing that the new approach improves on previous tools. Without such data, questions remain about potential confounding effects that might not be anticipated from the targeted experiments provided in the current manuscript.

      We thank the reviewer for this suggestion. The requested experiment is now presented in Figure 5 and described in section “Detection of fitness-associated phenotypes in a pooled loss-of-function screen”.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment

      This important study provides empirical evidence of the effects of genetic diversity and species diversity on ecosystem functions across multi-trophic levels in an aquatic ecosystem. The support for these findings is solid, but a more nuanced interpretation of the results could make the conclusions more convincing. The work will be of interest to ecologists working on multi-trophic relationships and biodiversity.

      Thanks for this new assessment. Here below we reply to the comments that you and the reviewer have made. We understand the critics related to the issue of the interpretation of causal relationships from observational data. We now added an entire paragraph (in the second paragraph of the Discussion) that explicitly call for a cautionary interpretation of our results. We also tried to refrain the use of certain words (e.g., “we demonstrate”) when we think it is hard to conclude. This a tricky exercise as on the one hand we gathered a large and strong database (which had been underlined by the reviewers) that should supposedly strengthen statistical inferences, but on the other hands, the inferences we’ve made are based from observational data, which obviously comes from biases (even if partially controlled statistically). We hope that you’ll find our adding appropriate to find the good balance between a strong dataset and fragile interpretation.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work used a comprehensive dataset to compare the effects of species diversity and genetic diversity within each trophic level and across three trophic levels. The results stated that species diversity had negative effects on ecosystem functions, while genetic diversity had positive effects. Additionally, these effects were observed only within each trophic level and not across the three trophic levels studied. Although the effects of biodiversity, especially genetic diversity across multi-trophic levels, have been shown to be important, there are still very few empirical studies on this topic due to the complex relationships and difficulty in obtaining data. This study collected an excellent dataset to address this question, enhancing our understanding of genetic diversity effects in aquatic ecosystems.<br /> Strengths:

      The study collected an extensive dataset that includes species diversity of primary producers (riparian trees), primary consumers (macroinvertebrate shredders), and secondary consumers (fish). It also includes genetic diversity of the dominant species in each trophic level, biomass production, decomposition rates, and environmental data. The writing is logical and easy to follow.

      Weaknesses:

      The two main conclusions-(1) species diversity had negative effects on ecosystem functions, while genetic diversity had positive effects, and (2) these effects were observed only within each trophic level, not across the three levels-are overly generalized. Analysis of the raw data shows that species and genetic diversity have different effects depending on the ecosystem function. For example, neither affected invertebrate biomass, but species diversity positively influenced fish biomass, while genetic diversity had no effect. Furthermore, Table S2 reveals that only four effect sizes were significant (P < 0.05): one positive genetic effect, one negative genetic effect, and two negative species effects, with two effects within a trophic level and two across trophic levels. Additionally, using a P < 0.2 threshold to omit lines in the SEMs is uncommon and was not adequately justified. A more cautious interpretation of the results, with acknowledgment of the variability observed in the raw data, would strengthen the manuscript.

      There is actually no objective justification for having chosen p<0.20. This is a subjective threshold that has been chosen to simplify the visual interpretation of causal graphs while highlighting the most biologically relevant links. We have now added a sentence stating explicitly the subjective nature of the threshold. We understand the point you raised regarding the cautionary interpretation of the results. We have now added a paragraph (just before the detailed discussion) explicitly calling for a cautionary interpretation of the results (see l. 414-424). We think this paragraph prevails for the entire discussion. Our message in this paragraph is that inferences that we’ve made can arise from both a biological reality and statistical artefacts. We can not really tease apart at this stage, and our interpretation of the results therefore has to be taken with care. We hope you’ll find the statement adequate.  We prefer advertising the readers from the start rather than including cautionary note all over the discussion. We feel it was more logical and comfortable. We have also modified the text from place to place to avoid strong statement such as “we demonstrated” when we think the demonstration can not be considered as solid.

      Recommendations for the authors:

      Reviewing Editor:

      In addition to the comments from the reviewer, we have the following comments on your paper:

      (1) It would be important to clarify that there could be different interpretations about one of the major findings: for within-trophic BEF relationships, genetic and species diversity have the opposite effects on ecosystem functions (i.e., positive and negative effects for genetic and species diversity, respectively). (1) One possibility is that for each specific ecosystem function, genetic and species diversity have the opposite effects. (2) The other possibility is that genetic diversity has positive effects on some functions, while species diversity has negative effects on other functions. These two possibilities can have quite different implications about the generalizability of the conclusion, mechanisms involved, and practices for ecosystem management. Therefore, it would be important to clarify that the findings from this paper are more about the second rather than the first possibility both in the discussion and conclusion sections.

      Yes, true, this is an important distinction and we agree with your conclusion. We have added a section in the Discussion (l. 537-545) and a note in the Conclusion (l. 625-627).

      (2) Please take special caution when comparing the findings from this observational study vs. previous experimental works. (1) The different ranges of diversity in the observational vs. experimental works, together with the nonlinear nature of the BEF relationship challenge the direct comparisons of their results. That is, even if their true BEF relationship are identical, focusing on different sections of a nonlinear curve can give us different results of the estimated BEF relationships. This challenge is further aggravated when involving both genetic and species diversity because these two facets have different biological meanings as the authors have already noted. Using standardized effect size or explained variance, as this paper did, may partially get around but not truly resolve this issue. It would be important to add clarifications to make the comparisons between genetic and species diversity effects more understandable in a biological or ecological context. One possibility could be to state that both genetic and species diversity measured in this study well represent their natural gradients in this aquatic ecosystem, so that the standardized effect sizes quantify how these natural diversity gradients associate with ecosystem functions. This further points to the issue about the representatives of the genetic diversity sampled from up to 32 individuals for each species per site, which would also need clarification. We suggest the authors to identify these challenges in the discussion, so that future studies can be aware of these or even find alternative solutions. (2) The species diversity effects have quite different meanings between this study and previous observational and experimental studies. The negative effects are for the biomass of one target species from this study, while the species diversity effects are usually for the biomass of all species within a community. These two scenarios are not directly comparable. The negative relationship between species diversity and a target species' biomass can simply arise from a sampling process, for example, given the same community biomass, the more species occur in a community, the less biomass allocated to a single species, without assuming any biological interactions or species differences. And this study cannot exclude this possibility. Note that this null, sampling process is not equal to a negative covariance between biomass of a focal species and biomass of the community involving the species as stated in lines 446-448. To avoid possible mis-interpretation, we suggest the authors to revise or remove the comparison appearing in the paragraph starting from line 515.

      Thanks for these comments. Although we agree with the two points raised by the Editor, we must admit that we found them difficult to answer properly.  See our detailed responses hereafter.

      Point (1): this is true that comparisons with previous studies is tricky, especially when these comparisons also include both genetic and species components. This is a problem (a limit) for almost all comparisons in biology. We added a few lines to warn readers that these comparisons are not without any limits (see l. 414-424). Regarding the fact that « genetic and species diversity measured in this study well represent their natural gradients in this aquatic ecosystem »: all is about scales. The genetic and species diversity measured in this study are obviously representative of communities and populations of the upstream (piedmont) part of the Garonne River basin as our sampling design covers all the east-west gradient. On the other hand, these communities and populations are not representative of the entire Garonne River basin, as we lack all the downstream part of the network. We added a sentence to specify that the sampling communities are specific of this specific ecosystem (rivers from the piedmont, see l. 224-226). Regarding « the issue about the representatives of the genetic diversity sampled from up to 32 individuals », we must admit that we are surprised by this comment as it is a very classical way for estimating genomic diversity. Although there is no clear rule, 30 individuals per site is generally assumed (and has been shown) to be an appropriate sample size (especially given that we used here a genome-wide approach). We added a reference to justify the sample size.

      Point (2): We understand the point raised by the Editors. Regarding your note “Note that this null, sampling process is not equal to a negative covariance between biomass of a focal species and biomass of the community involving the species as stated in lines 446-448.”: this is true, we rephrase this sentence to be more neutral. Regarding the paragraph starting l. 515 (now 550), we refrained to remove this paragraph as it provides some mechanistic explanation for underlying patterns, which we think is important even if incomplete or speculative. The confusion probably arises because here we discuss all type of negative BEFs, including the effect of species diversity on the biomass of the community, on the biomass of focal species (including those from other trophic levels) and the litter degradation. Our discussion is very general, whereas you seem to focus on a specific case of negative species-BEFs. To highlight this further and warn readers about possible conclusions, we added the following sentence: “Given the empirical nature of our study and the fact that our meta-regressive approach includes several types of BEFs (e.g., species richness acting either on the biomass of a single focal species or on the biomass of an entire focal community), it is hard to tease apart specific and underlying mechanisms” (l. 573-576).

      (3) Please clarify how you derived the 95% CI in Fig. 5. For example, how did you involve the uncertainties of each raw effect size (e.g. each black triangle in Fig. 5a) when calculating their mean and 95% CI in each group (e.g., the red triangles and error bars in Fig. 5a)?

      Estimates and 95%-CI from Figure 5 are derived from the mixed-effect models described from l. 314. They are hence marginal effects derived from the models, and 95%-CI include all error terms (fixed and random). We now specify in the Figure caption that estimates and 95%-CI are marginal effects derived from the mixed-effect models.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors examined whether aberrantly projecting retinal ganglion cells in albino mice innervate a separate population of thalamocortical neurons, as would be predicted for Hebbian learning rules. The authors find support for this hypothesis in correlated light and electron microscopy (CLEM) reconstructions of retinal ganglion cell axons and thalamocortical neurons. In a second line of investigation, the authors ask the same question about retinal ganglion cell innervation of local inhibitory interneurons of the mouse LGN. The authors conclude that these connections are less specific.

      Strengths:

      The authors make good use of CLEM to test a circuit-level hypothesis, and they find an interesting difference in RGC synaptic innervation patterns for thalamocortical neurons vs. local interneurons.

      Weaknesses:

      The conclusions about the local interneuron innervation are a little more difficult to interpret. One would expect to only capture a small part of the local interneuron dendritic field, as compared to the smaller thalamocortical neurons, right? Doesn't that imply that finding some evidence of promiscuous connectivity means that other dendrites that were not observed probably connect to many different RGCs?

      We will try to clarify this point

      Reviewer #2 (Public review):

      In this article, the authors examined the organization of misplaced retinal inputs in the visual thalamus of albino mice at electron-microscopic (EM) resolution to determine whether these synaptic inputs are segregated from the rest of the retinogeniculate circuitry.

      The study's major strengths include its high resolution, achieved through serial EM and confocal microscopy, which enabled the identification of all synaptic inputs onto neurons in the dorsolateral geniculate nucleus (dLGN).

      The experiments are very precise and demanding; thus, only the synaptic inputs of a few neurons were fully reconstructed in one animal. A few figures could be improved in their presentation.

      Despite this, the authors clearly demonstrate the synaptic segregation of misrouted retinal axons onto dLGN neurons, separate from the rest of the retinogeniculate circuitry.

      This finding is impactful because retinal inputs typically do not segregate within the mouse dLGN, and it was previously thought that this was due to the nucleus's small size, which might prevent proper segregation. The study shows that in cases where axons are misrouted and exhibit a different activity pattern than surrounding retinal inputs, segregation of inputs can indeed occur. This suggests that the normal system has the capacity to segregate inputs, despite the limited volume of the mouse dLGN.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Please include page numbers and line numbers in future submissions.

      Done

      (2) I am red-green colorblind, and I had a lot of trouble seeing the red channels when they were mixed with green. I recommend using magenta when possible.

      Thanks for the heads up. We have switched to green and magenta where possible. In the tinted EM where switching colors did not seem helpful, we added an asterisk to RGC boutons so that red and green would not be the only identifiers.

      (3) It would help if the figure captions also stated the conclusions that can be drawn from the figures. I recommend stating the main conclusion in the first sentence of the caption, rather than stating only what we are viewing. Similarly, the last sentence of the caption can help summarize what has been seen.

      We have included summary sentences at the beginning and end of figure legends.

      (4) In the text when discussing Figure 2J, do the authors mean to cite Supplementary Figure 2?

      Yes, thanks.

      (5) I don't think TC was ever defined (or I didn't find it).

      Corrected

      (6) In the subsection "An exclusive set..." cite Liang et al. as more evidence of non-specific innervation.

      We cite Liang et al in the discussion, but I don’t see a good place to cite it in the referenced results section. Please elaborate if we are missing something.

      (7) Supplementary Figure 3 is never cited.

      We have added the citation to Figure 3.

      (8) I found myself unsure of what to conclude after the results on LIN. A few more sentences of interpretation and restating what was found would help.

      We have added additional clarification in the Results:

      “The LIN results are consistent with our prediction that shaft dendrites would be indifferent to island/non-island boundaries while individual targeted dendrites would target either the island or non-island RGC boutons. However, the restriction of the targeted dendrites to one or the other RGC field does not appear to be an absolute rule. Rather the scale of targeted dendrite exploration and the size of the exclusion zone is likely to reduce the chances that a targeted dendrite would find partners on both in the island and outside of the island. This matching between the exploration of targeted LIN dendrites and the segregation of retinogeniculate connectivity means that targeted LIN dendrites will have an RGC input profile (island/non-island) that matches the TCs they innervate.”

      Reviewer #2 (Recommendations for the authors):

      (1) The abbreviation TC is used in the text without a definition.

      Corrected

      (2) The features that allow for labeling the different dendrites/cells (TC and LIN) in Serial EM data (Figure 1) are necessary. While the explanation is provided for RGC boutons, the labeling for thalamic cells is not discussed.

      We added the sentence:

      “Thalamocortical dendrites were distinguished from local inhibitory neuron dendrites by the presence of spines and the absence of synaptic outputs.”

      (3) Image 2C (EM) appears blurry or pixelated. Enhancing its resolution could improve clarity.

      Image 2C is a demonstration of how much we felt we could sacrifice image quality and still reconstruct TC arbors and RGC inputs.

      (4) The gray circles that show the innervation of TC17 in Figure 2E are barely visible, especially on-screen without high magnification. A more contrasting color and wider lines would enhance visibility. It would also be helpful to indicate TC17 in Figure 2H and 2G, as this cell is special and highlighted in the main text.

      We have made the requested changes

      (5) A TC with no RGC input is mentioned. Have you identified other synaptic inputs, potentially related to SC or the cortex?

      Both TC17 (a few exclusion zone RGC inputs) and TC5 (no RGC inputs) were innervated by some large, dark mitochondria boutons that could be SC inputs.  However, we did not perform enough reconstruction of the axons to confidently describe their non-RGC input profile. I have previously observed occasional TCs in the same region of the dLGN where RGC inputs are almost entirely replaced by SC inputs, so finding two such cells was not surprising.

      (6) Two fully reconstructed TCs are mentioned. Please specify their exact number in the text, as citing Figure 2J or Supplementary Figure 1 alone is not sufficient for identification.

      Clarified as “(TC3, TC4, Figure 2J, Supplementary Figure 2,3).”

      (7) A correlation between the position of the dendrites and the location of RGC inputs would provide additional insights. This is somewhat reminiscent of the dendrite orientation of Layer IV spiny stellate neurons in the somatosensory cortex that receive inputs from the thalamocortical axons and could be mentioned in the discussion.

      We believe that the images provided are a strong argument for TC arbors being shaped by RGC bouton distributions. We agree that reporting the correlation between dendrites and RGC boutons would be useful, but we found this correlation difficult to quantify. One of the challenges is that we would need to perform several-fold more reconstruction of dendrites and RGC boutons to have an unbiased mapping of both. Currently, most of the reconstructions stop when the dendrites assume a distal morphology and stop interacting with RGC boutons. Likewise, the EM of the RGC boutons are only those that innervate the reconstructed cells. We considered simply quantifying the asymmetry of the TC arbors relative to a symmetrical distribution and a random distribution, but we felt that quantification would be difficult to interpret without a similar analysis performed in the same region of dLGN on wild-type TCs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper is an incremental follow-up to the authors' recent paper which showed that Purkinje cells make inhibitory synapses onto brainstem neurons in the parabrachial nucleus which project directly to the forebrain. In that precedent paper, the authors used a mouse line that expresses the presynaptic marker synaptophysin in Purkinje cells to identify Purkinje cell terminals in the brainstem and they observed labeled puncta not only in the vestibular and parabrachial nuclei, as expected, but also in neighboring dorsal brainstem nuclei, prominently the central pontine grey. The present study, motivated by the lack of thorough characterization of PC projections to the brainstem, uses the same mouse line to anatomically map the density and a PC-specific channelrhodopsin mouse line to electrophysiologically assess the strength of Purkinje cell synapses in dorsal brainstem nuclei. The main findings are (1) the density of Purkinje cell synapses is highest in vestibular and parabrachial nuclei and correlates with the magnitude of evoked inhibitory synaptic currents, and (2) Purkinje cells also synapse in the central pontine grey nucleus but not in the locus coeruleus or mesencephalic nucleus.

      Strengths:

      The complementary use of anatomical and electrophysiological methods to survey the distribution and efficacy of Purkinje cell synapses on brainstem neurons in mouse lines that express markers and light-sensitive opsins specifically in Purkinje cells is the major strength of this study. By systematically mapping presynaptic terminals and light-evoked inhibitory postsynaptic currents in the dorsal brainstem, the authors provide convincing evidence that Purkinje cells do synapse directly onto pontine central grey and nearby neurons but do not synapse onto trigeminal motor or locus coeruleus neurons. Their results also confirm previously documented heterogeneity of Purkinje cell inputs to the vestibular nucleus and parabrachial neurons.

      Weaknesses:

      Although the study provides strong evidence that Purkinje cells do not make extensive synapses onto LC neurons, which is a helpful caveat given previous reports to the contrary, it falls short of providing the comprehensive characterization of Purkinje cell brainstem synapses which seemed to be the primary motivation of the study. The main information provided is a regional assessment of PC density and efficacy, which seems of limited utility given that we are not informed about the different sources of PC inputs, variations in the sizes of PC terminals, the subcellular location of synaptic terminals, or the anatomical and physiological heterogeneity of postsynaptic cell types. The title of this paper would be more accurate if "characterization" were replaced by "survey".

      Several of the study's conclusions are quite general and have already been made for vestibular nuclei, including the suggestions in the Abstract, Results, and Discussion that PCs selectively influence brainstem subregions and that PCs target cell types with specific behavioral roles.

      We agree that we did not provide an in-depth characterization of PC synapses onto all identified types of brainstem neurons. With so many types of neurons in the brainstem, this would be a monumental task. Despite this limitation we prefer to keep our original title, since our study makes the following advances:

      • We provide a comprehensive map of all PC synaptic boutons across the brainstem, and corresponding maps of PC synaptic input sizes. The input sizes vary widely, but are often multiple nanoamps, indicating that the cerebellum is an important regulator of activity in these regions. These maps will be indispensable for future investigations of cerebellar outputs.

      • We find that PC projections and the synapses they make are spatially restricted within most target nuclei such as the vestibular and parabrachial nuclei. This suggests that the influence of the cerebellum is spatially segregated within these nuclei, and likely allows the cerebellum to regulate specific behaviors.  While some aspects of these gradients have been described previously, our study is comprehensive, and has a higher degree of specificity than can be achieved with immunohistochemistry. 

      • We discover that PCs form functional synapses in the pontine central grey and nearby nuclei. Much of this region’s function is unknown, but certain subregions are important for micturition and valence. PCs make large synapses onto a small fraction of cells in this region, which suggests that PCs may target specific cell types to control novel nonmotor behaviors.

      • We provide clarification regarding PC projections to the locus coeruleus. Multiple high-profile, highly influential studies using rabies tracing (Schwarz et al., Nature 2015; Breton-Provencher and Sur, Nature Neuroscience 2019; and others) described a prominent PC input to the locus coeruleus. We showed that this projection is essentially nonexistent, both anatomically and functionally. We previously addressed this issue, but the PC-specific optogenetic approach we used here provides the most compelling evidence against a prominent PC-LC connection. This is an important finding for the cerebellum and a cautionary tale for conclusions based solely on viral tracing methods. We will expand on this issue in response to the comments of reviewer #3.

      Reviewer #2 (Public review):

      Summary:

      While it is often assumed that the cerebellar cortex connects, via its sole output neuron, the Purkinje cell, exclusively to the cerebellar nuclei, axonal projections of the Purkinje cells to dorsal brainstem regions have been well documented. This paper provides comprehensive mapping and quantification of such extracerebellar projections of the Purkinje cells, most of which are confirmed with electrophysiology in slice preparation. A notable methodological strength of this work is the use of highly Purkinje cell-specific transgenic strategies, enabling selective and unbiased visualization of Purkinje terminals in the brainstem. By utilizing these selective mouse lines, the study offers compelling evidence challenging the general assumption that Purkinje cell targets are limited to the cerebellar nuclei. While the individual connections presented are not entirely novel, this paper provides a thorough and unambiguous demonstration of their collective significance. Regarding another major claim of this paper, "characterization of direct Purkinje cell outputs (Title)", however, the depth of electrophysiological analysis is limited to the presence/absence of physiological Purkinje input to postsynaptic brainstem neurons whose known cell types are mostly blinded. Overall, conceptual advance is largely limited to confirmatory or incremental, although it would be useful for the field to have the comprehensive landscape presented.

      Strengths:

      (1) Unsupervised comprehensive mapping and quantification of the Purkinje terminals in the dorsal brainstem are enabled, for the first time, by using the current state-of-the-art mouse lines, BAC-Pcp2-Cre and synaptophysin-tdTomato reporter (Ai34).

      (2) Combinatorial quantification with vGAT puncta and synaptophysin-tdTomato labeled Purkinje terminals clarifies the anatomical significance of the Purkinje terminals as an inhibitory source in each dorsal brainstem region.

      (3) Electrophysiological confirmation of the presence of physiological Purkinje synaptic input to 7 out of 9 dorsal brainstem regions identified.

      (4) Pan-Purkinje ChR2 reporter provides solid electrophysiological evidence to help understand the possible influence of the Purkinje cells onto LC.

      Weaknesses:

      (1) The present paper is largely confirmatory of what is presented in a previous paper published by the author's group (Chen et al., 2023, Nat Neurosci). In this preceding paper, the author's group used AAV1-mediated anterograde transsynaptic strategy to identify postsynaptic neurons of the Purkinje cells. The experiments performed in the present paper are, by nature, complementary to the AAV1 tracing which can also infect retrogradely and thus is not able to demonstrate the direction of synaptic connections between reciprocally connected regions. Anatomical findings are all consistent with the preceding paper. The likely absence of robust physiological connections from the Purkinje to LC has also been evidenced in the preceding paper by examining c-Fos response to Purkinje terminal photoinhibition at the PBN/LC region.

      We agree that we previously dealt with the issue of PC-LC synapses (Chen et al., 2023, Nat Neurosci), but our conclusions differed from several high-profile publications (Schwarz et al., Nature 2015; Breton-Provencher and Sur, Nature Neuroscience 2019), and still met considerable resistance. We felt that the optogenetic approach provided the most definitive means of evaluating the presence and strength of PC-LC synapse that will hopefully settle this issue. These experiments also set a standard for future studies assessing the presence of PC synapses onto other target neurons in the brainstem.

      (2) Although the authors appear to assume uniform cell type and postsynaptic response in each of the dorsal brainstem nuclei (as noted in the Discussion, "PCs likely function similarly to their inputs to the cerebellar nuclei, where a very brief pause in firing can lead to large and rapid elevations in target cell firing"), we know that the responses to the Purkinje cell input are cell type dependent, which vary in neurotransmitter, output targets, somata size, and distribution, in the cerebellar and vestibular nuclei (Shin et al., 2011, J Neurosci; Najac and Raman, 2015, J Neurosci; Özcan et al., 2020, J Neurosci). This consideration impacts the interpretation of two key findings: (a) "Large ... PC-IPSCs are preferentially observed in subregions with the highest densities of PC synapses (Abstract)". For example, we know that the terminal sparse regions reported in the present paper do contain Floccular Targeted Neurons that are sparse yet have dense somatic terminals with profound postinhibitory rebound (Shin et al.). Despite their sparsity, these postsynaptic neurons play a distinct and critical role in proper vestibuloocular reflex. Therefore, associating broad synaptic density with "PC preferential" targets, as written in the Abstract, may not fully capture the behavioral significance of Purkinje extracerebellar projections. (b) "We conclude ... only a small fraction of cell. This suggests that PCs target cell types with specific behavioral roles (Abstract, the last sentence)". Prior research has already established that "PCs target cell types with specific behavioral roles in brainstem regions". Also, whether 23 % (for PCG), for example, is "a small fraction" would be subjective: it might represent a numerically small but functionally important cell type population. The physiological characterization provided in the present cell type-blind analysis could, from a functional perspective, even be decremental when compared to existing cell typespecific analyses of the Purkinje cell inputs in the literature.

      We now cite the papers suggested by the reviewer (Shin et al., 2011, J Neurosci; Najac and Raman, 2015, J Neurosci; Özcan et al., 2020, J Neurosci) and add to the discussion.

      (3) The quantification analyses used to draw conclusions about

      (a) the significance of PC terminals among all GABAergic terminals and the fractions of electrophysiologically responsive postsynaptic brainstem neurons may have potential sampling considerations:.

      (a.i) this study appears to have selected subregions from each brainstem nucleus for quantification (Figure 2). However, the criteria for selecting these subregions are not explicitly detailed, which could affect the interpretation of the results.

      Additional explanation has been added to results in the section, “Quantification of PC synapses in the brainstem.”  

      (a.ii) the mapping of recorded cells (Figure 3) seems to show a higher concentration in terminal-rich regions of the vestibular nuclei.

      In Figure 3, we strived to record in an unbiased manner. However, there may have been a slight bias to recordings in areas of lower myelination where patching is easier. We now clarify this issue in the text.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Chen and colleagues explores the connections from cerebellar Purkinje cells to various brainstem nuclei. They combine two methods - presynaptic puncta labeling as putative presynaptic markers, and optogenetics, to test the anatomical projections and functional connectivity from Purkinje cells onto a variety of brainstem nuclei. Overall, their study provides an atlas of sorts of Purkinje cell connectivity to the brainstem, which includes a critical analysis of some of their own data from another publication. Overall, the value of this work is to both provide neural substrates by which Purkinje cells may influence the brainstem and subsequent brain regions independent of the deep cerebellar nuclei and also, to provide a critical analysis of viral-based methods to explore neuronal connectivity.

      Strengths:

      The strengths lie in the simplicity of the study, the number of cells patched, and the relationship between the presence of putative presynaptic puncta and electrophysiological results. This type of study is important and should provide a foundation for future work exploring cerebellar inputs and outputs. Overall, I think that the critique of viral-based methods to define connectivity, and a more holistic assessment of what connectivity is and how it should be defined is timely and warranted, as I think this is under-appreciated by many groups and overall, there is a good deal of research being published that do not properly consider the issues that this manuscript raises about what viral-based connectivity maps do and do not tell us.

      We thank the reviewer for highlighting this important aspect of this work, and for agreeing with our thesis concerning viral-based connectivity maps.

      Weaknesses:

      While I overall liked the manuscript, I do have a few concerns that relate to interpretation of results, and discussion of technological limitations. The main concerns I have relate to the techniques that the authors use, and an insufficient discussion of their limitations. The authors use a Cre-dependent mouse line that expresses a synaptophysin-tomato marker, which the authors confidently state is a marker of synapses. This is misleading. Synaptophysin is a vesicle marker, and as such, labels axons, where vesicles are present in transit, and likely cell bodies where the protein is being produced. As such, the presence of tdtomato should not be interpreted definitively as the presence of a synapse. The use of vGAT as a marker, while this helps to constrain the selection of putative pre-synaptic sites, is also a vesicle marker and will likely suffer the same limitations (though in this case, the expression is endogenous and not driven by the ROSA locus). A more conservative interpretation of the data would be that the authors are assessing putative pre-synaptic sites with their analysis. This interpretation is wholly consistent with their findings showing the presence of tdtomato in some regions but only sparse connectivity - this would be expected in the event that axons are passing through. If the authors wish to strongly assert that they are specifically assessing synapses, a marker better restricted to synapses and not vesicles may be more appropriate.

      We agree that synaptophysin-tdTomato is an imperfect marker, although it is vastly superior to cytosolic tdTomato.  We found that viral expression of synaptophysin-GFP gives much more punctate labelling, but an appropriate synaptophysin-GFP line is not available. We carefully point out this issue, and threshold the images to avoid faint labeling associated with fibers of passage.  The intersection of VGAT labelling and of the synaptophysin-tdTomato labelling provides us with superior identification of PC boutons.  We will add additional clarification to point out that these are putative presynaptic boutons, but that alone this does not establish the existence or the strength of functional synapses.

      Similarly, while optogenetics/slice electrophysiology remains the state of the art for assessing connectivity between cell populations, it is not without limitations. For example, connections that are not contained within the thickness of the slice (here, 200 um, which is not particularly thick for slice ephys preps) will not be detected. As such, the absence of connections is harder to interpret than the presence of connections. Slices were only made in the coronal plane, which means that if there is a particular topology to certain connections that is orthogonal to that plane, those connections may be under-represented. As such, all connectivity analyses likely are under-representations of the actual connectivity that exists in the intact brain. Therefore, perhaps the authors should consider revising their assessments of connections, or lack thereof, of Purkinje cells to e.g., LC cells. While their data do make a compelling case that the connections between Purkinje cells and LC cells are not particularly strong or numerous, especially compared to other nearby brainstem nuclei, their analyses do indicate that at least some such connections do exist. Thus, rather than saying that the viral methods such as rabies virus are not accurate reflections of connectivity - perhaps a more circumspect argument would be that the quantitative connectivity maps reported by other groups using rabies virus do not always reflect connectivity defined by other means e.g., functional connections with optogenetics. In some cases, the authors do suggest this (e.g."Together, these findings indicate that reliance on anatomical tracing experiments alone is insufficient to establish the presence and importance of a synaptic connection"), but in other cases, they are more dismissive of viral tracing results (e.g. "it further suggests that these neurons project to the cerebellum and were not retrogradely labeled"). Furthermore, some statements are a bit misleading e.g., mentioning that rabies methods are critically dependent on starter cell identity immediately following the citation of studies mapping inputs onto LC cells. While in general, this claim has merit, the studies cited (19-21) use Dbh-Cre to define LC-NE cells which does have good fidelity to the cells of interest in the LC. Therefore, rewording this section in order to raise these issues generally without proximity to the citations in the previous sentence may maintain the authors' intention without suggesting that perhaps the rabies studies from LC-NE cells that identified inputs from Purkinje cells were inaccurate due to poor fidelity of the Cre line. Overall, this manuscript would certainly not be the first report indicating that the rabies virus does not provide a quantitative map of input connections. In my opinion, this is still under-appreciated by the broad community and should be explicitly discussed. Thus, an acknowledgment of previous literature on this topic and how their work contributes to that argument is warranted.

      We have a different take on connectivity and the use of optogenetics.  Based on our years of experience studying synapses in brain slice, axons survive very well even when they are cut. It is not necessary to preserve intact axons that extend for long distances. It is also true that activation of these axons, with either extracellular electrical stimulation or with optogenetics, is sufficient to evoke synaptic inputs. Robust synaptic responses are evoked with optogenetic activation regardless of the slice orientation. We thank the reviewer for raising this issue, and we have added a couple of sentences to clarify this point under the section “Characterization of functional properties of PC synapses in the brainstem.”

      The discussion on starter cell specificity was not referring to the specificity of cre in transgenic animals, but the TVA/G helper proteins that are introduced by AAV and used in conjunction with the rabies virus. The issues related to this have recently been discussed in Elife (Beier, 2022) in addition to citations 58 and 59 in the manuscript. We have more explicitly highlighted this issue in the revised manuscript in the section “Lack of significant PC inputs to LC neurons.”

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      (1) Methods need detail to be replicable, particularly in how PC synapses were identified and automatically counted. It is not clear what was the variation within subregions across mice. How were neurons selected or rejected for recordings and analyses? Was each subregion sampled at equal spacing? Methods for anatomy should mention sagittal sections.

      Wording in Methods section, “Anatomy” was changed to better reflect how PC synapses were identified as colabeled segments of vGAT and tdTomato labeling. 

      Each datapoint in Figure 2D-F was quantification of a region for each section and each mouse. The color of the data point indicates the anterior posterior location of the section. The violin plot quantifies the median and quartile value for all points across sections and mice. The variability captured by the violin point reflects variability across the anterior-posterior axis. 

      Neurons were mostly randomly selected in each slice, and rejected based on unstable holding current or access resistance. Cell locations were recorded and updated with each experiment so that we minimized oversampling easier to patch regions.

      Sagittal sections were added in methods.

      (2) Figure 2D-F what is the black line and grey region?

      Additional text was added in the caption for Figure 2D-F

      (3) MEV is confusing given LAV stands for lateral vestibular - perhaps call it ME5?

      We will remain consistent with the abbreviations in the Allen Brain Reference Atlas.

      Reviewer #2 (Recommendations for the authors):

      (1) What are the criteria for distinguishing large, small, and non-responders?

      Large are in the nA range, small are in the hundreds of pA, and non-responders are effectively zero. Manual curation of these responses indicated that a current amplitude threshold of 45 pA clearly separated non-responders from responders. To be clear, the average response (as stated in text and displayed in Figure 3D) includes all cells.

      (2) p1. "Unexpectedly": it would not be unexpected, rather, expected, because it was reported in Chen et al., 2023, Nat Neurosci.

      The PCG was hinted at, but an actual functional, anatomical connection was not reported in our previous manuscript.

      (3) p1. "We combined electrophysiological recordings with immunohistochemistry to assess the molecular identities of these PC targets": please clarify "these" here. It could be read that it refers to "pontine central gray and nearby subnuclei" but it doesn't make sense. Immuno has only been performed for MeV and LC.

      Corrected

      (4) p1. "but only inhibit a small fraction of cells in many nuclei": as far as I read Fig.3, it seems that ~50% for PBN/VN and ~25% for PCG: would this be "a small fraction"?

      The small fraction of cells was in reference to subnuclei within the PCG, but we agree this statement is too broad to be useful and have eliminated it.

      (5) p2. "conventional tracer": viral tracer is becoming a standard, so dye tracer could be better here.

      Corrected

      (6) p3. "rostral/cauda": typo.

      Corrected.  

      (7) p3. Quantification of PC synapses in the brainstem: it would be helpful to introduce why synapto-tdT alone is not sufficient, and the purpose of adding vGAT immunostaining.

      We have added more on vGAT labeling putative presynaptic sites and quantifying only synaptic labeling instead of axonal tdTomato in the Results, “Quantification of PC synapses in the brainstem.” In addition, vGAT staining allows us to examine the PC contribution to total inhibition in each region.

      (8) p7. "PB and are": typo.

      Corrected. And all instances of PBN were changed to PB

      (9) p7. "they are likely a mix of excitatory and inhibitory inputs 54,55": Bagnall et al., 2009, J Neurosci, would be critically relevant here.

      Added, thank you

      (10) Figures 2-3: Yellow/Blue color scheme is hard to distinguish, and having two colors could be read as implying two distinct regions.

      We are unsure what the reviewer is referring to exactly here, but the colors refer to the sections in 2C (see the color bar on the bottom right of each atlas schematic). The points represent an individual section that was quantified, and thus do represent distinct samples from distinct regions.

      (11) Figure 2D-F: what is indicated by each point?

      Each data point is the number of PC bouton (D), density of bouton (E), or percentage of synaptophysin/vGAT (F) quantified for each region per section. Each color represents a coronally distinct section of a region. Additional text was added into the captions to clarify this and point 10.

      (12) Figure 3E, right: what is the correlation coefficient?

      The correlation coefficient was found to be 0.74

      Reviewer #3 (Recommendations for the authors):

      Some minor grammatical errors and typos need to be cleaned up (e.g. "To quantifying the densities...", "The medial-ventral region of the PBN...have extensive...".

      These errors have been corrected

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Abstract

      I don't think you need the first two sentences of the abstract. This is not a grant and your results are exciting enough to justify a full basic science-based approach.

      We fully understand this perspective.  However, we prefer to introduce the work in the broader context of sleep medicine.  This manuscript is part of our long-standing efforts to develop cavefish as a model for sleep disorders and we believe this provides important context.

      Last sentence of the abstract: the subject is missing. "That have developed..." who has developed?

      Thank you. We have corrected this error, the sentence now reads “...these findings suggest that cavefish have developed resilience to sleep loss...”

      Introduction

      First paragraph. Worth explaining in a sentence what is the link between DNA damage and ROS.

      We now state ‘Further, chronic sleep loss results in elevated reactive oxygen species (ROS), a known mediatior of DNA damage, in the gut and/or brain that contribute to mortality in Drosophila and mice [11,16].’

      "A. mexicanus exists as blind cave populations and an extant surface population that are interfertile". This needs rephrasing. As it is, it sounds like the surface population is infertile.

      We have rephrased for clarity; the line now reads: “while the surface and cave populations are geographically isolated, they remain interfertile and capable of hybridization in nature as well as laboratory settings”.

      "Further, the evolved differences in DNA repair genes, including links between mechanisms regulating sleep, light responsiveness, and DNA repair across all three cave populations studied to date [27,29]" This sentence is incomplete.

      We have corrected the phrasing, which now reads “...evolved differences in DNA repair genes have been identified across all three cave populations studied to date, including links between mechanisms regulating sleep, light responsiveness, and DNA repair”:

      Figure 1

      I recommend improving the legibility of the figure copying some of the information provided in the legend directly within the figure itself.

      A, B: label in the panel itself what is blue and what is green.

      Thank you, we have made this change.

      C: Make it clear in the figure itself that you are measuring yH2AX. Also, probably you have enough room in the figure to avoid abbreviations for Rhomb, mes, and tele. It may also help if you could add a little cartoon that explains what those three brain regions are.

      We have added text to the y axis indicating that yH2AX fluorescence is being measured, and replaced the abbreviations with eh full names of the regions.

      G: again, explain that DHE is being measured here. And perhaps pick a different colour choice to highlight the difference from C?

      We have added clarifiaction to the y-axis of the figure, but have retained the color scheme for consistency; in all surface-cave comparisons in the manuscript, gray is used for surface fish and red for cavefish.

      In the text: I would recommend adding some quantitative reminder of what is the difference in sleep amount between the two species (cave vs surface).

      We have added the following to highlight the magnitude of the difference in sleep: “Strikingly, cavefish sleep as little as 1-2 hours per day, in contrast to their surface counterparts, which sleep as much as 6-10 hours a day”

      "Together, these findings fortify the notion that cellular stress is elevated in the gut of cavefish relative to surface fish." Were the two populations fed the same diet and raised in the same lab conditions? If this is pinpointed to sleep amount, it's worth ruling out possible confounding factors.

      We have added a sentence to the results underlining this point: “Prior to imaging, both surface and cavefish had been reared in a temperature-controlled incubator, and relied solely on their yolk sac for nutrients; so, differences in gut ROS cannot be attributed to differences in rearing or feeding conditions.”

      Figure 2

      Spell out, somewhere in the figure itself, that the 30s and 60s refer to UV treatment protocols.

      We have added X-axis titles to clarify this in Fig 2 and supp. Fig 1.

      It would be worth providing a cartoon of the experimental setup that shows for instance what time of the day UV was given (it's only specified in the text) and which subsequent sleep period was selected for comparisons.

      We have added arrows to all sleep plots indicating the time of UV treatment, and brackets indicating the time period used for statistical comparisons, as well as text in the figure legends indicating this.

      Figure 3

      A. I don't think this is needed, to be honest, and if you want to keep it, it needs a better legend.

      We have edited the figure legend to increase clarity.

      B. I would make it clear in the figure that this refers to transcriptomics analysis. Perhaps you could change the order and show C, D, and then B.

      We have added text to the figure legend and the results text to more explicitly state that the PCA plot is of transcriptional response. We have however retained the original figure order, as well feel this figure is important to establish that both populations have strong, but distinct responses to the UV treatment.

      Figure 4

      A. Spell it out in the figure itself that you're staining for CPD.

      Thank you, we have made this change.

      B. You are using the same colour combination you had in Figure 1 but for yet another pairing. This is a bit confusing.

      Thank you for bringing this to our attention.  We have added descriptions of the colors in the figure legend.

      Discussion

      "Beyond the Pachón cavefish population, all three other cavefish populations have been found to have reduced sleep (Cite)." Citation missing here.

      Thank you.  We have now clarified this sentence and included a citation.

      Reviewer #2 (Recommendations For The Authors):

      Consideration of Environmental Conditions:

      Evaluate whether the lab conditions, which may more closely resemble surface environments, could influence the observed increase in neuronal DNA damage and gut ROS levels in cavefish. Adjusting these conditions or discussing their potential impact in the manuscript would strengthen the findings.

      We are very excited about these experiments.  We have a paper that will be submitted to BioRxiv this week where we record wild-caught fish, as well as fish in caves.  The conclusion is that sleep loss is present in both populations.  This field work took over 10 years to come together and still lacks the power of the lab based assays.  Nevertheless, we can conclusively say that the phenotypes we have observed for the last ~15 years in the lab are present in a natural setting.  We have included a statement about the need for future work to test these findings in a natural setting.

      Alternative Stressors:

      Given that cavefish are albino and blind (to my knowledge), consider using alternative sources of genotoxic stress beyond UV-induced damage. This could include chemical agents or other forms of environmental stress to provide a more comprehensive assessment of DDR.

      We agree and are enthusiastic about looking more generally at stress.  We note that we have previously found that cavefish rebound following sleep deprivation (McGaugh et al, 2020) suggesting that they are responsive to sleep disruption.  This will be a major research focus area moving forward.

      Broader Stress Responses:

      Investigate whether other forms of stress, such as dietary changes or temperature fluctuations, elicit similar differences in sleep patterns and DDR responses. This could provide additional insights into the robustness of the observed phenomena.

      We fully agree.  This will be the primary focus of this research area moving forward. We hypothesize that cavefish are generally less responsive to their environment.  Unpublished data reveals that temperature stress, circadian changes, and aging (presented here) to little to impact gene expression in surface fish.  We would like to test the hypothesis that transcriptional stability of cavefish contributes to their longevity.

      Potential Protective Mechanisms:

      Discuss the possibility that lower levels of gamma-H2AX in cavefish might be protective, as DDR can lead to cellular senescence or cancer. This perspective could add depth to the interpretation of the results.

      This was the hypothesis underlying this manuscript.  However, we found elevated levels of gamma-H2AX.  We believe there may be additional protective mechanisms that have evolved in cavefish, but cannot identify them to date.  Our hope is future functional studies by our group, as well as other groups’ access to this published work, may help address these questions.

      Strengthening the Sleep-DNA Damage Link:

      Further experiments are needed to directly link sleep differences to the observed variations in DNA damage and DDR. This could involve manipulating sleep patterns in surface fish and cavefish to observe corresponding changes in DNA repair mechanisms.

      We agree.  We have referenced work that conclusively showed this relationship in zebrafish. Our current methods for limiting sleep involves shaking, and this has too many confounds.  We are working on developing genetic tools, and applying the gentle rocking methods used previously in zebrafish to address these questions.

      Clarification of Causal Directionality:

      Address the potential that sleep patterns and DDR responses may both be downstream effects of a common cause or independent adaptations to the cave environment. Clarifying this in the manuscript would provide a more nuanced understanding of the evolutionary adaptations.

      Thank you for this suggestion.  We have now added a paragraph describing how these experiments (and the ones described above) are necessary for understanding the relationship between sleep and DDR.

      Clarification and Presentation:

      Fix the many typos, and improve the clarity of the figures and their legends to ensure they are easily interpretable. Additional context in the discussion section would help readers understand the significance and potential implications of the findings.

      Thank you, we have now included this.

      Reviewer #3 (Recommendations For The Authors):

      There are a number of suggestions that I have made in the public review, but there are a few things that I would like to add here.

      The methods section is missing many important details, for instance, the intensity of the illumination used in the UV exposure in larvae is not reported but is vital for the interpretation/replication of these experiments. In general, this section should be redone with a greater effort to include all important information. Similarly, the figure legends could be greatly improved, with important details like n-number and definition of significance thresholds defined (e.g. see Figures 1, C, and G.)

      We have added greater detail to the methods section to specify the spectral peak and power output of the bulbs used.

      There are a number of passages in the manuscript that do not make sense, which suggests that a future version of record should be carefully proofread. I know that this can be a case of reading multiple versions of a manuscript so many times that one doesn't really see it anymore, but, for example, phrases like "To differentiate between these two possibilities" are confusing to the reader when there has been no introduction of alternate possibilities.

      Thank you for this comment.  We have fixed this mistake and proofread the manuscript.

      Additionally, there are multiple examples of errors in citations/references. A few examples are below:

      "Further, chronic sleep loss results in elevated reactive oxygen species (ROS) in the gut and/or brain that contribute to mortality in Drosophila and mice [11, 16]". Reference 16 does not include mice at all, and reference 11 is Vaccaro et al. 2020, where Drosophila mortality is assessed, but mouse mortality is not.

      We have added the appropriate citations and revised this sentence.

      References 13 and 15 are the same.

      Thank you, we have fixed.

      References 24 and 26 are the same.

      Thank you, we have fixed.

      Public Reviews:

      Reviewer #1 (Publc Review):

      Summary:

      Lloyd et al employ an evolutionary comparative approach to study how sleep deprivation affects DNA damage repair in Astyanax mexicanus, using the cave vs surface species evolution as a playground. The work shows, convincingly, that the cavefish population has evolved an impaired DNA damage response both following sleep deprivation or a classical paradigm of DNA damage (UV).

      Strengths:

      The study employs a thorough multidisciplinary approach. The experiments are well conducted and generally well presented.

      Weaknesses:

      Having a second experimental mean to induce DNA damage would strengthen and generalise the findings.

      Overall, the study represents a very important addition to the field. The model employed underlines once more the importance of using an evolutionary approach to study sleep and provides context and caveats to statements that perhaps were taken a bit too much for granted before. At the same time, the paper manages to have an extremely constructive approach, presenting the platform as a clear useful tool to explore the molecular aspects behind sleep and cellular damage in general. The discussion is fair, highlighting the strengths and weaknesses of the work and its implications.

      We fully agree with this assessment.  We are currently performing experiments to test the effects of additional DNA damaging agents.  We hope to extend these studies beyond DNA-damage agents to look more generally at how animals respond to stress including ROS, sleep deprivation, and high temperature.  This will be a major direction of the laboratory moving forward.

      The manuscript investigates the relationship between sleep, DNA damage, and aging in the Mexican cavefish (Astyanax mexicanus), a species that exhibits significant differences in sleep patterns between surface-dwelling and cave-dwelling populations. The authors aim to understand whether these evolved sleep differences influence the DNA damage response (DDR) and oxidative stress levels in the brain and gut of the fish.

      Summary of the Study:

      The primary objective of the study is to determine if the reduced sleep observed in cave-dwelling populations is associated with increased DNA damage and altered DDR. The authors compared levels of DNA damage markers and oxidative stress in the brains and guts of surface and cavefish. They also analyzed the transcriptional response to UV-induced DNA damage and evaluated the DDR in embryonic fibroblast cell lines derived from both populations.

      Strengths of the Study:

      Comparative Approach:

      The study leverages the unique evolutionary divergence between surface and cave populations of A. mexicanus to explore fundamental biological questions about sleep and DNA repair.

      Multifaceted Methodology:

      The authors employ a variety of methods, including immunohistochemistry, RNA sequencing, and in vitro cell line experiments, providing a comprehensive examination of DDR and oxidative stress.

      Interesting Findings:

      The study presents intriguing results showing elevated DNA damage markers in cavefish brains and increased oxidative stress in cavefish guts, alongside a reduced transcriptional response to UV-induced DNA damage.

      Weaknesses of the Study:

      Link to Sleep Physiology:

      The evidence connecting the observed differences in DNA damage and DDR directly to sleep physiology is not convincingly established. While the study shows distinct DDR patterns, it does not robustly demonstrate that these are a direct result of sleep differences.

      We agree with this assessment.  We are currently working to apply tools developed in zebrafish to examine the physiology of sleep.  While this is important, and our results our promising, we will note that functional analysis of sleep physiology in fish has been limited to zebrafish.  We hope future studies will allow us to integrate approaches that examine the physiology of sleep.

      Causal Directionality:

      The study fails to establish a clear causal relationship between sleep and DNA damage. It is possible that both sleep patterns and DDR responses are downstream effects of a common cause or independent adaptations to the cave environment.

      We agree, however, we note that this could be the case for all animals in which sleep has been linked to DNA damage.  We believe the most likely explanation for Astyanax and other animals studied, is that sleep is that sleep and DDR are downstream/interface with the sleep homeostat.

      Environmental Considerations:

      The lab conditions may not fully replicate the natural environments of the cavefish, potentially influencing the results. The impact of these conditions on the study's findings needs further consideration.

      This is correct. We have considered this carefully.  After nearly a decade of effort,  we have completed analysis of sleep in the wild.  These will be uploaded to BioRxiv within the next week.

      Photoreactivity in Albino Fish:

      The use of UV-induced DNA damage as a primary stressor may not be entirely appropriate for albino, blind cavefish. Alternative sources of genotoxic stress should be explored to validate the findings.

      We have addressed this above.  Future work will examine additional stressors. Both fish are transparent at 6dpf and so it is unlikely that albinism impacts the amount of UV that reaches the brain.

      Assessment of the Study's Achievements:

      The authors partially achieve their aims by demonstrating differences in DNA damage and DDR between surface and cavefish. However, the results do not conclusively support the claim that these differences are driven by or directly related to the evolved sleep patterns in cavefish. The study's primary claims are only partially supported by the data.

      Impact and Utility:

      The findings contribute valuable insights into the relationship between sleep and DNA repair mechanisms, highlighting potential areas of resilience to DNA damage in cavefish. While the direct link to sleep physiology remains unsubstantiated, the study's data and methods will be useful to researchers investigating evolutionary biology, stress resilience, and the molecular basis of sleep.

      Reviewer #3 (Public Review):

      Lloyd, Xia, et al. utilised the existence of surface-dwelling and cave-dwelling morphs of Astyanax mexicanus to explore a proposed link between DNA damage, aging, and the evolution of sleep. Key to this exploration is the behavioural and physiological differences between cavefish and surface fish, with cavefish having been previously shown to have low levels of sleep behaviour, along with metabolic alterations (for example chronically elevated blood glucose levels) in comparison to fish from surface populations. Sleep deprivation, metabolic dysfunction, and DNA damage are thought to be linked and to contribute to aging processes. Given that cavefish seem to show no apparent health consequences of low sleep levels, the authors suggest that they have evolved resilience to sleep loss. Furthermore, as extended wake and loss of sleep are associated with increased rates of damage to DNA (mainly double-strand breaks) and sleep is linked to repair of damaged DNA, the authors propose that changes in DNA damage and repair might underlie the reduced need for sleep in the cavefish morphs relative to their surface-dwelling conspecifics.

      To fulfill their aim of exploring links between DNA damage, aging, and the evolution of sleep, the authors employ methods that are largely appropriate, and comparison of cavefish and surface fish morphs from the same species certainly provides a lens by which cellular, physiological and behavioural adaptations can be interrogated. Fluorescence and immunofluorescence are used to measure gut reactive oxygen species and markers of DNA damage and repair processes in the different fish morphs, and measurements of gene expression and protein levels are appropriately used. However, although the sleep tracking and quantification employed are quite well established, issues with the experimental design relate to attempts to link induced DNA damage to sleep regulation (outlined below). Moreover, although the methods used are appropriate for the study of the questions at hand, there are issues with the interpretation of the data and with these results being over-interpreted as evidence to support the paper's conclusions.

      This study shows that a marker of DNA repair molecular machinery that is recruited to DNA double-strand breaks (γH2AX) is elevated in brain cells of the cavefish relative to the surface fish and that reactive oxygen species are higher in most areas of the digestive tract of the cavefish than in that of the surface fish. As sleep deprivation has been previously linked to increases in both these parameters in other organisms (both vertebrates and invertebrates), their elevation in the cavefish morph is taken to indicate that the cavefish show signs of the physiological effects of chronic sleep deprivation.

      It has been suggested that induction of DNA damage can directly drive sleep behaviour, with a notable study describing both the induction of DNA damage and an increase in sleep/immobility in zebrafish (Danio rerio) larvae by exposure to UV radiation (Zada et al. 2021 doi:10.1016/j.molcel.2021.10.026). In the present study, an increase in sleep/immobility is induced in surface fish larvae by exposure to UV light, but there is no effect on behaviour in cavefish larvae. This finding is interpreted as representing a loss of a sleep-promoting response to DNA damage in the cavefish morph. However, induction of DNA damage is not measured in this experiment, so it is not certain if similar levels of DNA damage are induced in each group of intact larvae, nor how the amount of damage induced compares to the pre-existing levels of DNA damage in the cavefish versus the surface fish larvae. In both this study with A. mexicanus surface morphs and the previous experiments from Zada et al. in zebrafish, observed increases in immobility following UV radiation exposure are interpreted as following from UV-induced DNA damage. However, in interpreting these experiments it is important to note that the cavefish morphs are eyeless and blind. Intense UV radiation is aversive to fish, and it has previously been shown in zebrafish larvae that (at least some) behavioural responses to UV exposure depend on the presence of an intact retina and UV-sensitive cone photoreceptors (Guggiana-Nilo and Engert, 2016, doi:10.3389/fnbeh.2016.00160). It is premature to conclude that the lack of behavioural response to UV exposure in the cavefish is due to a different response to DNA damage, as their lack of eyes will likely inhibit a response to the UV stimulus.

      We believe that in A. mexicanus, like in zebrafish, it is highly unlikely that the effects of UV are mediated through visual processing. Even if this were the case, the timeframe of UV activation is very short compared to the time-scale of sleep measurements so this is unlikely to be a confound.

      Indeed, were the equivalent zebrafish experiment from Zada et al. to be repeated with mutant larvae fish lacking the retinal basis for UV detection it might be found that in this case too, the effects of UV on behaviour are dependent on visual function. Such a finding should prompt a reappraisal of the interpretation that UV exposure's effects on fish sleep/locomotor behaviour are mediated by DNA damage.

      We prefer not to comment on Zada et al, as that is a separate manuscript.

      An additional note, relating to both Lloyd, Xia, et al., and Zada et al., is that though increases in immobility are induced following UV exposure, in neither study have assays of sensory responsiveness been performed during this period. As a decrease in sensory responsiveness is a key behavioural criterion for defining sleep, it is, therefore, unclear that this post-UV behaviour is genuinely increased sleep as opposed to a stress-linked suppression of locomotion due to the intensely aversive UV stimulus.

      We understand this concern and are working on improved methodology for measuring sleep.  However, behavioral measurements are the standard for almost every manuscript that has studied sleep in zebrafish, flies, and worms to date. 

      The effects of UV exposure, in terms of causing damage to DNA, inducing DNA damage response and repair mechanisms, and in causing broader changes in gene expression are assessed in both surface and cavefish larvae, as well as in cell lines derived from these different morphs. Differences in the suite of DNA damage response mechanisms that are upregulated are shown to exist between surface fish and cavefish larvae, though at least some of this difference is likely to be due to differences in gene expression that may exist even without UV exposure (this is discussed further below).

      UV exposure induced DNA damage (as measured by levels of cyclobutene pyrimidine dimers) to a similar degree in cell lines derived from both surface fish and cave fish. However, γH2AX shows increased expression only in cells from the surface fish, suggesting induction of an increased DNA repair response in these surface morphs, corroborated by their cells' increased ability to repair damaged DNA constructs experimentally introduced to the cells in a subsequent experiment. This "host cell reactivation assay" is a very interesting assay for measuring DNA repair in cell lines, but the power of this approach might be enhanced by introducing these DNA constructs into larval neurons in vivo (perhaps by electroporation) and by tracking DNA repair in living animals. Indeed, in such a preparation, the relationship between DNA repair and sleep/wake state could be assayed.

      Comparing gene expression in tissues from young (here 1 year) and older (here 7-8 years) fish from both cavefish and surface fish morphs, the authors found that there are significant differences in the transcriptional profiles in brain and gut between young and old surface fish, but that for cavefish being 1 year old versus being 7-8 years old did not have a major effect on transcriptional profile. The authors take this as suggesting that there is a reduced transcriptional change occurring during aging and that the transcriptome of the cavefish is resistant to age-linked changes. This seems to be only one of the equally plausible interpretations of the results; it could also be the case that alterations in metabolic cellular and molecular mechanisms, and particularly in responses to DNA damage, in the cavefish mean that these fish adopt their "aged" transcriptome within the first year of life.

      This is indeed true.  However, one could also interpret this as a lack of aging.  If the profile does not change over time, the difference seems largely semantic.

      A major weakness of the study in its current form is the absence of sleep deprivation experiments to assay the effects of sleep loss on the cellular and molecular parameters in question. Without such experiments, the supposed link of sleep to the molecular, cellular, and "aging" phenotypes remains tenuous. Although the argument might be made that the cavefish represent a naturally "sleep-deprived" population, the cavefish in this study are not sleep-deprived, rather they are adapted to a condition of reduced sleep relative to fish from surface populations. Comparing the effects of depriving fish from each morph on markers of DNA damage and repair, gut reactive oxygen species, and gene expression will be necessary to solidify any proposed link of these phenotypes to sleep.

      We agree this would be beneficial.  We note that relatively few papers have sleep deprived fish.  While we done have this before in A. mexicanus the assay is less than ideal and likely induces generalizable stress.  We are working on adapting more recently developed methods in zebrafish.

      A second important aspect that limits the interpretability and impact of this study is the absence of information about circadian variations in the parameters measured. A relationship between circadian phase, light exposure, and DNA damage/repair mechanisms is known to exist in A. mexicanus and other teleosts, and differences exist between the cave and surface morphs in their phenomena (Beale et al. 2013, doi: 10.1038/ncomms3769). Although the present study mentions that their experiments do not align with these previous findings, they do not perform the appropriate experiments to determine if such a misalignment is genuine. Specifically, Beale et al. 2013 showed that white light exposure drove enhanced expression of DNA repair genes (including cpdp which is prominent in the current study) in both surface fish and cavefish morphs, but that the magnitude of this change was less in the cave fish because they maintained an elevated expression of these genes in the dark, whereas the darkness suppressed the expression of these genes in the surface fish. If such a phenomenon is present in the setting of the current study, this would likely be a significant confound for the UV-induced gene expression experiments in intact larvae, and undermine the interpretation of the results derived from these experiments: as samples are collected 90 minutes after the dark-light transition (ZT 1.5) it would be expected that both cavefish and surface fish larvae should have a clear induction of DNA repair genes (including cpdp) regardless of 90s of UV exposure. The data in Supplementary Figure 3 is not sufficient to discount this potentially serious confound, as for larvae there is only gene expression data for time points from ZT2 to ZT 14, with all of these time points being in the light phase and not capturing any dynamics that would occur at the most important timepoints from ZT0-ZT1.5, in the relevant period after dark-light transition. Indeed, an appropriate control for this experiment would involve frequent sampling at least across 48 hours to assess light-linked and developmentally-related changes in gene expression that would occur in 5-6dpf larvae of each morph independently of the exposure to UV.

      We agree that this would be useful, however, frequent sampling is not feasible given the experiments presented here and the challenges of working with an emerging model.

      On a broader point, given the effects of both circadian rhythm and lighting conditions that are thought to exist in A. mexicanus (e.g. Beale et al. 2013) experiments involving measurements of DNA damage and repair, gene expression, and reactive oxygen species, etc. at multiple times across >1 24 hour cycle, in both light-dark and constant illumination conditions (e.g. constant dark) would be needed to substantiate the authors' interpretation that their findings indicate consistently altered levels of these parameters in the cavefish relative to the surface fish. Most of the data in this study is taken at only single time points.

      Again, see comment above.  The goal was to identify whether there are differences in DNA Damage response between A. mexcicanus. Extending on this to examine interactions with the circadian system could be a useful path to pursue in the future.

      On a broader point, given the effects of both circadian rhythm and lighting conditions that are thought to exist in A. mexicanus (e.g. Beale et al. 2013) experiments involving measurements of DNA damage and repair, gene expression, and reactive oxygen species, etc. at multiple times across >1 24 hour cycle, in both light-dark and constant illumination conditions (e.g. constant dark) would be needed to substantiate the authors' interpretation that their findings indicate consistently altered levels of these parameters in the cavefish relative to the surface fish. Most of the data in this study is taken at only single time points.

      In summary, the authors show that there are differences in gene expression, activity of DNA damage response and repair pathways, response to UV radiation, and gut reactive oxygen species between the Pachón cavefish morph and the surface morph of Astyanax mexicanus. However, the data presented does not make the precise nature of these differences very clear, and the interpretation of the results appears to be overly strong. Furthermore, the evidence of a link between these morph-specific differences and sleep is unconvincing.

      In summary, the authors show that there are differences in gene expression, activity of DNA damage response and repair pathways, response to UV radiation, and gut reactive oxygen species between the Pachón cavefish morph and the surface morph of Astyanax mexicanus. However, the data presented does not make the precise nature of these differences very clear, and the interpretation of the results appears to be overly strong. Furthermore, the evidence of a link between these morph-specific differences and sleep is unconvincing.

    1. Author response:

      Reviewer #1 (Public review):

      Point 1. The authors postulate a synergistic role for Itgb1 and Itgb3 in the intravasation phenotype, because the single KOs did not replicate the phenotype of the DKO. However, this is not a correct interpretation in the opinion of this reviewer. The roles appear rather to be redundant. Synergistic roles would rather demonstrate a modest effect in the single KO with potentiation in the DKO.

      We agree that the interaction between Itgb1 and Itgb3 appears redundant and we will correct this point in the revised manuscript.

      Point 2. The experiment does not explain how these integrins influence the interaction of the MK with their microenvironment. It is not surprising that attachment will be impacted by the presence or absence of integrins. However, it is unclear how activation of integrins allows the MK to become "architects for their ECM microenvironment" as the authors posit. A transcriptomic analysis of control and DKO MKs may help elucidate these effects.

      We do not currently understand how α5β1 or αvβ3 integrins activation would contribute to ECM remodeling by megakaryocytes. Integrins are well known key regulators of ECM remodelling (https://doi.org/10.1016/j.ceb.2006.08.009). They can transmit traction force that provoques ECM remodelling (https://doi.org/10.1016/j.bpj.2008.10.009). We will discuss our previous study on the observed reduction in RhoA activation in double knockout (DKO) mice (Guinard et al., 2023,  PMID: 37171626), which likely impact the organization of the ECM microenvironment. Alternatively, integrin signalling contribute to gene expression regulation involved in ECM remodelling (ECM proteins, proteases….). We do agree with the reviewer that the transcriptomic analysis could provide strong evidence; however, it is challenging to perform this analysis in vivo. Isolation of native megakaryocytes (MKs) from DKO mice is challenging due to their reduced numbers, requiring too many mice for sufficient RNA and risk of cell contamination. An alternative approach will be to analyze platelets, which are more abundant and easier to isolate, while still mimicking the characteristics of bone marrow MKs. We will use PCR array technology for selected ECM panels and adhesion molecules (from all players currently known to contribute to ECM remodelling), providing a practical way to address the reviewer's suggestions and provide valuable insights.

      Point 3. Integrin DKO have a 50% reduction in platelets counts as reported previously, however laminin α4 deficiency only leads to 20% reduction in counts. This suggests a more nuanced and subtle role of the ECM in platelet growth. To this end, functional assays of the platelets in the KO and wildtype mice may provide more information.

      The difference in platelet counts between integrin DKO and laminin α4 KO mice is not fully understood. Although our study specifically focuses on MK-ECM interactions in the bone marrow, we recognize the importance of providing additional information on platelet functionality. To address this, we will use flow cytometry to examine the levels of P-selectin surface expression and fibrinogen binding under basal conditions and after stimulation with collagen-related peptide and TRAP.

      Point 4. There is insufficient information in the Methods Section to understand the BM isolation approach. Did the authors flush the bone marrow and then image residual bone, or the extruded bone marrow itself as described in PMID: 29104956?

      Additional information on the methodology will be provided to clarify the BM isolation.

      Point 5. The references in the Methods section were very frustrating. The authors reference Eckly et al 2020 (PMID: 32702204) which provides no more detail but references a previous publication (PMID: 24152908), which also offers no information and references a further paper (PMID: 22008103), which, as far as this reviewer can tell, did not describe the methodology of in situ bone marrow imaging.

      To address this confusion, we will add the reference "In Situ Exploration of the Major Steps of Megakaryopoiesis Using Transmission Electron Microscopy" by C. Scandola et al. (PMID: 34570102), which provides a standardized protocol for bone marrow isolation.

      Therefore, this reviewer cannot tell how the preparation was performed and, importantly, how can we be sure that the microarchitecture of the tissue did not get distorted in the process?

      Thank you for pointing this out. While we cannot completely rule out the possibility of distortion, we will clarify the precautions taken to minimize it. We utilized a double fixation process immediately after extruding the bone marrow, followed by embedding it in agarose to preserve its integrity as much as possible. We will address this point in greater detail in Methods section of the revised version.

      Reviewer #2 (Public review):

      Point 1. ECM cage imaging

      a) The value or additional information provided by the staining on nano-sections (A) is not clear, especially considering that the thick vibratome sections already display the entirety of the laminin γ1 cage structure effectively. Further clarification on the unique insights gained from each approach would help justify its inclusion.

      Ultrathin cryosection allow high-resolution imaging (10x fold increased in Z), facilitating the analysis of signal superposition. This study explores the interactions between MKs and their immediate ECM microenvironment, located at a distance of less than one micrometer, making nano-sections optimal for precise analysis of ECM distribution both within and surrounding MKs. This high-resolution approach has revealed the presence of collagen IV, laminin, fibronectin, and fibrinogen near MKs, More importantly, ultrathin cryosection allow us to clearly show with high resolution the presence of activated integrin in contact with laminin an coll IV fibers (see Fig. 3)

      We employed large-volume whole-mount imaging to clarify the overall three-dimensional architecture of the ECM interface, allowing us to identify the cages. Our findings emphasize the role of specific ECM components in facilitating proplatelet passage through the sinusoid barrier, an essential step for platelet production. Further details will be addressed in the revised manuscript.

      b) The sMK shown in Supplementary Figure 1C appears to be linked to two sinusoids, releasing proplatelets to the more distant vessels. Is this observation representative, and if so, can further discussion be provided?

      This observation is not representative; MKs can also be associated with just one sinusoid.

      c) Freshly isolated BM-derived MKs are reported to maintain their laminin γ1 cage. Are the proportions of MKs with/without cages consistent with those observed in microscopy?   

      In the revised manuscript, we will include the quantification of the proportion of BM-derived MKs with/without cages.

      Point 2.  ECM cage formation

      a) The statement "the full assembly of the 3D ECM cage required megakaryocyte interaction with the sinusoidal basement membrane" on page 7 is too strong given the data presented at this stage of the study. Supplemental Figure 1C shows that approximately 10% of pMKs form cages without direct vessel contact, indicating that other factors may also play a role in cage formation.

      The reviewer is correct. We will modify the text to reflect a more cautious interpretation of our results.

      b) The data supporting the statement that "pMK represent a small fraction of the total MK population" (cell number or density) could be shown to help contextualize the 10% of them with a cage.

      New bar graphs will be provided to represent the density of MK in the parenchyma against the total MK in the bone marrow.

      c) How "the full assembly of the 3D ECM cage" is defined at this stage of the study should be clarified, specifically regarding the ECM components and structural features that characterize its completion.

      We recognize that the term ' full assembly' of the 3D ECM cage can be misleading, as it might suggest different stages of cage formation, such as a completed cage, one that is in the process of formation, or an incomplete cage. Since we have not yet studied this concept, we will eliminate the term "full assembly" from the manuscript to avoid any confusion. Instead, we will simply mention the presence of a cage.

      Point 3. Data on MK Circulation and Cage Integrity: Does the cage require full component integrity to prevent MK release in circulation? Are circulating MKs found in Lama4-/- mice? Is the intravasation affected in these mice? Are the ~50% sinusoid associated MK functional?  

      These are very valid points. We will answer all these questions by performing a detailed analysis of MK localization, vessel association and intravascular MK detection using IF and high-resolution EM imaging of Lamα4<sup>-/-</sup> mice. Additionally, we will analyze data from Lamα4-/- bone marrow explants to assess the capacity of MKs to extend proplatelets.

      Point 4. Methodology

      a) Details on fixation time are not provided, which is critical as it can impact antibody binding and staining. Including this information would improve reproducibility and feasibility for other researchers.

      We will added this information in the methods section.

      b) The description of 'random length measuring' is unclear, and the rationale behind choosing random quantification should be explained. Additionally, in the shown image, it appears that only the branching ends were measured, which makes it difficult to discern the randomness in the measurements.

      The random length measurement method uses random sampling to provide unbiased data on laminin/collagen fibers in a 3D cage. Contrary to what the initial image might have suggested, measurements go beyond just the branching ends; they include intervals between various branching points throughout the cage.

      To clarify this process, we will outline these steps: 1) acquire 3D images, 2) project onto 2D planar sections, 3) select random intersection points for measurement, 4) measure intervals using ImageJ software, and 5) repeat the process for a representative dataset. This will better illustrate the randomness of our measurements.

      Point 5.  Figures

      a) Overall, the figures and their corresponding legends would benefit from greater clarity if some panels were split, such as separating images from graph quantifications.

      Following the reviewer’s suggestion, we will fully update all the Figures and separate images from graph quantifications.

      Reviewer #3 (Public review):

      Point 1. The data linking ECM cage formation to MK maturation raises several interesting questions. As the authors mention, MKs have been suggested to mature rapidly at the sinusoids, and both integrin KO and laminin KO MKs appear mislocalized away from the sinusoids. Additionally, average MK distances from the sinusoid may also help separate whether the maturation defects could be in part due to impaired migration towards CXCL12 at the sinusoid. Presumably, MKs could appear mislocalized away from the sinusoid given the data presented suggesting they leaving the BM and entering circulation. Additional data or commentary on intrinsic (ex-vivo) MK maturation phenotypes may help strengthen the author's conclusions and shed light on whether an essential function of the ECM cage is integrin activation at the sinusoid.

      The hypothesis of MK migration towards CXCL12 is interesting, although it has recently been challenged by Stegner et al. (2017), who found that MKs are primarily sessile. However, we cannot exclude this possibility. To address the reviewer's concerns, we will quantify the distance of MKs from the sinusoids. This could help to determine whether the maturation defects are due to impaired migration towards CXCL12 at the sinusoids or other factors, such as the ECM cage.

      We would appreciate some clarification regarding the second point raised by the reviewer. Is the question  specifically addressing whether the ECM cage has an effect on the activation of integrins in the sinusoids? If so, we will use immunofluorescence (IF) to investigate the relationship between the presence of an ECM cage and the activation of integrins on the surface of endothelial cells within the sinusoids. Thank you for your guidance on this matter.

      Point 2. The data demonstrating intact MKs inter circulation is intriguing - can the authors comment or provide evidence as to whether MKs are detectable in blood? A quantitative metric may strengthen these observations.

      We will conduct flow cytometry experiments and prepare blood smears to determine whether intact MKs are detectable in blood.

      Point 3. Supplementary Figure 6 - shows no effect on in vitro MK maturation and proplt, or MK area - But Figures 6B/6C demonstrate an increase in total MK number in MMP-inhibitor treated mice compared to control. Some additional clarification in the text may substantiate the author's conclusions as to either the source of the MMPs or the in vitro environment not fully reflecting the complex and dynamic niche of the BM ECM in vivo.

      This is a valid point. We will revise the text to include further clarification.

      Point 4.  Similarly, one function of the ECM discussed relates to MK maturation but in the B1/3 integrin KO mice, the presence of the ECM cage is reduced but there appears to be no significant impact upon maturation (Supplementary Figure 4). By contrast, MMP inhibition in vivo (but not in vitro) reduces MK maturation. These data could be better clarified in the text, or by the addition of experiments addressing whether the composition and quantity of ECM cage components directly inhibit maturation versus whether effects of MMP-inhibitors perhaps lead to over-activation of the integrins (as with the B4galt KO in the discussion) are responsible for the differences in maturation.

      These are very good questions, but they are difficult to assess in situ. To approach this, we will perform in vitro experiments :

      (1) We will vary collagenIV and laminin411 concentrations in the culture conditions to determine how this affects MK maturation ; and

      (2) We will assess the integrin activation states on cultured MKs treated with MMP inhibitors to determine if MMP inhibitors could influence MK maturation through over-activation of integrins.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This paper contains what could be described as a "classic" approach towards evaluating a novel taste stimuli in an animal model, including standard behavioral tests (some with nerve transections), taste nerve physiology, and immunocytochemistry of the tongue. The stimulus being tested is ornithine, from a class of stimuli called "kokumi", which are stimuli that enhance other canonical tastes, increasing essentially the hedonic attributes of these other stimuli; the mechanism for ornithine detection is thought to be GPRC6A receptors expressed in taste cells. The authors showed evidence for this in an earlier paper with mice; this paper evaluates ornithine taste in a rat model.

      Strengths:

      The data show the effects of ornithine on taste: in two-bottle and briefer intake tests, adding ornithine results in a higher intake of most, but not all, stimuli tests. Bilateral nerve cuts or the addition of GPRC6A antagonists decrease this effect. Small effects of ornithine are shown in whole-nerve recordings.

      Weaknesses:

      The conclusion seems to be that the authors have found evidence for ornithine acting as a taste modifier through the GPRC6A receptor expressed on the anterior tongue. It is hard to separate their conclusions from the possibility that any effects are additive rather than modulatory. Animals did prefer ornithine to water when presented by itself. Additionally, the authors refer to evidence that ornithine is activating the T1R1-T1R3 amino acid taste receptor, possibly at higher concentrations than they use for most of the study, although this seems speculative. It is striking that the largest effects on taste are found with the other amino acid (umami) stimuli, leading to the possibility that these are largely synergistic effects taking place at the tas1r receptor heterodimer.

      We would like to thank Reviewer #1 for the valuable comments. Our basis for considering ornithine as a taste modifier stems from our observation that a low concentration of ornithine (1 mM), which does not elicit a preference on its own, enhances the preference for umami substances, sucrose, and soybean oil through the activation of the GPRC6A receptor. Notably, this receptor is not typically considered a taste receptor. The reviewer suggested that the enhancement of umami taste might be due to potentiation occurring at the TAS1R receptor heterodimer. However, we propose that a different mechanism may be at play, as an antagonist of GPRC6A almost completely abolished this enhancement. In the revised manuscript, we will endeavor to provide additional information on the role of ornithine as a taste modifier acting through the GPRC6A receptor.

      Reviewer #2 (Public review):

      Summary:

      The authors used rats to determine the receptor for a food-related perception (kokumi) that has been characterized in humans. They employ a combination of behavioral, electrophysiological, and immunohistochemical results to support their conclusion that ornithine-mediated kokumi effects are mediated by the GPRC6A receptor. They complemented the rat data with some human psychophysical data. I find the results intriguing, but believe that the authors overinterpret their data.

      Strengths:

      The authors examined a new and exciting taste enhancer (ornithine). They used a variety of experimental approaches in rats to document the impact of ornithine on taste preference and peripheral taste nerve recordings. Further, they provided evidence pointing to a potential receptor for ornithine.

      Weaknesses:

      The authors have not established that the rat is an appropriate model system for studying kokumi. Their measurements do not provide insight into any of the established effects of kokumi on human flavor perception. The small study on humans is difficult to compare to the rat study because the authors made completely different types of measurements. Thus, I think that the authors need to substantially scale back the scope of their interpretations. These weaknesses diminish the likely impact of the work on the field of flavor perception.

      We would like to thank Reviewer #2 for the valuable comments and suggestions. Regarding the question of whether the rat is an appropriate model system for studying kokumi, we have chosen this species for several reasons: it is readily available as a conventional experimental model for gustatory research; the calcium-sensing receptor (CaSR), known as the kokumi receptor, is expressed in taste bud cells; and prior research has demonstrated the use of rats in kokumi studies involving gamma Glu-Val-Gly (Yamamoto and Mizuta, Chem. Senses, 2022).

      We acknowledge that fundamentally different types of measurements were conducted in the human psychophysical study and the rat study. Kokumi can indeed be assessed and expressed in humans; however, we do not currently have the means to confirm that animals experience kokumi in the same way that humans do. Therefore, human studies are necessary to evaluate kokumi, a conceptual term denoting enhanced flavor, while animal studies are needed to explore the potential underlying mechanisms of kokumi. We believe that a combination of both human and animal studies is essential, as is the case with research on sugars. While sugars are known to elicit sweetness, it is unclear whether animals perceive sweetness identically to humans, even though they exhibit a strong preference for sugars. In the revised manuscript, we will incorporate additional information to address the comments raised by the reviewer. We will also carefully review and revise our previous statements to ensure accuracy and clarity.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors set out to investigate whether GPRC6A mediates kokumi taste initiated by the amino acid L-ornithine. They used Wistar rats, a standard laboratory strain, as the primary model and also performed an informative taste test in humans, in which miso soup was supplemented with various concentrations of L-ornithine. The findings are valuable and overall the evidence is solid. L-Ornithine should be considered to be a useful test substance in future studies of kokumi taste and the class C G protein-coupled receptor known as GPRC6A (C6A) along with its homolog, the calcium-sensing receptor (CaSR) should be considered candidate mediators of kokumi taste.

      Strengths:

      The overall experimental design is solid based on two bottle preference tests in rats. After determining the optimal concentration for L-Ornithine (1 mM) in the presence of MSG, it was added to various tastants, including inosine 5'-monophosphate; monosodium glutamate (MSG); mono-potassium glutamate (MPG); intralipos (a soybean oil emulsion); sucrose; sodium chloride (NaCl); citric acid and quinine hydrochloride. Robust effects of ornithine were observed in the cases of IMP, MSG, MPG, and sucrose, and little or no effects were observed in the cases of sodium chloride, citric acid, and quinine HCl. The researchers then focused on the preference for Ornithine-containing MSG solutions. The inclusion of the C6A inhibitors Calindol (0.3 mM but not 0.06 mM) or the gallate derivative EGCG (0.1 mM but not 0.03 mM) eliminated the preference for solutions that contained Ornithine in addition to MSG. The researchers next performed transections of the chord tympani nerves (with sham operation controls) in anesthetized rats to identify the role of the chorda tympani branches of the facial nerves (cranial nerve VII) in the preference for Ornithine-containing MSG solutions. This finding implicates the anterior half-two thirds of the tongue in ornithine-induced kokumi taste. They then used electrical recordings from intact chorda tympani nerves in anesthetized rats to demonstrate that ornithine enhanced MSG-induced responses following the application of tastants to the anterior surface of the tongue. They went on to show that this enhanced response was insensitive to amiloride, selected to inhibit 'salt tastant' responses mediated by the epithelial Na+ channel, but eliminated by Calindol. Finally, they performed immunohistochemistry on sections of rat tongue demonstrating C6A positive spindle-shaped cells in fungiform papillae that partially overlapped in its distribution with the IP3 type-3 receptor, used as a marker of Type-II cells, but not with (i) gustducin, the G protein partner of Tas1 receptors (T1Rs), used as a marker of a subset of type-II cells; or (ii) 5-HT (serotonin) and Synaptosome-associated protein 25 kDa (SNAP-25) used as markers of Type-III cells.

      Weaknesses:

      The researchers undertook what turned out to be largely confirmatory studies in rats with respect to their previously published work on Ornithine and C6A in mice (Mizuta et al Nutrients 2021).

      The authors point out that animal models pose some difficulties of interpretation in studies of taste and raise the possibility in the Discussion that umami substances may enhance the taste response to ornithine (Line 271, Page 9). They miss an opportunity to outline the experimental results from the study that favor their preferred interpretation that ornithine is a taste enhancer rather than a tastant.

      At least two other receptors in addition to C6A might mediate taste responses to ornithine: (i) the CaSR, which binds and responds to multiple L-amino acids (Conigrave et al, PNAS 2000), and which has been previously reported to mediate kokumi taste (Ohsu et al., JBC 2010) as well as responses to Ornithine (Shin et al., Cell Signaling 2020); and (ii) T1R1/T1R3 heterodimers which also respond to L-amino acids and exhibit enhanced responses to IMP (Nelson et al., Nature 2001). While the experimental results as a whole favor the authors' interpretation that C6A mediates the Ornithine responses, they do not make clear either the nature of the 'receptor identification problem' in the Introduction or the way in which they approached that problem in the Results and Discussion sections. It would be helpful to show that a specific inhibitor of the CaSR failed to block the ornithine response. In addition, while they showed that C6A-positive cells were clearly distinct from gustducin-positive, and thus T1R-positive cells, they missed an opportunity to clearly differentiate C6A-expressing taste cells and CaSR-expressing taste cells in the rat tongue sections.

      It would have been helpful to include a positive control kokumi substance in the two-bottle preference experiment (e.g., one of the known gamma-glutamyl peptides such as gamma-glu-Val-Gly or glutathione), to compare the relative potencies of the control kokumi compound and Ornithine, and to compare the sensitivities of the two responses to C6A and CaSR inhibitors.

      The results demonstrate that enhancement of the chorda tympani nerve response to MSG occurs at substantially greater Ornithine concentrations (10 and 30 mM) than were required to observe differences in the two bottle preference experiments (1.0 mM; Figure 2). The discrepancy requires careful discussion and if necessary further experiments using the two-bottle preference format.

      We would like to thank Reviewer #3 for the valuable comments and helpful suggestions. We propose that ornithine has two stimulatory actions: one acting on GPRC6A, particularly at lower concentrations, and another on amino acid receptors such as T1R1/T1R3 at higher concentrations. Consequently, ornithine is not preferable at lower concentrations but becomes preferable at higher concentrations. For our study on kokumi, we used a low concentration (1 mM) of ornithine. The possibility mentioned in the Discussion that 'the umami substances may enhance the taste response to ornithine' is entirely speculative. We will reconsider including this description in the revised version. As the reviewer suggested, in addition to GPRC6A, ornithine may bind to CaSR and/or T1R1/T1R3 heterodimers. However, we believe that ornithine mainly binds to GPRC6A, as a specific inhibitor of this receptor almost completely abolished the enhanced response to umami substances, and our immunohistochemical study indicated that GPRC6A-expressing taste cells are distinct from CaSR-expressing taste cells (see Supplemental Fig. 3). We conducted essentially the same experiments using gamma-Glu-Val-Gly in Wistar rats (Yamamoto and Mizuta, Chem. Senses, 2022) and compared the results in the Discussion. The reviewer may have misunderstood the chorda tympani results: we added the same concentration (1 mM) used in the two-bottle preference test to MSG (Fig. 5-B). Fig. 5-A shows nerve responses to five concentrations of plain ornithine. In the revised manuscript, we will strive to provide more precise information reflecting the reviewer’s comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The behavioral effects found with the CPRC6A antagonists are not entirely convincing, as the antagonist is seemingly just mixed up in the solution with the stimuli. There are no control experiments demonstrating that the antagonists do not have a taste themselves.

      We mixed the antagonists into both liquids used in the two-bottle preference test to eliminate any potential taste effects of the antagonists themselves. In the electrophysiological experiments, the antagonist was incorporated into the solution after confirming that it did not elicit any appreciable response in the taste nerve.

      (2) The effects of ornithine found with quinine did not have a satisfying explanation - if there is some taste cell-taste cell modulation that accounts for the taste enhancement, why is the quinine less aversive? Why is it not enhanced like the other compounds?

      The effects of ornithine on quinine responses remain difficult to explain. A previous study (Tokuyama et al., Chem Pharm Bull, 2006) proposed that ornithine prevents bitter substances from binding to bitter receptors, although this hypothesis lacks definitive evidence. In the present study, our findings suggest that the binding of quinine to bitter receptors is essential, as another agonist, gallate, also enhanced the preference for quinine, but this effect was abolished by EGCG, a GPRC6A antagonist (see Supplemental Fig. 2).

      (3) Unless I am missing something, there appears to be no quantitative analysis of the immunocytochemical data, just assertions.

      We have made quantitative analyses in the revised text, and the following sentences have been added: “Approximately 11% of GPRC6A-positive cells overlapped with IP3R3 (9 double-positive cells/80 GPRC6A-positive cells), while approximately 8.3% of IP3R3-positive cells expressed GPRC6A (9 double-positive /109 IP3R3-positive cells). In addition, GPRC6A-positive cells were unlikely to colocalize with a-gustducin, another marker for a subset of type II cells, in single taste cells (0 double-positive cell/93 GPRC6A-positive cells). Regarding type III cell markers, GPRC6A-positive cells were unlikely to colocalize with 5-HT in single taste cells (0 double-positive cell/75 GPRC6A-positive cells).”

      (4) The hallmarks of Kokumi taste include descriptors such as "thickness", and "mouthfeel", which sound like potential somatosensory attributes. Perhaps the authors should consider this possibility for at least some of the effects found.

      The term kokumi, a Japanese word, refers to a phenomenon in which the flavor of complexly composed food is enhanced through certain processes, making them more delicious. To date, kokumi has been described using the representative terms thickness, mouthfulness, and continuity, originally introduced in the first paper on kokumi by Ueda et al. (1990). However, these terms are derived from Japanese and may not fully convey the nuances of the original language when translated into these simple English words. In particular, thickness is often interpreted as referring to physical properties such as viscosity or somatosensory sensations. Since kokumi inherently lacks somatosensory elements, this revised paper adopts alternative terms and explanations for the three components of kokumi to prevent misunderstanding and confusion.

      Therefore, to clarify that kokumi attributes are inherently gustatory, thickness is replaced with intensity of whole complex tastes (rich flavor with complex tastes), emphasizing the synergistic effects of a variety of tastes rather than the mere enhancement of a single flavor. Mouthfulness is clarified as not referring to mouthfeel (the tactile sensation a food gives in the mouth) but rather as spread of taste and flavor throughout the oral cavity, describing how the flavor fills the mouth. Continuity is replaced with persistence of taste (lingering flavor).

      (5) I don't think the human experiment (S1) belongs to the paper, even as a supplementary bit of data. It's only 17 subjects, they are all female, and we don't know anything about how they were selected, even though it states they are all students/staff at Kio. Were any of them lab members? Were they aware of the goals of the experiment? Could simply increasing the amount of solute in the soup make it seem thicker? This (sparse) data seems to have been shoehorned into the paper without enough detail/justification.

      Despite the reviewer’s suggestion, we would like to include the human experiment because the rationale of the present study is to confirm, through a human sensory test, that the kokumi of a complex solution (in this case, miso soup) is enhanced by the addition of ornithine. This is followed by basic animal experiments to investigate the underlying mechanisms. Therefore, this human study serves an important role.

      The total number of participants increased to 22 (19 women and three men) following an additional experiment with 5 new participants. New results have been shown in Supplemental Figure 1 with statistical analyses. The rewritten parts are as follows:

      We recruited 22 participants (19 women and three men, aged 21-28 years) from Kio University who were not affiliated with our laboratory, including students and staff members. All participants passed a screening test based on taste sensitivity. According to the responses obtained from a pre-experimental questionnaire, we confirmed that none of the participants had any sensory abnormalities, eating disorders, or mental disorders, or were taking any medications that may potentially affect their sense of taste. All participants were instructed not to eat or drink anything for 1 hour prior to the start of the experiment. We provided them with a detailed explanation of the experimental procedures, including safety measures and personal data protection, without revealing the specific goals of the study.

      (6) The introduction could be more concise - for example, when describing Kokumi stimuli such as ornithine and its possible receptors, the authors do not need to add the detail about how this stimulus was deduced from adding clams to the soup. Details like this can be reserved for the discussion.

      Thank you for this comment. We have tried to shorten the Introduction.

      (7) Line 86: awkward phrasing - this doesn't need to be a rhetorical question.

      We have deleted the sentence.

      (8) Supplementary Figure 1: The labels on the figure say "Miso soup in 1 mM Orn" when the Orn is dissolved into the soup.

      Thank you for pointing out our mistake. We have changed the description, such as “1 mM Orn in miso soup”.

      Reviewer #2 (Recommendations for the authors):

      Major concerns

      (1) The impact of "kokumi" taste ligands on food perception appears to be profound in humans. This observation is fascinating because it implies that molecules like ornithine impact a variety of flavor perceptions, some of which are non-gustatory in nature (e.g., spread, mouthfulness and harmony). What remains unclear is whether "kokumi" ligands produce analogous sensations in rodents. If they don't, then rodents are an inappropriate model system for studying the impact of kokumi on flavor perceptions. The authors fail to address this key issue, and uncritically assume that kokumi ligands produce sensations like thickness, mouthfulness, and continuity in rodents. For this reason, the authors' reference to GPRC6A as a kokumi receptor is inappropriate.

      Thank you very much for the valuable comments. The term kokumi refers to a phenomenon in which the flavor of complexly composed foods is enhanced through certain processes, making them more delicious. It is an important concept in the field of food science, which studies how to make prepared dishes more enjoyable. Kokumi is also considered a higher-order, profound cognitive function evaluated by humans who experience a wide variety of foods. However, it is unclear whether animals, particularly experimental animals, can perceive kokumi in the same way humans do.

      To date, kokumi has been described using the representative terms thickness, mouthfulness, and continuity, originally introduced in the first paper on kokumi by Ueda et al. (1990). However, these terms are derived from Japanese and may not fully convey the nuances of the original language when translated into these simple English words. In particular, thickness is often interpreted as referring to physical properties such as viscosity or somatosensory sensations. Since kokumi inherently lacks somatosensory elements, this revised paper adopts alternative terms and explanations for the three components of kokumi to prevent misunderstanding and confusion.

      Therefore, to clarify that kokumi attributes are inherently gustatory, thickness is replaced with intensity of whole complex tastes (rich flavor with complex tastes), emphasizing the synergistic effects of a variety of tastes rather than the mere enhancement of a single flavor. Mouthfulness is clarified as not referring to mouthfeel (the tactile sensation a food gives in the mouth) but rather as spread of taste and flavor throughout the oral cavity, describing how the flavor fills the mouth. Continuity is replaced with persistence of taste (lingering flavor).

      Rodents are thought to possess basic taste functions similar to humans, such as the expression of taste receptors, including kokumi receptors, in taste cells. Regardless of whether rodents can perceive kokumi, findings from studies on rodents may provide insights into aspects of the kokumi concept as experienced by humans.

      Indeed, the results of this study indicate that ornithine enhances umami, sweetness, fat taste, and saltiness, leading to the enhancement of complex flavors—referred to as intensity of whole taste. The activation of various taste cells, resulting in the enhancement of multiple tastes, may contribute to the sensation of flavors spreading throughout the oral cavity. Furthermore, the strong enhancement of MSG and MPG suggests that glutamate contributes to the mouthfulness and persistence of taste characteristic of kokumi.

      (2) A related concern is that the authors did not make any measurements that model kokumi sensations documented in the literature. For example, they would need to develop behavioral/electrophysiological measurements that reflect the known effects of kokumi ligands on flavor perception (i.e., increases in intensity, spread, continuity, richness, harmony, and punch). For example, ornithine is thought to produce more "punch" (i.e., a more rapid rise in intensity). This could be manifested as a more rapid rise in peripheral taste response or a more rapid fMRI response in the taste cortex. Alternatively, ornithine is thought to increase "continuity" (i.e., make the taste response more persistent). This response would presumably be manifested as a peripheral taste response that adapts more slowly or a more persistent fMRI response. As it stands, the authors have documented that ornithine increases (i) the preference of rats for some chemical stimuli, but not others; and (ii) the response of the CT nerve to some but not all taste stimuli.

      In animal experiments, it is challenging to examine each attribute of kokumi. The increase of complex tastes can be investigated through behavioral experiments and neural activity recordings. However, phenomena such as spread or harmony, which arise from profound human judgments, are difficult to validate in animal studies.

      While it was possible to examine persistence through neural responses to tastants, all stimuli were rinsed at 30 seconds after onset of stimulation, so the exact duration of persistence was not investigated. However, since the MSG response was enhanced approximately 1.5 times with the addition of ornithine, it is strongly suggested that the duration might also have been prolonged.

      Regarding punch, no differences were observed in the neural responses when ornithine was added, likely because the phasic response already had a rapid onset.

      In the context of fMRI studies, there has been a report that adding glutathione to mixtures of umami and salt solutions increases responses (Goto et al. Chem Senses, 2016). However, research specifically examining the attributes of kokumi has not yet been reported.

      (3) The quality of the SNAP-25 immunohistochemistry is poor (see Figure 7D), with lots of seemingly nonspecific staining in and outside the taste bud.

      The quality of the SNAP-25 is not poor. It is known that SNAP-25 labels not only type III cells but also the dense network of intragemmal nerve fibers (Tizzano et al., Immunohistochemical Analysis of Human Vallate Taste Buds. Chem Senses.40:655-60, 2015). Therefore, lots of seemingly nonspecific staining is due to intense SNAP-25-immunoreactivity of the nerve fibers.

      (4) The authors need to drastically scale back the scope of their conclusions. What they can say is that ornithine appears to enhance the taste responses of rats to a variety of taste stimuli and that this effect appears to be mediated by the GPRC6A receptor. They cannot use their data to address kokumi effects in humans, as they have not attempted to model any of these effects. Given the known problems with pharmacological blocking agents (e.g., nonspecificity), the authors would significantly strengthen their case if they could generate similar results in a GPRC6A knockout mouse.

      Our research approach begins with confirming in humans that the addition of ornithine to complex foods (such as miso soup) induces kokumi. Based on this confirmation, we conduct fundamental studies using animal models to investigate the peripheral taste mechanisms underlying the expression of kokumi.

      It is possible that the key to kokumi expression lies in the enhancement of desirable tastes (particularly umami) and the suppression of unpleasant tastes. Moving forward, we will deepen our fundamental research on the action of ornithine mediated through GPRC6A, including studies using knockout mice.

      (5) The introduction is too long. Much of the discussion of kokumi perception in humans should either be removed or shortened considerably.

      Following the reviewer’s suggestion, the introduction has been shortened.

      (6) I recommend that the authors break up the Methods and Results sections into different experiments. This would enable the authors to provide separate rationales for each procedure. For instance, the authors conducted a variety of different behavioral procedures (e.g., long- and short-term preference tests, and preference tests with and without GPRC6A receptor antagonists).

      Rather than following the reviewer’s suggestion, we have added subheadings to describe the purpose of each experiment. This approach would help readers better understand the experimental flow, as each experiment is relatively straightforward.

      (7) The inclusion of the human data is odd for two reasons. First, the measurements used to assess the impact of ornithine on flavor perception in humans were totally different than those used in rats. This makes it impossible to compare the human and rat datasets. Second, the human study was rather limited in scope, had small effect sizes, and had a lot of individual variation. For these reasons, the human data are not terribly helpful. I recommend that the authors remove the human data from this paper, and publish them as part of a more extensive study on humans.

      Despite the reviewer’s suggestion, we would like to include the human experiment because the rationale of the present study is to confirm, through a human sensory test, that the kokumi of a complex solution (in this case, miso soup) is enhanced by the addition of ornithine. This is followed by basic animal experiments to investigate the underlying mechanisms. Therefore, this human study serves an important role. The considerable variation in the scores suggests that evaluating the three kokumi attributes is challenging and likely influenced by differences in judgment criteria among participants.

      The total number of participants increased to 22 (19 women and three men) following an additional experiment with 5 new participants. New results have been shown in Supplemental Figure 1 with statistical analyses. The rewritten parts are as follows:

      We recruited 22 participants (19 women and three men, aged 21-28 years) from Kio University who were not affiliated with our laboratory, including students and staff members. All participants passed a screening test based on taste sensitivity. According to the responses obtained from a pre-experimental questionnaire, we confirmed that none of the participants had any sensory abnormalities, eating disorders, or mental disorders, or were taking any medications that may potentially affect their sense of taste. All participants were instructed not to eat or drink anything for 1 hour prior to the start of the experiment. We provided them with a detailed explanation of the experimental procedures, including safety measures and personal data protection, without revealing the specific goals of the study.

      (8) While the use of English is generally good, there are many instances where the English is a bit awkward. I recommend that the authors ask a native English speaker to edit the text.

      Thank you for this comment. The text has been edited by a native English speaker.

      Minor concerns

      (1) Lines 13-14: The authors state that "the concept of 'kokumi' has garnered significant attention in gustatory physiology and food science." This is an exaggeration. Kokumi has generated considerable interest in food science but has yet to generate much interest in gustatory physiology.

      We have rewritten this part: “The concept of “kokumi” has generated considerable interest in food science but kokumi has not been well studied in gustatory physiology.”

      (2) Line 20: The use of "specific taste" is unclear in this context. The authors indicate (in Figure 5A) that 1 mM ornithine generates a CT nerve response. They also reveal (in Figure 1A) that rats do not prefer 1 mM ornithine over water. The results from a preference test do not provide insight into whether a solution can be tasted; they merely demonstrate a lack of preference for that solution. Based on these data, the authors cannot infer that 1 mM ornithine cannot be tasted.

      We agree with the reviewer’s comment. Ornithine at 1 mM concentration may have a weak taste because this solution elicited a small neural response (Fig. 5-A). We have rewritten the text: “… at a concentration without preference for this solution.”

      (3) Line 44: Sensory information from foods enters the oral and the nasal cavity.

      The nasal cavity has been added.

      (5) Lines 59: The terms "thickness", "mouthfulness" and "continuity" are not intuitive in English, and may reflect, at least in part, a failure in translation. The word thickness implies a tactile sensation (e.g., owing to high viscosity), but the authors use it to indicate a flavor that is more intense and onsets more quickly. The word mouthfulness is supposed to indicate that a flavor is experienced throughout the oral cavity. The problem here is that this happens with all tastants, independent of the presence of substances like ornithine. Indeed, taste buds occur in a limited portion of the oral epithelium, but we nevertheless experience tastes throughout the oral cavity, owing to a phenomenon called tactile referral (see the following reference: Todrank and Bartoshuk, 1991, A taste illusion: taste sensation localized by touch" Physiology & Behavior 50:1027-1031). The word continuity does not imply that the taste is long-lasting or persistent.

      These three attributes were originally introduced by Ueda et al. (1990), who translated Japanese terms describing the profound characteristics of kokumi, which are deeply rooted in Japanese culinary culture. However, these simply translated terms have caused global misunderstanding and confusion, because they sound like somatosensory rather than gustatory descriptions. Therefore, to clarify that kokumi attributes are inherently gustatory, in the revised version we use the terms “intensity of whole complex tastes (rich flavor with complex tastes)” instead of thickness, “mouthfulness (spread of taste and flavor throughout the oral cavity),” and “persistence of taste (lingering flavor)” instead of continuity.

      The results of this study indicate that ornithine enhances umami, sweetness, fat taste, and saltiness, leading to the enhancement of complex flavors—referred to as intensity of whole taste. The activation of various taste cells, resulting in the enhancement of multiple tastes, may contribute to the sensation of flavors spreading throughout the oral cavity. Furthermore, the strong enhancement of MSG and MPG suggests that glutamate contributes to the mouthfulness and persistence of taste characteristic of kokumi.

      (6) Figure legends: The authors provide results of statistical comparisons in several of the figures. They need to explain what statistical procedures were performed. As it stands, it is impossible to interpret the asterisks provided.

      We have explained statistical procedures in each Figure legend.

      (7) I did not see any reference to the sources of funding or any mention of potential conflicts of interest.

      We have added the following information:

      Funding: JSPS KAKENHI Grant Numbers JP17K00935 (to TY) and JP22K11803(to KU).

      Declaration of interests: The authors declare that they have no competing interests.

      Reviewer #3 (Recommendations for the authors):

      (1) I suggest that the authors increase their level of interest in glutathione and gamma-glutamyl peptides. This might include an appropriate gamma-glutamyl control substance in the two-bottle preference study (see Public Review). It might also include more careful attention to the work that identified glutathione as an activator of the CaSR (Wang et al., JBC 2006) and the nature of its binding site on the CaSR which overlaps with its site for L-amino acids (Broadhead et al., JBC 2011). This latter article also identified S-methyl glutathione, in which the free-SH group is blocked, as a high-potency activator of the CaSR. It would be expected to show comparable potency to gamma-glu-Val-Gly in assays of kokumi taste.

      We have appropriately referenced glutathione and gamma-Glu-Val-Gly, potent agonists of CaSR, where necessary. In our previous study (Yamamoto and Mizuta, Chem Senses, 2022), we examined the additive effects of these substances on basic taste stimuli in rodents, and the results were compared in greater detail with those obtained from the addition of ornithine in the present study. We have also discussed the potential binding of ornithine to other receptors, including CaSR and T1R1/T1R3 heterodimers.

      (2) Figures:

      -None of the figures were labelled with their Figure numbers. I have inferred the Figure numbers from the legends and their positions in the pdf.

      We are sorry for this inconvenience.

      - The labelling of Figure 1 and Figure 2 are problematic. In Figure 1 it should be made clear that the horizontal axes refer to the Ornithine concentration. In Figure 2 it should be made clear that the horizontal axes refer to the tastant concentrations (MSG, IMP, etc) and that the Ornithine concentrations were fixed at either zero or 1.0 mM.

      We are sorry for the lack of information about the horizontal axes. We have explained the horizontal axes in figure legends in Figs. 1 and 2. The labelling of both figures has also been modified to make this clear.

      - Figure 3B: 'Control' should appear at the top of this panel since the panels that follow all refer to it.

      Following the reviewer’s suggestion, we have added ‘Control’ at the top of Figure 3B.

      - Figure 5A. Provide a label for the test substance, presumably Ornithine.

      Yes, we have added ‘Ornithine’.

      - Figure 7 would be strengthened by the inclusion of immunohistochemistry analyses of the CaSR.

      We are sorry that we did not analyze immunohistochemistry for the CaSR because a previous study precisely had analyzed the CaSR expression on taste cells in rats. We have analyzed co-expression of GPRC6A and CaSR (see Supplemental Figure 3).

      (3) Other Matters:

      - Line 38: list the five basic taste modalities here.

      Yes, we have included the five basic taste modalities here.

      - Line 107: 'even if ... kokumi ... is less developed in rodents' - if there is evidence that kokumi is less developed in rodents it should be cited here.

      We cannot cite any references here because no studies have compared the perception of kokumi between humans and rodents.

      - Line 308: 'recently we conducted experiments in rats using gallate ...' - the authors appear to imply that they performed the research in Reference 43, however, I was unable to find an overlap between the two lists of authors.

      We are not doing a similar study as the research in Reference 43 (40 in the revised paper). Following the result that gallate is an agonist of GPRC6A as shown by Reference 43, we were interested in doing similar behavioral experiments using gallate instead of ornithine.

      The sentences have been rewritten to avoid misunderstanding.

      - Line 506: the sections are said to be 20 mm thick - should this read 20 micrometers?

      Thank you. We have changed to 20 micrometers.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      The Bagnat and Rawls groups' previous published work (Park et al., 2019) described the kinetics and genetic basis of protein absorption in a specialized cell population of young vertebrates termed lysosome-rich enterocytes (LREs). In this study they seek to understand how the presence and composition of the microbiota impacts the protein absorption function of these cells and reciprocally, how diet and intestinal protein absorption function impact the microbiome. 

      Strengths of the study include the functional assays for protein absorption performed in live larval zebrafish, which provides detailed kinetics on protein uptake and degradation with anatomic precision, and the gnotobiotic manipulations. The authors clearly show that the presence of the microbiota or of certain individual bacterial members slows the uptake and degradation of multiple different tester fluorescent proteins. 

      To understand the mechanistic basis for these differences, the authors also provide detailed single-cell transcriptomic analyses of cells isolated based on both an intestinal epithelial cell identity (based on a transgenic marker) and their protein uptake activity. The data generated from these analyses, presented in Figures 3-5, are valuable for expanding knowledge about zebrafish intestinal epithelial cell identities, but of more limited interest to a broader readership. Some of the descriptive analysis in this section is circular because the authors define subsets of LREs (termed anterior and posterior) based on their fabp2 expression levels, but then go on to note transcriptional differences between these cells (for example in fabp2) that are a consequence of this initial subsetting. 

      Inspired by their single-cell profiling and by previous characterization of the genes required for protein uptake and degradation in the LREs, the authors use quantitative hybridization chain reaction RNA-fluorescent in situ hybridization to examine transcript levels of several of these genes along the length of the LRE intestinal region of germ-free versus mono-associated larvae. They provide good evidence for reduced transcript levels of these genes that correlate with the reduced protein uptake in the mono-associated larval groups. 

      The final part of the study (shown in Figure 7) characterized the microbiomes of 30-day-old zebrafish reared from 6-30 days on defined diets of low and high protein and with or without homozygous loss of the cubn gene required for protein uptake. The analysis of these microbiomes notes some significant differences between fish genotypes by diet treatments, but the discussion of these data does not provide strong support for the hypothesis that "LRE activity has reciprocal effects on the gut microbiome". The most striking feature of the MDS plot of Bray Curtis distance between zebrafish samples shown in Figure 7B is the separation by diet independent of host genotype, which is not discussed in the associated text. Additionally, the high protein diet microbiomes have a greater spread than those of the low protein treatment groups, with the high protein diet cubn mutant samples being the most dispersed. This pattern is consistent with the intestinal microbiota under a high protein diet regimen and in the absence of protein absorption machinery being most perturbed in stochastic ways than in hosts competent for protein uptake, consistent with greater beta dispersal associated with more dysbiotic microbiomes (described as the Anna Karenina principle here: https://pubmed.ncbi.nlm.nih.gov/28836573/). It would be useful for the authors to provide statistics on the beta dispersal of each treatment group. 

      Overall, this study provides strong evidence that specific members of the microbiota differentially impact gene expression and cellular activities of enterocyte protein uptake and degradation, findings that have a significant impact on the field of gastrointestinal physiology. The work refines our understanding of intestinal cell types that contribute to protein uptake and their respective transcriptomes. The work also provides some evidence that microbiomes are modulated by enterocyte protein uptake capacity in a diet-dependent manner. These latter findings provide valuable datasets for future related studies. 

      We thank the reviewer for their thorough and kind assessment. We appreciate the suggestion for edits and for pointing out areas that need further clarification.

      One point that clearly needs further explanation is the use fabp6 (referred to as fabp2 by the reviewer) to define anterior LREs and their gene expression pattern. which includes high levels of fabp6. This was deemed by the reviewer as a “circular argument”.  We would like to clarify that the rationale for using fabp6 as anchor is that we had previously reported overlap between fabp6 and LREs (see Fig.6C-E in Wen et al. PMID: 34301599) and thus were able here to define fabp6’s spatial pattern in relation to other LRE markers and the neighboring ileocyte population using transgenic markers and HCR. Thus, far from being a circular argument, using fabp6 allowed us to identify other markers that are differentially expressed between anterior and posterior LREs, which share a core program that we highlight in our study. In the revised manuscript we will clarify this point.

      We will also add the analysis suggested for the 16S rRNA gene sequencing data, include statistics on beta dispersal, and expand the discussion of these data as suggested.

      Reviewer #2 (Public review): 

      Summary: 

      The authors set out to determine how the microbiome and host genotype impact host protein-based nutrition. 

      Strengths: 

      The quantification of protein uptake dynamics is a major strength of this work and the sensitivity of this assay shows that the microbiome and even mono-associated bacterial strains dampen protein uptake in the host by causing down-regulation of genes involved in this process rather than a change in cell type. 

      The use of fluorescent proteins in combination with transcript clustering in the single cell seq analysis deepens our understanding of the cells that participate in protein uptake along the intestine. In addition to the lysozome-rich enterocytes (LRE), subsets of enteroendocrine cells, acinar, and goblet cells also take up protein. Intriguingly, these non-LRE cells did not show lysosomal-based protein degradation; but importantly analysis of the transcripts upregulated in these cells include dab2 and cubn, genes shown previously as being essential to protein uptake. 

      The derivation of zebrafish mono-associated with single strains of microbes paired with HCR to localize and quantify the expression of host protein absorption genes shows that different bacterial strains suppress these genes to variable extents. 

      The analysis of microbiome composition, when host protein absorption is compromised in cubn-/- larvae or by reducing protein in the food, demonstrates that changes to host uptake can alter the abundance of specific microbial taxa like Aeramonas. 

      Weaknesses: 

      The finding that neurons are positive for protein uptake in the single-cell data set is not adequately discussed. It is curious because the cldn:GFP line used for sorting does not mark neurons and if the neurons are taking up mCherry via trans-synaptic uptake from EECs, those neurons should be mCherry+/GFP-; yet methods indicate GFP+ and GFP+/mCherry+ cells were the ones collected and analyzed. 

      We thank the Reviewer for the kind and positive assessment of our work, for suggestions to improve the accessibility and clarity of the manuscript, and for pointing out an issue related to a neuronal population that needs further clarification.

      We confirm that there is a population of neurons that express cldn15la (and cldn15la:GFP). They are not easily visualized by microscopy because IECs express this gene at a relatively much higher level. However, the endogenous cldn15la transcript can be found in a recently published dataset (PMID: 35108531) as well as in ours. We will add a Discussion point to clarify this issue.

      Reviewer #3 (Public review): 

      Summary: 

      Childers et al. address a fundamental question about the complex relationship within the gut: the link between nutrient absorption, microbial presence, and intestinal physiology. They focus on the role of lysosome-rich enterocytes (LREs) and the microbiota in protein absorption within the intestinal epithelium. By using germ-free and conventional zebrafishes, they demonstrate that microbial association leads to a reduction in protein uptake by LREs. Through impressive in vivo imaging of gavaged fluorescent proteins, they detail the degradation rate within the LRE region, positioning these cells as key players in the process. Additionally, the authors map protein absorption in the gut using single-cell sequencing analysis, extensively describing LRE subpopulations in terms of clustering and transcriptomic patterns. They further explore the monoassociation of ex-germ-free animals with specific bacterial strains, revealing that the reduction in protein absorption in the LRE region is strain-specific. 

      Strengths: 

      The authors employ state-of-the-art imaging to provide clear evidence of the protein absorption rate phenotype, focusing on a specific intestinal region. This innovative method of fluorescent protein tracing expands the field of in vivo gut physiology. 

      Using both conventional and germ-free animals for single-cell sequencing analysis, they offer valuable epithelial datasets for researchers studying host-microbe interactions. By capitalizing on fluorescently labelled proteins in vivo, they create a new and specific atlas of cells involved in protein absorption, along with a detailed LRE single-cell transcriptomic dataset. 

      Weaknesses: 

      While the authors present tangible hypotheses, the data are primarily correlative, and the statistical methods are inadequate. They examine protein absorption in a specific, normalized intestinal region but do not address confounding factors between germ-free and conventional animals, such as size differences, transit time, and oral gavage, which may impact their in vivo observations. This oversight can lead to bold conclusions, where the data appear valuable but require more nuance. 

      The sections of the study describing the microbiota or attempting functional analysis are elusive, with related data being overinterpreted. The microbiome field has long used 16S sequencing to characterize the microbiota, but its variability due to experimental parameters limits the ability to draw causative conclusions about the link between LRE activity, dietary protein, and microbial composition. Additionally, the complex networks involved in dopamine synthesis and signalling cannot be fully represented by RNA levels alone. The authors' conclusions on this biological phenomenon based on single-cell data need support from functional and in vivo experiments. 

      We thank the reviewer for their assessment and for pointing out some areas that need to be explained better and/or discussed further.

      The reviewer mentions some potential confounding factors (ie., size differences, transit time, oral gavage) in the gnotobiotic experiments. We would like to convey that these aspects have been addressed in our experimental design and will be clarified in our full in the revised manuscript by adding information to Methods or by adding data statements. Briefly: 1-larval sizes were recorded and found to be similar between GF and monoassociated larvae. A statement will be added to text.; 2-while intestinal transit time has been reported to be affected by microbes in larval zebrafish (PMIDs: 16781702, 28207737, 33352109) and is a topic of interest, it does not represent a confounding factor for our experiments. In our assay, luminal cargo is present at high concentrations throughout the gut and is not limiting at any point during the assay; 3-gavage, which is necessary for quantitative assays, is indeed an experimental manipulation that may somehow alter the subjects (the same is true for microscopy and virtually any research method). However, any potential effects of gavage manipulation would not explain differences between GF and CV animals or alter our conclusions about microbial or dietary effects. We will elaborate on this in the revised Discussion.

      We acknowledge that microbiota composition is prone to relatively high degrees of interindividual and interexperimental variation, and that measuring microbiota composition using 16S rRNA gene sequencing is accompanied by inherent technical limitations such as limited taxonomic resolution, primer bias, etc.  It is important to note that comparable assays such as shotgun metagenomic DNA sequencing are not currently suitable for samples such as larval zebrafish or their dissected digestive tracts where the relative superabundance of host DNA prevents adequate coverage of microbial DNA. However, 16S rRNA gene sequencing remains a mainstream assay in the larger microbial ecology field, has proven effective at revealing important impacts of environmental factors on the gut microbiota (PMIDs: 21346791, 31409661, 31324413). Our results here also illustrate how 16S rRNA gene sequencing can be a useful method to detect perturbations to the zebrafish gut microbiome. Reproducing previous findings, we detected in our samples many of the core zebrafish microbiota taxa that have been identified by other studies (PMIDs: 26339860, 21472014, 17055441). To increase the robustness of our results, we included several biological replicates for each condition, co-housed genotypes and included large sample sizes to minimize environmental variation between groups. Importantly, replicates housed in different tanks showed similar results. We will emphasize these points in the revised Discussion. To further underscore this in the revised manuscript, we will add a beta diversity plot and statistical analysis showing that the microbiome was not significantly affected by our experimental replicates.

      Regarding dopamine pathways, we thank the reviewer for pointing out that the language we used in our interpretation of this and other pathways enriched in our scRNAseq data was too strong. In the revised manuscript, we will soften those conclusions, and instead indicate that these may be areas worthy of future dedicated investigation.

      Finally, the reviewer mentions the use of inadequate statistical methods for some analyses but without specifying or indicating alternative analyses. Only the need to justify the use of two-way ANOVA was made explicit. In this point, we respectfully disagree and would like to emphasize that we use statistical methods that are standards in the field. We will nevertheless add a justification for the use of two-way ANOVA where appropriate. Briefly, the two-way ANOVA test was used to compare fluorescence profiles of gavages cargoes or HCR probes at each level along the length of the LRE region. This test accounts for differences in fluorescence between experimental conditions at each level (binned 30 μm areas) along the LRE region (~300 μm). This test allows us to capture differences in fluorescence between experimental conditions while accounting for heterogeneity in the LRE region.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, Harpring et al. investigated divisome assembly in Chlamydia trachomatis serovar L2 (Ct), an obligate intracellular bacterium that lacks FtsZ, the canonical master regulator of bacterial cell division. They find that divisome assembly is initiated by the protein FtsK in Ct by showing that it forms discrete foci at the septum and future division sites. Additionally, knocking down ftsK prevents divisome assembly and inhibits cell division, further supporting their hypothesis that FtsK regulates divisome assembly. Finally, they show that MreB is one of the last chlamydial divisome proteins to arrive at the site of division and is necessary for the formation of septal peptidoglycan rings but does not act as a scaffold for division assembly as previously proposed.

      Strengths:

      The authors use microscopy to clearly show that FtsK forms foci both at the septum as well as at the base of the progenitor cell where the next septum will form. They also show that the Ct proteins PBP2, PBP3, MreC, and MreB localize to these same sites suggesting they are involved in the divisome complex.

      Using CRISPRi the authors knock down ftsK and find that most cells are no longer able to divide and that PBP2 and PBP3 no longer localized to sites of division suggesting that FtsK is responsible for initiating divisome assembly. They also performed a knockdown of pbp2 using the same approach and found that this also mostly inhibited cell division. Additionally, FtsK was still able to localize in this strain, however PBP3 did not, suggesting that FtsK acts upstream of PBP2 in the divisome assembly process while PBP2 is responsible for the localization of PBP3.

      The authors also find that performing a knockdown of ftsK also prevents new PG synthesis further supporting the idea that FtsK regulates divisome assembly. They also find that inhibiting MreB filament formation using A22 results in diffuse PG, suggesting that MreB filament formation is necessary for proper PG synthesis to drive cell division.

      Overall the authors propose a new hypothesis for divisome assembly in an organism that lacks FtsZ and use a combination of microscopy and genetics to support their model that is rigorous and convincing. The finding that FtsK, rather than a cytoskeletal or "scaffolding" protein is the first division protein to localize to the incipient division site is unexpected and opens up a host of questions about its regulation. The findings will progress our understanding of how cell division is accomplished in bacteria with non-canonical cell wall structure and/or that lack FtsZ.

      Weaknesses:

      No major weaknesses were noted in the data supporting the main conclusions. However, there was a claim of novelty in showing that multiple divisome complexes can drive cell wall synthesis simultaneously that was not well-supported (i.e. this has been shown previously in other organisms). In addition, there were minor weaknesses in data presentation that do not substantially impact interpretation (e.g. presenting the number of cells rather than the percentage of the population when quantifying phenotypes and showing partial western blots instead of total western blots).

      We agree with the weaknesses identified by the reviewer. We removed the statements in the Results and Discussion that multiple independent divisome complexes can simultaneously direct PG synthesis. We presented the data in Figs. 3-5 as % of the cells in the population, and complete western blots are shown in Supp. Fig. S1.

      Reviewer #2 (Public review):

      Summary:

      Chlamydial cell division is a peculiar event, whose mechanism was mysterious for many years. C. trachomatis division was shown to be polar and involve a minimal divisome machinery composed of both homologues of divisome and elongasome components, in the absence of an homologue of the classical division organizer FtsZ. In this paper, Harpring et al., show that FtsK is required at an early stage of the chlamydial divisome formation.

      Strengths:

      The manuscript is well-written and the results are convincing. Quantification of divisome component localization is well performed, number of replicas and number of cells assessed are sufficient to get convincing data. The use of a CRISPRi approach to knock down some divisome components is an asset and allows a mechanistic understanding of the hierarchy of divisome components.

      Weaknesses:

      The authors did not analyse the role of all potential chlamydial divisome components and did not show how FtsK may initiate the positioning of the divisome. Their conclusion that FtsK initiates the assembly of the divisome is an overinterpretation and is not backed by the data. However, data show convincingly that FtsK, if perhaps not the initiator of chlamydial division, is definitely an early and essential component of the chlamydial divisome.

      The following statement has been included in the Discussion (pg. 16 of the revised manuscript)  “Although we focused our study on a subset of the divisome and elongasome proteins that Chlamydia expresses (bolded in Fig. 6G), our results support our conclusion that chlamydial budding is dependent upon a hybrid divisome complex and that FtsK is required for the assembly of this hybrid divisome. At this time, we cannot rule out that other proteins act upstream of FtsK to initiate divisome assembly in this obligate intracellular bacterial pathogen.”

      We will soon be submitting another manuscript that addresses how FtsK specifies the site of divisome assembly. This work is too extensive to be included in this manuscript.

      Reviewer #3 (Public review):

      Summary:

      The obligate intracellular bacterium Chlamydia trachomatis (Ct) divides by binary fission. It lacks FtsZ, but still has many other proteins that regulate the synthesis of septal peptidoglycan, including FtsW and FtsI (PBP3) as well as divisome proteins that recruit and activate them, such as FtsK and FtsQLB. Interestingly, MreB is also required for the division of Ct cells, perhaps by polymerizing to form an FtsZ-like scaffold. Here, Harpring et al. show that MreB does not act early in division and instead is recruited to a protein complex that includes FtsK and PBP2/PBP3. This indicates that Ct cell division is organized by a chimera between conserved divisome and elongasome proteins. Their work also shows convincingly that FtsK is the earliest known step of divisome activity, potentially nucleating the divisome as a single protein complex at the future division site. This is reminiscent of the activity of FtsZ, yet fundamentally different.

      Strengths:

      The study is very well written and presented, and the data are convincing and rigorous. The data underlying the proposed localization dependency order of the various proteins for cell division is well justified by several different approaches using small molecule inhibitors, knockdowns, and fluorescent protein fusions. The proposed dependency pathway of divisome assembly is consistent with the data and with a novel mechanism for MreB in septum synthesis in Ct.

      Weaknesses:

      The paper could be improved by including more information about FtsK, the "focus" of this study. For example, if FtsK really is the FtsZ-like nucleator of the Ct divisome, how is the Ct FtsK different sequence-wise or structurally from FtsK of, e.g. E. coli? Is the N-terminal part of FtsK sufficient for cell division in Ct like it is in E. coli, or is the DNA translocase also involved in focus formation or localization? Addressing those questions would put the proposed initiator role of FtsK in Ct in a better context and make the conclusions more attractive to a wider readership.

      We will be submitting another manuscript soon that details the conserved domain organization of FtsK from different bacteria, and the role of the various domains of chlamydial FtsK (including the N-terminus and the C-terminal translocase domain) in directing its localization in dividing Chlamydia. We have added text to the discussion (pg. 16 of the revised manuscript) that describes the sequence homology of chlamydial FtsK to FtsK from E. coli.

      Another weakness is that the title of the paper implies that FtsK alone initiates divisome assembly. However, the data indicate only that FtsK is important at an early stage of divisome assembly, not that it is THE initiator. I suggest modifying the title to account for this--perhaps "FtsK is required to initiate....".

      We agree with the reviewer and modified the title to “FtsK is Critical for the Assembly of the Unique Divisome Complex of the FtsZ-less Chlamydia trachomatis”. We have also modified the text throughout to indicate that FtsK is required for the assembly of the hybrid divisome of Chlamydia

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Suggestions for improvement (mostly minor):

      (1) For several of the graphs, the authors plot the number of cells with a given phenotype on the y-axis, but then describe percentages of cells in the text. It would make it clearer if all the graphs had the percentage of cells on the y-axis instead.

      We have modified the figures to indicate the percentage of cells on the y-axis with a given phenotype.

      (2) In Figures 3, 4, and 5 the authors show separate graphs for plus/minus drug or inducer. These should be on the same graph as they are directly comparing these two different conditions. Having them on separate graphs makes it less clear whether these differences are significant between the two conditions

      We modified Fig. 4 to show +/- inducer in ftsk and pbp2 knockdown strains in the same graph.  Regarding Figures 3 and 5, we believe the figures in the original submission effectively demonstrate the +/- drug conditions, so these figures remain unchanged in the revised manuscript.

      (3) In Figure 2 the authors show microscopy of the colocalization of FtsK with several other divisome proteins from Ct. Quantification of the colocalization of FtsK with these other proteins would provide a more holistic understanding of their colocalization and help further support their argument that FtsK initiates the assembly of the divisome.

      Supp. Fig. S4A of the revised manuscript contains images showing the colocalization of FtsK with the fusions at the septum and the base of dividing cells, and the colocalization of FtsK with the fusions that are only at the base of dividing cells. Supp. Fig. S4B quantified the percentage of dividing cells where FtsK overlaps the localization of each of the fusions at the septum, at the septum and the base, and at the base alone.

      (4) In Figure 6 the authors mention that the PG ring was at a slight angle relative to the MOMP-stained septum. What is the significance of this? The authors mention it several times but do not explain its relevance to divisome assembly. It is not really evident in the images presented.

      We mention in the discussion pgs. 17-18 of the revised manuscript that “The relevance of the angled orientation of PG and MreC rings relative to the MOMP-stained septum in division intermediates is unclear. However, it appears to be a conserved feature of the cell division process and may arise because the divisome proteins are often positioned slightly above or below the plane of the MOMP-stained septum. The positioning of divisome proteins above or below the septum is indicated in Figs. 1 and 2.

      We included cartoons in Fig. 6C of the revised manuscript to assist the reader in visualizing the angled orientation of the PG ring relative to the MOMP-stained septum.

      (5) In line 270 the authors claim that "these are the first data in any system to suggest that septal PG synthesis/modification is simultaneously directed by multiple independent divisome complexes." However, their experiments do not demonstrate that multiple divisome complexes are active at the same time. They show that multiple foci of FtsK etc. are present at sites where PG synthesis has occurred, but that does not necessarily mean that each focus/complex was actively synthesizing PG at the same time. Moreover, similar approaches were used to support a claim that septal PG synthesis is directed by multiple discrete divisome complexes previously (e.g. in Figure 1 of Bisson-Filho et al. 2017 (PMID: 28209898) in Bacillus subtilis and in Perez et al 2021 (PMID: 33269494) in Streptococcus pneumoniae). This claim is not central to the main conclusions of the study and could just be removed.

      This statement has been removed from the Results and the Discussion.

      (6) In Figure 6B the authors see three distinct FtsK foci. Why is this the only place in the manuscript where they see three foci? They mentioned previously that they saw foci at the septum and at the base of the progenitor mother cell, but why are there three foci here?

      The vast majority of dividing cells displayed one foci at the septum and/or the base.  Representative images were chosen that reflected the localization profiles observed in the majority of cells. While we observed cells with  multiple foci, as shown in Figure 6C, these cells were relatively rare   (~2% of cells for all the divisome proteins in 3 independent experiments).  Since  the number of cells with multiple foci were relatively rare, we chose to group these cells with the cells that had single foci at the septum, the septum and base, or base alone categories in the quantification shown in Fig. 2C. This is stated in the legend of Fig. 2 of the revised manuscript.

      (7) The Discussion section is lacking a couple of things that would put the data in a broader context. Can the authors speculate on how FtsK knows how to find the division site? I.e. what might be upstream of FtsK localization? Additionally, the authors do not talk about the FtsK sequence or domains at any point in the paper. Does Ct FtsK have a similar sequence/structure to FtsKs from other bacteria? Are there any differences in sequence/structure that might tell us about its function in Ct?

      We will be submitting another manuscript soon that examines how the site of assembly of the divisome is defined in dividing Chlamydia. This manuscript will also define the localization of the different sub-domains of chlamydial FtsK during cell division.  For this manuscript, we added a paragraph in the Discussion (pg. 16 of the revised manuscript) that states the domain organization is conserved in FtsK proteins from different bacteria. This paragraph includes information regarding the % sequence identity of the C-terminus and the N-terminus of chlamydial FtsK when compared to E. coli FtsK.

      (8) For Supplementary Figure S1B-C. The authors should show the full blots rather than just the single band of the protein of interest to show that the antibodies are specific. Additionally, the authors should include a loading control to show that they loaded the same amount of protein for each sample.

      We have included the full blots in Supp. Fig. S1 of the revised manuscript. We do not see the need for including a loading control for these blots because we are not making arguments about the relative level of the proteins that were assayed. We only use the blots to show that the fusion proteins are primarily a single species of the predicted molecular mass.

      (9) In Supplementary Figure S4A the authors use RT-qPCR to measure ftsK and pbp2 transcript levels. Since they have antibodies against these proteins, they should also include Western blots to show that the proteins are not being produced when targeted using CRISPRi.

      We have included data in Supp. Fig. S5E of the resubmission that indicates foci of FtsK and PBP2 could not be detected following the knockdown of ftsk and pbp2. We feel that these data support our conclusion that the induced expression of dCas12 in the the ftsk and pbp2 knockdown strains results in the downregulation of the endogenous FtsK and PBP2 polypeptides.

      (10) In lines 261-262 the authors say that "PG organization was the same or differed at the septum." What is the PG organization being compared to? Same or different from what?

      We agree with the reviewer that the text in lines 261-262 in the original submission was confusing.  The text has been modified.

      (11) Lines 201-215 the authors refer to Supplementary Figure S3 throughout this section, but they should refer to Supplementary Figure S4.

      This has been corrected.

      Reviewer #2 (Recommendations for the authors):

      I am not convinced that this paper shows that FtsK initiates the assembly of the divisome since the authors did not analyse the role and localization of all other chlamydial divisome components. Out of the ten homologues of divisome and elongasome components encoded by C. trachomatis genome, only five are investigated in this study. There is no explanation about how these five were chosen.

      We state on pg. 16 of the revised manuscript that “Although we focused our study on a subset of the divisome and elongasome proteins that Chlamydia expresses (bolded in Fig. 6G), our results support our conclusion that chlamydial budding is dependent upon a hybrid divisome complex and that FtsK is required for the assembly of this hybrid divisome. At this time, we cannot rule out that other proteins act upstream of FtsK to initiate divisome assembly in this obligate intracellular bacterial pathogen.

      Results convincingly indicate that FtsK is an early divisome component, but proofs are lacking to indicate that it initiates the divisome formation. Indeed, the authors do not show how FtsK would be the first protein to selectively accumulate at a given location to initiate the divisome formation. For this reason, the model they propose at the end of their study is not backed by sufficient data, to my opinion.

      We agree with the reviewer that our data does not show that FtsK initiates divisome assembly. The title of the manuscript has been modified to “FtsK is Critical for the Assembly of the Unique Divisome Complex of the FtsZ-less Chlamydia trachomatis” and the text throughout has been modified to indicate that FtsK is the first protein we assayed that associates with nascent divisomes at the base of dividing cells. We will soon be submitting another manuscript that details how FtsK is recruited to a specific site to initiate nascent divisome assembly, This work is too extensive to be included in this manuscript.

      There are also discrepancies in the number of cells analysed to quantify the localization of divisome components, ranging from 50 to 250 cells. The authors could better explain why there are such variations.

      There were differences in the number of cells analyzed in the various experiments, but in every instance the effect of inhibitors (A22 and mecillinam) or ftsk and pbp2 knockdown on divisome assembly was statistically significant.

      There are a few mistakes in the text regarding figure numbering (Figure S4 is mentioned as S3 in the text). Figures 5B and D are not specifically cited.

      These mistakes have been corrected in the revised manuscript.

      Line 261-262: the sentence starting "Our imaging analysis.." is not clear to me.

      We agree with the reviewer that the text in lines 261-262 was confusing.  The text has been modified (pg. 14 of the revised manuscript).

      Line 270-271: there are insufficient proofs to say that there are multiple independent divisome complexes. This is in my opinion an overinterpretation of the data, since there is no proof that these complexes are independent.

      This statement has been removed from the text.

      A few details are lacking in the figure legends:

      Figure 2C: when was the expression of the different mCherry and 6xHis constructs induced?

      The onset and length of the induction of the fusions have been included in the legend of Fig. 2.

      Bars are sometimes mentioned as uM and should be um. Bars sizes, number of replicates, and/or meaning of the error bars are lacking in legends of Figures S2, S3, and S4

      This has been corrected in the revised manuscript.

      The consistency of Figures could be improved between Figures 3A, 4A, B, and 5A. The results of treated cells could be always shown as dark grey. It would help the reader.

      We have used consistent coloring in Figs. 3-5 to indicate the treated cells.

      Reviewer #3 (Recommendations for the authors):

      (1) Lines 113-118: do Ct cells increase in size as they get closer to starting division? If so, could a pseudo-time course (demograph) be done to bolster the evidence that the base foci formed mainly in predivisional cells and not newborn cells? This evidence might be more convincing than the data in Figures 1F and G.

      Chlamydial cells in the population were heterogeneous in size at the timepoint we are studying. This observation is consistent with previous reports in the literature (Liechti et al.,2021). While we agree that a pseudo-time course could potentially bolster the evidence about when FtsK foci appear, we believe our current analysis sufficiently demonstrates that basal foci of FtsK appear prior to the appearance of new buds at the base of dividing cells.

      (2) Figure 3E: It looks like MreC localization to foci doesn't strictly require MreB polymerization. Is this known for E. coli or other species?

      To our knowledge, MreC assembly into a filament has not been shown to be dependent upon MreB in other bacteria.  In Caulobacter crescentus, MreC forms a helical structure that is not dependent upon MreB or MreB filament formation (Dye et al., 2005. PNAS; Divakaruni et al., 2005. PNAS).

      (3) Figure 5E: why is nearly half of PBP2 and PBP3 still localized to foci at the membrane even after treatment with mecillinam? This suggests, as the authors mention, that mecillinam reduces the efficiency of localization to the divisome but does not eliminate it. Any ideas why?

      At this time, we do not know why inhibiting the catalytic activity of PBP2 with mecillinam does not fully prevent the association of PBP2 with the chlamydial divisome. We have included a statement in the Results (pg. 13 of the revised manuscript) that inhibiting the catalytic activity of PBP2 prevents it from efficiently associating with or maintaining its association with polarized divisome complexes.

      (4) Line 262-263: This sentence is confusing-please rephrase. The same as what? Differed from what?

      We agree with the reviewer. The wording in lines 262-263 of the original submission has been modified.  

      (5) Lines 265-267 and Figure 6: Adding cartoon schematics might help readers visualize cell orientations in Fig. 6 (especially 6B).

      Cartoons have been added to Fig. 6C (Fig. 6B in the original submission) to orient the reader.

      (6) Line 294-298: Do the authors think that the residual 5-10% of PG foci after FtsK knockdown is due to the ability of residual FtsK to organize divisomes?

      We show that knockdown of FtsK is not complete, and while we cannot be certain, it is likely, that the PG foci detected in FtsK knockdown cells is due to the ability of the residual FtsK to organize divisomes that direct PG synthesis.

      (7) Do the authors have any evidence that FtsK foci are mobile like treadmilling FtsZ?

      We have not performed real-time imaging studies, and we currently have no evidence that FtsK foci are mobile.

      (8) FtsK foci here are reminiscent of mobile foci formed by the FtsK-like SpoIIIE at the Bacillus subtilis sporulation septum. This might be a good idea to mention in the Discussion. Is it possible that Ct FtsK is also involved in coordinating chromosome partitioning through the developing septum? (That is another reason why it would be useful to know if the translocase domain was dispensable for localization/activity).

      We are currently preparing another manuscript that documents the contribution of the various domains of FtsK to its localization profile and whether the division defect in ftsk knockdown cells can be suppressed by specific subdomains of FtsK. This manuscript not only will include these data, it will also include experiments that address how the site of polarized budding is defined. In the revised manuscript, we have included a description of how the domain organization of chlamydial FtsK is similar to E. coli FtsK (pg. 16 of revision). Chlamydial FtsK also has a similar domain organization as SpoIIIE from B. subtilis. The C-terminal catalytic domain of SpoIIIE is 45% identical to chlamydial FtsK. The N-terminus of SpoIIIE is predicted to encode 4 transmembrane spanning helices, like chlamydial FtsK. However, the N-terminus of SpoIIIE shares no sequence homology with the N-terminus of chlamydial FtsK.  We have not included the similar domain organization of SpoIIIE and chlamydial FtsK in the revised manuscript.

      (9) It seems that FtsK foci localize to a particular spot opposite from the active septum, although how this spot is specified is not clear. Is there any geometric clue for FtsK's localization like there is for Min-specified FtsZ localization?

      As mentioned above, we are currently preparing another manuscript that documents our efforts to understand how the site of polarized budding is defined.  This analysis is too extensive to include in this study.

      (10) As mentioned in the Summary, do the authors know whether the N-terminal membrane binding part of FtsK (FtsKn) sufficient for localization/divisome assembly in Ct as it is in other species? Oullette et al. 2012 showed that FtsKn could interact with MreB in BACTH.

      We are currently preparing another manuscript that documents the contribution of the various domains of FtsK to its localization profile.

      (11) The previous BACTH result with MreB and FtsKn implies that this interaction is direct, yet the current data suggest that this is not the case. Can the authors comment on this? Is this due to bridging effects inherent in the BACTH system?

      We have not presented any data to indicate that FtsK and MreB do not interact. We have only shown that FtsK localization is not dependent upon MreB filament formation (Fig. 3).

      (12) The FtsZ-independent role of FtsK in nucleating the divisome suggests that Ct FtsK may differ from other FtsKs structurally - can this be explored, perhaps with AlphaFold 3?

      As mentioned above, we have included a paragraph in the discussion of the revised manuscript (pg. 16 of the revised manuscript) that states the domain organization of chlamydial FtsK is similar to E.coli FtsK. This conserved domain organization is evident when we view the structures of the proteins using Alphafold.

      (13) Typo on line 559: should be HeLa.

      This has been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a comprehensive exploration of the role of liver-specific Survival Motor Neuron (SMN) depletion in peripheral and central nervous system tissue pathology through a well-constructed mouse model. This study is pioneering in its approach, focusing on the broader physiological implications of SMN, which has traditionally been associated predominantly with spinal muscular atrophy (SMA).

      Strengths:

      (1) Novelty and Relevance: The study addresses a significant gap in understanding the role of liver-specific SMN depletion in the context of SMA. This is a novel approach that adds valuable insights into the multi-organ impact of SMN deficiency.

      (2) Comprehensive Methodology: The use of a well-characterized mouse model with liver-specific SMN depletion is a strength. The study employs a robust set of techniques, including genetic engineering, histological analysis, and various biochemical assays.

      (3) Detailed Analysis: The manuscript provides a thorough analysis of liver pathology and its potential systemic effects, particularly on the pancreas and glucose metabolism.

      (4) Clear Presentation: The manuscript is well written. The results are presented clearly with well-designed figures and detailed legends.

      We thank the reviewer for their positive comments. They had some concerns for us to consider (see below). We provide a point-by-point response to their comments.

      Weaknesses:

      (1) Limited Time Points: The study primarily focuses on a single time point (P19). This limits the understanding of the temporal progression of liver and pancreatic pathology in the context of SMN depletion. Longitudinal studies would provide a better understanding of disease progression.

      We thank the reviewer for the suggestion. We extended our analysis to include P60 mice and performed both liver and pancreatic analyses at this time point to address this suggestion.

      (2) Incomplete Recombination: The mosaic pattern of Cre-mediated excision leads to variability in SMN depletion, which complicates the interpretation of some results. Ensuring more consistent recombination across samples would strengthen the conclusions.

      The variability in Cre-mediated excision is inherently stochastic, influenced by factors such as Cre expression levels, timing of recombination, and the accessibility of the target locus in individual cells. Achieving complete consistency across samples is particularly challenging, especially given the complexity of our breeding scheme, which occasionally results in litters without any animals of the desired genotype. Importantly, our study not only demonstrates that liver-specific SMN depletion results in liver alterations and pancreatic dysfunction but also highlights the limitations and challenges associated with this mouse model. By doing so, we aim to provide valuable insights for other researchers considering similar approaches in future studies.

      Reviewer #2 (Public review):

      Summary:

      Marylin Alves de Almeida et al. developed a novel mouse cross via conditionally depleting functional SMN protein in the liver (AlbCre/+;Smn2B/F7). This mouse model retains a proportion of SMN in the liver, which better recapitulates SMN deficiency observed in SMA patients and allows further investigation into liver-specific SMN deficiency and its systemic impact. They show that AlbCre/+;Smn2B/F7 mice do not develop an apparent SMA phenotype as mice did not develop motor neuron death, neuromuscular pathology or muscle atrophy, which is observed in the Smn2B/- controls. Nonetheless, at P19, these mice develop mild liver steatosis, and interestingly, this conditional depletion of SMN in the liver impacts cells in the pancreas.

      Strengths:

      The current model has clearly delineated the apparent metabolic perturbations which involve a significantly increased lipid accumulation in the liver and pancreatic cell defects in AlbCre/+;Smn2B/F7 mice at P19. Standard methods like H&E and Oil Red-O staining show that in AlbCre/+;Smn2B/F7 mice, their livers closely mimic the livers of Smn2B/- mice, which have the full body knockout of SMN protein. Unlike previous work, this liver-specific conditional depletion of SMN is superior in that it is not lethal to the mouse, which allows an opportunity to investigate the long-term effects of liver-specific SMN on the pathology of SMA.

      We thank the reviewer for their positive comments. They had some concerns for us to consider (see below). We provide a point-by-point response to their comments (review comments in black, our response in red).

      Weaknesses:

      Given that SMA often involves fatty liver, dyslipidemia and insulin resistance, using the current mouse model, the authors could have explored the long-term effects of liver-specific depletion of SMN on metabolic phenotypes beyond P19, as well as systemic effects like glucose homeostasis. Given that the authors also report pancreatic cell defects, the long-term effect on insulin secretion and resistance could be further explored. The mechanistic link between a liver-specific SMN depletion and apparent pancreatic cell defects is also unclear.

      We extended our analysis to include P60 mice and performed both liver and pancreatic analyses at this time point to address this suggestion. In addition, we discussed the liver-pancreas axis in the Discussion.

      Discussion:

      This current work explores a novel mouse cross in order to specifically deplete liver SMN using an Albumin-Cre driver line. This provides insight into the contribution of liver-specific SMN protein to the pathology of SMA, which is relevant for understanding metabolic perturbations in SMA patients. Nonetheless, given that SMA in patients involve a systemic deletion or mutation of the SMN gene, the authors could emphasize the utility of this liver-specific mouse model, as opposed to using in vitro models, which have been recently reported (Leow et al, 2024, JCI). Authors should also discuss why a mild metabolic phenotype is observed in this current mouse model, as opposed to other SMA mouse models described in literature.

      We appreciate the reviewer’s insightful comment. We have thoroughly addressed this suggestion in the Discussion section, particularly in lines 284-298; 309-322 and 334-359.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Longitudinal Studies: Conducting studies at maybe one more time points postnatally to provide a clearer picture of how liver-specific SMN depletion affects tissue pathology over time.

      We extended our analysis to include P60 mice and performed both liver and pancreatic analyses at this time point to address this suggestion.

      (2) Functional Assays: Incorporate glucose tolerance tests, insulin sensitivity tests, and more detailed metabolic profiling to better understand the physiological consequences of liver-specific SMN depletion on glucose metabolism and pancreatic function.

      We sincerely thank the reviewer for this suggestion. We have included a full panel of metabolic hormones associated with glucose metabolism from animals at P19 and P60. These new data, along with additional figures, have now been provided in our revised manuscript.

      (3) Mechanism: Discuss the molecular pathways affected by SMN depletion in the liver and pancreas. Mechanistic studies including transcriptomic or proteomic analyses to identify dysregulated pathways will help.

      We appreciate the reviewer’s insightful comment. We have thoroughly addressed this suggestion in the Discussion section, particularly in lines 284-298 and 334-359.

      (4) Typos in the abstract: beta cells secret insulin and alpha cells produce gulcagon. 

      Thank you for catching this error. It has been corrected to reflect that insulin is produced by beta cells and glucagon by alpha cells.

      (5) Efficiency and specificity of the Alb-Cre: if possible, cross the Alb-Cre with the Rosa26 reporter line to test the efficiency and specificity of the Alb-Cre.

      We agree that this would provide valuable insights. However, initiating a new breeding program to generate the required genotypes would take over a year and is beyond the scope of this study. To address this in part, we performed Cre immunostaining of the liver, pancreas, and spinal cord at P19, as well as the liver at P60. These results, now included in Supplemental Figure 1, demonstrate liver-specific expression and variability across hepatocytes.

      Reviewer #2 (Recommendations for the authors):

      The title of this manuscript is potentially misleading. The manuscript largely investigates the involvement of SMN protein on peripheral organs such as the liver, muscles, neuromuscular junction, and the pancreas. Yet, the title could be interpreted that the peripheral nervous system or central nervous system is the main focus. The title should be edited to indicate key terms such as "motor neuron and peripheral tissue pathology".

      Thank you for pointing this out. We have revised the title to better represent the study’s focus. It is now “Impact of liver-specific survival motor neuron (SMN) depletion on central nervous system and peripheral tissue pathology”.

      Suggestions:

      Please clarify and explain clearly the various mouse lines (Smn2B/+, Smn2B/- and +/+; Smn2B/F7 ) used as controls as the nomenclature used is confusing. In addition, authors could consider the use of a wild-type mouse line to be used as a control to validate changes in AlbCre/+;Smn2B/F7 mice.

      We have now provided clarification on mouse line nomenclature in the Results section (lines 104–124). Full-body heterozygous mice (_Smn_2B/+) are used as controls due to their slightly reduced SMN protein levels and absence of phenotypic changes compared to wild-type mice.

      Given that the main phenotype implicated by the liver-specific depletion of SMN protein in AlbCre/+;Smn2B/F7 mice is pancreatic abnormalities (changes in alpha- and beta- cell numbers and blood glucose levels), authors should expand further on the pancreatic phenotype.  

      We added a full panel of metabolic hormones related to glucose metabolism in animals at P19 and P60. Furthermore, this has been discussed in detail in lines 284-298 and 334-344 of the Discussion.

      A pancreas-specific depletion of SMN would provide this current manuscript with a better understanding of the role of SMN in regulating SMA pathology and provide more definitive conclusions on the contribution of liver-specific SMN depletion on normal pancreatic function.

      We agree that this would be very informative. However, to do this would require initiation of a new breeding program that will take more than a year to arrive at the right genotypes. Although valuable, it is beyond the scope of the present study.

      The authors should also delineate the role of hepatic SMN in pancreatic function, and how the intrinsic liver-specific loss of SMN directly impacts the pancreas. Currently, literature demonstrates that the fatty liver phenotype in SMA patients is a primary SMN-dependent hepatocyte-intrinsic liver defect associated with mitochondrial and other hepatic metabolism implications (see Leow et al, 2024 J Clin Invest). Given that the authors describe that SMN protein levels are not altered in the pancreas of AlbCre/+;Smn2B/F7 mice at P19, the authors ought to clarify how pancreas development and function is impacted in this mouse model, whether in-utero or postnatally. This could potentially underscore the cross-talk between liver SMN and pancreas function.

      We have discussed the relationship between hepatic SMN and pancreatic function in the Discussion at lines 284-298 and 334-359.

      Authors should also perform some metabolic tolerance tests to both oral glucose and insulin at an older age (e.g. P60) to study their homeostasis in these mice. These would help to substantiate the authors' conclusion and provide the paper with a greater level of novelty.

      We thank the reviewer for this suggestion. A full panel of metabolic hormones related to glucose metabolism at P19 and P60 has been included, supported by additional figures that enhance the manuscript's novelty and depth.

      Authors mentioned in the Discussion in lines 238 to 240: "Altogether, our findings underscore the necessity of conducting further investigations at later time points to unveil potential modifications in other pathways and their repercussions on liver physiology". Please elucidate the effects of longer term liver-specific depletion of SMN beyond P19, such as the onset of NAFLD or a diabetic phenotype due to pancreatic dysfunctions.

      We extended our data to include P60 mice and performed liver and pancreatic analyses at these time points. The observed effects were transient, possibly due to the stochastic nature of Cre expression.

      In addition, while AlbCre/+;Smn2B/F7 mice had similar weight gain trends as controls, it does appear that AlbCre/+;Smn2B/F7 mice weigh more than their controls by P60 (Figure 9C). This data would provide more convincing evidence of the metabolic defects observed in these mice.

      As per the reviewer’s suggestion, we included new data (Figure 9D) showing % weight gain at P60 normalized to basal weight at P7. However, no statistically significant differences were detected.

      Other than protein quantification, authors should perform immunohistochemistry or in-situ hybridization of SMN and imaging of AlbCre/+;Smn2B/F7 organs to validate the loss of liver-specific SMN. It is unclear from western blots that the expression of SMN is only in hepatocytes.

      We thank the reviewer for the suggestion. Unfortunately, SMN antibodies have not produced reliable tissue immunostaining. To address this, we performed Cre immunostaining of the liver, pancreas, and spinal cord at P19, and the liver at P60, which demonstrated liver-specific expression. These results are now included in Supplemental Figure 1.

      Authors should consider re-wording lines 228 through 231: "While our current analysis did not reveal significant differences in AlbCre/+;Smn2B/F7 mice, the observed upward trend in transferrin and HO levels suggests ongoing changes in iron metabolism, which may not be fully manifested at P19". Alternatively, a higher number of mouse samples would allow them to qualify this statement. Authors should also consider comparing levels of liver biomarkers such as ALT and AST, to check for liver homeostatic function.

      We have removed speculative statements to avoid unsupported claims.

      Recommendations:

      The methods and additional details to generate the AlbCre/+;Smn2B/F7 should be explained better in section 2.1 of the Results. It is potentially confusing as to why these mice had to carry both 2B and F7 alleles. Additionally, the role of the F7 allele is not deliberately clear in the Introduction.

      Additional details are now included in the Introduction (lines 87-90) and the Results section (lines 104-124).

      Authors should refer to Leow et al 2024 (J Clin Invest) and discuss how their current findings compare with their hepatocyte-intrinsic SMN deficiency IPSCs model.<br /> We note a previous publication (Deguise et al 2021 Cell Mol Gastroenterol Hepatol) by the authors which characterized the Smn2B/- mouse model and its NAFLD/NASH features. From our understanding, the Smn2B/- mouse model appears to recapitulate SMA phenotype well, such as the early onset of hepatic steatosis and neurological conditions. As a follow-up to this publication, authors should discuss why this current study of a liver-specific SMN depletion is important and relevant to the study of SMA pathology.

      We thank the reviewer for the insightful suggestions. We have included a discussion of these findings and their relevance to the study of SMA pathology in lines 284-298 and 309-322.

      Minor corrections:

      Abstract (line 32) reads: "a decrease in insulin producing alpha-cells and an increase in glucagon producing beta-cells". The authors should clarify and correct as insulin producing beta-cells and glucagon producing alpha-cells.

      Thank you for catching the error. We corrected the description of insulin- and glucagon-producing cells.

      Please clarify the number and gender of mice used for weight tracking and motor function experiments up to P60 (Figure 9C). It would be inappropriate if male and female mice were plotted together. If so, authors should stratify data by gender.

      We thank the reviewer for the suggestion. Unfortunately, we did not stratify the animals by sex due to the unequal and insufficient number of males and females in our study. To address this, we normalized weight gain to each animal’s starting weight, and no significant differences were observed (now shown in Figure 9D).

      The number of figures should be reduced. We recommend merging Figures 1 and 2 (generation of AlbCre/+;Smn2B/F7 mouse line and validation) and Figures 3 and 4 (liver function). Figures 5 through 9 may be supplemental figures instead.

      We thank the reviewer for the suggestions. We merged Figures 1 and 2, and Figures 3 and 4, as requested. However, we would prefer to keep the other figures within the main results as they assess the impact of liver-specific depletion of SMN on other pathologies within the mouse model.

      Standardize the use of asterisks and reporting p-values in Figure 2. All other figures in the manuscript utilize asterisks, but Figures 2C', 2D' and 2E' use p-values across comparisons.

      P-values were included only when they approached statistical significance, providing additional clarity to the results.

      It is unclear what the white arrow in Figure 7A indicates.

      It is meant to point out the absence of an innervating axon. Please see Figure 5 legend, lines 801-802.

      Note spelling errors in Figures 8B and 8C: 'Muscle flber'.

      Thank you for catching this. We have corrected the typo to indicate muscle fiber instead.

      Please clarify if muscle fiber size should be indicated as µm2 instead of µ2 in Figures 8B and 8C, as written in Materials and Methods under line 394.

      Thank you for catching this. We corrected the typo to indicate µm2 instead.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      (1) The overall conclusion, as summarized in the abstract as "Together, our study documents the diversification of locomotor and oculomotor adaptations among hunting teleost larvae" is not that compelling. What would be much more interesting would be to directly relate these differences to different ecological niches (e.g. different types of natural prey, visual scene conditions, height in water column etc), and/or differences in neural circuit mechanisms. While I appreciate that this paper provides a first step on this path, by itself it seems on the verge of stamp collecting, i.e. collecting and cataloging observations without a clear, overarching hypothesis or theoretical framework.

      There are limited studies on the prey capture behaviors of larval fishes, and ours is the first to compare multiple species systematically using a common analysis framework. Our analysis approach could have uncovered a common set of swim kinematics and capture strategies shared by all species; but instead, we found that medaka used a monocular strategy rather than the binocular strategy of cichlids and zebrafish. Our analysis similarly could have revealed first-feeding larvae of all species go through a “bout” stage, which was previously proposed as important for sensorimotor decision making (Bahl et al., 2019), but instead we found that medaka and some cichlids have more continuous swimming from an early life stage. Finally, the rate at which prey capture kinematics evolves is not known. Our approach could have revealed rapid diversification of feeding strategies in cichlids (similarly to how adult feeding behavior evolves), but instead we found smaller differences within cichlids than between cichlids and medaka.

      (2) The data to support some of the claims is either weak or lacking entirely.

      Highlighted timestamps in videos, new stats in fig 1H and fig 2, updated supplementary figures now provide additional support for claims.

      - It would be helpful to include previously published data from zebrafish for comparison.

      We appreciate the suggestion. Mearns et al. (2020) provided a comprehensive account of prey capture in zebrafish larvae in an almost identical setup with similar analyses. We do not feel it is necessary to recount all the findings in that paper here. There are many studies on prey capture in zebrafish from the past 20 years, and reproducing these here would not add anything to that extensive pre-existing literature.

      - Justification is required for why it is meaningful to compare hunting strategies when both fish species and prey species are being varied. For instance, artemia and paramecia are different sizes and have different movement statistics.

      We added text explaining why different food was chosen for medaka/cichlids. There is no easy way to stage match fishes as evolutionarily diverged as cichlids, medaka, and zebrafish. Size is a reasonable metric within a species, but there is no guarantee that sizematched larvae of two different species are at the same level of maturity. Therefore, we thought the most appropriate stage to address is when larvae first start feeding, as this enables us to study innate prey capture behavior before any learning or experience-dependent changes have taken place. Given that zebrafish, medaka and cichlid larvae are different sizes when they first start feeding, it was necessary to study their hunting behavior to different prey items.

      - It would be helpful in Figure 1A to add the abbreviations used elsewhere in the paper. I found it slightly distracting that the authors switch back and forth in the paper between using "OL" and "medaka" to refer to the same species: please pick one and then remain consistent.

      Medaka is the common name for the japanese rice fish, O. latipes. Cichlilds do not have common names are only referred to by their scientific names. Since readers are more likely to be familiar with the common name, medaka, we now use medaka (OL) throughout the manuscript, which we hope makes the text clearer.

      - The conceptual meaning of behavioral segmentation is somewhat unclear. For zebrafish, the bouts already come temporally segmented. However in medaka for instance, swimming is more continuous, and the segmentation is presumably more in terms of "behavioral syllables" as have been discussed for example mouse or drosophila behavior (in the last row of Figure S1 it is not at all obvious why some of the boundaries were placed at their specific locations). It's not clear whether it's meaningful to make an equivalence between syllables and bouts, and so whether for instance Figure 1H is making an apples-to-apples comparison.

      We clarified the text to say we are comparing syllables, rather than bouts.

      - The interpretation of 1H is that "medaka exhibited significantly longer swims than cichlids"; however this is not supported by the appropriate statistical test. The KS test only says that two probability distributions are different; to say that one quantity is larger than another requires a comparison of means.

      Updated Fig 1H; boostrap test (difference of medians) and re plotted data as violin plots.

      (2) The data to support some of the claims is either weak or lacking entirely.

      Highlighted timestamps in videos, new stats in fig 1H and fig 2, updated supplementary figures now provide additional support for claims.

      - I think the evidence that there are qualitatively different patterns of eye convergence between species is weak. In Figure 2A I admire the authors addressing this using BIC, and the distributions are clearly separated in LA (the Hartigan dip test could be a useful additional test here). However for LO, NM, and AB the distributions only have one peak, and it's therefore unclear why it's better to fit them with two Gaussians rather than e.g. a gamma distribution. Indeed the latter has fewer parameters than a two-gaussian model, so it would be worthwhile to use BIC to make that comparison. The positions of the two Gaussians for LO, NM, and AB are separated by only a handful of degrees (cf LA, where the separation is ~20 degrees), which further supports the idea that there aren't really two qualitatively different convergence states here.

      Added explanation to text.

      - Figure S2 is unfortunately misleading in this regard. I don't claim the authors aimed to mislead, but they have made the well-known error of using colors with very different luminances in a plot where size matters (see e.g.

      https://nam12.safelinks.protection.outlook.com/?url=https%3A %2F%2Fwww.r-project.org%2Fconferences%2FDSC2003%2FProceedings%2FIhaka.pdf&data=05%7C02%7Cdme arns%40princeton.edu%7C17ae2b44f0f246f15ddd08dc9b8e2 01c%7C2ff601167431425db5af077d7791bda4%7C0%7C0%7

      C638556282750568814%7CUnknown%7CTWFpbGZsb3d8ey

      JWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ XVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Ll4J4Xo39JEtKb %2FNnRWNoyedZAu5aAOMq0lHJCwsfXI%3D&reserved=0).

      Thus, to the eye, it appears there's a big valley between the red and blue regions, but actually, that valley is full of points: it's really just one big continuous blob.

      Kernel density estimation of eye convergence angles were added to Figure S2. The point we wish to make is that there is higher density when both eyes are rotated invwards (converged) in cichlids, but not medaka (O. latipes). The valley between converged and unconverged states being full of points is due to (1) slight variation with placement of key points in SLEAP, which blurs the boundary between states and (2) the eye convergence angle must pass through the valley in order to become converged, so necessarily there are points in between the two extremes of eye convergence.

      - In Figure 2D please could the authors double-check the significance of the difference between LO and NM: they certainly don't look different in the plot.

      Thank for for flagging this. We realize the way we previously reported the stats was open to misinterpretation. We have updated figure 2C, D and F to use letters to indicate statistical groupings, which hopefully makes it clearer which species are statistically different from each other.

      - In Figure 2G it's not clear why AB is not included. It is mentioned that the artemia was hard to track in the AB videos, but the supplementary videos provided do not support this.

      The contrast of the artemia in the AB videos is sufficiently different from the other cichlid videos that our pre-trained YOLO model fails. Retraining the model would be a lot of extra work and we feel like a comparison of three species is sufficient to address the sensorimotor transformations that occur over the course of prey capture in cichlids.

      - The statement "Zebrafish larvae have a unique swim repertoire during prey capture, which is distinct from exploratory swim bouts" is not supported by the work of others or indeed the authors' own work. In Figure 4F all types of bouts can occur at any time, it's just the probability at which they occur that varies during prey capture versus other times (see also Mearns et al (2020) Figure S4B).

      The point is well taken that there probably is not a hard separation between spontaneous and prey capture swims based on tail kinematics alone, which is also shown in Marques et al. (2018). However, we think that figure 2I of Mearns et al., which plots the probability of swims being drawn from different parts of the behavior space during prey capture (eyes converged) or not (eyes unconverged), shows that the repertoire of swims during the two states is substantially different. Points are blue or red; there are very few pale blue/pale red points in that figure panel. Figure S4B is showing clustered data, and clustering is a notoriously challenging problem for which there exists no perfect solution (Kleinberg, 2002). The clusters in Mearns et al. incorporated information about transition structure, as this was necessary for obtaining interpretable clusters for subsequent analyses. However, a different clustering approach could have yielded different boundaries, which may have shown more (or less) separation of bout types during prey capture/exploratory swimming. Therefore, we have updated the text to say that zebrafish perferentially perform different swim types during prey capture and exploration, and re-interpreted the behavior of cichlids similarly.

      - More discussion is warranted of the large variation in the number of behavioral clusters found between species (11-32). First, how much is this variation really to be trusted? I appreciate the affinity propogation parameters were the same in all cases, but what parameters "make sense" is somewhat dependent on the particular data set. Second, if one does believe this represents real variation, then why? This is really the key question, and it's unsatisfying to merely document it without trying to interpret it.

      Extended paragraph with more interpretation.

      - What is the purpose of "hovers"? Why not stay motionless? Could it be a way of reducing the latency of a subsequent movement? Is this an example of the scallop theorem?

      Added a couple of sentences speculating on function.

      - I'm not sure "spring-loaded" is a good term here: the tension force of a coiled tail is fairly negligible since there's little internal force actively trying to straighten it.

      Rewrote this part to highlight that fish spring toward the prey, without the implication that tension forces in the tail are responible for the movement. However, we are not aware of any literature measuring passive forces within the tail of fishes. Presumably the notochord is relatively stiff and may provide an internal force trying to straighten the tail.

      - There are now several statements for which no direct evidence is presented. We shouldn't have to rely on the author's qualitative impressions of what they observed: show us quantitative analysis.

      * "often hover"

      * "cichlids often alternate between approaches and hover swims"

      * "over many hundreds of milliseconds"

      * "we have also observed suction captures and ram-like attacks"

      * "may swim backwards"

      * "may expel prey from their mouth"

      * "cichlid captures often occur in two phases"

      Added references to supplementary videos with timestamps to highlight these behaviors.

      - I don't find it plausible that sated fish continue hunting prey that they know they're not going to eat just for the practice.

      Removed the speculation.

      - In Figure 3 is it not possible to include medaka, based on the hand-tracked paramecia?

      The videos are recorded at high frame rate, so it would be a lot of additional work to track these manually. Furthermore, earlier in prey capture it is very difficult to tell by watching videos which prey the medaka are tracking, especially as single paramecia can drift in and out of focus in the videos. Since there is no eye convergence, it is very difficult to ascertain for certain when tracking a given prey begins. In Fig 4, it was only possible to track paramecia by hand since it is immediately prior to the strike and from the video it is possible to see which paramecium the fish targeted. Our analyses of heading changes was performed over the 200 ms prior to a strike, which we think is a conservative enough cutoff to say that fish were probably pursuing prey in this window (it is shorter than the average behavioral syllable duration in medaka).

      - Figure 3 (particularly 3D) suggests the interesting finding that LA essentially only hunt prey that is directly in front of them (unlike LO and NM, the distribution of prey azimuth actually seems to broaden slightly over the duration of hunting events).

      This is worthy of discussion.

      We offer a suggestion for the many instances of prey capture being initiated in the central visual field in LA later in the manuscript when we discuss spitting behavior. We have added text to make this point earlier in the manuscript. The increase in azimuthal range at the end of prey capture may be due to abort swims (e.g. supp. vid. 1, 00:21). The widening of azimuthal angles is present in LO and NM also and is not unique to LA.

      - The reference Ding et al (2016) is not in the reference list.

      Wrong paper was referenced. Should be Ding 2019, which has been added to bibliography.

      - I am not convinced that medaka exhibit a unique side-swing behavior. I agree there is this tendency in the example movie, however, the results of the quantification (Figure 4) are underwhelming. First, cluster 5 in 4K appears to include a proportion of cases from LA and AB. These proportions may be small, but anything above zero means this is not unique to medaka. Second, the heading angle (4N) starts at 4 degrees for LA and 8 degrees for medaka. This difference is genuine but very small, much smaller than what's drawn in the schematic (4M). I'm not sure it's justifiable to call a difference of 4 degrees a qualitatively different strategy.

      We have changed the text to highlight that side swing is highly enriched in medaka. Comparing 4J to 3B we would argue that there is a qualitative difference in the strategy used to capture prey in the cichlid larvae we study here and medaka. We agree that further work is required to understand distance estimation behaviors in different species. In this manuscript, we use heading angle as a proxy for how prey position might change on the retina over a hunting sequence. But as the heading and distance are changing over time, the actual change in angle on the retina for prey may be much larger than the ~8 degree shift reported here. The actual position of the prey is also important here, which, for reasons mentioned above, we could not track. Given the final location of prey in the visual field prior to the strike (Fig 4J), the most parsimonious explanation of the data is that the prey is always in the monocular visual field. In cichlids, the prey is more-or-less centered in the 200 ms preceding the strike. While it is true theat the absolute difference in heading is 4 degrees, when converted to an angular velocity (4N, right), the medaka (OL) effectively rotate twice as fast as LA (20 deg/s vs 40 deg/s), which we think is a substantial difference and evidence of a different targeting strategy.

      - 4K: This is referred to in the caption as a confusion matrix, which it's not.

      Fixed.

      - 4N right panel: how many fish contributed to the points shown?

      Added to figure legend (n=113, LA; n=36, OL). Same data in left and right panels.

      - In the Discussion it is hypothesized that medaka use their lateral line in hunting more than in other species. Testing this hypothesis (even just compared to one other species) would be fairly straightforward, and would add significant interest to the paper overall.

      We agree that this is an interesting experiment for follow up studies, but it is beyond the scope of the current manuscript as we do not have the appropriate animal license for this experiment.

      Reviewer 2:

      The paper is rather descriptive in nature, although more context is provided in the discussion. Most figures are great, but I think the authors could add a couple of visual aids in certain places to explain how certain components were measured.

      Added new supplemental figure (Supp Fig 2)

      Figure 1B- it could be useful to add zebrafish and medaka to the scientific names (I realize it's already in Figure A but I found myself going back and forth a couple of times, mostly trying to confirm that O. latipes is medaka).

      Added common names to 1B, sprinkled reminders of OL/medaka throughout text.

      Figure 1G. I wasn't sure how to interpret the eye angle relative to the midline. Can they rotate their eyes or is this due to curvature in the 'upper' body of the fish? Adding a schematic figure or something like that could help a reader who is not familiar with these methods. Related to this, I was a bit confused by Figure 2A. After reading the methods section, I think I understand - but I little cartoon to describe this would help. It also reminds the reader (especially if they don't work with fish) that fish eyes can rotate. I also wanted to note that initially, I thought convergence was a measure of how the two eyes were positioned relative to the prey given the emphasis given on binocular vision, and only after reading certain sections again did I realize convergence was a measure of eye rotation/movement.

      New supplemental figure explaining how eye tracking is performed

      Figure 3. It was not immediately clear to me what onset, middle, and end represented - although it is explained in the caption. I think what tripped me up is the 'eye convergence' title in the top right corner of Figure 3A.

      Updated figure with schematic illustrating that time is measured relative to eye convergence onset and end.

      The result section about attack swim, S-strike, capture spring, etc. was a bit confusing to read and could benefit from a couple of concise descriptions of these behaviors. For example, I am not familiar with the S strike but a couple of paragraphs into this section, the reader learns more about the difference between S strike vs. attack swim. This can be mentioned in the first paragraph when these distinct behaviors are mentioned.

      Added description of behavior earlier in text.

      Figure 4. Presents lots of interesting data! I wonder if using Figure 1E could help the reader better understand how these measurements were taken.

      New supplemental figure added, explaining how tail tracking is performed.

      I probably overlooked this, but I wonder why so many panels are just focused on one species.

      Added explanation to the text.

      Is the S-shaped capture strategy the same as an S strike?

      Clarified in text to say "S-strike-like". This is a description of prey capture from adult largemouth bass in New et al. (2002). From the still frames shown in that paper, the kinematics looks similar to an S-strike or capture spring. The important point we wish to make is that tail is coiled in an S-shape prior to a strike, which indicates this that a kinematically similar behavior exists fishes beyond just larval cichlids and zebrafish.

      At the end of the page, when continuous swimming versus interrupted swimming is discussed, please remind the reader that medaka shows more continuous swimming (longer bouts).

      Added "while medaka swim continuously with longer bouts ("gliding")".

      After reading the discussion, it looks like many findings are unique. For example, given that medaka is such a popular model species in biology, it strikes me that nobody has ever looked into their hunting movements before. If their findings are novel, perhaps they should state so it is clear that the authors are not ignoring the literature.

      We have highlighted what we believe to be the novelty of our findings (first description of prey capture in larval cichlids and medaka). To our knowledge, we are first to describe hunting in medaka; but there is an extensive literature on medaka dating back to the early 20th century, some of which is only published in Japanese. We have done our best to review the literature, but we cannot rule out that there are papers that we missed. No English language article or review we found mentions literature on hunting behavior in medaka larvae.

      Reviewer 3:

      More evidence is needed to assess the types of visual monocular depth cues used by medaka fish to estimate prey location, but that is beyond the scope of this compelling paper. For example, medaka may estimate depth through knowledge of expected prey size, accommodation, defocus blur, ocular parallax, and/or other possible algorithms to complement cues from motion parallax.

      Added sentence to discussion highlighting that other cues may also contribute to distance estimation in cichlids and medakas. Follow-up studies will require new animal license.

      None. It's quite nice, timely, and thorough work! For future work, one could use 3D pose estimation of eye and prey kinematics to assess the dynamics of the 2D image (prey and background) cast onto the retina. This sort of representation could be useful to infer which monocular depth cues may be used by medaka during hunting.

      Great suggestion for follow up studies. Bolton et al. and Mearns et al. both find changes in z associated with prey capture, and it would be interesting to see how other fish species use the full 3-dimensional water column during prey capture, especially considering the diversity of hunting strategies in adult cichlids (ranging from piscivorous species, like LA, to algar grazers).

      In Figure 4N, you use "change in heading leading up to a strike as a proxy for the change in visual angle of the prey for cichlids and medaka." This proxy makes sense, but you also have the eye angles and (in some cases) the prey positions. One could estimate the actual change in visual angle from this information, which would also allow one to measure whether the fish are trying to stabilize the position of the prey on a high-acuity patch of the retina during the final moments of the hunt. This information may also shed light on which monocular depth cues are used.

      As addressed in comment to reviewer 1, this would require actually manually tracking individual paramecia over hundreds of frames. It is not possible to determine exactly when hunting begins in medaka, and it is prone to errors if medaka switch between targets over the course of a hunting episode. This question is better addressed with psychophysics experiments in embedded animals where it is possible to precisely control the stimulus, but this requires new animal licenses and is beyond the scope of this paper.

      In Figure 5, you could place the prey object a little farther from the D. rerio fish for the S-strike diagram.

      Fixed.

      Figure 4F legend should read "...at the peak of each bout."

      Fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Thank you for your constructive feedback and recognition of our work. We followed your suggestion and improved the accuracy of the language used to interpret some of our findings. 

      Summary:

      The present study by Mikati et al demonstrates an improved method for in-vivo detection of enkephalin release and studies the impact of stress on the activation of enkephalin neurons and enkephalin release in the nucleus accumbens (NAc). The authors refine their pipeline to measure met and leu enkephalin using liquid chromatography and mass spectrometry. The authors subsequently measured met and leu enkephalin in the NAc during stress induced by handling, and fox urine, in addition to calcium activity of enkephalinergic cells using fiber photometry. The authors conclude that this improved tool for measuring enkephalin reveals experimenter handling stress-induced enkephalin release in the NAc that habituates and is dissociable from the calcium activity of these cells, whose activity doesn't habituate. The authors subsequently show that NAc enkephalin neuron calcium activity does habituate to fox urine exposure, is activated by a novel weigh boat, and that fox urine acutely causes increases in met-enk levels, in some animals, as assessed by microdialysis.

      Strengths:

      A new approach to monitoring two distinct enkephalins and a more robust analytical approach for more sensitive detection of neuropeptides. A pipeline that potentially could help for the detection of other neuropeptides.

      Weaknesses:

      Some of the interpretations are not fully supported by the existing data or would require further testing to draw those conclusions. This can be addressed by appropriately tampering down interpretations and acknowledging other limitations the authors did not cover brought by procedural differences between experiments.

      We have taken time to go through the manuscript ensuring we are more detailed and precise with our interpretations as well as appropriately acknowledging limitations. 

      Reviewer #2 (Public Review):

      Thank you for your constructive and thorough assessment of our work. In our revised manuscript, we adjusted the text to reflect the references you mentioned regarding the methionine oxidation procedure. Additionally, we expanded the methods section to include the key details of the statistical tests and procedures that you outlined. 

      Summary:

      The authors aimed to improve the detection of enkephalins, opioid peptides involved in pain modulation, reward, and stress. They used optogenetics, microdialysis, and mass spectrometry to measure enkephalin release during acute stress in freely moving rodents. Their study provided better detection of enkephalins due to the implementation of previously reported derivatization reaction combined with improved sample collection and offered insights into the dynamics and relationship between Met- and Leu-Enkephalin in the Nucleus Accumbens shell during stress.

      Strengths:

      A strength of this work is the enhanced opioid peptide detection resulting from an improved microdialysis technique coupled with an established derivatization approach and sensitive and quantitative nLC-MS measurements. These improvements allowed basal and stimulated peptide release with higher temporal resolution, lower detection thresholds, and native-state endogenous peptide measurement.

      Weaknesses:

      The draft incorrectly credits itself for the development of an oxidation method for the stabilization of Met- and Leu-Enk peptides. The use of hydrogen peroxide reaction for the oxidation of Met-Enk in various biological samples, including brain regions, has been reported previously, although the protocols may slightly vary. Specifically, the manuscript writes about "a critical discovery in the stabilization of enkephalin detection" and that they have "developed a method of methionine stabilization." Those statements are incorrect and the preceding papers that relied on hydrogen peroxide reaction for oxidation of Met-Enk and HPLC for quantification of oxidized Enk forms should be cited. One suggested example is Finn A, Agren G, Bjellerup P, Vedin I, Lundeberg T. Production and characterization of antibodies for the specific determination of the opioid peptide Met5-Enkephalin-Arg6-Phe7. Scand J Clin Lab Invest. 2004;64(1):49-56. doi: 10.1080/00365510410004119. PMID: 15025428.

      Thank you for highlighting this. It was not our intention to imply that we developed the oxidation method, rather that we were able improve the detection of metenkephalin by oxidation of the methionine without compromising the detection resolution of leu-enkephalin, enabling the simultaneous detection of both peptides. We have addressed this is the manuscript and included the suggested citation. 

      Another suggestion for this draft is to make the method section more comprehensive by adding information on specific tools and parameters used for statistical analysis:

      (1) Need to define "proteomics data" and explain whether calculations were performed on EIC for each m/z corresponding to specific peptides or as a batch processing for all detected peptides, from which only select findings are reported here. What type of data normalization was used, and other relevant details of data handling? Explain how Met- and Leu-Enk were identified from DIA data, and what tools were used.

      Thank you for pointing out this source of confusion. We believe it is because we use a different DIA method than is typically used in other literature. Briefly, we use a DIA method with the targeted inclusion list to ensure MS2 triggering as opposed to using large isolation widths to capture all precursors for fragmentation, as is typically done with MS1 features. For our method, MS2 is triggered based on the 4 selected m/z values (heavy and light versions of Leu and Met-Enkephalin peptides) at specific retention time windows with isolation width of 2 Da; regardless of the intensity of MS1 of the peptides. 

      (2) Simple Linear Regression Analysis: The text mentions that simple linear regression analysis was performed on forward and reverse curves, and line equations were reported, but it lacks details such as the specific variables being regressed (although figures have labels) and any associated statistical parameters (e.g., R-squared values). 

      Additional detail about the linear regression process was added to the methods section, please see lines 614-618. The R squared values are also now shown on the figure. 

      ‘For the forward curves, the regression was applied to the measured concentration of the light standard as the theoretical concentration was increased. For plotting purposes, we show the measured peak area ratios for the light standards in the forward curves. For the reverse curves, the regression was applied to the measured concentration of the heavy standard, as the theoretical concentration was varied.’

      (3) Violin Plots: The proteomics data is represented as violin plots with quartiles and median lines. This visual representation is mentioned, but there is no detail regarding the software/tools used for creating these plots.

      We used Graphpad Prism to create these plots. This detail has been added to the statistical analysis section. See line 630.

      (4) Log Transformation: The text states that the data was log-transformed to reduce skewness, which is a common data preprocessing step. However, it does not specify the base of the logarithm used or any information about the distribution before and after transformation.

      We have added the requested details about the log transformation, and how the data looked before and after, into the statistical analysis section. We followed convention that the use of log is generally base 10 unless otherwise specified as natural log (base 2) or a different base. See lines 622-625

      ‘The data was log10 transformed to reduce the skewness of the dataset caused by the variable range of concentrations measured across experiments/animals. Prior to log transformation, the measurements failed normality testing for a Gaussian distribution. After the log transformation, the data passed normality testing, which provided the rationale for the use of statistical analyses that assume normality.’

      (5) Two-Way ANOVA: Two-way ANOVA was conducted with peptide and treatment as independent variables. This analysis is described, but there is no information regarding the software or statistical tests used, p-values, post-hoc tests, or any results of this analysis.

      Information about the two-way ANOVA analysis has been added to the statistical analysis section. Additionally, more detailed information has been added to the figure legends about the statistical results. Please see lines 625-628.

      ‘Two-way ANOVA testing with peptide (Met-Enk or Leu-Enk) and treatment (buffer or stress for example) as the two independent variables. Post-hoc testing was done using Šídák's multiple comparisons test and the p values for each of these analyses are shown in the figures (Figs. 1F, 2A).’ 

      (6) Paired T-Test: A paired t-test was performed on predator odor proteomic data before and after treatment. This step is mentioned, but specific details like sample sizes, and the hypothesis being tested are not provided.

      The sample size is included in the figure legend to which we have included a reference. We have also included the following text to highlight the purpose of this test. See lines 628-630

      A paired t-test was performed on the predator odor proteomic data before and after odor exposure to test that hypothesis that Met-Enk increases following exposure to predator odor  (Fig. 3F). These analyses were conducted using Graphpad Prism.

      (7) Correlation Analysis: The text mentions a simple linear regression analysis to correlate the levels of Met-Enk and Leu-Enk and reports the slopes. However, details such as correlation coefficients, and p-values are missing.

      We apologize for the use of the word correlation as we think it may have caused some confusion and have adjusted the language accordingly. Since this was a linear regression analysis, there is no correlation coefficient. The slope of the fitted line is reported on the figures to show the fitted values of Met-Enk to Leu-Enk. 

      (8) Fiber Photometry Data: Z-scores were calculated for fiber photometry data, and a reference to a cited source is provided. This section lacks details about the calculation of zscores, and their use in the analysis. 

      These details have been added to the statistical analysis section. See lines 634-637

      ‘For the fiber photometry data, the z-scores were calculated as described in using GuPPy which is an open-source python toolbox for fiber photometry analysis. The z-score equation used in GuPPy is z=(DF/F-(mean of DF/F)/standard deviation of DF/F) where F refers to fluorescence of the GCaMP6s signal.’

      (9) Averaged Plots: Z-scores from individual animals were averaged and represented with SEM. It is briefly described, but more details about the number of animals, the purpose of averaging, and the significance of SEM are needed.

      We have added additional information about the averaging process in the statistical analysis section. See lines 639-643.

      ‘The purpose of the averaged traces is to show the extent of concordance of the response to experimenter handling and predator odor stress among animals with the SEM demonstrating that variability. The heatmaps depict the individual responses of each animal. The heatmaps were plotted using Seaborn in Python and mean traces were plotted using Matplotlib in Python.’

      A more comprehensive and objective interpretation of results could enhance the overall quality of the paper.

      We have taken this opportunity to improve our manuscript following comments from all the reviewers that we hope has resulted in a manuscript with a more objective interpretation of results. 

      Reviewer #3 (Public Review):

      Thank you for your thoughtful review of our work. To clarify some of the points you raised, we revised the manuscript to include more detail on how we distinguish between the oxidized endogenous and standard signal, as well as refine the language concerning the spatial resolution. We also edited the manuscript regarding the concentration measurements. We conducted technical replicates, so we appreciate you raising this point and clarify that in the main text. 

      Summary:

      This important paper describes improvements to the measurement of enkephalins in vivo using microdialysis and LC-MS. The key improvement is the oxidation of met- to prevent having a mix of reduced and oxidized methionine in the sample which makes quantification more difficult. It then shows measurements of enkephalins in the nucleus accumbens in two different stress situations - handling and exposure to predator odor. It also reports the ratio of released met- and leu-enkephalin matching what is expected from the digestion of proenkephalin. Measurements are also made by photometry of Ca2+ changes for the fox odor stressor. Some key takeaways are the reliable measurement of met-enkephalin, the significance of directly measuring peptides as opposed to proxy measurements, and the opening of a new avenue into the research of enkephalins due to stress based on these direct measurements.

      Strengths:

      -Improved methods for measurement of enkephalins in vivo.

      -Compelling examples of using this method.

      -Opening a new area of looking at stress responses through the lens of enkephalin concentrations.

      Weaknesses:

      (1) It is not clear if oxidized met-enk is endogenous or not and this method eliminates being able to discern that.

      We clarified our wording in the text copied below to provide an explanation on how we distinguish between the two. Even after oxidation, the standard signal has a higher m/z ratio due to the presence of the Carbon and Nitrogen isotopes as described in the Chemicals section of the methods ‘For Met Enkephalin, a fully labeled L-Phenylalanine (<sup>13</sup>C<sub>9</sub>, <sup>15</sup>N) was added (YGGFM). The resulting mass shift between the endogenous (light) and heavy isotope-labeled peptide are 7Da and 10Da, respectively.’, so they can still be differentiated from the endogenous signal. We have clarified the language in the results section. See lines 82-87. 

      ‘After each sample collection, we add a consistent known concentration of isotopically labeled internal standard of Met-Enk and Leu-Enk of 40 amol/sample to the collected ISF for the accurate identification and quantification of endogenous peptide. These internal standards have a different mass/charge (m/z) ratio than endogenous Met- and Leu-Enk. Thus, we can identify true endogenous signal for Met-Enk and Leu-Enk (Suppl Fig. 1A,C) versus noise, interfering signals, and standard signal (Suppl. Fig. 1B,D).’

      (2) It is not clear if the spatial resolution is really better as claimed since other probes of similar dimensions have been used.

      Apologies for any confusion here. To clarify we primarily state that our approach improves temporal resolution and in a few cases refer to improved spatiotemporal resolution, which we believe we show. The dimensions of the microdialysis probe used in these experiments allow us to target the nucleus accumbens shell and as well as being smaller – especially at the membrane level - than a fiber photometry probe. 

      (3) Claims of having the first concentration measurement are not quite accurate.

      Thank you for your feedback. To clarify, we do not claim that we have the first concentration measurements, rather we are the first to quantify the ratio of Met-Enk to Leu-Enk in vivo in freely behaving animals in the NAcSh. 

      (4) Without a report of technical replicates, the reliability of the method is not as wellevaluated as might be expected.

      We have added these details in the methods section, please see lines 521-530. 

      ‘Each sample was run in two technical replicates and the peak area ratio was averaged before concentration calculations of the peptides were conducted. Several quality control steps were conducted prior to running the in vivo samples. 1) Two technical replicates of a known concentration were injected and analyzed – an example table from 4 random experiments included in this manuscript is shown below. 2) The buffers used on the day of the experiment (aCSF and high K+ buffer) were also tested for any contaminating Met-Enk or Leu-Enk signals by injecting two technical replicates for each buffer. Once these two criteria were met, the experiment was analyzed through the system. If either step failed, which happened a few times, the samples were frozen and the machines were cleaned and restarted until the quality control measures were met.’

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      • The authors should provide appropriate citations of a study that has validated the Enkephalin-Cre mouse line in the nucleus accumbens or provide verification experiments if they have any available.

      Thank you for your comment. We have added a reference validating the Enk-Cre mouse line in the nucleus accumbens to the methods section and is copied here. 

      D.C. Castro, C.S. Oswell, E.T. Zhang, C.E. Pedersen, S.C. Piantadosi, M.A. Rossi, A.C. Hunker, A. Guglin, J.A. Morón, L.S. Zweifel, G.D. Stuber, M.R. Bruchas, An endogenous opioid circuit determines state-dependent reward consumption, Nature 2021 598:7882 598 (2021) 646–651. https://doi.org/10.1038/s41586-02104013-0.

      • Better definition of the labels y1,y2,b3 in Figures 1 and S1 would be useful. I may have missed it but it wasn't described in methods, results, or legends.

      Thank you for this comment. We have added this information to Fig.1 legend ‘Y1, y2, b3 refer to the different elution fragments resulting from Met-Enk during LC-MS.

      • It is interesting that the ratio of KCl-evoked release is what changes differentially for Met- vs Leu. Leu enk increases to the range of met-enk. There is non-detectable or approaching being non-detectable leu-enk (below the 40 amol / sample limit of quantification) in most of the subjects that become apparent and approach basal levels of met-enkephalin. This suggests that the K+ evoked response may be more pronounced for leu-enk. This is something that should be considered for further analysis and should be discussed.

      Thank you for this astute observation, and you make a great point. We have added some discussion of this finding in the results and discussion sections see lines 111112 and lines 253-257. 

      ‘Interestingly, Leu-Enk showed a greater fold change compared to baseline than did Met-Enk with the fold changes being 28 and 7 respectively based on the data in Fig.1F.’

      ‘We also noted that Leu-Enk showed a greater fold increase relative to baseline after depolarization with high K+ buffer as compared to Met-Enk. This may be due to increased Leu-Enk packaging in dense core vesicles compared to Met-Enk or due to the fact that there are two distinct precursor sources for Leu-Enk, namely both proenkephalin and prodynorphin while Met-Enk is mostly cleaved from proenkephalin (see Table 1 [48]).’

      • For example in 2E, it would be helpful to label in the graph axis what samples correspond to the manipulation and also in the text provide the reader with the sample numbers. The authors interpret the relationship between the last two samples of baseline and posthandling stress as the following in the figure legend "the concentration released in later samples is affected; such influence suggests that there is regulation of the maximum amount of peptide to be released in NAcSh. E. The negative correlation in panel d is reversed by using a high K+ buffer to evoke Met-Enk release, suggesting that the limited release observed in D is due to modulation of peptide release rather than depletion of reserves." However, the correlations are similar between 2D and E and it appears that two mice are mediating the difference between the two groups. The appropriate statistical analysis would be to compare the regressions of the two groups. Statistics for the high K+ (and all other graphs where appropriate) need to be reported, including the r2 and p-value.

      Thank you for your constructive critique. To elucidate the effect of high K+, we have plotted the regression line and reported the slope for Fig. 2E. Notably, the slope is reduced by a factor of 2 and appears to be driven by a large subset of the animals. The statistics for the high K+ graph are shown on the figure (Fig 1F) which test the hypothesis of whether high K+ leads to the release of Leu-Enk and Met-Enk respectively compared to baseline with aCSF. We have added the test statistics to the figure legend for additional clarity. Fig. 1G has no statistics because it is only there to elucidate the ratio between Met-Enk and Leu-Enk in the same samples. We did not test any hypotheses related to whether there are differences between their levels as that is not relevant to our question. The correlation on the same data is depicted in Fig. 1H, and we have added the R<sup>2</sup> value per your request. 

      • The interpretation that handling stress induces enkephalin release from microdialysis experiments is also confounded by other factors. For instance, from the methods, it appears that mice were connected and sample collection started 30 min after surgery, therefore recovery from anesthesia is also a confounding variable, among other technical aspects, such as equilibration of the interstitial fluid to the aCSF running through the probe that is acting as a transmitter and extracellular molecule "sink". Did the authors try to handle the mice post hookup similar to what was done with photometry to have a more direct comparison to photometry experiments? This procedural difference, recording from recently surgerized animals (microdialysis) vs well-recovered animals with photometry should be mentioned in addition to the other caveats the authors mention.

      Thank you for your comment. We are aware of this technical limitation, and it is largely why we sought to conduct the fiber photometry experiments to get at the same question. As you requested, we have included additional language in the discussion to acknowledge this limitation and how we chose to address it by measuring calcium activity in the enkephalinergic neurons, which would presumably be the same cell population whose release we are quantifying using microdialysis. See lines 262-273.  

      ‘Our findings showed a robust increase in peptide release at the beginning of experiments, which we interpreted as due to experimenter handling stress that directly precedes microdialysis collections. However, there are other technical limitations to consider such as the fact that we were collecting samples from mice that were recently operated on. Another consideration is that the circulation of aCSF through the probe may cause a sudden shift in oncotic and hydrostatic forces, leading to increased peptide release to the extracellular space. As such, we wanted to examine our findings using a different technique, so we chose to record calcium activity from enkephalinergic neurons - the same cell population leading to peptide release. Using fiber photometry, we showed that enkephalinergic neurons are activated by stress exposure, both experimenter handling and fox odor, thereby adding more evidence to suggest that enkephalinergic neurons are activated by stress exposure which could explain the heightened peptide levels at the beginning of microdialysis experiments.’

      • The authors should provide more details on handling stress manipulation during photometry. For photometry what was the duration of the handling bout, what was the interval between handling events, and can the authors provide a description of what handling entailed? Were mice habituated to handling days before doing photometry recording experiments?

      Thank you for your suggestion. We have addressed all of your points in the methods section. See lines 564-570. 

      ‘The handling bout which mimicked traditional scruffing lasted about 3-5 seconds. The mouse was then let go and the handling was repeated another two times in a single session with a minimum of 1-2 minutes between handling bouts. Mice were habituated to this manipulation by being attached to the fiber photometry rig, for 3-5 consecutive days prior to the experimental recording. Additionally, the same maneuver was employed when attaching/detaching the fiber photometry cord, so the mice were subjected to the same process several times.’

      • For the novel weigh boat experiments, the authors should explicitly state when these experiments were done in relation to the fox urine, was it a different session or the same session? Were they the same animals? Statements like the following (line 251) imply it was done in the same animals in the same session but it should be clarified in the methods "We also showed using fiber photometry that the novelty of the introduction of a foreign object to the cage, before adding fox odor, was sufficient to activate enkephalinergic neurons."

      As shown in supplementary figure 4, individual animal data is shown for both water and fox urine exposure (overlaid) to depict whether there were differences in their responses to each manipulation – in the same animal. And yes, you are correct, the animals were first exposed to water 3 times in the recording session and then exposed to fox urine 3 times in the same session. We have added that to the methods section describing in vivo fiber photometry. See lines 575-576.  

      • Statistical testing would be needed to affirm the conclusions the authors draw from the fox urine and novel weigh boat experiments. For example, it shows stats that the response attenuates, that it is not different between fox urine and novel (it looks like the response is stronger to the fox urine when looking at the individual animals), etc. These data look clear but stats are formally needed. Formal statistics are also missing in other parts of the manuscript where conclusions are drawn from the data but direct statistical comparisons are not included (e.g. Fig 2.G-I).

      The photometry data is shown as z-scores which is a formal statistical analysis. ANOVA would be inappropriate to run to compare z-scores. We understand that this is erroneously done in fiber photometry literature, however, it remains incorrect. The z-scores alone provide all the information needed about the deviation from baseline. We understand that this is not immediately clear to readers, and we thank you for allowing us to explain why this is the case. We have added test statistics to figure legends where hypothesis testing was done and p-values were reported. 

      • Did the authors try to present the animals with repeated fox urine exposure to see if this habituates like the photometry?

      No, we did not do that experiment due to the constrained timing within which we had to run our microdialysis/LC-MS timeline, but it is a great point for future exploration. 

      • It would be useful to present the time course of the odor experiment for the microdialysis experiment.

      The timeline is shown in Fig.1a and Fig.3e. To reiterate, each sample is 13 minutes long.

      • Can the authors determine if differences in behavior (e.g. excessive avoidance in animals with with one type of response) or microdialysis probe location dictate whether animals fall into categories of increased release, no release, or no-detection? From the breakdown, it looks like it is almost equally split into three parts but the authors' descriptions of this split are somewhat misleading (line 210). " The response to predator odor varies appreciably: although most animals show increased Met-Enk release after fox odor exposure, some show continued release with no elevation in Met-Enk levels, and a minority show no detectable release".

      Thank you for your constructive feedback. We do not believe the difference in behavior is correlated with probe placement. The hit map can be found in suppl. Fig 3 and shows that all mice included in the manuscript had probes in the NAcSh. We purposely did not distinguish between dorsal and ventral because of our 1 mm membrane would make it hard to presume exclusive sampling from one subregion. That is a great point though, and we have thought about it extensively for future studies. We have edited the language to reflect the almost even split of responses for Met-Enk and appreciate you pointing that out. 

      • Overall, given the inconsistencies in experimental design and overall caveats associated, I think the authors are unable to draw reasonable conclusions from the repeated stressor experiments and something they should either consider is not trying to draw strong conclusions from these observations or perform additional experiments that provide the grounds to derive those conclusions.

      We have included additional language on the caveats of our study, and our use of a dual approach using fiber photometry and microdialysis was largely driven by a

      desire to offer additional support of our conclusions. We expected pushback about our conclusions, so we wanted to offer a secondary analysis using a different technique to test our hypothesis. To be honest the tone of this comment and content is not particularly constructive (especially for trainees) nor does it offer a space to realistically address anything. This work took multiple years to optimize, it was led by a graduate student, and required a multidisciplinary team. As highlighted, we believe it offers an important contribution to the literature and pushes the field of peptide detection forward.  

      Reviewer #2 (Recommendations For The Authors):

      A more comprehensive and objective interpretation of results could enhance the overall quality of the paper. The manuscript contains statements like "we are the first to confirm," which can be challenging to substantiate and may not significantly enhance the paper. It's essential to ensure that novelty statements are well-founded. For example, the release of enkephalins from other brain regions after stress exposure is well-documented but not addressed in the paper. Similarly, the role of the NA shell in stress has been extensively studied but lacks coverage in this manuscript.

      We have edited the language to reflect your feedback. We have also included relevant literature expanding on the demonstrated roles of enkephalins in the literature. We would like to note that most studies have focused on chronic stress, and we were particularly interested in acute stress. See lines 129-134.

      ‘These studies have included regions such as the locus coeruleus, the ventral medulla, the basolateral nucleus of the amygdala, and the nucleus accumbens core and shell. Studies using global knockout of enkephalins have shown varying responses to chronic stress interventions where male knockout mice showed resistance to chronic mild stress in one study, while another study showed that enkephalin-knockout mice showed delayed termination of corticosteroid release. [33,34]’ 

      Finally, not a weakness but a clarification suggestion: the method description mentions the use of 1% FA in the sample reconstitution solution and LC solvents, which is an unusually high concentration of acid. If this concentration is intentional for maintaining the peptides' oxidation state, it would be beneficial to mention this in the text to assist readers who might want to replicate the method.

      This is correct and has been clarified in the methods section

      Reviewer #3 (Recommendations For The Authors):

      -The Abstract should state the critical improvements that are made. Also, quantify the improvements in spatiotemporal resolution.

      Thank you for your comment. We have edited the abstract to reflect this. 

      - The use of "amol/sample" as concentration is less informative than an SI units (e.g., pM concentration) and should be changed. Especially since the volume used was the same for in vivo sampling experiments.

      Thank you for your comment. We chose to report amol/sample because we are measuring such a small concentration and wanted to account for any slight errors in volume that can make drastic differences on reported concentrations especially since samples are dried and resuspended.  

      -Please check this sentence: "After each collection, the samples were spiked with 2 µL of 12.5 fM isotopically labeled Met-Enkephalin and Leu-Enkephalin" This dilution would yield a concentration of ~2 fM. In a 12 uL sample, that would be ~0.02 amol, well below the detection limit. (note that fM would femtomolar concentration and fmol would be femtomoles added).

      -"liquid chromatography/mass spectrometry (LC-MS) [9-12]"... Reference 9 is a RIA analysis paper, not LC-MS as stated.

      Thank you for catching these. We have corrected the unit and citation. 

      -Given that improvements in temporal resolution are claimed, the lack of time course data with a time axis is surprising. Rather, data for baseline and during treatment appear to be combined in different plots. Time course plots of individuals and group averages would be informative.

      Due to the expected variability between individual animal time course data, where for example, we measure detectable levels in one sample followed by no detection, it was very difficult to combine data across time. Therefore, to maximize data inclusion from all animals that showed baseline measurements and responses to individual manipulations, we opted to report snapshot data. Our improvement in temporal resolution refers to the duration of each sample rather than continuous sampling, so those two are unrelated. Thank you for your feedback and allowing us to clarify this.

      - I do not understand this claim "We use custom-made microdialysis probes, intentionally modified so they are similar in size to commonly used fiber photometry probes to avoid extensive tissue damage caused by traditional microdialysis probes (Fig. 1B)." The probes used are 320 um OD and 1 mm long. This is not an uncommon size of microdialysis probes and indeed many are smaller, so is their probe really causing less damage than traditional probes?

      Thank you for your comment. We are only trying to make the point that the tissue damage from these probes is comparable to commonly used fiber photometry probes. We only point that out because tissue damage is used as a point to dissuade the usage of microdialysis in some literature, and we just wanted to disambiguate that. We have clarified the statement you pointed out.  

      -The oxidation procedure is a good idea, as mentioned above. It would be interesting to compare met-enk with and without the oxidation procedure to see how much it affects the result (I would not say this is necessary though). It is not uncommon to add antioxidants to avoid losses like this. Also, it should be acknowledged that the treatment does prevent the detection of any in vivo oxidation, perhaps that is important in met-enk metabolism?

      The comparison between oxidized and unoxidized Met-Enk detection is in figure 1C. 

      -It would be a best practice to report the standard deviation of signal for technical replicates (say near in vivo concentrations) of standards and repeated analysis of a dialysate sample to be able to understand the variability associated with this method. Similarly, an averaged basal concentration from all rats.

      Thank you for your comment. We have included a table showing example quality control standard injections from 4 randomly selected experiments included in the manuscript that were run before and after each experiment and descriptive statistics associated with these technical replicates. We also added some detail to the methods section to describe how quality control is done. See lines 521-530. 

      ‘Each sample was run in two technical replicates and the peak area ratio was averaged before concentration calculations of the peptides were conducted. Several quality control steps were conducted prior to running the in vivo samples. 1) Two technical replicates of a known concentration were injected and analyzed – an example table from 4 random experiments included in this manuscript is shown below. 2) The buffers used on the day of the experiment (aCSF and high K+ buffer) were also tested for any contaminating Met-Enk or Leu-Enk signals by injecting two technical replicates for each buffer. Once these two criteria were met, the experiment was analyzed through the system. If either step failed, which happened a few times, the samples were frozen and the machines were cleaned and restarted until the quality control measures were met.’

      EDITORS NOTE

      Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05.

      Thank you for your suggestion. We have included more detail about statistical analysis in the figure legends per this comment and reviewer comments.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Responses to Reviewer #1:

      We thank the reviewer for these additional comments, and more generally for their extensive engagement with our work, which is greatly appreciated. Here, we respond to the three points in their latest review in turn.

      The results of these experiments support a modest but important conclusion: If sub-optimal methods are used to collect retrospective reports, such as simple yes/no questions, inattentional blindness (IB) rates may be overestimated by up to ~8%.

      It is true, of course, that we think the field has overstated the extent of IB, and we appreciate the reviewer characterizing our results as important along these lines. Nevertheless, we respectfully disagree with the framing and interpretation the reviewer attaches to them. As explained in our previous response, we think this interpretation — and the associated calculations of IB overestimation ‘rates’ — perpetuates a binary approach to perception and awareness which we regard as mistaken.

      A graded approach to IB and visual awareness 

      Our sense is that many theorists interested in IB have conceived of perception and awareness as ‘all or nothing’: You either see a perfectly clear gorilla right in front of you, or you see nothing at all. This is implicit in the reviewer’s characterization of our results as simply indicating that fewer subjects fail to see the critical stimulus than previously assumed. To think that way is precisely to assume the orthodox binary position about perception, i.e., that any given subject can neatly be categorized into one of two boxes, saw or didn’t see.

      Our perspective is different. We think there can be degraded forms of perception and awareness that fall neatly into neither of the categories “saw the stimulus perfectly clearly” or “saw nothing at all”. On this graded conception, the question is not: “What proportion of subjects saw the stimulus?” but: “What is the sensitivity of subjects to the stimulus?” This is why we prefer signal detection measures like d′ over % noticing and % correct. This powerful framework has been successful in essentially every domain to which it has been applied, and we think perception and visual awareness are no exception. We understand that the reviewer may not think the same way about this foundational issue, but since part of our goal is to promote a graded approach to perception, we are keen to highlight our disagreement here and so resist the reviewer’s interpretation of our results (even to the extent that it is a positive one!).

      Finally, we note that given this perspective, we are correspondingly inclined to reject many of the summary figures following below in Point (1) by the reviewer. These calculations (given in terms of % noticing and not noticing) make sense on the binary conception of awareness, but not on the SDT-based approach we favor. We say more about this below. 

      (1) In experiment 1, data from 374 subjects were included in the analysis. As shown in figure 2b, 267 subjects reported noticing the critical stimulus and 107 subjects reported not noticing it. This translates to a 29% IB rate if we were to only consider the "did you notice anything unusual Y/N" question. As reported in the results text (and figure 2c), when asked to report the location of the critical stimulus (left/right), 63.6% of the "non-noticer" group answered correctly. In other words, 68 subjects were correct about the location while 39 subjects were incorrect. Importantly, because the location judgment was a 2-alternative-forced-choice, the assumption was that if 50% (or at least not statistically different than 50%) of the subjects answered the location question correctly, everyone was purely guessing. Therefore, we can estimate that ~39 of the subjects who answered correctly were simply guessing (because 39 guessed incorrectly), leaving 29 subjects from the nonnoticer group who were correct on the 2AFC above and beyond the pure guess rate. If these 29 subjects are moved from the non-noticer to the noticer group, the corrected rate of IB for Experiment 1 is 20.86% instead of the original 28.61% rate that would have been obtained if only the Y/N question was used. In other words, relying only on the "Y/N did you notice anything" question led to an overestimate of IB rates by 7.75% in Experiment 1.

      In the revised version of their manuscript, the authors provided the data that was missing from the original submission, which allows this same exercise to be carried out on the other 4 experiments.  

      (To briefly interject: All of these data were provided in our public archive since our original submission and remain available at https://osf.io/fcrhu. The difference now is only that they are included in the manuscript itself.)

      Using the same logic as above, i.e., calculating the pure-guess rate on the 2AFC, moving the number of subjects above this pure-guess rate to the non-noticer group, and then re-calculating a "corrected IB rate", the other experiments demonstrate the following:

      Experiment 2: IB rates were overestimated by 4.74% (original IB rate based only on Y/N question = 27.73%; corrected IB rate that includes the 2AFC = 22.99%)

      Experiment 3: IB rates were overestimated by 3.58% (original IB rate = 30.85%; corrected IB rate = 27.27%)

      Experiment 4: IB rates were overestimated by ~8.19% (original IB rate = 57.32%; corrected IB rate for color* = 39.71%, corrected IB rate for shape = 52.61%, corrected IB rate for location = 55.07%)

      Experiment 5: IB rates were overestimated by ~1.44% (original IB rate = 28.99%; corrected IB rate for color = 27.56%, corrected IB rate for shape = 26.43%, corrected IB rate for location = 28.65%)

      *note: the highest overestimate of IB rates was from Experiment 4, color condition, but the authors admitted that there was a problem with 2AFC color guessing bias in this version of the experiment which was a main motivation for running experiment 5 which corrected for this bias.

      Taken as a whole, this data clearly demonstrates that even with a conservative approach to analyzing the combination of Y/N and 2AFC data, inattentional blindness was evident in a sizeable portion of the subject populations. An important (albeit modest) overestimate of IB rates was demonstrated by incorporating these improved methods.

      We appreciate the work the reviewer has put into making these calculations. However, as noted above, such calculations implicitly reflect the binary approach to perception and awareness that we reject. 

      Consider how we’d think about the single subject case where the task is 2afc detection of a low contrast stimulus in noise. Suppose that this subject achieves 70% correct. One way of thinking about this is that the subject fully and clearly sees the stimulus on 40% of trials (achieving 100% correct on those) and guesses completely blindly on the other 60% (achieving 50% correct on those) for a total of 40% + 30% = 70% overall. However, this is essentially a ‘high threshold’ approach to the problem, in contrast to an SDT approach. On an SDT approach — an approach with tremendous evidential support — on every trial the subject receives samples from probabilistic distributions corresponding to each interval (one noise and one signal + noise) and determines which is higher according to the 2afc decision rule. Thus, across trials, they have access to differentially graded information about the stimulus. Moreover, on some trials they may have significant information from the stimulus (perhaps, well above their single interval detection criterion) but still decide incorrectly because of high noise from the other spatial interval. From this perspective, there is no nonarbitrary way of saying whether the subject saw/did not see on a given trial. Instead, we must characterize the subject’s overall sensitivity to the stimulus/its visibility to them in terms of a parameter such as d′ (here, ~ 0.7).

      We take the same attitude to the subjects in our experiments (and specifically to our ‘super subject’). Instead of calculating the proportion of subjects who saw or failed to see the stimulus (with some characterized as aware and some as unaware), we think the best way to characterize our results is that, across subjects (and so trials also), there was differential graded access to information from the stimulus, and this is best represented in terms of the group-level sensitivity parameter d′. This is why we frame our results as demonstrating that subjects traditionally considered inattentionally blind exhibit significant residual visual sensitivity to the critical stimulus.

      (2) One of the strongest pieces of evidence presented in this paper was the single data point in Figure 3e showing that in Experiment 3, even the super subject group that rated their non-noticing as "highly confident" had a d' score significantly above zero. Asking for confidence ratings is certainly an improvement over simple Y/N questions about noticing, and if this result were to hold, it could provide a key challenge to IB. However, this result can most likely be explained by measurement error.

      In their revised paper, the authors reported data that was missing from their original submission: the confidence ratings on the 2AFC judgments that followed the initial Y/N question. The most striking indication that this data is likely due to measurement error comes from the number of subjects who indicated that they were highly confident that they didn't notice anything on the critical trial, but then when asked to guess the location of the stimulus, indicated that they were highly confident that the stimulus was on the left (or right). There were 18 subjects (8.82% of the high-confidence non-noticer group) who responded this way. To most readers, this combination of responses (high confidence in correctly judging a stimulus feature that one is highly confident in having not seen at all) indicates that a portion of subjects misunderstood the confidence scales (or just didn't read the questions carefully or made mistakes in their responses, which is common for experiments conducted online).

      In the authors' rebuttal to the first round of peer review, they wrote, "it is perfectly rationally coherent to be very confident that one didn't see anything but also very confident that if there was anything to be seen, it was on the left." I respectfully disagree that such a combination of responses is rationally coherent. The more parsimonious interpretation is that a measurement error occurred, and it's questionable whether we should trust any responses from these 18 subjects.

      In their rebuttal, the authors go on to note that 14 of the 18 subjects who rated their 2AFC with high confidence were correct in their location judgment. If these 14 subjects were removed from analysis (which seems like a reasonable analysis choice, given their contradictory responses), d' for the high-confidence non-noticer group would most likely fall to chance levels. In other words, we would see a data pattern similar to that plotted in Figure 3e, but with the first data point on the left moving down to zero d'. This corrected Figure 3e would then provide a very nice evidence-based justification for including confidence ratings along with Y/N questions in future inattentional blindness studies.

      We appreciate the reviewer’s highlighting of this particular piece of evidence as amongst our strongest. (At the same time, we must resist its characterization as a “single data point”: it derives from a large pre-registered experiment involving some 7,000 subjects total, with over 200 subjects in the relevant bin — both figures being far larger than a typical IB experiment.) We also appreciate their raising the issue of measurement error.

      Specifically, the reviewer contends that our finding that even highly confident non-noticers exhibit significant sensitivity is “most likely … explained by measurement error” due to subjects mistakenly inverting our confidence scale in giving their response. In our original reply, we gave two reasons for thinking this quite unlikely; the reviewer has not addressed these in this revised review. First, we explicitly labeled our confidence scale (with 0 labeled as ‘Not at all confident’ and 3 as ‘Highly confident’) so that subjects would be very unlikely simply to invert the scale. This is especially so as it is very counterintuitive to treat “0” as reflecting high confidence. More importantly, however, we reasoned that any measurement error due to inverting or misconstruing the confidence scale should be symmetric. That is: If subjects are liable to invert the confidence scale, they should do so just as often when they answer “yes” as when they answer “no” – after all the very same scale is being used in both cases. This allows us to explore evidence of measurement error in relation to the large number of high-confidence “yes” subjects (N = 2677), thus providing a robust indicator as to whether subjects are generally liable to misconstrue the confidence scale. Looking at the number of such high confidence noticers who subsequently respond to the 2afc question with low confidence (a pattern which might, though need not, suggest measurement error), we found that the number was tiny. Only 28/2677 (1.05%) of high-confidence noticers subsequently gave the lowest level of confidence on the 2afc question, and only 63/2677 (2.35%) subjects gave either of the two lower levels of confidence. For these reasons, we consider any measurement error due to misunderstanding the confidence scale to be extremely minimal.

      The reviewer is correct to note that 18/204 (9%) subjects reported both being highly confident that they didn't notice anything and highly confident in their 2afc judgment, although only 14/18 were correct in this judgment. Should we exclude these 14? Perhaps if we agree with the reviewer that such a pattern of responses is not “rationally coherent” and so must reflect a misconstrual of the scale. But such a pattern is in fact perfectly and straightforwardly intelligible. Specifically, in a 2afc task, two stimuli can individually fall well below a subject’s single interval detection criterion — leading to a high confidence judgment that nothing was presented in either interval. Quite consistent with this, the lefthand stimulus may produce a signal that is much higher than the right-hand stimulus — leading to a high confidence forced-choice judgment that, if something was presented, it was on the left. (By analogy, consider how a radiologist could look at a scan and say the following: “We’re 95% confident there’s no tumor. But even on the 5% chance that there is, our tests completely rule out that it’s a malignant one, so don’t worry.”) 

      (3) In most (if not all) IB experiments in the literature, a partial attention and/or full attention trial is administered after the critical trial. These control trials are very important for validating IB on the critical trial, as they must show that, when attended, the critical stimuli are very easy to see. If a subject cannot detect the critical stimulus on the control trial, one cannot conclude that they were inattentionally blind on the critical trial, e.g., perhaps the stimulus was just too difficult to see (e.g., too weak, too brief, too far in the periphery, too crowded by distractor stimuli, etc.), or perhaps they weren't paying enough attention overall or failed to follow instructions. In the aggregate data, rates of noticing the stimuli should increase substantially from the critical trial to the control trials. If noticing rates are equivalent on the critical and control trials, one cannot conclude that attention was manipulated in the first place.

      In their rebuttal to the first round of peer review, the authors provided weak justification for not including such a control condition. They cite one paper that argues such control conditions are often used to exclude subjects from analysis (those who fail to notice the stimulus on the control trial are either removed from analysis or replaced with new subjects) and such exclusions/replacements can lead to underestimations of inattentional blindness rates. However, the inclusion of a partial or full attention condition as a control does not necessitate the extra step of excluding or replacing subjects. In the broadest sense, such a control condition simply validates the attention manipulation, i.e., one can easily compare the percent of subjects who answered "yes" or who got the 2AFC judgment correct during the critical trial versus the control trial. The subsequent choice about exclusion/replacement is separate, and researchers can always report the data with and without such exclusions/replacements to remain more neutral on this practice.

      If anyone were to follow-up on this study, I highly recommend including a partial or full attention control condition, especially given the online nature of data collection. It's important to know the percent of online subjects who answer yes and who get the 2AFC question correct when the critical stimulus is attended, because that is the baseline (in this case, the "ceiling level" of performance) to which the IB rates on the critical trial can be compared.

      We agree with the reviewer that future studies could benefit from including a partial or full attention condition. They are surely right that we might learn something additional from such conditions. 

      Where we differ from the reviewer is in thinking of these conditions as “controls” appropriate to our research question. This is why we offered the justification we did in our earlier response. When these conditions are used as controls, they are used to exclude subjects in ways that serve to inflate the biases we are concerned with in our work. For our question, the absence of these conditions does not impact the significance of the findings, since such conditions are designed to answer a question which is not the one at the heart of our paper. Our key claim is that subjects who deny noticing an unexpected stimulus in a standard inattentional blindness paradigm nonetheless exhibit significant residual sensitivity (as well as a conservative bias in their response to the noticing question); the presence or absence of partial- or full-attention conditions is orthogonal to that question.

      Moreover, we note that our tasks were precisely chosen to be classic tasks widely used in the literature to manipulate attention. Thus, by common consensus in the field, they are effective means to soak up attention, and have in effect been tested in partial- and full-attention control settings in a huge number of studies. Second, we think it very doubtful that subjects in a full-attention trial would not overwhelmingly have detected our critical stimuli. The reviewer worries that they might have been “too weak, too brief, too far in the periphery, too crowded by distractor stimuli, etc.” But consider E5 where the stimulus was a highly salient orange or green shape, present on the screen for 5 seconds. The reviewer also suggests that subjects in the full-attention control might not have detected the stimulus because they “weren't paying enough attention overall”. But evidently if they weren’t paying attention even in the full-attention trial this would be reason for thinking that there was inattentional blindness even in this condition (a point made by White et al. 2018) and certainly not a reason for thinking there was not an attentional effect in the critical trial. Lastly, the reviewer suggests that a full-attention condition would have helped ensure that subjects were following instructions. But we ensured this already by (as per our pre-registration) excluding subjects who performed poorly in the relevant primary tasks.

      Thus, both in principle and in practice, we do not see the absence of such conditions as impacting the interpretation of our findings, even as we agree that future work posing a different research question could certainly learn something from including such conditions.

      Responses to Reviewer #2:

      We note that this report is unchanged from an earlier round of review, and not a response to our significantly revised manuscript. We believe our latest version fully addresses all the issues which the reviewer originally raised. The interested reader can see our original response below. We again thank the reviewer for their previous report which was extremely helpful.

      —-

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study presents valuable findings to the field interested in inattentional blindness (IB), reporting that participants indicating no awareness of unexpected stimuli through yes/no questions, still show above-chance sensitivity to specific properties of these stimuli through follow-up forced-choice questions (e.g., its color). The results suggest that this is because participants are conservative and biased to report not noticing in IB. The authors conclude that these results provide evidence for residual perceptual awareness of inattentionally blind stimuli and that therefore these findings cast doubt on the claim that awareness requires attention. Although the samples are large and the analysis protocol novel, the evidence supporting this interpretation is still incomplete, because effect sizes are rather small, the experimental design could be improved and alternative explanations have not been ruled out.

      We are encouraged to hear that eLife found our work “valuable”. We also understand, having closely looked at the reviews, why the assessment also includes an evaluation of “incomplete”. We gave considerable attention to this latter aspect of the assessment in our revision. In addition to providing additional data and analyses that we believe strengthen our case, we also include a much more substantial review and critique of existing methods in the IB literature to make clear exactly the gap our work fills and the advance it makes. (Indeed, if it is appropriate to say this here, we believe one key aspect of our work that is missing from the assessment is our inclusion of ‘absent’ trials, which is what allows us to make the crucial claims about conservative reporting of awareness in IB for the first time.) Moreover, we refocus our discussion on only our most central claims, and weaken several of our secondary claims so that the data we’ve collected are better aligned with the conclusions we draw, to ensure that the case we now make is in fact complete. Specifically, our two core claims are (1) that there is residual sensitivity to visual features for subjects who would ordinarily be classified as inattentionally blind (whether this sensitivity is conscious or not), and (2) that there is a tendency to respond conservatively on yes/no questions in the context of IB. We believe we have very compelling support for these two core claims, as we explain in detail below and also through revisions to our manuscript.

      Given the combination of strengthened and clarified case, as well as the weakening of any conclusions that may not have been fully supported, we believe and hope that these efforts make our contribution “solid”, “convincing”, or even “compelling” (especially because the “compelling” assessment characterizes contributions that are “more rigorous than the current state-of-the-art”, which we believe to be the case given the issues that have plagued this literature and that we make progress on).

      Reviewer #1 (Public review):

      Summary:

      In the abstract and throughout the paper, the authors boldly claim that their evidence, from the largest set of data ever collected on inattentional blindness, supports the views that "inattentionally blind participants can successfully report the location, color, and shape of stimuli they deny noticing", "subjects retain awareness of stimuli they fail to report", and "these data...cast doubt on claims that awareness requires attention." If their results were to support these claims, this study would overturn 25+ years of research on inattentional blindness, resolve the rich vs. sparse debate in consciousness research, and critically challenge the current majority view in cognitive science that attention is necessary for awareness.

      Unfortunately, these extraordinary claims are not supported by extraordinary (or even moderately convincing) evidence. At best, the results support the more modest conclusion: If sub-optimal methods are used to collect retrospective reports, inattentional blindness rates will be overestimated by up to ~8% (details provided below in comment #1). This evidence-based conclusion means that the phenomenon of inattentional blindness is alive and well as it is even robust to experiments that were specifically aimed at falsifying it. Thankfully, improved methods already exist for correcting the ~8% overestimation of IB rates that this study successfully identified.

      We appreciate here the reviewer’s recognition of the importance of work on inattentional blindness, and the centrality of inattentional blindness to a range of major questions. We also recognize their concerns with what they see as a gap between our data and the claims made on their basis. We address this in detail below (as well as, of course, in our revised manuscript). However, from the outset we are keen to clarify that our central claim is only the first one the reviewer mentions — and the one which appears in our title — namely that, as a group, participants can successfully report the location, color, and shape of stimuli they deny noticing, and thus that there is “Sensitivity to visual features in inattentional blindness”. This is the claim that we believe is strongly supported by our data, and all the more so after revising the manuscript in light of the helpful comments we’ve received.

      By contrast, the other claims the reviewer mentions, concerning awareness (as opposed to residual sensitivity–which might be conscious or unconscious) were intended as both secondary and tentative. We agree with the referee that these are not as strongly supported by our data (and indeed we say so in our manuscript), whereas we do think our data strongly support the more modest — and, to us central — claim that, as a group, inattentionally blind participants can successfully report the location, color, and shape of stimuli they deny noticing. 

      We also feel compelled to resist somewhat the reviewer’s summary of our claims. For example, the reviewer attributes to us the claim that “subjects retain awareness of stimuli they fail to report”; but while that phrase does appear in our abstract, what we in fact say is that our data are “consistent with an alternative hypothesis about IB, namely that subjects retain awareness of stimuli they fail to report”. We do in fact believe that our data are consistent with that hypothesis, whereas earlier investigations seemed not to be. We mention this only because we had used that careful phrasing precisely for this sort of reason, so that we wouldn’t be read as saying that our results unequivocally support that alternative.

      Still, looking back, we see how we may have given more emphasis than we intended to some of these more secondary claims. So, we’ve now gone through and revised our manuscript throughout to emphasize that our main claim is about residual sensitivity, and to make clear that our claims about awareness are secondary and tentative. Indeed, we now say precisely this, that although we favor an interpretation of “our results in terms of residual conscious vision in IB … this claim is tentative and secondary to our primary finding”. We also weaken the statements in the abstract that the reviewer mentions, to better reflect our key claims.

      Finally, we note one further point: Dialectically, inattentional blindness has been used to argue (e.g.) that attention is required for awareness. We think that our data concerning residual sensitivity at least push back on the use of IB to make this claim, even if (as we agree) they do not provide decisive evidence that awareness survives inattention. In other words, we think our data call that claim into question, such that it’s now genuinely unclear whether awareness does or does not survive inattention. We have adjusted our claims on this point accordingly as well.

      Comments:

      (1) In experiment 1, data from 374 subjects were included in the analysis. As shown in figure 2b, 267 subjects reported noticing the critical stimulus and 107 subjects reported not noticing it. This translates to a 29% IB rate, if we were to only consider the "did you notice anything unusual Y/N" question. As reported in the results text (and figure 2c), when asked to report the location of the critical stimulus (left/right), 63.6% of the "non-noticer" group answered correctly. In other words, 68 subjects were correct about the location while 39 subjects were incorrect. Importantly, because the location judgment was a 2-alternative-forced-choice, the assumption was that if 50% (or at least not statistically different than 50%) of the subjects answered the location question correctly, everyone was purely guessing. Therefore, we can estimate that ~39 of the subjects who answered correctly were simply guessing (because 39 guessed incorrectly), leaving 29 subjects from the nonnoticer group who may have indeed actually seen the location of the stimulus. If these 29 subjects are moved to the noticer group, the corrected rate of IB for experiment 1 is 21% instead of 29%. In other words, relying only on the "Y/N did you notice anything" question leads to an overestimate of IB rates by 8%. This modest level of inaccuracy in estimating IB rates is insufficient for concluding that "subjects retain awareness of stimuli they fail to report", i.e. that inattentional blindness does not exist.

      In addition, this 8% inaccuracy in IB rates only considers one side of the story. Given the data reported for experiment 1, one can also calculate the number of subjects who answered "yes, I did notice something unusual" but then reported the incorrect location of the critical stimulus. This turned out to be 8 subjects (or 3% of the "noticer" group). Some would argue that it's reasonable to consider these subjects as inattentionally blind, since they couldn't even report where the critical stimulus they apparently noticed was located. If we move these 8 subjects to the non-noticer group, the 8% overestimation of IB rates is reduced to 6%.

      The same exercise can and should be carried out on the other 4 experiments, however, the authors do not report the subject numbers for any of the other experiments, i.e., how many subjects answered Y/N to the noticing question and how many in each group correctly answered the stimulus feature question. From the limited data reported (only total subject numbers and d' values), the effect sizes in experiments 2-5 were all smaller than in experiment 1 (d' for the non-noticer group was lower in all of these follow-up experiments), so it can be safely assumed that the ~6-8% overestimation of IB rates was smaller in these other four experiments. In a revision, the authors should consider reporting these subject numbers for all 5 experiments.

      We now report, as requested, all these subject numbers in our supplementary data (see Supplementary Tables 1 and 2 in our Supplementary Materials).

      However, we wish to address the larger question the reviewer has raised: Do our data only support a relatively modest reduction in IB rates? Even if they did, we still believe that this would be a consequential result, suggesting a significant overestimation of IB rates in classic paradigms. However, part of our purpose in writing this paper is to push back against a certain binary way of thinking about seeing/awareness. Our sense is that the field has conceived of awareness as “all or nothing”: You either see a perfectly clear gorilla right in front of you, or you see nothing at all. Our perspective is different: We think there can be degraded forms of awareness that fall into neither of those categories. For that reason, we are disinclined to see our results in the way that the reviewer suggests, namely as simply indicating that fewer subjects fail to see the stimulus than previously assumed. To think that way is, in our view, to assume the orthodox binary position about awareness. If, instead, one conceives of awareness as we do (and as we believe the framework of signal detection theory should compel us to), then it isn’t quite right to think of the proportion of subjects who were aware, but rather (e.g.) the sensitivity of subjects to the relevant stimulus. This is why we prefer measures like d′ over % noticing and % correct. We understand that the reviewer may not think the same way about this issue as we do, but part of our goal is to promote that way of thinking in general, and so some of our comments below reflect that perspective and approach.

      For example, consider how we’d think about the single subject case where the task is 2afc detection of a low contrast stimulus in noise. Suppose that this subject achieves 70% correct. One way of thinking about that is that the subject sees the stimulus on 40% of trials (achieving 100% correct on those) and guesses blindly on the other 60% (achieving 50% correct on those) for a total of 40% + 30% = 70% overall. However, this is essentially a “high threshold” approach to the problem, in contrast to an SDT approach. On an SDT approach (an approach with tremendous evidential support), on every trial the subject receives samples from probabilistic distributions corresponding to each interval (one noise and one signal + noise) and determines which is higher according to the 2afc decision rule. Thus, across trials they have access to differentially graded information about the stimulus. Moreover, on some trials they may have significant information from the stimulus (perhaps, well above their single interval detection criterion) but still decide incorrectly because of high noise from the other spatial interval. From this perspective, there is no non-arbitrary way of saying whether the subject saw/did not see on a given trial. Instead, we must characterize the subject’s overall sensitivity to the stimulus/its visibility to them in terms of a parameter such as d′ (here, ~ 0.7).

      We take the same attitude to our super subject. Instead of saying that some subjects saw/failed to see the stimuli, instead we suggest that the best way to characterize our results is that across subjects (and so trials also) there was differential graded access to information from the stimulus best represented in terms of the group-level sensitivity parameter d′.

      We acknowledge that (despite ourselves) we occasionally fell into an all-too-natural binary/high threshold way of thinking, as when we suggested that our data show that “inattentionally blind subjects consciously perceive these stimuli after all” and “the inattentionally blind can see after all." (p.17) We have removed such problematic phrasing as well as other problematic phrasing as noted below.

      (2) Because classic IB paradigms involve only one critical trial per subject, the authors used a "super subject" approach to estimate sensitivity (d') and response criterion (c) according to signal detection theory (SDT). Some readers may have issues with this super subject approach, but my main concern is with the lack of precision used by the authors when interpreting the results from this super subject analysis.

      Only the super subject had above-chance sensitivity (and it was quite modest, with d' values between 0.07 and 0.51), but the authors over-interpret these results as applying to every subject. The methods and analyses cannot determine if any individual subject could report the features above-chance. Therefore, the following list of quotes should be revised for accuracy or removed from the paper as they are misleading and are not supported by the super subject analysis: "Altogether this approach reveals that subjects can report above-chance the features of stimuli (color, shape, and location) that they had claimed not to notice under traditional yes/no questioning" (p.6)

      "In other words, nearly two-thirds of subjects who had just claimed not to have noticed any additional stimulus were then able to correctly report its location." (p.6)

      "Even subjects who answer "no" under traditional questioning can still correctly report various features of the stimulus they just reported not having noticed, suggesting that they were at least partially aware of it after all." (p.8)

      "Why, if subjects could succeed at our forced-response questions, did they claim not to have noticed anything?" (p.8)

      "we found that observers could successfully report a variety of features of unattended stimuli, even when they claimed not to have noticed these stimuli." (p.14)

      "our results point to an alternative (and perhaps more straightforward) explanation: that inattentionally blind subjects consciously perceive these stimuli after all... they show sensitivity to IB stimuli because they can see them." (p.16)

      "In other words, the inattentionally blind can see after all." (p.17)

      We thank the reviewer for pointing out how these quotations may be misleading as regards our central claim. We intended them all to be read generically as concerning the group, and not universally as claiming that all subjects could report above-chance/see the stimuli etc. We agree entirely that the latter universal claim would not be supported by our data. In contrast, we do contend that our super-subject analysis shows that, as a group, subjects traditionally considered intentionally blind exhibit residual sensitivity to features of stimuli (color, shape, and location) that they had all claimed not to notice, and likewise that as a group they could succeed at our forced-choice questions. 

      To ensure this claim is clear throughout the paper, and that we are not interpreted as making an unsupported universal claim we have revised the language in all of the quotations above, as follows, as well as in numerous other places in the paper.

      “Altogether this approach reveals that subjects can report above-chance the features of stimuli (color, shape, and location) that they had claimed not to notice under traditional yes/no questioning” (p.6) => “Altogether this approach reveals that as a group subjects can report above-chance the features of stimuli (color, shape, and location) that they had all claimed not to notice under traditional yes/no questioning” (p.6)

      “Even subjects who answer “no” under traditional questioning can still correctly report various features of the stimulus they just reported not having noticed, suggesting that they were at least partially aware of it after all.” (p.8) => “... even subjects who answer “no” under traditional questioning can, as a group, still correctly report various features of the stimuli they just reported not having noticed, indicating significant group-level sensitivity to visual features. Moreover, these results are even consistent with an alternative hypothesis about IB, that as a group, subjects who would traditionally be classified as inattentionally blind are in fact at least partially aware of the stimuli they deny noticing.” (p.8)

      “Why, if subjects could succeed at our forced-response questions, did they claim not to have noticed anything?” (p.8) => “Why, if subjects could succeed at our forcedresponse questions as a group, did they all individually claim not to have noticed anything?” (p.8)

      “we found that observers could successfully report a variety of features of unattended stimuli, even when they claimed not to have noticed these stimuli.” (p.14) => “we found that groups of observers could successfully report a variety of features of unattended stimuli, even when they all individually claimed not to have noticed those stimuli.” (p.14)

      “our results point to an alternative (and perhaps more straightforward) explanation: that inattentionally blind subjects consciously perceive these stimuli after all... they show sensitivity to IB stimuli because they can see them.” (p.16) => “our results just as easily raise an alternative (and perhaps more straightforward) explanation: that inattentionally blind subjects may retain a degree of awareness of these stimuli after all.” (p.16) Here deleting: “they show sensitivity to IB stimuli because they can see them.”

      “In other words, the inattentionally blind can see after all.” (p.17) => “In other words, as a group, the inattentionally blind enjoy at least some degraded or partial sensitivity to the location, color and shape of stimuli which they report not noticing.” (p.17)

      In one case, we felt the sentence was correct as it stood, since it simply reported a fact about our data:

      “In other words, nearly two-thirds of subjects who had just claimed not to have noticed any additional stimulus were then able to correctly report its location.” (p.6)

      After all, if subjects were entirely blind and simply guessed, it would be true to say that 50% of subjects would be able to correctly report the stimulus location (by guessing).

      In addition to these and numerous other changes, we also added the following explicit statement early in the paper to head-off any confusion on this point: “Note that all analyses reported here relate to this super subject as opposed to individual subjects”. 

      (3) In addition to the d' values for the super subject being slightly above zero, the authors attempted an analysis of response bias to further question the existence of IB. By including in some of their experiments critical trials in which no critical stimulus was presented, but asking subjects the standard Y/N IB question anyway, the authors obtained false alarm and correct rejection rates. When these FA/CR rates are taken into account along with hit/miss rates when critical stimuli were presented, the authors could calculate c (response criterion) for the super subject. Here, the authors report that response criteria are biased towards saying "no, I didn't notice anything". However, the validity of applying SDT to classic Y/N IB questioning is questionable.

      For example, with the subject numbers provided in Box 1 (the 2x2 table of hits/misses/FA/CR), one can ask, 'how many subjects would have needed to answer "yes, I noticed something unusual" when nothing was presented on the screen in order to obtain a non-biased criterion estimate, i.e., c = 0?' The answer turns out to be 800 subjects (out of the 2761 total subjects in the stimulus-absent condition), or 29% of subjects in this condition.

      In the context of these IB paradigms, it is difficult to imagine 29% of subjects claiming to have seen something unusual when nothing was presented. Here, it seems that we may have reached the limits of extending SDT to IB paradigms, which are very different than what SDT was designed for. For example, in classic psychophysical paradigms, the subject is asked to report Y/N as to whether they think a threshold-level stimulus was presented on the screen, i.e., to detect a faint signal in the noise. Subjects complete many trials and know in advance that there will often be stimuli presented and the stimuli will be very difficult to see. In those cases, it seems more reasonable to incorrectly answer "yes" 29% of the time, as you are trying to detect something very subtle that is out there in the world of noise. In IB paradigms, the stimuli are intentionally designed to be highly salient (and unusual), such that with a tiny bit of attention they can be easily seen. When no stimulus is presented and subjects are asked about their own noticing (especially of something unusual), it seems highly unlikely that 29% of them would answer "yes", which is the rate of FAs that would be needed to support the null hypothesis here, i.e., of a non-biased criterion. For these reasons, the analysis of response bias in the current context is questionable and the results claiming to demonstrate a biased criterion do not provide convincing evidence against IB.

      We are grateful to the reviewer for highlighting this aspect of our data. We agree with several of these points. For example, it is indeed striking that — given the corresponding hit rate — a false alarm rate of 29% would be needed to obtain an unbiased criterion. At the same time, we would respectfully push back on other points above. In our first experiment that uses the super-subject analysis, for example, d′ is 0.51 and highly significant; to describe that figure, as the reviewer does, as “slightly above zero” seemed not quite right to us (and all the more so given that these experiments involve very large samples and preregistered analysis plans). 

      We also respectfully disagree that our data call into question the validity of applying SDT to classic yes/no IB questioning. The mathematical foundations of SDT are rock solid, and have been applied far more broadly than we have applied them here. In fact, in a way we would suggest that exactly the opposite attitude is appropriate: rather than thinking that IB challenges an immensely well-supported, rigorously tested and broadly applicable mathematical model of perception, we think that the conflict between our SDT-based model of IB and the standard interpretation constitutes strong reason to disfavor the standard interpretation. Several points are worth making here.

      First, it is already surprising that 11.03% of our subjects in E2 (46/417) and 7.24% of our subjects in E5 (200/2761) E5 reported noticing a stimulus when no stimulus was present. But while this may have seemed unlikely in advance of inquiry, this is in fact what the data show and forms the basis of our criterion calculations. Thus, our criterion calculations already factor in a surprising but empirically verified high false alarm rate of subjects answering “yes” when no stimulus was presented and were asked about their noticing. (We also note that the only paper we know of to report a false alarm rate in an IB paradigm, though not one used to calculate a response criterion, found a very consistent false alarm rate of 10.4%. See Devue et al. 2009.)

      Second, while the reviewer is of course correct that a common psychophysical paradigm involves detection of a “threshold-level”/faint stimulus in noise, it is widely recognized that SDT has an extremely broad application, being applicable to any situation in which two kinds of event are to be discriminated (Pastore & Scheirer 1975) and being “almost universally accepted as a theoretical account of decision making in research on perceptual detection and recognition and in numerous extensions to applied domains” quite generally (Estes 2002, see also: Wixted 2020). Indeed, cases abound in which SDT has been successfully applied to situations which do not involve near threshold stimuli in noise. To pick two examples at random, SDT has been used in studying acceptability judgments in linguistics (Huang and Ferreira 2020) and the assessment of physical aggression in childstudent interactions (Lerman et al. 2010; for more general discussion of practical applications, see Swets et al. 2000). Given that the framework of SDT is so widely applied and well supported, and that we see no special reason to make an exception, we believe it can be relied on in the present context.

      Finally, we note that inattentional blindness can in many ways be considered analogous to “near threshold” detection since inattention is precisely thought to degrade or even abolish awareness of stimuli, meaning that our stimuli can be construed as near threshold in the relevant sense. Indeed, our relatively modest d′ values suggest that under inattention stimuli are indeed hard to detect. Thus, even were SDT more limited in its application, we think it still would be appropriate to apply to the case of IB.

      (4) One of the strongest pieces of evidence presented in the entire paper is the single data point in Figure 3e showing that in Experiment 3, even the super subject group that rated their non-noticing as "highly confident" had a d' score significantly above zero. Asking for confidence ratings is certainly an improvement over simple Y/N questions about noticing, and if this result were to hold, it could provide a key challenge to IB. However, this result hinges on a single data point, it was not replicated in any of the other 4 experiments, and it can be explained by methodological limitations. I strongly encourage the authors (and other readers) to follow up on this result, in an in-person experiment, with improved questioning procedures.

      We agree that our finding that even the super-subject group that rated their non-noticing as “highly confident” had a d' score significantly above zero is an especially strong piece of evidence, and we thank the reviewer for highlighting that here. At the same time, we note that while the finding is represented by a single marker in Figure 3e, it seemed not quite right to call this a “single data point”, as the reviewer does, given that it derives from a large pre-registered experiment involving some 7,000 subjects total, with over 200 subjects in the relevant bin — both figures being far larger than a typical IB experiment. It would of course be tremendous to follow up on this result – and we certainly hope our work inspires various follow-up studies. That said, we note that recruiting the necessary numbers of in person subjects would be an absolutely enormous, career-level undertaking – it would involve bringing more than the entire undergraduate population at our own institution, Johns Hopkins, into our laboratory! While those results would obviously be extremely valuable, we wouldn’t want to read the reviewer’s comments as implying that only an experiment of that magnitude — requiring thousands upon thousands of in-person subjects — could make progress on these issues. Indeed, because every subject can only contribute one critical trial in IB, it has long been recognized as an extremely challenging paradigm to study in a sufficiently well-powered and psychophysically rigorous way. We believe that our large preregistered online approach represents a major leap forward here, even if it involves certain trade-offs.

      In the current Experiment 3, the authors asked the standard Y/N IB question, and then asked how confident subjects were in their answer. Asking back-to-back questions, the second one with a scale that pertains to the first one (including a tricky inversion, e.g., "yes, I am confident in my answer of no"), may be asking too much of some subjects, especially subjects paying half-attention in online experiments. This procedure is likely to introduce a sizeable degree of measurement error.

      An easy fix in a follow-up study would be to ask subjects to rate their confidence in having noticed something with a single question using an unambiguous scale:

      On the last trial, did you notice anything besides the cross?

      (1): I am highly confident I didn't notice anything else

      (2): I am confident I didn't notice anything else

      (3): I am somewhat confident I didn't notice anything else

      (4): I am unsure whether I noticed anything else

      (5): I am somewhat confident I noticed something else

      (6): I am confident I noticed something else

      (7): I am highly confident I noticed something else

      If we were to re-run this same experiment, in the lab where we can better control the stimuli and the questioning procedure, we would most likely find a d' of zero for subjects who were confident or highly confident (1-2 on the improved scale above) that they didn't notice anything. From there on, the d' values would gradually increase, tracking along with the confidence scale (from 3-7 on the scale). In other words, we would likely find a data pattern similar to that plotted in Figure 3e, but with the first data point on the left moving down to zero d'. In the current online study with the successive (and potentially confusing) retrospective questioning, a handful of subjects could have easily misinterpreted the confidence scale (e.g., inverting the scale) which would lead to a mixture of genuine high-confidence ratings and mistaken ratings, which would result in a super subject d' that falls between zero and the other extreme of the scale (which is exactly what the data in Fig 3e shows).

      One way to check on this potential measurement error using the existing dataset would be to conduct additional analyses that incorporate the confidence ratings from the 2AFC location judgment task. For example, were there any subjects who reported being confident or highly confident that they didn't see anything, but then reported being confident or highly confident in judging the location of the thing they didn't see? If so, how many? In other words, how internally (in)consistent were subjects' confidence ratings across the IB and location questions? Such an analysis could help screen-out subjects who made a mistake on the first question and corrected themselves on the second, as well as subjects who weren't reading the questions carefully enough.

      As far as I could tell, the confidence rating data from the 2AFC location task were not reported anywhere in the main paper or supplement.

      We are grateful to the reviewer for raising this issue and for requesting that we report the confidence rating data from our 2afc location task in Experiment 3. We now report all this data in our Supplementary Materials (see Supplementary Table 3).

      We of course agree with the reviewer’s concern about measurement error, which is a concern in all experiments. What, then, of the particular concern that some subjects might have misunderstood our confidence question? It is surely impossible in principle to rule out this possibility; however, several factors bear on the plausibility of this interpretation. First, we explicitly labeled our confidence scale (with 0 labeled as ‘Not at all confident’ and 3 as ‘Highly confident’) so that subjects would be very unlikely simply to invert the scale. This is especially so as it is very counterintuitive to treat “0” as reflecting high confidence. However, we accept that it is a possibility that certain subjects might nonetheless have been confused in some other way.

      So, we also took a second approach. We examined the confidence ratings on the 2afc question of subjects who reported being highly confident that they didn't notice anything.

      Reassuringly, the large majority of these high confidence “no” subjects (~80%) reported low confidence of 0 or 1 on the 2afc question, and the majority (51%) reported the lowest confidence of 0. Only 18/204 (9%) subjects reported high confidence on both questions. 

      Still, the numbers of subjects here are small and so may not be reliable. This led us to take a third approach. We reasoned that any measurement error due to inverting or misconstruing the confidence scale should be symmetric. That is: If subjects are liable to invert the confidence scale, they should do so just as often when they answer “yes” as when they answer “no” – after all the very same scale is being used in both cases. This allows us to explore evidence of measurement error in relation to the much larger number of highconfidence “yes” subjects (N = 2677), thus providing a much more robust indicator as to whether subjects are generally liable to misconstrue the confidence scale. Looking at the number of such high confidence noticers who subsequently respond to the 2afc question with low-confidence, we found that the number was tiny. Only 28/2677 (1.05%) of highconfidence noticers subsequently gave the lowest level of confidence on the 2afc question, and only 63/2677 (2.35%) subjects gave either of the two lower levels of confidence. In this light, we consider any measurement error due to misunderstanding the confidence scale to be extremely minimal.

      What should we make of the 18 subjects who were highly confident non-noticers but then only low-confidence on the 2afc question? Importantly, we do not think that these 18 subjects necessarily made a mistake on the first question and so should be excluded. There is no a priori reason why one’s confidence criterion in a yes/no question should carry over to a 2afc question. After all, it is perfectly rationally coherent to be very confident that one didn’t see anything but also very confident that if there was anything to be seen, it was on the left. Moreover, these 18 subjects were not all correct on the 2afc question despite their high confidence (4/18 or 22% getting the wrong answer). 

      Nonetheless, and again reassuringly, we found that the above-chance patterns in our data remained the same even excluding these 18 subjects. We did observe a slight reduction in percent correct and d′ but this is absolutely what one should expect since excluding the most confident performers in any task will almost inevitably reduce performance.

      In this light, we consider it unlikely that measurement error fully explains the residual sensitivity found even amongst highly confident non-noticers. That said, we appreciate this concern. We now raise the issue and the analysis of high confidence noticers which addresses it in our revised manuscript. We also thank the reviewer for pressing us to think harder about this issue, which led directly to these new analyses that we believed have strengthened the paper.

      (5) In most (if not all) IB experiments in the literature, a partial attention and/or full attention trial (or set of trials) is administered after the critical trial. These control trials are very important for validating IB on the critical trial, as they must show that, when attended, the critical stimuli are very easy to see. If a subject cannot detect the critical stimulus on the control trial, one cannot conclude that they were inattentionally blind on the critical trial, e.g., perhaps the stimulus was just too difficult to see (e.g., too weak, too brief, too far in the periphery, too crowded by distractor stimuli, etc.), or perhaps they weren't paying enough attention overall or failed to follow instructions. In the aggregate data, rates of noticing the stimuli should increase substantially from the critical trial to the control trials. If noticing rates are equivalent on the critical and control trials one cannot conclude that attention was manipulated.

      It is puzzling why the authors decided not to include any control trials with partial or full attention in their five experiments, especially given their online data collection procedures where stimulus size, intensity, eccentricity, etc. were uncontrolled and variable across subjects. Including such trials could have actually helped them achieve their goal of challenging the IB hypothesis, e.g., excluding subjects who failed to see the stimulus on the control trials might have reduced the inattentional blindness rates further. This design decision should at least be acknowledged and justified (or noted as a limitation) in a revision of this paper.

      We acknowledge that other studies in the literature include divided and full attention trials, and that they could have been included in our work as well. However, we deliberately decided not to include such control trials for an important reason. As the referee comments, the main role of such trials in previous work has been to exclude from analysis subjects who failed to report the unexpected stimulus on the divided and/or full attention control trials.

      (For example, as Most et al. 2001 write: “Because observers should have seen the object in the full-attention trial (Mack & Rock, 1998), we used this trial as a control … Accordingly, 3 observers who failed to see the cross on this trial were replaced, and their data were excluded from the analyses.") As the reviewer points out, excluding such subjects would very likely have ‘helped' us. However, the practice is controversial. Indeed, in a review of 128 experiments, White et al. 2018 argue that the practice has “problematic consequences” and “may lead researchers to understate the pervasiveness of inattentional blindness". Since we wanted to offer as simple and demanding a test of residual sensitivity in IB as possible, we thus decided not to use any such exclusions, and for that reason decided not to include divided/full attention trials. 

      As recommended, we discuss this decision not to include divided/full attention trials and our logic for not doing so in the manuscript. As we explain, not having those conditions makes it more impressive, not less impressive, that we observed the results we in fact did — it makes our results more interpretable, not less interpretable, and so absence of such conditions from our manuscript should not (in our view) be considered any kind of weakness.

      (6) In the discussion section, the authors devote a short paragraph to considering an alternative explanation of their non-zero d' results in their super subject analyses: perhaps the critical stimuli were processed unconsciously and left a trace such that when later forced to guess a feature of the stimuli, subjects were able to draw upon this unconscious trace to guide their 2AFC decision. In the subsequent paragraph, the authors relate these results to above-chance forced-choice guessing in blindsight subjects, but reject the analogy based on claims of parsimony.

      First, the authors dismiss the comparison of IB and blindsight too quickly. In particular, the results from experiment 3, in which some subjects adamantly (confidently) deny seeing the critical stimulus but guess a feature at above-chance levels (at least at the super subject level and assuming the online subjects interpreted and used the confidence scale correctly), seem highly analogous to blindsight. Importantly, the analogy is strengthened if the subjects who were confident in not seeing anything also reported not being confident in their forced-choice judgments, but as mentioned above this data was not reported.

      Second, the authors fail to mention an even more straightforward explanation of these results, which is that ~8% of subjects misinterpreted the "unusual" part of the standard IB question used in experiments 1-3. After all, colored lines and shapes are pretty "usual" for psychology experiments and were present in the distractor stimuli everyone attended to. It seems quite reasonable that some subjects answered this first question, "no, I didn't see anything unusual", but then when told that there was a critical stimulus and asked to judge one of its features, adjusted their response by reconsidering, "oh, ok, if that's the unusual thing you were asking about, of course I saw that extra line flash on the left of the screen". This seems like a more parsimonious alternative compared to either of the two interpretations considered by the authors: (1) IB does not exist, (2) super-subject d' is driven by unconscious processing. Why not also consider: (3) a small percentage of subjects misinterpreted the Y/N question about noticing something unusual. In experiments 4-5, they dropped the term "unusual" but do not analyze whether this made a difference nor do they report enough of the data (subject numbers for the Y/N question and 2AFC) for readers to determine if this helped reduce the ~8% overestimate of IB rates.

      Our primary ambition in the paper was to establish, as our title suggests, residual sensitivity in IB. The ambition is quite neutral as to whether the sensitivity reflects conscious or unconscious processing (i.e. is akin to blindsight as traditionally conceived). We were evidently not clear about this, however, leading to two referees coming away with an impression of our claims that is different than we intended. We have revised our manuscript throughout to address this. But we also want to emphasize here that we take our data primarily to support the more modest claim that there is residual sensitivity (conscious or unconscious) in the group of subjects who are traditionally classified as inattentionally blind. We believe that this claim has solid support in our data.

      We do in the discussion section offer one reason for believing that there is residual awareness in the group of subjects who are traditionally classified as inattentionally blind. However, we acknowledge that this is controversial and now emphasize in the manuscript that this claim “is tentative and secondary to our primary finding”. We also emphasize that part of our point is dialectical: Inattentional blindness has been used to argue (e.g.) that attention is required for awareness. We think that our data concerning residual sensitivity at least push back on the use of IB to make this claim, even if they do not provide decisive evidence (as we agree) that awareness survives inattention. (Cf. here, Hirshhorn et al. 2024 who take up a common suggestion in the field that awareness is best assessed by using both subjective and objective measures, with claims about lack of awareness ideally being supported by both; our data suggest at a minimum that in IB objective measures do not neatly line up with subjective measures.)

      We hope this addresses the referee’s concern that we dismiss the “the comparison of IB and blindsight too quickly”. We do not intend to dismiss that comparison at all, indeed we raise it because we consider it a serious hypothesis. Our aim is simply to raise one possible consideration against it. But, again, our main claim is quite consistent with sensitivity in IB being akin to “blindsight”.

      We also agree with the referee that a possible explanation of why some subjects say they do not notice something unusual in IB paradigms, is not because they didn’t notice anything but because they didn’t consider the unexpected stimulus sufficiently unusual. However, the reviewer is incorrect that we did not mention this interpretation; to the contrary, it was precisely the kind of concern which led us to be dissatisfied with standard IB methods and so motivated our approach. As we wrote in our main text: “However, yes/no questions of this sort are inherently and notoriously subject to bias…   For example, observers might be under-confident whether they saw anything (or whether what they saw counted as unusual); this might lead them to respond “no” out of an excess of caution.” On our view, this is exactly the kind of reason (among other reasons) that one cannot rely on yes/no reports of noticing unusual stimuli, even though the field has relied on just these sorts of questions in just this way.

      We do not, however, think that this explanation accounts for why all subjects fail to report noticing, nor do we think that it accounts for our finding of above-chance sensitivity amongst non-noticers. This is for two critical reasons. First, whereas the word “unusual” did appear in the yes/no question in our Experiments 1-3, it did not appear in our Experiments 4 and 5 on dynamic IB. (In both cases, we used the exact wording of such questions in the experiments we were basing our work on.) And, of course, we still found significant residual sensitivity amongst non-noticers in Experiments 4 and 5. Second, in relation to our confidence experiment, we think it unlikely that subjects who were highly confident that they did not notice anything unusual only said that because they thought what they had seen was insufficiently unusual. Yet even in this group of subjects who were maximally confident that they did not notice anything unusual, we still found residual sensitivity.

      (7) The authors use sub-optimal questioning procedures to challenge the existence of the phenomenon this questioning is intended to demonstrate. A more neutral interpretation of this study is that it is a critique on methods in IB research, not a critique on IB as a manipulation or phenomenon. The authors neglect to mention the dozens of modern IB experiments that have improved upon the simple Y/N IB questioning methods. For example, in Michael Cohen's IB experiments (e.g., Cohen et al., 2011; Cohen et al., 2020; Cohen et al., 2021), he uses a carefully crafted set of probing questions to conservatively ensure that subjects who happened to notice the critical stimuli have every possible opportunity to report seeing them. In other experiments (e.g., Hirschhorn et al., 2024; Pitts et al., 2012), researchers not only ask the Y/N question but then follow this up by presenting examples of the critical stimuli so subjects can see exactly what they are being asked about (recognition-style instead of free recall, which is more sensitive). These follow-up questions include foil stimuli that were never presented (similar to the stimulus-absent trials here), and ask for confidence ratings of all stimuli. Conservative, pre-defined exclusion criteria are employed to improve the accuracy of their IB-rate estimates. In these and other studies, researchers are very cautious about trusting what subjects report seeing, and in all cases, still find substantial IB rates, even to highly salient stimuli. The authors should consider at least mentioning these improved methods, and perhaps consider using some of them in their future experiments.

      The concern that we do not sufficiently discuss the range of “improved” methods in IB studies is well-taken. A similar concern is raised by Reviewer #2 (Dr. Cohen). To address the concern, we have added to our manuscript a substantial new discussion of such improved methods. However, although we do agree that these methods can be helpful and may well address some of the methodological concerns which our paper raises, we do not think that they are a panacea. Thus, our discussion of these methods also includes a substantial discussion of the problems and pitfalls with such methods which led us to favor our own simple forced-response and 2afc questions, combined with SDT analysis. We think this approach is superior both to the classic approach in IB studies and to the approach raised by the reviewers.

      In particular, we have four main concerns about the follow up questions now commonly used in the field:

      First, many follow up questions are used not to exclude people from the IB group but to include people in the IB group. Thus, Most et al. 2001 asked follow up questions but used these to increase their IB group, only excluding subjects from the IB group if they both reported seeing and answered their follow ups incorrectly: “Observers were regarded as having seen the unexpected object if they answered 'yes' when asked if they had seen anything on the critical trial that had not been present before and if they were able to describe its color, motion, or shape." This means that subjects who saw the object but failed to see its color, say, would be treated as inattentionally blind. This has the purpose of inflating IB rates, in exactly the way our paper is intended to critique. So, in our view this isn’t an improvement but rather part of the approach we take issue with.

      Second, many follow up questions remain yes/no questions or nearby variants, all of which are subject to response bias. For example, in Cohen’s studies which the reviewer mentions, it is certainly true that “he uses a carefully crafted set of probing questions to conservatively ensure that subjects who happened to notice the critical stimuli have every possible opportunity to report seeing them.” We agree that this improves over a simple yes/no question in some ways. However, such follow up probes nonetheless remain yes/no questions, subject to response bias, e.g.:

      (1) “Did you notice anything strange or different about that last trial?”

      (2) “If I were to tell you that we did something odd on the last trial, would you have a guess as to what we did?”

      (3) “If I were to tell you we did something different in the second half of the last trial, would you have a guess as to what we did?”

      (4) “Did you notice anything different about the colors in the last scene?”

      Indeed, follow up questions of this kind can be especially susceptible to bias, since subjects may be reluctant to “take back” their earlier answers and so be conservative in responding positively to avoid inconsistency or acknowledgement of earlier error. This may explain why such follow up questions produce remarkable consistency despite their rather different wording. Thus, Simons and Chabris (1999) report: “Although we asked a series of questions escalating in specificity to determine whether observers had noticed the unexpected event, only one observer who failed to report the event in response to the first question (“did you notice anything unusual?'') reported the event in response to any of the next three questions (which culminated in “did you see a ... walk across the screen?''). Thus, since the responses were nearly always consistent across all four questions, we will present the results in terms of overall rates of noticing.” Thus, while there are undoubtedly merits to these follow ups, they do not resolve problems of bias.

      This same basic issue affects the follow up question used in Pitts et al. 2012 which the reviewer mentions. Pitts et al. write: “If a participant reported not seeing any patterns and rated their confidence in seeing the square pattern (once shown the sample) as a 3 or less (1 = least confident, 5 = most confident), she or he was placed in Group 1 and was considered to be inattentionally blind to the square patterns.” The confidence rating follow-up question here remains subject to bias. Moreover, and strikingly, the inclusion criterion used means that subjects who were moderately confident that they saw the square pattern when shown (i.e. answered 3) were counted as inattentionally blind (!). We do not think this is an appropriate inclusion criterion.

      The third problem is that follow up questions are often free/open-response. For instance, Most et al. (2005) ask the follow up question: "If you did see something on the last trial that had not been present during the first two trials, what color was it? If you did not see something, please guess." This is a much more difficult and to that extent less sensitive question than our binary forced-response/2afc questions. For this reason, we believe our follow up questions are more suitable for ascertaining low levels of sensitivity.

      The fourth and final issue is that whereas 2afc questions are criterion free (in that they naturally have an unbiased decision rule), this is in fact not true of n_afc questions in general, nor is it true in general of _delayed n-alternative match to sample designs. Thus, even when limited response options are given, they are not immune to response biases and so require SDT analysis. Moreover, some such tasks can involve decision spaces which are often poorly understood or difficult to analyze without making substantial assumptions about observer strategy. 

      This last point (as well as the first) is relevant to Hirshhorn et al. 2024. Hirshhorn et al. write that they “used two awareness measures. Firstly, participants were asked to rate stimulus visibility on the Perceptual Awareness Scale (PAS, a subjective measure of awareness: Ramsøy & Overgaard, 2004), and then they were asked to select the stimulus image from an array of four images (an objective measure: Jakel & Wichmann, 2006).”

      While certainly an improvement on simple yes/no questioning, the PAS remains subject to response bias. On the other hand, we applaud Hirshhorn et al.’s use of objective measures in the context of IB which of course our design implements. However, while Hirshhorn et al. 2024 suggest that their task is a spatial 4afc following the recommendation of this design by Jakel & Wichmann (2006), it is strictly a 4-alternative delayed match to sample task, so it is doubtful if it can be considered a preferred psychophysical task for the reasons Jakel & Wichmann offer. Regardless, the more crucial point is that observers in such a task might be biased towards one alternative as opposed to another. Thus, use of d′ (as opposed to percent correct as in Hirshhorn et al. 2024) is crucial in assessing performance in such tasks.

      For all these reasons, then, while we agree that the field has taken significant steps to move beyond the simple yes/no question traditionally used in IB studies (and we have revised our manuscript to make this clear); we do not think it has resolved the methodological issues which our paper seeks to highlight and address, and we believe that our approach contributes something additional that is not yet present in the literature. We have now revised our manuscript to make these points much more clearly, and we thank the reviewer for prompting these improvements.

      Reviewer #2 (Public review):

      In this study, Nartker et al. examine how much observers are conscious of using variations of classic inattentional blindness studies. The key idea is that rather than simply asking observers if they noticed a critical object with one yes/no question, the authors also ask follow-up questions to determine if observers are aware of more than the yes/no questions suggest. Specifically, by having observers make forced choice guesses about the critical object, the authors find that many observers who initially said "no" they did not see the object can still "guess" above chance about the critical object's location, color, etc. Thus, the authors claim, that prior claims of inattentional blindness are mistaken and that using such simple methods has led numerous researchers to overestimate how little observers see in the world. To quote the authors themselves, these results imply that "inattentionally blind subjects consciously perceive these stimuli after all... they show sensitivity to IB stimuli because they can see them."

      Before getting to a few issues I have with the paper, I do want to make sure to explicitly compliment the researchers for many aspects of their work. Getting massive amounts of data, using signal detection measures, and the novel use of a "super subject" are all important contributions to the literature that I hope are employed more in the future.

      We really appreciate this comment and that the reviewer found our work to make these important contributions to the literature. We wrote this paper expecting not everyone to accept our conclusions, but hoping that readers would see the work as making a valuable contribution to the literature promoting an underexplored alternative in a compelling way. Given that this reviewer goes on to express some skepticism about our claims, it is especially encouraging to see this positive feedback up top!

      Main point 1: My primary issue with this work is that I believe the authors are misrepresenting the way people often perform inattentional blindness studies. In effect, the authors are saying, "People do the studies 'incorrectly' and report that people see very little. We perform the studies 'correctly' and report that people see much more than previously thought." But the way previous studies are conducted is not accurately described in this paper. The authors describe previous studies as follows on page 3:

      "Crucially, however, this interpretation of IB and the many implications that follow from it rest on a measure that psychophysics has long recognized to be problematic: simply asking participants whether they noticed anything unusual. In IB studies, awareness of the unexpected stimulus (the novel shape, the parading gorilla, etc.) is retroactively probed with a yes/no question, standardly, "Did you notice anything unusual on the last trial which wasn't there on previous trials?". Any subject who answers "no" is assumed not to have any awareness of the unexpected stimulus.

      If this quote were true, the authors would have a point. Unfortunately, I do not believe it is true. This is simply not how many inattentional blindness studies are run. Some of the most famous studies in the inattentional blindness literature do not simply as observes a yes/no question (e.g., the invisible gorilla (Simons et al. 1999), the classic door study where the person changes (Simons and Levin, 1998), the study where observers do not notice a fight happening a few feet from them (Chabris et al., 2011). Instead, these papers consistently ask a series of follow-up questions and even tell the observers what just occurred to confirm that observers did not notice that critical event (e.g., "If I were to tell you we just did XYZ, did you notice that?"). In fact, after a brief search on Google Scholar, I was able to relatively quickly find over a dozen papers that do not just use a yes/no procedure, and instead as a series of multiple questions to determine if someone is inattentionally blind. In no particular order some papers (full disclosure: including my own):

      (1) Most et al. (2005) Psych Review

      (2) Drew et al. (2013) Psych Science

      (3) Drew et al. (2016) Journal of Vision

      (4) Simons et al. (1999) Perception

      (5) Simons and Levin (1998) Perception

      (6) Chabris et al. (2011) iPerception

      (7) Ward & Scholl (2015) Psych Bulletin and Review

      (8) Most et al. (2001) Psych Science

      (9) Todd & Marois (2005) Psych Science

      (10) Fougnie & Marois (2007) Psych Bulletin and Review

      (11) New and German (2015) Evolution and Human Behaviour

      (12) Jackson-Nielsen (2017) Consciousness and cognition

      (13) Mack et al. (2016) Consciousness and cognition

      (14) Devue et al. (2009) Perception

      (15) Memmert (2014) Cognitive Development

      (16) Moore & Egeth (1997) JEP:HPP

      (17) Cohen et al. (2020) Proc Natl Acad Sci

      (18) Cohen et al. (2011) Psych Science

      This is a critical point. The authors' key idea is that when you ask more than just a simple yes/no question, you find that other studies have overestimated the effects of inattentional blindness. But none of the studies listed above only asked simple yes/no questions. Thus, I believe the authors are mis-representing the field. Moreover, many of the studies that do much more than ask a simple yes/no question are cited by the authors themselves! Furthermore, as far as I can tell, the authors believe that if researchers do these extra steps and ask more follow-ups, then the results are valid. But since so many of these prior studies do those extra steps, I am not exactly sure what is being criticized.

      To make sure this point is clear, I'd like to use a paper of mine as an example. In this study (Cohen et al., 2020, Proc Natl Acad Sci USA) we used gaze-contingent virtual reality to examine how much color people see in the world. On the critical trial, the part of the scene they fixated on was in color, but the periphery was entirely in black and white. As soon as the trial ended, we asked participants a series of questions to determine what they noticed. The list of questions included:

      (1) "Did you notice anything strange or different about that last trial?"

      (2) "If I were to tell you that we did something odd on the last trial, would you have a guess as to what we did?"

      (3) "If I were to tell you we did something different in the second half of the last trial, would you have a guess as to what we did?"

      (4) "Did you notice anything different about the colors in the last scene?"

      (5) We then showed observers the previous trial again and drew their attention to the effect and confirmed that they did not notice that previously.

      In a situation like this, when the observers are asked so many questions, do the authors believe that "the inattentionally blind can see after all?" I believe they would not say that and the reason they would not say that is because of the follow-up questions after the initial yes/no question. But since so many previous studies use similar follow-up questions, I do not think you can state that the field is broadly overestimating inattentional blindness. This is why it seems to me to be a bit of a strawman: most people do not just use the yes/no method.

      We appreciate this reviewer raising this issue. As he (Dr. Cohen) states, his “primary issue” concerns our discussion of the broader literature (which he worries understates recent improvements made to the IB methodology), rather than, e.g., the experiments we’ve run. We take this concern very seriously and address it comprehensively here.

      A very similar issue is identified by Reviewer #1, comment (7). To review some of what we say in reply to them: To address the concern we have added to our manuscript a substantial new discussion of such improved methods. However, although we do agree that these methods can be helpful and may well address some of the methodological concerns which our paper raises, we do not think that they are a panacea. Thus, our discussion of these methods also includes a substantial discussion of the problems and pitfalls with such methods which led us to favor our own simple forced-response and 2afc questions, combined with SDT analysis. We think this approach is superior both to the classic approach in IB studies and to the approach raised by the reviewers.

      In particular, we have three main concerns about the follow up questions now commonly used in the field:

      First, many follow up questions are used not to exclude subjects from the IB group but to include subjects in the IB group. Thus, Most et al. (2001) asked follow up questions but used these to increase their IB group, only excluding subjects from the IB group if they both reported seeing and failed to answer their follow ups correctly: “Observers were regarded as having seen the unexpected object if they answered 'yes' when asked if they had seen anything on the critical trial that had not been present before and if they were able to describe its color, motion, or shape." This means that subjects who saw the object but failed to describe it in these respects would be treated as inattentionally blind. This is problematic since failure to describe a feature (e.g., color, shape) does not imply a complete lack of information concerning that feature; and even if a subject did lack all information concerning these features of an object, this would not imply a complete failure to see the object. Similarly, Pitts et al. (2012) asked subjects to rate their confidence in their initial yes/no response from 1 = least confident to 5 = most confident, and used these ratings to include in the IB group those who rated their confidence in seeing at 3 or less. This is evidently problematic, since there is a large gap between being under confident that one saw something and being completely blind to it. More generally, using follows up to inflate IB rates in such ways raises precisely the kinds of issues our paper is intended to critique. So in our view this isn’t an improvement but rather part of the approach we take issue with.

      Second, many follow up questions remain yes/no questions or nearby variants, all of which are subject to response bias. For example, in the reviewer’s own studies (Cohen et al. 2020, 2011; see also: Simons et al., 1999; Most et al., 2001, 2005; Drew et al., 2013; Memmert, 2014) a series of follow up questions are used to try and ensure that subjects who noticed the critical stimuli are given the maximum opportunity to report doing so, e.g.:

      (1) “Did you notice anything strange or different about that last trial?”

      (2) “If I were to tell you that we did something odd on the last trial, would you have a guess as to what we did?”

      (3) “If I were to tell you we did something different in the second half of the last trial, would you have a guess as to what we did?”

      (4) “Did you notice anything different about the colors in the last scene?”

      We certainly agree that such follow up questions improve over a simple yes/no question in some ways. However, such follow up probes nonetheless remain yes/no questions, intrinsically subject to response bias. Indeed, follow up questions of this kind can be especially susceptible to bias, since subjects may be reluctant to “take back” their earlier answers and so be conservative in responding positively to avoid inconsistency or acknowledgement of earlier error. This may explain why such follow up questions produce remarkable consistency despite their rather different wording. Thus, Simons and Chabris (1999) report: “Although we asked a series of questions escalating in specificity to determine whether observers had noticed the unexpected event, only one observer who failed to report the event in response to the first question (“did you notice anything unusual?'') reported the event in response to any of the next three questions (which culminated in “did you see a ... walk across the screen?''). Thus, since the responses were nearly always consistent across all four questions, we will present the results in terms of overall rates of noticing.” Thus, while there are undoubtedly merits to these follow ups, they do not resolve problems of bias.

      It is also important to recognize that whereas 2afc questions are criterion free (in that they naturally have an unbiased decision rule), this is not true of n_afc nor delayed _n-alternative match to sample designs in general. Performance in such tasks thus requires SDT analysis – which itself may be problematic if the decision space is not properly understood or requires making substantial assumptions about observer strategy.

      Third, and finally, many follow up questions are insufficiently sensitive (especially with small sample sizes). For instance, Todd, Fougnie & Marois (2005) used a 12-alternative match-tosample task (see similarly: Fougnie & Marois, 2007; Devue et al., 2009). And Most et al. (2005) asked an open-response follow-up: “If you did see something on the last trial that had not been present during the first two trials, what color was it? If you did not see something, please guess.” These questions are more difficult and to that extent less sensitive than binary forced-response/2afc questions of the sort we use in our own studies – a difference which may be critical in uncovering degraded perceptual sensitivity.

      For all these reasons, then, while we agree that the field has taken significant steps to move beyond the simple yes/no question traditionally used in IB studies (and we have revised our manuscript to make this clear); we do not think it has resolved the methodological issues which our paper seeks to highlight and address, and we believe that our approach of using 2afc or forced-response questions combined with signal detection analysis is an important improvement on prior methods and contributes something additional that is not yet present in the literature. We have now revised our manuscript to make these points much clearer.

      Other studies that improve on the standard methodology

      This reviewer adds something else, however: A very helpful list of 18 papers which include follow ups and that he believes overcome many of the issues we raise in our paper. To just state our reaction bluntly: We are familiar with every one of these papers (indeed, one of them is a paper by one of us!), and while we think these are all very valuable contributions to the literature, it is our view that none of these 18 papers resolves the worries that led us to conduct our work.  

      Here we briefly comment on the relevant pitfalls in each case. We hope this serves to underscore the importance of our methodological approach.

      (1) Most et al. (2005) Psych Review

      Either a 2-item or 5-item questionnaire was used. The 2-item questionnaire ran as follows:

      (1) On the last trial, did you see anything other than the 4 circles and the 4 squares (anything that had not been present on the original two trials)? Yes No 

      (2) If you did see something on the last trial that had not been present during the original two trials, please describe it in as much detail as possible.

      This clearly does not substantially improve on the traditional simple yes/no question. Moreover, the second question (as well as being open-ended) was used to include additional subjects in the IB group, in that participants were counted as having seen the object only if they responded “yes” to Q1 and in addition “were able to report at least one accurate detail” in response to Q2. In other words, either a subject says “no” (and is treated as unaware), or says “yes” and then is asked to prove their awareness, as it were. If anything, this intensifies the concerns we raise, by inflating IB rates. 

      The 5-item questionnaire looked like this: 

      (1) On the last trial, did you see anything other than the black and white L’s and T’s (anything that had not been present on the first two trials)?

      (2) If you did see something on the last trial that had not been present during the first two trials, please describe it.

      (3) If you did see something on the last trial that had not been present during the first two trials, what color was it? If you did not see something, please guess. (Please indicate whether you did see something or are guessing)

      (4) If you did see something during the last trial that had not been present in the first two trials, please draw an arrow on the “screen” below showing the direction in which it was moving. If you did not see something, please guess. (Please indicate whether you did see something or are guessing)

      (5) If you did see something during the last trial that had not been present during the first two trials, please circle the shape of the object below [4 shapes are presented to choose from]. If you did not see anything, please guess. (Please indicate whether you did see something or are guessing)

      Q5 was not used for analysis purposes. (It suffers from the second issue raised above.) Q1 is the traditional y/n question. Qs 2&3 are open ended. It is unclear how responses to Q4 were analyzed (at the limit it could be considered a helpful, forced-choice question – though it again would suffer from the second issue raised above). However, as noted with respect to the 2-item questionnaire, these responses were not used to exclude people from the IB group but to include people in it. So again, this approach does not in any way address the issues we are concerned about, and if anything, only makes them worse. 

      (2)  Drew et al. (2013) Psych Science

      All follow ups were yes/no: “we asked a series of questions to determine whether they noticed the gorilla: ‘Did the final trial seem any different than any of the other trials?’, ‘Did you notice anything unusual on the final trial?’, and, finally, ‘Did you see a gorilla on the final trial?’”. So, this paper essentially implements the standard methodology we mention (and criticize). 

      (3)  Drew et al. (2016) Journal of Vision

      Follow up questions were used, but the reported procedure does not provide sufficient details to evaluate them (we are only told: “After the final trial, they were asked: ‘On that last trial of the task, did you notice anything that was not there on previous trials?’ They then answered questions about the features of the unexpected stimulus on a separate screen (color, shape, movement, and direction of movement).”). It is not clear that these follow ups were used to exclude any subjects from the analysis. Finally, given that the unexpected object could be the same color as the targets/distractors, it is clear that biases would have been introduced which would need to be considered (but which were not).

      (4)  Simons & Chabris (1999) Perception

      All follow ups were yes/no: “observers were … asked to provide answers to a surprise series of additional questions. (i) While you were doing the counting, did you notice anything unusual on the video? (ii) Did you notice any- thing other than the six players? (iii) Did you see anyone else (besides the six players) appear on the video? (iv) Did you see a gorilla [woman carrying an umbrella] walk across the screen? After any “yes'' response, observers were asked to provide details of what they noticed. If at any point an observer mentioned the unexpected event, the remaining questions were skipped.” As noted previously, the analyses in fact did not use these questions to exclude subjects since answers were so consistent.

      (5)  Simons and Levin (1998) Perception

      This is a change detection paradigm, not a study of inattentional blindness. And in any case, one yes/no follow up was used: “Did you notice that I'm not the same person who approached you to ask for directions?”

      (6)  Chabris et al. (2011) iPerception

      Two yes/no questions were asked: “we asked whether the subjects had seen anything unusual along the route, and then whether they had seen anyone fighting.” It seems that follow up questions (a request to describe the fight) were asked only of those who said yes.

      This is in fact a common procedure – follow up questions only being asked of the “yes” group. As discussed, it is sometimes used to increase rates of IB, compounding the problem we identify in our paper. So this is another example of a follow-up question that makes the problem we identify worse, not better.

      (7) Ward & Scholl (2015) Psych Bulletin and Review

      Two yes/no questions were used: “...observers were asked whether they noticed ‘anything … that was different from the first three trials’ — and if so, to describe what was different. They were then shown the gray cross and asked if they had noticed it—and if so, to describe where it was and how it moved. Only observers who explicitly reported not noticing the cross were counted as ‘nonnoticers’ to be included in the final sample (N = 100).” In each case, combining the traditional noticing question with a request to describe and identify may have induced conservative response biases in the noticing question, since a subject might consider being able to describe or identify the unexpected stimulus a precondition of giving a positive answer to the noticing question.

      (8) Most et al. (2001) Psych Science

      The same 5-item questionnaire discussed above in relation to Most et al. (2005) was used: 

      (1) On the last trial, did you see anything other than the black and white L’s and T’s (anything that had not been present on the first two trials)?

      (2)   If you did see something on the last trial that had not been present during the first two trials, please describe it.

      (3) If you did see something on the last trial that had not been present during the first two trials, what color was it? If you did not see something, please guess. (Please indicate whether you did see something or are guessing)

      (4) If you did see something during the last trial that had not been present in the first two trials, please draw an arrow on the “screen” below showing the direction in which it was moving. If you did not see something, please guess. (Please indicate whether you did see something or are guessing)

      (5) If you did see something during the last trial that had not been present during the first two trials, please circle the shape of the object below [4 shapes are presented to choose from]. If you did not see anything, please guess. (Please indicate whether you did see something or are guessing)

      Q5 was not used for analysis purposes. (It suffers from the second issue raised above.) Q1 is the traditional yes/no question. Qs 2&3 are open ended. It is unclear how responses to Q4 were analyzed (at the limit it could be considered a helpful, forced-choice question – though it again would suffer from the second issue raised above). However, as noted with respect to the two item questionnaire in Most et al. 2005, these responses were not used to exclude people from the IB group but to include people in it. So again this approach does not in any way address the issues we are concerned about, and if anything only makes them worse.

      (9) Todd, Fougnie & Marois (2005) Psych Science

      “participants were probed with three questions to determine whether they had detected the critical stimulus ... .The first question assessed whether subjects had seen anything unusual during the trial; they responded ‘‘yes’’ or ‘‘no’’ by pressing the appropriate key on the keyboard. The second question asked participants to select which stimulus they might have seen among 12 possible objects and symbols selected from MacIntosh font databases. The third question asked participants to select the quadrant in which the critical stimulus may have appeared by pressing one of four keys, each of which corresponded to one of the quadrants.”

      These follow ups were used to include people in the IB group: “In keeping with previous studies (Most et al., 2001), participants were considered to have detected the critical stimulus successfully if they (a) reported seeing an unexpected stimulus and (b) correctly selected its quadrant location.” In line with our third point about sensitivity, the object identity test transpired to be “too difficult even under full-attention conditions … Thus, performance with this question was not analyzed further.”

      (10) Fougnie & Marois (2007) Psych Bulletin and Review

      Same exact methods and problems as with Todd & Marois (2005) Psych Science, just discussed.

      (11) New and German (2015) Evolution and Human Behaviour

      “After the fourth trial containing the additional experimental stimulus, the participant was asked, “Did you see anything in addition to the cross on that trial?” and which quadrant the additional stimulus appeared in. They were then asked to identify the stimulus in an array which in Experiment 1 included two variants chosen randomly from the spider stimuli and the two needle stimuli. Participants in Experiment 2 picked from all eight stimuli used in that experiment.”

      Our second concern about response biases and the need for appropriate SDT analysis of the 4/8 alternative tasks applies to all these questions. We also note that analyses were only performed on groups separately (those who detected/failed to detect, those who located/failed to locate, and those who identified/failed to identify) and on the group which did all three/failed to do any one of the three. Especially in light of the fact that some subjects could clearly detect the stimulus without being able to identity it (e.g.), the most stringent test given our concerns (which were not obviously New and German’s comparative concerns), would be to consider the group which could not detect, identify or localize.

      (12) Jackson-Nielsen (2017) Consciousness and cognition

      This is a very interesting example of a follow-up which used a 3-AFC recognition test:

      “participants were immediately asked, ‘‘which display looks most like what you just saw?’ from 3 alternatives”. However, though such an objective test is definitely to be preferred in our view to an open-ended series of probes, the 3-AFC test administered clearly had issues with response biases, as discussed, and actually yielded significantly below chance performance in one of the experiments.

      (13) Mack et al. (2016) Consciousness and cognition

      The follow ups here were essentially yes/no combined with an assessment of surprise. Participants were asked to enter letters into a box, and if they did so “were immediately asked by the experimenter whether they had noticed anything different about the array on this last trial and if they did not, they were told that there had been no letters and their responses to that news were recorded. Clearly, if they expressed surprise, this would be compelling evidence that they were unaware of the absence of the letters. Those observers who did not enter letters and realized there were no letters present were considered aware of the absence.” So, this again has all of the same problems we identify, considering subjects unaware because they expressed surprise.

      (14) Devue et al. (2009) Perception

      An 8-alternative task was used. The authors were primarily interested in a comparative analysis and so did not use this task to exclude subjects. We note that an 8 alternative task is very demanding – compare the 12-alternative task used in Todd, Fougnie & Marois (2005). There was an attempt to investigate biases in a separate bias trial, however SDT measures were not used.

      (15) Memmert (2014) Cognitive Development

      “After watching the video and stating the number of passes, participants answered four questions (following Simons & Chabris, 1999): (1) While you were counting, did you perceive anything unusual on the video? (2) Did you perceive anything other than the six players? (3) Did you see anyone else (besides the six players) appear on the video? (4) Did you notice a gorilla walk across the screen? After any “yes” reply, children were asked to provide details of what they noticed. If at any point a child mentioned the unexpected event, the remaining questions were omitted.” All of these follow-up questions are yes/no judgments, used to determine awareness in exactly the way we critique as problematic.

      (16) Moore & Egeth (1997) JEP:HPP

      This study (which includes one of us, Egeth, as author) did use forced choice questions. In one case, the question was 2-alternative, in the other it was 4-alternative. In the latter case, SDT would have been appropriate but was not used. In the former case, it may have been that a larger sample would have revealed evidence of sensitivity to the background pattern (as it stood 55% answered the 2-alternative question correctly). Although these results have been replicated, unfortunately the replication in Wood and Simons 2019 used a 6-alternative recognition task and this was not analyzed using SDT. We also note that the task is rather difficult in this study. Wood and Simons report: “Exclusion rates were much higher than anticipated, primarily due to exclusions when subjects failed to correctly report the pattern on the full-attention trial; we excluded 361 subjects, or 58% of our sample.”

      (17) Cohen et al. (2020) Proc Natl Acad Sci

      While this paper improves over a simple yes/no question in some ways, especially in that it used the follow up questions to exclude subjects from the unaware (IB) group, the follow up probes nonetheless remain yes/no questions, subject to response bias, e.g.:

      (1) “Did you notice anything strange or different about that last trial?”

      (2) “If I were to tell you that we did something odd on the last trial, would you have a guess as to what we did?”

      (3) “If I were to tell you we did something different in the second half of the last trial, would you have a guess as to what we did?”

      (4) “Did you notice anything different about the colors in the last scene?”

      Follow up questions of this kind can be especially susceptible to bias, since subjects may be reluctant to “take back” their earlier answers and so be conservative in responding positively to avoid inconsistency or acknowledgement of earlier error. This may explain why such follow up questions can produce remarkable consistency despite their rather different wording. 

      (18) Cohen et al. (2011) Psych Science

      Here are the probes used in this study:

      (1) Did you notice anything different on that trial?

      (2) Did you notice something different about the background stream of images?

      (3) Did you notice that a different type of image was presented in the background that was unique in some particular way?

      (4) Did you see an actual photograph of a natural scene in that stream?

      (5) If I were to tell you that there was a photograph in that stream, can you tell me what it was a photograph of?

      Qs 1-4 are yes/no. Q5 is yes/no with an open-ended response. After this, a 5 or 6-alternative recognition test was administered. So again, this faces the same issues, since y/n questions are subject to bias in the way we have described, and many-alternative tests are more problematic than 2afc tests.

      In summary

      We really appreciate the care that went into compiling this list, and we agree that these papers and the improved methods they contain are relevant. But as hopefully made clear above, the approaches in each of these papers simply don’t solve the foundational issues our critique is aimed at (though they may address other issues). This is why we felt our new approach was necessary. And we continue to feel this way even after reading and incorporating these comments from Dr. Cohen.

      Nevertheless, there is clearly lots for us to do in light of these comments. And so as noted earlier we have now added a very substantial new section to our discussion section to more fairly and completely portray the state of the art in this literature. This is really to our benefit in the end, since we now not only better acknowledge the diverse approaches present, but also set up ourselves to make our novel contribution exceedingly clear.

      Main point 2: Let's imagine for a second that every study did just ask a yes/no question and then would stop. So, the criticism the authors are bringing up is valid (even though I believe it is not). I am not entirely sure that above chance performance on a forced choice task proves that the inattentionally blind can see after all. Could it just be a form of subliminal priming? Could there be a significant number of participants who basically would say something like, "No I did not see anything, and I feel like I am just guessing, but if you want me to say whether the thing was to the left or right, I will just 100% guess"? I know the literature on priming from things like change and inattentional blindness is a bit unclear, but this seems like maybe what is going on. In fact, maybe the authors are getting some of the best priming from inattentional blindness because of their large sample size, which previous studies do not use.

      I'm curious how the authors would relate their studies to masked priming. In masked priming studies, observers say the did not see the target (like in this study) but still are above chance when forced to guess (like in this study). Do the researchers here think that that is evidence of "masked stimuli are truly seen" even if a participant openly says they are guessing?

      We’re grateful to the reviewer for raising this question. As we say in response to Reviewer #1, our primary ambition in the paper is to establish, as our title suggests, residual sensitivity in IB. The ambition is quite neutral as to whether the sensitivity reflects conscious or unconscious processing (i.e. is akin to blindsight as traditionally conceived, or what the reviewer here suggests may be happening in masked priming). Since we were evidently insufficiently clear about this we have revised our manuscript in several places to clarify that we take our data primarily to support the more modest claim that there is residual sensitivity (conscious or unconscious) in the group of subjects who are traditionally classified as inattentionally blind. We believe that this claim has much more solid support in our data than our secondary and tentative suggestion about awareness.

      This said, we do consider masked priming studies to be susceptible to the critique that performance may reflect degraded conscious awareness which is unreported because of conservative response criteria. There is good evidence that response criteria tend to be conservative near threshold (Björkman et al. 1993; see also: Railo et al. 2020), including specifically in masked priming studies (Sand 2016, cited in Phillips 2021). So, we consider it a perfectly reasonable hypothesis that subjects who say they feel they are guessing in fact have conscious access to a degraded signal which is insufficient to reach a conservative response criterion but nonetheless sufficient to perform above chance in 2afc detection. Of course, we appreciate that this hypothesis is controversial, so it is not one we argue for in our paper (though we are happy to share our feelings about it here).

      Main point 3: My last question is about how the authors interpret a variety of inattentional blindness findings. Previous work has found that observers fail to notice a gorilla in a CT scan (Drew et al., 2013), a fight occurring right in front of them (Chabris et al., 2011), a plane on a runway that pilots crash into (Haines, 1991), and so forth. In a situation like this, do the authors believe that many participants are truly aware of these items but simply failed to answer a yes/no question correctly? For example, imagine the researchers made participants choose if the gorilla was in the left or right lung and some participants who initially said they did not notice the gorilla were still able to correctly say if it was in the left or right lung. Would the authors claim "that participant actually did see the gorilla in the lung"? I ask because it is difficult to understand what it means to be aware of something as salient as a gorilla in a CT scan, but say "no" you didn't notice it when asked a yes/no question. What does it mean to be aware of such important, ecologically relevant stimuli, but not act in response to them and openly say "no" you did not notice them?

      Our view is that in such cases, observers may well have a “degraded” percept of the relevant feature (gorilla, plane, fight etc.). But crucially we do not suggest that this percept is sufficient for observers to recognize the object/event as a gorilla, plane, fight etc. Our claim is only that, in our studies at least, observers (as a group) do have enough information about the unexpected stimuli to locate them, and discriminate certain low level features better than chance. Crudely, it may be that subjects see the gorilla simply as a smudge or the plane as a shadowy patch etc. (One of us who is familiar with the gorilla CT scan stimuli notes that the gorilla is in fact rather hard to see even when you know which slide it is on, suggesting that they are not as “salient” as the reviewer suggests!) 

      More precisely, in the paper we write that in our view perhaps “...unattended stimuli are encoded in a partial or degraded way. Here we see a variety of promising options for future work to investigate. One is that unattended stimuli are only encoded as part of ensemble representations or summary scene statistics (Rosenholtz, 2011; Cohen et al., 2016). Another is that only certain basic “low-level” or “preattentive” features (see Wolfe & Utochkin, 2019 for discussion) can enter awareness without attention. A final possibility consistent with the present data is that observers can in principle be aware of individual objects and higher-level features under inattention but that the precision of the corresponding representations is severely reduced. Our central aim here is to provide evidence that awareness in inattentional blindness is not abolished. Further work is needed to characterize the exact nature of that awareness.” We hope this sheds light on our perspective while still being appropriately cautious not to go too far beyond our data.

      Overall: I believe there are many aspects of this set of studies that are innovative and I hope the methods will be used more broadly in the literature. However, I believe the authors misrepresent the field and overstate what can be interpreted from their results. While I am sure there are cases where more nuanced questions might reveal inattentional blindness is somewhat overestimated, claims like "the inattentionally blind can see after all" or "Inattentionally blind subjects consciously perceive thest stimuli after all" seem to be incorrect (or at least not at all proven by this data).

      Once again, we would like to thank this reviewer for his feedback, which obviously comes from a place of tremendous expertise on these issues. We appreciate his assessment that our studies are innovative and that our methodological advances will be of use more broadly. We also hear the reviewer loud and clear about the passages in question, which on reflection we agree are not as central to our case as the other claims we make (regarding residual sensitivity and conservative responding), and so we have now edited them accordingly to refocus our discussion on only those claims that are central and supported. Thank you for making our paper stronger!

      Reviewer #3 (Public review):

      Summary:

      Authors try to challenge the mainstream scientific as well as popularly held view that Inattentional

      Blindness (IB) signifies subjects having no conscious awareness of what they report not seeing (after being exposed to unexpected stimuli). They show that even when subjects indicate NOT having seen the unexpected stimulus, they are at above chance level for reporting features such as location, color or movement of these stimuli. Also, they show that 'not seen' responses are in part due to a conservative bias of subjects, i.e. they tend to say no more than yes, regardless of actual visibility. Their conclusion is that IB may not (always) be blindness, but possibly amnesia, uncertainty etc.

      We just thought to say that we felt this was a very accurate summary of our claims, and in ways underscore the modesty we had hoped to convey. This is especially true of the reviewer’s final sentence: “Their conclusion is that IB may not (always) be blindness, but possibly amnesia, uncertainty etc.”; as we noted in response to other reviewers, our claim is not that IB doesn’t exist, that subjects are always conscious of the stimulus, etc.; it is only that the cohort of IB subjects show sensitivity to the unattended stimulus in ways that suggest they are not as blind as traditionally conceived. Thank you for reading us as intended!

      Strengths:

      A huge pool of (25.000) subjects is used. They perform several versions of the IB experiments, both with briefly presented stimuli (as the classic Mack and Rock paradigm), as well as with prolonged stimuli moving over the screen for 5 seconds (a bit like the famous gorilla version), and all these versions show similar results, pointing in the same direction: above chance detection of unseen features, as well as conservative bias towards saying not seen.

      We’re delighted that the reviewer appreciated these strengths in our manuscript!

      Weaknesses:

      Results are all significant but effects are not very strong, typically a bit above chance. Also, it is unclear what to compare these effects to, as there are no control experiments showing what performance would have been in a dual task version where subjects have to also report features etc for stimuli that they know will appear in some trials

      The backdrop to the experiments reported here is the “consensus view” (Noah & Mangun, 2020) according to which inattention completely abolishes perception, such that subjects undergoing IB “have no awareness at all of the stimulus object” (Rock et al., 1992) and that “one can have one’s eyes focused on an object or event … without seeing it at all” (Carruthers, 2015). In this context, we think our findings of significant above-chance sensitivity (e.g., d′ = 0.51 for location in Experiment 1; chance, of course, would be d′ = 0 here) are striking and constitute strong evidence against the consensus view. We of course agree that the residual sensitivity is far lower than amongst subjects who noticed the stimulus. For this reason, we certainly believe that inattention has a dramatic impact on perception. To that extent, our data speak in favor of a “middle ground” view on which inattention substantially degrades but crucially does not abolish perception/explicit encoding. We see this as an importantly neglected option in a literature which has overly focused on seen/not seen binaries (see our section ‘Visual awareness as graded’).

      Regarding the absence of a control condition, we think those conditions wouldn’t have played the same role in our experiments as they typically play in other experiments. As Reviewer #1 comments, the main role of such trials in previous work has been to exclude from analysis subjects who failed to report the unexpected stimulus on the divided and/or full attention control trials. As Reviewer #1 points out, excluding such subjects would very likely have ‘helped’ us. However, the practice is controversial. Indeed, in a review of 128 experiments, White et al. 2018 argue that the practice has “problematic consequences” and “may lead researchers to understate the pervasiveness of inattentional blindness". Since we wanted to offer as simple and demanding a test of residual sensitivity in IB as possible, we thus decided not to use any exclusions, and for that reason decided not to include divided/full attention trials.

      As recommended, we discuss this decision not to include divided/full attention trials and our logic for not doing so in the manuscript. As we explain, not having those conditions makes it more impressive, not less impressive, that we observed the results we in fact did — it makes our results more interpretable, not less interpretable, and so absence of such conditions from our manuscript should not (in our view) be considered any kind of weakness.

      There are quite some studies showing that during IB, neural processing of visual stimuli continues up to high visual levels, for example, Vandenbroucke et al 2014 doi:10.1162/jocn_a_00530 showed preserved processing of perceptual inference (i.e. seeing a kanizsa illusion) during IB. Scholte et al 2006 doi: 10.1016/j.brainres.2005.10.051 showed preserved scene segmentation signals during IB. Compared to the strength of these neural signatures, the reported effects may be considered not all that surprising, or even weak.

      We agree that such evidence of neural processing in IB is relevant to — and perhaps indeed consistent with — our picture, and we’re grateful to the reviewer for pointing out further studies along those lines. Previously, we mentioned a study from Pitts et al., 2012 in which, as we wrote, “unexpected line patterns have been found to elicit the same Nd1 ERP component in both noticers and inattentionally blind subjects (Pitts et al., 2012).” We have added references to both the studies which the reviewer mentions – as well as an additional relevant study – to our manuscript in this context. Thank you for the helpful addition.

      We do however think that our studies are importantly different to this previous work. Our question is whether processing under IB yields representations which are available for explicit report and so would constitute clear evidence of seeing, and perhaps even conscious experience. As we discuss, evidence for this kind of processing remains wanting: “A handful of prior studies have explored the possibility that inattentionally blind subjects may retain some visual sensitivity to features of IB stimuli (e.g., Schnuerch et al., 2016; see also Kreitz et al., 2020, Nobre et al., 2020). However, a recent meta-analysis of this literature (Nobre et al., 2022) argues that such work is problematic along a number of dimensions, including underpowered samples and evidence of publication bias that, when corrected for, eliminates effects revealed by earlier approaches, concluding “that more evidence, particularly from well-powered pre-registered experiments, is needed before solid conclusions can be drawn regarding implicit processing during inattentional blindness” (Nobre et al., 2022).” Our paper is aimed at addressing this question which evidence of neural processing can only speak to indirectly.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      (1) Please report all of the data, especially the number of subjects in each experiment that answered Y/N and the numbers of subjects in each of the Y and N groups that guessed a feature correctly/incorrectly on the 2AFC tasks. And also the confidence ratings for the 2AFC task (for comparison with the confidence ratings on the Y/N questions).

      We now report all this data in our (revised) Supplementary Materials. We agree that this information will be helpful to readers.

      (2) Consider adding a control condition with partial attention (dual task) or full attention (single task) to estimate the rates of seeing the critical stimulus when it's expected.

      This is the only recommendation we have chosen not to implement. The reason, as we explain in detail above (especially in response to Reviewer #1 comment 5), is that this would not in fact be a “control condition” in our studies, and indeed would only inflate the biases we are concerned with in our work. As the referee comments, the main role of such trials in previous work has been to exclude from analysis subjects who failed to report the unexpected stimulus on the divided and/or full attention control trials. And the practice is controversial: Indeed, in a review of 128 experiments, White et al. 2018 argue that the practice has “problematic consequences” and “may lead researchers to understate the pervasiveness of inattentional blindness" (emphasis added). So, our choice not to have such conditions ensures an especially stringent test of our central claim. Not having those conditions (and their accompanying exclusions) makes our results more interpretable, not less interpretable, and so the absence of such conditions from our manuscript should not (in our view) be considered any kind of weakness.

      We have added a paragraph to our “Design and analytical approach” section explaining the logic behind our deliberate decision not to include divided or full attention trials in our experiments. (For even fuller discussion, see our response to Reviewer #1’s comment 5 above.)

      (3) Consider revising the interpretations to be more precise about the distinction between the super subject being above chance versus each individual subject who cannot be at chance or above chance because there was only a single trial per subject.

      We have now done this throughout the manuscript, as discussed above. We have also added a substantive additional discussion to our “Design and analytical approach” section discussing what should be said about individual subjects in light of our group level data.

      This was a very helpful point, and greatly clarifies the claims we wish to make in the paper. Thank you for this comment, which has certainly made our paper stronger.

      Reviewer #2 (Recommendations for the authors):

      I would be curious to hear the authors' response to two points:

      (1) What do they have to say about prior studies that do more than just ask yes/no questions (and ask several follow-ups)? Are those studies "valid"?

      A very substantial new discussion of this important point has been added. As you will see above, we comment on every one of the 18 papers this reviewer raised (as well as the general argument made); we contend that while many of these papers improve on past methodology in various ways, most in fact do “just ask yes/no questions”, and none of them makes the methodological advance we offer in our manuscript. However, this discussion has helped us clarify that very advance, and so working through this issue has really helped us improve our paper and make its relation to existing literature that much clearer. Thank you for raising this crucial point.

      (2) Do the authors think it is possible that in many cases, people are just guessing about a critical item's location or color and this is at least in part a form of priming?

      We have clarified our discussion in numerous places to further emphasize that our main point concerns above-chance sensitivity, not awareness. Given this, we take very seriously the hypothesis that something like priming of a kind sometimes proposed to occur in cases of blindsight or other putative cases of unconscious perception could be what is driving the responses in non-noticers.

      Reviewer #3 (Recommendations for the authors):

      (1) Control dual task version with expected stimuli would be nice

      We have added a paragraph to our “Design and analytical approach” section explaining the logic behind our deliberate decision not to include divided or full attention trials, which would not in fact be a “control” task in our experiments. For full discussion, see our response to Reviewer 3 above, as well as our summary here in the Recommendations for Authors section in responding to Reviewer 1, recommendation (2).

      (2) Please do a better job in discussing and introducing experiments about neural signatures during IB.

      A discussion of Vandenbroucke et al. 2014 and Scholte et al. 2006 has been added to our discussion of neural signatures in IB, as well as an additional reference to an important early study of semantic processing in IB (Rees et al., 1999). Thank you for these very helpful suggestions!

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Dong et al. study the directed cell migration of tracheal stem cells in Drosophila pupae. The migration of these cells which are found in two nearby groups of cells normally happens unidirectionally along the dorsal trunk towards the posterior. Here, the authors study how this directionality is regulated. They show that inter-organ communication between the tracheal stem cells and the nearby fat body plays a role. They provide compelling evidence that Upd2 production in the fat body and JAK/STAT activation in the tracheal stem cells play a role. Moreover, they show that JAK/STAT signalling might induce the expression of apicobasal and planar cell polarity genes in the tracheal stem cells which appear to be needed to ensure unidirectional migration. Finally, the authors suggest that trafficking and vesicular transport of Upd2 from the fat body towards the tracheal cells might be important.

      Strengths:

      The manuscript is well written. This novel work demonstrates a likely link between Upd2JAK/STAT signalling in the fat body and tracheal stem cells and the control of unidirectional cell migration of tracheal stem cells. The authors show that hid+rpr or Upd2RNAi expression in a fat body or Dome RNAi, Hop RNAi, or STAT92E RNAi expression in tracheal stem cells results in aberrant migration of some of the tracheal stem cells towards the anterior. Using ChIP-seq as well as analysis of GFP-protein trap lines of planar cell polarity genes in combination with RNAi experiments, the authors show that STAT92E likely regulates the transcription of planar cell polarity genes and some apicobasal cell polarity genes in tracheal stem cells which appear to be needed for unidirectional migration. Moreover, the authors hypothesise that extracellular vesicle transport of Upd2 might be involved in this Upd2-JAK/STAT signalling in the fat body and tracheal stem cells, which, if true, would be quite interesting and novel.

      Overall, the work presented here provides some novel insights into the mechanism that ensures unidirectional migration of tracheal stem cells that prevents bidirectional migration. This might have important implications for other types of directed cell migration in invertebrates or vertebrates including cancer cell migration.

      Weaknesses:

      It remains unclear to what extent Upd2-JAK/STAT signalling regulates unidirectional migration. While there seems to be a consistent phenotype upon genetic manipulation of Upd2-JAK/STAT signalling and planar cell polarity genes, as in the aberrant anterior migration of a fraction of the cells, the phenotype seems to be rather mild, with the majority of cells migrating towards the posterior.

      We agree that the phenotype is mild, as perturbing JAK/STAT signaling in the progenitors specifically affects the coordinated migration of the cells rather than alters their direction or completely blocks migration. Our data indicate that inter-organ communication ensures coordinated behavior of the progenitor cells, although the differential responses exhibited by individual cells represent an interesting unresolved issue that awaits future in-depth investigation.

      While I am not an expert on extracellular vesicle transport, the data presented here regarding Upd2 being transported in extracellular vesicles do not appear to be very convincing.

      We performed additional PLA experiments which support the interaction between Upd2 and the core components of extracellular vesicles (revised Figure 8). Furthermore, we performed electron microscopy to visualize the Lbm-containing vesicles in fat body (Figure 8-figure supplement 1D).

      These data are now provided in the revised manuscript.

      Major comments:

      (1) The graphs showing the quantification of anterior (and in some cases also posterior migration) are quite confusing. E.g. Figure 1F (and 5E and all others): These graphs are difficult to read because the quantification for the different conditions is not shown separately. E.g. what is the migration distance for Fj RNAi anterior at 3h in Fig5E? Around -205micron (green plus all the other colors) or around -70micron (just green, even though the green bar goes to -205micron). If it's -205micron, then the images in C' or D' do not seem to show this strong phenotype. If it's around -70, then the way the graph shows it is misleading, because some readers will interpret the result as -205. Moreover, it's also not clear what exactly was quantified and how it was quantified. The details are also not described in the methods. It would be useful, to mark with two arrowheads in the image (e.g. 5 A' -D') where the migration distance is measured (anterior margin and point zero).

      Overall, it would be better, if the graph showed the different conditions separately. Also, n numbers should be shown in the figure legend for all graphs.

      We apologize for those inappropriate presentation and insufficient description and thank you for kindly pointing them out. We used different colors to represent different genotypes, and the columns were superimposed. we chose to show the quantification in different conditions separately in the revised Figures. The anterior migration distance for Fj RNAi is around 70 µm.

      We now provided detailed description in the revised methods. For migration distance measurement, we took snapshots at 0hr\ 1hr\ 2hr and 3hr, and measured the distance from the starting point (the junction of TC and DT) to the leading edge of progenitor clusters. The velocity formula: v=d (micrometer)/t (min). As you kindly suggested, we indicated the anterior margin and point zero in the corresponding panels. We have added n number in the legends.

      (2) Figure 2-figure supplement 1: C-L and M: From these images and graph it appears that Upd2 RNAi results in no aberrant anterior migration. Why is this result different from Figures 2D-F where it does?

      The fat body-expressing lsp2-Gal4 was used in Figure 2-figure supplement 1C-L and Figure 2D-F, while trachea specific btl-Gal4 was used in Figure 2-figure supplement 1K-L. The lsp2-Gal4-driven but not btl-Gal4-driven upd2RNAi causes aberrant anterior migration, suggesting that fat bodyderived Upd2 plays a role. We have further clarified this in the text.

      (3) Figure 5F: The data on the localisation of planar cell polarity proteins in the tracheal stem cell group is rather weak. Figure 5G and J should at least be quantified for several animals of the same age for each genotype. Is there overall more Ft-GFP in the cells on the posterior end of the cell group than on the opposite side? Or is there a more classic planar cell polarity in each cell with FtGFP facing to the posterior side of the cell in each cell? Maybe it would be more convincing if the authors assessed what the subcellular localisation of Ft is through the expression of Ft-GFP in clones to figure out whether it localises posteriorly or anteriorly in individual cells.

      We staged the animals, measured several animals for each genotype and provided the quantifications in the revised manuscript. The level of Ft-GFP is higher in the cells at the frontal edge. We tried to examine the expression of Ft-GFP at single-cell level. However, this turned out to be technically difficult because the tracheal stem cells are not regularly arranged as epithelial cells and the proximal-distant axis of the tracheal stem cells remains unclear. We thus decided to measure the fluorescence signal of groups of stem cells along the DT regardless of their individual polarity within cells.

      (4) Regarding the trafficking of Upd2 in the fat body, is it known, whether Grasp65, Lbm, Rab5, and 7 are specifically needed for extracellular vesicle trafficking rather than general intracellular trafficking? What is the evidence for this?

      In our experiments, knocking down rab5, rab7, grasp65 or lbm in trachea using btl-Gal4 did not cause abnormality in the disciplined migration, which excludes their intracellular contribution in the trachea (Figure 7-figure supplement 1). Perturbation of Grasp65 or Lbm in fat body increased intracellular upd2-containing vesicles, indicating that intracellular production is functional (Figure 6J). The Grasp65 is specifically required for Upd2 production. Lbm, Rab5 and Rab7 are important of vesicle trafficking. Our conclusion does not pertain to extracellular or intracellular compartment.

      (5) Figure 8A-B: The data on the proximity of Rab5 and 7 to the Upd2 blobs are not very convincing.

      The confocal images indicate the proximity of Rab5 and Rab7 to the Upd2 vesicles. We interpret the proximity together with the results from Co-IP and PLA data (Figure 8E-K).

      (6) The authors should clarify whether or not their work has shown that "vesicle-mediated transport of ligands is essential for JAK/STAT signaling". In its current form, this manuscript does not appear to provide enough evidence for extracellular vesicle transport of Upd2.

      Lbm belongs to the tetraspanin protein family that contains four transmembrane domains, which are the principal components of extracellular vesicles. We show that Lbm interacts with Upd2. The JAK/STAT signaling depends on the Upd2 in the fat body as well as vesicle trafficking machinery. Furthermore, we performed electron microscopy and show the presence of Lbm-containing vesicles in fat body (Figure 8-figure supplement 1D).

      (7) What is the long-term effect of the various genetic manipulations on migration? The authors don't show what the phenotype at later time points would be, regarding the longer-term migration behaviour (e.g. at 10h APF when the cells should normally reach the posterior end of the pupa). And what is the overall effect of the aberrant bidirectional migration phenotype on tracheal remodelling?

      We observed that the integrity of tracheal network especially the dorsal trunk was impaired, which may be due to incomplete regeneration (Figure 3-figure supplement1E-I).

      (8) The RNAi experiments in this manuscript are generally done using a single RNAi line. To rule out off-target effects, it would be important to use two non-overlapping RNAi lines for each gene.

      We validated the phenotype using several independent RNAi alleles.

      Reviewer #2 (Public review):

      Summary:

      This work by Dong and colleagues investigates the directed migration of tracheal stem cells in Drosophila pupae, essential for tissue homeostasis. These cells, found in two nearby groups, migrate unidirectionally along the dorsal trunk towards the posterior to replenish degenerating branches that disperse the FGF mitogen. The authors show that inter-organ communication between tracheal stem cells and the neighboring fat body controls this directionality. They propose that the fat body-derived cytokine Upd2 induces JAK/STAT signaling in tracheal progenitors, maintaining their directional migration. Disruption of Upd2 production or JAK/STAT signaling results in erratic, bidirectional migration. Additionally, JAK/STAT signaling promotes the expression of planar cell polarity genes, leading to asymmetric localization of Fat in progenitor cells. The study also indicates that Upd2 transport depends on Rab5- and Rab7-mediated endocytic sorting and Lbm-dependent vesicle trafficking. This research addresses inter-organ communication and vesicular transport in the disciplined migration of tracheal progenitors.

      Strengths:

      This manuscript presents extensive and varied experimental data to show a link between Upd2JAK/STAT signaling and tracheal progenitor cell migration. The authors provide convincing evidence that the fat body, located near the trachea, secretes vesicles containing the Upd2 cytokine. These vesicles reach tracheal progenitors and activate the JAK-STAT pathway, which is necessary for their polarized migration. Using ChIP-seq, GFP-protein trap lines of planar cell polarity genes, and RNAi experiments, the authors demonstrate that STAT92E likely regulates the transcription of planar cell polarity genes and some apicobasal cell polarity genes in tracheal stem cells, which seem to be necessary for unidirectional migration.

      Weaknesses:

      Directional migration of tracheal progenitors is only partially compromised, with some cells migrating anteriorly and others maintaining their posterior migration.

      Our results suggest that Upd2-JAK/STAT signaling is required for the consistency of disciplined migration. Although only a few tracheal progenitors display anterior migration, these cells lose the commitment of directional movement. We acknowledge that the phenotype is moderate.

      Additionally, the authors do not examine the potential phenotypic consequences of this defective migration.

      We examined the long-term effects of the aberrant migration and observed an impairment of tracheal integrity and melanized tracheal branches (Figure 3-figure supplement1E-I).

      It is not clear whether the number of tracheal progenitors remains unchanged in the different genetic conditions. If there are more cells, this could affect their localization rather than migration and may change the proposed interpretation of the data.

      We examined the progenitor cell number in bidirectional movement samples and control group. The results show that cell number does not exhibit a significant difference between control and bidirectional movement groups (Figure 3-figure supplement 1).

      Upd2 transport by vesicles is not convincingly shown.

      We performed additional PLA experiments to further support the interaction between Upd2 and the core components of extracellular vesicles. Furthermore, we performed electron microscopy and show the presence of Lbm-containing vesicles in fat body (Figure 8-supplement 1D). Additional experiments such as colocalization and Co-IP assay and better quantification are provided in the revised manuscript (see revised Figure 8).

      Data presentation is confusing and incomplete.

      We used different colors to represent different genotypes, and the columns were superimposed. we changed the graphs to show the quantification in different conditions separately. We revised data presentation to avoid confusing.

      Reviewer #3 (Public review):

      Summary:

      Dong et al tackle the mechanism leading to polarized migration of tracheal progenitors during Drosophila metamorphosis. This work fits in the stem cell research field and its crucial role in growth and regeneration. While it has been previously reported by others that tracheal progenitors migrate in response to FGF and Insulin signals emanating from the fat body in order to regenerate tracheal branches, the authors identified an additional mechanism involved in the communication of the fat body and tracheal progenitors.

      Strengths:

      The data presented were obtained using a wide range of complementary techniques combining genetics, molecular biology, quantitative, and live imaging techniques. The authors provide convincing evidence that the fat body, found in close proximity to the trachea, secrete vesicles containing the Upd2 cytokine that reach tracheal progenitors leading to JAK-STAT pathway activation, which is required for their polarized migration. In addition, the authors show that genes regulating planar cell polarity are also involved in this inter-organ communication.

      Weaknesses:

      (1) Affecting this inter-organ communication leads to a quite discrete phenotype where polarized migration of tracheal progenitors is partially compromised. The study lacks data showing the consequences of this phenotype on the final trachea morphology, function, and/or regeneration capacities at later pupal and adult stages. This could potentially increase the significance of the findings.

      Regarding your kind suggestion, we examined the long-term effects of the aberrant migration and observed the impairment of tracheal integrity and melanized tracheal branches (Figure 3-figure supplement1E-I).

      (2) The conclusions of this paper are mostly well supported by data, but some aspects of data acquisition and analysis need to be clarified and corrected, such as recurrent errors in plotting of tracheal progenitor migration distance that mislead the reader regarding the severity of the phenotype.

      We used different colors to represent different genotypes, and the columns were superimposed. we changed the graphs to show the quantification in different conditions separately. We thank you for kindly pointing it out.

      (3) The number of tracheal progenitors should be assessed since they seem to be found in excess in some genetic conditions that affect their behavior. A change in progenitor number could lead to crowding, thus affecting their localization rather than migration capacities, thereby changing the proposed interpretation. In addition, the authors show data suggesting a reduced progenitor migration speed when the fat body is affected, which would also be consistent with a crowding of progenitors.

      We examined the cell number in bidirectional movement samples and control group. We examined cell number and cell proliferation and observed that there was no significance between control and bidirectional movement groups (Figure 3-figure supplement 2).

      (4) The authors claim that tracheal progenitors display a polarized distribution of PCP proteins that is controlled by JAK-STAT signaling. However, this conclusion is made from a single experiment that is not quantified and for which there is no explanation of how the plot profile measurements were performed. It also seems that this experiment was done only once. Altogether, this is insufficient to support the claim. Finally, a quantification of the number of posterior edges presenting filopodia rather than the number of filopodia at the anterior and posterior leading edges would be more appropriate.

      We staged the animals, measured several animals for each genotype and provided the quantifications in the revised manuscript. The level of Ft-GFP is higher in the cells at the frontal edge. We tried to examine the expression of Ft-GFP at single-cell level. However, this turned out to be difficult due to the fact that the tracheal stem cells are not regularly patterned as epithelial cells and the proximaldistant axis of tracheal stem cells is not well defined. We thus decided to measure the fluorescence signal of groups of stem cells along the DT regardless of their individual polarity.

      (5) The authors demonstrate that Upd2 is transported through vesicles from the fat body to the tracheal progenitors where they propose they are internalized. Since the Upd2 receptor Dome ligand binding sites are exposed to the extracellular environment, it is difficult to envision in the proposed model how Upd2 would be released from vesicles to bind Dome extracellularly and activate the JAK-STAT pathway. Moreover, data regarding the mechanism of the vesicular transport of Upd2 are not fully convincing since the PLA experiments between Upd2 and Rab5, Rab7, and Lbm are not supported by proper positive and negative controls and co-immunoprecipitation data in the main figure do not always correlate to the raw data.

      We use molecular modeling to show that Upd2 and Lbm intermingle, and Upd2 is not entirely encapsulated in vesicles (Figure 8-supplement 1E). We performed PLA experiments using the animals not expressing upd2-Cherry as negative control (Figure 8 E-J). We corrected the Co-IP panel and apologize for this error.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor comments:

      (1) Figure 1-figure supplement 1: E: How was the migration velocity assessed? By live imaging individual cells or following the cell front of the group? Over what time period? Do the data points in the graph correspond to individual cells or the cell group? It would be important to show confocal images that go along with this quantification.

      We took snapshots of pupae at 0hr\ 1hr\ 2hr and 3hr, and measured the distance covered by the migrating progenitor cells from the start place (the junction of TC and DT) to the leading edge of progenitor groups. We then calculated the migration rate by v=d (micrometer)/t (min). As the progenitor cells revolve around and migrate along the DT, tracking single tracheoblast through intact cuticle is technically challenging. We have therefore measured the leading edge as a proxy to the whole cell group. We agree with you that time-lapse imaging is favorable for analysis of migration.

      (2) Figure 1-figure supplement 1: F: Why is there Gal80ts in the genotype? (and in Figure 1H). Also, what pupal age was used for this quantification?

      Expression of hid and rpr in L3 stage impaired fat body integrity and adipocyte abundance, and caused lethality. Gal80ts was used for controlling the expression of rpr.hid. The pupal at 0hr APF were used in EdU experiment.

      (3) Figure 2C: what is shown in the 6 columns (why 3 each for control and rpr/hid)?

      We conducted 3 replicates of each group for control and rpr.hid.

      (4) In the methods, several Drosophila stocks are listed as 'source:" from a particular person (e.g. Dr Ma). Please list the real source of this stock, e.g. Bloomington stock number, or the lab and publication in which the stock was originally made.

      We provide the information on these stocks in the revised methods.

      (5) The SKOV3 carcinoma cell and S2 cell work is not described in the methods.

      We added detailed description of this experiment in the revised method-Cell culture and transfection. 

      (6) Figure 6 (F) 'Bar graph plots the abundance of Upd2-mCherry-containing vesicles in progenitors.' What does abundance mean? What was quantified, the number of vesicles, or the mean intensity? This is also not mentioned in the methods.

      We counted the number of Upd2-mCherry-containing vesicles in fat body cells and trachea progenitors and added the description of measurement in the method.

      (7) There are a few language mistakes throughout the manuscript. E.g.

      (a) Line 117 and other places: Language: 'fat body' should be 'the fat body'.

      We thank you for pointing out these errors and corrected it accordingly.

      (b) Line 1276 Language mistakes: 'Video 1 3D-view of confocal image stacks of tracheal progenitors and fat body. Scale bar: 100 μm. Genotypes: UAS-mCD8-GFP/+;lsp2-Gal4,P[B123]-RFP-moe/+.' :stacks and genotypes should be singular.

      We fixed these errors and thank you for kindly pointing them out. We also proofread the entire manuscript to assure accuracy.

      (8) In general, it is hard to figure out the exact genotypes used in experiments. This is mostly not written very clearly in the figure legends. E.g. Figure 2: genotype for A-C missing in figure legend (is B from control animals?)

      We added genotypes in the figure legends. For Figure 2, A and C lsp2-Gal4,P[B123]-RFP-moe/+ for control, UAS-rpr-hid/+;Gal80ts/+;lsp2-Gal4,P[B123]-RFP-moe/+ for rpr.hid; B from control animals.

      Reviewer #2 (Recommendations for the authors):

      Major comments:

      (1) The phenotype resulting from Upd2 downregulation by RNAi is subtle and shown by unconvincing images. In addition, these phenotypes are analyzed using only one RNAi line.

      We used two independent alleles of upd2RNAi from THFC (THU1288 and THU1331), and observed similar phenotype. For RNAi experiments, we always use multiple independent alleles.

      (2) The authors should analyze the phenotypic consequences of directional migration changes. Is there an effect on tracheal remodeling?

      We observed that the integrity of tracheal network especially the dorsal trunk was impaired and that melanized tracheal branches were present, which may be due to incomplete regeneration (Figure 3figure supplement1E-I).

      (3) The number of tracheal progenitors should be quantified, as some genetic conditions may affect cell numbers, as is apparent in some panels.

      We examined cell number and cell proliferation and observed that there was no significance between control and bidirectional movement groups (Figure 3-figure supplement 1).

      (4) The data on PCP protein distribution are unconvincing, unquantified, and insufficient to support one of the main conclusions of the study, which is stated in the abstract: "JAK/STAT signaling promotes the expression of genes involved in planar cell polarity, leading to asymmetric localization of Fat in progenitor cells."

      We staged the animals, measured several animals for each genotype and provided the quantifications in the revised manuscript. The level of Ft-GFP is higher in the cells at the frontal edge. We tried to examine the expression of Ft-GFP at single-cell level. However, this turned out to be difficult due to the fact that the tracheal stem cells are not regularly patterned as epithelial cells and the proximaldistant axis of tracheal stem cells is not well defined. We thus decided to measure the fluorescence signal of groups of stem cells along the DT regardless of their individual polarity.

      Minor comments:

      (1) Language should be revised. In many places in the manuscript, starting in line 113, "fat body" should be "the fat body".

      Thank you for pointing out this error. We corrected it accordingly.

      (2) Genotypes used in experiments should be described.

      We added all the genotypes. We proofread the entire manuscript to complete the figure legends for genotypes.

      (3) Line 67, the reference to "The progenitor cells reside in Tr4 and Tr5 metameres and start to move along the tracheal branch" should include (Chen and Krasnow, Science 2014).

      We added the reference in the manuscript.

      (4) Line 1081, Figure 7 Legend. "Bar graph plots the abundance of Upd2-mCherry-containing vesicles" Abundance is the number of vesicles? The graph displays the average number of vesicles? Please explain and describe the quantification.

      The bar graph represents the number of Upd2-mCherry-containing vesicles in different conditions. We quantified the number of vesicles per area.

      (5) Figure 1 (I-J) What is shown on the panels? Progenitors marked with? This information is not present in the figure or figure legend. Same for Figure 2 (D-E).

      Figure 1I-J show the vector of migrating progenitors. We added the information in the legends. The tracheal cells were labeled by nls-mCherry in Figure 1I-J. In Figure 2D-E, the progenitors were marked with P[B123]-RFP-moe.

      (6) Figure 3 Q, Stat92E-GFP values in the graph are not well-explained. What do the numbers in the y-axis refer to?

      y-axis represents the intensity of Stat92E-GFP normalized to control. We have changed the y-axis label to ‘normalized Stat92E-GFP intensity’ in the legends.

      (7) In general, figures and figure legends must be revised. Sometimes stainings are not well-defined, some scale bars are missing and plots do not say what the values are.

      We apologized for inadequate information and have revised the figures and legends accordingly.

      Reviewer #3 (Recommendations for the authors):

      Several points should be addressed by the authors in order to improve their manuscript.

      Major points:

      (1) The phenotype obtained from decreasing the inter-organ signaling is quite discrete. It is further weakened by the fact that the images chosen to illustrate the measures are not really convincing. No image at 1h APF shows any clear anterior migration. Based on the scale, most of the images at 3h APF do not show a striking difference compared to the control, and in any case, stronger phenotypes would be missed anteriorly since they would thus be out of frame. In addition, at 3h APF, progenitors migrating anteriorly from Tr5 position get mixed with those migrating posteriorly from Tr4 so it is not clear how measurements were made. Given that most phenotypes are observed upon the use of RNAis, it is possible that phenotypes are weak due to persistent gene expression. Using null clones for dome, hop, or stat in progenitors could therefore aggravate the phenotypes and support further the significance of the study. Finally, assessing the consequences of compromised fat body-tracheal communication on trachea morphology, function, and regeneration later in pupal development and on adult flies would also help strengthen the importance of the findings.

      We agree with you that anteriorly migrated Tr5 progenitors adjoining Tr4 progenitor hinders measurements and that mutants may give stronger phenotype than RNAi lines. We only measured Tr4 progenitors (instead of Tr5) when assessing anterior migration. Thus, we performed experiments using mutant alleles, which gave aberrant migration of tracheal progenitors (Figure 3-figure supplement1A-D). We can now show that the integrity of tracheal network especially dorsal trunk was impaired, which may be due to incomplete regeneration (Figure 3-figure supplement1E-I).

      (2) Although the authors did not observe defects in tracheal progenitor proliferation, progenitors seem to be present in excess in some key genetic background (e.g, upon expression of rpr.hid, statRNAi, Rab-RNAi or in the presence of BFA). This excess could be the result of another mechanism than proliferation (recruitment of extra progenitors since it is not clear how they originate, defect in apoptosis...) and could impact the localization of progenitors, those being pushed anteriorly as a consequence of crowding. A proper characterization of tracheal progenitor number would thus help to discriminate between defects in migration or crowding. This point could also be addressed by performing individual tracking of tracheal progenitors, to find out whether each progenitor is indeed migrating in the wrong direction or if the movement assessed by the global tracking method that is used is just a consequence of progenitor excess.

      We examined the cell number in bidirectional movement samples and control group. The results show that there was no significance between control and bidirectional movement groups (Figure 3figure supplement 1). We also tried to follow every progenitor, but were unable to obtain convincing results with P[B123]-RFP-moe, as tracking single tracheoblast through intact cuticle is technically challenging.

      (3) Regarding the ChIP-seq experiment, an explanation of why choosing the "establishment of planar polarity" family should be provided since data indicate a quite low GeneRatio. Indeed, the "cell adhesion" family seems a more obvious candidate, which would be further supported by the fact that the JAK-STAT pathway has been shown to affect cell adhesion components such as ECadherin and FAK (Silver and Montell 2001, Mallart et al 2024). Also, have these known targets of JAK-STAT signaling been found in the ChIP-seq data? Since filopodia polarization is affected in tracheal progenitors when JAK-STAT signaling is decreased, the same question also applies to enabled, which is involved in filopodia formation and has been recently identified as a target of JAK-STAT signaling.

      As you kindly suggested, we tested a number of cell adhesion-related genes such as E-Cadherin (shg), fak, robo2 and enabled (ena). We did not observe an apparent aberrancy in the migration of tracheal progenitors (Figure 5-supplement 1J).

      (4) Data investigating PCP protein distribution is not convincing, not quantified, and not sufficient to draw one of the main conclusions of the study, which is even written in the abstract "JAK/STAT signaling promotes the expression of genes involved in planar cell polarity leading to asymmetric localization of Fat in progenitor cells."

      We better quantified the abundance of Ft in in the progenitors in the frontal edge and those lagging behind. The traces plot multiple replicates in the figures. The level of Ft-GFP is higher in the cells at the frontal edge.

      (5) Overall, the figures together with their caption and/or the material and methods section lack some important information for the reader to fully understand the data. In addition, some errors are found in multiple plots throughout the article and must be corrected. Here are some examples:

      According to your suggestion, we revised legends and methods section to include sufficient information.

      (a) Migration distance plots from Figure 3E do not match the data presented in the source data file. It seems that, when creating the plot, instead of superimposing the bars, bars were stacked. This should be corrected for all migration distance plots from Figure 3E onward, including in supplementary figures.

      We apologized for misleading representation. We revised it accordingly and show the quantification in different conditions separately.

      (b) The number of analyzed flies and/or clusters of tracheal progenitors from different flies should be stated for all quantification or observations made on images. This information is lacking for all migration distance plots, for progenitor migration tracking (Figure 1 I, J), for DIPF reporter in Figure 2J, for plot profiles (Figure 5G, J), for Upd2-Rab5/Rab7/Lbm co-detections, PLA, CoIP, and lbm-pHluorin experiments. This also applies to RNA seq, ChIP seq, and surface proteomics, for which the number of pupae and number of replicates is not indicated.

      We changed the graphs to show the quantification and n number in different conditions separately.

      We also added the n number of replicates in methods.

      (c) How quantifications were performed is not sufficiently explained. For example, the reference point for migration distance measurement is not defined, and neither is whether the measures were made on fixed or live imaging samples. In fluorescence intensity measurements and Upd2 vesicle counting, information on whether measures were made on a single z slice or on a projection of several z slices should be stated together with what ROI and which FIJI tool for quantification were used. For plot profiles, the same information regarding z slices misses together with how the orientation, the thickness, and the length of the line were chosen, and again the number of times the experiment was conducted should be mentioned and error bars should appear on graphs.

      We thank this reviewer for the suggestions which help clarify the methodology of our experiments and improve presentation of our data. We have made the changes according to the suggestions and modified our methods section and the related figures to incorporate these changes.

      For measuring the migration distance of tracheal progenitors, we took snapshots of living pupae at 0hr\ 1hr\ 2hr and 3hr APF, and measured the migration distance of tracheal progenitors from the start place (the junction of TC and DT) to the leading edge of progenitor groups.

      For the measurements of fluorescent intensity of stat92E-GFP and DIPF, we took z-stack confocal images of samples and quantified the fluorescent intensity using FIJI. Specifically, intensity was quantified for regions of interest, using the Analysis and Measurement tools. To quantify Upd2mCherry vesicles, z-stack confocal images of fat body were taken and the cell counting function of FIJI was used to measure the vesicle number.

      To quantify the fluorescent intensity of in vivo tagged Ds, Ft and Fj proteins, a single z slice was used. The expression level of the protein was assessed as the integrated fluorescent intensity normalized to area.

      For the measurement of Ft-GFP distribution, a single z slice of the progenitors immediately proximal to the DT was imaged. An arbitrary line was drawn along the migration direction from the starting TC-DT junction to the leading front (the length of the line corresponds to the distribution range of tracheal stem cell clusters). Then, fluorescent intensity along the line was automatically calculated with the imbedded measurement function of Zeiss confocal software.

      Minor points:

      (1) In several instances, the authors generalize that stem cells migrate to leave their niche, but this is not the case for all stem cells.

      The phenomenon that stem cells leave their niche when they are activated is commonly observed. We interpreted the general mechanism from our system of tracheal stem cells. We fully agree with you that it may not be the case for all stem cells. We modified the text accordingly.

      (2) Line 122 -a reference paper or an image showing the expression pattern of the lsp2-Gal4 driver is missing.

      We added the reference in the manuscript.

      (3) Line 136 - The term "traces of individual progenitors" is overstated and should be reformulated as the method used does not seem to be individual cell tracking.

      We rephrased accordingly in the revised manuscript.

      (4) Line 146 - Fat body and tracheal progenitors are qualified as interdependent organs, in which aspect do tracheal progenitors affect the fat body?

      Current knowledge suggests a close inter-organ crosstalk between trachea and fat body: The fly trachea provides oxygen to the body and influences the oxidation and metabolism of the whole body. When the trachea is perturbed, the body is in hypoxia, which causes inflammatory response in adipose tissue as an important immune organ (Shin et al., 2024).

      (5) Line 163 - Not all the genes tested are cytokines, so the sentence should be reformulated. In addition, in supplementary Fig2-1 C-J, the KD of hh seems to abolish completely tracheal progenitor migration, which is not commented on.

      According to your suggestion, we revised the description on information of the genes tested. We added comments in the revised manuscript regarding phenotypes of hh knockdown. 

      (6) Line 180 - Conclusion is made on Dome expression while using a dome-Gal4 construct, which does not necessarily recapitulate the endogenous pattern of dome expression, so it should be reformulated. Ideally, dome expression should be assessed in another way. Also, it is not clear whether GFP is present only in progenitors since images are zoomed.

      We revised statement and provided larger view of dome>GFP that shows an enriched expression in the tracheal progenitors (Figure 2-figure supplement 2E), an expression pattern that is consistent with FlyBase.

      (7) Line 199 - Is it upd-Gal4 or upd2-Gal4 that is used? Since the conclusion of the experiment is made on upd2, the use of upd-gal4 would not be relevant. If upd2-gal4 is used, it should be corrected. In general, the provenance of the Gal4 lines should be provided. In addition, a strong GFP signal in the trachea is visible on the image in Supplementary Figure 2-2F but not commented on and seems contradictory with the conclusion mentioning that fat body and gut are the main source of Upd2 production.

      We removed data obtained from the use of this irrelevant upd-Gal4 line.

      (8) Figures:

      -  Figure 1 G, H - Scale bar is missing.

      We added it accordingly.

      -  Figure 1 I, J - The information on the staining is missing.

      We added it in the revised manuscript.

      -  Figure 2A - Providing explanations of the terms "Count" and "Gene ratio" in the caption would be helpful for readers who are not used to this kind of data. In addition, the color code is confusing since the same color is used for the selected gene family and for high p-values (the same applies to other similar graphs).

      Gene ratio refers to the proportion of genes in a dataset that are associated with a particular biological process, function, or pathway. Count indicates the number of genes from input gene list that are associated with a specific GO term. We used redness to indicate a smaller p-value and a higher significance.

      -  Figure 2 B, C - What does the color scale represent? What do the columns in C correspond to, different time points, different replicates?

      The color scale represents the normalized expression. The columns in C correspond to different replicates of control and rpr.hid.

      -  Figure 2 F - The error bars on the 3h APF posterior bars are missing.

      We added error bars accordingly.

      -  Figure 2 G - The legend "Down-Stable-Up" is in comparison to what?

      The control group was generated from the reaction without H2O2. The comparison was relative to the control group.

      -  Figure 2 J - The specificity of the DIPF tool that has been created should be validated in other tissues displaying known JAK-STAT activity and/or in conditions of decreased JAK-STAT signaling. In addition, the added value of the tool as compared to the JAK-STAT activity reporter used later, which has been well characterized, is not obvious.

      We added the signal of DIPF in fat body and salivary gland, both of which harbor active JAK/STAT signaling (Figure 2-figure supplement 2F-H). As opposed to the well characterized Stat92E-GFP reporter that assays the downstream transcription activity, the DIPF reporter measures the upstream event of receptor dimerization.

      -  Figure 3 I-P - Reporter tool validation in Images I-L could be moved to supplementary data. In images M-P, staining of nuclei and/or membranes would be useful to assess cell integrity.

      We revised the figures accordingly.

      -  Figure 3Q and similar plots in the following figures do not explain the normalization performed and how it can be higher than 1 in control conditions.

      In these figures, we normalized the signal relative to control groups, e.g., The value of Stat92E-GFP in btl-GFP control group was set to 1 in the previous Figure 3Q (revised Figure 3-supplementary

      Figure B-J).

      -  Figure 4C - These representations lack explanations to be fully understood by a broad audience.

      The figure showing that Stat92E binding was detected in the promoters and intronic regions (the orange peaks) of genes functioning in distal-to-proximal signaling, such as ds, fj, fz, stan, Vang and fat2. We added the information in figure legend according to your suggestion.

      -  Figure 5 K,L - What is the x-axis missing, together with the method of tracking used?

      The x-axis refers to time of recording from a t stack series with a time interval of 5 min. We revised method section and provide detailed procedure of this experiment.

      -  Figures 6 and 8- The overall figures lack a wider view of the cells/tissues/organs and/or additional staining to understand what is presented.

      We showed preparation of fat body. In order to obtain the high resolution of vesicles, we used high magnification. We now added wider views of the tissues under investigation (e.g. Figure 6-figure supplement 1).

      -  Figure 6 D,E - The scale bar is missing.

      We added it accordingly.

      -  Figure 8 O-S - What is the blue staining?

      The blue staining shows DAPI-stained nuclei. We have added the information in the legend.

      -  PLA experiments can give a lot of non-specific background. What kind of controls have been used in Figure 8 F-J? Negative controls should be done on cells that do not express upd2-mCherry using both antibodies to detect non-specific background, which does not usually appear completely black.

      If possible, a positive control using a known protein interacting with Rab5-GFP should be included.

      We used the control samples without one of the primary antibodies in previous Figure 8. In the revised Figure 8, we conducted experiment as you suggested with controls that do not express upd2mCherry (Figure 8 E-J).

      -  Co-IP experiments - The raw data file for blots is quite hard to read through. Some legends are not facing the right lane and some blots presented in the main figure are difficult to track since several blots are presented in the raw data file. e.g.

      (a)  Raw blot for Figure 8 K: the band for mCherry in the IP anti-GFP blot (lane one in K) is not convincing, it is not distinguishable from other aspecific bands. On the reverse IP presented only in raw data, on the input from blot IB anti-mCherry, both lanes present exactly the same bands at 72kb when one of the lanes corresponds to extract from flies not expressing upd2-mCherry.

      We thank you for pointing out the incorrect labels. We apologized for the errors and corrected it accordingly.

      (b)  Raw blot for Figure 8 L: on the input blot IB anti-GFP, there is a band corresponding to Rab7-GFP in the lane of the extract from flies not expressing Rab7-GFP.

      We corrected it.

      (c)  Raw data for Figure 8 M: on the last blot, legends are missing above the input Ib anti-GFP blot.

      We added the missing legends in the figure.

      Shin, M., Chang, E., Lee, D., Kim, N., Cho, B., Cha, N., Koranteng, F., Song, J.J., and Shim, J. (2024). Drosophila immune cells transport oxygen through PPO2 protein phase transition. Nature 631, 350-359.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) Summary:

      In this manuscript, the model's capacity to capture epistatic interactions through multi-point mutations and its success in finding the global optimum within the protein fitness landscape highlights the strength of deep learning methods over traditional approaches.

      We thank the reviewer for his/her recognition of our model’s potential and advantages.

      (2) Strengths:

      It is impressive that the authors used AI combined with limited experimental validation to achieve such significant enhancements in protein performance. Besides, the successful application of the designed antibody in industrial settings demonstrates the practical and economic relevance of the study. Overall, this work has broad implications for future AI-guided protein engineering efforts.

      We are thankful for the editor’s appreciation on our work, especially acknowledged the practical application of our model.

      (3) Weaknesses:

      However, the authors should conduct a more thorough computational analysis to complement their manuscript. While the identification of improved multi-point mutants is commendable, the manuscript lacks a detailed investigation into the mechanisms by which these mutations enhance protein properties. The authors briefly mention that some physicochemical characteristics of the mutants are unusual, but they do not delve into why these mutations result in improved performance. Could computational techniques, such as molecular dynamics simulations, be employed to explore the effects of these mutations?

      We thank the reviewer for this good question, which allows us to provide a deeper investigation into the mechanisms by which the mutations significantly enhance the alkali-resistance of proteins. By following the reviewer’s suggestion, we have expanded our analysis by incorporating molecular dynamics (MD) simulations to understand the impact of the mutations. As an example, we focused on the representative alkali-resistant mutant, A57D;P29T, and examined its MD simulation results. As shown in Figure S4A, the two-point mutant of A57D;P29T has a Tm increase of around 8 ℃ and a much stronger binding affinity than the WT. Our analysis of the MD trajectories indicates that the A57D;P29T mutant has a more rigid structure than that of WT due to its lower root mean squared deviation (RMSD) of protein (Figure S4B). Furthermore, we calculated the root mean squared fluctuation (RMSF) for each residue, and realized that the mutant displayed less fluctuation at residue 29 but similar flexibility at residue 57. Interestingly, residues at positions 10, 108 and 118 which spatially distant from residues 29 and 57 in the mutant exhibited remarkable weakened fluctuations than those in the WT (Figure S4C), implying a more rigid structure of the mutant contributing to its improved resistance on high temperature and strong alkalinity. However, Figure S4D shows the AlphaFold3 predicted structures of the WT and the mutant are quite similar.

      To unveil the origin of change on structural flexibility, we computed the intramolecular interactions, such as salt bridges and hydrogen bonds for both WT and the mutant. We observed that the mutations increased the number of hydrogen bonds between the mutation sites and the rest of the protein (Figure S4E). However, the overall structure of the mutant did not show significant changes, which is also evident from the solvent-accessible surface area (SASA) analysis (Figure S4F). We also analyzed changes in salt bridges and found that although residue 57 mutated to Histidine, no new salt bridges were formed. Additionally, RMSF results showed that residues 10, 108, and 118 became more rigid, but further analysis revealed that there was no significant change in hydrogen bonds or other interactions in these regions. Overall, the MD results suggest that more hydrogen bonds introduced by the mutations of A57D;P29T stabilize the protein, leading to the enhanced alkali resistance observed in the mutant. These results are now presented in Figure S4 and discussed in detail in the revised manuscript.

      Specifically, we have added the following discussion in the main text:

      “In order to gain deeper insights into the mechanisms by which the identified mutations enhance protein properties, we performed molecular dynamics (MD) simulations on the best alkali-resistant mutant. The simulation results revealed several key observations that help explain the observed improvements in protein stability and alkali resistance. As shown in Figure S4A, the two-point mutant of A57D;P29T has a Tm increase of around 8℃ and a much stronger binding affinity than the WT. Our analysis of the MD trajectories indicates that the A57D;P29T mutant has a more rigid structure than that of WT due to its lower root mean squared deviation (RMSD) of protein (Figure S4B). Furthermore, we calculated the root mean squared fluctuation (RMSF) for each residue, and realized that the mutant displayed less fluctuation at residue 29 but similar flexibility at residue 57. Interestingly, residues at positions 10, 108 and 118 which spatially distant from residues 29 and 57 in the mutant exhibited remarkable weakened fluctuations than those in the WT (Figure S1C), implying a more rigid structure of the mutant contributing to its improved resistance on high temperature and strong alkalinity. However, Figure S4D shows the AlphaFold3 predicted structures of the WT and the mutant are quite similar. To unveil the origin of change on structural flexibility, we computed the intramolecular interactions, such as salt bridges and hydrogen bonds for both WT and the mutant. We observed that the mutations increased the number of hydrogen bonds between the mutation sites and the rest of the protein (Figure S4E). However, the overall structure of the mutant did not show significant changes, which is also evident from the solvent-accessible surface area (SASA) analysis (Figure S4F). We also analyzed changes in salt bridges and found that although residue 57 mutated to Histidine, no new salt bridges were formed. Additionally, RMSF results showed that residues 10, 108, and 118 became more rigid, but further analysis revealed that there were no significant changes in hydrogen bonds or other interactions in these regions. Taken together, these findings suggest that the enhanced alkali resistance of the mutant is likely due to an overall increase in protein stability, rather than a dramatic change in its structural conformation. The MD simulation results, which are detailed in Figure S4, provide a deeper understanding of how specific mutations can improve protein properties and offer valuable insights for future protein engineering applications.”

      And we also included the following content in the SI:

      “Molecular Dynamics (MD) simulations

      The initial structures for molecular dynamics (MD) simulations of both the wild type and the mutant were predicted using AlphaFold3. To simulate experimental conditions, each protein was placed in a cubic water box containing 0.1 M NaCl. The CHARMM27 force field and the TIP4P water model were applied throughout the simulations. After an initial energy minimization of 50,000 steps, the systems were heated and equilibrated for 1 ns in the NVT ensemble at 300 K followed by an additional 1 ns in the NPT ensemble at 1 atm. The production phase then involved 200-ns simulations with periodic boundary conditions, using a 2 fs integration time step. The LINCS algorithm was used to constrain covalent bonds involving hydrogen atoms, while Lennard-Jones interactions were cut off at 10 Å. Electrostatic interactions were computed with the particle mesh Ewald method, using a 10 Å cutoff and a grid spacing of approximately 1.6 Å with a fourth-order spline. Temperature and pressure were regulated by the velocity rescaling thermostat and Parrinello-Rahman algorithm, respectively. All simulations were performed using GROMACS 2020.4 software packages. Both systems have reached equilibrium according to the analyses of root mean squared deviation (RMSD).”

      (4) Additionally, the authors claim that their method is efficient. However, the selected VHH is relatively short (<150 AA), resulting in lower computational costs. It remains unclear whether the computational cost of this approach would still be acceptable when designing larger proteins (>1000 AA). Besides, the design process involves a large number of prediction tasks, including the properties of both single-site saturation and multi-point mutants. The computational load is closely tied to the protein length and the number of mutation sites. Could the authors analyze the model's capability boundaries in this regard and discuss how scalable their approach is when dealing with larger proteins or more complex mutation tasks?

      In our prior work, we have demonstrated that our method is applicable to larger proteins as well [Jiang et al., Sci. Adv. 10, eadr2641 (2024)]. For instance, when engineering a protein with 1000 amino acids, inferring the fitness of one million mutants using the model on a single 4090 GPU takes approximately 20 hours. However, it remains infeasible to explore all possible mutations when designing multi-point mutants due to the vast space. To address this challenge, we propose the design of a reliable mutant library. In the first round of experiments, we used the model to score all single-point mutations, and then constructed the multi-point mutant library by combining experimentally tested single-point mutations. In this way, even when designing five-point mutants, we only need to score on the order of millions of mutants, making the inference process time-efficient and fully acceptable. As a result, the number of single-point mutations selected for combination into the multi-point mutant library becomes a crucial parameter that affects both inference time and scope. We limited the number of single-point mutations to between 30 and 50 to strike a balance between efficiency and accuracy.

      These results are discussed in the revised manuscript. Specifically, we have added the following discussion at the section 2.2 in the main text:

      “Although the model inference is fast, it is not feasible to explore all possible mutations when designing multi-point mutants due to the exponential increase in the number of potential combinations. To manage this challenge, we constructed a mutant library based on a two-stage design process. In the first stage, we scored all single-point mutations using the model, and in the second stage, we combined experimentally validated single-point mutations to create the multi-point mutant library. This approach ensures that even when designing multi-point mutants (e.g., five-point mutants), the number of mutants to score remains in the millions, which is computationally efficient and practical. The number of single-point mutations selected for the multi-point mutant library is a key factor influencing both the computational load and the scope of the design space. To maintain a balance between efficiency and accuracy, we limited the number of single-point mutations to between 30 and 50. This strategic approach allows us to achieve both scalability and precision in our protein engineering tasks.”

      Reviewer #2 (Public review):

      In this paper, the authors aim to explore whether an AI model trained on natural protein data can aid in designing proteins that are resistant to extreme environments. While this is an interesting attempt, the study's computational contributions are weak, and the design of the computational experiments appears arbitrary.

      The reviewer’s comments give us an opportunity to further state the novelty of this study. Despite the AI model has been reported in our previous work [Sci. Adv. 10, eadr2641 (2024)], the unnatural physicochemical properties of proteins, to the best of our knowledge, have never been predicted using AI models. Our preceding work [Sci. Adv. 10, eadr2641 (2024)] has demonstrated that the large language model can predict the performances of the mutants on thermostability, catalytic activity, and binding affinity, etc. However, whether the AI models are able to evaluate the unnatural properties of the mutants remains unexplored. Our work has shown that AI models trained on the natural proteins can be used to design the mutants that resistant extreme conditions, such as strong alkalinity, substantially expanding the application of AI for bioengineering. Moreover, our design of the computational experiments was driven by the nature of the task and the availability of experimental data. We employed different strategies for designing single-point and multi-point mutants, specifically using a zero-shot approach for single-point mutations to overcome the challenge of rare data and fine-tuning the model for multi-point mutations to leverage the experimental data of single-point mutations.

      (1) The writing throughout the paper is poor. This leaves the reader confused.

      The manuscript has been revised accordingly, and we would like to address the reader’s questions if anything is confused.

      (2) The main technical issue the authors address is whether AI can identify protein mutations that adapt to extreme environments based solely on natural protein data. However, the introduction could be more concise and focused on the key points to better clarify the significance of this question.

      We thank the reviewer for this comment. We have revised the manuscript, particularly the introduction, where we focused on the research questions, methods, and main findings, while removing excessive background information to improve the manuscript’s conciseness and clarity.

      “Protein engineering, situated at the nexus of molecular biology, bioinformatics, and biotechnology, focuses on the design of proteins to introduce novel functionalities or enhance existing attributes[1-3]. With the exponential growth of biological data and computational power, protein engineering has experienced a significant shift towards advanced computational methodologies, particularly deep learning, to expedite the design process and unravel complex protein-function relationships[4-9]. However, a significant challenge in industrial protein engineering is designing proteins with inherent resistance to extreme conditions, such as high temperature and extreme pH environments (acidic or alkaline)[17, 18]. Unlike proteins in natural ecosystems, those used in industrial processes often encounter harsh physical and chemical conditions, necessitating exceptional resilience to maintain functionality[19, 20]. Previous efforts to enhance protein resistance have often relied on rational design and mutant library screening. These methods are typically labor-intensive, inefficient, and yield limited improvements[23-26]. Consequently, the industrial demand for proteins resilient to harsh environments poses a notable absence within the training datasets of Artificial Intelligence (AI) models. Exploring whether AI can achieve the evolution of protein resistance to extreme environments is crucial for broadening protein applications and improving modification efficiency.

      Recent advances in large-scale protein language models (LLMs) have enabled zero-shot predictions of protein mutants based on self-supervised learning from natural protein sequences. Although AI-guided protein design has been applied to predict the mutants with greater thermostability and higher activity[34-36], it is unexplored whether these models based on the natural protein information can find the mutants that adapt the unnatural extreme environments, such as the alkaline solution with the pH value higher than 13.

      Here, we employed a LLM (large language model) developed by our group, the Pro-PRIME model[27], to predict dozens of mutants of a nano-antibody against growth hormone (a VHH antibody), and examined their fitness, including alkali resistance and thermostability, to evaluate their performance under extreme environments.

      We utilized the Pro-PRIME model to score saturated single-point mutations of the VHH in a zero-shot setting, and selected the top 45 mutants for experimental testing. Some mutants exhibited improved alkali resistance, while others demonstrated higher thermal stability or affinity. Subsequently, we fine-tuned the Pro-PRIME model to predict dozens of multi-point mutations. As a result, we obtained three multi-point mutants with enhanced alkali resistance, higher thermostability, as well as strong affinity to the targeted protein. Also, the dynamic binding capacity of the selected mutant did not show significant decline after more than 100 cycles, making it suitable for practical application in industrial production. The selected mutant has been used in practical production and lower the cost for over one million dollars in a year. To the best of our knowledge, this is the first protein product developed by a LLM that has been successfully applied in mass production. Due to the Pro-PRIME model's ability to achieve precise predictions of multi-point mutations with reliance on a small amount of experimental data, our two-round design process involved experimental validation of only 65 mutants in two months, demonstrating remarkable high efficiency. Furthermore, we performed a systematic analysis of these findings and determined that the model can yield more valuable predictive outcomes while remaining consistent with rational design principles. Specifically, within the framework of multi-point combinations, the model's incorporation of negative single-point mutations into the combinatorial space led to exceptional results, showcasing its capacity to capture epistatic interactions. Notably, in striving for global optimum, deep learning methods offer distinct advantages over traditional rational design approaches.”

      (3) The authors did not develop a new model but instead used their previously developed Pro-PRIME model. This significantly weakens the novelty and contribution of this work.

      While it is true that the Pro-PRIME model was previously developed, the novelty and contribution of this work lie in its novel application to design proteins with properties that are not naturally found or are rare in nature. In our original work, the Pro-PRIME model was used to optimize proteins for existing, well-established properties, such as thermal stability, enzymatic activity, and affinity. However, in this study, we extended the model’s capabilities to design proteins that exhibit resilience to extreme environments, such as high pH—properties that are not inherently present in most natural proteins. To our knowledge, no existing model has addressed the challenge of engineering alkali-resistant proteins, nor is there relevant dataset available for training such models.

      This shift from optimizing existing characteristics to engineering entirely new properties represents a significant step forward in the field of protein design. By focusing on the design of proteins that can survive and function in harsh, unnatural environments, we have demonstrated the broader applicability of the Pro-PRIME model beyond its initial scope. This expansion of the model's application is a novel contribution that has the potential to accelerate the development of proteins for industrial, agricultural, and biotechnological applications.

      Thus, while the Pro-PRIME model itself is not new, its application to the new challenge of engineering proteins with alkali resistance and other novel properties significantly enhances the impact and novelty of this work. Moreover, this work is groundbreaking not only in terms of the model’s novel application but also because no previous studies have specifically targeted alkali resistance or provided data for training models on such extreme properties. Therefore, our approach is unique, marking a new direction in protein engineering.

      We have made the following revisions to the conclusions section of the manuscript:

      “Through two rounds of evolution, we successfully designed a VHH antibody with strong resistance to extreme environments and enhanced affinity using the Pro-PRIME model. Although rare case can tolerate the extreme pH and saline conditions in our pre-training dataset, the Pro-PRIME model showed impressive performance after supervised learning with limited data, especially on capturing the epistatic effects. The analysis of these 65 mutants revealed that the Pro-PRIME model is adept at exploring the large space of protein fitness, being less susceptible to local optima, and having greater potential to find the global optimum. Our efficient method of designing mutants that consider multiple properties improvement holds promise for industrial application of proteins. Specifically, the VHH antibody has been deployed in practical production and significantly enhancing the efficiency of the entire production line after our design. While the Pro-PRIME model itself has been reported, this work demonstrates its first-time application to the challenge of designing proteins with alkali resistance and other extreme properties that are not found in natural proteins, nor have previous studies addressed or provided data for such applications. This shift from optimizing existing protein properties to engineering entirely new, unnatural traits is a significant advance in the field. This study shows that the AI models, such as Pro-PRIME, can not only guide the evolution of protein thermal stability, enzymatic activity, ligand affinity, etc., but also enable to develop the mutants adapting the harsh unnatural environments, such as extreme pH and concentrated salt, largely expanding its application. The novelty of this work lies in the ability to design and engineer proteins with novel properties, specifically alkali resistance, which is an unprecedented achievement in AI-assisted protein engineering. The great potential of AI model is expected to significantly accelerate the development of proteins for diverse applications in medicine, agriculture, bioengineering, etc.”

      (4) The computational experiments are not well-justified. For instance, the authors used a zero-shot setting for single-point mutation experiments but opted for fine-tuning in multiple-point mutation experiments. There is no clear explanation for this discrepancy. How does the model perform in zero-shot settings for multiple-point mutations? How would fine-tuning affect single-point mutation results? The choice of these strategies seems arbitrary and lacks sufficient discussion.

      We appreciate the reviewer’s comment regarding the use of zero-shot and fine-tuning settings for single-point and multi-point mutation experiments, and we are grateful for the opportunity to further clarify this aspect of our work.

      In the first round of design, we used the zero-shot approach for single-point mutations because the number of possible single-point mutations is limited, and no prior experimental data was available. In the absence of relevant data, the zero-shot approach allows the model to make predictions based on the learned sequence patterns from the pre-trained protein language model. Given that single-point mutations are relatively fewer in number and computationally feasible to evaluate, the zero-shot approach was deemed appropriate for this task.

      However, when it comes to designing multi-point mutants, the number of potential combinations increases exponentially, making it computationally impractical to explore all possible mutations in a reasonable timeframe. Furthermore, since we had already obtained some experimental data for single-point mutations in the first round, we fine-tuned the model with this data in the second round to improve the accuracy of predictions for multi-point mutants. Fine-tuning helps the model better capture the specific features that contribute to protein functionality, which are critical when dealing with multi-point mutations where multiple residues interact. This allows the model to produce more reliable and targeted predictions for multi-point mutants, ultimately leading to better design outcomes.

      Regarding the model's performance in zero-shot settings for multi-point mutations, we tested this approach, and the results did not align well with the experimental data for multi-point mutants. Specifically, the Spearman correlation coefficient between the zero-shot predictions and experimental results was -0.71, indicating that zero-shot predictions for multi-point mutations were not as accurate as those from the fine-tuned model.

      In summary, the choice of using zero-shot for single-point mutations and fine-tuning for multi-point mutations was driven by the nature of the task and the availability of experimental data. Fine-tuning the model improves its predictive performance, especially for more complex multi-point mutation tasks. We have now clarified these choices in the manuscript and have added further discussion on the trade-offs between zero-shot and fine-tuning approaches.

      Specifically, we have added the following discussion at the section 2.2 in the main text:

      “Note that we employed different strategies for designing single-point and multi-point mutants, specifically using a zero-shot approach for single-point mutations and fine-tuning the model for multi-point mutations. These choices were made based on the distinct characteristics of the two tasks and the availability of experimental data. For single-point mutations, the number of possible mutations is relatively limited, and at the outset, there were no experimental data available. In such cases, the zero-shot setting was chosen because it allows the model to predict the fitness of mutants based solely on the information learned during pre-training on a large protein sequence dataset. Since single-point mutations are computationally manageable, this approach was deemed appropriate to generate initial predictions for protein engineering. However, when designing multi-point mutants, the situation changes significantly. The potential combinations of mutations increase exponentially, and without prior data, it becomes computationally infeasible to evaluate every possible combination within a reasonable timeframe. Moreover, by the time we reached the multi-point mutation design stage, experimental data for several single-point mutations had already been obtained. This data enabled us to fine-tune the model to better capture the specific structural and functional features that contribute to protein stability and resistance, especially in the context of multiple interacting mutations. Fine-tuning improves the model’s accuracy by adjusting its parameters to align more closely with the experimental data, ensuring that the predicted multi-point mutants are more likely to meet the desired engineering goals. After the second round of design, the fitness of the mutants was further improved. In improving alkali resistance, experimental results showed that 15 of the 45 designed mutants exhibited positive responses, yielding a success rate of 30%, close to the 35% success rate achieved in the second round. Compared to the wild type, the best single-point mutant improved alkali resistance by approximately 44.7%, while the best multi-point mutant achieved a 67.7% increase. For thermal stability enhancement, the success rate in the first round was 77.8%, rising to 100% in the second round. The top single-point mutant exhibited a Tm increase of 6.37°C over the wild type, while the best multi-point mutant had a Tm increase of 10.02°C. We also tested the performance of the zero-shot approach for multi-point mutants, and the results showed that this method did not yield satisfactory predictions. The Spearman correlation coefficient between the zero-shot predictions and experimental results for multi-point mutants was -0.71, indicating a significant discrepancy. This further highlights the importance of fine-tuning the model for multi-point mutations, as the fine-tuned model provided more accurate and reliable results. In summary, the choice of zero-shot for single-point mutations and fine-tuning for multi-point mutations was driven by practical considerations regarding computational feasibility and the availability of experimental data. Fine-tuning the model significantly enhances its predictive performance, particularly for complex multi-point mutations where multiple residues interact. We believe this strategy strikes an optimal balance between computational efficiency and predictive accuracy, making it well-suited for practical protein engineering applications.”

    1. Author response:

      We would like to thank the reviewers and the editors for carefully reading and commenting our manuscript and plan to prepare a revised manuscript. Particularly, we want to thank reviewer 2 for spotting a major oversight regarding the use of the TKO (TRiP-CRISPR knockout) and TOE (TRiP-CRISPR Over Expression) systems and the MiMIC alleles. As the reviewer pointed out, these lines were not used as intended, therefore our results and conclusions regarding the genetic interactions between Pink1 and several of genes in the paper (PIG-A, Rab7, Ccz1, CG10646, Mon1, FASN2, CG17712) that we attempted to target, are incorrect and based on a technical mistake. These results need to be removed from the manuscript.

    1. Author response:

      Reviewer 1:

      Summary: This work presents an Interpretable protein-DNA Energy Associative (IDEA) model for predicting binding sites and affinities of DNA-binding proteins. Experimental results demonstrate that such an energy model can predict DNA recognition sites and their binding strengths across various protein families and can capture the absolute protein-DNA binding free energies.

      We appreciate the reviewer’s careful assessment of the paper, and we thank the reviewer for the insightful suggestions and comments.

      Strengths:

      (1) The IDEA model integrates both structural and sequence information, although such an integration is not completely original. (2) The IDEA predictions seem to have agreement with experimental data such as ChIP-seq measurements.

      We appreciate the reviewer’s comments on the strength of the paper.

      Weaknesses:

      (1) The authors claim that the binding free energy calculated by IDEA, trained using one MAX-DNA complex, correlates well with experimentally measured MAX-DNA binding free energy (Figure 2) based on the reported Pearson Correlation of 0.67. However, the scatter plot in Figure 2A exhibits distinct clustering of the points and thus the linear fit to the data (red line) may not be ideal. As such. the use of the Pearson correlation coefficient that measures linear correlation between two sets of data may not be appropriate and may provide misleading results for non-linear relationships.

      We thank the reviewer for the insightful comments and agree that the linear fit between our predictions and the experimental data may not be ideal. The primary utility of the IDEA model is for assessing the relative binding affinities of different DNA sequences. To further support this, we plan to conduct additional statistical analyses that are independent of the linear correlation assumption but instead focus on the ranked order of DNA sequence binding affinities.

      (2) In the same vein, the linear Pearson Correlation analysis performed in Figure 5A and the conclusion drawn may be misleading.

      We thank the reviewer for the insightful comments. We will perform the same analysis for Figure 5A as detailed in our response to the previous comments.

      (3) The authors included the sequences of the protein and DNA residues that form close contacts in the structure in the training dataset, whereas a series of synthetic decoy sequences were generated by randomizing the contacting residues in both the protein and DNA sequences. In particular, synthetic decoy binders were generated by randomizing either the DNA (1000 sequences) or protein sequences (10,000 sequences) from the strong binders. However, the justification for such randomization and how it might impact the model’s generalizability and transferability remain unclear.

      We thank the reviewer for the insightful comments. We will perform additional analyses to assess the robustness of our model predictions with respect to the number of randomized decoys. Additionally, we will examine how randomization would potentially affect the model’s generalizability and transferability.

      (4) The authors performed Receiver Operating Characteristic (ROC) analysis and reported the Area Under the Curve (AUC) scores in order to quantitate the successful identification of the strong binders by IDEA. It would be beneficial to analyze the precision-recall (PR) curve and report the PRAUC metric which could be more robust.

      We agree with Reviewer 1 that more statistical metrics should be used to evaluate our model’s performance. We will include a more robust approach, such as PRAUC, to evaluate our model.

      Reviewer 2:

      Summary:

      Zhang et al. present a methodology to model protein-DNA interactions via learning an optimizable energy model, taking into account a representative bound structure for the system and binding data. The methodology is sound and interesting. They apply this model for predicting binding affinity data and binding sites in vivo. However, the manuscript lacks discussion of/comparison with state-of-the-art and evidence of broad applicability. The interpretability aspect is weak, yet over-emphasized.

      We appreciate the reviewer’s excellent summary of the paper, and we thank the reviewer for the insightful suggestions and comments.

      Strengths:

      The manuscript is well organized with good visualizations and is easy to follow. The methodology is discussed in detail. The IDEA energy model seems like an interesting way to study a protein-DNA system in the context of a given structure and binding data. The authors show that an IDEA model trained on one system can be transferred to other structurally similar systems. The authors show good performance in discriminating between binding-vs-decoy sequences for various systems, and binding affinity prediction. The authors also show evidence of the ability to predict genome-wide binding sites.

      We appreciate the reviewer’s strong assessment of the strengths of this paper.

      Weaknesses:

      An energy-based model that needs to be optimized for specific systems is inherently an uncomfortable idea. Is this kind of energy model superior to something like Rosetta-based energy models, which are generally applicable? Or is it superior to family-specific knowledge-based models? It is not clear.

      We thank the reviewer for the insightful comments. We will include predictions by generic protein-DNA energy models, such as the Rosetta-based energy model or family-specific knowledge-based model, to compare with our model performance.

      Prediction of binding affinity is a well-studied domain and many competitors exist, some of which are well-used. However, no quantitative comparison to such methods is presented. To understand the scope of the presented method, IDEA, the authors should discuss/compare with such methods (e.g. PMID 35606422).

      We thank the reviewer for the insightful comments. In our initial submission, Figure S5 presents a comparison between our model’s prediction and those of an existing method using 10-fold cross-validation. We agree a more comprehensive comparison with other methods is needed and will include a discussion and comparison of the IDEA model’s performance with additional state-of-the-art models.

      The term “interpretable” has been used lavishly in the manuscript while providing little evidence on the matter. The only evidence shown is the family-specific residue-nucleotide interaction/energy matrix and speculations on how these values are biologically sensible. Recent works already present more biophysical, fine-grained, and sometimes family-independent interpretability (e.g. PMID 39103447, 36656856, 38352411, etc.). The authors should put into context the scope of the interpretability of IDEA among such works.

      We agree that “interpretability” should be discussed in a relevant context. We will discuss the scope of IDEA interoperability within the context of recent works, including those suggested by the reviewers.

      The manuscript disregards subtle yet important differences in commonly used terminology in the field. For example, the authors use the term ”specificity” and ”affinity” almost interchangeably (for example, the caption for Figure 3A uses ”specificity” although the Methods text describes the prediction as about ”affinity”). If the authors are looking to predict specificity, IDEA needs to be put in the context of the corresponding state-of-the-art (PMID 36123148, 39103447, 38867914, 36124796, etc).

      We really appreciate the reviewer for pointing out our conflation of “specificity” and “affinity” in the manuscript. To clarify, IDEA’s primary function is to predict the binding affinities of protein-DNA pairs in a sequence-specific manner. The acquired binding affinities of target DNA sequences can then be used to assess the specific binding motifs. We will revise our text to clarify this point.

      It is not clear how much the learned energy model is dependent on the structural model used for a specific system/family. It would be interesting to see the differences in learned model based on different representative PDB structures used. Similarly, the supplementary figures show a lack of discriminative power for proteins like PDX1 (homeodomain family), POU, etc. Can the authors shed some light on why such different performances?

      We thank the reviewer for the insightful comments and agree that the familyspecific energy model could provide insight into the model predictions. We will examine different energy models based on the protein family, and especially investigate whether they can explain the lack of discriminative power for certain proteins.

      It is also not clear if IDEA’s prediction for reverse complement sequences is the same for a given sequence. If so, how is this property being modelled? Either this description is lacking or I missed it.

      We thank the reviewer for the insightful comments. The IDEA model treats reverse complementary sequences separately. We will provide additional details on how these sequences are modeled.

      Reviewer 3:

      Summary:

      Protein-DNA interactions and sequence readout represent a challenging and rapidly evolving field of study. Recognizing the complexity of this task, the authors have developed a compact and elegant model. They have applied well-established approaches to address a difficult problem, effectively enhancing the information extracted from sparse contact maps by integrating artificial sequences decoy set and available experimental data. This has resulted in the creation of a practical tool that can be adapted for use with other proteins.

      We appreciate the reviewer’s excellent summary of the paper, and we thank the reviewer for the insightful suggestions and comments.

      Strengths:

      (1) The authors integrate sparse information with available experimental data to construct a model whose utility extends beyond the limited set of structures used for training. (2) A comprehensive methods section is included, ensuring that the work can be reproduced. Additionally, the authors have shared their model as a GitHub project, reflecting their commitment to transparency of research.

      We appreciate the reviewer’s strong assessment of the strengths of this paper.

      Weaknesses:

      (1) The coarse-graining procedure appears artificial, if not confusing, given that full-atom crystal structures provide more detailed information about residue-residue contacts. While the selection procedure for distance threshold values is explained, the overall motivation for adopting this approach remains unclear. Furthermore, since this model is later employed as an empirical potential for molecular modeling, the use of P and C5 atoms raises concerns, as the interactions in 3SPN are modeled between C<sub>α</sub> and the nucleic base, represented by its center of mass rather than P or C5 atoms.

      We appreciate the reviewer’s insightful comments. The selection of P and C5 atoms will augment our model prediction, but the prediction is robust without this selection scheme. We will provide more details on the motivation behind this selection.

      Regarding the simulation model, we acknowledge a potential disconnection between the coarse-grained level of the 3SPN model (3 coarse-grained sites per nucleotide) and the data-driven model (1 coarse-grained site per nucleotide). The selection of nucleic bases for molecular interactions in the 3SPN model follows the PI’s previous work [PMID: 34057467] and its code implementation. We will test the simulation model by incorporating interactions between Cff and P atoms. In the future, we will work on implementing IDEA model output for 1-bead-per-nucleotide DNA simulation models.

      (2) Although the authors use a standard set of metrics to assess model quality and predictive power, some ∆∆G predictions compared to MITOMI-derived ∆∆G values appear nonlinear, which casts doubt on the interpretation of the correlation coefficient.

      We thank the reviewer for the insightful comments and agree that the linear fit between our model’s prediction and the experimental data may not be ideal. The primary utility of the IDEA model is for assessing the relative binding affinities of different DNA sequences. To this end, we plan to perform additional statistical analyses that are independent of the linear correlation assumption but instead focus on the ranked order of DNA sequence binding affinities.

      (3) The discussion section lacks information about the model’s limitations and a comprehensive comparison with other models. Additionally, differences in model performance across various proteins and their respective predictive powers are not addressed.

      We thank the reviewer for the insightful comments and will compare the performance of the IDEA model with state-of-the-art methods. We will also perform detailed analyses of the learned energy models across different proteins and examine their correlation with the model’s predictive powers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Hahn et al use bystander BRET, NanoBiT assays, and APEX2 proteomics to investigate endosomal signaling of CCR7 by two agonists, CCL19 and CCL21. The authors suggest that CCR7 signals from early endosomes following internalisation. They use spatial proteomics to try to identify novel interacting partners that may facilitate this signaling and use this data to specifically enhance a Rac1 signaling pathway. Many of the results in the first few figures showing simultaneous recruitment of Barr and G proteins by CCR7 have been shown previously (Laufer et al, 2019, Cell Reports), as has signaling from endomembranes, and Rac1 activation at intracellular sites. The new findings are the APEX2 proteomics studies, which could be useful to the scientific community. Unfortunately, the authors only follow up on a single finding, and the expansion of this section would improve the manuscript.

      First of all, we would like to thank the reviewer for helping with the manuscript. The summary is mostly accurate except for the statement that simultaneous recruitment of barr and G protein to CCR7 has been shown before. It should also be noted that it has not been demonstrated that CCR7 activates G proteins from endosomes previously nor has the functional role of this signaling mechanism. However, that CCR7 activity at endomembranes is associated with Rac1 signaling was demonstrated in the Laufer et al. study as the reviewer correctly points out.

      Strengths:

      (1) The APEX2 resource will be valuable to the GPCR and immunology community. It offers many opportunities to follow up on findings and discover new biology. The resource could also be used to validate earlier findings in the current manuscript and in previous manuscripts. Was there enrichment of early endosomal markers, Barr and Gi as this would provide further evidence for their earlier claims regarding endosomal signaling? Previous studies have suggested signaling from the TGN, so it is possible that the different ligands also direct to different sites. This could easily be investigated using the APEX2 data.

      Thank you for your comment. We do in fact observe enrichment of TGN/Golgi markers in response to chemokine stimulation, which we now have highlighted in the manuscript (fourth paragraph on page 7).

      (2) The results section is well written and can be followed very easily by the reader.

      We are glad that the reviewer found the results section very readable.

      (3) Some findings verify previous studies (e.g. endomembrane signalling). This should be acknowledged as this shows the validity of the findings of both studies.

      This is correct. We have now included more discussion of previous work related to CCR7 signaling at endomembranes (thirdparagraph on page 10).

      Weaknesses:

      (1) The findings are interesting although the studies are almost all performed in HEK293 cells. I understand that these are commonly used in GPCR biology and are easy to transfect and don't express many GPCRs at high concentrations, but their use is still odd when there are many cell-lines available that express CCR7 and are more reflective of the endogenous state (e.g. they are polarised, they can perform chemotaxis/ migration). Some of the findings within the study should also be verified in more physiologically relevant cells. At the moment only the final figure looks at this, but findings need to be verified elsewhere.

      We thank the reviewer for raising this point and giving us an opportunity to elaborate in further detail. The major goal of our study was to investigate whether CCR7 activates G protein from endosomes, the underlying mechanism, and functions of this potential signaling mechanism. The reason we chose CCR7 as our model receptor was that it belongs to a group of GPCRs, the chemokine receptors, that most often have features associated with the ability to promote endosomal G protein activation (phosphorylation site clusters in the C-terminal region).

      Specific detection of G protein activation at distinct subcellular compartments is currently very challenging in truly endogenous systems despite new innovative biosensors that are available (not just related to CCR7, but GPCRs in general). To our knowledge, most if not all studies that detect direct activation of G protein at a specific compartment whether at the plasma membrane, endosome, Golgi, or other compartments, have overexpressed either the receptor, G protein, or both. This is why we choose the HEK293 cell system for most of our experiments, which are easy to manipulate. That being said, we did confirm major findings in an indirect manner using Jurkat T-cells, which express CCR7 endogenously and are physiological relevant. Our hope is that in the future we will be able to use highly sensitive biosensors to directly confirm our findings in such a cell system as the reviewer wisely suggests.

      (2) The authors acknowledge that the kinetic patterns of the signals at the early endosome are not consistent with the rates of internalisation. They mention that this could be due to trafficking elsewhere. This could be easily looked at in their APEX2 data. Is there evidence of proximity to markers of other membranes? Perhaps this could be added to the discussion. Similarly, previous studies have shown that CCR7 signaling may involve the TGN. Was there enrichment of these markers? If not, this could also be an interesting finding and should be discussed. It is also possible that the Rab5 reporter is just not as efficient as the trafficking one, especially as in later figures the very convincing differences in the two ligands are not as robust as the differences in trafficking.

      Excellent point. We have now highlighted the possibility of CCR7 being further trafficked to the trans-Golgi network (TGN) as possible explanation for the transient translocation of activated CCR7 to the early endosome in Fig. 1G-H (second paragraph on page 3).

      Furthermore, in the APEX2 experiment we observe enrichment of proteins involved in lysosomal trafficking (LAMP1, VPS16, VAMP7, WDR91, and PP4P1) by CCL19 stimulation at 25 min, and recycling endosomes/TGN markers (SNX6, RAB7L, and GGA) by CCL21 stimulation at 25 min. In addition to this, several markers of TGN/Golgi (SNX3, COG5, YIF1A, SC22B, and AP3S1) were enriched as well in response to both CCL19 and CCL21 stimulation. We have now included a statement in the manuscript, which describes the likely trafficking of CCR7 to the TGN/Golgi in response to CCL19 and CCL21 stimulation (fourth paragraph on page 7).

      (3) In the final sentence of paragraph 2 of the results the authors state that the internalisation is specific to CCR7 as there isn't recruitment to V2R. I'm not sure this is the best control. The authors can only really say it doesn't recruit to unrelated receptors. The authors could have used a different chemokine receptor which does not respond to these ligands to show this.

      The point with this control experiment was to demonstrate that the loss of NanoBiT signal in response to CCL19 in CCR7-SmBiT/LgBiT-CAAX expressing cells, but not in V2R-SmBiT/LgBiT-CAAX expressing cells, was a result of bona fide CCR7 internalization rather than potential artifactual effects of CCL19 on the NanoBiT system. Our intent was not to demonstrate specificity of CCL19 among chemokine receptors, which already has been thoroughly tested in previous studies. We have now modified the sentence (second paragraph on page 3) “Moreover, CCL19/CCL21-stimulation of receptor internalization to endosomes is specific to CCR7 as none of the chemokines promote internalization or trafficking to endosomes of the vasopressin type 2 receptor (V<sub>2</sub>R)-SmBiT construct (Fig. S1E-F)” to “Moreover, CCL19/CCL21-stimulation did not promote internalization or trafficking to endosomes of the vasopressin type 2 receptor (V<sub>2</sub>R)-SmBiT construct, which validates that these chemokines act specifically via the CCR7-SmBiT system (Fig. S1E-F).”

      (4) The miniGi-Barr1 and imaging showing co-localisation could be more convincing if it was also repeated in a more physiological cell line as in the final figure. Imaging of CCR7, miniGi, and Barr1 would also provide further evidence that the receptor is also present within the complex.

      We agree with the reviewer’s assessment. However, as mentioned above it is currently extremely challenging to detect endogenous G protein coupling/activation to endogenous receptors. In addition, we are not sure if overexpressing fluorophore-tagged receptor, miniG, and barr1 in a physiological-relevant cell line would provide truly physiological conditions as the expression of these proteins still would be artificially high. This is why we chose to conduct these mechanistic experiments in HEK293 cells and then indirectly verify key findings in an endogenous and physiological-relevant cell line.

      (5) The findings regarding Rac1 are interesting, although an earlier paper found similar results (Laufer et al, 2019, Cell Reports), so perhaps following up on another APEX2-identified protein pathway would have been more interesting. The authors' statement that Rac1 is specifically activated, and RhoA and Cdc42 are not, is unconvincing from the current data. Only a single NanoBiT assay was used, and as raw values are not reported it is difficult for the reader to glean some essential information. The authors should show evidence that these reporters work well for other receptors (or cite previous studies) and also need evidence from an independent (i.e. non-NanoBiT or BRET) assay.

      The major focus of the study was to investigate whether CCR7 can activate G protein after having been internalized into endosomes via formation of CCR7-Gi/o-barr megaplexes, and to dissect out potential functions of said endosomal G protein signaling. To do this, we used CCL19 and CCL21 which stimulate G protein to the same extent but differ in their ability of promote barr recruitment and receptor internalization with CCL19 being superior to CCL21. To this end, we found that CCL19 also promote endosomal G protein activation to a greater extent than CCL21, and therefore, we specifically looked for proteins enriched by CCL19 in our APEX experiment. This led us to some Rho GTPase regulators that were differentially enriched by CCL19 and CCL21. We agree that there were other interesting effectors related to CCR7 biology identified in the APEX experiment such as EYA2, GRIP2, and EI24. However, those proteins were enriched similar by CCL19 and CCL21 challenge, and thus, do not seem to be activated specifically at endosomes. Following the same argument, we also did not observe any difference in the activity of RhoA or Cdc42 when stimulated with CCL19 or CCL21, so we cannot conclude that these signaling proteins are activated specifically in endosomes. On the other hand, Rac1 was stimulated to a larger degree by CCL19 than CCL21, its activity was inhibited by the Gi/o inhibitor PTX and endocytosis inhibitors Dyngo-4a and PitStop2. CCR7-mediated Rac1 signaling was also inhibited by expression of a dominant negative dynamin mutant that inhibits receptor internalization, and Rac1 was not activated by an internalization-deficient CCR7-DS/T mutant. Finally, the involvement of Rac1 in CCR7 mediated chemotaxis of Jurkat T cells was also demonstrated. We believe that these findings together provide strong basis for the claim that endosomal Gi/o protein signaling by CCR7 activates Rac1.

      Following the reviewer’s suggestion, we have now included experiments to show that the activation of RhoA, Rac1, and Cdc42 by CXCR4 also can be detected by the NanoBiT biosensors (Fig. S7D-F). We have also added the appropriate references to the original studies where these biosensors were developed in the results section (first paragraph on page 8).

      (6) At present, the studies in Figure 7 do not go beyond those in the previous Laufer et al study in which they showed blocking endocytosis affected Rac1 signalling. The authors could show that Rac1 signalling is from early endosomes to improve this, otherwise, it could be from the TGN as previously reported.

      The major purpose of Figure 7 was to indirectly confirm findings from HEK293 cells experiments and to tie them to physiological functions. Our experiments using Jurkat T-cells show that CCL19 promote stronger chemotactic response than CCL21 despite similar Gi/o response. In addition, we showed that CCR7-mediated Gi/o activation, receptor endocytosis, as well as Rac1 activity, are required to drive chemotaxis. The Laufer et al. study did not investigate whether CCR7 activates G protein after having been internalized into endosomes via formation of CCR7-Gi/o-barr megaplexes, and thus, did not focus on functional outcomes of this signaling mechanism. Based on this, we believe our work provides new and valuable knowledge to the field.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript describes a comprehensive analysis of signalling downstream of the chemokine receptor CCR7. A comprehensive dataset supports the authors' hypothesis that G protein and beta-arrestin signalling can occur simultaneously at CCR7 with implications for continued signalling following receptor endocytosis.

      We would like to thank the reviewer for helping with the manuscript. We agree on all points made and have now updated the manuscript accordingly.

      Strengths:

      The experiments are well controlled and executed, employing a wide range of assays using - in the main - CCR7 transfectants. Data are well presented, with the authors' claims supported by the data. The paper also has an excellent narrative which makes it relatively easy to follow. I think this would certainly be of interest to the readership of the journal.

      We appreciate the positive assessment of strengths.

      Weaknesses:

      Since the authors show a differential enrichment of RhoGTPases by CCR7 stimulation with CCL19 versus CCL21, I think that they also need to show that the Gi/o coupling of HEK-292-CCR7-APEX2 cells to both CCL19 and CCL21 is not perturbed by the modification. Currently, the authors only show data for CCL19 signalling, which leaves the potential for a false negative finding in terms of CCL21 signalling being selectively impaired. This should be relatively easy to do and should strengthen the authors' conclusions.

      We agree with the reviewer and have now included experiments to show that both CCL19- and CCL21-mediated CCR7-APEX2 stimulation leads to Gi/o activation (Fig. S4C). In addition, our proteomics experiments show strong effects of both CCL19 and CCL21 stimulation, which suggest that the receptor is activated by both ligands.

      The authors conclude the discussion by suggesting that their findings highlight endosomal signalling as a general mechanism for chemokine receptors in cell migration. I think this is an overreach. The authors chose several studies of CXC chemokine receptors to support their argument that C-terminal truncation or mutation of the C-terminal phosphorylation sites impairs endocytosis and chemotaxis (refs 40-42). However, in some instances e.g. at the related chemokine receptor CCR4, C-terminal removal of these sites impairs endocytosis but promotes chemotaxis (Nakagawa et al, 2014); Anderson et al, 2020). I therefore think that either the final statement needs to be tempered down or the counterargument discussed a little.

      We appreciate the reviewer highlighting this point. We have now modified the concluding sentence from “Thus, the findings from our study highlight endosomal G protein signaling by chemokine receptors as a potential general mechanism that regulates key aspects of cell migration” to “Thus, the findings from our study highlight endosomal G protein signaling by some chemokine receptors as a potential mechanism that regulates key aspects of cell migration.” We hope that the temper level of this sentence is more appropriate.

      References:

      Anderson, C. A. et al. A degradatory fate for CCR4 suggests a primary role in Th2 inflammation. J Leukocyte Biol 107, 455-466 (2020).

      Nakagawa, M. et al. Gain-of-function CCR4 mutations in adult T cell leukaemia/lymphoma. Journal of Experimental Medicine 211, 2497-2505 (2014).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The results section is well written, although the introduction needs more information on what is known about CCR7 trafficking and endomembrane signaling. I understand this is because the authors wanted to focus on GPCR signaling, but the study will equally be of interest to researchers in the immunology and chemokine fields, and therefore more CCR7-focussed discussion in the introduction would be useful. Similarly, the discussion would benefit from more discussion of previous studies of CCR7 trafficking and endomembrane signaling (in particular the Laufer et al paper) to acknowledge that many of the findings within this paper verify previous studies.

      We have now included additional immunology/endomembrane background information about CCR7 at the place where the receptor is introduced (first paragraph on page 3). We have also expanded our discussion of our work in relation to the Laufer et al. study (third paragraph on page 10).

      (2) On page 5, the authors state that 'The response to chemokine stimulation was not observed in mock transfected HEK293 cells'. Figure S4D does not have a legend so it is difficult to see what they mean by mock transfected. Do they mean not transfecting with anything or not with the receptor? The better control would be transfecting the reporters but not the receptor. This may have been done, but the wording needs clarifying and S4D needs a legend.

      Thanks for pointing this out. We believe the reviewer refers to Figure S2D and we have now highlighted/clarified the legend better. Mock transfected conditions refer to HEK293 cells transfected with the reporter, but not the receptor. This is written in the legend as “(D) Change in luminescence signal generated between SmBiT-barr1 and LgBiT-miniGi in response to 100 nM CCL19 or 100 nM CCL21 in mock transfected HEK293 cells (no CCR7)”, which we believe should be clear to the audience.

      (3) The validation of the APEX2 receptor construct relies on a single assay with one ligand. The authors should show that the receptor expresses at the cell surface, is internalised normally, and that both ligands activate the receptor.

      We have now included additional data to show that (1) the receptor is expressed at the cell surface, (2) that the CCR7-APEX2 recruits barr1 to the plasma membrane, (3) that this association leads to barr1 translocation to the early endosomes as an indirect measurement of receptor internalization, and (4) that both CCL19- and CCL21-stimulation inhibit forskolin induced cAMP production (Fig.S4A-C, and described in fifth paragraph on page 6).

      (4) The APEX2 section is very short, especially as this is novel data. It lacks some important information, e.g. when the authors state that 'we identified a total of 579 proteins', is this in total for both ligands, separately or were some shared? More information on each ligand separately and combined would make this clearer.

      We have now specified that the identified total proteins enriched from our APEX2 approach is when the cells are stimulated with either CCL19 or CCL21 (third paragraph on page 7). Furthermore, we have included a Venn diagram in Fig. S5C to show how many proteins were enriched by CCL19 or CCL21 stimulation and how many of those were shared at different time points.

      (5) The discussion would benefit from some further work. The current first two paragraphs just reiterate the introduction and don't discuss the current paper so could be removed completely. The Laufer et al study needs much more discussion as they report many of the findings of the current paper (signaling following endocytosis, Rac1 endomembrane signaling) five years ago. The APEX2 findings that are discussed, though interesting, are not followed up by further experimental evidence and there is little discussion of why the two ligands have different responses or what the physiological effects could be.

      We appreciate the reviewer’s effort in helping with the discussion. To this end, we have now expanded our discussion of the mentioned paper further as suggested (third paragraph on page 10). We agree that the findings from our APEX experiment are interesting, but the focus of this study relates to proteins enriched specifically at endosomes. Several of the most enriched proteins did not show this localization bias, which is why these proteins were not further investigated.

      Minor changes:

      (1) The authors should remove the word 'recent' at the start of the first sentence of the third paragraph. Endosomal signaling by GPCRs was described 15 years ago so cannot really be seen as recent anymore.

      We have now adjusted the manuscript accordingly.

      (2) Tukey defaulted to Turkey in some places.

      We thank the reviewer for pointing out these typos, which now have been corrected.

      Reviewer #2 (Recommendations For The Authors):

      Minor Points:

      (1) ACKRs do not couple to G proteins so it is peculiar to see them in this table. I would limit the table to the conventional CCR1-10, CXCR1-6 and XCR1. The ligand for XCR1 is XCL1 which is absent from the table.

      We have now modified the table accordingly.

      (2) CCL19 (formerly known as ELC) has been long known to be a more efficacious and potent ligand in chemotaxis assays (Bardi et al, 2001). This earlier reference should be added to the citations in the preceding statement on page 10.

      This is an important study showing that CCL19 is more efficacious than CCL21 in promoting chemotaxis and that this has been known for decades. We have now included the reference accordingly (reference 59 in second paragraph on page 11).

      (3) Figure 6, Panel Q. I think the legends for CCR7 and CCR7 delta ST might be flipped.

      We thank the reviewer for pointing out this error. We have now corrected the figure panel.

      (4) Figure S5 (or 5) might benefit from simple Venn diagrams showing the numbers of differentially enriched proteins following treatment with the two ligands at different time points.

      We have included a Venn diagram in Fig. S5C to show how many proteins were enriched by CCL19 or CCL21 stimulation and how many of those where shared.

      Reference:

      Bardi, G., Lipp, M., Baggiolini, M. & Loetscher, P. The T cell chemokine receptor CCR7 is internalized on stimulation with ELC, but not with SLC. European Journal of Immunology 31, 3291-3297 (2001).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Understanding the mechanisms of how organisms respond to environmental stresses is a key goal of biological research. Assessment of transcriptional responses to stress can provide some insights into those underlying mechanisms. The researchers quantified traits, fitness, and gene expression (transcriptional) response to salinity stress (control vs stress treatments) for 130 accessions of rice (three replicates for each accession), which were grown in the field in the Philippines. This experimental design allowed for many different types of downstream analyses to better understand the biology of the system. These analyses included estimating the strength of selection imposed on transcription in each environment, evaluating possible trade-offs in gene expression, testing whether salinity induces transcriptional decoherence, and conducting various eQTL-type analyses.

      Strengths:

      The study provides an extensive analysis of gene expression responses to stress in rice and offers some insights into underlying mechanisms of salinity responses in this important crop system. The fact that the study was conducted under field conditions is a major plus, as the gene expression responses to soil salinity are more realistic than if the study was conducted in a greenhouse or growth chamber. The preprint is generally well-written and the methods and results are mostly well-described.

      Weaknesses:

      While the study makes good use of analyzing the dataset, it is not clear how the current work advances our understanding of gene regulatory evolution or plant responses to soil salinity generally. Overall, the results are consistent with other prior studies of gene expression and studies of selection across environmental conditions. Some of the framing of the paper suggests that there is more novelty to this study than there is in reality. That said, the results will certainly be useful for those working in rice and should be interesting to scientists interested in how gene expression responses to stress occur under field conditions. I detail other concerns I had about the preprint below:

      The abstract on lines 33-35 illustrates some of my concerns about the overstatement of the novelty of the current study. For example, is it really true that the role of gene expression in mediating stress response and adaptation is largely unexplored? There have been numerous studies that have evaluated gene expression responses to stresses in a wide range of organisms. Perhaps, I am missing something critically different about this study. If so, I would recommend that the authors reword this sentence to clarify what gap is being filled by this study. Further, is it really the case that none of them have evaluated how the correlational structure of gene expression changes in response to stresses in plants, as implied in lines 263-265? Don't the various modules and PC analyses of gene expression get at this question?

      We have re-worded these sentences, and highlighted the novelty of our work.

      There were some places in the methods of the preprint that required more information to properly evaluate. For example, more information should be provided on lines 664-668 about how G, E, and GxE effects were established, especially since this is so central to this study. What programs/software (R? SAS? Other?) were used for these analyses? If R, how were the ANOVAs/models fit? What type of ANOVA was used? How exactly was significance determined for each term? Which effects were considered fixed and which were random? If the goal was to fit mixed models, why not use an approach like voom-limma (Law et al. 2014 Genome Biology)? More details should also be added to lines 688-709 about these analyses, including what software/programs were used for these analyses.

      We have added more details in the methods. Also, although we could in priciple use voom-limma to fit our mixed model, to be able to partition variance into G, E and G×E, we need to use the function fitExtractVarPartModel (from package VariancePartition) which requires all categorical variables to be modeled as random effects. Therefore, we couldn’t model environment as a fixed effect.

      One thing that I found a bit confusing throughout was the intermixing of different terms and types of selection. In particular, there seemed to be some inconsistencies with the usage of quantitative genetics terms for selection (e.g. directional, stabilizing) vs molecular evolution terms for selection (e.g. positive, purifying). I would encourage the authors to think carefully about what they mean by each of these terms and make sure that those definitions are consistently applied here.

      We have defined the selection terms used in the study and used these terms consistently throughout the manuscript.

      It would be useful to clarify the reasons for the inherent bias in the detection of conditional neutrality (CN) and antagonistic pleiotropy (AP; Lines 187-196). It is also not clear to me what the authors did to deal with the bias in terms of adjusting P-value thresholds for CN and AP the way it is currently written. Further, I found the discussion of antagonistic pleiotropy and conditional neutrality to be a bit confusing for a couple of reasons, especially around lines 489-491. First of all, does it really make sense to contrast gene expression versus local adaptation, when lots of local adaptation likely involves changes in gene expression? Second, the implication that antagonistic pleiotropy is more common for local adaptation than the results found in this study seems questionable. Conditional neutrality appears to be more common for local adaptation as well: see Table 2 of Wadgymar et al. 2017 Methods in Ecology and Evolution. That all said, it is always difficult to conclude that there are no trade-offs (antagonistic pleiotropy) for a particular locus, as the detecting trade-offs may only manifest in some years and not others and can require large sample sizes if they are subtle in effect.

      We have now explained the cause of the inherent bias in the detection of CN, and also elaborated on how we deal with this bias. Also, we have edited our discussion and added relevant citations to indicate both conditional neutrality and antagonistic pleiotropy can lead to local adaptations and added the caveat regarding detecting antagonistic pleiotropy.

      Reviewer #2 (Public Review):

      The authors investigate the gene expression variation in a rice diversity panel under normal and saline growth conditions to gain insight into the underlying molecular adaptive response to salinity. They present a convincing case to demonstrate that environmental stress can induce selective pressure on gene expression, which is in agreement to their earlier study (Groen et al, 2020). The data seems to be a good fit for their study and overall the analytic approach is robust.

      (1) The work started by investigating the effect of genotype and their interaction at each transcript level using 3'-end-biased mRNA sequencing, and detecting a wide-spread GXE effect. Later, using the total filled grain number as a proxy of fitness, they estimated the strength of selection on each transcript and reported stronger selective pressure in a saline environment. However, this current framework relies on precise estimation of fitness and, therefore can be sensitive to the choice of fitness proxy.

      We now acknowledge this caveat in the discussion.

      (2) Furthermore, the authors decomposed the genetic architecture of expression variation into cis- and trans-eQTL in each environment separately and reported more unique environment-specific trans-eQTLs than cis-. The relative contribution of cis- and trans-eQTL depends on both the abundance and effect size. I wonder why the latter was not reported while comparing these two different genetic architectures. If the authors were to compare the variation explained by these two categories of eQTL instead of their frequency, would the inference that trans-eQTLs are primarily associated with expression variation still hold?

      We have now also reported the effect sizes for both cis- and trans-eQTLs in the two environments and showed that the trans-eQTLs have higher effect sizes as compared to cis-eQTLs, indicating that they are able to explain higher proportion of variation in transcript abundances in the two environments.

      (3) Next, the authors investigated the relationship between cis- and trans-eQTLs at the transcript level and revealed an excess of reinforcement over the compensation pattern. Here, I struggle to understand the motivation for testing the relationship by comparing the effect of cis-QTL with the mean effect of all trans-eQTLs of a given transcript. My concern is that taking the mean can diminish the effect of small trans-eQTLs potentially biasing the relationship towards the large-effect eQTLs.

      We wanted to estimate compensating vs reinforcing effects, which essentially entails identifying genes that have opposing directionality of cis and trans-effects. To get the total trans-effect we decided to take the mean effect of trans-eQTLs. This mean was only used to identify the compensating/reinforcing genes and although the mean effects diminishes the effect of small trans-eQTLs, this mean was not used in downstream analyses.

      Reviewer #3 (Public Review):

      In this work, the authors conducted a large-scale field trial of 130 indica accessions in normal vs. moderate salt stress conditions. The experiment consists of 3 replicates for each accession in each treatment, making it 780 plants in total. Leaf transcriptome, plant traits, and final yield were collected. Starting from a quantitative genetics framework, the authors first dissected the heritability and selection forces acting on gene expression. After summarizing the selection force acting on gene expression (or plant traits) in each environment, the authors described the difference in gene expression correlation between environments. The final part consists of eQTL investigation and categorizing cis- and trans-effects acting on gene expression.

      Building on the group's previous study and using a similar methodology (Groen et al. 2020, 2021), the unique aspect of this study is in incorporating large-scale empirical field works and combining gene expression data with plant traits. Unlike many systems biology studies, this study strongly emphasizes the quantitative genetics perspective and investigates the empirical fitness effects of gene expression data. The large amounts of RNAseq data (one sample for each plant individual) also allow heritability calculation. This study also utilizes the population genetics perspective to test for traces of selection around eQTL. As there are too many genes to fit in multiple regression (for selection analysis) and to construct the G-matrix (for breeder's equation), grouping genes into PCs is a very good idea.

      Building on large amounts of data, this study conducted many analyses and described some patterns, but a central message or hypothesis would still be necessary. Currently, the selection analysis, transcript correlation structure change, and eQTL parts seem to be independent. The manuscript currently looks like a combination of several parallel works, and this is reflected in the Results, where each part has its own short introduction (e.g., 185-187, 261-266, 349-353). It would be great to discuss how these patterns observed could be translated to larger biological insights. On a related note, since this and the previous studies (focusing on dry-wet environments) use a similar methodology, one would also wonder what the conclusions from these studies would be. How do they agree or disagree with each other?

      We acknowledge that the manuscript currently presents some analyses in a somewhat independent manner. Although it would be ideal to have a central hypothesis/message, our study is meant to broadly outline the various responses and fitness effects of salinity stress in rice. Throughout the manuscript, we have also included comparisons between our findings and that of our previous studies on drought stress to highlight any consistent themes or novel insights.

      Many analyses were done separately for each environment, and results from these two environments are listed together for comparison. Especially for the eQTL part, no specific comparison was discussed between the two environments. It would be interesting to consider whether one could fit the data in more coherent models specifically modeling the X-by-environment effects, where X might be transcripts, PCs, traits, transcript-transcript correlation, or eQTLs.

      We do plan to consider fitting models that explicitly incorporate X-by-environment interactions to provide a more detailed understanding of the genetics of plasticity between the two environments, but it is beyond the scope of this paper. This will be explored in a separate report.

      As stated, grouping genes into PCs is a good idea, but although in theory, the PCs are orthogonal, each gene still has some loadings on each PC (ie. each PC is not controlled by a completely different set of genes). Another possibility is to use any gene grouping method, such as WGCNA, to group genes into modules and use the PC1 of each module. There, each module would consist of completely different sets of genes, and one would be more likely to separate the biological functions of each module. I wonder whether the authors could discuss the pros and cons of these methods.

      We recognize that individual genes can contribute to multiple PCs, and this is precisely why we choose PCA clustering over WGCNA where one gene can belong to only one module. Our aim was to recognize all biological processes that could be under selection in either environment, and since one gene can be involved in various different processes, we wanted to identify the contribution of these genes to different processes which can be done effectively by a PCA analyses.

      Reviewer #4 (Public Review):

      The manuscript examines how patterns of selection on gene expression differ between a normal field environment and a field environment with elevated salinity based on transcript abundances obtained from leaves of a diverse panel of rice germplasm. In addition, the manuscript also maps expression QTL (eQTL) that explains variation in each environment. One highlight from the mapping is that a small group of trans-mapping regulators explains some gene expression variation for large sets of transcripts in each environment. The overall scope of the datasets is impressive, combining large field studies that capture information about fecundity, gene expression, and trait variation at multiple sites. The finding related to patterns indicating increased LD among eQTLs that have cis-trans compensatory or reinforcing effects is interesting in the context of other recent work finding patterns of epistatic selection. However, other analyses in the manuscript are less compelling or do not make the most of the value of collected data. Revisions are also warranted to improve the precision with which field-specific terminology is applied and the language chosen when interpreting analytical findings.

      Selection of gene expression:

      One strength of the dataset is that gene expression and fecundity were measured for the same genotypes in multiple environments. However, the selection analyses are largely conducted within environments. The addition of phenotypic selection analyses that jointly analyze gene expression across environments and or selection on reaction norms would be worthwhile.

      We do plan to consider fitting models that explicitly incorporate G×E interactions to provide a more detailed understanding of the genetics of plasticity between the two environments, but it is beyond the scope of this paper. This will be explored in a separate report.

      Gene expression trade-offs:

      The terminology and possibly methods involved in the section on gene expression trade-offs need amendment. I specifically recommend discontinuing reference to the analysis presented as an analysis of antagonistic pleiotropy (rather than more general trade-offs) because pleiotropy is defined as a property of a genotype, not a phenotype. Gene expression levels are a molecular phenotype, influenced by both genotype and the environment. By conducting analyses of selection within environments as reported, the analysis does not account for the fact that the distribution of phenotypic values, the fitness surface, or both may differ across environments. Thus, this presents a very different situation than asking whether the genotypic effect of a QTL on fitness differs across environments, which is the context in which the contrasting terms antagonistic pleiotropy and conditional neutrality have been traditionally applied. A more interesting analysis would be to examine whether the covariance of phenotype with fitness has truly changed between environments or whether the phenotypic distribution has just shifted to a different area of a static fitness surface.

      We recognize that pleiotropy is a property of a genotype, and not phenotype, but since our phenotype (gene expression) is strongly coupled with the genotype, we choose to call trade-offs as antagonistic pleiotropy. That being said, we did test whether the covariance of gene expression with phenotype significantly varies between environments, and found that to indeed be the case.

      Biological processes under selection / Decoherence: PCs are likely not the most ideal way to cluster genes to generate consolidated metrics for a selection gradient analysis. Because individual genes will contribute to multiple PCs, the current fractional majority-rule method applied to determine whether a PC is under direct or indirect selection for increased or decreased expression comes across as arbitrary and with the potential for double-counting genes. A gene co-expression network analysis could be more appropriate, as genes only belong to one module and one can examine how selection is acting on the eigengene of a co-expression module. Building gene co-expression modules would also provide a complementary and more concrete framework for evaluating whether salinity stress induces "decoherence" and which functional groups of genes are most impacted.

      We recognize that individual genes can contribute to multiple PCs, and this is precisely why we choose PCA clustering over WGCNA where one gene can belong to only one module. Our aim was to recognize all biological processes that could be under selection in either environment, and since one gene can be involved in various different processes, we wanted to identify the contribution of these genes to different processes which can be done effectively by a PCA analyses. But again as pointed out by the reviewer, our PCs did contain contribution (even negligible) of each gene, so to identify the ‘primary’ biological processes represented by the PCs, we chose the majority rule. As for testing decoherence, we agree that a co-expression module analyses would have provided additional support to the specific test performed in our manuscript, but since it would just be additional support, we choose to not add it in the manuscript.

      But based on the recommendation of the reviewer(s), we did perform a WGCNA analyses and found a total of 14 and 13 modules in normal and saline conditions, of which 0 and 2 modules (with no significant GO enrichment) were under directional selection. This supports our reasoning of potentially missing on identification of processes under selection.

      Selection of traits:

      Having paired organismal and molecular trait data is a strength of the manuscript, but the organismal trait data are underutilized. The manuscript as written only makes weak indirect inferences based on GO categories or assumed gene functions to connect selection at the organismal and molecular levels. Stronger connections could be made for instance by showing a selection of co-expression module eigengene values that are also correlated with traits that show similar patterns of selection, or by demonstrating that GWAS hits for trait variation co-localize to cis-mapping eQTL.

      We did perform a GWAS for all the traits collected in both normal and saline environment, and only found significant hits for fecundity (in both normal and saline environment) and chlorophyll_a content (in the saline environment). But these regions did not overlap with any candidate genes or cis-mapping eQTL. Hence we choose to mention it in the manuscript. Additionally, using the WGCNA modules, we found that the only two module under selection in the saline environment were not significantly correlated with any of the traits measured.

      Genetic architecture of gene expression variation:

      The descriptive statistics of the eQTL analysis summarize counts of eQTLs observed in each environment, but these numbers are not broken down to the molecular trait level (e.g., what are the median and range of cis- and trans-eQTLs per gene). In addition, genetic architecture is a combination of the numbers and relative effect sizes of the QTLs. It would be useful to provide information about the relative distributions of phenotypic variance explained by the cis- vs. trans- eQTLs and whether those distributions vary by environment. The motivation for examining patterns of cis-trans compensation specifically for the results obtained under high salinity conditions is unclear to me. If the lines sampled have predominantly evolved under low salinity conditions and the hypothesis being evaluated relates to historical experience of stabilizing selection, then my intuition is that evaluating the eQTL patterns under normal conditions provides the more relevant test of the hypothesis.

      We have added the median number of eQTLs per gene in each environment. Additionally, we recognize that genetic architecture is a combination id numbers and effect size, and we have added information regarding the effect sizes of eQTLs by type and by environment as recommend by another reviewer. We did explore the distributions of phenotypic variance explained by the cis- vs. trans- eQTLs as recommended here, and found that trans-eQTLs explain more phenotypic variance than cis-eQTLs in both environments and that the distribution of either type of eQTL does not vary by environment. We are choosing to not add this in the main text due to space limitations. Lastly, we examined the patterns of cis-trans compensation/reinforcement under both normal and salinity conditions and have compared and contrasted the results from both in the main text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Lines 126: I would recommend citing those who originally developed the 3' end targeted RNA sequencing methods (e.g. Meyer et al 2011 Molecular Ecology).

      We have cited the recommended paper.

      Lines 128-130: It would be useful to include a description here of what models were fit to the data to partition out G, E, and GxE effects.

      Due to space limitations, we have in brief added a sentence to this effect.

      Line 139: I would suggest changing "found little" to "no" since the test was not significant.

      The sentence has been modified to say no evidence.

      Line 313: I think you mean directional selection instead of positive selection.

      We have corrected the text

      Lines 362-363: Would the authors also expect an enrichment of reinforcing genes for most scenarios where that has been divergent selection, such as local adaptation among populations?

      Based on our hypothesis, we would indeed expect an enrichment of reinforcing genes for scenarios of local adaptation where different alleles are maintained in different populations due to local adaptation.

      Reviewer #3 (Recommendations For The Authors):

      Figures 1d-e are not mentioned in the Results.

      The figures have been referenced in appropriate places.

      Lines 41-45: Terms such as reinforcement and compensation need to be explained in this specific context. Also "different selection regimes" is a bit broad and vague.

      Due to word-count limitation, we are choosing to not elaborate the terms reinforcement and compensation in the abstract (since these are commonly used in the literature, and we have also defined these in the main text). Additionally, we now explicitly state the selection pressures associated with cis and trans eQTLs.

      Table 1: Please explain S and C in the footnote.

      We have added the recommended footnote

      Figures: Some panel labels (a, b, c...) are mingled with the graphs.

      We are re-made our figure such that the panel labels do not mingle with the plots.

      Lines 588-591: font.

      Modified

      Lines 620-633: Please describe how these RNAseq libraries were allocated/pooled into different sequencing lanes to avoid potential batch effects among sequencing lanes.

      The sequencing was performed on the same Illumina NextSeq 500 machine and we have added the sequencing libraries/pool plan in the methods (lines 688-689). 

      Lines 690-692: At the beginning of this paragraph, it was mentioned that the un-standardized coefficients were estimated. But here, it seems like the transcript data were already standardized in the data preparation step. What do lines 687-688 refer to? Further standardizing those estimated coefficients so that the whole distribution has mean=0 and sd=1?

      Thank you pointing out our oversight. We checked our scripts and data preparation did not include transcript standardization, and we have removed the above line from the manuscript.

      Lines 705-711: Please explain why assigning the positive/negative selection status for each gene is important. "Positive selection" here is defined as genes whose increased expression also increases fitness, but traditionally positive selection was defined as "the derived state is favored over the ancestral state". For a gene whose ancestral expression is high but lower expression increases fitness in this experiment, could we also say this gene is under positive selection? Given that we don't know the ancestral state here, maybe the authors could explain whether this definition is necessary. Also, given that many genes positively or negatively regulate each other in a pathway, it is also unclear whether it is necessary to assign the positive/negative status for a PC using the majority rule (lines 710-711).

      We have now defined the different selection terms with respect to our study and use them consistently throughout the manuscript.

      Lines 711-715: If I understand correctly, PCs were used as traits, and by definition PCs should all be orthogonal. Is this section saying only retaining PCs whose correlation < 0.6 with each other? What is the rationale?

      PCA were performed on transcript abundance and the resulting orthogonal PCs explaining over 0.5% variance were all retained for selection analyses.

      We also performed selection analyses on the functional traits measured in the field, but since these functional traits are correlated (and as such would not satisfy the independent variable requirement of regression analyses), we retained only those functional traits which had a Pearson correlation coefficient < 0.6.

      Line 729: Please briefly describe what CLIP is doing.

      We have added the required description.

      Lines 736-741: The accession numbers do not add up to 125.

      Thank you for catching our oversight. We have edited the text, and now the numbers add upto 125.

      Line 796: Please remind readers where these 247k SNPs come from. Supposedly all accessions have been whole-genome sequenced, so the total number of SNPs should be larger than this.

      We have detailed method detailing how the SNPs were obtained and processed in the lines preceding this. Indeed the number of SNPs would have been much bigger, but the stringent cutoffs and linkage disequilibrium pruning reduced our dataset to about 247k SNPs.

      Lines 154-160: This is a bit confusing. The authors first mentioned, for the raw selection differentials, the mean and variance differ between environments, meaning they are misleading (why?). The next sentence then says non-standardized selection differentials will be used.

      The mean and variance for transcript abundances vary between the two environments. Because traits are usually measured in different scales, it is recommended to standardize trait values using variance or mean before estimating selection coefficients. Multiplying this variance (or mean) standardized selection differential with heritability gives the expected response to selection in standard deviation (or mean) units. But if the trait variance (or mean) varies between traits or environments, it leads to a conflation between the standardized selection differential and trait variance (or mean), which can be misleading. So to avoid this, and given that our traits (transcript abundance in this case) were all measured on the same scale, we chose to not standardize our trait values and estimated raw selection differentials.

      Figure 1 c-e: Please explain how the horizontal axis values were obtained. Is it assuming these selection differentials have a normal distribution of mean=0 & sd=1?

      Yes, horizontal axis represents theorical quantile for selection differential assuming they have a normal distribution with mean=0 and sd=1. This has been added to the figure legend.

      Line 162-168: Please clarify this part. What does “general trend towards stronger positive compared to negative selection on gene expression” mean? Does it mean the whole distribution of S is significantly different from 0, the difference in the number of genes in the S>0 vs S<0 category, or the a-bit-higher median |S| in the S>0 vs S<0 category? If it is the last one, are the small differences biological meaningful (0.053 vs. 0.047 for control & 0.051 vs. 0.050 for salt conditions), given that the authors defined |S|<0.1 as neutral?

      By “general trend towards stronger positive compared to negative selection on gene expression”, we mean that more transcripts were under positive directional selection as compared to negative directional selection. We have also clarified this in the text now.

      Line 177-178: This sentence implies disruptive selection is more important than stabilizing selection in the saline environment, but the test was not significant (line 176).

      Although there was no significant difference in the magnitude of stabilizing vs disruptive selection within the saline environment, the number of transcripts experiencing stronger disruptive selection in the saline condition was greater than the number of transcripts experiencing disruptive selection in the normal conditions. And so comparing between conditions, disruptive selection plays an important role in the saline conditions.

      Line 188-190: How CN vs. AP was statistically defined was not mentioned in the Methods section.

      We have added in the main text within the Results section.

      Line 203-214: How do these results fit with the previous observations that almost all transcripts have significant heritability?

      Although we do find that all but three transcripts have a have significant genetic effect (and thus have significant heritability), the median broad-sense heritability for 51 antagonistically pleiotropic genes is 0.23. Give that, we would only be able to detect SNPs regulating gene expression with high effect size since our sample size is n=130. Additionally, we used a very stringent criteria (FDR < 0.001) to define eQTLs. These two factors in combination could lead to us not being able to detect significant eQTLs for AP genes.

      Line 246-250: Please explain why the current conclusion would be opposite from the previous study. Supposedly the PCA, G matrix, and breeder’s equation were done for each environment separately. It makes sense that the G matrix and response to selection could be different between saline and drought treatments, but for the control treatments in the two studies, do they still differ? Why? Also in Table S7, it would be nice to show the % variation explained by each PC.

      Although both our studies had largely overlapping samples, about 20% samples were unique to each study. Additionally, although the site where the study was performed was the same across the two studies, we found significant temporal differences in gene expression due to micro-environmental differences. Both these factors can lead to changes in direct and indirect selection and its response, and we are examining these differences as part of a separate study. We also highlight these caveats in our discussion.

      Information on percent explained by each PCs is given in Table S5.

      Figure 2b: The vertical axis was labeled as “selection gradient”, but I think the responses to selection (D, I, T) have different units.

      We have re-labeled the vertical axis as “selection”.

      Reviewer #4 (Recommendations For The Authors):

      The manuscript mixes terminology for selection from quantitative genetics with that from population genetics. This is problematic, and the adjectives positive and negative should be replaced as descriptors of selection by instead rewording, for example, positive directional selection as directional selection for higher transcript abundance.

      Lines 193-196: The phrasing here reads as if the selection is solely acting on the presence/absence of expression rather than on quantitative variation in expression. During revision, it would be worth considering including an analysis of genes that parses genes that show the presence/absence of variation of expression within or across environments separately from genes that are expressed to non-trivial levels in both environments.

      We have modified the sentence in question now. Also, we pre-processed RNA-seq data to remove all transcripts with low expression signals (sigma signal < 20), and further retained only transcripts that had non-trivial expression in at least 10% of the population, which we believe represents presence/absence of variation of expression within or across environments.

      Lines 216-231: Is this analysis solely for directional selection? Not clear since previous sections examined both directional and stabilizing selection.

      Yes, we performed this analysis for only directional selection, and have clarified this in the text too.

      Lines 224-226: The meaning of this sentence is unclear and should be written more concretely.

      We have rephrased the sentence to be more clear.

      Lines 232-241: The description of the scientific logic here could be read as implying that genes interacting in networks are the sole source of indirect selection. I recommend revising the language to indicate this cause is one of several potential causes.

      We have reworded the sentence such that we indicate selection acting on interacting genes is just one of the causes of indirect selection.

      The strength of the conclusions of the decoherence analysis should be evaluated in light of caveats with such analyses (see Cai and Des Marais New Phytologist 2023).

      We have added the caveat with relevant citation in the manuscript.

      Rename this section as "Selection on Organismal Traits", as the previous sections have also been investigating selection on traits, just molecular traits.

      We have renamed the section as recommended

      Lines 314-318: Rewrite for clarity. Most environments select for an optimal phenotype; it is just the case here that the phenotypic distribution in the high salinity environment overlaps with the optimum.

      We have rephrased and clarified the statement.

      Lines 343-345: Rephrase to "These results indicate that natural variation in gene regulation under..."

      Rephrased.

      Line 354: "most" reads as too strong a descriptor here if the majority is ~60%.

      We have reworded the sentence to read “more than half”

      Lines 359-361: It is unclear to me how this interpretation follows from the above analysis.

      We have reworded the sentence so that the claim follows our analysis.

      Line 372: Is the expectation here more specifically one of epistatic selection? Other processes could stochastically lead to the genetic fixation of compensatory/reinforcing variants, but I think only epistasis for fitness would cause the interesting patterns of LD observed.

      The expectation here is that certain cis and trans variants only exists to compensate/reinforce, potentially through epistasis. We have clarified this in the text.

      Line 405: Change "adaptive organismal responses of organisms" to "organismal responses." As written, the sentence reads as being about plasticity rather than evolutionary responses, which are by populations, not organisms. None of the analyses included the manuscript test specifically test for adaptive plasticity.

      Rephrased.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      The conserved AAA-ATPase PCH-2 has been shown in several organisms including C. elegans to remodel classes of HORMAD proteins that act in meiotic pairing and recombination. In some organisms the impact of PCH-2 mutations is subtle but becomes more apparent when other aspects of recombination are perturbed. Patel et al. performed a set of elegant experiments in C. elegans aimed at identifying conserved functions of PCH-2. Their work provides such an opportunity because in C. elegans meiotically expressed HORMADs localize to meiotic chromosomes independently of PCH-2. Work in C. elegans also allows the authors to focus on nuclear PCH-2 functions as opposed to cytoplasmic functions also seen for PCH-2 in other organisms. 

      The authors performed the following experiments: 

      (1) They constructed C. elegans animals with SNPs that enabled them to measure crossing over in intervals that cover most of four of the six chromosomes. They then showed that doublecrossovers, which were common on most of the four chromosomes in wild-type, were absent in pch-2. They also noted shifts in crossover distribution in the four chromosomes. 

      (2) Based on the crossover analysis and previous studies they hypothesized that PCH-2 plays a role at an early stage in meiotic prophase to regulate how SPO-11 induced double-strand breaks are utilized to form crossovers. They tested their hypothesis by performing ionizing irradiation and depleting SPO-11 at different stages in meiotic prophase in wild-type and pch-2 mutant animals. The authors observed that irradiation of meiotic nuclei in zygotene resulted in pch-2 nuclei having a larger number of nuclei with 6 or greater crossovers (as measured by COSA-1 foci) compared to wildtype. Consistent with this observation, SPO11 depletion, starting roughly in zygotene, also resulted in pch-2 nuclei having an increase in 6 or more COSA-1 foci compared to wild type. The increased number at this time point appeared beneficial because a significant decrease in univalents was observed. 

      (3) They then asked if the above phenotypes correlated with the localization of MSH-5, a factor that stabilizes crossover-specific DNA recombination intermediates. They observed that pch-2 mutants displayed an increase in MSH-5 foci at early times in meiotic prophase and an unexpectedly higher number at later times. They conclude based on the differences in early MSH-5 localization and the SPO-11 and irradiation studies that PCH-2 prevents early DSBs from becoming crossovers and early loading of MSH-5. By analyzing different HORMAD proteins that are defective in forming the closed conformation acted upon by PCH-2, they present evidence that MSH-5 loading was regulated by the HIM-3 HORMAD. 

      (4) They performed a crossover homeostasis experiment in which DSB levels were reduced. The goal of this experiment was to test if PCH-2 acts in crossover assurance. Interestingly, in this background PCH-2 negative nuclei displayed higher levels of COSA-1 foci compared to PCH-2 positive nuclei. This observation and a further test of the model suggested that "PCH-2's presence on the SC prevents crossover designation." 

      (5) Based on their observations indicating that early DSBS are prevented from becoming crossovers by PCH-2, the authors hypothesized that the DNA damage kinase CHK-2 and PCH2 act to control how DSBs enter the crossover pathway. This hypothesis was developed based on their finding that PCH-2 prevents early DSBs from becoming crossovers and previous work showing that CHK-2 activity is modulated during meiotic recombination progression. They tested their hypothesis using a mutant synaptonemal complex component that maintains high CHK-2 activity that cannot be turned off to enable crossover designation. Their finding that the pch-2 mutation suppressed the crossover defect (as measured by COSA-1 foci) supports their hypothesis. 

      Based on these studies the authors provide convincing evidence that PCH-2 prevents early DSBs from becoming crossovers and controls the number and distribution of crossovers to promote a regulated mechanism that ensures the formation of obligate crossovers and crossover homeostasis. As the authors note, such a mechanism is consistent with earlier studies suggesting that early DSBs could serve as "scouts" to facilitate homolog pairing or to coordinate the DNA damage response with repair events that lead to crossing over. The detailed mechanistic insights provided in this work will certainly be used to better understand functions for PCH-2 in meiosis in other organisms. My comments below are aimed at improving the clarity of the manuscript. 

      We thank the reviewer for their concise summary of our manuscript and their assessment of our work as “convincing” and providing “detailed mechanistic insight.”

      Comments 

      (1) It appears from reading the Materials and Methods that the SNPs used to measure crossing over were obtained by mating Hawaiian and Bristol strains. It is not clear to this reviewer how the SNPs were introduced into the animals. Was crossing over measured in a single animal line? Were the wild-type and pch-2 mutations made in backgrounds that were isogenic with respect to each other? This is a concern because it is not clear, at least to this reviewer, how much of an impact crossing different ecotypes will have on the frequency and distribution of recombination events (and possibly the recombination intermediates that were studied). 

      We have clarified these issues in the Materials and Methods of our updated preprint. The control and pch-2 mutants were isogenic in either the Bristol or Hawaiian backgrounds. Control lines were the original Bristol and Hawaiian lines and pch-2 mutants were originally made in the Bristol line and backcrossed at least 3 times before analysis. Hawaiian pch-2 mutants were made by backcrossing pch-2 mutants at least 8 times to the Hawaiian background and verifying the presence of Hawaiian SNPs on all chromosomes tested in the recombination assay. To perform the recombination assays, these lines were crossed to generate the relevant F1s.

      (2) The authors state that in pch-2 mutants there was a striking shift of crossovers (line 135) to the PC end for all of the four chromosomes that were tested. I looked at Figure 1 for some time and felt that the results were more ambiguous. Map distances seemed similar at the PC end for wildtype and pch-2 on Chrom. I. While the decrease in crossing over in pch-2 appeared significant for Chrom. I and III, the results for Chrom. IV, and Chrom. X. seemed less clear. Were map distances compared statistically? At least for this reviewer the effects on specific intervals appear less clear and without a bit more detail on how the animals were constructed it's hard for me to follow these conclusions. 

      We hope that the added details above makes the results of these assays more clear. Map distances were compared and did not satisfy statistical significance, except where indicated. While we agree that the comparisons between control animals and pch-2 mutants may seem less clear with individual chromosomes, we argue that more general, consistent patterns become clear when analyzing multiple chromosomes. Indeed, this is why we expanded our recombination analysis beyond Chromosome III and the X Chromosome, as reported in Deshong, 2014. We have edited this sentence to: “Moreover, there was a striking and consistent shift of crossovers to the PC end of all four chromosomes tested.”

      (3) Figure 2. I'm curious why non-irradiated controls were not tested side-by-side for COSA-1 staining. It just seems like a nice control that would strengthen the authors' arguments. 

      We have added these controls in the updated preprint as Figure 2B.

      (4) Figure 3. It took me a while to follow the connection between the COSA-1 staining and DAPI staining panels (12 hrs later). Perhaps an arrow that connects each set of time points between the panels or just a single title on the X-axis that links the two would make things clearer. 

      To make this figure more clear, we have generated two different cartoons for the assay that scores GFP::COSA-1 foci and the assay that scores bivalents. We have also edited this section of the results to make it more clear.

      Reviewer #2 (Public review): 

      Summary: 

      This paper has some intriguing data regarding the different potential roles of Pch-2 in ensuring crossing over. In particular, the alterations in crossover distribution and Msh-5 foci are compelling. My main issue is that some of the models are confusingly presented and would benefit from some reframing. The role of Pch-2 across organisms has been difficult to determine, the ability to separate pairing and synapsis roles in worms provides a great advantage for this paper. 

      Strengths: 

      Beautiful genetic data, clearly made figures. Great system for studying the role of Pch-2 in crossing over. 

      We thank the reviewers for their constructive and useful summary of our manuscript and the analysis of its strengths. 

      Weaknesses: 

      (1) For a general audience, definitions of crossover assurance, crossover eligible intermediates, and crossover designation would be helpful. This applies to both the proposed molecular model and the cytological manifestation that is being scored specifically in C. elegans. 

      We have made these changes in an updated preprint.

      (2) Line 62: Is there evidence that DSBs are introduced gradually throughout the early prophase? Please provide references. 

      We have referenced Woglar and Villeneuve 2018 and Joshi et. al. 2015 to support this statement in the updated preprint.

      (3) Do double crossovers show strong interference in worms? Given that the PC is at the ends of chromosomes don't you expect double crossovers to be near the chromosome ends and thus the PC? 

      Despite their rarity, double crossovers do show interference in worms. However, the PC is limited to one end of the chromosome. Therefore, even if interference ensures the spacing of these double crossovers, the preponderance of one of these crossovers toward one end (and not both ends) suggest something functionally unique about the PC end.

      (4) Line 155 - if the previous data in Deshong et al is helpful it would be useful to briefly describe it and how the experimental caveats led to misinterpretation (or state that further investigation suggests a different model etc.). Many readers are unlikely to look up the paper to find out what this means. 

      We have added this to the updated preprint: “We had previously observed that meiotic nuclei in early prophase were more likely to produce crossovers when DSBs were induced by the Mos transposon in pch-2 mutants than in control animals but experimental caveats limited our ability to properly interpret this experiment.”

      (5) Line 248: I am confused by the meaning of crossover assurance here - you see no difference in the average number of COSA-1 foci in Pch-2 vs. wt at any time point. Is it the increase in cells with >6 COSA-1 foci that shows a loss of crossover assurance? That is the only thing that shows a significant difference (at the one time point) in COSA-1 foci. The number of dapi bodies shows the loss of Pch-2 increases crossover assurance (fewer cells with unattached homologs). So this part is confusing to me. How does reliably detecting foci vs. DAPI bodies explain this? 

      We have removed this section to avoid confusion.

      (6) Line 384: I am confused. I understand that in the dsb-2/pch2 mutant there are fewer COSA-1 foci. So fewer crossovers are designated when DSBs are reduced in the absence of PCH-2.

      How then does this suggest that PCH-2's presence on the SC prevents crossover designation? Its absence is preventing crossover designation at least in the dsb-2 mutant. 

      We have tried to make this more clear in the updated preprint. In this experiment, we had identified three possible explanations for why PCH-2 persists on some nuclei that do not have GFP::COSA-1 foci: 1) PCH-2 removal is coincident with crossover designation; 2) PCH-2 removal depends on crossover designation; and 3) PCH-2 removal facilitates crossover designation. The decrease in the number of GFP::COSA-1 foci in dsb2::AID;pch-2 mutants argues against the first two possibilities, suggesting that the third might be correct. We have edited the sentence to read: “These data argue against the possibility that PCH-2’s removal from the SC is simply in response to or coincident with crossover designation and instead, suggest that PCH-2’s removal from the SC somehow facilitates crossover designation and assurance.”

      (7) Discussion Line 535: How do you know that the crossovers that form near the PCs are Class II and not the other way around? Perhaps early forming Class I crossovers give time for a second Class II crossover to form. In budding yeast, it is thought that synapsis initiation sites are likely sites of crossover designation and class I crossing over. Also, the precursors that form class I and II crossovers may be the same or highly similar to each other, such that Pch-2's actions could equally affect both pathways. 

      We do not know that the crossovers that form near the PC are Class II but hypothesize that they are based on the close, functional relationship that exists between Class I crossovers and synapsis and the apparent antagonistic relationship that exists between Class II crossovers and synapsis. We agree that Class I and Class II crossover precursors are likely to be the same or highly similar, exhibit extensive crosstalk that may complicate straightforward analysis and PCH-2 is likely to affect both, as strongly suggested by our GFP::MSH-5 analysis. We present this hypothesis based on the apparent relationship between PCH-2 and synapsis in several systems but agree that it needs to be formally tested. We have tried to make this argument more clear in the updated preprint.

      Reviewer #3 (Public review): 

      Summary: 

      This manuscript describes an in-depth analysis of the effect of the AAA+ ATPase PCH-2 on meiotic crossover formation in C. elegant. The authors reach several conclusions, and attempt to synthesize a 'universal' framework for the role of this factor in eukaryotic meiosis. 

      Strengths: 

      The manuscript makes use of the advantages of the 'conveyor' belt system within the c.elegans reproductive tract, to enable a series of elegant genetic experiments. 

      We thank this reviewer for the useful assessment of our manuscript and the articulation of its strengths.

      Weaknesses: 

      A weakness of this manuscript is that it heavily relies on certain genetic/cell biological assays that can report on distinct crossover outcomes, without clear and directed control over other aspects and variables that might also impact the final repair outcome. Such assays are currently out of reach in this model system. 

      In general, this manuscript could be more generally accessible to non-C.elegans readers. Currently, the manuscript is hard to digest for non-experts (even if meiosis researchers). In addition, the authors should be careful to consider alternative explanations for certain results. At several steps in the manuscript, results could ostensibly be caused by underlying defects that are currently unknown (for example, can we know for sure that pch-2 mutants do not suffer from altered DSB patterning, and how can we know what the exact functional and genetic interactions between pch-2 and HORMAD mutants tell us?). Alternative explanations are possible and it would serve the reader well to explicitly name and explain these options throughout the manuscript. 

      We have made the manuscript more accessible to non-C. elegans readers and discuss alternate explanations for specific results in the updated preprint. 

      Recommendations for the authors:  

      Reviewing Editor Comments: 

      (1) Please provide 'n' values for each experiment. 

      n values are now included in the Figure legends for each experiment.

      (2) Line 129: Please represent the DCOs as percent or fraction (1%-9.8%, instead of 1-13). 

      We have made this change.

      (3) Figure 3A legend: the grey bar should read 20hr. COSA-1/ 32 hr DAPI. In Figure 3E, it is not clear why 36hr Auxin and 34hr Auxin show a significant difference in DAPI bodies between control and pch-2, but 32hr Auxin treatment does not. Here again 'n' values will help. 

      We have made this change. We also are not sure why the 32 hour auxin treatment did not show a significant difference in DAPI stained bodies. We have included the n values, which are not very different between timepoints and therefore are unlikely to explain the difference. The difference may reflect the time that it takes for SPO-11 function to be completely abrogated.

      (4) Line 360: Please provide the fraction of PCH-2 positive nuclei in dsb-2.

      We have made this change. 

      Please also address all reviewer comments. 

      Reviewer #1 (Recommendations for the authors): 

      (1) Page 3, line 52. While I agree that crossing over is important to generate new haplotypes, work has suggested that the contribution by an independent assortment of homologs to generate new haplotypes is likely to be significantly greater. One reference for this is: Veller et al. PNAS 116:1659. 

      We deeply appreciate this reviewer pointing us to this paper, especially since it argues that controlling crossover distribution contributes to gene shuffling and now cite it in our introduction! While we agree that this paper concludes that independent assortment likely explains the generation of new haplotypes to a greater degree than crossovers, the authors performed this analysis with human chromosomes and explicitly include the caveat that their modeling assumes uniform gene density across chromosomes. For example, we know this is not true in C. elegans. It would be interesting to perform the same analysis with C. elegans chromosomes in control and pch-2 mutants, taking into account this important difference.

      (2) Figure 2. It would really help the reader if an arrow and text were shown below each irradiation sign to indicate the stage in meiosis in which the irradiation was done as well as another arrow in the late pachytene box to show when the COSA-1 foci were analyzed. In general, having text in the figures that help stage the timing in meiosis would help the non C. elegans reader. This is also an issue where staging of C. elegans is shown (Figure 4). 

      We have made these changes to Figure 2. To help readers interpret Figure 4, we have added TZ and LP to the graphs in Figure 4B and 4D and indicated what these acronyms (transition zone and late pachytene, respectively) are in the Figure legend.

      (3) Page 12, line 288. It would be valuable to first outline why the him3-R93Y and htp-3H96Y alleles were chosen. This was eventually done on Page 13, but introducing this earlier would help the reader. 

      We have introduced these mutations earlier in the manuscript.

      (4) Page 13, line 323. A one sentence description of the OLLAS tagging system would be useful. 

      We have added this sentence: “we generated wildtype animals and pch-2 mutants with both GFP::MSH-5 and a version of COSA-1 that has been endogenously tagged at the Nterminus with the epitope tag, OLLAS, a fusion of the E. coli OmpF protein and the mouse Langerin extracellular domain”

      Reviewer #2 (Recommendations for the authors): 

      (1) The title is a little awkward. Consider: PCH-2 controls the number and distribution of crossovers in C. elegans by antagonizing their formation 

      We have made this change.

      (2) Abstract: 

      Consider removing "that is observed" from line 20. 

      We have made this change.

      I'm confused by the meaning of "reinforcement of crossover-eligible intermediates" from line 27. 

      We have removed this phrase from the abstract.

      A definition of crossover assurance would be helpful in the abstract. 

      We have added this to the abstract: “This requirement is known as crossover assurance and is one example of crossover control.”

      (3) Line 36: I know a stickler but many meioses only produce one haploid gamete (mammalian oocytes, for example) 

      Thanks for the reminder! We have removed the “four” from this sentence.

      (4) Line 284 - are you defining MSH-5 foci as crossover-eligible intermediates? If so, please state this earlier. 

      We have added this to the introduction to this section of the results: “In C. elegans, these crossover-eligible intermediates can be visualized by the loading of the pro-crossover factor MSH-5, a component of the meiosis-specific MutSγ complex that stabilizes crossover-specific DNA repair intermediates called joint molecules”

      (5) Can the control be included in Figure S1? 

      We have made this change.

      (6) Can you define that crossover designation is the formation of a COSA-1 focus? 

      We did this in the section introducing GFP::MSH-5: “In the spatiotemporally organized meiotic nuclei of the germline, a functional GFP tagged version of MSH-5, GFP::MSH-5, begins to form a few foci in leptotene/zygotene (the transition zone), becoming more numerous in early pachytene before decreasing in number in mid pachytene to ultimately colocalize with COSA-1 marked sites in late pachytene in a process called designation” 

      (7) Would it be easier to see the effect of DSB to crossover eligible intermediates in Spo-11, Pch-2 vs. Spo-11 mutant with irradiation using your genetic maps? At least for early vs. late breaks? 

      Unfortunately, irradiation does not show the same bias towards genomic location that endogenous double strand breaks do so it is unlikely to recapitulate the effects on the genetic map.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weaknesses:

      In my estimation, the following would improve this manuscript:

      (1) The physiological relevance of these data could be better highlighted. For instance, future work could revolve around incubating oocytes with oviduct fluid (or OVGP1) to reduce polyspermy in porcine IVF, and naturally improve sperm selection in human IVF.

      Thank you for the suggestions. We have added these physiological relevance points at the end of the discussion.

      (2) Biological and technical replicate values for each experiment are unclear - for semen, oocytes, and oviduct fluid pools. I suggest providing in the Materials and Methods and/or Figure legends.

      Biological and technical replicates are now indicated in M&M. Number of oocytes or ZPs used were already indicated in every Supplementary Table.

      (3) Although differences presented in the bar charts seem obvious, providing statistical analyses would strengthen the manuscript.

      Statistical analyses are now indicated in each bar chart.

      (4) Results are presented as {plus minus} SEM (line 677); however, I believe standard deviation is more appropriate.

      This was a mistake; all the results are indicated as standard deviation.

      (5) Given the many independent experimental variables and combinations, a schematic depiction of the experimental design may benefit readers.

      A schematic depiction of the experimental design is now included as Figure 1. This new Figure modifies the number assigned to the rest of Figures.

      (6) Attention to detail can be improved in parts, as delineated in the "author recommendation" review section.

      Done

      Reviewer #2 (Public review):

      Weaknesses:

      The authors postulate a role for oviductal fluid in species-specific fertilization, but in my opinion, they cannot rule out hormonal effects or differences in the method of oocyte maturation employed.

      As we indicate below, the effect of hormones has been analyzed, and we have demonstrated that it is not the cause of zona pellucida specificity.

      They also cannot unequivocally prove that OVGP1 is the oviductal protein involved in the effect. Additional experiments are necessary to rule out these alternative explanations.

      Our work does not demonstrate that other proteins could be involved, but it does show that OVGP1 is involved in the process.

      When performing the EZPT assay on mouse oocytes obtained either from the ovary or from the oviduct, the oocytes obtained from the ovary came from mice primed with eCG, whereas the ones collected from the oviduct were obtained from superovulated mice (eCG plus hCG). This difference in the hormonal environment may make a difference in the properties of the ZP. Additionally, the ones obtained from the ovary were in vitro matured, which is also different from the freshly ovulated eggs and, again, may change the properties of the ZP. I suggest doing this experiment superovulating both groups of mice but collecting the fully matured MII eggs from the ovary before they get ovulated. In that way the hormonal environment will be the same in both groups and in both groups, oocytes will be matured in vivo. Hence, the only difference will be the exposure to oviductal fluids.

      In Figure 2, we compare ZPs from murine oocytes obtained from the ovary using only PMSG with ZPs from oviductal oocytes treated with both HCG and PMSG. But in Figure 7, however, we compared ZPs from murine oocytes exposed only to PMSG, with the only difference being whether or not they had been in contact with OVGP1. This shows that it is not the effect of the hormone but rather the contact with OVGP1 that determines their specificity.

      Mice with OVGP1 deletion are viable and fertile. It would be quite interesting to investigate the species-specificity of sperm-ZP binding in this model. That would indicate whether OVGP1 is the only glycoprotein involved in determining species-specificity. Alternatively, the authors could immunodeplete OVGP1 from oviductal fluid and then ascertain whether this depleted fluid retains the ability to impede cross-species fertilization.

      We agree with the reviewer that it would be interesting to investigate sperm-ZP binding in this model. Unfortunately, we do not have the OVGP1 knockout mouse strain. We also believe that immunodepletion of OVGP1 would not completely remove the protein, so its effect would likely not be entirely eliminated.

      What is the concentration of OVGP1 in the oviduct? How did the authors decide what concentration of protein to use in the experiments where they exposed ZPs to purified OVGP1? Why did they use this experimental design to check the structure of the ZP by SEM? Why not do it on oocytes exposed to oviductal fluid, which would be more physiological?

      We have included in the manuscript that the concentration of OVGP1 in the oviductal fluid was quantified using ImageJ software by comparing the mean gray value of the band in the oviductal fluid to the band in the recombinant protein lane. By establishing this relationship, along with the known concentration of protein amount in the recombinant one and in the total protein amount of oviductal fluid, the concentration of OVGP1 in the oviductal fluid was determined as the average of three western blots. The concentration of OVGP1 in oviductal fluids was in the range of 100-150 ng/µl in mice and 150-200 ng/µL in cow. We have included also in the manuscript the concentration that we have use for the EZPTs, 30 ng/µL of recombinants OVGP1 (bovine, murine and human) for 30 minutes in 20µL drops. With this concentration, we observed a clear effect on zona specificity with no negative impact on the gametes.

      As you can see in supplementary Fig S8B, we already realized SEM of oocytes exposed to oviductal fluid.

      None of the figures show any statistical analysis. Please perform analysis for all the data presented, include p values, and indicate which statistical tests were performed. The Statistical analysis section in the Methods indicating that repeated measures ANOVA was used must refer to the tables. Was normality tested? I doubt all the data are normally distributed, in which case using ANOVA is not appropriate.

      Statistical results are now included in each Figure and Table. All the statistical analysis are included, all the data pass normality, homogeneity of variance and independence; for this reason the data analysis was conducted by using a one-way ANOVA, followed by Tukey´s post hoc test. Significance level was set at p <0.05.

      Why was OVGP1 selected as the probable culprit of the species specificity? In the Results section entitled "Homology of bovine, human and murine OVGP1 proteins..." the authors delve into the possible role of this protein without any rationale for investigating it. What about other oviductal proteins?

      A sentence indicating this rationale for investigating OVGP1 has been introduced in this paragraph.

      Reviewer #3 (Public review):

      Weaknesses:

      The manuscript began with a well-written introduction, but problems started to surface in the Results section, in the Discussion, as well as in the Materials and Methods. Major concerns include inconsistencies, misinterpretation of results, lacking up-to-date literature search, numerous errors found in the figure legends, misleading and incorrect information given in the Materials and Methods, missing information regarding statistical analysis, and inadequate discussion. These concerns raise questions regarding the authenticity of the study, reliability of the findings, and interpretation of the results. The manuscript does not provide solid and convincing findings to support the conclusion.

      We have modified and clarified all the issues, some of which are misunderstandings, we have also performed the suggested experiment of putting sperm in contact with OVGP1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Ensure consistency in (past) tense, for example, "decondensed" (line 102), "induced" (line 103), and elsewhere.

      Done

      (2) Replace "table" with "Table" throughout.

      Done

      (3) The authors often refer to "co-incubation". I believe this should read "incubation". My understanding is that oocytes were incubated with oviduct fluid or sperm but never both simultaneously as "co-incubation" implies.

      Done

      (4) Synonymous terms "OVGP1" and "oviductin" are used interchangeably. Consider using one or the other for consistency.

      We believe that by using both terms, reading is more fluid.

      (5) Delete "around" on line 256 and "approximately" on line 263 and provide actual percentages.

      Done

      (6) The point of the sentence on lines 311-313 is unclear to me.

      Rewritten

      (7) Suggest specifying "wildtype" on line 419.

      All the mice used in this work are wildtype

      (8) Do the authors have details regarding cattle oocyte donor breeds?

      Done

      (9) What do the authors mean by "strengthen" on line 500?

      The word strengthen has been changed to carefully isolated

      (10) Ponceau and vinculin (Figure 3) details are not provided in the manuscript.

      Ponceau and vinculin details are now included in the manuscript

      (11) Address formatting issues (e.g. citation 26 among others).

      Done

      (12) Primary and secondary antibody controls for immunofluorescent imaging (to fully exclude autofluorescence) are lacking.

      Controls for immunofluorescent imaging are indicated in Supplementary Figure S7.

      (13) The corresponding author on the manuscript and in the eLife submission system are different

      It was a problem during submission, now it is corrected.

      Reviewer #2 (Recommendations for the authors):

      (1) For the experiment depicted in Figures 3C and D, the authors need to perform a negative control to demonstrate that this fluorescent signal is specific. What happens if they express a different FLAG-tagged protein instead of bOVGP1 and mOVGP1? FLAG antibodies give quite strong non-specific binding. Or if they expressed untagged bovine and mouse OVGP1?

      The negative controls are in the supplementary Figure S7. A rabbit polyclonal antibody to the human OVGP1 was used for murine and bovine IVM ZPs from ovaries and murine superovulated ZPs recovered from mouse oviducts. There is a remarkable difference in the ones that are not incubated with any OVGP1 and the endogenous one, given the specificity of the antibody.

      Also, IVM mouse and bovine oocytes incubated or not with OF were immunoblotted with anti-Flag-tag antibody. Since any of them present OVGP1 tagged to Flag, there is not signal in the immunofluorescence.

      (2) For the Western blots of recombinant proteins, why are the authors not showing the blots using His and FLAG tag antibodies? Is the 50-kDa band observed for the mouse OVGP1 detected with His-Tag antibody?

      We have included a supplementary figure S6 with the western blot with anti-His and anti-Flag. The protein around 50 kDa is not a specific band (there is not signal with anti-Flag). This new figure modifies the number assigned to the rest of supplementary figures (S6-S8).

      (3) How was the estrous cycle stage determined in mice? It is not described in the Methods.

      Estrous cycle stage was determined in mice by visual examination of the vaginal opening and cytological examination of the vagina smear. This is now included in the M&M

      (4) For sperm binding, what does the percentage mean?

      It was a mistake, percentages were related to pronuclear formation and cleavage not to sperm binding, this is now corrected.

      (5) In Figure 3A, the labels for regions C, D, and E are mixed up. It is regions A and C that are conserved (or orange and blue, if the letters are incorrect). The purple region is only present in the mouse (E?), and the red region (D?) is only in the human form. Also, the legend for this panel is repeated verbatim in the Results section. Please remove one of them.

      Errors in Figure 3a have been corrected. Legend repetition is removed.

      (6) In the title of Figure 1B and in different places in the text, it should be mouse (not mice) oocytes.

      Done

      (7) In line 140, I would change the part indicating "We extracted the cytoplasmic contents from the oocytes". It is not only the cytoplasm, but all the oocyte, including the nucleus and membranes, that are being removed.

      Done

      (8) Please rephrase the sentence in lines 245-247, as it is quite confusing.

      Done

      (9) In line 236, the authors indicate that "During in vitro maturation (IVM), oocytes displayed a porous ZP structure...". Do they mean after IVM? When were those oocytes collected for SEM?

      The sentence has been modified by “after IVF”. Bovine oocytes were collected from slaughterhouse ovaries and were similar to those used in the rest of the experiments in the manuscript.

      (10) In the legend of Figure 1, please indicate what the parthenogenic group is.

      Done

      (11) In the legend to Figure 1G, the text indicates "Note sperm only appear outside the zona". However, I cannot see any sperm in that image.

      The phrase has been removed, as when enlarging the image to better see the sperm that are inside the area, the vision of those that are outside has been lost.

      (12) In the legend to Figure 2 describing the different zona pictures, the letters of the panels are not correct.

      Done

      (13) In line 999, please provide the right concentration for NMase (it indicates 10 μ/mL).

      Done

      (14) Where does the model depicted at the end of the manuscript go? Is it a Figure? A graphical abstract? In that model, please correct some typos: it should be "ZP obtained from ovarian oocytes"; and change specie for species in all three panels.

      Done. It is a model (Fig. 10)

      (15) The FITC-PNA staining to visualize acrosomes is not described in the Methods section.

      Done

      Reviewer #3 (Recommendations for the authors):

      The present study reports findings from a series of experiments suggesting that bovine oviductal fluid and species-specific oviductal glycoprotein (OVGP1 or oviductin) from bovine, murine, or human sources modulate the species specificity of bovine and murine oocytes. The manuscript began with a well-written introduction, but problems started to surface in the Results section, Discussion as well as in the Materials and Methods. Major concerns include inconsistencies, misinterpretation of results, lacking up-to-date literature search, numerous errors found in the figure legends, misleading and incorrect information given in the Materials and Methods, missing information regarding statistical analysis, and inadequate discussion.

      We have modified and clarified all the issues, some of which are misunderstandings, we have also performed the suggested experiment of putting sperm in contact with OVGP1.

      Specific comments:

      (1) Lines 142 to 143 on page 5: It is stated that "Because this experiment was done on empty ZPs, we called this test "empty zona penetration test" (EZPT)". In fact, the experiment was not actually done on empty ZPs, but on oocytes with the ooplasm extracted. Therefore, the zona pellucidae used in the experiment were not empty but contained an intact zona matrix of glycoproteins. The term "EZPT" used by the authors in the manuscript is a misnomer. A better term should be used to reflect the ZPs which were intact and not empty.

      We extracted the cytoplasmic containing all the organelles, nucleus and membranes, and the polar body. This has been clarified in the text.

      (2) The authors need to distinguish between sperm penetration and sperm binding in the manuscript. In lines 169 to 177 on page 6, the authors mixed up the terms "penetration" and "binding" in the text. In writing about events leading to fertilization in reproductive biology, the term "sperm binding" refers to the interaction between the sperm plasma membrane and the oocyte zona pellucida (ZP), whereas the term "sperm penetration" refers to the passage of the sperm through the ZP. Therefore, the statements in lines 169 to 177 describing the binding of bovine, murine, and human sperm to bovine oocytes with and without prior treatment with oviductal fluid are misleading and not correct. In fact, Figure 2 and Table 6 show sperm penetration and not sperm binding.

      Figure 2A and B (now 3A and 3B), and Tables S6 show both sperm penetration (% penetration rate and average sperm in penetrated ZPs) and sperm binding (average sperm bound to ZPs). Throughout the manuscript, a clear distinction is made between sperm attached to the ZP and sperm that have penetrated it.

      (3) Lines 182 to 187 on page 6: What is being described in the text here does not match what is being shown in Figure 3A. As a result, the information provided in lines 182 to 187 is not correct and misleading. For example, it is stated in lines 182 to 183 that "As depicted in Fig. 3A, the sequences of these three OVGP1 have five distinct regions (A, B, C, D and E)." However, Figure 3A shows that hOVGP1 and mOVGP1 both have only 4 regions and bOVGP1 has only 3 regions. None of the three has 5 regions. In lines 183 to 184, the authors continued to state that "Regions A and D are conserved in the different mammals." This statement is also not true because Figure 3A shows that only region A is conserved in all three species but not region D which is found only in the human. What is stated in lines 186 to 187 is also not correct based on the information provided in Figure 3A. It is stated here that "Region C is an insertion present only in the mouse (Mus) and region E is typical of human oviductin." However, based on the color codes provided in Figure 3A, region C is present in all three species while region E is present only in the mouse.

      Errors with naming regions in Figure 3A (now 4A) have been corrected.

      (4) In lines 195 to 197 on page 6, the authors stated that "Western blots of the three OVGP1 recombinants indicated expected sizes based on those of the proteins: 75 kDa for human and murine OVGP1 and around 60 kDa for bovine OVGP1 (Fig. 3B)." However, the expected size of the recombinant human OVGP1 is not in agreement with what has been published in literature regarding the molecular weight of recombinant human OVGP1. It has been previously reported that a single protein band of approximately 110-150 kDa was detected for recombinant human OVGP1 using an antibody against human OVGP1. The authors provided Western blots of murine oviductal fluid and bovine oviductal fluid in Figure 3B but not a Western blot of native human oviductal fluid. The latter should have been included for a comparison with the recombinant human OVGP1.

      We do not have human oviductal fluid, but we have included now a supplementary figure 6S of a western blot with antibody again His and Flag (present in the recombinant OVGP1) which shows that the size of the recombinant protein is as indicated in the Figure 3B (now 4B).

      (5) Lines 220 to 229 on page 7: In this experiment, the authors conducted the EZPT using ZPs from bovine oocytes that were either treated with or without bOVGP1 followed by incubation, respectively, with homologous sperm (bovine) and heterologous sperm (human and murine). This is a logical experiment to determine if OVGP1 plays a species-specific role in setting the specificity of the zona pellucida. However, in the in vivo situation, sperm that reach the lumen of the ampulla region of the oviduct where fertilization takes place are also exposed to oviductal fluid of which OVGP1 is a major constituent. Therefore, an additional experiment in which sperm are treated with OVGP1 prior to incubation with ZP should be carried out for a comparison.

      The additional experiment in which sperm are treated with OVGP1 prior to incubation with ZP has been done (Table S9). No effects were observed. This is now included in the manuscript.

      (6) Regarding the results obtained with the use of neuraminidase (lines 278 to 293 on pages 8 to 9), if neuraminidase treatment of bovine ZP prevented bovine sperm penetration regardless of whether ZPs had been or had not been in contact with OVGP1, that means OVGP1 is not responsible for penetration despite the description of earlier findings in the manuscript. Sialic acid is likely associated with the sugar side chains of ZP glycoproteins and not sugar side chains of OVGP1. To attribute the species-specific property of sialic acid to OVGP1 for sperm binding, an experiment in which OVGP1 will be treated with neuraminidase prior to performing the EZPT is needed.

      We conducted the experiment by treating only OVGP1 with neuraminidase and then isolating OVGP1 from the enzyme previously to incubate treated OVGP1 with ZPs. The results agree with our previous findings, indicating the importance of sialic acid on OVGP1 for sperm binding and penetration, and confirming that OVGP1 is responsible for species-specific penetration. Results are shown in Fig. 9 and Table S14.

      (7) The Discussion appears superficial and a more in-depth discussion regarding the results obtained in the present study in relation to other reports about OVGP1 published in literature is needed (e.g. a recent paper published by Kenji Yamatoya et al. (2023) Biology of Reproduction https://doi.org/10.1093/biolre/ioad159). Lines 317 to 342 of the Discussion on pages 10 to 11 should belong to the Introduction.

      Results of Yamatoya are now included in discussion. Part of the discussion from 317 to 342 are now in the introduction

      (8) In is not clear what the authors exactly want to say in lines 343 to 344 of the Discussion on page 11. It is stated here that "The empty zona penetration test (EZPT) enables heterologous sperm to overcome the oocyte's second barrier, the plasma membrane or oolemma." Do the authors mean that the sperm can now enter the empty space encircled by the ZP without having to go through the plasma membrane or oolemma? In Figure S4 which depicts the method used to empty the ooplasm in the bovine oocyte, does the method extract only the ooplasm (or cytoplasmic contents) leaving behind the intact plasma membrane or oolemma? This needs to be clearly shown and clearly explained. High magnifications of the zona pellucida are also needed to show whether the plasma membrane (or oolemma) is still present and intact after extraction of the ooplasm.

      This is clearly explained in the text. To obtain empty ZP, everything except ZP (nucleus, organelles, membranes and cytoplasmic contents of the oocytes) was removed using a micromanipulator, following the procedure outlined in Figure S4.

      (9) The authors stated in the Discussion in lines 383 to 383 on page 12 that "After ovulation, the changes reported in the carbohydrate composition of the ZP (3, 25) are likely induced by the addition of glycoproteins of oviductal origin, as we have seen here with OVGP1." There is no evidence in the present study to suggest that OVGP1 or glycoproteins of oviductal origin have changed or can change the carbohydrate composition of the ZP. At present, it is not known if OVGP1 or glycoproteins of oviductal origin directly interact with ZP glycoproteins (including ZP1, ZP2, ZP3 and/or ZP4) that make up the zona matrix.

      There is scientific evidence suggesting that oviductal glycoproteins, including OVGP1, interact with the zona pellucida (ZP) glycoproteins of the oocyte. Studies have shown that OVGP1 binds to the ZP of the oocyte. Specifically, OVGP1 is thought to interact with ZP glycoproteins, such as ZP2 and ZP3, in a way that may help stabilize the oocyte or modify the ZP structure during its passage through the oviduct. This interaction is believed to influence processes like sperm binding, oocyte maturation, and potentially the prevention of polyspermy during fertilization. For example, in several studies, the absence of OVGP1 in knockout animals (such as in Ovgp1-KO hamsters) has been associated with impaired fertilization and embryonic development, which indicates the importance of this interaction. However, the detailed molecular mechanisms and functional significance of these interactions require further exploration. We have use the work “likely” to soften this statement.

      Velásquez, J. G., Canovas, S., Barajas, P., Marcos, J., Jiménez‐Movilla, M., Gallego, R. G., ... & Coy, P. (2007). Role of sialic acid in bovine sperm–zona pellucida binding. Molecular reproduction and development, 74(5), 617-628.

      Kunz, P., et al. (2013). "The role of oviductal glycoprotein 1 in sperm–egg interaction and early embryonic development." Reproduction, 145(3), 225-233. DOI: 10.1530/REP-12-0300

      Yamatoya, K., Kurosawa, M., Hirose, M., Miura, Y., Taka, H., Nakano, T., ... & Araki, Y. (2024). The fluid factor OVGP1 provides a significant oviductal microenvironment for the reproductive process in golden hamster. Biology of reproduction, 110(3), 465-475.

      (10) Lines 390 to 391 page 12: The statement "This determines that OVGP1 modifications are critical to define the barrier among the different species of mammals." needs to be rephrased because there is no evidence in the present study showing that OVGP1 has been modified. There are many concerns with errors, important information that is missing, and inconsistencies as well as wrong and misleading information in the Materials and Methods which are troublesome. These concerns raise questions regarding the authenticity and reliability of the study. Some of the major concerns are listed below:

      All concerns have been fixed

      (11) It says in line 399 on page 13 that "Human semen samples were obtained from a normozoospermic donor...". Do the authors really mean that the semen samples were obtained from only one donor?

      Samples were obtained from 3 normozoospermic donor, this is now indicated in M&M

      (12) In lines 409 to 411 on page 13, what do the authors mean by "...the samples were frozen into pellets..."? Was centrifugation of the samples carried out prior to freezing the samples? Secondly, what do the authors mean by "....and stored in liquid nitrogen at -196{degree sign}C or lower.", particularly what do the authors mean by "or lower"? The temperature of liquid nitrogen is -196{degree sign}C. What is the "lower" temperature?

      Centrifugation of the samples were no carried out at this time. A more detailed protocol is now included The word lower has been removed.

      (13) Line 424 on page 13: Provide the full name of "M2" when it is first used in the text then followed by the abbreviation.

      Done

      (14) Is there a reason why different counting chambers were used to determine sperm concentrations? In line 432 on page 13, a Thomas cell counting chamber was used to determine the sperm count of epididymal mouse sperm whereas it is mentioned in line 441 on page 14 that a Neubauer cell counting chamber was used to determine epididymal cat sperm. Furthermore, where did the cat's sperm come from?

      The cat sperm was obtained and processed at the Faculty of Veterinary Medicine and the rest of the samples were processed in the INIA-CSIC lab, and different chambers were used in both places.

      (15) The mention of the use of cat spermatozoa in line 439 on page 14 is a worrisome problem of the manuscript. The present study used bovine, mouse, and human sperm and not cat. Therefore, the sudden mentioning of the use of cat spermatozoa in the Materials and Methods is troublesome and worrisome. It appears that the paragraph from lines 439 to 450 was directly copied and pasted from previously published work. Furthermore, lines 441 to 445 do not flow and do not make sense. In fact, what is described in this paragraph (lines 439 to 450) does not appear to correspond to the method(s) used to obtain the results presented in the Results section of the manuscript.

      I don't understand why the reviewer says we don't use cat sperm. This study uses cat sperm. Results of cat sperm are indicated in the Figure 1A (now 2A). We have modified the M&M to clarify frozen description.

      (16) Similarly, several problems are also found in the paragraphs (lines 453-478 on page 14) describing the methods and procedures to obtain homologous and heterologous IVF of bovine oocytes. Firstly, it is mentioned here (in line 460) that COCs were co-incubated with selected sperm without removing the cumulus cells. However, the results of the sperm penetration experiment indicated otherwise. Figures 2 and 3 show that the oocytes were denuded of cumulus cells. Secondly, it is very worrisome and troublesome to read what is written in line 468 on page 14 that "...from other species (cat, human, mouse, and rabbit)." One wonders where the cat and rabbit came from. Again, it appears that this paragraph was directly copied and pasted from previously published work.

      Cat sperm was used in this manuscript and it is correctly indicated in every section and figures. About IVF and EZPT protocols, in the protocol of IVF for bovine oocytes, COCs were used without removing the cumulus cells. For the EZPT cumulus cells were removed, this is described in the following sections of the material and methods. The word rabbit was a mistake and it has been removed.

      (17) In lines 468 to 469 on page 14, it is mentioned that "Sperm-egg interactions were assessed through a sperm-ZP binding assay...". The authors only examined sperm penetration in their study. Therefore, this needs to be specified in the Materials and Methods. Secondly, the authors did not use the conventional sperm-ZP binding assay in their study. Instead, they used the EZPT in their study. There appear to be many inconsistencies throughout the manuscript.

      When the IVF experiments using bovine COCs were done (Fig 2A and C, Fig 1S to 3S, and Tables 1S to 4S) conventional sperm-egg interaction was assessed at 2.5 hours after IVF. EZPT was used in the rest of experiments. IVF with COCs and EZPT with ZPs are different experiments.

      (18) Lines 480 to 489 on page 15 under the sub-heading of "In vitro culture of presumptive zygotes to first cleavage embryos on Day 2" do not provide the correct methodology used for obtaining the results presented in the manuscript. In line 482, it is not clear where the "synthetic oviductal fluid" came from. In fact, in the Results section, none of the results came from the use of synthetic oviductal fluid. In line 487, humans and rabbits are mentioned here. However, human and rabbit oocytes were not used in the present study. It is very strange indeed to read human and rabbit in the sentence.

      SOF reference is now included. Human results are in Fig 1A; the sentence is referred about the cultures of bovine oocytes inseminated with sperm of bull, human, mouse or cat). Rabbit word is a mistake and is now eliminated of the manuscript.

      (19) In line 500 on page 15, what do the authors mean by "Each oviduct was strengthen by removing the adjacent tissue..."?

      The sentence has been modified.

      (20) On page 15 in the Materials and Methods, the authors described the collection of bovine and mouse oviductal fluid. However, there is no mention of human oviductal fluid and how it was collected. This important information is missing.

      We have not use human oviductal fluid in this manuscript.

      (21) Line 510 on page 15: The sub-heading of "Preparation of empty zonae pellucidae from bovine ovarian oocytes" should be rephrased. As pointed out earlier in my review, the ZPs prepared by the authors were intact and not "empty". It was the oocyte which was empty after extraction of the ooplasm.

      Everything except the ZP were removed from the oocyte, this is now clarified in the manuscript.

      (22) Line 518 on page 16 and line 553 on page 17: "Figure S5" should be "Figure 4S".

      Done

      (23) Line 538 and line 547 on page 16: "mice oocytes" should be "mouse oocytes".

      Done

      (24) On page 17, the procedures for in vitro fertilization, sperm penetration, and binding assessment in mice were described here in lines 560 to 574. Several problems are noted in this paragraph as listed below:<br /> a. As mentioned earlier the authors in the present manuscript mixed up sperm penetration and sperm binding which are two separate events. Based on the results presented in the manuscript, they represent sperm penetration and not sperm binding. Therefore, the authors need to precisely explain in the manuscript whether the results presented refer to sperm penetration or sperm binding.

      Both sperm penetration and binding have been analyzed in this work.

      b. In line 570 on page 17, the term "insemination" is wrongly used here. Insemination is the introduction of semen into the female reproductive tract either through sexual intercourse or through an instrument. The procedure used in the present study was carried out in vitro in a co-incubation manner and not by transferring sperm into the female reproductive tract.

      The word insemination has been changed to incubation

      c. Information regarding procedures for treatment with various oviductal fluid and OVGP1s are all missing in the Materials and Methods.

      This information is now in M&M

      d. The concentrations of various oviductal fluids and OVGP1s used and the number of ZPs used in each incubation are also missing.

      Concentrations are now indicated in the manuscript. All the numbers and ZPs used are indicated in supplementary figures.

      (25) Lines 577 to 603 on pages 17 to 18: Were recombinant bovine and murine glycoproteins prepared using the same methodology? In line 595 on page 18, it is stated that "Supernatant was saved in subsequent experiments." It is not clear exactly what experiments the supernatant was subsequently used in.

      Details about how the bovine and murine glycoproteins were prepared are now included. Sentence about subsequent experiment is delete; supernatant was used for the next steps of protein purification.

      (26) What is being described in lines 604 to 609 on page 18 is problematic. The paragraph starts by saying that "Human recombinant oviductin was obtained from Origene Technologies....". Strangely, the paragraph continues by saying that the recombinant proteins were produced by transfection in HEK293T...". If recombinant human OVGP1 had already been obtained from Origene Technologies, why did the authors want to produce it again? It does not make sense.

      We briefly described the method that Origene used for the production of the human recombinant OVGP1

      (27) In lines 626 to 627 on page 18, it is stated that "Zonae pellucidae previously incubated with OVGP1 proteins from several species and murine oviductal fluid...". Were the zonae pellucidae previously incubated with only murine oviductal fluid or also with others?

      It was only incubated with OVGP1 or with oviductal fluid, this is now clarified in the text.

      (28) In lines 638 and 639 on page 19, can the authors please explain the difference between "endogenous OVGP1 and bOVGP1" and "exogenous recombinant hOVGP1 and mOVGP1"?

      This is now clarified

      (29) As stated in lines 676 to 679 on page 20, statistical analysis was performed in the study. Strangely, no "n" numbers and p values were provided in any of the figures that require statistical analysis. This is problematic.

      Statistical analysis and significant differences are now included in the figures, all the numbers used are included in the supplementary tables that are related with the figures.

      There are also many errors noted in the Figure Legends. These concerns raise questions regarding the reliability of the findings and interpretation of the results. Some major ones that require attention are listed below:

      (30) Figure legend 1 on page 27: In line 912, where did the "cat sperm" come from? In line 913, where did the "feline sperm" come from? In line 918, as pointed out earlier, the term "empty zona penetration test (EZPT)" is a misnomer and should be replaced with a better term. In line 924, it is stated that "Note sperm only appear outside the zona." However, no sperm can be seen outside the zona pellucida shown in Figure 1.

      Cat sperm is used in this manuscript. Term EZPT is now clarified The sentence about sperm outside of ZP is removed

      (31) Figure legend 2 on page 27 (lines 928 to 940) needs to be rewritten. Some of the sentences are not clearly written. Authors, please check all the capital labeling letters some of which appear to be wrong.

      Done

      (32) As is written, Figure legend 3 on pages 28 and 29 (lines 943 to 959) presents many problems:

      a. Contrary to what is stated in the figure legend, not all five regions are present in the hOVGP1, mOVGP1, and bOVGP1.

      Done

      b. Contrary to what is stated in line 946, region D is not conserved in the mouse and bull as shown in Figure 3A, and region C is not present only in the mouse.

      Done

      c. Based on what is shown in Figure 3A, region E is present only in the mouse and not in the human.

      Done

      d. What is stated in line 951 that "Proteins were expressed in mammalian cells..." is not correct. Based on the information provided in the manuscript, recombinant human OVGP1 was obtained from Origene Technologies and was not expressed in mammalian cells as claimed.

      All the recombinant proteins were produced in mammalian cells.

      (33) Figure legend 6 on page 28: In lines 985 to 986, what do the authors mean by "...and combinations of the three oviductins with sperm of the three species."? As is written, it appears that the bovine ZPs were pretreated with a combination of all three oviductins and then co-incubated with sperm from the bull, mouse and human together.

      We have clarified this sentence

      (34) What is described in the figure legend for the supplemental figure (Figure S7) does not make sense.

      Legend of Fig S7 (now S8) is related to pictures A to E, the legend is now clarified.

      (35) In addition to the figures and supplemental figures provided in the manuscript, there is also an additional figure labeled with "Model" showing three diagrams. Strangely, there is no mention of this additional figure in the manuscript. There is no figure legend for or description of this figure. It is not clear what is being shown in this figure, and it is not clear about the purpose of the use of this figure.

      We have included a legend to the model that is now Figure 10.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, a chromosome-level genome of the rose-grain aphid M. dirhodum was assembled with high quality, and A-to-I RNA-editing sites were systematically identified. The authors then demonstrated that: 1) Wing dimorphism induced by crowding in M. dirhodum is regulated by 20E (ecdysone signaling pathway); 2) an A-to-I RNA editing prevents the binding of miR-3036-5p to CYP18A1 (the enzyme required for 20E degradation), thus elevating CYP18A1 expression, decreasing 20E titer, and finally regulating the wing dimorphism of offspring.

      Strengths:

      he authors present both genome and A-to-I RNA editing data. An interesting finding is that a A-to-I RNA editing site in CYP18A1 ruin the miRNA binding site of miR-3036-5p. And loss of miR-3036-5p regulation lead to less 20E and winged offspring.

      Weaknesses:

      How crowding represses the miR-3036-5p is still unclear.

      Reviewer #2 (Public Review):

      Summary:

      Environmental influences on development are ubiquitous, affecting many phenotypes in organisms. However molecular genetic and cellular mechanisms transducing environmental signals are still only barely understood. This study examines part of one such intracellular mechanism in a polyphenic (or dimorphic) aphid.

      Strengths:

      While other published reports have linked phenotypic plasticity to RNA editing before, this study reports such an interaction in insects. The study uses a wide array of molecular tools to identify connections upstream and downstream of the RNA editing to elucidate the regulatory mechanism, which is illuminating.

      Weaknesses:

      While this system is intriguing, this report does not foster confidence in its conclusions. Many of the analyses seem based on very small sample sizes. It is itself problematic that sample sizes are not obvious in most figures, although based on Methods section covering RNAseq, they seem to be either 3, 6 or 9, depending on whether stages were pooled, but that point is not made clear. With such small sample sizes, statistical tests of any kind are unreliable. Besides the ambiguity on sample sizes, it's unclear what error bars or whiskers show in plots throughout this study. When sample sizes are small estimates of variance are not reliable. Student's t-test is not appropriate for comparisons with such small sample sizes. Presently, it is not possible to replicate the tests shown in Figures 3, 4 and 6. (Besides the HT-seq reads, other data should also be made publicly available, following the journal's recommendations.) Regardless, effect sizes in some comparisons (Fig 3J, 4A-C, 6E, H) are clearly not large, making confidence in conclusions low. The authors should be cautious about over-interpreting these data.

      We appreciate very much for the reviewers’ time spent on our manuscript and the referees for the valuable suggestions and comments.

      To Reviewer #1:

      At present, researches on miRNAs mainly focus on its role in gene regulation by binding to the mRNA of target genes, “how miRNAs are regulated” has received less attention.

      Recent researches indicated that the expression of miRNAs is also regulated at the transcriptional or post transcriptional level. Transcriptional regulation including changes in the promoter of microRNA genes, and post-transcriptional mechanisms such as changes in miRNA processing and stability can both affect the final expression level of miRNAs.

      This article did not address how crowding treatment regulates miRNA expression. But this will be a very interesting issue, and we will pay attention to it in our future research.

      Thank you for this suggestion.

      To Reviewer #2:

      (1) “Transgenerational wing dimorphism was observed in M. dirhodum in which crowding of the parent (100 mother aphids in a 10 cm³ tube) increased the winged offspring (Fig 3E).” In this experiment, over 250 offsprings were used to calculate the proportion of winged and wingless individuals in normal (277), crowding (255) and crowding+20E (272) groups, respectively.

      “The RNAi-mediated knockdown of CYP18A1 and ADAR2 can significantly increase the titer of 20E (Fig. 4E) and reduce the number of winged offspring by 29.6% and 24.4% (Fig. 4F), respectively.” In this experiment, over 245 offsprings were used to calculate the proportion of winged and wingless individuals in dsEGFP (273), dsCYP18A1(248), and dsADAR2 (250) groups, respectively.

      “miR-3036-5p agomir and antagomir treatments could affect the proportion of winged offspring under normal conditions (Fig. 6F), but have no effect on the wing dimorphism of offspring under crowded conditions (Fig. 6L).” In this experiment, over 235 offsprings were used to calculate the proportion of winged and wingless individuals in each group, respectively.

      So I think our conclusion that crowding treatment, A-to-I RNA editing, and miRNAs could affect the wing dimorphism of offspring in M. dirhodum is very reliable. Because the number of aphids we use to count the results is sufficient.

      (2) The quantitative PCR method is used to detect changes in gene expression levels of CYP18A1 and ADAR2 after treatment with crowding, 20E, dsRNA, miRNA agomir and antagomir, and the results are shown in Fig. 3J, 4A-C, 5B, 6B, H, respectively. 5 biological replicates (more than 100 aphids were used for each biological replicate) were used in each sample, which might be sufficient for qPCR experiments. And among these biological replicates, the differences in gene expression levels are relatively small.

      (3) The titer of 20E was detected after treatment with crowding, 20E, dsRNA, miRNA agomir and antagomir, and the results are shown in Fig. 3I, 4E, 6E, K, respectively. 8 biological replicates (more than 100 aphids were used for each biological replicate) were used in each sample.

      The number of biological replicates used in each analysis and the number of aphids included in each biological replicate have been added in the Materials and Methods section. Thank you very much for pointing out this important issue.

      Reviewer #1 (Recommendations For The Authors):

      Several questions:

      (1) This study was conducted on the rose-grain aphid M. dirhodum. However, pea aphid Acyrthosiphon pisum seems to be a better object in wing dimorphism and development studies. Have the authors also identified the A-to-I RNA editing on pea aphids or other aphids?

      Wheat is one of the main grain crops in China as well as in the world. Metopolophium dirhodum is one of the most important wheat aphids around China, and has posed a significant threat to grain production. The current study was conducted to determine the regulatory mechanism of wing dimorphism on M. dirhodum, which might be of great significance to better control this pest in wheat production.

      Surely the pea aphid offers more established experimental tools and genomic resources. However, with the development of high-throughput sequencing technology, the chromosome level genomes of many insect species have been assembled. That means any of various insects might be studied as a model species, and not limited to Drosophila melanogaster, Acyrthosiphon pisum, etc.

      We didn’t identify the A-to-I RNA editing on pea aphids or other aphids. A recent study has shown that editing events are poorly conserved across different Xenopus species. Even sites that are detected in both X. laevis and X. tropicalis show largely divergent editing levels or developmental profiles. In protein-coding regions, only a small subset of sites that are found mostly in the brain are well conserved between frogs and mammals. The conservation of RNA editing in aphids is still unknown, and we will continue to pay attention to this issue in our future research works.

      Reference: Nguyen TA, Heng JWJ, Ng YT, Sun R, Fisher S, Oguz G, Kaewsapsak P, Xue S, Reversade B, Ramasamy A, Eisenberg E, Tan MH. Deep transcriptome profiling reveals limited conservation of A-to-I RNA editing in Xenopus. BMC Biology. 2023, 21(1):251.

      (2) "Two miRNA-target prediction software programs, miRanda and RNAhybrid, were used to identify the miRNAs that potentially act on CYP18A1. The results showed that miR-3036-5p could bind to the sequence containing edited position (editing site 528) of CYP18A1 in M. dirhodum." Is there any other miRNA that can also act on CYP18A1, thereby regulating its expression?

      The predicted results indicate that there are several other miRNAs can act on CYP18A1, but none of them can bind to this editing site (editing site 528). Therefore, we did not pay attention to other miRNAs.

      (3) 11678 A-to-I RNA-editing sites were systematically identified in M. dirhodum. Does that mean RNAi-mediated knockdown of ADAR2 may affect the RNA-editing and expression of a large number of genes? Please clarify.

      It is of course possible that RNAi-mediated knockdown of ADAR2 may affect the RNA-editing and expression of a large number of genes. A-to-I RNA editing was also observed in 5 other genes that involved in 20E biosynthesis and signaling pathway, but no evident difference was identified for the RNA editing and expression levels of these 5 genes after crowding treatment (Fig. S2, Table S5). That means the A-to-I RNA editing of CYP18A1 might be crucial in 20E-mediated wing dimorphism in M. dirhodum.

      (4) It is interesting that "the transcriptional level of ADAR2 was 2.19 fold higher in the crowding+20E treatment parent than that in the normal group, but no significant difference was identified between the crowding and normal groups". ADAR2 can be induced by 20E, rather than crowding. How should the author explain? It seems that 20E induction can also cause many RNA editing events.

      20-hydroxyecdysone (20E) can affect the growth and development, molting, metamorphosis, and reproductive processes of insects. According to this result, 20E induction can also cause RNA editing events by regulating the expression of ADAR2, and which may provide valuable references for the future study on 20E. Meanwhile, we will also continue to pay attention to this issue in our future research works.

      (5) Authors provided a lot of text to describe the genome assembly. I don't think it's necessary, authors can make appropriate deletions.

      Thank you for this suggestion. This is the first high-quality chromosome-level genome of M. dirhodum, which will be very helpful for the cloning, functional verification, and evolutionary analysis of genes in this important species or even other Hemiptera insects. Therefore, I think it is necessary to provide a detailed description. We will also make appropriate deletions in the “Result and Discussion” sections.

      Reviewer #2 (Recommendations For The Authors):

      Additional concerns

      - With an existing genome sequence available for the peas aphid *Acyrthosiphon pisum*, why have these authors chosen to use the rose-grain aphid for this study? It would be helpful to address any limitations in *Acyrthosiphon pisum* or advantages in *Metopolophium dirhodum* that explain that decision.

      Wheat is one of the main grain crops in China as well as in the world. Metopolophium dirhodum is one of the most important wheat aphids around China, and has posed a significant threat to grain production. The current study was conducted to determine the regulatory mechanism of wing dimorphism on M. dirhodum, which might be of great significance to better control this pest in wheat production.

      Surely the pea aphid offers more established experimental tools and genomic resources. However, with the development of high-throughput sequencing technology, the chromosome level genomes of many insect species have been assembled. That means any of various insects might be studied as a model species, and not limited to Drosophila melanogaster, Acyrthosiphon pisum, etc.

      - In Figure 5E, what anatomy is being shown in FISH? Moreover, this represents a single sample. It would be preferable to include a supplemental figure with comparable images from at least 3 additional specimens.

      It is the whole aphid body, and we have already uploaded additional 2 FISH images to the supplementary material Fig. S5. Thank you for this suggestion.

      - L190: Conservation alone seems inadequate to conclude that a chromosome functions as a sex chromosome. It would be fine to note the homology between Chr1 and the X of other Aphidini, but there are other explanations for that. Inference that Chr 1 is a sex chromosome might come from observations in karyotypes (by relative size comparisons or ideally from FISH) or from comparison of reads mapped to the chromosomes, suggesting Chr1 is hemizygous in males.

      Karyotype analysis experiment was not conducted in this research, so here the sex chromosome was determined based on chromosome homology between M. dirhodum and A. pisum genome. We have made appropriate modifications to the description in the article. Thank you for this suggestion.

      - L205: It's unclear to me how to interpret RNA editing results, based on RNAseq data, that map to "intergenic regions", especially when this is such a large fraction (37.3%) of the total result. Does this suggest a fundamental problem with the analysis, that so much RNAseq data maps to parts of the genome that are not annotated as genes?

      Non-coding RNA regions often account for a large proportion in the genome, and this RNAseq data is mapped to non-coding RNA transcription regions (37.3%) between protein-coding genes (intergenic regions).

      - L288-290: What degrees of confidence are attached to the predictions of these miRNA targets?

      There is no clear research indicating the accuracy of miRNA target prediction software. However, by comprehensively utilizing multiple prediction tools and experimental verification, the accuracy and reliability of prediction can be significantly improved.

      Actually, the prediction of miRNA targets is only a preliminary identification step, and we have subsequently demonstrated that miR-3036-5p can act on CYP18A1 through dual-luciferase reporter assay, RNA immunoprecipitation and FISH, etc.

      - L296-298: The mechanism proposed in this study seems to imply that miR-3036-5p should be absent (not expressed) in aphids under crowded conditions. Therefore, relative realtime PCR is not particularly useful here. Finding that the miR relative expression is reduced by 48.8% is meaningless, because in *relative* expression, zero has no special meaning. In this case, absolute quantitative PCR, measuring actual transcript numbers, would be far more informative.

      miR-3036-5p is not absent in aphids under crowded conditions. Only a significant decrease of miR-3036-5p in expression level under crowded conditions was identified compared to normal feeding conditions (Fig. 5B). So it should be reasonable to use relative quantitative methods for expression level analysis.

      - L361: Isn't alternative mRNA splicing a more common post-transcriptional modification?

      I'm very sorry, this sentence has been modified to “A-to-I RNA editing is one of the most prevalent forms of posttranscriptional modification in animals, plants, and other organisms.” Thank you for this suggestion.

      - L372: "Functional wing polymorphism is commonly observed in insects as a form of adaptation and a source of variation for natural selection (14)." The relationship between plastic phenotypic variation and natural selection is complex, and there is a large theoretical literature in evolutionary biology and evo-devo on this topic, but it is not a focus in the cited review by Zhang et al.. It would be helpful if the authors could expand on this idea with reference to some of this literature (e.g. Levins 1968; Harrison 1980; Moran 1992; Roff 1996; West-Eberhard 2003; Zera 2009).

      I have changed the citation and expanded on this idea. “Wing polymorphism is commonly observed in insects, resulting from variation in both genetic factors and environmental factors (Zera 2009).”

      - L404: Use the word "accurate" seems inappropriate in this context. Both morphs are equally "accurate".

      This sentence has been modified to “resulting in the alteration of CYP18A1 expression and wing dimorphism of offspring regulated by miR-3036-5p”, Thank you for this suggestion.

      - L412: Reference 67 seems irrelevant to this point.

      References have been changed and added.

      67. E.J. Duncan, C.B. Cunningham, P.K. Dearden. Phenotypic plasticity: what has DNA methylation got to do with it? Insects. 13(2):110 (2022).

      68. K.J. Rangan, S.L. Reck-Peterson, RNA recoding in cephalopods tailors microtubule motor protein function. Cell 186, 2531-2543 (2023).

      - L443: Is this referring to "mixed stage" aphids?

      Yes. To make it clearer, this sentence has been modified to “Approximately 200 mg of fresh M. dirhodum with mixed stages (including first- to fourth-instar nymphs and winged and wingless adults)”.

      - L483: What mass or number of individual aphids was used? I assume multiple individuals were pooled?

      Each sample contains approximately 200 aphids.

      - L499: Why was k = 17 used? The default is k = 21.

      The selection of k is usually an odd number between 15 and 21, which ensures that the types of k-mers can cover the genome while being small enough to avoid erroneous effects. Therefore, using 17 is very reasonable.

      - L574: what does it mean "multiple editing types"? What different types are possible? Are you referring to things other than A-to-I editing?

      That means besides A-to-I, this locus may also have other editing situations, such as A-to-C. If this situation occurs, it will be discarded.

      - L635: Which luciferase construct or plasmid has been used in this experiment? Citation to that source is necessary.

      PmirGLO vector (Promega, Leiden, Netherlands) was used in this experiment, and a reference has been added.

      B. Zhu, L. Li, R. Wei, P. Liang, X. Gao. Regulation of GSTu1-mediated insecticide resistance in Plutella xylostella by miRNA and lncRNA. PLoS Genetics. 17(10), e1009888 (2021).

      - L644: Did cDNA synthesis employ random primers or a poly-dT primer?

      This kit provides mixed primers, including random and poly-dT primers. (PrimeScript™ RT reagent Kit with gDNA Eraser (Perfect Real Time), Takara Biotechnology, Dalian, China).

      - Fig 4D: Seems like this panel should be divided to cover the two sites, as in Fig 3F. Right now the x-axis labels seem redundant.

      Done. Thank you for this suggestion.

      - Fig 7: Consider adding ADAR2 to this figure.

      Done. Thank you for this suggestion.

      - Table 1: It would be helpful to represent this data in a figure where the phylogenetic relationships among the species can be shown.

      The phylogenetic relationships among the species were shown in Fig. 1D, and the table here may present genome information in more detail.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review)

      This paper focuses on secondary structure and homodimers in the HIV genome. The authors introduce a new method called HiCapR which reveals secondary structure, homodimer, and long-range interactions in the HIV genome. The experimental design and data analysis are well-documented and statistically sound. However, the manuscript could be further improved in the following aspects.

      Major comments:

      (1) Please give the full name of an abbreviation the first time it appears in the paper, for example, in L37, "5' UTR" "RRE".

      Thank you for your suggestion. We have added the full name of these abbreviations.

      (2) The introduction could be strengthened by discussing the limitations of existing methods for studying HIV RNA structures and interactions and highlighting the specific advantages of the HiCapR method.

      Thank you for your insightful suggestion. We have modifed sentences in the introduction section (line 66 -line 71, line 80-line 81 in the revised manuscript).

      (3) Please reorganize Results Part 1.

      Thank you for your advice. We have reorganized results part 1. We hope the revision provides a logical flow and clarity to the results, making it easier for readers to follow the progression of the study and the significance of the findings regarding to the HiCapR method.

      (4) Is there any reason that the authors mention "genome structure of SARS-CoV-2" in L95?

      Thank you for your insightful question. We have deleted this sentence in the revised paper.

      Initially, the mention of our previous work on SARS-CoV-2 serves two purposes: firstly, to demonstrate our capability to perform proximity ligation assays on viral samples; and secondly, to underscore the necessity of the hybridization step, which is particularly relevant for the study of HIV.

      Unlike SARS-CoV-2, which is highly abundant in infected cells and does not require post-library hybridization, HIV-1 presents a unique challenge due to its typically low viral RNA input within cells. The simplified SPLASH protocol, while effective for more abundant viral RNAs, does not provide the necessary coverage for high-resolution analysis when applied directly to HIV samples.

      Now, we have deleted this sentence according to your comments, and discuss the technical difference elsewhere.

      (5) L102: Please clarify the purpose of comparing "NL4-3" and "GX2005002." Additionally, could you explain what NL4-3 and GX2005002 are? The connection between NL4-3, GX2005002, and HIV appears to be missing.

      Thank you for your question, and we apologize for the misleading. "NL4-3" and "GX2005002" are two distinct HIV-1 strains that exhibit different prevalence patterns in various geographical regions. The NL4-3 strain is a well-characterized laboratory strain that is widely used in HIV research and is representative of the HIV-1 subtype B, which is highly prevalent in Europe and the Americas. On the other hand, GX2005002 is a primary isolate of the CRF01_AE subtype, which is one of the most prevalent strains in Southeast Asia, particularly in China.

      The reason for comparing these two strains in our study is twofold. Firstly, it allows us to assess the applicability and versatility of our HiCapR method across different HIV-1 strains that may have distinct genetic and structural features. This is crucial for understanding the potential broad utility of our method in studying various HIV-1 strains globally. Secondly, by comparing these strains, we can begin to elucidate any strain-specific differences in RNA structure, homodimer formation, and long-range interactions, which may have implications for viral pathogenesis, transmission, and response to therapeutic interventions.

      The connection between NL4-3, GX2005002, and HIV lies in their representation of different subtypes of the HIV-1 virus, which exhibit genetic diversity and are associated with different geographical distributions. This diversity is epidemiologically and clinically relevant, as it may be associated with different pathogenesis and resistance mechanisms, and might has implications for vaccine development and treatment strategies.

      (6) Figure 1A is not able to clearly present the innovation point of HiCapR.

      Thank you for your comment. We have revised this figure to more clearly illustrate the steps and principles of the post-library capture process using HIV pooled probes hybridization and streptavidin pull down to enrich HIV RNA-derived chimeras.

      (7) Please compare the contact metrics detected by HiCapR and current techniques like SHAPE on the local interactions to assess the accuracy of HiCapR in capturing local RNA interactions relative to established methods.

      Thank you for your request to compare the contact metrics detected by HiCapR and current techniques like SHAPE on local interactions to assess the accuracy of HiCapR in capturing local RNA interactions relative to established methods.

      In this study, HiCapR has demonstrated its ability to identify key structural elements within the HIV genome, including TAR, polyA, SL1, SL2, and SL3, as well as the polyA-SL1 in the monomeric conformation. These elements are crucial for understanding the local RNA structures involved in HIV replication and pathogenesis. By visualizing the base pairing probability as a heatmap, we have identified the most stable base pairs in the 5’ UTR of HIV, which is consistent across both NL4-3 and GX2005002 strains (Figure 2D). This consistency suggests robustness in the overall structure despite sequence variations and alternative RNA conformations, indicating a high level of agreement between HiCapR and SHAPE methods in detecting local interactions.

      Furthermore, HiCapR not only confirms the presence of known structural elements but also reveals alternative conformations of the 5'UTR that support the alternative conformations found in SHAPE analysis. This additional layer of information provides a more comprehensive view of the RNA structures, highlighting HiCapR's ability to capture local RNA interactions with a high degree of accuracy comparable to established methods like SHAPE.

      (8) The paper needs further language editing.

      We have thoroughly revised the paper. We hope it’s improved significantly.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript "Mapping HIV-1 RNA Structure, Homodimers, Long-Range Interactions and 1 persistent domains by HiCapR" Zhang et al report results from an omics-type approach to mapping RNA crosslinks within the HIV RNA genome under different conditions i.e. in infected cells and in virions. Reportedly, they used a previously published method which, in the present case, was improved for application to RNAs of low abundance.

      Their claims include the detection of numerous long-range interactions, some of which differ between cellular and virion RNA. Further claims concern the detection and analysis of homodimers.

      Strengths:

      (1) The method developed here works with extremely little viral RNA input and allows for the comparison of RNA from infected cells versus virions.

      (2) The findings, if validated properly, are certainly interesting to the community.

      Thank you for your comprehensive review and insightful comments on our manuscript. We appreciate your recognition of the strengths of our HiCapR method and the potential interest of our findings to the scientific community.

      Weaknesses:

      (1) On the communication level, the present version of the manuscript suffers from a number of shortcomings. I may be insufficiently familiar with habits in this community, but for RNA afficionados just a little bit outside of the viral-RNA-X-link community, the original method (reference 22) and the presumed improvement here are far too little explained, namely in something like three lines (98-100). This is not at all conducive to further reading.

      Thank you for your feedback on the clarity of our manuscript, particularly regarding the explanation of the HiCapR method and its improvements over the original method mentioned in reference 22

      In response to your feedback, we expand on the description of the HiCapR method in the revised manuscript to ensure that it is accessible to a broader audience. We will provide a more thorough comparison between HiCapR and the original method, detailing the specific improvements and how they enable the analysis of low-abundance viral RNAs like HIV. This will include:

      Post-library Hybridization: Unlike the original method, HiCapR incorporates a post-library hybridization step. This innovation allows for the capture of target RNA involved in interactions after library construction, offering additional flexibility and enhancing the resolution of the analysis.

      Enhanced Sensitivity: HiCapR has been optimized to work with extremely low viral RNA input, which is a significant advancement over the original method. This is crucial for studying viruses like HIV, where obtaining high quantities of viral RNA can be challenging. As a matter of fact,

      (2) Experimentally, the manuscript seems to be based on a single biological replicate, so there is strong concern about reproducibility.

      Thank you for raising the issue of reproducibility in our study. We understand the importance of experimental replication in ensuring the reliability of our findings. In response to your concern, we would like to provide the following clarification and additional details regarding the reproducibility of our HiCapR experiments:

      Replicates in HiCapR Experiments: All ligation and control samples in our HiCapR experiments were performed in three biological replicates. This was done to ensure the high reproducibility of our results. The high degree of correlation (r > 0.99) between these replicates underscores the reliability of our findings.

      Dimer Validation Experiments: To validate the dimer formation of RRE and 5’-UTR, we employed multiple independent methods, including Native agarose gel electrophoresis, Agilent 4200 TapeStation Capillary electrophoresis, and Biomolecular Binding Kinetics Assays. These methods provide complementary perspectives on the dimer formation, enhancing the robustness of our validation process. The data presented in Figure 3C and Supplementary figure S12 are representative results from these experiments, which consistently support our findings on dimer formation.

      Agreement Between Cellular and Virion RNA: Our study also demonstrates a significant similarity between virions in the supernatant and infected cells from the same viral strain, as shown in Supplementary Figure S3. This consistency further validates the reproducibility and reliability of our HiCapR method in capturing RNA structures and interactions under different conditions.

      Consistency across two strains: Our study includes a comprehensive analysis of two distinct HIV-1 strains, NL4-3 and GX2005002, which are prevalent in Europe and Southeast Asia, respectively. The consistency in our findings across these strains serves as a strong indicator of the reproducibility and general applicability of our HiCapR method. Specifically, presence of key structural elements such as TAR, polyA, SL1, SL2, and SL3 in both NL4-3 and GX2005002 strains, suggests a robust structural framework that is conserved across different strains, despite sequence variations. Additionally, our study reveals approximately 20 candidate dimer peaks conserved between the NL4-3 and GX2005002 strains along the genome. The conservation of these dimer peaks across strains indicates a reproducible pattern of dimerization.

      (3) The authors perform an extensive computational analysis from a limited number of datasets, which are in thorough need of experimental validation

      Thank you for your comment.

      In response to your concern, we would like to clarify that while our manuscript does present an extensive computational analysis, we have also conducted a series of experiments. Specifically, we have validated dimer formation using multiple independent methods (afore discussed).

      Given the time-consuming nature of additional experiments, we have chosen to share the HiCapR data with the community in a timely manner. This approach allows for more immediate communication and evaluation of the data on HIV structure, which we believe is valuable for advancing the field.

      We are committed to further investigating the functional implications of our structural findings. We plan to conduct more experiments to explore the functional linking between the structural insights of HIV, which will help to deepen our understanding of the virus's replication and potential antiviral strategies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I suggest a major revision of the manuscript.

      Minor comments:

      (1) The article lacks consistency in its presentation. The expression of the proper noun is wrong in the paper. For example, (a) L89, "RNA:RNA interaction" →RNA-RNA interaction; (b) L431, "SARS-COV-2" → SARS-CoV-2;

      We are sorry for the inconsistency. We have corrected the mistakes.

      (2) "We identified dimers based on the methodology described in23." is not a complete sentence.

      Thank you for your insightful comment. We have revised the sentence to provide a complete and clear description of our methodology. The revised sentence is as follows: "Homodimers were identified in accordance with the methods previously reported in the literature."

      Reviewer #2 (Recommendations for the authors):

      (1) The authors perform an extensive computational analysis from a limited number of datasets, which are in thorough need of experimental validation. There is a single series on in vitro validation of the interaction of an homodimerization site, described in five lines (278-283) plus the Figure panel 3c with a very brief legend, and an extremely minimalist Figure S12. The panel to Figure 3c contains Kd values which have not been assessed for significant digits.

      Thank you for your constructive feedback on our manuscript.

      We acknowledge that our computational analysis is based on a limited number of datasets. Due to the initial exploratory nature of our study and the logistical challenges of generating additional datasets, we have focused on in-depth analysis of the available data. We are currently working on further validating our findings and are committed to publishing these results in a follow-up study.

      Regarding Experimental Validation:

      We agree that the initial description of our in vitro validation of the homodimerization site was concise. To address this, we have expanded the description of our experimental procedures. Specifically, we have detailed the methods used for the in vitro transcription, the preparation of RNA samples, and the use of the Octet R8 platform for biomolecular binding kinetics assays.

      For the Kd values presented in Figure 3c. We have now included standard error of the mean and have defined the significant digits in the figure legend. This revision provides a more accurate representation of the binding affinities.

      (2) As a further example to be experimentally validated, splice sites are discussed after lines 354, for which unsophisticated validation techniques such as targeted RT-PCR are widely accepted.

      In response to your comment, we would like to clarify that the splice sites mentioned in our study are well-established and widely recognized in the literature. They have been previously characterized and are considered canonical within the HIV research community. Given their established nature, we have relied on this foundational knowledge in our analysis.

      However, we concur with the importance of validating the regulatory role of homodimers in splicing, which is a significant aspect of HIV biology. While we have provided evidence for the presence of these homodimers and their potential implications for splicing, we acknowledge the need for further functional studies to elucidate their mechanistic role.

      Due to the scope and length constraints of the current manuscript, we have chosen to focus on the structural and interaction analyses provided by HiCapR. The functional validation of these homodimers and their impact on splicing will be pursued in subsequent studies, which we plan to initiate promptly. We believe that a dedicated follow-up study will allow for a more in-depth exploration of this complex and important aspect of HIV gene regulation.

      We are committed to advancing our understanding of the functional significance of these homodimers in the context of HIV splicing and will ensure that this line of investigation is thoroughly addressed in our future work.

      Thank you again for your valuable feedback. We look forward to contributing further to the field with our ongoing research.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      “This work presents valuable data demonstrating that a camelid single-domain antibody can selectively inhibit a key glycolytic enzyme in trypanosomes via an allosteric mechanism. The claim that this information can be exploited for the design of novel chemotherapeutics is incomplete and limited by the modest effects on parasite growth, as well as the lack of evidence for cellular target engagement in vivo.”

      We agree with this assessment. In this re-worked version, we implemented the textual changes suggested by the reviewers and performed additional in silico work. The reviewers also presented valuable suggestions for additional experiments. However, we currently don’t have dedicated hands and funding for this project, which renders it impossible for us to perform additional “wet lab” experiments at this stage. We have thus not included new experimental “wet lab” data. Finally, the claim that our results may be exploited for the design of novel chemotherapeutics perhaps came across stronger than we intended to. We still believe our findings indicate a potential for such an endeavor, but this clearly requires further investigation and experimental evidence. We have softened this statement by removing it from the abstract and have edited the discussion to end as follows.

      “Based on the presented results, we propose that sdAb42 may pinpoint a site of vulnerability on trypanosomatid PYKs that could potentially be exploited for the design of novel chemotherapeutics. Indeed, antibodies (or fragments thereof) are valuable drug discovery tools. Antibodies (and camelid sdAbs especially) are known for their ability to "freeze out" specific conformations of highly dynamic antigens, thereby exposing target sites of interest, which could be exploited for rational drug design (the development of so-called "chemo-superiors", (Lawson, 2012; Khamrui et al., 2013; van Dongen et al., 2019)). While the design of a "chemo-superior" inspired on the sdAb42-mediated allosteric inhibition mechanism will require further investigation, the results presented here provide a foundation to fuel such an endeavour.”

      REVIEWER 1:

      Summary:

      The authors identified nanobodies that were specific for the trypanosomal enzyme pyruvate kinase in previous work seeking diagnostic tools. They have shown that a site involved in the allosteric regulation of the enzyme is targeted by the nanobody and using elegant structural approaches to pinpoint where binding occurs, opening the way to the design of small molecules that could also target this site.

      Strengths:

      The structural work shows the binding of a nanobody to a specific site on Trypanosoma congolense pyruvate kinase and provides a good explanation as to how binding inhibits enzyme activity. The authors go on to show that by expressing the nanobodies within the parasites they can get some inhibition of growth, which albeit rather weak, they provide a case on how this could point to targeting the same site with small molecules as potential trypanocidal drugs.

      Weaknesses:

      The impact on growth is rather marginal. Although explanations are offered on the reasons for that, including the high turnover rate of the expressed nanobody and the difficulty in achieving the high levels of inhibition of pyruvate kinase required to impact energy production sufficiently to kill parasites, this aspect of the work doesn't offer great support to developing small molecule inhibitors of the same site.

      Recommendations for authors:

      Generally, the paper is very well written and the figures and their legends are clear.

      Comment 1.1: I thought the Introduction could give more focus to the need for new drugs for veterinary trypanosomiasis. The reality is that with fexinidazole now available and acoziborole soon to be available, with <1,000 cases of human African trypanosomiasis in each of the last five years, the case for needing new drugs is difficult to make. For Animal trypanosomiasis, however, the need for novel drugs is much more pressing.

      We agree with this comment and have included an additional section in the Introduction’s second paragraph, which reads as follows.

      “Hence, there is a need for alternative compounds, preferably with novel modes of action and/or designed based on mechanistic insights of the target’s structure-function relationship (Field et al., 2017; De Rycker et al., 2018). This need is especially pressing for AAT, which strongly impedes sustainable livestock rearing in Sub-Saharan Africa. AAT results in drastic reductions of draft power, meat, and milk production by the infected animals (small and large ruminants), and its control relies mainly on vector control and chemotherapy, with only few drugs currently available. The lack of routine field diagnosis has resulted in the misuse of trypanocidal drugs, thereby accelerating the rise of parasite resistance and further exacerbating the problem (Richards et al., 2021). As such, AAT-inflicted annual losses are estimated at around $5 billion (and the necessity to invest another $30 million each year to control AAT through chemotherapy), thereby having a devastating impact on the socio-economic development of Sub-Saharan Africa (Fetene et al., 2021). In contrast, HAT is perceived as a minor threat as it has reached a post-elimination phase as a public health problem with less than 1,000 yearly documented cases (Franco et al., 2022). In addition, new and effective drugs for HAT treatment have recently become available (De Rycker et al., 2023). HAT control currently relies on case detection and treatment, and vector control (Büscher et al., 2017).”

      Comment 1.2: A few pedantic things can be tidied up too, for example on line 61 it is stated tsetse flies are part of the life cycle for all trypanosomes while some veterinary species e.g. T. evansi and some T.vivax strains use other biting flies for transmission. I'd also add in the Introduction that pyruvate kinase is not a glycosomal enzyme (it is covered in the legend to figure 1 but I think it is quite important to clarify in the Introduction too so as to assure readers aren't wondering if "intrabodies" can get targeted there.

      We agree with this comment and have included an additional section in the Introduction’s third paragraph to expand on the life cycles of African trypanosomes, which reads as follows.

      “African trypanosomes are extracellular parasites that have a bipartite life cycle involving insect vectors and mammals as hosts (Radwanska et al., 2018). Most HAT (T. brucei gambiense and T. b. rhodesiense) and AAT (T. b. brucei and T. congolense) causing trypanosomes are uniquely vectored by tsetse flies (Glossina spp.) and are confined to Sub-Saharan Africa. T. b. evansi and T. vivax (both causative agents of AAT) have expanded beyond the tsetse belt due to their ability to be mechanically transmitted by a variety of biting flies (Glossina, Stomoxys, and Tabanus spp.). Finally, T. b. equiperdum infects equids and represents an exception as it is transmitted directly from animal to animal through sexual contact.”

      The introduction now also explicitly mentions that pyruvate kinase is not a glycosomal enzyme.

      Comment 1.3: The introduction would also be a good place to include some more information on what is known about the allosteric effectors of pyruvate kinase in trypanosomes, and emphasize where gaps in knowledge exist too.

      We agree with this comment and have included an additional section in the Introduction’s third paragraph, which reads as follows.

      “Pyruvate kinase (PYK) represents another attractive glycolytic target. This non-glycosomal enzyme catalyses the last step of the glycolysis (the irreversible conversion of phosphoenolpyruvate (PEP) to pyruvate; Figure 1A). The importance of this reaction is two-fold: i) the generation of ATP through the transfer of a phosphoryl group from PEP to ADP and ii) the formation of pyruvate, a crucial metabolite of the central metabolism. Like most PYKs, trypanosomatid PYKs are homotetramers. The PYK monomer is a ∼55 kDa protein organized into four domains termed ’N’, ’A’, ’B’, and ’C’ (Figure 1B). The A domain constitutes the largest part of the PYK monomer and is characterized by an (𝛼/𝛽)8-TIM barrel fold that contains the active site. Together with the N-terminal domain, it is also involved in the formation of the PYK tetramer AA’ dimer interfaces. The B domain is known as the flexible ’lid’ domain that shields the active site during enzyme-mediated phosphotransfer. Finally, the C domain harbors the binding pocket for allosteric effectors and stabilizes the PYK tetramer by formation of CC’ dimer interfaces. Because of their role in ATP production and distribution of fluxes into different metabolic branches, the activity of trypanosomatid PYKs is tightly regulated through an allosteric mechanism known as the "rock and lock" model (Morgan et al., 2010, 2014; Pinto Torres et al., 2020). In this model (which is detailed in Figure 1C), the binding of substrates and/or effectors (and analogs thereof) to the active and effector sites, respectively, trigger a conformational change from the enzymatically inactive T state to the catalytically active R state. Known effector molecules for trypanosomatid PYKs are fructose 2,6-bisphosphate (F26BP), fructose 1,6-bisphosphate (F16BP) and sulfate (SO<sub>4</sub><sup>2-</sup>), with F26BP being the most potent one (van Schaftingen et al., 1985; Callens and Opperdoes, 1992; Ernest et al., 1994; Tulloch et al., 2008). Interestingly, trypanosomatid PYKs seem to be largely unresponsive to the allosteric regulation of enzyme activity by free amino acids (Callens et al., 1991), which contrasts with human PYKs (Chaneton et al., 2012; Yuan et al., 2018). Known trypanosomatid PYK inhibitors impair enzymatic activity through occupation of the PYK active site (Morgan et al., 2011).”

      In the Results, although I am not qualified to analyse the structural data in detail I am confident in the ability of the authors to do so.

      Comment 1.4: Differences in nanobody binding kinetics to the T. congolense enzyme when compared to T. brucei and Leishmania enzymes are attributed to the relatively few amino acid differences in those sites. It is desirable to test site-directed mutagenesis of those residues.

      This is a highly valuable suggestion from the reviewer. However, we currently don’t have dedicated hands and funding for this project, which renders it impossible for us to perform additional experiments at this stage.

      Comment 1.5: In the section on slow-binding inhibition kinetics (lines 194-220) I found it difficult to follow whether it was just the R<>T transition that slowed nanobody inhibition, or whether competition with effectors at the site would also impact on those inhibition kinetics. Can this be clarified?

      Since the sdAb42 epitope is located relatively far away from both active and effector sites (~20 and ~40 Å, respectively), it seems highly unlikely the observed “slow-binding inhibition” kinetics are the result of a competition between sdAb42 on one hand and substrates and/or effectors on the other for enzyme binding. Instead, given that sdAb42 selectively binds and locks the enzyme’s inactive T state, these data can be explained by the idea that sdAb42 can only bind to trypanosomatid PYKs after having undergone an R- to T-state transition. To clarify this matter, we slightly reformulated the discussion as indicated below. We also included a small discussion on the observation that there is a 400-fold difference between the Kd and the IC50.

      “Since the sdAb42 epitope is located relatively far away from both active and effector sites (~20 and ~40 Å, respectively), it seems highly unlikely that the observed “slow-binding inhibition” kinetics are the result of a direct competition between sdAb42 and substrates and/or effectors. Instead, given that sdAb42 selectively binds and locks the enzyme’s inactive T state, these data can be explained by the idea that sdAb42 can only bind to trypanosomatid PYKs after having undergone an R- to T-state transition. An additional observation in this context, is the 400-fold difference between the K<sub>D</sub> and IC<sub>50</sub> values. Although we currently do not have a mechanistic explanation, similar differences have been observed for the sdAb-mediated allosteric inhibition of other kinases (Singh et al., 2022).”

      For the intrabody expression work, the reference cited on line 230 actually points to a growing ability to genetically modify T. congolense. However, it is justifiable to work on T.brucei given the much wider availability and advanced status of the genetic tools.

      The growth inhibition data shown in Figure 7 is weak, albeit significant and the case is made as to why that might be.

      Comment 1.6: The authors do point to the fact that inhibiting other parts of the glycolytic pathway might be helpful in getting a better growth inhibitory effect. It would be useful, in this regard, to test the ability of the PFK inhibitors in the Macnae et al. paper in the intrabody expressing line, and possibly other inhibitors e.g. 2-deoxy-D-glucose to see if these combinations do have the desired impacts. Also, looking at the metabolome of the intrabody expressors under induction could also give some further insights into changes in flux (although perhaps not on its own given the weak effects on the growth seen).

      This is a highly valuable suggestion from the reviewer. However, we currently don’t have dedicated hands and funding for this project, which renders it impossible for us to perform additional experiments at this stage. We would like to point out that, in our experience, studying the effect of enzyme inhibition on the metabolome is usually only useful shortly after adding the onset of inhibition. The system adapts to the lowered flux and relevant changes are mostly transient. Since the induced expression of sdAb42 is almost certainly slow, we expect the metabolic changes will be minimal.

      REVIEWER 2:

      Summary:

      In this work, the authors show that the camelid single-chain antibody sdAb42 selectivity inhibits Trypanosome pyruvate kinase (PYK) but not human PYK. Through the determination of the crystal structure and biophysical experiments, the authors show that the nanobody binds to the inactive T-state of the enzyme, and in silico analysis shows that the binding site coincides with an allosteric hotspot, suggesting that nanobody binding may affect the enzyme active site. Binding to the T-state of the enzyme is further supported by non-linear inhibition kinetics. PYK is an important enzyme in the glycolytic pathway, and inhibition is likely to have an impact on organisms such a trypanosomes, that heavily rely on glycolysis for their energy production. The nanobody was generated against Trypanosoma congolense PYK, but for technical reasons the authors progressed to testing its impact on cell viability in Trypanosoma brucei brucei. First, they show that sdA42 is able to inhibit Tbb PYK, albeit with lower potency. Cell-based experiments next show that expression of sdA42 has a modest, and dose-dependent effect on the growth rate of Tbb. The authors conclude that their data indicates that targeting this allosteric site affects cell growth and is a valuable new option for the development of new chemotherapeutics for trypanosomatid diseases.

      Strengths:

      The work clearly shows that sdA42A inhibits Trypanosome and Leishmania PYK selectively, with no inhibition of the human orthologue. The crystal structure clearly identifies the binding site of the nanobody, and the accompanying analysis supports that the antibody acts as an allosteric inhibitor of PYK, by locking the enzyme in its apo state (T-state).

      Weaknesses:

      (1) The most impactful claim of this work is that sdAb42-mediated inhibition of PYK negatively affects parasite growth and that this presents an opportunity to develop novel chemotherapeutics for trypanosomatid diseases. For the following reasons I think this claim is not sufficiently supported:

      Comment 2.1: The authors do not provide evidence of target-engagement in cells, i.e. they do not show that sdA42A binds to, or inhibits, Tbb PYK in cells and/or do not provide a functional output consistent with PYK inhibition (e.g. effect on ATP production). Measuring the extent of target engagement and inhibition is important to draw conclusions from the modest effect on growth.

      The authors do not explore the selectivity of sdA42A in cells. Potentially sdA42A may cross-react with other proteins in cells, which would confound interpretation of the results.

      We understand the reviewer’s concern. While it is theoretically possible that sdAb42 may non-specifically (cross-)react with other proteins in the cell, this would be highly unlikely based on the following arguments. First, many studies have employed sdAbs as intrabodies and reported on specific sdAb-mediated effects (outstanding reviews on the topic are Cheloha et al. (PMID 32868455) and Soetens et al. (PMID 33322697)). Second, it has been demonstrated that selecting sdAbs from an immune library through phage display or “bacteriomatch” (a bacterial system similar to yeast two hybrid) yields highly similar results (Pellis et al., PMID 22583807), thereby indicating that sdAbs interact specifically with their target antigens in an intracellular environment. Third, we identified TcoPYK as the target for sdAb42 by employing sdAb42 as bait in a pull-down from a parasite whole cell lysate (Pinto Torres et al., PMID 29899344). The pull-down fractions were analysed by SDS-PAGE and we observed a clear prominent band, which was further analysed by mass spectrometry and revealed TcoPYK as the target with great certainty. Even though the affinity of sdAb42 for TbrPYK is lower, it still remains high (nM affinity) and we expect it to bind TbrPYK with high specificity.

      Regarding measuring the effect on ATP production, we would like to state that such experiments are not obvious. Instead of measuring ATP levels, one should measure ATP turnover as ATP levels may not necessarily be decreased. The latter was observed to be the case for the specific inhibition of trypanosomal PFK (Nare et al. PMID 36864883). The specific trypanosomal PFK inhibitor inhibits motility (and growth) of T. congolense IL3000 at concentrations that only slightly affect ATP levels. One could think of repeating the sdAb42 experiments in a T. congolense model. However, T. congolense BSF metabolism is more complicated than that of T. brucei BSF. First, the T. congolense glucose metabolic network is more expanded, allowing a lower glucose consumption rate to produce ATP and metabolites for growth. Second, pyruvate is not excreted but further metabolised, in part in the mitochondrion. Steketee et al. (PMID 34310651) have shown that T. congolense also takes up pyruvate from the medium. One can thus check if (increased) external pyruvate (partially) rescues the growth inhibition by sdAb42. It will not provide proof, but maybe an indication. As mentioned above, we are currently unable to perform such additional experiments due to lack of dedicated hands and funding.

      Comment 2.2: sdA42A only affects minor growth inhibition in Tbb. The growth defect is used as the main evidence to support targeting this site with chemotherapeutics, however based on the very modest effect on the parasites, one could reasonably claim that PYK is actually not a good drug target. The strongest effect on growth is seen for the high expressor clone in Figure 4a, however here the uninduced cells show an unusual profile, with a sudden increase in growth rate after 4 days, something that is not seen for any of the other control plots. This unexplained observation accentuates the growth difference between induced and uninduced, and the growth differences seen in all other experiments, including those with the highest expressors (clones 54 and 55) are much more modest. The loss of expression of sdA42A over time is presented as a reason for the limited effect, and used to further support the hypothesis that targeting the allosteric site is a suitable avenue for the development of new drugs. However, strong evidence for this is missing.

      We agree that the growth effect of sdAb42 expression is modest, and we have provided several explanations as to why this could be the case. In addition, as mentioned at the start of this rebuttal, the claim that our results may be exploited for the design of novel chemotherapeutics was perhaps expressed stronger than we intended to. We still believe our findings indicate a potential for such an endeavor, but this clearly requires further investigation and experimental evidence as mentioned by the reviewer.

      We, however, disagree that PYK would not be a good drug target. Its potential to serve as a drug target is related to its fundamentally important role in trypanosomal glycolysis and not to the features of sdAb42. Steketee et al. (PMID 34310651) have shown that glycolysis is essential for T. congolense BSF, despite a lower glycolytic flux than in T. brucei BSF. The T. congolense glucose metabolic network is more expanded, allowing a lower glucose consumption rate to produce ATP and metabolites for growth. Also here, PYK is thus almost certainly essential and from that perspective a good drug target.

      Comment 2.3: For chemotherapeutic interventions to be possible, a ligandable site is required. There is no analysis provided of the antibody binding site to indicate that small molecule binding is indeed feasible.

      We agree with the reviewer’s comment and have included APOP analysis on the TcoPYK T state crystal structure (see also reply to Comment 3.1). Briefly, APOP works by detecting pockets and then perturbing each pocket in the protein's elastic network (GNM) by adding stiffer springs between the surrounding residues. The pockets are scored and ranked based on the calculated shifts in the eigenvalues of the global GNM modes and their local hydrophobic densities, thereby also considering the pocket’s surface accessibility, which renders it suitable for the identification of allosteric (and druggable) pockets. The APOP analysis identifies pockets overlapping with the sdAb42 epitope as highly ranking allosteric ligand binding pockets. The data have been summarized in an additional supplementary figure (Figure 4 – figure supplement 1). The manuscript also contains details on the performed APOP analysis in the Materials and Methods section.

      Comment 2.4: The authors comment on the modest growth inhibition, and refer to the need to achieve over 88% reduction in Vmax of PYK to see a strong effect, something that may or may not be achieved in the cell-based model (no target-engagement or functional readout provided). The slow binding model and switch of species are also raised as potential explanations. While these may be plausible explanations, they are not tested which leaves us with limited evidence to support targeting the allosteric site on PYK.

      In our understanding of this remark, we believe it be related to Comments 2.1 and 2.2 and thus refer to our answers formulated above.

      Comment 2.5: The evidence to support an allosteric mechanism is derived from structural studies, including the in silico allosteric network predictions. Unfortunately, standard enzyme kinetics mode of inhibition studies are missing. Such studies could distinguish uncompetitive from non-competitive behaviour and strengthen the claim that sdAb42 locks the enzyme complex in the apo form.

      We agree with the referee that a thorough kinetic analysis could distinguish between uncompetitive (i.e., sdAb only binds to the enzyme if substrate is bound) or non-competitive (i.e., sdAb can bind to apo enzyme and substrate-bound enzyme) inhibition. In both cases, however, this would correspond to an allosteric mechanism of inhibition. Although such a thorough kinetic analysis would be interesting in its own right, we would like to argue that this type of very detailed kinetics is outside the scope of this paper. This is especially the case taking into account that this analysis could be complicated by the slow-onset inhibition behavior.

      Comment 2.6: As general comment, the graphical representation of the data could be improved in line with recent recommendations: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002128, https://elifesciences.org/inside-elife/5114d8e9/webinar-report-transforming-data-visualisation-to-improve-transparency-and-reproducibility.

      - Bar-charts for potency are ideally presented as dot plots, showing the individual data points, or box plots with datapoints shown.

      - Images in Figure 7 show significant heterogeneity of nanobody expression, but the extent of this can not be gleaned from Figure 7B. It would be much better to use box plots or violin plots for each cell line on this figure panel. The same applies to Figure 10.

      We thank the reviewer for these suggestions but have taken the decision not to act upon these as the other reviewers explicitly mentioned that our figures are very clear.

      Recommendations for authors:

      Please find below some minor comments:

      Comment 2.7: Line 24: "increasing number of drug failures": This does not really reflect the current situation for human African trypanosomiasis, with NECT treatment retaining efficacy, fexinidazole now being registered, and acoziborole progressing towards registration. It may be worth considering focusing the introduction more on Nagana, as all Trypanosome species used in the paper are animal infective, and the nanobody was discovered for T. congolense.

      We refer to our answer formulated in response to Comment 1.1.

      Comment 2.8: Line 55: "alarming number of reports describing ..." While resistance is a big problem, this mainly applies to malaria, bacterial and fungal diseases. For kinetoplastids, the number of reports describing resistance in the clinic is pretty limited. However, the drug discovery pipeline for these diseases is sparse, so I definitely agree there is a need to develop new compounds with differentiated mechanisms.

      We agree with the reviewer and have slightly adapted our wording here as follows.

      “Unfortunately, a number of reports describe treatment failure or parasite resistance to the currently available drugs (De Rycker et al., 2018).”

      Comment 2.9: This manuscript is about pyruvate kinase, but the enzyme is not properly introduced. I suggest a short paragraph introducing PYK at line 77 (without duplicating Figure 1), covering its role in glycolysis, the importance of pyruvate, any essentiality data from the literature, and any known inhibitors.

      We refer to our answer formulated in response to Comment 1.3.

      Comment 2.10: Figure 6: For the top insets it would be useful to somehow show the increasing antibody concentration, either by using a changing intensity or size for each line.

      We thank the reviewer for this suggestions, but decided not to act upon it as we found that the inclusion of this information in the figure made it “too crowded”, which is why we opted to provide this information in the figure legend.

      “Only a subset of the traces is shown for the sake of clarity. The following curves are shown (from bottom to top): TcoPYK (0.15 nM sdAb42, 500 nM sdAb42, 750 nM sdAb42, 1000 nM sdAb42, 1500 nM sdAb42, 2000 nM sdAb42, no enzyme control), LmePYK (5 nM sdAb42, 750 nM sdAb42, 1250 nM sdAb42, 1500 nM sdAb42, 2500 nM sdAb42, 3000 nM sdAb42, no enzyme control), and TbrPYK (1 nM sdAb42, 1000 nM sdAb42, 1750 nM sdAb42, 2000 nM sdAb42, 3500 nM sdAb42, 4000 nM sdAb42, no enzyme control).”

      Comment 2.11: You refer to the curves as biphasic, but they look like 1st order kinetics, and there are no clear 1st and 2nd phases (or at least they are not marked). It may be more appropriate to label these as non-linear.

      We agree that the term “biphasic” is potentially an over-simplification of the actual situation. What we mean is that the formation of product as a function of time ([P] versus [t] curve) is not linear at short time ranges but evolves from an initial “weakly inhibited” rate (v<sub>0</sub>) to a “strongly inhibited” steady-state rate (v<sub>ss</sub>). This conversion from v<sub>0</sub> to v<sub>ss</sub> indeed occurs in a fashion following single exponential behavior. With the term “biphasic” we thus meant a non-linear phase (before v<sub>ss</sub> is reached) followed by a linear phase (after v<sub>ss</sub> is reached). To avoid any confusion, we replaced the term “biphasic” by “non-linear”.

      Comment 2.12: IC50s - would be useful to provide a comparison with IC50s generated in the pre-incubation experiments - is the antibody less potent without pre-incubation? I could not find IC50s for the pre-incubation experiments shown in Figure 2.

      We determined an IC50 value for sdAb42 against TcoPYK under pre-incubation conditions, but initially decided not to include this into the manuscript. We agree with the reviewer that a comparison between IC50 values determined under pre- and post-incubation conditions would be of interest, and have therefore included the pre-incubation IC50 data for TcoPYK in Figure 2 (panel B). The data indeed show that sdAb42 far more efficiently inhibits an enzyme that is not continuously cycling between R and T states (IC50 values of 15 nM and 359 nM under pre- and post-incubation conditions, respectively). This is now discussed in the results section of the manuscript. We did not determine IC50 values for sdAb42 against TbrPYK and LmePYK under pre-incubation conditions, but suspect that a similar observation will be made upon comparing these values to IC50 under post-incubation conditions.

      REVIEWER 3:

      Summary:

      Out of the 20 Neglected Tropical Diseases (NTD) highlighted by the WHO, three are caused by members of the trypanosomatids, namely Leishmanaisis, Trypanosomiasis, and Chagas disease. Trypanosomal glycolytic enzymes including pyruvate kinase (PyK) have long been recognised as potential targets. In this important study, single-chain camelid antibodies have been developed as novel and potent inhibitors of PyK from the T, congolense. To gain structural insight into the mode of action, binding was further characterised by biophysical and structural methods, including crystal structure determination of the enzyme-nanobody complex. The results revealed a novel allosteric mechanism/pathway with significant potential for the future development of novel drugs targeting allosteric and/or cryptic binding sites.

      Strengths:

      This paper covers an important area of science towards the development of novel therapies for three of the Neglected Tropical Diseases. The manuscript is very clearly written with excellent graphics making it accessible to a wide readership beyond experts. Particular strengths are the wide range of experimental and computational techniques applied to an important biological problem. The use of nanobodies in all areas from biophysical binding experiments and X-ray crystallography to in-vivo studies is particularly impressive. This is likely to inspire researchers from many areas to consider the use of nanobodies in their fields.

      Weaknesses:

      There is no particular weakness, but I think the computational analysis of allostery, which basically relies on a single server could have been more detailed.

      Recommendations for authors:

      Overall an excellent paper, there are just a couple of points that the authors could consider, if time allows.

      Comment 3.1: As mentioned above the computational analysis of allostery appears to be based on a single server based on coordinates alone with no in-depth analysis. It would be extremely interesting to see if more sophisticated methods based on elastic network model and/or molecular dynamics simulation gave similar results. I realize that this would require quite a lot of work though.

      We agree with the reviewer’s comment and have complemented the perturbation analysis (previously presented in the manuscript) with dGNM and APOP analyses to identify allosteric communication pathways and allosteric binding pockets, respectively. dGNM, which is based on transfer entropy, allowing for a detailed characterization of the dynamic coupling and information transfer between residues. Meanwhile, APOP employs a perturbation-based approach to detect and rank allosteric pockets. The findings are in good agreement with the previously presented perturbation data and have been summarized in an additional supplementary figure (Figure 4 – figure supplement 1). The manuscript also contains details on the performed transfer entropy and APOP analyses in the Materials and Methods section.

      Comment 3.2: The figures are excellent and really help the reader - with the exception of the screenshots (Figure 8). Using pymol or chimera (or any other more expensive commercial package) would really help the reader and will not take much time.

      We agree with the referee that this is not the most beautiful figure. However, we find the quality and clarity of the figure to be adequate for its purpose (i.e., a supplemental figure).

      Comment 3.3: Finally, I would have liked to see at least the PDB validation files. This is a highly regarded and experienced team, nevertheless, the resolution is rather mediocre. As the crystal coordinates were used as input for the computational, any experimental inaccuracies will affect the computational results.

      We agree with the reviewer that we could have provided the validation report together with the submitted manuscript and we apologise for the inconvenience. The validation reports will be released together with the structures following final manuscript publication. Regarding the resolution of the crystal structures, we agree with the reviewer’s comment, but we obviously employed data sets from our best diffracting crystals and could not obtain a higher resolution despite our best efforts.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Recommendations for the authors):

      While the authors have responded to most of the comments, a number of issues remain, most of which pertain to imprecise writing, as previously mentioned.

      In the second revision of our manuscript, we tried our best to precise our writing.

      For example, at high concentrations of PRG-GEF, the authors repeatedly state that RhoA is inhibited (including in the summary). While this may be functionally valid, it is imprecise. RhoA is activated (not inhibited), but its ability to promote contractility is impaired, presumably as a consequence of sequestration of the active GTPase by the PH domain of PRG-GEF. To put a finer point on this, the activity of RhoA•GTP is to bind to proteins that selectively bind active RhoA. One such protein the PH domain of PRG. In the case where PRG is overexpressed, RhoA•GTP binds to PRG. Due to the high concentrations of PRG in some cells, this outcompetes the ability of RhoA•GTP to bind other effectors such as formins or ROCK. However, there no strong evidence that RhoA is inhibited. The only hint of such evidence is a reduction in the biosensor for active RhoA, but this too is likely outcompeted by the overexpressed active GEF. There does not appear to be any disagreement about the mechanism, but rather a semantic difference.

      We thank Reviewer #2 for emphasizing this semantic concern, which indeed requires clarification. We agree that RhoA is not chemically inactivated; rather, the protein remains active but is functionally sequestered. Our use of the term “inhibition” was intended to describe functional inhibition, consistent with the definition of inhibition as the act of reducing, preventing, or blocking a process, activity, or function. However, we recognize that this terminology could be interpreted as imprecise. To address this, we have clarified the text by explicitly referring to "functional inhibition of RhoA signaling" where appropriate, or by rewording to terms such as "competitive inhibition of RhoA effector binding" to more accurately reflect the mechanism.

      Overall, the manuscript is written in a conversational style, not with the precision expected of a scientific manuscript.

      We acknowledge Reviewer #2’s comment regarding the style of our manuscript. While our manuscript adopts a somewhat conversational tone, this was a deliberate choice. We believe this style helps engage the reader and facilitates understanding of our reasoning, guided by the philosophy that science is conducted by humans and should be communicated in a way that resonates with them. That said, we fully agree that this approach should not compromise scientific precision. In response to this feedback, we have revised the manuscript to ensure greater clarity and precision while maintaining the approachable style we have chosen.

      To exemplify this, I provide an alternative phrasing of one such paragraph.

      Lines 51-62:

      Here, contrarily to previous optogenetic approaches, we report a serendipitous discovery where the optogenetic recruitment at the plasma membrane of GEFs of RhoA triggers both protrusion and retraction in the same cell type, polarizing the cell in opposite directions. In particular, one GEF of RhoA, PDZ-RhoGEF (PRG), also known as ARHGEF11, was most efficient in eliciting both phenotypes. We show that the outcome of the optogenetic perturbation can be predicted by the basal GEF concentration prior to activation. At high concentration, we demonstrate that Cdc42 is activated together with an inhibition of RhoA by the GEF leading to a cell protrusion. Thanks to the prediction of a minimal mathematical model, we can induce both protrusion and retraction in the same cell by modulating the frequency of light pulses. Our ability to control both phenotypes with a single protein on timescales of second provides a clear and causal demonstration of the multiplexing capacity of signaling circuits.

      Here, we report that the phenotypic consequences of plasma membrane recruitment of a guanine nucleotide exchange factor (GEF), PDZ-RhoGEF (PRG, aka ARHGEF11) depends on the level of expression and degree of recruitment of the GEF. At low concentrations, recruitment of PRG induces cell retraction, consistent with the expected function of a GEF for RhoA. However, at high concentrations, Cdc42 is activated, leading to cell protrusion. A minimal mathematical predicts, and experimental observations confirm, that the extent of recruitment determines the consequences of GEF recruitment. The ability of a single GEF to induce disparate outcomes demonstrates the multiplexing capacity of signaling circuits.

      We thank Reviewer #2 for providing an alternative phrasing for lines 51–62. We appreciate the effort to enhance clarity and precision in this key section of the manuscript. While we agree with many aspects of the suggested revision and have incorporated several elements to improve the text, we have also retained aspects of our original phrasing that align with the overall tone and structure of the manuscript. Specifically, we have ensured that the balance between precision and accessibility is maintained while integrating the reviewer's suggestions. We hope that the revised text now addresses the concerns raised.

      Key points to correct throughout the manuscript are:

      -  overexpression of PRG does not "inhibit" RhoA.

      -  retraction and protrusion are distinct phenotypes, they are not opposite phenotypes. One results from RhoA activation, the other results from Cdc42 activation.

      Regarding the term “inhibition,” we agree with the reviewer’s point and have addressed this in our earlier comment.

      Regarding the terminology of "opposite phenotypes," we believe this description is valid. While protrusion and retraction arise from distinct signaling pathways (Cdc42 activation and RhoA activation, respectively), we describe them as opposite phenotypes because they represent mutually exclusive cellular behaviors. A cell cannot protrude and retract at the same location simultaneously; instead, these behaviors represent opposing ends of the dynamic spectrum of cell morphology.

      Here are some other places where editing would improve the manuscript (a noncomprehensive list).

      We went through the whole manuscript to improve the scientific precision according to Reviewer #2 comment on the terminology “inhibition”.

      line 15 "inhibition of RhoA by the PH domain of the GEF at high concentrations."

      We modified the wording: “sequestration of active RhoA by the GEF PH domain at high concentrations”

      line 51 "Here, contrarily to previous optogenetic approaches"

      We removed “contrarily to previous optogenetic approaches"

      line 141 "We next wonder what could differ in the activated cells that lead to the two opposite phenotypes." (the state of mind of the authors is not relevant)

      As explained earlier, we made the choice to keep our writing style.

      line 185 "Very surprised by this ability of one protein to trigger opposite phenotypes"

      As explained earlier, we made the choice to keep our writing style.

      lines 206 ff "As our optogenetic tool prevented us from using FRET biosensors because of spectral overlap, we turned to a relocation biosensor that binds RhoA in its GTP form. This highly sensitive biosensor is based on the multimeric TdTomato, whose spectrum overlaps with the RFPt fluorescent protein used for quantifying optoPRG recruitment. We thus designed a new optoPRG with iRFP, which could trigger both phenotypes *but was harder to transiently express* (?? what does this have to do with the spectral overlap), giving rise to a majority of retracting phenotype. *Looking at the RhoA biosensor*, we saw very different responses for both phenotypes (Figure 3G-I). "

      We have clarified.

      lines 231ff "RhoA activity shows a very different behavior: it first decays, and then rises. It seems that, adding to the well-known activation of RhoA, PRG DH-PH can also negatively regulate RhoA activity." again, RhoA activity may appear to decay, but this is a limitation of the measurements. RhoA is likely activated to the GTP-bound form. PRG is not negatively regulating RhoA activity. An activity that prevents nucleotide exchange by RhoA or accelerates its hydrolysis would constitute negative regulation of RhoA.

      We modified the wording to clarify the sentence.

      The attempts to quantify the degree of overexpression, though rough, should be included in the version of record. It is not clear how that estimate was generated.

      The estimate of absolute concentration (switch at 200nM) was obtained by comparing fluorescent intensities of purified RFPt and cells under a spinning disk microscope while keeping the exact same acquisition settings. The whole procedure will be described in a manuscript in preparation, focused on Rac1 GEFs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ghone et al show that HIV-1 Vif causes a pseudo-metaphase arrest rather than a G2 arrest. The metaphase arrest correlates with misregulation of the kinetochore which could be explained by the loss of phosphatase functions that determine chromosome-microtubule interactions.

      Strengths:

      The single-cell imaging using different reporters of cell cycle progression is very elegant and the quantitation is convincing. The authors clearly show that what others have characterized as a G2 arrest by flow cytometry is somewhat later in metaphase and correlates with kinetochore misregulation.

      We sincerely appreciate the reviewer recognizing the quality and precision of our study, particularly our use of long-term live cell imaging combined with single-cell resolution analysis.

      Weaknesses:

      (1) The major problem with the paper is trying to connect what is observed in tumor cell lines with actual infections in primary T cells. While all of the descriptive work in cell lines is convincing, none of these cells are relevant targets and tumor cells have different cell death and cell cycle regulation than primary T cells. Thus, while Vif might well do all of the things described in the manuscript, it is a stretch to connect any of it to what happens in vivo.

      We fully agree with this point. It is indeed technically challenging to perform 48-120 hours of live-cell imaging at high magnification at short intervals using primary T cells because of their non-adherent nature. We also agree that Vif’s functions in pseudo-metaphase arrest and the consequent induction of cell death, observed in cancer cells (e.g., Cal51, HeLa, and MDA-MB-231 cell lines) or normal non-transformed epithelial cells (e.g., the RPE1 cell line), may differ in T cells. Further studies and refined approaches will be required to address this important question. We have revised the manuscript to include a discussion of this issue in the section of Limitation of this study.

      (2) Line 109 and elsewhere. The ability of Vif to cause cell cycle arrest and bind PP2A subunits is not a completely conserved feature. Rather, it is quite variable in different HIV-1 strains. (e.g. https://doi.org/10.1016/j.bbrc.2020.04.123 and https://elifesciences.org/articles/53036). Therefore, it is necessary for the authors to quite clearly use strain designations in the manuscript rather than a generic "Vif", and to more clearly describe the viruses being used.

      Thank you for raising this important point. We utilized the NL4-3 strain in our study and have revised the manuscript to specify this detail. While this study uncovered part of the mechanism by which Vif modulates phosphatase regulation during mitosis, further research is required to elucidate the full mechanism, particularly how this degradation induces a robust pseudo-metaphase arrest.

      (3) Figure 5: This figure shows disruption of PP2A-B56 at the kinetochores. However, is this specific to the kinetochores? Since Vif has been described to more broadly degrade PP2A-B56, could this not be a result of a more general decrease in PP2A activity throughout the cell?

      Thank you for highlighting this critical point. PP2A is a major serine/threonine phosphatase that regulates numerous essential cell cycle processes. To the best of our knowledge, Vif selectively targets the degradation of the B56 family of PP2A regulatory subunits, without affecting other three B-type subunits or the catalytic core of PP2A itself. During early mitosis, all five members of the B56 family (B56α, B56β, B56γ, B56δ, and B56ε) accumulate at kinetochores and centromeres, where they play critical roles in chromosome alignment. Many PP2A-B56 substrates are also localized to kinetochores and chromosomes during mitosis. Depletion of specific B56 isoforms or introduction of phosphorylation-deficient mutants of PP2A-B56 substrates at kinetochores has been shown to result in mitotic defects, underscoring the crucial roles of PP2A-B56 in regulating kinetochore, centromere, and chromosomal functions during mitosis. Interestingly, we observed no significant cell cycle arrest during G1, S, or G2 phases in Vif-expressing cells. While PP2A-B56 likely has important roles outside of mitosis, Vif-mediated degradation of PP2A-B56 appears to selectively disrupt its mitotic functions, particularly at the kinetochore. This finding highlights a targeted mechanism by which Vif interferes with PP2A-B56-mediated regulation of mitotic processes. However, further experiments are required to elucidate the precise mechanisms underlying Vif's inhibition of the specific mitotic roles of PP2A-B56.

      Reviewer #2 (Public review):

      Summary

      The authors characterize the cell-cycle arrest induced by HIV-1 Vif in infected cells. They show this arrest is not at G2/M as previously thought but during metaphase. They show that the metaphase plate forms normally but progression to anaphase is massively delayed, and chromosome segregation is dysregulated in a manner consistent with impaired assembly of microtubules at the kinetochore. This correlates with the lack of recruitment of B56-subunits of PP2 phosphatase which are known degradation targets of Vif, suggesting that this weakens and unbalances the microtubule-mediated forces on the separating chromosomes.

      Strengths

      The authors present a very well-performed set of quantitative live cell imaging experiments that convincingly show a difference between Vif and Vpr-mediated cell cycle arrests. Through an in-depth characterization of the Vif-mediated block in metaphase, they make a strong case for this phenotype being tied to the degradation of PP2-B56 by Vif. Furthermore, it is important that they have performed most of these experiments with virally infected cells, meaning that their observations are observable at relevant viral expression levels of Vif.

      We appreciate the reviewer’s recognition of the importance and significance of our study.

      Weaknesses

      Experimentally there is very little to criticize with respect to the cellular systems used. Data from 10.1016/j.bbrc.2020.04.123 has identified selective mutants that fail to degrade B56 while maintaining A3G degradation by Cul5, and it would be nice to confirm that such a mutant behaves like the delta-Vif virus when examining metaphase, but selective ablation of B56 during mitosis to mimic Vif is would expect to be very challenging and beyond the scope.

      Thank you for your valuable suggestion. As also highlighted by Reviewer #1, it is true that certain variants of Vif, as discussed in 10.1016/j.bbrc.2020.04.123, differentially impact B56 degradation. Notably, some variants degrade A3G without inducing cell cycle arrest. We agree that investigating whether Vif's effects on B56 are directly linked to the mitotic arrest phenotype is an important direction for future research. Equipped with our advanced imaging tools, we are now preparing to extend our studies to include Vif variants from additional HIV-1 subtypes, including primary isolates. As you rightly pointed out, depletion of B56 is expected to be challenging as the B56 family comprises multiple isoforms, each with distinct and partially redundant roles in mitosis, particularly in microtubule assembly and spindle assembly checkpoint regulation. The functions of PP2A-B56 in mitosis are well-documented compared to the relatively new studies on Vif’s role in PP2A-B56 degradation. In human cells, the B56 family comprises 5 isoforms (B56α, B56β, B56γ, B56δ, and B56ε). While all B56 isoforms localize to kinetochores or centromeres during early mitosis, the reasons for their slightly different localization patterns (to either kinetochores or centromeres) remain unclear (Vallardi et al., eLife, 2019). Notably, these isoforms exhibit functional redundancy; thus, the depletion of any single isoform does not result in severe mitotic defects (Foley et al., Nature Cell Biology, 2011; Neumann et al., Nature, 2010). Supporting this redundancy, the overexpression of a single isoform (tested only B56α and B56γ) can rescue kinetochore function when all other isoforms are depleted (Foley et al., Nature Cell Biology, 2011; Vallardi et al., eLife, 2019). This complexity poses significant challenges to modulating the relative levels of individual B56 isoforms experimentally. While these specific experiments are beyond the current scope of our study, we remain committed to advancing our understanding of the mechanisms driving Vif-induced pseudo-metaphase arrest. Your suggestion aligns with our ongoing efforts, and we will consider these experiments as we further explore this fascinating area.

      Where I would raise some criticism is in the relevance of these observations to the replication and pathogenesis of the virus itself, which the authors do not address or discuss. Firstly, despite clear data that both Vpr and Vif can lead to a cell cycle arrest in cycling cells, it has never been particularly clear why the virus does this. While I would agree with the authors that Vif results in the metaphase arrest through targeting B56-PP2A, this may not be the reason WHY the virus targets one of the cell's major phosphatases, but rather a knock-on effect of doing so. I appreciate that this is beyond the scope of the study, but it is something I feel should be discussed rather than the narrow mechanistic points made in the discussion. Secondly, the authors suggest that this activity of Vif is a major cause of apoptosis in infected cells and perhaps CD4+ T cell depletion in vivo. It would be good to quantify how much apoptosis is Vif-dependent in infected primary human CD4+ T cells rather than transformed tumor cells, and whether this correlates with the Vif-mediated induction of a pseudometaphase.

      Thank you for highlighting this important point. We completely agree that the full scope of Vif’s bi-functional roles, in both degrading the APOBEC3 family, which is essential for HIV-1 infection, and inducing cell cycle arrest, is not yet fully understood. The connection between Vif’s role in cell cycle arrest and the HIV-1 life cycle remains unclear. One possible explanation, as discussed in our study, is that Vif-induced pseudo-metaphase arrest may contribute to cell death, suggesting that Vif could play a role in the reduction of CD4+ T cells. Alternatively, Vif’s impact on cell cycle arrest, or its disruption of phosphatase activity, could facilitate HIV-1 virus production. However, further experiments, especially using primary human CD4+ T cells with similar approaches as in this study, are essential to gain deeper insights. This discussion has been included in the Limitations section of our study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The first paragraph of the Introduction is not necessary and anyway is quite outdated about the current state of HIV pathogenesis. Likewise, the discussion implies that HIV pathogenesis is due to virally-induced cell death, which is also outdated by more than a decade of work demonstrating that chronic immune activation is the driver of CD4 cell decline rather than direct cytotoxicity due to viral proteins.

      We have revised the first paragraph of the Introduction.

      (2) Line 134. I do not know what are Cal51 cells, and why they are being used for an HIV study here. Some rationale for being the cell of choice for this study should be included.

      Thank you for this suggestion. We have revised the text to clearly articulate the rationale for selecting the Cal51 cell line in this study. Briefly, this study focuses on the robust mitotic arrest induced by Vif. To capture this phenomenon, long-term live-cell imaging was required with a range of 48–120 hours, with imaging intervals of 6–12 minutes and 3–4 z-stacks per time point. These parameters presented considerable technical challenges. The Cal51 cell line was chosen as it has been genetically engineered by the CRISPR-Cas9 method to express mScarlet-tagged Histone H2B and mNeonGreen-tagged Tubulin, enabling extended live-cell imaging. Furthermore, the Cal51 cell line exhibits wild-type p53 expression and maintains a stable near-diploid karyotype, making it an ideal model for studying cell cycle progression.

      (3) A description of the viruses being used is necessary. Although the authors cite a previous paper, the names in that paper do not exactly match the names used here. I presume that is the NL4.3 strain?

      Thank you for raising this important point. We utilized the B type HIV-1 NL4-3 strain in our study and have revised the manuscript to specify this detail.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Reviews):

      Summary:

      This study examines to what extent this phenomenon varies based on the visibility of the saccade target. Visibility is defined as the contrast level of the target with respect to the noise background, and it is related to the signal-to-noise ratio of the target. A more visible target facilitates the oculomotor behavior planning and execution, however, as speculated by the authors, it can also benefit foveal prediction even if the foveal stimulus visibility is maintained constant. Remarkably, the authors show that presenting a highly visible saccade target is beneficial for foveal vision as the detection of stimuli with an orientation similar to that of the saccade target is improved, the lower the saccade target visibility, the less prominent the effect.  

      Strengths:

      The results are convincing and the research methodology is technically sound.

      Weaknesses:

      Discussion on how this phenomenon may unfold in natural viewing conditions when the foveal and saccade target stimuli are complex and are constituted by different visual properties is lacking. Some speculations regarding feedforward vs feedback neural processing involved in the phenomenon and the speed of the feedforward signal in relation to the visibility of the target, are not well justified and not clearly supported by the data.

      We thank the reviewer for their comment. In general, we tried to address conceptual points only briefly in this Research Advance if we had discussed them in depth in our main article which this advance will be linked to (Kroell & Rolfs, 2022: https://elifesciences.org/articles/78106). However, the reviews showed us that this rendered our theoretical reasoning in the current manuscript appear incomplete. In the revised Discussion section, we have elaborated on several conceptual questions. In particular, we expand on the transferability of our findings to natural viewing conditions:

      “Foveal prediction in natural visual environments

      As noted above, human observers typically move their eyes towards the most conspicuous objects in their environment (‘t Hart, Schmidt, Roth, & Einhäuser, 2013). Foveal prediction seems to benefit from this strategy as the strength of the predicted signal increases with the conspicuity of the eye movement target. Nonetheless, natural visual environments as well as naturalistic viewing behavior pose several challenges for the foveal prediction mechanism (see Kroell & Rolfs, 2022, for an initial discussion). 

      First, naturalistic saccade target stimuli will likely exhibit complex shapes and, more often than not, will include feature conjunctions rather than isolated features. Previous findings suggest that the foveal feedback mechanism is capable of operating at this level of complexity: High-level peripheral information such as the category of novel, rendered objects (Williams et al., 2008) has been successfully decoded from activation in foveal retinotopic cortex. If, indeed, temporal objectspecific areas such as area TE send feedback, the foveal prediction mechanism may even be specialized for the transfer of complex visual properties.

      Second, foveal input will often be of high contrast in natural visual environments. If fed-back predictive signals can influence foveal perception in the presence of high-contrast feedforward input remains to be established. In our main investigation (Kroell & Rolfs, 2022; Figure 2B) as well as in previous studies (Hanning & Deubel, 2022b), pre-saccadic foveal detection performance decreased markedly in the course of saccade preparation, presumably because visuospatial attention gradually shifted towards the saccade target and away from the foveal location. This presaccadic decrease in foveal sensitivity may boost the relative weight of fed-back signals by attenuating the conspicuity of high-contrast feedforward input. In other words, the strength of feedforward input to the fovea is reduced gradually across saccade preparation. At the same time, the strength of the fed-back predictive signal should profit from the high contrast of naturalistic saccade targets.

      Third, while foveal and peripheral information was congruent on 50% of all ‘probe present’ trials in our investigation, peripheral and foveal features will often be weakly correlated or even uncorrelated in natural environments (see Samonds, Geisler, & Priebe, 2018). Again, the presaccadic attenuation of foveal feedforward processing may allow fed-back peripheral signals to influence perception even if they are uncorrelated with foveal information. Moreover, in piloting variations of our paradigm, we observed that the subjective impression of perceiving the saccade target at the pre-saccadic foveal location is most pronounced if the foveal noise region is replaced with a black Gaussian blob at certain time points before saccade onset (unpublished phenomenological accounts). In consequence, fed-back signals do not seem to require correlated feedforward input to influence perception. Quantitative evidence, however, remains to be established.

      Lastly, pre-saccadic foveal input is likely less relevant during natural viewing behavior than it is in our task. It is possible that this task-induced prioritization of the foveal location facilitated the emergence of congruency effects. In a previous experiment (Kroell & Rolfs, 2022; Figure 1D), however, the perceptual probe could appear anywhere on a horizontal axis of 9 dva length around the fixation location. Despite this spatial unpredictability, congruency effects peaked at the presaccadic foveal location, even after peripheral baseline performances had been raised to a foveal level through an adaptive increase in probe opacity. On a similar note, the orientation of the saccade target is irrelevant to the behavioral task in our design, mirroring naturalistic situations: The eye movement can be planned and executed based on local contrast variations alone, and observers are never required to report on the orientation of the peripheral target stimulus. Ultimately, however, an influence of task demands on visual processing can only be fully excluded through techniques that provide a direct readout of perceptual contents without requiring overt responses. In psychophysical investigations, a prediction of saccade target motion may be read out from observers’ eye velocities (Kroell, Mitchell, & Rolfs, 2023; Kwon, Rolfs, & Mitchell, 2019). In electroencephalographic (EEG) and electrophysiological studies, foveal predictions should manifest in early visually evoked potentials (e.g., Creel, 2019) and increased firing rates of featureselective foveal neurons in early visual areas, respectively. In conclusion, previous findings (Williams et al., 2008), the assumed properties of the neuronal feedback mechanism (Williams et al., 2008; Bullier, 2001) and characteristics of our current and previous experimental paradigms collectively suggest that foveal feature predictions are likely to transfer to naturalistic environments and viewing situations. Experimental evidence remains to be established.”

      We have furthermore modified the Abstract to emphasize the connection of the current manuscript to the main article.

      With respect to the reviewer’s point that “speculations regarding feedforward vs feedback neural processing involved in the phenomenon and the speed of the feedforward signal in relation to the visibility of the target, are not well justified”: 

      Again, we understand that we should have elaborated on our theoretical reasoning in this Research Advance. The assumption that our initial findings rely on neuronal feedback to foveal retinotopic cortex is derived from Williams et al.’s (2008) seminal findings: In an fMRI study, the category of peripherally presented objects could be decoded from voxels in foveal retinotopic cortex, suggesting that peripheral visual information was available to neurons with strictly foveal receptive fields. We extended these findings to saccade preparation, suggesting that feedback from higher-order, non-retinotopically organized visual areas may transmit information without the requirement of efference copies (see Kroell, 2023; Dissertation; https://doi.org/10.18452/27204, pp. 54-59): Irrespective of the vector of the upcoming saccade, the features of the attended saccade target would invariably be relayed to foveal retinotopic cortex. Ultimately, only anatomical and functional studies in non-human primates can conclusively establish the role of feedback connections in the observed foveal prediction effects. At present, however, this parsimonious model could account for all of our current and previous findings, that is, a temporally, spatially and feature-specific anticipation of saccade target properties in the presaccadic center of gaze. Nonetheless, we are open to considering any other mechanism that may account for our findings, and have integrated the explanation provided by the reviewer into the paragraph on potential thalamic mechanisms (see the reviewer’s Major Point 1).

      Concerning the point that the “some speculations regarding feedforward vs feedback neural processing […] and the speed of the feedforward signal in relation to the visibility of the target are not well justified and not clearly supported by the data”: 

      Theoretical considerations on the impact of peripheral target contrast on feedforward processing speed were a main motivation for the current study. We apologize if our theoretical reasoning was incomplete and have added additional references and elaborations to the Introduction: 

      “In particular, neuronal response latencies decrease systematically as the contrast of visual input increases. While this phenomenon is reliably observed at varying stages of the visual processing hierarchy—such as the lateral geniculate nucleus (Lee, Elepfandt, & Virsu, 1981b), primary visual cortex (e.g., Albrecht, 1995; Carandini & Heeger, 1994; Carandini, Heeger, & Movshon, 1997; Carandini, Heeger, & Senn, 2002), and anterior superior temporal sulcus (STSa; Oram, Xiao, Dritschel, & Payne, 2002; van Rossum, van der Meer, Xiao, & Oram, 2008)—influences of contrast on neuronal response latency are particularly pronounced in higher-order visual areas: A doubling of stimulus contrast has been shown to decrease the latency of V1 neurons by 8 ms, compared to a reduction of 33 ms in area STSa (Oram et al., 2002; van Rossum et al., 2008). Assuming that the peripheral target is processed in a bottom-up fashion until it reaches higher-order object processing areas, the time point at which peripheral signals are available for feedback should be dictated by the temporal dynamics of visual feedforward processing.”

      Concerning the interpretation of the observed time courses, and regarding the reviewer’s Major points 3 & 6, we substantially revised the Results and Discussion section. In brief, we deemphasized the claim/interpretation of faster enhancement with increasing target opacity and instead focus on describing the oscillatory pattern mentioned by the reviewer. We provide a more temporally resolved pre-saccadic time course using a moving-window analysis and discuss all suggested and further alternative explanations (i.e., saccade-locked perceptual or attentional oscillations, longer signal accumulation intervals for low-contrast information, oscillatory nature of feedback signaling). Details and full revised paragraphs are provided in the response to this reviewer’s Major points 3 & 6.

      Unfortunately, there is no line numbering in the manuscript version I downloaded so I cannot refer to the specific lines of text here.

      We apologize for the inconvenience and have added line numbers.

      Major:

      (1) The authors speculate that the phenomenon of pre-saccadic foveal prediction arises from feedback connections from higher-order visual areas, which relay relevant saccade target features to the foveal retinotopic cortex. These feedback signals are then presumably combined with feedforward foveal input to the early visual cortex and facilitate the detection of target-congruent features at the center of gaze. This interpretation is sensible, however, it may not be the only plausible scenario. The thalamus receives copies of feedforward and feedback connections between all visual areas and is a likely candidate hub for combining information across visual space. In this latter case, the phenomenon of pre-saccadic foveal prediction may not arise from feedback from higher-order visual areas, but rather from a combination of signals occurring at the level of the thalamus. The authors should either acknowledge this possibility and the fact that this phenomenon is not necessarily the result of a feedback loop, or they should explain their rationale for excluding this scenario.

      We thank the reviewer for their highly thoughtful suggestion, and for alerting us to relevant literature. We have added the following paragraph to the Discussion section. In brief, we discuss the thalamic pulvinar as either an intermediate modulatory region or as the final receiver of the fed-back signal. Yet, we assume that—to solve the combinatorial issue associated with a transfer of feature information before saccades with any possible direction and amplitude—the contribution of non-retinotopic, higherorder object processing areas is likely required. 

      “Neural implementation of foveal prediction

      Based on the body of our findings as well as previous literature, we suggested a parsimonious feedback mechanism to underly the observed effects: the preparation of a saccadic eye movement, and the concomitant shift of pre-saccadic attention (e.g., Kowler, Anderson, Dosher, & Blaser, 1995; Deubel & Schneider, 1996), selects the peripheral target stimulus among competing information. Higher-order visual areas feed selected feature input back to early retinotopic areas— specifically, to neurons with foveal receptive fields. Fed-back feature information combines with congruent, foveal feedforward input, resulting in the enhancement effects we observe. Especially in the context of active vision, this feedback mechanism is appealing as it resolves a combinatorial issue associated with feature-specific information transfer before saccades. Consider a simplified case in which, right before a saccadic eye movement, the activation of a feature-selective neuron that encodes a certain retinal location is transferred to a neuron within the same brain area that will encode said retinal location after saccade landing. For this mechanism to function for any possible saccade direction and amplitude, most neurons would need to be connected to most other neurons (or, in a simplified version, to neurons with foveal receptive fields) in a given brain area. Assuming an information transmission via feedback rather than horizontal connections significantly reduces this dimensionality: Higher-order visual areas that encode object properties (largely) detached from retinotopic or spatiotopic reference frames selectively transfer feature information to neurons with foveal receptive fields, irrespective of the vector of the upcoming saccade. This parsimonious mechanism would have shortcomings. In particular, foveal feedback should become less effective during saccade sequences where several peripheral targets are simultaneously attended. Feature information at both attended target locations may be fed back in temporal succession or weighted and erroneously combined into a single fed-back signal. In most cases, however, foveal feedback may reasonably achieve what established transsaccadic mechanisms struggle to explain: An anticipation of the features of a single saccade target—which typically constitutes the currently most relevant object in the visual field—in foveal vision. 

      While direct feedback connections from higher-order to early visual areas would constitute the most straightforward implementation, it is conceivable that feedback signals are relayed through and modulated by subcortical areas. In particular, the thalamic pulvinar has been identified as a connection hub for visual processing that receives copies of feedforward and feedback connections from different visual areas and may even combine information across visual space (Cortes, Ladret, Abbas-Farishta, & Casanova, 2024). In the case of foveal prediction, thalamic neurons may receive fed-back signals from higher-order areas and enhance those signals before passing them on to cortical neurons with foveal receptive fields. Perhaps, a modification of foveal activation within the thalamic pulvinar itself is sufficient to influence perception. To the best of our understanding, however, the fed-back signal must originate in non-retinotopic, higher-order object processing areas to reduce the number of necessary neuronal connections.”

      (2) The results presented are very compelling. I wonder to which extent they generalize to situations in which the foveal input and the peripheral input are more heterogenous (e.g., faces or complex objects composed of many different features, orientations, and other visual properties). I think the current research raises a number of interesting questions. In general, it would be important for the readers to elaborate more on how the mechanism of pre-saccadic foveal prediction may play out in normal viewing conditions or in conditions in which the foveal input is completely irrelevant to the task.

      We agree and have reiterated this point in the current manuscript (see our first reply to “Weaknesses”). We also explicitly refer to Kroell & Rolfs (2022) for an extensive initial discussion of this question.

      (3) On page 10 the authors state that their data suggest that foveal enhancement emerges in earlier stages of saccade preparation as target opacity increases. However, this is not clear from the figures, when performance is locked to saccade onset (Fig 3 C), for the highest opacity targets performance seems to oscillate, however, the authors do not comment on that. There is literature showing how saccades can reset perceptual oscillations, and maybe what is observed here is just a stronger performance oscillation when the saccade target is more visible. Why would performance drop systematically 75 ms before saccade onset and then increase again 25 ms before the onset? Can the authors elaborate more on this?

      In response to this comment, we inspected the pre-saccadic time course of enhancement effects in a more temporally resolved fashion and, indeed, observed pronounced oscillations for the two higher target opacity conditions (see Results): 

      “Especially at higher target opacities, the temporal development of foveal enhancement appears to exhibit an oscillatory pattern. To inspect this incidental observation in a more temporally resolved fashion, we determined mean enhancement values in a boxcar window of 50 ms duration sliding along all saccade-locked probe offset time points (step size = 10 ms; x-axis values in Figure 4 indicate the latest time point in a certain window). We then fitted 6th order polynomials (with no constraints on parameters) to the resulting time courses and compared the fitted values against zero using bootstrapping (see Methods). The average foveal enhancement across target opacities reached significance starting 115 ms before saccade onset (gray curve in Figure 4; all ps < .046). For every individual target opacity condition, we observed significant enhancement immediately before saccade onset, although only very briefly for the lowest opacity (-2–0 ms for 25%; -39–0 ms for 39%, -106–0 ms for 59% &  -13–0 ms for 90%; all ps < .050; yellow to dark red curves in Figure 4). Especially for the higher two target opacities, we observed a local maximum preceding eye movement onset by approximately 80 ms. Interestingly, assuming a peak in enhancement in approximately 80 ms intervals (i.e., at x-axis values of -80 and 0 ms in Figure 4) would correspond to an oscillation frequency of 12.5 Hz. In contrast to rapid feedforward processing, feedback signaling is associated with neural oscillations in the alpha and beta range (i.e., between 7 and 30 Hz; Bastos et al., 2015; Jensen, Bonnefond, Marshall, & Tiesinga, 2015; van Kerkoerle et al., 2015).”

      We had observed an oscillatory pattern in multiple previous investigations, and in both Hit Rates to foveal orientation content and reflexive gaze velocities in response to peripheral motion information. So far, we have been unsure how to explain it. The literature on thalamic visual processing mentioned by the reviewer alerted us to the oscillatory nature of feedback signaling itself. Interestingly, the temporal frequency range of feedback oscillations includes the frequency of ~12.5 Hz observed in our data. We have included this and alternative explanations in the Discussion section (see below). Throughout, we highlight that we are aware that our analysis approach is purely descriptive and that the potential explanations we give are speculative.

      “Moreover, foveal congruency effects appear to exhibit an oscillatory pattern, with peaks in a medium saccade preparation stage (~80 ms before the eye movement) and immediately before saccade onset. We have noticed this pattern in several investigations with substantially different visual stimuli and behavioral readouts. For instance, using a full-screen dot motion paradigm, we observed a pre-saccadic, small-gain ocular following response to coherent motion in the saccade target region (Kroell, Rolfs, & Mitchell, 2023, conference abstract; Kroell, 2023, dissertation). Predictive ocular following first reached significance ~125 ms before the eye movement, then decreased and subsequently ramped up again ~25 ms before saccade onset. Several explanatory mechanisms appear conceivable. Unlike rapid feedforward processing, feedback propagation has been shown to follow an oscillatory rhythm in the alpha and beta range, that is, between 7 and 30 Hz (Bastos et al., 2015; Jensen, Bonnefond, Marshall, & Tiesinga, 2015; van Kerkoerle, et al., 2015). In our case, it is possible that the object-processing areas that send feedback to retinotopic visual cortex do so at a temporal frequency of ~12.5 Hz. At higher stimulus contrasts, feedforward signals may be fed back instantaneously and without the need for signal accumulation in feedbackgenerating areas. The resulting perceptual time courses may reflect innate temporal feedback properties most veridically. Alternatively, the initial enhancement peak may be related to the sudden onset of the saccade target stimulus and not to movement preparation itself. In this case, the initial peak should become particularly apparent if enhancement is aligned to the onset of the target stimulus. Yet, Figure 3 and Figure 4 suggest more prominent oscillations in saccade-locked time courses. In accordance with this, perceptual and attentional processes have been shown to exhibit oscillatory modulations that are phase-locked to action onset (e.g., Tomassini, Spinelli, Jacono, Sandini, & Morrone, 2015; Hogendoorn, 2016; Wutz, Muschter, van Koningsbruggen, Weisz, & Melcher, 2016; Benedetto & Morrone, 2017; Tomassini, Ambrogioni, Medendorp, & Maris, 2017; Benedetto, Morrone & Tomassini, 2019). Whether the oscillatory pattern of foveal enhancement, as well as its increased prominence at higher target contrasts, relies on innate temporal properties of feedback signaling, signal accumulation, saccade-locked oscillatory modulations of feedforward processing or attention, or a combination of these factors, one conclusion remains: task-induced cognitive influences suggested to underlie the considerable variability in temporal characteristics of foveal feedback during passive fixation (e.g., Fan et al., 2016; Weldon et al., 2016; 2020) are not the only possible explanation. Low-level target properties such as its luminance contrast modulate the resulting time course and should be equally considered, at least in our paradigm.”

      In the revised Abstract, we removed our claim on an earlier emergence of enhancement at higher opacities and have added this summary instead:

      “Second, the time course of foveal enhancement appeared to show an oscillatory pattern that was particularly pronounced at higher target opacities. Interestingly, the temporal frequency of these oscillations corresponded to the frequency range typically associated with neural feedback signaling.”

      (4) What was the average difference in latency between short and long latencies? It would be good to report it in the main text.

      We apologize for the oversight. The difference was 61 ms, with latencies of md = 247±18 ms for short- and md = 308±18 ms for long-latency saccades. We have added this information to the main text.

      (5) From the saccade latency graphs in Figure S1 it seems there is some variability in the latency of saccades across subjects, I wonder if there is a correlation between saccade latency and the magnitude of the foveal prediction effect across subjects.

      We had inspected a connection between saccade latency and congruency in our first investigation (Kroell & Rolfs, 2022; not reported) and observed that participants with lower latencies tended to show more enhancement, albeit non-significantly. Likewise, we observed a non-significant negative correlation between the median saccade latency and the mean foveal prediction effect (across opacities and time points) in the current investigation, r \= -0.22, p \= .572. While our study involved a small number of observers (n = 9), the analysis approach illustrated in Figure 2 A-C instead makes use of the large number of trials collected per participant (mean n = 2841 trials per observer) and demonstrates a reliable influence of saccade latency on an individual-observer level.

      (6) Page 14, the authors state that their findings suggest that the feedforward processing of the peripheral saccade target is accelerated when it is presented at high contrast. I find this a bit too speculative, both in terms of assuming that there is a feedforward vs a feedback process (see my point 1) and in terms of speculating that the feedforward process is accelerated as I do not see a clear hint of this in the data (see my point 3) and it is a bit of a stretch to speculate on delays or accelerations of neural processing. It is possible that the feedforward signal is always delivered at the same speed but it is weaker in one case and the effect needs more time to build up.

      We fully agree and hope to have addressed the reviewer’s arguments in the sections preceding this point. We included the reviewer’s last sentence in the Discussion section as well: 

      “Alternatively, or in addition, it is conceivable that weaker feedforward signals require a longer accumulation interval before the feedback process can be initiated.”

      Minor:

      (1) I think the description of the linear mixed-effects model can go in the supplemental methods, if possible, and its results can be briefly mentioned in the text.

      In previous work, we have been asked to move linear mixed-effects model descriptions from supplemental to main method (or even results) sections for clarity. We have followed this suggestion ever since and, due to the relevance of the models for the interpretation of the presented results, would like to keep their description in the methods section.

      (2) This is just a minor point, but I would suggest using a different word instead of opacity (maybe visibility?).

      We had gone back and forth on this. We decided to use the term ‘conspicuity’ when we discuss our findings conceptually and the term ‘opacity’ when we refer to the experimental manipulation (since we directly manipulate the transparency, i.e., 1-opacity, of the target patch against the background). To compute the slopes in Figures 2 and 5, we ordered observers’ performances by the linearly spaced opacity conditions. Since the term ‘opacity’ is closest to both the experimental manipulation and the variable entered into analysis, we would like to adhere to this terminology. However, we have added an explicit note to the end of our introduction to avoid confusion: 

      “Throughout the paper, we use the term ‘opacity’ when we refer to the experimental manipulation (that is, a variation of the transparency, i.e., 1-opacity of the target patch against the background noise) and the term ‘conspicuity’ when we discuss our findings conceptually.”

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors ran a dual task. Subjects monitored a peripheral location for a target onset (to generate a saccade to), and they also monitored a foveal location for a foveal probe. The foveal probe could be congruent or incongruent with the orientation of the peripheral target. In this study, the authors manipulated the conspicuity of the peripheral target, and they saw changes in performance in the foveal task. However, the changes were somewhat counterintuitive.

      Strengths:

      The authors use solid analysis methods and careful experimental design.

      Weaknesses:

      I have some issues with the interpretation of the results, as explained below. In general, I feel that a lot of effects are being explained by attention and target-probe onset asynchrony etc, but this seems to be against the idea put forth by the authors of "foveal prediction for visual continuity across saccades". Why would foveal prediction be so dependent on such other processes? This needs to be better clarified and justified.

      We address the described weaknesses in the respective sections below. In general, as we point out in response to Reviewer 1 as well, the current submission is a Research Advance article meant to supplement our main article (Kroell & Rolfs, 2022, https://doi.org/10.7554/eLife.78106). To comply with the eLife recommendations for Research Advance submissions, we addressed conceptual points only briefly, especially if they had been explained in detail in our main article. To make the nature and format of the current submission as explicit as possible, and to emphasize its connection to our previous work, we refer to the submission format in our abstract and introduction now.

      Specifics:

      The explanation of decreased hit rates with increased peripheral target opacity is not convincing. The authors suggest that higher contrast stimuli in the periphery attract attention. But, then, why are the foveal results occurring earlier (as per the later descriptions in the manuscript)? And, more importantly, why would foveal prediction need to be weaker with stronger pre-saccadic attention to the periphery? What is the function of foveal prediction? What of the other interpretation that could be invoked in general for this type of task used by the authors: that the dual task is challenging and that subjects somehow misattribute what they saw in the peripheral task when planning the saccade. i.e. foveal hit rates are misperceptions of the peripheral target. When the peripheral target is easier to see, then the foveal hit rate drops.

      We will address these comments one by one:

      The authors suggest that higher contrast stimuli in the periphery attract attention. But, then, why are the foveal results occurring earlier (as per the later descriptions in the manuscript)?

      We consider these observations to rely on separate processes. Already in the main publication (Kroell & Rolfs, 2022), we had observed a continuous decrease of target-congruent and target-incongruent foveal Hit Rates (HRs) during saccade preparation, and suggested that this decrease (similarly observed in Hanning & Deubel, 2022b is likely caused by the pre-saccadic shift of visuospatial attention to the target. In other words, as attentional resources shift towards the periphery, foveal detection performance is hampered, irrespective of peripheral and foveal feature (in-)congruency. In the current investigation, we again observed a pronounced pre-saccadic decrease of foveal HRs, irrespective of foveal probe orientation. Our argument that high-contrast peripheral saccade targets attract more attention relies on the clear observation that this decrease becomes more pronounced as the contrast of the saccade target increases. To the best of our judgment and experience with doing the task ourselves, this interpretation appears very conceivable. We explain this rationale in the Abstract and the Results sections of the manuscript (see below).

      Our hypotheses and interpretations concerning the time course of foveal prediction refer to the difference between target-congruent and target-incongruent foveal HRs (i.e., to predictive foveal feature enhancement). Irrespective of the general, feature-unspecific decrease of foveal detection performances, we had hypothesized that the peripheral target is processed faster if it exhibits a high contrast. This assumption is based on temporal processing properties of many visual neurons that we have expanded on in our revision: 

      “In particular, neuronal response latencies decrease systematically as the contrast of visual input increases. While this phenomenon is reliably observed at varying stages of the visual processing hierarchy—such as the lateral geniculate nucleus (Lee et al., 1981b), primary visual cortex (e.g., Albrecht, 1995; Carandini et al., 1997, 2002; Carandini and Heeger, 1994), and anterior superior temporal sulcus (STSa; Oram et al., 2002; van Rossum et al., 2008)— influences of contrast on neuronal response latency are particularly pronounced in higher-order visual areas: A doubling of stimulus contrast has been shown to decrease the latency of V1 neurons by 8 ms, compared to a reduction of 33 ms in area STSa (Oram et al., 2002; van Rossum et al., 2008). Assuming that the peripheral target is processed in a bottom-up fashion until it reaches higher-order object processing areas, the time point at which peripheral signals are available for feedback should be dictated by the temporal dynamics of visual feedforward processing.”

      Of note, both reviewers asked us to explore the oscillatory nature of the difference between targetcongruent and target-incongruent HRs. We will post our changes in response to the reviewer’s remark below.

      And, more importantly, why would foveal prediction need to be weaker with stronger pre-saccadic attention to the periphery?

      We hope that our previous reply has cleared up that the opposite is true: In general, and irrespective of the feature congruency of target and foveal probe, foveal HRs decrease as target contrast increases. As we have stated in our Abstract and Results, “foveal Hit Rates for target-congruent and incongruent probes decreased as target opacity increased, presumably since attention was increasingly drawn to the target the more salient it became. Crucially, foveal enhancement defined as the difference between congruent and incongruent Hit Rates increased with opacity”. This finding did not appear counterintuitive to us and was, in fact pre-registered as a main hypothesis (see https://osf.io/wceba). 

      We are unsure if this goes beyond the reviewer’s concern but we, in fact, speculate in the revised Discussion section as well as in our original eLife article that the overall, feature-unspecific decrease in foveal detection performances may aid feature-specific foveal prediction: 

      “This pre-saccadic decrease in foveal sensitivity may boost the relative weight of fed-back signals by attenuating the conspicuity of high-contrast feedforward input. In other words, the strength of feedforward input to the fovea is reduced gradually across saccade preparation. At the same time, the strength of the fed-back predictive signal should profit from the high contrast of naturalistic saccade targets.”

      What is the function of foveal prediction?

      Please refer to the section ‘What is the function of foveal prediction?’ in our main article. We have pasted this paragraph below for the reviewer’s convenience. 

      “What is the function of foveal prediction?

      As stated above, previous investigations on foveal feedback required observers to make peripheral discrimination judgments. We, in contrast, did not ask observers to generate a perceptual judgment on the orientation of the saccade target. Instead, detecting the target was necessary to perform the oculomotor task. While the identification of local contrast changes would have sufficed to direct the eye movement, the orientation of the target enhanced foveal processing of congruent orientations. The automatic nature of foveal enhancement showcases that perceptual and oculomotor processing are tightly intertwined in active visual settings: planning an eye movement appears to prioritize the features of its target; commencing the processing of these features before the eye movement is executed may accelerate post- saccadic target identification and ultimately provide a head start for corrective gaze behavior (Deubel et al., 1982; Ohl and Kliegl, 2016; Tian et al., 2013).”

      What of the other interpretation that could be invoked in general for this type of task used by the authors: that the dual task is challenging and that subjects somehow misattribute what they saw in the peripheral task when planning the saccade. i.e. foveal hit rates are misperceptions of the peripheral target. When the peripheral target is easier to see, then the foveal hit rate drops.

      Alternative explanations in general: In our main article, we ruled out—either through direct experimentation or by considering relevant properties of our findings—the following alternative explanations: i) spatially global feature-based attention to the target orientation, ii) a multiplicative combination of spatial and feature-based attention, and iii) shifts of decision criterion. While dual tasks (i.e., simultaneous oculomotor planning and perceptual detection) are standard in psychophysical investigations of active vision, we acknowledge the potential influence of an explicit foveal task in the revised manuscript, and in response to both reviewers: 

      “Lastly, pre-saccadic foveal input is likely less relevant during natural viewing behavior than it is in our task. It is possible that this task-induced prioritization of the foveal location facilitated the emergence of congruency effects. In a previous experiment (Kroell & Rolfs, 2022; Figure 2D), the perceptual probe could appear anywhere on a horizontal axis of 9 dva length around the screen center. Despite this spatial unpredictability, however, congruency effects peaked at the pre-saccadic foveal location, even after peripheral baseline performances had been raised to a foveal level through an adaptive increase in probe opacity. Ultimately, an influence of task demands on visual processing can only be fully excluded through techniques that provide a direct readout of perceptual contents without requiring keyboard responses. In psychophysical investigations, a prediction of saccade target motion may be read out from observers’ eye velocities (Kroell, Mitchell, & Rolfs, 2023; Kwon, Rolfs, & Mitchell, 2019). In electroencephalographic (EEG) and neurophysiological studies, foveal predictions should manifest in early visual evoked potentials (e.g., Creel, 2019) and increased firing rates of feature-selective foveal neurons in early visual areas, respectively.”

      Difficulty of the task: Concerning the perceptual detection task, every experimental session was preceded by an adaptive staircase procedure that adjusted the transparancy of the foveal probe—and, thus, task difficulty—depending on the respective observer’s performance (see Methods for details). Concerning the oculomotor task, observers were able to perform accurate saccades with typical movement latencies for all target opacity conditions (see Results, Supplements & Figure S1). In general, we are unsure how high task difficulty could produce a feature-, temporally and spatially specific enhancement of both filtered and incidental target-congruent foveal orientation information. In fact, a main finding of our current submission is that foveal HRs decrease as the target becomes easier to see and the oculomotor task thus becomes easier to perform.

      Perceptual confusion of target and probe stimulus: We observe a specific increase in HRs for foveal probes that exhibit the same orientation as the peripheral saccade target. Just like in our main article, a response is defined as a ‘Hit’ if a foveal probe is presented and the observer generates a ‘present’ judgment. To our understanding, the suggestion that a confusion of target and probe stimuli may account for these effects necessarily implies that this confusion hinges on the congruency between peripheral and foveal feature inputs. In other words, peripheral and foveal signals should be more readily “confused” if they exhibit similar features. We assume that peripheral feature information is fed back to neurons with foveal receptive field and combines with feature-congruent feedforward input. Whether this combination of signals can be described as low-level perceptual “confusion” likely depends on individual linguistic judgments (it would certainly be a novel description of feedback-feedforward interactions). Perhaps a defining difference between the reviewer’s concern and our assumed mechanism is the spatial specificity of the resulting congruency effects. We suggest that only neurons with foveal receptive fields receive feature information via feedback. And indeed, we demonstrate a clear spatial specificity of congruency effects around the pre-saccadic foveal location, even after parafoveal performances had been raised to a foveal level by an adaptive increase in probe opacity (see Kroell & Rolfs, 2022; Figure 2C & Figure 3). In other words, observers’ perception is altered in their pre-saccadic center of gaze while the target is presented peripherally. We struggle to conceive a

      scenario in which a confusion of signals should be feature-specific as well as specific to an interaction between peripheral and foveal signals without being meaningful at the same time. If the reviewer is referring to confusions on the response or decision level, we would like to point them towards the Discussion section ‘Can our findings be explained by established mechanisms other than foveal prediction?’ in our main article. In this paragraph, we provide detailed arguments for a dissociation between our findings and shifts in decision criterion that would exceed the scope of a Research Advance. 

      When the peripheral target is easier to see, then the foveal hit rate drops.

      We agree. Target-congruent and incongruent foveal HRs decreased as the contrast of the probe increased. However, and as we stated in response to the reviewer’s first comment, the difference between target-congruent and target-incongruent foveal HRs (and, thus, foveal enhancement of the target orientation) increased with peripheral target contrast.

      The analyses of Fig. 3C appear to be overly convoluted. They also imply an acknowledgment by the authors that target-probe temporal difference matters. Doesn't this already negate the idea that the foveal effects are associated with the saccade generation process itself? If the effect is related to target onset, how is it interpreted as related to a foveal prediction that is associated with the saccade itself? 

      We indeed conducted analyses that can reveal an influence of target presentation duration at probe onset, the saccade preparation stage at probe offset, as well as a combination of both factors. The fact that target presentation duration may have an influence on foveal prediction would not negate a simultanous influence of saccade preparation and vice versa. In the main article, we directly investigated the influence of saccade preparation on foveal enhancement by introducing a passive fixation condition (Kroell & Rolfs, 2022; Figure 5). At identical target-probe offset durations, pre-saccadic foveal enhancement was significantly more pronounced and accelerated compared to enhancement during passive fixation. We have added a purely saccade-locked time course (uncorrected by targetprobe interval) to our Results section and to Figure 3 (second row). We still believe that the target-locked, saccade-locked and combined analysis are informative for future investigations and would like to present them all for completeness.

      Also, the oscillatory nature of the effect in Fig. 3C for 59% and 90% opacity is quite confusing and not addressed. The authors simply state that enhancement occurs earlier before the saccade for higher contrasts. But, this is not entirely true. The enhancement emerges then disappears and then emerges again leading up to the saccade. Why would foveal prediction do that?

      In response to this comment and a suggestion by Reviewer 1, we inspected the pre-saccadic time course of enhancement effects in a more temporally resolved fashion and, indeed, observed pronounced oscillations for the two higher target opacity conditions (see Results): 

      “Especially at higher target opacities, the temporal development of foveal enhancement appears to exhibit an oscillatory pattern. To inspect this incidental observation in a more temporally resolved fashion, we determined mean enhancement values in a boxcar window of 50 ms duration sliding along all saccade-locked probe offset time points (step size = 10 ms; x-axis values in Figure 4 indicate the latest time point in a certain window). We then fitted 6th order polynomials to the resulting time courses and compared the fitted values against zero using bootstrapping (see Methods). The average foveal enhancement across target opacities reached significance starting 115 ms before saccade onset (gray curve in Figure 4; all ps < .046). For every individual target opacity condition, we observed significant enhancement immediately before saccade onset, although only very briefly for the lowest opacity (-2–0 ms for 25%; -39–0 ms for 39%, -106–0 ms for 59% &  -13–0 ms for 90%; all ps < .050; yellow to dark red curves in Figure 4). Especially for the higher two target opacities, we observed a local maximum preceding eye movement onset by approximately 80 ms. Interestingly, assuming a peak in enhancement in approximately 80 ms intervals (i.e., at x-axis values of -80 and 0 ms in Figure 4) would correspond to an oscillation frequency of 12.5 Hz. In contrast to rapid feedforward processing, feedback signaling is associated with neural oscillations in the alpha and beta range (i.e., between 7 and 30 Hz; Bastos et al., 2015; Jensen, Bonnefond, Marshall, & Tiesinga, 2015; van Kerkoerle et al., 2015).”

      We had observed an oscillatory pattern in multiple previous investigations, and in both Hit Rates to foveal orientation content and reflexive gaze velocities in response to peripheral motion information. So far, we have been unsure how to explain it. The literature on thalamic visual processing mentioned by the reviewer alerted us to the oscillatory nature of feedback signaling itself. Interestingly, the temporal frequency range of feedback oscillations includes the frequency of ~12.5 Hz observed in our data. We have included this and alternative explanations in the Discussion section (see below). We are aware, and acknowledge in the manuscript, that our analysis approach is purely descriptive, and that the potential explanations we give are speculative. 

      “Moreover, foveal congruency effects appeared to exhibit an oscillatory pattern, with peaks in a medium saccade preparation stage (~80 ms before the eye movement) and immediately before saccade onset. We have noticed this pattern in several investigations with substantially different visual stimuli and behavioral readouts. For instance, using a full-screen dot motion paradigm, we observed a pre-saccadic, small-gain ocular following response to coherent motion in the saccade target region (Kroell, Rolfs, & Mitchell, 2023, conference abstract; Kroell, 2023, dissertation). Predictive ocular following first reached significance ~125 ms before the eye movement, then decreased and subsequently ramped up again ~25 ms before saccade onset. Several explanatory mechanisms appear conceivable. Unlike rapid feedforward processing, feedback propagation has been shown to follow an oscillatory rhythm in the alpha and beta range, that is, between 7 and 30 Hz (Bastos et al., 2015; Jensen, Bonnefond, Marshall, & Tiesinga, 2015; van Kerkoerle, et al., 2015). In our case, it is possible that the object-processing areas that send feedback to retinotopic visual cortex do so at a temporal frequency of ~12.5 Hz. At higher stimulus contrasts, feedforward signals may be fed back instantaneously and without the need for signal accumulation in feedback-generating areas. The resulting perceptual time courses may reflect innate temporal feedback properties most veridically. Alternatively, the initial enhancement peak may be related to the sudden onset of the saccade target stimulus and not to movement preparation itself. In this case, the initial peak should become particularly apparent if enhancement is aligned to the onset of the target stimulus. Yet, Figure 3 and Figure 4 suggest more prominent oscillations in saccade-locked time courses. In accordance with this, perceptual and attention processes have been shown to exhibit oscillatory modulations that are phase-locked to action onset (e.g., Tomassini, Spinelli, Jacono, Sandini, & Morrone, 2015; Hogendoorn, 2016; Wutz, Muschter, van Koningsbruggen, Weisz, & Melcher, 2016; Benedetto & Morrone, 2017; Tomassini, Ambrogioni, Medendorp, & Maris, 2017; Benedetto, Morrone & Tomassini, 2019). Whether the oscillatory pattern of foveal enhancement, as well as its increased prominence at higher target contrasts, relies on innate temporal properties of feedback  signaling, signal accumulation, saccade-locked oscillatory modulations of feedforward processing or attention, or a combination of these factors, one conclusion remains: task-induced cognitive influences suggested to underlie the considerable variability in temporal characteristics of foveal feedback during passive fixation (e.g., Fan et al., 2016; Weldon et al., 2016; 2020) are not the only possible explanation. Low-level target properties such as its luminance contrast modulate the resulting time course and should be equally considered, at least in our paradigm.”

      The interpretation of Fig. 4 is also confusing. Doesn't the longer latency already account for the lapse in attention, such that visual continuity can proceed normally now that the saccade is actually eventually made? In all results, it seems that the effects are all related to the dual nature of the task and/or attention, rather than to the act of making the saccade itself. Why should visual continuity (when a saccade is actually made, whether with short or long latency) have different "fidelity"? And, isn't this disruptive to the whole idea of visual continuity in the first place?

      We are unsure if we grasp the unifying concern behind these remarks. For the reviewer’s point on the dual-task nature of our paradigm, please consider our answer above. Perhaps it is important to note that we do not (and would never) claim that foveal prediction is the only mechanism underlying visual continuity. We believe that multiple mechanisms, including but not limited to pre-saccadic shifts of attention, predictive remapping of attention pointers and the perception of intra-saccadic signals interact and jointly contribute to visual continuity. It appears highly conceivable that, like most processes in biological systems, motor and perceptual performances are subject to fluctuations. We argue that saccade latencies as well as the magnitude of foveal prediction constitute read-outs of these variations. We also suggest that those read-outs are innately correlated beyond their common moderator of, perhaps, attentional state; we have previously presented clear evidence for a link between eye movement preparation and foveal prediciton (Kroell & Rolfs, 2022; Figure 2). To the best of our judgment, we consider it reasonable that the effectiveness of movement-contingent perceptual processes varies with the effectiveness (in programming or execution) of the very movement motivating them. We present evidence for this assumption in our submission. We would also like to make clear that we do not assume our vision to fail entirely, even if every single well-known mechanism of visual continuity were to break down at once. Upon saccade landing, the visual system receives reliable visual input. Nonetheless, the visual system has undeniably developed mechanisms to optimize this process. We believe foveal prediciton to rank among them.

      Small question: is it just me or does the data in general seem to be too excessively smoothed?

      We did not apply any smoothing to either the analysis or visualization of our data in the initial manuscript.

      Every observer completed a large number of trials (mean n = 2841 trials per observer; total trial number > 25,500), which likely contributes to the clarity of our data. To inspect the oscillatory pattern of enhancement in a more temporally resolved fashion (in response to the reviewer’s point above), we applied a moving window analysis in this revision. Due to overlapping window borders, this analysis introduces a certain degree of smoothing. Nonetheless, data patterns are comparable to the time course with only few non-overlapping time bins (Figure 3B; second row). In general, we have described all steps of our analysis routine extensively in the Methods section and will make our data publicly available upon publication of the Reviewed Preprint. 

      General comment: it is important to include line numbers in manuscripts, to help reviewers point to specific parts of the text when writing their comments. Otherwise, the peer review process is rendered unnecessarily complicated for the reviewers.

      We apologize and have added line numbers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The study presents important findings on inositol-requiring enzyme (IRE1α) inhibition on diet-induced obesity (overnutrition) and insulin resistance where IRE1α inhibition enhances thermogenesis and reduces the metabolically active and M1-like macrophages in adipose tissue. The evidence supporting the conclusions is convincing but can be enhanced with information/data on the validity, specificity, selectivity, and toxicity of the IRE1α inhibitor and supported with more detail on the mechanisms by which adipose tissue macrophages influence adipocyte metabolism. The work will be of interest to cell biologists and biochemists working in metabolism, insulin resistance, and inflammation.

      We thank the editors for the assessment and appreciation of our findings in this study. In the revision, we have added the information on the validity, selectivity and toxicity of IRE1α inhibitor. In addition, we also discussed the likelihood that suppression of metabolically activated proinflammatory macrophage population in adipose tissue on the reversal of adipose remodeling and thermogenesis. In the revision, we have improved the manuscript significantly throughout the text and figures following the recommends by the reviewers.

      Public Reviews:

      Reviewer #1 (Public review):

      First, the authors confirm the up-regulation of the main genes involved in the three branches of the Unfolded Protein Response (UPR) system in diet-induced obese mice in AT, observations that have been extensively reported before. Not surprisingly, IRE1a inhibition with STF led to an amelioration of the obesity and insulin resistance of the animals. Moreover, non-alcoholic fatty liver disease was also improved by the treatment. More novel are their results in terms of thermogenesis and energy expenditure, where IRE1a seems to act via activation of brown AT. Finally, mice treated with STF exhibited significantly fewer metabolically active and M1-like macrophages in the AT compared to those under vehicle conditions. Overall, the authors conclude that targeting IRE1a has therapeutical potential for treating obesity and insulin resistance.

      The study has some strengths, such as the detailed characterization of the effect of STF in different fat depots and a thorough analysis of macrophage populations. However, the lack of novelty in the findings somewhat limits the study´s impact on the field.

      We thank the reviewer for the appreciation of our findings and the comments about the novelty. Regarding the novelty, we would emphasize several novelties presented in this manuscript. First, as the reviewer correctly pointed out, we discovered that IRE1 inhibition by STF activates brown AT and promotes thermogenesis and that IRE1 inhibition not only significantly attenuated the newly discovered CD9+ ATMs and the “M1-like” CD11c+ ATMs but also diminished the M2 ATMs for the first time. These discoveries are very important and novel. In obesity, it was originally proposed that ATM undergoes M1/M2 polarization from an anti-inflammatory M2 to a classical pro-inflammatory M1 state. It was further reported that IRE1 deletion improves thermogenesis by boosting M2 population which then synthesize and secrete catecholamines to promote thermogenesis. It is now known that M2 macrophages do not synthesize catecholamines or promote thermogenesis. In this study, we discovered that IRE1 inhibition doesn’t increase (but instead decrease) the M2 population and that IRE1 inhibition promotes thermogenesis likely by suppressing pro-inflammatory macrophage populations including the M1-like ATMs and most importantly the newly identified metabolically active macrophages, given that ATM inflammation has been reported to suppress thermogenesis. Second, this study presented the first characterization of relationship between the more classical M1-like ATMs and the newly discovered metabolically active ATMs, showing that the CD11c+ M1-like ATMs are largely overlapping with but yet non-identical to CD9+ ATMs in the eWAT under HFD. Third, although upregulation of ER stress response genes in the adipose tissues of diet-induced obese mice have been extensively reported, it doesn’t necessarily mean that targeting IRE1a or ER stress can reverse existing insulin resistance and obesity. It is not uncommon that a therapy doesn’t yield the desired effect as expected. For instance, amyloid plaques are a hallmark of Alzheimer's disease (AD), interventions that prevent or reverse beta amyloid deposition have been expected to prevent progression or even reverse cognitive impairment in AD patients. However, clinical trials on such therapies have been disappointing. In essence, experimental demonstration of effectiveness or feasibility for any potential therapeutic targets is a first step for any future clinical implementation.

      Reviewer #2 (Public review):

      The manuscript by Wu et al demonstrated that IRE1a inhibition mitigated insulin resistance and other comorbidities through increased energy expenditure in DIO mice. In this reviewer's opinion, this timely study has high significance in the field of metabolism research for the following reasons.

      (1) The authors' findings are significant and may offer a new therapeutic target to treat metabolic diseases, including diabetes, obesity, NAFLD, etc.

      (2) The authors carefully profiled the ATMs and examined the changes in gene expression after STF treatment.

      (3) The authors presented evidence collected from both systemic indirect calorimetry and individual tissue gene expression to support the notion of increased energy expenditure.

      Overall, the authors have presented sufficient background in a clear and logically organized structure, clearly stated the key question to be addressed, used the appropriate methodology, produced significant and innovative main findings, and made a justified conclusion.

      We thank the reviewer for the appreciation of our work.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Wu D. et al. explores an innovative approach to immunometabolism and obesity by investigating the potential of targeting macrophage Inositol-requiring enzyme 1α (IRE1α) in cases of overnutrition. Their findings suggest that pharmacological inhibition of IRE1α could influence key aspects such as adipose tissue inflammation, insulin resistance, and thermogenesis. Notable discoveries include the identification of High-Fat Diet (HFD)-induced CD9+ Trem2+ macrophages and the reversal of metabolically active macrophages' activity with IRE1α inhibition using STF. These insights could significantly impact future obesity treatments.

      Strengths:

      The study's key strengths lie in its identification of specific macrophage subsets and the demonstration that inhibiting IRE1α can reverse the activity of these macrophages. This provides a potential new avenue for developing obesity treatments and contributes valuable knowledge to the field.

      Weaknesses:

      The research lacks an in-depth exploration of the broader metabolic mechanisms involved in controlling diet-induced obesity (DIO). Addressing this gap would strengthen the understanding of how targeting IRE1α might fit into the larger metabolic landscape.

      Impact and Utility:

      The findings have the potential to advance the field of obesity treatment by offering a novel target for intervention. However, further research is needed to fully elucidate the metabolic pathways involved and to confirm the long-term efficacy and safety of this approach. The methods and data presented are useful, but additional context and exploration are required for broader application and understanding.

      We thank the reviewer for the appreciation of strengths in our manuscript. In particular, we appreciate the reviewer’s recommendation on the exploration of broader metabolic landscape, such as the effect of IRE1 inhibition on non-adipose tissue macrophages and metabolism. We agree that achieving these will certainly broaden the therapeutic potential of IRE1 inhibition to larger metabolic disorders and we will pursue these explorations in future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      A list of recommendations for the authors is presented below:

      (1) Please, update the literature review to include more recent studies relevant to the topic.

      We thank the reviewer’s suggestions. We have added more references from recent studies.

      (2) Please, provide a detailed explanation of how STF functions, including potential off-target effects or issues related to specificity.

      We thank the reviewer’s suggestions. STF is a small-molecule inhibitor designed to selectively inhibit the RNase activity of IRE1a. Once IRE1a is activated (e.g., in obesity), its RNase domain initiates the unconventional splicing of the transcription factor X-box binding protein 1 (XBP1) mRNA and the Regulated IRE1-Dependent Decay (RIDD) of microRNAs, which is detrimental if prolonged. IRE1a RNase inhibitors including STF engage the RNase-active site of IRE1a with high affinity and specificity by exploiting a shallow complementary pocket through pi-stacking interactions with His910 and Phe889 and an essential Schiff base interaction between the aldehyde moiety of the inhibitor and the side chain amino group of Lys907 (Sanches et al., NComm 2014, PMID: 25164867). This specific and high affinity binding blocks the IRE1a RNase activity, preventing the splicing of XBP1 mRNA and RIDD. As IRE1a has been shown to be activated in multiple tissues under various pathological conditions and to be responsible for the progression of the pathological conditions, inhibition of IRE1a by pharmacological agents including STF has the great potential for the treatment of various pathological disorders. Several studies have reported that STF shows no overt toxicity when administered systemically (Madhavan, Aparajita, et al.2022, PMID 35105890; Herlea-Pana et al., 2021, PMID 34675883; Papandreou et al., 2011, PMID 21081713; Tufanli et al., 2017, PMID 28137856).

      (3) Lines 263-266 require a reference.

      We thank the reviewer’s suggestion. A reference has been added.

      (4) Stromal vascular fraction (SVF) also contains a significant amount of preadipocytes and stem cells, not only macrophages, which might affect the conclusions reached by the authors.

      We thank the reviewer’s comments. It is true that SVF consists of multiple cell types, including endothelial cells, macrophages, preadipocytes, and various stem cell populations. In HFD-induced obesity, adipose tissue undergoes significant remodeling, and the percentage of macrophages in the SVF of obese adipose tissue increases significantly relative to other cell types. In our studies, SVFs from adipose tissues of obese mice were isolated, cultured, and treated with STF for overnight.  We observed that IRE1 RNase activity in SVFs was inhibited by STF treatment, and that ATM population and the expression of pro-inflammatory genes were downregulated by STF. Given the short-term treatment, the parsimonious interpretation of the data would be that STF directly acts on ATMs.  However, we note that the possibility that the effect of STF on other cell types might influence the ATM and inflammatory gene expression can’t be totally ruled out. As such, we have modified our conclusion from “these results indicate that STF acts directly on ATMs to regulate inflammation” to “these results indicate that STF likely acts directly on ATMs to regulate inflammation”.

      (5) Figures 1A and G: It is common practice to present the XBP1s/XBP1u ratio; consider using this standard measure.

      We thank the reviewer’s comments. Regarding the XBP1 mRNA splicing, we see both ways of presentation in publications. There are quite a number of papers, for instance, PMID25018104, 2014, Cell; PMID23086298, 2012, NCB, that used the XBP1s/ (XBP1s+XBP1u) ratio. We preferred this way of presentation as it shows the ratio of spliced XBP1 (XBP1s) relative to the total XBP1 mRNA (XBP1s+XBP1u).

      (6) Figure 1F: please indicate the type of AKT phosphorylation assessed.

      We thank the reviewer’s comments. We have added Ser473 as the phosphorylation site at in both figure legend and figure.

      (7) Figures 2E-H: please clearly indicate the specific fat depots analyzed in each figure.

      We thank the reviewer’s comments. We have added the information in the figure legends and figures.

      (8) Figures 1I and 3A, and Supplementary Figures 6D-E: please include a quantification analysis of the images presented.

      We thank the reviewer’s suggestion. We have added the quantifications of the images.

      (9) In Figure 3D the image corresponding to the merge for the STF condition is a duplication of the control, please correct this.

      We thank the reviewer for pointing this out. We have replaced it with the correct image.

      (10) Figures 4B-F: please provide individual data points in the graphs to show variability and sample distribution.

      We thank the reviewer’s suggestion. We have re-plotted the graphs in Fig. 4B-F with the individual data points.

      (11) Figure 4I: it is rather unusual to have such a strong signal of UCP1 in ND conditions, please explain.

      We thank the reviewer for the comment. We wish to point out that the images were taken from BAT slides. UCP1 is expected to show strong staining in BAT under DN condition, which as expected is weakened under HFD condition. STF treatment was able to correct the HFD-induced weakening of UCP1 staining in BAT.

      (12) Supplementary Figures 2C-D: please provide representative images for better clarity and interpretation.

      We thank the reviewer for the comment. The representative images for Supplementary Figures 2C-D were actually shown in Figures 2C and F. Supplementary Figures 2C-D were the mere quantification for adipocyte areas for Figures 2C and F.

      (13) Supplementary Table 3 is repeated, please remove.

      We thank the reviewer for the comment. We have deleted this repetition.

      Reviewer #2 (Recommendations for the authors):

      The manuscript can be further strengthened with more clarification on the following points.

      (1) The use of IRE1a pharmacological inhibitor STF-083010 (STF) needs to be validated. How was the dose determined? Were there any dose-dependent studies? Under the current dosing regimen, what are the specificity, selectivity, and toxicity of STF? Also, were the serine/threonine kinase and RNase activities measured in the adipocytes and ATMs of the animals dosed with the compound? What's the PK data?

      We thank the reviewer for the comments. In the animal study, we used STF 10 mg/kg for intraperitoneal injection. This dose was adopted from several recent studies (Madhavan, Aparajita, et al.2022, PMID 35105890; Herlea-Pana et al., 2021, PMID 34675883; Papandreou et al., 2011, PMID 21081713; Tufanli et al., 2017, PMID 28137856), in which STF treatment showed beneficial effect in their respective disease models. STF didn’t compromise cell viability or induce any other toxicity at the dose or concentration used in these studies (Papandreou I, et al., 2011; Upton JP, et al., 2012; Lerner AG, et al., 2012; Kemp KL, et al., 2013; Cross BC, et al., 2012). In our study, we didn’t observe any apparent toxicity on mice at this dose. Importantly, we did observe that STF inhibited IRE1 RNase activity in adipose tissues (F1G, S1D) and ATMs (F6Q, S8C, G, I) of the animals at this dose. As the IRE1 inhibitors including STF has been extensively examined and shown to have no effect on the kinase function of IRE1 (Cross et al., 2012, PMID: 22315414; Tufanli et al., 2017, PMID 28137856), we didn’t perform the assay on Ire1 kinase activity. Additionally, as the chemical has been administered into several animal models, with significant beneficial effects, one would assume decent pharmacokinetic parameters being achieved with the current dose. It would be important and necessary to have systematic PK studies in the future if clinical trials are to be considered.

      (2) The statistical method for individual panels in each figure needs to be specified.

      We thank the reviewer for the suggestion. We have specified the statistical method in the figure legends.

      (3) In Figure 1E, there's no difference in fasting insulin levels, though a difference was detected after the glucose load. This suggests an effect on insulin secretion but not insulin sensitivity.

      We thank the reviewer for the comments. The insulin levels are still different between Veh and STF groups at fasting, just not reaching statistically significant. Under glucose stimulation, the insulin levels all showed the same trend, which is, the STF group is lower than the Veh group. Even if the fasting insulin levels showed no difference between the two groups regardless of glucose stimulation, the fact that the blood glucose levels at all the time points are lower in STF group than Veh group (Fig. 1C) indicates that insulin sensitivity is improved. In our study, the insulin levels were lower in STF group, but the blood glucose levels were still lowered by STF, further strengthening the notion that STF treatment improves insulin sensitivity. This is indeed further corroborated by the ITT results (Fig. 1D).

      (4) Figure 2 and S2A did not show a decrease in BW but rather BW gain. The statement (line 308) needs to be edited. As a result of this, the relative fat mass measurement (% of BW) needs to be presented in addition to Figure 2B.

      We thank the reviewer for the comments/suggestions. As shown in Figs. 2A and S2A, we observed a slight decrease in body weight (~2g reduction) in STF-treated mice while Veh group increased body weight by ~3.5g, at the end of 4 weeks of treatment. As shown in Fig. 2B, this difference in body weight between Veh and STF groups was primarily due to a reduction in fat tissue. In the revision, we also added the percentages of fat and lean masses over total body weight in Supplemental Fig. 2B, which show the similar trend.

      (5) The measurement of blood lipid levels in Figure 3F-H is informative. More importantly, hepatic lipid content needs to be measured.

      We thank the reviewer for and agree on the comments. As this study is more focused on the insulin resistance and adipose tissue remodeling, we didn’t go deep into the comorbidities beyond the reported observations. It will be interesting to explore the effects of IRE1 inhibition on the obesity/insulin resistance comorbidities including hepatic lipid content measurement in future study.

      Minor corrections:

      (1) Line 261: "(spliced".

      Done. We have corrected it.

      (2) Line 334: spell out "PEPCK".

      We have added the full name “Phosphoenolpyruvate carboxykinase”. Thanks!

      (3) Line 478: please rephrase.

      We thank the reviewer for the comment. We have rephrased the sentence as following: “These results reveal that STF treatment suppresses the adipose tissue inflammation and the accumulation of pro-inflammatory ATM with augmenting (suppressing instead) M2-like ATMs.”

      (4) Figure 4L: "pGC1-a".

      We thank the reviewer for pointing this out. We have corrected the name.

      (5) Figure 4O: missing Y-axis label.

      We have added the label. Thanks!

      Reviewer #3 (Recommendations for the authors):

      The observations presented by Wu D. et al. in the manuscript are potentially interesting and relevant. The current study seeks to build upon previous findings, specifically from the work titled, "Silencing IRE1α using myeloid-specific cre suppresses alternative activation of macrophages and impairs energy expenditure in obesity." By using a pharmacological inhibitor to modulate IRE1α activity in adipose tissue macrophages (ATMs), the authors aim to develop therapeutics that could significantly impact the treatment of obesity and metabolic disease.

      The authors have performed some satisfactory experiments related to liver steatosis. However, the manuscript would benefit from a more comprehensive exploration of the mechanisms by which ATMs influence adipocyte metabolism, particularly in epididymal white adipose tissue (eWAT). In particular, the study should investigate how adiposity and lipid droplet size change in response to alterations in lipolysis and adipogenesis, as this could provide insights into how these processes contribute to the amelioration of the obesity phenotype.

      Several issues should be addressed to strengthen the manuscript and make the study more convincing. Below are specific comments and recommendations:

      Major:

      (1) The indirect calorimetric data should be normalized for dependent variables such as body weight, lean mass, and fat mass+ lean mass to accurately interpret the results. The results for 24-hour energy expenditure should be included in Figure 4B-F to provide a more comprehensive analysis. It is recommended to plot bar graphs with all individual data points for the energy expenditure (EE) results shown in Figure 4B-F, to offer a clearer and more detailed presentation of the data (Figure 4B-F).

      We thank the reviewer for the comments. Data analysis on the indirect calorimetric studies has evolved over the years. One common practice was/is to normalize the data by body weight. However, this approach was deemed improper some years ago (Tschop et al Nature Methods 2012, PMID: 22205519). Tschop paper also pointed out the shortcomings associated with normalization by lean mass. Instead, it concludes that “generalized linear model is the most appropriate statistical approach to accommodate discrete (genotype) and continuous (body mass) traits, rather than using a simple division by BW or lean BW”. In our study, we used CalR, an improved generalized linear model (which includes ANOVA and ANCOVA) (Mina et al Cell Metabolism 2018, PMID: 30017358) for all our energy expenditure data analysis (shown in Fig. 4A-E). In the revision, we also included data analysis normalized by BW (Fig. S2F-H’), which actually shows even wider difference between Veh and STF groups than the data shown in Fig. 4A-F. As STF decreased the fat mass and had little effect on lean mass, the difference would be more drastic for normalization with fat mass and with fat mass+ lean mass than the data shown in Fig. 4A-E and would be similar to the data shown in Fig. 4A-E for normalization with lean mass. In addition, we replotted the graphs in Fig. 4B, D, F-H with the individual data points.

      (2) At the thermoneutral point (30{degree sign}C), the study could benefit from testing the indirect calorimetric models of human energy physiology. Future studies could also explore this to evaluate the implications for drug development.

      We agree with the reviewer on the comments. In the future study, it will be very informative to investigate the effects of STF under thermoneutral conditions, which could provide more consistent data on how drugs affect metabolic processes in humans, improving translational research.

      (3) The current study missed the opportunity to investigate the effects of STF on non-adipose tissue (non-AT) resident macrophage populations, such as those in bone marrow or lymph-node macrophages. Understanding how STF modulates macrophage metabolism in these contexts would be valuable.

      We thank the reviewer for and agree on the comments. As this study is more focused on the insulin resistance and adipose tissue remodeling, we were mostly restricted to adipose tissue macrophage populations. In the future, it would be interesting to investigate the effect of STF on macrophages in other non-adipose tissues, which will provide a more comprehensive understanding of STF's effects on immune cell metabolism, which could inform its application in various therapeutic areas.

      (4) The study should explore how STF influences the expression of CD9, Trem2, (positive subpopulations), and the secretion of pro-inflammatory cytokines by macrophages, particularly in response to LPS and IFNγ activation in stromal vascular fraction (SVF) cells and bone marrow-derived macrophages (BM-Macrophages).

      We appreciate the reviewer for the comments. Under obesity, the ATM does not undergo the classical M1/M2 polarization; instead, both M1-like/pro-inflammatory macrophages and M2 macrophages increase drastically in obesity. It will be interesting to investigate the effects of STF on the newly identified CD9- and Trem2-positive macrophage subpopulations in SVF and bone marrow macrophages in response to LPS and IFNγ stimulation in the future, although these studies might not faithfully reflect the changes in adipose tissue under obesity as these stressors typically induce classical M1/M2 polarization.

      (5) Additional macrophage gating is necessary better to understand adipose tissue macrophage (ATM) inflammation. Specifically, CD11c−MHC2 low macrophages represent a newly identified inflammatory and dynamic subset in murine adipose tissue. These ATMs accumulate rapidly after ten days of a high-fat diet (HFD) and should increase further with prolonged HFD. For this study, CD11c−MHC2 low ATMs could be subdivided for flow cytometry analysis based on their MHC2 expression, distinguishing them from CD11c−MHC2 high ATMs. All macrophage subtypes categorized here can be studied for metabolic health using seahorse analysis as well.

      We appreciate the reviewer for the comments. It will be interesting to investigate the effects of STF on the newly identified CD11c−MHC2 low macrophage subpopulation in the future. Future studies certainly can include metabolic analysis with Seahorse which can corroborate the energy metabolism at the cellular level with organismal thermogenesis. 

      (6) All flow cytometry histograms - are they showing mean fluorescence intensity or cell# per population? Please specify. All flow cytometry dot plots - It would be helpful for readers to see populations plotted as bar graphs next to respective flow plots, as opposed to being shown as supplemental tables. Additionally, labeling dot plots with the parent population from which cells were gated on would also help readers understand faster what we're looking at.

      We appreciate the reviewer for the comments. In flow cytometry histograms, we used “normalized to mode”. The mode is often used to compare the distribution of fluorescence intensity between different samples. It focuses on the shape of the distribution (with a max of 100%) rather than the absolute cell counts, which helps remove variations caused by different cell numbers or sample sizes, making it easier to compare populations based on fluorescence intensity. When normalizing to the mode, the highest peak in the histogram is scaled to 100%, and all other values are scaled relative to that peak. This allows for easy comparison of multiple histograms, even if the total number of cells (or events) differs between samples.

      (7) The results appear to confuse the actual sample size and p-value. Please carefully review the statistical analyses to ensure that biological replicates are accurately represented. Additionally, include p-values alongside fold change data in the text for clarity represented.

      We appreciate the reviewer for the comments. We have rechecked the statistical analyses confirming that the biological replicates are now properly represented. The exact number of biological replicates for each experiment is now clearly specified in both the methods section and figure legends.

      (8) To further validate the findings, consider using Seahorse analysis at the cellular level in future experiments. This could confirm indirect calorimetric data and thermogenesis responses to cold stimulation.

      We appreciate the reviewer for the comments. Yes, Seahorse analysis at the cellular level will be conducted in future experiments.

      (9) Please ensure the use of person-first language, avoiding labels or adjectives that define individuals based on a condition or characteristic.

      We appreciate the reviewer for the comments. We have changed the descriptions by using person-first language.

      (10) The manuscript does not demonstrate how STF inhibition of IRE1α in ATM, specifically through CD9 and Trem2, controls diet-induced obesity. This aspect should be further elucidated.

      We appreciate the reviewer for the comment. In this study, we observed that STF inhibits IRE1α RNase activity in SVF and in sorted ATMs as well as in adipose tissue. The improvement in diet-induced obesity can be attributable to IRE1α inhibition in both adipocytes and macrophages as shown previously by myeloid and adipocyte-specific knockouts of IRE1α. To conclude whether the IRE1α in CD9- and/or Trem2-positive ATMs controls diet-induced obesity, genetic means would be needed to generate CD9- and/or Trem2-positive ATMs-specific deletion of IRE1α, which will be technically challenging at this moment as there is no CD9 or Trem2-specific Cre lines available.

      Minor:

      (1) Line 43-44: Update terminology to "MASLD" instead of "NAFLD."

      We thank the reviewer for pointing these out. We have changed the terminology in the revision.

      (2) Line 58-59: Add a reference for the mentioned text.

      We thank the reviewer for the comment. Added a reference in the text in the revision.

      (3) Was the antibody used to detect CD9 and Trem2 validated for FACS and other analyses?

      We thank the reviewer for the comment. In our studies, we determined CD9 and Trem2 expression through flow cytometry and immunostaining staining. In flow experiment, CD9 and Trem2 were acquired from Biolegend: PE/Dazzle™ 594 anti-mouse CD9 (BioLegend Cat# 124821, RRID:AB_2800601); APC-conjugated Trem2 (R&D Systems Cat# FAB17291N, RRID:AB_3646995), which were validated for FACS. For immunostaining: CD9  (Abcam Cat# ab223052, RRID:AB_2922392). and Trem2 (R&D Systems Cat# MAB17291, RRID:AB_2208679).

      (4) Studies were limited to male mice; this should be noted in the title and discussed as a limitation.

      We thank the reviewer for the comment. We have modified the wording in the revision.

      (5) Ensure all reagents are fully described with preparation details and identifiable numbers for reproducibility and/or submit the FACS protocol to any protocol archives.

      We thank the reviewer for the suggestions. Yes, we have modified the wording in the revision.

      (6) Provide the correct version numbers for all software used (FlowJo, Prism, etc.).

      We thank the reviewer for the suggestions. We have provided the correct version numbers for softwares for FlowJo and Prism.

      (7) Specify section size (µm) and blocking agent used for eWAT immunofluorescence (Line 207).

      We thank the reviewer for the suggestions. We have added this information.

      (8) Add gene accession numbers to Supplementary Table 3.

      We thank the reviewer for the suggestions. We have added this information.

      (9) Figure 2: Clarify HFD and treatment timelines with a schematic diagram.

      We thank the reviewer for the suggestions. We have added a schematic diagram in Supplemental Figure 1C.

      (10) For histology analysis, the minimum combined data from triplicate images is shown in Figure 2C-2H. For Figures 2E and H, provide complete methods for histology analysis.

      We thank the reviewer for the comments. For the histology analysis shown in Figures 2C–2H, we used a minimum of three mice per treatment group. For each mouse, 3–5 images were taken for analysis. All histology analyses were conducted using ImageJ for image quantification, and the data were processed and organized using Excel and Graphpad.

      (11) Figure 3D Macrophage markers F4/80 stained differently in Figure 5B; to avoid false positive staining, show isotype control to confirm actual staining. For eWAT immunofluorescence (Figures 3D, 5B, 6E)., counterstaining is needed in addition to macrophages, such as for adipocytes-perilipin, and phalloidin for total cells.

      We thank the reviewer for the comments. Yes, Figures 3D macrophage marker F4/80 stained is differently from that of Figure 5B, as they are in different tissues, with Figure 3D in liver samples while Figure 5B in adipose tissues. In the liver, subsets of macrophages are known as Kupffer cells. Kupffer cells have distinct morphology and behavior compared to other tissue-resident macrophages. When stained with F4/80 in the liver, the pattern may reflect the specialized role of Kupffer cells, typically showing a more diffuse or localized staining around blood vessels and sinusoids. In adipose tissue, macrophages tend to accumulate around dead or dying adipocytes, forming what is known as "crown-like structures" (CLS). The F4/80 staining in adipose tissue shows a more clustered pattern, particularly around areas of fat tissue undergoing remodeling or inflammation. In adipose tissue, you can still see clear, defined cells even without counterstaining like perilipin, and importantly, adipocytes are generally way larger than macrophages in size. Yes, we agree that if with counterstaining it would enhance the accuracy. In the future study, we will use perilipin staining to make it easier to differentiate adipocytes from other structures and provide stronger data.

      (12) Insert scale bars in the original images for Figures 3D, 4I, 4M, 5B, 6E, S3B, S6D-E, and S7A-B. All images added a scale bar not inserted while acquiring the image or using imaging software.

      We thank the reviewer for the suggestions. The resolution for the scale bars in the images obtained during acquisition, somehow, isn’t sufficient enough to be clearly visible and requires the enlargement of the images to be seen clearly. In the revision, we have manually added the scale bars for clarity.

      (13) Figure 5E: Please label X-axis as F4/80.

      We thank the reviewer for pointing this out. The label has been added in the revision.

      (14) Figure 5F: It is specified in the legend that cells were gated on F4/80+CD11b+CD11c+, but there is a CD11c- population shown in the histogram...How is this population appearing if all cells should be CD11c+?

      We thank the reviewer for pointing this out. We gated against CD11c in F4/80+CD11b+ population. As such, we have corrected the description in the legend.

      (15) Figure 5G: What is the F4/80+CD11b+CD11c-CD206- population gated in quadrants?

      We thank the reviewer for the comment. The F4/80+CD11b+CD11c-CD206- population was shown in Figure 5G on the lower left side, with the percentages being 15.7% for ND, 5.54% for Veh-HFD, and 26% for STF-HFD.

      (16) Figure 6J: Flow cytometry gates seem slightly misplaced and the sample appears to be overcompensated - were FMOs included in this experiment to establish proper gates? If so, please include.

      We thank the reviewer for the comment. In the study, we did include Fluorescence Minus One (FMO) control in the experiment to establish proper gating. We have included this information in the methods section.

      (17) Table 1-3: Indicate the number of replicates (n=) used in all tables.

      We thank the reviewer for the suggestion. We have provided the specific number of mice used in the study within the figure legends.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      The analysis of the dormancy rates is interesting and offers some intriguing questions related to the higher dormancy rate found for the L2 isolates and lower for the L3 ones. It will be interesting in the future to expand the data generated in this advanced in vitro plaAorm to in vivo studies.

      Indeed, an increased dormancy propensity of L2 isolates was previously reported in broth culture and associated to specific genetic polymorphisms. The opposite phenotype observed in the L3 isolates is indeed particularly intriguing and was not described to date. Hence, we fully agree that it would be very interesting to find out whether these phenotypes are also observed in vivo.

      The authors propose that ‘strains exhibiting greater proliferative capacity are more prone to induce macrophage apoptosis, thereby contributing to the extent of the granulomatous response.’ It would be interesting to know what happens if the macrophage apoptotic response is blocked.

      This is an interesting suggestion that would deserve a dedicated comprehensive investigation covering other cell death pathways. Even though the trend is significant, the correlation coefficient is rather low in this interaction, which looks a fortiori due to substantial inter-host variability in the apoptotic propensity of macrophages from individual donors to a given strain. In addition, such blocking experiments may require performing isolated macrophage infections that would fall outside of the scope of this study, or considering the extent and the contribution of the apoptosis of other cell subsets. 

      In contrast to macrophage apoptosis, T cell activation correlated with less replicative bacteria. Are these two findings related, ie, are the granulomas showing more (apoptotic) macrophages the ones with a lower percentage of activated T cells? This would shed light on what distinguishes granulomas that are protective from those that support bacterial growth. 

      Indeed, a significant negative correlation between macrophage apoptosis induction and T cell activation can be observed, specifically with activated CD4 T cells expressing CD38 (rS \= -0.36, p < 0.05) or CD69 (rS = -0.40, p < 0.01). We have added this additional result in the manuscript text (line 217).

      It would also be interesting to know the functional impact of blocking early CXCL9 or IL1b on the outcome of granulomatous response/bacteria growth.

      We have performed the suggested early blocking experiments and added the expected negative effect on granuloma formation upon neutralization of IL-1b (current Fig. 6E) in the revised version of the manuscript, and furthermore discussed the null effect on bacterial growth of the treatment with an anti-CXCL-9 specific antibody (current Fig. 6H).

      The authors acknowledge the absence of neutrophils in this model. However, this could be discussed in more detail, as neutrophils play an important part in TB pathogenesis as shown in different models of infection and human TB. 

      We concur and have expanded the importance of neutrophils in TB pathogenesis (including references) in the discussion section (line 260). 

      Related to neutrophils and TB pathogenesis, another important player is type I IFN. The multiplex assay used included IFN-alpha, was this molecule detected? If so, was there any difference in the levels of type I IFN detected among the different infections?

      We agree and that is why we had originally included IFN-α in our screen. However, this cytokine remained under the limit of quantification at both studied time points, preventing us to draw conclusions on the effect of Mtb strain diversity on the secretion of type I IFNs in in vitro granulomas.

      Reviewer #2:

      In Figure 1b/c, it is not clear what comparisons are being made to give the p-value annotations.

      In Figure 2a/b, it is not clear what comparisons are being made to give the p-value annotations.

      In Figure 3a, again it is not clear what comparisons are being made to give the p-value annotation.

      The p-values formerly present on the upper le] corner of the panels were resulting from either Friedman (Figures 1C, 2A and 3A) or Kruskal-Wallis (Figures 1B and 2B) tests and indicated whether there was a significant difference between the analyzed groups overall. To avoid confusion, those values have been removed to only leave the post-test comparison between specific groups.  

      In the results narrative related to Figure 1 (lines 93-103), the authors refer to lineage heterogeneity without providing any objective quantification of this - I suggest they do so, by providing variance or standard deviations. 

      Thank you very much for this relevant suggestion, we have now included the coefficients of variation as a quantitative measure of the within-lineage heterogeneity in the manuscript (line 97). 

      I also suggest the authors explain what the data points actually represent in this figure - do I assume each data point = cfu from a well of 'granuloma'? Are they all from the same donor PBMC? What is the sample N for each lineage? If the data are not from the same donor PBMC, I think more informative to present the results of paired statistical analyses, stratified by donor cells. In addition, the authors should include a summary table of the demographic characteristics of the donors (at least sex, ethnicity, and age). If the data are derived from a single donor, I'd advocate providing data from at least one further donor.

      In the new supplementary figure requested by Reviewer 3 Figure 1—figure supplement 1 (actual CFU data on days 1 and 8 p.i. used to calculate the growth rate) it is now indicated that bacterial load was quantified as CFU per well.

      Regarding the number of donors used, as stated in the Material and Methods section (current line 418) and depicted by the four different shapes used when data are grouped by individual infecting strain, all figures in our manuscript have been generated using PBMCs from 4 independent donors. For greater clarity, “n = 4” has now been included in the figure legends. Regarding the statistical analyses, paired statistical analyses stratified by donor were already performed in the original version of the manuscript whenever appropriate. 

      As stated in the methods section, the buffy coats used for PBMC isolation are anonymized so demographic data are unavailable.

      The premise of the analysis in Figure tic and the results narrative ("This finding suggests that an increased ability to enter dormancy is not necessarily associated with a more pronounced growth phenotype", line 132) is not clear to me. Why would increased dormancy relate to increased growth in the same context? I suggest this analysis be removed.

      We apologize for the confusion in our original statement. We now rephrased it as “This finding suggests that an increased tendency to remain in a metabolically active state is not necessarily associated with a more pronounced growth phenotype”.

      In Figure 3b, I think it may be more informative if the data points from the same donor were linked. Likewise in Figure 3c, I'd like to see a donor-paired statistical analysis.

      For all figures, the choice of using individual symbols to identify data points from the same donor but not connecting lines was made to provide a neater image. Nevertheless, we have now modified the figure linking the data points from the same donor. The statistical analysis performed is always donor-paired whenever appropriate. 

      The casual inference suggested in the results narrative between ‘macrophage apoptosis’ and granulomatous response line 173-175) is not tested directly by the experiment – I suggest the authors exclude this statement.

      Fair point, the statement has been removed.

      To what extent have the authors considered whether variation in T cell responses between lineages may be confounded by variation in Mtb reactive T cell frequencies in donor PBMC. Can this be disentangled at all? This should be acknowledged as a potential limitation of the study.

      We did characterize the presence of mycobacterial antigen-specific reactive T cells in the PBMCs from the investigated donors. To do so, we performed in vitro stimulations with purified protein derivative (PPD) or an ESAT-6/CFP-10 peptide pool and quantified the frequency of IFN-γ-positive CD4 T cells by flow cytometry. The percentage of IFN-γg-positive CD4 T cells recalled by PPD stimulation ranged from 0.02% to 0.13%, while no ESAT6/CFP-10 reactive T cells were detected. As such, we can akest that the PBMC donors never encountered Mtb even though some levels of memory recalled by PPD may be due to cross-reactivity with BCG or pre-exposure to non-tuberculous mycobacteria. We have now added a panel in Figure 5—figure supplement 2 representing the frequency of mycobacteria-specific CD4 T cells and, as suggested, discussed the impact on the extent of the T cell responses observed in granulomas in the revised version of the manuscript.  Nevertheless, the observed MTBC strain-specific trends are consistent across the donors, as depicted in Figure 5B and Figure 5—figure supplement 2A-B.

      Moreover, the experimental design does not really test cause and effect for the relationship between T cell proliferation/activation and bacterial growth. What is the impact of T-cell depletion from PBMC on bacterial growth?

      The increased TB susceptibility of HIV patients demonstrated that T cells play a critical part in the control of Mtb infection. We agree and did envisage such a depletion experiment. However, depleting T cells from PBMCs would imply removing up to 70% of the cells present in the specimen, which would lead to a situation from which results cannot be compared to the original sample and therefore would not be interpretable. 

      Reviewer #3:

      Data presentation:

      - In Figure 1 (replication rate), actual cumulative CFU means from each strain for both days 1 and 8 with statistical analysis should be presented as panels in this figure.

      Agreed. We are providing the requested representation of the data and the corresponding paired statistical analysis as supplementary material Figure 1—figure supplement 1.

      - In Figure 2 (dormancy), a panel comparing the mean number of bacteria that are single positive for either Auramine-O, Nile Red, or are double positive should be included for each strain, with statistical analysis. Representative photomicrographs of phenotypes from the staining should also be included. Electron microscopy could be conducted to compare the presence of intermediate lipid inclusions within organoidbound mycobacteria.

      As requested, percentages of single stained as well as double positive bacilli in each sample are now represented in Figure 2—figure supplement 1. In addition, we have now also followed the request and included a photomicrograph picturing representative Mtb staining phenotypes. Lastly, it would certainly be very elegant to visualize the presence of Mtb lipid inclusions within cellular aggregates by electron microscopy. However, we do not currently have the means for such investigations and the implementation of such a protocol under BSL3 conditions appears unrealistic in the context of this study.  

      - In Figure 3 (granulomatous response), the number, circularity, and size of immune aggregates are presented as "granuloma score" in which the mean ratio of size to circularity is divided by the number of inclusions. To their credit, in Supplementary Figure 2, the authors provide the data in a straighAorward manner. However, the granuloma score metric is reduced as the number of observed "granulomas" increases, which is counterintuitive. Additionally, circularity is not a definitive aspect of human granulomas (Wells et al., Am J Respir Crit Care Med, 2021, PMID: 34015247). I am skeptical that the "granuloma score" is an accurate predictor granulomatous inflammation. Is there precedent for this metric in the literature? If so, a reference should be provided. A high magnification inset of 1 representative granuloma from each strain should be included in Figure 3A.

      As requested, insets of a representative average granuloma for each strain have been included in Figure 3A. The formulation of the “granuloma score” has no precedent and cannot be referenced. By doing so, we meant to integrate within one single parameter the visual differences represented in the current Figure 3— figure supplement 2. We intentionally sought to assign the highest score to the massive aggregation that some strains may promote unlike some that trigger several small, dispersed and diffused aggregates.

      - In Figure 4 (macrophage apoptosis), a panel showing the percentage of dual Annexin V and 7-AAD positive cells should be included to provide the reader with the relative scope of ongoing apoptotic vs necrotic/secondary necrotic death in the model. If the data is readily available, including a control of uninfected PBMCs would also allow the reader to evaluate donor-dependent differences of in vitro cell death at baseline.

      No significant differences were observed in the percentage of dual Annexin V- and 7-AAD-positive macrophages (necrosis/secondary necrosis) between the MTBC strains at this time-point. Nevertheless, we have disclosed this result in the revised manuscript as Figure 4—figure supplement 2.

      - In Figures 5 and 6 (lymphocyte activation and soluble mediator secretion), panels showing unscaled data should be included. Panels depicting the unscaled immunoassay protein readings (pg/mL) by strain for CXCL9, granzyme B, and TNF with statistical analysis should be included in Figure 6.

      As requested, unscaled lymphocyte activation and soluble mediator data have been included as Figure 5— figure supplement 2 and Figure 6—figure supplement 1, respectively (replacing former supplementary figures 5 and 7). In addition, updated Figure 6G panel now depicts correlation analysis with the unscaled cytokine concentrations.

      The DosR-regulon:

      The authors hypothesize that differences in the prevalence of the dormancy metrics (acid-fastness or lipid inclusion prevalence, are due to strain-specific increases in expression of the DosR regulon within the model's hypoxic conditions (lines 107-114, 126-127). The claim that their model is equipped to evaluate dosR-dependent mycobacterial phenotypes was also previously proposed (Arbués et el., 2021) and should be tested. A comparison of the dosR-dependent gene expression of each strain in PBMC aggregates and broth culture by qRT-PCR would test this idea at a very basic level.

      We agree. Actually, a similar request was made during the revision of our first in vitro granuloma study for which such qPCR data were generated and presented in Fig. 1 D (PMID: 32069329). In addition, the work of Kapoor et al., who originally developed the in vitro granuloma model also demonstrated the induction of most of the DosR regulated genes by qPCR (PMID: 23308269). We trust that the reviewer will agree that this does not need to be repeated.

      The modern Beijing lineage strain L2C:

      The authors claim (Line 101-102) that the results of Figure 1 "confirm the higher virulence propensities of strains from modern lineages". From the data presented, it appears that strain L2C (Modern-Beijing) dominates the modern vs ancestral and inter/intra-lineage phenotypes of replication, dormancy, and apoptosis. Are significant differences between modern and ancestral lineages or between strains simply a facet of the distinct profile of L2C? Do the statistical differences disappear when the L2C group is excluded?

      Indeed, among the modern lineages’ isolates, L2C exhibits a hypervirulent profile in terms of bacterial replication. However, the difference between modern and ancestral strains remains statistically significant when L2C is excluded from the analysis (p = 0.002). That is also the case when we analyze the proportion of dormant bacteria. Exclusion of L2C strain results in a Kruskal-Wallis overall p = 0.005, and p = 0.0002 when we compare L2 vs. L3. Lastly, regarding the percentage of apoptotic macrophages, if we use L2B (instead of L2C) to compare, the difference is still significant vs. L1A (p = 0.008) although there is no longer a trend for L2A (p = 0.1).

      "Dormancy":

      Dormancy is definitively a non-replicative state, where bacterial growth is absent. The authors' findings and claims appear to be incompatible with that definition, which they acknowledge (Lines 130-135). The lack of correlation between growth and dormancy in their model is supported with reference to Figure 2C, a Spearman's analysis of dormancy ratio with growth rate (inclusive of all strains under consideration). The figure supports a model where "dormancy" and "growth rate" are disjunct but also appears to show high "dormancy" accompanying increasing "growth" in the L2C group. How are strains able to grow if they are in a non-replicative state? Are the "growth rate" assays actually measures of survival? Are there different rates of infectivity? Are the bacteria growing cellularly in the serum-rich ECM, etc. etc? We need to see the hard CFU and Nile Red, and Auramine-O data to contextualize these findings. Alternatively, could the accumulation of inclusions in the model not be a reliable dormancy metric (Fines et al., BioRxiv [Preprint], 2023, PMID: 37609245)?

      We fully agree. The Nile red profiles are always relative and only depict the proportion of the population that has entered a dormant state. Nevertheless, dormancy can be dynamic and bacteria may swi]ly resuscitate in that model. Furthermore, and as depicted in Figure 2—figure supplement 1, despite showing an increased tendency to enter a dormant-like state, a considerable population of lineage 2 bacilli still remains metabolically active and in a replicative state. The referred preprint is very interesting and we will follow it up closely.

      Specificity of responses to PBMC aggregation:

      The authors claim that their results "reveal a broad spectrum of granulomatous responses" (Line 73) but do not show any aggregation specificity of PBMC responses beyond the model's intrinsic metrics of area and circularity. To establish that their phenotypes such as lymphocyte activation, cytokine release, cell death, or mycobacterial acid-fastness/lipid inclusion prevalence, are aspects of the granulomatous response the authors could infect PBMCs from the same donors with the same strains and perform the same assays using established Mtb-PBMC models in which the cells do not aggregate. This would answer many important questions, for example, does the rate of macrophage infection account for variability in apoptosis percentage? Phagocytosis assay and quantification of stained intracellular mycobacteria within recently infected PBMCs could be conducted to determine if phenotypes are an aspect of granulomatous aggregation or due to strain-specific differences in cellintrinsic macrophage immunity. It would also be very informative to know what percentage of PBMCs and mycobacteria are granuloma-bound in the ECM.

      We are not aware of Mtb-PBMC models in which the cells do not aggregate. We previously compared PBMC infection models in the presence or absence of the collagen matrix and cells also spontaneously coalesced around infection foci (PMID: 34603299). Regarding the last point, the melting step of the collagen matrix requires enzymatic digestion and pipetting that dislocate the aggregates. Accordingly, we cannot distinguish the bacteria that would remain within the matrix compared to those replicating within cellular aggregates. However, we did resolve this question by demonstrating that the bacteria were not able to grow in the absence of cells in this culture condition (Supplementary material, PMID: 34603299)

      Minor recommendations

      - The term TNF-a should be replaced with TNF throughout the manuscript.

      We acknowledge that the term TNF-a can be interchangeable with TNF. However, we chose to use the TNFα terminology to differentiate it from lymphotoxin α, which is also referred to as TNF-β.

      - The authors cite studies conducted in murine and NHP models to support the claim that "understanding of immune protective traits in TB remains insufficient and yet dominated by data from mouse and non-human primate studies" (Lines 63-64) but ignore an abundance of data from other in vivo and in vitro models that have provided numerous valuable insights in the field of TB immunology. This line should be revised or omired.

      For us, the term “dominate” implies that these models are widely used, not that they are the only ones. Other models indeed provided additional relevant data. We are citing the lung-on-chip model of McKinney’lab and the in vitro granuloma model of Elkigton’s lab (line 66). We would be very happy to include more references upon further specifications even though we cannot build an extensive review here.

      - The authors claim that their model "encompasses, with the exception of neutrophils, all immune cell types involved in TB" (Lines 67-68). To support this claim, they should provide additional references or data demonstrating that the PBMC aggregates include, eosinophils, mast cells, dendritic cells, yolk-sac-derived alveolar macrophages, and Langhan's giant cells.

      With the aim of providing a more accurate and detailed information regarding the cell types present in the model, the sentence has been reformulated as: “The model encompasses all PBMC-derived cell types involved in TB immune responses, but lacks granulocytes (i.e. neutrophils, eosinophils, basophils and mast cells)” (line 260). Noteworthy, the presence of multinucleated giant cells was reported in Kapoor’s paper describing the in vitro granuloma model for the first time (PMID: 23308269).

      -  As an additional note, the title can be improved and made more broadly accessible by revising the use of the acronyms CXCL9, granzyme B, and TNF-α.

      To render the title more broadly accessible we propose to replace the listed acronyms by “soluble immune mediators”, but we remain opened to more appropriate and specific suggestions.

      Answers to the reviewers’ public comments

      Reviewer #1:

      First of all, we would like to thank the reviewers for their feedback and suggestions to improve our manuscript. To strengthen the findings of our study, we have performed and added results from IL-1b and CXCL9 blocking experiments evaluating the impact on the granulomatous response and bacterial load, respectively. In the revised version of the manuscript, while we discuss the null effect on bacterial growth of the treatment with an anti-CXCL-9 antibody and the potential reason behind it, we are now reporting a negative effect on the magnitude of granuloma formation upon neutralization of IL-1b that the correlation analysis had initially suggested.

      Reviewer #2:

      The revised version of our manuscript incorporates now all the points detailed in the private answers to the reviewer, including clarifications on the statistical tests performed, additional supplementary materials to transparently disclose the raw data behind the normalization approach, as well as flow cytometry data on the immune memory status of the blood donors. In addition, and as stated in the answer to reviewer #1, to test causal relationship between some host and pathogen traits, we have now performed and provided data and interpretation of IL-1b and CXCL9 blocking experiments.

      Reviewer #3:

      We are thankful and concur with these constructive comments and insights. We have now consistently revisited the statistics in the figures to improve clarity and included new supplementary figures reporting the raw data that were missing in the initial version of the manuscript. In addition, and as mentioned in the answers to reviewers #1 and #2, we have now performed and added IL-1β and CXCL9 blocking experiments to test causal relationship between specific host and pathogen traits. In particular, we are now reporting a negative effect on the magnitude of granuloma formation upon neutralization of IL-1β that the correlation analysis had initially suggested.

      More specifically, regarding the point that our method for bacterial collection calls into question whether all Mtb plated for CFU assay resided within granulomatous aggregates, we previously reported that Mtb growth strictly required the presence of human cells in our culture conditions (Supplementary material, Arbués et al, 2021, PMID: 34603299). In the presence of cells, our microscopy read-out does allow us to observe extra-cellular growth if infections are carried on beyond an 8-day limit, which we applied in the current study to exclude this particular caveat. 

      Concerning the apparently conflicting observation that those strains displaying an increased tendency to enter a dormant-like state are the ones exhibiting the highest replication rates, we would like to point out that a considerable population of bacilli still remains metabolically active and in a replicative state. For instance, and as depicted in Figure 2—figure supplement 1, despite showing an increased tendency to enter a dormant-like state, a considerable population of lineage 2 bacilli does remain metabolically active. Moreover, dormancy can be dynamic and bacteria may swi]ly resuscitate.

      Regarding the mentioned limitations of our study that we have discussed in the revised version of our manuscript, we fully concur that PBMC-based in vitro granuloma models lack tissue structure as well as some important stromal and immune cellular players. Nevertheless, we and others demonstrated the particular relevance of the 3-dimensional infection approach within a matrix of collagen and fibronectin by providing mechanistical insights into Mtb resuscitation previously associated to treatment with various immunomodulatory drugs (Arbués et al., 2020, PMID: 32069329; Tezera et al., 2020, PMID: 32091388).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This manuscript describes the impact of modulating signaling by a key regulatory enzyme, Dual Leucine Zipper Kinase (DLK), on hippocampal neurons. The results are interesting and will be important for scientists interested in synapse formation, axon specification, and cell death. The methods and interpretation of the data are solid, but the study can be further strengthened with some additional studies and controls.

      We greatly appreciate the thorough review and thoughtful suggestions from the reviewers and editors on our original manuscript. We provide point-to-point response below.  We added new studies on P10 mice and controls as suggested, and made revision of figures and texts for clarification. The revised manuscript includes three new supplemental figures; major text revision is copied under response.

      Reviewer #1 (Public Review):

      Summary:

      In this work, Ritchie and colleagues explore functional consequences of neuronal over-expression or deletion of the MAP3K DLK that their labs and others have strongly implicated in both axon degeneration, neuronal cell death, and axon regeneration. Their recent work in eLife (Li, 2021) showed that inducible over-expression of DLK (or the related LZK) induces neuronal death in the cerebellum. Here, they extend this work to show that inducible over-expression in Vglut1+ neurons also kills excitatory neurons in hippocampal CA1, but not CA3. They complement this very interesting finding with translatomics to quantify genes whose mRNAs are differentially translated in the context of DLK over-expression or knockout, the latter manipulation having little to no effect on the phenotypes measured. The authors note that several genes and pathways are differentially regulated according to whether DLK is over-expressed or knocked out. They note DLK-dependent changes in genes related to synaptic function and the cytoskeleton and ultimately relate this in cultured neurons to findings that DLK over-expression negatively impacts synapse number and changes microtubules and neurites, though with a less obvious correlation.

      Strengths:

      This work represents a conceptual advance in defining DLK-dependent changes in translation. Moreover, the finding that DLK may differentially impact neuronal death will become the basis for future studies exploring whether DLK contributes to differential neuronal susceptibility to death, which is a broadly important topic.

      We thank the reviewer for the comments on the value of our work.

      Weaknesses:

      This seems like two works in parallel that the authors have not yet connected. First is that DLK affects the translation of an interesting set of genes, and second, that DLK(OE) kills some neurons, disrupts their synapses, and affects neurite growth in culture.

      Specific questions:

      (1) Is DLK effectively knocked out? The authors reference the floxxed allele in their 2016 work (PMID: 27511108), however, the methods of this paper say that the mouse will be characterized in a future publication. Has this ever been published? The major concern is that here the authors show that Cre-mediated deletion results in a smaller molecular weight protein and the maintenance of mRNA levels.

      We apologize for out-of-date citation of the DLK(cKO)<sup>fl/fl</sup> mice.  The DLK(cKO)<sup>fl/fl</sup> mice have been published in (Li et al., 2021; Saikia et al., 2022); excision of the flox-ed exon was verified using several Cre drivers (Pv-Cre, AAV-Cre, and VGlut1-Cre in this study).  The flox-ed exon contains the initiation ATG and 148 amino acids.  By western blot analysis using antibodies against C-terminal peptides of DLK on cerebellar extracts (in Li et al., 2021) and hippocampal extracts (this study), the full-length DLK protein was significantly reduced (Fig 1A-B); DLK is expressed in other hippocampal cells, in addition to glutamatergic neurons, explaining remaining full-length DLK detected. 

      Our Ribo-seq of VGlut1-Cre; DLK(cKO)<sup>fl/fl</sup> detected remaining Dlk mRNAs lacking the floxed exon (Fig.S1C), which has several candidate ATG at amino acid 223 and after (Fig.S1C1). We detected a very faint band for smaller molecular weight proteins on western blots, only when the membrane was exposed under 5X longer exposure using Pico PLUS Chemiluminescent Substrate (Thermo Scientific, 34580) and a Licor Odyssey XF Imager (revised Fig. S1B). This smaller molecular weight protein might be produced using any candidate ATGs, but would represent an N-terminal truncated DLK protein lacking the ATP binding site and ~1/4 of the kinase domain, i.e. not a functional kinase. 

      The revised manuscript has updated citation for DLK(cKO)<sup>fl/fl</sup>. Revised Fig.S1B includes images of a western blot under normal exposure vs longer exposure of western blots using anti-DLK antibodies. New Fig.S1C1 shows effects of floxed exon on DLK.

      (2) Why does DLK(OE) not kill CA3 neurons? The phenomenon is clear but there is no link to gene expression changes. In fact, the highlighted transcript in this work, Stmn4, changes in a DLK-dependent manner in CA3.

      We agree that this is a very interesting question not answered by our gene expression analysis.  While we verified Stmn4 expression levels to correlate to the levels of DLK, we do not think that increased Stmn4 per se in DLK(iOE) is a major factor accounting for CA1 death vs CA3 survival. Several published studies have also reported regulation of Stmn4 mRNAs in other cell types, in the contexts of cell death (Watkins et al., 2013; Le Pichon et al., 2017) and axon regeneration and cytoskeleton disruption (Asghari Adib et al., 2024; DeVault et al., 2024; Hu et al., 2019;  Shin et al., 2019). As Stmns have significant expression and function redundancy, conventional knockdown or overexpression of individual Stmn generally does not lead to detectable effects on cellular function. As CA3 neurons are widely known for their dense connections and show resilience to NMDA-mediated neurotoxicity (Sammons et al., 2024; Vornov et al., 1991), we speculate that the differential vulnerability of CA1 and CA3 under DLK(iOE) is a reflection of both the intrinsic property, such as gene expression, and also their circuit connection. 

      In the revised manuscript, we have included following statement on pg 18:

      ‘While our data does not pinpoint the molecular changes explaining why CA3 would show less vulnerability to increased DLK, we may speculate that DLK(iOE) induced signal transduction amplification may differ in CA1 vs CA3. CA1 genes appear to be more strongly regulated than CA3 genes, consistent with our observation that increased c-Jun expression in CA1 is greater than that in CA3. Other parallel molecular factors may also contribute to resilience of CA3 neurons to DLK(iOE), such as HSP70 chaperones, different JNK isoforms, and phosphatases, some of which showed differential expression in our RiboTag analysis of DLK(iOE) vs WT (shown in File S2. WT vs DLK(iOE) DEGs). Together with other genes that show dependency on DLK, the DLK and Jun regulatory network contributes to the regional differences in hippocampal neuronal vulnerability under pathological conditions.’

      Further we state in ‘Limitation of our study’ on pg 20:

      ‘Our analysis also does not directly address why CA3 neurons are less vulnerable to increased DLK expression. Future studies using cell-type specific RiboTag profiling and other methods at a refined time window will be required to address how DLK dependent signaling interacts with other networks underlying hippocampal regional neuron vulnerability to pathological insults.’

      We hope our data will stimulate continued interests for testable hypothesis in future studies.

      (3) Why are whole hippocampi analyzed to IP ribosome-associated mRNAs? The authors nicely show a differential effect of DLK on CA1 vs CA3, but then - at least according to their methods ¬- lyse whole hippocampi to perform IP/sequencing. Their data are therefore a mix of cells where DLK does and does not change cell death. The key issue is whether DLK does/does not have an effect based on the expression changes it drives.

      At the time of planning the Ribo-Tag experiment several years ago, we focused on the hippocampal glutamatergic neurons. Due to technical difficulty in micro-dissecting individual hippocampal regions from this early timepoint, we opted to use whole hippocampi to isolate ribosome-associated mRNAs. We agree with the reviewer that it is important to sort out DLK-dependent general gene expression changes vs those specific to a particular cell type where DLK impacts its survival. With emerging CA1, CA3 and other cell-type specific Cre drivers and advanced RNAseq technology, we hope that our work will stimulate broad interest in these questions in future studies. 

      In the revised manuscript, we have included new analysis comparing our Vglut1-RiboTag profiling (P15) with CamK2-RiboTag (for CA1) and Grik4-RiboTag (for CA3) (P42) published in Traunmüller et al., 2023 (GSE209870). We find that >80% of the top ranked genes in their CamK2-RiboTag (for CA1) and Girk4-RiboTag (for CA3) were detected in our VGlut1-RiboTag (revised methods and Supplemental Excel File S3). CA1-enriched genes tended to be expressed higher in DLK(cKO), compared to control, whereas CA3-enriched genes showed less significant correlation to DLK expression levels. Additionally, many genes known to specify CA1 fate do not show significant downregulation in DLK(iOE). This analysis, along with other data in our manuscript, is consistent with an idea that DLK does not regulate neuronal fate.

      In the revised manuscript, we presented this additional analysis in Fig. S6K-L, and expanded text description on page 9:

      ‘Additionally, we compared our Vglut1-RiboTag datasets with CamK2-RiboTag and Grik4-RiboTag datasets from 6-week-old wild type mice reported by (Traunmüller et al., 2023; GSE209870). We defined a list of genes enriched in CamK2-expressing CA1 neurons relative to Grik4-expressing CA3 neurons (CA1 genes), and those enriched in Grik4-expressing CA3 neurons (CA3 genes) (File S3). When compared with the entire list of Vglut1-RiboTag profiling in our control and DLK(cKO), we found CA1 genes tended to be expressed more in DLK(cKO) mice, compared to control (Fig.S6K), while CA3 genes showed a slight enrichment in control though the trend was less significant, and were less clustered towards one genotype (Fig.S6L). Moreover, many CA1 genes related to cell-type specification, such as FoxP1, Satb2, Wfs1, Gpr161, Adcy8, Ndst3, Chrna5, Ldb2, Ptpru, and Ntm, did not show significant downregulation when DLK was overexpressed. These observations imply that DLK likely specifically down-regulates CA1 genes both under normal conditions and when overexpressed, with a stronger effect on CA1 genes, compared to CA3 genes. Overall, the informatic analysis suggests that decreased expression of CA1 enriched genes may contribute to CA1 neuron vulnerability to elevated DLK, although it is also possible that the observed down-regulation of these genes is a secondary effect associated with CA1 neuron degeneration’.

      (4) Is the subtle decrease in synapse number (Basson/Homer co-loc.) in the DLK (OE) simply a function of neurons (and their synapses, presumably) having died? At the P15 time point that the authors choose because cell death is minimal, there is still a ~25% reduction in CA1 thickness (Figure 2B), which is larger than the ~15% change in synapses (Figure 5H) they describe.

      We thank reviewer for the question. To address this, we have analyzed synapses in the CA1 region at P10 in DLK(iOE) mice when there was no detectable loss of neurons. At P10, we did not detect significant changes in Bassoon, Homer1, or colocalized puncta in CA1 (Fig.S11A-F). In P15 DLK(iOE) mice, Homer1 puncta were slightly smaller (Fig.5L) and showed a significant decrease in CA1 SR (Fig.5I).

      In the revised manuscript we have also redone our statistical analysis of synapses, using mice rather than ROIs (revised Fig. 5), as recommended by R3. We also analyzed synapses in CA3, and found no significant differences in P10 or P15 (Fig.S12).  We would interpret the data to mean that the effects of DLK(OE) on synapses in CA1 may represent an early step in neuronal death. We hope that future studies will shed clarity on this question.

      Reviewer #2 (Public Review):

      This manuscript describes the impact of deleting or enhancing the expression of the neuronal-specific kinase DLK in glutamatergic hippocampal neurons using clever genetic strategies, which demonstrates that DLK deletion had minimal effects while overexpression resulted in neurodegeneration in vivo. To determine the molecular mechanisms underlying this effect, ribotag mice were used to determine changes in active translation which identified Jun and STMN4 as DLK-dependent genes that may contribute to this effect. Finally, experiments in cultured neurons were conducted to better understand the in vivo effects. These experiments demonstrated that DLK overexpression resulted in morphological and synaptic abnormalities.

      Strengths:

      This study provides interesting new insights into the role of DLK in the normal function of hippocampal neurons. Specifically, the study identifies:

      (1) CA1 vs CA3 hippocampal neurons have differing sensitivity to increased DLK signaling.

      (2) DLK-dependent signaling in these neurons is similar to but distinct from the downstream factors identified in other cell types, highlighted by the identification of STMN4 as a downstream signal.

      (3) DLK overexpression in hippocampal neurons results in signaling that is similar to that induced by neuronal injury.

      The study also provides confirmatory evidence that supports previously published work through orthogonal methods, which adds additional confidence to our understanding of DLK signaling in neurons. Taken together, this is a useful addition to our understanding of DLK function.

      We thank the reviewer for careful reading and positive comments.

      Weaknesses:

      There are a few weaknesses that limit the impact of this manuscript, most of which are pointed out by the authors in the discussion. Namely:

      (1) It is difficult to distinguish whether the changes in the translatome identified by the authors are DLK-dependent transcriptional changes, DLK-dependent post-transcriptional changes or secondary gene expression changes that occur as a result of the neurodegeneration that occurs in vivo. Additional expression analysis at earlier time points could be one method to address this concern.

      We appreciate the reviewer’s comment, and have performed new analysis on c-Jun and p-c-Jun levels in CA1, CA3, and DG in P10 DLK(OE) mice. Our data suggest that in CA3 elevations in p-c-Jun and c-Jun occur separately from cell death in a DLK-dependent manner, though the high elevation of both p-c-Jun and c-Jun in CA1 correlates with cell death.

      The data is presented in revised Fig.S7A,B, and described in revised text on pg 9-10:

      ‘In control mice, glutamatergic neurons in CA1 had low but detectable c-Jun immunostaining at P10 and P15, but reduced intensity at P60; those in CA3 showed an overall low level of c-Jun immunostaining at P10, P15 and P60; and those in DG showed a low level of c-Jun immunostaining at P10 and P15, and an increased intensity at P60 (Fig.S7A,C,E). In Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice at P10 when no discernable neuron degeneration was seen in any regions of hippocampus, only CA3 neurons showed a significant increase of immunostaining intensity of c-Jun, compared to control (Fig.S7A). In P15 mice, we observed further increased immunostaining intensity of c-Jun in CA1, CA3, and DG, with the strongest increase (~4-fold) in CA1, compared to age-matched control mice (Fig.S7C). The overall increased c-Jun staining is consistent with RiboTag analysis.’

      Also, on pg.10:

      In Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice, we observed increased p-c-Jun positive nuclei in CA1 at P10, and strong increase in CA1 (~10-fold), CA3 (~6-fold), and DG (~8-fold) at P15 (Fig.S7B,D).

      (2) Related to the above, it is difficult to conclusively determine from the current data whether the changes in synaptic proteins observed in vivo are a secondary result of neuronal degeneration or a primary impact on synapse formation. The in vitro studies suggest this has the potential to be a primary effect, though the difference in experimental paradigm makes it impossible to determine whether the same mechanisms are present in vitro and in vivo.

      We appreciate the comment, which is related to R1 point 4. We have performed further analysis and revised the text on pg.12 with the following text:

      ‘To assess effects of DLK overexpression on synapses, we immunostained hippocampal sections from both P10 and P15, with age-matched littermate controls. Quantification of Bassoon and Homer1 immunostaining revealed no significant differences in CA1 SR and CA3 SR and SL in P10 mice of _<_i>Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> and control (Fig.S11A-F, S12A-J). In P15, Bassoon density and size in CA1 SR were comparable in both mice (Fig 5G, H, K), while Homer1 density and size were reduced in DLK(iOE) (Fig.5G,I, L). Overall synapse number in CA1 SR was similar in DLK(iOE) and control mice (Fig.5J). Similar analysis on CA3 SR and SL detected no significant difference from control (Fig.S12M-V).’

      We would interpret the data to mean that the effects of DLK(OE) on synapses in CA1 may represent an early step in neuronal death. We hope that future studies will shed clarity on this question.

      Additionally, to address whether the same mechanisms are present in vitro, we have performed further analysis on cultured hippocampal neurons. As described in the Methods, we made hippocampal neuron cultures from P1 pups of the following crosses:

      For control: Vglut1<sup>Cre/+</sup> X Rosa26<sup>tdT/+</sup> 

      For DLKcKO: Vglut1<sup>Cre/+</sup>;DLK(cKO)<sup>fl/fl</sup>  X Vglut1<sup>Cre/+</sup>;DLK(cKO)<sup>fl/fl</sup>;Rosa26<sup>tdT/+</sup> 

      For DLKiOE: H11-DLK<sup>iOE/iOE</sup> X Vglut1<sup>Cre/+</sup>;Rosa26<sup>tdT/+</sup> 

      Dissociated cells from a given litter were pooled into the same culture. Because there were different proportions of neurons with our genotype of interest in each culture, it is not simple to know whether DLK was causing significant cell death.

      On pg 13, we stated our observation:

      ‘We did not notice an obvious effect of DLK(iOE) or DLK(cKO) on neuron density in cultures at DIV2. To assess neuronal type distribution in our cultures, we immunostained DIV14 neurons with antibodies for Satb2, as a CA1 marker (Nielsen et al., 2010), and Prox1, as a marker of DG neurons (Iwano et al., 2012). We did not observe significant differences in the proportion of cells labeled with each marker in DLK(cKO) or DLK(iOE) cultures (Fig.S13E). These data are consistent with the idea that DLK signaling does not have a strong role in neuron-type specification both in vivo and in vitro’.

      (3) The phenotype of DLK cKO mice is very subtle (consistent with previous reports) and while the outcome of increased DLK levels is interesting, the relevance to physiological DLK signaling is less clear. What does seem possible is that increased DLK may phenocopy other neuronal injuries but there are no real comparisons to directly address this in the manuscript. It would be helpful for the authors to provide this analysis as well as a table with all of the translational changes along with fold changes.

      Thank you for the suggestion. The fold changes of genes showing significantly altered expression in DLK(cKO) and DLK(iOE) are provided in the excel files (Supplementary excel File S1 WT vs DLK(cKO) DEGs and File S2. WT vs DLK(iOE) DEGs, highlighted columns B and F).  

      On pg 6, we revised the text as following to include comparison of DLK levels in other physiological conditions and our mice:

      ‘Several studies have reported that DLK protein levels increase under a variety of conditions, including optic nerve crush (Watkins et al., 2013), NGF withdrawal (~2 fold) (Huntwork-Rodriguez et al., 2013; Larhammar et al., 2017), and sciatic nerve injury (Larhammar et al., 2017). Induced human neurons show increased DLK abundance about ~4 fold in response to ApoE4 treatment (Huang et al., 2019). Increased expression of DLK can lead to its activation through dimerization and autophosphorylation (Nihalani et al., 2000)’.

      And,

      ‘Additional analysis at the mRNA level (supplemental excel, File S2. WT vs DLK(iOE) DEGs) and at the protein level (Fig.S8E) suggest that the increase in DLK abundance was around 3 times the control level. The localization patterns of DLK protein appeared to vary depending on region of hippocampus and age of animals in both control and Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice (Fig.S3C).’

      In Discussion, we state (pg. 16): ‘The levels of DLK in our DLK(iOE) mice model appear comparable to those reported under traumatic injury and chronic stress.’

      (4) For the in vivo experiments, it is unclear whether multiple sections from each animal were quantified for each condition. More information here would be helpful and it is important that any quantification takes multiple sections from each animal into account to account for natural variability.

      We apologize this was unclear in the original manuscript.

      In the revised methods, under Confocal imaging and quantification (pg 33), we stated: “For brain tissue, three sections per mouse were imaged with a minimum of three mice per genotype for data analysis.”

      In revised figure legends, we made it clear that multiple sections from each animal have been used for quantification in all instances, i.e. “Each dot represents averaged thickness from 3 sections per mouse, N≥4 mice/genotype per timepoint.” 

      In Fig.1F-H: “Each dot represents averaged intensity from 3 sections per mouse”

      In Fig.S3B “Data points represent individual mice, averages taken across 3 sections per mouse”

      Reviewer #3 (Public Review):

      Dr Jin and colleagues revisit DLK and its established multifactorial roles in neuronal development, axonal injury, and neurodegeneration. The ambitious aim here is to understand the DLK-dependent gene network in the brain and, to pursue this, they explore the role of DLK in hippocampal glutamatergic neurons using conditional knockout and induced overexpression mice. They produce evidence that dorsal CA1 and dentate gyrus neurons are vulnerable to elevated expression of DLK, while CA3 neurons appear unaffected. Then they identify the DLK-dependent translatome featured by conserved molecular signatures and cell-type specificity. Their evidence suggests that increased DLK signaling is associated with possible STMN4 disruptions to microtubules, among else. They also produce evidence on cultured hippocampal neurons showing that expression levels of DLK are associated with changes in neurite outgrowth, axon specification, and synapse formation. They posit that downstream translational events related to DLK signaling in hippocampal glutamatergic neurons are a generalizable paradigm for understanding neurodegenerative diseases.

      Strengths

      This is an interesting paper based on a lot of work and a high number of diverse experiments that point to the pervasive roles of DLK in the development of select glutamatergic hippocampal neurons. One should applaud the authors for their work in constructing sophisticated molecular cre-lox tools and their expert Ribotag analysis, as well as technical skill and scholarly treatment of the literature. I am somewhat more skeptical of interpretations and conclusions on spatial anatomical selectivity without stereological approaches and also going directly from (extremely complex) Ribotag profiling patterns to relevance based on immunohistochemistry and no additional interventions to manipulate (e.g. by knocking down or blocking) their top Ribotag profile hits. Also, it seems to this reviewer that major developmental claims in the paper are based on gene translational profiling dependent on DLK expression, not DLK activation, despite some evidence in the paper that there is a correlation between the two. Therefore, observed patterns and correlations may or may not be physiologically or pathologically relevant. Generalizability to neurodegenerative diseases is an overreach not justified by the scope, approach, and findings of the paper.

      We thank the reviewer for the encouraging and constructive comments on the manuscript.

      Weaknesses and Suggestions:

      The authors state that the rationale for the translatomic studies is to "to gain molecular understanding of gene expression associated with DLK in glutamatergic neurons" and to characterize the "DLK-dependent molecular and cellular network", However, a problem with the experimental design is the selection of an anatomical region at a time point featured by active neurodegeneration. Therefore, it is not straightforward that the differentially expressed genes or pathways caused by DLK overexpression changes could be due to processes related to neurodegeneration. Indeed, the authors find enrichment of signals related to pathways involved in extracellular matrix organization, apoptosis, unfolded protein responses, the complement cascade, DNA damage responses, and depletion of signals related to mitochondrial electron transport, etc., all of which could be the consequence of neurodegeneration regardless of cause. A more appropriate design to discover DLK-dependent pathways might be to look at a region and/or a time point that is not confounded by neurodegeneration.

      We appreciate reviewer’s comment. We included our thoughts in ‘Limitation of the study’ (pg 20):

      ‘Future studies using cell-type specific RiboTag profiling and other methods at a refined time window will be required to address how DLK dependent signaling interacts with other networks underlying hippocampal regional neuron vulnerability to pathological insults.’

      In a related vein, the authors ask "if the differentially expressed genes associated with DLK(iOE) might show correlation to neuronal vulnerability" and, to answer this question, they select the set of differentially expressed genes after DLK overexpression and assess their expression patterns in various regions under normal conditions. It looks to me that this selection is already confounded by neurodegeneration which could be the cause for their downregulation. Therefore, such gene profiles may not be directly linked to neuronal vulnerability. A similar issue also relates to the conclusion that "...the enrichment of DLK-dependent translation of genes in CA1 suggests that the decreased expression of these genes may contribute to CA1 neuron vulnerability to elevated DLK".

      We agree with the reviewer’s concern that it is difficult to separate neurodegenerative consequences from changes caused by DLK solely based on our translatomics studies on P15 DLK(iOE) mice.  As responded to reviewer 1 (point 4) and reviewer 2 (point 1), we have included new analysis of P10 mice (Fig.S7A,B) when neurons did not show detectable sign of degeneration.

      We consider several lines of evidence supporting that some differentially expressed genes in DLK(iOE) vs control may likely be specific for increased DLK signaling.

      First, the genes identified in DLK(iOE) vs control represent a small set of genes (260), which is comparable to other DLK dependent datasets (Asghari Adib et al., 2024) but shows cell-type specificity.

      Second, our analysis using rank-rank hypergeometric overlap (RRHO) detects a significant correlation between upregulated genes from DLK(iOE) vs downregulated genes in DLK(cKO), and vice versa, suggesting that expression of a similar set of genes is depended on DLK (Fig.3C, S6C-E). Consistently, GO term analysis using the list of genes coordinately regulated by DLK, derived from our RRHO analysis, leads to identification of similar GO terms related to up- and downregulated genes as using DLK(iOE)-RiboTag data alone. SynGO analysis of DLK(iOE) regulated genes and DLK(cKO) regulated genes also identified similar synaptic processes regulated by significantly regulated genes (Fig.3F and S6J).  

      Third, we performed additional analysis comparing our Vglut1-RiboTag dataset with CamK2-RiboTag and Grik4-RiboTag datasets from 6-week-old wild type mice reported by (Traunmüller et al., 2023; GSE209870). We observed >80% overlap among the top ranked genes (revised Methods). We described this analysis on pg 9 and Fig. S6K-L (and Supplemental Excel File S3):

      ‘Additionally, we compared our Vglut1-RiboTag datasets with CamK2-RiboTag and Grik4-RiboTag datasets from 6-week-old wild type mice reported by (Traunmüller et al., 2023; GSE209870). We defined a list of genes enriched in CamK2-expressing CA1 neurons relative to Grik4-expressing CA3 neurons (CA1 genes), and those enriched in Grik4-expressing CA3 neurons (CA3 genes) (File S3). When compared with the entire list of Vglut1-RiboTag profiling in our control and DLK(cKO), we found CA1 genes tended to be expressed more in DLK(cKO) mice, compared to control (Fig.S6K), while CA3 genes showed a slight enrichment in control though the trend was less significant, and were less clustered towards one genotype (Fig.S6L). Moreover, many CA1 genes related to cell-type specification, such as FoxP1, Satb2, Wfs1, Gpr161, Adcy8, Ndst3, Chrna5, Ldb2, Ptpru, and Ntm, did not show significant downregulation when DLK was overexpressed. These observations imply that DLK likely specifically down-regulates CA1 genes both under normal conditions and when overexpressed, with a stronger effect on CA1 genes, compared to CA3 genes. Overall, the informatic analysis suggests that decreased expression of CA1 enriched genes may contribute to CA1 neuron vulnerability to elevated DLK, although it is also possible that the observed down-regulation of these genes is a secondary effect associated with CA1 neuron degeneration.’

      To understand the role and relevance of the DLK overexpression model, there should be a discussion of to what extent it corresponds to endogenous levels of DLK expression or DLK-MAPK pathway activation under baseline or pathological conditions.

      We appreciate the suggestion, which is similar to R2 point 3. We have revised the text and discussion to include how DLK levels may be altered in other physiological conditions vs our mice.

      Pg. 6: ‘Several studies have reported that DLK protein levels increase under a variety of conditions, including optic nerve crush (Watkins et al., 2013), NGF withdrawal (~2 fold) (Huntwork-Rodriguez et al., 2013; Larhammar et al., 2017), and sciatic nerve injury (Larhammar et al., 2017). Induced human neurons show increased DLK abundance about ~4 fold in response to ApoE4 treatment (Huang et al., 2019). Increased expression of DLK can lead to its activation through dimerization and autophosphorylation (Nihalani et al., 2000)’.

      And,

      ‘Additional analysis at the mRNA level (supplemental excel, File S2. WT vs DLK(iOE) DEGs) and at the protein level (Fig.S8E) suggest that the increase in DLK abundance was around 3 times the control level. The localization patterns of DLK protein appeared to vary depending on region of hippocampus and age of animals in both control and Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice (Fig.S3C).’

      In Discussion (pg. 16): ‘The levels of DLK in our DLK(iOE) mice model appear comparable to those reported under traumatic injury and chronic stress.’

      The authors posit that "dorsal CA1 neurons are vulnerable to elevated DLK expression, while neurons in CA3 appear largely resistant to DLK overexpression". This statement assumes that DLK expression levels start at a similar baseline among regions. Do the authors have any such data? Ideally, they should show whether DLK expression and p-c-Jun (as a marker of downstream DLK signaling) are the same or different across regions in both WT and overexpression mice. For example, what are the DLK/p-c-Jun expression levels in regions other than CA1 in Supplementary Figures 2-3 and how do they compare with each other? Normalization to baseline for each region does not allow such a comparison. Also, in Supplementary Figure 6, analyses and comparisons between regions are done at a time point when degeneration has already started. Ideally, these should be done at P10.

      We thank the reviewer for raising these points. In the revised manuscript we have included protein expression analysis of DLK (Fig S3), c-Jun, and p-c-Jun at P10 (Fig. S7).

      We provided a quantification of DLK immunostaining intensity in CA1 and CA3 in Fig.S3D,E and find roughly comparable levels between regions.

      Pg. 6: ‘Additional analysis at the mRNA level (supplemental excel, File S2. WT vs DLK(iOE) DEGs) and at the protein level (Fig.S8E) suggest that the increase in DLK abundance was around 3 times the control level. The localization patterns of DLK protein appeared to vary depending on region of hippocampus and age of animals in both control and Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice (Fig.S3C).’

      We provided our quantifications without normalization to baseline in each region for c-Jun and p-c-Jun, and revised the text accordingly:

      Pg. 9-10: ‘In control mice, glutamatergic neurons in CA1 had low but detectable c-Jun immunostaining at P10 and P15, but reduced intensity at P60; those in CA3 showed an overall low level of c-Jun immunostaining at P10, P15 and P60; and those in DG showed a low level of c-Jun immunostaining at P10 and P15, and an increased intensity at P60 (Fig.S7A,C,E). In Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice at P10 when no discernable neuron degeneration was seen in any regions of hippocampus, only CA3 neurons showed a significant increase of immunostaining intensity of c-Jun, compared to control (Fig.S7A). In P15 mice, we observed further increased immunostaining intensity of c-Jun in CA1, CA3, and DG, with the strongest increase (~4-fold) in CA1, compared to age-matched control mice (Fig.S7C). The overall increased c-Jun staining is consistent with RiboTag analysis’.

      Pg. 10: ‘In Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice, we observed increased p-c-Jun positive nuclei in CA1 at P10, and strong increase in CA1 (~10-fold), CA3 (~6-fold), and DG (~8-fold) at P15 (Fig.S7B,D).

      Illustration of proposed selective changes in hippocampal sector volume needs to be very carefully prepared in view of the substantial claims on selective vulnerability. In 2A under P15 and especially P60, it is difficult to see the difference - this needs lower magnification and a lot of care that anteroposterior levels are identical because hippocampal sector anatomy and volumes of sectors vary from level to level. One wonders if the cortex shrinks, too. This is important.

      Thank you for raising the point. We have provided images to view the anteroposterior level in Fig.S2A-C. We have noticed cortex in DLK(OE) mice to become thinner, along with expansion of ventricles in some animals at later timepoints (Fig.S2C).

      One cannot be sure that there is selective death of hippocampal sectors with DLK overexpression versus, say, rearrangement of hippocampal architecture. One may need stereological analysis, otherwise this substantial claim appears overinterpreted.

      We appreciate the comment.

      In the revised manuscript, we included a new supplemental figure (Fig. S2) showing lower magnification images of coronal sections, and used cautionary wording, such as ‘CA3 is less vulnerable, compared to CA1’, to minimize the impression of over-interpretation.  By NeuN staining, at P10, P15, P60, we did not observe detectable difference in overall hippocampus architecture, apart from noted cell death of CA1 and DG and associated thinning of each of the layers. At 46 weeks, some animals showed differences in the overall shape of dorsal hippocampus, though this appeared to reflect a disproportionately large CA3 region compared to other regions (Fig S2). Increased GFAP staining (Fig.S5A-C) was detected in CA1 but not in CA3, and microglia by IBA1 staining (Fig.S5E) also displayed less reactivity in CA3, compared to CA1. Thus, based on NeuN staining, GFAP staining, IBA1 staining and analysis of the differentially regulated genes, we infer that the effect of DLK(iOE) in CA1 is different than the effect on CA3.

      Is the GFAP excess reflective of neuroinflammation? What do microglial markers show? The presence of neuroinflammation does not bode well with apoptosis. Speaking of which, TUNEL in one cell in Supplementary Figure 4E is not strong evidence of a more widespread apoptotic event in CA1.

      We have included staining data for the microglia marker IBA1. Both GFAP and IBA1 showed evidence of reactivity particularly in the CA1 region (S5A-E), supporting the differential vulnerability in different regions, though whether cell death is primarily due to apoptosis is unclear.

      We agree that our data of sparse TUNEL staining at P15 (Fig S5F,G) do not rule out whether other mechanisms of cell death may also occur.  We have included this in our limitations (pg.20) “While we find evidence for apoptosis, other forms of cell death may also occur.”

      In several places in the paper (as illustrated in Figure 4B, Supplementary Figure 2B, etc.): the unit of biological observation in animal models is typically not a cell, but an organism, in which averaged measures are generated. This is a significant methodological problem because it is not easy to sample neurons without involving stereological methods. With the approach taken here, there is a risk that significance may be overblown.

      We appreciate the reviewer’s point. We used same region for quantification of RNAscope, genotype-blind when possible. We revised the graphs to show mean values for individual mice in Fig.4B, 4C, and Fig.S3B (previously Fig.S2B).

      Other Comments and Questions:

      Supplementary Figure 9: The authors state that data points are shown for individual ROIs - ideally, they should also show averages for biological replicates. Can the authors confirm that statistical analyses are based on biological replicates (mice) and not ROIs?

      We have revised the graphs to show averages from individual mice in Fig.5B-D, F5E-F (previously Fig.S9G-I), Fig.5H-J, and Fig.5K-L (previously Fig.S9J-L)  and Fig.S10B,C,E,F (previously Fig.S9B,C, E,F). The statistical analyses are based on biological replicates of mice.

      For in vitro experiments, what is the effect of DLK overexpression on neuronal viability and density? Could these variables confound effects on synaptogenesis/synapse maturation?

      As described in the Methods, we made hippocampal neuron cultures from P1 pups of the following crosses:

      For control: Vglut1<sup>Cre/+</sup> X Rosa26<sup>tdT/+</sup> 

      For DLKcKO: Vglut1<sup>Cre/+</sup>;DLK(cKO)<sup>fl/fl</sup>  X Vglut1<sup>Cre/+</sup>;DLK(cKO)<sup>fl/fl</sup>;Rosa26<sup>tdT/+</sup> 

      For DLKiOE: H11-DLK<sup>iOE/iOE</sup> X Vglut1<sup>Cre/+</sup>;Rosa26<sup>tdT/+</sup> 

      Dissociated cells from a given litter were pooled into the same culture. Because there were different proportions of neurons with our genotype of interest in each culture, it is not simple to know whether DLK was causing significant cell death.

      On pg 13, we stated our observation:

      ‘We did not notice an obvious effect of DLK(iOE) or DLK(cKO) on neuron density in cultures at DIV2. To assess neuronal type distribution in our cultures, we immunostained DIV14 neurons with antibodies for Satb2, as a CA1 marker (Nielsen et al., 2010), and Prox1, as a marker of DG neurons (Iwano et al., 2012). We did not observe significant differences in the proportion of cells labeled with each marker in DLK(cKO) or DLK(iOE) cultures (Fig.S13E). These data are consistent with the idea that DLK signaling does not have a strong role in neuron-type specification both in vivo and in vitro’.

      We cannot rule out whether variable factors in our cultures may confound effects on synaptogenesis/synapse maturation, and would hope future studies will shed clarity.

      Correlations between c-jun expression and phosphorylation are extremely important and need to be carefully and convincingly documented. I am a bit concerned about Supplementary Figure 6 images, especially 6B-CA1 (no difference between control and KO, too small images) and 6D (no p-c-Jun expression at all anywhere in the hippocampus at P15?).

      At P10, P15, and P60 we stained for p-c-Jun using the Rabbit monoclonal p-c-Jun (Ser73) (D47G9) antibody from Cell Signaling (cat# 3270) at a 1:200 dilution and imaged using an LSM800 confocal microscope with a 20x objective. We observed p-c-Jun to be quite low generally in control animals. We have replaced the images in Fig.S7F (previously S6D), and adjusted the brightness/contrast to enable better visualization of the low signal in Fig.S7B,D,F (previously Fig.S6B,D).

      We revised our text to present the data carefully as stated above:

      Pg. 9-10: ‘In control mice, glutamatergic neurons in CA1 had low but detectable c-Jun immunostaining at P10 and P15, but reduced intensity at P60; those in CA3 showed an overall low level of c-Jun immunostaining at P10, P15 and P60; and those in DG showed a low level of c-Jun immunostaining at P10 and P15, and an increased intensity at P60 (Fig.S7A,C,E). In Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice at P10 when no discernable neuron degeneration was seen in any regions of hippocampus, only CA3 neurons showed a significant increase of immunostaining intensity of c-Jun, compared to control (Fig.S7A). In P15 mice, we observed further increased immunostaining intensity of c-Jun in CA1, CA3, and DG, with the strongest increase (~4-fold) in CA1, compared to age-matched control mice (Fig.S7C). The overall increased c-Jun staining is consistent with RiboTag analysis’.

      Pg. 10: ‘In Vglut1<sup>Cre/+</sup>;H11-DLK<sup>iOE/+</sup> mice, we observed increased p-c-Jun positive nuclei in CA1 at P10, and strong increase in CA1 (~10-fold), CA3 (~6-fold), and DG (~8-fold) at P15 (Fig.S7B,D).

      Recommendations for the authors:

      Several major and minor reservations were raised. The major issues are the need for more information about the over-expression of DLK and a need to extrapolate to an in vivo condition with DLK. A considerable amount of useful information is presented with some very nicely done experiments but it is not yet a coherent or integrated story. The lack of impact of DLK overexpression in some neurons is perhaps the most impactful observation of the study and would be great to have more information around the differential transcriptional/signaling response in these cell types. There is also a need for more experimental details and to address several questions about the mouse genetic and translatome analysis. They are valid concerns that require attention by the authors.

      We thank the editors and reviewers for their thoughtful evaluation and suggestions.  We hope that the editors and reviewers find that the new data and text changes in our revised manuscript, along with above point-to-point response, have addressed the concerns and strengthened our findings.

      Minor points:

      (1)The authors state that deletion of DLK has no effect on CA1 at 1yr, however, the image of CA1 in Figure S1D shows substantially fewer NeuN+ neurons. Is this a representative field of view?

      We have re-examined images, and observed no effect on hippocampal morphology at 1 yr. We now included representative images in the revised Fig S1D.

      (2) Is the DLK protein section staining in Figure 2C a real signal? The staining looks like speckles and is purely somatic. Axonal staining is widely expected based on the literature and the authors' own work. There should be a specificity control.

      To our knowledge, axonal staining of DLK reported in the literature is mostly based on cultured DRG neurons. In addition to the reported axonal localization, DLK is present in the cell soma, near the golgi (Hirai et al., 2002), and in the post-synaptic density (Pozniak et al., 2013).

      In the revised manuscript, we addressed this point by including controls with no primary antibody, and using an antibody against the closely related kinase, LZK. These additional data are shown in (Fig.S3C,D) (previously Fig.S2C), supporting that DLK protein staining represents real signal.  At P10 and P15, DLK immunostaining around CA3 showed axonal staining of the mossy fibers, as well as in the soma and dendritic layers (Fig.S3C,D). A similar pattern was also seen in primary cultured neurons (Fig 6A).

      (3) The protein expression of DLK in the transgenic overexpressor (Figure S7C) looks, to the resolution of this blot, to be at least 50kD heavier than 'WT' DLK. Can the authors explain this discrepancy?

      The Cre-induced DLK(iOE) transgene has T2A and tdTomato in-frame to C-terminus of DLK. It is known that T2A ‘self-cleavage’ is often incomplete. DLK-T2A-tdTomato would be about 50 kD bigger than WT DLK. We now include the transgene design in revised Fig S1D, and also stated in figure legend of Fig.S8C (previously S7C) that ‘Larger molecular weight band of DLK in Vglut1<sup>Cre/+</sup>;H11-DLKiOE/+ would match the predicted molecular weight of DLK-T2A-tdTomato if T2A-peptide induced ‘self-cleavage’ due to ribosomal skipping is ineffective (Fig.S1D).’

      (4) Expression changes in DLK affect various aspects of neurites in CA1 cultures (Figure 6), and changes in DLK also modestly affect STMN4 (and 2, perhaps indirectly) levels (Figure S7C), but there is no indication that DLK acts via STMN4 to cause these changes. It is not clear what to make of these data. Of note, Stmn4 levels change in response to DLK in CA3, without DLK affecting cell death in this region.

      We appreciate and agree with the comment. Other studies (Asghari Adib et al., 2024; DeVault et al., 2024; Hu et al., 2019; Larhammar et al., 2017; Le Pichon et al., 2017; Shin et al., 2019; Watkins et al., 2013) reported expression changes in Stmn4 mRNAs in other cell types and cellular contexts, which appeared to depend on DLK. Hippocampal neurons express multiple Stmns (Fig.S8A). While we present our analysis on the effects of DLK dosage on Stmn4, and also Stmn2, we do not think that DLK-induced changes of Stmn4 expression per se is a major factor underlying CA1 cell death vs CA3 survival.

      In the revised manuscript, we addressed this point in ‘Limitation of our study’ (pg 20):

      ‘Additional experiments will be needed to elucidate in vivo roles of STMN4 and its interaction with other STMNs’.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The article by Piersma et al. aims to reduce the complex process of NK cell licensing to the action of a single inhibitory receptor for MHC class I. This is achieved using a mouse strain lacking all of the Ly49 receptors expressed by NK cells and inserting the Ly49a gene into the Ncr1 locus, leading to expression on the majority of NK cells.

      Strengths:

      The mouse model used represents a precise deletion of all NK-expressed genes within the Ly49 cluster. The re-introduction of the Ly49a gene into the Ncr1 locus allows expression by most NK cells. Convincing effects of Ly49a expression on in vitro activation and in vivo killing assay are shown.

      Weaknesses:

      The choice of Ly49a provides a clear picture of H-2D<sup>d</sup> recognition by this Ly49. It would be valuable to perform additional studies investigating Ly49c and Ly49i receptors for H-2b. This is of interest because there are reports indicating that Ly49c may not be a functional receptor in B6 mice due to strong cis interactions.

      We agree with the reviewer that it will be important to extend our findings to H-2b haplotypes with individual cognate Ly49 receptors (Ly49C and Ly49I). While these experiments are subject of our ongoing studies, they are beyond the scope of the current manuscript considering the significant time, effort and cost to generate these new Ly49C and Ly49I knockin mice.

      This work generates an excellent mouse model for the study of NK cell licensing by inhibitory Ly49s that will be useful for the community. It provides a platform whereby the functional activity of a single Ly49 can be assessed.

      Reviewer #2 (Public review):

      Piersma et al. continue to work on deciphering the role and function of Ly49 NK cell receptors. This manuscript shows that a single inhibitory Ly49 receptor is sufficient to license NK cells and eliminate MHC-I-deficient target cells in mice. In short, they refined the mouse model ∆Ly49-1 (Parikh et al., 2020) into the Ly49KO model in which all Ly49 genes are disrupted. Using this model, they confirmed that NK cells from Ly49KO mice cannot be licensed, produce lower levels of IFN-gamma, and cannot reject MHC-I-deficient cells. To study the effect of a single Ly49 receptor in the function of NK cells, the authors backcrossed Ly49KO mice to H-2D<sup>d</sup> transgenic KODO (D8-KODO) Ly49A knock-in mice in which a single inhibitory Ly49A receptor that recognizes H-2D<sup>d</sup> ligands is expressed. By doing so, they demonstrate that a single inhibitory Ly49 receptor expressed by all NK cells is sufficient for licensing and missing-self killing.

      While the results of the study are largely consistent with the conclusions, it is important to address some discrepancies. For instance, in the title of Figure 1, the authors state that NK cells in Ly49KO mice compared to WT mice have a less mature phenotype , which is not consistent with the corresponding text in the Results section (lines 170-171) that states there is no difference in maturation. These differences are not evident in Figure 1, panel D. It is crucial to acknowledge these inconsistencies to ensure a comprehensive understanding of the research findings.

      We thank the reviewer for pointing this out. We have corrected the figure legend title to: “Mice generated to lack all NK-related Ly49 molecules using CRISPR have NK cells that display alterations in select surface molecules.”

      In the legend of Figure 2. the text related to panel C indicates the use of dyes to label the splenocytes, and CFSE, CTV, and CTFR were mentioned. However, only CTV and CTFR are shown on the plots and mentioned in the corresponding text in the Results section. Similarly, in the legend of Figure 4, which is related to panel C, the authors write that splenocytes were differentially labeled with CFSE and CTV as indicated; however, in Figure 4, C and the Results section text, there is no mention of CFSE.

      We thank the reviewer to point out these inconsistencies. We did label target cells with CFSE to distinguish them from host cells, to clarify we have done the following:

      We have removed CFSE from figure legends of Figure 2 and 4.

      We included the following on CFSE labeling in the Materials and Methods section: “Target splenocytes were additionally labeled with CFSE to identify transferred target splenocytes from host cells.”

      The authors should clarify why they assume that KLRG1 expression is influenced by the expression of inhibitory Ly49 receptors and not by manipulations on chromosome 6, where the genes for both KLRG1 and Ly49 receptors are located.

      The effect on KLRG1 expression in phenocopied in the Ly49A KI mice (on a Ly49 KO background). The Ly49A KI allele is encoded by the Ncr1 locus, which is located on chromosome 7 and not by chromosome 6 where KLRG1 is located, thus excluding involvement of cis-regulatory elements encoded by the Ly49 locus on chromosome 6. 

      We have clarified this in the discussion section (lines 350-358):

      “The Ly49 gene family as well as Klrg1 is located within the NKC on chromosome 6 (Yokoyama and Plougastel, 2003) ….  expression of only Ly49A, encoded in the Ncr1 locus located on chromosome 7, in Ly49KO mice on a H-2D<sup>d</sup> background restored KLRG1 expression”

      However, a better explanation for the possible influence of other inhibitory NK cell receptors still needs to be included. In the study by Zhang et al. (doi: 10.1038/s41467-019-13032-5 the authors showed the synergized regulation of NK cell education by the NKG2A receptor and the specific Ly49 family members. Although in this study, Piersma and colleagues show the control of MHC-I deficient cells by Ly49A+ NKG2A-NK cells in Figure 4., this receptor is not mentioned in the Results or in the Discussion section, so its role in this story needs to be clarified. Therefore, the reader would benefit from more information regarding NKG2A receptor and NKG2A+/- populations in their results.

      We agree with the reviewer that it is important to describe our results in the context of other inhibitory receptors. To clarify the role of NKG2A and potentially other inhibitory receptors we have made the following improvements to our manuscript:

      We discuss the role of NKG2A in the discussion section, which now include (lines 259-266):

      “While our results did not interrogate licensing by inhibitory receptors outside of the Ly49 receptor family, such as has been reported for NKG2A (Anfossi et al., 2006; Zhang et al., 2019), they do demonstrate that expression of Ly49A without other Ly49 family members can mediate NK cell licensing. Moreover, we found that Ly49 receptors are required and sufficient for missing-self rejection under steady-state conditions. However, these observations do not rule out involvement of other inhibitory receptors under specific inflammatory conditions. For example, NKG2A contributes to rejection of missing-self targets in poly(I:C)-treated mice (Zhang et al., 2019).”

      We also added the following to the result section (lines 179-182):

      NKG2A has been implicated in NK cell licensing by the non-classical MHC-I molecule Qa1 (Anfossi et al., 2006), to eliminate potential confounding effects by this interaction, effector functions of NKG2A- NK cells were evaluated as described before (Bern et al., 2017).

      Reviewer #3 (Public review):

      Summary:

      In this study, Piersma et al. successfully generated a mouse model with all Ly49n et al., 2017 genes knocked out, resulting in the complete absence of Ly49 receptor expression on the cell surface. The absence of Ly49 expression led to the loss of NK cell education/licensing and consequently, a failure in responsiveness against missing-self target cells. The experimental work and findings are partially overlapping with the previous work by Zhang et al. (2019), who also performed knockout of the entire Ly49 locus in mice and demonstrated that loss of NK responsiveness was due to the removal of inhibitory, and not activating Ly49 genes. The authors demonstrate the restoration of NK cell licensing by knocking in a single Ly49 gene, Ly49A, in a mouse expressing the H-2D<sup>d</sup> ligand for this receptor, which is a novel and important finding.

      Strengths:

      The authors established a novel mouse model enabling them to have a clean and thorough study on the function of Ly49 on NK cell licensing. Also, by knocking in a single Ly49, they were able to investigate the function of a given Ly49 receptor excluding the "contamination" of co-expression of any other Ly49 genes. Their idea and method were novel though the mouse model was somehow genetically similar to a previous study. The experiment design and data interpretation were logically clear and the evidence was solid.

      Weaknesses:

      The paper is very poorly written and confusing. The authors should be more accurate in the usage of terminology, provide more details on experimental procedures, and revise much of the text to improve clarity and coherence. A thorough revision aiming to clarify the paper would be helpful.

      We regret that the manuscript was confusing to the reviewer. We have made thorough revisions to the different sections, which we hope will improve the clarity of the manuscript.

      We have made changes to all sections of the manuscript, including the title. These revisions include improved clarity on description of NK cell licensing and consistent usage throughout the manuscript, per the reviewer recommendations. We hope that all our improvements help the clarity of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I was confused by lines 262-270 in the discussion. The data from Hanke et al. is presented as contradictory to the observation that Ly49s bind more efficiently to H2-Kb than -Db, but they showed that Ly49c/i did not bind Kb-deficient cells, supporting the preferred binding to Kb.

      We have clarified this issue and the paragraph now reads: “This is further supported by early studies using Ly49 transfectants binding to Con A blasts showing that Ly49C and Ly49I can bind to H-2D<sup>b</sup>-deficient but not H-2K<sup>b</sup>-deficient cells (Hanke et al., 1999), despite the caveat of testing binding to cells overexpressing Ly49s in these studies.”

      Reviewer #2 (Recommendations for the authors):

      The authors' conclusion that one type of inhibitory Ly49 receptor expressed on NK cells is sufficient for successful licensing and rejection of missing self-cells is a significant step forward. However, it would be beneficial to complement this with additional data. For instance, exploring the role of a single inhibitory Ly49 receptor responsible for licensing in a mouse model with a different haplotype (e.g. Ly49C or Ly49I on H-2b MHC I haplotype in C57BL/6J mice) could provide valuable insights and open new avenues for research in the field.

      We agree with the reviewer that it will be important to extend our findings to additional MHC-I haplotypes with single cognate Ly49 receptors. While these experiments are subject of our ongoing studies, they are beyond the scope of the current manuscript considering the significant effort, time, and cost to generate these new Ly49C and Ly49I knockin mice.

      Reviewer #3 (Recommendations for the authors):

      Specific issues that should be addressed are as follows:

      (1) The title of the paper: "Expression of a single inhibitory Ly49 receptor is sufficient to license NK cells for effector functions" is ambiguous. When I first read the title, I thought the authors meant that only a single Ly49 molecule on the NK cell surface was necessary to induce licensing. It might be better to replace "single inhibitory receptor" with "single member of Ly49 receptor family".

      We have changed the title to: “Expression of a single inhibitory member of the Ly49 receptor family is sufficient to license NK cells for effector functions”

      (2) In the abstract, introduction, and results, the authors distinguish "licensing" and "rejection of missing-self targets" as two distinct phenomena. An example includes Abstract, lines 51-53: "Herein, we showed mice lacking expression of all Ly49s were unable to reject missing-self target cells in vivo, were defective in NK cell licensing, and displayed lower KLRG1 on the surface of NK cells". Similarly, the title of the second subsection of the Results states: "Ly49-deficient NK cells are defective in licensing and rejection of cognate MHC-I deficient target cells" (line 176). In these instances, it seems that by "licensing", they mean only response to plate-bound anti-NK1.1 stimulation and not a response to missing-self targets. Alternatively, in the first paragraph of the Discussion, it sounds as if licensing includes both anti-NK1.1 and missing-self responses (lines 258-260): "...NK cells were fully licensed in terms of their functional phenotype, including the capacity to be activated by an activation receptor in vitro and efficient rejection of MHC-I deficient target cells in vivo". Please define the terms and use the terms consistently throughout the paper.

      We were the first to describe the term licensing and have defined this as acquisition of NK cell functional competence by self-MHC molecules (Kim et al., 2005), which is characterized by increased NK cell effector functions to activating signals. Thus, licensed NK cells are prevented from attacking normal MHC-I<sup>+</sup> cells by the same self-MHC-I-specific receptor that conferred licensing, while unlicensed NK cells without appropriate Ly49 receptors are functionally incompetent.

      To clarify we made changes throughout the manuscript including the following:

      Lines 91-101:

      “In addition to effector function in missing-self, Ly49 receptors that recognize their cognate MHC-I ligands are involved in licensing or education of NK cells to acquire functional competence. NK cell licensing is characterized by potent effector functions including IFNγ production and degranulation in response to activation receptor stimulation (Elliott et al., 2010; Kim et al., 2005). Like missing-self recognition, inhibitory Ly49s require SHP-1 for NK cell licensing which interacts with the ITIM-motif encoded in the cytosolic tail of inhibitory Ly49s (Bern et al., 2017; Kim et al., 2005; Viant et al., 2014). Moreover, lower expression of SHP-1, particularly within the immunological synapse, is associated with licensed NK cells (Schmied et al., 2023; Wu et al., 2021). Thus, inhibitory Ly49s have a second function that licenses NK cells to self-MHC-I thereby generating functionally competent NK cells but it has not been possible to exclude contributions from other co-expressed Ly49s.”

      Lines 268-271 (previously 258-260):

      “Yet the NK cells were fully licensed in terms of IFNγ production and degranulation in vitro and efficiently rejected MHC-I deficient target cells in vivo. Thus, a single Ly49 receptor is capable to confer the licensed phenotype and missing-self rejection in vitro and in vivo.”

      Lines 309-312:

      “In conclusion, these data show that expression of a single inhibitory Ly49 receptor is necessary and sufficient to license NK cells and mediate missing self-rejection under steady state conditions in vivo.”

      (3) Introduction, lines 76-79. Please provide the C57BL/6 MHC-I genotype. It is difficult to follow the text here without this information. In general, please provide information to help the reader who may not be working in this precise field.

      We thank the reviewer for pointing this out. We have included this and the lines now read: “For example, in the C57BL/6 background, Ly49C and Ly49I can recognize H-2<sup>b</sup> MHC-I molecules that include H-2K<sup>b</sup> and H-2D<sup>b</sup>, while Ly49A and Ly49G cannot recognize H-2<sup>b</sup> molecules and instead they recognize H-2<sup>d</sup> alleles.”

      (4) Introduction, lines 85-97. Please use commas: "...the MHC-I specificities of other Ly49s have been primarily studied with MHC tetramers containing human b2m, which is not recognized by Ly49A, on cells overexpressing Ly49s" in order to clarify the sentence.

      Commas have been added as suggested by the reviewer.

      (5) Introduction, lines 91-101. The whole paragraph starting with the following sentence does not make sense and should be re-written. "In addition to effector function in missing-self, when inhibitory Ly49 receptors recognize their cognate MHC-I ligands in vivo, they license or educate NK cells for potent effector functions including IFNγ production and degranulation in response to activation receptor stimulation".

      We regret that this paragraph was not clear to the reviewer. We have changed this paragraph to:

      “In addition to effector function in missing-self, Ly49 receptors that recognize their cognate MHC-I ligands are involved in licensing or education of NK cells to acquire functional competence. NK cell licensing is characterized by potent effector functions including IFNγ production and degranulation in response to activation receptor stimulation (Elliott et al., 2010; Kim et al., 2005). Like missing-self recognition, inhibitory Ly49s require SHP-1 for NK cell licensing which interacts with the ITIM-motif encoded in the cytosolic tail of inhibitory Ly49s (Bern et al., 2017; Kim et al., 2005; Viant et al., 2014). Moreover, lower expression of SHP-1, particularly within the immunological synapse, is associated with licensed NK cells (Schmied et al., 2023; Wu et al., 2021). Thus, inhibitory Ly49s have a second function that licenses NK cells to self-MHC-I thereby generating functionally competent NK cells but it has not been possible to exclude contributions from other co-expressed Ly49s.”

      (6) Results, line 181. Please edit: "...MHC-I-deficient H-2K<sup>b</sup> x H-2D<sup>b</sup> deficient (KODO) mice".

      This sentence now reads “... NK cells from H-2K<sup>b</sup> and H-2D<sup>b</sup> double deficient (KODO) mice”

      (7) Results, line 192. Please re-word the following phrase: "missing-self is dominated by H-2K<sup>b</sup> in the C57BL/6 background", as it is unclear. Do you mean that H-2K<sup>b</sup> is protected from lysis as opposed to H-2D<sup>b</sup>?

      We thank the reviewer for pointing this out, line 192 now reads: “..missing-self recognition in the C57BL/6 background depends on the absence of H-2K<sup>b</sup> rather than H-2D<sup>b</sup>.”

      (8) Please briefly describe the Ncr1-Ly49A knockin procedure so that the reader understands the link between NKp46 and Ly49A expression without going to the earlier paper. Also, it needs to be mentioned that Ncr1 is the gene encoding NKp46.

      Lines 201-205 now read: “To investigate the potential of a single inhibitory Ly49 receptor on mediating NK cell licensing and missing-self rejection, the Ly49KO mice were backcrossed to H-2D<sup>d</sup> transgenic KODO (D8-KODO) Ly49A KI mice that express Klra1 cDNA encoding the inhibitory Ly49A receptor in the Ncr1 locus encoding NKp46 and its cognate ligand H-2D<sup>d</sup> but not any other classical MHC-I molecules (Parikh et al., 2020).

      In the materials and Methods section, the following has been added (lines 324-326):

      “In Ly49A KI mice the stop codon of Ncr1 encoding NKp46 is replaced with a P2A peptide-cleavage site upstream of the Ly49A cDNA, while maintaining the 3’ untranslated region.”

      (9) Figure 4C, legend. There is no CFSE staining in this experiment. Please correct.

      We did label target cells with CFSE to distinguish them from host cells, to clarify we have done the following:

      We have removed CFSE from figure legends of Figure 2 and 4.

      We included the following on CFSE labeling in the Materials and Methods section (lines 377-379): “Target splenocytes were additionally labeled with CFSE to identify transferred target splenocytes from host cells.”

      (10) Discussion, lines 262-270. This paragraph sounds as if data by Hanke et al. does not agree with the data presented in the paper. On the contrary, Hanke et al. demonstrate that Ly49C and Ly49I detectably bind to H-2K<sup>b</sup>, but poorly to H-2D<sup>b</sup>, supporting observations shown in Figure 2C.

      We have clarified this issue and the paragraph now reads: “This is further supported by early studies using Ly49 transfectants binding to Con A blasts showing that Ly49C and Ly49I can bind to H-2D<sup>b</sup>-deficient but not H-2K<sup>b</sup>-deficient cells (Hanke et al., 1999), despite the caveat of testing binding to cells overexpressing Ly49s in these studies.”

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Hua et al show how targeting amino acid metabolism can overcome Trastuzumab resistance in HER2+ breast cancer.

      Strengths:

      The authors used metabolomics, transcriptomics and epigenomics approaches in vitro and in preclinical models to demonstrate how trastuzumab-resistant cells utilize cysteine metabolism.

      Thank you for your valuable comments. We would like to extend our appreciation for your efforts. Your constructive suggestion would help improve our research.

      Weaknesses:

      However, there are some key aspects that needs to be addressed.

      Major:

      (1) Patient Samples for Transcriptomic Analysis: It is unclear from the text whether tumor tissues or blood samples were used for the transcriptomic analysis. This distinction is crucial, as these two sample types would yield vastly different inferences. The authors should clarify the source of these samples.

      Thank you for your valuable comments. In the transcriptomic analysis, we included the data of HER2 positive breast cancer patients who received trastuzumab in I-SPY2 trial (GSE181574). Tumor tissues were used in this dataset.

      (2) The study only tested one trastuzumab-resistant and one trastuzumab-sensitive cell line. It is unclear whether these findings are applicable to other HER2-positive tumor cell lines, such as HCC1954. The authors should validate their results in additional cell lines to strengthen their conclusions.

      Thank you for your valuable comments. We agree with your opinion, and the exploration of multiple cell lines would make our research findings more comprehensive. This is a limitation of our study, and we would continue to improve our design and methods in future experiments.

      (3) Relevance to Metastatic Disease: Trastuzumab resistance often arises in patients during disease recurrence, which is frequently associated with metastasis. However, the mouse experiments described in this paper were conducted only in the primary tumors. This article would have more impact if the authors could demonstrate that the combination of Erastin or cysteine starvation with trastuzumab can also improve outcomes in metastasis models.

      Thank you for your valuable comments. We agree with your suggestions. The exploration of metastatic disease would make our research more meaningful and help better address clinical key issues. In our future studies, we will continue to investigate the association between the invasive and metastatic capabilities of trastuzumab resistant HER2 positive breast cancer and cysteine metabolism.

      Minor:

      (1) The figures lack information about the specific statistical tests used. Including this information is essential to show the robustness of the results.

      Thank you for your valuable comments. We would include the statistical information in our figure legends.

      (2) Figure 3K Interpretation: The significance asterisks in Figure 3K do not specify the comparison being made. Are they relative to the DMSO control? This should be clarified.

      Thank you for your valuable comments. We would clarify the comparison information in our figure legends.

      Reviewer #2 (Public review):

      In this manuscript, Hua et al. proposed SLC7A11, a protein facilitating cellular cystine uptake, as a potential target for the treatment of trastuzumab-resistant HER2-positive breast cancer. If this claim holds true, the finding would be of significance and might be translated to clinical practice. Nevertheless, this reviewer finds that the conclusion was poorly supported by the data.

      Notably, most of the data (Figures 2-6) were based on two cell lines - JIMT1 as a representative of trastuzumab-resistant cell line, and SKBR3 as a representative of trastuzumab sensitive cell line. As such, these findings could be cell-line specific while irrelevant to trastuzumab sensitivity at all. Furthermore, the authors claimed ferroptosis simply based on lipid peroxidation (Figure 3). Cell viability was not determined, and the rescuing effects of ferroptosis inhibitors were missing. The xenograft experiments were also suspicious (Figure 4). The description of how cysteine starvation was performed on xenograft tumors was lacking, and the compound (i.e., erastin) used by the authors is not suitable for in vivo experiments due to low solubility and low metabolic stability. Finally, it is confusing why the authors focused on epigenetic regulations (Figures 5 & 6), without measuring major transcription factors (e.g., NRF2, ATF4) which are known to regulate SLC7A11.

      To sum up, this reviewer finds that the most valuable data in this manuscript is perhaps Figure 1, which provides unbiased information concerning the metabolic patterns in trastuzumab-sensitive and primary resistant HER2-positive breast cancer patients.

      Thank you for your valuable comments. We agree with your suggestions. Your feedback would help enhance the quality of our research.

      (1) Our research was mainly conducted in JIMT1 (trastuzumab resistant) and SKBR3 (trastuzumab sensitive), and this is a limitation of our study. The experimental validation using different cell lines will make our research findings more persuasive. In our future research, we will continuously optimize experimental design and methods to make our findings more comprehensive.

      (2) The detection of ferroptosis in our research was mainly performed by evaluating the lipid peroxidation. Experiments measuring cell viability and rescuing effects would help provide more evidence.

      (3) In xenograft experiments, the cysteine starvation was performed by feeding cysteine-free diet. The drug dissolution and other conditions were optimized by referring to previous relevant literature. We would clarify more details in our article.

      (4) Epigenetic modifications have been recognized as crucial factors in drug resistance formation. An increasing number of studies have emphasized the importance of epigenetic changes in regulating the abnormal expression of oncogenes and tumor suppressor genes related to drug resistance. Currently, the role of epigenetic changes in the development of trastuzumab resistance in HER2 positive breast cancer is still in exploration. We tried to investigate the dysregulation of histone modifications and DNA methylation in trastuzumab resistant HER2 positive breast cancer. Our findings indicated that targeting H3K4me3 and DNA methylation could decrease SLC7A11 expression and induce ferroptosis. This would provide more evidence in exploring trastuzumab resistance mechanisms. We will provide a more detailed discussion in the article.

      We would like to extend our appreciation for your constructive suggestions and continue to improve our research in future experiments.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors report that GPR55 activation in presynaptic terminals of Purkinje cells decrease GABA release at the PC-DCN synapse. The authors use an impressive array of techniques (including highly challenging presynaptic recordings) to show that GPR55 activation reduces the readily releasable pool of vesicle without affecting presynaptic AP waveform and presynaptic Ca2+ influx. This is an interesting study, which is seemingly well-executed and proposes a novel mechanism for the control of neurotransmitter release. However, the authors' main conclusions are heavily, if not solely, based on pharmacological agents that most often than not demonstrate affinity at multiple targets. Below are points that the authors should consider in a revised version.

      We thank the reviewer for the encouraging comments, and will fully address the reviewer’s concerns as detailed below.

      Major points:

      (1) There is no clear evidence that GPR55 is specifically expressed in presynaptic terminals at the PC-DCN synapse. The authors cited Ryberg 2007 and Wu 2013 in the introduction, mentioning that GPR55 is potentially expressed in PCs. Ryberg (2007) offers no such evidence, and the expression in PC suggested by Wu (2013) does not necessarily correlate with presynaptic expression. The authors should perform additional experiments to demonstrate the presynaptic expression of GPR55 at PC-DCN synapse.

      We agree with the reviewer’s concern that the present manuscript lacks the evidence for localization of GPR55 at PC axon terminals. Honestly, our previous attempt to immune-label GPR55 did not work well. Now, we realize that different antibodies are commercially available, and are going to test them. Hopefully, in the revised manuscript, we will demonstrate immunocytochemical images showing GPR55 at terminals of PCs.

      (2) The authors' conclusions rest heavily on pharmacological experiments, with compounds that are sometimes not selective for single targets. Genetic deletion of GPR55 would be a more appropriate control. The authors should also expand their experiments with occlusion experiments, showing if the effects of LPI are absent after AM251 or O-1602 treatment. In addition, the authors may want to consider AM281 as a CB1R antagonist without reported effects at GPR55.

      We appreciate the reviewer for pointing out the essential issue regarding the specificity of activation of GPR55 in our study. Regarding the direct manipulation of GPR55, such as genetic deletion, we will try acute knock-down of its expression, considering the possibility of compensation which sometimes occur when the complete knock-out is performed. In addition, according to the reviewer’s suggestion, we will examine whether the effects of LPI and AM251 occlude each other, and also perform control experiments showing the lack of CB1R involvement.

      (3) It is not clear how long the different drugs were applied, and at what time the recordings were performed during or following drug application. It appears that GPR55 agonists can have transient effects (Sylantyev, 2013; Rosenberg, 2023), possibly due to receptor internalization. The timeline of drug application should be reported, where IPSC amplitude is shown as a function of time and drug application windows are illustrated.

      As suggested, the timing and duration of drug application will be indicated together with the time course of changes of IPSC amplitudes. This change will make things much clearer. Thank you for the suggestion.

      (4) A previous investigation on the role of GPR55 in the control of neurotransmitter release is not cited nor discussed Sylantyev et al., (2013, PNAS, Cannabinoid- and lysophosphatidylinositol-sensitive receptor GPR55 boosts neurotransmitter release at central synapses). Similarities and differences should be discussed.

      We are really sorry for missing this important study in discussion and citation. In the revised version, of course, we will discuss their findings and our data.

      Minor point:

      (1) What is the source of LPI? What isoform was used? The multiple isoforms of LPI have different affinities for GPR55.

      We are sorry for insufficient explanation about the LPI used in our study. We used LPI derived from soy (Merck, catalog #L7635) that was estimated to contain 58% C16:0 and 42% C18:0 or C18:2 LPI. This information will be added to the Materials and Methods in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This paper investigates the mode of action of GPR55, a relatively understudied type of cannabinoid receptor, in presynaptic terminals of Purkinje cells. The authors use demanding techniques of patch clamp recording of the terminals, sometimes coupled with another recording of the postsynaptic cell. They find a lower release probability of synaptic vesicles after activation of GPR55 receptors, while presynaptic voltage-dependent calcium currents are unaffected. They propose that the size of a specific pool of synaptic vesicles supplying release sites is decreased upon activation of GPR55 receptors.

      Strengths:

      The paper uses cutting-edge techniques to shed light on a little-studied, potentially important type of cannabinoid receptor. The results are clearly presented, and the conclusions are for the most part sound.

      We are really happy to hear the encouraging comments from the reviewer.

      Weaknesses:

      The nature of the vesicular pool that is modified following activation of GPR55 is not definitively characterized.

      During revision, we will perform further analysis and additional experiments to obtain deeper insights into the vesicle pools affected by GPR55 as much as possible.

      Reviewer #3 (Public review):

      Summary:

      Inoshita and Kawaguchi investigated the effects of GPR55 activation on synaptic transmission in vitro. To address this question, they performed direct patch-clamp recordings from axon terminals of cerebellar Purkinje cells and fluorescent imaging of vesicular exocytosis utilizing synapto-pHluorin. They found that exogenous activation of GPR55 suppresses GABA release at Purkinje cell to deep cerebellar nuclei (PC-DCN) synapses by reducing the readily releasable pool (RRP) of vesicles. This mechanism may also operate at other synapses.

      Strengths:

      The main strength of this study lies in combining patch-clamp recordings from axon terminals with imaging of presynaptic vesicular exocytosis to reveal a novel mechanism by which activation of GPR55 suppresses inhibitory synaptic strength. The results strongly suggest that GPR55 activation reduces the RRP size without altering presynaptic calcium influx.

      We thank the reviewer for the positive evaluation on our conclusions.

      Weaknesses:

      The study relies on the exogenous application of GPR55 agonists. It remains unclear whether endogenous ligands released due to physiological or pathological activities would have similar effects. There is no information regarding the time course of the agonist-induced suppression. There is also little evidence that GPR55 is expressed in Purkinje cells. This study would benefit from using GPR55 knockout (KO) mice. The downstream mechanism by which GPR55 mediates the suppression of GABA release remains unknown.

      We agree with the reviewer in all respects suggested as weaknesses. Most issues will be made much clearer by the additional experiments and analysis described above to respond to respective issues raised by other reviewers. The situation of endogenous ligands for GPR55 causing the synaptic depression and its downstream mechanism are very important issues, and we are going to discuss these points in the revised manuscript, and like to work on these in the future study.

    1. Author response:

      Reviewer #1:

      In their paper entitled "Combined transcriptomic, connectivity, and activity profiling of the medial amygdala using highly amplified multiplexed in situ hybridization (hamFISH)" Edwards et al. present a new method designated as hamFISH (highly amplified multiplexed in situ hybridization) that enables sequential detection of {less than or equal to}32 genes using multiplexed branched DNA amplification. As proof-of-principle, the authors apply the new technique - in conjunction with connectivity, and activity profiling - to the medial amygdala (MeA) of the mouse, which is a critical nucleus for innate social and defensive behaviors.

      As mentioned by Edwards et al., hamFISH could prove beneficial as an affordable alternative to other in situ transcriptomic methods, including commercial platforms, that are resource-intensive and require complex analysis pipelines. Thus, the authors envision that the method they present could democratize in situ cell-type identification in individual laboratories.

      The data presented by Edwards et al. is convincing. The authors use the appropriate and validated methodology in line with the current state-of-the-art. The paper makes a strong case for the benefits of hamFISH when combining transcriptomics studies with connectivity tracing and immediate early gene-based activity profiling. Notably, the authors also discuss the caveats and limitations of their study/approach in an open and transparent manner.

      In its current state, the manuscript touches upon a number of most intriguing, yet rather preliminary findings. For example, the roles of inhibitory neuron cluster i3 or of the selective and apparently MeA neuron-specific projections (Figure 3 - Figure Supplement 2D) remain elusive. As it is the authors' prime intent to provide "a proof-of-principle example of overlaying transcriptomic types, projection, and activity in a behaviorally relevant manner and demonstrates the usefulness of hamFISH in multiplexed in situ gene expression profiling", such studies might be beyond the scope of the present manuscript. The absence of such more in-depth hypothesis-based analysis, however, prevents an even more enthusiastic overall assessment.

      We thank the reviewer for their positive assessment and agree that further studies are needed to explore and understand the MeA circuit further.

      Reviewer #2:

      The authors describe the development and implementation of hamFISH, a sensitive multiplexed ISH method. They leverage a pre-existing scRNA-seq dataset for the MeA to design 32 probes that combinatorically represent MeA neuronal populations - ~80% of MeA neurons express three of these markers. Using these markers to assess the spatial organization of the MeA, the authors identify a novel population of Ndnf+ projection neurons and characterize their connectivity with anterograde and retrograde labeling. They additionally combine hamFISH with CTB labeling of three principal MeA projection sites to show that 75% of MeA neurons have only a single projection target. Finally, they engage adult male mice in encounters with other adult males (aggression), females (mating), and pups (infanticide), followed by hamFISH and c-fos labeling to relate cell identity to behavior. Their overall conclusion is that hamFISH-defined cell types are broadly active to multiple sensory stimuli. However, the data presented are not sufficient to conclude that no selectivity exists within the MeA. A weakness of the study is that the selected hamFISH genes contain only Lhx6 as a lineage-marking transcription factor. Instead, the authors predominately use neuropeptides as markers. Genes such as Tac1, Cartpt, Adcyap1, Calb1, and Gal are expressed throughout the MeA, and many other brain regions; they are not restricted to a single transcriptomic cell type and they do not denote any developmental origins. By design, the panel has low cell type specificity as all MeA neurons express at least three of the genes. Therefore, the authors' conclusions may not hold with a more stringent classification of cell type or cell identity.

      We agree with the reviewer that a deeper level of cell type classification may reveal the selectivity of cell types that may have been missed. The design of our hamFISH bridge-readout probes allows modification to be compatible with a barcoded readout system such as MERFISH, which would substantially increase the number of genes that can be included in the gene panel. This would, however, increase the complexity of the analysis pipeline and reduce throughput, but would be a potential avenue to explore to define MeA cell types at a deeper level. An advantage of hamFISH is the ease of including and reading out alternative gene panels. For example, one panel could examine developmental-lineage-specific genes. Overall, our panel captures the highest hierarchical level (similar to the subclass level of the Allen taxonomy) of MeA transcriptomic types, based on published data available at the time of our gene panel design. Genes including Tac1, Cartpt, Adcyap1, Calb1, and Gal are expressed in specific patterns within the MeA and are useful for classification. In the original manuscript, we also included our rationale for dropping Foxp2, a lineage-specific marker gene in the MeA.

      Reviewer #3:

      In this manuscript, Edwards et al. describe hamFISH, a customizable and cost-efficient method for performing targeted spatial transcriptomics. hamFISH utilizes highly amplified multiplexed branched DNA amplification, and the authors extensively describe hamFISH development and its advantages over prior variants of this approach.

      The authors then used hamFISH to investigate an important circuit in the mouse brain for social behavior, the medial amygdala (MeA). To develop a hamFISH probe set capable of distinguishing MeA neurons, the authors mined published single-cell RNA-sequencing datasets of the MeA, ultimately creating a panel of 32 hamFISH probes that mostly cover the identified MeA cell types. They evaluated over 600,000 MeA cells and classified neurons into 16 inhibitory and 10 excitatory types, many of which are spatially clustered. The authors combined hamFISH with viral and other circuit tracer injections to determine whether the identified MeA cell populations sent and/or received unique inputs from connected brain regions, finding evidence that several cell types had unique patterns of input and output. Finally, the authors performed hamFISH on the brains of male mice that were placed in behavioral conditions that elicit aggressive, infanticidal, or mating behaviors, finding that some cell populations are selectively activated (as assessed by c-fos mRNA expression) in specific social contexts.

      Strengths:

      (1) The authors developed an optimized tissue preparation protocol for hamFISH and implemented oligopools instead of individually synthesized oligonucleotides to reduce costs. The branched DNA amplification scheme improved smFISH signal compared to previous methods, and multiple variants provide additional improvements in signal intensity and specificity. Compared to other spatial transcriptomics methods, the pipeline for imaging and analysis is streamlined and is compatible with other techniques like fluorescence-based circuit tracing. This approach is cost-effective and has several advantages that make it a valuable addition to the list of spatial transcriptomics toolkits.

      (2) Using 31 probes, hamFISH was able to detect 16 inhibitory and 10 excitatory neuron types in the MeA subregions, including the vast majority of cell types identified by other transcriptomics approaches. The authors quantified the distributions of these cell types along the anterior-posterior, dorsal-ventral, and medial-lateral axes, finding spatial segregation among some, but not all, MeA excitatory and inhibitory cell types. The authors additionally identified a class of inhibitory neurons expressing Ndnf (and a subset of these that express Chrna7) that project multiple social chemosensory circuits.

      (3) The authors combined hamFISH with MeA input and output mapping, finding cell-type biases in the projections to the MPOA, BNST, and VMHvl, and inputs from multiple regions.

      (4) The authors identified excitatory and inhibitory cell types, and patterns of activity across cell types, that were selectively activated during various social behaviors, including aggression, mating, and infanticide, providing new insights and avenues for future research into MeA circuit function.

      Weaknesses:

      (1) Gene selection for hamFISH is likely to still be a limiting factor, even with the expanded (32-probe) capacity. This may have contributed to the lack of ability to identify sexually dimorphic cell types (Figure S2B). This is an expected tradeoff for a method that has major advantages in terms of cost and adaptability.

      We recognise that the 32-plex gene detection might not be sufficient to address key questions in the transcriptomic organization of innate social behavior circuits, and that the study fell short of addressing more quantitative gene expression differences between sexes.  Detecting sexually dimorphic gene expression likely requires a more targeted approach as the dimorphism is expression differences rather than binary expression of marker genes, and the gene panel needs to be specifically configured for this purpose.

      (2) Adaptation of hamFISH, for example, to adapt it to other brain regions or tissues, may require extensive optimization.

      We have successfully performed hamFISH on at least two other mouse brain regions without needing to optimize further, suggesting that compatibility with other mouse brain regions is not an issue. We recognise, however, that optimization of hamFISH may be required for its application in other types of tissue or species. Human brain tissue, for example, typically suffers from high autofluorescence and different tissue preparation methods may need to be employed. We note that the amplification by hamFISH signal boost with v2 amplifiers may be useful to this end.

      (3) Pairing this method with behavioral experiments is likely to require further optimization, as c-fos mRNA expression is an indirect and incomplete survey of neuronal activity (e.g. not all cell types upregulate c-fos when electrically active). As such, there is a risk of false negative results that limit its utility for understanding circuit function.

      We acknowledge that c-fos is not the only readout of neuronal activity and that a panel of immediate early genes would allow a more comprehensive readout of activity-dependent gene expression. We fully agree that immediate early gene induction is an indirect readout of neural activity, and alternative methods such as in vivo physiology would provide a complementary insight into the selectivity of MeA neuron responses.

      (4) The limited compatibility of hamFISH with thicker tissue samples and lack of optical sectioning introduce additional technical limitations. For example, it would be difficult to densely sample larger neural circuits using serial 20 micron sections. Also, because the imaging modality is not clear from the methods, it is difficult to know whether the analysis methods introduce the risk of misattributing gene expression to overlapping cells.

      We agree that the use of hamFISH as described here is restricted to thin (<20 um) sections. We have shown, however, that our encoding probe and bridge-readout probe design are compatible with HCR-based mRNA detection, which is compatible with thicker sections. Regarding the misattribution of gene expression to overlapping cells in the z-axis, we used epifluorescence microscopy with 14x 500 nm z-steps to collect our raw data and generate maximum intensity projections for further analysis. Because of the thin sections (10 um) used for the imaging, the overlap between cells in z is expected to be minimal. Regarding throughput, we agree that hamFISH is likely not suitable for brain-wide questions that require large volume coverage, but its major advantage is that it allows routine use of low-level multiplexing for targeted brain areas.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:  

      Reviewer #1 (Public Review):  

      Summary:  

      The study by Pudlowski et al. investigates how the intricate structure of centrioles is formed by studying the role of a complex formed by delta- and epsilon-tubulin and the TEDC1 and TEDC2 proteins. For this, they employ knockout cell lines, EM, and ultrastructure expansion microscopy as well as pull-downs. Previous work has indicated a role of delta- and epsilon-tubulin in triplet microtubule formation. Without triplet microtubules centriolar cylinders can still form, but are unstable, resulting in futile rounds of de novo centriole assembly during S phase and disassembly during mitosis. Here the authors show that all four proteins function as a complex and knockout of any of the four proteins results in the same phenotype. They further find that mutant centrioles lack inner scaffold proteins and contain an extended proximal end including markers such as SAS6 and CEP135, suggesting that triplet microtubule formation is linked to limiting proximal end extension and formation of the central region that contains the inner scaffold. Finally, they show that mutant centrioles seem to undergo elongation during early mitosis before disassembly, although it is not clear if this may also be due to prolonged mitotic duration in mutants.  

      Strengths:  

      Overall this is a well-performed study, well presented, with conclusions mostly supported by the data. The use of knockout cell lines and rescue experiments is convincing.  

      Weaknesses:  

      In some cases, additional controls and quantification would be needed, in particular regarding cell cycle and centriole elongation stages, to make the data and conclusions more robust. 

      We thank the reviewer for these comments and have improved our analyses of these as detailed below.

      Reviewer #2 (Public Review):  

      Summary:  

      In this article, the authors study the function of TEDC1 and TEDC2, two proteins previously reported to interact with TUBD1 and TUBE1. Previous work by the same group had shown that TUBD1 and TUBE1 are required for centriole assembly and that human cells lacking these proteins form abnormal centrioles that only have singlet microtubules that disintegrate in mitosis. In this new work, the authors demonstrate that TEDC1 and TEDC2 depletion results in the same phenotype with abnormal centrioles that also disintegrate into mitosis. In addition, they were able to localize these proteins to the proximal end of the centriole, a result not previously achieved with TUBD1 and TUBE1, providing a better understanding of where and when the complex is involved in centriole growth.  

      Strengths:  

      The results are very convincing, particularly the phenotype, which is the same as previously observed for TUBD1 and TUBE1. The U-ExM localization is also convincing:

      despite a signal that's not very homogeneous, it's clear that the complex is in the proximal region of the centriole and procentriole. The phenotype observed in U-ExM on the elongation of the cartwheel is also spectacular and opens the question of the regulation of the size of this structure. The authors also report convincing results on direct interactions between TUBD1, TUBE1, TEDC1, and TEDC2, and an intriguing structural prediction suggesting that TEDC1 and TEDC2 form a heterodimer that interacts with the TUBD1- TUBE1 heterodimer.  

      Weaknesses:  

      The phenotypes observed in U-ExM on cartwheel elongation merit further quantification, enabling the field to appreciate better what is happening at the level of this structure.  

      We thank the reviewer for these comments and have improved our analyses of cartwheel elongation as detailed below.

      Reviewer #3 (Public Review):  

      Summary:  

      Human cells deficient in delta-tubulin or epsilon-tubulin form unstable centrioles, which lack triplet microtubules and undergo a futile formation and disintegration cycle. In this study, the authors show that human cells lacking the associated proteins TEDC1 or TEDC2 have these identical phenotypes. They use genetics to knockout TEDC1 or TEDC2 in p53negative RPE-1 cells and expansion microscopy to structurally characterize mutant centrioles. Biochemical methods and AlphaFold-multimer prediction software are used to investigate interactions between tubulins and TEDC1 and TEDC2.  

      The study shows that mutant centrioles are built only of A tubules, which elongate and extend their proximal region, fail to incorporate structural components, and finally disintegrate in mitosis. In addition, they demonstrate that delta-tubulin or epsilon-tubulin and TEDC1 and TEDC2 form one complex and that TEDC1 TEDC2 can interact independently of tubulins. Finally, they show that the localization of four proteins is mutually dependent.  

      Strengths:  

      The results presented here are mostly convincing, the study is exciting and important, and the manuscript is well-written. The study shows that delta-tubulin, epsilon-tubulin, TEDC1, and TEDC2 function together to build a stable and functional centriole, significantly contributing to the field and our understanding of the centriole assembly process.  

      Weaknesses:  

      The ultrastructural characterization of TEDC1 and TEDC2 obtained by U-ExM is inconclusive. Improving the quality of the signals is paramount for this manuscript.  

      We thank the reviewer for these comments and have improved our imaging of TEDC1 and TEDC2 localization, as detailed below.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):  

      The reviewers agreed that the conclusions are largely supported by solid evidence, but felt that improving the following aspects would make some of the data more convincing:  

      (1) The UExM localizations of TEDC1/2 are not very convincing and the reviewers suggest to complement these with alternative super-resolution approaches (e.g. SIM) and/or different labeling techniques such as pre-expansion labeling using STAR red/orange secondaries (also robust for SIM and STED), use of the Halo tag, different tag antibodies, etc 

      We thank the reviewers for these recommendations and have adapted two of these strategies to improve our imaging of TEDC1 and TEDC2 localization. First, we used an alternative super-resolution approach, a Yokogawa CSU-W1 SoRA confocal scanner (resolution = 120 nm) and imaged cells grown on coverslips (not expanded). We found that TEDC1 and TEDC2 localize to procentrioles and the proximal end of parental centrioles (Fig 2 – Supplementary Figure 1a, b). Second, we used a recently described expansion gel chemistry (Kong et al., Methods Mol Biol 2024) combined with Abberior Star red and orange secondary antibodies. This technique resulted in robust signal at centrosomes and in the cytoplasm and indicated that TEDC1 and TEDC2 localize near the centriole walls of procentrioles and the proximal region of parental centrioles, near CEP44 (Fig 2 – Supplementary Figure 1c, d). These results complement and support our initial observations (Fig 2C, D) and we have edited the text to reflect this (lines 157-163). We also note that these Flag tag and V5 tag primary antibodies are specific and have little background signal in all applications (Fig 2 – Supplementary Fig 1E-J), while other commercially available antibodies against these tags did exhibit non-specific signal. 

      (2) The cell cycle classifications of centrioles would strongly benefit, apart from a better description, from adding quantifications of average centriole length at a given stage based on tubulin staining (not acTub). 

      We thank the reviewers for these recommendations. We have added an improved description of our cell cycle analyses (lines 234-237). We have also added new analyses for centriole length as measured by staining with alpha-tubulin (Fig 4 – Supp 3 and Fig 4 – Supp 4). We find that in all mutants, acetylated tubulin elongates along with alpha-tubulin in a similar way as control centrioles.

      Reviewer #1 (Recommendations For The Authors):  

      Specific points:  

      (1) The introduction is a bit oddly structured. About halfway through it summarizes what is going to be presented in the study, giving the impression that it is about to conclude, but then continues with additional, detailed introduction paragraphs. Overall, the authors may also want to consider making it more concise.

      We thank the reviewer for these suggestions and have shortened and restructured the introduction for clarity and conciseness.

      (2) The text should explain to the non-expert reader why endogenous proteins are not detected and why exogenously expressed, tagged versions are used. Related to this, the authors state overexpression, but what is this assessment based on? Does expression at the endogenous level also rescue? At least by western blot, these questions should be addressed. 

      In the text, we have added clarification about why endogenous proteins were not detected for immunofluorescence (lines 149-151). To quantify the overexpression, we have added Western blots of TEDC1 and TEDC2 to Fig 1 – Supplementary Figure 1E,F. We note that endogenous levels of both proteins are very low, and the rescue constructs are overexpressed 20 to 70 fold above endogenous levels.  

      (3) The figures should clearly indicate when tagged proteins are used and detected.

      Currently, this info is only found in the legends but should be in the figure panels as well. 

      We have made these changes to the figure panels in Fig 2, Fig 2 – Supp 1, and Fig 3.

      (4)  I could not find a description and reference to Figure 2 Supplement 2 and 3. 

      We have replaced these supplements with new supplementary figures for TEDC1 and TEDC2 localization (Fig 2 – Supp 1).

      (5) The multiple bands including unspecific (?) bands should be labeled to guide the reader in the western blots. 

      We have labeled nonspecific bands in our Western blots with asterisks (Fig 1 – Supp 1, Fig 3)

      (6) The alphafold prediction suggests that TUBD1 can bind to the TED complex in the absence of TUBE1 can this be shown? This would be a nice validation of the predicted architecture of the complex. I also missed a bit of a discussion of the predicted architecture. How could it be linked to triplet microtubule formation? Is the latest alphafold version 3 adding anything to this analysis? 

      In our pulldown experiments, we found that TUBD1 cannot bind to TEDC1 or TEDC2 in the absence of TUBE1 (Fig 3C, D, IB: TUBD1). We performed this experiment with three biological replicates and found the same result. It is possible that TUBD1 and TUBE1 form an intact heterodimer, similar to alpha-tubulin and beta-tubulin, and this will be an exciting area of future research.

      We have added new analysis from AlphaFold3 (Fig 3 – Supp 1B). AlphaFold3 predicts a similar structure as AlphaFold Multimer.

      We have also added additional discussion about the AlphaFold prediction to the text (lines 220-222, 365-367). Thanks to the reviewer for pointing out this oversight.

      (7) I suggest briefly explaining in the text how cells and centrioles at different cell cycle stages were identified. I found some info in the legend of Figure 1, but no info for other figures or in the text. Related to this, how are procentrioles defined in de novo formation? There is no parental centriole to serve as a reference. 

      We have added a brief explanation of the synchronization and identification in lines 234237. We have also clarified the text regarding de novo centrioles, and now term these “de novo centrioles in the first cell cycle after their formation” (lines 271-272).

      (8) Related to point 7: using acetylated tubulin as a universal length and width marker seems unreliable since it is a PTM. The authors should use general tubulin staining to estimate centriole dimensions, or at least establish that acetylated tubulin correlates well with the overall tubulin signal in all mutants. 

      We have added two supplementary data figures (Fig 4 – supp 3 and Fig 4 – supp 4) in which we co-stain control and mutant centrioles with alpha-tubulin. We found that acetylated tubulin marked mutant centrioles well and as alpha-tubulin length increased, acetylated tubulin length also increased. 

      (9) Presence and absence of various centriolar proteins. These analyses lack a clear reference for the precise centriole elongation stage. This is particularly problematic for proteins that are recruited at specific later stages (such as inner scaffold proteins). The staining should be correlated with centriole length measurements, ideally using general tubulin staining.  

      As described for point 8, we have added two supplementary data figures in which we costain control and mutant centrioles with alpha-tubulin and found that acetylated tubulin also increases as overall tubulin length increases in all mutants. We note that inner scaffold proteins are absent in all our mutant centrioles at all stages of the cell and centriole cycle, as also previously reported for POC5 in Wang et al., 2017.

      Reviewer #2 (Recommendations For The Authors):  

      Here's a list of points I think could be improved:  

      -  As the authors previously published, the centriole appears to have a smaller internal diameter than mature centrioles. Could the authors measure to see if the phenotype is identical? Is the centriole blocked in the bloom phase (Laporte et al. 2024)? 

      We have added an additional supplementary figure (Fig 4 – supp 5) to show that mutant centrioles have smaller diameters than mature centrioles, as we previously reported for the delta-tubulin and epsilon-tubulin mutant centrioles by EM. We thank the reviewers for the additional question of the bloom phase. Given the comparatively smaller number of centrioles we analyzed in this paper compared to Laporte et al (50 to 80 centrioles per condition here, versus 800 centrioles in Laporte et al), it is difficult to definitively conclude whether there is a block in bloom phase. This would be an interesting area for future research.  

      -  The images of the centrioles in EM are beautiful. Would it be possible to apply a symmetrisation on it to better see the centriolar structures? For example, is the A-C linker present? 

      We thank the reviewer for this excellent suggestion. Using centrioleJ, we find that the A-C linker is absent from mutant centrioles. The symmetrized images have been added to Fig 1 – Supplementary Fig 2, and additional discussion has been added to the text (line 143-144, line 368-374).  

      -  How many EM images were taken? Did the centrioles have 100% A-microtubule only or sometimes with B-MT? 

      For TEM, we focused on centrioles that were positioned to give perfect cross-section images of the centriolar microtubules, and thus did not take images of off-angle or rotated centrioles. Given the difficulty of this experiment (centrioles are small structures within the cell, centrosomes are single-copy organelles, and off-angle centrioles were not imaged), we were lucky to image 3 centrioles that were in perfect cross-section – 2 for Tedc1<sup>-/-</sup> and 1 for Tedc2<sup>-/-</sup>. Our images indicate that these centrioles only have A-tubules (Fig 1 – Supp Fig

      2).

      -  In Figure 2 - it would be preferable to write TEDC2-flag or TEDC1-flag and not TEDC2/1. 

      We have made this change

      -  It seems that Figures 2C and D aren't cited, and some of the data in the supplemental data are not described in the main text. 

      We have replaced these supplements with new supplementary figures for TEDC1 and TEDC2 localization (Fig 2 – Supp 1).

      -  The signal in U-ExM with the anti-Flag antibody is heterogeneous. Did the authors test several anti-FLAG antibodies in U-ExM? 

      We tested several anti-Flag and anti-V5 antibodies for our analyses, and chose these because they have little background signal in all applications (Fig 2 – Supplementary Fig 1E-J). Other commercially available antibodies against these tags did exhibit non-specific signal.

      -  The AlphaFold prediction is difficult to interpret, the authors should provide more views and the PDB file. 

      We have added 2 additional views of the AlphaFold prediction in Fig 3 – Supp 1A.

      -  In general, but particularly for Figure 4: the length doesn't seem to be divided by the expansion factor, it is therefore difficult to compare with known EM dimensions. Can the authors correct the scale bars? 

      We have corrected the scale bars for all figures to account for the expansion factor.

      -  Concerning Gamma-tubulin that is "recruited to the lumen of centrioles by the inner scaffold, had localization defects in mutant centrioles. However, we were unable to reliably detect gamma-tubulin within the lumen of control or de novo-formed centrioles in S or G2-phase (Figure 4 - Supplement 1E), and thus were unable to test this hypothesis". In Laporte et al 2024, Gamma-tubulin arrives later than the inner scaffold and only on mature centrioles, so this result appears to be in line with previous observation. However, the authors should be able to detect a proximal signal under the microtubules of the procentriole, is this the case? 

      We agree that this is an exciting question. However, in our expansion microscopy staining, we frequently observe that gamma-tubulin surrounds centrioles, corresponding to its role in the pericentriolar material (PCM). In our hands, we find it difficult to distinguish between centriolar gamma-tubulin at the base of the A-tubule from gamma-tubulin within the PCM.  

      -  In the signal elongation of SAS-6, STIL, CEP135, CPAP, and CEP44, would it be possible to quantify the length of these signals (with dimensions divided by the expansion factor for comparison with known TEM distances)? 

      We have quantified the lengths of SAS-6 and CEP135 in new Fig 4 – Supp 3 and Fig 4 – Supp 4.  

      -  The authors observe that centrin is present, but only as a SFI1 dot-like localization (which is another protein that would be interesting to look at), and not an inner scaffold localization. Can the authors elaborate? These results suggest that the distal part is correctly formed with only a microtubule singlet. 

      We agree with the reviewer’s interpretation that the centriole distal tip is likely correctly formed with only singlet microtubules, as both distal centrin and CP110 are present. We have added this point to the discussion (line 415).

      -The authors observe that CPAP is elongated, but CPAP has two locations, proximal and distal. Is it distal or proximal elongation? Is the proximal signal of CPAP longer than that of CEP44 in the mutants? The authors discuss that the elongation could come from overexpression of CPAP, but here it seems that the centriole is not overlong, just the structures around the cartwheel. 

      We thank the reviewer for this point. It is difficult for us to conclude whether the proximal or distal region is extended in the mutants, as our mutant centrioles lacks a visible separation between these two regions. It would be interesting to probe this question in the future by testing whether subdomains of CPAP may be differentially regulated in our mutants.

      Reviewer #3 (Recommendations For The Authors):  

      It isn't apparent to me what was counted in Figure 1C. Were all centrioles (mother centrioles and procentrioles) counted? Where is the 40% in control cells coming from? Can this set of data be presented differently? 

      We apologize for the confusion. In this figure, all centrioles were counted. We have updated the figure legend for clarity. We performed this analysis in a similar way as in Wang et al., 2017 to better compare phenotypes.  

      Figure 2C. and the text lines 182-187: The ultrastructural characterization of TEDC1 and TEDC2 suffers from the low quality of the TEDC1 and TEDC2 signals obtained postexpansion. In comparison with robust low-resolution immunosignal, it appears that most of the signal cannot be recovered after expansion. Another sub-resolution imaging method to re-analyze TEDC1 and TEDC22 localization would be essential. The same concern applies to Figures 2 - Supplement 2 and 3. Also, Figure 2 - Supplement 2 and Supplement 3 do not seem to be cited. 

      We thank the reviewer for these recommendations. As also mentioned above, we used an alternative super-resolution approach, a Yokogawa CSU-W1 SoRA confocal scanner (resolution = 120 nm), and found that TEDC1 and TEDC2 localize to procentrioles and the proximal end of parental centrioles (Fig 2 – Supplementary Figure 1a, b). Second, we used a recently described expansion gel chemistry (Kong et al., Methods Mol Biol 2024) combined with Abberior Star red and orange secondary antibodies. This technique resulted in robust signal at centrosomes and in the cytoplasm and indicated that TEDC1 and TEDC2 localize near the centriole walls of procentrioles and the proximal region of parental centrioles, near CEP44 (Fig 2 – Supplementary Figure 1c, d). These stainings complement and support our initial observations (Fig 2C, D) and we have edited the text to reflect this (lines 157-163). We have also removed the supplementary figures that were uncited in the text.

      TUBD1 and TUBE1 form a dimer and TEDC2 and TEDC1 can interact. Any speculation as to why TEDC2 does not pull down both TUBE1 and TUBD1? 

      We apologize for the confusion. TEDC2 does pull down both TUBE1 and TUBD1 (Fig 3D, pull-down, second column, Tedc2-V5-APEX2 rescuing the Tedc2<sup>-/-</sup> cells pulls down TUBD1, TUBE1, and TEDC1).  

      Figure 4A and B. The authors use acetylated tubulin to determine the length of procentrioles in the S and G2 phases. However, procentrioles are not acetylated on their distal ends in these cell phase phases (as the authors also mention further in the text). Why has alpha tubulin not been used since it works well in U-ExM? The average size of the control, G2 procentrioles, seems too small in Figure 4A and not consistent with other imaging data (for instance, in Figure 4 - Supplement 1 C, Cp110, and CPAP staining). There is no statistical analysis in F4A.  

      We have added two supplementary data figures (Fig 4 – supp 3 and Fig 4 – supp 4) in which we co-stain control and mutant centrioles with alpha-tubulin. We found that acetylated tubulin correlates well with overall tubulin signal in all mutants. We have added statistical analysis to the figure legend of Fig 4A.

      Lines 260 - 262: "These results indicate that centrioles with singlet microtubules can elongate to the same length as controls, and therefore that triplet microtubules are not essential for regulating centriole length." It is hard to agree with this statement. Mutant procentrioles show aberrantly elongated proximal signals of several tested proteins. In addition, in lines 326 - 328, the authors state that "Together, these results indicate that centrioles lacking compound microtubules are unable to properly regulate the length of the proximal end."  

      We thank the reviewer and have clarified the statement to state that these results indicate that centrioles with singlet microtubules can elongate to the same overall length as control centrioles in G2 phase.  

      Line 353: The authors suggest that elongated procentriole structure in mitosis may represent intermediates in centriole disassembly. Another interpretation, more in line with the EM data from Wang et al., 2017, would be that these mutant procentrioles first additionally elongate before they disassemble in late mitosis. The aberrant intermediate structure concept would need further exploration. For instance, anti-alpha/beta-tubulin antibodies could be used to investigate centriole microtubules.  

      We apologize for the confusion and have edited this section for clarity (lines 341-343): “We conclude that in our mutant cells, centrioles elongate in early mitosis to form an aberrant intermediate structure, followed by fragmentation in late mitosis.”

      References need to be included in lines 122, 277, 279. 

      We have added these references

      Line 281: Add references PMID: 30559430 and PMID: 32526902.  

      We have added these references (lines 265-266).

      Line 289: "Moreover, our results suggest that centriole glutamylation is a multistep process, in which long glutamate side chains are added later during centriole maturation." This does not seem like an original observation. For instance, see PMID: 32526902.  

      We have added this reference (lines 273-274).

    1. Author response:

      Reviewer 1:

      (1) Provide Rsmd and DALI scores to show how similar the AlphaFold-predicted structures of BrrG are to other anti-termination factors. This should be done for Fig1B and also for Suppl. Fig 1 to support the claim that BrrG, GafA, GafZ, Q21 share structural features.

      In the revised manuscript we will provide Rsmd and DALI scores.

      (2) Throughout the manuscript, flow cytometry data of gfp expression was used and shown as single replicate. Korotaev et al wrote in the legends that error bars are shown (that is not true for e.g. Figs. 3, 4, and 5). It is difficult for reviewers/readers to gauge how reliable are their experiments.

      As stated in the manuscript all flow cytometry data were performed in triplicate. In the revised manuscript we will include the two replicates not presented in the main figures as supplementary information.

      (3) I am unsure how ChIP-seq in Fig. 2A was performed (with anti-FLAG or anti-HA antibodies? I cannot tell from the Materials & Methods). More importantly, I did not see the control for this ChIP-seq experiment. If a FLAG-tagged BrrG was used for ChIP-seq, then a WT non-tagged version should be used as a negative control (not sequencing INPUT DNA), this is especially important for anti-terminator that can co-travel with RNA polymerase. Please also report the number of replicates for ChIP-seq experiments.

      Fig. 2A presents a coverage plot from the ChIP-Seq of ∆brrG +pTet:brrG-3xFLAG (N’). A replicate of this N-terminally tagged construct will be added as supplementary data in the revised version. As anticipated by the referee, we had used ∆brrG +pTet:brrG (untagged) as control.

      (4) Korotaev et al mentioned that BrrG binds to DNA (as well as to RNA polymerase). With the availability of existing ChIP-seq data, the authors should be able to locate the DNA-binding element of BrrG, this additional information will be useful to the community.

      We will mine the ChIP-Seq data to define the BrrG binding site as closely as possible and include the analysis in the revised version of the manuscript.

      (5) Mutational experiments to break the potential hairpin structure are required to strengthen the claim that this putative hairpin is the potential transcriptional terminator.

      We did not claim that the identified hairpin is a terminator but rather suggested it as a candidate terminator. We agree with the referee that the proposed experiment would be necessary to definitively prove its terminator function. However, our primary aim was to demonstrate that BrrG acts as a processive terminator, which we have shown by replacing the putative terminator with a well-characterized synthetic terminator that BrrG successfully overcame. Therefore, we prefer not to conduct the proposed experiment and will instead further tone down our conclusions regarding the putative terminator function of the identified hairpin structure.

      Reviewer 2:

      (1) The authors wrote "GTAs are not self-transmitting because the DNA packaging capacity of a GTA particle is too small to package the entire gene cluster encoding it" (page 3). I thought that at least the Bartonella capsid gene cluster should be self-transmissible within the 14 kb packaged DNA (https://doi.org/10.1371/journal.pgen.1003393, https://doi.org/10.1371/journal.pgen.1000546). This was also concluded by Lang et al (https://doi.org/10.1146/annurev-virology-101416-041624). In this case the presented results would have important implications. As the gene cluster and the anti-terminator required for its expression are separated on the chromosome, it would not be possible to transfer an active GTA gene cluster, although the DNA coding for the genes required for making the packaging agent itself, theoretically fits into a BaGTA particle. Could the authors comment on that? I think it would be helpful to add the sizes of the different gene clusters and the distance between them in Fig. 2A. The ROR amplified region spans 500kb, is the capsid gene cluster within this region?

      We thank the reviewer for bringing up this interesting point. The bgt cluster (capsid cluster) is approximately 9.2 kb in size and could feasibly be packaged in its entirety into a GTA particle. In contrast, the ror gene cluster, which encodes the antiterminator BrrG, is approximately 20 kb in size—exceeding the packaging limit of GTA particles—and is separated from the bgt cluster by approximately 35 kb. Consequently, if the bgt cluster is transferred via a GTA particle into a recipient host that does not encode the ror gene cluster, the bgt cluster would not be expressed.

      (2) Another side-note regarding the introduction: On page three the authors write: "GTAs encode bacteriophage-like particles and in contrast to phages transfer random pieces of host bacterial DNA". While packaging is not specific, certain biases in the packaging frequency are observed in both studied GTA families. For Bartonella this is ROR. In the two GTA-producing strains D. shibae and C. crescentus origin and terminus of replication are not packaged and certain regions are overrepresented (https://doi.org/10.1093/gbe/evy005, https://doi.org/10.1371/journal.pbio.3001790). Furthermore, D. shibae plasmids are not packaged but chromids are. I think the term "random" does not properly describe these observations. I would suggest using "not specific" instead.

      We thank the reviewer for this suggestion and will adjust the working accordingly.

      (3) Page 5: Remove "To address this". It is not needed as you already state "To test this hypothesis" in the previous sentence.

      We will adjust the working accordingly.

      (4) I think the manuscript would greatly benefit from a summary figure to visualize the Q-like antiterminator-dependent regulatory circuit for GTA control and its four components described on pages 15 and 16.

      We thank the reviewer for this valuable suggestion and will include a summary figure illustrating the deduced regulatory mechanism in the revised manuscript.

      (5) Page 17: It might be worth noting that GafA is highly conserved along GTAs in Rhodobacterales (https://doi.org/10.3389/fmicb.2021.662907) and so is probably regulatory integration into the ctrA network (https://doi.org/10.3389/fmicb.2019.00803). It's an old mechanism. It would be also interesting to know if it is a common feature of the two archetypical GTAs that the regulator is not part of the cluster itself.

      We agree with the points raised by the reviewer and will address them in the revised manuscript. Specifically, we will highlight the high conservation of GafA in GTAs across Rhodobacterales and its regulatory integration within the ctrA network. Additionally, we will analyze whether the GafA-like antitermination regulator is typically located outside the regulated gene cluster, as we have demonstrated for BrrG of BaGTA in the Bartonellae.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, Huang et al used SMRT sequencing to identify methylated nucleotides (6mA, 4mC, and 5mC) in Pseudomonas syringae genome. They show that the most abundant modification is 6mA and they identify the enzymes required for this modification as when they mutate HsdMSR they observe a decrease of 6mA. Interestingly, the mutant also displays phenotypes of change in pathogenicity, biofilm formation, and translation activity due to a change in gene expression likely linked to the loss of 6mA. Overall, the paper represents an interesting set of new data that can bring forward the field of DNA modification in bacteria.

      Thank you for your valuable feedback on our paper exploring the impact of 6mA modification in P. syringae.

      Major Concerns:

      Most of the authors' data concern Psph pathovar. I am not sure that the authors' conclusions are supported by the two other pathovars they used in the initial 2 figures. If the authors want to broaden their conclusions to Pseudomonas syringe and not restrict it to Psph, the authors should have stronger methylation data using replicates. Additionally, they should discuss why Pss is so different than Pst and Psph. Could they do a blot to confirm it is really the case and not a sequencing artifact? Is the change of methylation during bacterial growth conserved between the pathovar? The authors should obtain mutants in the other pathovar to see if they have the same phenotype. The authors have a nice set of data concerning Psph but the broadening of the results to other pathovar requires further investigation.

      We appreciate the reviewer’s insightful comments. While the majority of our data focuses on the Psph, we recognize the importance of validating these findings in Pss and Pst. To this end, we have performed additional experiments using dot blot and mutant construction to enhance our conclusions in other pathovars.

      We agree that we should discuss why Pss is different from Psph and Pst. We performed a dot blot assay using genome DNA in Pss and Pst, presented in Figure S5A. Meanwhile, we compared the 6mA modification level of Pss and Pst in different growth phases. As shown in Figure S5A, the change of methylation during bacterial growth is conserved in Pst. The change was not obvious in Pss, which might be due to the lack of a type I R-M system.

      “In accordance with previous studies showing that growth conditions affect the bacterial methylation status, we applied dot blot experiments using the same amount of DNA (1 μg) from these three P. syringae strains to detect the 6mA levels during both logarithmic and stationary phases. The results revealed that 6mA levels in the stationary phase were much higher compared to the logarithmic phase in Psph and Pst, but no significant change in Pss. Additionally, we found that during the stationary phase, 6mA methylation levels in Psph and Pst were higher than those in Pss. These findings were consistent with the MTases predication on these three strains, since Pss does not harbor any type I R-M systems, which are important for 6mA medication in bacteria.”

      Please see Figure S5A and Lines 220-228 in the revised manuscript.

      We also tried to construct an HsdM mutant in Pst to explore whether the influence of 6mA methylation was conserved in P. syringae, but it failed after multiple attempts. We did not construct a Pss mutant because no type I R-M system was predicted, and few methylation sites were identified via SMRT-seq in this strain. Therefore, we overexpressed HsdM in Pst instead. We have performed additional experiments in WT and the HsdM overexpression strains, including dot blot and growth curve assays.

      Please see Figures S5B-C and Lines228-232 in the revised manuscript.

      The authors should include proper statistical analysis of their data. A lot of terms are descriptive but not supported by a deeper analysis to sustain the conclusions. For example, in Figure 4E, we do not know if the overlap is significant or not. Are DEGs more overlapping to 6mA sites than non-DEGs? Here is a non-exhaustive list of terms that need to be supported by statistics: different level (L145), greater conservation (L162), significant conservation (L165), considerable similarity (L175), credible motifs (L189), Less strong (L277) and several "lower" and "higher" throughout the text.

      Thank you for the insightful feedback. We have made the following revisions in the manuscript to ensure that the terms are more precise and do not require statistical significance testing.

      (1) Statistical analysis: We have added statistical tests for the overlap between DEGs and 6mA sites in Figure 4E. We performed the Fisher test, and we found the overlap was not significant (p> 0.05). DEGs and non-DEGs were both non-significant overlapped 6mA sites. Please see Figure 4E and Lines 261-262.

      “Less strong” was used to indicate the influence of HsdM on biofilm in Figure 5D. All Figures with “*” labels were analyzed using students' two-tailed t-tests with a significant change (p < 0.05).

      (2) Revised wording: For terms used to describe comparisons, we have revised the wording to be clearer and ensure that the terminology used did not imply the need for statistical significance testing when not required. For example:

      “Different level” has been removed. Please see Lines 143-144.

      “Greater conservation” has been revised to “more conserved functional terms”. Please see Lines 161-162.

      “Significant conservation” has been revised to “notable conservation”. Please see Line 165.

      “Credible motifs” has been revised to “identified motifs”. Please see Line 186.

      The authors performed SMRT sequencing of the delta hsdMSR showing a reduction of 6mA. Could they include a description of their results similar to Figures 1-2. How reduced is the 6mA level? Is it everywhere in the genome? Does it affect other methylation marks? This analysis would strengthen their conclusions.

      Yes, we agree. We have provided additional analysis and descriptions to strengthen the conclusions regarding these valuable comments. We determined three methylation sites in the HsdMSR mutant strain and compared the overlapped genes within these modification patterns. Specifically, we focused on the 6mA sites in Psph WT, HsdMSR mutant, and HsdM motif CAGCN<sub>(6)</sub>CTC. As expected, we found almost all of the reduction 6mA sites in the ΔhsdMSR were from motif CAGCN<sub>(6)</sub>CTC. We also noticed that 5mC and 4mC sites in the mutant were relatively similar to that in WT, and the slight difference might be caused by sequencing errors. Overall, we propose that HsdMSR only catalyze the 6mA located on the motif CAGCN<sub>(6)</sub>CTC, but does not affect other 6mA sites and other modification types.

      Please see Figures S4D-E and Lines 212-218 in the revised manuscript.

      In Figure 6E to conclude that methylation is required on both strands, the authors are missing the control CAGCN6CGC construct otherwise the effect could be linked to the A on the complementary strand.

      Thank you for your valuable suggestions. We have provided the control result on the complementary strand. Please see Figure 6C. The new result evidences the conclusion that 6mA methylation regulates gene transcription based on methylation on both strands.

      Please see Figure 6C and Lines 329-330 in the revised manuscript.

      Reviewer #2 (Public Review):

      In the present manuscript, Huang et.al. revealed the significant roles of the DNA methylome in regulating virulence and metabolism within Pseudomonas syringae, with a particular focus on the HsdMSR system in this model strain. The authors used SMRT-seq to profile the DNA methylation patterns (6mA, 5mC, and 4mC) in three P. syringae strains (Psph, Pss, and Psa) and displayed the conservation among them. They further identified the type I restriction-modification system (HsdMSR) in P. syringae, including its specific motif sequence. The HsdMAR participated in the process of metabolism and virulence (T3SS & Biofilm formation), as demonstrated through RNA-seq analyses. Additionally, the authors revealed the mechanisms of the transcriptional regulation by 6mA. Strictly from the point of view of the interest of the question and the work carried out, this is a worthy and timely study that uses third-generation sequencing technology to characterize the DNA methylation in P. syringae. The experimental approaches were solid, and the results obtained were interesting and provided new information on how epigenetics influences the transcription in P. syringae. The conclusions of this paper are mostly well supported by data, but some aspects of data analysis and discussion need to be clarified and extended.

      Thank you for your positive feedback and recognition of the importance of our study. We appreciate the suggestions for further clarification and extension of some aspects of data analysis and discussion. We added further analysis of the SMRT-seq result of the ΔhsdMSR and overexpressed HsdM in Pst to provide more information on conservation. We added these contents to the discussion in the revised manuscript. Please see Figure 6C and  Figure S5.

      Reviewer #3 (Public Review):

      Summary:

      The article by Huang et.al. presents an in-depth study on the role of DNA methylation in regulating virulence and metabolism in Pseudomonas syringae, a model phytopathogenic bacterium. This comprehensive research utilized single-molecule real-time (SMRT) sequencing to profile the DNA methylation landscape across three model pathovars of P. syringae, identifying significant epigenetic mechanisms through the Type-I restriction-modification system (HsdMSR), which includes a conserved sequence motif associated with N6-methyladenine (6mA). The study provides novel insights into the epigenetic mechanisms of P. syringae, expanding the understanding of bacterial pathogenicity and adaptation. The use of SMRT sequencing for methylome profiling, coupled with transcriptomic analysis and in vivo validation, establishes a robust evidence base for the findings

      Strengths:

      The results are presented clearly, with well-organized figures and tables that effectively illustrate the study's findings.

      Weaknesses:

      It would be helpful to add more details, especially in the methods, which make it easy to evaluate and enhance the manuscript's reproducibility.

      Thank you for the positive evaluation of our study, as well as the constructive feedback provided. We have added more details in methods for RNA-seq analysis and Ribo-seq analysis. Please see Lines 484-515.

      “Briefly, bacteria were cultured to an OD<sub>600</sub> of 0.4, at which point chloramphenicol was added to a final concentration of 100 µg/mL for 2 minutes. Cells were then pelleted and washed with pre-chilled lysis buffer [25 mM Tris-HCl, pH 8.0; 25 mM NH4Cl; 10 mM MgOAc; 0.8% Triton X-100; 100 U/mL RNase-free DNase I; 0.3 U/mL Superase-In; 1.55 mM chloramphenicol; and 17 mM GMPPNP]. The pellet was resuspended in lysis buffer, followed by three freeze-thaw cycles using liquid nitrogen. Sodium deoxycholate was then added to a final concentration of 0.3% before centrifugation. The resulting supernatant was adjusted to 25 A260 units and mixed with 2 mL of 500 mM CaCl<sub>2</sub> and 12 µL MNase, making up a total volume of 200 µL. After the digestion, the reaction was quenched with 2.5 mL of 500 mM EGTA. Monosomes were isolated using Sephacryl S400 MicroSpin columns, and RNA was purified using the miRNeasy Mini Kit (Qiagen). rRNA was removed using the NEBNext rRNA Depletion Kit, and the final library was constructed with the NEBNext Small RNA Library Prep Kit. For each sample, ribosome footprint reads were mapped to the Psph 1448A reference genome, and the translational efficiency was calculated by dividing the normalized Ribo-seq counts by the normalized RNA counts. Two biological replicates were performed for all experiments.”

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      I would recommend the authors limit their manuscript to Psph pathovar and include statistical analysis supporting their conclusions.

      Thank you for your suggestion.

      Minor

      • L104: "significantly" please add a p-value and explain the analysis.

      Sorry for the confusion. We have added the p-value and explained the analysis in the method section. The p-value used for SMRT-seq was the modification quality value (QV) score, which were used to call the modified bases A (QV=50) and C (QV=100). Please see Lines 452-454.

      • Figures 1B, D, F, and Figure 2A: make the Venn diagram to scale

      Yes, we have revised.

      • L110-111: missing p-value to say that the authors observe a bigger overlap in Pst than Psph as they observed more modified sites in Pst

      Sorry for the confusion. We said it had a bigger overlap in Pst because the number 17.7 was bigger than the number of 15 in Psph. To avoid misunderstanding, we revised the wording to “more genes equipped with all three modification types were detected in Pst than Psph”. Please see Lines 110-111.

      • L112: missing description of their Pss analysis (IDP, sites...)

      We have added the information for Pss in the revised manuscript.

      “Additionally, the methylome atlas of Pss revealed a lower incidence of methylation than those of Psph and Pst, particularly in terms of 6mA modifications, which were only seen in 457 significant 6mA occurrences under the same threshold (IPD > 1.5) and a total of 2,853 and 1,438 methylation sites were detected as 5mC and 4mC, respectively”. Please see Lines 114-116.

      • L118: "modification" to "modified "

      We have revised. Please see Line 119.

      • L120: "modification sites" to "modified nucleotides"

      We have revised. Please see Line 121.

      • L142: correct the title "Methylated genes revealed highly functional conservation among three P. syringae strains" maybe to "Methylated genes are functionally conserved among ..."

      We have revised. Please see Line 142.

      • Figure 2C: not easy to read and interpret

      Sorry for the confusion. Figure 2C revealed the significantly enriched functional pathways in GO and KEGG databases among three P. syringae strains. The specific names of each pathway were listed on the left, and each column with dots indicated the number of genes within one kind of methylation in one of three P. syringae strains. The larger the size, the bigger the number.

      We have revised the legend of Figure 2C. Please see Lines 575-579.

      “The dot plot revealed the significantly enriched functional pathways in GO and KEGG databases among three P. syringae strains. The specific names of each pathway were listed on the left, and each column with dots indicated the number of genes within one kind of methylation in one of three P. syringae strains. The size of the dots indicates the number of related genes.”

      • Figure 6B-C: what is the difference between B 24h and C?

      Figure 6B revealed the expression difference between WT and mutant during 24 hours. Figure 6C only showed a time point in 24 hours. To avoid repetition, we have removed Figure 6C.

      • Figure 6C-D: if the same maybe remove Figure 2C

      We have removed Figure 6D.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript could be improved by addressing the following concerns:

      (1) In line 146: How to understand the percentage conserved in "more than two of the strains"?

      Sorry for the confusion, we planned to indicate the pattern that conserved in two strains and three strains. We have revised it to: “Notable, about 25% to 45% of methylated genes were conserved in two and three strains”. Please see Line 145.

      (2) In line 178: Five conserved sequence motifs should be replaced by "Six conserved sequence motifs".

      We have revised. Please see Line 176.

      (3) In Figure 2B, specify the C1, C2 and C3. "m6A" should be replaced by "6mA".

      Yes, we have revised.

      (4) In Figure S2, "m6A" should be replaced by "6mA".

      Yes, we have revised.

      (5) In line 212, please add references for the previous studies showing that growth conditions affect bacterial methylation status.

      Thank you for your suggestion. We have added the relevant references (Gonzalez and Collier, 2013), (Krebes et al., 2014), (Sanchez-Romero and Casadesus, 2020).

      (6) In line 217, "illustrate" should be "illustrated".

      Yes, we have revised. Please see Line 210.

      (7) There are some genes colored in grey, revealing bigger differences between the two strains than those related to ribosomal protein, T3SS, and alginate synthesis in Fig. 4A. Do they have important functional roles as well?

      Thank you for your suggestion. A total of 116 genes with bigger differences (|Log<sub>2</sub>FC| > 2) except for genes related to ribosomal protein, T3SS, and alginate synthesis. Among these genes, 31 were annotated as hypothetical proteins and 4 as transcription factors with unknown functions, and the remaining genes mostly encoded metabolism-related enzymes. These enzymes might have effects on growth defects in ΔhsdMSR. We added this information in the revised manuscript. Please see Line 249-254.

      (8) The authors should discuss what will be the potential signals or factors that can regulate the activity of HsdMSR. In other words, what can decide the extent of methylation through activating or suppressing the expression of HsdMSR?

      Thank you for your valuable suggestion. We have added this part in the discussion part. Please see Lines 404-415.

      “Apart from the established roles of 6mA and HsdMSR in P. syringae, certain signals or factors may influence HsdMSR expression. For instance, we confirmed that the growth phase affects methylation levels in P. syringae. Previous studies have shown that increased temperatures can reduce methylation levels, as observed in PAO1(Doberenz et al., 2017). These findings suggest that HsdMSR expression may be responsive to both intrinsic cellular states and extrinsic environmental conditions. To further explore potential upstream TFs regulating the expression of HsdMSR, we searched for TF-binding sites in the HsdMSR promoter using our own databases (Fan et al., 2020; Shao et al., 2021; Sun et al., 2024). As a result, we found three candidate TFs (PSPPH_0061, PSPPH_3268, and PSPPH_3504) that might directly bind and regulate HsdMSR expression. Future studies on these TFs and their interactions with the HsdMSR promoter would help clarify the regulatory network governing HsdMSR activity.”

      Reviewer #3 (Recommendations For The Authors):

      (1) Some figures contain dense information, which may be overwhelming for readers. Streamlining the legend for Figure 1 and resizing the Venn diagrams within it could enhance clarity and visual appeal.

      Thank you for your suggestion. We have scaled all the Venn plots in the revised version.

      (2) Incorporating a discussion about the role of the restriction-modification (RM) system in bacterial defense against phage infection into the discussion section could enrich the manuscript's context and relevance.

      Thank you for your valuable suggestion. We have added this part in the Discussion part. Please see Lines 416-427.

      “RM systems are known for their intrinsic role as innate immune systems in anti-phage infection, and present in around 90% of bacterial genomes(Oliveira et al., 2014). RM systems protect bacteria self by recognizing and degrading foreign phage DNA via methylation-specific site and restriction endonucleases (REases) (Loenen et al., 2014). In addition, other phage-resistance systems are similar to RM systems but carry extra genes. One is called the phage growth limitation (Pgl) system, which modifies and cleaves phage DNA. However, the Pgl only modifies the phage DNA in the first infection cycle, and cleaves phage DNA in the subsequent infections, which gives a warn to the neighboring cells(Hampton et al., 2020; Hoskisson et al., 2015). To counteract RM and RM-like systems, phages have evolved strategies, including unusual modifications such as hydroxymethylation, glycosylation, and glucosylation. They can also encode their own MTases to protect their DNA or employ strategies to evade restriction systems and other anti-RM defenses.(Iida et al., 1987; Murphy et al., 2013; Vasu and Nagaraja, 2013).

      (3) In line 152: What is the importance of the mentioned example of Cro/CI family TF?

      Thank you for your comments. The Cro/CI are important TFs present in phages. The interaction between Cro and CI affects bacteria immunity status in Enterohemorrhagic Escherichia coli (EHEC) strains(Jin et al., 2022). RM systems are known as a kind of phage-defense system, and hence we mentioned it here. We have added this description in the revised manuscript. Please see Lines 152-154.

      Reference:

      (1) Doberenz, S., Eckweiler, D., Reichert, O., Jensen, V., Bunk, B., Sproer, C., Kordes, A., Frangipani, E., Luong, K., Korlach, J., et al. (2017). Identification of a Pseudomonas aeruginosa PAO1 DNA Methyltransferase, Its Targets, and Physiological Roles. mBio 8. 10.1128/mBio.02312-16.

      (2) Fan, L., Wang, T., Hua, C., Sun, W., Li, X., Grunwald, L., Liu, J., Wu, N., Shao, X., Yin, Y., et al. (2020). A compendium of DNA-binding specificities of transcription factors in Pseudomonas syringae. Nat Commun 11, 4947. 10.1038/s41467-020-18744-7.

      (3) Gonzalez, D., and Collier, J. (2013). DNA methylation by CcrM activates the transcription of two genes required for the division of Caulobacter crescentus. Mol Microbiol 88, 203-218. 10.1111/mmi.12180.

      (4) Hampton, H.G., Watson, B.N., and Fineran, P.C. (2020). The arms race between bacteria and their phage foes. Nature 577, 327-336.

      (5) Hoskisson, P.A., Sumby, P., and Smith, M.C. (2015). The phage growth limitation system in Streptomyces coelicolor A (3) 2 is a toxin/antitoxin system, comprising enzymes with DNA methyltransferase, protein kinase and ATPase activity. Virology 477, 100-109.

      (6) Iida, S., Streiff, M.B., Bickle, T.A., and Arber, W. (1987). Two DNA antirestriction systems of bacteriophage P1, darA, and darB: characterization of darA− phages. Virology 157, 156-166.

      (7) Jin, M., Chen, J., Zhao, X., Hu, G., Wang, H., Liu, Z., and Chen, W.-H. (2022). An engineered λ phage enables enhanced and strain-specific killing of enterohemorrhagic Escherichia coli. Microbiology Spectrum 10, e01271-01222.

      (8) Krebes, J., Morgan, R.D., Bunk, B., Sproer, C., Luong, K., Parusel, R., Anton, B.P., Konig, C., Josenhans, C., Overmann, J., et al. (2014). The complex methylome of the human gastric pathogen Helicobacter pylori. Nucleic Acids Res 42, 2415-2432. 10.1093/nar/gkt1201.

      (9) Loenen, W.A., Dryden, D.T., Raleigh, E.A., Wilson, G.G., and Murray, N.E. (2014). Highlights of the DNA cutters: a short history of the restriction enzymes. Nucleic Acids Res 42, 3-19.

      (10) Murphy, J., Mahony, J., Ainsworth, S., Nauta, A., and van Sinderen, D. (2013). Bacteriophage orphan DNA methyltransferases: insights from their bacterial origin, function, and occurrence. Appl Environ Microb 79, 7547-7555.

      (11) Oliveira, P.H., Touchon, M., and Rocha, E.P. (2014). The interplay of restriction-modification systems with mobile genetic elements and their prokaryotic hosts. Nucleic Acids Res 42, 10618-10631.

      (12) Sanchez-Romero, M.A., and Casadesus, J. (2020). The bacterial epigenome. Nature reviews. Microbiology 18, 7-20. 10.1038/s41579-019-0286-2.

      (13) Shao, X., Tan, M., Xie, Y., Yao, C., Wang, T., Huang, H., Zhang, Y., Ding, Y., Liu, J., Han, L., et al. (2021). Integrated regulatory network in Pseudomonas syringae reveals dynamics of virulence. Cell Rep 34, 108920. 10.1016/j.celrep.2021.108920.

      (14) Sun, Y., Li, J., Huang, J., Li, S., Li, Y., Lu, B., and Deng, X. (2024). Architecture of genome-wide transcriptional regulatory network reveals dynamic functions and evolutionary trajectories in Pseudomonas syringae. bioRxiv, 2024.2001. 2018.576191.

      (15) Vasu, K., and Nagaraja, V. (2013). Diverse functions of restriction-modification systems in addition to cellular defense. Microbiol Mol Biol Rev 77, 53-72. 10.1128/MMBR.00044-12.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the editors and reviewers for the comments and suggestions on our manuscript.  The main point that we wished to convey in this paper was the concept and the kinetic model that enabled the estimation of nuclear export rate from an image of single mRNAs localised in single cells.  By studying the influenza viral transcripts with this model, we report the variation in the mRNA nuclear export rate of the eight viral segments.  Of note, the hemagglutinin and neuraminidase mRNAs were the slowest among the eight segments in exiting the nucleus.  We agree that the potential mechanism and the biological impact of this observation require further validation, as the reviewers pointed out.  We revised our manuscript to describe these points separately (Lines 21-25, Abstract; Lines 86-91, Introduction; Lines 316-320, Results; Lines 372-381, Discussion).  We also highlight below, the revisions that we made to address the specific points raised by the reviewers.  

      Influenza viral transcription

      The authors used specific settings for their virology experiments and several assumptions regarding their mathematical modelling, so it's extremely important that the reader has the viral life cycle clearly understood before immersing themselves in the results. Thus, a detailed explanation of the viral life cycle, including the kinetics of each step, would be extremely helpful if included in the introduction section.  Reviewer #1

      We have included the molecular composition of influenza vRNP and the mechanism of viral transcription in the revised manuscript (Lines 46-53).  

      Line 45: "Eight viral RNA segments are transcribed by the same set of molecular machinery" (Ref. 7). What's known about the arrival of the viral RNA segments in the nucleus? Is it synchronized? The authors will understand that my concern is related to the fact that a differential arrival would indeed impact the transcription and export processes.  Reviewer #1

      The arrival of eight vRNPs in the nucleus is not synchronised, with each of the eight vRNPs arriving independently (Chou et al. PLOS Pathogens 2013) (Lakadamyali et al, PNAS 2003).  This does not compromise our model, as our model estimates the export rate of each mRNA species individually (also please see our response in Model assumption below).  This is included in the second paragraph of the Discussion section (Lines 390-400).  

      Model assumption

      Even though I do not have the expertise to assess the authors' mathematical model, I do not doubt its robustness. Even so, I find some virological concerns related to the set-up of their experiments. According to what I understand, the authors performed non-synchronized 2 h-long infections with the WSN strain of influenza A virus. They did this to avoid cRNA production (and cross-reaction of the probes), which they claim to occur "much later than mRNA synthesis". Then they omit the degradation of the mRNAs for their model without giving an explanation for having done so. So, taking all these into account, it seems to me that too many assumptions are made without a strong argument. I understand that they are made in order to simplify their model, but I strongly consider that the model would gain strength if some of these events were experimentally considered. Thus, would it be possible to perform synchronized infections? Would it be possible to empirically demonstrate that cRNA production does not occur within the first 2 hours of infection and/or separate transcription and replication? Would it be possible to incorporate a degradation inhibitor of the mRNAs into their infections? If all these could be achieved, then the results coming out of the mathematical model would be enormously reinforced.  Reviewer #1

      * The study lacks experimental data that would help support the conclusions. For instance, perturbations are many times used to prove a point related to gene expression. An example for Fig. 2 for such an experiment could be to treat the cells with transcription inhibitors (e.g. DRB, 5,6-dichloro1-beta-D-ribofuranosylbenzimidazole). Preventing transcription leaves only mature RNAs in the nucleus, and then using this system one can compare the export rate of different RNAs.  Reviewer #2

      We agreed that the primary concern in our model was the assumption that the mRNA degradation could be omitted.  Synchronised infection is not necessary; in fact, non-synchronised infection is preferred, as we explain later in our response.  Additionally, the dominance of mRNA production over the cRNA production has been documented elsewhere.  To address mRNA degradation and validate our model estimation, we performed a time-course measurement using baloxavir.  Baloxavir efficiently blocks the viral transcription by inhibiting the nuclease activity in PA.  DRB, suggested by the reviewer, allows influenza viral transcription and causes viral transcripts to accumulate in the nucleus for unknown mechanisms (Amorim et al. Traffic 2007 and our observation using smFISH, not shown).  The additional experiment, now presented in Fig. 5 in the revised manuscript, indicated that the mRNA degradation is minimal, and the export rate estimated in our model and the time-course experiment agreed well for the HA segment.  The experiment raised the possibility that the time-course measurement underestimates the export rate of transcripts that exit the nucleus rapidly, such as NP.  A real-time imaging of single transcripts would be necessary to directly measure the true nuclear export rate; however, this is beyond the scope of our paper.  The new result is now presented in Fig. 5, Supplementary figures 3 and 4, and in the main text (Lines 322-360).  An alteration was also made in Line 286 to guide to Fig. 5.  The Materials and Methods section was updated (Lines 478-482).  

      We note that our model does not require synchronised infection.  Even under synchronised infection, such as incubating cells with the virus at 4°C to facilitate attachment and subsequently shifting to 37°C to allow viral entry, the inherent heterogeneity in vRNP migration to the nucleus still remains.  This randomness does not compromise our model; rather, our model exploits this random arrival of each vRNP in each cell in the system.  This variation, in turn, generates cells carrying varying amounts of transcripts, enabling the estimation of nuclear export rate.  Importantly, more variation ensures the broader distribution of transcript levels, enabling more precise parameter fitting in our model.  It is also important to note that our model does not require the correlation between segments.  Our model estimates the export rate of each mRNA species individually.  These important points were explained in the Discussion section (Lines 390-400).  

      * There is no concrete value given for the export rates and what they might mean biologically (e.g. time present/stuck in the nucleus) - Fig. 4D. This leaves the reader in the dark.  Reviewer #2

      The export rate lambda (previously denoted as k) in our model (Fig. 4) and the decay constant k in the time-course measurement (Fig. 5) represent the proportion of mRNAs exported from the nucleus in an infinitesimal time, defining the nuclear export rate.  This has been clarified in the revised manuscript (Lines 314-316), with some alterations to make the parameter use more comprehensive.  

      -  The Greek letter k previously used in Fig. 4 and the associated equations was consistently replaced with lambda to avoid the confusion with the parameter k that is subsequently used for the exponent decay in Fig. 5 in the revised manuscript.  

      -  The Greek letter epsilon (previously used to represent export) was replaced with mu, slightly more common for representing the rate of transport.  

      -  The term “velocity” was consistently replaced with “rate” in the context of the nuclear export (Lines 163, 215, 320, 441).  

      -  The phrase “molar concentrations of mRNAs” was corrected for “molecules of mRNAs” (Line 282).

      Also, we have now described our model in two sections: “Conceiving the model” and “Implementing a kinetic model to estimate the nuclear export rate” in the Result.  The first section outlines the conceptual framework of the model, and the second focuses on its implementation and the parameter extraction (Lines 227 and 277).  

      Applicability of the model

      Lines 27-29. "Our framework presented in this study can be widely used for investigating the nuclear retention of nascent transcripts produced in a transcription burst." In my opinion, this is the strongest point of the manuscript: developing a mathematical model to analyze nuclear export retention as a mechanism of protein expression control, which could lay the foundation for further biological processes. The authors revisit this idea in the Discussion section. However, which would be those processes for which the model could be helpful? I consider that a more conspicuous discussion on this topic would broaden the readers scope, a crucial point under the eLife scope.  Reviewer #1

      * Could this framework be used to quantify the nuclear export rate of cellular RNAs? According to the explanation in the Discussion, it would seem that this approach is limited to quantifying the export rate of influenza RNAs.  Reviewer #2

      Our model is not limited to the influenza virus infection.  Our model is applicable for systems where transcription is initiated concurrently, such as when stimuli trigger the activation of a certain set of genes for transcription.  Therefore, this makes it particularly valuable for quantifying the nuclear retention of mRNAs in a transcription burst.  This point is reiterated in Line 383-390.  

      Potential mechanisms for differential nuclear export rate of viral segments

      * There is no mechanistic insight in the study. The idea driven by this study is that gene expression is regulated by the RNA export rate. But how is that explained? Is there any molecular pathway or explanation for this model? If the transcripts are ready for export, why do the mRNAs stay inside the nucleus? One option to consider are the export factors. Viral RNAs are exported by different pathways as mentioned (line 362), or by TREX2 (Bhat P et al Nat Comm 2023). The data shows that there is no difference observed in the export rate of different pathways. How about knocking down an important export factor to show how this affects the export rates. Or the opposite, overexpress a certain factor, would this change the nucleus/cytoplasm distribution of the retained RNAs.  Reviewer #2

      As we discussed in the paper, we are beginning to consider that each viral segment has an intrinsic sequence that determines its nuclear export rate, because previous studies on the export factors does not fully explain the variation in the nuclear export rate observed in our study.  As the reviewer suggested, a recent study (Bhat et al. Nature Communications 2023) exactly pointed out the internal sequence in the HA segment, aligning with our working hypothesis.  This point is discussed and their work (Bhat et al. 2023) has been cited in the Discussion section in the revised manuscript (Lines 446-449).  

      Biological impact of the nuclear retention

      The authors mention several times throughout the manuscript that the virus might use the nuclear retention of mRNA for HA and NA to postpone the expression of these antigenic molecules. At this point, I need to admit that a great question mark appeared in my mind, maybe related to the fact that some knowledge is lacking in my analysis. Lines 328-330: "On the other hand, pushing back the expression of viral antigens HA and NA would be beneficial for the virus to delay the host immune response against the infected cells in which the virus is being replicated." As I tend to understand, the host immune response recognizes HA and NA within the viral particle, if so and independently of the time that HA and Na arrive at the virus assembly step, the progeny' viral particles that are complete and extruded from the cells would be those awakening the host immunity response. If this is right, how would the delayed export of those proteins from the nucleus (and their late expression) be beneficial for delaying the immune response? I would appreciate an explanation for this point, and if I am wrong, then there could exist a relationship between nuclear export rate and the pathogenicity of different strains of influenza A virus. If so, could the authors challenge their model with additional viral strains showing a differential immune response pattern? A deeper analysis in this direction would greatly strengthen the message in their manuscript.  Reviewer #1

      * Is the timing of viral protein appearance in accordance with the time the mRNA is exported to the cytoplasm. It is logical that the first mRNA to go to the cytoplasm would be the first to become a protein. Can the authors show that nuclear retention of mRNA would push back the expression of the viral antigens HA and NA.  Reviewer #2

      Three types of immune reactions are being studied extensively.  The first is the humoral immune response, where antibodies target the viral antigens HA and NA on the viral envelope, coating and inactivating the viral particles.  The second is the cytotoxic T cell response.  There is growing evidence that cytotoxic T cells react against NP, eliciting cross-reaction to broader range of influenza viral strains.  This reaction is not specific to HA and NA, and antigens are processed in the cytoplasm and presented on the MHC.  The third is antibody-dependent cellular cytotoxicity (ADCC), where antibodies recognise the viral proteins on the cellular surface (HA and NA) of infected cells, facilitating their elimination by the NK cells.  Although protein translation may begin as soon as the first mRNA exits the nucleus, the virus may delay the peak of the antigen production and therefore, postpone the NK-mediated ADCC.  This specific point, along with references to ADCC in influenza virus infection, has been clarified in the Discussion section (Lines 377-381).  

      Data analysis and presentation

      Lines 99-101. "Viral mRNAs were detected as single diffraction-limited spots in the three-dimensional image stacks, allowing for absolute mRNA quantification (Fig. 1B)". What do the authors mean to say by "absolute mRNA quantification"? Do they refer to the total spots or the total mRNAs? Is it assumed that one spot corresponds to a single mRNA transcript? This is not clear at all for this reviewer, which could be the situation for a potential reader. Since it's the beginning of the story, this should be clearly stated in the manuscript.  Reviewer #1

      Each spot of fluorescent signal corresponds to a single molecule of viral mRNA.  We quantified the absolute number of transcripts in each cell.  This is clarified in the revised manuscript (Lines 104-106).  

      * Line 151: does the baseline change according to the RNA in question? The authors say that the "baseline is defined by the median of the Z distribution of peripheral mRNAs" - it seems that the number 0.731 refers only to one type of RNA (which is not mentioned at all not in the text and not in the legend). Reviewer #2

      The baseline was set using the NP mRNAs in the cytoplasm because the NP mRNA showed the widest distribution across the cytoplasm (Line 157).  

      * Also, what is all the signal that is seen outside the marked cells in Fig. 2B? There seems to be significant background in the field, does this mean much false-positive in the multiplex FISH? If so, then how do the authors know that the staining inside the cells isn't to some degree non-specific? It would be necessary to back this up with some other type of quantitative assay like qRT-PCR.  Reviewer #2

      The cells were removed from the analysis if the cytoplasmic boundary touched any edge of the field-of-view, while the signals were recovered across the entire field-of-view.  This is clarified in the figure legend (Lines 194-195).  

      Others

      * The meaning and explanation for Figure 1H -are unclear. Rephrase and make the legend more reader friendly.  Reviewer #2

      We made alterations to the legend (Lines 132-134) and the relevant lines in the main text (Lines 148-151).  

      * Fig. 2E: Is this the total transcript count or only in the nucleus? Would it be possible to find some correlation between the segments if a pair-wise analysis is performed according to nuclear-cytoplasm distribution?  Reviewer #2

      The total counts are presented.  This is clarified in the legend (Lines 199-200).  

      * Abstract -"A mathematical modelling indicated that the relationship between the nuclear ratio and the total count of mRNAs in single cells is dictated by a proxy for the nuclear export rate." - this sentence is very unclear.  Reviewer #2

      The sentence was removed in the revised manuscript (Line 21).  This removal did not affect the overall meaning in the abstract.  We also made an alteration to Line 279 that contained a similar phrase.  

      * The use of the word "acutely" (lines 16 and 35) is strange.  Reviewer #2

      They have been removed (now Lines 15, 33).  

      * Line 157 - "This result indicates that the velocity of viral mRNA export from the nucleus varies according to the viral segments." - not velocity, maybe timing.  Reviewer #2

      We consistently replaced “velocity” with “rate” (Lines 163, 215, 320, 441).

      * Reference for line 41.  Reviewer #2

      A reference (Waker et al. Trends Microbiol. 2019) has been cited (Line 39).  

      * Reference for lines 105-106.  Reviewer #2

      The gene length of each segment was indicated in the sentence (Line 137).  

      * Line 264- why here is 0.02 M.O.I used compared to line 97 where 2 is used?  Reviewer #2

      We used M.O.I. of 0.02 to allow for spot quantification over longer periods of observation (Lines 269-270).  

      * NS1 is expressed at late infection times and might alter the nuclear export of viral mRNAs (line 352). Need to show that indeed it is not expressed in the experiments done here.  Reviewer #2

      It is not possible to definitely prove that NS1 is not expressed due to the sensitivity limitations.  However, we minimised the its impact by investigating at the early time point (Lines 415416).  

      * Line 459- 30% formamide? Is this correct or should it be 10%?  Reviewer #2

      This is correct.  The probes used were longer than the others for smFISH.  Therefore, we washed away the probes with the stringent condition.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Giménez-Orenga et al. investigate the origin and pathophysiology of myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) and fibromyalgia (FM). Using RNA microarrays, the authors compare the expression profiles and evaluate the biomarker potential of human endogenous retroviruses (HERV) in these two conditions. Altogether, the authors show that HERV expression is distinct between ME/CFS and FM patients, and HERV dysregulation is associated with higher symptom intensity in ME/CFS. HERV expression in ME/CFS patients is associated with impaired immune function and higher estimated levels of plasma cells and resting CD4 memory T cells. This work provides interesting insights into the pathophysiology of ME/CFS and FM, creating opportunities for several follow-up studies.

      Strengths:

      (1) Overall, the data is convincing and supports the authors' claims. The manuscript is clear and easy to understand, and the methods are generally well-detailed. It was quite enjoyable to read.

      (2) The authors combined several unbiased approaches to analyse HERV expression in ME/CFS and FM. The tools, thresholds, and statistical models used all seem appropriate to answer their biological questions.

      (3) The authors propose an interesting alternative to diagnosing these two conditions. Transcriptomic analysis of blood samples using an RNA microarray could allow a minimally invasive and reproducible way of diagnosing ME/CFS and FM.

      Weaknesses:

      (1) The cohort analysed in this study was phenotyped by a single clinician. As ME/CFS and FM are diagnosed based on unspecific symptoms and are frequently misdiagnosed, this raises the question of whether the results can be generalised to external cohorts.

      Thank you for your comment. Surely the study of larger cohorts will determine the external validity of these results in a clinical scenario. However, this pilot study, first of its kind, was designed to maximize homogeneity across participants which seemed primarily ensured by inclusion of females only diagnosed by a single experienced observer.

      (2) The analyses performed to unravel the causes and effects of HERV expression in ME/CFS and FM are solely based on sequencing data. Experimental approaches could be used to validate some of the transcriptomic observations.

      Certainly, experimental approaches may add robustness to our findings. We in fact consider taking this avenue to deepen in the observations presented here. However, the limited knowledge of HERV-mediated physiological functions may hinder the task of revealing causes and effects of HERV expression in ME/CFS and FM in the short term.

      Reviewer #2 (Public review):

      Summary:

      Giménez-Orenga carried out this study to assess whether human endogenous retroviruses (HERVs) could be used to improve the diagnosis of Myalgic Encephalomyelitis/Chronic Fatigue Syndrome (ME/CFS) and Fibromyalgia (FM). To this end, they used the HERV-V3 array developed previously, to characterize the genome-wide changes in the expression of HERVs in patients suffering from ME/CFS, FM, or both, compared to controls. In turn, they present a useful repertoire of HERVs that might characterize ME/CFS and FM. For the most part, the paper is written in a manner that allows a natural understanding of the workflow and analyses carried out, making it compelling. The figures and additional tables present solid support for the findings. However, some statements made by the authors seem incomplete and would benefit from a more thorough literature review. Overall, this work will be of interest to the medical community seeking in better understanding of the co-occurrence of these pathologies, hinting at a novel angle by integrating HERVs, which are often overlooked, into their assessment.

      Strengths:

      (1) The work is well-presented, allowing the reader to understand the overall workflow and how the specific aims contribute to filling the knowledge gap in the field.

      (2) The analyses carried out to understand the potential impact on gene expression mediated by HERVs are in line with previous works, making it solid and robust in the context of this study.

      Weaknesses:

      (1) The authors claim to obtain genome-wide HERV expression profiles. However, the array used was developed using hg19, while the genomic analysis of this work are carried out using a liftover to hg38. It would improve the statement and findings to include a comparison of the differences in HERVs available in hg38, and how this could impact the "genome-wide" findings.

      This is an important point. However, the low number of probes that were excluded from our analysis by lack of correspondence with hg38, less than 100 among the 1,290,800 probesets, was interpreted as insignificant for "genome-wide" claims. An aspect that will be detailed in the revised version of this manuscript.

      (2) The authors in some points are not thorough with the cited literature. Two examples are:

      a) Lines 396-397 the authors say "the MLT1, usually found enriched near DE genes (Bogdan et al., 2020)". I checked the work by Bogdan, and they studied bacterial infection. A single work in a specific topic is not sufficient to support the statement that MLT1 is "usually" in close vicinity to differentially expressed genes. More works are needed to support this.

      b) After the previous statement, the authors go on to mention "contributing to the coding of conserved lncRNAs (Ramsay et al., 2017)". First, lnc = long non-coding, so this doesn't make sense. Second, in the work by Ramsay they mention "that contributed a significant amount of sequence to primate lncRNAs whose expression was conserved", which is different from what the authors in this study are trying to convey. Again, additional work and a rephrasing might help to support this idea.

      Certainly, these two sentences need rephrasing to better adjust statements to current evidence and will be replaced in the revised version of this manuscript.

      (3) When presenting the clusters, the authors overlook the fact that cluster 4 is clearly control-specific, and fail to discuss what this means. Could this subset of HERV be used as bona fide markers of healthy individuals in the context of these diseases? Are they associated with DE genes? What could be the impact of such associations?

      Using control DE HERV as bona fide markers of healthy individuals seems like an interesting possibility worth exploring. Control DE HERVs (cluster 4) are indeed associated with DE genes involved in apoptosis, T cell activation and cell-cell adhesion (modules 1 and 6) (Figure 3A). The impact of which deserves further study.

      Appraisals on aims:

      The authors set specific questions and presented the results to successfully answer them. The evidence is solid, with some weaknesses discussed above that will methodologically strengthen the work.

      Likely impact of work on the field:

      This work will be of interest to the medical community looking for novel ways to improve clinical diagnosis. Although future works with a greater population size, and more robust techniques such as RNA-Seq, are needed, this is the first step in presenting a novel way to distinguish these pathologies.

      It would be of great benefit to the community to provide a table/spreadsheet indicating the specific genomic locations of the HERVs specific to each condition. This will allow proper provenance for future researchers interested in expanding on this knowledge, as these genomic coordinates will be independent of the technique used (as was the array used here).

      We agree with the reviewer that sharing genomic locations of DE HERVs in these pathologies would contribute to further development of our findings. Unfortunately, we do not hold the rights to share probe coordinates from this custom HERV-V3 microarray which we used under MTA agreement with its developer.

      Reviewer #3 (Public review):

      The authors find that HERV expression patterns can be used as new criteria for differential diagnosis of FM and ME/CFS and patient subtyping. The data are based on transcriptome analysis by microarray for HERVs using patient blood samples, followed by differential expression of ERVs and bioinformatic analyses. This is a standard and solid data processing pipeline, and the results are well presented and support the authors' claim.

    1. Author response:

      Thank you to the reviewers and editors for their positive and constructive comments. Based on this feedback, we can see that we need to clarify that the primary goal of this paper is a test of potential changes in public health policy rather than a test of technical improvements to forecasting models. We briefly summarize the primary goal below to address these public reviews and list our proposed revisions to the manuscript based on reviewer feedback.

      All real-time forecasting models contend with 2 major constraints:

      (1) How far into the future they have to predict

      (2) How rapidly the data used for predictions become available in real time

      In the case of evolutionary influenza forecasts, the current values of these constraints are 1) 12 months into the future and 2) an average lag of ~3 months for hemagglutinin (HA) sequences to become available after sample collection. Regardless of the predictors we use in these models (genetic or phenotypic), our units of prediction always depend on HA: the HA protein is the primary target of our immunity, HA is the only gene whose composition is determined by the vaccine selection process, and influenza diversity is historically defined by clades in HA phylogenies.

      Our primary goal of this study was to understand the relative effect sizes of these two common constraints on forecasting while holding all other variables as constant as possible. With this understanding, we hoped to better inform public health priorities and set realistic expectations for current and future forecasting efforts regardless of the technical specifications of each forecasting model. In other words, the goal of this study was not to optimize prediction methods but to estimate the effects of potential policy changes on forecast accuracy.

      We found that reducing how far into the future we need to predict consistently reduced our forecasting error in simulated populations (where we knew the true fitness of each virus) and in natural populations (where we either estimated fitness from genetic predictors or we knew the true fitness of each virus based on its future success). Figure 6 and its first supplemental figure show these effect sizes for natural and simulated populations, respectively, when the future fitness of each virus is known at the time of prediction. By definition, we cannot hope to improve our estimates of viral fitness for these forecasts by using other genetic or phenotypic information.

      Figure 6 shows that reducing how far into the future we need to predict from 12 to 6 months improves our forecasting accuracy 3 times as much as reducing the lag between sample collection and HA sequence submission to public databases. The impact of this finding is the confirmation that a faster vaccine development process would improve our forecast accuracy substantially more than faster turnaround between sample collection and sequence submission. If our public health goal is to make better predictions of future influenza populations, then this result indicates that our main priority is to speed up the vaccine development process.

      If our public health goal is to better understand the composition of currently circulating influenza populations (the units of our forecasts), then Figure 3 shows that reducing the lag between sample collection and HA sequence submission from ~3 months on average to 1 month on average reduces our uncertainty in current clade frequency estimates by half. This impact is also independent of the predictors we use in our forecasting models and is not lessened by the lack of other genetic or phenotypic information in our analyses.

      We realize that neither a 6-month vaccine development process nor a 1-month average sequence submission lag exist yet, but we believe that these are realistic and achievable goals for scientific and public health communities. We also realize that these public health goals are not mutually exclusive. By measuring the effects of these realistic changes to current policy through our forecasting experiments, we hope to inspire and motivate researchers and decision-makers who are empowered to make both of these goals a reality.

      Finally, we want to emphasize that the use of phenotypic data in forecasts introduces additional delays caused by the lag between when genetic sequences become available and when serological experiments can be performed. Most WHO influenza collaborating centers use a "sequence-first" approach where they characterize the genetic sequence and use available sequences to prioritize phenotypic experiments with serology. This additional lag in availability of phenotypic data means that a forecasting model based on genetic and phenotypic data will necessarily have a greater lag in data availability than a model based on genetic data only. This lag is important for practical forecasts, too, but because the lag reflects specific characteristics of each collaborating center and not a global policy change, we believe this topic falls outside of the scope of this study.

      Based on these public reviews and the private recommendations from reviewers, we plan to make the following revisions to this manuscript.

      ● Clarify the introduction, discussion, and abstract to emphasize the primary goal for this study to test effects of realistic changes to public health policy and note that this study does not cover improvements to forecasting models. As part of these changes, we will include a rationale for our choice of a genetic-information-only approach rather than a model that integrates phenotypic data. We will also refine Figure 1 to more clearly communicate the two factors we tested in this study.

      ● Provide a clearer explanation for the subsampling approach we use, include supplemental materials to communicate the geographic and temporal biases that exist in available HA sequence data, and discuss potential effects of different subsampling strategies.

      ● Evaluate the robustness of our results to different randomly subsampled data. We will perform additional technical replicates of our analysis workflow for natural populations, and summarize the effects of realistic interventions across replicates in a supplemental figure and the main text of the results.

      ● Investigate time-dependent effects of forecast horizons and submission lags on model accuracy to identify any potential biases in accuracy during specific historical epochs or any seasonal trends in accuracy associated with predicting future populations for the Northern or Southern Hemispheres.

      ● In the discussion, clarify how reducing submission lags would practically improve the WHO's ability to select vaccine candidate viruses and minimize jargon that currently makes the discussion less accessible to the average reader.

      ● Investigate how changes in forecast horizons and submission lags change the distance between predicted and observed future populations at antigenic positions (i.e., "epitope" positions) to understand whether we see the same effects with that subset of positions as we see across all HA positions.

    1. Author Response:

      We greatly appreciate the feedback provided by reviewers on this manuscript. One of our key objectives was to provide a comprehensive, detailed resource for researchers using single-cell transcriptomics to study arthritis, especially immune cells like macrophages. We strived to perform thorough, wide-ranging analyses that are both accessible and useful to other scientists in the field, and that we hope will serve as the basis for many future avenues of study. As such, we acknowledge that this work is a “first step”, providing a strong descriptive foundation with some mechanistic insight that we and others will continue pursuing. Preliminary studies in our laboratory seeking to dissect signaling mechanisms associated with the M-CSF pathway have illuminated how complex and context-dependent this signaling is, which is an important consideration for future in vivo investigations. Further, it is indeed true that attempting to harmonize transcriptomic data across studies, models, laboratories, and dissection/processing methods is fraught with difficulty and prone to misinterpretation – and we made an effort to highlight this in our manuscript, particularly with respect to where synovial immune cells were recovered from, and how. We encourage healthy discussion within the field for developing shared, unified protocols for harvests and processing upstream of transcriptomic experiments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review)

      Summary:

      The authors wanted to use AlphaFold-multimer (AFm) predictions to reduce the challenge of physics-based protein-protein docking.

      Strengths:

      They found that two features of AFm predictions are very useful. 1) pLLDT is predictive of flexible residues, which they could target for conformational sampling during docking; 2) the interface-pLLDT score is predictive of the quality of AFm predictions, which allows the authors to decide whether to do local or global docking.

      Weaknesses:

      (1) As admitted by the authors, the AFm predictions for the main dataset are undoubtedly biased because these structures were used for AFm training. Could the authors find a way to assess the extent of this bias?

      Indeed, the AFm training included most of the structures in the DB5 benchmark for its training as many structures (either unbound or bound) were deposited before the training cut-off period. One of the challenges of estimating this bias is the availability of new structures - both bound and unbound deposited after the training cut-off. Estimating the extent of training bias is therefore conditional on these factors and difficult. A few studies have attempted to address this bias (Yin et al, 2022, https://doi.org/10.1002/pro.4379).

      In our study, we assess this bias by comparing the AFm structures to the bound and unbound forms and calculating their Ca RMSDs and TM-scores (new addition). We now elaborate in the Results:Dataset curation section and we have added a figure comparing the TM-scores in the supplement.

      We added a clarifying text and a note about the TM-score calculation in the manuscript as follows:

      “Since most of the benchmark targets in DB5.5 were included in AlphaFold training, there would be training bias associated with their predictions (i.e. our measured success rates are an upper bound).”

      “We also calculated the TM-scores of the AFm predicted complex structures with respect to the bound and the unbound crystal structures (Supplementary Figure S2). As TM-scores reflect a global comparison between structures and are less sensitive to local structural deviations, no strong conclusions could be derived. This is in agreement with our intuition that since both unbound and bound states of proteins will share a similar fold, and AlphaFold can predict structures with high TM-scores in most cases, gauging the conformational deviations with TM-scores would be inconclusive.”

      (2) For the CASP15 targets where this bias is absent, the presentation was very brief. In particular, it would be interesting to see how AFm helped with the docking. The authors may even want to do a direct comparison with docking results without the help of AFm.

      Unfortunately since this was a CASP-CAPRI round, the structure of the unbound Antigen or the nanobodies was unavailable. Thus we cannot perform a comparison without using AF2 at all since we need a structure prediction tool to produce the unbound nanobody and the nanobody-antigen complex template structure to dock. This has been clarified in the main text for better understanding for the readers.

      “Since the nanobody-antigen complexes were CASP targets, we did not have unbound structures, rather only the sequences of individual chains. Therefore, for each target, we employed the AlphaRED strategy as described in Fig 7.”

      Reviewer #1 (Recommendations For The Authors):

      For suggestions for major improvements, see comments under weaknesses. One additional suggestion: the authors found that pLLDT is predictive of flexible residues. Can they try to find AFm features that are predictive of the interface site? Such information may guide their docking to a local site.

      This is a great idea that we and others have been thinking about considerably. Prior work by Burke et al. (Towards a structurally resolved human protein interaction network) examines AlphaFold’s ability to predict PPIs. For high-confidence predicted models of interacting protein complexes, the authors showed that pDockQ correlated reasonably well with correct protein interactions.

      That being said, binding site identification, particularly in a partner-agnostic fashion, i.e. determining binding patches on a given protein, is an area of on-going research . We hope a future study examines AlphaFold3 or ESM3 specifically for this task.

      “Further, we tested multiple thresholds to estimate the optimum cut-off for distinguishing near-native structures (defined as an interface-RMSD < 4 Å) from the predictions. Figure 3.B summarizes the performance with a confusion matrix for the chosen interface-pLDDT cutoff of 85. 79 % of the targets are classified accurately with a precision of 75%, thereby validating the utility of interface-pLDDT as a discriminating metric to rank the docking quality of the AFm complex structure predictions. With AlphaFold3 and ESM3 being released, investigating features that could predict flexible residues or interface site would be valuable, as this information may guide local docking.”

      Minor:

      Page 3, lines 73-77, state how many targets were curated from DB5.5.

      We have now clarified this in the manuscript. All 254 targets curated from DB5.5 at the time of this benchmark study.

      “For each protein target, we extracted the amino acid sequences from the bound structure and predicted a corresponding three-dimensional complex structure with the ColabFold implementation of the AlphaFold multimer v2.3.0 (released in March 2023) for the 254 benchmark targets from DB5.5.”

      In Figure 1, the color used for medium is too difficult to distinguish from the grey color used for rigid.

      We thank you for this suggestion. We have updated the color to olive. Further, based on Reviewer 2’s suggestions, we have moved this plot to the Supplementary.

      Reviewer #2 (Public Review):

      Summary:

      In short, this paper uses a previously published method, ReplicaDock, to improve predictions from AlphaFold-multimer. The method generated about 25% more acceptable predictions than AFm, but more important is improving an Antibody-antigen set, where more than 50% of the models become improved.

      When looking at the results in more detail, it is clear that for the models where the AFm models are good, the improvement is modest (or not at all). See, for instance, the blue dots in Figure 6. However, in the cases where AFm fails, the improvement is substantial (red dots in Figure 6), but no models reach a very high accuracy (Fnat ~0.5 compared to 0.8 for the good AFm models). So the paper could be summarized by claiming, "We apply ReplicaDock when AFm fails", instead of trying to sell the paper as an utterly novel pipeline. I must also say that I am surprised by the excellent performance of ReplicaDock - it seems to be a significant step ahead of other (not AlphaFold) docking methods, and from reading the original paper, that was unclear. Having a better benchmark of it alone (without AFm) would be very interesting.

      We thank the reviewer for highlighting the performance of ReplicaDock. ReplicaDock alone is benchmarked in the original paper (10.1371/journal.pcbi.1010124), with full details on the 2022 version of DB5.5 in the supplement. Indeed ReplicaDock2 achieves the highest reported success rates on flexible docking targets reported in the literature (until this AlphaRED paper!).

      Regarding this statement about “the paper could be summarized…” it might be helpful to give more context. ReplicaDock is a replica exchange Monte Carlo sampling approach for protein docking that incorporates flexibility in an induced-fit fashion. However, the choice of which backbone residues to move is solely dependent on contacts made during each docking trajectory. In the last section of the ReplicaDock paper, we introduced “Directed Induced-fit” where we biased the backbone sampling only towards those residues where we knew the backbone is flexible (this information is obtained because for the benchmark set, we had both unbound and bound structures and hence could cherry-pick the specific residues which are mobile). We agree with the reviewers that AlphaRED is essentially a derivative of ReplicaDock, however, the two major claims that we make in this paper are:

      (1) AlphaFold pLDDT is an effective predictor of backbone flexibility for practical use in docking.

      (2) We can automate the Directed InducedFit approach within ReplicaDock by utilizing this pLDDT information per residue for conformational sampling in protein docking; and in doing so, create a pipeline that would allow us to go from sequence-to-structure-to-complex, specifically capturing conformational changes.

      To conclude these claims, we pose the following questions in the Introduction:

      “(1) Do the residue-specific estimates from AF/AFm relate to potential metrics demonstrating conformational flexibility?

      (2) Can AF/AFm metrics deduce information about docking accuracy?

      (3) Can we create a docking pipeline for in-silico complex structure prediction incorporating AFm to convert sequence-to-structure-to-docked complexes?”

      This work requires a pipeline, the center of which lies in ReplicaDock as a docking method, but has functionalities that were absent in prior work. The goal is also to develop a one-stop shop without manual intervention (a prerequisite for biasing backbone sampling in ReplicaDock) that could be utilized by structural biologists efficiently.

      We clarify this points in the abstract and main text as follows:

      Abstract: “In this work, we combine AlphaFold as a structural template generator with a physics-based replica exchange docking algorithm \add{to better sample conformational changes.”

      Introduction:

      “The overarching goal is to create a one-stop, fully-automated pipeline for simple, reproducible, and accurate modeling of protein complexes. We investigate the aforementioned questions and create a protocol to resolve AFm failures and capture binding-induced conformational changes. We first assess the utility of AFm confidence metrics to detect conformational flexibility and binding site confidence.”

      These results also highlight several questions I try to describe in the weakness section below. In short, they boil down to the fact that the authors must show how good/bad ReplicaDock is at all targets (not only the ones where AFm fails. In addition, I have several more technical comments.

      Strengths:

      Impressive increase in performance on AB-AG set (although a small set and no proteins).

      We thank the reviewer for their comments.

      Weaknesses:

      The presentation is a bit hard to follow. The authors mix several measures (Fnat, iRMS, RMSDbound, etc). In addition, it is not always clear what is shown. For instance, in Figure 1, is the RMSD calculated for a single chain or the entire protein? I would suggest that the author replace all these measures with two: TM-score when evaluating the quality of a single chain and DockQ when evaluating the results for docking. This would provide a clearer picture of the performance. This applies to most figures and tables.

      We apologize for the lack of clarity owing to different metrics. Irms and fnat are standard performance metrics in the docking field, but we agree that DockQ would be simpler when the detail of the other metrics are not required. We have updated the figures Figure 5 and Figure 8 to also show DockQ comparisons.

      Regarding Figure 1, as highlighted in Line 90 of the main-text, “Figure 1 shows the Ca-RMSD of all protein partners of the AFm predicted complex structures with respect to the bound and the unbound.” As suggested by the reviewer in their further comments, we have moved this FIgure to the Supplementary. We have also included TM-score comparison in the Supplementary ( SupFig S2) and included clarifying statements in the main text:

      “We also tested TM-scores to measure the structural deviations of the AFm predicted complex structures with respect to the bound and unbound structures (Supplementary Figure S2). However, this metric is not sensitive enough to detect the subtle, local conformational changes upon binding.”

      For instance, Figure 9 could be shown as a distribution of DockQ scores.

      We have now updated Figure 5 to include DockQ scores in Panel D. Since DockQ is a function of iRMSD, fnat and L-RMSD, it shows cumulative improvement in performance. Some of the nuanced details, such as, the protocol improves i-RMSD considerably but fnat improvement is lacking, and can highlight whether backbone sampling is the challenge or is it sidechain refinement.Therefore, we need to retain the iRMSD and fnat metrics in panel A-C . But We have incorporated this in the main text as follows:

      “Finally, to evaluate docking success rates, we calculate DockQ for top predictions from AFm and AlphaRED respectively (Figure 5D). AlphaRED demonstrates a success rate (DockQ>0.23) for 63% of the benchmark targets. Particularly for Ab-Ag complexes, AFm predicted acceptable or better quality docked structures in only 20% of the 67 targets. In contrast, the AlphaRED pipeline succeeds in 43% of the targets, a significant improvement.”

      Further, we have reevaluated success rates in Figure 8 (previously Figure 9) and have updated the manuscript to report these updated success rates.

      “By utilizing the AlphaRED strategy, we show that failure cases in AFm predicted models are improved for all targets (lower Irms for 97 of 254 failed targets) with CAPRI acceptable-quality or better models generated for 62% of targets overall (Fig 8)”.

      The improvements on the models where AFm is good are minimal (if at all), and it is unclear how global docking would perform on these targets, nor exactly why the plDDT<0.85 cutoff was chosen.

      We agree with the reviewers that the improvement on the models with good AFm predictions is minimal. We acknowledge this in the text now as follows:

      “Most of the improvements in the success rates are for cases where AFm predictions are worse. For targets with good AFm predictions, AlphaRED refinement results in minimal improvements in docking accuracy.”

      The choice of pLDDT cutoff = 85 is elaborated in the “Interface-pLDDT correlates with DockQ and discriminates poorly docked structures” section, paragraph 3. Briefly, we tested multiple metrics and the interface pLDDT had the highest AUC, indicating that it is the best metric for this task. For interface-pLDDT we tested multiple thresholds, and the cutoff of 85 resulted in the highest percentage of true-positive and true-negative rates. This is illustrated with the confusion matrix in Figure 3.B with the precision scores. We now clarify this in the text as follows:

      “With interface-pLDDT as a discriminating metric, we tested multiple thresholds to estimate the optimum cut-off for distinguishing near-native structures (defined as an interface-RMSD < 4 Å) from the predictions. Figure 3B summarizes the performance with a confusion matrix for the chosen interface-pLDDT cutoff of 85. 79% of the targets are classified accurately with a precision of 75%, thereby validating the utility of interface-pLDDT as a discriminating metric to rank the docking quality of the AFm complex structure predictions.”

      To better understand the performance of ReplicaDock, the authors should therefore (i) run global and local docking on all targets and report the results, (ii) report the results if AlphaFold (not multimer) models of the chains were used as input to ReplicaDock (I would assume it is similar). These models can be downloaded from AlphaFoldDB.

      The performance of ReplicaDock on DB5.5 is tabulated in our prior work (https://doi.org/10.1371/journal.pcbi.1010124) and we direct the reviewers there for the detailed performance and results. In our opinion, the benchmark suggested by the reviewer would be redundant and not worth the computational expense.

      The scope of this paper is to highlight a structure prediction + physics-based modeling pipeline for docking to adapt to the accuracy of up-and-coming structure prediction tools.

      Using AlphaFold monomer chains as input and benchmarking on that, albeit interesting scientifically, will not be useful for either the pipeline or biologists who would want a complex structure prediction. We thank the authors for their comments but want to reemphasize that the end goal of this work is to increase the accuracy of complex structure predictions and PPIs obtained from computational tools.

      Further, it would be interesting to see if ReplicaDock could be combined with AFsample (or any other model to generate structural diversity) to improve performance further.

      We would like to highlight that ReplicaDock is a stand-alone tool for protein docking and here we demonstrate the ability of adapting it with metrics derived from AlphaFold or other structure prediction tools (say ESMFold) such as pLDDT for conformational sampling and improving docking accuracy. We definitely agree that adapting it to use with tools such as AFSample will be interesting but it is out of scope of this work.

      The estimates of computing costs for the AFsample are incorrect (check what is presented in their paper). What are the computational costs for RepliaDock global docking?

      The authors of the AFSample paper report that “AFsample requires more computational time than AF2, as it generates 240 models, and including the extra recycles, the overall timing is 1000 more costly than the baseline.” We have reported these exact numbers in our manuscript.

      The computational costs of ReplicaDock are 8-72 CPU hours on a single node with 24 processors as reported in our prior work.

      For AlphaRED, the costs are slightly higher owing to the structure prediction module in the beginning and are up to 100 CPU hrs for our largest (max Nres) target.

      It is unclear strictly what sequences were used as input to the modelling. The authors should use full-length UniProt sequences if they were not done.

      We report this in the methods section of the manuscript as well as in Figure 5. Full length complex sequences were used for the models that we extracted from DB5.5.

      “As illustrated in Fig. 5, given a sequence of a protein complex, we use the ColabFold implementation of AF2-multimer to obtain a predictive template.”

      We clarify this in the methods section as:

      “For each target in the DB5.5 dataset, we first extracted the corresponding FASTA sequence for the bound complex and then obtained AlphaFold predicted models with the ColabFold v1.5.2 implementation of AlphaFold and AlphaFold-multimer (v.2.3.0).”

      The antibody-antigen dataset is small. It could easily be expanded to thousands of proteins. It would be interesting to know the performance of ReplicaDock on a more extensive set of Antibodies and nanobodies.

      This work demonstrates the performance on the docking benchmark, i.e. given unbound structure can you predict the bound complexes. With this regard, our analysis has been focussed on targets where both the unbound and bound structures are available so that we could evaluate the ability of AlphaRED on modeling protein flexibility and docking accuracy. For antibody-antigen complexes, there are only 67 structures with both unbound and bound complexes available and they constituted our dataset. Benchmarking AlphaRED on all antibody-antigen targets can give biased results as most Ab-Ag complexes are in AlphaFold training set. Further, our work is more aimed towards predicting conformational flexibility in docking and not rigid-body docked complexes, so benchmarking on existing bound Ab-Ag structures is out of scope for this work.

      Using pLDDT on the interface region to identify good/bas models is likely suboptimal. It was acceptable (as a part of the score) for AlphaFold-2.0 (monomer), but AFm behaves differently. Here, AFm provides a direct score to evaluate the quality of the interaction (ipTM or Ranking Confidence). The authors should use these to separate good/bad models (for global/local docking), or at least show that these scores are less good than the one they used.

      We thank the reviewers for this suggestion.

      Reviewer #2 (Recommendations For The Authors):

      Some Figures could be skipped/improved

      Fig 1: Use TM-score instead a much better measure (and the figure is not necessary).

      Figure 1 compares the bias of AlphaFold towards unbound or bound forms of the proteins. We believe that this figure highlights the slight inherent bias of AlphaFold towards bound structures over unbound.

      As the reviewers have suggested we have included a plot comparing the TM-scores for the structures. Further, we have moved this figure to the Supplementary.

      Fig 2. Skip B (why compare RMSD with pLDDT?). Add a figure to see how this correlates over all targets not just two.

      RMSD and LDDT both represent metrics to evaluate conformational variability between two structures, such as the bound and unbound forms of the same protein structure. On one hand where RMSD measures overall deviation of residues, LDDT allows the estimation of relative domain orientations and concerted proteins. We have elaborated this in Methods as well as in the Results section titled “AlphaFold pLDDT provides a predictive confidence measure for backbone flexibility”.

      The data for the benchmark targets is now included in the Supplementary (Supplementary Figures S3-S4).

      Fig 3. Color the different chains of a protein differently. Thereby the Receptor/Ligand/Bound labels can be omitted.

      We thank the reviewers for this suggestion. However, the color scheme is chosen to highlight (1) the relative orientation of protein partners relative to each other. We have ensured that the alignment is over one partner (Receptor) so that you could see the relative orientation of the other partner (Ligand) in the modeled protein over the bound structure (in one color). (2) The coloring of the receptor and ligand chain is by pLDDT (from red to blue) to highlight that for decoys with incorrectly predicted interfaces, the pLDDT scores of the interface residues are indeed lower and can be a discriminating metric. We elaborate this in the caption of Figure 3 as well as in the section “Interface-pLDDT correlates with DockQ and discriminates poorly docked structures”. Coloring the chains of a protein differently will obfuscate the point that we are aiming to make and will be inconclusive for the readers as they would need to rely only on quantitative metrics (Irms and DockQ) reported but won’t be able to visualize the interface pLDDT of the incorrectly bound structures. We hope that this justifies the choice of our color scheme.

      Fig 4. Include RankConf, ipTM, pDockQ, and other measures in the plos (they are likely better). Include DockQ for the top targets. It is difficult to estimate for multi chain complexes.

      We thank the reviewer for this suggestion. We have now included the DockQ performances for all targets in Figure 5 (previously Figure 6) as well as re-evaluated our final success rates based on the DockQ calculations in Figure 8 (previously Figure 9).

      Fig 5. use a better measure to split (see above).

      We have elaborated on the choice of the split for the comments above and the interface pLDDT threshold of 85 is a decision made post observation on the docking benchmark. We do want to highlight that the cut-off is arbitrary and in our online server (ROSIE) as well as in custom scripts, this cut-off can be tuned by the user as required. We would suggest a cut-off of 85 based on our observations but the users are welcome to tune this as per their needs.

      Fig 6. Replace lrms/fnat with DockQ.

      We have now included DockQ scores in our manuscript.

      Fig 7. Color the different chains of a protein differently.

      We have colored the protein chains differently. AlphaFold models are in Orange, Bound complexes are in Gray, and predicted proteins from AlphaRED are in Blue-Green indicating the two partners. All models are aligned over the receptor so relative orientations of the ligand protein can be observed.

      Fig 8 Color the different chains of a protein differently.

      The chains are colored differently. We would like the reviewer to elaborate more on what they would like to observe as we believe our color scheme makes intuitive sense for readers.

      Fig 9. Use DockQ instead of CAPRI criteria.

      The figure has been updated based on DockQ. To elaborate, the CAPRI criteria is set based on DockQ scores as elaborated in the figure caption.

    1. Author response:

      eLife Assessment <br /> This manuscript reports important findings that the methyltransferase METTL3 is involved in the repair of abasic sites and uracil in DNA, mediating resistance to floxuridine-driven cytotoxicity. The presented evidence for the involvement of m6A in DNA is incomplete and requires further validation with orthogonal approaches to conclusively show the presence of 6mA in the DNA and exclude that the source is RNA or bacterial contamination. 

      We thank the editors for recognizing the importance of our work and the relevance of METTL3 in DNA repair. However, we wholly disagree with the second sentence in the eLife assessment, and we want to clarify why our evidence for the involvement of 6mA in DNA is complete.  

      The identification of 6mA in DNA, upon DNA damage, is based first on immunofluorescence observations using an anti-m6A antibody. In this setting, removal of RNA with RNase treatment fails to reduce the 6mA signal, excluding the possibility that the source of signal is RNA. In contrast, removal of DNA with DNase treatment removes all 6mA signal, strongly suggesting that the species carrying the N6-methyladenosine modification is DNA (Figure 3D, E). Importantly, in Figure 3F, we provide orthogonal, quantitative mass spectrometry data that independently confirm this finding. Mass spectrometry-liquid chromatography of DNA analytes, conclusively shows the presence of 6mA in DNA upon treatment with DNA damaging agents and excludes that the source is RNA, based on exact mass. Reviewer #2 recognized the strengths of this approach to generate solid evidence for 6mA in DNA.

      Cells only show the 6mA signal when treated with DNA damaging agents, and the 6mA is absent from untreated cells (Figure 3D, E, F). This provides strong evidence that the 6mA signal is not a result of bacterial contamination in our cell lines. Moreover, our cell lines are routinely tested for mycoplasma contamination. It could be possible that stock solutions of DNA damaging agents may be contaminated, but this would need to be true for all individual drugs and stocks tested. The data showing 6mA signal is not significantly different from untreated cells when a DNA damaging agent is combined with a METTL3 inhibitor (Figure 3G, H) provides strong evidence against bacterial contamination in our stocks.  

      In summary, we provide conclusive evidence, based on orthogonal methods, that the METTL3-dependent N6-methyladenosine modification is deposited in DNA, not RNA, in response to DNA damage. 

      Public Reviews: <br /> Reviewer #1 (Public review): <br /> Summary: 

      The authors sought to identify unknown factors involved in the repair of uracil in DNA through a CRISPER knockout screen. 

      Typo above: “CRISPER” should be “CRISPR”.

      Strengths: 

      The screen identified both known and unknown proteins involved in DNA repair resulting from uracil or modified uracil base incorporation into DNA. The conclusion is that the protein activity of METTL3, which converts A nucleotides to 5mA nucleotides, plays a role in the DNA damage/repair response. The importance of METTL3 in DNA repair, and its colocalization with a known DNA repair enzyme, UNG2, is well characterized. 

      Typo above: “5mA” should be “6mA”.

      Weaknesses: <br /> This reviewer identified no major weaknesses in this study. The manuscript could be improved by tightening the text throughout, and more accurate and consistent word choice around the origin of U and 6mA in DNA. The dUTP nucleotide is misincorporated into DNA, and 6mA is formed by methylation of the A base present in DNA. Using words like 6mA "deposition in DNA" seems to imply it results from incorporation of a methylated dATP nucleotide during DNA synthesis.

      The increased presence of 6mA during DNA damage could result from methylation at the A base itself (within DNA) or from incorporation of pre-modified 6mA during DNA synthesis. Our data do not directly discriminate between these two mechanisms, and we will clarify this point in the discussion.

      Reviewer #2 (Public review): <br /> Summary: <br /> In this work, the authors performed a CRISPR knockout screen in the presence of floxuridine, a chemotherapeutic agent that incorporates uracil and fluoro-uracil into DNA, and identified unexpected factors, such as the RNA m6A methyltransferase METTL3, as required to overcome floxuridine-driven cytotoxicity in mammalian cells. Interestingly, the observed N6-methyladenosine was embedded in DNA, which has been reported as DNA 6mA in mammalian genomes and is currently confirmed with mass spectrometry in this model. Therefore, this work consolidated the functional role of mammalian genomic DNA 6mA, and supported with solid evidence to uncover the METTL3-6mA-UNG2 axis in response to DNA base damage. <br /> Strengths: <br /> In this work, the authors took an unbiased, genome-wide CRISPR approach to identify novel factors involved in uracil repair with potential clinical interest. 

      The authors designed elegant experiments to confirm the METTL3 works through genomic DNA, adding the methylation into DNA (6mA) but not the RNA (m6A), in this base damage repair context. The authors employ different enzymes, such as RNase A, RNase H, DNase, and liquid chromatography coupled to tandem mass spectrometry to validate that METTL3 deposits 6mA in DNA in response to agents that increase genomic uracil. <br /> They also have the Mettl3-KO and the METTL3 inhibition results to support their conclusion. <br /> Weaknesses:<br /> Although this study demonstrates that METTL3-dependent 6mA deposition in DNA is functionally relevant to DNA damage repair in mammalian cells, there are still several concerns and issues that need to be improved to strengthen this research.

      First, in the whole paper, the authors never claim or mention the mammalian cell lines contamination testing result, which is the fundamental assay that has to be done for the mammalian cell lines DNA 6mA study.

      Our cell lines are routinely tested for bacterial contamination, specifically mycoplasma, and we plan to state this information in a revised version of the manuscript.

      Importantly, we do not observe 6mA in untreated cells, strongly suggesting that the 6mA signal observed is dependent on the presence of DNA damage and not caused by contamination in the cell lines (Figure 3D, E, F). While it could be possible that stock solutions of DNA damaging agents may be contaminated, this would need to be the case for all individual drugs and stocks tested that induce 6mA, which seems very unlikely. Finally, the data showing 6mA signal is not significantly different from untreated cells when a DNA damaging agent is combined with a METTL3 inhibitor (Figure 3 G, H) provides strong evidence against bacterial contamination in our drug stocks.

      Second, in the whole work, the authors have not supplied any genomic sequencing data to support their conclusions. Although the sequencing of DNA 6mA in mammalian models is challenging, recent breakthroughs in sequencing techniques, such as DR-Seq or NT/NAME-seq, have lowered the bar and improved a lot in the 6mA sequencing assay. Therefore, the authors should consider employing the sequencing methods to further confirm the functional role of 6mA in base repair.

      While we agree that it could be important to understand the precise genomic location of 6mA in relation to DNA damage, this is outside the scope of the current study. Moreover, this exercise may prove unproductive. If 6mA is enriched in DNA at damage sites or as DNA is replicated, the genomic mapping of 6mA is likely to be stochastic. If stochastic, it would be impossible to obtain the read depth necessary to map 6mA accurately.

      Third, the authors used the METTL3 inhibitor and Mettl3-KO to validate the METTL3-6mA-UNG2 functional roles. However, the catalytic mutant and rescue of Mettl3 may be the further experiments to confirm the conclusion. 

      We believe this to be an excellent suggestion from Reviewer #2 but we are unable to perform the proposed experiment at this time. We encourage future studies to explore the rescue experiment.

      Reviewer #3 (Public review):

      Summary:

      The authors are showing evidence that they claim establishes the controversial epigenetic mark, DNA 6mA, as promoting genome stability.

      Strengths:

      The identification of a poorly understood protein, METTL3, and its subsequent characterization in DDR is of high quality and interesting.

      Weaknesses:

      (1) The very presence of 6mA (DNA) in mammalian DNA is still highly controversial and numerous studies have been conclusively shown to have reported the presence of 6mA due to technical artifacts and bacterial contamination. Thus, to my knowledge there is no clear evidence for 6mA as an epigenetic mark in mammals, and consequently, no evidence of writers and readers of 6mA. None of this is mentioned in the introduction. Much of the introduction can be reduced, but a paragraph clearly stating the controversy and lack of evidence for 6mA in mammals needs to be added, otherwise, the reader is given an entirely distorted view of the field.

      These concerns must also be clearly in the limitations section and even in the results section which fails to nuance the authors' findings.

      We agree with the reviewer that the presence and potential function of 6mA in mammalian DNA has been debated. Importantly, the debate regarding the presence and quantity of 6mA in DNA has been previously restricted to undamaged, baseline conditions. In complete agreement with this notion, we do not detect appreciable levels of 6mA in untreated cells. We will revise the introduction to introduce the debate about 6mA in DNA. We, however, want to highlight that our study provides for the first time, convincing evidence (based on orthogonal methods) that 6mA is present in DNA in response to a stimulus, DNA damage.

      (2) What is the motivation for using HT-29 cells? Moreover, the materials and methods do not state how the authors controlled for bacterial contamination, which has been the most common cause of erroneous 6mA signals to date. Did the authors routinely check for mycoplasma?

      HT-29 is a cell line of colorectal origin and chemotherapeutic agents that introduce uracil and uracil derivatives in DNA, as those used in this study, are relevant for the treatment of colorectal cancer. As indicated above, we do not observe 6mA in untreated cells, strongly suggesting that the 6mA signal observed is dependent on DNA damage and not caused by a potential bacterial contamination (Figure 3D, E, F). Additionally, our cell lines are routinely tested for bacterial contamination, specifically mycoplasma.

      (3) The single-cell imaging of 6mA in various cells is nice but must be confirmed by orthogonal approaches. PacBio would provide an alternative and quantitative approach to assessing 6mA levels. Similarly, it is unclear why the authors have not performed dot-blots of 6mA for genomic DNA from the given cell lines.

      We are confused by this point since an orthogonal approach to detect 6mA, mass spectrometry-liquid chromatography, was employed. This method does not use an antibody and confirms the increase of 6mA in DNA when cells were treated with DNA damaging agents. This data is presented in Figure 3F.

      It is sensible to hypothesize that the localization of 6mA is consistent with DNA replication (like uracil deposition). In this event, the genomic mapping of 6mA is likely to be stochastic. This would make quantification with PacBio sequencing difficult because it would be very challenging to achieve the appropriate read depth to call a modified base.

      Dot blots rely on an antibody and thus are not truly orthogonal to our immunofluorescence-based measurements. We preferred the mass spectrometry-liquid chromatography approach we took as a true orthogonal approach.

      (4) The results of Figure 3 need further investigation and validation. If the results are correct the authors are suggesting that the majority of 6mA in their cell lines is present in the DNA, and not the RNA, which is completely contrary to every other study of 6mA in mammalian cells that I am aware of. This could suggest that the antibody is not, in fact, binding to 6mA, but to unmodified adenine, which would explain why the signal disappears after DNAse treatment. Indeed, binding of 6mA to unmethylated DNA is a commonly known problem with most 6mA antibodies and is well described elsewhere.

      Based on this and the following comment, we are convinced that Reviewer #3 has overlooked two critical elements of our study:

      First, the immunofluorescence work presented in Figure 3, showing 6mA signal in response to DNA damage, uses cells that were pre-extracted to remove excess cytoplasmic RNA. This method is often used in immunofluorescence experiments of this kind. The pre-extraction method removes most of the cytoplasmic content, and the majority of the cytoplasmic m6A RNA signal. Supplementary Figure 3D shows cells that have not been pre-extracted prior to staining. These images show the cytoplasmic m6A signal is abundant if we do not perform the pre-extraction step.

      If the antibody used to label 6mA significantly reacted with unmodified adenine, we would expect a large signal in untreated or untreated and denatured conditions. In contrast, an increase in 6mA is not observed in either case.

      Second, the orthogonal approach we employed, mass spectrometry coupled with liquid chromatography, measures 6mA DNA analytes specifically by exact mass. This approach does not depend on an antibody and yields results consistent with those from the immunofluorescence experiments.

      (5) Given the lack of orthologous validation of the observed DNA 6mA and the lack of evidence supporting the presence of 6mA in mammalian DNA and consequently any functional role for 6mA in mammalian biology, the manuscript's conclusions need to be toned down significantly, and the inherent difficultly in assessing 6mA accurately in mammals acknowledged throughout.

      Typo above: “difficultly” should be “difficulty”.

      As discussed in response to prior comments, Figure 3 does provide two independent and orthologous methods that demonstrate 6mA presence in DNA specifically, and not RNA, in response to DNA damage. Complementary and orthogonal datasets are presented using either immunofluorescence microscopy or mass spectrometry-liquid chromatography of extracted DNA. The latter method does not rely on an antibody and can discriminate 6mA DNA versus RNA based on exact mass. We will revise the text to clarify that Figure 3F is a completely orthogonal approach.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review): 

      Hotinger et al. explore the population dynamics of Salmonella enterica serovar Typhimurium in mice using genetically tagged bacteria. In addition to physiological observations, pathology assessments, and CFU measurements, the study emphasizes quantifying host bottleneck sizes that limit Salmonella colonization and dissemination. The authors also investigate the genetic distances between bacterial populations at various infection sites within the host.

      Initially, the study confirms that pretreatment with the antibiotic streptomycin before inoculation via orogastric gavage increases the bacterial burden in the gastrointestinal (GI) tract, leading to more severe symptoms and heightened fecal shedding of bacteria. This pretreatment also significantly reduces between-animal variation in bacterial burden and fecal shedding. The authors then calculate founding population sizes across different organs, discovering a severe bottleneck in the intestine, with founding populations reduced by approximately 10^6-fold compared to the inoculum size. Streptomycin pretreatment increases the founding population size and bacterial replication in the GI tract. Moreover, by calculating genetic distances between populations, the authors demonstrate that, in untreated mice, Salmonella populations within the GI tract are genetically dissimilar, suggesting limited exchange between colonization sites. In contrast, streptomycin pretreatment reduces genetic distances, indicating increased exchange.

      In extraintestinal organs, the bacterial burden is generally not substantially increased by streptomycin pretreatment, with significant differences observed only in the mesenteric lymph nodes and bile. However, the founding population sizes in these organs are increased. By comparing genetic distances between organs, the authors provide evidence that subpopulations colonizing extraintestinal organs diverge early after infection from those in the GI tract. This hypothesis is further tested by measuring bacterial burden and founding population sizes in the liver and GI tract at 5 and 120 hours post-infection. Additionally, they compare orogastric gavage infection with the less injurious method of infection via drinking, finding similar results for CFUs, founding populations, and genetic distances. These results argue against injuries during gavage as a route of direct infection. 

      To bypass bottlenecks associated with the GI tract, the authors compare intravenous (IV) and intraperitoneal (IP) routes of infection. They find approximately a 10-fold increase in bacterial burden and founding population size in immune-rich organs with IV/IP routes compared to orogastric gavage in streptomycin-pretreated animals. This difference is interpreted as a result of "extra steps required to reach systemic organs."

      While IP and IV routes yield similar results in immune-rich organs, IP infections lead to higher bacterial burdens in nearby sites, such as the pancreas, adipose tissue, and intraperitoneal wash, as well as somewhat increased founding population sizes. The authors correlate these findings with the presence of white lesions in adipose tissue. Genetic distance comparisons reveal that, apart from the spleen and liver, IP infections lead to genetically distinct populations in infected organs, whereas IV infections generally result in higher genetic similarity. 

      Finally, the authors investigate GI tract reseeding, identifying two distinct routes. They observe that the GI tracts of IP/IV-infected mice are colonized either by a clonal or a diversely tagged bacterial population. In clonally reseeded animals, the genetic distance within the GI tract is very low (often zero) compared to the bile population, which is predominantly clonal or pauciclonal. These animals also display pathological signs, such as cloudy/hardened bile and increased bacterial burden, leading the authors to conclude that the GI tract was reseeded by bacteria from the gallbladder bile. In contrast, animals reseeded by more complex bacterial populations show that bile contributes only a minor fraction of the tags. Given the large founding population size in these animals' GI tracts, which is larger than in orogastrically infected animals, the authors suggest a highly permissive second reseeding route, largely independent of bile. They speculate that this route may involve a reversal of known mechanisms that the pathogen uses to escape from the intestine. 

      The manuscript presents a substantial body of work that offers a meticulously detailed understanding of the population dynamics of S. Typhimurium in mice. It quantifies the processes shaping the within-host dynamics of this pathogen and provides new insights into its spread, including previously unrecognized dissemination routes. The methodology is appropriate and carefully executed, and the manuscript is well-written, clearly presented, and concise. The authors' conclusions are well-supported by experimental results and thoroughly discussed. This work underscores the power of using highly diverse barcoded pathogens to uncover the within-host population dynamics of infections and will likely inspire further investigations into the molecular mechanisms underlying the bottlenecks and dissemination routes described here.

      Major point:

      Substantial conclusions in the manuscript rely on genetic distance measurements using the Cavalli-Sforza chord distance. However, it is unclear whether these genetic distance measurements are independent of the founding population size. I would anticipate that in populations with larger founding population sizes, where the relative tag frequencies are closer to those in the inoculum, the genetic distances would appear smaller compared to populations with smaller founding sizes independent of their actual relatedness. This potential dependency could have implications for the interpretation of findings, such as those in Figures 2B and 2D, where antibiotic-pretreated animals consistently exhibit higher founding population sizes and smaller genetic distances compared to untreated animals.

      Thank you for raising this important point regarding reliance on cord distances for gauging genetic distance in barcoded populations. The reviewer is correct that samples with more founders will be more similar to the inoculum and thus inherently more similar to other samples that also have more founders. However, creation of libraries containing very large numbers of unique barcodes can often circumvent this issue. In this case, the effect size of chance-based similarity is not large enough to change the interpretation of the data in Figures 2B and 2D. In our case, the library has ~6x10<sup>4</sup> barcodes, and the founding populations in Figure 2B are ~10<sup>3</sup>. Randomly resampling to create two populations of 10<sup>3</sup> cells from an initial population with 6x10<sup>4</sup> barcodes is expected to yield largely distinct populations with very little similarity. Thus, the similarity between streptomycin-treated populations in Figure 2D is likely the result of biology rather than chance.  

      Reviewer #2 (Public review):

      In this paper, Hotinger et. al. propose an improved barcoded library system, called STAMPR, to study Salmonella population dynamics during infection. Using this system, the authors demonstrate significant diversity in the colonization of different Salmonella clones (defined by the presence of different barcodes) not only across different organs (liver, spleen, adipose tissues, pancreas, and gall bladder) but also within different compartments of the same gastrointestinal tissue. Additionally, this system revealed that microbiota competition is the major bottleneck in Salmonella intestinal colonization, which can be mitigated by streptomycin treatment. However, this has been demonstrated previously in numerous publications. They also show that there was minimal sharing between populations found in the intestine and those in the other organs. Upon IV and IP infection to bypass the intestinal bottleneck, they were able to demonstrate, using this library, that Salmonella can renter the intestine through two possible routes. One route is essentially the reverse path used to escape the gut, leading to a diverse intestinal population; while the other, through the bile, typically results in a clonal population. Although the authors showed that the STAMPR pipeline improved the ability to identify founder populations and their diversity within the same animal during infections, some of the conclusions appear speculative and not fully supported.

      (1) It's particularly interesting how the authors, using this system, demonstrate the dominant role of the microbiota bottleneck in Salmonella colonization and how it is widened by antibiotic treatment (Figure 1). Additionally, the ability to track Salmonella reseeding of the gut from other organs starting with IV and IP injections of the pathogen provides a new tool to study population dynamics (Figure 5). However, I don't think it is possible to argue that the proximal and distal small intestine, Peyer's patches (PPs), cecum, colon, and feces have different founder populations for reasons other than stochastic variations. All the barcoded Salmonella clones have the same fitness and the fact that some are found or expanded in one region of the gastrointestinal tract rather than another likely results from random chance - such as being forced in a specific region of the gut for physical or spatial reasons-and subsequent expansion, rather than any inherent biological cause. For example, some bacteria may randomly adhere to the mucus, some may swim toward the epithelial layer, while others remain in the lumen; all will proliferate in those respective sites. In this way, different founder populations arise based on random localization during movement through the gastrointestinal tract, which is an observation, but it doesn't significantly contribute to understanding pathogen colonization dynamics or pathogenesis. Therefore, I would suggest placing less emphasis on describing these differences or better discussing this aspect, especially in the context of the gastrointestinal tract.

      Thank you for helping us identify this area for further clarification. We agree with the reviewer’s interpretation that seeding of proximal and distal small intestine, Peyer's patches (PPs), cecum, colon, and feces with different founder populations is likely caused by stochastic variations, consistent with separate stochastic bottlenecks to establishing these separate niches. To clarify this point we have modified the text in the results section, “Streptomycin treatment decreases compartmentalization of S. Typhimurium populations within the intestine”.

      Change to text:

      “Except for the cecum and colon, in untreated animals the S. Typhimurium populations in different regions of the intestine were dissimilar (Avg. GD ranged from 0.369 to 0.729, 2D left); i.e., there is little sharing between populations in the intestine. These data suggest that there are separate bottlenecks in different regions of the intestine that cause stochastic differences in the identity of the founders. Interestingly, when these founders replicate, they do not mix, remaining compartmentalized with little sharing between populations throughout the intestinal tract (i.e., barcodes found in one region are not in other regions, Figure S3). This was surprising as the luminal contents, an environment presumably conducive to bacterial movement, were not removed from these samples.”

      In this section we are interested in the underlying biology that occurs after the initial bottleneck to preserve this compartmentalization during outgrowth of the intestinal population. In other words, what prevents these separate populations from merging (e.g., what prevents the bacteria replicating in the proximal small intestine from traveling through the intestine and establishing a niche in the distal small intestine)? While we do not explore the mechanisms of compartmentalization, we observe that it is disrupted by streptomycin pretreatment, suggesting a microbiota-dependent biological cause. 

      (2) I do think that STAMPR is useful for studying the dynamics of pathogen spread to organs where Salmonella likely resides intracellularly (Figure 3). The observation that the liver is colonized by an early intestinal population, which continues to proliferate at a steady rate throughout the infection, is very interesting and may be due to the unique nature of the organ compared to the mucosal environment. What is the biological relevance during infection? Do the authors observe the same pattern (Figures 3C and G) when normalizing the population data for the spleen and mesenteric lymph nodes (mLN)? If not, what do the authors think is driving this different distribution?

      Thank you for raising this interesting point. These data indicate that the liver is seeded from the intestine early during infection. The timing and source of dissemination have relevance for understanding how host and pathogen variables control the spread of bacteria to systemic sites. For example, our conclusion (early dissemination) indicates that the immune state of a host at the time of exposure to a pathogen, and for a short period thereafter, are what primarily influence the process of dissemination, not the later response to an active infection. 

      We observe that the liver and mucosal environments within the intestine have similar colonization behaviors. Both niches are seeded early during infection, followed by steady pathogen proliferation and compartmentalization that apparently inhibits further seeding. This results in the identity of barcodes in the liver population remaining distinct from the intestinal populations, and the intestinal populations remaining distinct from each other.

      We observe a similar pattern to the liver in the spleen and MLN (the barcodes in the spleen and MLN are dissimilar to the population in the intestine). To clarify this point, we have modified the text (below) and added this analysis as a supplemental figure (S4).

      Change to text:

      Genetic distance comparison of liver samples to other sites revealed that, regardless of streptomycin treatment, there was very little sharing of barcodes between the intestine and extraintestinal sites (Avg. GD >0.75, Figure 3C). Furthermore, the MLN and spleen populations also lacked similarity with the intestine (Figure S4). These analyses strongly support the idea that S. Typhimurium disseminates to extraintestinal organs relatively early following inoculation, before it establishes a replicative niche in the intestine.

      (3) Figure 6: Could the bile pathology be due to increased general bacterial translocation rather than Salmonella colonization specifically? Did the authors check for the presence of other bacteria (potentially also proliferating) in the bile? Do the authors know whether Salmonella's metabolic activity in the bile could be responsible for gallbladder pathology?

      The reviewer raises interesting points for future work. We did not check whether other bacterial species are translocating during S. Typhimurium infection. The relevance of Salmonella’s metabolic activity is also very interesting, and we hope these questions will be answered by future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points:

      (1) P. 9/10 "... the marked delay in shedding after IP and IV relative to orogastric inoculation suggest that the S. Typhimurium population encounters substantial bottleneck(s) on the route(s) from extraintestinal sites back to the intestine.": Can you conclude that from the data? It could also be possible that there is a biological mechanism (other than chance events) that delays the re-entry to the intestine.

      We propose that the delay in shedding indicates additional obstacles that bacteria face when re-entering the intestine, and that there are likely biological mechanisms that cause this delay. However, these unknown mechanisms effectively act as additional bottlenecks by causing a stochastic loss of population diversity. 

      (2) P. 11 "...both organs would likely contain all 10 barcodes. In contrast, a library with 10,000 barcodes can be used to distinguish between a bottleneck resulting in Ns = 1,000 and Ns = 10,000, since these bottlenecks result in a different number of barcodes in output samples. Furthermore, high diversity libraries reduce the likelihood that two tissue samples share the same barcode(s) due to random chance, enabling more accurate quantification of bacterial dissemination.": I agree with the general analysis, but I find it misleading to talk about the presence of barcodes when the analyses in this manuscript are based on the much more powerful comparison of relative abundance of individual tags instead of their presence or absence.

      The reviewer raises an excellent point, and the distinction between relative abundance versus presence/absence is discussed extensively in the original STAMPR manuscript. Although relative abundance is powerful, the primary metric used in this study (Ns) is calculated principally from the number of barcodes, corrected (via simulations) for the probability of observing the same barcode across distinct founders. Although this correction procedure does rely on barcode abundance, the primary driver of founding population quantification is the number of barcodes.

      (3) P.14 "the library in LB supplemented with SM was not significantly different than the parent strain" and Figure 2C: How was significance tested? How many times were the growth curves recorded? On my print-out, the red color has different shades for different growth curves.

      Significance was tested with a Mann-Whitney and growth curves were performed 5 times. Growth curves are displayed with 50% opacity, and as a result multiple curves directly on top of each other appear darker. The legend to S2 has been modified accordingly.

      (4) P.16: close bracket in the equation for FRD calculation.

      Done

      (5) Figure 2C "Average CFU per founder": I found the wording confusing at first as I thought you divided the average bacterial burden per organ by Ns, instead of averaging the CFU/Ns calculated for each mouse.

      The wording has been clarified. 

      (6) Figure 3B: It would be helpful to include expected genetic distances in the schematic as it is difficult to infer the genetic distance when only two of three, respectively, different "barcode colors" are used. While I find the explanation in the main text intuitive, a graphical representation would have helped me.

      Thank you for the suggestion. Unfortunately, using colors to represent barcodes is imperfect and limits the diversity that can be depicted. We have modified Figure 3B to further clarify. 

      (7) Figure 3C: Why do you compare the genetic distance to the liver, when you discuss the genetic distance of the intestinal population? Is it not possible that the intestinal populations are similar to the extraintestinal organs except the liver?

      For clarity, we chose to highlight exclusively the liver. However, we observed a similar pattern to the liver in other extraintestinal organs. To clarify the generalizability of this point we have added a supplemental figure with comparisons to MLN and Spleen (Supplemental figure S4) as well as further text.

      (8) Figure 3C & S5A: I found "+SM" and "+SM, Drinking" confusing and would have preferred "+SM, Gavage" and "+SM, Drinking" for clarity.

      Done, thank you for the suggestion.

      (9) Figure 3G&H: I find it worthy of discussion that the bacterial burden increases over time, while the founding population decreases. Does that not indicate that replication only occurs at specific sites leading to the amplification of only a few barcodes and thereby a larger change of the relative barcode abundance compared to the inoculum?

      From 5h to 120h the size of the founding population decreases in multiple intestinal sites. This likely indicates that the impact of the initial bottleneck is still ongoing at 5h, although further temporal analysis would be required to define the exact timing of the bottleneck. Notably, the passage time through the mouse intestine is ~5h. Many of the founders observed at 5h could be a population that will never establish a replicative niche, and failing to colonize be shed in the feces, bottlenecking the population between 5h and 120h. To clarify this point we have added the following text:

      Section “S. Typhimurium disseminates out of the intestine before establishing an intestinal replicative niche”.

      “In contrast to the liver, there were more founders present in samples from the intestine (particularly in the colon) at 5 hours versus 120 hours (Figure 3H). These data likely indicate that many of the founders observed in the intestine at 5 hours are shed in the feces prior to establishing a replicative niche, and demonstrates that the forces restricting the S. Typhimurium population in the intestine act over a period of > 5 hours.”  

      (10) Figure S2A: I do not understand this figure. Why are there more than 70.000 tags listed? I was under the impression the barcode library in S. Typhimurium had 55.000 tags while only the plasmid pSM1 had more than 70.000 (but the plasmid should not be relevant here). Why are there distinct lines at approximately 10^-5 and a bit lower? I would have expected continuously distributed barcode frequencies.

      During barcode analysis, each library is mapped to the total barcode list in the barcode donor pSM1, which contains ~70,000 barcodes. This enables consistent analysis across different bacterial libraries. The designation “barcode number” refers to the barcode number in pSM1, meaning many of the barcodes in the Salmonella library are at zero reads. This graph type was chosen to show there was no bias toward a particular barcode, however there is significant overlap of the points, making individual barcode frequencies difficult to see. We have changed the x-axis to state “pSM1 Barcode Number” and clarified in the figure legend.

      Since the y-axes on these graphs is on a log10 scale, the lines represent barcodes with 1 read, 2 reads, 3 reads, etc. As the number of reads per barcode increases linearly, the space between them decreases on logarithmic axes.

      (11) There are a few typos in the figure legends of the supplementary material. For example Figure S2: S. Typhimurium not italicized, ~7x105 no superscript. Fig. S4&5 ", Open circles" is "O" is capitalized.

      Typos have been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This is an interesting manuscript where the authors systematically measure rG4 levels in brain samples at different ages of patients affected by AD. To the best of my knowledge this is the first time that BG4 staining is used in this context and the authors provide compelling evidence to show an association with BG4 staining and age or AD progression, which interestingly indicates that such RNA structure might play a role in regulating protein homeostasis as previously speculated. The methods used and the results reported seems robust and reproducible. There were two main things that needed addressing:

      (1) Usually in BG4 staining experiments to ensure that the signal detected is genuinely due to rG4 an RNase treatment experiment is performed. This does not have to be extended to all the samples presented but having a couple of controls where the authors observe loss of staining upon RNase treatment will be key to ensure with confidence that rG4s are detected under the experimental conditions. This is particularly relevant for this brain tissue samples where BG4 staining has never been performed before.

      (2) The authors have an association between rG4-formation and age/disease progression. They also observe distribution dependency of this, which is great. However, this is still an association which does not allow the model to be supported. This is not something that can be fixed with an easy experiment and it is what it is, but my point is that the narrative of the manuscript should be more fair and reflect the fact that, although interesting, what the authors are observing is a simple correlation. They should still go ahead and propose a model for it, but they should be more balanced in the conclusion and do not imply that this evidence is sufficient to demonstrate the proposed model. It is absolutely fine to refer to the literature and comment on the fact that similar observations have been reported and this is in line with those, but still this is not an ultimate demonstration.

      Comments on current version:

      The authors have now addressed my concerns.

      We thank the reviewer for their support!

      Reviewer #2 (Public review):

      RNA guanine-rich G-quadruplexes (rG4s) are non-canonical higher order nucleic acid structures that can form under physiological conditions. Interestingly, cellular stress is positively correlated with rG4 induction.

      In this study, the authors examined human hippocampal postmortem tissue for the formation ofrG4s in aging and Alzheimer Disease (AD). rG4 immunostaining strongly increased in the hippocampus with both age and with AD severity. 21 cases were used in this study (age range 30-92).

      This immunostaining co-localized with hyper-phosphorylated tau immunostaining in neurons. The BG4 staining levels were also impacted by APOE status. rG4 structure was previously found to drive tau aggregation. Based on these observations, the authors propose a model of neurodegeneration in which chronic rG4 formation drives proteostasis collapse.

      This model is interesting, and would explain different observations (e.g., RNA is present in AD aggregates and rG4s can enhance protein oligomerization and tau aggregation).

      Main issue from the previous round of review:

      There is indeed a positive correlation between Braak stage severity and BG4 staining, but this correlation is relatively weak and borderline significant ((R = 0.52, p value = 0.028). This is probably the main limitation of this study, which should be clearly acknowledged (together with a reminder that "correlation is not causality"). Related to this, here is no clear justification to exclude the four individuals in Fig 1d (without them R increases to 0.78). Please remove this statement. On the other hand, the difference based on APOE status is more striking.

      Comments on current version:

      The authors have made laudable efforts to address the criticisms I made in my evaluation of the original manuscript.

      We thank the reviewer for their support!

      Recommendations for the authors:

      Reviewing Editor:

      I would suggest two minor edits:

      - The findings are correlative and descriptive, but the title implies functionality (A New Role for RNA G-quadruplexes in Aging and Alzheimer′s Disease). I would suggest toning down this title).

      - While I understand the limitations in performing additional biochemical experiments to validate the immunofluorescence study, I think this is worth mentioning as a limitation in the text.

      We have made these two changes as requested, altering the title to remove the word Role that may imply more meaning than intended, and adding a line to the discussion on the need for future additional biochemical experiments.

      Reviewer #1 (Recommendations for the authors):

      Thanks for addressing the concerns raised.

      We thank the reviewer for their support!

      Reviewer #2 (Recommendations for the authors):

      Minor point:

      Related to the "correlation is not causality" remark I made in my evaluation of the original manuscript: the authors' answer is reasonable. Still, I would suggest to modify the abstract: "we propose a model of neurodegeneration in which chronic rG4 formation drives proteostasis collapse" => "we propose a model of neurodegeneration in which chronic rG4 formation is linked to proteostasis collapse"

      All other remarks I made have been answered properly.

      We thank the reviewer for their support! We have made the change exactly as requested by the reviewer.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript investigates lipid scrambling mechanisms across TMEM16 family members using coarse-grained molecular dynamics (MD) simulations. While the study presents a statistically rigorous analysis of lipid scrambling events across multiple structures and conformations, several critical issues undermine its novelty, impact, and alignment with experimental observations.

      Critical issues:

      (1) Lack of Novelty:

      The phenomenon of lipid scrambling via an open hydrophilic groove is already well-established in the literature, including through atomistic MD simulations. The authors themselves acknowledge this fact in their introduction and discussion. By employing coarse-grained simulations, the study essentially reiterates previously known findings with limited additional mechanistic insight. The repeated observation of scrambling occurring predominantly via the groove does not offer significant advancement beyond prior work.

      We agree with the reviewer’s statement regarding the lack of novelty when it comes to our observations of scrambling in the groove of open Ca<sup>2+</sup>-bound TMEM16 structures. However, we feel that the inclusion of closed structures in this study, which attempts to address the yet unanswered question of how scrambling by TMEM16s occurs in the absence of Ca<sup>2+</sup>, offers new observations for the field. In our study we specifically address to what extent the induced membrane deformation, which has been theorized to aid lipids cross the bilayer especially in the absence of Ca<sup>2+</sup>, contributes to the rate of scrambling (see references 36, 59, and 66). There are also several TMEM16F structures solved under activating conditions (bound to Ca<sup>2+</sup> and in the presence of PIP2) which feature structural rearrangements to TM6 that may be indicative of an open state (PDB 6P48) and had not been tested in simulations. We show that these structures do not scramble and thereby present evidence against an out-of-the-groove scrambling mechanism for these states. Although we find a handful of examples of lipids being scrambled by Ca<sup>2+</sup>-free structures of TMEM16 scramblases, none of our simulations suggest that these events are related to the degree of deformation.

      (2) Redundancy Across Systems:

      The manuscript explores multiple TMEM16 family members in activating and non-activating conformations, but the conclusions remain largely confirmatory. The extensive dataset generated through coarse-grained MD simulations primarily reinforces established mechanistic models rather than uncovering fundamentally new insights. The effort, while statistically robust, feels excessive given the incremental nature of the findings.

      Again, we agree with the reviewer’s statement that our results largely confirm those published by other groups and our own. We think there is however value in comparing the scrambling competence of these TMEM16 structures in a consistent manner in a single study to reduce inconsistencies that may be introduced by different simulation methods, parameters, environmental variables such as lipid composition as used in other published works of single family members. The consistency across our simulations and high number of observed scrambling events have allowed us to confirm that the mechanism of scrambling is shared by multiple family members and relies most obviously on groove dilation.

      (3) Discrepancy with Experimental Observations:

      The use of coarse-grained simulations introduces inherent limitations in accurately representing lipid scrambling dynamics at the atomistic level. Experimental studies have highlighted nuances in lipid permeation that are not fully captured by coarse-grained models. This discrepancy raises questions about the biological relevance of the reported scrambling events, especially those occurring outside the canonical groove.

      We thank the reviewer for bringing up the possible inaccuracies introduced by coarse graining our simulations. This is also a concern for us, and we address this issue extensively in our discussion. As the reviewer pointed out above, our CG simulations have largely confirmed existing evidence in the field which we think speaks well to the transferability of observations from atomistic simulations to the coarse-grained level of detail. We have made both qualitative and quantitative comparisons between atomistic and coarse-grained simulations of nhTMEM16 and TMEM16F (Figure 1, Figure 4-figure supplement 1, Figure 4-figure supplement 5) showing the two methods give similar answers for where lipids interact with the protein, including outside of the canonical groove. We do not dispute the possible discrepancy between our simulations and experiment, but our goal is to share new nuanced ideas for the predicted TMEM16 scrambling mechanism that we hope will be tested by future experimental studies.

      (4) Alternative Scrambling Sites:

      The manuscript reports scrambling events at the dimer-dimer interface as a novel mechanism. While this observation is intriguing, it is not explored in sufficient detail to establish its functional significance. Furthermore, the low frequency of these events (relative to groove-mediated scrambling) suggests they may be artifacts of the simulation model rather than biologically meaningful pathways.

      We agree with the reviewer that our observed number of scrambling events in the dimer interface is too low to present it as strong evidence for it being the alternative mechanism for Ca<sup>2+</sup>-independent scrambling. This will require additional experiments and computational studies which we plan to do in future research. However, we are less certain that these are artifacts of the coarse-grained simulation system as we observed a similar event in an atomistic simulation of TMEM16F.

      Conclusion:

      Overall, while the study is technically sound and presents a large dataset of lipid scrambling events across multiple TMEM16 structures, it falls short in terms of novelty and mechanistic advancement. The findings are largely confirmatory and do not bridge the gap between coarse-grained simulations and experimental observations. Future efforts should focus on resolving these limitations, possibly through atomistic simulations or experimental validation of the alternative scrambling pathways.

      Reviewer #2 (Public review):

      Summary:

      Stephens et al. present a comprehensive study of TMEM16-members via coarse-grained MD simulations (CGMD). They particularly focus on the scramblase ability of these proteins and aim to characterize the "energetics of scrambling". Through their simulations, the authors interestingly relate protein conformational states to the membrane's thickness and link those to the scrambling ability of TMEM members, measured as the trespassing tendency of lipids across leaflets. They validate their simulation with a direct qualitative comparison with Cryo-EM maps.

      Strengths:

      The study demonstrates an efficient use of CGMD simulations to explore lipid scrambling across various TMEM16 family members. By leveraging this approach, the authors are able to bypass some of the sampling limitations inherent in all-atom simulations, providing a more comprehensive and high-throughput analysis of lipid scrambling. Their comparison of different protein conformations, including open and closed groove states, presents a detailed exploration of how structural features influence scrambling activity, adding significant value to the field. A key contribution of this study is the finding that groove dilation plays a central role in lipid scrambling. The authors observe that for scrambling-competent TMEM16 structures, there is substantial membrane thinning and groove widening. The open Ca<sup>2+</sup>-bound nhTMEM16 structure (PDB ID 4WIS) was identified as the fastest scrambler in their simulations, with scrambling rates as high as 24.4 {plus minus} 5.2 events per μs. This structure also shows significant membrane thinning (up to 18 Å), which supports the hypothesis that groove dilation lowers the energetic barrier for lipid translocation, facilitating scrambling.

      The study also establishes a correlation between structural features and scrambling competence, though analyses often lack statistical robustness and quantitative comparisons. The simulations differentiate between open and closed conformations of TMEM16 structures, with open-groove structures exhibiting increased scrambling activity, while closed-groove structures do not. This finding aligns with previous research suggesting that the structural dynamics of the groove are critical for scrambling. Furthermore, the authors explore how the physical dimensions of the groove qualitatively correlate with observed scrambling rates. For example, TMEM16K induces increased membrane thinning in its open form, suggesting that membrane properties, along with structural features, play a role in modulating scrambling activity.

      Another significant finding is the concept of "out-of-the-groove" scrambling, where lipid translocation occurs outside the protein's groove. This observation introduces the possibility of alternate scrambling mechanisms that do not follow the traditional "credit-card model" of groove-mediated lipid scrambling. In their simulations, the authors note that these out-of-the-groove events predominantly occur at the dimer interface between TM3 and TM10, especially in mammalian TMEM16 structures. While these events were not observed in fungal TMEM16s, they may provide insight into Ca<sup>2+</sup>-independent scrambling mechanisms, as they do not require groove opening.

      Weaknesses:

      A significant challenge of the study is the discrepancy between the scrambling rates observed in CGMD simulations and those reported experimentally. Despite the authors' claim that the rates are in line experimentally, the observed differences can mean large energetic discrepancies in describing scrambling (larger than 1kT barrier in reality). For instance, the authors report scrambling rates of 10.7 events per μs for TMEM16F and 24.4 events per μs for nhTMEM16, which are several orders of magnitude faster than experimental rates. While the authors suggest that this discrepancy could be due to the Martini 3 force field's faster diffusion dynamics, this explanation does not fully account for the large difference in rates. A more thorough discussion on how the choice of force field and simulation parameters influence the results, and how these discrepancies can be reconciled with experimental data, would strengthen the conclusions. Likewise, rate calculations in the study are based on 10 μs simulations, while experimental scrambling rates occur over seconds. This timescale discrepancy limits the study's accuracy, as the simulations may not capture rare or slow scrambling events that are observed experimentally and therefore might underestimate the kinetics of scrambling. It's however important to recognize that it's hard (borderline unachievable) to pinpoint reasonable kinetics for systems like this using the currently available computational power and force field accuracy. The faster diffusion in simulations may lead to overestimated scrambling rates, making the simulation results less comparable to real-world observations. Thus, I would therefore read the findings qualitatively rather than quantitatively. An interesting observation is the asymmetry observed in the scrambling rates of the two monomers. Since MARTINI is known to be limited in correctly sampling protein dynamics, the authors - in order to preserve the fold - have applied a strong (500 kJ mol-1 nm-2) elastic network. However, I am wondering how the ENM applies across the dimer and if any asymmetry can be noticed in the application of restraints for each monomer and at the dimer interface. How can this have potentially biased the asymmetry in the scrambling rates observed between the monomers? Is this artificially obtained from restraining the initial structure, or is the asymmetry somehow gatekeeping the scrambling mechanism to occur majorly across a single monomer? Answering this question would have far-reaching implications to better describe the mechanism of scrambling.

      The main aim of our computational survey was to directly compare all relevant published TMEM16 structures in both open and closed states using the Martini 3 CGMD force field. Our standardized simulation and analysis protocol allowed us to quantitatively compare scrambling rates across the TMEM16 family, something that has never been done before. We do acknowledge that direct comparison between simulated versus experimental scrambling rates is complicated and is best to be interpreted qualitatively. In line with other reports (e.g., Li et al, PNAS 2024), lipid scrambling in CGMD is 2-3 orders of magnitude faster than typical experimental findings. In the CG simulation field, these increased dynamics due to the smoother energy landscape are a well known phenomenon. In our view, this is a valuable trade-off for being able to capture statistically robust scrambling dynamics and gain mechanistic understanding in the first place, since these are currently challenging to obtain otherwise. For example, with all-atom MD it would have been near-impossible to conclude that groove openness and high scrambling rates are closely related, simply because one would only measure a handful of scrambling events in (at most) a handful of structures.

      Considering the elastic network: the reviewer is correct in that the elastic network restrains the overall structure to the experimental conformation. This is necessary because the Martini 3 force field does not accurately model changes in secondary (and tertiary) structure. In fact, by retaining the structural information from the experimental structures, we argue that the elastic network helped us arrive at the conclusion that groove openness is the major contributing factor in determining a protein’s scrambling rate. This is best exemplified by the asymmetric X-ray structure of TMEM16K (5OC9), in which the groove of one subunit is more dilated than the other. In our simulation, this information was stored in the elastic network, yielding a 4x higher rate in the open groove than in the closed groove, within the same trajectory.

      Notably, the manuscript does not explore the impact of membrane composition on scrambling rates. While the authors use a specific lipid composition (DOPC) in their simulations, they acknowledge that membrane composition can influence scrambling activity. However, the study does not explore how different lipids or membrane environments or varying membrane curvature and tension, could alter scrambling behaviour. I appreciate that this might have been beyond the scope of this particular paper and the authors plan to further chase these questions, as this work sets a strong protocol for this study. Contextualizing scrambling in the context of membrane composition is particularly relevant since the authors note that TMEM16K's scrambling rate increases tenfold in thinner membranes, suggesting that lipid-specific or membrane-thickness-dependent effects could play a role.

      Considering different membrane compositions: for this study, we chose to keep the membranes as simple as possible. We opted for pure DOPC membranes, because it has (1) negligible intrinsic curvature, (2) forms fluid membranes, and (3) was used previously by others (Li et al, PNAS 2024). As mentioned by the reviewer, we believe our current study defines a good standardized protocol and solid baseline for future efforts looking into the additional effects of membrane composition, tension, and curvature that could all affect TMEM16-mediated lipid scrambling.

      Reviewer #3 (Public review):

      Strengths:

      The strength of this study emerges from a comparative analysis of multiple structural starting points and understanding global/local motions of the protein with respect to lipid movement. Although the protein is well-studied, both experimentally and computationally, the understanding of conformational events in different family members, especially membrane thickness less compared to fungal scramblases offers good insights.

      We appreciate the reviewer recognizing the value of the comparative study. In addition to valuable insights from previous experimental and computational work, we hope to put forward a unifying framework that highlights various TMEM16 structural features and membrane properties that underlie scrambling function.

      Weaknesses:

      The weakness of the work is to fully reconcile with experimental evidence of Ca²⁺-independent scrambling rates observed in prior studies, but this part is also challenging using coarse-grain molecular simulations. Previous reports have identified lipid crossing, packing defects, and other associated events, so it is difficult to place this paper in that context. However, the absence of validation leaves certain claims, like alternative scrambling pathways, speculative.

      It is generally difficult to quantitatively compare bulk measurements of scrambling phenomena with simulation results. The advantage of simulations is to directly observe the transient scrambling events at a spatial and temporal resolution that is currently unattainable for experiments. The current experimental evidence for the precise mechanism of Ca<sup>2+</sup>-independent scrambling is still under debate. We therefore hope to leverage the strength of MD and statistical rigor of coarse-grained simulations to generate testable hypotheses for further structural, biochemical, and computational studies.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public review):

      This experiment sought to determine what effect congenital/early-onset hearing loss (and associated delay in language onset) has on the degree of inter-individual variability in functional connectivity to the auditory cortex. Looking at differences in variability rather than group differences in mean connectivity itself represents an interesting addition to the existing literature. The sample of deaf individuals was large, and quite homogeneous in terms of age of hearing loss onset, which are considerable strengths of the work. The experiment appears well conducted and the results are certainly of interest. R: Thank you for your positive and thoughtful feedback.

      Reviewer #3 (Public review):

      Summary:

      This study focuses on changes in brain organization associated with congenital deafness. The authors investigate differences in functional connectivity (FC) and differences in the variability of FC. By comparing congenitally deaf individuals to individuals with normal hearing, and by further separating congenitally deaf individuals into groups of early and late signers, the authors can distinguish between changes in FC due to auditory deprivation and changes in FC due to late language acquisition. They find larger FC variability in deaf than normal-hearing individuals in temporal, frontal, parietal, and midline brain structures, and that FC variability is largely driven by auditory deprivation. They suggest that the regions that show a greater FC difference between groups also show greater FC variability.

      Strengths:

      The manuscript is well-written, and the methods are clearly described and appropriate. Including the three different groups enables the critical contrasts distinguishing between different causes of FC variability changes. The results are interesting and novel.

      Weaknesses:

      Analyses were conducted for task-based data rather than resting-state data. The authors report behavioral differences between groups and include behavioral performance as a nuisance regressor in their analysis. This is a good approach to account for behavioral task differences, given the data. Nevertheless, additional work using resting-state functional connectivity could remove the potential confound fully.

      The authors have addressed my concerns well.

      Thank you for your thoughtful feedback. We appreciate your acknowledgment of the strengths of our study and the approaches taken to address potential confounds. As noted, we discuss the limitation of not including resting-state data in the manuscript, and we agree that this represents an important avenue for future research. We hope to address this question in future studies.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The paper proposes that the placement of criteria for determining whether a stimulus is 'seen' or 'unseen' can significantly impact the validity of neural measures of consciousness. The authors found that conservative criteria, which require stronger evidence to classify a stimulus as 'seen,' tend to inflate effect sizes in neural measures, making conscious processing appear more pronounced than it is. Conversely, liberal criteria, which require less evidence, reduce these effect sizes, potentially underestimating conscious processing. This variability in effect sizes due to criterion placement can lead to misleading conclusions about the nature of conscious and unconscious processing.

      Furthermore, the study highlights that the Perceptual Awareness Scale (PAS), a commonly used tool in consciousness research, does not effectively mitigate these criterion-related confounds. This means that even with PAS, the validity of neural measures can still be compromised by how criteria are set. The authors emphasize the need for careful consideration and standardization of criterion placement in experimental designs to ensure that neural measures accurately reflect the underlying cognitive processes. By addressing this issue, the paper aims to improve the reliability and validity of findings in the field of consciousness research.

      Strengths:

      (1) This research provides a fresh perspective on how criterion placement can significantly impact the validity of neural measures in consciousness research.

      (2) The study employs robust simulations and EEG experiments to demonstrate the effects of criterion placement, ensuring that the findings are well-supported by empirical evidence.

      (3) By highlighting the limitations of the PAS and the impact of criterion placement, the study offers practical recommendations for improving experimental designs in consciousness research.

      Weaknesses:

      The primary focused criterion of PAS is a commonly used tool, but there are other measures of consciousness that were not evaluated, which might also be subject to similar or different criterion limitations. A simulation could applied to these metrics to show how generalizable the conclusion of the study is.

      We would like to thank reviewer 1 for their positive words and for taking the time to evaluate our manuscript. We agree that it would be important to gauge generalization to other metrics of consciousness. Note however, that the most commonly used alternative methods are postdecision wagering and confidence, both of which are known to behave quite similarly to the PAS (Sandberg, Timmermans , Overgaard & Cleeremans, 2010). Indeed, we have confirmed in other work that confidence is also sensitive to criterion shifts (see https://osf.io/preprints/psyarxiv/xa4fj). Although it has been claimed that confidence-derived aggregate metrics like meta-d’ or metacognitive efficiency may overcome criterion shifts, it would require empirical data rather than simulation to settle whether this is true or not (also see the discussion in https://osf.io/preprints/psyarxiv/xa4fj). Furthermore, out of these metrics, the PAS seems to be the preferred one amongst consciouness researchers (see figure 4 in Francken, Beerendonk, Molenaar, Fahrenfort, Kiverstein, Seth, Gaal S van, 2022; as well as https://osf.io/preprints/psyarxiv/bkxzh). Thus, given the fact that other metrics are either expected to behave in similar ways and/or because it would require more empirical work to determine along which dimension(s) criterion shifts would operate in alternative metrics, we see no clear path to implement the suggested simulations. We anticipate that aiming to do this would require a considerable amount of additional work, figuring out many things which we believe would better suit a future project. We would of course be open to doing this if the reviewer would have more specific suggestions for how to go about the proposed simulations.

      Reviewer #2 (Public review):

      Summary:

      The study investigates the potential influence of the response criterion on neural decoding accuracy in consciousness and unconsciousness, utilizing either simulated data or reanalyzing experimental data with post-hoc sorting data.

      Strengths:

      When comparing the neural decoding performance of Target versus NonTarget with or without post-hoc sorting based on subject reports, it is evident that response criterion can influence the results. This was observed in simulated data as well as in two experiments that manipulated the subject response criterion to be either more liberal or more conservative. One experiment involved a two-level response (seen vs unseen), while the other included a more detailed four-level response (ranging from 0 for no experience to 3 for a clear experience). The findings consistently indicated that adopting a more conservative response criterion could enhance neural decoding performance, whether in conscious or unconscious states, depending on the sensitivity or overall response threshold.

      Weaknesses:

      (1) The response criterion plays a crucial role in influencing neural decoding because a subject's report may not always align with the actual stimulus presented. This discrepancy can occur in cases of false alarms, where a subject reports seeing a target that was not actually there, or in cases where a target is present but not reported. Some may argue that only using data from consistent trials (those with correct responses) would not be affected by the response criterion. However, the authors' analysis suggests that a conservative response criterion not only reduces false alarms but also impacts hit rates. It is important for the authors to further investigate how the response criterion affects neural decoding even when considering only correct trials.

      We would like to thank reviewer 2 for taking the time to evaluate our manuscript. We appreciate the suggestion to investigate neural decoding on only correct trials. What we in fact did is consider target trials that are 'correct' (hits = seen target present trials) and 'incorrect' (misses = unseen target present trials) separately, see figure 4A and figure 4B. This shows that the response criterion also affects the neural measure of consciousness when only considering correct target present trials. Note however, that one cannot decode 'unseen' (target present) trials if one only aims to decode 'correct' trials, because those are all incorrect by definition. We did not analyze false alarms (these would be the 'seen' trials on the noise distribution of Figure 1A), as there were not enough trials of those, especially in the conservative condition (see Figure 2C and 2D), making comparisons between conservative and liberal impossible. However, the predictions for false alarms are pretty straightforward, and follow directly from the framework in Figure 1.

      (2) The author has utilized decoding target vs. nontarget as the neural measures of unconscious and/or conscious processing. However, it is important to note that this is just one of the many neural measures used in the field. There are an increasing number of studies that focus on decoding the conscious content, such as target location or target category. If the author were to include results on decoding target orientation and how it may be influenced by response criterion, the field would greatly benefit from this paper.

      We thank the reviewer for the suggestion to decode orientation of the target. In our experiments, the target itself does not have an orientation, but the texture of which it is composed does. We used four orientations, which were balanced out within and across conditions such that presence-absence decoding is never driven by orientation, but rather by texture based figure-ground segregation (for similar logic, see for example Fahrenfort et al, 2007; 2008 etc). There are a couple of things to consider when wanting to apply a decoding analysis on the orientation of these textures:

      (1) Our behavioral task was only on the presence or absence of the target, not on the orientation of the textures. This makes it impossible to draw any conclusions about the visibility of the orientation of the textures. Put differently: based on behavior there is no way of identifying seen or unseen orientations, correctly or incorrectly identified orientations etc. For examply, it is easy to envision that an observer detects a target without knowing the orientation that defines it, or vice versa a situation in which an observer does not detect the target while still being aware of the orientation of a texture in the image (either of the figure, or of the background). The fact that we have no behavioral response to the orientation of the textures severely limits the usefulness of a hypothetical decoding effect on these orientations, as such results would be uninterpretable with respect to the relevant dimension in this experiment, which is visibility.

      (2) This problem is further excarbated by the fact that the orientation of the background is always orthogonal to the orientation of the target. Therefore, one would not only be decoding the orientation of the texture that constitutes the target itself, but also the texture that constitutes the background. Given that we also have no behavioral metric of how/whether the orientation of the background is perceived, it is similarly unclear how one would interpret any observed effect.

      (3) Finally, it is important to note that – even when categorization/content is sometimes used as an auxiliary measure in consciousness research (often as a way to assay objective performance) - consciousness is most commonly conceptualized on the presence-absence dimension. A clear illustration of this is the concept of blindsight. Blindsight is the ability of observers to discriminate stimuli (i.e. identify content) without being able to detect them. Blindsight is often considered the bedrock of the cognitive neuroscience of consciousness as it acts as proof that one can dissociate between unconscious processing (the categorization of a stimulus, i.e. the content) and conscious processing of that stimulus (i.e. the ability to detect it).

      Given the above, we do not see how the suggested analysis could contribute to the conclusions that the manuscript already establishes. We hope that – given the above - the reviewer agrees with this assessment.

      Reviewer #3 (Public review):

      Summary:

      Fahrenfort et al. investigate how liberal or conservative criterion placement in a detection task affects the construct validity of neural measures of unconscious cognition and conscious processing. Participants identified instances of "seen" or "unseen" in a detection task, a method known as post hoc sorting. Simulation data convincingly demonstrate that, counterintuitively, a conservative criterion inflates effect sizes of neural measures compared to a liberal criterion. While the impact of criterion shifts on effect size is suggested by signal detection theory, this study is the first to address this explicitly within the consciousness literature. Decoding analysis of data from two EEG experiments further shows that different criteria lead to differential effects on classifier performance in post hoc sorting. The findings underscore the pervasive influence of experimental design and participants report on neural measures of consciousness, revealing that criterion placement poses a critical challenge for researchers.

      Strengths and Weaknesses:

      One of the strengths of this study is the inclusion of the Perceptual Awareness Scale (PAS), which allows participants to provide more nuanced responses regarding their perceptual experiences. This approach ensures that responses at the lowest awareness level (selection 0) are made only when trials are genuinely unseen. This methodological choice is important as it helps prevent the overestimation of unconscious processing, enhancing the validity of the findings.

      A potential area for improvement in this study is the use of single time-points from peak decoding accuracy to generate current source density topography maps. While we recognize that the decoding analysis employed here differs from traditional ERP approaches, the robustness of the findings could be enhanced by exploring current source density over relevant time windows. Event-related peaks, both in terms of timing and amplitude, can sometimes be influenced by noise or variability in trial-averaged EEG data, and a time-window analysis might provide a more comprehensive and stable representation of the underlying neural dynamics.

      We thank reviewer 3 for their positive words and for taking the time to evaluate our manuscript. If we understand the reviewer correctly, he/she suggests that the signal-to-noise ratio could be improved by averaging over time windows rather than taking the values at singular peaks in time. Before addressing this suggestion, we would like to point out that we plotted the relevant effects across time in Supplementary Figure S1A and S1B. These show that the observed effects were not somehow limited in time, i.e. only occuring around the peaks, but that they consistenly occured throughout the time course of the trial. In line with this observation one might argue that the results could be improved further by averaging across windows of interest rather than taking the peak moments alone, as the reviewer suggests. Although this might be true, there are many analysis choices that one can make, each of which could have a positive (or negative) effect on the signal to noise ratio. For example, when taking a window of interest, one is faced with a new choice to make, this time regarding the number of consecutive samples to average across (i.e. the size of the window), etc. More generally there is a long list of choices that may affect the precise outcome of analyses, either positively or negatively. Having analyzed the data in one way, the problem with adding new analysis approaches is that there is no objective criterion for deciding which analysis would be ‘best’, other than looking at the outcome of the statistical analyses themselves. Doing this would constitute an explorative double-dipping-like approach to analyzing the results, which – aside from potentially increasing the signal to noise ratio – is likely to also result in an increase of the type I error rate. In the past, when the first author of this manuscript has attempted to minimize the number of statistical tests, he has lowered the number of EEG time points by simply taking the peaks (for example see https://doi.org/10.1073/pnas.1617268114), and that is the approach that was taken here as well. Given the above, we prefer not to further ‘try out’ additional analytical approaches on this dataset, simply to improve the results. We hope the reviewer sympathizes with our position that it is methodologically most sound to stick to the analyses we have already performed and reported, without further exploration.

      It is helpful that the authors show the standard error of the mean for the classifier performance over time. A similar indication of a measure of variance in other figures could improve clarity and transparency.

      That said, the paper appears solid regarding technical issues overall. The authors also do a commendable job in the discussion by addressing alternative paradigms, such as wagering paradigms, as a possible remedy to the criterion problem (Peters & Lau, 2015; Dienes & Seth, 2010). Their consideration of these alternatives provides a balanced view and strengthens the overall discussion.

      We thank the reviewer for this suggestion. Note that we already have a measure of variance in the other figures too, namely showing the connected data points of individual participants. Indvidual data points as a visualization of variance is preferred by many journals (e.g., see https://www.nature.com/documents/cr-gta.pdf), and also shows the spread of relevant differences when paired points are connected. For example, in Figure 2, 3 and 4, the relevant difference is between the liberal and conservative condition. When wanting to show the spread of the differences between these conditions, one option would be to first subtract the two measures in a pairwise fashion (e.g., liberal-conservative), and then plot the spread of those differences using some metric (e.g. standard error/CI of the mean difference). However, this has the disadvantage of no longer separately showing the raw scores on the conditions that are being compared. Showing conditions separately provides clarity to the reader about what is being compared to what. The most common approach to visualizing the variance of the relevant difference in such cases, is to plot the connected individual data points of all participants in the same plot. The uniformity of the slope of these lines in such a visualization provides direct insight into the spread of the relevant difference. Plotting the standard error of the mean on the raw scores of the conditions in these plots would not help, because this would not visualize the spread of the relevant difference (liberal-conservative). We therefore opted in the manuscript to show the mean scores on the conditions that we compare, while also showing the connected raw data points of individual participants in the same plot. One might argue that we should then use that same visualization in figure 3A, but note that this figure is merely intended to identify the peaks, i.e. it does not compare liberal to conservative. Furthermore, plotting the decoding time lines of individual participants would greatly diminish the clarity of this figure. Given our explanation, we hope the reviewer agrees with the approach that we chose, although we are of course open to modifying the figures if the reviewer has a suggestion for doing so while taking into account the points we raise here in our response.

      Impact of the Work:

      This study effectively demonstrates a phenomenon that has been largely unexplored within the consciousness literature. Subjective measures may not reliably capture the construct they aim to measure due to criterion confounds. Future research on neural measures of consciousness should account for this issue, and no-report measures may be necessary until the criterion problem is resolved.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors could further elaborate on the results of the PAS to provide a clearer insight into the impact of response criteria, which is notably more complex than in other experiments. Specifically, the results demonstrate that conservative response criterion condition displays a considerably higher sensitivity compared to those with a liberal response criterion. It would be interesting to explore whether this shift in sensitivity suggests a correlation between changes in response criteria and conscious experiences, and how the interaction between sensitivity and response criteria can affect the neural measure of consciousness.

      We thank the reviewer for this suggestion. Note that the change in sensitivity that we observed is minor compared to the change we observed in response criterion (hedges g criterion in exp 2 = 2.02, compared to hedges g sensitivity/d’ in exp 2 = 0.42). However, we do investigate the effect of sensitivity (disregarding response criterion) on decoding accuracy. To this end we devised Figure 3C (for the full decoding time course see Supplementary Figure S1B). These figures show that the small behavioral sensitivity effects observed in both experiments (hedges g sensitivity in exp 1 = 0.30, exp 2 = 0.42) did not translate into significant decoding differences between conservative and liberal in either experiment. This comes as no surprise given the small corresponding behavioral effects. Note that small sensitivity differences between liberal and conservative conditions are commonplace, plausibly driven by the fact that being liberal also involves being more noisy in one’s response tendencies (i.e. sometimes randomly indicating presence). Further, the reviewer suggests that we might correlate changes in response criteria to changes in conscious experience. The only relevant metric of conscious experience for which we have data in this manuscript is the Perceptual Awareness Scale (PAS), so we assume the reviewer asks for a correlation between experimentally induced changes in response criterion with the equivalent changes in d’. To this end we computed the difference in the PAS-based d’ metric between conservative and liberal, as well as the difference in the PAS-based criterion metric between conservative and liberal, and correlated these across subjects (N=26) using a Spearman rank correlation. The result shows that these metrics do not correlate r(24)=0.04, p=0.85. Note however that small-N correlations like these are only somewhat reliable for large effect sizes. An N of 26 and a mere power of 80% requires an effect size of at least r=0.5 to be detectable, so even if a correlation were to exist we may not have had enough power to detect it. Due to these caveats we opted to not report this null-correlation in the manuscript, but we are of course willing to do so if the reviewer and/or editor disagrees with this assessment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Tubert C. et al. investigated the role of dopamine D5 receptors (D5R) and their downstream potassium channel, Kv1, in the striatal cholinergic neuron pause response induced by thalamic excitatory input. Using slice electrophysiological analysis combined with pharmacological approaches, the authors tested which receptors and channels contribute to the cholinergic interneuron pause response in both control and dyskinetic mice (in the LDOPA off state). They found that activation of Kv1 was necessary for the pause response, while activation of D5R blocked the pause response in control mice. Furthermore, in the LDOPA off-state of dyskinetic mice, the absence of the pause response was restored by the application of clozapine. The authors claimed that (1) the D5R-Kv1 pathway contributes to the cholinergic interneuron pause response in a phasic dopamine concentration-dependent manner, and (2) clozapine inhibits D5R in the L-DOPA off state, which restores the pause response.

      Strengths:

      The electrophysiological and pharmacological approaches used in this study are powerful tools for testing channel properties and functions. The authors' group has well-established these methodologies and analysis pipelines. Indeed, the data presented were robust and reliable.

      Thank you for your comments.

      Weaknesses:

      Although the paper has strengths in its methodological approaches, there is a significant gap between the presented data and the authors' claims.

      There was no direct demonstration that the D5R-Kv1 pathway is dominant when dopamine levels are high. The term 'high' is ambiguous, and it raises the question of whether the authors believe that dopamine levels do not reach the threshold required to activate D5R under physiological conditions.

      We acknowledge that further work is necessary to clarify the role of the D5R in physiological conditions. While we haven’t found effects of the D1/D5 receptor antagonist SCH23390 on the pause response in control animals (Fig. 3), it is still possible that dopamine levels reach the threshold to stimulate D5R when burst firing of dopaminergic neurons contributes to dopamine release. We believe the pause response depends, among other factors, on the relative stimulation levels of SCIN D2 and D5 receptors, which is likely not an all-or-nothing phenomenon. To reduce ambiguity, we have eliminated the labels referring to dopamine levels in Figure 6F.

      Furthermore, the data presented in Figure 6 are confusing. If clozapine inhibits active D5R and restores the pause response, the D5R antagonist SCH23390 should have the same effect. The data suggest that clozapine-induced restoration of the pause response might be mediated by other receptors, rather than D5R alone.

      Thank you for letting us clarify this issue. Please note that the levels of endogenous dopamine 24 h after the last L-DOPA challenge in severe parkinsonian mice are expected to be very low. In the absence of an agonist, a pure D1/D5 antagonist would not exert an effect, as demonstrated with SCH23390 alone, which did not have an impact on the SCIN response to thalamic stimulation (Fig. 6). While clozapine can also act as a D1/D5 receptor antagonist, its D1/D5 effects in absence of an agonist are attributed to its inverse agonist properties (PMID: 24931197). Notably, SCH23390 prevented the effect of clozapine, allowing us to conclude that ligand-independent D1/D5 receptor-mediated mechanisms are involved in suppressing the pause response in dyskinetic mice. We now made it clearer in the third paragraph of the Discussion.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Tubert et al presents the role of the D5 receptor in modulating the striatal cholinergic interneuron (CIN) pause response through D5R-cAMP-Kv1 inhibitory signaling. Their model elucidates the on / off switch of CIN pause, likely due to the different DA affinity between D2R and D5R. This machinery may be crucial in modulating synaptic plasticity in cortical-striatal circuits during motor learning and execution. Furthermore, the study bridges their previous finding of CIN hyperexcitability (Paz et al., Movement Disorder 2022) with the loss of pause response in LID mice.

      Strengths:

      The study had solid findings, and the writing was logically structured and easy to follow. The experiments are well-designed, and they properly combined electrophysiology recording, optogenetics, and pharmacological treatment to dissect/rule out most, if not all, possible mechanisms in their model.

      Thank you for your comments.

      Weaknesses:

      The manuscript is overall satisfying with only some minor concerns that need to be addressed. Manipulation of intracellular cAMP (e.g. using pharmacological analogs or inhibitors) can add additional evidence to strengthen the conclusion.

      Thank you for the suggestion. While we acknowledge that we are not providing direct evidence of the role of cAMP, we chose not to conduct these experiments because cAMP levels influence several intrinsic and synaptic currents beyond Kv1, significantly affecting  membrane oscillations and spontaneous firing, as shown in Paz et al. 2021. However, we are modifying the fourth paragraph of the Discussion so there is no misinterpretation about our findings in the current work.

      Reviewer #3 (Public review):

      Summary:

      Tubert et al. investigate the mechanisms underlying the pause response in striatal cholinergic interneurons (SCINs). The authors demonstrate that optogenetic activation of thalamic axons in the striatum induces burst activity in SCINs, followed by a brief pause in firing. They show that the duration of this pause correlates with the number of elicited action potentials, suggesting a burst-dependent pause mechanism. The authors demonstrated this burst-dependent pause relied on Kv1 channels. The pause is blocked by an SKF81297 and partially by sulpiride and mecamylamine, implicating D1/D5 receptor involvement. The study also shows that the ZD7288 does not reduce the duration of the pause and that lesioning dopamine neurons abolishes this response, which can be restored by clozapine.

      Weaknesses:

      While this study presents an interesting mechanism for SCIN pausing after burst activity, there are several major concerns that should be addressed:

      (1) Scope of the Mechanism:

      It is important to clarify that the proposed mechanism may apply specifically to the pause in SCINs following burst activity. The manuscript does not provide clear evidence that this mechanism contributes to the pause response observed in behavioral animals. While the thalamus is crucial for SCIN pauses in behavioral contexts, the exact mechanism remains unclear. Activating thalamic input triggers burst activity in SCINs, leading to a subsequent pause, but this mechanism may not be generalizable across different scenarios. For instance, approximately half of TANs do not exhibit initial excitation but still pause during behavior, suggesting that the burst-dependent pause mechanism is unlikely to explain this phenomenon. Furthermore, in behavioral animals, the duration of the pause seems consistent, whereas the proposed mechanism suggests it depends on the prior burst, which is not aligned with in vivo observations. Additionally, many in vivo recordings show that the pause response is a reduction in firing rate, not complete silence, which the mechanism described here does not explain. Please address these in the manuscript.

      Thank you for your valuable feedback. While the absence of an initial burst in some TANs in vivo may suggest the involvement of alternative or additional mechanisms, this does not exclude a participation of Kv1 currents. We have seen that subthreshold depolarizations induced by thalamic inputs are sufficient to produce an afterhyperpolarization (AHP) mediated by Kv1 channels (see Tubert et al., 2016, PMID: 27568555). Although such subthreshold depolarizations are not captured in current recordings from behaving animals, intracellular in vivo recordings have demonstrated an intrinsically generated AHP after subthreshold depolarization of SCIN caused by stimulation of excitatory afferents (PMID: 15525771). Additionally, when pause duration is plotted against the number of spikes elicited by thalamic input (Fig. 1G), we found that one elicited spike is followed by an interspike interval 1.4 times longer than the average spontaneous interspike interval. We acknowledge the potential involvement of additional factors, including a decrease of excitatory thalamic input coinciding with the pause, followed by a second volley of thalamic inputs (Fig. 1J-K, after observations by Matsumoto et al., 2001- PMID: 11160526), as well as the timing of elicited spikes relative to ongoing spontaneous firing (Fig. 1D-E). Dopaminergic modulation (Fig. 3) and regional differences among striatal regions (PMID: 24559678) may also contribute to the complexity of these dynamics. 

      (2) Terminology:

      The use of "pause response" throughout the manuscript is misleading. The pause induced by thalamic input in brain slices is distinct from the pause observed in behavioral animals. Given the lack of a clear link between these two phenomena in the manuscript, it is essential to use more precise terminology throughout, including in the title, bullet points, and body of the manuscript.

      While we acknowledge that our study does not include in vivo evidence, we believe ex vivo preparations have been instrumental in elucidating the mechanisms underlying the responses observed in vivo. We also agree with previous ex vivo studies in using consistent terminology. However, we will clarify the ex vivo nature of our work in the abstract and bullet points for greater transparency.

      (3) Kv1 Blocker Specificity:

      It is unclear how the authors ruled out the possibility that the Kv1 blocker did not act directly on SCINs. Could there be an indirect effect contributing to the burst-dependent pause? Clarification on this point would strengthen the interpretation of the results.

      Thank you for letting us clarify this issue. In our previous work (Tubert et al., 2016) we showed that the Kv1.3 and Kv1.1 subunits are selectively expressed in SCIN throughout the striatum. Moreover, gabaergic transmission is blocked in our preparations. We are including a phrase to make it clearer in the manuscript (Results section, subheading “The pause response to thalamic stimulation requires activation of Kv1 channels”).

      (4) Role of D1 Receptors:

      While it is well-established that activating thalamic input to SCINs triggers dopamine release, contributing to SCIN pausing (as shown in Figure 3), it would be helpful to assess the extent to which D1 receptors contribute to this burst-dependent pause. This could be achieved by applying the D1 agonist SKF81297 after blocking nAChRs and D2 receptors.

      Thank you for letting us clarify this point. We show that blocking D2R or nAChR reduces the pause only for strong thalamic stimulation eliciting 4 SCIN spikes (Figure 3G), whereas the D1/D5 agonist SKF81297 is able to reduce the pause induced by weaker stimulation as well (Figure 3C). In addition, the D1/D5 receptor antagonist SCH23390 does not modify the pause response (Figure 3C). This may indicate that nAChR-mediated dopamine release induced by thalamic-induced bursts more efficiently activates D2R compared to D5R. We speculate that, in this context, lack of D5R activation may be necessary to keep normal levels of Kv1.3 currents necessary for SCIN pauses.

      (5) Clozapine's Mechanism of Action:

      The restoration of the burst-dependent pause by clozapine following dopamine neuron lesioning is interesting, but clozapine acts on multiple receptors beyond D1 and D5.

      Although it may be challenging to find a specific D5 antagonist or inverse agonist, it would be more accurate to state that clozapine restores the burst-dependent pause without conclusively attributing this effect to D5 receptors.

      Thank you for your insightful observation. We acknowledge the difficulty of targeting dopamine receptors pharmacologically due to the lack of highly selective D1/D5 inverse agonists. We used SCH23390, which is a highly selective D1/D5 receptor antagonist devoid of inverse agonist effects, to block clozapine’s ability to restore SCIN pauses (Figure 6C). This indicates that the restoration of SCIN pauses by clozapine depends on D1/D5 receptors. Furthermore, in a previous study, we demonstrated that clozapine’s effect on restoring SCIN excitability in dyskinetic mice (a phenomenon mediated by Kv1 channels in SCIN; Tubert et al., 2016) was not due to its action on serotonin receptors (Paz, Stahl et al., 2022). While our data do not rule out the potential contribution of other receptors, such as muscarinic acetylcholine receptors, we believe they strongly support the role of D1/D5 receptors. To reflect this, we added a statement discussing the potential contribution of receptors beyond D1/D5 in the last paragraph of the Discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The effect of MgTx was not consistent with the previous study (Tubert, 2016). I expected MgTx to increase the basal firing rate of cholinergic interneurons.

      Thank you for highlighting this. In our previous study we used ACSF in the recording pipette, instead of the intracellular solution -higher in potassium- used in the present study. This is likely related to the higher spontaneous firing rates of SCIN observed in the present study, which made the SCIN response stand out. In addition, our previous study analyzed the effect of MgTx on spontaneous firing frequency of SCIN isolated from major circuit regulation by adding CNQX and picrotoxin to the bath, while in this study we needed to preserve the thalamic input and only picrotoxin in the bath was used. Given these differences, the two conditions are not strictly comparable but rather give complementary information.

      (2) In the text, the authors claim that "SCINs recorded in the parkinsonian OFF-L-DOPA condition show an increase in membrane excitability that mimics changes acutely induced by SKF81297 in SCINs from control mice." However, the data for SKF81297 do not support this claim.

      We modified the text to make it clearer that the cited phrase refers to a previous publication (PMID: 35535012) in which SCIN intrinsic excitability was characterized via analysis of responses to somatic current injection in whole-cell recordings. In the present study Fig. 3D shows SKF81297 effects on interspike intervals during spontaneous activity with a trend towards increased firing, and Fig. 4E a lack of effect on “burst duration” for responses with different numbers of spikes elicited by thalamic afferent stimulation. 

      (3) I recommend testing whether other receptors, such as D2R, contribute to the clozapineinduced pause response in the L-DOPA off state.

      Thank you for your suggestion. We acknowledge that studying the role of D2R is important. However, our preliminary data suggest that a comprehensive follow up study, which is beyond the scope of this manuscript, is necessary to understand their role. 

      Reviewer #2 (Recommendations for the authors):

      (1) For Figure 1D-E, I understand that the authors are trying to state that the previous spontaneous spike contributes to a hyperpolarized window that induces a delay in the evoked spikes. However, it is almost impossible to discriminate between spontaneous and evoked spikes in this experiment. Furthermore, considering the tonic firing property, I highly suspect that even a sham control design (no optogenetic light) will give you a similar distribution as in Figure 1E (the longer IN X1, the shorter in X2).

      We agree that some spikes following stimulus onset may have occurred independently of the light stimulus, as it is also possible during behavioral tasks. We used the baseline recordings to estimate the effects of a sham stimulus as requested and included the data in Fig. 1E-F. As expected, the sham stimulation data showed a similar inverse relationship with the time elapsed from the preceding spike, but latencies were longer than with the stimulus (except for times close to the average ISI), suggesting that the optical stimulation increased the probability of evoking a spike (Fig. 1F). Remarkably, the pause following this threshold stimulation was significantly longer than the baseline ISI, as reported in the main text (Results section, last sentence of first paragraph).

      (2) The authors used optogenetics to induce thalamic inputs to induce the pause after bursts. Considering CINs also receive inputs from different brain regions, e.g. cortex, does this phenomena/pause after bursts also exist following cortical inputs?

      We did not study the SCIN response to cortical inputs, but both thalamic and cortical inputs seem to drive SCIN pause-responses as observed by others (PMID: 24553950).  

      (3) The effect of the D5R inverse agonism, and the further combined with D5R agonist and antagonist, faithfully reveal/confirm the increase of ligand-independent activity of D5R in LID reported previously. It would be ideal to also directly modulate intracellular cAMP (as in the 2022 paper) to confirm the rescue effects on the CIN pause response.

      Please, see our response in the public review.

      (4) In healthy conditions, the balance between D2R and D5R signaling (shown in Figure 6F left) switches the pause and no pause modes which potentially contributes to cortical-striatal plasticity. How about in LID off L-DOPA condition? Is it possible to rescue/modulate the pause on/off mode by D2R agonism in LID?

      We haven’t tested the effect of D2 agonists yet, but this is scheduled for follow up studies. 

      Reviewer #3 (Recommendations for the authors):

      (1) The authors use the ratio of pause duration to baseline ISI to describe the pause, which is useful for detecting significant differences. However, it would be beneficial to also report the actual duration of the burst-dependent pause to provide readers with a clearer understanding of the variation in pauses.

      In all figures we report the average baseline ISI duration for each experiment / experimental condition, allowing readers to estimate actual pause durations. We added in the main text actual average pause durations corresponding to Fig. 1H, which are representative of those observed along the study.

      (2) In Figure 3D, a more detailed comparison would be helpful, as there appears to be a significant difference between the SKF81297 group and others.

      We acknowledge that there might be a trend towards reduced ISIs, however, it was statistically non-significant (see legend of figure 3). In addition, the effect of SKF81297 seems unrelated to this trend, as its effect is also seen under the effect of ZD7288, which substantially prolongs the baseline ISI (Fig. 4E-F).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment  

      This manuscript reports important findings that the methyltransferase METTL3 is involved in the repair of abasic sites and uracil in DNA, mediating resistance to floxuridine-driven cytotoxicity. Convincing evidence shows the involvement of m6A in DNA based on single cell imaging and mass spec data. The authors present evidence that the m6A signal does not result from bacterial contamination or RNA, but the text does not make this overly clear.

      We thank the editors for recognizing the importance of our work and the relevance of METTL3 and 6mA in DNA repair. We agree the evidence presented can be regarded as convincing, in that it includes validation with orthogonal approaches and excludes the source of 6mA being RNA or bacterial contamination.

      To clarify, the identification of 6mA in DNA, upon DNA damage, is based first on immunofluorescence observations using an anti-m6A antibody. In this setting, removal of RNA with RNase treatment fails to reduce the 6mA signal, excluding the possibility that the source of signal is RNA. In contrast, removal of DNA with DNase treatment removes all 6mA signal, strongly suggesting that the species carrying the N6-methyladenosine modification is DNA (Figure 3D, E). Importantly, in Figure 3F, G, we provide orthogonal, quantitative mass spectrometry data that independently confirm this finding. Mass spectrometry-liquid chromatography of DNA analytes, conclusively shows the presence of 6mA in DNA upon treatment with DNA damaging agents and excludes that the source is RNA, based on exact mass. 

      Cells only show the 6mA signal when treated with DNA damaging agents, and the 6mA is absent from untreated cells (Figure 3D, E, H, I). This provides strong evidence that the 6mA signal is not a result of bacterial contamination in our cell lines. Furthermore, our cell lines are routinely tested for mycoplasma contamination. It could be possible that stock solutions of DNA damaging agents may be contaminated, but this would need to be true for all individual drugs and stocks tested, which is highly unlikely. Moreover, the data showing 6mA signal is not significantly different from untreated cells when a DNA damaging agent is combined with a METTL3 inhibitor (Figure 3H, I) provides strong evidence against bacterial contamination in our stocks.  

      In summary, we provide conclusive evidence, based on orthogonal methods, that the METTL3-dependent N6-methyladenosine modification is deposited in DNA, not RNA, in response to DNA damage and have now clarified these points in the results and discussion. 

      Public Reviews:  

      Reviewer #1 (Public review):  

      Summary:  

      The authors sought to identify unknown factors involved in the repair of uracil in DNA through a CRISPR knockout screen.  

      Strengths:  

      The screen identified both known and unknown proteins involved in DNA repair resulting from uracil or modified uracil base incorporation into DNA. The conclusion is that the protein activity of METTL3, which converts A nucleotides to 6mA nucleotides, plays a role in the DNA damage/repair response. The importance of METTL3 in DNA repair, and its colocalization with a known DNA repair enzyme, UNG2, is well characterized.  

      Weaknesses:  

      This reviewer identified no major weaknesses in this study. The manuscript could be improved by tightening the text throughout, and more accurate and consistent word choice around the origin of U and 6mA in DNA. The dUTP nucleotide is misincorporated into DNA, and 6mA is formed by methylation of the A base present in DNA. Using words like 6mA "deposition in DNA" seems to imply it results from incorporation of a methylated dATP nucleotide during DNA synthesis.  

      The increased presence of 6mA during DNA damage could result from methylation at the A base itself (within DNA) or from incorporation of pre-modified 6mA during DNA synthesis. Our data do not directly discriminate between these two mechanisms, and we clarified this point in the discussion.  

      Reviewer #2 (Public review):  

      Summary:  

      In this work, the authors performed a CRISPR knockout screen in the presence of floxuridine, a chemotherapeutic agent that incorporates uracil and fluoro-uracil into DNA, and identified unexpected factors, such as the RNA m6A methyltransferase METTL3, as required to overcome floxuridine-driven cytotoxicity in mammalian cells. Interestingly, the observed N6-methyladenosine was embedded in DNA, which has been reported as DNA 6mA in mammalian genomes and is currently confirmed with mass spectrometry in this model. Therefore, this work consolidated the functional role of mammalian genomic DNA 6mA, and supported with solid evidence to uncover the METTL3-6mA-UNG2 axis in response to DNA base damage.  

      Strengths:  

      In this work, the authors took an unbiased, genome-wide CRISPR approach to identify novel factors involved in uracil repair with potential clinical interest.  

      The authors designed elegant experiments to confirm the METTL3 works through genomic DNA, adding the methylation into DNA (6mA) but not the RNA (m6A), in this base damage repair context. The authors employ different enzymes, such as RNase A, RNase H, DNase, and liquid chromatography coupled to tandem mass spectrometry to validate that METTL3 deposits 6mA in DNA in response to agents that increase genomic uracil.  

      They also have the Mettl3-KO and the METTL3 inhibition results to support their conclusion.  

      Weaknesses:  

      Although this study demonstrates that METTL3-dependent 6mA deposition in DNA is functionally relevant to DNA damage repair in mammalian cells, there are still several concerns and issues that need to be improved to strengthen this research.  

      First, in the whole paper, the authors never claim or mention the mammalian cell lines contamination testing result, which is the fundamental assay that has to be done for the mammalian cell lines DNA 6mA study.  

      Our cell lines are routinely tested for bacterial contamination, specifically mycoplasma, and we state this information in the revised manuscript. 

      Importantly, we do not observe 6mA in untreated cells, strongly suggesting that the 6mA signal observed is dependent on the presence of DNA damage and not caused by contamination in the cell lines (Figure 3D, E, H, I). While it could be possible that stock solutions of DNA damaging agents may be contaminated, this would need to be the case for all individual drugs and stocks tested that induce 6mA, which is very unlikely. Finally, the data showing 6mA signal is not significantly different from untreated cells when a DNA damaging agent is combined with a METTL3 inhibitor (Figure 3 H, I) provides strong evidence against bacterial contamination in our drug stocks.  

      Second, in the whole work, the authors have not supplied any genomic sequencing data to support their conclusions. Although the sequencing of DNA 6mA in mammalian models is challenging, recent breakthroughs in sequencing techniques, such as DR-Seq or NT/NAME-seq, have lowered the bar and improved a lot in the 6mA sequencing assay. Therefore, the authors should consider employing the sequencing methods to further confirm the functional role of 6mA in base repair.  

      While we agree that it could be important to understand the precise genomic location of 6mA in relation to DNA damage, this is outside the scope of the current study. Moreover, this exercise may prove unproductive. If 6mA is enriched in DNA at damage sites or as DNA is replicated, the genomic mapping of 6mA is likely to be stochastic. If stochastic, it would be impossible to obtain the read depth necessary to map 6mA accurately. 

      Third, the authors used the METTL3 inhibitor and Mettl3-KO to validate the METTL36mA-UNG2 functional roles. However, the catalytic mutant and rescue of Mettl3 may be the further experiments to confirm the conclusion.  

      We believe this to be an excellent suggestion from Reviewer #2 but we are unable to perform the proposed experiment at this time. We encourage future studies to explore the rescue experiment.  

      Reviewer #3 (Public review):  

      Summary:  

      The authors are showing evidence that they claim establishes the controversial epigenetic mark, DNA 6mA, as promoting genome stability.  

      Strengths:  

      The identification of a poorly understood protein, METTL3, and its subsequent characterization in DDR is of high quality and interesting.  

      Weaknesses:  

      (1) The very presence of 6mA (DNA) in mammalian DNA is still highly controversial and numerous studies have been conclusively shown to have reported the presence of 6mA due to technical artifacts and bacterial contamination. Thus, to my knowledge there is no clear evidence for 6mA as an epigenetic mark in mammals, and consequently, no evidence of writers and readers of 6mA. None of this is mentioned in the introduction. Much of the introduction can be reduced, but a paragraph clearly stating the controversy and lack of evidence for 6mA in mammals needs to be added, otherwise, the reader is given an entirely distorted view of the field.  

      These concerns must also be clearly in the limitations section and even in the results section which fails to nuance the authors' findings. 

      We agree with the reviewer that the presence and potential function of 6mA in mammalian DNA has been debated. Importantly, the debate regarding the presence and quantity of 6mA in DNA has been previously restricted to undamaged, baseline conditions. In complete agreement with this notion, we do not detect appreciable levels of 6mA in untreated cells. We revised the introduction section to present the debate about 6mA in DNA. We, however, want to highlight that our study provides, for the first time, convincing evidence (based on two orthogonal methods) that 6mA is present in DNA in response to a stimulus, DNA damage. We do not claim or provide any data that suggest 6mA is a baseline epigenetic mark.  

      (2) What is the motivation for using HT-29 cells? Moreover, the materials and methods do not state how the authors controlled for bacterial contamination, which has been the most common cause of erroneous 6mA signals to date. Did the authors routinely check for mycoplasma? 

      HT-29 is a cell line of colorectal origin and chemotherapeutic agents that introduce uracil and uracil derivatives in DNA, as those used in this study, are relevant for the treatment of colorectal cancer. As indicated above, we do not observe 6mA in untreated cells, strongly suggesting that the 6mA signal observed is dependent on DNA damage and not caused by a potential bacterial contamination (Figure 3D, E, H, I). Additionally, our cell lines are routinely tested for bacterial contamination, specifically mycoplasma. 

      (3) The single cell imaging of 6mA in various cells is nice. The results are confirmed by mass spec as an orthogonal approach. Another orthogonal and quantitative approach to assessing 6mA levels would be PacBio. Similarly, it is unclear why the authors have not performed dot-blots of 6mA for genomic DNA from the given cell lines.

      We are confused by this point since an orthogonal approach to detect 6mA, mass spectrometry-liquid chromatography, was employed. This method does not use an antibody and confirms the increase of 6mA in DNA when cells were treated with DNA damaging agents. This data is presented in Figure 3F, G. 

      It is sensible to hypothesize that the localization of 6mA is consistent with DNA replication (like uracil deposition). In this event, the genomic mapping of 6mA is likely to be stochastic. This would make quantification with PacBio sequencing difficult because it would be very challenging to achieve the appropriate read depth to call a modified base. 

      Dot blots rely on an antibody and thus are not truly orthogonal to our immunofluorescence-based measurements. We preferred the mass spectrometry-liquid chromatography approach we took as a true orthogonal approach.  

      (4) The results of Figure 3 need further investigation and validation. If the results are correct the authors are suggesting that the majority of 6mA in their cell lines is present in the DNA, and not the RNA, which is completely contrary to every other study of 6mA in mammalian cells that I am aware of. This could suggest that the antibody is not, in fact, binding to 6mA, but to unmodified adenine, which would explain why the signal disappears after DNAse treatment. Indeed, binding of 6mA to unmethylated DNA is a commonly known problem with most 6mA antibodies and is well described elsewhere.  

      Based on this and the following comment, we are convinced that Reviewer #3 has overlooked two critical elements of our study:

      First, the immunofluorescence work presented in Figure 3, showing 6mA signal in response to DNA damage, uses cells that were pre-extracted to remove excess cytoplasmic RNA. This method is often used in immunofluorescence experiments of this kind. The pre-extraction method removes most of the cytoplasmic content, and the majority of the cytoplasmic m6A RNA signal. Supplementary Figure 3D shows cells that have not been pre-extracted prior to staining. These images show the cytoplasmic m6A signal is abundant if we do not perform the pre-extraction step. 

      If the antibody used to label 6mA significantly reacted with unmodified adenine, we would expect a large signal in untreated or untreated and denatured conditions. In contrast, an increase in 6mA is not observed in either case.

      Second, the orthogonal approach we employed, mass spectrometry coupled with liquid chromatography, measures 6mA DNA analytes specifically by exact mass. This approach does not depend on an antibody and yields results consistent with those from the immunofluorescence experiments. 

      (5) Given the lack of orthologous validation of the observed DNA 6mA and the lack of evidence supporting the presence of 6mA in mammalian DNA and consequently any functional role for 6mA in mammalian biology, the manuscript's conclusions need to be toned down significantly, and the inherent difficulty in assessing 6mA accurately in mammals acknowledged throughout.  

      As discussed in response to prior comments, Figure 3 does provide two independent and orthologous methods that demonstrate 6mA presence in DNA specifically, and not RNA, in response to DNA damage. Complementary and orthogonal datasets are presented using either immunofluorescence microscopy or mass spectrometry-liquid chromatography of extracted DNA. The latter method does not rely on an antibody and can discriminate 6mA DNA versus RNA based on exact mass. We revised the text to clarify that Figure 3F, G is a completely orthogonal approach. 

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):  

      The authors cited most of the related publications; however, the reviewer suggested that three 2015 papers in Cell (Dahua Chen's, Yang Shi's, and Chuan He's) and the 2016 Nature (Andrew Xiao's) article are worth citing here because those are the milestone works reported the genomic DNA 6mA, for the first wave, in eukaryotic and mammalian genomes.  

      Furthermore, in Tao P. Wu and Andrew Z. Xiao's 2016 Nature article, the result has already emphasized the genomic DNA 6mA is enriched in the H2A.X sites; therefore, that work indicated the link between DNA damage and repair and 6mA's functional role. The authors may add some comments or discussion on this point.  

      Last but not least, the authors may also need to discuss the reported evidence of DNA 6mA's function in mitochondria.  

      We thank the reviewer for these suggestions. We revised our introduction and include additional references and discussion points, as suggested by the reviewer. 

      Reviewer #3 (Recommendations for the authors):  

      Minor points:  

      (1) In general, the manuscript is too verbose, and the amount of text can be dramatically reduced/sharpened. The introduction in particular is too long. 

      We revised the manuscript and reduced text when appropriate.

      (2) Each results section can also be condensed to improve clarity significantly. Indeed the results section reads like a 'Result & Discussion' section, which is then followed by a Discussion. Maybe the discussion section can be shortened to a 'conclusion'.

      We revised the results section when appropriate and reworked the discussion.

      Importantly, we revised the text related to Figure 3 as it does appear that Reviewer #3 did not appreciate key results present in this figure, specifically the orthogonal, mass spectrometry approach validating the discovery of 6mA DNA species (Figure 3F, G). We added a schematic as Figure 3F to further clarify this point as well. 

      (3) The accession number for sequencing data in GEO data should be provided.  

      The accession numbers is now provided in the manuscript. GSE282260.

      (4) All figures are unnecessarily small and in some cases, supporting figures from the supplementary data should be moved into the main figure to improve clarity. 

      The figures are of high image quality and can be enlarged easily. If there are specific figures that the reviewer believes will improve clarity, we would be happy to move them.

  2. Jan 2025
    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important work proposes a neural network model of interactions between the prefrontal cortex and basal ganglia to implement adaptive resource allocation in working memory, where the gating strategies for storage are adjusted by reinforcement learning. Numerical simulations provide convincing evidence for the superiority of the model in improving effective capacity, optimizing resource management, and reducing error rates, as well as solid evidence for its human-like performance. The paper could be strengthened further by a more thorough comparison of model predictions with human behavior and by improved clarity in presentation. This work will be of broad interest to computational and cognitive neuroscientists, and may also interest machine-learning researchers who seek to develop brain-inspired machine-learning algorithms for memory.

      We thank the reviewers for their thorough and constructive comments, which have helped us clarify, augment and solidify our work. Regarding the suggestion to include a “more thorough comparison with with human behavior”, we believe this comment reflects one of the reviewer’s suggestion to compare with sequential order effects. We now include a new section with simulations showing that the network exhibits clear recency effects in accordance with the literature, and where such recency effects are known to be related to WM interference and not due to passive decay. Overall our work makes substantial contact with human behavioral patterns that have been documented in the human literature (and which as far as we know have not been jointly captured by any one model), such as the shape of the error distributions, including probability of recall and variable precision;  attraction to recently presented items,  sensitivity to reinforcement history, set-size dependent chunking, recency effects,  dopamine manipulation effects, as well of a range of human data linking capacity limitations to frontostriatal function. It also provides a theoretical proposal for the well established phenomenon of capacity limitations in humans, suggesting that they arise due to difficulty in WM management.

      Below we address each reviewer individually, responding to each comment and providing the relevant location in the paper that the changes and additions were made. Reviewer responses are included in blue/bold for clarity.  

      Public Reviews:

      Reviewer 1:

      Thank you for your comments. We appreciate your statements of the strengths of this paper and your suggestions to improve this paper.

      First, the method section appears somewhat challenging to follow. To enhance clarity, it might be beneficial to include a figure illustrating the overall model architecture. This visual aid could provide readers with a clearer understanding of the overall network model.

      Additionally, the structure depicted in Figure 2 could be potentially confusing. Notably, the absence of an arrow pointing from the thalamus to the PFC and the apparent presence of two separate pathways, one from sensory input to the PFC and another from sensory input to the BG and then to the thalamus, may lead to confusion. While I recognize that Figure 2 aims to explain network gating, there is room for improvement in presenting the content accurately.

      As suggested, we added a figure (new figure 2) illustrating the overall model architecture before expanding it to show the chunking circuitry. This figure also shows the projections from thalamus to PFC (we preserve the previous figure 2, now figure 3, as an example sequence of network gating decisions, in more abstract form to help facilitate a functional understanding of the sequence of events without too much clutter). We also made several other general clarifications to the methods sections to make it more transparent and easier to follow, as per your suggestions.   

      Still, for the method part, it would enhance clarity to explicitly differentiate between predesigned (fixed) components and trainable components. Specifically, does the supplementary material state that synaptic connection weights in striatal units (Go&NoGo) are trained using XCAL, while other components, such as those in the PFC and lateral inhibition, are not trained (I found some sentences in 'Limitations and Future Directions')?

      We have now explicitly specified learned and fixed components. We have further explained the role of XCAL and how striatal Go/NoGo weights are trained. We have also added clarification on how gating policies are learned via eligibility traces and synaptic tags.

      I'm not sure about the training process shown in Figure 8. It appears that the training may not have been completed, given that the blue line representing the chunk stripe is still ascending at the endpoint. The weights depicted in panel d) seem to correspond with those shown in panels b) and c), no? Then, how is the optimization process determined to be finished? Alternatively, could it be stated that these weight differences approach a certain value asymptotically? It would be better to clarify the convergence criteria of the optimization process.

      The training process has been clarified and we specify (in the last paragraph of the Base PBWM Model) how we determine when training is complete. We also can confirm that the network behavior has stabilized in learning even if the Go/NoGo weights continue to grow over time for the chunked layer (due to imperfect performance and reinforcement of the chunk gating strategy).

      Reviewer 2:

      Thank you for your comments. We appreciate your notes on the strengths of the paper and your suggestions to help improve the paper.

      The model employs a spiking neural network, which is relatively complex. Additionally, while this paper validates the effectiveness of chunking strategies used by the brain to enhance working memory efficiency through computational simulations, further comparison with related phenomena observed in cognitive neuroscience experiments on limited working memory capacity, such as the recency effect, is necessary to verify its generalizability.

      Thank you for proposing we add in more connections with human WM. Based on your specific recommendation, we have included the section “Network recapitulates human sequential effects in working memory.” where we discuss recency effects in human working memory and how our model recapitulates this effect. We have also made the connections to human data and human work more explicit throughout the manuscript (Figure 4c). As noted in response to the assessment, we believe our model does make contact with a wide variety of cognitive neuroscience data in human WM, such as the shape of the error distributions,  including probability of recall and variable precision;  attraction to recently presented items,  sensitivity to

      reinforcement history, set-size dependent chunking, recency effects, and dopamine manipulation effects, as well of a range of human data linking capacity limitations to frontostriatal function. It also provides a theoretical proposal for the well established phenomenon of capacity limitations in humans, suggesting that they arise due to difficulty in WM management.

      Recommendations For The Authors:

      Reviewer 1:

      I appreciate the authors' clear discussion of the limitations of this work in the section "Limitations and Future Directions". The development of a comprehensive model framework to overcome these constraints should require a separate paper, though, I am curious if the authors have attempted any experiments, such as using two identically designed chunking layers, that could partially support the assumptions presented in the paper.

      Expanding the number of chunking layers is a great future direction. We felt that it was most effective for this paper to begin with a minimal set up with proof of concept. We hypothesize that, given our results, a reinforcement learning algorithm would be able to learn to select the best level of abstraction (degree of chunking) in more continuous form, but would require more experience across a range of tasks to do so.

      I'm not sure whether it's appropriate that "Frontostriatal Chunking Gating..." precedes "Dopamine Balance is...", maybe it would be better to reverse the order thus avoiding the need to mention the role of dopamine before delving into the details. Additionally, including a summary at the end of the Introduction, outlining how the paper is organized, could provide readers with a clear roadmap of the forthcoming content.

      We appreciate this suggestion. After careful thought, we wanted to preserve the order because we felt it was important to make the direct connection between set size and stripe usage following the discussion on performance based on increasing stripes.  

      The authors could improve the overall polish of the paper. The equations in the Method section are somewhat confusing: Eq. (2) appears incorrect, as it lacks a weight w_i and n should presumably be in the denominator. For Eq. (3), the comma should be replaced with ']'... It would be advisable to cross-reference these equations with the original O'Reilly and Frank paper for consistency.

      Thank you for pointing out the errors in the method equations- those equations were indeed rendering incorrectly. We have fixed this problem.  

      Additionally, there are frequent instances of missing figure and reference citations (many '?'s), and it would be beneficial to maintain consistent citation formatting throughout the paper: sometimes citations are presented as "key/query coding (Traylor, Merullo, Frank, and Pavlick, 2024; see also Swan and Wyble, 2014)", while other times they are written as "function (O'Reilly & Frank, 2006)"...

      Lastly, there is an empty '3.1' section in the supplementary material that should be addressed.

      The citation issues were fixed. The supplementary information was cleaned and the missing section was removed. Thank you for mentioning these errors.  

      Reviewer 2:

      Thank you for the following recommendations and suggestions. We respond to each individual point based on the numbering system used in your review.  

      (1) This paper utilizes the experimental paradigm of visual working memory, in which different visual stimuli are sequentially loaded into the working memory system, and the accuracy of memory for these stimuli is calculated.

      The authors could further plot the memory accuracy curve as the number of items (N) increases, under both chunking and non-chunking strategies. This would allow for the examination of whether memory accuracy suddenly declines at a specific value of N (denoted as Nc), thereby determining the limited capacity of working memory within this experimental framework, which is about 4 different items or chunks. Additionally, it could be investigated whether the value of Nc is larger when the chunking strategy is applied.

      We have included an additional plot (Probability of Recall) as a supplemental figure to Figure 5 to explore the probability of recall as a function of set size for both chunking and no chunking models.  This plot shows that the chunking model increases probability of recall when set size exceeds allocated capacity (but that nevertheless both models show decreases in recall with set size, consistent with the literature).

      (2) The primacy effect or recency effect observed in the experiments and traditional working memory models, including the slot model and the limited resource model, should be examined to see if it also appears in this model.

      The literature on human working memory shows a prevalent recency effect (but not a primacy effect, which is thought to be due to episodic memory, and which is not included in our model). We have added a section showing that our model demonstrates clear recency effects.

      (3) The construction of the model and the single neuron dynamics involved need further refinement and optimization:

      Model Description: The details of the model construction in the paper need to be further elaborated to help other researchers better understand and apply the model in reproducing or extending research. Specifically:

      a) The construction details of different modules in the model (such as Input signal, BG, striatum, superficial PFC, deep PFC) and the projection relationships between different modules. Adding a diagram to illustrate the network construction would be beneficial.

      To aid in the understanding of the model construction and model components, we have included an additional figure (Figure 1: Base Model) that explains the key layers and components of the model.  We have also altered the overall model figures to show more clearly that the inputs project to both PFC and striatum, to highlight that information is temporarily represented in superficial PFC layers even before striatal gating, which is needed for storage after the input decays.

      We have expanded the methods and equations and we also provide a link to the model github for purposes of reproducibility and sharing.  

      A base model figure was added to specify key connections.  

      a) The numbers of excitatory and inhibitory neurons within different modules and the connections between neurons.

      We added clarification on the type of connections between layers (specifying which are fixed and learned). We have also added the size of layers in a new appendix section “Layer Sizes and Inner Mechanics”

      b) The dynamics of neurons in different modules need to be elaborated, including the description of the dynamic equations of variables (such as x) involved in single neuron equations.

      Single neuron dynamics are explained in equations 1-4. Equations 5-6 explain how activation travels between layers. The specific inhibitory dynamics in the chunking layer are elaborated in Figure 4. PBWM Model and Chunking Layer Details. The Appendix section “Neural model  implementational details” states the key equations, neural information and connectivity. Since there is a large corpus of background information underlying these models, we have linked the Emergent github and specifically the Computational Cognitive Neuroscience textbook which has a detailed description of all equations. For the sake of paper length and understability, we chose the most relevant equations that distinguish our model.  

      c) The selection of parameters in the model, especially those that significantly affect the model's performance.

      The appendix section hyperparameter search details some of the key parameters and why those values were chosen.  

      d) The model employs a sequential working memory paradigm, the forms of external stimuli involved in the encoding and recalling phases (including their mathematical expressions, durations, strengths, and other parameters) need to be elaborated further.

      We appreciate this comment. We have expanded the Appendix section “Continuous Stimuli” to include the details of stimuli presentation (including durations etc).  

      (4) The figures in the paper need optimization. For example, the size of the schematic diagram in Figure 2 needs to be enlarged, while the size of text such as "present stimulus 1, 2, recall stimulus 1" needs to be reduced. Additionally, the citation of figures in the main text needs to be standardized. For example, Figure 1b, Figure 1c, etc., are not cited in the main text.

      The task sequence figure (original Figure 2) has been modified and following your suggestions, text sizes have been modified.  

      (5) Section 3.1 in the appendix is missing.

      Supplemental section 3.1 is removed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      MacDonald et al., investigated the consequence of double knockout of substance P and CGRPα on pain behaviors using a newly created mouse model. The investigators used two methods to confirm knockout of these neuropeptides: traditional immunolabeling and a neat in vitro assay where sensory neurons from either wildtype or double knock are co-cultured with substance P "sniffer cells", HEK cells stably expressing NKR1 (a substance P receptor), GCaMP6s and Gα15. It should be noted that functional assays confirming CGRPα knockout were not performed. Subsequently, the authors assayed double knockout mice (DKO) and wildtype (WT) mice in numerous behavioral assays using different pain models, including acute pain and itch stimuli, intraplanar injection of Complete Freund's Adjuvant, prostaglandin E2, capsaicin, AITC, oxaliplatin, as well as the spared nerve injury model. Surprisingly, the authors found that pain behaviors did not differ between DKO and WT mice in any of the behavioral assays or pain paradigms. Importantly, female and male mice were included in all analyses. These data are important and significant, as both substance P and CGRPα have been implicated in pain signaling, though the magnitude of the effect of a single knockout of either gene has been variable and/or small between studies.

      The conclusions of the study are largely supported by the data; however, additional experimental controls and analyses would strengthen the authors claims.

      We thank the reviewer for their insightful comments and have answered them below.

      (1) The authors note that single knockout models of either substance P or CGRPα have produced variable effects on pain behaviors that are study-dependent. Therefore, it would have strengthened the study if the authors included these single knockout strains in a side-by-side analysis (in at least some of the behavioral assays), as has been done in prior studies in the field when using double- or triple-knockout mouse models (for example, see PMID: 33771873). If in the authors hands, single knockouts of either peptide also show no significant differences in pain behaviors, then the finding that double knockouts also do not show significant differences would be less surprising.

      In our study, we found no phenotypic differences between WT and DKO mice, suggesting Substance P and CGRPα are largely dispensable for pain behavior. We agree that if we had we observed significant changes in behavior, it would have been interesting to examine the effects of knocking out each gene individually to determine which peptide is responsible for the phenotype. However, given the double deletion had no effect, we can predict that loss of each alone would have no or minor effects. In line with this, a more recent study that comprehensively phenotyped the Calca KO mouse found no deficits in a range of danger related behaviors (PMID: 34376756). Overall, as we are reporting negative data about the Double KO, we do not believe extensive studies of the single KOs is necessary to support the findings of our paper.

      (2) It is unclear why the authors only show functional validation of substance P knockout using "sniffer" cells, but not CGRPα. Inclusion of this experiment would have added an additional layer of rigor to the study.

      Imaging of CGRPα release is more challenging using the ‘sniffer’ approach because functional CGRP receptors require the expression of two genes: Calcrl (or Calcr) along with Ramp1. We now have succeeded in generating a new stable cell line expressing Calcrl and Ramp1, along with GCaMPs and human Galpha15 and include new data in the revised Figure 1F-H and Figure Supplement 1B. These cells respond robustly to CGRPalpha, but not to SP. In contrast, the existing SP cell line responds to SP but not CGRPalpha. Capsaicin evokes a strong response in these cells in co-culture with DRGs. This response is dramatically reduced in the DKO. This data therefore confirms our mice have a loss of CGRPalpha signaling as indicated by IHC.

      (3) The authors should be a bit more reserved in the claims made in the manuscript. The main claim of the study is that "CGRPα and substance P are not required for pain transmission." However, the authors also note that neuropeptides can have opposing effects that may produce a net effect of no change. In my view, the data presented show that double knockout of substance P and CGRPα do not affect somatic pain behaviors, but do not preclude a role for either of these molecules in pain signaling more generally. Indeed, the authors also note that these neuropeptides could be involved in nociceptor crosstalk with the immune or vascular systems to promote headache. The authors only assayed pain responses to glabrous skin stimulation. How the DKO mice would behave in orofacial pain assays, migraine assays, visceral pain assays, or bone/joint pain assays, for example, was not tested. I do not suggest the authors include these experiments, only that they address the limitations/weaknesses of their study more thoroughly.

      The reviewer makes an important point that we agree with. Our study assesses acute and chronic pain in peptide DKO mice lacking Substance P and CGRPα. Most of our data focuses on the hindpaw as pain in the paw is the gold-standard approach for phenotyping pain targets and numerous well-validated chronic pain models have been developed for this body site.  However, to extend the conclusions to other tissues, we did also look at visceral pain and GI distress using acetic acid and LiCl models (Figure 2J and Figure 2 supplement). We agree with the reviewer that given the utility of CGRP monoclonal antibodies, migraine experiments would be interesting for future studies using these mice, a point we highlight in the discussion. Bone/joint pain is also clearly important from a translational perspective, but outside the scope of the current study.

      (4) A more minor but important point, the authors do not describe the nature of the WT animals used. Are the littermates or a separately maintained colony of WT animals? The WT strain background should be included in the methods section.

      The WT strain are C57/BL6j from Jackson Lab. This has been added to the methods.

      Reviewer #2 (Public Review):

      Summary:

      The paper aimed to examine the effect of co-ablating Substance P and CGRPα peptides on pain using Tac1 and Calca double knockout (DKO) mice. The authors observed no significant changes in acute, inflammatory, and neuropathic pain. These results suggest that Substance P and CGRPα peptides do not play a major role in mediating pain in mice. Moreover, they reveal that the lack of behavioral phenotype cannot be explained by the redundancy between the two peptides, which are often co-expressed in the same neuron

      Strengths:

      The paper uses a straightforward approach to address a significant question in the field. The authors confirm the absence of Substance P and CGRPα peptides at the levels of DRG, spinal cord, and midbrain. Subsequently, they employ a comprehensive battery of behavioral tests to examine pain phenotypes, including acute, inflammatory, and neuropathic pain. Additionally, they evaluate neurogenic inflammation by measuring edema and extravasation, revealing no changes in DKO mice. The data are compelling, and the study's conclusions are well-supported by the results. The manuscript is succinct and well-presented.

      We thank the reviewer for their enthusiasm for the importance of our work.

      Reviewer #3 (Public Review):

      In this study, the authors were assessing the role of double global knockout of substance P and CGPRα on the transmission of acute and chronic pain. The authors first generated the double knockout (DKO) mice and validated their animal model. This is then followed by a series of acute and chronic pain assessments to evaluate if the global DKO of these neuropeptides are important in modulating acute and chronic pain behaviors. Authors found that these DKO mice Substance P and CGRPα are not required for the transmission of acute and chronic pain although both neuropeptides are strongly implicated in chronic pain. This study does provide more insight into the role of these neuropeptides on chronic pain processing, however, more work still needs to be done. (see the comments below).

      We thank the reviewer for their detailed and constructive feedback, and below outline the steps we have taken to answer their concerns.

      (1) In assessing the double KO (result #1), why are different regions of the brains shown for substance P and CGRPα (for example, midbrain for substance P and amygdala for CGRPα)? Since the authors mentioned that these peptides co-expressed in the brain (as in the introduction), shouldn't the same brain regions be shown for both IHC? It would be ideal if the authors could show both regions (midbrain and amygdala) in addition to the DRG and spinal cord for both peptides in their findings.<br /> In addition, since this is double KO, the authors should show more representative IHC-stained brain regions (spanning from the anterior to posterior).

      We could not co-stain both SP and CGRP in the same sections as the DKO mouse has endogenous GFP and RFP fluorescence, limiting us to one channel (far red). Specifically, we use a Calca KO that is a Cre:GRP knock-in/knockout (Chen et al 2018, PMID30344042) and Tac1 KO is a tagRFP knock-in/knockout (Wu et al 2018 PMID29485996). This is why we show different brain sections.

      (2) It is also unclear as to why the authors only assessed the loss of substance P signaling in the double KO mice. Shouldn't the same be done for CGRPα signaling? Either the authors assess this, or the authors have to provide clear explanations as to why only substance P signaling was assessed.

      As noted in our response to Reviewer 1, imaging of CGRP release is more challenging using the ‘sniffer’ approach because functional CGRP receptors require the expression of two genes: Calcrl (or Calcr) along with Ramp1. We have now generated this cell line and performed the experiment (see revised Figure 1 and Figure 1 Supplement).

      (3) Has these animal's naturalistic behavior been assessed after the double KO (food intake, sleep, locomotion for example)? I think this is important as changes to these naturalistic behaviors can affect pain processes or outcomes.

      We agree that assessment of naturalistic behavior including food intake, sleep and locomotion would be interesting to look at in DKO mice. However, our study is focused on acute and chronic pain behavior of these animals, and therefore a comprehensive phenotypic assessment of naturalistic home-cage behavior is outside the scope of our study.

      (4) Figure 2H: The authors acknowledge that there is a trend to decrease with capsaicin-evoked coping-like responses. However, a close look at the graph suggests that the lack of significance could be driven by 1 mouse. Have the authors run an outlier test? Alternatively, the authors should consider adding more n to these experiments to verify their conclusions.

      We were reluctant to add more animals searching for significance. Instead, we investigated the potential phenotype further by looking at cfos staining in the cord and found no differences (Figure 2, supplement 1). This result suggests loss of the two peptides does not grossly disrupt capsaicin evoked pain signal transmission between the nociceptor and post-synaptic dorsal neurons in the spinal cord.

      (5) Similarly, the values for WT in the evoked cFos activity (Figure 2- Suppl Figure 1) are pretty variable. Considering that the n number is low (n = 5), authors should consider adding more n.<br /> Also, since the n number is low in this experiment (eg. 5 vs 4), does this pass the normality test to run a parametric unpaired t-test? Either the authors increase their n numbers or run the appropriate statistical test.

      As described in the statistical tables, the Shapiro-Wilk test indicates these data do pass the normality test. Therefore, we retain the use of the unpaired t test, which demonstrates no significant difference between the groups.

      (6) In most of the results, authors ran a parametric test despite the low n number. Authors have to ensure that they are carrying out the appropriate statistical test for their dataset and n number.

      We now provide a table of the statistical results, which provides detailed information about all statistical tests performed in this study. For experiments where we make a single comparison between the two distributions (WT vs DKO), we have run a Shapiro-Wilk test. Where the data from both groups pass the normality test, we retain the use of the unpaired t test. Where the Shapiro-Wilk test indicates data from either group are unlikely to be normally distributed, we now use a Mann-Whitney U test to compare the groups, as this non-parametric test makes no assumptions about the underlying distribution.

      Many experiments involved two factors (genotype, and e.g. temperature, drug, time-point). These data were analyzed in the original submission using 2-WAY ANOVA or Repeated Measures 2-WAY ANOVA, followed by post-hoc Sidak’s tests to compute p values adjusted for multiple comparisons. Because there is no widely agreed non-parametric alternative to 2-WAY ANOVA for analyzing data with two factors and that enables us to account for multiple comparisons, we used 2-WAY ANOVA as is typically used in the field for these kinds of experiments. We reasoned sticking with the 2-WAY ANOVA was the best course of action based on information provided by the statistical software used for this study - https://www.graphpad.com/support/faq/with-two-way-anova-why-doesnt-prism-offer-a-nonparametric-alternative-test-for-normality-test-for-homogeneity-of-variances-test-for-outliers/

      We note that regardless of the test, our conclusion that there are no major changes in acute or chronic pain behaviors are clear and strongly supported.

      (7) Along the same line of comment with the previous, authors should increase the n number for DKO for staining (Figure 4) as n number is only 3 and there is variability in the cFos quantification in the ipsilateral side.

      We believe this is not necessary as the finding is clear that there is no difference.

      (8) Authors should provide references for statement made in Line 319-321 as authors mentioned that there are accumulating evidence indicating that secretion of these neuropeptides from nociceptor peripheral terminals modulates immune cells and the vasculature in diverse tissues.

      We now provide several references to primary papers and reviews supporting this statement.

      (9) Authors state that the sample size used was similar to those from previous studies, but no references were provided. Also, even though the sample sizes used were similar, I believe that the right statistic test should be used to analyze the data.

      We have now cited several classic studies phenotyping mouse KOs in pain in the methods that used similar sample sizes. As detailed above, we have taken the reviewer’s feedback on board and performed normality testing to ensure the correct statistical test is used for each experiment.

      (10) In the discussion, the authors noted that knocking out of a gene remains the strongest test of whether the molecule is essential for a biological phenomenon. At the same time, it was acknowledged that Substance P infusion into the spinal cord elicits pain, but it is analgesic in the brain. The authors might want to expand more on this discussion, including how we can selectively assess the role of these neuropeptides in areas of interest. For example, knocking out both Substance P and CGRPα in selected areas instead of the global KO since there are reported compensatory effects.

      This is highlighted in the closing paragraph: “Emerging approaches to image and manipulate these molecules (Girven et al., 2022; Kim et al., 2023), as well as advances in quantitating pain behaviors (Bohic et al., 2023; MacDonald and Chesler, 2023), may ultimately reveal the fundamental roles of neuropeptides in generating our experience of pain.” The Kim preprint (now published, and so the citation has been updated in the text) describes a method of inactivating neuropeptide transmission in select brain regions in a cell-type specific manner.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I do not have any major comments. My minor comments are as follows:

      (1) What was the control group for all behavioral studies? Was it WT from an independent colony or one of the littermates was used for generating controls?

      We used C57/Bl6 mice from Jax. This is now mentioned in methods.

      (2) In Fig. 2H, it seems that the effect will become significant if several mice are added.

      We are reluctant to add mice searching for significance. Sample sizes were determined before we collected the data blind.

      (3) There is no figure 3, but two figures 4.

      Thank you. This has been corrected.

      (4) Multiple typos in the legend for figure 4 (lines 234-254). Line 242 (& n=8 (3M, 3F)), line 243 (swelling and plasma), line 252 ((n=8 for) & n=6 for DKO (4M, 4F)).

      Thank you. This has been corrected.

      (5) In Figure 4 (lines 273-285), the contralateral side is mentioned in B but no images are shown.

      Thank you. We removed the mention.

      (6) Although ligand knockouts cannot be compared directly with receptor inhibition, the readers could benefit from discussing studies of receptor ablation and/or pharmacological inhibition.

      We do discuss the classic studies of receptor KO, and the clinical data on receptor blockers here –

      “However, selective antagonists of the Substance P receptor NKR1 failed to relieve chronic pain in human clinical trials (Hill, 2000). Although CGRP monoclonal antibodies and receptor blockers have proven effective for subsets of migraine patients, their usefulness for other types of pain in humans is unclear (De Matteis et al., 2020; Jin et al., 2018). In line with this, knockout mice deficient in Substance P, CGRPα or their receptors have been reported to display some pain deficits, but the analgesic effects are neither large nor consistent between studies (Cao et al., 1998; De Felipe et al., 1998; Guo et al., 2012; Salmon et al., 2001, 1999; Zimmer et al., 1998).” 

      Reviewer #3 (Recommendations For The Authors):

      Minor comments:

      (1) Figure 1E: What does chambers mean? Additionally, are the 12 chambers equally from the male and female samples (6 from male and 6 from female)?

      We have changed this to well. Each replicate is an individual well from 8 well chamber slide. In all these experiments, the wells are approximately evenly distributed by mouse, because from each mouse we cultured around 8 wells’ worth of DRGs.

      (2) Figure 1D: What does low and high mean in the Hargreaves test?

      These refer to a low and high active intensity of the radiant heat stimulus. Number is now described in the methods. 40 and 55 in the intensity units used by the instrument.

      (3) Figure 2-Suppl Figure 1: Authors should provide a bigger image of the image so that it is clearer to the readers.

      We think the image is of a reasonable size and comparable to the images used elsewhere in the paper.

      (4) Authors should consider labeling their supplementary figures in running numbers or combining supplementary figures together to avoid confusion. For example, Figure 2-Supplementary Figure 1 and Figure 2- Supplementary Figure 2 can be combined as just Supplementary Figure 2.

      We agree with the reviewer this would be clearer, but we have followed eLife’s convention for labelling and numbering supplements.

      (5) Figure 3 is mislabeled as Figure 4.

      Thank you. We have corrected this.

      (6) Only female mice were used in the CFA experiment, which does not go in line with the rest of the results which consist of both sexes.

      We have repeated the experiment with additional male mice. To be consistent with the von frey data, these were followed for 7 days, and so the figure now shows a 7 day time course.

      (7) Typo in line 243. The word "and" is subscript.

      Thank you. We have corrected this.

      (8) There is a typo in the legend for Figure 4 where E is labeled I, G is labeled as F, and J is labeled as J.

      Thank you. We have corrected this.

      (9) Authors should specify what "several weeks" means (Line 263).

      It means three weeks. We tested to 21 days. We will replace with three.

      (10) Authors should specify what "one day" means (Line 267). For example, how many days after the intraplantar oxaliplatin treatment? Also, authors should justify why that specific time point was selected or have a reference for it.

      This means one day after - 24 hours. Please see PMID: 33693512. Two references are provided in them methods.

      (11) Figure 4 legend: authors should again be specific on what "prolonged" entails (Line 277).

      We have replaced prolonged with 30 minutes brushing. Specifically, 3 x 10 min stim period, with 1 min rest between stim. It is in the methods.

      (12) In the methods section, authors state that both male and female mice were used for all experiments. However, only female mice were used in the CFA experiment (see minor comment #6). Authors should verify and correct this.

      This is correct. We only used female mice for one of the groups. We have since repeated with males, now included in the data.

      (13) Authors should be more specific in the methods section on how long the habituation per day, how many days and what were the mice habituation to (experimenter, room, chamber, etc)?

      As noted in the methods, mice are habituated for at least an hour to the chambers, and thus implicitly to the room. We do not perform explicit habituation to the investigator such as repeated handling.

      (14) Authors need to provide more information on the semi-automated procedure they are referring to in Line 397. Also, authors should also provide the criteria for cFos quantification (eg. Intensity, etc). If this has been published before, they should provide the reference.

      We have added this. We used the ‘Find maxima’ and ‘Analyze particles’ functions in FIJI, followed by a manual curation step.

      (15) How much acetone was applied and how was it applied to the paw? (Line 495)

      We used the same applicator (1ml syringe with a well at the top) to generate a droplet of acetone that was used for all mice. This has been added to methods.

      (16) Authors should specify the amount of capsaicin injected in μl (Line 500).

      20 ul. We have added this.

      (17) Authors should explain or reference why they are analyzing the 15 min interval between 5 and 20 minutes for injection (Line507-508).

      Acetic acid behaviour lasts around 30 mins in our hands. We chose the 15 minute interval because it reduces burdensome hand scoring time by 50% versus doing the whole 30 mins. We reasoned that in the first 5 mins post injection the animal behaviour may be contaminated by stress related to handling, injection and return to chamber. Thus, 5 and 20 minutes provided a sensible time-frame for scoring the behavior when it is at its peak.

      (18) Authors have to provide more information/explanation on how they decide on the conditioned taste aversion protocol. Like why they do 30 mins exposure to a single water-containing bottle followed 90 mins exposure to both bottles. If this has been published before, they should provide the reference.

      We read dozens of different published protocols in the literature, and piloted one that was something of an amalgam of some of them with various adaptations of convenience. Because it worked on our first attempt, we stuck to it. The advantage of the CTA assay is it is incredibly robust to changes in the specificities of the paradigm, evincing the clear survival value of learning to avoid tastes that make you sick.

      (19) Authors again should provide more detail in their methods section.

      a. Specify the time frame that they are assessing here (Line 533).

      This can be seen in the Figure. 0 to 120 mins. We have added it to the methods.

      b. How long were the mice allowed to recover post-SNI before mechanical allodynia was assessed (Line 545)?

      This is apparent in the figures. 2 days to 21 days. We have added it to the methods.

      c. How much of the oxaliplatin was injected into the mice?

      40 ug / 40 ul (see PMID:33693512)

      Editors note: Reviewers agreed that addressing the concerns about power, outliers, and statistics, as well as functional validation of CGRPα would raise the strength of evidence to compelling, and inclusion of comparison to single KO would raise it to exceptional.

      Should you choose to revise your manuscript, please check to ensure full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05.

    1. Author response:

      Regarding a future revised version, we plan to:

      • refer to the "MoMac-VERSE" study according to the original report.

      • modify incorrectly formatted references.

      • modify the text to acknowledge the heterogeneity and variability in the response of primary cells to the GSK3 inhibitor.

      • improve the explanation of the reanalysis of single cell RNAseq data in Figure 7 (ref. 47, GSE120833), and re-adapt the graphs of the scRNA-Seq data using different plot parameters (e.g., reduction = "umap.scvi") to provide a more friendly-user visualization including bona fide macrophage markers for each subpopulation.

      • include statistical analyses in each one of the figure legends

      • perform additional analyses (e.g., dose-response and kinetics of CHIR-99021 effects) and mechanistic studies (e.g., role of proteasome) to further dissect the re-programming ability of the GSK3/MAFB axis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Insects and their relatives are commonly infected with microbes that are transmitted from mothers to their offspring. A number of these microbes have independently evolved the ability to kill the sons of infected females very early in their development; this male killing strategy has evolved because males are transmission dead-ends for the microbe. A major question in the field has been to identify the genes that cause male killing and to understand how they work. This has been especially challenging because most male-killing microbes cannot be genetically manipulated. This study focuses on a male-killing bacterium called Wolbachia. Different Wolbachia strains kill male embryos in beetles, flies, moths, and other arthropods. This is remarkable because how sex is determined differs widely in these hosts. Two Wolbachia genes have been previously implicated in male-killing by Wolbachia: oscar (in moth male-killing) and wmk (in fly male-killing). The genomes of some male-killing Wolbachia contain both of these genes, so it is a challenge to disentangle the two.

      This paper provides strong evidence that oscar is responsible for male-killing in moths. Here, the authors study a strain of Wolbachia that kills males in a pest of tea, Homona magnanima. Overexpressing oscar, but not wmk, kills male moth embryos. This is because oscar interferes with masculinizer, the master gene that controls sex determination in moths and butterflies. Interfering with the masculinizer gene in this way leads the (male) embryo down a path of female development, which causes problems in regulating the expression of genes that are found on the sex chromosomes.

      We would like to thank you for evaluating our manuscript.

      Strengths:

      The authors use a broad number of approaches to implicate oscar, and to dissect its mechanism of male lethality. These approaches include:

      (1) Overexpressing oscar (and wmk) by injecting RNA into moth eggs.

      (2) Determining the sex of embryos by staining female sex chromosomes.

      (3) Determining the consequences of oscar expression by assaying sex-specific splice variants of doublesex, a key sex determination gene, and by quantifying gene expression and dosage of sex chromosomes, using RNASeq.

      (4) Expressing oscar along with masculinizer from various moth and butterfly species, in a silkmoth cell line.

      This extends recently published studies implicating oscar in male-killing by Wolbachia in Ostrinia corn borer moths, although the Homona and Ostrinia oscar proteins are quite divergent. Combined with other studies, there is now broad support for oscar as the male-killing gene in moths and butterflies (i.e. order Lepidoptera). So an outstanding question is to understand the role of wmk. Is it the master male-killing gene in insects other than Lepidoptera and if so, how does it operate?

      Thank you for your comments. Wolbachia strains often carry wmk genes, but as observed in this study, the homologs in Homona showed no apparent MK ability. These showed strong male lethality in D. melanogaster, but it is still unclear whether the genes are the master male-killing gene in Diptera. It is also possible that the genes show toxicities in other lepidopteran insects as well as in other insect taxa. Further functional validation assays in different insects are warranted to clarify whether wmk shows toxicity in different insect taxa. We have also discussed the functions of wmk in the Discussion section (lines 301-306).

      Weaknesses:

      I found the transfection assays of oscar and masculinizer in the silkworm cell line (Figure 4) to be difficult to follow. There are also places in the text where more explanation would be helpful for non-experts (see recommendations).

      Thank you for your suggestion. We have thoroughly revised the manuscript to address all the questions, comments and suggestions you raised in “recommendations”. In particular, we have revised the section on the transfection assays of Oscar and Masc in Bm-N4 cells (result section “Hm-oscar suppresses the masculinizing functions of lepidopteran masc genes” starts on line 214 and Fig. 4; materials and methods section ”Transfection assays and quantification of BmIMP<sup>M</sup>”, starts on line 483). We have also provided more detailed explanations for non-experts in some contexts (in response to your recommendation). We believe that the resulting revisions have significantly improved the quality and comprehensiveness of our manuscript.

      Reviewer #2 (Public review):

      Summary:

      Wolbachia are maternally transmitted bacteria that can manipulate host reproduction in various ways. Some Wolbachia induce male killing (MK), where the sons of infected mothers are killed during development. Several MK-associated genes have been identified in Homona magnanima, including Hm-oscar and wmk-1-4, but the mechanistic links between these Wolbachia genes and MK in the native host are still unclear.

      In this manuscript, Arai et al. show that Hm-oscar is the gene responsible for Wolbachia-induced MK in Homona magnanima. They provide evidence that Hm-Oscar functions through interactions with the sex determination system. They also found that Hm-Oscar disrupts sex determination in male embryos by inducing female-type dsx splicing and impairing dosage compensation. Additionally, Hm-Oscar suppresses the function of Masc. The manuscript is well-written and presents intriguing findings. The results support their conclusions regarding the diversity and commonality of MK mechanisms, contributing to our understanding of the mechanisms and evolutionary aspects of Wolbachia-induced MK.

      We would like to thank you for evaluating our manuscript.

      Strengths/weaknesses:

      (1) The authors found that transient overexpression of Hm-oscar, but not wmk-1-4, in Wolbachia-free H. magnanima embryos induces female-biased sex ratios. These results are striking and mirror the phenotype of the wHm-t infected line (WT12). However, Table 1 lists the "male ratio," while the text presents the "female ratio" with standard deviation. For consistency, the calculation term should be uniform, and the "ratio" should be listed for each replicate.

      We have revised the first results section (Hm-oscar induces female-biased sex ratios, starting from line 147) accordingly to maintain the consistency in the calculation term. In the revised manuscript, the 'male ratio' is now consistently used, in alignment with Fig. 1. In addition, we have included all sex ratio information (number of males and females) in the supplementary data file for transparency and clarity.

      (2) The error bars in Figure 3 are quite large, and the figure lacks statistical significance labels. The authors should perform statistical analysis to demonstrate that Hm-oscar-overexpressed male embryos have higher levels of Z-linked gene expression.

      The large error bar on each chromosome (Fig.3a-d) likely reflect the overall variation in expression levels across different transcripts. Accordingly, we have included statistical data for Figure 3 based on the Steel-Dwass test for expression levels. However, displaying statistical significance directly on the whisker plots would make the figure too cluttered due to the numerous combinations. Instead, we have provided all the statistical data in the supplementary data file. To further support the claim that Z-linked genes are more highly expressed in wHm-t-infected/Hb-Oscar-injected embryos, we have included the expression data for a Z-linked gene tpi, along with its statistical data in the revised manuscript (Fig. 3e, lines 210-212).

      (3) The authors demonstrated that Hm-Oscar suppresses the masculinizing functions of lepidopteran Masc in BmN-4 cells derived from the female ovaries of Bombyx mori. They should clarify why this cell line was chosen and its biological relevance. Additionally, they should explain the rationale for evaluating the expression levels of the male-specific BmIMP variant and whether it is equivalent to dsx.

      Thank you for your suggestion. We selected BmN-4 cell line because previous studies have established it as a reliable model for investigating the functions of lepidopteran masc genes and the interactions between masc and Oscar genes (Katsuma et al., 2019; 2022). In addition, BmIMP<sup>M</sup> is a male-specific regulator of the male-type dsx, making it an ideal target for assessing the 'maleness' induced by transfection of the masc gene in female-derived BmN-4 cells (Suzuki et al., 2010; Katsuma et al., 2015). We have included more detailed background information in the revised manuscript and have thoroughly revised this section (Hm-oscar suppresses the masculinizing functions of lepidopteran masc genes, starting at line 214) and Figure 4 for better clarity.

      (4) Although the authors show that Hm-oscar is involved in Wolbachia-induced MK in Homona magnanima and interacts with the sex determination system in lepidopteran insects, the precise molecular mechanism of Hm-oscar-induced MK remains unclear. Further studies are needed to elucidate how Hm-oscar regulates Homona magnanima genes to induce MK, though this may be beyond the scope of the current manuscript.

      Based on our findings and previous studies in Homona, Ostrinia and Bombyx (Arai et al., 2023a; Katsuma et al., 2023; Kiuchi et al., 2014), we hypothesize that the molecular mechanisms underlying _w_Hm-induced MK are likely linked to impaired dosage compensation caused by the inhibition of Masc function by the Hm-Oscar protein. While the precise mechanisms remain unclear, unbalanced Z-linked gene expression due to the impaired dosage compensation (i.e., 2-fold higher Z-linked gene expression compared to normal males) is known to be lethal for lepidopteran males (Kiuchi et al., 2014; Fukui et al., 2015; Visser et al., 2021). We have outlined this hypothesis in the Discussion section (lines 245-254).

      Reviewer #3 (Public review):

      Summary:

      Overall, this is a clearly written manuscript with nice hypothesis testing in a non-model organism that addresses the mechanism of Wolbachia-mediated male killing. The authors aim to determine how five previously identified male-killing genes (encoded in the prophage region of the wHm Wolbachia strain) impact the native host, Homona magnanima moths. This work builds on the authors' previous studies in which:

      (1) They tested the impact of these same wHm genes via heterologous expression in Drosophila melanogaster.

      (2) They examined the activity of other male-killing genes (e.g., from the wFur Wolbachia strain in its native host: Ostrinia furnacalis moths).

      Advances here include identifying which wHm gene most strongly recapitulates the male-killing phenotype in the native host (rather than in Drosophila), and the finding that the Hm-Oscar protein has the potential for male-killing in a diverse set of lepidopterans, as inferred by the cell-culture assays.

      Strengths:

      Strengths of the manuscript include the reverse genetics approaches to dissect the impact of specific male-killing loci, and the use of a "masculinization" assay in Lepidopteran cell lines to determine the impact of interactions between specific masc and oscar homologs.

      We would like to thank you for evaluating our manuscript.

      Weaknesses:

      My major comments are related to the lack of statistics for several experiments (and the data normalization process), and opportunities to make the manuscript more broadly accessible.

      Thank you for your suggestions. We have thoroughly revised the manuscript to provide clearer explanations for non-experts. In addition, we have included more detailed statistical data for Figure 3 and Figure 4 based on the Steel-Dwass tests. For Figure 3a-d, displaying statistical significance directly on the whisker plots would make the figure too cluttered due to the numerous combinations. Therefore, we have provided all the statistical data in the supplementary data file. To further support the claim that Z-linked genes are more highly expressed in w_Hm-t-infected/Hm-Oscar-injected embryos, we have included the expression data for a Z-linked gene _tpi, along with its statistical data in the revised manuscript (Fig.3e, lines 210-212). Regarding Figure 4, we have revised the Figure based on the reviewer’s suggestions, and provided more detailed information on how the expression data were analyzed (Transfection assays and quantification of BmIMP<sup>M</sup>, lines 495-520). We have also included more detailed background information on the assay system (Hm-oscar suppresses the masculinizing functions of lepidopteran masc genes, lines 215-237). Although we did not observe statistical significance based on the Steel-Dwass test, likely due to limited replicates, the observed changes in the IMP gene expression remain clearly evident.

      The manuscript I think would be much improved by providing more details regarding some of the genes and cross-lineage comparisons. I know some of this is reported in previous publications, but some summary and/or additional analysis would make this current manuscript much more approachable for a broader audience, and help guide readers to specific important findings. For example, a graphic and/or more detail on how the wmk/oscar homologs (within and across Wolbachia strains) differ (e.g., domains, percent divergence, etc) would be helpful for contextualizing some of the results. I recognize the authors discuss this in parts (e.g., lines 223-227), but it does require some bouncing between sections to follow. Similarly, the experiments presented in Figure 4 indicate that Hm-oscar has broad spectrum activity: how similar are the masc proteins from these various lepidopterans? Are they highly conserved? Rapidly evolving? Do the patterns of masc protein evolution provide any hints at how Oscar might be interacting with masc?

      Thank you for your valuable suggestion. To address this, we have included a visualization of the structural differences between the Oscar and wmk homologs in Figure 1a of the revised manuscript. In addition, we have included more detailed information for these genes and revised the introduction (lines 110-114; 124-137) and discussion (lines 255-266) to provide a clearer and more comprehensive overview. We have also described the similarity of the Masc proteins and Oscar proteins that we used, which is now reflected in the revised Figure 4b and 4d. More detailed information on these proteins is available in the supplementary data. Notably, Masc proteins exhibit high sequence variability with conserved domains (Figure 4d). Previous study identified the N-terminal region of Masc as crucial for the Oscar function (Katsuma et al., 2022). The wide spectrum of the actions of Hm-Oscar likely stems from these conserved structures of Masc, but the effects might have undergone evolutionary tuning through interactions with the native host as discussed in lines 293-294.

      It is clear from Figure 1 that the combinations of wmk homologs do not cause male killing on their own. Did the authors test if any of the wmk homologs impact the MK phenotype of oscar? It looks like a previous study tested this in wFur (noted in lines 250-252), but given that the authors also highlight the differences between the wFur-oscar and Hm-oscar proteins, this may be worth testing in this system. Related to this, what is the explanation for why there would be 4 copies of wmk in Hm?

      Thank you for your valuable suggestion. Unfortunately, we have not yet tested the effects of co-expression of wmk and Oscar. Due to a technical issue, the mixing of multiple constructs results in a reduced amount of mRNA (i.e. mixing wmk-3 and Hm-Oscar at the same concentration results in a 2-fold lower concentration in mRNA for both genes compared to mono-injected groups). In addition, we have previously tested injecting mRNA at the twofold higher concentration (i.e. 2 ug/ul mRNA), which resulted in very low hatchability regardless of the genes. Katsuma et al (2022) tested the effect of wmk on the sex determination system, but did not test the effect of co-injection/transfection of wmk and Oscar. Considering the results of this and previous studies (Katsuma et al., 2022; Arai et al., 2023), it is likely that the targets of the wmk and oscar genes are different (as discussed in lines 267-289). Co-injection of wmk and oscar may not produce additive effects. Nevertheless, we would like to test the results in future studies using the Drosophila system as well.

      As you point out, it is an interesting point that the moth-derived MK Wolbachia w_Hm-t encodes four _wmk genes, although they have no apparent effect on host survival. The exact functional relevance of these wmk homologs remains unclear. However, they may play a role in Wolbachia biology as transcriptional regulators, given that they encode HTH domains. Wolbachia generally encode several wmk homologs in their genome, regardless of whether they induce MK. This suggests that the functions of the wmk genes may be 'suppressed' in certain Wolbachia-host systems. The wmk and Hm-oscar genes are located within a prophage region, and some wmk genes are tandemly arrayed (as described in Arai et al., 2023). These wmk homologs may have increased in number by horizontal phage transfer, and the region containing wmk and adjacent sequences may act as a genomic island for virulence. So far, the function of wmk homologs has only been tested in D. melanogaster and H. magnanima, and further studies in other Wolbachia-host systems are highly warranted to test whether wmk exerts MK effects in other insect models. These points have been briefly discussed in the revised manuscript (lines 301-306; 318-320).

      Why are some of the broods male-biased (2/3) rather than ~50:50? (Lines 170-175, Figure 2a). For example, there is a strong male bias in un-hatched oscar-injected and naturally infected embryos, whereas the control uninfected embryos have normal 50:50 sex ratios. It is difficult to interpret the rate of male-killing given that the sex ratios of different sets of zygotes are quite variable.

      The observed male-biased sex ratios in unhatched embryos are due to the occurrence of MK during embryogenesis. In the unhatched groups, the skew towards males reflects that fact that the male embryos were targeted and killed by Wolbachia/Oscar, leading to a surplus of unhatched male embryos. Conversely, hatched individuals show a higher proportion of females because many of the males were eliminated during embryogenesis. Thus, the unhatched embryos are more male-biased, while the hatched individuals are more female-biased in the Hm-oscar/_w_Hm-t treated groups. We have revised the relevant section (Males are killed mainly at the embryonic stage, lines 179-186) and provided more detailed information to clarify this explanation.

      Figure 2b - it appears there are both male and female bands in the HmOsc male lane. I think this makes sense (likely a partial phenotype due to the nature of the overexpression approach), but this is worth highlighting, especially in the context of trying to understand how much of the MK phenotype might be recapitulated through these methods. Related, there is no negative control for this PCR.

      Thank you for your suggestion. As you noted, a faint dsx-M band is visible in the Hm-oscar treated group in Figure 2b. This is consistent with previous findings by Arai et al. (2023), which reported that male embryos with low-density w_Hm-t showed double bands of _dsx-M and dsx-F, similar to what we observed in this study. This information has been included in the revised manuscript in lines 196-198, as follows:

      “Notably, male embryos expressing Hm-oscar also exhibited weak male-type dsx splicing in addition to the female-type splicing, resembling the previously observed pattern in male embryos infected with low-titer _w_Hm-t (Arai et al., 2023a).”

      Also, we appreciate your comment regarding the missing of negative control. The figure has now been revised as we realised that the negative control lane had been lost during the preparation of the figure. We also included the relevant molecular marker information in both the figure legends and Figure 2b.

      It appears the RNA-seq analysis (Figure 3) is based on a single biological replicate for each condition. And, there are no statistical comparisons that support the conclusions of a shift in dosage compensation. Finally, it is unclear what exactly is new data here: the authors note "The expression data of the wHm-t-infected and non-infected groups were also calculated based on the transcriptome data included in Arai et al. (2023a)" - So, are the data in Figure 3c and 3d a re-print of previous data? The level of dosage compensation inferred by visually comparing the control conditions in 3b and 3d does not appear consistent. With only one biological replicate library per condition, what looks like a re-print of previous data, and no statistical comparisons, this is a weakly supported conclusion.

      Thank you for your suggestion. In this study, we generated the RNA-seq data for the Hm-oscar/GFP-injected groups, but did not sequence the w_Hm-t-infected/NSR lines. Instead, the previously generated RNA-seq data of _w_Hm-t-infected/NSR (Arai et al., 2023) were re-analyzed (rather than simply reprinted) to evaluate whether the expression patterns of _Hm-oscar-injected and w_Hm-t-infected groups are similar. We have revised the Results section (_Hm-oscar impairs dosage compensation in male embryos, lines 200-212), the Materials and methods section (Quantification of Z chromosome-linked genes, lines 454-456), and the figure legends to provide more precise information about this analysis.

      Although we did not perform replicates for the RNA-seq comparisons, it is important to note that each RNA-seq sample contains 50-60 male/female individuals. We believe the results are still robust and clearly indicative of the trends we observe. This was further supported by the quantification of Hmtpi gene expression, which we have visualized in Figure 3e (and lines 210-212). As you noted, the expression patterns in Figure 3b (GFP injected) and Figure 3d (NSR) are not completely identical. This discrepancy may be due to the differences between injection treatments and natural infections. Nevertheless, both treatments are consistent in showing that gene expressions on the Z chromosome (Chr01 and Chr15) are not upregulated.

      We have also added more detailed statistical data for Figure 3 based on the Steel-Dwass tests. For Figure 3a-d, however, showing the statistical significance directly on the whisker plots would create excessive clutter due to the numerous combinations of chromosomes. Instead, we have provided the full statistical data in the supplementary data file. Furthermore, to support/strengthen our conclusion that Z-linked genes are highly expressed in w_Hm-t-infected/_Hm-Oscar-injected embryos, we have included expression data for the Z-linked gene tpi, along with statistical data, in the revised manuscript (Fig. 3e, lines 210-212).

      In Figure 4: There are no statistics to support the conclusions presented here. Additionally, the data have gone through a normalization process, but it is difficult to follow exactly how this was done. The control conditions appear to always be normalized to 100 ("The expression levels of BmImpM in the Masc and Hm-Oscar/Oscar co-transfected cells were normalized by setting each Masc-transfected cell as 100"). I see two problems with this approach:

      (1) This has eliminated all of the natural variation in BmImpM expression, which is likely not always identical across cells/replicates.

      (2) How then was the percentage of BmImpM calculated for each of the experimental conditions? Was each replicate sample arbitrarily paired with a control sample? This can lead to very different outcomes depending on which samples are paired with each other. The most appropriate way to calculate the change between experimental and control would be to take the difference between every single sample (6 total, 3 control, 3 experimental) and the mean of the control group. The mean of the control can then be set at 100 as the authors like, but this also maintains the variability in the dataset and then eliminates the issue of arbitrary pairings. This approach would also then facilitate statistical comparisons which is currently missing.

      Thank you for your suggestion. As you pointed out in (1), the previous analysis did indeed eliminate the natural variation in BmIMP-M expression. In the revised manuscript and Figure 4, we have reanalyzed the data following your suggestion and have described the variation across replicates.

      For (2), the data shown in the previous manuscript were normalized to 100 for each Masc-treated group. In doing so, each replicate sample was arbitrarily paired with a control sample from the same cell lot to account for variations that might occur due to differences in cell lots. However, following your recommendation, we have revised the figure to set the average of the Hm-masc treated group to 100, rather than using arbitrary pairings. More detailed normalization procedures have been provided in the section 'Transfection assays and quantification of BmIMP' (lines 483-520). Additionally, we have provided more detailed background information on the assay system in lines 218-223. Although we did not observe statistical significance based on the Steel-Dwass test, likely due to the limited number of replicates, the differences in IMP gene expression between the Masc-treated and Masc&Hm-oscar-treated groups remain evident.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Line 38: change to: 'Wolbachia are maternally transmitted'.

      Revised accordingly (line 38).

      Line 69: remove 'seemingly'.

      Revised accordingly (line 69).

      Paragraph starting line 123: I don't think this is so clear to a reader who is not familiar with the work and system. It would be helpful to more clearly explain that candidate male-killing genes from Wolbachia that infect Homona were inserted into Drosophila melanogaster, and that their expression was then induced, with interesting patterns (and that it can be a bit difficult to interpret the transgenic expression of genes from a moth male-killer that are inserted into a fly). Also, the sentence about the combined action of cifA and cifB in Drosophila cytoplasmic incompatibility is also confusing to a non-expert. I would suggest removing it.

      Thank you for your suggestion. We have revised the paragraph (lines 124-139) to provide clearer background information, making it easier for non-experts to follow. We have also removed the sentence regarding the combined effect of cifA and cifB to improve the flow and overall clarity.

      Line 170: what is the explanation for the male-biased sex ratio instead of 50-50?

      The male-biased sex ratio occurs because MK happens during embryogenesis. Unhatched embryos include males that were killed by Wolbachia/Oscar, resulting in a higher proportion of unhatched male embryos. Conversely, the hatched individuals display a female bias, as most of the males were eliminated during embryogenesis. Thus, the unhatched embryos are more male-biased, while the hatched individuals are more female-biased in the Hm-oscar/_w_Hm-t treated groups. We have revised the section “Males are killed mainly at the embryonic stage” (lines 170-186) to include more detailed information explaining this phenomenon.

      Line 190: please explain what are the Z chromosomes in Bombyx and Homona and Lepidoptera in general (chromosomes 1 and 15?), as this is not so clear for a non-expert.

      Thank you for your suggestion. I have revised the section (lines 200-212) to include more precise background information about the chromosome constitutions in lines 202-204 as follows:

      “Unlike other lepidopteran species, Tortricidae, including H. magnanima, generally possess a large Z chromosome that is homologous to B. mori chromosomes 1 (Z) and 15 (autosome).”

      Line 222: please explain oscar diversity and classification in more detail, as this is not so clear for a non-expert.

      Thank you for your suggestion. We have revised the sentences to provide clearer background information on the diversity of oscar genes (lines 255-264).

      Figure 4: I found this difficult to follow. Why are there 2 rows (HmOscar and Oscar)? Does oscar here refer to oscar from Ostrinia? I am also a bit confused about the baseline control of Masc in these cell lines. If I understand Lepidoptera sex determination, then these cell lines are expressing high levels of female-specific piRNAs that suppress Masc. How specific are these piRNAs (i.e. do Bombyx piRNAs suppress Mascs from other Lepidoptera)? How much extra Masc will override endogenous piRNA? Information is lost by setting Masc expression to 100% in each separate comparison.

      Yes, the Oscar indicates the w_Fur-encoded _oscar (Oscar from Ostrinia) that was tested to compare function with the Homona-derived Hm-oscar gene. In addition, following the reviewer's suggestions, we have revised the figure and included more detailed information on how we adjusted the expressions in the M&M section.

      A previous study (Shoji et al., 2017, RNA 23:86–97) demonstrated that the Fem piRNA (29 bp) in Bombyx mori requires a 17 bp complementary sequence from its 5' region for its function. However, in species other than B. mori, no significant homology (i.e., over 17 bp matches) was found between the B. mori Fem piRNA and the masc genes analyzed in this study. Therefore, it is likely that the Fem piRNA expressed in BmN-4 cells is unable to suppress the masculinizing function driven by masc genes in other lepidopteran species. In addition, we did not quantify the levels of piRNA in this system, but the expression levels of masc are probably too high to be suppressed.

      Figure 4 legend: spelling of Spodoptera.

      Revised accordingly.

      Reviewer #2 (Recommendations for the authors):

      In Figure 2, what is the dsx splicing type for the hatched male in the Hm-oscar-injected group and the wHm-t infected line? Dsx-F or dsx-M?

      Thank you for your suggestion. Unfortunately, we have not tested splicing in the hatched male neonates (1st instar larvae), partly due to difficulties in obtaining sufficient material for RNA extraction. Based on the previous publication in the Ostrinia system, where Oscar-bearing w_Sca induces MK, the hatched males (ZZ) exhibit female type _dsx as observed in the male embryos (Herran et al., 2022). The hatched Homona males may show double bands for dsx-M and dsx-F as observed in this study.

      The size of the markers (in kilobase pairs) should be indicated in Figure 2.

      We have accordingly included the marker information in the revised Figure 2b and the figure legends.

      In Figure 3, could the authors identify which genes exhibit higher expression levels in the Hm-oscar-injected group and the wHm-t infected line? Could they provide hints for the possible mechanism of male-killing?

      In the RNA-seq data shown in Figure 3a-d, we observed that both the Hm-oscar-injected and w_Hm-infected groups generally exhibited upregulated expression of Z-linked genes. Rather than the upregulation or downregulation of a specific gene, we consider that global upregulation of Z-linked genes, caused by improper dosage compensation, is lethal for males. The Z chromosome contains various genes involved in key biological processes such as endocrine function and detoxification, and disruption of these processes may contribute to male lethality. Additionally, in this revised manuscript, we have provided more detailed information on the expression level of the Z-linked gene _tpi. We have also discussed the potential mechanisms of MK in the Discussion section (lines 245-254).

      The format of the references should be consistent. Gene and species names should be italicized.

      We have accordingly formatted.

      Reviewer #3 (Recommendations for the authors):

      The authors use the term "upstream" (e.g., Oscar suppressed the function of masculinizer, the upstream male sex determinant...), which was sometimes confusing. In many cases, it reads as though the masculinizer was upstream of oscar, but what I think the authors are trying to convey is that masculinizer is a primary sex-determining factor.

      Thank you for your suggestion. We have accordingly revised the term.

      Line 101: which insect is wFur from?

      It is from Ostrinia furnacalis - line 104 has been revised.

      Figure 1: it would be helpful to indicate the statistical results on the figure.

      Accordingly, we have added statistical data (binominal test) for Figure 1. The data for the Steel-Dwass test have been included in the supplementary data.

      Figure 2b: please label the ladder on the gel.

      Thank you for your suggestion. We have accordingly labeled the DNA ladder on the gel.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      The paper by Auer et. makes several contributions: (1) The study developed a novel approach to map the microstructural organization of the human amygdala by applying radiomics and dimensionality reduction techniques to high-resolution histological data from the BigBrain dataset. (2) The method identified two main axes of microstructural variation in the amygdala, which could be translated to in vivo 7 Tesla MRI data in individual subjects. (3) Functional connectivity analysis using resting-state fMRI suggests that microstructurally defined amygdala subregions had distinct patterns of functional connectivity to cortical networks, particularly the limbic, frontoparietal, and default mode networks. (4) Meta-analytic decoding was used to suggest that the superior amygdala subregion's connectivity is associated with autobiographical memory, while the inferior subregion was linked to emotional face processing. (5) Overall, the data-driven, multimodal approach provides an account of amygdala microstructure and possibly function that can be applied at the individual subject level, potentially advancing research on amygdala organization.

      We thank the Reviewer for the positive comments and insightful evaluation of the work.

      (1.1) Although these are meritorious contributions there are some concerns that I will summarize below. The paper makes little-to-no contact with the monkey literature regarding the anatomy of amygdala subregions, their functionality, and their patterns of anatomical connectivity. This is surprising because such literature on non-human primates is a very important starting point for understanding the human amygdala. I recommend taking a careful look at the work by Helen Barbas, among others. There are too many papers to cite but a notable example is: Ghashghaei, H. T., Hilgetag, C. C., & Barbas, H. (2007). Sequence of information processing for emotions based on the anatomic dialogue between prefrontal cortex and amygdala. Neuroimage, 34(3), 905-923. The work of Amaral is also highly relevant.

      As suggested, we included the important work of Amaral et al. as well as Ghashghaei et al. highlighting its contribution to mapping the intricate anatomy and function of the amygdala in non-human primates. We comment on this in the Introduction of the manuscript. Please see P.3.

      “Early research on the amygdala in non-human primates has been instrumental in understanding its intricate structure, function and patterns of anatomical connectivity (Amaral and Price 1984; Ghashghaei et al. 2007). This foundational study highlights the amygdala’s different subdivisions, most notably the basomedial nucleus (BM), basolateral nucleus (BL), and central nucleus (Ce) (Amaral et al. 1992). Furthermore, this work describes a dense network between these subdivisions and the prefrontal cortex, most strongly found in the posterior orbitofrontal and anterior cingulate areas.”

      (1.2) Furthermore, the authors subscribe to a model with LB, CM, and SF sectors. How does the SF sector relate to monkey anatomy?

      The overall organization of these subregions is largely conserved between humans and monkeys, reflecting their evolutionary relationship. While the basic subregional organization is conserved, there are still some important structural and functional differences between human and monkey amygdalae. For example, the SF subregion, often described in humans includes parts of the cortical nuclei (VCo), anterior amygdaloid area (AAA), amygdalohippocampal transition area (AHi), amygdalopiriform transition area (APir) as well as the lateral olfactory tract (LOT). This remark was added in the Discussion, on P.12:

      “However, this region has been previously described as consisting of three main subdivisions: LB, CM, and SF, each composed of smaller subnuclei with distinct connectivity patterns and functions (Amunts et al. 2005; Ball et al. 2007; Bzdok et al. 2013; de Olmos and Heimer 1999). These subregions are largely conserved between humans and monkeys, reflecting their evolutionary relationship. However, there are still some considerable differences such as in the SF subregion, where its description in monkeys additionally contains the lateral olfactory tract (LOT) (De Olmos 1990).”

      (1.3) The authors use meta-analytical decoding via NeuroSynth. If the authors like those results of course they should keep them but the quality of coordinate reporting in the literature is insufficient to conclude much in the context of amygdala subregion function in my opinion. I believe the results reported are at most "somewhat suggestive".

      We agree with the Reviewer that use of data from NeuroSynth poses unique challenges, particularly relating to investigations of a small structure such as the amygdala. However, to clarify, these analyses decode the cortex-wide functional connectivity patterns of amygdala subregions and not activations within subregions defined by our microanatomical analyses. Additionally, comments from Reviewer 2 suggested expanding the NeuroSynth decoding to the contralateral hemisphere. As such, we decided to keep this analysis in the main manuscript but rephrase the interpretation of these findings in the Discussion to emphasize their exploratory nature on P.13:

      “Functional decoding of subregional functional connectivity patterns indicated possible dissociations in cognitive (e.g., memory) and affective (e.g., emotional face processing) functions of the amygdala, echoing previous accounts of this region’s involvement in associative processing of emotional stimuli. Notably, these findings link the functional connectivity profile of a subregion partially co-localizing with LB to emotional face processing. The LB subregion has been previously linked to associative processing related to the integration of sensory information (Bzdok et al. 2013; Ghods-Sharifi, St Onge, and Floresco 2009; Pessoa 2010; Winstanley et al. 2004; Boyer 2008), which is consistent with the association with visual emotional information processing identified in the present work.”

      (1.4) Another significant concern has to do with the results in Figure 3. The red and yellow clusters identified are quite distinct but the differences in functional connectivity are very modest. Figure 3C reveals very similar functional connectivity with the networks investigated. This is very surprising, and the authors should include a careful comparison with related findings in the literature. Overall, there is limited comparison between the observed results and those obtained via other methods. On a more pessimistic note, the results of Figure 3 seem to question the validity of the general approach.

      We agree with the Reviewer that we can indeed observe considerable overlap between functional connectivity profiles of amygdala subregions. The amygdala is a relatively small structure, leading to likely interconnectivity between its subregions (Bzdok et al. 2013) in addition to considering BOLD signal autocorrelation within this region. In addition, functional signals in the amygdala are affected by relatively lower signal-to-noise ratio (SNR), a limitation extending to temporobasal and mesiotemporal regions. Despite these challenges, our technique remained sensitive to detect subtle differences in connectivity patterns even in this small group of subjects in this restricted subcortical territory.

      In the revised manuscript, we further highlight these caveats in the Discussion (P.13):

      “Although these findings are promising, we also observe considerable overlap between functional connectivity networks of both our defined subregions. Indeed, the amygdala is a relatively small structure, leading to likely interconnectivity between its subregions and locally high signal autocorrelation. Functional connectivity and microstructure in the amygdala are certainly related, however previous work suggests they do not perfectly overlap (Bzdok et al. 2013). In addition, this region is affected by relatively low signal-to-noise ratio (SNR), as is observed in broader temporobasal and mesiotemporal territories.”

      (1.5) Some statements in the Discussion feel unwarranted. For example, "significant dissociation in functional connectivity to prefrontal structures that support self-referential, reward-related, and socio-affective processes." This feels way beyond what can be stated based on the analyses performed.

      We agree that this interpretation may reach beyond the analyses performed and reported findings. We have adjusted this portion of the text accordingly in our Discussion on functional connectivity findings (P.13):

      “Qualitatively, we found that the subregion defined by the highest 25% of U1 values mainly overlapped with what is commonly defined as the superficial and centromedial subregions, whereas the lowest 25% U1 values subregion overlapped mostly with the laterobasal division. Interestingly, CM and SF characterized subregions showed significantly stronger functional connectivity to prefrontal structures. This finding aligns with previous work demonstrating unique affiliations between the CM subregion and anterior cingulate and frontal cortices (Kapp, Supple, and Whalen 1994; Barbour et al. 2010), as well as between the SF subregion and the orbitofrontal cortex (Goossens et al. 2009; Caparelli et al. 2017; Pessoa 2010; Klein-Flügge et al. 2022).”

      Additionally, we have also edited our Discussion to ensure that our interpretations are grounded in the analyses conducted, while framing the findings as potential avenues for future work. Please see P.13.

      “Functional decoding of functional connectivity results indicated possible dissociations in cognitive (e.g., memory) and affective (e.g., emotional face processing) functions of the amygdala, echoing previous accounts of this region’s functional specialization and subregional segregation of associative processing of emotional stimuli.”

      Recommendations for the authors:

      (1.6) Figure 1 has panels A-I but only A-D are discussed in the caption. The orientation of the slices is not indicated which makes it very hard to follow for most readers.

      The subpanels are now referred to in the revised Results. We also added a notation on the orientation of the slices and described them accordingly in our Figure 1 description. (P.5-6):

      “(A) The amygdala was segmented from the 100-micron resolution BigBrain dataset using an existing subcortical parcellation (Xiao et al. 2019). Slice orientation is consistent across all panels in this figure.”

      (1.7) Some figure references in the text seem to be incorrect; please check that the text refers to the correct figure number and panel.

      We thank the Reviewer for pointing this out. We thoroughly revised the correspondence between figure panel labels and their referencing in the text.

      Reviewer #2:

      This study bridges a micro- to macroscale understanding of the organization of the amygdala. First, using a data-driven approach, the authors identify structural clusters in the human amygdala from high-resolution post-mortem histological data. Next, multimodal imaging data to identify structural subunits of the amygdala and the functional networks in which they are involved. This approach is exciting because it permits the identification of both structural amygdalar subunits, and their functional implications, in individual subjects. There are, however, some differences in the macro and microscale levels of organization that should be addressed.

      Strengths:

      The use of data-driven parcellation on a structure that is important for human emotion and cognition, and the combination of this with high-resolution individual imaging-based parcellation, is a powerful and exciting approach, addressing both the need for a template-level understanding of organization as well as a parcellation that is valid for individuals. The functional decoding of rsfMRI permits valuable insight into the functional role of structural subunits. Overall, the combination of micro to macro, structure, and function, and general organization to individual relevance is an impressive holistic approach to brain mapping.

      We thank the Reviewer for their constructive and helpful feedback on our work.

      Weaknesses:

      (2.1) UMAP 1, as calculated from the histological data, appears to correlate well across individuals, and decently with the MRI data, although the medial-lateral coordinate axis is an outlier. UMAP 2, on the other hand, does not appear to correlate well with imaging data or across individuals. This does pose a problem with the claim that this paper bridges micro- and macroscale parcellations. One might certainly expect, however, that different levels of organization might parcellate differently, but the authors should address this in the discussion and offer ways forward.

      Data driven methods hold several advantages for the quantitative extraction of signal from the underlying data in an observer-independent manner. However, these techniques are also sensitive to potential idiosyncrasies in the data. In the present work, our main analyses rely on the processing of a histological dataset (BigBrain) providing a unique opportunity for high-resolution analysis of amygdala histology and in vivo translation of findings leveraging ultra-high field MRI (n=10). However, both datasets are limited by their small sample size (n=1 for BigBrain and n=10 for MICA-PNI). As a result, we speculate that signal variations captured by U2 may be sensitive to artifacts or subject-specific sources of variance. Moving forward, this hypothesis could be assessed in future work via the analysis of larger histological and neuroimaging datasets to better track recurring features picked up by U2 or the association of these unique topographies with behavioural markers.

      As suggested, we included a section in our Discussion highlighting this shortcoming and the importance for larger datasets moving forward. Please see P.11-12.

      “However, it is important to note that both datasets analyzed in this work are limited by their small sample size (n=1 for BigBrain and n=10 for MICA-PNI). We speculate that the signal variations captured by U2 may be sensitive to artifacts or subject-specific sources of variance, potentially explaining why it was not consistent between subjects and modalities. Moving forward, this hypothesis could be assessed in future work via the analysis of larger histological and neuroimaging datasets to better track recurring features picked up by U2 or the association of these unique topographies with behavioural markers.”

      (2.1) It would be interesting to see functional decoding for the right amygdala. This could be included in the supplementary material. A discussion of differences in the results in the two hemispheres could be illuminating.

      In accordance with the Reviewer’s suggestion, we added Supplementary figure S2 exploring the decoding of connectivity profiles of the right amygdala stratified by its cytoarchitectural embedding with UMAP.

      Upon analysis, dissociation in functional connectivity patterns over the right amygdala were less evident, leading to overall similar functional decoding across the two clusters. We refer to this Supplementary Figure in our Discussion on P.13.

      “For the right amygdala, dissociation in functional connectivity patterns were more subtle, leading to overall similar functional decoding across the two clusters. (Figure S2)”

      (2.3) The authors acknowledge that this mapping matches some but not all subunits that have been previously described in the amygdala. It would be helpful to neuroanatomists if the authors could discuss these differences in more detail in the discussion, to identify how this mapping differs and what the implications of this are.

      In our work, we focus on mapping the three well characterized amygdala subregions, specifically the superficial (SF), centromedial (CM) and laterobasal (LB) subdivisions. Qualitative histological accounts have indeed delineated multiple subunits within these subregions which we now describe in the revised manuscript. Due to the lower resolution of in vivo MRI data used in this work relative to post mortem histology, we focused our analyses on larger subregions that could be more reliably mapped to native quantitative T1 spaces of each participant. We now overview this issue in the Discussion. Please see P.12.

      “Although qualitative histological accounts have indeed delineated multiple subunits within these general regions, the present work focuses on three subdivisions (Amunts et al. 2005) to account for resolution disparities when translating our findings to in vivo MRI data. The LB subdivision includes the basomedial nucleus (Bm), basolateral nucleus (BL), lateral nucleus (LA) and paralaminar nucleus (PL). Moving medially, the CM subdivision includes the central (Ce) and medial nuclei (Me), while the SF subdivision includes the anterior amygdaloid area (AAA), amygdalohippocampal transition area (AHi), amygdalopiriform transition area (APir), and ventral cortical nucleus (VCo) (Heimer et al. 1999). However, disagreement on the precise attribution of nuclei to broader subdivisions motivated our investigations of probabilistic subunits of the amygdala (Kedo et al. 2018). The development of new tools to segment amygdala subnuclei in vivo offers opens opportunities for future work to further validate our framework at the precision of these nuclei within subjects (Saygin et al. 2017).”

      (2.4) The acronym UMAP is not explained. A brief explanation and description would be useful to the reader.

      We moved the expanded acronym from the Methods to the first instance of the term UMAP in our paper, found in the Introduction. As suggested, we also added a sentence describing the technique. Please see P.6.

      “We then applied Uniform Manifold Approximation and Projection (UMAP), a non-linear dimensionality reduction technique that preserves the local and global structure of high-dimensional data by projecting it into a lower-dimensional space (Becht et al. 2018), to the resulting 20-feature matrix to derive a 2-dimensional embedding of amygdala cytoarchitecture (Figure 1D).”

    1. Author response:

      Reviewer 1:

      (1) Reward Interpretation and Skin Conductance Responses (SCR):

      The reviewer raises a valid point, as the model from which we derive prediction errors describes predictive learning—specifically, the occurrence of shock—without incorporating additional reward learning effects. SCRs are used to fit the model’s hyperparameters but do not directly measure reward; rather, they serve as a marker of arousal.

      In our paradigm, SCRs are measured during CS presentation and primarily reflect predictive learning, as they are closely linked to contingency awareness. The association between estimated prediction errors during unexpected US omissions and reward remains reliant on existing literature.

      In the revised manuscript, we will further elaborate on these points to clarify the distinction between predictive learning and direct reward processing, while contextualizing our findings within the broader literature on reward signaling and fear extinction.

      (2) Reinforcement Agent and SCR Modeling:

      Notably, we do not use SCR as a personalized expectation measure due to its limited reliability at the individual level; instead, the model's hyperparameters are fitted on the entire SCR dataset, yielding per-trial prediction and prediction error estimates for each CS sequence rather than for individual participants.

      (3) Clarity and Visualization of Results:

      We recognize that the presentation of our results can be improved and will take steps to enhance figure clarity, also ensuring that trend-level results are clearly distinguished.

      (4) Theoretical Context for Paradigm Phases:

      Regarding the differences across experimental phases, we recognize the theoretical significance of these distinctions. However, our primary focus is on identifying commonalities in unexpected US omission responses across phases rather than emphasizing phase-specific differences. Nevertheless, we will provide a brief clarification on phase differences to enhance the manuscript’s interpretability.

      (5) Cerebellum-VTA Connectivity Analysis:

      Furthermore, we acknowledge that our conclusion regarding the modulation of the dopaminergic system by the cerebellum should be framed more cautiously. We will temper our claims to better reflect the bidirectional and potentially indirect nature of cerebellum-VTA interactions. Additionally, we plan to include PPI results using a cerebellar seed showing the VTA, potentially in the supplementary material.

      Reviewer 2:

      (1) Success of extinction learning based on Self-reports and SCRs?

      The reviewer points to a problem, which is inherent to extinction learning: The initial fear association is not erased, but merely inhibited, and is prone to return. Although the recall phase follows the extinction phase, we did not expect a complete inhibition of the conditioned response; instead, spontaneous recovery is expected. In fact, the spontaneous recovery observed in the recall phase provided us with an additional opportunity to investigate unexpected US omissions, which was our primary focus.

      (2) Concerns on reliability of event-based contrasts using three events:

      Regarding concerns about the reliability of analyses based on three events, we believe that the consistency of our parametric modulation analysis— which incorporates all events— combined with the three-event analysis results, provides further support for the observed patterns. We are currently discussing ways of additional analysis for further verification of the reliability of using three events.

      (3) Deviations from preregistration:

      Finally, we will carefully review all deviations from our preregistration to ensure transparency. Any methodological or analytical changes will be explicitly addressed in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In the present work the authors explore the molecular driving events involved in the establishment of constitutive heterochromatin during embryo development. The experiments have been carried out in a very accurate manner and clearly fulfill the proposed hypotheses.

      Regarding the methodology, the use of: i) an efficient system for conversion of ESCs to 2C-like cells by Dux overexpression; ii) a global approach through IPOTD that reveals the chromatome at each stage of development and iii) the STORM technology that allows visualization of DNA decompaction at high resolution, helps to provide clear and comprehensive answers to the conclusion raised.

      The contribution of the present work to the field is very important as it provides valuable information on chromatin-bound proteins at key stages of embryonic development that may help to understand other relevant processes beyond heterochromatin maintenance.

      The study could be improved through a more mechanistic approach that focuses on how SMARCAD1 and TOPBP1 cooperate and how they functionally connect with H3K9me3, HP1b and heterochromatin regulation during embryonic development. For example, addressing why topoisomerase activity is required or whether it connects (or not) to SWI/SNF function and the latter to heterochromatin establishment, are questions that would help to understand more deeply how SMARCAD1 and TOPBP1 operate in embryonic development.

      We would like to thank the reviewer for the positive evaluation of our work and the methodology we employed. We greatly appreciated the reviewer’s recognition of our study to “provide valuable information on chromatin-bound proteins at key stages of embryonic development that may help to understand other relevant processes beyond heterochromatin maintenance”. While we acknowledge the value of including mechanistic studies, such an addition would require a substantial amount of experimental work that exceeds our current resources.

      Reviewer #1 (Recommendations For The Authors):

      In my opinion, the authors could improve the study by deciphering -to a certain extent- the possible mechanism by which SMARCAD1 and TOPBP1 are cooperating in their system to establish H3K9me3 and consequently heterochromatin; and whether it is different (or not) from that already reported in yeast (ref 27). In fact, is it only SMARCAD1 that participates in this process or the whole SWI/SNF complex? Could the lack of SMARCAD1 compromise the proper assembly of the SWI/SNF complex? In this regard, a model describing the main findings of the study and the discussion of the possible mechanisms involved -based on the current bibliography- would be appreciated. This, although speculative, would illustrate the range of possibilities that could be operating in the maintenance of heterochromatin during embryonic development. In conclusion, it would be great if the authors could link -mechanistically- the dots connecting SMARCD1, TOPBP1, H3K9me3/HP1/heterochromatin.

      As suggested by the reviewer and to enrich the discussion, we have included some additional sentences and references in the revised discussion section.

      As a minor point, In Figure 3A, left panel it appears that the protein precipitating with H3K9me3 reacts with TOPBP1 but its molecular weight does not exactly match to the TOPBP1 band found in the input. The authors should clarify this point and it is also recommended that IPs and inputs are run in the same gel. Please replace Figure 3A right panel.

      Following the reviewer’s suggestion and to improve the reading flow, we have restructured the order of the figures and removed the original Figure 3A. The revised Figure 3A-C panel illustrates the SMARCAD1 association with H3K9me3 in ESCs and 2C- cells, while capturing the reduced SMARCAD1-H3K9me3 association in 2C<sup>+</sup> cells.

      Reviewer #2 (Public Review):

      The manuscript by Sebastian-Perez describes determinants of heterochromatin domain formation (chromocenters) at the 2-cell stage of mouse embryonic development. They implement an inducible system for transition from ESC to 2C-like cells (referred to as 2C<sup>+</sup>) together with proteomic approaches to identify temporal changes in associated proteins. The conversion of ESCs to 2C<sup>+</sup> is accompanied by dissolution of chromocenter domains marked by HP1b and H3K9me3, which reform upon transition back to the 2C-like state. The innovation in this study is the incorporation of proteomic analysis to identify chromatin-associated proteins, which revealed SMARCAD1 and TOPBP1 as key regulators of chromocenter formation.

      In the model system used, doxycycline induction of DUX leads to activation of EGFP reporter regulated by the MERVL-LTR in 2C<sup>+</sup> cells that can be sorted for further analysis. A doxycycline-inducible luciferase cell line is used as a control and does not activate the MERVL-LTR GFP reporter. The authors do see groups of proteins anticipated for each developmental stage that suggest the overall strategy is effective.

      The major strengths of the paper involve the proteomic screen and initial validation. From there, however, the focus on TOPBP1 and SMARCAD1 is not well justified. In addition, how data is presented in the results section does not follow a logical flow. Overall, my suggestion is that these structural issues need to be resolved before engaging in comprehensive review of the submission. This may be best achieved by separating the proteomic/morphological analyses from the characterization of TOPBP1 and SMARCAD1.

      We appreciate the reviewer’s positive evaluation of our inducible system to trigger the transition from ESCs to 2C-like cells, and the strength of the chromatin proteomics we conducted. In response to the reviewer’s suggestion, we have reorganized the order of the figures, particularly Figure 1 and Figure 2, and revised the text to improve readability and flow.

      Reviewer #2 (Recommendations For The Authors):

      There are some very interesting components to the study but, as noted, the narrative requires changes and the rationale for focusing on TOPBP1 and SMARCAD1 is not strong at present. Specific comments are noted below

      (1) Inclusion of authentic 2C cells for comparative chromocenter analysis (or at least a more fulsome discussion of how the system has been benchmarked in previous studies).

      We have included more detail in the revised methods section, in the “Cell lines and culture conditions” paragraph. We have added: “The Dux overexpression system was benchmarked according to previously reported features. Dux overexpression resulted in the loss of DAPI-dense chromocenters and the loss of the pluripotency transcription factor OCT4 (fig. S1E) (6, 7), upregulation of specific genes of the 2-cell transcriptional program such as endogenous Dux, MERVL, and major satellites (MajSat) (fig. S1F) (6, 7, 11, 26, 58), and accumulation in the G2/M cell cycle phase (fig. S1G), with a reduced S phase consistent in several clonal lines (fig. S1H) (15).”

      (2) In Figure 1A, the text indicates a loss of chromocenters, but it may be better described as decompaction because the DAPI/H3K9me3 staining shows diffuse/expanded structures (this is in fact how it is described in relation to Figure 2).

      We have changed the text accordingly, now describing it as “decompaction”.

      (3) Table S1 has 6 separate tabs but these are not specified in the text. It would be useful to separate the 397 proteins unique to Luc and 2C- cells since they form much of the basis for the remaining analysis. This approach also assumes it is the absence of a protein in the 2C<sup>+</sup> that accounts for the lack of chromocenters (noting there are 510 proteins unique to the 2C<sup>+</sup> state that are not discussed).

      We have referenced the supplementary table as Table S1 in the text for simplicity. It includes: Table S1A - List of Protein Groups identified by mass spectrometry in -EdU, Luc, 2C- and 2C<sup>+</sup> cells; Table S1B - Input data for SAINT analysis; Table S1C - SAINT results of the comparison 2C- vs Luc and 2C<sup>+</sup> vs Luc; Table S1D - SAINT results of the comparison Luc vs 2C- and 2C<sup>+</sup> vs 2C-; Table S1E - SAINT results of the comparison Luc vs 2C<sup>+</sup> and 2C- vs 2C<sup>+</sup>; and Table S1F - Total number of PSM per protein in the different cells and conditions tested.

      (4) Since there is no change in H3K9me3 levels, loss of SUV420H2 from 2C<sup>+</sup> chromatin (figure 1G) coupled with potential changes in H4K20me3 could contribute the morphological differences. SUV420H2 is known to regulate chromocenter clustering in a way the requires H4K20me3 but this is not addressed or cited (PUBMED: 23599346).

      As suggested by the reviewer, we have added additional sentences and references in the revised manuscript.

      (5) In Figure 1C, there does appear to be overlap between the 2C<sup>+</sup> and 2C- populations (while the Luc population is distinct) even though they are morphologically distinct when imaged in Figure 2A. The 2C- cells are thought to be an intermediate, low Dux expressing population.

      Chromatome profiling through genome capture provides a snapshot of the chromatin-bound proteome in the analyzed samples (shown in revised Fig. 2B). As indicated by the reviewer and previously reported in the literature, 2C- cells are an intermediate population before reaching 2C<sup>+</sup> cells. For this study, we have focused on H3K9me3 morphological changes. Even though 2C- and 2C<sup>+</sup> cells are distinct with respect to H3K9me3 morphology (shown in revised Fig. 1B), analysis of the chromatome data from hundreds of chromatin-bound proteins revealed some overlap between these two populations. However, replicates from the same population tend to cluster together, for example, 2C<sup>+</sup> rep1 and 2C<sup>+</sup> rep3, and 2C- rep1 and 2C- rep2. Collectively, these data suggest that a defined subset of coordinated changes in the chromatome likely triggers the transition from 2C- to 2C<sup>+</sup> cells. Further experimental investigation of the chromatome dataset during the 2C-like transition would be interesting, however, we believe it is beyond the scope of this study.

      (6) Data with SUV39H1 and 2 is difficult to accommodate; what about other H3K9 methyltransferases or proteins such as TRIM28 (KAP1) and SETDB1 (this comes up in the discussion but is not assessed in the results section).

      We agree that investigating the role of TRIM28 (KAP1) and SETDB1 in this experimental setting could be of interest, however, we believe that these experiments go beyond the scope of the presented study.

      (7) Rationale for choosing TOPBP1 needs to be improved. How do TOPBP1 levels relate to TOPI/TOP2A/TOP2B levels across the 3 cell populations? By what criteria does topoisomerase inhibitor treatment increase 2C<sup>+</sup> like cells? Moreover, to what extent will inhibiting topoisomerases lead to global heterochromatin and cell cycle changes regardless of cell type.

      Following the reviewer’s suggestion, we have included some additional references throughout the text to strengthen our rationale for selecting TOPBP1, given its well-established critical role in DNA replication and repair. Additionally, we have revised the results and discussion sections to include new sentences that propose a potential mechanism by which topoisomerase inhibitors may indirectly recruit TOPBP1 to facilitate DNA repair, ultimately leading to an increase in 2C<sup>+</sup> cells.

      (8) Likewise, the decision to look at SMARCAD1 based solely on its interaction with TOPBP1 seems somewhat arbitrary and it did not seem to come up as of interest in the iPOTD analysis. Moreover, they were not able to validate the interaction with their own analyses.

      We have revised the text to clarify the connection further.

      (9) The flow of results is confusing. The first section concludes with a focus on TOPBP1 and SMARCAD1, then progresses to morphological characterization of heterochromatin regions in the next two sections before returning to TOPBP1 and SMARCAD1. It seems like it would make more sense to describe the model system and morphological characterization at the beginning of the results section and then transition to the proteomic analysis and characterization of TOPBP1 and SMARCAD1 (with the expectation that the rationale be improved).

      As suggested by the reviewer, we have reordered the figures, particularly Figure 1 and Figure 2, and rephased the text to improve the overall reading flow.

      (10) There has been considerable work done on characterizing chromatin structure, epigenetic changes, and morphology during early embryonic development. It is therefore difficult to see what validating some of these changes in the inducible model is adding much in the way of new knowledge. It may, but this is not articulated in the current text.

      As detailed before, we have rephrased the text to improve the overall reading flow, which we hope has improved the understanding of the impact of our results.

      (11) It is difficult to disentangle broader effects of both TOPBP1 and SMARCAD1 from those described here; they may induce phenotypes, but these may not be unique to this model system.

      We agree with the reviewer, but to address this point would require additional experiments which would go beyond the scope of the presented study.

      (12) One of the issues with this assay is global chromatin recovery; it is not focused on heterochromatin compartments. The statement "We identified a total of 2396 proteins, suggesting an efficient pull-down of chromatin-associated factors (fig. S2D and Table S1)" does not demonstrate efficiency. Additional functional annotation would be required to establish this claim, including what fraction are known chromatin-associated proteins (with a focus on the heterochromatin compartment).

      We have changed the text accordingly. The resulting statement reads as: “We identified a total of 2396 proteins, suggesting an effective pull-down of putative chromatin-associated factors (fig. S2D and Table S1)”.

      Reviewer #3 (Public Review):

      The manuscript entitled "SMARCAD1 and TOPBP1 contribute to heterochromatin maintenance at the transition from the 2C-like to the pluripotent state" by Sebastian-Perez et al. adopted the iPOTD method to compare the chromatin-bound proteome in ESCs and 2C-like cells generated by Dux overexpression. The authors identified 397 chromatin-bound proteins enriched only in ESC and 2C- cells, among which they further investigated TOPBP1 due to its potential role in controlling chromocenter reorganization. SMARCD1, a known interacting protein of TOPBP1, was also investigated in parallel. The authors observed increased size and decreased number of H3K9me3-heterochromatin foci in Dux-induced 2C<sup>+</sup> cells. Interestingly, depletion of TOPBP1 or SMARCD1 also led to increased size and decreased number of H3K9me3 foci. However, depletion of these proteins did not affect entry into or exit from the 2C-like state. Nevertheless, the authors showed that both TOPBP1 and SMARCD1 are required for early embryonic development.

      Although this manuscript provides new insights into the features of 2C-like cells regarding H3K9me3-heterochromatin reorganization, it remains largely descriptive at this stage. It does not provide new insights into the following important aspects: 1) how SMARCD1 associates with H3K9me3 and contributes to heterochromatin maintenance, 2) how TOPBP1 regulates the expression of SMARCD1 and facilitates its localization in heterochromatin foci, 3) whether the remodelling of chromocenter is causally related to the mutual transitions between ESCs and 2C-like cells. Furthermore, some results are over-interpreted. Additional experiments and analyses are needed to increase the strength of mechanistic insights and to support all claims in the manuscript.

      We would like to thank the reviewer for their positive and thorough evaluation of our manuscript. We have revised the text and hope that the overall flow is now clearer. Moreover, while we acknowledge the value of including mechanistic studies, such an addition would require a substantial amount of experimental work that exceeds our current resources. 

      Reviewer #3 (Recommendations For The Authors):

      Major points:

      (1) Fig.2: the DNA decompaction of the chromatin fibers shown in 2C<sup>+</sup> cells may be more related to a relaxed 3D chromatin conformation (Zhu, NAR 2021; Olbrich, Nat Commun 2021) than chromatin accessibility. The authors should discuss this point.

      As suggested by the reviewer, we have included some additional sentences and references in the revised manuscript to address this concern.

      (2) Chemical inhibition of topoisomerases resulted in an increase in the percentage of 2C<sup>+</sup> cells. Does depletion of TOPBP1 also resulted in increased percentage of 2C<sup>+</sup> cells? Please include this result in Fig. 3E. Additionally, it should be noted that DDR and p53 have been reported to activate Dux (Stashpaz, eLife 2020; Grow, Nat Genet 2021), and thus, may contribute to the increased percentage of 2C<sup>+</sup> cells observed upon topoisomerase inhibition. This point should be discussed in the manuscript.

      To address this concern, we have included some additional sentences and references in the revised manuscript.

      (3) Fig 3A: the TOPBP1 band in the IP sample is questionable, and therefore the conclusion that TOPBP1 is associated with H3K9me3 is difficult to draw from Fig 3A. Additionally, the authors mentioned that association of TOPBP1 and SMARCAD1 is undetected in ESCs, likely due to the suboptimal efficiency of available antibodies. As these are key conclusions in this study, the authors are suggested to try other commercially available TOPBP1 antibodies (e.g., Abcam #ab-105109, used by ElInati, PNAS 2017) or knock-in tags to perform the co-IP experiment.

      Following the reviewer’s suggestion and to improve the reading flow, we have restructured the order of figures and removed the original Figure 3A. The revised Figure 3A-C panel illustrates the SMARCAD1 association with H3K9me3 in ESCs and 2C- cells, while capturing the reduced SMARCAD1-H3K9me3 association in 2C<sup>+</sup> cells.

      (4) Fig. 3C-D, Fig. S3D: the authors claimed reduction of both SMARCAD1 expression and its co-localization with H3K9me3 foci in 2C<sup>+</sup> cells, but did not perform mechanistic studies. It is important to know if TOPBP1 expression also decreases in 2C<sup>+</sup> cells. Additionally, it is unclear if the reduced co-localization of SMARCAD1 with H3K9me3 foci results from its altered nuclear localization or simply from reduced expression level? In either case, please provide some mechanistic insights.

      While we acknowledge the value of including mechanistic studies, such an addition would require a substantial amount of experimental work that exceeds our current resources. 

      (5) Fig. 3K, Fig. S4D-E: does SMARCAD1 expression decrease upon TOPBP1 depletion? Statistical analysis of SMARCAD1 intensity in Fig. S4E is needed, and a Western blot analysis is strongly suggested. Additionally, it is unclear if the reduced co-localization of SMARCAD1 with H3K9me3 foci results from its altered nuclear localization or simply from reduced expression level? In Fig. 3K, TOPBP1-depleted cells appear to show decreased size and increased number of H3K9me3 foci, which is inconsistent with Fig. S4B-C. The authors should clarify this discrepancy. Furthermore, statistics should be performed to determine whether Smarcad1/Topbp1 knockdown could further increase the size and decrease the number of H3K9me3 foci in 2C<sup>+</sup> cells. This would provide additional evidence for the involvement of these proteins in heterochromatin maintenance.

      We did not observe Smarcad1 downregulation after Topbp1 knockdown (shown in fig. S4A). In Figs. S4B and S4C, we observed that the number of H3K9me3 foci decreased, and their area became larger after knocking down either Smarcad1 or Topbp1, compared to scramble controls. These results align with the reviewer’s comment. Additionally, it should be noted that these findings were derived from the quantification of tens of cells and hundreds of foci, as indicated in the figure legend. This resulted in statistical significance after applying the test indicated in the figure legend.

      (6) Fig. 3J is suggested to be moved to Fig. 4. Additionally, performing immunostaining of SMARCAD1, TOPBP1, and H3K9me3 during pre-implantation development would provide valuable information on their protein-level dynamics, interactions, and functions in early embryos. This would further strengthen the conclusions drawn in the manuscript.

      We agree that performing these additional experiments would provide additional valuable information, however this would require a substantial amount of experimental work that exceeds our current resources.

      (7) Fig. 4 and Fig. S5: the authors observed reduced H3K9me3 signal in the Smarcad1 MO embryos at the 8-cell stage, but claim that they failed to examine Topbp1 MO embryos at the 8-cell stage due to their developmental arrest at the 4-cell stage. However, based on Fig. 4A, not all Topbp1 MO embryos were arrested at the 4-cell stage, and it is still possible to examine the H3K9me3 signal in 8-cell Topbp1 MO embryos, which is critical for demonstrating its function in early embryos. Also, how to interpret the increased HP1b signal in Topbp1 MO embryos?

      For Topbp1 silencing, we observed an even more severe phenotype compared to Smarcad1 MO. All the Topbp1 MO-injected embryos (100 %) arrested at the 4-cell stage and did not develop further (shown in Fig. 4A and 4B). Therefore, the severity of the Topbp1 morpholino phenotype posed a technical challenge in evaluating the H3K9me3 signal in 8-cell Topbp1 MO embryos, as none of the injected embryos developed beyond the 4-cell stage.

      We believe the increased HP1b signal in Topbp1 MO embryos could indicate potential alterations in chromatin organization and heterochromatin stability. Specifically, we observed remodeling of heterochromatin in both 2-cell and 4-cell Topbp1 MO arrested embryos compared to controls, as evidenced by the spreading and increased HP1b signal (shown in fig. S5F-S5I). Further investigations could enhance our understanding of the underlying defects in Topbp1 knockdown embryos, extending beyond heterochromatin-related errors.

      Minor points:

      (1) Page 4, the third row from the bottom: please revise the sentence.

      We have reviewed the text and it now reads correctly in the revised manuscript.

      (2) Fig. 1C: The authors claimed "Luc replicates clustered separately from 2C<sup>+</sup> and 2C- conditions", however, Luc rep3 is apparently clustered with 2C conditions.

      (3) The GFP signal in Fig. S1E is confusing.

      (4) Please include ESC in Fig. 2D-E. Also label the colors in Fig. 2E.

      As indicated in the figure legend of the revised Fig. 1F: “Cells with a GFP intensity score > 0.2 are colored in green. Black dots indicate 2C- cells and green dots indicate 2C<sup>+</sup> cells.”

      (5) Fig. 2G: Transposition of the heatmap (show genes in rows) is suggested to improve readability.

      (6) Page 7, the third row from the bottom: incorrect citation of Fig. 1K.

      Thank you for spotting this incorrect citation. We have corrected it in the revised manuscript.

      (7) Page 8, row 15, Fig. S3D should be cited to support the decreased expression of SMARCAD1 in 2C<sup>+</sup> cells.

      We have cited the corresponding supplementary figure S3D in the mentioned sentence.

      (8) Fig. 2H: what is the difference between "2C-" and "ESC-like"?

      We named 2C- to those cells not expressing the GFP reporter in the transition from ESCs to 2C<sup>+</sup> cells. We named ESC-like cells to those cells that do not express the GFP reporter during exit, meaning from sorted and purified 2C<sup>+</sup> to a GFP negative state.

      (9) Fig. S4A-C: compared with shTopbp1#2, shTopbp1#1 appears to be slightly more effective in knockdown, but less dramatic changes in the size/number of H3K9me3 foci.

      (10) Fig. 4: please show the effectiveness of Topbp1 MO by Immunostaining of TOPBP1.

      (11) Fig. 4C: please label the developmental stage as in Fig. 4E and 4G.

      We have added a “8-cell” label in the Figure 4C, as suggested by the reviewer.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, Zhao and colleagues investigate inflammasome activation by E. tarda infections. They show that E. tarda induces the activation of the NLRC4 inflammasome as well as the non-canonical pathway in human THP1 macrophages. Further dissecting NLRC4 activation, they find that T3SS translocon components eseB, eseC and eseD are necessary for NLRC4 activation and that delivery of purified eseB is sufficient to trigger NAIP-dependent NLRC4 activation. Sequence analysis reveals that eseB shares homology within the C-terminus with T3SS needle and rod proteins, leading the authors to test if this region is necessary for inflammasome activation. They show that the eseB CT is required and that it mediates interaction with NAIP. Finally, they that homologs of eseB in other bacteria also share the same sequence and that they can activate NLRC4 in a HEK293T cell overexpression system.

      Strengths:

      This is a very nice study that convincingly shows that eseB and its homologs can be recognized by the human NAIP/NLRC4 inflammasome. The experiments are well designed, controlled and described, and the papers is convincing as a whole.

      Weaknesses:

      The authors need to discuss their study in the context of previous papers that have shown an important role for E. tarda flagellin in inflammasome activation and test whether flagellin and/or E. tarda T3SSs needle or rod can activate NLRC4.

      The authors show that eseB and its homologs can activate NLRC4, but there are also other translocon proteins that are very different such as YopB or PopB. and share little homology with eseB. It would be nice to include a section comparing the different type 3 secretion systems. are there 2 different families of T3SSs, those that feature translocon components that are recognized by NAIP-NLRC4 and those that cannot be recognized?

      (1) The authors need to discuss their study in the context of previous papers that have shown an important role for E. tarda flagellin in inflammasome activation and test whether flagellin and/or E. tarda T3SSs needle or rod can activate NLRC4.

      According to the reviewer’s suggestion, we added the relevant discussion (lines 326-334) and carried out additional experiments to examine whether E. tarda flagellin, needle, and rod could activate NLRC4. The relevant results are shown in Figure S3, Figure S5, and lines 226-230 and 269-274.

      (2) The authors show that eseB and its homologs can activate NLRC4, but there are also other translocon proteins that are very different such as YopB or PopB. and share little homology with eseB. It would be nice to include a section comparing the different type 3 secretion systems. are there 2 different families of T3SSs, those that feature translocon components that are recognized by NAIP-NLRC4 and those that cannot be recognized?

      According to the reviewer’s suggestion, additional experiments were performed to examine the NLRC4-activating potentials of 14 translocator proteins that share low sequence identities with EseB. The relevant results and discussion are shown in Figure S8 and lines 289-301; 364-372, and 377-379.

      Reviewer #2 (Public Review):

      Summary:

      This work by Zhao et al. demonstrates the role of the Edwardsiella tarda type 3 secretion system translocon in activating human macrophage inflammation and pyroptosis. The authors show the requirement of both the bacterial translocon proteins and particular host inflammasome components for E. tarda-induced pyroptosis. In addition, the authors show that the C-terminal region of the translocon protein, EseB, is both necessary and sufficient to induce pyroptosis when present in the cytoplasm. The most terminal region of EseB was determined to be highly conserved among other T3SS-encoding pathogenic bacteria and a subset of these exhibited functionally similar effects on inflammasome activation. Overall, the data support the conclusions and interpretations and provide interesting insights into interactions between bacterial T3SS components and the host immune system.

      Strengths:

      The authors use established and reliable molecular biology and bacterial genetics strategies to characterize the roles of the bacterial T3SS translocon and host inflammasome pathways to E. tarda-induced pyroptosis in human macrophages. These observations are naturally expanded upon by demonstrating the specific regions of EseB that are required for inflammasome activation and the conservation of this sequence among other pathogenic bacteria.

      Weaknesses:

      The functional assessment of EseB homologues is limited to inflammasome activation at the protein level but does not include the effects on cell viability as shown for E. tarda EseB. Confirmation that EseB homologues have similar effects on cell death would strengthen this portion of the manuscript.

      According to the reviewer’s suggestion, the effects of representative EseB homologs on cell death were examined in the revised manuscripts (Figure 5D, Figure S7 and line 289).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I only have a few suggestions on how to improve the study:

      Activation of caspase-4 requires entry into the host cytosol. Can this be observed with E. tarda and is it T3SS dependent? The fact that deleting the translocon components abrogates all GSDMD activation (see Fig. 2D) suggests that also Casp4 activation requires an active T3SS. It would be useful for the reader to include some more information on the cellular biology of E. tarda.

      In our study, we found that E. tarda could enter THP-1 cells (Figure S1), and host cell entry was not affected by deletion of eseB-D (Δ_eseB-D_) in the T3SS system (Figure 2B, C). Additional experiments showed that Δ_eseB-D_ abolished the ability of E. tarda to activate Casp4 (Figure S2), implying that Casp4 activation required an active T3SS. Relevant changes in the revised manuscript: lines 223 and 224, 341-342.

      The data presented by the authors suggest that escB is sensed by NLRC4 when overexpressed, they do however not prove that during an infection escB is the main factor that drives NLRC4 activation, since deficiency in escB also abrogated translocation of other potential activators of NLRC4, e.g. flagellin and T3SS needle and rod subunits. I would thus find it essential to properly test if E. tarda flagellin can activate NLRC4 by comparing a WT and flagellin deficient strain, and/or by transfecting or expressing E.t. flagellin in these cells, as well as testing whether E.t. rod and needle subunits act as NLRC4 activators. This is important as previous studies suggested that flagellin is the main activator of cytotoxicity during E. tarda infection.

      Previous studies have shown that flagellin is required for E. tarda-induced macrophage death in fish [1] but not in mice [2]. In the revised manuscript, we performed additional experiments to examine whether E. tarda flagellin, needle, and rod could activate NLRC4. The relevant results are shown in Figure S3, Figure S5, and lines 226-230 and 269-274, and 326-334.

      References

      (1) Xie HX, Lu JF, Rolhion N, Holden DW, Nie P, Zhou Y, et al. Edwardsiella tarda-induced cytotoxicity depends on its type III secretion system and flagellin. Infect Immun. 2014;82(8):3436-45. doi: 10.1128/IAI.01065-13.

      (2) Chen H, Yang D, Han F, Tan J, Zhang L, Xiao J, et al. The bacterial T6SS effector EvpP prevents NLRP3 inflammasome activation by inhibiting the Ca<sup>2+</sup>-dependent MAPK-JNK pathway. Cell Host Microbe. 2017;21(1):47-58. doi: 10.1016/j.chom.2016.12.004.

      Figure 5/S4, please list the names of the eseB homologs. It is cumbersome to have to access GenBank with the accession number to be able to understand what proteins the authors define as homologs of eseB.

      The names were added to the revised Table S2, Figure 5 and Figure S6 (the original Figure S4).

      The authors mention that other translocon proteins, such as YopB/D and PopB/D, were suggested to cause inflammasome activation. How do these compare to eseB and its homologs? Do they share the CT motif?

      Additional experiments were performed to compare the inflammasome activation abilities of EseB and other translocator proteins including YopD and PopD. The relevant results and discussion are shown in Figure S8 and lines 289-301, 364-372, and 377-379.

      It would be nice to show that there are potentially two groups of translocon proteins, one group sharing homology to needle subunits within the CT region and another that is different. A quick look at the sequence of these proteins suggests that they are quite different and much larger than eseB.

      In our study, additional experiments with more translocator proteins indicated that the possession of EseB T6R-like terminal residues does not necessarily guarantee the protein to activate the NLRC4 inflammasome. Relevant results and discussion are shown in lines 289-301, 364-372, and 377-379.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, Satouh et al. report giant organelle complexes in oocytes and early embryos. Although these structures have often been observed in oocytes and early embryos, their exact nature has not been characterized. The authors named these structures "endosomal-lysosomal organelles form assembly structures (ELYSAs)". ELYSAs contain organelles such as endosomes, lysosomes, and probably autophagic structures. ELYSAs are initially formed in the perinuclear region and then migrate to the periphery in an actin-dependent manner. When ELYSAs are disassembled after the 2-cell stage, the V-ATPase V1 subunit is recruited to make lysosomes more acidic and active. The ELYSAs are most likely the same as the "endolysosomal vesicular assemblies (ELVAs)", reported by Elvan Böke's group earlier this year (Zaffagnini et al. doi.org/10.1016/j.cell.2024.01.031). However, it is clear that Satouh et al. identified and characterized these structures independently. These two studies could be complementary. Although the nature of the present study is generally descriptive, this paper provides valuable information about these giant structures. The data are mostly convincing, and only some minor modifications are needed for clarification and further explanation to fully understand the results.

      Reviewer #2 (Public Review):

      Satouh et al report the presence of spherical structures composed of endosomes, lysosomes, and autophagosomes within immature mouse oocytes. These endolysosomal compartments have been named as Endosomal-LYSosomal organellar Assembly (ELYSA). ELYSAs increase in size as the oocytes undergo maturation. ELYSAs are distributed throughout the oocyte cytoplasm of GV stage immature oocytes but these structures become mostly cortical in the mature oocytes. Interestingly, they tend to avoid the region which contains metaphase II spindle and chromosomes. They show that the endolysosomal compartments in oocytes are less acidic and therefore non-degradative but their pH decreases and becomes degradative as the ELYSAs begin to disassemble in the embryos post-fertilization. This manuscript shows that lysosomal switching does not happen during oocyte development, and the formation of ELYSAs prevents lysosomes from being activated. Structures similar to these ELYSAs have been previously described in mouse oocytes (Zaffagnini et al, 2024) and these vesicular assemblies are important for sequestering protein aggregates in the oocytes but facilitate proteolysis after fertilization. The current manuscript, however, provides further details of endolysosomal disassembly post-fertilization. Specifically, the V1-subunit of V-ATPase targeting the ELYSAs increases the acidity of lysosomal compartments in the embryos. This is a well-conducted study and their model is supported by experimental evidence and data analyses.

      Reviewer #3 (Public Review):

      Fertilization converts a cell defined as an egg to a cell defined as an embryo. An essential component of this switch in cell fate is the degradation (autophagy) of cellular elements that serve a function in the development of the egg but could impede the development of the embryo. Here, the authors have focused on the behavior during the egg-to-embryo transition of endosomes and lysosomes, which are cytoplasmic structures that mediate autophagy. By carefully mapping and tracking the intracellular location of well-established marker proteins, the authors show that in oocytes endosomes and lysosomes aggregate into giant structures that they term Endosomal LYSosomal organellar Assembl[ies] (ELYSA). Both the size distribution of the ELYSAs and their position within the cell change during oocyte meiotic maturation and after fertilization. Notably, during maturation, there is a net actin-dependent movement towards the periphery of the oocyte. By the late 2-cell stage, the ELYSAs are beginning to disintegrate. At this stage, the endo-lysosomes become acidified, likely reflecting the activation of their function to degrade cellular components.

      This is a carefully performed and quantified study. The fluorescent images obtained using well-known markers, using both antibodies and tagged proteins, support the interpretations, and the quantification method is sophisticated and clearly explained. Notably, this type of quantification of confocal z-stack images is rarely performed and so represents a real strength of the study. It provides sound support for the conclusions regarding changes in the size and position of the ELYSAs. Another strength is the use of multiple markers, including those that indicate the activity state of the endo-lysosomes. Altogether, the manuscript provides convincing evidence for the existence of ELYSAs and also for regulated changes in their location and properties during oocyte maturation and the first few embryonic cell cycles following fertilization.

      At present, precisely how the changes in the location and properties of the ELYSAs affect the function of the endo-lysosomal system is not known. While the authors' proposal that they are stored in an inactive state is plausible, it remains speculative. Nonetheless, this study lays the foundation for future work to address this question.

      Minor point: l. 299. If I am not mistaken, there is a typo. It should read that the inhibitors of actin polymerization prevent redistribution from the cytoplasm to the cortex during maturation.

      Minor point: A few statements in the Introduction would benefit from clarification. These are noted in the comments to the authors.

      We sincerely appreciate the editorial board of eLife and the reviewers for their helpful and constructive comments on our manuscript. We are pleased that the reviewers acknowledged that we identified and characterized this assembly structure independently. In the revised manuscript, we have carefully considered the reviewers’ comments and conducted additional analysis to address each of them.

      Regarding the typographical errors, we revised the description to fit with our findings and the reviewers’ comments. We also found that the primer sequence was correct, and we carefully checked the accuracy of the entire manuscript.

      We hope that the revised version will now be deemed suitable for publication in eLife.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Q. 1) The authors state in the Abstract that ELYSAs contain autophagosome-like membranes in the outer layer. However, this seems to be just speculation based on the LC3 staining results and is not directly shown. Are there autophagosome-like double membrane structures in ELYSAs?

      We appreciate this comment. We also agree with this concern; however, it was difficult to assert that they are autophagosomes based on the observation of the electron micrographs. For this reason, we rephrased it to be "Most ELYSAs are also positive for an autophagy regulator, LC3.” (lines 33). In addition, we revised the notation to LC3-positive structures in the Result and Discussion section (line 165-169, 286).

      Q. 2) The data in Figure 2A, showing a decrease in the number of LAMP1 structures, seems to contradict the data in Figure 1B, showing an apparent increase in LAMP1 structures. Please explain this discrepancy. If the authors did not count structures just below the plasma membrane, please explain the rationale for this.

      We really appreciate the valuable comment. Regarding the number of LAMP1-positive structures, it is not suitable for comparison with Figure 1B, etc., as pointed out by the reviewer, since the distribution of the LAMP1 signal differs from plane to plane. To avoid any potential confusion, we added new images of the Z-projection of the immunostained images that can better reflect the number of positive structures in the whole oocyte/embryo in Figure 2.

      In addition, as the reviewer pointed out, there is a technical difficulty in measuring the LAMP1-positive signal on the plasma membrane or just below it. We explained how and why we had to delete plasma membrane signals in our response #21.

      Q. 3) The actin dependence is not observed in Figure 5C. What is the difference between Figure 5C and 5E? Please explain further.

      We apologize for the lack of clarity; Figures 5C and 5E show the average number of LAMP1-positive structures (5C) and the percentage of the sum of granule volumes in LAMP1 positive structure (5E), respectively, after classifying the LAMP1 positive granules by their diameters.

      We removed Figure 5E for the sake of conciseness since we already mentioned a similar fact in Figure 5C. To clarify the corresponding explanations, we moved figures that were not classified by diameter to Supplementary Figure 8 to improve readability. Moreover, we have rewritten the main text on lines 200–211.

      Q. 4) While the actin inhibitors reduce the number of peripheral LAMP1 structures (Figure 5F), they do not affect their number in the central region (Figure 5G). How can the authors conclude that actin inhibitors inhibit the migration of LAMP1 structures?

      We appreciate the comment. As pointed out, the number of large LAMP1-positive structures in the medial region did not change. Therefore, we have avoided the description that ELYSAs migrate from the middle region to the cell periphery and have unified the description of whether large structures in the periphery occur. Please refer to the subsection title (line 188), the following descriptions (lines 189–199), the related description in the Results (lines 200–211), and the title and the legend of Figure 5.

      Q. 5) The authors show that the V1A subunit associates with the surface of LAMP1 structures as punctate structures (Figure 6B). What are these V1A-positive structures? Is V1A recruited to some specific domains of ELYSAs, or are V1A-positive active lysosomes recruited to ELYSAs? Please provide an interpretation of these data. The phrase "The V1-subunit of V-ATPase is targeted to these structures" (line 262) is not appropriate because it is indistinguishable whether only the V1 subunits are recruited or active lysosomes containing the V1 subunit are recruited.

      Thank you for the valuable comment. Indeed, our analysis, including the analysis of Fig. 8 described on line 262, did not clarify whether free V1A-mCherry molecules accessed the ELYSA periphery or whether lysosomes with V1A-mCherry molecules newly merged into the ELYSA. Therefore, we added this interpretation to lines 232–234 of the Results and revised the Discussion as "The number of membrane structures positive for V1A-mCherry increase upon ELYSA disassembly, indicating further acidification of the endosomal/lysosomal compartment" (lines 292–294).

      Q. 6) Why did the authors use LysoSensor as a marker for ELYSA instead of LAMP1 in Figure 8 and 9? Some reasons should be given.

      There is a clear technical reason for this: when LAMP1-EGFP was expressed in a zygote, it was largely migrated to the plasma membrane before and after the 2-cell stage, making it difficult to capture the change of ELYSAs. To circumvent this difficulty, we used Lysosensor to visualize ELYSAs instead of LAMP1-EGFP. This explanation was added to lines 258–260.

      Q. 7) In Figure 9A, it is not clear whether the activity of LysoSensor-positive structures is lower at this stage compared to other stages. It may be shown in Figure S7, but the data are not clearly visible. A direct comparison would be ideal.

      A new analysis similar to that shown in Fig. 9 for early 2-cells and 4-cells was performed and added to Figure S7. To support direct comparison, the ranges of axes were set to be similar.

      As a result, the quantified MagicRed signal on the isolated LysoSensor-positive punctate structure in MII oocyte was nearly the same as that in early 2-cells and 4-cells. In early 2-cells, LysoSensor gave a signal at the cellular boundary, where MagicRed staining was not observed, confirming that MagicRed activity is higher in the interior than in the cell periphery in post-fertilization embryos. We have included an additional description in the main text (lines 280–282).

      Q. 8) In the phrase "pregnant mare serum gonadotropin or an anti-inhibin antibody" (line 382), is "or" correct?

      When inducing superovulatory stimulation, an anti-inhibin antibody (distributed as CARD HyperOva) can be used as a substitute for PMSG (after additional stimulation with hCG), which results in the production of eggs of similar quality to those of PMSG. This was used in most experiments. To amend the lack of clarity, a reference (Takeo and Nakagata Plos One, 2015) was added to the description of HyperOva (line 417).

      Q. 9) In almost all graphs, please indicate what the X-axis is indicating (not just "number") so that readers can understand what number is being represented without reading the legends.

      We revised the axis titles in all figures.

      Q. 10) Since grayscale images provide better contrast than color images, it is recommended that single-color images be shown in grayscale.

      We replaced all single-color images with grayscale images.

      Reviewer #2 (Recommendations For The Authors):

      Specific comments:

      Q. 11) Figure 1 and S1- Both Rab5 and Rab7 co-localize with LAMP1. However, there seems to be a lot of LAMP1-free Rab5 dots as compared to the Lamp1-free Rab7. As a result, LAMP1 and Rab7 are co-localized more frequently than LAMP1 and Rab5 (video1). Could it be that early endosomes (Rab5+) are yet to be incorporated into ELYSAs? If so, a brief discussion of this phenomenon would be nice.

      Thank you very much for the comment. We agree with the reviewer’s interpretation. In accordance with this suggestion, we clearly stated in the main text: “Although small punctate structures that are RAB5-positive but LAMP1-negative also spread over the cytosol, most giant structures were positive for RAB5 and LAMP1 (Video 1)” (lines 91–93). In the Discussion section, a brief statement was included: “Considering the large number of RAB5-positive and LAMP1-negative punctate structures in MII oocytes, these layers may also reflect the assembly mechanism of the ELYSA” (lines 318–320).

      Q. 12) Video 3 (and Figure 6) clearly shows the dynamics of LAMP1-labelled vesicles during maturation, which is impressive. In contrast to the live cell imaging after LAMP1 mRNA injection, Figure 1 used anti-LAMP1 Ab to detect endogenous levels of LAMP1. It appears that mRNA microinjection causes LAMP1 overexpression causing more (but smaller) vesicles to form. It should be easy to quantify and compare the vesicles in Figure 1 and 6

      We appreciate the comment. As mentioned, injections of EGFP-LAMP1 mRNA are useful for the visualization of LAMP1 dynamics during the maturation phase from GV to MII by live cell imaging, which is not feasible with immunostaining. However, the fluorescence emitted by EGFP-LAMP1 is only a few tenths of that of antibody staining, and because of the technical difficulty of microinjection into GV oocytes, the signal-to-noise ratio sufficient for imaging was merely one in ten oocytes. In addition, live cell imaging of oocytes in Figure 6 had to be carried out with very low excitation light exposure to reduce the toxicity. It was also performed with a low magnification lens and a longer step size in the z-axis. For these reasons, in examining the point raised, we performed an additional 3D object analysis, in the same way as in Figure 2, on the data of IVM oocytes injected with EGFP-LAMP1 mRNA using the same lens as in Figure 1 and with a longer exposure time than in live imaging. The results were compared with the MII data of Figures 1 and 2.

      As a result, as shown in the new Figure S8, more objects with a diameter of 0.2–0.4 µm were found than in the immunostaining data, which fits the reviewer’s point. In addition, the counts were lower for the 0.6–1.0 µm diameter, but there was no significant difference in the number of larger LAMP1 positive structures corresponding to the ELYSA size. We consider that this was appropriate for the original purpose of characterizing the ELYSA formation process. A description of these points has been added to lines 221–225.

      Q. 13) In Figure 4A and B- Seems like not all LAMP1-positive structures were LC3-positive. Is there any size or location within the oocyte that determines LC3 positivity?

      We appreciate the valuable comment. To answer this comment, we proceeded with a new 3D object-based co-localization analysis on Lamp1 and LC3, determined the number, volume, and distribution within the oocyte, and incorporated the results as Supplementary Figure 6. To examine the positivity, we further analyzed the percentage of double-positive structures of all the LAMP1-positive structures. The results showed that their average diameter significantly shifted from 2.36 µm (GV) to 3.78 µm (MII). Moreover, it was clearly indicated that LAMP1-positive structures smaller than 2 µm in diameter are rarely positive for LC3. In terms of location, measuring the distance of the double positive structures from the oocyte center (the cellular geometric center) indicated that they tend to be observed at the periphery of both stages of oocytes (more than 80% in > 30 µm in the MII oocyte). Of note, no clear tendency of double positivity was observed. A description of these points has been added to lines 174–186.

      Q. 14) In discussion, line 256- Small ELYSAs are formed in GV oocytes. Since you haven't checked the smaller-sized, growing oocytes, I suggest rephrasing this sentence as 'are present' rather than 'are formed'.

      We agree with the reviewer’s suggestion and changed it to "present" (line 287).

      Q. 15) Line 188- ELISA should instead be ELYSA

      Thank you for pointing this out. We have found a few more typographical errors, and all of them have been corrected (lines 213 and 321).

      Reviewer #3 (Recommendations For The Authors):

      Q. 16) Line 42: What do you mean by 'zygotic gene expression following the degradation of the cellular components of each maternal and paternal gamete'? ZGA requires this degradation? Please provide supporting references from the literature.

      We apologize for the confusing wording. We meant to say that both ZGA and degradation of parental components are required. To avoid misunderstanding, we have revised “zygotic gene expression as well as the degradation of the cellular components of each maternal and paternal gamete” and inserted a new reference (line 44).

      Q. 17) 50: MII means metaphase II, not meiosis II.

      We corrected the clerical mistake (line 50).

      Q. 18) 51: Define LC3.

      We added the definition of LC3 (line 51-52).

      Q. 19) 60: 'lysosomal activity in oocytes is upregulated by sperm-derived factors as the oocytes grow and mature'. As written, the sentence implies that oocytes grow and mature after fertilization. This may be true for maturation, but I would be surprised to learn that there is growth of the oocyte after fertilization.

      We appreciate this valuable comment.

      The C. elegans lives mainly as a hermaphrodite, which contains a couple of U-shaped gonad arms including the ovary, spermatheca and uterus in the body. Oocytes grow in the ovary and maturate upon receiving major sperm proteins secreted from sperms and ovulated to the spermatheca for fertilization. In 2017, Kenyon’s group reported that major sperm proteins act as sperm-secreted hormones to upregulates the lysosomal activity in oocytes during oocyte growth and maturation. We have revised our manuscript to avoid misunderstanding, to ' lysosomal activity in oocytes is upregulated by major sperm proteins secreted from sperms as the oocytes grow and mature '. (L. 61-66).

      Q. 20) 94 and Figure 1B: While it is clear that many LAMP1 foci at the late 2-cell stage do not also contain RAB5, it seems that the majority of RAB5 loci also stain for LAMP1. This may be a minor point in the context of the paper but could be clarified.

      We could not easily agree with the suggestion because of the possibility that the images might give different impressions on each plane. Therefore, as a way to verify this point, we attempted to quantify the co-localization by reconstructing the 3D puncta information based on the two types of antibody staining data. Unfortunately, as shown in Fig. 1AB, Rab5 had a high cytoplasmic background, and although we were able to extract peaks, we could not reliably recalibrate the three-dimensional punctate structure (please refer to the new Supplementary Fig. 6). Therefore, co-localization on each other's punctate structure (LAMP1/RAB5 vs. RAB5/LAMP1) could not be verified. The validation using specific planes also showed large differences between planes, with overlapping punctate structures counted separately in adjacent planes, making reliable quantification difficult. This is an issue that will be addressed in the future.

      On the other hand, the newly added Z-projection figure (Fig. 1AB) shows that RAB5-positive and LAMP1-negative punctate structures tend to accumulate along the LAMP1-positive punctate structures larger than 1 µm at the late 2-cell stage in all observed embryos; we added this statement on lines 99–101.

      Q. 21) 100-102 and Figure 2A: Does the decrease in the total number of LAMP1 foci refer just to cytoplasmic or also to membrane foci? If the former, what was the reason for not including the membrane in the analysis?

      We appreciate the critical question. The LAMP1 signal on the plasma membrane interfered with the measurement of the signals just below the plasma membrane. The biological cause of this increased signal on the plasma membrane, as shown in Fig. 2E, seemed to be caused by the migration of the LAMP1 signals post-fertilization, which was also reported in a previous paper by Zaffagnini et al. (2024), published in Cell.

      In our analysis, oocytes are giant cells, and confocal imaging has a technical limitation in obtaining the same fluorescent intensity along the z-axis. However, 3D-object analysis requires thresholding based on absolute values. As a result of this situation, the presence of the plasma membrane signal caused punctate structures located close to the membrane to be captured and recognized as a single, very large LAMP1-positive structure, resulting in the loss of the punctate structure that should be measured.

      To avoid this issue, we have used several programs to correct the fluorescence difference along the z-axis; nonetheless, these attempts were unsuccessful. Therefore, as described in the Materials and Methods section, we applied only background subtraction at each z-position and then manually removed the plasma membrane signal (which was thin and continuous at the edges). Furthermore, when the plasma membrane and punctate structure signals overlapped, we paid attention not to remove the signals but to separate them. Thus, we believe that the decrease in the number and volume of LAMP1-positive structures after fertilization is still a phenomenon associated with the shift of LAMP1 to the plasma membrane.

      Q. 22) Figure 2B, F, G: As the x-axis does not represent a continuous variable, adjacent data points should not be connected by a line. The histogram representations in A, C, and E are much easier to understand. I suggest presenting all data in this format.

      We revised the line graphs to bar graphs. Besides, to make the significance among populations clearer, the significances are now expressed using alphabetical indicators.

      Q. 23) Figure 2B, C: It seems that the values for the different stages are expressed relative to the value at MII. Why not use the GV value at the base-line? This would follow the developmental trajectory of the oocyte/embryo more directly and would not (I believe) change the conclusions.

      We appreciated the comment. We meant to express that ELYSA develops most in the MII phase and that it decreases after fertilization, so considering the reviewer’s suggestion, we expressed GV-MII changes based on GV and changes after fertilization based on the MII phase (Fig. 2C, D).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript is dedicated heavily to cell type mapping and identification of sub-type markers in the human testis but does not present enough results from cross-investigation between NOA cases versus control. Their findings are mostly based on transcriptome and the authors do not make enough use of the scATAC-seq data in their analyses as they put forward in the title. Overall, the authors should do more to include the differential profile of NOA cases at the molecular level - specific gene expression, chromatin accessibility, TF binding, pathway, and signaling that are perturbed in NOA patients that may be associated with azoospermia.

      Strengths:

      (1) The establishment of single-cell data (both RNA and ATAC) from the human testicular tissues is noteworthy.

      (2) The manuscript includes extensive mapping of sub-cell populations with some claimed as novel, and reports marker gene expression.

      (3) The authors present inter-cellular cross-talks in human testicular tissues that may be important in adequate sperm cell differentiation.

      Weaknesses:

      (1) A low sample size (2 OA and 3 NOA cases). There are no control samples from healthy individuals.

      Thank you for your comments. We recognize that the small sample size in this study somewhat limits its generalizability. However, in transcriptomic research, limited sample sizes are a common issue due to the complexities involved in acquiring samples, particularly in studies about the reproductive system. Healthy testicular tissue samples are difficult to obtain, and studies (doi: 10.18632/aging.203675) have used obstructive azoospermia as a control group in which spermatogenesis and development are normal.

      (2) Their argument about interactions between germ and Sertoli cells is not based on statistical testing.

      Thank you for your comments. Due to limited funding, we have not yet fully and deeply conducted validation experiments, but we plan to carry out related experiments in the later stage. We hope that the publication of this study will help to obtain more financial support to further investigate the interactions between germ cells and Sertoli cells.

      (3) Rationale/logic of the study. This study, in its present form, seems to be more about the role of sub-Sertoli population interactions in sperm cell development and does not provide enough insights about NOA.

      Thank you for your comments. In Figure 6, we conducted an in-depth analysis and comparison of the differences between the Sertoli cell subtypes and the germ cell subtypes involved in spermatogenesis in the OA and NOA groups. The results revealed that in the NOA group, especially in the NOA3 group, which has a lower sperm count compared to NOA2 and NOA1, there is a significant loss of Sertoli cell subtypes including SC3, SC4, SC5, SC6, and SC8. The NOA1 group, with a sperm count close to that of the OA group, also had a Sertoli cell profile similar to the OA group. The NOA2 group, with a sperm count between that of NOA1 and NOA3, also exhibited an intermediate profile of Sertoli cell subtypes. Therefore, we suggest that change in Sertoli cell subtypes is a key factor affecting sperm count, rather than just the total number of Sertoli cells. We believe that through these analyses, we can provide in-depth insights into NOA, and we hope that the publication of this study will help obtain more funding support to further validate and expand on these findings.

      (4) The authors do not make full use of the scATAC-seq data.

      Thank you for your comments.We have added analysis of the scATAC-seq data and shown in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Shimin Wang et al. investigated the role of Sertoli cells in mediating spermatogenesis disorders in non-obstructive azoospermia (NOA) through stage-specific communications. The authors utilized scRNA-seq and scATAC-seq to analyze the molecular and epigenetic profiles of germ cells and Sertoli cells at different stages of spermatogenesis.

      Strengths:

      By understanding the gene expression patterns and chromatin accessibility changes in Sertoli cells, the authors sought to uncover key regulatory mechanisms underlying male infertility and identify potential targets for therapeutic interventions. They emphasized that the absence of the SC3 subtype would be a major factor contributing to NOA.

      Weaknesses:

      Although the authors used cutting-edge techniques to support their arguments, it is difficult to find conceptual and scientific advances compared to Zeng S et al.'s paper (Zeng S, Chen L, Liu X, Tang H, Wu H, and Liu C (2023) Single-cell multi-omics analysis reveals dysfunctional Wnt signaling of spermatogonia in non-obstructive azoospermia. Front. Endocrinol. 14:1138386.). Overall, the authors need to improve their manuscript to demonstrate the novelty of their findings in a more logical way.

      Thank you for your detailed review of our work. We greatly appreciate your feedback and have made revisions to our manuscript accordingly.

      Regarding the novelty of our research, we believe our study offers conceptual and scientific advances in several ways:

      We have systematically revealed the stage-specific roles of Sertoli cell subtypes in different stages of spermatogenesis, particularly emphasizing the crucial role of the SC3 subtype in non-obstructive azoospermia (NOA). Additionally, we identified that other Sertoli cell subtypes (SC1, SC2, SC3...SC8, etc.) also collaborate in a stage-specific manner with different subpopulations of spermatogenic cells (SSC0, SSC1/SSC2/Diffed, Pa...SPT3). These findings provide new insights into the understanding of spermatogenesis disorders.

      Compared to the study by Zeng S et al., our research not only focuses on the functional alterations in Sertoli cells but also comprehensively analyzes the interaction patterns between Sertoli cells and spermatogenic cells using scRNA-seq and scATAC-seq technologies. We uncovered several novel regulatory networks that could serve as potential targets for the diagnosis and treatment of NOA.

      We sincerely appreciate your constructive comments and will continue to explore this area further, aiming to make a more significant contribution to the understanding of NOA mechanisms.

      Reviewer #3 (Public Review):

      Summary:

      This study profiled the single-cell transcriptome of human spermatogenesis and provided many potential molecular markers for developing testicular puncture-specific marker kits for NOA patients.

      Strengths:

      Perform single-cell RNA sequencing (scRNA-seq) and single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq) on testicular tissues from two OA patients and three NOA patients.

      Weaknesses:

      Most results are analytical and lack specific experiments to support these analytical results and hypotheses.

      Thank you for your thorough review of our work. We highly value your feedback and have made revisions to our manuscript accordingly. Indeed, we have conducted immunofluorescence (IF) experiments to validate the data obtained from single-cell sequencing and have expanded the sample size to enhance the reliability of our results. To better present these validation experiments, we have reorganized and renamed the sample information, making it easier for you to understand which samples were used in the specific experiments. Following the publication of this paper, we plan to secure additional funding to deepen our research, particularly in the area of experimental validation. We sincerely appreciate your support and insightful suggestions, which have greatly helped guide our future research directions.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors should include results from cross-investigation comparing NOA/OA patients versus controls.

      Thank you for your comments. In this study, OA was the control group. Healthy testicular tissue samples are difficult to obtain, and studies (doi: 10.18632/aging.203675) have used OA as a control group in which spermatogenesis and development are normal.

      (2) In Table S1, the authors should also include the metric for scATAC-seq, and do more to show the findings the authors obtained in RNA is replicated with chromatin accessibility.

      Thank you for your comments. We have added Table S2, which includes the metric for scATAC-seq.

      (3) A single sample from each OA and NOA group may not be enough to confirm colocalization. The authors should include results from all available samples and use quantitative measures.

      Thank you for your comments. I apologize that the sample size in this study was less than three and we could not conduct quantitative analysis. We will increase the sample size and conduct corresponding experiments in subsequent research.

      (4) The Methods section does not include enough description to follow how the analyses were carried out, and is missing information on some of the key procedures such as velocity and cell cycle analyses.

      Thank you for your comments. The method about velocity and cell cycle analyses was added in the revised manuscript. The description is as follows:

      “Velocity analysis

      RNA velocity analysis was conducted using scVelo's (version 0.2.1) generalized dynamical model. The spliced and unspliced mRNA was quantified by Velocity (version 0.17.17).”

      “Cell cycle analysis

      To quantify the cell cycle phases for individual cell, we employed the CellCycleScoring function from the Seurat package. This function computes cell cycle scores using established marker genes for cell cycle phases as described in a previous study by Nestorowa et al. (2016). Cells showing a strong expression of G2/M-phase or S-phase markers were designated as G2/M-phase or S-phase cells, respectively. Cells that did not exhibit significant expression of markers from either category were classified as G1-phase cells.”

      (5) For the purpose of transparency, the authors should upload codes used for analyses so that each figure can be reproduced. All raw and processed data should be made publicly available.

      Thank you for your comments. We have deposited scRNA-seq and scATAC-seq data in NCBI. ScRNA-seq data have been deposited in the NCBI Gene Expression Omnibus with the accession number GSE202647, and scATAC-seq data have been deposited in the NCBI database with the accession number PRJNA1177103.

      Reviewer #2 (Recommendations For The Authors):

      The detailed points the authors need to improve are attached below.

      The results presented in the study have several weaknesses:

      In Figure 1A, it's required to show HE staining results of all patients who underwent single-cell analysis were provided.

      Thank you very much for your valuable suggestions. In Figure 1, we present the HE staining results paired with the single-cell data, covering all patients involved in the single-cell analysis.

      - Saying "identification of novel potential molecular markers for distinct cell types" seems unsupported by the data.

      Thank you for your comments. I'm sorry for the inaccuracy of my description. We have revised this sentence. The description is as follows: These findings indicate that the scRNA-seq data from this study can serve for cellular classification.

      - The methods suggest an integrated analysis of scRNA-seq and scATAC-seq, but from the figures, it seems like separate analyses were performed. It's necessary to have data showing the integrated analysis.

      Thank you for your comments. We have added an integrated analysis of scRNA-seq and scATAC-seq. The results were shown in Figure S2.

      Figure 2 does not seem to well cover the diversity of germ cell subtypes. The main content appears to be about the differentiation process, and it seems more focused on SSCs (stem cell types), but the intended message is not clearly conveyed.

      Thank you for your comments. Figure S1 revealed the diversity of germ cell subtypes. The second part of the results described the integrated findings from Figures 2 and S1.

      - In Figure 2B, pseudotime could be shown, and I wonder if the pseudotime in this analysis shows a similar pattern as in Figure 2D.

      Thank you for your comments. Figure 2B revealed the pseudotime analysis of 12 germ cell subpopulation. Figure 2D revealed RNA velocity of 12 germ cell subpopulation. The two methods are both used for cell trajectory analysis. The pseudotime in Figure 2B showed a similar pattern as in Figure 2D.

      - While staining occurs within one tissue, saying they are co-expressed seems inaccurate as the staining locations are clearly distinct. For example, the staining patterns of A2M and DDX4 (a classical marker) are quite different, so it's hard to claim A2M as a new potential marker just because it's expressed. Also, TSSK6 was separately described as having a similar expression pattern to DDX4, but from the IF results, it doesn't seem similar.

      Thank you for your comments. We have revised the Figure.

      - It was described that A2M (expressed in SSC0-1), and ASB9 (expressed in SSC2) have open promoter sites in SSC0, SSC2, and Diffing_SPG, but it doesn't seem like they are only open in the promoters of those cell types. For example, there doesn't seem to be a peak in Diffing for either gene. The promoter region of the tracks is not very clear, so overall figure modification seems necessary.

      Thank you for your comments. We have revised the Figure.

      - The ATAC signal scale for each genomic region should be included, and clear markings for the TSS location and direction of the genes are needed.

      Thank you for your comments. We have revised the figure and shown in the revised manuscript.

      Figure 3A mostly shows the SSC2 in the G2/M phase, so it seems questionable to call SSC0/1 quiescent. Also, I wonder if the expression of EOMES and GFRA1 is well distinguished in the SSC subtypes as expected.

      Thank you for your comments. We will validate in subsequent experiments whether the expression of EOMES and GFRA1 is clearly distinguished in the SSC subtypes.

      - In Figure 3C, it would be good to have labels indicating what the x and y axes represent. The figure seems complex, and the description does not seem to fully support it.

      Thank you for your comments. We have added labels indicating what the x and y axes represent in the Figure 3C. The x and y axes represent spliced and unspliced mRNA ratios, respectively.

      - While TFs are the central focus, it's disappointing that scATAC-seq was not used.

      Thank you for your comments. TFs analysis using scATAC-seq will be carried out in the future.

      Figure 4: It would be good to have a more detailed discussion of the differences between subtypes, such as through GO analysis. The track images need modification like marking the peaks of interest and focusing more on the promoter region, similar to the previous figures.

      Thank you for your comments. GO analysis results were put in Figure S5. The description is as follows:

      As shown in Figure S5, SC1 were mainly involved in cell differentiation, cell adhesion and cell communication; SC2 were involved in cell migration, and cell adhesion; SC3 were involved in spermatogenesis, and meiotic cell cycle; SC4 were involved in meiotic cell cycle, and positive regulation of stem cell proliferation; SC5 were involved in cell cycle, and cell division; SC6 were involved in obsolete oxidation−reduction process, and glutathione derivative biosynthetic process; SC7 were involved in viral transcription and translational initiation; SC8 were involved in spermatogenesis and sperm capacitation.

      In Figure 5, it would be good to have criteria for the novel Sertoli cell subtype presented. CCDC62 is presented as a representative marker for the SC8 cluster, but from Figure 4C, it seems to be quite expressed in the SC3 cluster as well. Therefore, in Figure 5E's protein-level check, it's unclear if this truly represents a novel SC8 subtype.

      Thank you for your comments. CCDC62 expression was higher in SC8 cluster than in SC3. Since some molecular markers were not commercially available in the market, CCDC62 was selected as SC8 marker for immunofluorescence verification. Immunofluorescence results showed that CCDC62 is a novel SC8 marker.

      - It might have been more meaningful to use SOX9 as a control and show that markers in the same subtype are expressed in the same location.

      Thank you for your comments. To determine PRAP1, BST2, and CCDC62 as new markers for the SC subtype, we co-stained them with SOX9 (a well-known SC marker).

      - Figures 4 and 5 could potentially be combined into one figure.

      Thank you for your comments. Since combining Figures 4 and 5 into a single image would cause the image to be unclear, two images are used to show it.

      In Figure 6, it would be good to support the results with more NOA patient data.

      Thank you for your comments. Patient clinical and laboratory characteristics has been presented in Table 1.

      - Rather than claiming the importance of SC3 based on 3 single-cell patient data, it would be better to validate using public data with SC3 signature genes (e.g., showing the correlation between germ cell and SC3 ratios).

      Thank you for your comments. I'm sorry I didn't find public data with SC3 signature genes. In the future, we will verify the importance of SC3 through in vivo and in vitro experiments.

      - 462: It seems to be referring to Figure 6G, not 6D.

      Thank you for your comments. We have revised it. The description is as follows: As shown in Figure 6G, State 1 SC3/4/5 were tended to associated with PreLep, SSC0/1/2, and Diffing and Diffed-SPG sperm cells (R > 0.72).

      In Figure 7, the spermatogenesis process is basically well-known, so it would be better to emphasize what novel content is being conveyed here. Additionally, emphasizing the importance of SC3 in the overall process based on GO results leaves room for a better approach.

      Thank you for your valuable suggestions. Regarding Figure 7, we recognize that the spermatogenesis process is well-known, and we will focus on highlighting the novel content, particularly the role and significance of the SC3 subtype in spermatogenesis disorders. As for the importance of SC3 in the overall process based on GO results, we have validated this in Figure 8 through co-staining experiments between Sertoli cells and spermatogenic cells in OA and NOA groups. The results demonstrate a significant correlation between the number of SC3-positive cells and SPT3 spermatogenic cells, particularly in the NOA5-P8 group, where both SC3 and SPT3 cell counts are notably lower than in the NOA4-P7 group. This further supports the critical role of SC3 in the spermatogenesis process. Your suggestions have prompted us to refine our data presentation and more clearly emphasize the novel aspects of our research. We will continue to strive to ensure that every part of our research contributes meaningfully to the academic community. Thank you again for your guidance.

      In Figure 8, only the contents of the IF-stained proteins are listed, which seems slightly insufficient to constitute a subsection on its own. It might have been better to conclude by emphasizing some subtypes.

      Thank you for your comments. We have combined this part of the results with other results into one section. The description is as follows:

      “Co-localization of subpopulations of Sertoli cells and germ cells

      To determine the interaction between Sertoli cells and spermatogenesis, we applied Cell-PhoneDB to infer cellular interactions according to ligand-receptor signalling database. As shown in Figure 6G, compared with other cell types, germ cells were mainly interacted with Sertoli cells. We futher performed Spearman correlation analysis to determine the relationship between Sertoli cells and germ cells. As shown in Figure 6H, State 1 SC3/4/5 were tended to be associated with PreLep, SSC0/1/2, and Diffing and Diffed-SPG sperm cells (R > 0.72). Interestingly, SC3 was significantly positively correlated with all sperm subpopulations (R > 0.5), suggesting an important role for SC3 in spermatogenesis and that SC3 is involved in the entire process of spermatogenesis. Subsequently, to understand whether the functions of germ cells and Sertoli cells correspond to each other, GO term enrichment analysis of germ cells and sertoli cells was carried out (Figure S3, S4). We found that the functions could be divided into 8 categories, namely, material energy metabolism, cell cycle activity, the final stage of sperm cell formation, chemical reaction, signal communication, cell adhesion and migration, stem cells and sex differentiation activity, and stress reaction. These different events were labeled with different colors in order to quickly capture the important events occurring in the cells at each stage. As shown in Figure S3, we discovered that SSC0/1/2 was involved in SRP-dependent cotranslational protein targeting to membrane, and cytoplasmic translation; Diffing SPG was involved in cell division and cell cycle; Diffied SPG was involved in cell cycle and RNA splicing; Pre-Leptotene was involved in cell cycle and meiotic cell cycle; Leptotene_Zygotene was involved in cell cycle and meiotic cell cycle; Pachytene was involved in cilium assembly and spermatogenesis; Diplotene was involved in spermatogenesis and cilium assembly; SPT1 was involved in cilium assembly and flagellated sperm motility; SPT2 was involved in spermatid development and flagellated sperm motility; SPT3 was involved in spermatid development and spermatogenesis. As shown in Figure S4, SC1 were mainly involved in cell differentiation, cell adhesion and cell communication; SC2 were involved in cell migration, and cell adhesion; SC3 were involved in spermatogenesis, and meiotic cell cycle; SC4 were involved in meiotic cell cycle, and positive regulation of stem cell proliferation; SC5 were involved in cell cycle, and cell division; SC6 were involved in obsolete oxidation−reduction process, and glutathione derivative biosynthetic process; SC7 were involved in viral transcription and translational initiation; SC8 were involved in spermatogenesis and sperm capacitation. The above analysis indicated that the functions of 8 Sertoli cell subtypes and 12 germ cell subtypes were closely related.

      To further verify that Sertoli cell subtypes have "stage specificity" for each stage of sperm development, we firstly performed HE staining using testicular tissues from OA3-P6, NOA4-P7 and NOA5-P8 samples. The results showed that the OA3-P6 group showed some sperm, with reduced spermatogenesis, thickened basement membranes, and a high number of sertoli cells without spermatogenic cells. The NOA4-P7 group had no sperm initially, but a few malformed sperm were observed after sampling, leading to the removal of affected seminiferous tubules. The NOA5-P8 group showed no sperm in situ (Figure 7A). Immunofluorescence staining in Figure 7B was performed using these tissues for validation. ASB9 (SSC2) was primarily expressed in a wreath-like pattern around the basement membrane of testicular tissue, particularly in the OA group, while ASB9 was barely detectable in the NOA group. SOX2 (SC2) was scattered around SSC2 (ASB9), with nuclear staining, while TF (SC1) expression was not prominent. In NOA patients, SPATS1 (SC3) expression was significantly reduced. C9orf57 (Pa) showed nuclear expression in testicular tissues, primarily extending along the basement membrane toward the spermatogenic center, and was positioned closer to the center than DDX4, suggesting its involvement in germ cell development or differentiation. BEND4, identified as a marker fo SC5, showed a developmental trajectory from the basement membrane toward the spermatogenic center. ST3GAL4 was expressed in the nucleus, forming a circular pattern around the basement membrane, similar to A2M (SSC1), though A2M was more concentrated around the outer edge of the basement membrane, creating a more distinct wreath-like arrangement. In cases of impaired spermatogenesis, this arrangement becomes disorganized and loses its original structure. SMCP (SC6) was concentrated in the midpiece region of the bright blue sperm cell tail. In the OA group, SSC1 (A2M) was sparsely arranged in a rosette pattern around the basement membrane, but in the NOA group, it appeared more scattered. SSC2 (ASB9) expression was not prominent. BST2 (SC7) was a transmembrane protein primarily localized on the cell membrane. In the OA group, A2M (SSC1) was distinctly arranged in a wreath-like pattern around the basement membrane, with expression levels significantly higher than ASB9 (SSC2). TSSK6 (SPT3) was primarily expressed in OA3-P6, while CCDC62 (SC8) was more abundantly expressed in NOA4-P7, with ASB9 (SCC2) showing minimal expression. Taken together, germ cells of a particular stage tended to co-localize with Sertoli cells of the corresponding stages. Germ cells and sertoli cells at each differentiation stage were functionally heterogeneous and stage-specific (Figure 8). This suggests that each stage of sperm development requires the assistance of sertoli cells to complete the corresponding stage of sperm development.”

      Reviewer #3 (Recommendations For The Authors):

      The authors revealed 11 germ cell subtypes and 8 Sertoli cell subtypes through single-cell analysis of two OA patients and three NOA patients. And found that the Sertoli cell SC3 subtype (marked by SPATS1) plays an important role in spermatogenesis. It also suggests that Notch1/2/3 signaling and integrins are involved in germ cell-Stotoli cell interactions. This is an interesting and useful article that at least gives us a comprehensive understanding of human spermatogenesis. It provides a powerful tool for further research on NOA. However, there are still some issues and questions that need to be addressed.<br /> (1) How to collect testicular tissue, please explain in detail. Extract which part of testicular tissue. It's better to make a schematic diagram.

      Thank you for your comments. The process is as follows: Testicular tissues were obtained from two OA patients (OA1-P1 and OA2-P2) and three NOA patients (NOA1-P3, NOA2-P4, NOA3-P5) using micro-dissection of testicular sperm extraction separately.

      (2) Whether the tissues of these patients are extracted simultaneously or separately, separated into single cells, and stored, and then single cell analysis is performed simultaneously. Please be specific.

      Thank you for your comments. The testicular tissues of these patients were extracted separately, then separated into single cells, and single cell analysis was performed simultaneously.

      (3) When performing single-cell analysis, cells from two OA patients were analyzed individually or combined. The same problem occurred in the cells of three NOA patients.

      Thank you for your comments. Cells from two OA patients and three NOA patients were analyzed individually.

      (4) Can you specifically point out the histological differences between OA and NOA in Figure 1A? This makes it easier for readers to understand the structure change between OA and NOA. Please also label representative supporting cells.

      Thank you for your comments. We have revised the description and it was shown in the revised manuscript.

      (5) The authors demonstrate that "We speculate that this lack of differentiation may be due to the intense morphological changes occurring in the sperm cells during this period, resulting in relatively minor differences in gene expression." Please provide some verification of this hypothesis? For example, use immunofluorescence staining to observe morphological changes in sperm cells.

      Thank you for your comments. Due to limited funds, we will verified this hypothesis in future studies.

      (6) The authors demonstrate that " As shown in Figure 5E, we discovered that PRAP1, BST2, and CCDC62 were co-expressed with SOX9 in testes tissues." The staining in Figure 5D is unclear, and it is difficult to explain that SOX9 is co-expressed with PRAP1 BST2 CCDC62 based on the current staining results. The staining patterns of SOX9 (green) and SOX9 (red) are also different. (SOX9 (red) appears as dots, while the background for SOX9 (green) is too dark to tell whether its staining is also in the form of dots.) In summary, increasing the clarity of the staining makes it more convincing. Alternatively, use high magnification to display these results.

      Thank you for your comments. I have redyed and updated this part of the immunofluorescence staining results. Please refer to the files named Figure 1, Figure 2, Figure 5, and Figure 8.

      (7) In Figure 8, the author emphasized the co-localization of Sertoli cells and Germ cells at corresponding stages and did a lot of staining, but it was difficult to distinguish the specific locations of co-localization, which was similar to Figure 5E. If possible, please mark specific colocalizations with arrows or use high magnification to display these results, in order to facilitate readers to better understand.

      Thank you for your comments. We have re-stained and updated this part of the data. Please refer to the immunofluorescence staining data in the updated Figure 8.

      (8) The authors emphasize that macrophages may play an important role in spermatogenesis. Therefore, adding relevant macrophage staining to observe the differences in macrophage expression between NOA and OA should better support this idea.

      Thank you for your comments. Macrophage-related experiments will be further explored in the future.

      (9) Notch1/2/3 signaling and integrin were discovered to be involved in germ cell-Sertoli cell interaction. However there are currently no concrete experiments to support this hypothesis. At least simple verification experiments are needed.

      Thank you for your comments. Due to limited funding, studies will be carried out in the future.

      (10) Data availability statements should not be limited to the corresponding author, especially for big data analysis. This is crucial to the credibility of this data (Have the scRNA-seq and scATAC-seq in this study been deposited in GEO or other databases, and when will they be released to the public?) The data for such big data analysis needs to be saved in GEO or other databases in advance so that more research can use it.

      Thank you for your comments. We have deposited scRNA-seq and scATAC-seq data in NCBI. “ScRNA-seq data have been deposited in the NCBI Gene Expression Omnibus with the accession number GSE202647, and scATAC-seq data have been deposited in the NCBI database with the accession number PRJNA1177103.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Strengths:

      The paper is solidly based on the ability of the authors to master molecular simulations of highly complex systems. In my opinion, this paper shows no major weaknesses. The simulations are carried out in a technically sound way. Comparative analyses of different systems provide valuable insights, even within the well-known limitations of MD. Plus, the authors further investigate why xCas9 exhibits improved recognition of the TGG PAM sequence compared to SpCas9 via well-tempered metadynamics simulations focusing on the binding of R1335 to the G3 nucleobase and the DNA backbone in both SpCas9 and xCas9. In this context, the authors provide a free-energy profiling that helps support their final model.

      The implementation of FEP calculations to mimic directed evolution improvement of DNA binding is also interesting, original and well-conducted.

      We thank the reviewer for their positive evaluation of our computational strategy. To further substantiate our findings, we have incorporated additional molecular dynamics and Free Energy Perturbation (FEP) calculations for the system bound to GAT. These results corroborate our previous observations obtained with AAG, reinforcing our conclusions.

      Overall, my assessment of this paper is that it represents a strong manuscript, competently designed and conducted, and highly valuable from a technical point of view.

      Weaknesses:

      To make their impact even more general, the authors may consider expanding their discussion on entropic binding to other recent cases that have been presented in the literature recently (such as e.g. the identification of small molecules for Abeta peptides, or the identification of "fuzzy" mechanisms of binding to protein HMGB1). The point on flexibility helping adaptability and expansion of functional properties is important, and should probably be given more evidence and more direct links with a wider picture.

      We have expanded our discussion on the role of entropy in favoring TGG binding to xCas9. To this end, we performed entropy calculations using the Quasi-Harmonic approximation (details provided in the Materials and Methods section). This analysis reveals that R1335 in xCas9 experiences an entropy increase compared to SpCas9, enhancing its adaptability and interaction with the DNA. This analysis and its explanation are detailed on pages 8-9.

      Additionally, we have enriched the Discussion section by clarifying how DNA binding is entropically favored in xCas9, thereby facilitating the recognition of alternative PAM sequences. A refined explanation is also included in the Conclusions section, where we contextualize xCas9 within a broader evolutionary framework of protein-DNA recognition. This highlights how structural flexibility can enable sequence diversity while maintaining high specificity.

      Recommendations for the authors:

      Overall, this is a very interesting and elegant manuscript with compelling results that shed light on the atomistic determinants of genetic-editing technologies.

      Since the paper proposes new findings that may be helpful for experimentalists, it would be interesting if the authors point out (in their discussion/conclusions) specific amino acids to mutate/target for future tests by the experimental community. This should just appear as an open hypothesis/proposal for new experiments.

      In the Conclusions, we have incorporated a discussion on how modifications in the PAM-binding cleft can enhance the recognition of alternative PAM sequences. As an illustrative example, we reference the recently developed SpRY Cas9 variant, which is capable of recognizing a broader range of PAMs. This variant includes mutations within the PAM-binding cleft that likely increase the flexibility of the interacting residues, as suggested by recent cryo-EM structures (Hibshman et al. Nat. Commun. 2024). The importance of fine-tuning the flexibility of the PAM-interacting cleft for engineering strategies has also been highlighted in the abstract.

      Overall, in light of the reviewer’s comments and in consideration of our findings, we revised the manuscript title in: “Flexibility in PAM Recognition Expands DNA Targeting in xCas9.” This new title better highlights the key findings from our research and contextualizes them within the broader goal of expanding DNA targeting capabilities, a critical priority for developing enhanced CRISPR-Cas systems.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This study by Wu et al. provides valuable computational insights into PROTAC-related protein complexes, focusing on linker roles, protein-protein interaction stability, and lysine residue accessibility. The findings are significant for PROTAC development in cancer treatment, particularly breast and prostate cancers.

      The authors' claims about the role of PROTAC linkers and protein-protein interaction stability are generally supported by their computational data. However, the conclusions regarding lysine accessibility could be strengthened with more in-depth analysis. The use of the term "protein functional dynamics" is not fully justified by the presented work, which focuses primarily on structural dynamics rather than functional aspects.

      Strengths:

      (1) Comprehensive computational analysis of PROTAC-related protein complexes.

      (2) Focus on critical aspects: linker role, protein-protein interaction stability, and lysine accessibility.

      Weaknesses:

      (1) Limited examination of lysine accessibility despite its stated importance.

      (2) Use of RMSD as the primary metric for conformational assessment, which may overlook important local structural changes.

      Reviewer #1 (Recommendations for the authors):

      (1) The authors' claims about the role of PROTAC linkers and protein-protein interaction stability are generally supported by their computational data. However, the conclusions regarding lysine accessibility could be strengthened with more in-depth analysis. Expand the analysis of lysine accessibility, potentially correlating it with other structural features such as linker length.

      We thank the reviewers for the suggestions! We performed time dependent correlation analysis to correlate the dihedral angles of the PROTACs and the Lys-Gly distance (Figures 6 and S17). We included detailed explanation on page 16:

      “To further examine the correlation between PROTAC rotation and the Lys-Gly interaction, we performed a time-dependent correlation analysis. This analysis showed that PROTAC rotation translates motion over time, leading to the Lys-Gly interaction, with a correlation peak around 60-85 ns, marking the time of the interaction (Figure 6 and Figure S17). In addition, the pseudo dihedral angles also showed a high correlation (0.85 in the case of dBET1) with Lys-Gly distance. This indicated that degradation complex undergoes structural rearrangement and drives the Lys-Gly interaction.”

      (2) The use of the term "protein functional dynamics" is not fully justified by the presented work, which focuses primarily on structural dynamics rather than functional aspects. Consider changing "protein functional dynamics" to "protein dynamics" to more accurately reflect the scope of the study.

      Thanks to the reviewer for the suggestion to use the more accurate terminology! We agreed with the reviewer that if we keep “protein functional dynamics” in the title, we should focus on how the “overall protein dynamic” links to the “function” – The function is directly related to PROTAC-induced structural dynamics which is commonly seen in “protein-structural-function” relationship, but it is not our main focus. Therefore, we changed the title to replace “functional” by “structural”.  

      (3) Incorporate more local and specific characterization methods in addition to RMSD for a more comprehensive conformational assessment.

      We thank the reviewer for the suggestion. We performed time dependent correlation analysis to understand how the rotation of PROTACs can translate to the Lys-Gly interaction. In addition, we performed dihedral entropies analysis for each dihedral angle in the linker of the PROTACs to better examine the flexibility of each PROTAC.

      We included detailed explanation at page 18: “Our dihedral entropies analysis showed that dBET57 has ~0.3 kcal/mol lower entropies than the other three linkers, suggesting dBET57 is less flexible than other PROTACs (Figure S18).”

      Reviewer #2 (Public review):

      Summary:

      The manuscript reports the computational study of the dynamics of PROTAC-induced degradation complexes. The research investigates how different linkers within PROTACs affect the formation and stability of ternary complexes between the target protein BRD4BD1 and Cereblon E3 ligase, and the degradation machinery. Using computational modeling, docking, and molecular dynamics simulations, the study demonstrates that although all PROTACs form ternary complexes, the linkers significantly influence the dynamics and efficacy of protein degradation. The findings highlight that the flexibility and positioning of Lys residues are crucial for successful ubiquitination. The results also discussed the correlated motions between the PROTAC linker and the complex.

      Strengths:

      The field of PROTAC discovery and design, characterized by its limited research, distinguishes itself from traditional binary ligand-protein interactions by forming a ternary complex involving two proteins. The current understanding of how the structure of PROTAC influences its degradation efficacy remains insufficient. This study investigated the atomic-level dynamics of the degradation complex, offering potentially valuable insights for future research into PROTAC degradability.

      Reviewer #2 (Recommendations for the authors):

      (1) Regarding the modeling of the ternary complex, the BRD4 structure (3MXF) is from humans, whereas the CRBN structure in 4CI3 is derived from Gallus gallus. Is there a specific reason for not using structures from the same species, especially considering that human CRBN structures are available in the Protein Data Bank (e.g., 8OIZ, 4TZ4)?

      We appreciate the reviewer’s insightful comment regarding the choice of crystal structures of BRD4 and CRBN structures from two species. Our initial selection of 4CI3 for CRBN structure was based on its high resolution and publication in Nature journal. Furthermore, the Gallus gallus CRBN structure shares high degree of sequence and structural similarity with Homo sapiens CRBN, especially in the ligand binding region. At the time of our study, we were aware of 4TZ4 as Homo sapiens CRBN, however, we did not use this structure since no publication or detailed experimental was associated with it. Additionally, PDB 8OIZ, was not publicly available yet for other researchers to use at the time.

      (2) Based on the crystal structure (PDB ID: 6BNB) discussed in Reference 6, the ternary complex of dBET57 exhibits a conformation distinct from other PROTACs, with CRBN adopting an "open" conformation. Using the same CRBN structure for dBET57 as for other PROTACs might result in inaccurate docking outcomes.

      Thank you for the reviewer’s comment! As noted by the authors in Reference 6, the observed open conformation of CRBN in the dBET57 ternary complex may result from the high salt crystallization conditions, which could drive structural rearrangement, and crystal contacts that may induce this conformation. The authors also mentioned that this open conformation could, in part, reflect CRBN’s intrinsic plasticity. However, they acknowledged that further studies are needed to determine whether this conformational flexibility is a characteristic feature of CRBN that enables it to accommodate a variety of substrates. Despite these observations, we believe that the compatibility of the observed BRD4<sup>BD1</sup> binding conformation with both open and closed CRBN states suggests that these conformational changes are all possible. Therefore, we believe using the same initial CRBN structure for dBET57 as for other PROTACs can still reasonably reveal the dynamic nature of the ternary complex and would not significantly affect the accuracy of our docking outcomes either.

      (3) Figure 2 displays only a single frame from the simulations, which might not provide a comprehensive representation. Could a contact frequency heatmap of PROTAC with the proteins be included to offer a more detailed view?

      We thank the reviewer for the suggestion! We performed the contact map analysis to observe the average distance between PROTACs and BRD4<sup>BD1</sup> over 400ns of MD simulation (new Figure S4 added).

      We included detailed explanation at page 8 and 9: “The residues contact map throughout the 400ns MD simulation also showed different pattern of protein-protein interactions, indicating that the linkers were able to adopt different conformations (Figure S4).”

      (4) The conclusions in Figure 3 and S11 are based on a single 400 ns trajectory. The reproducibility of these results is therefore uncertain.

      We thank the reviewer for the suggestion! We added one more random seed MD simulation for each PROTAC to ensure the reproducibility of the results. The Result is shown in Figure S21 and the details for each MD run are updated in Table 1.

      (5) Figure 4 indicates significant differences between the first and last 100 ns of the simulations. Does this suggest that the simulations have not converged? If so, how can the statistical analysis presented in this paper be considered reliable?

      We thank the reviewers for the question. The simulation was initiated with a 10-15A gap between BRD4 and Ub to monitor the movement of degradation machinery and Lys-Gly interaction. The significant changes in pseudo dihedral in Figure 4 shows that the large-scale movement of the degradation complex can initiate the Lys-Gly binding. It does not relate to unstable sampling because the system remains very stable when BRD4 comes close to Ub.

      (6) In Figure 5, the dihedral angle of dBET57_#9MD1 is marked on a peptide bond. Shouldn't this angle have a high energy barrier for rotation?

      We thank the reviewers for catching the error! Indeed, it was an error that the dihedral angles were marked on the peptide bond. We reworked the figure and double checked our dihedral correlation analysis. The updated correlate dihedral angle selection and the correlation coefficient is shown in Figure 5.

      (7) Given that crystal structures for dBET 70, 23, and 57 are available, why is there a need to model the complex using protein-protein docking?

      We thank the reviewer for the feedback. Only dBET23 has the ternary complex available in a crystal structure, which has the PROTAC and both proteins, while dBET1, dBET57 and dBET70 are not completed as ternary complexes. Although dBET70 has a crystal structure, its PROTAC’s conformation is not resolved, and thus we decided to still perform protein-protein docking with dBET70. 

      We includeed the explanation at page 8: “Only dBET23 crystal structure is available with the PROTAC and both proteins, while the experimentally determined ternary complexes of dBET1, dBET57 and dBET70 are not available. “

      (8) On page 9, it is mentioned that "only one of the 12 PDB files had CRBN bound to DDB1 (PDB ID 4TZ4)." However, there are numerous structures of the DDB1-CRBN complex available, including those used for docking like 4CI3, as well as 4CI1, 4CI2, 8OIZ, etc.

      We thank the reviewers for the comment! We acknowledged the existence of several DDB1-CRBN complex crystal structures, such as PDB IDs 4CI1, 4CI2, 4CI3, and the more recent 8OIZ. For our study, we chose to use 4TZ4 to maintain consistency in complex construction and to align with the methodology established in a previously published JBC paper (https://doi.org/10.1016/j.jbc.2022.101653), which successfully utilized the same structure for a similar construct. At the time our study was conducted, the 8OIZ structure had not yet been released. We appreciate your suggestion and will consider incorporating alternative structures in future studies to further investigate our findings.

      (9) Table 2 is first referenced on page 8, while Table 1 is mentioned first on page 10. The numbering of these tables should be reversed to reflect their order of appearance in the text.

      We thank the reviewer for catching the error! We switched the order of Table 1 and Table 2.

      Reviewer #3 (Public review):

      The authors offer an interesting computational study on the dynamics of PROTAC-driven protein degradation. They employed a combination of protein-protein docking, structural alignment, atomistic MD simulations, and post-analysis to model a series of CRBN-dBET-BRD4 ternary complexes, as well as the entire degradation machinery complex. These degraders, with different linker properties, were all capable of forming stable ternary complexes but had been shown experimentally to exhibit different degradation capabilities. While in the initial models of the degradation machinery complex, no surface Lys residue(s) of BRD4 were exposed sufficiently for the crucial ubiquitination step, MD simulations illustrated protein functional dynamics of the entire complex and local side-chain arrangements to bring Lys residue(s) to the catalytic pocket of E2/Ub for reactions. Using these simulations, the authors were able to present a hypothesis as to how linker property affects degradation potency. They were able to roughly correlate the distance of Lys residues to the catalytic pocket of E2/Ub with observed DC50/5h values. This is an interesting and timely study that presents interesting tools that could be used to guide future PROTAC design or optimization.

      Reviewer #3 (Recommendations for the authors):

      (1) My most important comment refers to the MM/PBSA analysis, the results of which are shown in Figure S9: binding affinities of -40 to -50 kcal/mol are unrealistic. This would correspond to a dissociation constant of 10^-37 M. This analysis needs to be removed or corrected.

      We thank the reviewer for the comment! MM/PBSA analysis indeed cannot give realistic binding free energy. It does not include the configurational entropy loss which should be a large positive value. In addition, while the implicit PBSA solvent model computes solvation free energy, the absolute values may not be very accurate. However, because this is a commonly used energy calculation, and some readers may like to see quantitative values to ensure that the systems have stable intermolecular attractions, we kept the analysis in SI. We edited the figure legend, moved the Figure S10 in SI page 19, and added sentences to clearly state that the calculations did not include configuration entropy loss “Note that the energy calculations focus on non-bonded intermolecular interactions and solvation free energy calculations using MM/PBSA, where the configuration entropy loss during protein binding was not explicitly included. “.

      (2) I think that the analysis of what in the different dBETx makes them cause different degradation potency is underdeveloped. The dihedral angle analysis (Figure 4B) did not explain the observed behavior in my opinion. Please add additional, clearer analysis as to what structural differences in the dBETx make them sample very different conformations.

      We thank the reviewer for the suggestions! Based on the suggestion, we further performed dihedral entropy analysis for each dihedral angle in the linker part of the PROTAC to examine the flexibility of each PROTAC. Because each PROTAC has a different linker, we now clearly label them in a new Figure S18 in SI page 27. Low dihedral entropies indicate a more rigid structure and thus less flexibility to make a PROTAC more difficult to rearrange and facilitate the protein structural dynamic necessary for ubiquitination.

      We added detailed explanation on page 18: “Our dihedral entropy analysis showed that dBET57 has ~0.3 kcal/mol lower configuration entropies than the other dBETs with three different linkers, suggesting that dBET57 is less flexible than the other PROTACs (Figure S18).”

      (3) "The movement of the degradation machinery correlated with rotations of specific dihedrals of the linker region in dBETs (Figure 5).": this is not sufficiently clear from the figure. Definitely not in a quantitative way.

      We thank the reviewers for the suggestions! To further understand the correlation between PROTACs dihedral angles and the movement of degradation machinery, we performed time dependent correlation analysis to correlate the dihedral angles of the PROTACs and the Lys-Gly distance (Figures 6 and S17).

      We included detailed explanation on page 16:

      “To further examine the correlation between PROTAC rotation and the Lys-Gly interaction, we performed a time-dependent correlation analysis. This analysis showed that PROTAC rotation translates motion over time, leading to the Lys-Gly interaction, with a correlation peak around 60-85 ns, marking the time of the interaction (Figure 6 and Figure S17). In addition, the pseudo dihedral angles also showed a high correlation (0.85 in the case of dBET1) with Lys-Gly distance. This indicated that degradation complex undergoes structural rearrangement and drives the Lys-Gly interaction.

      (4) Cartoons are needed at multiple stages throughout the paper to enhance the clarity of what the modeled complexes looked like (e.g. which subunits they contained).

      We thank the reviewers for the suggestions. We added and remade several Figures with cartoons to better represent the stages. We also used higher resolution and included clearer labels for each protein system.

      (5) The difference between CRL4A E3 ligase and CRBN E3 ligase is not clear to the non-expert reader.

      Thanks for the reviewer’s comment! To clarify the terms "CRL4A E3 ligase" and "CRBN E3 ligase", which refer to different levels of description for the protein complexes, we added a couple of sentences in the Figure 1 legend. As a result, the non-expert readers can clearly know the differences.

      As illustrated in Figure 1,

      • CRL4A E3 ligase refers to the full E3 ligase complex, which includes all protein components such as CRBN, DDB1, CUL4A, and RBX1.

      • CRBN E3 ligase, on the other hand, is a more colloquial term typically used to describe just the CRBN protein, often in isolation from the full CRL4A complex.

      (6) Figure 1, legend: unclear why it's E3 in A and E2 in B.

      We thank the reviewer for the question! E3 ligase in Figure 1A refers to CRBN E3 ligase, where researchers also simply term it CRBN. We have added a sentence to specify that CRBN E3 ligase is also termed CRBN for simplicity. In Figure 1B, E2 was unclear in the sentences. The full name of E2 should be E2 ubiquitin-conjugating enzyme. Because the name is a bit long, researchers also call it E2 enzyme. We have corrected it and used E2 enzyme to make it clearer. 

      (7) "Although the protein-protein binding affinities were similar, other degraders such as dBET1 and dBET57 had a DC50/5h of about 500 nM". It's unclear what experimental data supports the assertion that the protein-protein binding affinities are similar.

      We thank reviewer for the question. Indeed, the statement is unclear.

      We corrected the sentence in page 6: “Although utilizing the exact same warheads, other degraders such as dBET1 and dBET57 had a DC<sub>50/5h</sub> of about 500 nM.”

      (8) Was the construction of the degradation machinery complex guided by experimental data (maybe cryo-EM or tomography)? If not, what is the accuracy of the starting complex for MD? This may impact the reliability of the obtained results.

      Thank you for your insightful comments! Yes, the construction of the degradation machinery complex was guided by available high-resolution crystal structures, which was selected to maintain consistency and align with the methodology established in a previously published JBC paper (https://doi.org/10.1016/j.jbc.2022.101653).

      We acknowledged that static crystal structures represent only a single snapshot of the system and may not capture the full conformational flexibility of the complex. To address this limitation, we performed MD simulations using multiple starting structures. This approach allowed us to explore a broader conformational landscape and reduced the dependence on any single starting configuration, thereby enhancing the reliability of the results.

      We hope this clarifies the robustness of our methodology and the steps taken to ensure accuracy in our simulations.

      (9) "With quantitative data, we revealed the mechanism underlying dBETx-induced degradation machinery": I think this may be too strong of an assertion. The authors may have developed a mechanistic hypothesis that can be tested experimentally in the future.

      We thank the reviewer for the suggestion. This is indeed a strong assertion and needs to be modified. We edited the sentence in page 7: “With quantitative data, we revealed the importance of the structural dynamics of dBETx-induced motions, which arrange positions of surface lysine residues of BRD4<sup>BD1</sup> and the entire degradation machinery.”

      (10) Figure S2: are the RMSDs calculated over all residues? Or just the BRD4 residues? Given that the structures are aligned with respect to CRBN, the reported RMSD numbers might be artificially low since there are many more CRBN residues than there are BRD4 residues. Also, why weren't the crystal structures used for dBET 23 and 70 for the modeling? Wouldn't you want to use the most accurate possible structures? Simulations were run for 23. Why not for 70?

      We thank the reviewer for the suggestion. We added a sentence to more clearly explain the RMSD calculations in Figure S2: “The structural superposition is performed based on the backbone of CRBN and RMSD calculation is conducted based on the backbone of BRD4<sup>BD1</sup>.”

      Although dBET70 has crystal structure, its PROTAC structure is not resolved, and thus we decided to still perform protein-protein docking with dBET70.  dBET1 and dBET57 do not have a crystal structure for the ternary complexes.

      We included the explanation at page 8: “Only dBET23 crystal structure is available with the PROTACs and both proteins, while the experimentally determined ternary complexes of dBET1, PROTACs of dBET57 and dBET70 are not available. “

      a. And there are no crystal structures available for 1 and 57? If so, please clearly say that. Otherwise please report the RMSD.

      We thank the reviewer for the suggestion. We included the explanation at page 8: “Only dBET23 crystal structure is available with the PROTACs and both proteins, while the experimentally determined ternary complexes of dBET1, PROTACs of dBET57 and dBET70 are not available.”

      (11) Table 2 is referenced before Table 1.

      We thank the reviewer for catching the error! We switched the order for Table 1 and Table 2.

      (12) Figure S3 is not referenced in the main paper.

      We thank the reviewer for catching the error! We now referred Figure S3 on page7.

      (13) Minor comments on grammar and sentence structure:

      a. It should be "binding of a ternary complex"

      b. "Our shows the importance": word missing.

      c. "...providing insights into potential orientations for ubiquitination. observe whether the preferred conformations are pre-organized for ubiquitination." Word or words missing.

      We thank reviewer for catching the errors! We corrected grammatical errors and unclear sentences throughout the entire paper and revised the sentences to make them easily understandable for non-expert readers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work describes a convincingly validated non-invasive tool for in vivo metabolic phenotyping of aggressive brain tumors in mice brains. The analysis provides a valuable technique that tackles the unmet need for patient stratification and hence for early assessment of therapeutic efficacy. However, wider clinical applicability of the findings can be attained by expanding the work to include more diverse tumor models.

      We thank the Editors for their comments. This concern was also raised by Reviewer 1 in the Public Review, where we address in more detail – please refer to comment PR-R1.C1. In brief, we agree that a more clinically relevant model should provide more translatable results to patients, and acknowledge this better in the revised manuscript: page 18 (lines 14-17), “While patient-derived xenografts and de novo models would be more suited to recapitulate human GBM heterogeneity and infiltration features, and genetic manipulation of glycolysis and mitochondrial oxidation pathways potentially relevant to ascertain DGE-DMI sensitivity for their quantification, (…)”. However, we also believe that the potential of DGE-DMI for application to different glioblastoma models or patients is demonstrated clearly enough with the two immunocompetent models we chose, extensively reported in the literature as reliable models of glioblastoma.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work introduces a new imaging tool for profiling tumor microenvironments through glucose conversion kinetics. Using GL261 and CT2A intracranial mouse models, the authors demonstrated that tumor lactate turnover mimicked the glioblastoma phenotype, and differences in peritumoral glutamate-glutamine recycling correlated with tumor invasion capacity, aligning with histopathological characterization. This paper presents a novel method to image and quantify glucose metabolites, reducing background noise and improving the predictability of multiple tumor features. It is, therefore, a valuable tool for studying glioblastoma in mouse models and enhances the understanding of the metabolic heterogeneity of glioblastoma.

      Strengths:

      By combining novel spectroscopic imaging modalities and recent advances in noise attenuation, Simões et al. improve upon their previously published Dynamic Glucose-Enhanced deuterium metabolic imaging (DGE-DMI) method to resolve spatiotemporal glucose flux rates in two commonly used syngeneic GBM mouse models, CT2A and GL261. This method can be standardized and further enhanced by using tensor PCA for spectral denoising, which improves kinetic modeling performance. It enables the glioblastoma mouse model to be assessed and quantified with higher accuracy using imaging methods.

      The study also demonstrated the potential of DGE-DMI by providing spectroscopic imaging of glucose metabolic fluxes in both the tumor and tumor border regions. By comparing these results with histopathological characterization, the authors showed that DGE-DMI could be a powerful tool for analyzing multiple aspects of mouse glioblastoma, such as cell density and proliferation, peritumoral infiltration, and distant migration.

      Weaknesses:

      (1) Although the paper provides clear evidence that DGE-DMI is a potentially powerful tool for the mouse glioblastoma model, it fails to use this new method to discover novel features of tumors. The data presented mainly confirm tumor features that have been previously reported. While this demonstrates that DGE-DMI is a reliable imaging tool in such circumstances, it also diminishes the novelty of the study.

      PR-R1.C1 – We thank the Reviewer for the detailed analysis and reply below to each point. PR-R1.C1.1 - novelty: We thank the Reviewer for the comments and understand their perspective. While we acknowledge that our paper is more methodologically oriented, we also believe that significant methodological advances are critical for new discoveries. This was our main motivation and is demonstrated in the present work, showing the ability to map in vivo metabolic fluxes in mouse glioma, a “hot topic” and very desirable in the cancer field. 

      PR-R1.C1.2 – additional tumor features: To strengthen the biological relevance of this methodologic novelty, we have now included immune cell infiltration among the tumor features assessed, besides perfusion, histopathology, cellularity and cell proliferation. For this, we performed iba-1 immunostaining for microglia/ macrophages, now included in Fig. 2-B. These new results demonstrate significantly higher microglia/macrophage infiltration in CT2A tumors compared to GL261, particularly at the tumor border. This is very consistent with the respective tumor phenotypes, namely differences in cell density and cellularity between the 2 cohorts and across pooled cohorts, as we now report: page 9 (lines 10-18), “Such phenotype differences were reflected in the regional infiltration of microglia/macrophages: significantly higher at the CT2A peritumoral rim (PT-Rim) compared to GL261, and slightly higher in the tumor region as well (Fig 2B). Further quantitative regional analysis of Tumor-to-PT-Rim ROI ratios revealed: (i) 47% lower cell density (p=0.004) and 32% higher cell proliferation (p=0.026) in GL261 compared to CT2A (Fig 2C, Table S3); and (ii) strong negative correlations in pooled cohorts between microglia/macrophage infiltration and cellularity (R=-0.91, p=<0.001) or cell density (R=-0.77, p=0.016), suggesting more circumscribed tumor growth with higher peripheral/peritumoral infiltration of immune cells.”; and page 16 (lines 13-19), “GL261 tumors were examined earlier after induction than CT2A (17±0 vs. 30±5 days, p = 0.032), displaying similar volumes (57±6 vs. 60±14, p = 0.813) but increased vascular permeability (8.5±1.1 vs 4.3±0.5 10<sup>3</sup>/min: +98%, p=0.001),  more disrupted stromal-vascular phenotypes and infiltrative growth (5/5 vs 0/5), consistent with significantly lower tumor cell density (4.9±0.2 vs. 8.2±0.3 10<sup>-3</sup> cells/µm<sup>2</sup>: -40%, p<0.001) and lower peritumoral rim infiltration of microglia/macrophages (2.1±0.7 vs. 10.0±2.3 %: -77%, p=0.008)”.

      PR-R1.C1.3 – new tumor features and DGE-DMI: Importantly, such regional differences in cellularity/cell density and immune cell infiltration between the two cohorts were remarkably mirrored by the lactate turnover maps (Fig 3-C), as we now report in the manuscript: page 12 (lines 6-15), “GL261 tumors accumulated significantly less lactate in the core (1.60±0.25 vs 2.91±0.33 mM: -45%, p=0.013) and peritumor margin regions (0.94±0.09 vs 1.46±0.17 mM: 36%, p=0.025) than CT2A – Fig 3 A-B, Table S1. Consistently, tumor lactate accumulation correlated with tumor cellularity in pooled cohorts (R=0.74, p=0.014). Then, lower tumor lactate levels were associated with higher lactate elimination rate, k<sub>lac</sub> (0.11±0.1 vs 0.06±0.01 mM/min: +94%, p=0.006) – Fig 3B – which in turn correlated inversely with peritumoral rim infiltration of microglia/macrophages in pooled cohorts (R=-0.73, p=0.027) – Fig 3-C. Further analysis of Tumor/P-Margin metabolic ratios (Table S3) revealed: (i) +38% glucose (p=0.002) and -17% lactate (p=0.038) concentrations, and +55% higher lactate consumption rate (p=0.040) in the GL261 cohort; and (ii) lactate ratios across those regions reflected the respective cell density ratios in pooled cohorts (R=0.77, p=0.010) – Fig 3-C”. This is a novel, relevant feature compared to our previous work, as highlighted in our discussion: page 17 (lines 1-8), “Tumor vs peritumor border analyses further suggest that lactate metabolism reflects regional histologic differences:

      lactate accumulation mirrors cell density gradients between and across the two cohorts; whereas lactate consumption/elimination rate coarsely reflects cohort differences in cell proliferation, and inversely correlates with peritumoral infiltration by microglia/macrophages across both cohorts. This is consistent with GL261’s lower cell density and cohesiveness, more disrupted stromal-vascular phenotypes, and infiltrative growth pattern at the peritumor margin area, where less immune cell infiltration is detected and relatively lower cell division is expected [43]”.

      We trust that these new features recovered from DGE-DMI (Fig 2-B and Fig 3-C) show its potential for new discoveries in glioblastoma.

      (2) When using DGE-DMI to quantitatively map glycolysis and mitochondrial oxidation fluxes, there is no comparison with other methods to directly identify the changes. This makes it difficult to assess how sensitive DGE-DMI is in detecting differences in glycolysis and mitochondrial oxidation fluxes, which undermines the claim of its potential for in vivo GBM phenotyping.

      PR-R1.C2: We thank the reviewer for raising this important point. The validity of the method for mapping specific metabolic kinetics in mouse glioma was reported in our previous work, using the same animal models, as specified in the introduction (page 4, lines 10-13): “we recently (…) propose[d] Dynamic Glucose-Enhanced (DGE) 2H-MRS [31], demonstrating its ability to quantify glucose fluxes through glycolysis and mitochondrial oxidation pathways in vivo in mouse GBM (…)”. Therefore, this was not reproduced in the present work. 

      In brief, our DGE-DMI results are very consistent with our previous study, where DGE single voxel deuterium spectroscopy was performed in the same tumor models with higher temporal resolution and SNR (as state on page 16, lines 9-10: glycolytic lactate synthesis rate, 0.59±0.04 vs. 0.55±0.07 mM/min; glucose-derived glutamate-glutamine synthesis rate, 0.28±0.06 vs. 0.40±0.08 mM/min), which in turn matched well the values reported by others for glucose consumption rate through: 

      (i) glycolysis, in different tumor models including mouse lymphoma in vivo (0.99 mM/min, by DGE-DMI (Kreis et al. 2020), rat breast carcinoma in situ (1.43 mM/min, using a biochemical assay (Kallinowski et al. 1988), and even perfused GBM cells (1.35 fmol min<sup>−1</sup> cell<sup>−1</sup>, according to Hyperpolarized 13C-MRS (Jeong et al. 2017), very similar to our previous in vivo measurements in GL261 tumors: 0.50 ± 0.07 mM min<sup>−1</sup> = 1.25 ± 0.16 fmol min<sup>−1</sup> cell<sup>−1</sup> (Simoes et al. 2022)); 

      (ii) mitochondrial oxidation, very similar to previous in vivo measurements in mouse GBM xenografts (0.33 mM min<sup>−1</sup>, using 13C spectroscopy (Lai et al. 2018)), and particularly to our in situ measurements in cell culture for (GL261, 0.69 ± 0.09 fmol min<sup>−1</sup> cell<sup>−1</sup>; and CT2A 0.44 ± 0.08 fmol min<sup>−1</sup> cell<sup>−1</sup>), remarkably similar to the in vivo measurements in the respective tumors in vivo (Gl261, 0.32 ± 0.10 mM min<sup>−1</sup> = 0.77 ± 0.23 fmol min<sup>−1</sup> cell<sup>−1</sup>; and CT2A, 0.51 ± 0.11 mM min<sup>−1</sup> = 0.60 ± 0.12 fmol min<sup>−1</sup> cell<sup>−1</sup>) (Simoes et al. 2022)). 

      (3) The study only used intracranial injections of two mouse glioblastoma cell lines, which limits the application of DGE-DMI in detecting and characterizing de novo glioblastomas. A de novo mouse model can show tumor growth progression and is more heterogeneous than a cell line injection model. Demonstrating that DGE-DMI performs well in a more clinically relevant model would better support its claimed potential usage in patients.

      PR-R1.C3: We agree that a more clinically relevant model, such as the one suggested by the Reviewer, would in principle be better suited to provide more translatable results to patients. We however believe that the potential of DGE-DMI for application to different glioblastoma models or patients, with GBM or any other types of brain tumors for that matter, is demonstrated clearly enough with the two syngeneic models we chose, given their robustness and general acceptance in the literature as reliable immunocompetent models of GBM, and for their different histologic and metabolic properties. This way we could fully focus on the novel metabolic imaging method, as compared to our previous single-voxel approach. While both tumor cohorts (GL261 and CT2A) were studied at more advanced stages of tumor progression, the metabolic differences depicted are consistent with the histopathologic features reported, as discussed in the manuscript; namely, the lower glucose oxidation rates. We have now modified the manuscript to highlight this point: page 18 (lines 12-14), “While patient-derived xenografts and de novo models would be more suited to recapitulate human GBM heterogeneity and infiltration features, and genetic manipulation of glycolysis and mitochondrial oxidation pathways could be relevant to ascertain DGE-DMI sensitivity for their quantification, (…)”.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors attempt to noninvasively image metabolic aspects of the tumor microenvironment in vivo, in 2 mouse models of glioblastoma. The tumor lesion and its surrounding appearance are extensively characterized using histology to validate/support any observations made with the metabolic imaging approach. The metabolic imaging method builds on a previously used approach by the authors and others to measure the kinetics of deuterated glucose metabolism using dynamic 2H magnetic resonance spectroscopic imaging (MRSI), supported by de-noising methods.

      Strengths:

      Extensive histological evaluation and characterization.

      Measurement of the time course of isotope labeling to estimate absolute flux rates of glucose metabolism.

      Weaknesses:

      (1) The de-noising method appears essential to achieve the high spatial resolution of the in vivo imaging to be compatible with the dimensions of the tumor microenvironment, here defined as the immediately adjacent rim of the mouse brain tumors. There are a few challenges with this approach. Often denoising methods applied to MR spectroscopy data have merely a cosmetic effect but the actual quantification of the peaks in the spectra is not more accurate than when applied directly to original non-denoised data. It is not clear if this concern is applicable to the denoising technique applied here. However, even if this is not an issue, no denoising method can truly increase the original spatial resolution at which data were acquired. A quick calculation estimates that the spatial resolution of the 2H MRSI used here is 30-40 times too low to capture the much smaller tumor rim volume, and therefore there is concern that normal brain tissue and tumor tissue will be the dominant metabolic signal in so-called tumor rim voxels. This means that the conclusions on metabolic features of the (much larger) tumor are much more robust than the observations attributed to the (much smaller) tumor microenvironment/tumor rim.

      PR-R2.C1: We thank the Reviewer for the constructive comments regarding resolution and tumor rim, and denoising. These issues were raised more extensively in the section Recommendations For The Authors, where they are addressed in detailed (RA-R2.C2). In summary, we agree with the Reviewer that no denoising method can increase the nominal resolution; not was that our purpose. Thus, we clarify the relevance of spectral matrix interpolation in MRSI, and how our display resolution should in principle provide a better approximation to the ground truth than the nominal resolution, relevant for ROI analysis in the tumor margin. While we further show relevant correlations between metabolic maps and histologic features in tumor core and margin, we agree with the reviewer that our observations in the tumor core are more robust than those in the margin, and acknowledge this in the Discussion: page 19, lines 6-10: “Therefore, further DGE-DMI preclinical studies aimed at detecting and quantifying relatively weak signals, such as tumor glutamate-glutamine, and/or increase the nominal spatial resolution to better correlate those metabolic results with histology findings (e.g. in the tumor margin), should improve basal SNR with higher magnetic field strengths, more sensitive RF coils, and advanced DMI pulse sequences [55]).”

      (2) To achieve their goal of high-level metabolic characterization the authors set out to measure the deuterium labeling kinetics following an intravenous bolus of deuterated glucose, instead of the easier measurement of steady-state after the labeling has leveled off. These dynamic data are then used as input for a mathematical model of glucose metabolism to derive fluxes in absolute units. While this is conceptually a well-accepted approach there are concerns about the validity of the included assumptions in the metabolic model, and some of the model's equations and/or defining of fluxes, that seem different than those used by others.

      PR-R2.C2: These concerns about the metabolic model, were also raised in more detail in the section Recommendations For The Authors, where they are addressed more extensively – please refer to RA-R2.C3 (glucose infusion protocol) and RA-R2.C4 (equations). In brief, we explain that the total volume injected (100uL/25g animal) is standard for i.v. administration in mice, and clarify this better in the manuscript (page 24, line 23); as well as the differences between our kinetic model and the original one reported by Kreis et al. (Radiology 2020), who quantified glycolysis kinetics on a subcutaneous mouse model of lymphoma, exclusively glycolytic and thus estimating the maximum glucose flux rate was from the lactate synthesis rate (Vmax = Vlac). Instead, we extended this model to account for glucose flux rates for lactate synthesis (Vlac) and also for glutamate-glutamine synthesis (Vglx) in mouse glioblastoma, where Vmax = Vlac + Vglx, also acknowledging its simplistic approach in the Discussion (page 20, lines 22-24: “(…) metabolic fluxes [estimations] through glycolysis and mitochondrial oxidation (…) could potentially benefit from an improved kinetic model simultaneously assessing cerebral glucose and oxygen metabolism, as recently demonstrated in the rat brain with a combination of 2H and 17O MR spectroscopy [62] (…)”).

      Reviewer #3 (Public Review):

      Summary:

      Simoes et al enhanced dynamic glucose-enhanced (DGE) deuterium spectroscopy with Deuterium Metabolic Imaging (DMI) to characterize the kinetics of glucose conversion in two murine models of glioblastoma (GBM). The authors combined spectroscopic imaging and noise attenuation with histological analysis and showcased the efficacy of metabolic markers determined from DGE DMI to correlate with histological features of the tumors. This approach is also potent to differentiate the two models from GL261 and CT2A.

      Strengths:

      The primary strength of this study is to highlight the significance of DGE DMI in interrogating the metabolic flux from glucose. The authors focused on glutamine/glutamate and lactate. They attempted to correlate the imaging findings with in-depth histological analysis to depict the link between metabolic features and pathological characteristics such as cell density, infiltration, and distant migration.

      Weaknesses:

      (1) A lack of genetic interrogation is a major weakness of this study. It was unclear what underlying genetic/epigenetic aberrations in GL261 and CT2A account for the metabolic difference observed with DGE DMI. A correlative metabolic confirmation using mass spectrometry of the two tumor specimens would give insight into the observed imaging findings.

      PR-R3.C1: We thank the Reviewer for the helpful comments, which we break down below.

      PR-R3.C1.1 - genetic interrogation/manipulation: While we did not have access to conditional models for key enzymes of each metabolic pathway, for their genetic manipulation, we did however assess the mitochondrial function in each cell line, showing a significantly higher respiration buffer capacity and more efficient metabolic plasticity between glycolysis and mitochondrial oxidation in GL261 cells compared to CT2A (Simoes et al. NIMG:Clin 2022). This could drive e.g. more active recycling of lactate through mitochondrial metabolism in GL261 cells, aligned with our observations of increased glucose-derived lactate consumption rate in those tumors compared to CT2A. We have now included this in the discussion (page 17, lines 812): “our results suggest increased lactate consumption rate (active recycling) in GL261 tumors with higher vascular permeability, e.g. as a metabolic substrate for oxidative metabolism [44] promoting GBM cell survival and invasion [45], aligned with the higher respiration buffer capacity and more efficient metabolic plasticity of GL261 cells than CT2A [31].”

      PR-R3.C1.2 - correlation with post-mortem metabolic assessment: implementing this validation step would require an additional equipment, also not accessible to us: focalized irradiator, to instantly halt all metabolic reactions during animal sacrifice. We do believe that DGE-DMI could guide further studies of such nature, aimed at validating the spatio-temporal dynamics of regional metabolite concentrations in mouse brain tumors. Thus, the importance of end-point validation is now stressed more clearly in the manuscript (page 20, lines 13-16): “(…) mapping pathway fluxes alongside de novo concentrations (…) may be determinant for the longitudinal assessment of GBM progression, with end-point validation (…)”.

      These concerns and recommendations were also raised by the Reviewer in the Recommendations to Authors section, where we address them more extensively – please see RA-R1.C3 and RA-R1.C2, respectively.

      (2) A better depiction of the imaging features and tumor heterogeneity would support the authors' multimodal attempt.

      PR-R3.C2: We agree with the Reviewer that including more imaging features would improve the non-invasive characterization of each tumor. Due to the RF coil design and time constraints, we did not acquire additional data, such as diffusion MRI to assess tissue microstructure. Instead, our multi-modal protocol included two dynamic MRI studies on each animal, for multiparametric assessment of tumor volume, metabolism and vascular permeability, using 1H-MRI, 2H-spectroscopy during 2H-labelled glucose injection, and 1H-imaging during Gd-DOTA injection, respectively. Rather than aiming at tumor radiomics, we focused on the dynamic assessment of tumor metabolic turnover with heteronuclear spectroscopy, which is challenging per se and particularly in mouse brain tumors, given their very small size. For such multi-modal studies we used a previously developed dual tuned RF coil: the deuterium coil (2H) positioned in the mouse head, for optimal SNR; whereas the proton coil (1H) had suboptimal performance compared a conventional single tuned coil, and was used only for basic localization and adjustments, reference imaging and tumor volumetry (T2-weighted), and DCE-T1 MRI (T1weighted). The latter was analyzed pixel-wise to assess spatial correlations between tumor permeability and metabolic metrics, as shown in Fig S3. Whereas the limited T2w MRI data collected was only analyzed for tumor volume assessment; no additional imaging features were extracted (e.g. kurtosis/skewness), since such assessment did not shown any differences between the two tumor cohorts in our previous study (Simoes et al NIMG:Clin 2022).

      (3) Integration of the various cell types in the tumor microenvironment, as allowed with the resolution of DGE DMI, will explain the observed difference between GL261 and CT2A. Is there a higher percentage of infiltrative "other cells" observed in GL261 tumor?

      PR-R3.C3: While DGE-DMI resolution is far larger than brain and brain tumor cell sizes, we now performed additional analysis to assess the percentage of microglia/macrophages in both cohorts. The results are now included in the manuscript, namely Fig. 2B, as previously explained in PR-R1.1. Interestingly though, we observed a lower percentage of infiltrative "other cells" in GL261 tumors compared to CT2A, which we discuss in the manuscript: pages 19-20 (lines 20-24 and 1-4), “Finally, our results are indicative of higher microglia/macrophage infiltration in CT2A than GL261 tumors, which is inconsistent with another study reporting higher immunogenicity of GL261 tumors than CT2A for microglia and macrophage populations [56]. Such discrepancy could be related to methodologic differences between the two studies, namely the endpointguided assessment of tumor growth (bioluminescence vs MRI, more precise volumetric estimations) and the stage when tumors were studied (GL261 at 23-28 vs 16-18 days postinjection, i.e. less time for immune cell to infiltration in our case), presence/absence of a cell transformation step (GFP-Fluc engineered vs we used original cell lines), or perhaps media conditioning effects during cell culture due to the different formulations used (DMEM vs RPMI).”

      (4) This underlying technology with DGE DMI is capable of identifying more heterogeneous GBM tumors. A validation cohort of additional in vivo models will offer additional support to the potential clinical impact of this study.

      PR-R3.C4: We agree with the Reviewer that applying DGE-DMI to more clinically-relevant models of human brain tumors will enhance its translational impact to patients, as also suggested by Reviewer 1 and addressed in PR-R1.C3. We also believe that the feasibility and potential of DGE-DMI for application to different glioblastoma models or patients, with GBM or any other primary or secondary brain tumors, is clearly demonstrated in our work, using two reliable and well-described immunocompetent models of GBM. In any case, we have now modified the manuscript to better acknowledge this point: page 18 (lines 14-16), “(…) patient-derived xenografts and de novo models would be more suited to recapitulate human GBM heterogeneity and infiltration features (…)”.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors utilize longitudinal MRI to track tumor volumes but perform DMI at endpoint with late-stage tumors. Their previous publication applied metabolic imaging in tumors before the presence of necrosis. It would be valuable to perform longitudinal DMI to examine the evolution of glucose flux metabolic profile over time in the same tumor.

      RA-R1.C1: We thank the Reviewer for the very useful comments to our manuscript. We agree – in this work, we aimed at “extending” our previous DGE-2H single-voxel methodology to multivoxel (DMI), thoroughly demonstrating (1) its in vivo application to the same immunocompetent models of glioblastoma and (2) the ability to depict their phenotypic differences, and therefore (3) the potential for the metabolic characterization of more advanced models of GBM and/or their progression stages. We believe these objectives were achieved. Our results indeed open several possibilities, from longitudinal assessment of the spatio-temporal metabolic changes during GBM progression (and treatment-response) to its application to other models recapitulating more closely the human disease. Now that we have comprehensively demonstrated a protocol for DGE-DMI acquisition, processing and analysis in mouse GBM (a very challenging methodology), and demonstrate it in different mouse GBM cell lines, new studies can be designed to tackle more specific questions, like the one suggested here by the Reviewer. We have modified the manuscript to make this point clearer: page 20 (lines 15-17), “This may be determinant for the longitudinal assessment of GBM progression, with end-point validation; and/or treatment-response, to help selecting among new therapeutic modalities targeting GBM metabolism (…)”; page 21 (lines 5-8), “(…) we report a DGE-DMI method for quantitative mapping of glycolysis and mitochondrial oxidation fluxes in mouse GBM, highlighting its importance for metabolic characterization and potential for in vivo GBM phenotyping in different models and progression stages.”.

      (2) The authors demonstrate a promising correlation between metabolic phenotypes in vivo and key histopathological features of GBM at the endpoint. Directly assessing metabolites involved in glucose fluxes on endpoint tumor samples would strengthen this correlation.

      RA-R1.C2: While we acknowledge the Reviewer’s point, there were two main limitations to implementing such validation step in our protocol: 

      (1) Since we performed dynamic experiments, at the end of each study most 2H-glucose-derived metabolites were already below their maximum concentration (or barely detectable in some cases), as depicted by the respective kinetic curves (Fig 1-D and Fig S7), and thus no longer detectable in the tissues. Importantly, DGE-DMI could guide further studies towards selecting the ideally time-point for validating different metabolite concentrations in specific brain regions.

      (2) Such validation would require sacrificing the animals with a focalized irradiator (which we did not have), to instantly halt all metabolic reactions. Only then we could collect and analyze the metabolic profile of specific brain regions, either by in vitro MS or high-resolution NMR following extraction, or by ex vivo HRMAS analysis of the intact tissue, as reported previously by some of the authors for validation of glucose accumulation in different regions of mouse GL261 tumors (Simões et al. NMRB 2010: https://doi.org/10.1002/nbm.1421). Importantly, even if we did have access to a focalized irradiator, such protocols for metabolic characterization would compromise tissue integrity and thus the histopathologic analysis performed in this study. 

      We do agree with the importance of end-point validation and therefore stress it more clearly in the revised manuscript (page 20, lines 14-16): “(…) mapping pathway fluxes alongside de novo concentrations (…) may be determinant for the longitudinal assessment of GBM progression, with end-point validation (…)”.

      (3) Genetic manipulation of key players in the metabolic pathways studied in this paper (glycolysis and mitochondrial oxidation) would offer a strong validation for the sensitivity of DGE-DMI in accurately distinguishing metabolites (lactate, glutamate-glutamine) and their dynamics.

      RA-R1.C3: Thank you for this comment, we agree. This would be particularly relevant in the context of treatment-response monitoring. While such models were not available to us (conditional spatio-temporal manipulation of metabolic pathway fluxes), we believe our results can still demonstrate this point: We previously used in vivo DGE 2H-MRS to show evidence of decreased glucose oxidation fraction (Vglx/Vlac) in GL261 tumors under acute hypoxia (FiO2=12 %) compared to regular anesthesia conditions (FiO2=31 %), consistent with the inhibition of OXPHOS due to lower oxygens tensions (Simoes et al. NIMG:Clin 2022). In the present work, enhanced glycolysis in tumors vs peritumoral brain regions was clearly observed in all the animals studied, from both cohorts, as shown in Fig 1-B and Fig S4. Moreover, the spectral background (before glucose injection) is limited to a single peak in all the voxels: basal DHO, used as internal reference for spatio-temporal quantification of glucose, glutamine-glutamate, and lactate, all de novo and extensively characterized in healthy and glioma-bearing rodent brain (Lu et al. JCBFM 2018; Zhang et al. NMR Biomed 2024, de Feyter et al. SciAdv 2018; Batsios et al ClinCancerRes 2022;  Simoes et al. NIMG:Clin 2022) and other rodent tumors (Kreis et al. Radiology 2020, Montrazi et al. SciRep 2023). We have modified the manuscript to clarify this point (page 18, lines 14-17) “(…) patient-derived xenografts and de novo models would be more suited to recapitulate human GBM heterogeneity and infiltration features, and genetic manipulation of glycolysis and mitochondrial oxidation pathways could be relevant to ascertain DGE-DMI sensitivity for their quantification (…)”.

      (4) Please explain more why DEG-DMI can distinguish different glucose metabolites and how accurate it is.

      RA-R1.C4: DGE-DMI is the imaging extension of our previous work based on single-voxel deuterium spectroscopy, therefore relying on the same fundamental technique and analysis pipeline but moving from a temporal analysis to a spatio-temporal analysis for each metabolite, and thus dealing with more data. Unlike conventional proton spectroscopy (1H), only metabolites carrying the deuterium label (2H) will be detected in this case, including the natural abundance DHO (~0.03%), the deuterated glucose injected and its metabolic derivatives, namely deuterated lactate and deuterated glutamate-glutamine. Due to their different molecular structures, the deuterium atoms will resonate at specific frequencies (chemical shifts, ppm) during a 2H magnetic resonance spectroscopy experiment, as illustrated in Fig 1-A. The method is fully reproducible and accurate, and has been extensively reported in the literature from high-resolution NMR spectroscopy to in vivo spectroscopic imaging of different nuclei, such as proton (1H), deuterium (2H), carbon (13C), phosphorous (31P), and fluorine (19F). Since the fundamental principles of DMI and its application to brain tumors have been very well described in the flagship article by de Feyter et al., we have now highlighted this in the manuscript: page 4 (lines 4-7), “Deuterium metabolic imaging (DMI) has been (…) demonstrated in GBM patients, with an extensive rationale of the technique and its clinical translation [18], and more recently in mouse models of patient-derived GBM subtypes (…)”.

      (5) When mapping glycolysis and mitochondrial oxidation fluxes, add a control method to compare the reliability of DEG-DMI.

      RA-R1.C5: This concern (“lack of a control method”) was also raised by the Reviewer in the section Public Reviews section, where we already address it (PR-R1.2).

      (6) If using peritumoral glutamate-glutamine recycling as a marker of invasion capacity, what would be the correct rate of the presence of secondary brain lesions?

      RA-R1.C6: While our results suggest the potential of peritumoral glutamate-glutamine recycling as a marker for the presence of secondary brain lesions, this remains to be ascertained with higher sensitivity for glutamate-glutamine detection. Therefore, we cannot make further conclusions in this regard.  

      To make this point clear, we state in different sections of the discussion: page 19 (lines 1-2), “(…) recycling of the glutamate-glutamine pool may reflect a phenotype associated with secondary brain lesions.”; and page 19 (lines 6-10), “Therefore, further DGE-DMI preclinical studies aimed at detecting and quantifying relatively weak signals, such as tumor glutamateglutamine, and/or increase spatial resolution to correlate those metabolic results with histology findings (e.g in the tumor margin), should improve basal SNR with higher magnetic field strengths, more sensitive RF coils, and advanced DMI pulse sequences [55]).”).  

      (7) There are duplicated Vlac in Figure S3 B.

      RA-R1.C7: This was a typo that has now been corrected. Thank you.

      (8) Figure 4, it would be better to add a metabolic map of a tumor without secondary brain lesions to compare.

      RA-R1.C8: We fully agree and have modified Fig 4 accordingly, together with its legend.

      Particularly, we have included tumors C4 (without secondary lesions) vs G4 (with) for this “comparison”, since details of their histology, including the secondary lesions, are provided in Fig 2.

      (9) Full name of SNR and FID should be listed when first mentioned.

      RA-R1.C9: Agreed and modified accordingly, on pages 6-7 (lines 22-1), ”signal-to-noise-ratio (SNR)”, and page 19 (lines 5-6), “free induction decay (FID)”.

      (10) Page 2, Line 14: (59{plus minus}7 mm3) is not needed in the abstract.

      RA-R1.C10: As requested we have removed this specification from the Abstract.

      (11) Page 4, Line 22: Closing out the Introduction section with a statement on broader implications of the present work would enhance the effectiveness of the section.

      RA-R1.C11: We have added an additional sentence in this regard – pages 4-5 (lines 24-2): “Since DMI is already performed in humans, including glioblastoma patients [18], DGE-DMI could be relevant to improve the metabolic mapping of the disease.”

      (12) Define all acronyms to facilitate comprehension. For example, principal component analysis (PCR) and signal-to-noise ratio (SNR).

      R1.C12: Thank you for the comment. We have now defined all the acronyms when first used, including PCA (page 4 (line 11), “Marcheku-Pastur Principal Component Analysis (MP-PCA)”) and SNR (pages 6-7 (lines 22-1), as indicated above in comment R1.9).

      (13) Some elements within the figures have lower resolution, specifically bar graphs.

      RA-R1.C13: We apologize for this oversight. All the Figures have been revised accordingly, to correct this problem. Thank you.

      (14) Page 13, Line 8: "underly" should be spelled "underlie."

      RA-R1.C14: The typo has been corrected on page 15 (line 8), thank you.

      (15) Page 14, Line 13: "better vascular permeability" would be more effectively phrased as "increased vascular permeability."

      RA-R1.C15: This has also been corrected on page 16 (line 14), thank you.

      Reviewer #2 (Recommendations For The Authors):

      (1) I strongly suggest adding a scale bar in the histology figures.

      RA-R2.C1: Thank you for spotting our oversight! This has now been added as requested to Fig 2.

      (2) The 2H MRSI data were acquired at a nominal resolution of 2.25 x 2.27 x 2.25 mm^3, resulting in a nominal voxel volume of 11.5 uL. (In reality, this is larger due to the point spread function leading to signal bleeding from adjacent voxels.) If we estimate the volume of the tumor rim, as indicated by the histology slides, as (generously) ~ 50 um in width, 3.2 mm long (the diagonal of a 2.25 x 2.25 mm^2 square, and 2.27 mm high, we get a volume of 0.36 uL. Therefore the native spatial resolution of the 2H MRSI is at least 30 times larger than the volume occupied by the tumor rim/microenvironment. Normal tissue and tumor tissue will contribute the majority of the metabolic signal of that voxel. I feel an opposite approach could have been pursued: find out the spatial resolution needed to characterize the tumor rim based on the histology, then use a de-noising method to bring the SNR of those data to be acceptable. (this is just a thought experiment that assumes de-noising actually works to improve quantification for MRS data instead of merely cosmetically improve the data, so far the jury is still out on that, in my view).

      RA-R2.C2 – We thank the Reviewer for the detailed analysis and reply below to each point.

      RA-R2.C2.1 – spatial resolution and tumor rim: Our nominal voxel volume was indeed 11.5 uL, defined in-plane by the PSF which explains signal bleeding effects, as in any other imaging modality. The DMI raw data were Fourier interpolated before reconstruction, rendering a final in-plane resolution of 0.56 mm (0.72 uL voxel volume). The tumor rim (margin) analyzed was roughly 0.1 mm width (please note, not 0.05 mm), as explained in the methods section (page 28, line 16) and now more clearly defined with the scale bars in Fig 2. According to the Reviewer’s analysis, this would correspond to 0.1*3.2*2.27 = 0.73 uL, which we approximated with 1 voxel (0.72 uL), as displayed in Fig 3-A. Importantly, it has long been demonstrated that Fourier interpolation provides a better approximation to the ground truth compared to the nominal resolution, and even to more standard image interpolation performed after FT - see for instance Vikhoff-Baaz B et al. (MRI 2001. 19: 1227-1234), now citied in the Methods section: page 24, line 24 ([69]). While we do agree that both normal brain and tumor should contribute significantly to the metabolic signal in this relatively small region, we rely on extensive literature to maintain that despite its smoothing effect, the display resolution provides a better approximation to the ground truth and is therefore more suited than the nominal resolution for ROI analysis in this region. Still, we acknowledge this potential limitation in the Discussion: page 19, lines 6-10: “Therefore, further DGE-DMI preclinical studies aimed at detecting and quantifying relatively weak signals, such as tumor glutamate-glutamine, and/or increase the nominal spatial resolution to better correlate those metabolic results with histology findings (e.g. in the tumor margin), should improve basal SNR with higher magnetic field strengths, more sensitive RF coils, and advanced DMI pulse sequences [55]).”

      RA-R2.C2.2 – metabolic and histologic features at the tumor rim: Furthermore, we also performed ROI analysis of lactate metabolic maps in tumor and peritumoral rim areas closely reflected regional differences in cellularity and cell density, and immune cell infiltration between the 2 tumor cohorts and across pooled cohorts, as explained in the Public Review section - PR-R1.1 – and now report in the manuscript: page 12 (lines 6-16), “GL261 tumors accumulated significantly less lactate in the core (1.60±0.25 vs 2.91±0.33 mM: -45%, p=0.013) and peritumor margin regions (0.94±0.09 vs 1.46±0.17 mM: -36%, p=0.025) than CT2A – Fig 3 A-B, Table S1. Consistently, tumor lactate accumulation correlated with tumor cellularity in pooled cohorts (R=0.74, p=0.014). Then, lower tumor lactate levels were associated with higher lactate elimination rate, k<sub>lac</sub> (0.11±0.1 vs 0.06±0.01 mM/min: +94%, p=0.006) – Fig 3B – which in turn correlated inversely with peritumoral margin infiltration of microglia/macrophages in pooled cohorts (R=-0.73, p=0.027) - Fig 3-C. Further analysis of Tumor/P-Margin metabolic ratios (Table S3) revealed: (i) +38% glucose (p=0.002) and -17% lactate (p=0.038) concentrations, and +55% higher lactate consumption rate (p=0.040) in the GL261 cohort; and (ii) lactate ratios across those regions reflected the respective cell density ratios in pooled cohorts (R=0.77, p=0.010) – Fig 3-C”; page 17 (lines 1-8), “Tumor vs peritumor border analyses further suggest that lactate metabolism reflects regional histologic differences: lactate accumulation mirrors cell density gradients between and across the two cohorts; whereas lactate consumption/elimination rate coarsely reflects cohort differences in cell proliferation, and inversely correlates with peritumoral infiltration by microglia/macrophages across both cohorts. This is consistent with GL261’s lower cell density and cohesiveness, more disrupted stromal-vascular phenotypes, and infiltrative growth pattern at the peritumor margin area, where less immune cell infiltration is detected and relatively lower cell division is expected [43]”.

      RA-R2.C2.3 – alternative method: Regarding the alternative method suggested by the Reviewer, we have tested a similar approach in another region (tumor) and it did not work, as explained the Discussion section (page 19, lines 5-6) and Fig S11. Essentially, Tensor PCA performance improves with the number of voxels and therefore limiting it to a subregion hinders the results. In any case, if we understand correctly, the Reviewer suggests a method to further interpolate our data in the spatial dimension, which would deviate even more from the original nominal resolution and thus sounds counter-intuitive based on the Reviewer’s initial comment about the latter. More importantly, we would like to remark the importance of spectral denoising in this work, questioned by the Reviewer. There are several methods reported in the literature, most of them demonstrated only for MRI. We previously demonstrated how MPPCA denoising objectively improved the quantification of DCE-2H MRS in mouse glioma by significantly reducing the CRLBs: 19% improved fitting precision. In the present study, Tensor PCA denoising was applied to DGE-DMI, which led to an objective 63% increase in pixel detection based on the quality criteria defined, unambiguously reflecting the improved quantification performance due to higher spectral quality. 

      (3) Concerns re. the metabolic model: 2g/kg of glucose infused over 120 minutes already leads to hyperglycemia in plasma. Here this same amount is infused over 30 seconds... such a supraphysiological dose could lead to changes in metabolite pool sizes -which are assumed to not change since they are not measured, and also fractional enrichment which is not measured at all. Such assumptions seem incompatible with the used infusion protocol.

      RA-R2.C3:  We understand the concern. However, the protocol was reproduced exactly as originally reported by Kreis et al (Radiology 2020) that performed the measurements in mice and measured the fraction of deuterium enrichment (f=0.6). Since we also worked with mice, we adopted the same value for our model. The total volume injected was 100uL/25g animal, and adjusted for animal weight (96uL/24g average – Table S1), as we reported before (Simões et al. NIMG:Clin 2022), which is standard for i.v. bolus administration in mice as it corresponds to ~10% of the total blood volume. This volume is therefore easily diluted and not expected to introduce significant changes in the metabolic pool sizes. Continuous infusion protocols on the other hand will administer higher volumes, easily approaching the mL range when performed over periods as large as 120 min. This would indeed be incompatible with our bolus infusion protocol. We have now clarified this in the manuscript – page 24 (line 23): “i.v. bolus of 6,6<sup>′2</sup>H<sub>2</sub>-glucose (2 mg/g, 4 µL/g injected over 30 s (…)”.

      (4) Vmax = Vlac + Vglx. This is incorrect: Vmax = Vlac.

      RA-R2.C4: Thank you for raising this concern. As indicated in RA-R2.C3, our model (Simões et al. NIMG:Clin 2022) was adapted from the original model proposed by Kreis et al. (Radiology 2020), where the authors quantified glycolysis kinetics on a subcutaneous mouse model of lymphoma, exclusively glycolytic and thus estimating the maximum glucose flux rate was from the lactate synthesis rate (Vmax = Vlac). However, we extended this model to account for glucose flux rates for lactate synthesis (Vlac) and also for glutamate-glutamine synthesis (Vglx), where Vmax = Vlac + Vglx, as explained in our 2022 paper. While we acknowledge the rather simplistic approach of our kinetic model compared to others - reported by 13C-MRS under continuous glucose infusion in healthy mouse brain (Lai et al. JCBFM 2018) and mouse glioma (Lai et al. IJC 2018) – and acknowledge this in the Discussion (page 20, lines 22-24: “(…) metabolic fluxes [estimations] through glycolysis and mitochondrial oxidation (…) could potentially benefit from an improved kinetic model simultaneously assessing cerebral glucose and oxygen metabolism, as recently demonstrated in the rat brain with a combination of 2H and 17O MR spectroscopy [62] (…)”), our Vlac and Vglx results are consistent with our previous DGE 2H-MRS findings in the same glioma models, and very aligned with the literature, as discussed in PR-R1.C2.1.

      (5) Some other items that need attention: 0.03 % is used as the value for the natural abundance of DHO. The natural abundance of 2H in water can vary somewhat regionally, but I have never seen this value reported. The highest seen is 0.015%.

      RA-R2.C5: The Reviewers is referring to the natural abundance of deuterium in hydrogen: 1 in ~6400 is D, i.e. 0.015 %. The 2 hydrogen atoms in a water molecule makes ~3200 DHO, i.e. 0.03%. Indeed the latter can have slight variations depending on the geographical region, as nicely reported by Ge et al (Front Oncol 2022), who showed a 16.35 mM natural-abundance of DHO in the local tap water of St Luis MO, USA (55500/16.35 = 1/3364 = 0.034%).

      (6) Based on the color scale bar in Figure 1, the HDO concentration appears to go as high as 30 mM. Even if this number is off because of the previous concern (HDO), it appears to be a doubling of the HDO concentration. Is this real? What would be the origin of that? No study using [6,6'-2H2]-glucose that I'm aware of has reported such an increase in HDO.

      RA-R2.C6: As explained before (RA-R2.C3 and RA-R2.C4), we based our protocol and model on Kreis et al (Radiology 2020), who reported ~10 mM basal DHO levels raising up to ~27 mM after 90min, which are well within the ~30 mM ranges we report over a longer period (132 min).

      Similar DHO levels were mapped with DGE-DMI in mouse pancreatic tumors (Montrazi et al. SciRep 2023).

      (7) "...the central spectral matrix region selected (to discard noise regions outside the brain, as well as the olfactory bulb and cerebellum)". This reads as if k-space points correspond one-toone with imaging pixels, which is not the case.

      RA-R2.C7: We rephrased the sentence to avoid such potential misinterpretation, specifically: page 25 (lines 19-21): “Each dataset was averaged to 12 min temporal resolution and the noise regions outside the brain, as well as the olfactory bulb and cerebellum, were discarded (…)”.

      (8) The use of the term "glutamate-glutamine recycling" is not really appropriate since these metabolites are not individually detected with 2H MRS, which is a requirement to measure this neurotransmitter cycling.

      RA-R2.C8: Thank you for this comment. To avoid this misinterpretation, we have now rephrased "glutamate-glutamine recycling" to “recycling of the glutamate-glutamine pool” in all the sentences, namely: page 2 (lines 14-15); page 15 (line 8); page 15 (line 8); page 19 (line 1); page 21 (line 10).

      Reviewer #3 (Recommendations For The Authors):

      (1) One major issue is the lack of underlying genetics, and therefore it is hard for readers to put the observed difference between GL261 and CT2A into context. The authors might consider perturbing the genetic and regulatory pathways on glycolysis and glutamine metabolism, repeating DGE DMI measure, in order to enhance the robustness of their findings.

      RA-R3.C1: We thank the reviewer for the helpful revision and comments. The point made here is aligned with Reviewer 1’s, addressed in RA-R1.C3; and also with our previous reply to the Reviewer, PR-R3.C1. Thus, we agree that conditional spatio-temporal manipulation of metabolic pathway fluxes would be relevant to further demonstrate the robustness of DGEDMI, particularly for treatment-response monitoring. While such models were not available to us, our previous findings seem compelling enough to demonstrate this point. Thus, we previously showed a significantly higher respiration buffer capacity and more efficient metabolic plasticity between glycolysis and mitochondrial oxidation in GL261 cells compared to CT2A (Simoes et al. NIMG:Clin 2022), which could enhance lactate recycling through mitochondrial metabolism in GL261 cells and thus explain our observations of increased glucose-derived lactate consumption rate in those tumors compared to CT2A. We have now included this in the discussion (page 17, lines 8-12): “our results suggest increased lactate consumption rate (active recycling) in GL261 tumors with higher vascular permeability, e.g. as a metabolic substrate for oxidative metabolism [44] promoting GBM cell survival and invasion [45], aligned with the higher respiration buffer capacity and more efficient metabolic plasticity of GL261 cells than CT2A [31].” Moreover, we previously showed evidence of DGE-2H MRS’ ability to detect decreased glucose oxidation fraction (Vglx/Vlac) in GL261 tumors under acute hypoxia (FiO2=12 %) compared to regular anesthesia conditions (FiO2=31 %), consistent with the inhibition of OXPHOS due to lower oxygens tensions (Simoes et al. NIMG:Clin 2022).

      (2) Is increased resolution possible for DGE DMI to correlate with histological findings?

      RA-R3.C2: The resolution achieved with DGE DMI, or any other MRI method, is limited by the signal-to-noise ratio (SNR), which in turn depends on the equipment (magnetic field strength and radiofrequency coil), the pulse sequence used, and post-processing steps such as noiseremoval. Thus, increased resolution could be achieved with higher magnetic field strengths, more sensitive RF coils, more advanced DMI pulse sequences, and improved methods for spectral denoising if available. We have used the best configuration available to us and discussed such limitations in the manuscript, including now a few modifications to address the Reviewer’s point more clearly – page 19 (lines 6-10): “Therefore, further DGE-DMI preclinical studies aimed at detecting and quantifying relatively weak signals, such as tumor glutamateglutamine, and/or increase the nominal spatial resolution to better correlate those metabolic results with histology findings (e.g in the tumor margin), should improve basal SNR with higher magnetic field strengths, more sensitive RF coils, and advanced DMI pulse sequences [55])”.

      (3) The authors might consider measuring the contribution of stromal cells and infiltrative immune cells in the analysis of DGE DMI data, to construct a more comprehensive picture of the microenvironment.

      RA-R3.C3: Thank you for this important point. We now added additional Iba-1 stainings of infiltrating microglia/macrophages, for each tumor, as suggested by the Reviewer; stromal cells would be more difficult to detect and we did not have access to a validated staining method for doing so. Our new data and results - now included in Fig 2B – indicate significantly higher levels of Iba-1 positive cells in CT2A tumors compared to GL261, which are particularly noticeable in the periphery of CT2A tumors and consistent with their better-defined margins and lower infiltration in the brain parenchyma. This has been explained more extensively in PRR1.1.

      (4) Additional GBM models with improved understanding of the genetic markers would serve as an optimal validation cohort to support the potential clinical translation.

      RA-R3.C4: We agree with the Reviewer and direct again to RA-R1.3, where we already addressed this suggestion in detail and introduced modifications to the manuscript accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this report, the authors investigated the effects of reproductive secretions on sperm function in mice. The authors attempt to weave together an interesting mechanism whereby a testosterone-dependent shift in metabolic flux patterns in the seminal vesicle epithelium supports fatty acid synthesis, which they suggest is an essential component of seminal plasma that modulates sperm function by supporting linear motility patterns.

      Strengths:

      The topic is interesting and of general interest to the field. The study employs an impressive array of approaches to explore the relationship between mouse endocrine physiology and sperm function mediated by seminal components from various glandular secretions of the male reproductive tract.

      Thank you for your positive evaluation of our study's topic and approach. We are pleased that you found our investigation into the effects of reproductive secretions on sperm function to be of general interest to the field. We appreciate your positive feedback on the diverse methods we employed to explore this complex relationship.

      Weaknesses:

      Unfortunately, support for the proposed mechanism is not convincingly supported by the data, and the experimental design and methodology need more rigor and details, and the presence of numerous (uncontrolled) confounding variables in almost every experimental group significantly reduce confidence in the overall conclusions of the study.

      The methodological detail as described is insufficient to support replication of the work. Many of the statistical analyses are not appropriate for the apparent designs (e.g. t-tests without corrections for multiple comparisons). This is important because the notion that different seminal secretions will affect sperm function would likely have a different conclusion if the correct controls were selected for post hoc comparison. In addition, the HTF condition was not adjusted to match the protein concentrations of the secretion-containing media, likely resulting in viscosity differences as a major confounding factor on sperm motility patterns.

      We appreciate you highlighting concerns regarding our weak points and apologize for our unclear description. We revised the manuscript to be as rigorous and detailed as possible. In addition, some experimental designs were changed to simpler direct comparisons, and additional experiments were conducted (New Figure 1A-F, lines 103-113). We have made our explanations more consistent with the provided data, which includes further experimentation with additional controls and larger sample sizes to increase the reliability of the findings.

      To address the multiple testing problem, a multiple testing correction was made by making the statistical tests more stringent (Please see Statistical analysis in the Methods section and the Figure legends). Based on different statistical methods, the analysis results did not require significant revisions of the previous conclusions.

      Because the experiments on mixing extracts from the seminal vesicles were exploratory, we planned to avoid correcting for multiple comparisons. Repeating the t-test could lead to a Type I error in some results, so we apologize for not interpreting and annotating them. In the revised version, we removed the dataset for experiments on mixing extracts from the seminal vesicles and prostate, and we changed the description to refer to the clearer dataset mentioned above.

      The viscosity of the secretion-containing medium was measured with a viscometer, confirming that secretions did not significantly affect the viscosity of the solution. In addition, as the reviewer pointed out, we addressed the issue that the HTF condition could not be used as a control because of the heterogeneity in protein concentration (New Fig.1G, lines 110-111).

      Overall, we concluded that seminal vesicle secretion improves the linear motility of sperm more than prostate secretion.

      There is ambiguity in many of the measurements due to the lack of normalization (e.g. all Seahorse Analyzer measurements are unnormalized, making cell mass and uniformity a major confounder in these measurements). This would be less of a concern if basal respiration rates were consistently similar across conditions and there were sufficient independent samples, but this was not the case in most of the experiments.

      We apologize for the many ambiguities in the first manuscript. Cell culture experiments in the paper, including the flux analysis, were performed under conditions normalized or fixed by the number of viable cells. The description has also been revised to emphasize that the measurement values are standardized by cell count (lines 183-185, 189-190, 194-197). We emphasize that testosterone affects metabolism under the same number of viable cells (New Fig.4). This change in basal respiration is thought to be due to the shift in the metabolic pathway of seminal vesicle epithelial cells to a “non-normal TCA cycle” in which testosterone suppresses mitochondrial oxygen consumption, even under aerobic conditions (New Figs.3, 4, 5).

      The observation that oleic acid is physiologically relevant to sperm function is not strongly supported. The cellular uptake of 10-100uM labeled oleic acid is presumably due to the detergent effects of the oleic acid, and the authors only show functional data for nM concentrations of exogenous oleic acid. In addition, the effect sizes in the supporting data were not large enough to provide a high degree of confidence given the small sample sizes and ambiguity of the design regarding the number of biological and technical replicates in the extracellular flux analysis experiments.

      Thank you for your important critique. As you noted, the too-high oleic acid concentration did not reflect physiological conditions. Therefore, we changed the experimental design of an oleic acid uptake study and started again. We added an in vitro fertilization experiment corresponding to the functional data of exogenous oleic acid at nM concentrations (New Fig.7J,K, Lines 274-282).

      For the flux data to determine the effect of oleic acid on sperm metabolism, we have indicated in the text that the data were obtained based on eight male mice and two technical replicates. Pooled sperm isolated and cultured from multiple mice were placed in one well. The measurements were taken in three different wells, and each experiment was repeated four times. We did not use the extracellular flux analyzers XFe24 or XFe96. The measurements were also repeated because the XF HS Mini was used in an 8-well plate (only a maximum of 6 samples at a run since 2 wells were used for calibration).

      Overall, the most confident conclusion of the study was that testosterone affects the distribution of metabolic fluxes in a cultured human seminal vesicle epithelial cell line, although the physiological relevance of this observation is not clear.

      We thank the comments that this finding is one of the more robust conclusions of our study. Below we have written our thoughts on the physiological relevance of the observation results and our proposed revisions. In the mouse experiments, when the action of androgens was inhibited by flutamide, oleic acid was no longer synthesized in the seminal vesicles. The results of the experiments using cultured seminal vesicle epithelial cells showed that oleic acid was not being synthesized because of a change in metabolism dependent on testosterone. We have also added IVF data on the effects of oleic acid on sperm function (New Fig.7 and Supplementary Fig. 5, lines 274-282).<br /> As you can see, we have obtained consistent data in vitro and in vivo in mice. Our data also showed that the effects of testosterone on metabolic fluxes in vitro are similar in mouse and human seminal vesicle epithelial cells (New Fig.9). Therefore, it can be assumed that a decrease in testosterone levels causes abnormalities in the components of human semen. However, the conclusion was overestimated in the original manuscript, so we changed the wording as follows: It could be assumed that a decrease in testosterone levels causes abnormalities in the components of human semen. (lines 422-423)

      In the introduction, the authors suggest that their analyses "reveal the pathways by which seminal vesicles synthesize seminal plasma, ensure sperm fertility, and provide new therapeutic and preventive strategies for male infertility." These conclusions need stronger or more complete data to support them.

      We appreciate your comments about the suggestion presented in the introduction.

      We also removed our conclusions regarding treatment and prevention strategies for male infertility (lines 96-98). We wanted to discuss our findings not conclusively but as future applications that could result from further research based on our initial findings.

      The last sentence of the introduction has been revised to tone down these assertions as follows: These analyses revealed that testosterone promotes the synthesis of oleic acid in seminal vesicle epithelial cells and its secretion into seminal plasma, and the oleic acid ensures the linear motility and fertilization ability of sperm.

      We are grateful for your suggestions, which have prompted us to refine our manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Using a combination of in vivo studies with testosterone-inhibited and aged mice with lower testosterone levels, as well as isolated mouse and human seminal vesicle epithelial cells, the authors show that testosterone induces an increase in glucose uptake. They find that testosterone induces differential gene expression with a focus on metabolic enzymes. Specifically, they identify increased expression of enzymes that regulate cholesterol and fatty acid synthesis, leading to increased production of 18:1 oleic acid.

      Strength:

      Oleic acid is secreted by seminal vesicle epithelial cells and taken up by sperm, inducing an increase in mitochondrial respiration. The difference in sperm motility and in vivo fertilization in the presence of 18:1 oleic acid and the absence of testosterone is small but significant, suggesting that the authors have identified one of the fertilization-supporting factors in seminal plasma.

      Thank you for your positive comments regarding our work on the role of testosterone in regulating metabolic enzymes and the subsequent production of 18:1 oleic acid in seminal vesicle epithelial cells. We are pleased that the strength of our findings, particularly identifying oleic acid as a factor influencing sperm motility and mitochondrial respiration, has been recognized.

      Weaknesses:

      Further studies are required to investigate the effect of other seminal vesicle components on sperm capacitation to support the author's conclusions. The author's experiments focused on potential testosterone-induced changes in the rate of seminal vesicle epithelial cell glycolysis and oxphos, however, provide conflicting results and a potential correlation with seminal vesicle epithelial cell proliferation should be confirmed by additional experiments.

      Thank you very much for your valuable criticism. Although we fully agree with your comment, conducting experiments to investigate the effects of other seminal vesicle components on the fertilization potential of sperm would be a great challenge for us. This is because it has taken us the last three years to identify oleic acid as a key factor in seminal plasma. We are considering a follow-up study to explore the effect of other seminal vesicle components on sperm capacitation. Therefore, we have revised the Introduction and conclusions to tone down our assertions .

      The revised manuscript also includes additional data showing a correlation between changes in metabolic flux and the proliferation of seminal vesicle epithelial cells using shRNA. As a result, it was shown that cell proliferation is promoted when mitochondrial oxidative phosphorylation is promoted by ACLY knockdown (New Fig.8D, lines 303-305). This shows a close relationship between the metabolic shift in seminal vesicle epithelial cells and cell proliferation. The revised manuscript includes an interpretation and discussion of these results (lines 369-379).

      We are grateful for your suggestions, which have prompted us to refine our manuscript.

      Reviewer #3 (Public Review):

      Summary:

      Male fertility depends on both sperm and seminal plasma, but the functional effect of seminal plasma on sperm has been relatively understudied. The authors investigate the testosterone-dependent synthesis of seminal plasma and identify oleic acid as a key factor in enhancing sperm fertility.

      Strengths:

      The evidence for changes in cell proliferation and metabolism of seminal vesicle epithelial cells and the identification of oleic acid as a key factor in seminal plasma is solid.

      Weaknesses:

      The evidence that oleic acids enhance sperm fertility in vivo needs more experimental support, as the main phenotypic effect in vitro provided by the authors remains simply as an increase in the linearity of sperm motility, which does not necessarily correlate with enhanced sperm fertility.

      We appreciate the positive feedback on the solid evidence of cell proliferation and metabolic changes in seminal vesicle epithelial cells and the identification of oleic acid as an important factor in seminal plasma. We fully agree with the assessment that the evidence linking oleic acid and increased sperm fertility in vivo needs further experimental support. To address this concern, we changed the experimental design of an oleic acid study and started again to be more physiological regarding the effect of oleic acid on fertility outcomes, increased the replicates of artificial insemination, and added in vitro fertilization assessments (New Fig.7 and supplementary Fig.5, lines 274-282). The revised manuscript describes these experiments and discusses the association between oleic acid and fertility.

      We are grateful for your suggestions, which have prompted us to refine our manuscript.

      Recommendations for the authors:

      Reviewing Editor's note:

      As you can see from the three reviewers' comments, the reviewers agree that this study can be potentially important if major concerns are adequately addressed. The major concern common to all the reviewers is the incomplete mechanistic link between the physiological androgen effect on the production of oleic acid and its effect on sperm function. Statistical analyses need more rigor and consideration of other important capacitation parameters are needed to address these concerns and to improve the manuscript to support the current conclusions.

      Thank you for summarizing the reviewers' feedback and for your insights regarding the major concerns raised. We appreciate the reviewers' understanding of the potential importance of our work and have addressed the issues highlighted to strengthen the manuscript. We believe these changes will improve the quality of the manuscript and provide a clearer and more complete understanding of the role of androgens and oleic acid in sperm function.

      Reviewer #1 (Recommendations For The Authors):

      The following comments are provided with the hope of aiding the authors in improving the alignment between the data and their interpretations.

      Thank you for allowing us to strengthen our manuscript with your valuable comments and queries. We have made our best efforts to reflect your feedback.

      Major Comments:

      (1) The methodological detail is not sufficient to reproduce the work. For example:<br /> a. Manufacturer protocols are referred to extensively. These protocols are neither curated nor version-controlled. Please consider describing the underlying components of the assays. If information is not available, please consider providing catalog numbers and lot numbers in the methods (if appropriate for journal style requirements).

      We appreciate this suggestion, which we believe is important to ensure reproducibility. We described the catalog number in our Methodology and included as much information as possible.

      b. Please consider describing the analyses in full, with consideration given to whether blinding was part of the design. For example- line 492: "apoptotic cells were quantified using ImageJ". How was this quantified? How were images pre-processed? Etc.

      Although blinding was not performed, experiments and analyses based on Fisher's three principles were conducted to eliminate bias (lines 549-552). In order to avoid false-positive or false-negative results, it is clearly stated that tissue sections treated with DNAse were used as positive controls, and tissue sections without TdT were used as negative controls for apoptosis. We have added detailed quantification methods (lines 544-546).

      c. Please consider providing versions of all acquisition and analysis software used.

      We have added software version information in Materials and Methods.

      (2) Please consider revisiting the statistical analyses. Many of the analyses don't seem appropriate for the design. For example, the use of a t-test with multiple comparisons for repeated measures design in Figure 2 and the use of t-test for two-factor design in Figure 8. etc.

      To address the multiple testing issues, the statistical methodology was changed to a more rigorous one. Details are given in the Statistical analysis in the Methods section and the Figure legends.

      (3) The increase in % LIN in Figure 1 may be confounded by differences in viscosity between HTF and the fluid secretion mixtures. For this reason, HTF may not be an appropriate control for the ANOVA post hoc analysis. HTF protein was not adjusted to the same concentration as the secretion mixtures, correct? Ultimately, it does not appear that there would be a significant statistical effect of the different fluid mixtures if appropriate statistical comparisons were made. This detracts from the notion that the secretions impact sperm function.

      (4) Figure 1, the statistical analysis in the legend suggests that the experiments were analyzed with a t-test. Were corrections made for multiple comparisons in B-D? An ANOVA would probably be more appropriate.

      We used a viscometer to measure the viscosity of a solution of prostate and seminal vesicle secretions adjusted to a protein concentration of 10 mg/mL. The results showed that the secretions did not cause any significant viscosity changes (New Fig.1G, Lines 110-111).

      As you pointed out, the protein levels in the HTF medium and the secretion mixture are not adjusted to the same concentration. In addition, the original manuscript was not a controlled experiment because the two factors, seminal vesicle and prostate extracts, were modified. Therefore, to investigate the effect of prostate and seminal vesicle secretions on sperm motility, we modified the experimental design to directly compare the effects of the two groups: seminal vesicle and prostate extracts (New Fig.1A-G, lines 101-113). To show the sperm quality used in this study, motility data from sperm cultured in the HTF medium are presented independently in New Supplemental Fig.1A.

      (5) Additionally in Figure 1, there is no baseline quality control data to show that there are no intrinsic differences between sperm sampled from the two treatment groups. So baseline differences in sperm quality/viability remain a potential confounder.

      We thank you for this important point. Epididymal sperm were collected from healthy mice. We recovered only the seminal vesicle secretions from the flutamide-treated mice to pursue its role in the accessory reproductive glands, since testosterone targets the testes and accessory reproductive organs. So, there was no qualitative difference between the epididymal sperm before treatment. Nevertheless, incubation with seminal vesicle secretion for one hour altered the sperm motility pattern and in vivo fertilization results. Sperm function was altered by seminal vesicle secretion in a short period of culture time. We apologize for the confusion, and we have revised the text and figure to carry a clearer message (lines 128-132).

      (6) Figure 1E, did the authors confirm that flutamide-treated mice had decreased serum androgens? How often were mice treated with flutamide? This is important because flutamide has a relatively short half-life and is rapidly metabolized to inert hydroxyflutamide.

      Serum testosterone levels were unchanged. Flutamide was administered every 24 hours for 7 consecutive days. Although there was no change in blood testosterone levels (New Supplemental Fig.1B), a decrease in the weight of the seminal vesicles, prostate, and epididymis was confirmed. This is thought to be due to the pharmacological activity of flutamide.

      (7) Figure 1H, the meaning of 'relative activity of mitochondria' isn't clear. JC-1 does not measure 'activity'. A decreased average voltage potential across the inner mitochondrial membrane may indicate that more of the sperm from the flutamide group were dead. Additionally, J-aggregates are slow to form, generally requiring long incubation periods of at least 90 minutes or more. Additional positive and negative controls for predictable mitochondrial transmembrane voltage potential polarization states would have improved the quality of this experiment.

      Thank you for pointing this out. We have replaced the relative activity of mitochondria with high mitochondrial membrane potential (New Fig.1M, lines 125-128). Actually, it is thought that the sperm cultured in seminal vesicle secretions from mice that had been administered flutamide died because the motility of the sperm was also significantly reduced. Since antimycin reduces mitochondrial membrane potential, we have added an experiment in which 10 µM antimycin-treated sperm were used as a control to confirm that the JC-1 reaction is sensitive to changes in membrane potential.

      (8) Figure 4, the extracellular flux data appear to be unnormalized. The Seahorse instruments are extremely sensitive to the mass and uniformity of the cells at the bottom of the well. This may be a significant confounder in these results. For example, all of the observed differences between groups could simply be a product of differential cell mass, which is in line with the reduced growth potential of testosterone-treated cells indicated by the authors in the results section.

      We thank you for this important point. After correcting for cell viability, we seeded the same number of viable cultured cells into wells between experimental groups before measuring them in the flux analyzer. There were no significant differences in survival rates in all experiments. As a result, an increase in glucose-induced ECAR and a suppression of mitochondrial respiration were observed. We would like to emphasize that this difference based on metabolic data does not imply a reduction in the growth potential of the cells due to testosterone treatment.

      We described that these measurements are normalized based on cell count and viability (lines 184, 190, 195).

      (9) How did the authors know that the isolated mouse primary cells were epithelial cells? Was this confirmed? What was the relative sample purity?

      The cells were labeled with multiple epithelial cell markers (cytokeratin) and confirmed using immunostaining and flow cytometry. The percentage of cells positive for epithelial cell markers was approximately 80%. A stromal cell marker (vimentin) was also used to confirm purity, but only a few percent of cells were positive. The contaminating cell type was considered to be mainly muscle cells because the gene expression levels of muscle cell markers verified by RNA-seq were relatively high.

      (10) It is misleading to include the lactate/pyruvate media measurements in the middle of the figure in Figure 4 D and E because it seems at first glance like these measurements were made in the seahorse media but they are completely unrelated. Additionally, these measures are not normalized and are sensitive to confounding differences in cell viability, seeding density, mass, etc.

      Thank you for pointing this out. We have placed the lactate and pyruvate measurement graphs after the flux data of ECAR. We noted that these measurements are normalized based on cell count and viability (lines 189-190). The doubling time of seminal vesicle epithelial cells was approximately 3 days, and testosterone inhibited cell proliferation. Therefore, the seeding concentration of cells was increased 4-fold in the testosterone-treated group compared to the control, and experiments were conducted to ensure that the confluency at the time of measurement after 7 days of culture was comparable between groups.

      (11) The flux analyzer assays sold by Agilent have many ambiguities and problems of interpretation. Unfortunately, Agilent's interest in marketing/sales has outpaced their interest in scientific rigor. Please consider revising some of the language regarding the measurements. For example, 'ATP production rate' is not directly measured. Rather, oligomycin-sensitive respiration rate is measured. The conversion of OCR to ATP production rate is an estimation that depends on complex assumptions often requiring additional testing and validation. The same is true for other ambiguous terms such as 'maximal respiration' referring to FCCP uncoupled respiration, and glycolytic rate- which is also not measured directly. If the authors are interested in a more detailed description of the problems with Agilent's interpretation of these assays please see the following reference (PMID: 34461088).

      Thank you for your critical criticism and thoughtful advice, as well as for sharing the excellent reference. We agree with you on the flux analyzer ambiguities and data interpretation problems. The description of the measured values has been revised as follows.

      We have replaced the “ATP production rate” with the “oligomycin-sensitive respiratory rate.” Similarly, we have replaced “maximal respiration” with “FCCP-induced unbound respiration.” (lines 197-202) We chose not to deal with the conversion of OCR to ATP production rate because it is outside the scope of interest in our study.

      Avoid using the term "glycolytic capacity". We use “Oligomycin-sensitive ECAR.” (line 186) We recognize that the ECARs measured in this study reflect experimental conditions and may not fully represent physiological glycolytic flux in vivo. So, the main section includes a data set of glucose uptake studies to emphasize the significance of the changes obtained with the flux analyzer assay. (New Fig.6, lines 230-254)

      Figure 6, it's not surprising to see the accumulation of labeled oleic acid in the cells, however, this does not mean that oleic acid is participating in normal metabolic processes. Oleic acid will have detergent effects at high (uM) concentrations. The observation that sperm 'take up' OA at 10-100 uM concentrations should also be validated against sperm function the health of the cells is very likely to be negatively impacted. Additionally, no apparent accumulation is noted in the fluorescence imaging at 1uM, but the authors insinuate that uptake occurs at low nM concentrations. The effects in Figure 6D-F are nominal at best and are likely a result of the small sample sizes.

      Thank you for your good suggestion. We agree with the reviewer that high concentrations of oleic acid had a detergent effect. To improve the consistency of functional data and observations, oleic acid uptake tests were performed under the same concentration range as the sperm motility tests (New Fig.7A-C). The oleic acid concentration at this time was calculated regarding the oleic acid concentration in seminal fluid recovered from mice as detected by GCMS to reflect in vivo conditions.

      Epididymal sperm were incubated with fluorescently labeled oleic acid and observed after quenching of extracellular fluorescence. Fluorescent signals were detected selectively in the midpiece of the sperm. The fluorescence intensity of sperm quantified by flow cytometry increased significantly in a dose-dependent manner (New Fig.7A-C, lines 261-264).

      Furthermore, increasing the sample size did not change the trend of the sperm motility data. Although the effect size of oleic acid on sperm motility was small (New Fig.7D-G, lines 265-268), an improvement in fertilization ability was observed both in vitro (IVF) and in vivo (AI) (New Fig.7J-L, lines 274-282, 286-291). We conclude that the effect of oleic acid on sperm is of substantial significance. These data and interpretations have been revised in the text in the Results section.

      (12) Figure 6H, I applaud the authors for attempting intrauterine insemination experiments to test their previous findings. That said, there is no supporting data included to show that the sperm from the treatment groups had comparable starting viability/quality. Additionally, it is difficult to tell if the results are due to the small sample sizes and particularly the apparent outlier in the flutamide-only group.

      Thanks for the praise and comments for improvement. As we answered in your comment #5 above, the epididymal sperm was collected from healthy mice. Therefore, there is no qualitative difference in the epididymal sperm before treatment. This is described in the figure legend (lines 1130-1131). We apologize again for this complication. We also more than doubled the number of replications of the experiment. The impact of the outlier would have been minimal.

      (13) One final question related to Figure 6H: how did the authors know they were retrieving all of the possible 2-cell embryos from the uterus? Perhaps the authors could provide the raw counts of unfertilized eggs and 2-cell embryos so we can see if there were differences between the mice.

      We retrieved the pronuclear stage embryos from the fallopian tubes. It is not certain whether all embryos were recovered. Therefore, we added the number of embryos in the graph and in the supplementary data.

      (14) Figure 7 has the same seahorse assay normalization problem as mentioned earlier. Without normalization, it is difficult to tell if the effects are simply due to differences in cell mass. Were the replicates indicated in the graphs run on the same plate? If so, it would be much more convincing to see a nested design, with technical replicates within plates, and additional replicates run on separate plates.

      As we answered in your comment #8 above, these measurements were normalized based on sperm count. This has been corrected to be noted in the text and the figure legend (lines 1123-1124).

      Pooled sperm isolated and cultured from multiple mice were placed in one well. The measurements were taken in three different wells, and each experiment was repeated four times. We did not use the extracellular flux analyzers XFe24 or XFe96. The measurements were also repeated because the XF HS Mini was used in an 8-well plate (only a maximum of 6 samples at a run since 2 wells were used for calibration).

      (15) The statistical test in Figures 8E and F described in the legend is inappropriate (t-test), this appears to be a two-factor design.

      Thank you for pointing this out. Differences between groups were assessed using a two-way analysis of variance (ANOVA). When the two-way ANOVA was significant, differences among values were analyzed using Tukey's honest significant difference test for multiple comparisons.

      (16) The data in Figure 8 are interesting, and the effects appear to be a little more consistent compared with the mouse primary cells, potentially due to cell uniformity. However, the data are unnormalized, causing significant ambiguity, and there are no measures of cell viability to determine if the effects are due to cell death (or at least relative cell mass).

      As we answered in your comments #8 and #14 above, these measurements were normalized based on cell count and viability. This has been corrected to be noted in the figure legend (lines 1185-1186).

      Minor Comments:

      (1) The section title indicating the beginning of the results section is missing.

      A section title has been added to indicate the beginning of the results section.

      (2) There were several typos and confusingly worded statements throughout. Please consider additional editing.

      We used a proofreading service and corrected as much as possible.

      (3) In the introduction, a brief description of seminal fluid physiology is provided, but the reference is directed toward human physiology. Given that the research is performed solely in the mouse, a brief comparative description of mouse physiology would be helpful. For example, what is the role of mouse seminal fluid in the formation of the mating plug? What are the implications of the relative size disparity in seminal vesicles in mice versus humans? Etc.

      The third paragraph of the introduction has been revised (lines 57-60).

      Reviewer #2 (Recommendations For The Authors):

      Thank you for allowing us to strengthen our manuscript with your valuable comments and queries. We have made our best efforts to reflect your feedback.

      (1) The abstract is confusing and partly misleading and should be revised to more clearly and accurately summarize the study.

      The abstract was revised to be clearer and more accurate (lines 20-34).

      (2) The introduction should be revised to more accurately describe the sperm life cycle. Spermatogenesis, per definition, for example, exclusively takes place in the testis, sperm do not gain fertilization competence in the epididymis, sperm isolated from the epididymis cannot fertilize an oocyte unless in vitro capacitated, etc. In the last paragraph the connection between changes in fructose and citrate concentration, sperm metabolism and testicular-derived testosterone and AR remain unclear.

      The introduction was revised to be clearer and more accurate (lines 44-45).

      Citric acid and fructose are chemical components that are the subject of biochemical testing and are commonly used as semen testing items for humans and livestock. This is because the secretory function of the prostate and seminal vesicles is dependent on androgens. The measurement of citric acid and fructose concentrations in semen is routinely used to indicate testicular androgen production function (ISBN: 978-1-4471-1300-3, 978 92 4 0030787).

      (3) Throughout the manuscript the concept of (in vitro) capacitation is missing. Mixing sperm with seminal plasma is not the only way to achieve sperm that can fertilize the oocyte. Since media containing bicarbonate and albumin is the standard procedure in the field to capacitate epididymal mouse sperm rein vitro, the manuscript would gain value from a comparison between the effect of seminal plasma and in vitro capacitating media. Interesting readouts in addition to motility would i.e. be sAC activation, PKA-substrate phosphorylation, and acrosomal exocytosis.

      Thank you for pointing out this important point. As the reviewer points out, fertilization can be achieved in artificial insemination and in vitro fertilization using epididymal sperm which have not been exposed to seminal plasma. This has historically led to an underestimation of the role of accessory reproductive glands, such as the prostate and seminal vesicles. However, it has been reported that the removal of seminal vesicles in rodents decreases the fertilization rate after natural mating. This has been shown to be due to multiple factors affecting sperm motility rather than factors involved in plug formation (PMID: 3397934), but details of these factors and the whole picture of the role of the accessory glands were not known. This led us to become interested in the effects of sperm plasma on sperm other than fertilization and led us to begin research on the role of the accessory glands that synthesize sperm plasma.

      Early in our study, we found that simply exposing sperm to seminal vesicle extracts for 1 hour before IVF dramatically reduced fertilization rates, even in HTF medium containing bicarbonate and albumin. The experiment was designed on the assumption that seminal plasma contains factors that inhibit sperm from acquiring fertilizing ability. Therefore, we conducted experiments using modified HTF without albumin to avoid unintended motility patterns.

      However, we also respect the reviewer's opinion, and we have added our preliminary data related to IVF (New supplementary Fig.5).

      (4) In the introduction and throughout the manuscript it is unclear what the authors mean by "linear motility". An increase in VSL doesn't mean that the sperm swim in a more linear or straight way, or even that the sperm are 'straightened', it means that they swim faster from point A to point B. Do the authors mean progressive or hyperactivated motility? Please clarify.

      For all conditions tested the authors should follow the standard in the field and include the % of motile, progressively motile, and hyperactivated sperm.

      Thank you for pointing this out. We appreciate your feedback regarding the terminology. In our manuscript, "linear motility" refers to the degree to which sperm move in a straight line. We have clarified this by explaining that VSL (Straight-Line Velocity) and LIN (Linearity) are used to quantify and describe linear motility in sperm analysis: Higher VSL values indicate more direct, linear movement. A higher LIN value indicates a straighter path, thus representing greater linear motility. These terms have been standardized, and explanations have been added to the main text (lines 111-113).

      In response to your suggestion, we have included the percentage of motility and progressive motility for all conditions tested. However, since the experiment was performed using modified HTF without albumin, we have decided not to report the percentage of hyperactivation to avoid confusion.

      (5) Did the authors confirm that the injection of flutamide decreases androgen levels? That control needs to be included in the experiment to validate the conclusion.

      Injection of flutamide did not reduce androgen levels (see reviewer #1, comment 6). This is because flutamide's mechanism of action is based on antagonizing androgen and inhibiting its binding to the androgen receptor (New Fig.2A).

      (6) The role of mitochondrial activity in sperm progressive motility is still under investigation. PMID: 37440924 i.e. showed that inhibition of the ETC does not affect progressive but hyperactivated motility. The authors should either include additional experiments to confirm the correlation between mitochondrial activity and sperm progressive motility or tone down that conclusion.

      We have previously shown that treatment with D-chloramphenicol, an inhibitor of mitochondrial translation, significantly reduced sperm mitochondrial membrane potential, ATP levels, and linear motility (PMID: 31212063). Also, in the previous manuscript, we did not address progressive motility or hyperactivated motility in our analysis. We have chosen to discuss the effect of mitochondrial activity on linear motility rather than on progressive motility and hyperactivation of sperm.

      Was mitochondrial activity also altered in epididymal sperm incubated with and without seminal plasma or in aged mice?

      The mitochondrial membrane potential of epididymal sperm cultured with seminal vesicle extract (SV) was higher than that of epididymal sperm cultured without seminal vesicle extract (without SV: 67.3 ± 0.8%, with SV: 83.4 ± 1.8%). On the other hand, the mitochondrial membrane potential of epididymal sperm cultured with seminal vesicle extract recovered from aged mice was decreased (SV from aged: 60.3 ± 2.7%). It should be noted that the epididymal spermatozoa used in these experiments were healthy individuals, different from those from which seminal vesicle extracts were collected. (See also the response to reviewer 1's comment #5.)

      (7) The quality of the provided images showing AR, Ki67, and TUNEL staining should be improved or additional images should be included. Especially the AR staining is hard to detect in the provided images. The authors should also include a co-staining between AR and vesicle epithelial cells. That epithelial cells are multilayered does not come across in the pictures provided.

      We apologize for any inconvenience caused. The image has been replaced with one of higher resolution. The multilayered structure of the epithelial cells will also be seen.

      For the 12-month-old mice, an age-matched control should be included to support the authors' conclusion.

      To clarify the seminal vesicle changes associated with aging, we included images of 3-month-old mice as controls (New Supplementary Fig.2D).

      Overall, the rationale for the experiment does not become clear. How are the amount of seminal vesicle epithelial cells, testosterone, and AR expression connected to seminal plasma secretions? Why is it a disadvantage to have proliferating seminal vesicle epithelial cells? How is proliferation connected to the proposed switch in metabolic pathway activity?

      We have added some explanations and supporting data to the manuscript (New Fig.8D, lines 303-305, 315-319, 369-379). Cell proliferation stopped when the metabolic shift occurred, redirecting glucose toward fatty acid synthesis. Fatty acid synthesis is an important function of the seminal vesicle, and in the presence of testosterone, fatty acid synthesis enhancement and arrest of proliferation occur simultaneously. The connection between metabolism and cell proliferation was further demonstrated when ACLY was knocked down by shRNA, which stopped fatty acid synthesis and released the proliferative arrest induced by testosterone, allowing the cells to proliferate again. However, we do not know what effects occur when cell proliferation is stopped.

      (8) The experiments provided for glycolysis and oxphos are inconsistent and insufficient to support the authors' conclusion that testosterone shifts glycolytic and oxphos activity of seminal vesicle epithelial cells. Multiple groups (PMID 37440924, 37655160, 32823893) have shown that the increased flux through central carbon metabolism during capacitation is accompanied by an accumulation of intracellular lactate and increased secretion of lactate into the surrounding media. How do the authors explain that they see an increase in glucose uptake and ECAR but not in lactate and a decrease in pyruvate? Did the authors additionally quantify intracellular pyruvate and lactate? Since pyruvate and lactate are in constant equilibrium, it is odd that one metabolite is changing and the other one is not.

      Thank you for pointing this out. Since ECAR is often used as an alternative to lactate production but does not directly measure lactate levels, we measured changes in lactate and pyruvate concentrations in the culture medium. Under our experimental conditions, glucose appeared to be directed primarily towards anabolic processes, such as fatty acid synthesis, rather than the OXPHOS pathway, which may explain the lack of lactate production. The observed decrease in pyruvate might indicate its conversion to acetyl-CoA in the mitochondria, supporting both fatty acid synthesis and the TCA cycle. This shift would be consistent with the metabolic reprogramming toward anabolic activity.

      What do the authors mean by "the glycolytic pathway was not enhanced despite the activation of glycolysis" Seahorse, especially using a series of pathway inhibitors, only provides an indirect measurement of glycolysis and oxphos since the instrument does not provide a distinction from which pathways the detected protons are originating. The authors should consider a more optimized experimental design, i.e. the authors could monitor ECAR and OCR in the presence of glucose over time with and without the addition of testosterone. That would be less invasive since the sperm are not starved at the beginning of the experiment and would provide a more direct read-out. Did the authors normalize cell numbers in their experiment? Alternatively, the authors could consider performing metabolomics experiments.

      I agree with the reviewer. Buzzwords such as “glycolytic capacity” simply do not make sense, so we have removed them from the phrases noted by the reviewer. Please refer to the response to some of reviewer 1's points regarding the ambiguity of the data measured by the flux analyzer. Nevertheless, the assay design of the flux analysis could be used as a good “starting point” and provide information on the glycolytic system and respiratory control. Therefore, the interpretation of the flux analysis is supported by subsequent data sets.

      (9) The authors would strengthen their results by confirming their gene expression data by quantifying the expression of the respective proteins.

      Does testosterone treatment increase GLUT4 protein levels in isolated seminal vesicle epithelial cells? Or does it change the localization of the transporter? Are GLUT4 gene and protein levels altered in flutamide-treated cells? How do the authors explain that testosterone increases glucose uptake without changing Glut gene expression?

      We performed Western blot analysis to measure GLUT4 protein levels in seminal vesicle epithelial cells after testosterone treatment. The results showed that testosterone does not alter the expression of GLUT4 protein but simply changes its subcellular localization (New Fig.6C,D, lines 238-244).

      The discussion includes the interpretation of the observation that testosterone increases glucose uptake by altering localization without altering GLUT4 gene expression, a phenomenon commonly seen in other cells, such as cardiomyocytes (lines 362-364). The revised main figure also includes a data set of changes in GLUT4 localization, including flutamide-treated data. See also Reviewer 3's main comment #1.

      (10) Considering that the authors claim that SV secretions are crucial for sperm fertilization capacity, how do they explain that fertilization rates are still at 40 % when sperm are treated with flutamide?

      It is actually about 50% fertilized with HTF because it is fertilized without SV. Considering this baseline, we found that seminal vesicle secretions positively affect sperm in vivo fertilization. On the other hand, seminal plasma from flutamide-treated mice reduced the fertilization ability of healthy sperm. These are described in the text (lines 283-294).

      (11) It would be beneficial for the reader to include a schematic summarizing the results.

      Thank you for your advice from the reader's point of view. We have visualized the summaries of this study and added them to the manuscript (New Fig.10).

      Minor comments:

      Line 38: Male fertility, no article, please revise.

      I have changed “The male fertility” to “Male fertility” and added some references (lines 42-43).

      Line 55: Seminal plasma or TGFb? Please clarify.

      Corrected as follows. “TGFβ, a component of seminal plasma, increases antigen-specific Treg cells in the uterus of mice and humans, which induces immune tolerance, resulting in pregnancy.” (lines 60-62)

      Line 63: Why do the authors find it surprising that blood and seminal plasma have different compositions?

      This is because seminal plasma contains unique biochemical components that are not normally found in blood or only in small quantities. The intention was to emphasize the unique function of seminal plasma in supporting the physiological functions of sperm and to highlight its complex role by comparing it to blood. We clarified these intentions and reflected them in the revised text (lines 62-67).

      Line 94: The headline causes confusion. Seminal plasma does not induce sperm motility, it increases progressive sperm motility.

      Corrected as follows. “The effect of androgen-dependent changes in mouse seminal vesicle secretions on the linear motility of sperm” (lines 101-102)

      Reviewer #3 (Recommendations For The Authors):

      Thank you for allowing us to strengthen our manuscript with your valuable comments and queries. We have made our best efforts to reflect your feedback.

      Major:

      Figure 4 and Figure 5: The trend shows that GLUT3 is up-regulated and GLUT4 is downregulated although both of them are not statistically significant. However, GLUT4 is picked for all the following experiments based on protein localization. Providing other evidence/discussion why not to further consider other GLUTs will help to justify. Also, this reviewer suggests including GLUT4 localization data in the main figure as it is important data for the logical flow to link the following figures.

      We focused on GLUT4 because it was known that testosterone increases glucose uptake by changing the localization of GLUT4 without changing its expression (lines 230-231). In the revised manuscript, the increasing trend in Glut3 gene expression was also mentioned in the discussion, in addition to GLUT4 (lines 360-362). In any case, the results showed that testosterone increased glucose uptake by regulating the function of glucose transporters.

      Immunostaining of GLUT1~4 was performed to compare seminal vesicles from flutamide-treated mice with controls, and localization changes were observed only in GLUT4. Therefore, we hypothesized that GLUT4 is regulated by testosterone and performed the experiment. Fortunately, we were able to obtain a GLUT4-specific inhibitor, which dramatically inhibited the testosterone-dependent glucose uptake and subsequent lipid synthesis in seminal epithelial cells, leading us to believe that GLUT4 is a major glucose transporter.

      Increasing sperm linearity by oleic acid is observed and interpreted as enhanced sperm fertilizing potential. It is not clear why and how sperm linearity can be a determinant factor for enhancing sperm fertility in vivo. Providing an explanation of the effect of oleic acid on another key motility parameter more proven to be directly correlated with fertility (i.e., hyperactivation), and more direct evidence of oleic acid on enhancing sperm linearity indeed increasing sperm fertilization using IVF, is strongly recommended to support the author's main conclusion.

      Thank you for pointing this out. It is known that proteins derived from the seminal vesicles inhibit the hyperactivation of sperm and the acrosome reaction. Therefore, we conducted an experiment to add oleic acid, focusing on fatty acid synthesis caused by the metabolic shift of the seminal vesicles, which had not been known until now.

      Sperm were pretreated with an oleic acid-containing medium before IVF and oleic acid enhanced sperm linearity. When the sperm number was sufficient, there was no change in the cleavage rate after in vitro fertilization, but when the sperm count was reduced to one-tenth of the normal, the cleavage rate increased compared to the control (lines 274-282). In other words, the physiological role of oleic acid is to increase the probability of fertilization by keeping the sperm motility pattern linear or progressive. This increases the likelihood of the sperm passing through the female reproductive tract and environments that are unfavorable to sperm survival. Our research has uncovered significant insights into the role of seminal vesicle fluid and oleic acid in sperm fertilization. Due to the strong effect of the decapacitation factor, we found that seminal vesicle fluid reduces the fertilization rate in IVF. However, it does not interfere with the fertilization rate in in vivo during artificial insemination. This emphasizes the importance of oleic acid, along with other protein components of seminal plasma, in ensuring the in vivo fertilization ability of sperm.

      Minor:

      Please correct a typo in Line 173: sifts to shifts

      All typographical errors have been corrected.

    1. Author response:

      We plan to submit a revised version of our manuscript eLife-RP-RA-2024-105013, in which we address all comments raised by the two expert reviewers.

      Below we describe what we like to address in this revision. We understand that the provisional response is not meant to be a point-by-point reply. Therefore, our revision plan more generally summarizes the comments of the reviewers and how we plan to address them.

      Reviewer #1:

      This reviewer is overall very positive and states that our ‘work is likely to become the go-to resource for quantification in this field’. This reviewer raises few weaknesses of the manuscript that are explicitly described as minor.

      Microscopic resolution sufficient to support quantitative spine assessments?

      In the detailed revision, we will provide quantification of microscopic resolution and will relate this to the spine comparisons offered. Where needed, we will add caveats discussing measurement limits.

      Age of the human tissue.

      Most analysis is based on the study of three brains from elderly individuals. For the analysis of dendritic spines, we added measures from a younger brain (37 years-old). We will make it more clear, which datasets contained these measures and what the results of our comparative analysis have been.

      Genetic diversity contributing to species differences?

      We provide an updated discussion on this interesting topic.

      Reviewer #2:

      This reviewer also expresses a largely positive view of the manuscript, noting that ‘..the data will be of widespread interest to the cerebellar field…’. 

      Microscopic resolution:

      see above.

      Figure panels / Fig. 3:

      We will make sure that the figures are readable and will provide a clarification of gray scales used in Fig. 3.

      Vertical vs horizontal dendrite orientation:

      This is a point that requires clarification. Per our definition, all dendrites fall either into the vertical or horizontal category. We will make sure that this is defined sufficiently well.

    1. Author response:

      Response to Referee 1

      We agree that convex walls increase the time that consortia remain trapped in pores at high magnetic fields. Since the non-monotonic behavior of the drift velocity with the Scattering number arises largely due to these long trapping times, we agree that experiments using concave pores are likely to show a peak drift velocity that is diminished or erased.

      However, we disagree that a random packing of spheres or similar particles provides an appropriate model for natural sediment, which is not composed exclusively of hard particles in a pure fluid. Pore geometry is also influenced by clogging. Biofilms growing within a network of convex pillars in two-dimensional microfluidic devices have been observed to connect neighboring pillars, thereby forming convex pores. Similar pore structures appear in simulations of biofilm growth between spherical particles in three dimensions. Moreover, the salt marsh sediment in which MMB live is more complex than simple sand grains, as cohesive organic particles are abundant. Experiments in microfluidic channels show that cohesive particles clog narrow passageways and form pores similar to those analyzed here. Thus, we expect convex pores to be present and even common in natural sediment where clogging plays a role.

      The concentration of convex pores in the experiments presented here is almost certainly much higher than in nature. Nonetheless, since magnetotactic bacteria continuously swim through the pore space, they are likely to regularly encounter such convexities. Efficient navigation of the pore space thus requires that magnetotactic bacteria be able to escape these traps. In the original version of this manuscript, this reasoning was reduced to only one or two sentences. That was a mistake, and we thank the reviewer for prompting us to expand on this point. As the reviewer notes, this reasoning is central to the analysis and should have been featured more prominently. In the final version, we will devote considerable space to this hypothesis and provide references to support the claims made above.

      The reviewer suggests that the generality of this work depends on our finding a "positive correlation between the swimming speed and alignment [rate] based on parameters derived from literature." We wish to emphasize that, in addition to predicting this correlation, our theory also predicts the function that describes it. The black line in Figure 3 is not fitted to the parameters found in the literature review; it is a pure prediction.

      Response to Referee 2

      In the "Recommendations for the Authors," this reviewer drew our attention to a manuscript that absolutely should have been prominently cited. As the reviewer notes, our manuscript meaningfully expands upon this work. We are pleased to learn that the phenomena discussed here are more general than we initially understood. It was an oversight not to have found this paper earlier. The final version will better contextualize our work and give due credit to the authors. We sincerely appreciate the reviewer for bringing this work to our attention.

      We disagree that the use of non-culturable organisms and our unrealistic array should be considered serious weaknesses. While any methodological choice comes with trade-offs, we believe these choices best advance our aims. First, the goal of our research, both within and beyond this manuscript, is to understand the phenotypes of magnetotactic bacteria in nature. While using pure cultures enables many useful techniques, phenotypic traits may drift as strains undergo domestication. We therefore prioritize studying environmental enrichments.

      Clearly, an array of obstacles does not fully represent natural heterogeneity. However, using regular pore shapes allows us to average over enough consortium-wall collisions to enable a parameter-free comparison between theory and experiment. Conducting an analysis like this with randomly arranged obstacles would require averaging over an ensemble of random environments, which is practically challenging given the experimental constraints. Since we find good agreement between theory and experiment in simple geometries, we are now in a position to justify extending our theory to more realistic geometries. Additionally, we note that a microfluidic device composed of a random arrangement of obstacles would also be a poor representation of environmental heterogeneity, as pore shape and network topology differ between two and three dimensions.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: 

      Cruz-González and colleagues draw on DNA methylation and paired genetic data from 621 participants (n=308 controls; n=313 participants with Alzheimer's Disease). The authors generate a panel of epigenetic biomarkers of aging with a primary focus on the Horvath multi-tissue clock. The authors find weaker correlations between predicted epigenetic age and chronological age in subgroups with higher African ancestry than within a subgroup identified as White. The authors then examine genetic variation as a potential source for between-group differences in epigenetic clock performance. The authors draw on a large collection of publicly available methylation quantitative trait loci datasets and find evidence for substantial overlap between clock CpGs located within the Horvath clock and methQTLs. Going further, the authors show that methQTLs that overlap with Horvath clock CpGs show greater allelic variation in African ancestral groups pointing to a potential explanation for poorer clock performance within this group. 

      Thank you for this summary.

      Strengths:  

      This is an interesting dataset and an important research question. The authors cite issues of portability regarding polygenic risk scores as a motivation to examine between-group differences in the performance of a panel of epigenetic clocks. The authors benefit from a diverse cohort of individuals with paired genetic data and focus on a clinical phenotype, Alzheimer's disease, of clear relevance for studies evaluating age-related biomarkers.  

      Weaknesses:  

      While the authors tackle an important question using a diverse cohort the current manuscript is lacking some detail that may diminish the potential impact of this paper. For example:  

      (1) Information on chronological ages across groups should be reported to ensure there are no systematic differences in ages or age ranges between groups (see point below).  

      Thank you for pointing out this omission. The age ranges are similar across cohorts. No individuals under 60 were considered, and the average ages per cohort ranged from 72 to 76. Neither average age nor age range was consistently higher or lower in the admixed cohorts for which the clocks had lower performance compared to the White cohort. We will report the age distributions in supplementary material in the revision.

      (2) The authors compare correlations between chronological age and epigenetic age in sub-groups within to correlations reported by Horvath (2013). Attempting to draw comparisons between these two datasets is problematic. The current study has a much smaller N (particularly for sub-group analyses) and has a more restricted age range (6090yrs versus 0-100 yrs). Thus, is an alternative explanation simply that any weaker correlations observed in this study are driven by sample size and a restricted age range? Reporting the chronological ages (and ranges) across subgroups in the current study would help in this regard. Similarly, given the lack of association between AD status and epigenetic age (and very small effect in the white group), it may be of interest to examine the correlation between chronological age and epigenetic age in each group including the AD participants: would the between-group differences in correlations between chronological age and epigenetic be altered by increasing the sample size?

      Our conclusions about the reduced accuracy of the clocks in admixed individuals are based on comparisons within the MAGENTA cohorts, not on the comparisons to previous reports. We show significantly reduced accuracy on African American and Puerto Rican cohorts in MAGENTA compared to the White MAGENTA cohort. The reviewer is correct that the lower correlation in each of the cohorts compared to those in the Horvath study is due to the older age range of our cohort. Indeed, other studies applying the Horvath clock have seen similar correlations to those observed on the White MAGENTA cohort (Marioni et al., 2015, Horvath 2013, and Shireby et al., 2020). Following the suggestion to increase sample size, we conducted the chronological age vs. epigenetic age correlation analysis with the inclusion of AD cases. The significantly lower performance of the clock on Puerto Ricans and African Americans relative to White individuals remains after including all individuals in each cohort. We will include these results on the full cohorts in MAGENTA in the revision.

      (3) The correlation between chronological age and epigenetic age, while helpful is not the most informative estimate of accuracy. Median absolute error (and an analysis of MAE across subgroups) would be a helpful addition.  

      We used correlation because this is commonly used to evaluate the performance of epigenetic age clocks, but we agree that direct error quantification provides a complementary perspective. We confirm that the African American and Puerto Rican cohorts have higher error than the White cohort, and we will report these comparisons in the revision.

      (4) More information should be provided about how DNAm data were generated. Were samples from each ancestral group randomized across plates/slides to ensure ancestry and batch are not associated? How were batch effects considered? Given the relatively small sample sizes, it would be important to consider the impact of technical variation on measures of epigenetic age used in the current study. The use of principal Component-based versions of these clocks (Higgins Chen et al., 2023; Nature Aging https://doi.org/10.1038/s43587-022-00248-2) may help address concerns such concerns.  

      Thank you for pointing out the need for additional context on data generation. All omics data from the MAGENTA study were generated using protocols that aim to minimize technical artifacts and batch effects. We will add detailed protocol information will be detailed in the revision. We also thank the reviewer for their suggestion on applying the principal component clock to account for potential technical variation. We are planning to perform these analyses and include them in the revision.

      (5) Marioni et al., (2015) found a very weak cross-sectional association between DNAm Age and cognitive function (r~0.07) in a cohort of >900 participants. Given these effect sizes, I would not interpret the absence of an effect in the current study to reflect issues of portability of epigenetic biomarkers. 

      We agree that previous links between DNAm Age and AD/cognitive function have been small in magnitude. For example, the PhenoAge paper (Levine et al., 2018) and a study using the Horvath clock (Levine et al., 2015) found age acceleration of less than a year in AD patients relative to non-demented individuals. These effects have been detected in studies with relatively small sample sizes (e.g., 700 for Levine et al. 2015 and 604 for Levine et al. 2018). Our study is of similar size, but the cohort-specific analyses have lower power. Nonetheless, we replicate the modest, but significant association with AD in the white MAGENTA cohort. We have performed power calculations and find that we have 26% power to detect an effect of this size in the Cubans, 46% for the Peruvians, 66% for the Whites, 74% for the Puerto Ricans, and 84% for the African Americans. Given the relatively high power in the Puerto Rican and African American cohorts, we suggest that the reduced accuracy of the clocks contributes to the lack of association. We will also add caveats about power and the small sample size in the revision.

      6) The methQTL analyses presented are suggestive of potential genetic influence on DNAm at some Horvath CpGs. Do authors see differences in DNAm across ancestral groups at these potentially affected CpGs? This seems to be a missing piece together (e.g., estimating the likely impact of methQTL on clock CpG DNAm). 

      Thank you for this excellent suggestion. We will add this analysis in the revision. This will enable us to test for further evidence for our hypothesis about the role of ancestryspecific meQTL on clock accuracy.  

      Reviewer #2 (Public review):

      Summary:  

      This paper seeks to characterize the portability of methylation clocks across groups. Methylation clocks are trained to predict biological aging from DNA methylation but have largely been developed in datasets of individuals with primarily European ancestries. Given that genetic variation can influence DNA methylation, the authors hypothesize that methylation clocks might have reduced accuracy in non-European ancestries.  

      Strengths:  

      The authors evaluate five methylation clocks in 621 individuals from the MAGENTA study. This includes approximately 280 individuals sampled in Puerto Rico, Cuba, and Peru, as well as approximately 200 self-identified African American individuals sampled in the US. To understand how methylation clock accuracy varies with proportion of nonEuropean ancestry, the authors inferred local ancestry for the Puerto Rican, Cuban, Peruvian, and African American cohorts. Overall, this paper presents solid evidence that methylation clocks have reduced accuracy in individuals with non-European ancestries, relative to individuals with primarily European ancestries. This should be of great interest to those researchers who seek to use methylation clocks as predictors of agerelated, late-onset diseases and other health outcomes.

      Thank you for this summary.

      Weaknesses:  

      One clear strength of this paper is the ability to do more sophisticated analyses using the local ancestry calls for the MAGENTA study. It would be valuable to capitalize on this strength and assess portability across the genetic ancestry spectrum, as was recently advocated by Ding et al. in Nature (2023). For example, the authors could regress non-European local ancestry fraction on measures of prediction accuracy. This could paint a clearer picture of the relationship between genetic ancestry and clock accuracy, compared to looking at overall correlations within each cohort. 

      Thank you for this excellent suggestion. We agree that modeling portability across genetic ancestry as a spectrum would help support our conclusions. We will add this to the revision.

      The authors present two possible reasons that methylation clocks might have reduced accuracy in individuals with non-European ancestries: genetic variants disrupting methylation sites (i.e., "disruptive variants") and genetic variants influencing methylation sites (i.e., meQTLs). The authors conclude disruptive variants do not contribute to poor methylation clock portability, but the evidence in support of this conclusion is incomplete. The site frequency spectrum of disruptive variants in Figure 4 is estimated from all gnomAD individuals, and gnomAD is comprised of primarily European individuals. Thus, the observation that disruptive variants are generally rare in gnomAD does not rule them out as a source of poor clock portability in admixed individuals with non-European ancestries. 

      Thank you for this question. The allele frequencies were so low that even if they all occurred in individuals of non-European ancestries, they would still be incredibly rare. Nonetheless, in the revision, we will make this clear by reporting ancestry-specific allele frequencies.

      It is also unclear to what extent meQTLs impact methylation clock portability. The authors find that the frequency of meQTLs is higher in African ancestry populations, but this could reflect the fact that some of the analyzed meQTLs were ascertained in African Americans. The number of meQTL-affected methylation sites also varies widely between clocks, ranging from 6 to 271; thus, meQTLs likely impact the portability of different clocks in different ways. Overall, the paper would benefit from a more quantitative assessment of the extent to which meQTLs influence clock portability. 

      We agree that the meQTL likely influence the clocks in different ways and that the ascertainment of the meQTLs in different populations makes direct comparisons challenging. To provide mechanistic insights into the ways that meQTL influence the methylation clocks, we plan to leverage the individual-level genetic data generated for the MAGENTA individuals. This will allow us to explore whether the individuals who have the specified clock-influencing meQTL receive less accurate predictions from the methylation clocks. In addition, the new analysis of whether individuals from different cohorts have different methylation levels at clock CpGs with ancestry-variable meQTLs will help establish the differences between groups (see response to Reviewer #1 point 6). Finally, to resolve potential bias due to ascertaining some of the meQTL in African Americans, we will conduct the same analyses from the manuscript, holding out the set of meQTL from African Americans. These results will be included in the revision.

      The paper implies that methylation clocks have an inferior ability to predict AD risk in admixed populations relative to white individuals, but the difference between white AD patients and controls is not significant when correcting for multiple testing. This nuance should be made more explicit. 

      We agree that the signal is not particularly strong in the white cohort, but the effect size is in line with previous studies. We will add power calculations and discussion to help the interpretation of these results (see response to Reviewer #1 point 5).  

      Finally, this paper overlooks the possibility that environmental exposures co-vary with genetic ancestry and play a role in decreasing the accuracy of methylation clocks in genetically admixed individuals. Quantifying the impact of environmental factors is almost certainly outside of the scope of this paper. However, it is worth acknowledging the role of environmental factors to provide the field with a more comprehensive overview of factors influencing methylation clock portability. It is also essential to avoid the assumption that correlations with genetic ancestry necessarily arise from genetic causes.  

      We entirely agree about the importance of discussing environmental exposures. We did not intend to discount them in our manuscript. We will clarify their potential role and the scope of our analyses in the revision. We expect that environmental factors certainly contribute to differences between groups. The revisions outlined above may help us better quantify the genetic contribution.

      Reviewer #3 (Public review):

      This manuscript examines the accuracy of DNA methylation-based epigenetic clocks across multiple cohorts of varying genetic ancestry. The authors find that clocks were generally less accurate at predicting age in cohorts with large proportions of nonEuropean (especially African) ancestry, compared to cohorts with high European ancestry proportions. They suggest that some of this effect might be explained by meQTLs that occur near CpG sites included in clocks, because these variants may be at higher frequencies (or at least different frequencies) in cohorts with high proportions of non-European ancestry relative to the training set. They also provide discussions of potential paths forward to alleviate bias and improve portability for future clock algorithms.  

      The topic is timely due to the increasing popularity of DNA methylation-based clocks and the acknowledgment that many algorithms (e.g., polygenic risk scores) lack portability when applied to cohorts that substantially differ in ancestry or other characteristics from the training set. This has been discussed to some degree for DNA methylation-based clocks, but could of course use more discussion and empirical attention which the authors nicely provide using an impressive and diverse collection of data.  

      The manuscript is clear and well-written, however, some key background was missing (e.g., what we know already about the ancestry composition of clock training sets) and most importantly several analyses would benefit from being taken one step further. For example, the main argument of the paper is that ancestry impacts clock predictions, but this is determined by subsetting the data by recruitment cohort rather than analyzing ancestry as a continuous variable. Extending some of the analyses could really help the authors nail down their hypothesized sources of lack of portability, which is critical for making recommendations to the community and understanding the best paths forward.  

      Thank you for these suggestions. As noted in our response to reviewer #2, we will analyze ancestry as a continuous variable in the revision. We will also add details on the training of previous clocks and previous work on clock accuracy.

    1. Author response:

      We thank the reviewers for the careful review of our manuscript. Overall, they were positive about our use of cutting-edge methods to identify six inversions segregating in Lake Malawi. Their distribution in ~100 species of Lake Malawi species demonstrated that they were differentially segregating in different ecogroups/habitats and could potentially play a role in local adaptation, speciation, and sex determination. Reviewers were positive about our finding that the chromosome 10 inversion was associated with sex-determination in a deep benthic species and its potential role in regulating traits under sexual selection. They agree that this work is an important starting point in understanding the role of these inversions in the amazing phenotypic diversity found in the Lake Malawi cichlid flock.

      There were two main criticisms that were made which we summarize:

      (1) Lack of clarity. It was noted that the writing could be improved to make many technical points clearer. Additionally, certain discussion topics were not included that should be.

      We will rewrite the text and add additional figures and tables to address the issues that were brought up in a point-by-point response. We will improve/include (1) the nomenclature to understand the inversions in different lineages, (2) improved descriptions for various genomic approaches, (3) a figure to document the samples and technologies used for each ecogroup, and 4) integration of LR sequences to identify inversion breakpoints to the finest resolution possible.

      (2) We overstate the role that selection plays in the spread of these inversions and neglect other evolutionary processes that could be responsible for their spread.

      We agree with the overarching point. We did not show that selection is involved in the spread of these inversions and other forces can be at play. Additionally, there were concerns with our model that the inversions introgressed from a Diplotaxodon ancestor into benthic ancestors and incomplete lineage sorting or balancing selection (via sex determination) could be at play. Overall, we agree with the reviewers with the following caveats. 1. Our analysis of the genetic distance between Diplotaxodons and benthic species in the inverted regions is more consistent with their spread through introgression versus incomplete lineage sorting or balancing selection. 2. This question of selection is much more complicated in the context of the Lake Malawi cichlid radiation with ~800 different species. We believe the role of these inversions must be considered in a species- and time-specific way. In other words, the evolutionary forces acting on these inversions at the time of their formation are likely different than the role of the evolutionary forces acting now. Further the role of these inversions is likely different in different species. For example, the inversion of 10 and 11 play a role in sex determination in some species but not others and the potential pressures acting on the inverted and non-inverted haplotypes will be very different. These are very interesting and important questions booth for understanding the adaptive radiations in Lake Malawi and in general, and we are actively studying crosses to understand the role of these inversions in phenotypic variation between two species. We will modify the text to make all of these points clearer.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Koren et al. derive and analyse a spiking network model optimised to represent external signals using the minimum number of spikes. Unlike most prior work using a similar setup, the network includes separate populations of excitatory and inhibitory neurons. The authors show that the optimised connectivity has a like-to-like structure, which leads to the experimentally observed phenomenon of feature competition. The authors also examine how various (hyper)parameters-such as adaptation timescale, the excitatory-to-inhibitory cell ratio, regularization strength, and background current-affect the model. These findings add biological realism to a specific implementation of efficient coding. They show that efficient coding explains, or at least is consistent with, multiple experimentally observed properties of excitatory and inhibitory neurons. 

      As discussed in the first round of reviews, the model's ability to replicate biological observations such as the 4:1 ratio of excitatory vs. inhibitory neurons hinges on somewhat arbitrary hyperparameter choices. Although this may limit the model's explanatory power, the authors have made significant efforts to explore how these parameters influence their model. It is an empirical question whether the uncovered relationships between, e.g., metabolic cost and the fraction of excitatory neurons are biologically relevant.

      The revised manuscript is also more transparent about the model's limitations, such as the lack of excitatory-excitatory connectivity. Further improvements could come from explicitly acknowledging additional discrepancies with biological data, such as the widely reported weak stimulus tuning of inhibitory neurons in the primary sensory cortex of untrained animals.

      We thank the Reviewer for their insightful characterization of our paper and for further suggestions on how to improve it. We have now further improved the transparency about model’s limitations and we explicitly acknowledged the discrepancy with biological data about connection probability and about the selectivity of inhibitory neurons (pages 4 and 15).

      Reviewer #2 (Public review): 

      Summary: 

      In this work, the authors present a biologically plausible, efficient E-I spiking network model and study various aspects of the model and its relation to experimental observations. This includes a derivation of the network into two (E-I) populations, the study of single-neuron perturbations and lateral-inhibition, the study of the effects of adaptation and metabolic cost, and considerations of optimal parameters. From this, they conclude that their work puts forth a plausible implementation of efficient coding that matches several experimental findings, including feature-specific inhibition, tight instantaneous balance, a 4 to 1 ratio of excitatory to inhibitory neurons, and a 3 to 1 ratio of I-I to E-I connectivity strength.

      Strengths: 

      While many network implementations of efficient coding have been developed, such normative models are often abstract and lacking sufficient detail to compare directly to experiments. The intention of this work to produce a more plausible and efficient spiking model and compare it with experimental data is important and necessary in order to test these models. In rigorously deriving the model with real physical units, this work maps efficient spiking networks onto other more classical biophysical spiking neuron models. It also attempts to compare the model to recent single-neuron perturbation experiments, as well as some long-standing puzzles about neural circuits, such as the presence of separate excitatory and inhibitory neurons, the ratio of excitatory to inhibitory neurons, and E/I balance. One of the primary goals of this paper, to determine if these are merely biological constraints or come from some normative efficient coding objective, is also important. Lastly, though several of the observations have been reported and studied before, this work arguably studies them in more depth, which could be useful for comparing more directly to experiments.

      Weaknesses: 

      This work is the latest among a line of research papers studying the properties of efficient spiking networks. Many of the characteristics and findings here have been discussed before, thereby limiting the new insights that this work can provide. Thus, the conclusions of this work should be considered and understood in the context of those previous works, as the authors state. Furthermore, the number of assumptions and free parameters in the model, though necessary to bring the model closer to biophysical reality, make it more difficult to understand and to draw clear conclusions from. As the authors state, many of the optimality claims depend on these free parameters, such as the dimensionality of the input signal (M=3), the relative weighting of encoding error and metabolic cost, and several others. This raises the possibility that it is not the case that the set of biophysical properties measured in the brain are accounted for by efficient coding, but rather that theories of efficient coding are flexible enough to be consistent with this regime. With this in mind, some of the conclusions made in the text may be overstated and should be considered in this light.

      Conclusions, Impact, and additional context: 

      Notions of optimality are important for normative theories, but they are often studied in simple models with as few free parameters as possible. Biophysically detailed and mechanistic models, on the other hand, will often have many free parameters by their very nature, thereby muddying the connection to optimality. This tradeoff is an important concern in neuroscientific models. Previous efficient spiking models have often been criticized for their lack of biophysically-plausible characteristics, such as large synaptic weights, dense connectivity, and instantaneous communication. This work is an important contribution in showing that such networks can be modified to be much closer to biophysical reality without losing their essential properties. Though the model presented does suffer from complexity issues which raise questions about its connections to "optimal" efficient coding, the extensive study of various parameter dependencies offers a good characterization of the model and puts its conclusions in context.

      We thank the Reviewer for their thorough and accurate assessment of our paper.  

      Reviewer #3 (Public review): 

      Summary: 

      In their paper the authors tackle three things at once in a theoretical model: how can spiking neural networks perform efficient coding, how can such networks limit the energy use at the same time, and how can this be done in a more biologically realistic way than previous work. 

      They start by working from a long-running theory on how networks operating in a precisely balanced state can perform efficient coding. First, they assume split networks of excitatory (E) and inhibitory (I) neurons. The E neurons have the task to represent some lower dimensional input signal, and the I neurons have the task to represent the signal represented by the E neurons. Additionally, the E and I populations should minimize an energy cost represented by the sum of all spikes. All this results in two loss functions for the E and I populations, and the networks are then derived by assuming E and I neurons should only spike if this improves their respective loss. This results in networks of spiking neurons that live in a balanced state, and can accurately represent the network inputs. 

      They then investigate in depth different aspects of the resulting networks, such as responses to perturbations, the effect of following Dale's law, spiking statistics, the excitation (E)/inhibition (I) balance, optimal E/I cell ratios, and others. Overall, they expand on previous work by taking a more biological angle on the theory and show the networks can operate in a biologically realistic regime.

      Strengths: 

      * The authors take a much more biological angle on the efficient spiking networks theory than previous work, which is an essential contribution to the field

      * They make a very extensive investigation of many aspects of the network in this context, and do so thoroughly

      * They put sensible constraints on their networks, while still maintaining the good properties these networks should have

      Weaknesses: 

      * One of the core goals of the paper is to make a more biophysically realistic network than previous work using similar optimization principles. One of the important things they consider is a split into E and I neurons. While this works fine, and they consider the coding consequences of this, it is not clear from an optimization perspective why the split into E and I neurons and following Dale's law would be beneficial. This would be out of scope for the current paper however.

      * The theoretical advances in the paper are not all novel by themselves, as most of them (in particular the split into E and I neurons and the use of biophysical constants) had been achieved in previous models. However, the authors discuss these links thoroughly and do more in-depth follow-up experiments with the resulting model. 

      Assessment and context: 

      Overall, although much of the underlying theory is not necessarily new, the work provides an important addition to the field. The authors succeeded well in their goal of making the networks more biologically realistic, and incorporate aspects of energy efficiency. For computational neuroscientists this paper is a good example of how to build models that link well to experimental knowledge and constraints, while still being computationally and mathematically tractable. For experimental readers the model provides a clearer link of efficient coding spiking networks to known experimental constraints and provides a few predictions.

      We thank the Reviewer for a positive assessment and for pointing out the merits of our work.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      The authors have addressed my previous concerns, and I agree that the manuscript has improved. However, I believe they could still do more to acknowledge two notable mismatches between the model and experimental data.

      (1) Stimulus selectivity of excitatory and inhibitory neurons 

      In the model, excitatory and inhibitory neurons exhibit similar stimulus selectivity, which appears inconsistent with most experimental findings. The authors argue that whether inhibitory neurons are less selective remains an open question, citing three studies in support. However, only one of these studies (Ranyan) was conducted in primary sensory cortex and it is, to my knowledge, one of the few papers showing this (indeed, it's often cited as an exception). The other two studies (Kuan and Najafi) recorded from the parietal cortex of mice trained on decision making tasks, and therefore seem less relevant to the model.

      In contrast to the cited studies, the overwhelming majority of the work has found that inhibitory neurons in sensory cortex, in particular those expressing Parvalbumin, are less stimulus selective than excitatory cells. And this is indeed the prevailing view, as summarized by the review from Hu et al. (Science, 2014): "PV+ interneurons exhibit broader orientation tuning and weaker contrast specificity than pyramidal neurons." This view emerged from numerous classical studies, including Sohya et al. (J. Neurosci., 2007), Cardin (J. Neurosci., 2007), Nowak (Cereb. Cortex, 2008), Niell et al. ( J. Neurosci., 2008), Liu (J. Neurosci., 2009), Kerlin (Neuron, 2010), Ma et al. (J. Neurosci., 2010), Hofer et al. (Nature Neurosci. 2011), and Atallah et al. (Neuron 2012). Weak inhibitory tuning has been confirmed by recent studies, such as Sanghavi & Kar (biorxiv 2023), Znamenskiy et al. (Neuron 2024), and Hong et al. (Nature, 2024).

      The authors should acknowledge this consensus and cite the conflicting evidence. Failing to do so is cherry picking from the literature. Since training can increase the stimulus selectivity of PV+ neurons to that of Pyr levels, also in primary visual cortex (Khan et al. Neuron 2018), a favourable interpretation of the model is that it represents a highly optimized, if not overtrained, state.

      We have carefully considered the literature cited by the Reviewer. We agree with the interpretation that stimulus selectivity of inhibitory neurons in our model is higher than the stimulus selectivity of Parvalbumin-positive inhibitory neurons in the primary sensory cortex of naïve animals. We have edited the text in Discussion (page 14).

      (2) Connection probability 

      The manuscript claims that "rectification sets the overall connection probability to 0.5, consistent with experimental results (Pala & Petersen; Campagnola et al.)." However, the cited studies, and others, report significantly lower probabilities, except for Pyr-PV (E-I connections in the model). For example, Campagnola et al. measured PV-Pyr connectivity at 34% in L2/3 and 20% in L5.

      It's perfectly acceptable that the model cannot replicate every detail of biological circuits. But it's important to be cautious when claiming consistency with experimental data.

      Here as well, we agree with the Reviewer that the connection probability of 0.5 is consistent with reported connectivity of Pyr-PV neurons, but less so with reported connectivity of PV-Pyr neurons. We have now qualified our claim about compatibility of the connection probability in our model with empirical observations more precise (page 4).

      Reviewer #2 (Recommendations for the authors): 

      I commend the authors for an extremely thorough and detailed rebuttal, and for all of the additional work put in to address the reviewer concerns. For the most part, I am satisfied with the current state of the manuscript. 

      We thank the Reviewer for recognizing our effort to address the first round of Reviews to our best ability.

      Here are some small points still remaining that I think the authors should address: 

      (1) Pg. 8, "We verified the robustness of the model to small deviations from the optimal synaptic weights" - while the authors now cite Calaim et al. 2022 in the discussion, its relevance to several of the results justify its inclusion in other places. Here is one place where the authors test something that was also studied in this previous paper.

      The Reviewer is correct that Calaim et al. (eLife 2022) addressed the robustness of synaptic weights, and we now cited this study when describing our results on jiVering of synaptic connections (page 8).

      (2) Pg. 9, "In our optimal E-I network we indeed found that optimal coding efficiency is achieved in absence of within-neuron feedback or with weak adaptation in both cell types" Pg. 10, "the absence of within-neuron feedback or the presence of weak and short-lasting spike-triggered adaptation in both E and I neurons are optimally efficient solutions" The authors seem to state that both weak adaptation and no adaptation at all are optimal. In contrast to the rest of the results presented, this is very vague and does not give a particular level of adaptation as being optimal. The authors should make this more clear. 

      We agree that the text about optimal level of adaptation was unclear. The optimal solution is no adaptation, while weak and short-lasting adaptation define a slightly suboptimal, yet still efficient, network state, as now stated on page 10.

      (3) Pg. 13, "In summary our analysis suggests that optimal coding efficiency is achieved with four times more E neurons than I neurons and with mean I-I synaptic efficacy about 3 times stronger..." --- claims such as these are still too strong, in my opinion. It is rather the case that the particular ratio of E to I neurons and connections strengths can be made consistent with an optimally efficient regime.

      We agree here as well. We have revised the text (page 13) to beVer explain our results.

      (4) Pg. 14, "firing rates in the 1CT model were highly sensitive to variations in the metabolic constant" (Fig. 8I, as compared to Fig. 6C). This difference between the 1CT and E-I networks is striking, and I would suspect it is due to some idiosyncrasies in the difference between the two models (e.g., the relative amount of delay that it takes for lateral inhibition to take effect, or the fact that E-E connections have not been removed in this model). The authors should ideally back up this result with some justified explanation. 

      We agree with Reviewer that the delay for lateral inhibition in the E-I model is twice that of the 1CT model and that the E-I model gains stability from the lack of E-E connectivity. Furthermore, the tuning is stronger in I compared to E neurons in the E-I model, which contributes to making the E-I network inhibition-dominated (Fig. 1H). In contrast, the average excitation and inhibition in the 1CT model are of exactly the same magnitude. The property of being inhibition-dominated makes the E-I model more stable. We report these observations in the revised text (pages 14-15). 

      Reviewer #3 (Recommendations for the authors): 

      Overall my points were very well responded to and I removed most of my weaknesses.

      I appreciate the authors implementing my suggested analysis change for Figure 8, and I find the result very clear. I would further suggest they add a bit of text for the reader as to why this is done. For a new reader without much knowledge of these networks at first it seems the inhibitory population is very good at representation in fig 8G: so why is it not further considered in fig 8H?

      We thank the reviewer for providing further suggestions. We now clarified in the text why only the excitatory population of the E-I model is considered in E-I vs 1 cell type model comparison (page 14). 

      Thanks for sharing the code. From a quick browse through it looks very manageable to implement for follow up work, although some more guidance for how to navigate the quite complicated codebase and how to reproduce specific paper results would be helpful.

      We have also updated the code repository, where we have included more complete instructions on how to reproduce results of each figure. We renamed the folders with the computer code so that they point to a specific figure in the paper. The repository has been completed with the output of the numerical simulations we run, which allows immediate replot of all figures. We have deposited the repository at Zenodo to have the final version of the code associated with the DOI ttps://doi.org/10.5281/zenodo.14628524. This is mentioned in the section Code availability (page 17).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1 Public Review:

      Summary

      This very short paper shows a greater likelihood of C->U substitutions at sites predicted to be unpaired in the SARS-CoV-2 RNA genome, using previously published observational data on mutation frequencies in  SARS-CoV-2 (Bloom and Neher, 2023).

      General comments

      A preference for unpaired bases as a target for APOBEC-induced mutations has been demonstrated previously in functional studies so the finding is not entirely surprising. This of course assumes that A3A or other APOBEC is actually the cause of the majority of C->U changes observed in SARS-CoV-2 sequences.

      I'm not sure why the authors did not use the published mutation frequency data to investigate other potential influences on editing frequencies, such as 5' and 3' base contexts. The analysis did not contribute any insights into the potential mechanisms underlying the greater frequency of C->U (or G->U) substitutions in the SARS-CoV-2 genome.

      I have added additional discussions of mechanisms focusing on the question of whether basepairing bias is  primarily driven by secondary structure dependence of underlying mutation rates or by conservation of  secondary structure (Discussion lines 178–192) and I added a brief analysis of the 5′ and 3′ contexts of the  relationship between being basepaired in a secondary structure model and apparent mutational fitness  (Figures S1 and S2, Results lines 85–97). I found that the 5′ context of unpaired, but not paired basepairs  influences apparent mutational fitness (preference for 5′ U), and that the  is also . Additionally, there is a 3′  preference for G, indicating some CpG suppression. This contrasts to some degree with another analysis  based on counting lineage frequencies that may have lacked power to detect relatively small effects  (Simmonds  mBio  2024).

      Reviewer 1 Author recommendations:

      There are at least 5 publications describing the mapping/prediction of SARS-CoV-2 RNA secondary structure from 2022-2023 and their predictions are not entirely consistent. Why did the authors only refer to the Lan et al. paper?

      I have added comparisons when the Lan et al secondary structure model is replaced by one of two others  derived from SHAPE data (Results lines 110–122). Unsurprisingly, similar secondary structure models give  similar results and performance is modestly higher for the models from Lan et al. This is consistent  with  their observations that DMS reactivities performed better as classifiers of SL5 and ORF1 secondary  structure (the reason I compared to this secondary structure model and reactivity data set rather than  others), but I did not go into detail on this in the revision since there are many differences in methods  beyond class of reactivity probe. For example, somewhat stronger correlation for the Vero than the Huh7  dataset in Lan  et al  could arise from combining data  from two replicates, from cell type, or from differences  in data analysis methods. It’s also a small difference and cannot be confidently distinguished from noise.

      I conducted a preliminary comparison of the performance of DMS and SHAPE data for predicting mutations  where DMS data is available, but I opted against including this analysis in the manuscript for the same  reasons. Instead, I included in results and discussion comments on how, in general, reactivity data contains  information that is predictive of substitution rates that is not captured by binary secondary structure models.  I also discuss how multiple data sources can potentially be integrated to more accurately predict the impact  of a substitution on fitness (Discussion lines 195–201).

      Specific substitutions are referred to as C->T and C29085T for example, but as the genome of SARS-CoV-2 is RNA, and T should be a U.

      I agree and I have changed all “T” to “U” in the paper and analysis scripts. The choice of “T” was motivated  by what seemed to appear most frequently in papers on SARS-CoV-2 mutational spectra, but “U” is nearly  universal in papers on secondary structure and mutation mechanisms, so I agree it makes more sense in  this paper.

      The C29085T substitution is somewhat non-canonical as it is a single base bulge in a longer duplex section of dsRNA, very unlike the favoured sites for mutation in the Nakata et al paper.

      I have added a discussion of Nakata  et al ( NAR 2023) ( Introduction lines 29–32). I did not go into this depth  in the revision, but the analysis of ~2M patient sequences in Nakata  et al  also noted a high rate of UUC→UUU substitution, so the UUUC context of C29095 (shared by 3 of the 10 positions highlighted in  Nakata  et al  that had high mutation frequencies with  exogenous APOBEC3A expression) could be  interesting to investigate further.

      High C29095U substitution frequency is indeed somewhat at odds with the results in that work, which found  that UC→UU substitutions to be elevated in longer single-stranded regions than the context of C29095U in  SARS-CoV-2 secondary structure models (a single unpaired base opposing three unpaired bases in an  asymmetric internal loop).

      I'm not sure why DMS reactivity is considered a separate variable from pairing likelihood as one informs the other.

      The intent here, which was not clear, was to show that a binary basepairing model that uses DMS  reactivities as constraints does not capture all of the information available. I have clarified this in as  described above discussing information in different reactivy datasets.

      The C29095U substitution is also relavent to the consideration of DMS reactivity in addition to the resulting  secondary structure model. These are not considered as separate predictors and the reason for showing  both is mentioned in the paper: “DMS reactivity was more strongly correlated with estimated mutational  fitness than basepairing when analysis was limited to positions with detectable DMS reactivity.” I have  clarified this in the revised manuscript and also it is relevant to the discussion of a potential model  integrating all available datasets.

      Reviewer 2 Public Review:

      Hensel investigated the implications of SARS-CoV-2 RNA secondary structure in synonymous and nonsynonymous mutation frequency. The analysis integrated estimates of mutational fitness generated by Bloom and Neher (from publicly available patient sequences) and a population-averaged model of RNA basepairing from Lan et al (from DMS mutational profiling with sequencing, DMS-MaPseq).

      The results show that base-pairing limits the frequency of some synonymous substitutions (including the most common CT), but not all: GA and AG substitutions seem unaffected by base-pairing.

      The author then addressed nonsynonymous CT substitutions at base-paired positions. While there is still a generally higher estimated mutational fitness at unpaired positions, they propose a coarse adjustment to disentangle base-pairing from inherent mutational fitness at a given position. This adjustment reveals that nonsynonymous substitutions at base-paired positions, which define major variants, have higher mutational fitness.

      Overall, this manuscript highlights the importance of considering RNA secondary structure in viral evolution studies.

      The conclusions of this work are generally well supported by the data presented. Particularly, the author acknowledges most limitations of the analyses, and addresses them. Even though no new sequencing results were generated, the author used available data generated from the analysis of roughly seven million sequenced patient samples. Finally, the author discusses ways to improve the current available models.

      There are a number of limitations of this work that should be highlighted, specifically in regard to the secondary structure data used in this paper. The Lan et al. dataset was generated using a multiplicity of infection (MOI) of 0.05, 24 hours post-infection (h.p.i.). At such a low MOI and late timepoint, viral replication is not synchronous and sequencing artifacts might be generated by cell debris and viral RNA degradation, therefore impacting the population-averaged results. In addition, the nonsynonymous base-paired positions in Figure 2 have relatively high population-averaged DMS reactivity, which suggests those positions are dynamic. Therefore, the proposed adjustment could result in an incorrect estimation of their inherent mutational fitness.

      I would go further than this to say that the proposed adjustmentment  will usually  result in an incorrect  estimate. My intent is to propose an improved, but still likely incorrect, estimate by utilizing  in  vitro  data to  refine baseline mutation rates in order to obtain improved, but only coarsely adjusted, estimates of  mutational fitness. I added a note in the discussion that  in vitro  reactivities (and, consequently, secondary  structure models) may not reflect secondary structures  in vivo ( Discussion lines 204–205). I did not go  into  detail regarding the specific technical considerations mentioned here because they are outside the scope of  my expertise.

      I am not sure that top-ranked non-synonymous C→U positions have particularly high DMS values after  coarse adjustment for basepairing (labeled amino acid mutations in Figure 2). Of the six common mutations  used as examples, three have minimum values in the dataset considered (which is processed  normalized/filtered data rather than raw data) and three do not have very high DMS reactivity.

      However, there is clearly information in base reactivity that is not captured by a binary basepairing model,  which is indicated by residual positive correlation between DMS reactivity and mutational fitness after  adjustment. I now include a figure demonstrating this for synonymous C→U substitutions as Figure S3, and  I have tried to clarify the language throughout the manuscript to make it clear that a more accurate  adjustment is possible.

      Additionally, like all such RNA probing experiments within cells, it remains difficult to deconvolve DMS/SHAPE low reactivity with RNA accessibility (e.g. from protein binding).

      I agree, and in revising this manuscript it was interesting to see that Nakata  et al ( discussed above)  identified relatively large single-stranded regions with enhanced UC→UU substitution frequencies with  exogenous APOBEC3A expression, while C29095U, for example, is a single unpaired base with high DMS  reactivity and high empirical C→U substitution frequency (discussed briefly in the introduction of the revised  manuscript). Future analyses could consider heterogeneity in secondary structure as well as secondary  structures with low heterogeneity where strained conformations could have higher reactivity.

      This work presents clear methods and an easy-to-access bioinformatic pipeline, which can be applied to other RNA viruses. Of note, it can be readily implemented in existing datasets. Finally, this study raises novel mechanistic questions on how mutational fitness is not correlated to secondary structure in the same way for every substitution.

      Overall, this work highlights the importance of studying mutational fitness beyond an immune evasion perspective. On the other hand, it also adds to the viral intrinsic constraints to immune evasion.

      Reviewer 2 Author recommendations:

      Even though the experiment was not performed in this manuscript, it would be helpful for the readers if it was briefly explained how secondary structure is inferred from DMS reactivity, as this technique is not broadly used.

      It is not objective to refer to the Lan et al. model of RNA structure as "high quality" given the limitations of their experimental approach (low MOI, asynchronous infection, DMS-only, no long-range interactions) and the lack of external validation of the structure of the genome they propose.

      I removed “high-quality” from the abstract. Since a result of the paper is that secondary structure correlates  with synonymous substitution rates, this is an observation that can be used to retrospectively compare the  quality of secondary structure models in this respect. I updated the manuscript to include such a  comparison, and did not find a large difference between secondary structure models (Results lines  110–122). I added a discussion of how multiple data sources can potentially be integrated to more  accurately predict the impact of a substitution of viral fitness.

      I have also added a brief discussion of constraints on how much we can confidently infer from these  experiments given limitations of the experimental approach. I note that DMS and SHAPE data provide  information that can be combined to make a stronger model, and that predictions can be rapidly tested  given observations by Gout (Symonds?) et al that  in  vitro  substitution rates correlate with those observed  during the pandemic (Discussion lines 195–201).

      Mutational fitness from Bloom & Neher was derived throughout the pandemic, much of which came from a period with the most active surveillance (Delta / Omicron waves). Consequently, these viruses differ from the WA1 strain used by Lan et al. far more than the 3 nt differences between lineage A and B that the author refers to. The following sentence should therefore be revised to avoid misleading the reader:

      "Additionally, note that DMS data was obtained in experiments using the WA1 strain in Lineage A, which differs from the more common Lineage B at 3 positions and could have different secondary structure."

      Revised:

      “Additionally, note that DMS data was obtained in experiments using the WA1 strain in Lineage A,  which differs from the more common Lineage B at 3 positions and could have different secondary  structure. Furthermore, mutational fitness is estimated from the phylogenetic tree of published  sequences (the public UShER tree (Turakhia et al., 2021) additionally curated to filter likely artifacts  such as branches with numerous reversions) that are typically far more divergent and subsequently  will have somewhat different secondary structures. Since the dataset used for mutational fitness  aggregates data across viral clades, my analysis will not capture secondary structure variation  between clades or indels and masked sites that were not considered in that analysis (Bloom and  Neher, 2023).”

      To determine the extent to which the results depend on the single RNA structure model, it would be informative "turn the crank again" on the analysis with one of the other RNA structure datasets for SARS-CoV-2 (though most other datasets suffer from similar problems of asynchronicity of infection).

      I have added comparisons when the Lan  et al  secondary  structure model is replaced by one of two others  derived from SHAPE data as described above. Also, I conducted preliminary comparisons of underlying  DMS and SHAPE reactivity data as described above, but I opted not to include these in the revised  manuscript given that methods different beyond the chemical probe used. I also discuss how multiple data  sources can potentially be integrated to more accurately predict the impact of a substitution of viral fitness.

      In Figure 1 it would be helpful to add the values of the unpaired/basepaired ratios in the plot for clarity.

      Furthermore, a similar analysis using the substitution frequency, which strengthens the conclusions, is mentioned in the text, however, it is not shown. It could be shown as part of Figure 1, or as a supplementary figure.

      This was a good suggestion since numbers around 1 are not perceived as being very significant. I added  the ratio of median unpaired:paired rates to Figure 1, updated the corresponding manuscript text and the  figure caption, and note that the numbers are somewhat changed from the first version of my manuscript  because of updating to use the most up-to-date mutational fitness estimates.

      It is not clear how the two constants were calculated to obtain the "adjusted mutational fitness". It could be shown as part of Figure 2, or as a supplementary figure.

      I added dashed lines and arrows to Figure 2 showing median paired/unpaired mutational fitnesses and the  adjustment made to normalize to the overall median. I also added Figure S3 showing this for synonymous  substitutions, where it is more clear given the lower fraction of mutations with substantial fitness impacts.

      Minor comments

      Statements like "the current fast-growing lineage JN.1.7" never age well... please revise to state the period of time to which this refers.

      Revised:

      “…lineage JN.1.7, which had over 20% global prevalence in Spring 2024…”

      Also, I checked the list of mutations and the examples given remain in the top 15 ranked basepaired,  non-synonymous C→U mutations (BA.2-defining C26060U is added to the list, but I did not update to  include this). It replaces C9246U, which was not mentioned in the first version of the manuscript.

      Similarly, please provide context for the reader in the phrase: "This was one mutation that characterized the B.1.177 lineage" (e.g. add its early reference as "EU1" and that it predominated in Europe in autumn 2020, prior to the emergence of the Alpha variant).

      Revised to add detail:

      This was one of the mutations that characterized the B.1.177 lineage. This lineage, also known as  EU1, characterized a majority of sequences in Spain in summer 2020 and eventually in several  other countries in Europe prior to the emergence of the Alpha variant. However, it was unclear  whether or this lineage had higher fitness than other lineages or if A222V specifically conferred a  fitness advantage.

      "massive sequencing of SARS-CoV-2" - the meaning of the word "massive" is unclear. Revise.

      Revised  “…millions of patient SARS-CoV-2 sequences published during the pandemic…”

    1. Author response:

      The following is the authors’ response to the original reviews.

      We were pleased that many of the critical comments of the reviewers have allowed us to improve our manuscript. In addition to revise the originally submitted figures, we performed new experiments (e.g. new Fig.2, Fig.3, Fig.4, and Fig.6) and revised the manuscript substantially following the reviewers’ comments and suggestions to our initial submission. A point-by-point response to the reviewers’ critiques are summarized below, and new supportive data are provided in this revised manuscript. Per the Reviewers’ comments and revisions, we revised the title to be “Cold induces brain region-selective cell activity-dependent lipid metabolism”. 

      Reviewer #1:

      Strengths:

      A strength of the study is trying to better understand how metabolism in the brain is a dynamic process, much like how it has been viewed in other organs. The authors also use a creative approach to measuring in vivo lipid peroxidation via delivery of a BD-C11 sensor through a cannula to the region in conjunction with fiber photometry to measure fluorescence changes deep in the brain.

      We thank the Reviewer so much for the positive comments on this interesting study on metabolism in the brain.

      Weaknesses:

      One weakness was many of the experiments were done in a manner that could not distinguish between the contributions of neurons and glial cells, limiting the extent of conclusions that could be made. While this is not easily doable for all experiments, it can be done for some. For example, the Fos experiments in Figure 3 would be more conclusive if done with the labeling of neuronal nuclei with NeuN, as glial cells can also express Fos. To similarly show more conclusively that neurons are being activated during cold exposure, the calcium imaging experiments in Figure S3 can be done with cold exposure. 

      We agreed with the Reviewers’ comments. We revised the original Figure 3 (new Figure 6) and Figure S3 (new Figure S4). Our data show that cold increased Fos-positive cells in the PVH (Figure 6) and increased neuronal Ca2+ signals (new Figure S4). As it is difficult to exclude the involvements of astrocytes in the cold-induced lipid metabolism, and to address this reviewer’s questions, we revised the title and the text with replacing “neuronal” with “‘cell” activity, and we concluded that cold induced lipid metabolism depending on “cell activity” instead of “neuronal activity”. Studying cell type-specific contributions to the cold-induced effects on lipid metabolism will require many efforts beyond the scope of this study, to which we assumed that both neurons and glial cells contribute.

      Additionally, many experiments are only done with the minimal three animals required for statistics and could be more robust with additional animals included.  

      We thank this reviewer for the comments. We added the sample sizes accordingly in this revised manuscript.

      Another weakness is that the authors do not address whether manipulating lipid droplet accumulation or lipid peroxidation has any effect on PVH function (e.g. does it change neuronal activity in the region?).

      We thank this reviewer for bringing up this interesting point. The focus of this study was to examine how cold modulates lipid metabolism in the brain, while it is another interesting project studying how brain lipid metabolism (e.g. manipulating LD accumulation or lipid peroxidation) modulates neuronal activity, which however will require many efforts beyond the scope of this study. Manipulating LD or peroxidation would affect multiple cellular signaling pathways and physiological experimental conditions need to be developed. However, to address this reviewer’s questions, we performed preliminary studies with treating brain slices with the lipid peroxidation inhibitor a-TP and recorded PVH neurons, but did not observe differences in firing rates in a-TPtreated brain slices and controls (Data not shown).  

      Reviewer #2:

      Strengths:

      A set of relatively novel and interesting observations. Creative use of several in vivo sensors and techniques.

      We thank the Reviewer so much for the positive comments on our studies in both concept and techniques. 

      Weaknesses:  

      (1) The physiological relevance of lipolysis and thermogenesis genes in the PVH. The authors need to provide quantitative and substantial characterizations of lipid metabolism in the brain beyond a panel of qPCRs, especially considering these genes are likely expressed at very low levels. mRNA and protein level quantification of genes in Fig 1, in direct comparison to BAT/iWAT, should be provided. Besides bulk mRNA/protein, IHC/ISH-based characterization should be added to confirm to cellular expression of these genes.

      We agreed with the Reviewer’s comments and thank this reviewer for the constructive suggestions. To address this reviewer’s comments and suggestions, we performed additional experiments to verify cold-induced expressions of lipid lipolytic genes and proteins. For example, we stained ATGL and HSL in both neurons and astrocytes in the PVH. Matching with the increased gene expressions, cold increased protein expressions of ATGL (new Figure 2) and HSL (new Figure 3) in both neurons and astrocytes. We also performed western blots of p-HSL and HSL and observed that cold increased the expression level of p-HSL (new Figure 4). These new results support our conclusions and further demonstrate that cold increases lipid metabolism in the PVH.   

      (2) The fiberphotometry work they cited (Chen 2022, Andersen 2023, Sun 2018) used well-established, genetically encoded neuropeptide sensors (e.g., GRABs). The authors need to first quantitatively demonstrate that adapting BD-C11 and EnzCheck for in vivo brain FP could effectively and accurately report peroxidation and lipolysis. For example, the sensitivity, dynamic range, and off-time should all be calibrated with mass spectrometry measurements before any conclusions can be made based on plots in Figures 4, 5, and 6. This is particularly important because the main hypothesis heavily relies on this unvalidated technique.

      We thank this reviewer’s comments. Fiber photometry has been well demonstrated to detect fluorescent-labelled biomolecules in my laboratory and other labs, as indicated in the above stated publications. In this study, we combined photometry with the well commercially developed and validated lipid metabolic fluorescent-labelled biomarkers to monitor lipid metabolic dynamics in vivo. We indeed verified this approach in both brain (this study) and peripheral adipose tissues (another project). Particularly, our data in this study show that lipid peroxidation inhibitor a-TP blocked the cold-induced lipid peroxidation signals (Fig. 7A-C) and the pan-lipase inhibitor DEUP blocked the cold-induced lipolytic signals (Fig. 8A-C). These results demonstrate that the signals detected by photometry indeed reflect lipid peroxidation and lipolysis respectively in the brain. Meanwhile, we agreed with the reviewer’s suggestions on mass spectrometry measurements, while it is not feasible for us to perform the spectrometry in the brain in vivo at this moment.       

      (3) Generally, the histology data need significant improvement. It was not convincing, for example, in Figure 3, how the Fos+ neurons can be quantified based on the poor IF images where most red signals were not in the neurons. 

      We thank this reviewer for this comment. We performed additional experiments to add sample size and presented high quality images. 

      (4) The hypothesis regarding the direct role of brain temperature in cold-induced lipid metabolism is puzzling. From the introduction and discussion, the authors seem to suggest that there are direct brain temperature changes in responses to cold, which could be quite striking. However, this was not supported by any data or experiments. The authors should consolidate their ideas and update a coherent hypothesis based on the actual data presented in the manuscript. 

      We thank this reviewer for bringing up this comment and constructive suggestions. To make this study more concise on the cold-induced lipid metabolism, we removed the statements related to the brain temperature.

      Reviewer #1 (Recommendations For The Authors):

      An additional minor weakness is that the authors are redundant in their discussion, sometimes repeating sections from the introduction (e.g. this line in the discussion "Evidence shows that the brain's energy expenditure efficiency largely depends on the temperature (Yu et al., 2012), and temperature gradients between different brain regions exist (Anderson and Moser, 1995; Delgado and Hanai, 1966; Hayward and Baker, 1968; McElligott and Melzak, 1967; Moser and Mathiesen, 1996; Thornton, 2003)"). 

      We thank the Reviewer for these comments. We revised the text following the suggestions accordingly and removed the statements and references related to brain temperatures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary: 

      IPF is a disease lacking regressive therapies which has a poor prognosis, and so new therapies are needed. This ambitious phase 1 study builds on the authors' 2024 experience in Sci Tran Med with positive results with autologous transplantation of P63 progenitor cells in patients with COPD. The current study suggests that P63+ progenitor cell therapy is safe in patients with ILD. The authors attribute this to the acquisition of cells from a healthy upper lobe site, removed from the lung fibrosis. There are currently no cell-based therapies for ILD and in this regard the study is novel with important potential for clinical impact if validated in Phase 2 and 3 clinical trials. 

      Strengths: 

      This study addresses the need for an effective therapy for interstitial lung disease. It offers good evidence that the cells used for therapy are safe. In so doing it addresses a concern that some P63+ progenitor cells may be proinflammatory and harmful, as has been raised in the literature (articles which suggested some P63+ cells can promote honeycombing fibrosis; references 26 &35). The authors attribute the safety they observed (without proof) to the high HOPX expression of administered cells (a marker found in normal Type 1 AECs. The totality of the RNASeq suggests the cloned cells are not fibrogenic. They also offer exploratory data suggesting a relationship between clone roundness and PFT parameters (and a negative association between patient age and clone roundness). 

      We thank the reviewer for the important comments.

      Weaknesses: 

      The authors can conclude they can isolate, clone, expand, and administer P63+ progenitor cells safely; but with the small sample size and lack of a placebo group, no efficacy should be implied.

      We thank the reviewer for the suggestion and agree that we should be more cautious to discuss the efficacy of current study. 

      Specific points: 

      (1) The authors acknowledge most study weaknesses including the lack of a placebo group and the concurrent COVID-19 in half the subjects (the high-dose subjects). They indicate a phase 2 trial is underway to address these issues. 

      N/A

      (2) The authors suggest an efficacy signal on pages 18 (improvement in 2 subjects' CT scans) and 21 (improvement in DLCO) but with such a small phase 1 study and such small increases in DLCO (+5.4%) the authors should refrain from this temptation (understandable as it is). 

      We believe that exploring potential efficacy signal is also one aim of this study. All these efficacy endpoint analyses had been planned in prior to the start of clinical trials (as registered in ClinicalTrial.gov) and the data need be analyzed anyhow.

      (3) Likewise most CT scans were unchanged and those that improved were in the mid-dose group (albeit DLCO improved in the 2 patients whose CT scans improved). 

      Yes, it is.

      (4) The authors note an impressive 58m increase in 6MWTD in the high-dose group but again there is no placebo group, and the low-dose group has no net change in 6MWTD at 24 weeks. 

      Yes.

      (5) I also raise the question of the enrollment criteria in which 5 patients had essentially normal DLCO/VA values. In addition there is no discussion as to whether the transplanted stem cells are retained or exert benefit by a paracrine mechanism (which is the norm for cell-based therapies).

      Thank you for your detailed feedback.  The enrollment criteria are based on DLCO instead of DLCO/VA. And we would like to further discuss the possible benefit by paracrine mechanism in the revised manuscript.

      Recommendations for the authors: 

      (1) Four of the enrolled subjects had normal DLCO/VA (% of predicted) (>90% of predicted). This raises questions about the severity of their illness see: Table 1: Subjects 103, 105, 112, and 204 have DLCO/VA % predicted >90% of predicted and would appear not to qualify for the study. While technically enrollment criteria for DLCO are satisfied, DLCO/VA is an equally valid measure of ILD severity, and these 4 cases seem very mild. 

      Thank you for your detailed feedback. Yes, the current inclusion criteria is based on DLCO but not DLCO/VA.  And we believe improvement of DLCO and DLCO/VA is both meaningful. In future trial, we will consider DLCO/VA as inclusion criteria as well.

      (2) The authors state "Resolution of honeycomb lesion was also observed in patients of higher dose groups". This appears inaccurate as only 2 subjects in the study showed CT improvement and they were not in the highest dose group. This statement is an overreach for a Phase 1 study and should be removed from the abstract and more balance inserted in the text. The phase 2 study they are doing will answer these questions. 

      Thank you. We changed our statement about efficacy in the abstract part.

      a) Under exclusion criteria: More detail is required as to what defines "subjects who cannot tolerate cell therapy". 

      Those patients cannot tolerate previous cell therapy, for example mesenchymal stem cell transplantation, would not be included in the current trial.

      b) Figure S6 is important and should be in the main manuscript. This Figure shows that many (6) subjects had COVID at some trial measurement time points. This is an unfortunate confounder for efficacy signals (but efficacy is not the point of this study). Second, Figure 6 (in my view) shows little efficacy signal, which is a reminder to the authors that efficacy should not be implied in a study that was not powered to detect efficacy. 

      We agreed that the efficacy should be discussed very carefully.

      (3) Figure S3: It appears at some does there is a significant rise in monocytes (1M cells) and neutrophils (3 M cells). 

      Thank you for your reasonable concerns regarding the safety of the treatment. The monocyte counts in the S3 patients, even after an increase, remains within the reference range, and therefore we consider this elevation to be clinically meaningless. One patient exhibited a significant increase in neutrophils at 24 weeks, which was attributed to a grade II adverse event, acute bronchitis, which was unrelated to cell therapy. The symptoms resolved within three days following treatment with appropriate medication.

      (4) Figure 3: I wonder about the statistical significance of the 6MWD. Was this done by repeat measure ANOVA? The analysis suggests a p=0.08 but all error bars between low and high dose overlap and the biggest difference is at 24 weeks, and that appears to be labelled as not significant.

      Thank you for your kind reminding. The 6MWD result with a p-value of 0.008 was derived to compare the improvement in 6MWD at the 24-week time point versus baseline within the higher group. Therefore, a paired t-test was used for this analysis. In the revised version, we label them more clearly.

      Reviewer #2 (Public review):

      Summary: 

      This manuscript describes a first-in-human clinical trial of autologous stem cells to address IPF. The significance of this study is underscored by the limited efficacy of standard-of-care anti-fibrotic therapies and increasing knowledge of the role p63+ stem cells in lung regeneration in ARDS. While models of acute lung injury and p63+ stem cells have benefited from widespread and dynamic DAD and immune cell remodeling of damaged tissue, a key question in chronic lung disease is whether such cells could contribute to the remodeling of lung tissue that may be devoid of acute and dynamic injury. A second question is whether normal regions of the lung in an otherwise diseased organ can be identified as a source of "normal" p63+ stem cells, and how to assess these stem cells given recently identified p63+ stem cell variants emerging in chronic lung diseases including IPF. Lastly, questions of feasibility, safety, and efficacy need to be explored to set the foundation for autologous transplants to meet the huge need in chronic lung disease. The authors have addressed each of these questions to different extents in this initial study, which has yielded important if incomplete information for many of them. 

      Strengths: 

      As with a previous study from this group regarding autologous stem cell transplants for COPD (Ref. 24), they have shown that the stem cells they propagate do not form colonies in soft agar or cancers in these patients. While a full assessment of adverse events was confounded by a wave of Covid19 infections in the study participants, aside from brief fevers it appears these transplants are tolerated by these patients. 

      We thank the reviewer for the important comments.

      Weaknesses: 

      The source of stem cells for these autologous transplants is generally bronchoscopic biopsies/brushings from 5th-generation bronchi. Although stem cells have been cloned and characterized from nasal, tracheal, and distal airway biopsies, the systematic cloning and analysis of p63+ stem cells across the bronchial generations is less clear. For instance, p63+ stem cells from the nasal and tracheal mucosa appear committed to upper airway epithelia marked by 90% ciliated cells and 10% goblet cells (Kumar et al., 2011. Ref. 14). In contrast, p63+ stem cells from distal lung differentiate to epithelia replete with Club, AT2, and AT1 markers. The spectrum of p63+ stem cells in the normal bronchi of any generation is less studied. In the present study, cells are obtained by bronchoscopy from 3-5 generation bronchi and expanded by in vitro propagation. Single-cell RNA-seq identifies three clusters they refer to as C1, C2, and C3, with the major C1 cluster said to have characteristics of airway basal cells and C2 possibly the same cells in states of proliferation. Perhaps the most immediate question raised by these data is the nature of the C1/C2 cells. Whereas they are clearly p63/Krt5+ cells as are other stem cells of the airways, do they display differentiation character of "upper airway" marked by ciliated/goblet cell differentiation or those of the lung marked by AT2 and AT1 fates? This could be readily determined by 3-D differentiation in so-called airliquid interface cultures pioneered by cystic fibrosis investigators and should be done as it would directly address the validity of the sourcing protocol for autologous cells for these transplants. This would more clearly link the present study with a previous study from the same investigators (Shi et al., 2019, Ref. 9) whereby distal airway stem cells mitigated fibrosis in the murine bleomycin model. The authors should also provide methods by which the autologous cells are propagated in vitro as these could impact the quality and fate of the progenitor cells prior to transplantation. 

      We totally agree that the sub-population of the progenitor cells should be further analyzed. We would try this in the revised manuscript. And the methods to expand P63+ lung progenitor cells have been described in full details by Frank McKeon/Wa Xian group (Rao, et.al., STAR Protocols, 2020), which is adapted to pharmaceutical-grade technology patented by Regend Therapeutics, Ltd.

      The authors should also make a more concerted effort to compare Clusters 1, 2, and 3 with the variant stem cell identified in IPF (Wang et al., 2023, Ref. 27). While some of the markers are consistent with this variant stem cell population, others are not. A more detailed informatics analysis of normal stem cells of the airways and any variants reported could clarify whether the bronchial source of autologous stem cells is the best route to these transplants.  

      We thank for reviewer for the good suggestion and would like to make more detailed comparison in the revised manuscript.

      Other than these issues the authors should be commended for these firstin-human trials for this important condition.

      Thank you so much for the kind compliment.

      Recommendations for the authors: 

      Described in the review text but the authors need to be clear about how they propagated autologous stem cells in vitro.

      (1) Perhaps the most immediate question raised by these data is the nature of the C1/C2 cells. Whereas they are clearly p63/Krt5+ cells as are other stem cells of the airways, do they display differentiation character of "upper airway" marked by ciliated/goblet cell differentiation or those of the lung marked by AT2 and AT1 fates?

      The differentiation potential of the P63+/KRT5+ basal progenitor cells have been analyzed in multiple previous literatures, which are mentioned in the revised introduction part. Basically, the human P63+ progenitor cells can differentiate into airway epithelial cells in the airway area, while give rise to immature, but functional AT1 cells in alveolar area.

      (2) The authors should also provide methods by which the autologous cells are propagated in vitro as these could impact the quality and fate of the progenitor cells prior to transplantation.

      The methods to expand P63+ lung progenitor cells have been described in full details by Frank McKeon/Wa Xian group (Rao, et.al., STAR Protocols, 2020), which is adapted to pharmaceutical-grade technology patented by Regend Therapeutics, Ltd.

      (3) A more detailed informatics analysis of normal stem cells of the airways and any variants reported could clarify whether the bronchial source of autologous stem cells is the best route to these transplants.

      We thank the reviewer for the kind suggestion and have included the comparative analysis in revised Figure S2.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1 (Public review)

      Weaknesses: 

      The main weakness of the manuscript is that to a large degree, one of its main conclusions (MAP symmetry underlies differences in regenerative capacity) relies mainly on a correlation, without firmly establishing a causal link. However, this weakness is relatively minor because (1) it is partially addressed with the Spastin KO and (2) there isn't a trivial way to show a causal relationship in this case.

      We thank Reviewer #1 for their positive assessment of our manuscript. To further strengthen the claim that MAP asymmetry underlies differences in regenerative capacity, we could investigate the effect of depleting other MAPs that lose asymmetry after conditioning lesion (CRMP5 and katanin). One would expect that similarly to spastin, this would disrupt the physiological asymmetry of DRG axons and impair axon regeneration. We further discussed this issue in the revised version of the manuscript (page 17, line 381).

      Reviewer #2 (Public review)

      Weaknesses:

      In order for the method to be used it needs to be better described. For instance what proportion of neurons develop just two axonal branches, one of which is different? How selective are the researchers in finding appropriate neurons?

      We thank Reviewer #2 for their positive assessment of our manuscript. As suggested, we included further methodological details on the in vitro system in the revised version of the manuscript. We have previously evaluated the percentage of DRG neurons exhibiting different morphologies in our cultures: multipolar (4±1%), bipolar, (35±8%) bell-shaped (17±5%), and pseudo-unipolar neurons (43±3%). This was included in the revised manuscript on Figure 1B and page 5, line 107.  All the pseudo-unipolar neurons analysed had distinct axonal branches in terms of diameter and microtubule dynamics. For imaging purposes, we selected pseudounipolar neurons with axons unobstructed from other cells or neurites within a distance of at least 20–30 μm from the bifurcation point, to ensure optimal imaging. In the case of laser axotomy experiments, this distance was increased to 100–200 μm to ensure clear analysis of regeneration. These selection criteria is now detailed in the Methods (page 19, line 417, and page 21, line 474).

      Reviewer #3 (Public review):

      (1) Weaknesses:

      While some of the data are compelling, experimental evidence only partially supports the main claims. In its current form, the study is primarily descriptive and lacks convincing mechanistic insights. It misses important controls and further validation using 3D in vitro models.

      We recognize the importance of further exploring the contribution of other MAPs to microtubule asymmetry and regenerative capacity of DRG axons. In future work, we plan to investigate this issue using knockout mice for katanin and CRMP5. Regarding the mechanisms underlying the differential localization of proteins in DRG axons, we performed in-situ hybridization to evaluate the availability of axonal mRNA but no differences were found between central and peripheral DRG axons (Figure 4 – figure supplement 2). To address whether differences in protein transport exist, we attempted to transduce DRG neurons with GFP-tagged spastin both in vitro and in vivo. However, these experiments were inconclusive as very low levels of spastin-GFP were detected. We are actively optimizing these approaches and will address this challenge in future studies. These points were further discussed in the revised manuscript (page 15, line 330 and page 17, line 381).

      (2) Given the heterogeneity of dorsal root ganglion (DRG) neurons, it is unclear whether the in vitro model described in this study can be applied to all major classes of DRG neurons. 

      We acknowledge the diversity of DRG neurons and agree that assessing the presence

      of different DRG subtypes in our culture system will enrich its future use. Despite this heterogeneity, we focused on DRG neuron features that are common to all subtypes i.e, pseudo-unipolarization and higher regenerative capacity of peripheral branches. This point was addressed on page 14, line 309 of the revised manuscript.

      (3) Also unclear is the inconsistency with embryonic DRG cultures with embryonic (E)16 from rats and E13 from mice (spastin knockout and wild-type controls). 

      Given our previous experience in establishing DRG neuron cultures from E16 Wistar rats and E13 C57BL/6 mice, these developmental stages are equivalent, yielding cultures of DRG neurons with similar percentages of different morphologies. Of note, in our colonies, gestation length is ~19 days in C57BL/6 mice (background of the spastin knockout line) and ~22 days in Wistar Han rats. This was further clarified in the Methods (page 18, line 404).

      (4) Furthermore, the authors stated (line 393) that only a small subset of cultured DRG neurons exhibited a pseudo-unipolar morphology. The authors should include the percentage of the neurons that exhibit a pseudo-unipolar morphology.

      We have previously evaluated the percentage of DRG neurons exhibiting different morphologies in our cultures: multipolar (4±1%), bipolar, (35±8%) bell-shaped (17±5%), and pseudo-unipolar neurons (43±3%). This was included in the revised manuscript on Figure 1B and on page 5, line 107. In line 393, we referred specifically to an experimental setup where DRG neuron transduction was done, and 30 transduced neurons were randomly selected for longitudinal imaging. From these, the number of viable pseudo-unipolar DRG neurons was limited by both the random nature of viral transduction and light-induced toxicity throughout continuous imaging over seven consecutive days at hourly intervals. This was clarified in the revised manuscript (page 20, line 438).

      (5) The significance of studying microtubule polymerization to DRG asymmetry in vitro is questionable, especially considering the model's validity. The authors might consider eliminating the in vitro data and instead focus on characterizing DRG asymmetry in vivo both before and after a conditioning lesion. If the authors choose to retain the in vitro data, classifying the central and peripheral-like branches in cultured DRG neurons will require further in-depth characterization. Additional validation should be performed in adult DRG neuron cultures not aged in vitro.

      The in vitro system here presented reliably reproduces several key features of DRG neurons observed in vivo, including asymmetry in axon diameter, regenerative capacity, axonal transport, and microtubule dynamics. Of note, most studies in the field have been done using multipolar DRG neurons that do not recapitulate in vivo morphology and asymmetries. Thus, the current in vitro model serves as a versatile tool for advancing our understanding of DRG biology and associated diseases. This system is particularly suited to study axon regeneration asymmetries, and enables the investigation of mechanisms occurring at the stem axon bifurcation, such as asymmetric protein transport and microtubule dynamics, which are challenging to examine in vivo due to the length of the stem axon and the difficulty of locating the DRG T-junction. It will be important to optimize similar cultures using adult DRG neurons. However, this comes with challenges, such as lower cell viability. This is the case with multiple other neuron types for which the vast majority of cultures are obtained from embryonic tissue. These concerns were addressed in the revised version of the manuscript (page 13, line 296 and page 14 line 302).

      (6) The comparison of asymmetry associated with a regenerative response between in vitro and in vivo paradigms has significant limitations due to the nature of the in vitro culture system. When cultured in isolation, DRG neurons fail to form functional connections with appropriate postsynaptic target neurons (the central branch) or to differentiate the peripheral domains associated with the innervation of target organs. Rather than growing neurons on a flat, hard surface like glass, more physiologically relevant substrates and/or culturing conditions should be considered. This approach could help eliminate potential artifacts caused by plating adult DRG neurons on a flat surface. Additionally, the authors should consider replicating their findings in a 3D culture model or using dorsal root ganglia explants, where both centrally and peripherally projecting axons are present.

      We agree that a more sophisticated system, such as a compartmentalized culture, holds great potential for future research. In this respect, we are currently engaged in developing such models. A compartmentalized system would enable the separation of three compartments: central nervous system neurons, DRG neurons, and peripheral targets. While previous efforts to create compartmentalized DRG cultures have been reported (e.g., PMID: 11275274 and PMID: 37578145), these systems have not demonstrated the development of pseudo-unipolar morphology. Incorporating non-neuronal DRG cells into the DRG neuron compartment, may successfully support the development of a pseudo-unipolar morphology. 

      We also recognize the importance of dimensionality in fostering pseudo-unipolar morphology. Of note, our model provides a 3D-like environment, as DRG glial cells are continuously replicating over the 21 days in culture. In relation to DRG explants, we attempted their use but encountered limitations with confocal microscopy as the axial resolution was insufficient to resolve processes at the DRG T-junction or within individual branches. The above issues are now discussed in the revised manuscript (page 14, line 312).

      (7) Panels 5H-J require additional processing with astrocyte markers to accurately define the lesion borders. Furthermore, including a lower magnification would facilitate a direct comparison of the lesion site. 

      In our study, we relied on the alignment of nuclei to delineate the lesion site as in our accumulated experience, this provides an accurate definition of the lesion boarder. Outside the lesion, the nuclei are well-aligned, while at the lesion site, they become randomly distributed. Additionally, CTB staining further supports the identification of the rostral boarder of the lesion, as most injured central DRG axons stop their growth at the injury site. This was further detailed in the Methods of the revised manuscript (page 32, line 730).

      (8) The use of cholera toxin subunit B (CTB) to trace dorsal column sensory axons is prone to misinterpretation, as the tracer accumulates at the axon's tip. This limitation makes it extremely challenging to distinguish between regenerating and degenerating axons.

      While alternative methods to trace or label regenerating axons exist, CTB is a wellestablished and widely used tracer for central sensory projections, as shown in different studies (PMID: 22681683, PMID: 26831088 and PMID: 33349630). Regarding the concern of possiblebCTB labeling in degenerating axons, we believe this is unlikely to be the case in our system, as in spinal cord injury controls, CTB-positive axons are nearly absent. Also, as regeneration was investigated six weeks after injury, axon degeneration has most likely already occurred as shown in (PMID: 15821747 and PMID: 25937174).

      Recommendations for the authors: 

      Reviewer #1:

      (1) Figure 1 can be improved by adding a quantification of the fraction of neurons at each stage as a function of time.

      We have updated Figure 1 to include the quantification of the percentages of different DRG neuron morphologies at DIV21 (Figure 1B), which corresponds to the stage at which all in vitro experiments were conducted.

      (2) Figure 3A: why are retrograde transport events not shown?

      Retrograde transport events are not displayed as results did not reach statistical significance.

      (3) Figure 3 and 4: Combine the quantifications of with/without lesion, such that not only the differences between branches are apparent, but also the differences induced in each branch by the lesion.

      As requested, only combined quantifications of microtubule dynamics for naive and conditioning lesion are provided in the revised version of Figure 3 (Figures 3H and 3K), to highlight both branch-specific differences and lesion-induced changes. However, for Figure 4, as the western blots for naive and conditioning lesion were performed on separate gels, it is unfeasible to combine their quantification.

      (4) Figure 5: does spastin KO lead to a difference in the "MAP signature" of each branch? Also, if in addition to MAPs there are other known molecules (and an antibody is available) that show differential localization to peripheral/central branches, it would be nice to check if this asymmetry is also lost in spastin KO.

      Evaluating the MAP signature in DRG axons from spastin KO mice will be important to explore in future experiments. Despite some scattered reports in the literature, our study is the first to identify a distinct protein signature of central and peripheral DRG axons. This is especially relevant in the case of Tau, as irrespective of the experimental conditions, its levels are always increased in the peripheral DRG axon.

      Reviewer #2:

      (1) Please provide a more complete description of the culture method. Do all neurons develop two asymmetric branches or just a few, and how are they selected? Does the timing of the events in vitro correspond with what is happening to the neurons in embryos?

      We have included the percentages of the various DRG neuron morphologies at DIV21 in the revised manuscript (Figure 1B and on page 5, line 107). Additionally, a more detailed description of the culture method is now provided in the Methods, including the criteria used to select pseudo-unipolar neurons (page 19, line 417, and page 21, line 474). 

      Regarding the timing of events, upon DRG dissociation, neurons reinitiate polarization, taking 21 days to reach approximately 40% pseudo-unipolar morphology. A similar percentage is reached at E16.5 during rat development in vivo (PMID: 8729965).

      (2) Are the neurons and their branches resting on the glia? Is there any relation to the presence of glia and the type of growth that is seen?

      Yes, neurons and their branches rest on glia. This is required for DRG pseudounipolarization. In future studies, we plan to further investigate neuron-extrinsic mechanisms leading pseudo-unipolarization, and to identify the specific glial cell type(s) needed throughout this process. This is now discussed in the revised manuscript (page 14, line 306).

      (3) Is it possible to trace microtubules so as to see whether the microtubules of the two branches mix, or whether they remain separate all the way to the cell bodies?

      We used DRG neurons transduced with EB3-GFP, to examine microtubule polymerization at the T-junction through live imaging. This revealed a high continuum of polymerization from the stem axon to the central-like axon (Figure 4 – figure supplement 2D-G). To further determine whether microtubules from both branches mix or remain separate, alternative techniques such as FIB-SEM could be performed. This point is now further discussed in the revised manuscript (page 16, line 352).

      (4) Using the term MAPs would lead readers to expect to see an analysis of different levels of MAP1, MAP2, etc. It would be interesting to see this if the authors have done it, but it is not necessary for the paper.

      We assessed the expression of MAP2 via western blot in DRG peripheral and central axons and no differences were found. This is now referred to in the Discussion (pages 15, line 327).

      (5) The regeneration experiments on the spastin knockouts are complicated by the lesion being in CNS tissue, which introduces various issues. Is there a difference in regeneration after dorsal root crush?

      We have not yet examined whether regeneration differs after dorsal root crush in the spastin knockout model. However, this presents an interesting question, as Schwann cells in the dorsal root, may support regeneration of central DRG axons.  

      Reviewer #3:

      The authors stated that the normality of the datasets was tested using the Shapiro-Wilk or D'Agostino-Pearson omnibus normality test. Given the low sample size (n=4) for some of the experiments presented (e.g., Figure 3B), it is not clear how normality was assessed which justifies the use of parametric tests.

      We followed GraphPad’s recommendations for selecting the appropriate normality test (https://www.graphpad.com/support/faqid/959/). The D'Agostino-Pearson omnibus K2 test, recommended for its versatility, was used when sample size was 8 or more. For smaller sample sizes (n < 8), we used the Shapiro-Wilk test, which is also widely used in biological research and can be employed with datasets of at least 3 values. These tests guided our decision-making regarding the use of parametric or non-parametric statistical tests.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      The manuscript by Zhang et al. explores the effect of autophagy regulator ATG6 on NPR1-mediated immunity. The authors propose that ATG6 directly interacts with NPR1 in the nucleus to increase its stability and promote NPR1-dependent immune gene expression and pathogen resistance. This novel role of ATG6 is proposed to be independent of its role in autophagy in the cytoplasm. The authors demonstrate through biochemical analysis that ATG6 interacts with NPR1 in yeast and very weakly in vitro. They further demonstrate using overexpression transgenic plants that in the presence of ATG6-mcherry the stability of NPR1-GFP and its nuclear pool is increased.

      Comments on latest version:

      The term "invasion" has to be replaced with infection, as it doesn't have much meaning to this particular study. I already explained this point in the first review, but authors did not address it throughout the manuscript.

      Thank you for your constructive feedback. We have taken your suggestion into account and replaced "invasion" with "infection" in the revised manuscript (Lines 44,45,99,100,298,341,387,415,461,463,464,1002).

      In fig. 1e there's no statistical analysis. How can one show measurements from multiple samples without statistical analysis? All the data points have to be shown in the graph and statistics performed. In the arg6-npr1 and snrk-npr1 pairs no nuclear marker is included. How can one know where the nucleus is, particularly in such poor quality low res. images? The nucleus marker has to be included in this analysis and shown. This is an important aspect of the study as nuclear localization of ATG6 is proposed to be essential for its new function.

      Thank you for bringing this to our attention. We conducted the BIFC experiments again using nls-mCherry transgenic tobacco, which yielded clearer images. The results clearly demonstrate that ATG6 interacts with NPR1 in both the cytoplasm and nucleus. YFP signaling in the nucleus co-localizes with nls-mCherry (a nuclear localization mark). SnRK2.8 was employed as a positive control for NPR1 interaction." Relative fluorescence intensity of YFP were analyzed using image J software, n = 15 independent images were analyzed to quantify YFP fluorescence. All data points are displayed in the image, and we also conducted a Student's t-test analysis. We have incorporated these results into the revised manuscript (Fig 1d and e).

      Co-localization provided in the fig. S2 cannot complement this analysis, particularly since no cytoplasmic fraction is present for NPR1-GFP in fig. S2.

      Thank you for your observation. We repeated the experiment and confirmed that NPR1 and ATG6 co-localize in both the nucleus and cytoplasm. The image in Figure S2 has been updated accordingly.

      In the alignment in fig 2c, it is not explained what are the species the atg6 is taken from. The predicted NLS has to be shown in the context of either the entire protein sequence alignment or at least individual domain alignment with the indication of conserved residues (consensus). They have to include more species in the analysis, instead of including 3 proteins from a single species. Also, the predicted NLS in atg6 doesn't really have the classical type architecture, which might be an indication that it is a weak NLS, consistent with the fact that the protein has significant cytoplasmic accumulation. They also need to provide the NLS prediction cut-off score, as this parameter is a measure of NLS strength.

      Line 150: the NLS sequence "FLKEKKKKK" is a wrong sequence.

      Thank you for your suggestion. In both plants and animals, proteins are transported to the nucleus via specific nuclear localization signals (NLSs), which are typically characterized by short stretches of basic amino acids (Dingwall and Laskey, 1991, Raikhel, 1992, Nigg, 1997). Following your recommendation, we re-predicted potential NLS sequences in the ATG6 protein using NLSExplorer (http://www.csbio.sjtu.edu.cn/bioinf/NLSExplorer). Although we did not identify a classical monopartite NLS, we discovered a bipartite NLS similar to the consensus bipartite sequence (KRX<sub>(10-12)</sub>K(KR)(KR)) (Kosugi et al., 2009)in the carboxy-terminal region (475-517 aa) of ATG6, with a cut-off score of 2.6. These findings are consistent with substantial accumulation of ATG6 in the cytoplasm and minimal accumulation in the nucleus. Additionally, our comparison of ATG6 C-terminal sequences across several species, including Microthlaspi erraticum, Capsella rubella, Brassica carinata, Camelina sativa, Theobroma cacao, Brassica rapa, Eutrema salsugineum, Raphanus sativus, Hirschfeldia incana and Brassica napus, sequence comparison indicates that this bipartite NLS is relatively conserved. We have incorporated these results into the revised manuscript (lines 450-160).

      In fig. 3d no explanation for the error bars is included, and what type of statistical analysis is performed is not explained.

      Thank you for bringing this to our attention. In Figure 3d, a Student's t-test was conducted to analyze the data. The mean and standard deviation were calculated from three biological replicates, and the relevant description has been included in the figure notes.

      Reference

      Dingwall, C. and Laskey, R.A. (1991) Nuclear targeting sequences--a consensus? Trends Biochem Sci, 16, 478-481.

      Kosugi, S., Hasebe, M., Matsumura, N., Takashima, H., Miyamoto-Sato, E., Tomita, M. and Yanagawa, H. (2009) Six classes of nuclear localization signals specific to different binding grooves of importin alpha. J Biol Chem, 284, 478-485.

      Nigg, E.A. (1997) Nucleocytoplasmic transport: signals, mechanisms and regulation. Nature, 386, 779-787.

      Raikhel, N. (1992) Nuclear targeting in plants. Plant Physiol, 100, 1627-1632.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      However, given that S1P is upstream NF-κB signaling, it is unclear if it offers conceptual innovations as compared to previous studies from the same team (Palazzo et al. 2020; 2022, 2023)

      We find distinct differences between the impacts of S1P- and NFkB-signaling on glial activation, neuronal differentiation of the progeny of MGPCs and neuronal survival in damaged retinas. In the current study we demonstrate that 2 consecutive daily intravitreal injections of S1P selectively activated mTor (pS6) and Jak/Stat3 (pStat3), but not MAPK (pERK1/2) signaling in Müller glia.  Further, inhibition of S1P synthesis (SPHK1 inhibitor) decreased ATF3, mTor (pS6) and pSmad1/5/9 levels in activated Müller glia in damaged retinas. Inhibition of NFkB-signaling in damaged chick retinas did not impact the above-mentioned cell signaling pathways (Palazzo et al., 2020). Thus, S1P-signaling impacts cell signaling pathways in MG that are distinct from NFκB, but we cannot exclude the possibility of cross-talk between NFkB and these pathways. Further, inhibition of NFκB-signaling potently decreases numbers of dying cells and increases numbers of surviving ganglion cells (Palazzo et al 2020). Consistent with these findings, a TNF orthologue, which presumably activates NFκB-signaling, exacerbates cell death in damage retinas (Palazzo et al., 2020). By contrast, 5 different drugs targeting S1P-signaling had no effect on numbers of dying cells and only one S1PR1 inhibitor modestly decreased numbers of dying cells (current study). Although two different inhibitors of NFkB-signaling suppressed the proliferation of microglia in damaged retinas (Palazzo et al., 2020), all of the S1P-targeting drugs had no effect upon the proliferation of microglia (current study). In addition, inhibition of NFκB does not influence the neurogenic potential of MGPCs in damaged chick retinas (Palazzo et al., 2020), whereas inhibition of S1P receptors (S1PR1 and S1PR3) and inhibition of S1P synthesis (SPHK1) significantly increased the differentiation of amacrine-like neurons in damaged retinas (current study). Collectively, in comparison to the effects of pro-inflammatory cytokines and NFκB-signaling, our current findings indicate that S1P-signaling through S1PR1 and S1PR3 in Müller glia has distinct effects upon cell signaling pathways, neuronal regeneration and cell survival in damaged retinas. We will revise text in the Discussion (pages 33-34) to better highlight these important distinctions between NFκB- and S1P-signaling.

      Reviewer #2 (Public review):

      Weaknesses:

      The methodology is not very clean. A number of drugs (inhibitors/ antagonists/agonists signal modulators) are used to modulate S1P expression or signaling in the retina without evidence that these drugs are reaching the target cells. No alternative evaluation if the drugs, in fact, are effective. The drug solubility in the vehicle and in the vitreous is not provided, and how did they decide on using a single dose of each drug to have the optimal expected effect on the S1P pathway?

      Müller glia are the predominant retinal cell type that expresses S1P receptors. Consistent with these patterns of expression, we report Müller glia-specific effects of different agonists and antagonists that increase or decrease S1P-signaling. Since we compare cell-level changes within contralateral eyes wherein one retina is exposed to vehicle and the other is exposed to vehicle plus drug, it seems highly probable that the drugs are eliciting effects upon the Müller glia. It is possible, but very unlikely, that the responses we observed could have resulted from drugs acting on extra-retinal tissues, which might secondarily release factors that elicit cellular responses in Müller glia. However, this seems unlikely given the distinct patterns of expression for different S1P receptors in Müller glia, and the outcomes of inhibiting Sphk1 or S1P lyase on retinal levels of S1P.

      For example, we provide evidence that S1PR1 and S1PR3 expression is predominant in Müller glia in the chick retina using single cell-RNA sequencing and fluorescence in situ hybridization (FISH). Thus, we expect that S1PR1/3-targeting small molecule inhibitors to directly act on Müller glia, which is consistent with our read-outs of cell signaling with injections of S1P in undamaged retinas. We show that SPHK1 and SGPL1, which encode the enzymes that synthesize or degrade S1P, are expressed by different retinal cell types, including the Müller glia. The efficacy of the drugs that target SPHK1 and SGPL1 was assessed by measuring levels of S1P in the retina. By using liquid chromatography and tandem mass spectroscopy (LC-MS/MS), we provide data that inhibition of S1P synthesis (inhibition of SPHK1) significantly decreased levels of S1P in normal retinas, whereas inhibition of S1P degradation (inhibition of SGPL1) increased levels of S1P in damaged retinas (Fig. 5).  These data suggest that the SPHK1 inhibitor and the SGPL1 inhibitor specifically act at the intended target to influence retinal levels of S1P.  Further, inhibition of SPHK1 (to decrease levels S1P) results in decreased levels of ATF3, pS6 (mTor) and pSMAD1/5/9 in Müller glia, consistent with the notion that reduced levels of S1P in the retina impacts signaling at Müller glia. Finally, we find similar cellular responses to chemically different agonists or antagonists, and we find opposite cellular responses to agonists and antagonists, which are expected to be complimentary if the drugs are specifically acting at the intended targets in the retina. We will revise the Discussion to better address caveats and concerns regarding the actions and specificity of different drugs within the retina following intravitreal delivery.

      We will provide the drug solubility specifications and estimates of the initial maximum dose per eye for each drug. For chick eyes between P7 and P14, these estimates will assume a volume of about 100 ul of liquid vitreous, 800 ul gel vitreous and an average eye weight of 0.9 grams. We will revise Table 1 (pharmacological compounds) with ranges of reported in vivo ED50’s (mg/kg) for drugs and we will list the calculated initial maximum dose (mg/kg equivalent) per eye. Doses were chosen based on estimates of the initial maximum ocular dose that were within the range of reported ED50’s. However, as is the case for any in vivo model system, it is difficult to predict rates of drug diffusion out of the vitreous, how quickly the drugs are cleared from the entire eye, how much of the compound enters the retina, and how quickly the drug is cleared from the retina. Accordingly, we assessed drug specificity and sites of activation by relying upon readouts of cell signaling pathways that are parsed with patterns of expression of different S1P receptors and measurements of retinal levels of S1P following exposure to drugs targeted enzymes that synthesize or degrade S1P, as described above. 

      Reviewer #1 (Recommendations for the authors):

      I am wondering if Muller glia can be considered as fully differentiated at early postnatal stages as those used in this study. Is this mechanism operative in adult retinas? Could the authors perform studies in older animals, just to have the proof of principle that the proposed mechanism is retained.

      Chickens are considered to be adult at about 4 months of age, when the females start laying eggs. Unfortunately, housing, maintenance, handling and experimentation on large adult chickens has proven to be challenging. Nevertheless, there is evidence that Muller glia reprogramming remains robust in mature chick retinas from the P1 through P30, but the zones of proliferation shift away from central retina and become increasingly confined to the retinal periphery (Fischer, 2005). MG “maturation” appears to occur in a central-to-peripheral gradient, much like the process of embryonic retinal differentiation, but a zone of regeneration-competent MG remains in the periphery during adolescent development (Fischer, 2005).

      We have defined central vs peripheral retina in the Methods.

      To partially address this question, we have generated a new supplemental Figure 6 showing (i) SPHK1 fluorescent in-situ labeling of central and peripheral regions at P10, and (ii) analysis of EdU+Sox2+ MGPCs in central versus regions treated with NMDA +/-S1PR1 inhibitor or NMDA+/- SPHK1 inhibitor. We find that patterns of S1PR1 transcription in the central region are similar to the peripheral region (not shown), and S1PR1 inhibition modestly increased numbers of MGPCs in central regions. Unlike the peripheral regions of retina, SPHK1 FISH signal in the central region remains low at 48 hours post-injury (supplemental Fig. 6). Additionally, we found that the SPHK1 inhibitor had no effect on numbers of proliferating MGPCs in the central regions of retina, whereas SPHK1 inhibitors stimulated proliferation of MGPCs in the periphery (Fig. 4). It is likely that mature MG in central retinal regions are not responsive to SPHK1 inhibition due to low levels of expression.

      We have previously shown that Notch-related genes show unique patterns of expression in the central and peripheral retinas, and expression levels significantly change at P0, P7, and P21 (Ghai et al, 2010). We found that Notch inhibition reduced cell death and numbers of MGPCs in central regions but not peripheral regions. Recent sc-RNA sequencing analysis of murine macula and peripheral retinal regions has revealed interesting differences in NFKBIA/Z and NFIA expression, possibly indicating a difference in the early inflammatory transcriptional response to retinal damage (Zhang et al, 2024 biorxiv). We believe that spatial sequencing of peripheral “immature” and central “mature” chick Muller glia will be a useful tool in the future to reveal key differences in signaling pathway-related gene expression which confer a competence for regeneration in the periphery.

      We have added text to the Results (pages 20-21) and Discussion (page 32) to address the S1P-signaling in central (mature MG) vs peripheral (immature MG) regions of the retina.

      Minor points.

      The abstract is difficult to follow and consists of a list of what activates or represses the formation of MGPC. Please rewrite the abstract to integrate information and provide a clearer message. Also, please include the species of study in the abstract and mention it again at the beginning of the results, at least.

      We have rewritten the abstract to simplify and clarify our main points (p 2).

      Lines 65-69. The sentence is unclear, perhaps there are words either missing or in excess and there is a need to check the spelling.

      We have simplified this sentence to improve clarity and referenced our recently published review to support.

      Lines 112-113. Please explain why " retinas were treated with saline, NMDA, or 2 or 3 doses insulin+FGF2 and the combination of NMDA and insulin+FGF2". There is a reference but readers will appreciate understanding right away why.

      We have added a sentence to clarify the purpose of comparing gene expression patterns in MG and MGPCs in NMDA-damaged retinas versus retinas treated with insulin+FGF2.

      Lines 223-257. This list of experiments is difficult to follow and perhaps should be summarized better. Somehow lines 257-261 say it all.

      We have revised this section to clarify differences in outcomes between S1PR1/3 activators and inhibitors. We also stated the enzymatic functions of SPHK1 and SGPL1 to improve clarity.

      Lines 392-441. Comparative expression analysis should be summarized as the message is somehow simple but the description is rather lengthy.

      We have revised our comparative expression analyses to be more concise.

      Reviewer #2 (Recommendations for the authors):

      (1) Only a single dose of the drugs (inhibitor/ antagonists/agonists signal modulators) is used for each drug, as shown in Table 1. How do they know this is an effective dose?

      We estimated the appropriate dose based on the initial maximum dose, which we based on the reported ED50 values for each drug. We have revised Table 1 to include this information.

      (2) Most of the drugs appeared to be hydrophobic, but except for sphingosine and S1P, all are described to be injected with sterile saline. They must provide solubility characteristics of these drugs in solvents. For example, FTY720 is not water-soluble, which raises the question of all of their drugs' solubility, bioavailability to the cells of interest, and their effectivity in signal transduction in the retinal cells.

      Some S1P-targeting compounds were delivered in 20% DMSO in saline to support the solubility of the different lipophillic small molecule agonists/antagonists. We have added information to the Methods to describe the use of DMSO to solubilize these drugs (p 6) in Table 1 and p 5. We have also revised Table 1 with ranges of reported ED50’s (mg/kg) for all drugs and listed the calculated initial maximum dose (mg/kg) per eye.

      (3) Drugs were delivered to the vitreous chamber, but there was no information on how they would cross the inner limiting membrane to affect or modulate S1P metabolism in retinal MG or to bind the S1P receptors on MG or other retinal cell types.

      All selected compounds are small-molecule drugs, many of which are structural analogues of sphingosine or S1P. These drugs would be classified as BDDCS Class II drugs, meaning they have low solubility but high cell permeability. Thus, it is highly probable that they diffuse across the ILM to act on S1P receptors on MG, but it is also likely that their bioavailability is more limited, requiring a higher dose, repeated doses, and the use of solubilizing agents. We have clarified our use of DMSO to solubilize these drugs (p 6) according to vendor recommendations (p 5). This information has been added to the Methods.

      (4) Gene expression is a very dynamic process; without providing more evidence that the expression changes are the direct effect of the drug treatment, the conclusions made based on the gene expression profiles are not strong. Additional points:

      We do not make assertions that changes in scRNA-seq expression profiles are the direct result of S1P-targetting drugs. We report significant changes in cellular expression profiles following NMDA-induced retinal damage or ablation of microglia. We feel that new experiments to assess the gene expression profiles of retinal cells that are directly downstream of the different S1P-targetting drugs is better suited for future studies.

      (5) Please add in the introduction that there is only one sphingosine kinase in chicken, as no SPHK2 is known to be present.

      We have added additional information regarding the expression of SPHK1 and SPHK2 genes in the chick genome (p 4).

      (6) Fig 1d and in many other UMAP clusters, the low expressing genes are barely visible (Ex. 1d, S1PR2, and S1PR3); please extract them in separate UMAP clusters and provide them in supplements.

      We have revised supplemental Figure 1 to include separate panels for each of the S1P-related gene.

      (7) The Figure References for SPHK1 (Fig. 2e), SGPL1 (Fig. 2e), ASAH1 (Fig. 2f), CERS6 (Fig. 2f), and CERS5 (Fig. 2f) in the line # 124- 132 should belong to Figure 1, not Figure 2.

      We have corrected these figure references (p 14).

      (8) The description of the expression of zebrafish genes does not match the figures. For example, 'Similarly, sphk1 was detected in very few cells in the retina (Fig. 10j). By comparison, sphk2 was detected in a few bipolar cells and rod photoreceptors (Fig. 10j). Similar to patterns of expression seen in chick and human retinas, sgpl1 was detected in microglia and a few cells scattered among the different clusters of inner retinal neurons and rod photoreceptors (Fig. 10j)', the expression of these genes are not in very few or few scattered cells rather in many cells.

      We have revised these statements to improve clarity and more accurately describe the data in Figure 10 (p 28).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors employed a combinatorial CRISPR-Cas9 knockout screen to uncover synthetically lethal kinase genes that could play a role in drug resistance to kinase inhibitors in triple-negative breast cancer. The study successfully reveals FYN as a mediator of resistance to depletion and inhibition of various tyrosine kinases, notably EGFR, IGF-1R, and ABL, in triple-negative breast cancer cells and xenografts. Mechanistically, they demonstrate that KDM4 contributes to the upregulation of FYN and thereby is an important mediator of drug resistance. All together, these findings suggest FYN and KDM4A as potential targets for combination therapy with kinase inhibitors in triple-negative breast cancer. Moreover, the study may also have important implications for other cancer types and other inhibitors, as the authors suggest that FYN could be a general feature of drug-tolerant persister cells.

      Strengths:

      (1) The authors used a large combination matrix of druggable tyrosine kinase gene knockouts, enabling studying of co-dependence of kinase genes. This approach mitigates off-target effects typically associated with kinase inhibitors, enhancing the precision of the findings.

      (2) The authors demonstrate the importance of FYN in drug resistance in multiple ways. They demonstrate synergistic interactions using both knockouts and inhibitors, while also revealing its transcriptional upregulation upon treatment, strengthening the conclusion that FYN plays a role in the resistance.

      (3) The study extends its impact by demonstrating the potent in vivo efficacy of certain combination treatments, underscoring the clinical relevance of the identified strategies.

      Weaknesses:

      (1) The methods and figure legends are incomplete, posing a barrier to the reproducibility of the study and hindering a comprehensive understanding and accurate interpretation of the results.

      We thank the reviewer for pointing this out. We tried adding as much detail in methods and figures legends as possible to maximize reproducibility and accuracy in interpreting our results as will be described for our responses for the recommendations for authors.

      (2) The authors make use of a large quantity of public data (Fig. 2D/E, Fig. 3F/L/M, Fig 4C, Fig 5B/H/I), whereas it would have strengthened the paper to perform these experiments themselves. While some of this data would be hard to generate (e.g. patient data) other data could have been generated by the authors. The disadvantage of the use of public data is that it merely comprises associations, but does not have causal/functional results (e.g. FYN inhibition in the different cancer models with various drugs). Moreover, by cherry-picking the data from public sources, the context of these sources is not clear to the reader, and thus harder to interpret correctly. For example, it is not directly clear whether the upregulation of FYN in these models is a very selective event or whether it is part of a very large epigenetic re-programming, where other genes may be more critical. While some of the used data are from well-known curated databases, others are from individual papers that the reader should assess critically in order to interpret the data. Sometimes the public data was redundant, as the authors did do the experiments themselves (e.g. lung cancer drug-tolerant persisters), in this case, the public data could also be left out.

      More importantly, the original sources are not properly cited. While the GEO accession numbers are shown in a supplementary table, the articles corresponding to this data should be cited in the main text, and preferably also in the figure legend, to clarify that this data is from public sources, which is now not always the case (e.g. line 224-226). If these original papers do already mention the upregulation of FYN, and the findings from the authors are thus not original, these findings should be discussed in the Discussion section instead of shown in the Results.

      We welcome the reviewer’s concern. As reviewer pointed out, our analysis with FYN expression levels in multiple studies with drug tolerant cells may merely reflect association and not causal relationships. We had at least shown that FYN inhibition may reduce drug tolerance in TNBC and EGFR inhibitor treated lung cancer cells (figures 2H, 5E). The causal role of FYN in emergence of drug tolerance in other cancers treated with different drugs (such as irinotecan treated colon adenocarcinoma and gemcitabine treated pancreatic adenocarcinoma) may be beyond scope of this study. We made a brief discussion addressing this concern in lines 273-275.

      We also added proper citations of the public data used in this study in main text and figure legends in lines 267-269. The GEO accession numbers are listed in supplementary table S2. Importantly, none of the referenced studies identified FYN as key factor in generating drug tolerant cells.

      (3) The claim in the abstract (and discussion) that the study "highlights FYN as broadly applicable mediator of therapy resistance and persistence", is not sufficiently supported by the results. The current study only shows functional evidence for this for an EGFR, IGF1R, and Abl inhibitor in TNBC cells. Further, it demonstrates (to a limited extent) the role of FYN in gefitinib and osimertinib resistance (also EGFR inhibitors) in lung cancer cells. Thus, the causal evidence provided is only limited to a select subset of tyrosine kinase inhibitors in two cancer types. While the authors show associations between FYN and drug resistance in other cancer types and after other treatments, these associations are not solid evidence for a causal connection as mentioned in this statement. Epigenetic reprogramming causing drug resistance can be accompanied by altered gene expression of many genes, and the upregulation of FYN may be a consequence, but not a cause of the drug resistance. Therefore, the authors should be more cautious in making such statements about the broad applicability of FYN as a mediator of therapy resistance.

      We fully agree with the reviewer’s concern that FYN upregulation is simply an association, and may not be the cause of drug tolerance and resistance. Therefore, to accurately convey our findings, we edited our manuscript in lines 34-36 in abstract to “FYN expression is associated with therapy resistance and persistence by demonstrating its upregulation in various experimental models of drug-tolerant persisters and residual disease following targeted therapy, chemotherapy, and radiotherapy” and lines 288-290 in discussion to “ Upregulation of FYN is a general feature of drug tolerant cancer cells, suggesting the association of FYN expression with drug resistance and tumor recurrence after treatment.” We hope this satisfies the reviewer.

      (4) The rationale for picking and validating FYN as the main candidate gene over other genes such as FGFR2, FRK2, and TEK is not clear.

      a. While gene pairs containing FGFR2 knockouts seemed to be equally effective as FYN gene pairs in the primary screening, these could not be validated in the validation experiment. It is unclear whether multiple individual or a pool of gRNAs were used for this validation, or whether only 1 gRNA sequence was picked per gene for this validation. If only 1 gRNA per gene was used, this likely would have resulted in variable knockout efficiencies. Moreover, the T7 endonuclease assay may not have been the best method to check knockout efficiency, as it only implies endonuclease activity around a gene (but not to the extent of indels that can cause frameshifts, such as by TIDE analysis, or extent of reduction in protein levels by western blot).

      b. Moreover, FRK2 and TEK, also demonstrated many synergistic gene pairs in the primary screen. However, many of these gene pairs were not included in the validation screening. The selection criteria of candidate gene pairs for validation screening is not clear. Still, TEK-ABL2 was also validated as a strong hit in the validation screen. The authors should better explain the choice of FYN over other hits, and/or mention that TEK and FRK2 may also be important targets for combination treatment that can be further elucidated.

      We thank the reviewer for improving our manuscript. We had concerns with the generalizability of FGFR2, FRK and TEK in TNBC as their expressions are very low in MDA-MB-231, nor were they enriched in TNBC compared to cancer cell lines of other subtypes. We added a brief comment on this concern in results section and discussion section (lines 150-154, figure S3). Although we acknowledge that the validations done in figure 2B is a result of only one guide RNA, with validations with pharmacological inhibition of FYN (figure 2F-I), we hope the reader and reviewer can be convinced with our key findings in synthetic lethality between FYN and other tyrosine kinases.

      (5) On several occasions, the right controls (individual treatments, performed in parallel) are not included in the figures. The authors should include the responses to each of the single treatments, and/or better explain the normalization that might explain why the controls are not shown.

      a. Figure 2G: The effect of PP2 treatment, without combined treatment, is not shown.

      b. Figure 2H/3G: The effect of the knockouts on growth alone, compared to sgGFP, is not demonstrated. It is unclear whether the viability of knockouts is normalized to sgGFP, or to each untreated knockout.

      c. Figure 2L: The effect of SB203580 as a single treatment is not shown.

      We thank the reviewer for pointing this out. The data shown for all figures listed in these concerns were normalized by the changes in viability by pharmacological or genetic perturbations that synergized with TKIs (NVP-ADW742, gefitinib…etc.) used in the figures in the original manuscript. As reviewer had suggested, we newly added the effect of SB203580 and PP2 treatment on cell viability in supplementary figures S4A, S4K. SB203580 had no significant effect on cell viability, while PP2 treatment caused significant decrease in cell viability, which is expected as PP2 can inhibit activity of multiple Src family kinases. Regardless of the effect of SB203580 and PP2 on cell viability as single agent, it is evident that treatment of TKIs synergistically decreased cell viability in cancer cell lines. The change in viability by FYN or histone lysine demethylase knockout was also provided in newly added figure S4D and S6C. Notably, genetic ablation of FYN or histone lysine demethylases had modest, if any, influences on cell viability.

      (6) The study examines the effects at a single, relatively late time point after treatment with inhibitors, without confirming the sequential impact on KDM4A and FYN. The proposed sequence of transcriptional upregulation of KDM4A followed by epigenetic modifications leading to FYN upregulation would be more compellingly supported by demonstrating a consecutive, rather than simultaneous, occurrence of these events. Furthermore, the protein level assessment at 48 hours (for RNA levels not clearly described), raises concerns about potential confounding factors. At this late time point, reduced cell viability due to the combination treatment could contribute to observed effects such as altered FYN expression and P38 MAPK phosphorylation, making it challenging to attribute these changes solely to the specific and selective reduction of FYN expression by KDM4A.

      We thank the reviewer for pointing this out. We performed time course experiment for NVP-ADW742 treatment on MDA-MB-231 cells in our newly added figure 3E. Surprisingly, treatment of NVP-ADW742 increased KDM4A protein level within two hours. FYN protein accumulation followed KDM4A accumulation after 24 hours. This observation, with our chromatin immunoprecipitation data in figure 3O, provide evidence that FYN accumulation is a consequence of KDM4A accumulation and H3K9me3 demethylation upon TKI treatment. We newly discussed this data in results and discussion section in lines 214-216.

      (7) The cut-off for considering interactions "synergistic" is quite low. The manual of the used "SynergyFinder" tool itself recommends values above >10 as synergistic and between -10 and 10 as additive ( https://synergyfinder.fimm.fi/synergy/synfin_docs/). Here, values between 5-10 are also considered synergistic. Caution should be taken when discussing those results. Showing the actual dose response (including responses to each single treatment) may be required to enable the reader to critically assess the synergy, along with its standard deviation.

      We thank the reviewer for careful comments. We reanalyzed our data with SynergyFinder plus tool (Zheng, Genomics, Proteomics, and Bioinformatics 2022), which implements mathematical models distinct from SynergyFinder 3, for more faithful implementation of Bliss, Loewe independence models, and more critically, calculates statistical significance of the synergy. We provide updates synergy plots with statistics in figures 2F, 3J, and S4B. All drug combinations show statistically significant synergy (p<0.01). We also add raw data used to calculate synergy in figures 2F, 3J and S4B in supplementary dataset S2.

      (8) As the effect size on Western blots is quite limited and sometimes accompanied by differences in loading control, these data should be further supported by quantifications of signal intensities of at least 3 biological replicates (e.g. especially Figure 3A/5A). The figure legends should also state how many independent experiments the blots are representative of.

      We added quantifications for figure 3A and 5A for better depiction of our results. Figure legends were edited to indicate this is a representative of three independent experiments.

      (9) While the article provides mechanistic insights into the likely upregulation of FYN by KDM4A, this constitutes only a fragment of the broader mechanism underlying drug resistance associated with FYN. The study falls short in investigating the causes of KDM4A upregulation and fails to explore the downstream effects (except for p38 MAPK phosphorylation, which may not be complete) of FYN upregulation that could potentially drive sustained cell proliferation and survival. These omissions limit the comprehensive understanding of the complete molecular pathway, and the discussion section does not address potential implications or pathways beyond the identified KDM4A-FYN axis. A more thorough exploration of these aspects would enhance the study's contribution to the field.

      We welcome the reviewer’s careful concern. We agree our delineation of mechanisms underlying TKI resistance in TNBC involving KDM4 and FYN is far from complete. The increases in expression of histone demethylases were observed in cancers treated with different drugs. The mechanisms governing the increase in histone demethylase expression is not known and is beyond the scope of this paper. We newly added this in discussion section in lines 299-304.

      (10) FYN has been implied in drug resistance previously, and other mechanisms of its upregulation, as well as downstream consequences, have been described previously. These were not evaluated in this paper, and are also not discussed in the discussion section. Moreover, the authors did not investigate whether any of the many other mechanisms of drug resistance to EGFR, IGF1R, and Abl inhibitors that have been described, could be related to FYN as well. A more comprehensive examination of existing literature and consideration of alternative or parallel mechanisms in the discussion would enhance the paper's contribution to understanding FYN's involvement in drug resistance.

      FYN has been implicated in TKI resistance in CML cell lines (Irwin, Oncotarget, 2015). In this study, FYN is similarly transcriptionally upregulated in imatinib resistant CML, and this upregulation is dependent on EGR1 transcription factor. To address this concern, we generated EGR1 KO MDA-MB-231 cells and tested whether these cells retain the ability to accumulate FYN. Consistent with the previous study, imatinib treatment increased EGR1 protein level. However, EGR1 knockout did not influence FYN accumulation in MDA-MB-231 cells. EGR1 mediated accumulation of FYN may be context specific phenomenon to CML (Figure S5B). We newly discussed this result in result sections in lines 187-190. We also acknowledge that SRC family kinases are generally involved in drug resistance in many cancers. We discuss the recent findings regarding SRC family kinases in drug resistance in result section in lines 145-147 and discussion sections in lines 315-317.

      Reviewer #2 (Public Review):

      Summary:

      Kim et al. conducted a study in which they selected 76 tyrosine kinases and performed CRISPR/Cas9 combinatorial screening to target 3003 genes in Triple-negative breast cancer (TNBC) cells. Their investigation revealed a significant correlation between the FYN gene and the proliferation and death of breast cancer cells. The authors demonstrated that depleting FYN and using FYN inhibitors, in combination with TKIs, synergistically suppressed the growth of breast cancer tumor cells. They observed that TKIs upregulate the levels of FYN and the histone demethylase family, particularly KDM4, promoting FYN expression. The authors further showed that KDM4 weakens the H3K9me3 mark in the FYN enhancer region, and the inhibitor QC6352 effectively inhibits this process, leading to a synergistic induction of apoptosis in breast cancer cells along with TKIs. Additionally, the authors discovered that FYN is upregulated in various drug-resistant cancer cells, and inhibitors targeting FYN, such as PP2, sensitize drug-resistant cells to EGFR inhibitors.

      Strengths:

      This study provides new insights into the roles and mechanisms of FYN and KDM4 in tumor cell resistance.

      Weaknesses:

      It is important to note that previous studies have also implicated FYN as a potential key factor in drug resistance of tumor cells, including breast cancer cells. While the current study is comprehensive and provides a rich dataset, certain experiments could be refined, and the logical structure could be more rigorous. For instance, the rationale behind selecting FYN, KDM4, and KDM4A as the focus of the study could be more thoroughly justified.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The methods and figure legends are incomplete, posing a barrier to the reproducibility of the study and hindering a comprehensive understanding and accurate interpretation of the results. A critical revision of these aspects is needed, for example:

      a. Catalogue numbers of certain products critical to reproduce the study (e.g. antibodies) and/or at what company they have been purchased (e.g. used compounds)

      b. On several occasions the used concentrations of drugs or exposure time are not mentioned (e.g. Figure 2H, G (PP2), I, J, K, L, etc.)

      c. Figure legend of figure panels E-I in Figure 5 seems to be completely incorrect and not consistent with the figure axis etc.

      d. RT-qPCR methodology is not described in Methods.

      e. Western blot methods are very limited: these should be described in more detail or cite an article that does.

      f. Organoid culture: Information about the source of tumour cells (e.g. pre-treatment biopsy, material after surgery), isolation of tumor cells (e.g. methodology, characterization of material) and culture conditions (e.g. culture time before the experiment) is lacking.

      g. Information about how gefitinib/osimertinib-resistant PC9 and HCC827 cells are generated (as well as culture conditions and where they are from) is missing.

      We thank the reviewer for pointing these out. We have done our best to add experimental details for reproducibility in methods section and figure legends in lines 343-348, 408-426, 431-432, 439-453, 648-650, 671-672 and 691-693.

      (2) Figure 1B/C/D: it would be more meaningful if the most important hits (at least in one of these panels) were highlighted (e.g. line with gene-pair named), or visualized separately, so that the reader does not have to read the supplementary table to know what the most important hits were.

      We thank the reviewer for careful concern. We newly added labels for key synergistic gene pairs in figures 1D as reviewer suggested.

      (3) qPCR data shown in Figure S4 is from 1 independent experiment. As these experiments (especially qPCR) can be rather variable and the effect size is not very large, I would highly recommend repeating these experiments, or excluding them, as conclusions from them are not solid.

      We found performing qPCR with many drugs that did not cause substantial synergistic cell death with NVP-ADW742 in figure S5C (figure S4A in previous version of manuscript) will not provide much additional insights. Also, as we were more interested in finding direct regulators of FYN expression, we focused on drugs that inhibit epigenetic regulator that activate transcription. Therefore, we focused on performing FYN qPCR with drug combinations involving GSK-J4 (KDM6 inhibitor) and pinometostat(DOT1L inhibitor). As shown in our newly added figure in S5D, while GSK-J4 inhibited FYN expression, pinometostat failed to do so. Also, we also confirm that knockout of KDM5 or KDM6 reproducibly failed to decrease FYN expression upon TKI treatment (figure S5E and S5G). The new results are discussed in lines 193-198. We hope these additions satisfy the reviewer.

      (4) For validation of synergistic knockouts, it would be helpful for the interpretation to also show the viability/growth of each knockout (or treatment), instead of mostly normalized scores. For example, the reader now has no insight into whether FYN knockout itself already affects cell viability, or not. If it (or EGFR/IGF1R/ABL knockout) would already substantially affect cell viability, a further reduction in cell viability may not be as relevant as when it would not affect cell viability at all.

      We thank the reviewer for pointing this out. We replaced our figure in figure 2A to indicate raw changes in cell viability in each single and double knockout cells in figure S2A. We hope this satisfies the reviewer.

      (5) The curve fitting as in Figure 2G is somewhat misleading. While the curve seems to be forced to go from 1-0, the +PP2 dose-response curve does actually not seem to start at 1, but rather at 0.8, likely resulting from the effect of PP2 as a single treatment, thus, effects may be interpreted as more synergistic than that they truly are.

      The results shown in figure 2G is actually normalized to cells treated or not with PP2 to better reflect the effect of NVP-ADW742, gefitinib and imatinib in the presence of PP2. So viability value starting at 0.8 is not because of the effect of PP2 treatment as single agent (because it is normalized to PP2 treated cells), but is actually because very small dose of particularly NVP-ADW742 resulted in modest decrease in viability. To more accurately depict our findings, we added the data point in figure 2G with TKI dose of 0uM at viability 1. We also added details for normalization of viability in figure legends.

      (6) The readability of the paper could be enhanced by higher-quality images (now the text is quite pixelated).

      We had technical difficulties in converting file types. We have replaced figures for better resolution for all main and supplementary figures.

      (7) The discussion now contains one paragraph about the selectivity of kinase inhibitors, and that repurposing of inhibitors with more relaxed specificity or multi-kinase inhibitors can be beneficial. This does not seem to fall within the scope of the study, as there was no comparison between selective and non-selective inhibitors. It was also not clearly mentioned that the non-selective inhibitors worked better than the gene knockouts, or that for example, KDM3 and KDM4 knockout together worked better than only KDM4 knockout. It is recommended to either remove this paragraph, or rephrase it so that it better fits the actual results

      We agree with the reviewer. We chose to remove this paragraph in lines 308-313.

      (8) The entire paper does not discuss any known functions of FYN. Its function could be very briefly introduced in the results section when highlighting it as an important hit. More importantly, its known role in cancer and especially drug resistance should be discussed in the discussion (see also Public review).

      We thank the reviewer for pointing this out. We added brief description of the role of FYN in cancer malignancy and drug resistance in lines 145-147. Particularly, FYN accumulation by EGR1 transcription factor had been described in the context of imatinib resistant chronic myeloid leukemia (Irwin, Oncotarget, 2015). To address this, we tested whether EGR1 knockout decreases FYN level in MDA-MB-231 (Figure S5A). Notably EGR1 knockout failed to decrease FYN protein level. This result was discussed in lines 187-190.

      (9) Textual changes including:

      a. Line 29 (and others) "Massively parallel combinatorial CRISPR screens": I would rather choose a more descriptive term, such as "combinatorial tyrosine kinase knockout CRISPR screen", which already clarifies the screen used knockouts of (druggable) tyrosine kinases only. Using both "Parallel" and "combinatorial" is somewhat redundant, and "massively" is subjective, in my opinion.

      Manuscript edited as suggested (lines 29, 63, 86, 283). The term “massively parallel” have been removed as they don’t significantly change our scientific findings.

      b. Line 67 (and others): "to identify ... for elimination of TNBC": while this may be its potential implication, this study has identified genes in (mostly) TNBC cell lines and cell line xenografts. Please rephrase to something more within the scope of this research.

      Manuscript edited as suggested (lines 68-69) as “we utilize CombiGEM-CRISPR technology to identify tyrosine kinase inhibitor combinations with synergistic effect in TNBC cell line and xenograft models for potential combinatorial therapy against TNBC.” We hope it satisfies the reviewer.

      c. Line 31 (and others): Please check the capitals of words describing inhibitors, and make them consistent (e.g. Imatinib written with capital I, other inhibitors without capitals).

      We thank the reviewer for catching this error. We changed all “imatinib” and “osimertinib” to lowercase.

      d. Line 71: "... combining PP2, saracatinib (FYN inhibitor), .." ..." Here it is not clear PP2 is a FYN inhibitor, and, as saracatinib is a well-known Src-inhibitor, it is not correct to just say "FYN inhibitor". Better to rephrase to something such as:  "combining PP2 (Lck/Fyn inhibitor), saracatinib (Src/FYN inhibitor).

      As reviewer noted, most Src family kinase inhibitors are not selective against specific member among other Src family members. Therefore, we changed line 73 to “PP2, saracatinib (Src family kinase / FYN inhibitor).”

      e. Line 81: "The resulting library enabled massively parallel screens of pairwise knockouts, .." To clarify this is for the selected kinases only: "The resulting library enabled screens of pairwise knockouts of the 76 tyrosine kinase genes, .."

      Manuscript edited as suggested by the reviewer in line 86.

      f. Line 88 (and others): "after infection" consider rephrasing to "after transduction" as this is more commonly used when using lentiviral vectors only.

      We thank the reviewer for this. Every “infection” that designates lentiviral transduction were changed to “transduction”.

      g. Line 97-99: While being described as "good" correlation, a correlation of the same sgRNA pair, yet in a different order, of r=0.5 does not seem to be very good, neither does a correlation of r=0.74 for biological replicates. Please consider describing in a less subjective way.

      We removed the subjective terms and changed the manuscript as follows: “sgRNA pair (e.g., sgRNA-A + sgRNA-B and sgRNA-B + sgRNA-A) were positively correlated (r = 0.50) and were combined when calculating Z (Fig. S1D). The Z scores for three biological replicates were also correlated with r = 0.74 between replicates #2 and #3 (Fig. S1E).” in lines 97-101.

      h. Lines 92-96 and lines 102-115: The results section here contains quite a lot of technical information. While some information may be directly needed to understand the described results (such as a very short and simple explanation of how to interpret gene interaction score), other information may be more appropriate for the Methods section, to enhance the readability of the paper. Consider simplifying here and giving a more detailed overview in the Methods section. Also, the text is not entirely clear. You seem to give two separate explanations of how the GI scores were calculated (Starting in lines 106 and 111): please rephrase and clearly indicate the connections between those two explanations (in the Methods section).

      We thank the reviewer for valuable suggestion. We moved significant portions of the technical descriptions in methods section. We also clarified the text regarding the procedures for calculating GI scores in lines 385-387.

      i. Line 142: "These findings suggest that gene A could represent an attractive drug target.." "Gene A" should be "FYN"?

      We thank the reviewer for catching this. Indeed, it is “FYN” and we changed it in line 154.

      j. Line 149: Introduce Saracatinib, and make the reader aware that it actually mostly targets Src, and FYN with lower affinity.

      We newly added text in lines 73 and 164 to indicate that saracatinib is an inhibitor against Src family kinases.

      k. Line 469: "by the two sgRNA." "by the two sgRNAs".

      Corrected

      l. Throughout text/figures/figure legends, please check for consistency in the naming of cell lines, compounds, referring to figures etc. (E.g. MDA-MB-231/MDA MB 231/MDAMB-231 ; Fig. 1/Figure 1).

      Corrected. Thank you for catching this error.

      m. In Methods, frequently ug or uL are used instead of µg or µL

      Corrected.

      n. Legend Figure 5: Clarify what A, G, I, D, and P mean.

      Corrected in line 685-686 to: “A: NVP-ADW742, G: gefitinib, I: imatinib, D: doxorubicin, P: Paclitaxel.”

      o. Line 303: What is meant by: "The six variable nucleotides were added in reverse primer for multiplexing". Could you clarify this in the text?

      We apologize for confusion the six nucleotides is index sequence for multiplexed run in NGS. The text in lines 373-374 is edited to: “The six nucleotides described as “NNNNNN” in reverse primer above represents unique index to identify biological replicates in multiplexed NGS run.”

      Reviewer #2 (Recommendations For The Authors):

      To enhance the robustness of the conclusions drawn from this study, certain concerns merit attention.

      Concerns:

      (1) Line 130 indicates that eight synergistic target gene combinations were validated. It would be helpful to clarify the criteria used to select these gene pairs and provide the rationale for studying these specific combinations of genes.

      In fact, we had selected the gene pairs that we had the sgRNAs against available when we performed the experiments, so we did not have very good reason to explain our selections. Instead we added a brief discussion in lines 304-306 that further validations are required for the gene pairs not experimentally tested.

      (2) According to Figure 2C, FYN was identified as crucial among the 30 gene pairs, and its upregulation in TNBC prompted further investigation. It would be informative to discuss the expression levels of TEK, FRK, and FGFR2 in TNBC and explain why these nodes were not studied. Is there existing evidence demonstrating the superiority of FYN over these other genes?

      The similar concern was raised by reviewer #1. The expression levels of TEK, FRK and FGFR2 were relatively low in MDA-MB-231 and TNBCs in general, and we were concerned about the generalizability of these targets for treating TNBC. While the validation of these genes for possible synthetic lethality may lead to valuable insight, this may be beyond scope of this paper. This concern is newly discussed in result and discussion sections in lines 150-154.

      (3) The screening process employed only one cell line, and validation was conducted with only one cell line (Figure 2A). Consider supplementing the findings with more convincing evidence from other breast cancer cell lines to strengthen the conclusions.

      Although the CRISPR screens and primary validations were done with only one cell line, further validations with drug combinations were done in independent cancer cell lines such as Hs578T (figures S4E-J). Also, the possible association of FYN expression in drug tolerant cells were also demonstrated in lung cancer cells. We hope this satisfies the reviewer.

      (4) The network analysis in Figure 2C lacks a description of the methodology used. It would be beneficial to provide a brief explanation of the methods employed for this analysis.

      The network analysis was done manually with the size of each node proportional to the number of gene pairs. We newly added text in figure legend in line 638 to clarify this.

      (5) The significance of gene A mentioned in line 142 is unclear. Please provide a clear explanation or context for the importance of this gene.

      This is a mistake that were also pointed out by reviewer #1. The “gene A” should have been “FYN”. We corrected this in line 154.

      6. In Figure 2J and Figure 2K, it would be more informative to measure the phosphorylation levels of FYN and SRC rather than just their baseline levels. Consider revising the figures accordingly.

      We thank the reviewer for a careful comment. We newly provide supplementary figure S5A to show that phosphorylation level of FYN is increased, but this increase was proportional to the increase in FYN protein level, so the ratio of pFYN/FYN did not change significantly. We discussed this result in lines 187-190.

      (7) Figure S4B lacks biological replicates, which could impact the reliability of the experimental results. Consider adding biological replicates to enhance the robustness of the findings.

      This was also pointed out by reviewer #1. Instead of performing qPCR for all drugs, we focused on validating the decrease in FYN mRNA level for drug combinations that synergistically kill cancer cells. We were also aiming to identify direct mediator of FYN mRNA upregulation, so we focused on drug combination that involves inhibitor of epigenetic regulator that promotes transcription. To this end, we tested the impact of GSK-J4(KDM6 inhibitor) and pinometostat (DOT1L inhibitor) in combination with TKI in regulating FYN expression level. Notably, while GSK-J4 attenuated FYN mRNA accumulation by NVP-ADW742 treatment, pinometostat failed to do so (figure S5C). We newly described these results in lines 192-197 in results section.

      (8) Line 186 indicates that KDM3 knockout was not tested in Figure S5A. It would be helpful to provide an explanation for this omission or consider including the data if available.

      We thank the reviewer for pointing this out. The T7 endonuclease assay results for KDM3, KDM4 and PHF8 are added in figure S6B. All guide RNAs used in the study efficiently generated indel mutations.

      (9) In line 206, KDM4A is introduced, but Figures 3J and 3M had already pointed to KDM4A. The authors did not analyze the ChIP results for other members of the KDM4 family at this point. Please address this inconsistency and provide a rationale for focusing on KDM4A. Additionally, in Figure 3M, consider adding peak labeling to the enriched portion for clarity.

      We welcome the reviewer’s careful concern. KDM4 family enzymes perform catalytically identical reactions, and are thought to be redundant. Therefore, we judged that the most abundantly expression genes among KDM4 family should be the primary target to focus on. To this end, we analyzed the expression levels of KDM4 family genes in supplementary figure S6A. Indeed KDM4A expression was the highest among other KDM4 family genes. We discussed this in results section in lines 218-220.

      (10) The author only indicated the relationship between the H3K9me3 level in the enhancer region and FYN expression. It would be valuable to verify the activity of the enhancers and investigate additional markers such as H3K27ac and H3K4me1. Consider discussing these aspects to provide a more comprehensive understanding.

      Since we and others had shown that histone dementhylases are increased upon drug treatment, we focused on histone methylation marks which are associated with gene repression and whose removal by demethylases are associated with drug resistance. To this end, KDM6 demethylases removing H3K27me3 may serve as attractive alternative. In our newly added supplementary figure S6E, ADW742 treatment did not decrease H3K27me3 level in FYN promoter, indicating that H3K9me3 may be the dominant epigenetic change that modulates FYN expression upon drug treatment. This was briefly discussed in lines 233-235.

      (11) In Figure 4A, the addition of the drug alone does not inhibit tumor growth. Please provide an explanation for this result and consider discussing potential reasons for the observed lack of inhibition.

      The drug dose was adjusted carefully to minimize tumor shrinkage by single drug so that synergistic tumor shrinkage can be clearer.

      (12) Line 208 indicates missing parentheses in the text describing Figure 4C. Please correct the text accordingly to ensure clarity.

      Corrected. Thank you for catching this error.

      (13) The figure legends for Figures 5E, F, G, and H contain errors. Please correct the figure legends to accurately describe the respective figures.

      We thank the reviewer for catching this error. We have changed the figure legends in lines 691-697 to accurately describe the figures.

      (14) It may be beneficial for the authors to divide the results section into several subsections and add headings to improve the overall understanding of the findings.

      This is an excellent suggestion. We divided our results section into subsections and added headings in lines 80, 141, 181, 237 and 251 to help readers understand our findings.

      (15) The authors should include the sgRNA sequences used for gene targeting, along with details of the target genes and negative/positive controls, in the Supplementary Materials to enhance reproducibility and transparency.

      This is a critical point for improving reproducibility of our work. The sgRNA sequences used in the study are newly added in supplementary table S3.

      (16) The resolution of the figures in the Supplementary Materials is too low, which may impede the authors' ability to interpret the data. Consider providing higher-resolution figures for better readability.

      We had similar concern posed by reviewer #1, we provided higher resolution image for all main and supplementary figures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors constructed a novel HSV-based therapeutic vaccine to cure SIV in a primate model. The novel HSV vector is deleted for ICP34.5. Evidence is given that this protein blocks HIV reactivation by interference with the NF-kB pathway. The deleted construct supposedly would reactivate SIV from latency. The SIV genes carried by the vector ought to elicit a strong immune response. Together the HSV vector would elicit a shock and kill effect. This is tested in a primate model.

      Thank you for your kind comments and suggestions, which are very helpful in improving our manuscript. We have carefully revised our manuscript and performed additional experiments accordingly, and we now think this version has been substantially improved for your reconsideration.

      Strengths and weaknesses:

      (1) Deleting ICP34.5 from the HSV construct has a very strong effect on HIV reactivation. Why is no eGFP readout given in Figure 1C as for WT HSV? The mechanism underlying increased activation by deleting ICP34.5 is only partially explored. Overexpression of ICP34.5 has a much smaller effect (reduction in reactivation) than deletion of ICP34.5 (strong activation); so the story seems incomplete.

      Thank you for your careful review and kind reminder.

      (1) We are sorry for the misunderstanding of Figure 1C. In the experiment of Figue 1C, we used an HSV-1 17 strain containing GFP (HSV-GFP) and HSV-DICP34.5 (recombinant HSV-1 17 strain with ICP34.5 deletion based on HSV-GFP) to reactivate the HIV latency cell line (J-Lat 10.6 cell). Since detecting GFP cannot distinguish between HSV infection and HIV reactivation, we assessed the reactivation by measuring the mRNA levels of HIV LTR upon stimulation with either HSV-GFP or HSV-ΔICP34.5. Actually, in Figure 1B, we had verified the reactivation efficacy by infecting J-Lat 10.6 cells with the HSV-1 17 strain containing GFP (HSV-GFP) and found significant upregulation of mRNA levels of HIV-1 LTR, Tat, Gag, Vif, and Vpr. We have adjusted the corresponding descriptions accordingly in the revised manuscript.

      (2) We agree with your insightful mention that the mechanism underlying increased activation by HSV-ΔICP34.5 is worthy to be further explored in the future study. In this study, we found that ICP34.5 play an antagonistic role with the reactivation of HIV latency by HSV-1 mainly through the modulation of host NF-κB and HSF1 pathways, while HSV-1 (especially HSV-ΔICP34.5) might reactivate HIV latency through NF-κB, HSF1, and other yet-to-be-determined mechanisms. Thus, ICP34.5 overexpression can only a partial effect on the reduction of the HIV latency reactivation by HSV-1. We have mentioned this issue in the revised “Discussion section”. Intriguingly, these findings collectively indicated that ICP34.5 might play an antagonistic role in the reactivation of HIV by HSV-1, and thus our modified HSV-DICP34.5 constructs can effectively reactivate HIV/SIV latency through the release of imprisonment from ICP34.5. However, ICP34.5 overexpression had only a partial effect on the reduction of the HIV latency reactivation, indicating that HSV-DICP34.5-based constructs can also reactivate HIV latency through other yet-to-be-determined mechanisms. (Lines 334 to 340).

      (2) No toxicity data are given for deleting ICP34.5. How specific is the effect for HIV reactivation? An RNA seq analysis is required to show the effect on cellular genes.

      Thank you for your questions and suggestions.

      (1) It’s well known that ICP34.5 is a neurotoxicity factor that can antagonize host immune responses, and previous studies (in gene therapy and oncolytic virotherapy) have shown that the safety of recombinant HSV-based vector can be improved by deleting ICP34.5. In this study, we also found that HSV-DICP34.5 exhibited lower virulence and replication ability than its parental strain (HSV-GFP) (Figure 1D, Figure S1). In addition, HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV-GFP stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5I) and body weight (Figure S9) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-sPD1-SIVgag/SIVenv group (Figure S10). Thus, these data suggest the safety of HSV-DICP34.5 in PLWH might be tolerable. We have added the corresponding description in the revised manuscript.

      (2) In our study, we found both adenovirus and vaccinia virus cannot reactivate HIV latency (Figure S3). In addition, the deletion of ICP0 gene from HSV-1 diminished the reactivation effect of HIV latency by HSV-1 (Figure S4). Thus, these data suggested the reactivation of HIV latency by HSV-1 might be virus-specific. Of course, this might be further investigated in future studies. We have added the corresponding description in the revised manuscript.

      (3) To explore the mechanism of reactivating viral latency by HSV-DICP34.5-based constructs, we performed RNA-seq analysis (Figure S5). We have added the corresponding description accordingly in the revised manuscript.

      (3) The primate groups are too small and the results to variable to make averages. In Figure 5, the group with ART and saline has two slow rebounders. It is not correct to average those with a single quick rebounder. Here the interpretation is NOT supported by the data.

      We agree with you that this is a pilot study with limited numbers of rhesus macaques. Although the number of macaques was relatively limited, these nine macaques were distributed evenly based on the background level of age, sex, weight, CD4 count, and viral load (VL) (Table S2). All SIV-infected macaques used in this study had a long history of SIV infection and had several courses of ART therapy, which mimics treatment of chronic HIV-1 infection in humans. These macaques were infected with SIVmac239 for more than 5 years, and highly pathogenic SIV-infected macaques have been well-validated as a stringent model to recapitulate HIV-1 pathogenesis and persistence during ART therapy in humans. Indeed, in our Chinese rhesus model, ART treatment effectively suppressed SIV infection to undetectable levels in plasma, and upon ART discontinuation, virus rapidly rebounded, which is very similar with that in ART-treated HIV patients. We think the results of this pilot study were very promising for further studies which will be expanded the scale of animals and then to preclinical and clinical study in our next projects. Thank you for your understanding.

      As for your question regarding “the two animals with low VL and slow rebound”, our explanation is following: As mentioned above, these macaques were distributed evenly based on the background level of CD4 count and VL (Table S2), and then there were different change of viral load and viral rebound in different groups. Thus, we think these data can support our interpretation. Moreover, our conclusion can also be supported from at least three evidences.

      (1) The VL in the ART+saline group promptly rebounded after ART discontinuation, with an average 8.63-fold increase in the rebounded peak VL compared with the pre-ART VL (Figure 5A, D and E). However, plasma VL in the ART+HSV-sPD1-SIVgag/SIVenv group exhibited a delayed rebound interval (Figure 5B-D).

      (2) There was a lower rebounded peak VL than pre-ART VL in the ART+HSV-sPD1-SIVgag/SIVenv group (average 12.20-fold decrease), while a higher rebounded peak VL than pre-ART VL in the ART+HSV-empty group (average 2.74-fold increase) (Figure 5E).

      (3) We found significant suppression of total SIV DNA and integrated SIV DNA provirus in the ART+HSV-sPD1-SIVgag/SIVenv group. However, the copies of the SIV DNA provirus were significantly improved in the ART+HSV-empty group and ART+saline group (Figure 5F-G).

      Thank you for your understanding.

      Discussion

      HSV vectors are mainly used in cancer treatment partially due to induced inflammation. Whether these are suitable to cure PLWH without major symptoms is a bit questionable to me and should at least be argued for.

      Thank you for your kind question comment and question. We confirmed the enhanced reactivation of HIV latency by HSV-∆ICP34.5 in primary CD4+ T cells from people living with HIV (PLWH) (Figure S2). As mentioned above, previous studies have shown that the safety of recombinant HSV-based vector can be improved by deleting ICP34.5. In this study, we also found that HSV-DICP34.5 exhibited lower virulence and replication ability than its parental strain (HSV-GFP) (Figure 1D, Figure S1). In addition, HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV-GFP stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5I) and body weight (Figure S9) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-sPD1-SIVgag/SIVenv group (Figure S10). Thus, these data suggest the safety of HSV-DICP34.5 in PLWH might be tolerable. We have added the corresponding description in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this article, Wen et. al. describe the development of a 'proof-of-concept' bi-functional vector based on HSV-deltaICP-34.5's ability to purge latent HIV-1 and SIV genomes from cells. They show that co-infection of latent J-lat T-cell lines with an HSV-deltaICP-34.5 vector can reactivate HIV-1 from a latent state. Over- or stable expression of ICP 34.5 ORF in these cells can arrest latent HIV-1 genomes from transcription, even in the presence of latency reversal agents. ICP34.5 can co-IP with- and de-phosphorylate IKKa/b to block its interaction with NF-k/B transcription factor. Additionally, ICP34.5 can interact with HSF1 which was identified by mass-spec. Thus, the authors propose that the latency reversal effect of HSV-deltaICP-34.5 in co-infected JLat cells is due to modulatory effects on the IKKa/b-NF-kB and PP1-HSF-1 pathway.

      Next, the authors cleverly construct a bifunctional HSV-based vector with deleted ICP34.5 and 47 ORFs to purge latency and avoid immunological refluxes, and additionally, expand the application of this construct as a vaccine by introducing SIV genes. They use this 'vaccine' in mouse models and show the expected SIV-immune responses. Experiments in rhesus macaques (RM), further elicit the potential for their approach to reactivate SIV genomes and at the same time block their replication by antibodies. What was interesting in the SIV experiments is that the dual-functional vector vaccine containing sPD1- and SIV Gag/Env ORFs effectively delayed SIV rebound in RMs and in some cases almost neutralized viral DNA copy detection in serum. Very promising indeed, however, there are some questions I wish the authors had explored to get answers to, detailed below.

      Overall, this is an elegant and timely work demonstrating the feasibility of reducing virus rebound in animals, with the potential to expand to clinical studies. The work was well-written, and sections were clearly discussed.

      Strengths:

      The work is well designed, rationale explained, and written very clearly for lay readers.<br /> Claims are adequately supported by evidence and well-designed experiments including controls.

      Thank you for your nice comments regarding our work.

      Weaknesses:

      (1) While the mechanism of ICP34.5 interaction and modulation of the NF-kB and HSF1 pathways are shown, this only proves ICP34.5 interactions but does not give away the mechanism of how the HSV-deltaICP-34.5 vector purges HIV-1 latency. What other components of the vector are required for latency reversal? Perhaps serial deletion experiments of the other ORFs in the HSV-deltaICP-34.5 vector might be revealing.

      Thank you for your valuable suggestion. In fact, we are currently further exploring some potential viral genes of HSV-1 that might play a role in the reactivation of HIV latency. We have found that the deletion of ICP0 gene from HSV-1 diminished the reactivation effect of HIV latency by HSV-1 (Figure S4), showing that ICP0 might play a vital role for the reactivation. Of course, this might be further investigated in future studies. We have added the corresponding description in the revised manuscript.

      (2) The efficacy of the HSV vaccine vectors was evaluated in Rhesus Macaque model animals. Animals were chronically infected with SIV (a parent of HIV), treated with ART, challenged with bi-functional HSV vaccine or controls, and discontinued treatment, and the resulting virus burden and immune responses were monitored. The animals showed SIV Gag and Env-specific immune responses, and delayed virus rebound (however rebound is still there), and below-detection viral DNA copies. What would make a more convincing argument to this reviewer will be data to demonstrate that after the bi-functional vaccine, the animals show overall reduction in the number of circulating latent cells. The feasibility of obtaining such a result is not clearly demonstrated.

      Thank you for your valuable mention. We have now provided more data about this issue. We found significant suppression of total SIV DNA and integrated SIV DNA provirus in the ART+HSV-sPD1-SIVgag/SIVenv group. However, the copies of the SIV DNA provirus were significantly improved in the ART+HSV-empty group and ART+saline group (Figure 5F-G). We have added the corresponding description in the revised manuscript.

      (3) The authors state that the reduced virus rebound detected following bi-functional vaccine delivery is due to latent genomes becoming activated and steady-state neutralization of these viruses by antibody response. This needs to be demonstrated. Perhaps cell-culture experiments from specimens taken from animals might help address this issue. In lab cultures one could create environments without antibody responses, under these conditions one would expect a higher level of viral loads to be released in response to the vaccine in question.

      Thanks for your kind mention and suggestion. We performed the following cell experiment to address this issue. Primary CD4+ T cells from people living with HIV (PLWH) were isolated, and then infected with HSV or HSV-∆ICP34.5 constructs. As expected, we confirmed the enhanced reactivation of HIV latency by HSV-∆ICP34.5 (Figure S2). Thank you.

      (4) How do the authors imagine neutralizing HIV-1 envelope epitopes by a similar strategy? A discussion of this point may also help.

      Thank you for your kind comment. We have added the corresponding discussion in the revised manuscript. “The current consensus on HIV/AIDS vaccines emphasizes the importance of simultaneously inducing broadly neutralizing antibodies and cellular immune responses. Therefore, we believe that incorporating the induction of broadly neutralizing antibodies into our future optimizing approaches may lead to better therapeutic outcomes.” (Lines 384 to 388)

      (5) I thought the empty HSV-vector control also elicited somewhat delayed kinetics in virus rebound and neutralization, can the authors comment on why this is the case?

      Thank you for your careful review and mention. We agree with you that the HSV-1 empty vector does exhibit somewhat a delayed rebound. We think the possible reason is: Although the empty HSV-vector cannot elicit SIV-specific CTL responses, it effectively activates the latent SIV reserviors, and then these activated virions can be partially killed by ART drugs. Therefore, even without carrying HIV/SIV antigens, somewhat delayed kinetics in virus rebound may be observed. Thank you.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors should provide toxicity data for HSV transduction after deleting ICP34.5 and provide an explanation of why overexpression of ICP34.5 has such a small effect.

      Thank you for your questions and suggestions. As mentioned above, we now provided data for the safety of HSV-DICP34.5-based constructs.

      (1) It’s well known that ICP34.5 is a neurotoxicity factor that can antagonize host immune responses, and previous studies (in gene therapy and oncolytic virotherapy) have shown that the safety of recombinant HSV-based vector can be improved by deleting ICP34.5. In this study, we also found that HSV-DICP34.5 exhibited lower virulence and replication ability than its parental strain (HSV-GFP) (Figure 1D, Figure S1). In addition, HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV-GFP stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5I) and body weight (Figure S9) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-sPD1-SIVgag/SIVenv group (Figure S10). Thus, these data suggest the safety of HSV-DICP34.5 in PLWH might be tolerable. We have added the corresponding description in the revised manuscript.

      (2) We agree with your insightful mention that the mechanism underlying increased activation by HSV-ΔICP34.5 is worthy to be further explored in the future study. In this study, we found that ICP34.5 play an antagonistic role with the reactivation of HIV latency by HSV-1 mainly through the modulation of host NF-κB and HSF1 pathways, while HSV-1 (especially HSV-ΔICP34.5) might reactivate HIV latency through NF-κB, HSF1, and other yet-to-be-determined mechanisms. Thus, ICP34.5 overexpression can only a partial effect on the reduction of the HIV latency reactivation by HSV-1. We have mentioned this issue in the revised “Discussion section”. “Intriguingly, these findings collectively indicated that ICP34.5 might play an antagonistic role in the reactivation of HIV by HSV-1, and thus our modified HSV-DICP34.5 constructs can effectively reactivate HIV/SIV latency through the release of imprisonment from ICP34.5. However, ICP34.5 overexpression had only a partial effect on the reduction of the HIV latency reactivation, indicating that HSV-DICP34.5-based constructs can also reactivate HIV latency through other yet-to-be-determined mechanisms.” (Lines 334 to 340).

      (2) How specific is the effect for HIV reactivation? An RNA seq analysis is required to show the effect on cellular genes.

      Thank you for your questions and suggestions.

      (1) In our study, we found both adenovirus and vaccinia virus cannot reactivate HIV latency (Figure S3). In addition, the deletion of ICP0 gene from HSV-1 diminished the reactivation effect of HIV latency by HSV-1 (Figure S4). Thus, these data suggested the reactivation of HIV latency by HSV-1 might be virus-specific. Of course, this might be further investigated in future studies. We have added the corresponding description in the revised manuscript.

      (2) To explore the mechanism of reactivating viral latency by HSV-DICP34.5-based constructs, we performed RNA-seq analysis (Figure S5). Results showed that there were numerous differentially expressed genes (DEGs) in response to HSV-ΔICP34.5 infection. Among them, 2288 genes were upregulated, and 611 genes were downregulated. GO analysis showed the enrichment of these DEGs in cellular cycle, cellular development, and cellular proliferation, and KEGG enrichment analysis indicated the enrichment in pathways such as cellular cycle and cytokine-cytokine receptor interaction. We have added the corresponding description accordingly in the revised manuscript.

      (3) A comparison in primates has to be given for constructs with or without ICP34.5 to validate cell culture data (what is an empty vector?)

      Thank you for your reminder. In the revised manuscript, we performed the following cell experiment to address this issue. Primary CD4+ T cells from people living with HIV (PLWH) were isolated, and then infected with HSV or HSV-∆ICP34.5 constructs. As expected, we confirmed the enhanced reactivation of HIV latency by HSV-∆ICP34.5 (Figure S2). Thank you.

      (4) Legends should be improved in writing and content.

      Thank you for your kind mention. In the revised version, we have improved both the manuscript content and the legends of all Figures have been carefully revised in writing and content. Thank you.

      (5) The primate groups should be enlarged before any reliable conclusions can be made. Inflammatory/tox data should be provided.

      Thank you for your question.

      (1) As mentioned above, we agree with you that this is a pilot study with limited numbers of rhesus macaques. Although the number of macaques was relatively limited, these nine macaques were distributed evenly based on the background level of age, sex, weight, CD4 count, and viral load (VL) (Table S2). All SIV-infected macaques used in this study had a long history of SIV infection and had several courses of ART therapy, which mimics treatment of chronic HIV-1 infection in humans. These macaques were infected with SIVmac239 for more than 5 years, and highly pathogenic SIV-infected macaques have been well-validated as a stringent model to recapitulate HIV-1 pathogenesis and persistence during ART therapy in humans. Indeed, in our Chinese rhesus model, ART treatment effectively suppressed SIV infection to undetectable levels in plasma, and upon ART discontinuation, virus rapidly rebounded, which is very similar with that in ART-treated HIV patients. We think the results of this pilot study were very promising for further studies which will be expanded the scale of animals and then to preclinical and clinical study in our next projects. Thank you for your understanding.

      (2) As well known, ICP34.5 is a neurotoxicity factor that can antagonize host immune responses, and previous studies have shown that the safety of recombinant HSV-based vector can be improved by deleting ICP34.5. In this study, we also found that HSV-DICP34.5 exhibited lower virulence and replication ability than its parental strain (HSV-GFP) (Figure 1D, Figure S1). In addition, HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV-GFP stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5I) and body weight (Figure S9) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-sPD1-SIVgag/SIVenv group (Figure S10). Thus, these data suggest the safety of HSV-DICP34.5 in PLWH might be tolerable. We have added the corresponding description in the revised manuscript.

      (6) Discuss the potential of inflammatory HSV vaccines to be used in PLWH without clinical symptoms.

      Thank you for your mention. As discussed above, we found that HSV-DICP34.5 exhibited lower virulence and replication ability than its parental strain (Figure 1D, Figure S1), and we also found that HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV-GFP stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5I) and body weight (Figure S9) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-sPD1-SIVgag/SIVenv group (Figure S10). Thus, these data suggest the safety of HSV-DICP34.5 in PLWH might be tolerable. We have added the corresponding description in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      I think the authors have done due diligence to the experimental system, and collected evidence to show the feasibility of delaying virus rebound in macaques. However, I would encourage the authors to perform experiments that can back up the claim that delayed virus rebound is due to neutralization effects, or perhaps due to a reduction in viral reservoir. I believe insights into this process will add rigor, and push the relevance of the study to the next level.

      Thank you for your nice comment and valuable suggestion. We have now provided more data about this issue. We found significant suppression of total SIV DNA and integrated SIV DNA provirus in the ART+HSV-sPD1-SIVgag/SIVenv group. However, the copies of the SIV DNA provirus were significantly improved in the ART+HSV-empty group and ART+saline group (Figure 5F-G). We also discussed that incorporating the induction of broadly neutralizing antibodies into our future optimizing approaches may lead to better therapeutic outcomes in the revised Discussion section. We have added the corresponding description in the revised manuscript. Thank you.

      Altogether, all of the above comments and suggestions are very helpful in improving our manuscript. We have taken these comments into account seriously and try our best to address these questions point-by-point. After making extensive revisions, we now submit this revised manuscript for your re-consideration. Thank you again for all of your comments and suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study uses single nucleus multiomics to profile the transcriptome and chromatin accessibility of mouse XX and XY primordial germ cells (PGCs) at three time-points spanning PGC sexual differentiation and entry of XX PGCs into meiosis (embryonic days 11.5-13.5). They find that PGCs can be clustered into sub-populations at each time point, with higher heterogeneity among XX PGCs and more switch-like developmental transitions evident in XY PGCs. In addition, they identify several transcription factors that appear to regulate sex-specific pathways as well as cell-cell communication pathways that may be involved in regulating XX vs XY PGC fate transitions. The findings are important and overall rigorous. The study could be further improved by a better connection to the biological system, including the addition of experiments to validate the 'omics-based findings in vivo and putting the transcriptional heterogeneity of XX PGCs in the context of findings that meiotic entry is spatially asynchronous in the fetal ovary. Overall, this study represents an advance in germ cell regulatory biology and will be a highly used resource in the field of germ cell development.

      Strengths:

      (1) The multiomics data is mostly rigorously collected and carefully interpreted.

      (2) The dataset is extremely valuable and helps to answer many long-standing questions in the field.

      (3) In general, the conclusions are well anchored in the biology of the germ line in mammals.

      Weaknesses:

      (1) The nature of replicates in the data and how they are used in the analysis are not clearly presented in the main text or methods. To interpret the results, it is important to know how replicates were designed and how they were used. Two "technical" replicates are cited but it is not clear what this means.

      The two independent technical replicates comprised different pools of paired gonads. This sentence was added to the methods section of the revised manuscript.

      (2) Transcriptional heterogeneity among XX PGCs is mentioned several times (e.g., lines 321-323) and is a major conclusion of the paper. It has been known for a long time that XX PGCs initiate meiosis in an anterior-to-posterior wave in the fetal ovary starting around E13.5. Some heterogeneity in the XX PGC populations could be explained by spatial position in the ovary without having to invoke novel subpopulations.

      We thank the reviewer for pointing out this important biological phenomenon. We also recognize that transcriptional heterogeneity among XX PGCs is likely due to the anterior-to-posterior wave of meiotic initiation in E13.5 ovaries and highlight this possibility in our manuscript. However, since our study utilizes single-nucleus RNA-sequencing and not spatial transcriptomics, we are not able to capture the spatial location of the XX PGCs analyzed in our dataset. As such, our analysis applied clustering tools to classify the populations of XX PGCs captured in our dataset. 

      (3) There is essentially no validation of any of the conclusions. Heterogeneity in the expression of a given marker could be assessed by immunofluorescence or RNAscope.

      In our revised manuscript, we included immunofluorescence staining of potential candidate factors involved in PGC sex determination, such as PORCN and TFAP2C. Testing and optimizing antibodies for the targets identified in this study are ongoing efforts in our lab and we look forward to sharing our results with the research community.

      (4) The paper sometimes suffers from a problem common to large resource papers, which is that the discussion of specific genes or pathways seems incomplete. An example here is from the analysis of the regulation of the Bnc2 locus, which seems superficial. Relatedly, although many genes and pathways are nominated for important PGC functions, there is no strong major conclusion from the paper overall.

      In this manuscript, we set out to identify candidate factors, some already known and many others unknown, involved in the developmental pathways of PGC sex determination using computational tools. Our goal, as a research group and with future collaborators, is to screen these interesting candidates and discover their function in the primordial germ cell. Our research, presented in this study, represents a launching pad for which to identify future projects that will investigate these factors in further detail.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript by Alexander et al describes a careful and rigorous application of multiomics to mouse primordial germ cells (PGCs) and their surrounding gonadal cells during the period of sex differentiation.

      Strengths:

      In thoughtfully designed figures, the authors identify both known and new candidate gene regulatory networks in differentiating XX and XY PGCs and sex-specific interactions of PGCs with supporting cells. In XY germ cells, novel findings include the predicted set of TFs regulating Bnc2, which is known to promote mitotic arrest, as well as the TFs POU6F1/2 and FOXK2 and their predicted targets that function in mitosis and signal transduction. In XX germ cells, the authors deconstruct the regulation of the premeiotic replication regulator Stra8, which reveals TFs involved in meiosis, retinoic acid signaling, pluripotency, and epigenetics among predictions; this finding, along with evidence supporting the regulatory potential of retinoic acid receptors in meiotic gene expression is an important addition to the debate over the necessity of retinoic acid in XX meiotic initiation. In addition, a self-regulatory network of other TFs is hypothesized in XX differentiating PGCs, including TFAP2c, TCF5, ZFX, MGA, and NR6A1, which is predicted to turn on meiotic and Wnt signaling targets. Finally, analysis of PGC-support cell interactions during sex differentiation reveals more interactions in XX, via WNTs and BMPs, as well as some new signaling pathways that predominate in XY PGCs including ephrins, CADM1, Desert Hedgehog, and matrix metalloproteases. This dataset will be an excellent resource for the community, motivating functional studies and serving as a discovery platform.

      Weaknesses:

      My one major concern is that the conclusion that PGC sex differentiation (as read out by transcription) involves chromatin priming is overstated. The evidence presented in the figures includes a select handful of genes including Porcn, Rimbp1, Stra8, and Bnc2 for which chromatin accessibility precedes expression. Given that the authors performed all of their comparisons between XX versus XY datasets at each timepoint, have they missed an important comparison that would be a more direct test of chromatin priming: between timepoints for each sex? Furthermore, it remains possible that common mechanisms of differentiation to XX and XY could be missing from this analysis that focused on sexspecific differences.

      We thank the reviewer for their thoughtful assessment and suggestions, as stated here. We note that chromatin priming in PGCs prior to sex determination is a well-documented research finding (see references below), that is further supported by our single-nucleus multiomics data. To support these findings previously stated in the scientific literature, we included data demonstrating the asynchronous correlation between chromatin accessibility and gene expression during PGC sex determination. Specifically, we investigated the associations of differentially accessible chromatin peaks with differentially expressed gene expression for each PGC type (between sexes and across embryonic stages) using computational tools and methods that are well-established and applied by the research community. In our manuscript, we note that the patterns we identified support the potential role of chromatin priming in PGC sex determination. Nevertheless, we further highlight that a comprehensive profile of 3D chromatin structure and enhancer-promoter contacts in differentiating PGCs is needed to fully understand how changes to chromatin facilitate PGC sex determination.

      References:

      (1) Chen, M., et al. Integration of single-cell transcriptome and chromatin accessibility of early gonads development among goats, pigs, macaques, and humans. Cell Reports 41 (2022).

      (2) Huang, T.-C. et al. Sex-specific chromatin remodelling safeguards transcription in germ cells. Nature 600, 737–742 (2021).

      Reviewer #3 (Public Review):

      Summary:

      Alexander et al. reported the gene-regulatory networks underpinning sex determination of murine primordial germ cells (PGCs) through single-nucleus multiomics, offering a detailed chromatin accessibility and gene expression map across three embryonic stages in both male (XY) and female (XX) mice. It highlights how regulatory element accessibility may precede gene expression, pointing to chromatin accessibility as a primer for lineage commitment before differentiation. Sexual dimorphism in these elements and gene expression increases over time, and the study maps transcription factors regulating sexually dimorphic genes in PGCs, identifying sex-specific enrichment in various transcription factors. Strengths:

      The study includes step-wise multiomic analysis with some computational approach to identify candidate TFs regulating XX and XY PGC gene expression, providing a detailed timeline of chromatin accessibility and gene expression during PGC development, which identifies previously unknown PGC subpopulations and offers a multimodal reference atlas of differentiating PGC clusters. Furthermore, the study maps a complex network of transcription factors associated with sex determination in PGCs, adding depth to our understanding of these processes.

      Weaknesses:

      While the multiomics approach is powerful, it primarily offers correlational insights between chromatin accessibility, gene expression, and transcription factor activity, without direct functional validation of identified regulatory networks.

      As stated in our response above to a similar concern, we note that our research study represents a launching pad for which to identify future projects that will investigate candidates that may be involved in PGC sex determination, in further detail. With this rich dataset in hand, our goal in future research projects is to screen these candidates and discover their function in PGCs. 

      Response to Recommendations

      Reviewer #1 (Recommendations For The Authors):

      (1) Clarify at first introduction how combined ATAC-seq/RNA-seq mulitomics libraries were prepared, including if ATAC and RNA-seq data are from the same cell.

      This information was added to the introduction of the revised manuscript.

      (2) Clarify what the two technical replicates represent. Are they two libraries from the same gonad or the same pool of gonads? Are they from 2 different gonads?

      The two independent technical replicates comprised different pools of paired gonads. This sentence was added to the methods section of the revised manuscript.

      (3) In Supplemental Figure 1, there is substantial variation in the number of unique snATAC-seq fragments between some conditions. Could this create a systematic bias that affects clustering?

      We recognize the concern that substantial variation in the number of unique snATAC-seq fragments between conditions could potentially create a systematic bias that affects clustering. However, we analyzed our snATAC-seq dataset with Signac, which performs term frequency-inverse document frequency (TF-IDF) normalization. This is a process that normalizes across cells to correct for differences in cellular sequencing depth. Given that sequencing depth was taken into account in our normalization and clustering procedures, and that the unbiased clustering of PGCs also reflects the sex and embryonic stage of PGCs, we are confident that the clustering of the snATAC-seq datasets closely reflects the biological variability present in the PGCs collected.

      References:

      Signac Website:  https://stuartlab.org/signac/articles/pbmc_vignette

      Stuart, T., Srivastava, A., Madad, S., Lareau, C. A., & Satija, R. (2021). Single-cell chromatin state analysis with Signac. Nature methods, 18(11), 1333-1341.

      (4) In Figures 2a, 2e, 3a, and 3e, the visualization scheme is very difficult to follow. It's very hard to see the colors corresponding to average expression for many genes because the circles are so small. In addition, the yellow color is hard to see and makes it hard to estimate the size of the circle since the boundaries can be indistinct. I recommend using a different visualization scheme and/or set of size scales be used.

      In Figures 2a, 2e, 3a, and 3e, we chose this color palette to be inclusive of viewers who are colorblind. The chosen colors are visible on both a computer screen and on printed paper. We also included a legend of the color scale and dot size representing the average expression and percent of cells expressing the gene, respectively. If the color cannot be seen, it is because the cell population is not expressing the gene.

      (5) Perform in vivo validation (immunofluorescence or RNAscope) of at least some targets implicated in PGC development by this study.

      Such validations (immunofluorescence staining of PORCN and TFAP2C) are now included in Figure 4 and the supplement.

      (6) In line 351, the authors state that "we observed a strong demarcation between XX and XY PGCs at E12.5-E13.5." But in Figure 1j it looks like a reasonably high fraction of both XX and XY E12.5 cells are in cluster 1, which should mean that there is some overlap.

      While it is true that Figure 1j shows overlap of both XX and XY E12.5 cells in cluster 1, we were commenting on the separation of E12.5 XX (clusters 4 and 5) and E12.5 XY (clusters 8 and 9) PGCs. We have modified the sentence beginning at line 351 to state that the separation between XX and XY PGCs occurs at E13.5.

      (7) In lines 404-405: "We first linked snATAC-seq peaks to XY PGC functional genes". It is important to know how the peaks were linked to genes.

      We added the following sentence to address this comment: “Peak-to-gene linkages were determined using Signac functionalities and were derived from the correlation between peak accessibility and the intensity of gene expression.”

      (8) In Supplemental Figure 5c, the XX E11.5 condition has a substantially higher fraction of ATAC peaks at promoter regions compared to the others. Does this have statistical and biological significance?

      This is an interesting observation beyond the scope of our manuscript. Many interesting questions arise from this study and it is our plan to investigate further in the future. 

      (9) Line 885: "The increased number of DA peaks at E13.5 may be the result of changes to chromatin structure as XX PGCs enter meiotic prophase I"; but in Figure 4b, there's only a modest increase in DAP number from E12.5 to E13.5 in XX PGCs, compared to a massive gain in XY PGCs.

      In our manuscript, we comment on both phenomena: the doubling of differentially accessible peaks in XX PGCs from E12.5 to E13.5 and the massive increase in differentially accessible peaks in XY PGCs from E12.5 to E13.5. In our description of these results, we propose several hypotheses leading to these increases in differentially accessible peaks. As such, it cannot be ruled out that the changes to chromatin structure that occur during meiotic prophase I contribute to the gain in differentially accessible peaks in XX PGCs at E13.5, and we included this statement in the manuscript accordingly.

      Reviewer #2 (Recommendations For The Authors):

      (1) The methods state at line 141 that nuclei with mitochondrial reads of more than 25% were removed, however our understanding from the Bioconductor manual and companion manuscript (Amezquita, R.A., Lun, A.T.L., Becht, E. et al. Orchestrating single-cell analysis with Bioconductor. Nat Methods 17, 137-145 (2020). https://doi.org/10.1038/s41592-019-0654-x) is that snRNA-seq approaches remove mitochondrial transcripts entirely and datasets containing mitochondrial transcripts are thought to feature incompletely stripped nuclei. It is thought that mitochondrial transcripts participating in nuclear import may remain hanging on to the nuclear envelope and get encapsulated into GEMs. If the mitochondrial read cutoff of 25% was used intentionally to keep this potentially contaminating signal, please justify why this was done for this dataset.

      We agree with the reviewer that the presence of mitochondrial transcripts may be potentially contaminating signal. In our preprocessing steps, we removed the mitochondrial genes and transcripts from our datasets so that they would not influence or affect our analyses. The following sentence was added to the methods section on snRNA-seq data processing: “Mitochondrial genes and transcripts were removed from the snRNA-seq datasets to eliminate any potentially contaminating signal.”

      (2) Methods line 227: please include log2fold change and p-adjusted value cutoffs for GO enrichment.

      We used clusterprofiler for our GO enrichment analysis. Our GO enrichment analysis did not include a log2fold change analysis and the p-adjusted value cutoff is stated in the methods.

      (3) Results line 310: the claim that "At E12.5-E13.5, XY PGCs converged onto a single distinct population (cluster 7), indicating less transcriptional diversity among E12.5-E13.5 XY PGCs when compared to E12.5E13.5 XX PGCs (Fig1d)" would be strengthened if the authors quantified transcriptional distance with distance metrics such as euclidean or cosine distance.

      We used a clustering approach to gain insights into the transcriptional diversity of PGC populations. Using an additional metric, such as Euclidean or cosine distance, would not provide meaningful information not already achieved by clustering or change the conclusions presented in the manuscript.

      (4) Results line 317: the authors allude to Lars2 defining clusters 2 & 3 as a marker gene, but it is not clear why this is highlighted until the reader reaches the discussion, which alludes to the published role of Lars2 in reproduction. Please consider moving this sentence to the results section for clarity and perhaps expanding the discussion on the meaning.

      To provide clarity, we added the statement “genes with reported roles in reproduction” to the results section.

      (5) In Figure 2a, why do the authors choose to focus on Zkscan5 in XY PGCs when it is expressed by such a small portion of cells (<25%)? Do they assume that this is due to dropouts?

      We chose to focus on Zkscan5 as an example because of its enriched and differential expression in male PGCs, the motif for Zkscan5 is not enriched in female PGCs, and the reported roles of Zkscan5 in regulating cellular proliferation and growth. Zkscan5 is an example of how candidate genes can be identified for further investigation.

      (6) Line 461: "the population of E13.5 XX PGCs displaying the strongest Stra8 expression levels corresponded to the same population of XX PGCs with the highest module score of early meiotic prophase I genes (Figure 3c; Supplementary Fig. 3a-b)". However did the authors also consider examining the Stra8+ XX PGCs that do not robustly express meiotic genes to understand more about their differentiation potential?

      We are thankful to the reviewer for this suggestion. However, this research question is beyond the scope of the manuscript. We plan to investigate further in future research studies.

      (7) Line 505: "when we searched for the presence of RA receptor motifs in peaks linked to genes related to meiosis and female sex determination, we found that Stra8, Rec8, Rnf2, Sycp1, Sycp2, Ccnb3, and Zglp1 contain the RA receptor motifs in their regulatory sequences (Supplementary Figure 4g)." My read of the text is that the authors are not taking a side on the RA and meiosis controversy, but rather trying to reveal what the data can tell us, and the answer is that there is a strong signature linking RA to meiotic genes, which supports this as a valid biological pathway. But what is the strength of the RA>meiosis pathway compared to other mechanisms (which must be functioning in the triple receptor KO)? Perhaps the authors could take this analysis further with the following questions: (1) ask whether meiotic genes are more enriched in RA motifs compared to other expressed genes or other motifs (2) compare the strength of peak-gene correlations for all peaks containing RA receptor motifs vs. those with peaks for Zglp1, Rnf2, etc binding. The strengths of these correlations could provide clues to how much gene expression varies in response to RA exposure vs. modulation of these other factors and thus tell us something about how much RA is playing a role.

      We agree with the reviewer that this is a very interesting and important question. We also thank the reviewer for their thoughtful suggestions on the types of bioinformatics analyses that could answer this question. However, the section on RA signaling during PGC sex determination is only a small part of the manuscript and would be better analyzed in greater detail in a future research study or publication.

      (8) The shift from promoters in E11.5 XX PGCs to distal intergenic regions is fascinating. What can we learn about epigenetic reprogramming/methylation changes across gene bodies? 

      We agree with the reviewer that this is an interesting question about gene regulation in E11.5 XX PGCs. However, we prefer to analyze the epigenetic reprogramming changes across gene bodies in this cell population in additional research studies. Our purpose and goal for this section was to link differentially accessible chromatin peaks with differentially expressed genes to identify putative gene regulatory networks.

      (9) Line 581: why did the authors choose to highlight and validate PORCN1 in PGCs? Please elaborate.

      As stated in the manuscript, we chose to highlight and validate PORCN1 in PGCs because of its role in WNT signaling and because of the visibly strong correlation between chromatin accessibility at the XXenriched DAP in Fig. 4c (dashed box) and and gene expression of PORCN1.

      (10) Figure 5f would be easier to interpret if presented as two columns rather than a circle; show one line of the proteins and the other line with the transcripts so that each is on the same line and there are connections between them.

      This comment is related to stylistic preferences. The purpose of Fig. 5f is to demonstrate that the candidate transcription factors may regulate the expression of other enriched transcription factors. Figure 5f figure accomplishes this goal.

      (11) Line 640: "The predicted target genes of TCFL5 totaled 74% (367/494) of all DEGs with peak-to-gene linkages in XX PGCs". This seems like a high number and a lot of work for just TCFL5; given the overlap between other TFs and target genes, how many of these 367 target genes overlap with other TFs?

      We agree with the reviewer that this is an important declaration to make. We added the following sentence to the results section on TCFL5: “A large majority of the predicted target genes of TCFL5 were also predicted to be the target genes of the enriched TFs presented in Fig. 5e, e.g., the predicted target genes of these TFs overlapped with 4%-100% of the predicted target genes of TCFL5.”

      (12) The presentation of TCFL5 in the results section would make more sense with the additional mention of reproductive phenotypes already known (currently in the discussion Lines 914-917). I would furthermore suggest that the discussion goes into more depth on the difference between the regulatory network of TCFL5 in XX meiosis vs XY.

      We thank the reviewer for this comment, however, we already state in the results section that TCFL5 is known to influence XX PGC sex determination.

      (13) In the Methods, please state more clearly for those not familiar that the genetic background of mice is mixed.

      We described the mice with their official names, which provides the context of their genetic backgrounds.

      (14) Please specify which morphologic criteria were used to verify the stage of embryos in the methods.

      We added the following text to the methods section of the revised manuscript: “Plug date was used to determine the stage of embryos collected for single-nucleus RNA-seq and ATAC-seq. The stage of E11.5 embryos was confirmed by counting somites. The stage of embryos collected at E12.5 was confirmed by the morphological presence of the vessel and cords of the testes collected from XY embryos. Similarly, we confirmed the stage of embryos collected at E13.5 by the size of the gonads, the presence of more distinct cords in the testes of XY embryos, and the elongation of the ovaries of XX embryos.”

      (15) The total number of cells and PGCs that passed QC and are included in UMAPS should be stated.

      The requested information was added to the legend for Fig. 1 of the revised manuscript: “The number of PGCs per sex and embryonic stage are: 375 E11.5 XX PGCs; 1,106 E12.5 XX PGCs; 750 E13.5 XX PGCs; 110 E11.5 XY PGCs; 465 E12.5 XY PGCs; and 348 E13.5 XY PGCs.”

      (16) The order of timepoints changes between figures, and this is not for any obvious reason. Please make it consistent. Figures 1 and 6 list XX 11.5, 12.5, 13.5, and the same for XY, but Figures 2, 3, and 4 use the reverse order: XY E13.5, E12.5, E11.5, and then XX. 

      We thank the reviewer for this comment. However, we chose this order for each of the figures to match the coordinates of the graphs and where we would expect the reader to begin reading the graph first. For example, in Figure 3a, XX E11.5 is closest to the x-axis and would be expected to be read first.   

      (17) In Figure S2 the colors of clusters are hard to distinguish, and it is suggested that the cluster numbers should be listed above each colored bar to avoid frustration.

      We made the suggested correction to Figure S2.

      (18) In Figures 2e and 3e: what do the dashed boxes indicate?

      The dashed boxes are to guide the reader’s eyes to the fact that the order of transcription factors/genes under the Cistrome DB regulatory potential score and gene expression plots are the same.

      (19) In Figure 5a: break panels into i-iv so that the in-text call-outs are not all the same.

      We made the suggested correction to Figure 5a and modified the in-text call-outs.

      (20) Please indicate XX in Figure 5e and XY in Figure 5l.

      We made the suggested correction to Figure 5e and 5l.

      (21) In Figure S5c: Please reorganize DA chromatin peak charts so that columns are XX and XY with rows at the same timepoint.

      We made the suggested correction to Figure S5c.

      (22) In Figure S7a: please make images larger so that the overlapping expression of PORCN and TRA98 is more visible, and consider adding a more magnified panel.

      This image is now included in the main text, with expanded panels.

      (23) Line 742-754: this seems like a long introduction for the results section; please consider tightening it up.

      We believe this text is important and necessary to provide context to the bioinformatics analyses of cell signaling pathways in PGCs. Not all readers will be familiar with the ligand-receptor signals between gonadal support cells and PGCs, and this text provides details on which signaling pathways are known to direct sex determination of PGCs.

      (24) For UMAP plots in Figures 2c, 3c, S3b, and S4b, the text overlaid with the timepoints and sexes onto the UMAP plots is misleading, as it allows the reader to presume that the entire group of cells for a given sex/timepoint is located in the location of the text overlay. However, from the UMAP plots in Figure 1i-j, it is clear that the cells from a given sex/timepoint are actually spread across multiple identified clusters. Thus, the overlaid text obscures the important heterogeneity detected. To better represent the actual locations on the UMAP plot of cells from each sex/timepoint, it would be better to show inset density plots alongside these UMAP plots so the reader can locate the cells for themselves. 

      We thank the reviewer for this comment. However, we chose this formatting to offer simplicity and ease of understanding to our UMAPs in addition to highlighting the general biological patterns of gene expression. If the reader is interested in discerning more of the heterogeneity of the UMAPs, they may refer back to Figure 1.

      Reviewer #3 (recommendations for the authors):

      There are some errors or places that need clarification or corrections:

      (1) Figure 1f, according to the graph, it should be 8 clusters, not 9.

      There are 9 clusters because the numbering for the clusters start at ‘0’.

      (2) Why did cluster 8 have so many different states of cells from both sexes?

      The identification of cluster 8 is likely an artifact of sequencing, and would require several different analyses to figure out why cluster 8 has many different states of cells from both sexes. While this will address a technical issue associated with the dataset, this will not change any major conclusions of the study.

      (3) Figure 1i, shouldn't that be ten instead of eleven?

      There are 11 clusters because the numbering for the clusters start at ‘0’.

      (4) Figure 2a, zkscan expression level comparison was not so obvious as the bubble size was small. How many folds of differences from xx pgc?

      There is a 1.5 fold increase in the expression of Zkscan5 between XY and XX PGCs at E13.5. We included this information in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors constructed a novel HSV-based therapeutic vaccine to cure SIV in a primate model. The novel HSV vector is deleted for ICP34.5. Evidence is given that this protein blocks HIV reactivation by interference with the NFkappaB pathway. The deleted construct supposedly would reactivate SIV from latency. The SIV genes carried by the vector ought to elicit a strong immune response. Together the HSV vector would elicit a shock and kill effect. This is tested in a primate model.

      Strengths and weaknesses:

      (1) Deleting ICP34.5 from the HSV construct has a very strong effect on HIV reactivation. The mechanism underlying increased activation by deleting ICP34.5 is only partially explored. Overexpression of ICP34.5 has a much smaller effect (reduction in reactivation) than deletion of ICP34.5 (strong activation); this is acknowledged by the authors that no full mechanistic explanation can be given at this moment.

      Thank you for your comments. We agree with you that the mechanism underlying increased reactivation by deleting ICP34.5 is only partially explored. As you pointed out, the deletion of ICP34.5 leads to a significant reactivation, while the overexpression of ICP34.5 has a relatively weak inhibitory effect on reactivation. This difference prompts us to further contemplate the role of HSV-1 in regulating HIV latency and reactivation. Our data (Figure S4), along with previous literature (Mosca et al., 1987, Nabel et al., 1988), have indicated that the ICP0 protein might play a crucial role in the reactivation of HIV latency. However, we found for the first time that ICP34.5 can play an antagonistic role with this reactivation. This is a very interesting topic for understanding the complicated interactions between host cells and different viruses. We will investigate the deeper insights in future studies, and we have mentioned this limitation in the revised Discussion Section. Thank you!

      (2) No toxicity data are given for deleting ICP34.5. How specific is the effect for HIV reactivation? A RNA seq analysis is required to show the effect on cellular genes.

      A RNA seq analysis was done in the revised manuscript comparing the effect of HSV-1 and deleted vector in J-LAT cells (Fig S5). More than 2000 genes are upregulated after transduction with the modified vector in comparison with the WT vector. Hence, the specificity of upregulation of SIV genes is questioned. Authors do NOT comment on these findings. In my view it questions the utility of this approach.

      Thank you for your mentions.

      (1) As for the toxicity of HSV-ΔICP34.5, it is well known that ICP34.5 is a neurotoxicity factor that can antagonize host immune responses, and thus deleting ICP34.5 is beneficial to improve the safety of HSV-based constructs. As expected, we have demonstrated experimentally that HSV-DICP34.5 exhibited lower virulence and replication ability than wild-type HSV-1 (Figure S1). Importantly, we also observed a significant decrease in the expression of inflammatory factors in PWLH when compared to wild-type HSV-1 (Figure 1I-K). These data suggested that the safety of HSV-DICP34.5 should be more tolerable than wild-type HSV vector.

      (2) The RNASeq analysis is aimed to explore the HSV-ΔICP34.5-induced signaling pathways, but it is not suitable to use this data for assessing the toxicity of HSV-ΔICP34.5 constructs. As for the RNASeq data, we think it is reasonable to observe many upregulated genes (which are involved in a variety of signaling pathways), since HSV-DICP34.5 constructs reactivated HIV latency more effectively than wild-type HSV by modulating the IKKα/β-NF-kB pathway and PP1-HSF1 pathway.

      (3) To further validate whether HSV-ΔICP34.5 can specifically activate the HIV latent reservoir, we conducted additional experiments using vaccinia virus and adenovirus as controls, and results showed that both vaccinia virus and adenovirus cannot effectively reactivate HIV latency (Figure S3). Moreover, the deletion of ICP0 gene from HSV-1 diminished the reactivation effect of HIV latency by HSV-1, and overexpressing ICP0 greatly reactivate the latent HIV (Figure S4, Figure S5), implying that this reactivation should be virus-specific and ICP0 plays an important factor on reversing HIV latency. Interestingly, we herein found that ICP34.5 can act as an antagonistic factor for this reactivation of HIV latency by HSV-1. Thus, after the deletion of ICP34.5, the ability of HSV to reverse HIV latency was significantly enhanced. Our research group will investigate the underlying mechanism in future studies. Thank you for your insightful mention.

      (3) The primate groups are too small and the results to variable to make averages. In Fig 5, the group with ART and saline has two slow rebounders. It is not correct to average those with the single quick rebounder. Here the interpretation is NOT supported by the data.

      Although authors provided some promising SIV DNA data, no additional animals were added. Groups of 3 animals are too small to make any conclusion, especially since the huge variability in response. The average numbers out of 3 are still presented in the paper, which is not proper science.

      No data are given of the effect of the deletion in primates. Now the deleted construct is compared with an empty vector containing no SIV genes. Authors provide new data in Fig S2 on the comparison of WT and modified vector in cells from PLWH, but data are not that convincing. A significant difference in reactivation is seen for LTR in only 2/4 donors and in Gag in 3/4 donors. (Additional question what is meaning of LTR mRNA, do authors relate to genomic RNA??)

      Thank you for your serious review and kind reminder.

      (1) We agree with you that it is not appropriated to use averages for this pilot study with limited numbers of macaques. We are currently unable to conduct another experiment with a larger number of macaques, but we think the results of this pilot study were very promising for further studies. Now, following your kind suggestions, we have removed the averages and now presented the data for each monkey individually in the revised manuscript. We have also modified the corresponding description accordingly (Line 254 to 262). Thank you for your understanding.

      (2) Regarding your comment about the lack of data on the deletion of ICP34.5 from HSV-1, we are sorry for previously unclear description. In fact, the empty vector used in our animal experiments not only does not contain SIV antigens but also has the ICP34.5 deletion. We have revised the corresponding description accordingly (For example, we use HSV-DICP34.5DICP47-empty, HSV-DICP34.5DICP47-sPD1-SIVgag/SIVenv instead of HSV-empty, HSV-sPD1-SIVgag/SIVenv). We hope this revision will address your question.

      (3) As for the reactivation effects observed in PLWH samples, the data may be not perfect, but we think this result (a significant difference in reactivation is seen for LTR in 2/4 donors and for Gag in 3/4 donors, and the purpose of detecting LTR RNA is to evaluate the level of virus replication) is promising to support our conclusion (The enhanced reactivation effect in primary CD4+ T cells by HSV-∆ICP34.5 than wild-type HSV). Of course, we recognize the need for more samples to gain a comprehensive understanding of reactivation effect in different individuals in future study. In addition, we corrected the description of LTR RNA (Lines 99-106 and 115-116). Thank you for the reminder!

      Discussion

      HSV vectors are mainly used in cancer treatment partially due to induced inflammation. Whether these are suitable to cure PLWH without major symptoms is a bit questionable to me and should at least be argued for.

      The RNA seq data add on to this worry and should at least be discussed.

      Thank you for your mention. As mentioned above, the RNASeq analysis is aimed to explore the HSV-ΔICP34.5-induced signaling pathways, but it is not suitable to use this data for assessing the toxicity of HSV-ΔICP34.5 constructs. Actually, ICP34.5 is a neurotoxicity factor that can antagonize innate immune responses, and thus ICP34.5 deletion is beneficial to improve the safety of HSV-based constructs. As expected, our data have demonstrated experimentally that HSV-DICP34.5 exhibited lower virulence and replication ability than wild-type HSV-1 (Figure S1). Importantly, HSV-DICP34.5 induced a lower level of inflammatory cytokines (including IL-6, IL-1β, and TNF-α) in primary CD4+ T cells from PLWH compared to HSV stimulation, likely due to its lower virulence and replication ability (Figure 1I-K). In addition, the CD4+ /CD8+ T cell ratio (Figure 5H) and body weight (Figure S10) after treatment were effectively ameliorated in the SIV-infected macaques of the ART+HSV-DICP34.5DICP47-sPD1-SIVgag/SIVenv group. Our data also demonstrated that there was no significant effect on the cell composition of peripheral blood in the SIV-infected macaques of ART+HSV-DICP34.5DICP47-sPD1-SIVgag/SIVenv group (Figure S11). These data suggested that the safety of HSV-DICP34.5 should be more tolerable than wild-type HSV vector. We have added a more comprehensive description in the revised Discussion (Lines 328-334). Thank you again for all of your kind comments and suggestions.

      Reviewer #2 (Public review):

      Summary:

      In this article Wen et. al., describe the development of a 'proof-of-concept' bi-functional vector based out of HSV-deltaICP-34.5's ability to purge latent HIV-1 and SIV genomes from cells. They show that co-infection of latent J-lat T-cell lines with a HSV-deltaICP-34.5 vector can reactivate HIV-1 from a latent state. Over- or stable expression of ICP 34.5 ORF in these cells can arrest latent HIV-1 genomes from transcription, even in the presence of latency reversal agents. ICP34.5 can co-IP with- and de-phosphorylate IKKa/b to block its interaction with NF-k/B transcription factor. Additionally, ICP34.5 can interact with HSF1 which was identified by mass-spec. Thus, the authors propose that the latency reversal effect of HSV-deltaICP-34.5 in co-infected JLat cells is due to modulatory effects on the IKKa/b-NF-kB and PP1-HSF-1 pathway.

      Next the authors cleverly construct a bifunctional HSV based vector with deleted ICP34.5 and 47 ORFs to purge latency and avoid immunological refluxes, and additionally expand the application of this construct as a vaccine by introducing SIV genes. They use this 'vaccine' in mouse models and show the expected SIV-immune responses. Experiments in rhesus macaques (RM), further elicit potential for their approach to reactivate SIV genomes and at the same time block their replication by antibodies. What was interesting in the SIV experiments is that the dual-functional vector vaccine containing sPD1- and SIV Gag/Env ORFs effectively delayed SIV rebound in RMs and in some cases almost neutralized viral DNA copy detection in serum. Very promising indeed, however there are some questions I wish the authors explored to answer, detailed below.

      Overall, this is an elegant and timely work demonstrating the feasibility of reducing virus rebound in animals, and potentially expand to clinical studies. The work was well written, and sections were clearly discussed.

      Strengths:

      The work is well designed, rationale explained and written very clearly for lay readers.

      Claims are adequately supported by evidence and well designed experiments including controls.

      We appreciate your positive comment for our work.

      Weaknesses:

      (1) It looks like ICP0 is also involved in latency reversal effects. More follow-up work will be required to test if this is in fact true.

      Both our data (Figure S4, Figure S5) and previous literature (Nabel et al., 1988, Mosca et al., 1987) have reported that HSV ICP0 may play a role in reversing HIV latency. However, the exact mechanisms behind this effect have not yet been fully elucidated. Of note, we herein reported for the first time that ICP34.5 can act as an antagonistic factor for this reactivation of HIV latency by HSV-1. Thus, after the deletion of ICP34.5, the ability of HSV to reverse HIV latency was significantly enhanced. Our research group will investigate the underlying mechanism in future studies. Thank you for your insightful mention.

      (2) It is difficult to estimate the depletion of the latent viral reservoir. The authors have tried to address this issue. A more convincing argument to this reviewer will be data to demonstrate that after the bi-functional vaccine, the animals show overall reduction in the number of circulating latent cells. The feasibility to obtain such a result is not clearly demonstrated.

      Thank you for your comment. As you mentioned, we have indeed measured both total DNA and integrated DNA (iDNA) in blood cells (see Figure 5E-F), which can provide support for the reduction of the latent viral reservoir. Thank you for your kind reminder.

      (3) The authors state that the reduced virus rebound detected following bi-functional vaccine delivery is due to latent genomes becoming activated and steady-state neutralization of these viruses by antibody response. This needs to be demonstrated. Perhaps cell-culture experiments from specimen taken from animals might help address this issue. In lab cultures one could create environments without antibody responses, under these conditions one would expect higher level of viral loads being released in response to the vaccine in question.

      Thank you for your valuable suggestion. We believe that the reduced virus rebound observed may be influenced by immune responses from T cells and antibodies induced by both ART and the vaccine. We appreciate your insight and agree that future studies should focus on investigating the activation effects of the vaccine under controlled conditions that simulate the absence of immune responses in primary animal cells. This will help us better understand the mechanisms involved and address your concerns more comprehensively.

      Reviewer #2 (Recommendations for the authors):

      The Authors have sufficiently addressed my comments. Below are a few minor changes that can help with clarity.

      Lines 126-127: This sentence should be changed. Perhaps, "these data suggests that .... Safety of... in PLWH might be tolerable, at least in vitro."

      Thanks for your suggestion. We have revised it accordingly. (Line 130).

      Lines 128-132: Would this not mean that reactivation is due to ICP0 gene? Have the authors tried to express ICP0-gene into J-Lat cells and see if that is the reason for reactivation? This seems somewhat incomplete. At the end of 132, please add ", in the presence of ICP0". Also a sentence describing this effect is warranted.

      Thank you for your insightful suggestion. Yes, both our data and previous literature supported that the ICP0 gene can play a significant role in the reactivation of HIV latency (Figure S4, Figure S5). Of note, we herein reported for the first time that ICP34.5 can act as an antagonistic factor for this reactivation of HIV latency by HSV-1. Thus, after the deletion of ICP34.5, the ability of HSV to reverse HIV latency was significantly enhanced. We have described this effect in the revised version accordingly. Additionally, we have added the phrase “in the presence of ICP0” to the results section (Lines 137) to clarify this point.

      MOSCA, J. D., BEDNARIK, D. P., RAJ, N. B., ROSEN, C. A., SODROSKI, J. G., HASELTINE, W. A., HAYWARD, G. S. & PITHA, P. M. 1987. Activation of human immunodeficiency virus by herpesvirus infection: identification of a region within the long terminal repeat that responds to a trans-acting factor encoded by herpes simplex virus 1. Proc Natl Acad Sci U S A 84:  7408.DOI: https://doi.org/10.1073/pnas.84.21.7408, PMID: 2823260

      NABEL, G. J., RICE, S. A., KNIPE, D. M. & BALTIMORE, D. 1988. Alternative mechanisms for activation of human immunodeficiency virus enhancer in T cells. Science 239:  1299.DOI: https://doi.org/10.1126/science.2830675, PMID: 2830675

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      By using the biophysical chromosome stretching, the authors measured the stiffness of chromosomes of mouse oocytes in meiosis I (MI) and meiosis II (MII). This study was the follow-up of previous studies in spermatocytes (and oocytes) by the authors (Biggs et al. Commun. Biol. 2020: Hornick et al. J. Assist. Rep. and Genet. 2015). They showed that MI chromosomes are much stiffer (~10 fold) than mitotic chromosomes of mouse embryonic fibroblast (MEF) cells. MII chromosomes are also stiffer than the mitotic chromosomes. The authors also found that oocyte aging increases the stiffness of the chromosomes. Surprisingly, the stiffness of meiotic chromosomes is independent of meiotic chromosome components, Rec8, Stag3, and Rad21L. with aging.

      Strengths:

      This provides a new insight into the biophysical property of meiotic chromosomes, that is chromosome stiffness. The stiffness of chromosomes in meiosis prophase I is ~10-fold higher than that of mitotic chromosomes, which is independent of meiotic cohesin. The increased stiffness during oocyte aging is a novel finding.

      Weaknesses:

      A major weakness of this paper is that it does not provide any molecular mechanism underlying the difference between MI and MII chromosomes (and/or prophase I and mitotic chromosomes).

      We acknowledge that our study does not provide a comprehensive explanation for the stage-related alterations in chromosome stiffness; however, we believe that the observation of these changes is itself of broad interest. Initially, we hypothesized that DNA damage or depletion of meiosis-specific cohesin might contribute to the observed increase in chromosome stiffness. However, our experimental finding did not support these hypotheses, indicating that neither DNA damage nor cohesion depletion is responsible for the stiffness increase. The molecular basis underlying the stage-related stiffness increase remains elusive and requires exploration in future studies. In the Discussion, we propose that factors such as condensin, nuclear proteins, and histone methylation may play a role in regulating meiotic chromosome stiffness. The involvement of these factors in stage-related chromosome stiffening requires future investigation.

      Reviewer #2 (Public Review):

      This paper reports investigations of chromosome stiffness in oocytes and spermatocytes. The paper shows that prophase I spermatocytes and MI/MII oocytes yield high Young Modulus values in the assay the authors applied. Deficiency in each one of three meiosis-specific cohesins they claim did not affect this result and increased stiffness was seen in aged oocytes but not in oocytes treated with the DNA-damaging agent etoposide.

      The paper reports some interesting observations which are in line with a report by the same authors of 2020 where increased stiffness of spermatocyte chromosomes was already shown. In that sense, the current manuscript is an extension of that previous paper, and thus novelty is somewhat limited. The paper is also largely descriptive as it does neither propose a mechanism nor report factors that determine the chromosomal stiffness.

      There are several points that need to be considered.

      (1) Limitations of the study and the conclusions are not discussed in the "Discussion" section and that is a significant gap. Even more so as the authors rely on just one experimental system for all their data - there is no independent verification - and that in vitro system may be prone to artefacts.

      Our experimental system has been used to study different types of chromosome stiffness as well as nuclear stiffness.  We have compared our results with previously published data and found the data is consistent across different experiments. To address the reviewer’s concern, we describe the limitations of our in vitro experimental approach in the Discussion section.

      (2) It is somewhat unfortunate that they jump between oocytes and spermatocytes to address the cohesin question. Prophase I (pachytene) spermatocytes chromosomes are not directly comparable to MI or MII oocyte chromosomes. In fact, the authors report Young Modulus values of 3700 for MI oocytes and only 2700 for spermatocyte prophase chromosomes, illustrating this difference. Why not use oocyte-specific cohesin deficiencies?

      In this study, our goal was to investigate the mechanism underlying the increased chromosome stiffness observed during prophase I. Ideally, we would have compared wild-type and cohesin-deleted mouse oocytes at the metaphase I (MI) stage. However, experimental constraints made this approach unfeasible: spermatocytes and oocytes from  Rec8<sup>-/-</sup> and  Stag3<sup>-/-</sup> mutant mice cannot reach MI stage, and  Rad21l<sup>-/-</sup> mutant mice are sterile in males and subfertile in females, because cohesin proteins are crucial for germline cell development.

      Additionally, collecting prophase I chromosomes from oocytes is exceptionally challenging and requires fetal mice as prophase I oocyte sources because female oocytes progress to the diplotene stage during fetal development. The process is further complicated by the difficulty of genotyping fetal mice, making the study of female prophase I impracticable. By contrast, spermatocytes are continuously generated in males throughout life, with meiotic stages readily identifiable, making them more accessible for analysis.

      Our findings consistently showed increased chromosome stiffness in both prophase I spermatocytes and MI oocytes, suggesting that the phenomenon is not sex-specific. This observation implies that similar effects on chromosome stiffness may occur across meiotic stages, from prophase I to MI.

      (3) It remains unclear whether the treatment of oocytes with the detergent TritonX-100 affects the spindle and thus the chromosomes isolated directly from the Triton-lysed oocytes. In fact, it is rather likely that the detergent affects chromatin-associated proteins and thus structural features of the chromosomes.

      Regarding the use of Triton X-100, it is important to emphasize that the concentration used (0.05%) is very low and unlikely to significantly affect chromosome stiffness. To support this assertion, we have provided additional evidence in the revised manuscript demonstrating that this low concentration of Triton X-100 has a negligible effect on chromosome stiffness (Supplement Fig. 5, Right panel).

      (4) Why did the authors use mouse strains of different genetic backgrounds, CD-1, and C57BL/6? That makes comparison difficult. Breeding of heterozygous cohesin mutants will yield the ideal controls, i.e. littermates.

      The genetic mutant mice, all in a C57BL/6 background, were generously provided by Dr. Philip Jordan and delivered to our lab. As our lab does not currently maintain C57BL/6 colony and given that this strain typically produces small litter sizes - which would have complicated the remainder of the study - we chose CD-1 mice as the control group and used C57BL/6 mice specifically for the cohesin study. To address potential concerns regarding genetic background differences, we compared our results with previously published data from C57BL/6 mice and found no significant differences (2710 ± 610 Pa versus 3670 ± 840 Pa, P= 0.4809) (Biggs et al., 2020). Furthermore, prophase I spermatocytes from CD-1 mice showed no significant difference compared to any of the three cohesin-deleted C57BL/6 mutant mice, suggesting that chromosome stiffness is not significantly influenced by genetic background.

      (5) How did the authors capture chromosome axes from STAG3-deficienct spermatocytes which feature very few if any axes? How representative are those chromosomes that could be captured?

      We isolated chromosomes from prophase I mutant spermatocytes, which were identified by their large size, round shape, and thick chromosomal threads - characteristics indicative of advanced condensation and a zygotene-like stage during prophase I (Supplemental Fig. 3). The methodology for isolating these chromosomes has been described in details in our previous publication (Biggs et al., 2020), which is referenced in the current manuscript.

      Reviewer #3 (Public Review):

      Summary:

      Understanding the mechanical properties of chromosomes remains an important issue in cell biology. Measuring chromosome stiffness can provide valuable insights into chromosome organization and function. Using a sophisticated micromanipulation system, Liu et al. analyzed chromosome stiffness in MI and MII oocytes. The authors found that chromosomes in MI oocytes were ten-fold stiffer than mitotic ones. The stiffness of chromosomes in MI mouse oocytes was significantly higher than that in MII oocytes. Furthermore, the knockout of the meiosis-specific cohesin component (Rec8, Stag3, Rad21l) did not affect meiotic chromosome stiffness. Interestingly, the authors showed that chromosomes from old MI oocytes had higher stiffness than those from young MI oocytes. The authors claimed this effect was not due to the accumulated DNA damage during the aging process because induced DNA damage reduced chromosome stiffness in oocytes.

      Strengths:

      The technique used (isolating the chromosomes in meiosis and measuring their stiffness) is the authors' specialty. The results are intriguing and informative to the chromatin/chromosome and other related fields.

      Weaknesses:

      (1) How intact the measured chromosomes were is unclear.

      Currently, a well-calibrated chromosome mechanics experiment requires the extracellular isolation of chromosomes. In experiments conducted parallel to those in our previous study (Biggs et al., 2020), we obtained quantitatively consistent results, including measurements of the Young modulus for prophase I spermatocyte chromosomes.  Our isolation approach is significantly gentler than bulk methods that rely on hypotonic buffer-driven cell lysis and centrifugation. If substantial chromosomal damage had occurred during isolation, we would expect greater variation between experiments, as different amounts or types of damage could influence the results. 

      (2) Some control data needs to be included.

      We used wild-type prophase I spermatocytes and metaphase I (MI) oocytes as controls. To validate our findings, we compared some of our results with those reported in a previous study and observed consistent outcomes (Biggs et al., 2020).

      (3) The paper was not well-written, particularly the Introduction section.

      We have revised the paper and improved the overall quality of the manuscript.

      (4) How intact were the measured chromosomes? Although the structural preservation of the chromosomes is essential for this kind of measurement, the meiotic chromosomes were isolated in PBS with Triton X-100 and measured at room temperature. It is known that chromosomes are very sensitive to cation concentrations and macromolecular crowding in the environment (PMID: 29358072, 22540018, 37986866). It would be better to discuss this point.

      As suggested, we investigated the impact of PBS and Triton X-100 on chromosome stiffness. Our findings indicate that neither PBS nor Triton X-100 caused significant changes in chromosome stiffness (Supplemental Fig. 5).

      Recommendations For The Authors:

      Major points of Reviewers that the Editor indicated should be addressed

      (1) Reviewer's point 3, the effect of the high concentration of etoposide: It would be advisable to use lower concentrations of etoposide to observe the effect of DNA damage on chromosome stiffness more accurately.

      The effect of etoposide on oocyte is dose-dependent (Collins et al., 2015). Oocytes are generally not highly sensitive to DNA damage, and even at relatively high concentrations, not all may exhibit a response. To ensure that sufficient DNA damage in the oocytes we isolated, we used relatively high concentration of etoposide for the experiment. This concentration (50 μg/ml) falls within the typical range reported in the literature (Marangos and Carroll, 2012)(Cai et al., 2023)(Lee et al., 2023). As the reviewer suggested, we tested two additional lower concentrations of etoposide (5 μg/ml and 25 μg/ml) (see Fig. 5 C). We did not observe any significant differences in chromosome stiffness in 5 µg/ml etoposide-treated oocytes compared to the control. However, higher concentrations of etoposide (25 μg/ml) significantly reduced oocyte chromosome stiffness compared to the control.

      Revision to manuscript:

      “Results at lower etoposide concentrations revealed that chromosome stiffness in untreated control oocytes was not significantly different from that in oocytes treated with 5 μg/ml etoposide (3780 ± 700 Pa versus 3930 ± 400 Pa, P = 0.8624). However, chromosome stiffness in untreated oocytes was significantly higher than that in oocytes treated with 25 μg/ml etoposide (3780 ± 700 Pa versus 1640 ± 340 Pa, P = 0.015) (Figure 5C).”

      (2) Reviewer's point 3, the effect of Triton X-100: This is related to the concern of the #3 reviewer. It is critical to check whether the detergent does not affect the stiffness indirectly or not.

      To demonstrate that the low concentration of Triton X-100 does not influence chromosome stiffness, we conducted additional experiments. First, we isolated chromosomes and measured their stiffness. Then, we treated the chromosomes with 0.05% Triton X-100 via micro-spraying and remeasured the stiffness. The results showed no significant difference (see Supplement Fig. 5 right panel).

      Revision to manuscript:

      “In addition to past experiments indicating that mitotic chromosomes are stable for long periods after their isolation (Pope et al., 2006), we carried out control experiments on mouse oocyte chromosomes where we incubated them for 1 hour in PBS, or exposed them to a flow of Triton X-100 solution for 10 minutes; there was no change in chromosome stiffness in either case (Methods and Supplementary Fig. 5).”

      (3) Reviewer's point 1, the effect of the buffer composition: Please describe how the composition affects the stiffness of the chromosomes.

      PBS is an economical and effective buffer solution that closely mimics the osmotic conditions of the cytoplasm, which is crucial for maintaining chromosomal structural integrity. Appropriate ion concentrations are crucial for preserving chromosome integrity, as imbalances—either too high or too low—can alter chromosome morphology (Poirier and Marko, 2002). When chromosomes are stored in PBS, their stiffness remains relatively stable, even with prolonged exposure, ensuring minimal changes to their physical properties. To confirm this, we isolated chromosomes and measured their stiffness. After one-hour incubation in PBS, we remeasured stiffness and observed no significant differences, which demonstrated that chromosomes remain stable in PBS (see Supplement Fig.5 left panel).

      Revision to manuscript:

      “In this study, we developed a new way to isolate meiotic chromosomes and measure their stiffness. However, one concern is that the measurements were conducted in PBS solution, which is different from the intracellular environment. To address this, we monitored chromosome stiffness overtime in PBS solution and found that it remained stable over a period of one hour (Supplement Fig. 5 Left panel).”

      Reviewer #1 (Recommendations For The Authors):

      Major points:

      (1) Previously, the role of condensin complexes in chromosome stiffness is shown (Sun et al. Chromosome Research, 2018). Thus, at least the authors described the condensin staining on MI and MII chromosomes.

      We have added sentences in the discussion to elaborate on the role of condensin.

      Revision to manuscript:

      “Several factors, including condensin, have been found to affect chromosome stiffness (Sun et al., 2018). Condensin exists in two distinct complexes, condensin I and condensin II, and both are active during meiosis. Published studies indicate that condensin II is more sharply defined and more closely associated with the chromosome axis from anaphase I to metaphase II (Lee et al., 2011). Additionally, condensin II appears to play a more significant role in mitotic chromosome mechanics compared to condensin I (Sun et al., 2018). Thus, condensin II likely contributes more significantly to meiotic chromosome stiffness than condensin I.”

      (2) Although the authors nicely showed the difference in the stiffness between MI and MII chromosomes (Figure 2), as known, MI chromosomes are bivalent (with four chromatids) while MII chromosomes are univalent (with two chromatids). The physical property of the chromosomes would be affected by the number of chromatids. It would be essential for the authors to measure the physical properties of a univalent of MI chromosomes from mice defective in meiotic recombination such as Spo11 and/or Mlh3 KO mice.

      The reviewer correctly pointed out that the number of chromatids in chromosomes differs between metaphase I (MI) and metaphase II (MII) stages. We have addressed this difference by calculating Young’s modulus (E), a mechanical property that describes the elasticity of a material, independent of its geometry. Young’s modulus describes the intrinsic properties of the material itself, rather than the specific characteristics of the object being tested. It is calculated as E=(F/A)/(∆L/L0), where F was the force given to stretch the chromosome, A was the cross-section area, ∆L was the length change of the chromosome, and L0 was the original length of the chromosome. While an increase in chromosome or chromatid numbers, results in a larger cross-sectional area, leading to a higher doubling force (F). This variation in chromosome number or cross-sectional area does not impact the calculation of chromosome stiffness/Young’s modulus (E). While study of the mutants suggested by the referee would certainly be interesting, it would be likely that the absence of these key recombination factors would impact chromosome stiffness in a more complex way than just changing their thickness; this type of study is beyond the scope of the present manuscript and is an exciting direction for future studies.

      (3) In Figure 5, the authors measure the stiffness of etoposide-treated MI chromosomes. The concentration of the drug was 50 ug/ml, which is very high. The authors should analyze the different concentrations of the drug to check the chromosome stiffness. Moreover, etoposide is an inhibitor of Topoisomerase II. The effect of the drug might be caused by the defective Top2 activity, rather than Top2-adducts, thus DNA damage. It is very important to check the other Top2 inhibitors or DNA-damaging agents to generalize the effect of DNA damage on chromosome stiffness. Moreover, DNA damage induces the DNA damage response. It is important to check the effect of DDR inhibitors on the damage-induced change of stiffness.

      The reviewer is correct in noting that etoposide can induce DNA damage and inhibit Top2 activity. To address this concern, our previous DNase experiment provided further clarity and supports our results of this study (Biggs et al., 2020). This experiment was conducted in vitro, where DNase treatment caused DNA damage on chromosomes without affecting Top2 activity or triggering DNA damage response. The results demonstrated that DNase treatment led to reduced chromosome stiffness, which aligns with the findings presented in our manuscript.

      (4) In the same line as the #3 point, the authors also need to check the effect of etoposide on the stiffness of mitotic chromosomes from MEF.

      Experiments on MEF mitotic chromosomes were designed to serve as a reference for the meiotic chromosome studies. The etoposide experiments on meiotic chromosomes specifically aimed to investigate how DNA damage affects meiotic chromosome structure. While it would be interesting to explore the effects of etoposide-induced DNA damage on mitotic chromosomes, it represents a distinct research question that falls outside the scope of the current study.

      Minor points:

      (1) Line 141-142: Previous studies by the author analyzed the stiffness of mitotic chromosomes from pro-metaphase. Which stage of cell cycles did the authors analyze here?

      To ensure consistency in our experiments, we also measured the stiffness of mitotic chromosomes at the prometaphase stage. The precise stage used is very near to metaphase, at the very end of the prometaphase stage. We have modified the manuscript to clarify this point.

      Revision to manuscript:

      “For comparison with the meiotic case, we measured the chromosome stiffness of Mouse Embryonic Fibroblasts (MEFs) at late pro-metaphase (just slightly before their attachment to the mitotic spindle) and found that the average Young’s modulus was 340 ± 80 Pa (Figure 2B). The value is consistent with our previously published data, where the modulus for MEFs was measured to be 370 ± 70 Pa (Biggs et al., 2020).”

      (2) Line 157: Here, the doubling force of MI (and MII) oocytes should be described in addition to those of spermatocytes.

      The purpose of this paragraph is to demonstrate the reproductivity and consistency of our experiments. In this section, we compared our data with previously published findings. Published data do not include chromosome stiffness measurement from MI mouse oocytes. Our experiment is the first to assess this. Therefore, we did not include MI mouse oocytes in that comparison. To clarify this, we have added sentences to highlight the comparison of doubling force.

      Revision to manuscript:

      “Here, we found that the doubling forces of chromosomes from MI and MII oocytes are 3770 ± 940 pN and 510 ± 50 pN, respectively. We conclude that chromosomes from MI oocytes are much stiffer than those from both mitotic cells and MII oocytes (Supplement Fig. 2), in terms of either Young’s modulus or doubling force.”

      (3) Line 202: What stage of prophase I do the authors mean by the spermatocyte stage here? Diakinesis, Metaphase I or prometaphase I? I am not sure how the authors can determine a specific stage of prophase I by only looking at the thickness of the chromosomes. Please show the thickness distribution of WT and Rec8<sup>-/-</sup> chromosomes.

      We have reworded the sentence and clarified that the spermatocyte stage is prophase I stage. Since Rec8<sup>-/-</sup> spermatocytes cannot progress beyond the pachytene stage of prophase I, the isolated chromosomes must be in prophase I rather than diakinesis, metaphase I, prometaphase I, or any later stages (Xu et al., 2005). Based on the cell size and degree of chromosome condensation (Biggs et al., 2020), it is most likely that the measured chromosomes are at the zygotene-like stage. However, as we cannot definitively determine the exact substage of prophase I, thus, we have referred to them simply as prophase I.

      Revision to manuscript:

      “We isolated chromosomes from Rec8<sup>-/-</sup> prophase I spermatocytes, which displayed large and round cell size and thick chromosomal threads, indicative of advanced chromosome compaction after stalling at a zygotene-like prophase I stage (Supplement Fig. 3). The combination of large cell size and degree of chromosome compaction allowed us to reliably identify Rec8<sup>-/-</sup> prophase I chromosomes. Using micromanipulation, we measured chromosome stiffness by stretching the chromosomes (Supplement Fig. 3) (Biggs et al., 2019).”

      Reviewer #2 (Recommendations For The Authors):

      (1) Line 135: that statement is not substantiated; better to show retraction data and full reversibility.

      We added a figure showing oocyte chromosome stretching, which showed that the oocyte chromosome is elastic, and that the stretching process is reversible (Supplement Fig.1).

      (2) Line 144: the authors claim that the Young Modulus of MII oocytes is "slightly" higher than that of mitotic cells (MEFs). Well, "slightly" means it is rather similar, and therefore the commonly used statement that MII is similar to mitosis is OK - contrary to the authors' claim.

      We have removed the word “slightly” in the manuscript. The difference is statistically significant.

      Revision to manuscript:

      “Surprisingly, despite this reduction, the stiffness of MII oocyte chromosomes was still significantly higher than that for mitotic cells (Figure 2B).”

      (3) There are a lot of awkward sentences in this text. Some sentences lack words, are not sufficiently precise in wording and/or logic, and there are numerous typos. Some examples can be found in lines 89 (grammar), 94, 95 ("looked"), 98, 101 ("difference" - between what?), and some are commonplaces or superficial (lines 92/93, 120..., ). Occasionally the present and past tense are mixed (e.g. in M&M). Thus the manuscript is quite poorly written.

      Thanks for the comments of the reviewer. We have revised all the sentences highlighted by the reviewer and polished the entire manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) Line 48. "We then investigated the contribution of meiosis-specific cohesin complexes to chromosome stiffness in MI and MII oocytes." There is no data on oocytes with meiosis-specific cohesin KO. This part should be corrected.

      We have corrected this error.

      Revision to manuscript:

      “We examined the role of meiosis-specific cohesin complexes in regulating chromosome stiffness.”

      (2) Lines 155-157. The result of MI mouse oocyte chromosomes should also be mentioned here (Supplementary Figure 1).

      Please see our response to Reviewer 1 – Minor Point 2.

      (3) Line 163. "The stiffness of chromosomes in MI mouse oocytes is significantly higher compared to MII oocytes."<br /> Is this because two homologs are paired in MI chromosomes (but not in MII chromosomes)? The authors may want to discuss the possible mechanism.

      Please see our response to Reviewer 1 – Major Point 2.

      (4) Line 188: "We hypothesized that MI oocytes... would have higher chromosome stiffness than MII oocytes." Why did the authors measure chromosomes from spermatocytes but not MI oocytes?

      Both spermatocytes and oocytes from Rec8<sup>-/-</sup>, Stag3<sup>-/-</sup>, and Rad21l<sup>-/-</sup> mutant mice cannot reach MI stage because cohesin proteins are crucial for germline-cell development. We chose to use spermatocytes in our study because collecting fetal meiotic oocytes is extremely difficult, and genotyping fetal mice adds another layer of complexity to the experiments. In females, all oocytes complete prophase I and progress to the dictyotene stage during the fetal stage. Obtaining individual oocytes at this stage is challenging. In contrast, spermatocytes are continuously generated at all stages in males.

      (5) To support the authors' conclusion, verifying the KO of REC8, STAG3, and RAD21L by immunostaining or other methods is essential.

      These mice are provided by one of the authors, Dr. Philip Jordan, who has published several papers using these knockout mice (Hopkins et al., 2014)(Ward et al., 2016). The immunostaining of these models has already been well-characterized in those previous studies. In addition to performing double genotyping, we also use the size of the collected testes as an additional verification of the mutant genotype. These knockout mice have significantly smaller testes compared to their wild-type counterparts, providing a clear physical indicator of the mutation.

      (6) Some of the cited papers and descriptions in the Introduction are not appropriate and confusing. This part should be improved:

      Line 79. Recent studies have revealed that the 30-nm fiber is not considered the basic structure of chromatin (e.g., review, PMID: 30908980; original papers, PMID: 19064912, 22343941, 28751582). This point should be included.

      We have corrected the references as needed. Additionally, thank you for the updated information regarding the 30-nm fiber. We have removed all the descriptions about the 30-nm fiber to ensure the information is accurate and up to date.

      (7) Line 83. Reviews on mitotic chromosomes, rather than Ref. 9, should be cited here. For instance, PMID: 33836947, 31230958.

      We have corrected it and added references according to the review’s suggestion.

      (8) Line 85. Refs. 10 and 11 are not on the "Scaffold/Radial-Loop" model. For instance, PMID: 922894, 277351, 12689587. The other popular model is the hierarchical helical folding model (PMID: 98280, 15353545).

      We have corrected it and added appropriate references according to the review’s suggestion. Regarding the hierarchical helical folding model, our experiments do not provide data that either support or refute this model. Thus, we have opted not to include any discussion of this model in our manuscript.

      (9) Figure legends. There is no description of the statistical test.

      We have added the description of the statistical test at the end of the figure legends for clarity.

      (10) Line 156. The authors should mention which stages in spermatocyte prophase I (pachytene?) were used for their measurement.

      We cannot precisely determine the substage of prophase I in the spermatocytes although it is most likely in the pachytene stage.

      (11) Line 241. "DNA damage reduces chromosome stiffness in oocytes." It would be better to show how much damage was induced in aged and etoposide-treated chromosomes, for example, by gamma-H2AX immunostaining. In addition, there are some papers that show DNA damage makes chromatin/chromosomes softer (e.g., PMID: 33330932). The authors need to cite these papers.

      The effects of etoposide and age on meiotic oocytes has been published (Collins et al., 2015)(Marangos et al., 2015)(Winship et al., 2018).

      We are grateful for the citation information provided by the reviewer and have added it to our manuscript.

      Revision to manuscript:

      “Overall, these findings suggest that DNA damage reduces chromosome stiffness in oocytes instead of increasing it, which aligns with other studies showing that DNA damage can make chromosomes softer (Dos Santos et al., 2021). These results suggest that the increased chromosome stiffness observed in aged oocytes is not due to DNA damage.”

      (12) Line 328. Senescence?

      This error is corrected in the revised manuscript.

      Revision to manuscript:

      “Defective chromosome organization is often related to various diseases, such as cancer, infertility, and senescence (Thompson and Compton, 2011; Harton and Tempest, 2012; He et al., 2018).”

      References:

      Biggs, R., P.Z. Liu, A.D. Stephens, and J.F. Marko. 2019. Effects of altering histone posttranslational modifications on mitotic chromosome structure and mechanics. Mol. Biol. Cell. 30:820–827. doi:10.1091/mbc.E18-09-0592.

      Biggs, R.J., N. Liu, Y. Peng, J.F. Marko, and H. Qiao. 2020. Micromanipulation of prophase I chromosomes from mouse spermatocytes reveals high stiffness and gel-like chromatin organization. Commun. Biol. 3:1–7. doi:10.1038/s42003-020-01265-w.

      Cai, X., J.M. Stringer, N. Zerafa, J. Carroll, and K.J. Hutt. 2023. Xrcc5/Ku80 is required for the repair of DNA damage in fully grown meiotically arrested mammalian oocytes. Cell Death Dis. 14:1–9. doi:10.1038/s41419-023-05886-x.

      Collins, J.K., S.I.R. Lane, J.A. Merriman, and K.T. Jones. 2015. DNA damage induces a meiotic arrest in mouse oocytes mediated by the spindle assembly checkpoint. Nat. Commun. 6. doi:10.1038/ncomms9553.

      Harton, G.L., and H.G. Tempest. 2012. Chromosomal disorders and male infertility. Asian J. Androl. 14:32–39. doi:10.1038/aja.2011.66.

      He, Q., B. Au, M. Kulkarni, Y. Shen, K.J. Lim, J. Maimaiti, C.K. Wong, M.N.H. Luijten, H.C. Chong, E.H. Lim, G. Rancati, I. Sinha, Z. Fu, X. Wang, J.E. Connolly, and K.C. Crasta. 2018. Chromosomal instability-induced senescence potentiates cell non-autonomous tumourigenic effects. Oncogenesis. 7. doi:10.1038/s41389-018-0072-4.

      Hopkins, J., G. Hwang, J. Jacob, N. Sapp, R. Bedigian, K. Oka, P. Overbeek, S. Murray, and P.W. Jordan. 2014. Meiosis-Specific Cohesin Component, Stag3 Is Essential for Maintaining Centromere Chromatid Cohesion, and Required for DNA Repair and Synapsis between Homologous Chromosomes. PLoS Genet. 10:e1004413. doi:10.1371/journal.pgen.1004413.

      Lee, C., J. Leem, and J.S. Oh. 2023. Selective utilization of non-homologous end-joining and homologous recombination for DNA repair during meiotic maturation in mouse oocytes. Cell Prolif. 56:1–12. doi:10.1111/cpr.13384.

      Lee, J., S. Ogushi, M. Saitou, and T. Hirano. 2011. Condensins I and II are essential for construction of bivalent chromosomes in mouse oocytes. Mol. Biol. Cell. 22:3465–3477. doi:10.1091/mbc.E11-05-0423.

      Marangos, P., and J. Carroll. 2012. Oocytes progress beyond prophase in the presence of DNA damage. Curr. Biol. 22:989–994. doi:10.1016/j.cub.2012.03.063.

      Marangos, P., M. Stevense, K. Niaka, M. Lagoudaki, I. Nabti, R. Jessberger, and J. Carroll. 2015. DNA damage-induced metaphase i arrest is mediated by the spindle assembly checkpoint and maternal age. Nat. Commun. 6:1–10. doi:10.1038/ncomms9706.

      Poirier, M.G., and J.F. Marko. 2002. Mitotic chromosomes are chromatin networks without a mechanically contiguous protein scaffold. Proc. Natl. Acad. Sci. U. S. A. 99:15393–15397. doi:10.1073/pnas.232442599.

      Pope, L.H., C. Xiong, and J.F. Marko. 2006. Proteolysis of Mitotic Chromosomes Induces Gradual and Anisotropic Decondensation Correlated with a Reduction of Elastic Modulus and Structural Sensitivity to Rarely Cutting Restriction Enzymes. Mol. Biol. Cell. 17:104. doi:10.1091/MBC.E05-04-0321.

      Dos Santos, Á., A.W. Cook, R.E. Gough, M. Schilling, N.A. Olszok, I. Brown, L. Wang, J. Aaron, M.L. Martin-Fernandez, F. Rehfeldt, and C.P. Toseland. 2021. DNA damage alters nuclear mechanics through chromatin reorganization. Nucleic Acids Res. 49:340–353. doi:10.1093/nar/gkaa1202.

      Sun, M., R. Biggs, J. Hornick, and J.F. Marko. 2018. Condensin controls mitotic chromosome stiffness and stability without forming a structurally contiguous scaffold. Chromosom. Res. 26:277–295. doi:10.1007/s10577-018-9584-1.

      Thompson, S.L., and D.A. Compton. 2011. Chromosomes and cancer cells. Chromosom. Res. 19:433–444. doi:10.1007/s10577-010-9179-y.

      Ward, A., J. Hopkins, M. Mckay, S. Murray, and P.W. Jordan. 2016. Genetic Interactions Between the Meiosis-Specific Cohesin Components, STAG3, REC8, and RAD21L. G3 (Bethesda). 6:1713–24. doi:10.1534/g3.116.029462.

      Winship, A.L., J.M. Stringer, S.H. Liew, and K.J. Hutt. 2018. The importance of DNA repair for maintaining oocyte quality in response to anti-cancer treatments, environmental toxins and maternal ageing. Hum. Reprod. Update. 24:119–134. doi:10.1093/humupd/dmy002.

      Xu, H., M.D. Beasley, W.D. Warren, G.T.J. van der Horst, and M.J. McKay. 2005. Absence of Mouse REC8 Cohesin Promotes Synapsis of Sister Chromatids in Meiosis. Dev. Cell. 8:949–961. doi:10.1016/j.devcel.2005.03.018.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:<br /> (1) I still think that the authors need to set the importance of the differences in aggregation in the context of toxicity arising from protein misfolding/aggregation. While the authors state the limitation in the response, and I agree that a single manuscript cannot complete a field of investigation I still think that this is an important point missing from this manuscript.

      We thank the reviewer for the comments, we are working to address this issue and will elucidate in our future studies.

      (2) I retain my reservations about the fluorescence intensity data shown for Rho123, DCF, Jc1, and MitoSox. The errors are much lower than what we typically achieve in biological experiments in our as well as our collaborator's lab. A glimpse at published literature would also support our statement. Specifically, RHO123 shows a large difference in errors between Figure 5 and Figure 5 Supplement 2. The point to note is that the absolute intensities do not vary between these figures, but the errors are the order of magnitude lower in the main figures. I, therefore, accept these figures in good faith without further interrogation.

      We really value these comments from the reviewer and also do not want to cause any potential misleading interpretations of the data. We have therefore asked a more experienced author to redo all the experiments on the physiological indicators (Rho123, JC1 and MitoSox) that directly reflect mitochondrial function, and left out the DCF data. The new experimental data are in line with our previous results. We have clearly described these changes in the Results, Materials and Methods and Figure legends sections.

      The new data from the redo experiments are: Rho123 fluorescence intensity data in Figure 5A, B and C; Figure 6B; JC1 staining in Figure 6E; JC1 staining in Figure 7A, B and D.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper introduces a new approach to modeling human behavioral responses using image-computable models. They create a model (VAM) that is a combination of a standard CNN coupled with a standard evidence accumulation model (EAM). The combined model is then trained directly on image-level data using human behavioral responses. This approach is original and can have wide applicability. However, many of the specific findings reported are less compelling.

      Strengths:

      (1) The manuscript presents an original approach to fitting an image-computable model to human behavioral data. This type of approach is sorely needed in the field.

      (2) The analyses are very technically sophisticated.

      (3) The behavioral data are large both in terms of sample size (N=75) and in terms of trials per subject.

      Weaknesses:

      Major

      (1) The manuscript appears to suggest that it is the first to combine CNNs with evidence accumulation models (EAMs). However, this was done in a 2022 preprint

      (https://www.biorxiv.org/content/10.1101/2022.08.23.505015v1) that introduced a network called RTNet. This preprint is cited here, but never really discussed. Further, the two unique features of the current approach discussed in lines 55-60 are both present to some extent in RTNet. Given the strong conceptual similarity in approach, it seems that a detailed discussion of similarities and differences (of which there are many) should feature in the Introduction.

      Thanks for pointing this out—we agree that the novel contributions of our model (the VAM) with respect to prior related models (including RTNet) should be clarified, and have revised the Introduction accordingly. We include the following clarifications in the Introduction:

      “The key feature of the VAM that distinguishes it from prior models is that the CNN and EAM parameters are jointly fitted to the RT, choice, and visual stimulus data from individual participants in a unified Bayesian framework. Thus, both the visual representations learned by the CNN and the EAM parameters are directly constrained by behavioral data. In contrast, prior models first optimize the CNN to perform the behavioral task, then separately fit a minimal set of high-level CNN parameters [RTNet, Rafiei et al., 2024] and/or the EAM parameters to behavioral data [Annis et al., 2021; Holmes et al., 2020; Trueblood et al., 2021]. As we will show, fitting the CNN with human data—rather than optimizing the model to perform a task—has significant consequences for the representations learned by the model.”

      E.g. in the case of RTNet, the variability of the Bayesian CNN weight distribution, the decision threshold, and the magnitude of the noise added to the images are adjusted to match the average human accuracy (separately for each task condition). RTNet is an interesting and useful model that we believe has complementary strengths to our own work.

      Since there are several other existing models in addition to the VAM and RTNet that use CNNs to generate RTs or RT proxies (by our count, at least six that we cite earlier in the Introduction), we felt it was inappropriate to preferentially include a detailed comparison of the VAM and RTNet beyond the passage quoted above.

      (2) In the approach here, a given stimulus is always processed in the same way through the core CNN to produce activations v_k. These v_k's are then corrupted by Gaussian noise to produce drift rates d_k, which can differ from trial to trial even for the same stimulus. In other words, the assumption built into VAM appears to be that the drift rate variability stems entirely from post-sensory (decisional) noise. In contrast, the typical interpretation of EAMs is that the variability in drift rates is sensory. This is also the assumption built into RTNet where the core CNN produces noisy evidence. Can the authors comment on the plausibility of VAM's assumption that the noise is post-sensory?

      In our view, the VAM is compatible with a model in which the drift rate variability for a given stimulus is due to sensory noise, since we do not specify the origin of the Gaussian noise added to the drift rates. As the reviewer notes, the CNN component of the VAM processes a given stimulus deterministically, yielding the mean drift rates. This does not preclude us from imagining an additional (unmodeled) sensory process that adds variability to the drift rates. The VAM simply represents this and other hypothetical sources of variability as additive Gaussian noise. We agree however that it is worthwhile to think about the origin of the drift rate variability, though it is not a focus of our work.

      (3) Figure 2 plots how well VAM explains different behavioral features. It would be very useful if the authors could also fit simple EAMs to the data to clarify which of these features are explainable by EAMs only and which are not.

      In our view, fitting simple EAMs to the data would not be especially informative and poses a number of challenges for the particular task we study (LIM) that are neatly avoided by using the VAM. In particular, as we show in Figure 2, the stimuli vary along several dimensions that all appear to influence behavior: horizontal position, vertical position, layout, target direction, and flanker direction. Since the VAM is stimulus-computable, fitting the VAM automatically discovers how all of these stimulus features influence behavior (via their effect on the drift rates outputted by the CNN). In contrast, fitting a simple EAM (e.g. the LBA model) necessitates choosing a particular parameterization that specifies the relationship between all of the stimulus features and the EAM model parameters. This raises a number of practical questions. For example, should we attempt to fit a separate EAM for each stimulus feature, or model all stimulus features simultaneously?

      Moreover, while we could in principle navigate these issues and fit simple EAMs to the data, we do not intend to claim that simple EAMs fail to explain the relationship between stimulus features and behavior as well as the VAM. Rather, the key strength of the VAM relative to simple EAMs is that it includes a detailed and biologically plausible model of human vision. The majority of the paper capitalizes on this strength by showing how behavioral effects of interest (namely congruency effects) can be explained in terms of the VAM’s visual representations.

      (4) VAM is tested in two different ways behaviorally. First, it is tested to what extent it captures individual differences (Figure 2B-E). Second, it is tested to what extent it captures average subject data (Figure 2F-J). It wasn't clear to me why for some metrics only individual differences are examined and for other metrics only average human data is examined. I think that it will be much more informative if separate figures examine average human data and individual difference data. I think that it's especially important to clarify whether VAM can capture individual differences for the quantities plotted in Figures 2F-J.

      We would like to clarify that Fig. 2J in fact already shows how well the VAM captures individual differences for the average subject data shown in Fig. 2H (stimulus layout) and Fig. 2I (stimulus position). For a given participant and stimulus feature, we calculated the Pearson's r between model/participant mean RTs across each stimulus feature value. Fig. 2J shows the distribution of these Pearson’s r values across all participants for stimulus layout and horizontal/vertical position.

      Fig. 2G also already shows how well the VAM captures individual differences in behavior. Specifically, this panel shows individual differences in mean RT attributable to differences in age. For Fig. 2F, which shows how the model drift rates differ on congruent vs. incongruent trials, there is no sensible way to compare the models to the participants at any level of analysis (since the participants do not have drift rates). 

      (5) The authors look inside VAM and perform many exploratory analyses. I found many of these difficult to follow since there was little guidance about why each analysis was conducted. This also made it difficult to assess the likelihood that any given result is robust and replicable. More importantly, it was unclear which results are hypothesized to depend on the VAM architecture and training, and which results would be expected in performance-optimized CNNs. The authors train and examine performance-optimized CNNs later, but it would be useful to compare those results to the VAM results immediately when each VAM result is first introduced.

      Thanks for pointing this out—we apologize for any confusion caused by our presentation of the CNN analyses. We have added in additional motivating statements, methodological clarifications, and relevant references to our Results, particularly for Figure 3 in which we first introduce the analyses of the CNN representations/activity. In general, each analysis is prefaced by a guiding question or specific rationale, e.g. “How do the models' visual representations enable target selectivity for stimuli that vary along several irrelevant dimensions?” We also provide numerous references in which these analysis techniques have been used to address similar questions in CNNs or the primate visual cortex.

      We chose to maintain the current organization of our results in which the comparison between the VAM and the task-optimized models are presented in a separate figure. We felt that including analyses of both the VAM and task-optimized models in the initial analyses of the CNN representations would be overwhelming for many readers. As the reviewer acknowledges, some readers may already find these results challenging to follow. 

      (6) The authors don't examine how the task-optimized models would produce RTs. They say in lines 371-2 that they "could not examine the RT congruency effect since the task-optimized models do not generate RTs." CNNs alone don't generate RTs, but RTs can easily be generated from them using the same EAM add-on that is part of VAM. Given that the CNNs are already trained, I can't see a reason why the authors can't train EAMs on top of the already trained CNNs and generate RTs, so these can provide a better comparison to VAM.

      We appreciate this suggestion, but we judge the suggestion to “train EAMs on top of the already trained CNNs and generate RTs” to be a significant expansion of the scope of the paper with multiple possible roads forward. In particular, one must specify how the outputs of the task-optimized CNN (logits for each possible response) relate to drift rates, and there is no widely-accepted or standard way to do this. Previously proposed methods include transforming representation distances in the last layer to drift rates (https://doi.org/10.1037/xlm0000968), fitting additional subject-specific parameters that map the logits to drift rates

      (https://doi.org/10.1007/s42113-019-00042-1), or using the softmax-scored model outputs as drift rates directly (https://doi.org/10.1038/s41562-024-01914-8), though in the latter case the RTs are not on the same scale as human data. In our view, evaluating these different methods is beyond the scope of this paper. An advantage of the VAM is that one does not have to fit two separate models (a CNN and a EAM) to generate RTs.

      Nonetheless, we agree that it would be informative to examine something like RTs in the task-optimized models. Our revised Results section now includes an analysis of the confidence of the task-optimized models’ decisions, which we use a proxy for RTs:   

      “Since the task-optimized models do not generate RTs, it is not possible to directly measure RT congruency effects in these models without making additional assumptions about how the CNN's classification decisions relate to RTs. However, as a coarse proxy for RT, we can examine the confidence of the CNN's decisions, defined as the softmax-scored logit (probability) of the most probable direction in the final CNN layer. This choice of RT proxy is motivated by some prior studies that have combined CNNs with EAMs [Annis et al., 2021; Holmes et al., 2020; Trueblood et al., 2021]. These studies explicitly or implicitly derive a measure of decision confidence from the activity of the last CNN layer. The confidence measure is then mapped to the EAM drift rates, such that greater decision confidence generally corresponds to higher drift rates (and therefore shorter RTs).

      We calculated the average confidence of each task-optimized CNN separately for congruent vs. incongruent trials. On average, the task-optimized models showed higher confidence on congruent vs. incongruent trials (W = 21.0, p < 1e-3, Wilcoxon signed-rank test; Cohen's d = 0.99; n = 75 models). These analyses therefore provide some evidence that task-optimized CNNs have the capacity to exhibit congruency effects, though an explicit comparison of the magnitude of these effects with human data requires additional modeling assumptions (e.g., fitting a separate EAM).”

      (7) The Discussion felt very long and mostly a summary of the Results. I also couldn't shake the feeling that it had many just-so stories related to the variety of findings reported. I think that the section should be condensed and the authors should be clearer about which explanations are speculations and which are air-tight arguments based on the data.

      We have shortened the Discussion modestly and we have added in some clarifying language to help clarify which arguments are more speculative vs. directly supported by our data.

      Specifically, we added in the phrase “we speculate that…” for two suggestions in the Discussion (paragraphs 3 and 5), and we ensured that any other more speculative suggestions contain such clarifying language. We have also added in subheadings in the Discussion to help readers navigate this section. 

      (8) In one of the control analyses, the authors train different VAMs on each RT quantile. I don't understand how it can be claimed that this approach can serve as a model of an individual's sensory processing. Which of the 5 sets of weights (5 VAMs) captures a given subject's visual processing? Are the authors saying that the visual system of a given subject changes based on the expected RT for a stimulus? I feel like I'm missing something about how the authors think about these results.

      We agree that these particular analyses may cause confusion and have removed them from our revised manuscript.

      Reviewer #2 (Public Review):

      In an image-computable model of speeded decision-making, the authors introduce and fit a combined CCN-EAM (a 'VAM') to flanker-task-like data. They show that the VAM can fit mean RTs and accuracies as well as the congruency effect that is present in the data, and subsequently analyze the VAM in terms of where in the network congruency effects arise.

      Overall, combining DNNs and EAMs appears to be a promising avenue to seriously model the visual system in decision-making tasks compared to the current practice in EAMs. Some variants have been proposed or used before (e.g., doi.org/10.1016/j.neuroimage.2017.12.078 , doi.org/10.1007/s42113-019-00042-1), but always in the context of using task-trained models, rather than models trained on behavioral data. However, I was surprised to read that the authors developed their model in the context of a conflict task, rather than a simpler perceptual decision-making task. Conflict effects in human behavior are particularly complex, and thereby, the authors set a high goal for themselves in terms of the to-be-explained human behavior. Unfortunately, the proposed VAM does not appear to provide a great account of conflict effects that are considered fundamental features of human behavior, like the shape of response time distributions, and specifically, delta plots (doi.org/10.1037/0096-1523.20.4.731). The authors argue that it is beyond the scope of the presented paper to analyze delta plots, but as these are central to studies of human conflict behavior, models that aim to explain conflict behavior will need to be able to fit and explain delta plots.

      Theories on conflict often suggest that negative/positive-trending delta plots arise through the relative timing of response activation related to relevant and irrelevant information.

      Accumulation for relevant and irrelevant information would, as a result, either start at different points in time or the rates vary over time. The current VAM, as a feedforward neural network model, does not appear to be able to capture such effects, and perhaps fundamentally not so: accumulation for each choice option is forced to start at the same time, and rates are a static output of the CNN.

      The proposed solution of fitting five separate VAMs (one for each of five RT quantiles) is not satisfactory: it does not explain how delta plots result from the model, for the same reason that fitting five evidence accumulation models (one per RT quantile) does not explain how response time distributions arise. If, for example, one would want to make a prediction about someone's response time and choice based on a given stimulus, one would first have to decide which of the five VAMs to use, which is circular. But more importantly, this way of fitting multiple models does not explain the latent mechanism that underlies the shape of the delta plots.

      As such, the extensive analyses on the VAM layers and the resulting conclusions that conflict effects arise due to changing representations across layers (e.g., "the selection of task-relevant information occurs through the orthogonalization of relevant and irrelevant representations") - while inspiring, they remain hard to weigh, as they are contingent on the assumption that the VAM can capture human behavior in the conflict task, which it struggles with. That said, the promise of combining CNNs and EAMs is clearly there. A way forward could be to either adjust the proposed model so that it can explain delta plots, which would potentially require temporal dynamics and time-varying evidence accumulation rates, or perhaps to start simpler and combine CCNs-EAMs that are able to fit more standard perceptual decision-making tasks without conflict effects.

      We thank the reviewer for their thoughtful comments on our work. However, we note that the

      VAM does in fact capture the positive-trending RT delta plot observed in the participant data (Fig. S4A), though the intercepts for models/participants differ somewhat. On the other hand, the conditional accuracy functions (Fig. S4B) reveal a more pronounced difference between model and participant behavior. As the reviewer points out, capturing these effects is likely to require a model that can produce time-varying drift rates, whereas our model produces a fixed drift rate for a given stimulus. We also agree that fitting a separate VAM to each RT quantile is not a satisfactory means of addressing this limitation and have removed these analyses from our revised manuscript.

      However, while we agree that accurately capturing these dynamic effects is a laudable goal, it is in our view also worthwhile to consider explanations for the mean behavioral effect (i.e. the accuracy congruency effect), which can occur independently of any consideration of dynamics. One of our main findings is that across-model variability in accuracy congruency effects is better attributed to variation in representation geometry (target/flanker subspace alignment) vs.

      variation in the degree of flanker suppression. This finding does not require any consideration of dynamics to be valid at the level of explanation we pursue (across-user variability in congruency effects), but also does not preclude additional dynamic processes that could give rise to more specific error patterns. Our revised discussion now includes a section where we summarize and elaborate on these ideas:

      “It is not difficult to imagine how the orthogonalization mechanism described above, which explains variability in accuracy congruency effects across individuals, could act in concert with other dynamic processes that explain variability in congruency effects within individuals (e.g., as a function of RT). In general, any process that dynamically gates the influence of irrelevant sensory information on behavioral outputs could accomplish this, for example ramping inhibition of incorrect response activation [https://doi.org/10.3389/fnhum.2010.00222], a shrinking attention spotlight [https://doi.org/10.1016/j.cogpsych.2011.08.001], or dynamics in neural population-level geometry [https://doi.org/10.1038/nn.3643]. To pursue these ideas, future work may aim to incorporate dynamics into the visual component and decision component of the VAM with recurrent CNNs [https://doi.org/10.48550/arXiv.1807.00053, https://doi.org/10.48550/arXiv.2306.11582] and the task-DyVA model [https://doi.org/10.1038/s41562-022-01510-8], respectively.”

      Reviewer #3 (Public Review):

      Summary:

      In this article, the authors combine a well-established choice-response time (RT) model (the Linear Ballistic Accumulator) with a CNN model of visual processing to model image-based decisions (referred to as the Visual Accumulator Model - VAM). While this is not the first effort to combine these modeling frameworks, it uses this combination of approaches uniquely.

      Specifically, the authors attempt to better understand the structure of human information representations by fitting this model to behavioral (choice-RT) data from a classic flanker task. This objective is made possible by using a very large (by psychological modeling standards) industry data set to jointly fit both components of this VAM model to individual-level data. Using this approach, they illustrate (among other results) (1) how the interaction between target and flanker representations influence the presence and strength of congruency effects, (2) how the structure of representations changes (distributed versus more localized) with depth in the CNN model component, and (3) how different model training paradigms change the nature of information representations. This work contributes to the ML literature by demonstrating the value of training models with richer behavioral data. It also contributes to cognitive science by demonstrating how ML approaches can be integrated into cognitive modeling. Finally, it contributes to the literature on conflict modeling by illustrating how information representations may lead to some of the classic effects observed in this area of research.

      Strengths:

      (1) The data set used for this analysis is unique and is made publicly available as part of this article. Specifically, they have access to data for 75 participants with >25,000 trials per participant. This scale of data/individual is unusual and is the foundation on which this research rests.

      (2) This is the first time, to my knowledge, that a model combining a CNN with a choice-RT model has been jointly fit to choice-RT data at the level of individual people. This type of model combination has been used before but in a more restricted context. This joint fitting, and in particular, learning a CNN through the choice-RT modeling framework, allows the authors to probe the structure of human information representations learned directly from behavioral data.

      (3) The analysis approaches used in this article are state-of-the-art. The training of these models is straightforward given the data available. The interesting part of this article (opinion of course) is the way in which they probe what CNN has learned once trained. I find their analysis of how distractor and target information interfere with each other particularly compelling as well as their demonstration that training on behavioral data changes the structure of information representations when compared to training models on standard task-optimized data.

      Weaknesses:

      (1) Just as the data in this article is a major strength, it is also a weakness. This type of modeling would be difficult, if not impossible to do with standard laboratory data. I don't know what the data floor would be, but collecting tens of thousands of decisions for a single person is impractical in most contexts. Thus this type of work may live in the realm of industry. I do want to re-iterate that the data for this study was made publicly available though!

      We suspect (but have not systematically tested) that the VAMs can be fitted with substantially less data. We use data augmentation techniques (various randomized image transformations) during training to improve the generalization capabilities of the VAMs, and these methods are likely to be particularly important when training on smaller datasets. One could consider increasing the amount of image data augmentation when working with smaller datasets, or pursuing other forms of data augmentation like resampling from estimated RT distributions (see https://doi.org/10.1038/s41562-022-01510-8 for an example of this). In general, we don’t think that prospective users of our approach should be discouraged if they have only a few hundred trials per subject (or less) - it’s worth trying!

      (2) While this article uses choice-RT data it doesn't fully leverage the richness of the RT data itself. As the authors point out, this modeling framework, the LBA component in particular, does not account for some of the more nuanced but well-established RT effects in this data. This is not a big concern given the already nice contributions of this article and it leads to an opportunity for ongoing investigation.

      We agree that fully capturing the more nuanced behavioral effects you mention (e.g. RT delta plots and conditional accuracy functions) is a worthwhile goal for future research—see our response to Reviewer #2 for a more detailed discussion. ----------

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The phrase in the Abstract "convolutional neural network models of visual processing and traditional EAMs are jointly fitted" made me initially believe that the two models were fitted independently. You may want to re-word to clarify.

      We think that the phrase “jointly fitted” already makes it clear that both the CNN and EAM parameters are estimated simultaneously, in agreement with how this term is usually used. But we have nonetheless appended some additional clarifying language to that sentence (“in a unified Bayesian framework”).

      (2) Lines 27-28: EAMs "are the most successful and widely-used computational models of decision-making." This is only true for the specific type of decision-making examined here, namely joint modeling of choice and response times. Signal detection theory is arguably more widely-used when response times are not modeled.

      Thanks for pointing this out - we have revised the referenced sentence accordingly.

      (3) Could the authors clarify what is plotted in Figure 2F?

      Fig. 2F shows the drift rates for the target, flanker, and “other” (non-target/non-flanker) accumulators averaged over trials and models for congruent vs. incongruent trials. In case this was a source of confusion, we do not show the value of the flanker drift rates on congruent trials because the flanker and target accumulators are identical (i.e. the flanker/congruent drift rates are equivalent to the target/congruent drift rates).

      (4) Lines 214-7: "The observation that single-unit information for target direction decreased between the fourth and final convolutional layers while population-level decoding remained high is especially noteworthy in that it implies a transition from representing target direction with specialized "target neurons" to a more distributed, ensemble-level code." Can the authors clarify why this is the only reasonable explanation for these results? It seems like many other explanations could be construed.

      We have added additional clarification to this section and now use more tentative language:

      “The observation that single-unit information for target direction decreased between the fourth and final convolutional layers indicates that the units become progressively less selective for particular target directions. Since population-level decoding remained high in these layers, this suggests a transition from representing target direction with specialized "target neurons" to a more distributed, ensemble-level code.”

      (5) Lines 372-376: "Thus, simply training the model to perform the task is not sufficient to reproduce a behavioral phenomenon widely-observed in conflict tasks. This challenges a core (but often implicit) assumption of the task-optimized training paradigm, namely that to do a task well, a training model will result in model representations that are similar to those employed by humans." While I agree with the general sentiment, I feel that its application here is strange. Unless I'm missing something, in the context of the preceding sentence, the authors seem to be saying that researchers in the field expect that CNNs can produce a behavioral phenomenon (RTs) that is completely outside of their design and training. I don't think that anyone actually expects that.

      We moved the discussion/analyses of RTs to the next paragraph. It should now be clear that this statement refers specifically to the absence of an accuracy congruency effect in the task-optimized models.

      (6) Lines 387-389: "As a result, the VAMs may learn richer representations of the stimuli, since a variety of stimulus features-layout, stimulus position, flanker direction-influence behavior (Figure 2)." That is certainly true of tasks like this one where an optimal model would only focus on a tiny part of the image, whereas humans are distracted by many features. I'm not sure that this distractibility is the same as "richer representations". When CNNs classify images based on the background, would the authors claim that they have richer representations than humans?

      We agree that “richer” may not be the best way to characterize these representations, and have changed it to “more complex”.

      (7) Is it possible that drift rate d_k for each response happens to be negative on a given trial? If so, how is the decision given on such trials (since presumably none of the accumulators will ever reach the boundary)?

      It is indeed possible for all of the drift rates to be negative, though we found that this occurred for a vanishingly small number of trials (mean ± s.e.m. percent trials/model: 0.080 ± 0.011%, n = 75 models), as reported in the Methods. These trials were excluded from analyses.

      (8)  Can the authors comment on how they chose the CNN architecture and whether they expect that different architectures will produce similar results?

      Before establishing the seven-layer CNN architecture used throughout the paper, we conducted some preliminary experiments using other architectures that differed primarily in the number of CNN layers. We found that models with significantly fewer than seven layers typically failed to reach human-level accuracy on the task while larger models achieved human-level accuracy but (unsurprisingly) took longer to train.

      Reviewer #3 (Recommendations For The Authors):

      - In the introduction to this paper (particularly the paragraph beginning in line 33), the authors note that EAMs have typically been used in simplified settings and that they do not provide a means to account for how people extract information from naturalistic stimuli. While I agree with this, the idea of connecting CNNs of visual processing with EAMs for a joint modeling framework has been done. I recommend looking at and referencing these two articles as well as adjusting the tenor of this part of an introduction to better reflect the current state of the literature. For full disclosure, I am one of the authors on these articles. https://link.springer.com/article/10.1007/s42113-019-00042-1 https://www.sciencedirect.com/science/article/abs/pii/S0010027721001323

      We agree—thanks for pointing this out. The revised Introduction now discusses prior related models in more detail (including those referenced above) and better clarifies the novel contributions of our model. We specifically highlight that a novel contribution of the VAM is that “the CNN and EAM parameters are jointly fitted to the RT, choice, and visual stimulus data from individual participants in a unified Bayesian framework.”

      - The statement in lines 56-58 implies that this is the first article to glue CNNs together with EAMs. I would edit this accordingly based on the prior comment here and references provided. I will note that the second feature of the approach in this paper is still novel and really nice, namely the fact that the CNN and the EAM are jointly fitted. In the aforementioned references, the CNN is trained on the image set, and individual level Bayesian estimation was only applied to the EAM. Thus, it may be useful to highlight the joint estimation aspect of this investigation as well as how the uniqueness of the data available makes it possible.

      Agreed—see above.

      - Figure 3c and associated text. I understand the MI analysis you are performing here, however it is difficult to interpret as it stands. In the figure, what does a MI of 0.1 mean?? Can you give some context to that scale? I do find the interpretation of the hunchback shape in lines 210-222 to be somewhat of a stretch. The discussion that precedes (lines 199-209) this is clear and convincing. Can this discussion be strengthened more? And more interpretability of Figure 3c would be helpful; entropic scales can be hard to interpret without some context or scale associated.

      The MI analyses in Fig. 3C (and also Figs. 4C and 6E) show normalized MI, in which the raw MI has been divided by the entropy of the stimulus feature distribution. This normalization facilitates comparing the MI for different stimulus features, which is relevant for Figs. 4C and 6E. The normalized MI has a possible range of [0, 1], where 1 indicates perfect correlation between the two variables and 0 indicates complete independence. We now note in the legend of these figures that the possible normalized MI range is [0, 1], which should help with interpreting these values. Our revised results section for Fig. 3C now also includes some additional remarks on our interpretation of the hunchback shape of the MI.

      - Lines 244-248 and the analyses in Figure 3 suggest a change in the behavior of the CNN around layer 4. This is just a musing, but what would happen if you just used a 4 layer CNN, or even a 3 layer? This is not just a methods question. Your analysis suggests a transition from localized to distributed information representation. Right now, the EAM only sees the output of the distributed representation. What if it saw the results the more local representations from early layers? Of course, a shallower network may just form the distributed representations earlier, but it would interesting if there were a way to tease out not just the presence of distributed vs local representations, but the utility of those to the EAM.

      Thanks for this interesting suggestion. We did do some preliminary experiments in models with fewer layers, though we only examined the outputs of these models and did not assess their representations. We found that models with 3–5 layers generally failed to achieve human-level accuracy on the task. In principle, one could relate this observation to the representations of these models as a means of assessing the relative utility of distributed/local representations. However, there are confounding factors that one would ideally control for in order to compare models with different numbers of layers in this fashion (namely, the number of parameters).

      - Section Line 359 (Task optimized models) - It would be helpful to clarify here what these task-optimized models are being trained to do. As I understand it, they are being trained to directly predict the target direction. But are you asking them to learn to predict the true target direction? Or are you training them to predict what each individual responds? I think it is the second (since you have 75 of these), but it's not clear. I looked at the methods and still couldn't get a clear description of this. Also, are you just stripping the LBA off of the end of the CNN and then essentially putting a softmax in its place? If so, it would be helpful to say so.

      The task-optimized models were actually trained to output the true target direction in each stimulus, rather than trained to match the decisions of the human participants. We trained 75 such models since we wanted to use exactly the same stimuli as were used to train each VAM. The task-optimized CNNs were identical to those used in the VAMs, except that the outputs of the last layer were converted to softmax-scored probabilities for each direction rather than drift rates. The Results and Methods section now included additional commentary that clarifies these points.

      - Line 373-376: This statement is pretty well established at this point in the similarity judgement literature. I recommend looking at and referencing https://onlinelibrary.wiley.com/doi/full/10.1111/cogs.13226 https://www.nature.com/articles/s41562-020-00951-3 https://link.springer.com/article/10.1007/s42113-020-00073-z

      Thanks for pointing this out. For reference, the statement in question is “Thus, simply training the model to perform the task is not sufficient to reproduce a behavioral phenomenon widely-observed in conflict tasks. This challenges a core (but often implicit) assumption of the task-optimized training paradigm, namely that training a model to do a task well will result in model representations that are similar to those employed by humans.”

      We agree that the first and third reference you mention are relevant, and we now cite them along with some other relevant work. In our view, the second reference you mention is not particularly relevant (that paper introduces a new computational model for similarity judgements that is fit to human data, but does not comment on training models to perform tasks vs. fitting to human data).

      - Line 387-388: "VAMs may learn richer representations". This is a bit of a philosophical point, but I'll go ahead and mention it. The standard VAM does not necessarily learn "richer" feature representations. Rather, you are asking the VAM and task-optimized models to do different things. As a result, they learn different representations. "Better" or "richer" is in the eye of the beholder. In one view, you could view the VAM performance as sub-par since it exhibits strange artifacts (congruency effects) and the expansion of dimensionality in the VAM representations is merely a side-effect of poor performance. I'm not advocating this view, just playing devils advocate and suggesting a more nuanced discussion of the difference between the VAM and task-optimized models.

      We agree—this is a great point. We have changed this statement to read “the VAMs may learn more complex [rather than richer] representations of the stimuli”.

      - Lines 567-570: Here you discuss how the LBA backend of the VAM can't account for shrinking spotlight-like RT effects but that fitting models to different RT quantiles helps overcome this. I find this to be one of the weakest points of the paper (the whole process of fitting RT quantiles separately to begin with). This is just a limitation of the RT component of the model. This is a great paper but this is just a limitation inherent in the model. I don't see a need to qualify this limitation and think it would be better to just point out that this is a limitation of the LBA itself (be more clear that it is the LBA that is the limiting factor here) and that this leaves room for future research. From your last sentence of this paragraph, I agree that recurrent CNNs would be interesting. I will note that RNN choice-RT models are out there (though not with CNNs as part of the model).

      We agree and have revised this section of the Discussion accordingly (see our response to Reviewer #2 for more detail). We also removed the analyses of models trained on separate RT quantiles.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife Assessment

      The study presents a potentially valuable approach to genetically modify cells to produce extracellular matrices with altered compositions, termed cell-laid, engineered extracellular matrices (eECM). The evidence supporting the authors' conclusions regarding the utility of eECM for endogenous repair is solid, although there are some disagreements on the chondrogenicity of lyophilized constructs which was viewed as lacking robust evidence for endochondral ossification.

      We thank the reviewers for the assessment of our work. We however strongly contest the lack of evidence for chondrogenicity and endochondral ossification. This is robustly demonstrated and a clear strength of our study.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to modify the characteristics of the extracellular matrix (ECM) produced by immortalized mesenchymal stem cells (MSCs) by employing the CRISPR/Cas9 system to knock out specific genes. Initially, they established VEGF-KO cell lines, demonstrating that these cells retained chondrogenic and angiogenic properties. Additionally, lyophilized carriage tissues produced by these cells exhibited retained osteogenic properties.

      Subsequently, the authors established RUNX2-KO cell lines, which exhibited reduced COLX expression during chondrogenic differentiation and notably diminished osteogenic properties in vitro. Transplantation of lyophilized carriage tissues produced by RUNX2-KO cell lines into osteochondral defects in rat knee joints resulted in the regeneration of articular cartilage tissues as well as bone tissues, a phenomenon not observed with tissues derived from parental cells. This suggests that gene-edited MSCs represent a valuable cell source for producing ECM with enhanced quality.

      Strengths:

      The enhanced cartilage regeneration observed with ECM derived from RUNX2-KO cells supports the authors' strategy of creating gene-edited MSCs capable of producing ECM with superior quality. Immortalized cell lines offer a limitless source of off-the-shelf material for tissue regeneration.

      Weaknesses:

      Most of the data align with anticipated outcomes, offering limited novelty to advance scientific understanding. Methodologically, the chondrogenic differentiation properties of immortalized MSCs appeared deficient, evidenced by Safranin-O staining of 3D tissues and histological findings lacking robust evidence for endochondral differentiation. This presents a critical limitation, particularly as authors propose the implantation of cartilage tissues for in vivo experiments. Instead, the bulk of data stemmed from type I collagen scaffold with factors produced by MSCs stimulated by TGFβ.

      We thank the reviewer for the thorough evaluation. We appreciate the highlighted novelty but overall disagree with key points from the provided assessment. The most important one being non the contested in vitro cartilage and endochondral ossification by engineered ECMs, for which we have provided compelling evidence. Of note, the reviewer points the “osteogenic” properties of our tissues; the wording is incorrect since cells are absent from the final grafts. Here, the term ”osteoinductivity” should be employed, in line with the model of ectopic ossification used to demonstrate de novo bone formation.

      In the revised version, the authors presented Safranin-O staining results of pellets prior to lyophilization. The inset of figures showing entire pellets revealed that Safranin-O-positive areas were limited, suggesting that cells in the negative regions had not differentiated into chondrocytes. In Figure 3F, DAPI staining showed devitalized cells in the outer layer but was negative in the central part, indicating the absence of cells in these areas and incomplete differentiation induction.

      We strongly disagree with the reviewer on the lack of demonstrated chondrogenicity. We have provided evidence of Safranin-O positivity, GAGs quantification, as well as collagen type 2 and collagen type X stainings (also quantified). Frankly, those are gold standard assays in the field and we do not understand the reviewer point of view. We however agree that our grafts are not entirely composed of cartilage matrix. There are areas where cartilage is absent, in particular in the core of the tissues. This is expected from in vitro engineered cartilage pellets even from primary BM-MSCs donors. By selecting primary donors it is possible to obtain a superior cartilage formation. Our MSOD-B cells remain to-the-best-of-our -knowledge, the only human line capable of in vitro chondrogenesis, even if considered moderate.

      We agree with the absence of cells in the core area of our tissues, as correctly pointed out by the reviewer. This has been reported in other studies whereby the lack of media diffusion can lead to necrotic core formation.

      The rationale for establishing VEGF-KO cell lines remains unclear, and the authors' explanation in the revised manuscript is still equivocal. While they mention that VEGF is a late marker for endochondral ossification, the data in Figures 1D and 1E clearly show that VEGF-KO affects the early phase of endochondral ossification.

      We feel that the rationale for a VEGF-KO is sufficiently conveyed. In our study, VEGF-KO affects GAGs content in the tissue, but not the efficiency of ossification.

      Insufficient depth was given to elucidate the disparity in osteogenic properties between those observed in ectopic bone formation and those observed in transplantation into osteochondral defects.

      We here agree with the reviewer on the limited depth of our osteochondral assessment. However, this was performed as a proof-of-concept and we clearly conveyed both limitations and need of a follow-up study to demonstrate the repair efficacy of our tissue in such defect context.

      In the ectopic bone formation study, most of the collagenous matrix observed at 2 weeks was resorbed by 6 weeks, with only a small amount contributing to bone formation in MSOD-B cells (Figs. 2I and 4C). This finding does not align with the micro-CT data presented in Figures 2H and 4B. For the micro-CT experiments, it would be more appropriate to use a standard window for bone and present the data accordingly.

      Stainings report the deposition of collagens and may be misleading as not only indicating frank bone formation. This is the reason why we provided microCT data, offering a quantitative assessment of the full grafts and more reliably evaluating mineralized/bone tissue. We feel that our results matched our conclusions.

      While the regeneration of articular cartilage in RUNX2-KO ECM presents intriguing results, the study lacked an exploration into underlying mechanisms, such as histological analyses at earlier time points.

      We do agree with the reviewer regarding this limitation. In addition to mechanisms and early timepoints, we are also interested in longer in vivo evaluation. This represents a significant amount of work which is beyond the scope of our present manuscript.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors have started off using an immortalized human cell line and then gene edited it to decrease the levels of VEGF1 (in order to influence vascularization), and the levels of Runx2 (to decrease osteogenesis). They first transplanted these cells with a collagen scaffold. The modified cells showed a decrease in vascularization when VEGF1 was decreased, and suggested an increase in cartilage formation.

      In another study, matrix generated by these cells subsequently remodeled into a bone marrow organ. When RUNX2 was decreased, the cells did not mineralize in vitro, and their matrices expressed types I and II collagen but not type X collagen in vitro, in comparison with unedited cells. In vivo, the author claims that remodeling of the matrices into bone was somewhat inhibited. Lastly, they utilized matrices generated by RUNX2-edited cells to regenerate chondro-osteal defects. They suggest that the edited cells regenerated cartilage in comparison with unedited cells.

      Strengths:

      - The notion that inducing changes in the ECM by genetically editing the cells is a novel one, as it has long been thought that ECM composition influences cell activity.

      - If successful, it may be possible to make off the shelf ECMS to carry out different types of tissue repair.

      Weaknesses:

      - The authors have not demonstrated robust cartilage formation (quantitation would be useful).

      - Measuring total GAG content does not prove the presence of cartilage

      - There are numerous overstatements about forming and implanting cartilage.

      - Although it is implied, RUNX2 deletion did not improve cartilage formation by the modified cells.

      - In the control line, MSOD-B there were variability in the amount of safranin O positive material in various histological panels in the figures.; more quantitation is needed.

      - In the in vivo articular defect experiments, an untreated injured joint is needed as a negative control.

      - Statements about bone generation are often not reflective of the microCT data presented.<br /> - The discussion over-interprets the results.

      We thank the reviewer for the further assessment of our work. We respectfully disagree with most of the provided statements. The chondrogenicity of our graft is robustly demonstrated using multiple readouts, including quantitative ones. Beyond GAGs, we provided clear Safranin-O stainings, as well as collagen type 2 and X indicating presence of hypertrophic cartilage matrix. Those are the gold standards in the field and we thus do not understand the reviewer scepticism. We do agree that our grafts are fully composed of cartilage matrix, with areas (in the core) deprived of cartilage. This does not impact the core findings of our study and its conclusions, and we strongly feel our statements about forming in vitro cartilage fully stand.

      We do not claim in the manuscript an increased cartilage formation following RUNX2 deletion. We report in vitro an impaired hypertrophy (collagen type X) and maintenance of collagen type 2 and GAGs content.

      We are confident on our data regarding de novo bone formation bi priming endochondral ossification, confirmed both by stainings and microCT. We feel that our claims are well-supported.


      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors aimed to modify the characteristics of the extracellular matrix (ECM) produced by immortalized mesenchymal stem cells (MSCs) by employing the CRISPR/Cas9 system to knock out specific genes. Initially, they established VEGF-KO cell lines, demonstrating that these cells retained chondrogenic and angiogenic properties. Additionally, lyophilized carriage tissues produced by these cells exhibited retained osteogenic properties. 

      Subsequently, the authors established RUNX2-KO cell lines, which exhibited reduced COLX expression during chondrogenic differentiation and notably diminished osteogenic properties in vitro. Transplantation of lyophilized carriage tissues produced by RUNX2-KO cell lines into osteochondral defects in rat knee joints resulted in the regeneration of articular cartilage tissues as well as bone tissues, a phenomenon not observed with tissues derived from parental cells. This suggests that gene-edited MSCs represent a valuable cell source for producing ECM with enhanced quality. 

      Strengths: 

      The enhanced cartilage regeneration observed with ECM derived from RUNX2-KO cells supports the authors' strategy of creating gene-edited MSCs capable of producing ECM with superior quality. Immortalized cell lines offer a limitless source of off-the-shelf material for tissue regeneration. 

      We thank the reviewer for the interest in our work. We however want to clarify that the present manuscript does not report the generation of ECM with “superior quality”, but rather of modulated composition and thus function.  

      Weaknesses: 

      Most data align with anticipated outcomes, offering limited novelty to advance scientific understanding. Methodologically, the chondrogenic differentiation properties of immortalized MSCs appeared deficient, evidenced by Safranin-O staining of 3D tissues and histological findings lacking robust evidence for endochondral differentiation. This presents a critical limitation, particularly as authors propose the implantation of cartilage tissues for in vivo experiments. Instead, the bulk of data stemmed from type I collagen scaffold with factors produced by MSCs stimulated by TGFβ. 

      The chondrogenic differentiation of our MSOD-B line and their capacity of undergoing endochondral ossification has been robustly demonstrated in previous studies (Pigeot et al., Advanced Materials 2021 and Grigoryan et al., Science Translational Medicine 2022). In the present manuscript, we thus compare the chondrogenic capacity of newly established VEGF-KO and RUNX-KO lines to those of MSOD-B cells. We demonstrate by qualitative (Safranin-O staining, Collagen type 2 and Collagen type X immuno-stainings) and quantitative (glycosaminoglycans assay) assays that the generated tissues consist in cartilage grafts of similar quality than the MSOD-B counterpart. Of note, the safranin-O stainings were performed on lyophilized tissues, which can alter the staining quality/intensity. We now provide additional stainings of generated tissues pre-lyophilization. This is implemented in Figure 1D, Figure 3D.

      The rationale behind establishing VEGF-KO cell lines remains unclear. What specific outcomes did the authors anticipate from this modification? 

      VEGF is a known master regulator of angiogenesis and a key mediator of endochondral ossification. It has also been extensively used in bone tissue engineering studies as a supplemented factor – primarily in the form of VEGFα – to increase the vascularization and thus outcome of bone formation of engineered grafts (https://www.nature.com/articles/s42003-020-01606-9, https://www.sciencedirect.com/science/article/pii/S8756328216301752). In our study, it was thus identified as a natural candidate to demonstrate the possibility to generate VEGF-KO cartilage and subsequently assess the functional impact on both the angiogenic and osteogenic potential of resulting cartilage tissue. This is now clarified in the manuscript (page 3, paragraph 4).

      Insufficient depth was given to elucidate the disparity in osteogenic properties between those observed in ectopic bone formation and those observed in transplantation into osteochondral defects. While the regeneration of articular cartilage in RUNX2-KO ECM presents intriguing results, the study lacked an exploration into underlying mechanisms, such as histological analyses at earlier time points. 

      Using RUNX2-KO ECM, we aimed at demonstrating the impact on cartilage remodeling and bone formation. This was performed ectopically but also in the rat osteochondral defect as a regenerative set-up of higher clinical relevance. We agree with the reviewer that additional experimental groups and time-points (not only earlier but also longer ones) would offer a better mechanistic understanding of the ECM contribution to the joint repair. However, as stated in our manuscript this is a proof-of-concept study that successfully demonstrated the influence of the cartilage ECM modification on the in vivo skeletal regeneration. A follow-up study would need to be performed to complement existing evidence and strengthen the relevance of our approach for cartilage repair. This is now further emphasized in the discussion (page 11, paragraph 3).  

      Reviewer #2 (Public Review): 

      The manuscript submitted by Sujeethkumar et al. describes an alternative approach to skeletal tissue repair using extracellular matrix (ECM) deposited by genetically modified mesenchymal stromal/stem cells. Here, they generate a loss of function mutations in VEGF or RUNX2 in a BMP2overexpressing MSC line and define the differences in the resulting tissue-engineered constructs following seeding onto a type I collagen matrix in vitro, and following lyophilization and subcutaneous and orthotopic implantation into mice and rats. Some strengths of this manuscript are the establishment of a platform by which modifications in cell-derived ECM can be evaluated both in vitro and in vivo, the demonstration that genetic modification of cells results in complexity of in vitro cell-derived ECM that elicits quantifiable results, and the admirable goal to improve endogenous cartilage repair. However, I recommend the authors clarify their conclusions and add more information regarding reproducibility, which was one limitation of primary-cell-derived ECMs. 

      We thank the reviewer for the positive evaluation of our work.  

      Overcoming the limitations of native/autologous/allogeneic ECMs such as complete decellularization and reduction of batch-to-batch variability was not specifically addressed in the data provided herein. For the maintenance of ECM organization and complexity following lyophilization, evidence of complete decellularization was not addressed, but could be easily evaluated using polarized light microscopy and quantification of human DNA for example in constructs pre and post-lyophilization. 

      We appreciate the reviewer comments and acknowledge the lack of information in the first version of our manuscript. In line with our previous study (Pigeot et al., Advanced Materials 2021), the ectopic evaluation of our cartilage pellets was strictly done with lyophilized tissues using immunocompromised animals. Lyophilized tissues are thus considered devitalized, and not decellularized. Instead, the osteochondral defect experiment was performed with decellularized tissues in order to be able to implant the grafts in the rat immuno-competent model. This is now specified consistently throughout the manuscript. The decellularization process is also now incorporated accordingly in the method section (page 14, paragraph 2). We also provide quantifications of GAGs and DNAs from tissue pre- and post-decellularization (Supplementary figure 6A and 6B), described in the result section of the manuscript (page 9, paragraph 1). The decellularization step led to 97-98% of DNA removal.

      Importantly, we do not claim full maintenance of ECM integrity following lyophilization nor decellularization.  This is now clarified in the discussion (page 12, paragraph 2). However, we report their capacity to instruct skeletal regeneration in multiple contexts despite extensive processing.

      It would be ideal to see minimization of batch-to-batch variability using this approach, as mitigation of using a sole cell line is likely not sufficient (considering that the sole cell line-derived Matrigel does exhibit batch-to-batch and manufacturer-to-manufacturer variability). I recommend adding details regarding experimental design and outcomes not initially considered. Inter- and intraexperimental reproducibility was not adequately addressed. The size of in vitro-derived cartilage pellets was not quantified, and it is not clear that more than one independent 'differentiation' was performed from each gene-edited MSC line to generate in vitro replicates and constructs that were implanted in vivo. 

      We thank the Reviewer for the comment on variability/reproducibility concern. Using a cell line does confer higher robustness but indeed does not grant unlimited consistency of batch production. We now temper our claims in the discussion and mention the need to regularly recharacterize cell lines properties upon passages (page 12, paragraph 2). Using our edited lines, we have generated multiple batches of cartilage grafts for their in vitro characterization or in vivo performance assessment. We have now compiled batch variations of GAG content and pellet volume, provided as Supplementary figure 5. This revealed that batches are indeed not identical (nor each pellets), but the production remains consistent.

      The use of descriptive language in describing conclusions may mislead the reader and should be modified accordingly throughout the manuscript. For example, although this reviewer agrees with the comparative statements made by the authors regarding parental and gene-edited MSC lines, non-quantifiable terms such as 'frank' 'superior' (example, line 242) are inappropriate and should rather be discussed in terms of significance. Another example is 'rich-collagenous matrix,' which was not substantiated by uniform immunostaining for type II collagen (line 189). 

      We thank the Reviewer for the constructive suggestions. We have revised the language accordingly throughout the manuscript. 

      I have similar recommendations regarding conclusive statements from the rat implantation model, which was appropriately used for the purpose of evaluating the response of native skeletal cells to the different cell-derived ECMs. Interpretations of these results should be described with more accuracy. For example, increased TRAP staining does not indicate reduced active bone formation (line 237). Many would not conclude that GAGs were retained in the RUNX2-KO line graft subchondral region based on the histology. Quantification of % chondral regeneration using histology is not accurate as it is greatly influenced by the location in the defect from which the section was taken. Chondral regeneration is usually semi-quantified from gross observations of the cartilage surface immediately following excision. The statements regarding integration (example line 290) are not founded by histological evidence, which should show high magnification of the periphery of the graft adjacent to the native tissue. 

      We have revised our language relative to the TRAP staining description (page 9, paragraph 2). We also agree with the reviewer on the semi-quantitative approach of our methodology,  which we transparently disclosed both in the main text (page 9, paragraph 3) and method section (page 18, paragraph 2). The sectioning location does influence the analysis, but to prevent this we performed an assessment at different depth (top, middle, bottom for each sample). This is now implemented in our method section (page 18, paragraph 3). On the tissue integration, we now provide higher magnification images of the implant/host tissue area (Figure 5F).

      Reviewer #3 (Public Review): 

      Summary: 

      In this study, the authors have started off using an immortalized human cell line and then geneedited it to decrease the levels of VEGF1 (in order to influence vascularization), and the levels of Runx2 (to decrease chondro/osteogenesis). They first transplanted these cells with a collagen scaffold. The modified cells showed a decrease in vascularization when VEGF1 was decreased, and suggested an increase in cartilage formation. 

      In another study, the matrix generated by these cells was subsequently remodeled into a bone marrow organ. When RUNX2 was decreased, the cells did not mineralize in vitro, and their matrices expressed types I and II collagen but not type X collagen in vitro, in comparison with unedited cells. In vivo, the author claims that remodeling of the matrices into bone was somewhat inhibited. Lastly, they utilized matrices generated by RUNX2 edited cells to regenerate chondro-osteal defects. They suggest that the edited cells regenerated cartilage in comparison with unedited cells. 

      Strengths: 

      - The notion that inducing changes in the ECM by genetically editing the cells is a novel one, as it has long been thought that ECM composition influences cell activity. 

      - If successful, it may be possible to make off-the-shelf ECMS to carry out different types of tissue repair. 

      We thank the Reviewer for the critical evaluation of our work and the highlighted novelty of it.  

      Weaknesses: 

      - The authors have not generated histologically identifiable cartilage or bone in their transplants of the cells with a type I scaffold. 

      The chondrogenic differentiation of our MSOD-B line and their capacity of undergoing endochondral ossification has been robustly demonstrated in previous studies (Pigeot et al., Advanced Materials 2021 and Grigoryan et al., Science Translational Medicine 2022). In the present manuscript, we thus compare the chondrogenic capacity of newly established VEGF-KO and RUNX-KO lines to those of MSOD-B. We demonstrate by qualitative (Safranin-O staining, Collagen type 2 and Collagen type X immuno-stainings) and quantitative (glycosaminoglycans assay) assays that the generated tissues consist in cartilage tissue of similar quality than the MSOD-B. Of note, the safranin-O stainings were performed on lyophilized tissues, which can alter the staining quality/intensity. We now provide here additional stainings of generated tissues pre-lyophilization. This is implemented in Figure 1D and Figure 3D.

      On the contested formation of bone in vivo by our ECMs grafts, we have provided compelling qualitative evidence via Masson´s Trichrome stainings and quantification of mineralized volume by µCT. Both cortical bone and trabecular structures were identified ectopically. Those are standard evaluation methods in the field, we would be happy to receive additional suggestions by the Reviewer. 

      - In many cases, they did not generate histologically identifiable cartilage with their cell-free-edited scaffold. They did generate small amounts of bone but this is most likely due to BMPs that were synthesized by the cells and trapped in the matrix. 

      We now appreciate that the Reviewer agrees on the successful formation of bone induced by our engineered grafts. We however still respectfully disagree with the “small amount of bone” statement since our MSOD-B and MSOD-B VEGF KO cartilage grafts led to the full generation of a mature ectopic bone organ (that is, also composed of extensive marrow). This has been assessed qualitatively and quantitatively. 

      We agree with the Reviewer on the key role of BMP-2 in the remodeling process into bone and bone marrow, which we have extensively described in our previous publication (Pigeot et al., Advanced Materials 2021). However, the low amount of BMP-2 (in the dozens of nanogram/tissue range) embedded in the matrix is not sufficient per se to induce ectopic endochondral ossification. It is the combined presence of GAGs in the matrix -thus cartilage- that allows the success of bone formation.  

      - There is a great deal of missing detail in the manuscript. 

      We have incorporated additional methodological details describing the lyophilization/decellularization process of our tissues prior to evaluation (see Material and Methods section). We also have included a description of the MSOD-B line and implemented genetic elements (Supplementary Figure 1A).  

      - The in vivo study is underpowered, the results are not well documented pictorially, and are not convincing. 

      We believe our group size supports our conclusions confirmed by statistical assessment. We have provided additional stainings and images of higher magnifications (Figure 5) for both the ectopic and orthotopic in vivo evaluation.  

      - Given the fact that they have genetically modified cells, they could have done analyses of ECM components to determine what was different between the lines, both at the transcriptome and the protein level. Consequently, the study is purely descriptive and does not provide any mechanistic understanding of what mixture of matrix components and growth factors works best for cartilage or bone. But this presupposes that they actually induced the formation of bona fide cartilage, at least. 

      We thank the Reviewer for the suggestion. However, our study did not aim at understanding what ECM graft composition work best for cartilage nor bone regeneration respectively. Instead, we propose the exploitation of our cellular tools to interrogate the function of key ECM constituents and their impact in skeletal regeneration. We once more confirm that we generated cartilage grafts which is now better supported by additional histological assessment before lyophilization.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In a previous work, Prut and colleagues had shown that during reaching, high-frequency stimulation of the cerebellar outputs resulted in reduced reach velocity. Moreover, they showed that the stimulation produced reaches that deviated from a straight line, with the shoulder and elbow movements becoming less coordinated. In this report, they extend their previous work by the addition of modeling results that investigate the relationship between the kinematic changes and torques produced at the joints. The results show that the slowing is not due to reductions in interaction torques alone, as the reductions in velocity occur even for movements that are single joints. More interestingly, the experiment revealed evidence for the decomposition of the reaching movement, as well as an increase in the variance of the trajectory.

      Strengths:

      This is a rare experiment in a non-human primate that assessed the importance of cerebellar input to the motor cortex during reaching.

      Weaknesses:

      My major concerns are described below.

      If I understand the task design correctly, the monkeys did not need to stop their hand at the target. I think this design may be suboptimal for investigating the role of the cerebellum in control of reaching because a number of earlier works have found that the cerebellum's contributions are particularly significant as the movement ends, i.e., stopping at the target. For example, in mice, interposed nucleus neurons tend to be most active near the end of the reach that requires extension, and their activation produces flexion forces during the reach (Becker and Person 2019). Indeed, the inactivation of interposed neurons that project to the thalamus results in overshooting of reaching movements (Low et al. 2018). Recent work has also found that many Purkinje cells show a burst-pause pattern as the reach nears its endpoint, and stimulation of the mossy fibers tends to disrupt endpoint control (Calame et al. 2023). Thus, the fact that the current paper has no data regarding endpoint control of the reach is puzzling to me.

      We appreciate the reviewer’s point that cerebellar contributions can be particularly critical near the endpoint of a reach. In our current task design, monkeys were indeed required to hold at the target briefly—100 ms for Monkeys S and P, and 150 ms for Monkeys C and M—before receiving a reward. However, given the size of the targets and the velocity of movements, it often happened that the monkey didn’t have to stop its movement to obtain a reward. Importantly, we relaxed the task’s requirements (by increasing target size and reducing temporal constraints) to allow monkeys to perform the task under cerebellar block conditions as we found that the strict criteria in these conditions yield a low success rate. This design is suboptimal for studying endpoint accuracy which, as we now appreciate, is an important aspect of cerebellar control. In our revision, we will clarify these aspects of the task design and acknowledge that it is sub-optimal for examining the role of cerebellum in end-point control. Future studies will explicitly address this point more carefully.

      Because stimulation continued after the cursor had crossed the target, it is interesting to ask whether this disruption had any effects on the movements that were task-irrelevant. The reason for asking this is because we have found that whereas during task-relevant eye or tongue movements the Purkinje cells are strongly modulated, the modulations are much more muted when similar movements are performed but are task-irrelevant (Pi et al., PNAS 2024; Hage et al. Biorxiv 2024). Thus, it is interesting to ask whether the effects of stimulation were global and affected all movements, or were the effects primarily concerned with the task-relevant movements.

      This is a very interesting suggestion. Although our main analysis focused on target-directed reaching movements, we have the data for the between-trial movements under continuous stimulation (e.g., return to center movements). In our revised supplementary material, we will examine the effect of cerebellar block on endpoint velocities in inter-trial movements versus task-related movements.

      If the schematic in Figure 1 is accurate, it is difficult for me to see how any of the reaching movements can be termed single joint. In the paper, T1 is labeled as a single joint, and T2-T4 are labeled as dual-joint. The authors should provide data to justify this.

      The is reviewer right and movements to all targets engages shoulder and elbow but the single joint participation varied in a target-specific manner. In the manuscript, we used the term “single-joint” to indicate a target direction in which one joint remains stationary, resulting in minimal coupling torque at the adjacent joint. Specifically, for Targets 1 and 5 in our experiments, the net torque (and thus acceleration) at the elbow was negligible, and hence the shoulder experienced correspondingly low coupling torque (as illustrated in Figure 3c of our manuscript). To avoid confusion, we will use the term ‘predominantly single-joint’ movements in our revised manuscript to indicate targets with low coupling torques. We will also include an additional figure in the revised supplementary material displaying the net torques at the shoulder and elbow, similar to Figures 2c and 3c. Our goal is to demonstrate that movements to targets 1 and 5 are characterized by predominantly one-joint engagement (i.e., the elbow is stationary with low net torque) and low coupling torques, rather than implying a purely isolated, single-joint motion.

      Because at least part of this work was previously analyzed and published, information should be provided regarding which data are new.

      We will include a clear statement in the Methods section specifying which components of the dataset and analyses are entirely new. While some of the same animals and stimulation protocol were presented in prior work, the inverse-dynamics modeling, analyses of progressive movement changes across trials under stimulation and invariance of motor noise to movement velocity are newly reported in this manuscript.

      Reviewer #2 (Public review):

      This manuscript asks an interesting and important question: what part of 'cerebellar' motor dysfunction is an acute control problem vs a compensatory strategy to the acute control issue? The authors use a cerebellar 'blockade' protocol, consisting of high-frequency stimuli applied to the cerebellar peduncle which is thought to interfere with outflow signals. This protocol was applied in monkeys performing center outreaching movements and has been published from this laboratory in several preceding studies. I found the take-home-message broadly convincing and clarifying - that cerebellar block reduces muscle activation acutely particularly in movements that involve multiple joints and therefore invoke interaction torques, and that movements progressively slow down to in effect 'compensate' for these acute tone deficits. The manuscript was generally well written, and the data was clear, convincing, and novel. My comments below highlight suggestions to improve clarity and sharpen some arguments.

      Primary comments:

      (1) Torque vs. tone: Is it known whether this type of cerebellar blockade is reducing muscle tone or inducing any type of acute co-contraction that could influence limb velocity through mechanisms different than 'atonia'? If so, the authors should discuss this information in the discussion section starting around line 336, and clarify that this motivates (if it does) the focus on 'torques' rather than muscle activation. Relatedly, besides the fact that there are joints involved, is there a reason there is so much emphasis on torque per se? If the muscle is deprived of sufficient drive, it would seem that it would be more straightforward to conceptualize the deficit as one of insufficient timed drive to a set of muscles than joint force. Some text better contextualizing the choices made here would be sufficient to address this concern. I found statements like those in the introduction "hand velocity was low initially, reflecting a primary muscle torque deficit" to be lacking in substance. Either that statement is self-evident or the alternative was not made clear. Finally, emphasize that it is a loss of self-generated torque at the shoulder that accounts for the velocity deficits. At times the phrasing makes it seem that there is a loss of some kind of passive torque.

      We appreciate the reviewer’s emphasis on distinguishing reduced muscle tone and altered co-contraction patterns as possible explanations for decreased limb velocity. Our focus on torques arises from previous studies suggesting that the core deficit in cerebellar ataxia is impaired prediction of coupling torques. This point will be added in the discussion section of our revised manuscript where we will explain why we prioritize muscle torques and how muscle-level activation collectively contributes to net joint torques. Also, we will underscore that the observed velocity deficits primarily reflect a reduction of self-generated torque at the shoulder (whether acute or adaptive), rather than any reduction in passive torques.

      (2) Please clarify some of the experimental metrics: Ln 94 RESULTS. The success rate is used as a primary behavioral readout, but what constitutes success is not clearly defined in the methods. In addition to providing a clear definition in the methods section, it would also be helpful for the authors to provide a brief list of criteria used to determine a 'successful' movement in the results section before the behavioral consequences of stimulation are described. In particular, the time and positional error requirements should be clear.

      Successful trials were trials in which monkeys didn’t leave the center position before the go signal and reached the peripheral target within a specific time criteria. These values varied in different monkeys. We will include detailed definitions of our success criteria in the revised methods section of our manuscript. Specifically, we will update our methods section to include (i) the timing criteria of each phase of the trials and (ii) the size of the peripheral targets indicating the tolerance for endpoint accuracy.

      (3) Based on the polar plot in Figure 1c, it seemed odd to consider Targets 1-4 outward and 5-8 inward movements, when 1 and 5 are side-to-side. Is there a rationale for this grouping or might results be cleaner by cleanly segregating outward (targets 2-4) and inward (targets 6-8) movements? Indeed, by Figure 3 where interaction torques are measured, this grouping would seem to align with the hypothesis much more cleanly since it is with T2,T3,and T4 where clear coupling torques deficits are seen with cerebellar block.

      We acknowledge the reviewer’s observation regarding Targets 1 and 5 being side-to-side rather than strictly “outward” or “inward.” In the first section of our results, we grouped the targets in this way to emphasize the notably stronger effect of the cerebellar block on targets involving shoulder flexion (‘outward’) as compared to those involving shoulder extension (‘inwards’). For subsequent analyses we focused on the effects of cerebellar block on outward targets where movements were single-joint (Target 1) vs. multi-joint (Targets 2-4). To clarify this aspect, in our revised manuscript we will explain the rationale for grouping T1–T4 as “outward” and T5–T8 as “inward,” including how we defined them.

      (4) I did not follow Figure 3d. Both the figure axis labels and the description in the main text were difficult to follow. Furthermore, the color code per animal made me question whether the linear regression across the entire dataset was valid, or would be better performed within animal, and the regressions summarized across animals. The authors should look again at this section and figure.

      We will revise the figure labels and legend to clarify how each axis is defined. Please note that pooling the data was done after confirming that data from each animal expressed a similar trend. Specifically, the correlation coefficients were all positive but statistically significant in 3 out of the 4 monkeys. Moreover, following the reviewers’ feedback, we also did a partial correlation analysis (which controls for the variability across monkeys) and found a significant correlation (r = 0.33, p < 0.001). These points will be described in the revised manuscript.

      (5) Line 206+ The rationale for examining movement decomposition with a cerebellar block is presented as testing the role of the cerebellum in timing. Yet it is not spelled out what movement decomposition and trajectory variability have to do with motor timing per se.

      The reviewer is right and the relations between timing, decomposition and variability need to be explicitly presented. In our revision, we will explain how decomposed movements may reflect impaired temporal coordination across multiple joints—a critical cerebellar function. We will also clarify how increased variability in joint coordination can result in increased trial-to-trial variability of trajectories.

      Reviewer #3 (Public review):

      Summary:

      In their manuscript, "Disentangling acute motor deficits and adaptive responses evoked by the loss of cerebellar output," Sinha and colleagues aim to identify distinct causes of motor impairments seen when perturbing cerebellar circuits. This goal is an important one, given the diversity of movement-related phenotypes in patients with cerebellar lesions or injuries, which are especially difficult to dissect given the chronic nature of the circuit damage. To address this goal, the authors use high-frequency stimulation (HFS) of the superior cerebellar peduncle in monkeys performing reaching movements. HFS provides an attractive approach for transiently disrupting cerebellar function previously published by this group. First, they found a reduction in hand velocities during reaching, which was more pronounced for outward versus inward movements. By modeling inverse dynamics, they find evidence that shoulder muscle torques are especially affected. Next, the authors examine the temporal evolution of movement phenotypes over successive blocks of HFS trials. Using this analysis, they find that in addition to the acute, specific effects on muscle torques in early HFS trials, there was an additional progressive reduction in velocity during later trials, which they interpret as an adaptive response to the inability to effectively compensate for interaction torques during cerebellar block. Finally, the authors examine movement decomposition and trajectory, finding that even when low-velocity reaches are matched to controls, HFS produces abnormally decomposed movements and higher than expected variability in trajectory.

      Strengths:

      Overall, this work provides important insight into how perturbation of cerebellar circuits can elicit diverse effects on movement across multiple timescales.

      The HFS approach provides temporal resolution and enables analysis that would be hard to perform in the context of chronic lesions or slow pharmacological interventions. Thus, this study describes an important advance over prior methods of circuit disruption, and their approach can be used as a framework for future studies that delve deeper into how additional aspects of sensorimotor control are disrupted (e.g., response to limb perturbations).

      In addition, the authors use well-designed behavioral approaches and analysis methods to distinguish immediate from longer-term adaptive effects of HFS on behavior. Moreover, inverse dynamics modeling provides important insight into how movements with different kinematics and muscle dynamics might be differentially disrupted by cerebellar perturbation.

      Weaknesses:

      The argument that there are acute and adaptive effects to perturbing cerebellar circuits is compelling, but there seems to be a lost opportunity to leverage the fast and reversible nature of the perturbations to further test this idea and strengthen the interpretation. Specifically, the authors could have bolstered this argument by looking at the effects of terminating HFS - one might hypothesize that the acute impacts on muscle torques would quickly return to baseline in the absence of HFS, whereas the longer-term adaptive component would persist in the form of aftereffects during the 'washout' period. As is, the reversible nature of the perturbation seems underutilized in testing the authors' ideas.

      We agree that our approach could more explicitly exploit the rapid reversibility of high-frequency stimulation (HFS) by examining post-stimulation ‘washout’ periods. However, for the present dataset, we ended the session after the set of cerebellar block trials. We plan to study the effect of cerebellar block on immediate post-block washout trials in the future.  

      The analysis showing that there is a gradual reduction in velocity during what the authors call an adaptive phase is convincing. That said, the argument is made that this is due to difficulty in compensating for interaction torques. Even if the inward targets (i.e., targets 6-8) do not show a deficit during the acute phase, these targets still have significant interaction torques (Figure 3c). Given the interpretation of the data as presented, it is not clear why disruption of movement during the adaptive phase would not be seen for these targets as well since they also have large interaction torques. Moreover, it is difficult to delve into this issue in more detail, as the analyses in Figures 4 and 5 omit the inward targets.

      The reviewer is right and movements to Targets 6–8 (inward) were seemingly unaffected despite also involving significant interaction torques. In fact, we have already attempted to address this issue in the discussion section of the version 1 of our manuscript. Specifically, we note that while outward targets (2–4) tend to involve higher coupling torque impulses on average, this alone does not fully explain the differential impact of cerebellar block, as illustrated by discrepancies at the individual target level (e.g., target 7 vs. target 1). We proposed two possible explanations: (1) a bias toward shoulder flexion in the effect of cerebellar block—consistent with earlier studies showing ipsilateral flexor activation or tone changes following stimulation or lesioning of the deep cerebellar nuclei; and (2) a posture-related facilitation of inward (shoulder extension) movements from the central starting position.

      The text in the Introduction and in the prior work developing the HFS approach overstates the selectivity of the perturbations. First, there is an emphasis on signals transmitted to the neocortex. As the authors state several times in the Discussion, there are many subcortical targets of the cerebellar nuclei as well, and thus it is difficult to disentangle target-specific behavioral effects using this approach. Second, the superior cerebellar peduncle contains both cerebellar outputs and inputs (e.g., spinocerebellar). Therefore, the selectivity in perturbing cerebellar output feels overstated. Readers would benefit from a more agnostic claim that HFS affects cerebellar communication with the rest of the nervous system, which would not affect the major findings of the study.

      The reviewer is right that the superior cerebellar peduncle carries both descending and ascending fibers, and that cerebellar nuclei project to subcortical as well as cortical targets. However, it is also important to note that in primates the cerebellar-thalamo-cortical (CTC) pathway greatly expanded (on the expanse of the cerbello-rubro-spinal tract) in mediating cerebellar control of voluntary movements (Horne and Butler, 1995). The cerebello-subcortical pathways lost its importance over the course of evolution (Nathan and Smith, 1982, Padel et al., 1981, ten Donkelaar, 1988). In our previous study we found that the ascending spinocerebellar axons which enter the cerebellum through the SCP are weakly task-related and the descending system is quite small (Cohen et al, 2017). However, we cannot rule out an effect of HFS mediated in part through other systems. In the revised introduction section, we will clarify this point and use more careful language about the scope of our stimulation, emphasizing that HFS disrupts cerebellar communication broadly, rather than solely the cerebello-thalamo-cortical pathway.

      The text implies that increased movement decomposition and variability must be due to noise. However, this assumption is not tested. It is possible that the impairments observed are caused by disrupted commands, independent of whether these command signals are noisy. In other words, commands could be low noise but still faulty.

      We recognize the reviewer’s concern about linking movement decomposition and trial-to-trial trajectory variability with motor noise. As presented in our discussion section, we interpret these motor abnormalities as a form of motor noise in the sense that they are generated by faulty motor commands. We draw our interpretation from the findings of previous research work which show that the cerebellum aids in the state estimation of the limb and subsequent generation of accurate feedforward commands. Therefore, disruption of the cerebellar output may lead to faulty motor commands resulting in the observed asynchronous joint activations (i.e., movement decomposition) and unpredictable trajectories (i.e., increased trial-to-trial variability). Both observed deficits resemble increased motor noise.

      Throughout the text, the use of the term 'feedforward control' seems unnecessary. To dig into the feedforward component of the deficit, the authors could quantify the trajectory errors only at the earliest time points (e.g., in Figure 5d), but even with this analysis, it is difficult to disentangle feedforward- and feedback-mediated effects when deficits are seen throughout the reach. While outside the scope of this study, it would be interesting to explore how feedback responses to limb perturbation are affected in control versus HFS conditions. However, as is, these questions are not explored, and the claim of impaired feedforward control feels overstated.

      We agree that to strictly focus on feedforward control, we could have examined the measured variables in the first 50-100 ms of the movement which has been shown to be unaffected by feedback responses (Pruszynski et al. 2008, Todorov and Jordan 2002, Pruszynski and Scott 2012, Crevecoeur et al. 2013). However, in our task the amplitude of movements made by our monkeys was small and therefore the response measures we used were too small in the first 50-100 ms for a robust estimation. Also, fixing a time window led to an unfair comparison between control and cerebellar block trials, in which velocity was significantly reduced and therefore movement time was longer. Therefore, we used the peak velocity, torque-impulse at the peak velocity and maximum deviation of the hand trajectory as response measures. We will acknowledge this point in the discussion section of our revised manuscript. We will also tone down references to feedforward control throughout the text of our revised manuscript as suggested by the reviewer.

      The terminology 'single-joint' movement is a bit confusing. At a minimum, it would be nice to show kinematics during different target reaches to demonstrate that certain targets are indeed single joint movements. More of an issue, however, is that it seems like these are not actually 'single-joint' movements. For example, Figure 2c shows that target 1 exhibits high elbow and shoulder torques, but in the text, T1 is described as a 'single-joint' reach (e.g. lines 155-156). The point that I think the authors are making is that these targets have low interaction torques. If that is the case, the terminology should be changed or clarified to avoid confusion.

      Indeed, as reviewer #1 also noted, movements to target 1 and 5 are not purely single-joint but rather have relatively low coupling torques. Our intention while using the term “single-joint” was to indicate a target direction in which one joint remains stationary, resulting in minimal coupling torque at the adjacent joint. Specifically, for Targets 1 and 5 in our experiments, the net torque (and thus acceleration) at the elbow was negligible, and hence the shoulder experienced correspondingly low coupling torque (as illustrated in Figure 3c of our manuscript). ). To avoid confusion, we will use the term ‘predominantly single-joint’ movements in our revised manuscript to indicate targets with low coupling torques. We will also include an additional figure in the revised supplementary material displaying the net torques at the shoulder and elbow, similar to Figures 2c and 3c. Our goal is to demonstrate that movements to targets 1 and 5 are characterized by predominantly one-joint engagement (i.e., the elbow is stationary with low net torque) and low coupling torques, rather than implying a purely isolated, single-joint motion.

      The labels in Figure 3d are confusing and could use more explanation in the figure legend.

      In Figure 3d, it is stated that data from all monkeys is pooled. However, if there is a systematic bias between animals, this could generate spurious correlations. Were correlations also calculated for each animal separately to confirm the same trend between velocity and coupling torques holds for each animal?

      We will revise the figure legend and main-text explanation for Figure 3d. Please note that pooling the data was done after confirming that data from each animal expressed a similar trend. Specifically, the correlation coefficients were positive but significant for 3 out of the 4 monkeys. Moreover, following the reviewers’ feedback, we also did a partial correlation analysis (which controls for the variability across monkeys) and found a significant correlation (r = 0.33, p < 0.001). These points will be described in the revised manuscript.

      In Table S1, it would be nice to see target-specific success rates. The data would suggest that targets with the highest interaction torques will have the largest reduction in success rates, especially during later HFS trials. Is this the case?

      We will provide a breakdown of the success rates as a function of targets. However, one should note that success/failure may depend on several factors beyond impaired limb dynamics. In a previous study (Nashef et al. 2019) we identified several causes of failure such as (i) not entering the central target in time, (ii) moving out too early from the peripheral target, (iii) Reaction time longer than permitted, or (iv) premature exit from the central target before permitted.

    1. Author response:

      We thank all reviewers for the highly detailed review and the time and effort which has been invested in this review. We have read their perspectives, questions and suggested improvements with great interest. We have reflected on the public review in detail and have made the first provisional responses which are outlined below. First, we would like to respond to four main issues pointed out by the editor and reviewers:

      (1) Lack of yield data in the manuscript: There have been yield data collected in most of the sites and years of our study, and these have already been published and cited in our manuscript. In the appendix of our manuscript, we included a table with yield data for the sites and years in which the beetle diversity was studied. These data show that strip cropping does not cause a systematic yield reduction.

      (2) Sampling design clarification: Our paper combines data from trials conducted at different locations and years. On the one hand this allows an analysis of a comprehensive dataset, but on the other hand in some cases there were slight inconsistencies in how data were collected or processed (e.g. taxonomic level of species identification). We will explain the sampling design and data analysis in more detail to increase clarity and transparency.

      (3) Additional data analysis: In the revised manuscript we will present an analysis on the responses of abundances of the 12 most common ground beetle genera to strip cropping. This will give better insight of the variation in responses among ground beetle taxa.

      (4) Restrict findings to our system: We will nuance our findings further and will focus more strongly on the implications of our data on ground beetle communities, rather than on agrobiodiversity in a broader sense.

      We will further work on improving the manuscript based on reviewers feedback in the coming weeks, aiming to submit a revised version of the manuscript at the end of February.

      Detailed response to editor and reviewers:

      Editor Comments:

      (1) You only have analyzed ground beetle diversity, it would be important to add data on crop yields, which certainly must be available (note that in normal intercropping these would likely be enhanced as well).

      Most yield data have been published in three previous papers, which we already cited or will cite (one was not yet published at the time of submission). Our argumentation is based on these studies. We had also already included a table in the appendix that showed the yield data that relates specifically to our locations and years of measurement. The finding that strip cropping does not majorly affect yield is based on these findings. We will consider changing the title of our manuscript to remove the explicit focus on yield.

      (2) Considering the heterogeneous data involving different experiments it is particularly important to describe the sampling design in detail and explain how various hierarchical levels were accounted for in the analysis.

      We agree that some important details to our analysis were not described in sufficient detail. Especially reviewer 2 pointed out several relevant points that we did account for in our analyses, but which were not clear from the text in the methods section. We are convinced that our data analyses are robust and that our conclusions are supported by the data. We will revise the methods section to make our approach clearer and more transparent.

      (3) In addition to relative changes in richness and density of ground beetles you should also present the data from which these have been derived. Furthermore, you could also analyze and interpret the response of the different individual taxa to strip cropping.

      With our heterogeneous dataset it was quite complicated to show overall patterns of absolute changes in ground beetle abundance and richness, especially for the field-level analyses. As the sampling design was not always the same and occasionally samples were missing, the number of year series that made up a datapoint were different among locations and years. However, we always made sure that for the comparison of a paired monoculture and strip cropping field, the number of year series was always made equal through rarefaction. That is, the number of ground beetle(s) (species) are always expressed as the number per 2 to 6 samples. Therefore, we prefer to stick to relative changes as we are convinced that this gives a fairer representation of our complex dataset.

      We agree with the second point that both the editor and several reviewers pointed out. The indicator species analyses that we used were biased by rare species, and we now omit this analysis. Instead, we will include a GLM analysis on the responses of abundances of the 12 most common ground beetle genera to strip cropping. We chose for genera here (and not species) as we could then include all locations and years within the analysis, and in most cases a genus was dominated by a single species (but notable exceptions were Amara and Harpalus, which were made up of several species). We will illustrate these findings still in a similar fashion as we did for the indicator species analysis.

      (4) Keep to your findings and don't overstate them but try to better connect them to basic ecological hypotheses potentially explaining them.

      After careful consideration of the important points that reviewers point out, we decided to nuance our points about biodiversity conservation along two key lines: (1) the extent to which ground beetles can be indicators of wider biodiversity changes; and (2) our findings that are not as straightforward positive as our narrative suggests. We still believe that strip cropping contributes positively to carabid communities, and will carefully check the text to avoid overstatements.

      Reviewer 1:

      Summary:

      This study demonstrates that strip cropping enhances the taxonomic diversity of ground beetles across organically-managed crop systems in the Netherlands. In particular, strip cropping supported 15% more ground beetle species and 30% more individuals compared to monocultures.

      Strengths:

      A well-written study with well-analyzed data of a complex design. The data could have been analyzed differently e.g. by not pooling samples, but there are pros and cons for each type of analysis and I am convinced this will not affect the main findings. A strong point is that data were collected for 4 years. This is especially strong as most data on biodiversity in cropping systems are only collected for one or two seasons. Another strong point is that several crops were included.

      We thank reviewer 1 for their kind words and agree with this strength of the paper. The paper combines data from trials conducted at different locations and years. On the one hand this allows an analysis of a comprehensive dataset, but on the other hand in some cases there were slight inconsistencies in how data were collected or processed (e.g. taxonomic level of species identification).  

      Weaknesses:

      This study focused on the biodiversity of ground beetles and did not examine crop productivity. Therefore, I disagree with the claim that this study demonstrates biodiversity enhancement without compromising yield. The authors should present results on yield or, at the very least, provide a stronger justification for this statement.

      We acknowledge that we indeed did not formally analyze yield in our study, but we have good reason for this. The claim that strip cropping does not compromise yield comes from several extensive studies (Juventia et al., 2024; Ditzler et al., 2023; Carillo-Reche et al., 2023) that were conducted in nearly all the sites and years that we included in our study. We chose not to include formal analyses of productivity for two key reasons: (1) a yield analysis would duplicate already published analyses, and (2) we prefer to focus more on the ecology of ground beetles and the effect of strip cropping on biodiversity, rather than diverging our focus also towards crop productivity. Nevertheless, we have shown the results on yield in Table S6 and refer extensively to the studies that have previously analyzed this data.

      Reviewer 2:

      Summary:

      The authors aimed to investigate the effects of organic strip cropping on carabid richness and density as well as on crop yields. They find on average higher carabid richness and density in strip cropping and organic farming, but not in all cases.

      Strengths:

      Based on highly resolved species-level carabid data, the authors present estimates for many different crop types, some of them rarely studied, at the same time. The authors did a great job investigating different aspects of the assemblages (although some questions remain concerning the analyses) and they present their results in a visually pleasing and intuitive way.

      We appreciate the kind words of reviewer 2 and their acknowledgement of the extensiveness of our dataset. In our opinion, the inclusion of many different crops is indeed a strength, rarely seen in similar studies; and we are happy that the figures are appreciated.

      Weaknesses:

      The authors used data from four different strip cropping experiments and there is no real replication in space as all of these differed in many aspects (different crops, different areas between years, different combinations, design of the strip cropping (orientation and width), sampling effort and sample sizes of beetles (differing more than 35 fold between sites; L 100f); for more differences see L 237ff). The reader gets the impression that the authors stitched data from various places together that were not made to fit together. This may not be a problem per se but it surely limits the strength of the data as results for various crops may only be based on small samples from one or two sites (it is generally unclear how many samples were used for each crop/crop combination).

      The paper indeed combines data from trials conducted at different locations and years. On the one hand this allows an analysis of a comprehensive dataset, but on the other hand in some cases there were slight differences in the experimental design. At the time that we did our research, there were only a handful of farmers that were employing strip cropping within the Netherlands, which greatly reduced the number of fields for our study. Therefore, we worked in the sites that were available and studied as many crops on these sites. Since there was variation in the crops grown in the sites, for some crops we have limited replication. In the revision we will explain this more clearly.

      One of my major concerns is that it is completely unclear where carabids were collected. As some strips were 3m wide, some others were 6m and the monoculture plots large, it can be expected that carabids were collected at different distances from the plot edge. This alone, however, was conclusively shown to affect carabid assemblages dramatically and could easily outweigh the differences shown here if not accounted for in the models (see e.g. Boetzl et al. (2024) or Knapp et al. (2019) among many other studies on within field-distributions of carabids).

      Point well taken and we will present a more detailed description of the sampling design in the methods. Samples were always taken at least 10 meters into the field, and always in the middle of the strip. This would indeed mean that there is a small difference between the 3- and 6m wide strips regarding distance from another strip, but this was then only a difference of 1.5 to 3 meters from the edge. A difference that, based on our own extensive experience with ground beetle communities, will not have a large impact on the findings of ground beetles. The distance from field/plot edges was similar between monocultures and strip cropped fields.

      The authors hint at a related but somewhat different problem in L 137ff - carabid assemblages sampled in strips were sampled in closer proximity to each other than assemblages in monoculture fields which is very likely a problem. The authors did not check whether their results are spatially autocorrelated and this shortcoming is hard to account for as it would have required a much bigger, spatially replicated design in which distances are maintained from the beginning. This limitation needs to be stated more clearly in the manuscript.

      This is a limitation that is hard to avoid in comparisons between strip cropping and monoculture systems because the use of a statistically robust design with sufficient replication and still using field sizes that are representative for farming practice are often not possible. We will acknowledge this limitation in the revised manuscript. To allow a fair comparison based on sufficient number of replications, we chose to combine data from several years and locations (despite this not being the ideal experimental design). This approach has the drawback that ground beetle communities are difficult to compare. Therefore, we chose to further investigate two years of data from Wageningen as the factorial design allowed a fair comparison between monocultures and strip cropping. We analyzed three crop combinations during two years, but we still cannot exclude a potential influence of spatial autocorrelation. We acknowledged this limitation in our original submission, and we will clarify this point further in the revision. 

      Similarly, we know that carabid richness and density depend strongly on crop type (see e.g. Toivonen et al. (2022)) which could have biased results if the design is not balanced (this information is missing but it seems to be the case, see e.g. Celeriac in Almere in 2022).

      The samples size ranges between 2 and 6 per combination of cropping design, crop, location and year. We believe that this will allow a meaningful analysis. Moreover, our main focus is the comparison between monoculture and strip cropping, and not the comparison between different crops. Even though we show that crop types have different ground beetle communities, we are most interested in the contrast of ground beetle communities in strip cropping and monoculture systems.  

      A more basic problem is that the reader neither learns where traps were located, how missing traps were treated for analyses how many samples there were per crop or crop combination (in a simple way, not through Table S7 - there has to have been a logic in each of these field trials) or why there are differences in the number of samples from the same location and year (see Table S7). This information needs to be added to the methods section.

      Point well taken. We will clarify this further in the revised manuscript. As we combined data from several experimental designs that originally had slightly different research questions, this in part caused differences between numbers of rounds or samples per crop, location or year.

      As carabid assemblages undergo rapid phenological changes across the year, assemblages that are collected at different phenological points within and across years cannot easily be compared. The authors would need to standardize for this and make sure that the assemblages they analyze are comparable prior to analyses. Otherwise, I see the possibility that the reported differences might simply be biased by phenology.

      We agree and we dealt with this issue by using year series instead of using individual samples of different rounds. While this approach is not perfect, it allows us to get the best possible impression of the entire ground beetle community across seasons. For our analyses we had the choice to only include data from sampling rounds that were conducted at the same time, or to include all available data. We chose to analyze all data, and made sure that the number of samples between strip cropping and monoculture fields per location, year and crop was always the same by pooling and rarefaction. In this way we have analyzed a complex multi-year, multi-crop and multi-location dataset as good as we could.

      Surrounding landscape structure is known to affect carabid richness and density and could thus also bias observed differences between treatments at the same locations (lower overall richness => lower differences between treatments). Landscape structure has not been taken into account in any way.

      We did not include landscape structure as there are only 4 sites, which does not allow a meaningful analysis of potential effects landscape structure. Studying how landscape interacts with strip cropping to influence insect biodiversity would require at least, say 15 to 20 sites, which was not feasible for this study. However, such an analysis may be possible in an ongoing project (CropMix) which includes many farms that work with strip cropping.

      In the statistical analyses, it is unclear whether the authors used estimated marginal means (as they should) - this needs to be clarified.

      In the revised manuscript we will further clarify this point.

      In addition, and as mentioned by Dr. Rasmann in the previous round (comment 1), the manuscript, in its current form, still suffers from simplified generalizations that 'oversell' the impact of the study and should be avoided. The authors restricted their analyses to ground beetles and based their conclusions on a design with many 'heterogeneities' - they should not draw conclusions for farmland biodiversity but stick to their system and report what they found. Although I understand the authors have previously stated that this is 'not practically feasible', the reason for this comment is simply to say that the authors should not oversell their findings.

      In the revised manuscript, we will nuance our findings by explaining that strip cropping is a potentially useful tool to support ground beetle biodiversity in agricultural fields, but the effects on other taxa still needs to be further explored.

      Reviewer 3:

      Summary:

      In this paper, the authors made a sincere effort to show the effects of strip cropping, a technique of alternating crops in small strips of several meters wide, on ground beetle diversity. They state that strip cropping can be a useful tool for bending the curve of biodiversity loss in agricultural systems as strip cropping shows a relative increase in species diversity (i.e. abundance and species richness) of the ground beetle communities compared to monocultures. Moreover, strip cropping has the added advantage of not having to compromise on agricultural yields.

      Strengths:

      The article is well written; it has an easily readable tone of voice without too much jargon or overly complicated sentence structure. Moreover, as far as reviewing the models in depth without raw data and R scripts allows, the statistical work done by the authors looks good. They have well thought out how to handle heterogenous, yet spatially and temporarily correlated field data. The models applied and the model checks performed are appropriate for the data at hand. Combining RDA and PCA axes together is a nice touch.

      We thank reviewer 3 for their kind words and appreciation for the simple language and analysis that we used.

      Weaknesses:

      The evidence for strip cropping bringing added value for biodiversity is mixed at best. Yes, there is an increase in relative abundance and species richness at the field level, but it is not convincingly shown this difference is robust or can be linked to clear structural and hypothesised advantages of the strip cropping system. The same results could have been used to conclude that there are only very limited signs of real added value of strip cropping compared to monocultures.

      Point well taken. We agree that the effect of strip cropping on carabid beetle communities are subtle and we will nuance the text in the revised version to reflect this.

      There are a number of reasons for this:

      (1) Significant differences disappear at crop level, as the authors themselves clearly acknowledge, meaning that there are no differences between pairs of similar crops in the strip cropping fields and their respective monoculture. This would mean the strips effectively function as "mini-monocultures".

      This is indeed in line with our conclusions. Based on our data and results, the advantages of strip cropping seem mostly to occur because crops with different communities are now on a same field, rather than that within the strips you get mixtures of communities related to different crops. We discussed this in the first paragraph of the discussion in the original submission.

      The significant relative differences at the field level could be an artifact of aggregation instead of structural differences between strip cropping and monocultures; with enough data points things tend to get significant despite large variance. This should have been elaborated further upon by the authors with additional analyses, designed to find out where differences originate and what it tells about the functioning of the system. Or it should have provided ample reason for cautioning in drawing conclusions about the supposed effectiveness of strip cropping based on these findings.

      We believe that this is a misunderstanding of our approach. In the field-level analyses we pooled samples from the same field (i.e. pseudo-replicates were pooled), resulting in a relatively small sample size of 50 samples. We will explain this better in the methods section. Therefore, the statement “with enough data points things tend to get significant” is not applicable here.

      (2) The authors report percentages calculated as relative change of species richness and abundance in strip cropping compared to monocultures after rarefaction. This is in itself correct, however, it can be rather tricky to interpret because the perspective on actual species richness and abundance in the fields and treatments is completely lost; the reported percentages are dimensionless. The authors could have provided the average cumulative number of species and abundance after rarefaction. Also, range and/or standard error would have been useful to provide information as to the scale of differences between treatments. This could provide a new perspective on the magnitude of differences between the two treatments which a dimensionless percentage cannot.

      We agree that this would be the preferred approach if we would have had a perfectly balanced dataset. However, this approach is not feasible with our unbalanced design and differences in sampling effort. While we acknowledge the limitation of the interpretation of percentages, it does allow reporting relative changes for each combination of location, year and crop. The number of samples on which the percentages were based were always kept equal (through rarefaction) between the cropping systems (for each combination of location, year and crop), but not among crops, years and location. The reason for this is that we did not always have an equal number of samples available between both cropping systems, and this approach allowed us to make a better estimation whenever more samples were available. For example, sometimes we had 2 samples from a strip cropped field and 6 from the monoculture, here we would use rarefaction up to 2 samples (where we would just have a better estimation from the monoculture). In other cases, we had 4 samples in both strip cropped and monoculture field, here we chose to use rarefaction to 4 samples to get a better estimation altogether. Adding a value for actual richness or abundance to the figures would have distorted these findings, as the variation would be huge (as it would represent the number of ground beetle(s) species per 2 to 6 pitfall samples). Furthermore, the dimension that reviewer 3 describes would thus be “The number of ground beetle species / individuals per 2 to 6 samples”, not a very informative unit either. We chose to trade-off better estimations of difference between cropping systems over a more readily interpretable unit.

      (3) The authors appear to not have modelled the abundance of any of the dominant ground beetle species themselves. Therefore it becomes impossible to assess which important species are responsible (if any) for the differences found in activity density between strip cropping and monocultures and the possible life history traits related reasons for the differences, or lack thereof, that are found. A big advantage of using ground beetles is that many life history traits are well studied and these should be used whenever there is reason, as there clearly is in this case. Moreover, it is unclear which species are responsible for the difference in species richness found at the field level. Are these dominant species or singletons? Do the strip cropping fields contain species that are absent in the monoculture fields and are not the cause of random variation or sampling? Unfortunately, the authors do not report on any of these details of the communities that were found, which makes the results much less robust.

      Thank you for raising this point. We have reconsidered our indicator species analysis and found that it is rather sensitive for rare species and insensitive for changes in common species. Therefore, we will replace the indicator species analyses with a GLM analysis for the 12 most common genera of ground beetles In the revised manuscript. This will allow us to go more in depth on specific traits of the genera which abundances change depending on the cropping system. In the revised manuscript, we will also discuss these common genera more in depth, rather than focusing on rarer species. Furthermore, we will add information on rarity and habitat preference to the table that shows species abundances per location (Table S2).

      (4) In the discussion they conclude that there is only a limited amount of interstrip movement by ground beetles. Otherwise, the results of the crop-level statistical tests would have shown significant deviation from corresponding monocultures. This is a clear indication that the strips function more like mini-monocultures instead of being more than the sum of its parts.

      This is in line with our point in the first paragraph of the discussion and an important message of our manuscript.

      (5) The RDA results show a modelled variable of differences in community composition between strip cropping and monoculture. Percentages of explained variation of the first RDA axis are extremely low, and even then, the effect of location and/or year appear to peak through (Figure S3), even though these are not part of the modelling. Moreover, there is no indication of clustering of strip cropping on the RDA axis, or in fact on the first principal component axis in the larger RDA models. This means the explanatory power of different treatments is also extremely low. The crop level RDA's show some clustering, but hardly any consistent pattern in either communities of crops or species correlations, indicating that differences between strip cropping and monocultures are very small.

      We agree and we make a similar point in the first paragraph of the discussion.

      Furthermore, there are a number of additional weaknesses in the paper that should be addressed:

      The introduction lacks focus on the issues at hand. Too much space is taken up by facts on insect decline and land sharing vs. land sparing and not enough attention is spent on the scientific discussion underlying the statements made about crop diversification as a restoration strategy. They are simply stated as facts or as hypotheses with many references that are not mentioned or linked to in the text. An explicit link to the results found in the large number of references should be provided.

      We will streamline the introduction by omitting the land sharing vs. land sparing topic and better linking references to our research findings.

      The mechanistic understanding of strip cropping is what is at stake here. Does strip cropping behave similarly to intercropping, a technique that has been proven to be beneficial to biodiversity because of added effects due to increased resource efficiency and greater plant species richness? This should be the main testing point and agenda of strip cropping. Do the biodiversity benefits that have been shown for intercropping also work in strip cropping fields? The ground beetles are one way to test this. Hypotheses should originate from this and should be stated clearly and mechanistically.

      We agree with the reviewer and will clarify this research direction clearer in the introduction of the revised manuscript.

      One could question how useful indicator species analysis (ISA) is for a study in which predominantly highly eurytopic species are found. These are by definition uncritical of their habitat. Is there any mechanistic hypothesis underlying a suspected difference to be found in preferences for either strip cropping or monocultures of the species that were expected to be caught? In other words, did the authors have any a priori reasons to suspect differences, or has this been an exploratory exercise from which unexplained significant results should be used with great caution?

      Point well taken. We agree that the indicator species analysis has limitations and therefore now replaced this with GLM analysis for the 12 most common ground beetle genera.

      However, setting these objections aside there are in fact significant results with strong species associations both with monocultures and strip cropping. Unfortunately, the authors do not dig deeper into the patterns found a posteriori either. Why would some species associate so strongly with strip cropping? Do these species show a pattern of pitfall catches that deviate from other species, in that they are found in a wide range of strips with different crops in one strip cropping field and therefore may benefit from an increased abundance of food or shelter? Also, why would so many species associate with monocultures? Is this in any way logical? Could it be an artifact of the data instead of a meaningful pattern? Unfortunately, the authors do not progress along these lines in the methods and discussion at all.

      We thank reviewer 3 for these valuable perspectives. In the revised manuscript, we will further explore the species/genera that respond to cropping systems and discuss these findings in more detail.

      A second question raised in the introduction is whether the arable fields that form part of this study contain rare species. Unfortunately, the authors do not elaborate further on this. Do they expect rare species to be more prevalent in the strip cropping fields? Why? Has it been shown elsewhere that intercropping provides room for additional rare species?

      The answer is simply no, we did not find more rare species in strip cropping. In the revised manuscript, we will add a column for rarity (according to waarneming.nl) in the table showing abundances of species per location. We only found two rare species, one of which we only found a single individual and one that was more related to the open habitat created by a failed wheat field. We will discuss this more in depth in the discussion.

      Considering the implications the results of this research can have on the wider discussion of bending the curve and the effects of agroecological measures, bold claims should be made with extreme restraint and be based on extensive proof and robust findings. I am not convinced by the evidence provided in this article that the claim made by the authors that strip cropping is a useful tool for bending the curve of biodiversity loss is warranted.

      We believe that strip cropping can be a useful tool because farmers readily adopt it and it can result in modest biodiversity gains without yield loss. However, strip cropping is indeed not a silver bullet (which we also don’t claim). We will nuance the implications of our study in the revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Goal: Find downstream targets of cmk-1 phosphorylation, identify one that also seems to act in thermosensory habituation, test for genetic interactions between cmk-1 and this gene, and assess where these genes are acting in the thermosensory circuit during thermosensory habituation.

      Methods: Two in vitro analyses of cmk-1 phosphorylation of C. elegans proteins. Thermosensory habituation of cmk-1 and tax-6 mutants and double mutants was assessed by measuring the rate of heat-evoked reversals (reversal probability) of C. elegans before and after 20s ISI repeated heat pulses over 60 minutes.

      Conclusions: cmk-1 and tax-6 act in separate habituation processes, primarily in AFD, that interact complexly, but both serve to habituate the thermosensory reversal response. They found that cmk-1 primarily acts in AFD and tax-6 primarily acts in RIM (and FLP for naïve responses). They also identified hundreds of potential cmk-1 phosphorylation substrates in vitro.

      Strengths:

      The effect size in the genetic data is quite strong and a large number of genetic interaction experiments between cmk-1 and tax-1 demonstrate a complex interaction.

      Thanks a lot for these positive remarks.

      Weaknesses:

      The major concern about this manuscript is the assumption that the process they are observing is habituation. The two previously cited papers using this (or a very similar) protocol, Lia and Glauser 2020 and Jordan and Glauser 2023, both use the word 'adaptation' to describe the observed behavioral decrement. Jordan and Glauser 2023 use the words 'habituation' or 'habituation-like' 10 times, however, they use 'adaptation' over 100 times. It is critical to distinguish habituation from sensory adaptation (or fatigue) in this thermal reversal protocol. These processes are often confused/conflated, however, they are very different; sensory adaptation is a process that decreases how much the nervous system is activated by a repeated stimulus, therefore it can even occur outside of the nervous system. Habituation is a learning process where the nervous system responds less to a repeated stimulus, despite (at least part of the nervous system) the nervous system still being similarly activated by the stimulus. Habituation is considered an attentional process, while adaptation is due to the fatigue of sensory transduction machinery. Control experiments such as tests for dishabituation (where the application of a different stimulus causes recovery of the decremented response) or rate of spontaneous recovery (more rapid recovery after short inter-stimulus intervals) are required to determine if habituation or sensory adaptation are occurring. These experiments will allow the results to be interpreted with clarity, without them, it isn't actually clear what biological process is actually being studied.

      Thanks for the comment. As this reviewer points out, “adaptation” and “habituation” are often conflated. Many scientists (maybe not the majority though) use a less stringent definition for the word habituation, than the one presented by this reviewer. More particularly, the term habituation is used in human pain research to refer solely to the reduction of response to repeated stimuli, in the absence of a detailed assessment of the more stringent criteria mentioned here. In addition to the practice in pain research, the main reason why we steered toward ‘habituation’ from our previous publication is because it immediately conveys the idea of a response reduction, whereas ‘adaptation’ could in principle be either an up-regulation or a down-regulation of the response (again, based on various definitions). But we agree that using the word “habituation” came at the cost of triggering a confusion about the exact nature of the process, for those considering the stricter definition of the word “habituation”. In the manuscript under revision, we are changing this terminology to “adaptation”. Also following suggestions from Reviewer 2, we are strengthening the description of the protocol in the Result section and clarifying, why the adaptation phenomenon is not a ‘thermal damage’ effect or ‘fatigue’ effect in the neuro-muscular circuit controlling reversal.

      While the discrepancy between the in vitro phosphorylation experiments and the in silico predictions was discussed, the substantial discrepancy (over 85% of the substrates in the smaller in vitro dataset were not identified in the larger dataset) between the two different in vitro datasets was not discussed. This is surprising, as these approaches were quite similar, and it may indicate a measure of unreliability in the in vitro datasets (or high false negative rates).

      Thanks for the comment. This is an important aspect which we will more extensively cover in the Discussion section of the revised manuscript.

      The strong consistency of the CMK-1 recognition consensus sequences across the two in vitro dataset speaks against the unreliability of the analyses. Instead, there are a few points to highlight that explain the somewhat low degree of overlap between the two datasets, which indeed relate to the false negative rates as this reviewer suggests.

      (1) In the peptide library analysis, Trypsin cleavage prior to kinase treatment will leave a charged N-term or C- terminus and in addition remove part of the protein context required for efficient kinase recognition. This will have a variable effect across the different substrates in the peptide library, depending on the distance between the cleavage site and the phosphosite, but will not affect the native protein library. This effect increases the false negative rate in the peptide library.

      (2) The number and distribution of “available substrate phosphosites” diverge in the two libraries. Indeed, the peptide library is expected to contain a markedly larger diversity of potential CMK-1 substrate sites than the protein library (because the Trypsin digestion will reveal substrates that are normally buried in a native protein), but the depth of MS analysis is the same for the two libraries. In somewhat simplistic terms, the peptide-library analysis is prone to be saturated with abundant phosphorylated peptides, which prevent detecting all phosphosites. If the peptide analysis could have been made deeper, we would probably have increased the overlap (at the cost of increasing the number of false positive too).

      (3) We have chosen quite strict criteria and applied them separately to define each hit list; therefore, we know we have many false negatives in each list, which will naturally reduce the expected overlap.

      As we will clarify in the revised manuscript, we tend to give more trust to the protein-library dataset (since substrates are in a configuration closer to that in vivo), with those hits also present in the peptide dataset (like TAX-6 was) as the most convincing hits, as they could be validated in a second type of experiment.

      Additionally, the rationale for, and distinction between, the two separate in vitro experiments is not made clear.

      We reasoned that both substrate types have their own benefits and limitations (as discussed in the manuscript), so it was an added value to run both. We proposed that the subset of targets present in both datasets to be the most solid list of candidates. We will also reinforce our point in the revised discussion that the protein-library is likely to contain much less false positives.

      Line 207: After reporting that both tax-6 and cnb-1 mutants have high spontaneous reversals, it is not made clear why cnb-1 is not further explored in the paper. Additionally, this spontaneous reversal data should be in a supplementary figure.

      We kept the focus of the article primarily on TAX-6, because it was identified as CMK-1 target in vitro; CNB-1 was not. Moreover, we didn’t have cnb-1(gf) mutants to pursue the analysis, and we were stuck by the cnb-1(lf) constitutive high reversal rate for any further follow up. We have added a supplementary file to present the spontaneous reversals rates.

      Figure 3 -S1: This model doesn't explain why the cmk-1(gf) group and the cmk-1(gf) +cyclo A group cause enhanced response decrement (presumably by reducing the inhibition by tax-6) but the +cyclo A group (inhibited tax-6) showed weaker response decrement, as here there is even further weakened inhibition of tax-6 on this process. Also, the cmk-1(lf) +cyclo A group is labeled as constitutive habituation, however, this doesn't appear to be the case in Figure 3 (seems like a similar initial level and response decrement phenotype to wildtype).

      Thanks a lot for the comment. We are glad that the presentation of our complex dataset was clear enough to bring the reader to that level of detailed reflection and interpretation on the proposed model. To address the two points raised in this reviewer’s comment, we are modifying to the model presentation and provide additional clarifications below, where we use the term adaptation instead of habituation (as in the revised Figure):

      Regarding the first point, “why the cmk-1(gf) group and the cmk-1(gf) +cyclo A group cause enhanced response decrement … but the +cyclo A group showed weaker response decrement”. This is really a very good point, that cannot be easily explained if all the branches (arrows) in the model have the same weight or work as ON/OFF switches. We tried to convey the relative importance of the regulation effect via the thickness of the arrow lines (which we will clarified in the legend in the revised ms). The main ‘quantitative’ nuances to take into consideration here originate from 2 assuption of the model (which we are clarifying in the revised  manuscript):

      Assumption 1: the inhibitory effect of TAX-6 on the CMK-1 anti-adaptation branch and the inhibitory effect of TAX-6 on the CMK-1 pro-adaptation branch are not of the same magnitude (we have further enhanced the line thickness differences in the revised model, top left panel for wild type).

      Assumption 2: the two antagonistic direct effects of CMK-1 on adaptation are not of the same magnitude, most strikingly in the context of CMK-1(gf) mutants.

      In our model, the cyclosporin A treatment alone (bottom left panel) causes a strong boost on the CMK-1 inhibitory branch and a less marked boost on the CMK-1 activator branch (following assumption 1). This causes an imbalance between the two antagonist direct CMK-1-dependent drives, which reduces (but doesn’t fully block) adaptation. Indeed, we don’t observe a total block of adaptation with cyclosporin A in wild type, the effect being significantly milder than the totally non-adapting phenotypes seen, e.g., in TAX-6(gf) mutants. From there, the question is what happen in CMK-1(gf) background that would mask the anti-adaptation effect of Cyclosporin A? Here assumption 2 is relevant, and the CMK-1(gf) pro-adaptation direct branch is always prevalent and imbalance the regulation toward faster adaptation (the role of TAX-6 becoming negligible in the CMK-1(gf) background and ipso facto that of Cyclosporin A).

      Regarding the second point, “the cmk-1(lf) +cyclo A group is labeled as constitutive habituation”. We regret a confusing word choice in the first version of the manuscript; we intended to mean “normal habituation phenotype” but in the joint absence of antagonistic CMK-1 and TAX-6 regulatory signaling (so the regulation is not like in wild-type, but the phenotype ends up like in wild type). We are modifying the label to “normal adaptation” and will leave a note in the legend that an apparently normal adaptation phenotype seems to be the “default” situation when the two antagonistic regulatory pathways are shut off.

      More discussion of the significance of the sites of cmk-1 and tax-6 function in the neural circuit should take place. Additionally, incorporating the suspected loci of cmk-1 and tax-6 in the neural circuit into the model would be interesting (using proper hypothetical language). For example, as it seems like AFD is not required for the naïve reversal response but just its reduction, cmk-1 activity in AFD might be generating inhibition of the reversal response by AFD. It certainly would be understandable if this isn't workable, given extrasynaptic signaling and other unknowns, but it potentially could also be helpful in generating a working model for these complex interactions. For example, cmk-1 induces AIZ inhibition of AVA (AIZ is electrically coupled to AFD), and tax-6 reduces RIM activation of AVA (these neurons are also electrically coupled according to the diagram). RIM is also a neuropeptide-rich neuron, so this could allow it to interact with the cmk-1-related process(es) in AFD. Some discussion of possibilities like this could be informative.

      Thanks for the comment. These hypothetical inter-cellular communication pathways are indeed nice possibilities. On the other hand, we could envision several additional pathways. Following this helpful suggestion, we will expand the discussion of hypothetical models in the revised manuscript-

      Provide an explanation for why some of the experiments in Figure 4 have such a high N, compared to other experiments.

      The conditions with the highest n correspond to conditions which we have also used as ‘control’ condition for other type of experiments in the lab and as part of side projects, but which could be gathered for the present article. We have been working with cmk-1(lf) and tax-6(gf) mutants for many years… and the robust non-adapting phenotype was a reference point and a quality control when analyzing other non-adapting mutants.

      Because the loss of function and gain of function mutations in cmk-1 have a similar effect, it is likely that this thermosensory plasticity phenotype is sensitive to levels of cmk-1 activity. Therefore, it is not surprising that the cmk-1 promoter failed to rescue very well as these plasmid-driven rescues often result in overexpression. Given this and that the cmk-1p rescue itself was so modest, these rescue experiments are not entirely convincing (and very hard to interpret; for example, is the AFD rescue or the ASER rescue more complete? The ASER one is actually closer to the cmk-1p rescue). Given the sensitivity to cmk-1 activity levels, a degradation strategy would be more likely to deliver clear results (or perhaps even the overactivation approach used for tax-6).

      Thanks for the comment. We respectfully disagree with this reviewer’s statement “the loss of function and gain of function mutations in cmk-1 have a similar effect”. We suspect a confusion here, because our data clearly show that these two mutant types have an opposite phenotype. That being said, we interpret the weak rescue effect with cmk-1p as a probable result of overexpression or incomplete/imbalanced expression across neurons (as the promoter used might not include all the relevant regulatory regions). We dedicated considerable efforts to establish an endogenous CMK-1::degron knock in, for tissue-specific auxin-induced degradation (AID), but we were unfortunately not able to obtain consistent results. Unfortunately, the only useful data regarding CMK-1 place-of-action are the cell-specific rescue data already included in the report.

      Reviewer #2 (Public review):

      Summary:

      The reduction in a response to a specific stimulus after repeated exposures is called habituation. Alterations in habituation to noxious stimuli are associated with chronic pain in humans, however, the underlying molecular mechanisms involved are not clear. This study uses the nematode C. elegans to study genes and mechanisms that underlie habituation to a form of noxious stimuli based on heat, termed thermo-noxious stimuli. The authors previously showed that the Calcium/Calmodulin-dependent protein kinase (CMK-1) regulates thermo-nociceptive habituation in the nematode C. elegans. Although CMK-1 is a kinase with many known substrates, the downstream targets relevant for thermo-nociceptive habituation are not known. In this study, the authors use two different kinase screens to identify phosphorylation targets of CMK-1. One of the targets they identify is Calcineurin (TAX-6). The authors show that CMK-1 phosphorylates a regulatory domain of Calcineurin at a highly conserved site (S443). In a series of elegant experiments, the authors use genetic and pharmacological approaches to increase or decrease CMK-1 and Calcineurin signaling to study their effects on thermo-nociceptive habituation in C. elegans. They also combine these various approaches to study the interactions between these two signaling proteins. The authors use specific promoters to determine in which neurons CMK-1 and Calcineurin function to regulate thermo-nociceptive habituation. The authors propose a model based on their findings illustrating that CMK-1 and Calcineurin act mostly in different neurons to antagonistically regulate habituation to thermo-nociceptive stimuli in a complex manner.

      Strengths:

      (1) Given the conservation of habituation across phylogeny, identifying genes and mechanisms that underlie nociceptive habituation in C. elegans may be relevant for understanding chronic pain in humans.

      (2) The identification of canonical CaM Kinase phosphorylation motifs in the substrates identified in the CMK-1 substrate screen validates the screen.

      (3) The use of loss and gain of function approaches to study the effects of CMK-1 and Calcineurin on thermo-nociceptive responses and habituation is elegant.

      (4) The ability to determine the cellular place of action of CMK-1 and Calcineurin using neuron-specific promoters in the nematode is a clear strength of the genetic model system.

      Thanks a lot for these positive remarks.

      Weaknesses:

      (1) The manuscript begins by identifying Calcineurin as a direct substrate of CMK-1 but ends by showing that CMK-1 and Calcineurin mostly act in different neurons to regulate nociceptive habituation which disrupts the logical flow of the manuscript.

      We understand this point and we have carefully considered and (re-considered) the way to articulate the report. However, we could not present the story much differently as we would have no justification to investigate the role of TAX-6 and its interaction with CMK-1, if we would not have first identified it a phospho-target in vitro. Carefully considering this point, we found that the abstract of the first manuscript version was probably too cursory and susceptible to trigger wrong expectations among readers. We will extensively revise the abstract to clarify this point. Furthermore, we will reinforce this point in the last paragraph of the introduction.

      (2) The physiological relevance of CMK-1 phosphorylation of Calcineurin is not clear.

      We do agree and will explicitly discuss this aspect in the revised Discussion section, and make is also clear from the abstract on.

      (3) It is not clear if Calcineurin is already a known substrate of CaM Kinases in other systems or if this finding is new.

      We are not aware of any studies having shown Calcineurin is a direct target of CaM kinase I. But it was found to be substrate of CaM kinase II as well as of other kinases, as we explicitly presented in the discussion section. We will complement the text mentioning we are not aware of Calcineurin having so far been reported to by a CaM kinase I substrate.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper by Lee and Ouellette explores the role of cyclic-d-AMP in chlamydial developmental progression. The manuscript uses a collection of different recombinant plasmids to up- and down-regulate cdAMP production, and then uses classical molecular and microbiological approaches to examine the effects of expression induction in each of the transformed strains.

      Strengths:

      This laboratory is a leader in the use of molecular genetic manipulation in Chlamydia trachomatis and their efforts to make such efforts mainstream is commendable. Overall, the model described and defended by these investigators is thorough and significant.

      Weaknesses:

      The biggest weakness in the document is their reliance on quantitative data that is statistically not significant, in the interpretation of results. These challenges can be addressed in a revision by the authors.

      Thank you for these comments. We have generated new data, which we hope the reviewer will find more compelling. These will be included in a revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This manuscript describes the role of the production of c-di-AMP on the chlamydial developmental cycle. Chlamydia are obligate intracellular bacterial pathogens that rely on eukaryotic host cells for growth. The chlamydial life cycle depends on a cell form developmental cycle that produces phenotypically distinct cell forms with specific roles during the infectious cycle. The RB cell form replicates amplifying chlamydia numbers while the EB cell form mediates entry into new host cells disseminating the infection to new hosts. Regulation of cell form development is a critical question in chlamydia biology and pathogenesis. Chlamydia must balance amplification (RB numbers) and dissemination (EB numbers) to maximize survival in its infection niche. The main findings In this manuscript show that overexpression of the dacA-ybbR operon results in increased production of c-di-AMP and early expression of the transitionary gene hctA and late gene omcB. The authors also knocked down the expression of the dacA-ybbR operon and reported a reduction in the expression of both hctA and omcB. The authors conclude with a model suggesting the amount of c-di-AMP determines the fate of the RB, continued replication, or EB conversion. Overall, this is a very intriguing study with important implications however the data is very preliminary and the model is very rudimentary and is not well supported by the data.

      Thank you for your comments. Chlamydia is not an easy experimental system, but we will do our best to address the reviewer’s concerns in a revised submission.

      Describing the significance of the findings:

      The findings are important and point to very exciting new avenues to explore the important questions in chlamydial cell form development. The authors present a model that is not quantified and does not match the data well.

      Describing the strength of evidence:

      The evidence presented is incomplete. The authors do a nice job of showing that overexpression of the dacA-ybbR operon increases c-di-AMP and that knockdown or overexpression of the catalytically dead DacA protein decreases the c-di-AMP levels. However, the effects on the developmental cycle and how they fit the proposed model are less well supported.

      dacA-ybbR ectopic expression:

      For the dacA-ybbR ectopic expression experiments they show that hctA is induced early but there is no significant change in OmcB gene expression. This is problematic as when RBs are treated with Pen (this paper) and (DOI 10.1128/MSYSTEMS.00689-20) hctA is expressed in the aberrant cell forms but these forms do not go on to express the late genes suggesting stress events can result in changes in the developmental expression kinetic profile. The RNA-seq data are a little reassuring as many of the EB/Late genes were shown to be upregulated by dacA-ybbR ectopic expression in this assay.

      As the reviewer notes, we also generated RNAseq data, which validates that late gene transcripts (including sigma28 and sigma54 regulated genes) are statistically significantly increased earlier in the developmental cycle in parallel to increased c-di-AMP levels. The lack of statistical significance in the RT-qPCR data for omcB, which shows a trend of higher transcripts, is less concerning given the statistically significantly RNAseq dataset. We have reported the data from three replicates for the RT-qPCR and do not think it would be worthwhile to attempt more replicates in an attempt to “achieve” statistical significance.

      The authors also demonstrate that this ectopic expression reduces the overall growth rate but produces EBs earlier in the cycle but overall fewer EBs late in the cycle. This observation matches their model well as when RBs convert early there is less amplification of cell numbers.

      dacA knockdown and dacA(mut)

      The authors showed that dacA knockdown and ectopic expression of the dacA mutant both reduced the amount of c-di-AMP. The authors show that for both of these conditions, hctA and omcB expression is reduced at 24 hpi. This was also partially supported by the RNA-seq data for the dacA knockdown as many of the late genes were downregulated. However, a shift to an increase in RB-only genes was not readily evident. This is maybe not surprising as the chlamydial inclusion would just have an increase in RB forms and changes in cell form ratios would need more time points.

      Thank you for this comment. We agree that it is not surprising given the shift in cell forms. The reduction in hctA transcripts argues against a stress state as noted above by the reviewer, and the RNAseq data from dacA-KD conditions indicates at least that secondary differentiation has been delayed. We will try to clarify this in a revision.

      Interestingly, the overall growth rate appears to differ in these two conditions, growth is unaffected by dacA knockdown but is significantly affected by the expression of the mutant. In both cases, EB production is repressed. The overall model they present does not support this data well as if RBs were blocked from converting into EBs then the growth rate should increase as the RB cell form replicates while the EB cell form does not. This should shift the population to replicating cells.

      We agree that it seems that perturbing c-di-AMP production, whether by knockdown or overexpressing the mutant DacA(D164N), has an overall negative impact on chlamydial growth. We have generated new data, which we think will address this. These new data will be included in a revised manuscript.

      Overall this is a very intriguing finding that will require more gene expression data, phenotypic characterization of cell forms, and better quantitative models to fully interpret these findings.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors Eapen et al. investigated the peptide inhibitors of Cdc20. They applied a rational design approach, substituting residues found in the D-box consensus sequences to better align the peptides with the Cdc20-degron interface. In the process, the authors designed and tested a series of more potent binders, including ones that contain unnatural amino acids, and verified binding modes by elucidating the Cdc-20-peptide structures. The authors further showed that these peptides can engage with Cdc20 in the cellular context, and can inhibit APC/C<sup>Cdc20</sup> ubiquitination activity. Finally, the authors demonstrated that these peptides could be used as portable degron motifs that drive the degradation of a fused fluorescent protein.

      Strengths:

      This manuscript is clear and straightforward to follow. The investigation of different peptide variations was comprehensive and well-executed. This work provided the groundwork for the development of peptide drug modalities to inhibit degradation or apply peptides as portable motifs to achieve targeted degradation. Both of which are impactful.

      Weaknesses:

      A few minor comments:

      (1) In my opinion, more attention to the solubility issue needs to be discussed and/or tested. On page 10, what is the solubility of D2 before a modification was made? The authors mentioned that position 2 is likely solvent exposed, it is not immediately clear to me why the mutation made was from one hydrophobic residue to another. What was the level of improvement in solubility? Are there any affinity data associated with the peptide that differ with D2 only at position 2?

      The reviewer is correct that we have not done any detailed solubility characterisation; we refer only to observations rather than quantitative analysis. We wrote that we reverted from Leu to Ala due to solubility - we will clarify this statement to say that that we reverted to Ala, as it was the residue present in D1, for which we observed a measurable affinity by SPR and saw a concentration-dependent response in the thermal shift analysis. We do not have any peptides or affinity data that explore single-site mutations with the parental peptide of D2. D2 is included in the paper because of its link to the consensus D-box sequence and thus was the logical path to the investigations into positions 3 and 7 that come later in the manuscript.

      (2) I'm not entirely convinced that the D19 density not observed in the crystal structure was due to crystal packing. This peptide is peculiar as it also did not induce any thermal stabilization of Cdc20 in the cellular thermal shift assay. Perhaps the binding of this peptide could be investigated in more detail (i.e., NMR?) Or at least more explanation could be provided.

      This section will be clarified. The lack of observed density was likely due to the relatively low affinity of D19 and also to the lack of binding of the three C-terminal residues in the crystal, and consequently it has a further reduced affinity. The current wording in the manuscript puts greater emphasis on this second aspect being a D19-specific issue, even though it applies to all four soaked peptides. The extent of peptide-induced thermal stabilisations observed by TSA and CETSA is different, with the latter experiment consistently showing smaller shifts. This observation may be due to the more complex medium (cell lysate vs. purified protein) and/or different concentrations of the proteins in solution. In the CETSA, we over-expressed a HiBiT-tagged Cdc20, which is present in addition to any endogenously expressed Cdc20. Although we did not investigate it, the near identical D-box binding sites on Cdc20 and Cdh1 would suggest that there will be cross-specificity, which could further influence the CETSA experiments.

      Reviewer #2 (Public review):

      Summary:

      The authors took a well-characterised (partly by them), important E3 ligase, in the anaphase-promoting complex, and decided to design peptide inhibitors for it based on one of the known interacting motifs (called D-box) from its substrates. They incorporate unnatural amino acids to better occupy the interaction site, improve the binding affinity, and lay foundations for future therapeutics - maybe combining their findings with additional target sites.

      Strengths:

      The paper is mostly strengths - a logical progression of experiments, very well explained and carried out to a high standard. The authors use a carefully chosen variety of techniques (including X-ray crystallography, multiple binding analyses, and ubiquitination assays) to verify their findings - and they impressively achieve their goals by honing in on tight-binders.

      Weaknesses:

      Some things are not explained fully and it would be useful to have some clarification. Why did the authors decide to model their inhibitors on the D-box motif and not the other two SLiMs that they describe?

      For completeness, in addition to the D-box we did originally construct peptides based on the ABBA and KEN-box motifs, but they did not show any shift in melting temperature of cdc20 in the thermal shift assay whereas the D-box peptides did; consequently, we focused our efforts on the D-box peptides. Moreover, there is much evidence from the literature that points to the unique importance of the D-box motif in mediating productive interactions of substrates with the APC/C (i.e. those leading to polyubiquitination & degradation). One of the clearest examples is a study by Mark Hall’s lab (described in Qin et al. 2016), which tested the degradation of 15 substrates of yeast APC/C in strains carrying alleles of Cdh1 in which the docking sites for D-box, KEN or ABBA were mutated. They observed that whereas degradation of all 15 substrates depended on D-box binding, only a subset required the KEN binding site on Cdh1 and only one required the ABBA binding site. A more recent study from David Morgan’s lab (Hartooni et al. 2022) looking at binding affinities of different degron peptides concluded that KEN motif has very low affinity for Cdc20 and is unlikely to mediate degradation of APC/C-Cdc20 substrates. Engagement of substrate with the D-box receptor is therefore the most critical event mediating APC/C activity and the interaction that needs to be blocked for most effective inhibition of substrate degradation.

      What exactly do they mean when they say their 'observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast 'pseudo-substrate' inhibitor Acm1, acts to impede polyubiquitination of the bound protein'? It's an interesting thing to think about, and probably the paper they cite explains it more but I would like to know without having to find that other paper.

      Interesting results from a number of labs (Choi et al. 2008, Enquist-Newman et al. 2008, Burton et al. 2011, Qin et al. 2019) have shown that mutation of degron SLiMs in Acm1 that weaken interaction with the APC/C have the unexpected consequence of converting Acm1 from APC/C inhibitor to APC/C substrate. A necessary conclusion of these studies is that the outcome of degron binding (i.e. whether the binder functions as substrate or inhibitor) depends on factors other than D-box affinity and that D-box affinity can counteract them. One idea is that if a binder interacts too tightly, this removes some flexibility required for the polyubiquitination process. The most recent study on this question (Qin et al.2019) specifically pins the explanation for the inhibitory function of the high affinity D-box in Acm1 on its ‘D-box Extension’ (i.e. residues 8-12) preventing interaction with APC10. In our current study, the binding affinity of peptides is measured against Cdc20. In cellular assays however, the D-box must also engage APC10 for degradation to occur. It may be that the peptide binding most strongly to the D-box pocket on Cdc20 is less able to bind to APC10 and therefore less effective in triggering APC10-dependent steps in the polyubiquitination pathway. The important Hartooni et al. paper from David Morgan’s lab confirms that even though the binding of D-box residues to APC10 is very weak on its own, it can contribute 100X increase in affinity of a peptide by adding cooperativity to the interaction of D-box with co-activator.

      After further reading on this topic, we will modify the relevant piece of text from:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast ‘pseudo-substrate’ inhibitor Acm1, acts to impede polyubiquitination of the bound protein (Qin et al. 2019). Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. As shown in Qin et al., mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Qin et al. 2019). Overall, our results support the conclusions that all the D-box peptides engage productively with the APC/C and that the highest affinity interactors act as inhibitors rather than functional degrons of APC/C.”

      to:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with conclusions from other studies that affinity of degron binding does not necessarily correlate with efficiency of degradation. Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. A number of studies of a yeast ‘pseudo-substrate’ inhibitor Acm1, have shown that mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Choi et al. 2008, Enquist-Newman et al. 2008, Burton et al. 2011) through a mechanism that governs recruitment of APC10 (Qin et al. 2019). Our study does not consider the contribution of APC10 to binding of our peptides to APC/C<sup>Cdc20</sup> complex, but since there is strong cooperativity provided by this additional interaction (Hartooni et al. 2022) we propose this as the critical factor in determining the ability of the different peptides to mediate degradation of associated mNeon.”

      Re Figure 6 and the fact that we did look at peptide binding in cells, these experiments were done in unsynchronised cells, so most Cdc20 would not be bound to APC/C.

      Reviewer #3 (Public review):

      Summary:

      Eapen and coworkers use a rational design approach to generate new peptide-inspired ligands at the D-box interface of cdc20. These new peptides serve as new starting points for blocking APC/C in the context of cancer, as well as manipulating APC/C for targeted protein degradation therapeutic approaches.

      Strengths:

      The characterization of new peptide-like ligands is generally solid and multifaceted, including binding assays, thermal stability enhancement in vitro and in cells, X-ray crystallography, and degradation assays.

      Weaknesses:

      One important finding of the study is that the strongest binders did not correlate with the fastest degradation in a cellular assay, but explanations for this behavior were not supported experimentally. Some minor issues regarding experimental replicates and details were also noted.

      Interesting results from a number of labs (Choi et al. 2008, Enquist-Newman et al. 2008, Burton et al. 2011, Qin et al. 2019) have shown that mutation of degron SLiMs in Acm1 that weaken interaction with the APC/C have the unexpected consequence of converting Acm1 from APC/C inhibitor to APC/C substrate. A necessary conclusion of these studies is that the outcome of degron binding (i.e. whether the binder functions as substrate or inhibitor) depends on factors other than D-box affinity and that D-box affinity can counteract them. One idea is that if a binder interacts too tightly, this removes some flexibility required for the polyubiquitination process. The most recent study on this question (Qin et al.2019) specifically pins the explanation for the inhibitory function of the high affinity D-box in Acm1 on its ‘D-box Extension’ (i.e. residues 8-12) preventing interaction with APC10. In our current study, the binding affinity of peptides is measured against Cdc20. In cellular assays however, the D-box must also engage APC10 for degradation to occur. It may be that the peptide binding most strongly to the D-box pocket on Cdc20 is less able to bind to APC10 and therefore less effective in triggering APC10-dependent steps in the polyubiquitination pathway. The important Hartooni et al. paper from David Morgan’s lab confirms that even though the binding of D-box residues to APC10 is very weak on its own, it can contribute 100X increase in affinity of a peptide by adding cooperativity to the interaction of D-box with co-activator.

      After further reading on this topic, we will modify the relevant piece of text from:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast ‘pseudo-substrate’ inhibitor Acm1, acts to impede polyubiquitination of the bound protein (Qin et al. 2019). Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. As shown in Qin et al., mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Qin et al. 2019). Overall, our results support the conclusions that all the D-box peptides engage productively with the APC/C and that the highest affinity interactors act as inhibitors rather than functional degrons of APC/C.”

      to:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with conclusions from other studies that affinity of degron binding does not necessarily correlate with efficiency of degradation. Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. A number of studies of a yeast ‘pseudo-substrate’ inhibitor Acm1, have shown that mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Choi et al. 2008, Enquist-Newman et al. 2008, Burton et al. 2011) through a mechanism that governs recruitment of APC10 (Qin et al. 2019). Our study does not consider the contribution of APC10 to binding of our peptides to APC/C<sup>Cdc20</sup> complex, but since there is strong cooperativity provided by this additional interaction (Hartooni et al. 2022) we propose this as the critical factor in determining the ability of the different peptides to mediate degradation of associated mNeon.”

      Re Figure 6 and the fact that we did look at peptide binding in cells, these experiments were done in unsynchronised cells, so most Cdc20 would not be bound to APC/C.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      In the manuscript entitled "A VgrG2b fragment cleaved by caspase-11/4 promotes Pseudomonas aeruginosa infection through suppressing the NLRP3 inflammasome", Qian et al. found an activation of the non-canonical inflammasome, but not the downstream NLRP3 inflammasome, during the infection of macrophage by P. aeruginosa, which is in sharp contrast to that by E. coli (Figure 1). In realizing that the suppression of the NLRP3 inflammasome is Caspase-11 dependent, the authors performed a screening among P. aeruginosa proteins and identified VgrG2b being a major substrate of Caspase-11 (Figure 2). Next, the authors mapped the cleavage site on VgrG2b to D883, and demonstrated that cleavage of VgrG2b by Caspase-11 is essential for the suppression of the NLRP3 inflammasome (Figure 3). Furthermore, they found that a binding between the C-terminal fragment of the cleaved VgrG2b and NLRP3 existed (Figure 4), which was then proved to block the association of NLRP3 with NEK7 (Figure 5). Finally, the authors demonstrated that blocking of VgrG2b cleavage, by either mutation of the D883 or administration of a designed peptide, effectively improved the survival rate of the P. aeruginosa-infected mice (Figure 6). This is a well-designed and executed study, with the results clearly presented and stated.

      We are deeply grateful for your recognition and positive comments on our article. Thank you for your effort and dedication in reviewing our manuscript. We are honored to have the opportunity to receive feedback form professional reviewers like you.

      Reviewer #2 (Public review):

      Summary:

      In their manuscript, Quian and colleagues identified a novel mechanism by which Pseudomonas control inflammatory responses upon inflammasome activation. They identified a caspase-11 substrate (VgrG2b) which, upon cleavage, binds and inhibits the NLRP3 to reduce the production of pro-inflammatory cytokines. This is a unique mechanism that allows for the tailoring of the innate immune response upon bacterial recognition.

      Strengths:

      The authors are presenting here a novel conceptual framework in host-pathogen interactions. Their work is supported by a range of approaches (biochemical, cellular immunology, microbiology, animal models), and their conclusions are supported by multiple independent evidences. The work is likely to have an important impact on the innate immunity field and host-pathogen interactions field and may guide the development of novel inhibitors.

      Weaknesses:

      Although quite exhaustive, a few of the authors' conclusions are not fully supported (e.g., caspase-11 directly cleaving VgrG2b, the unique affinity of VgrG2b-C for NLRP3) and would require complementary approaches to validate their findings fully. This is minimal.

      We sincerely appreciate your professional review and kind appraisal on our article. These comments are really valuable and helpful for improving our manuscript. According to your suggestions, we have made some modifications and added some supplemental data to make our results more convincing. The detailed responses are listed point-by-point below.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I really enjoyed reading your manuscript and believe this is an important conceptual advance for the innate immunity field. Your conclusions are in general well-supported, you used a range of methodologies and the quality of the presentation of the results is excellent. I have a few comments here that I hope will contribute to improving an already great piece of work:

      Elements to be improved:

      Line 109-110: the author claims that the release of mito DNA is required for NLRP3 activation. ' I would support this with a reference. I believe this may not be fully agreed on in the field. Cleavage of GSDMD by caspase4/11 is required, however. A few groups showed the required for K+ efflux in this context (Broz, Brough, Schroder labs).

      It is a very good suggestion. Indeed, there is still controversy over this issue, and we have revised our text to make our manuscript more neutral. We have also cited these important references to help readers understand where the controversy lies.

      I disagree that OMV _+ Pseudomonas is a natural way to simulate natural infection. I would argue it is even quite artificial. Pseudomonas alone should be sufficient to generate OMV without the addition of extra OMVs.

      This is a good point. Before we infected BMDM cells with PAO1 stains, we had washed with PBS for at least three times to exclude the interference of contents in the LB medium. Moreover, in our experimental system, the time for co-incubation between bacteria and host cells is very limited. During this time, the amount of OMV secreted by bacteria may not reach the level of activating inflammasomes, and this concentration is also relatively low compared to the OMV concentration secreted by bacteria under physiological conditions. Therefore, we added extra OMVs to simulate the chronic infection condition in a short time.

      The co-expression of caspase with VrG2b and assume the cleavage is direct. However, the work is lacking work with recombinant proteases (commercially available), which would strengthen their conclusions regarding the ability of caspase-4/11 to directly cleave the protein. Based on the recognised sequence (DXXD), I believe caspase-4/11 is not directly responsible for this. These caspases were shown to cleave caspase-3/7, which can cleave such sequence (DXXX). As caspase-4 can cleave caspase-3/7 in their lysates, I would recommend testing this hypothesis to further strengthen the authors' conclusions.

      These are very good points. As data shown on Fig. 3F, we used recombinant VgrG2b and caspase-11 p22/p10 to prove the direct cleavage of caspase-11. To exclude the effect of caspase-3/7, we treated cells with inhibitors of caspase-3/7 and found that caspase-3/7 are not the executor for VgrG2b cleavage (new Fig. S3E, F).

      The affinity between caspase-11 and VgrG2b-C is puzzling as one would normally expect the caspase and its substrates to quickly dissociate. Does VgrG2b-C impact the activity of caspase-4/11 upon cleavage? Can VrgG2b-C also interact with p20/p10 caspase-1? I believe the authors only tried the full-length version of caspase-1 in supplemental.

      These are very good questions. We agree enzymes and substrates only have temporary interactions normally, which are not easy to catch. However, we used mutant caspase-11(C254A) inhibiting its cleavage of substrates, so that the combination of VgrG2b or VgrG2b-C with caspase-11(C254A) could be detected. This mutation is frequently used in immunoprecipitation (Wang K, Cell, 2020). We had tested the impact of VgrG2b-C on the enzyme activity of caspase-4/11, and showed that VgrG2b-C did not affect the cleavage of GSDMD by caspase-11 (Fig. 5C). We also tried the caspase-1 p20/p10, also found that they had no interaction with VgrG2b-C (new Fig. S4G).

      Can more details be provided about the generation of recombinant caspase-11, VgrG2b-C, and other recombinant proteins tested?

      Thanks for your suggestion, we have revised our description in the new version.

      The authors assumed that VgrG2C-b does not impact other inflammasome (such as NLRC4) based on their X-gal assay. I would also confirm this with a functional assay (e.g., transfection of flagellin in macrophages).

      This is a good suggestion. We have tested the impact of VgrG2b-C on NLRC4 inflammasome and found that VgrG2b-C does not affect NLRC4 activation with the transfection of flagellin (new Fig. S5K).

      Often, representative experiments are shown. For Elisa, cell death assays and quantitative experiments, pooling the data would be appropriate. Appropriate statistical analysis should be conducted based on this as well.

      Thanks for your suggestions. In the revised manuscript, we pooled the data of three independent experiments for our analysis of ELISA and cell death assays. We also added descriptions of statistical analysis in our revised text.

      VgrG2b has been suggested to be a metalloprotease (PMID: 31577948). Is its protease activity required for the phenomenon observed?

      This is a very good question. The active region of metalloprotease VgrG2b-C is aa932-941, especially the core sequence of HEXXH. Structure data also confirms that H935, E936, H939, E983 play key roles in the coordination with Zn ions (Sana TG, mBio, 2015; Wood TE, Cell reports, 2019). In our study, the cleavage of VgrG2b by caspase-4/11 depends on the recognition of tetrapeptide sequence in aa880-883. We added data showing that the cleavage of VgrG2b and the inhibition of NLRP3 inflammasome were not affected by VgrG2b enzymatic activity (new Fig. S4I-K).

      What is the affinity of VgrG2b-C for NLRP3? Is it higher than NEK7? A quantitative experiment would be required to claim this.

      This is a great point of view. We added the quantitative data certifying that VgrG2b-C has higher affinity with NLRP3 compared with NEK7 in the revised manuscript (326 nM VS 681 nM).

      The Material and Method section is a bit light and would benefit from adding more information (e.g. cell density, microscopy details, number of cells imaged, etc).

      Thanks for your suggestion. We have added more details in the Material and Method section in revised manuscript.

    1. Author response:

      We thank the reviewers for their concise and detailed summaries, and appreciate the constructive feedback on the article’s strengths and weaknesses. In response, we plan to strengthen our work in a revised version by presenting the model assumptions for the electrocyte more explicitly and further elaborate on the generalisability of the results to other cell types with different ion channels including calcium and chloride.

      Experimental work is beyond the scope of our modelling-based study. However, we would like our work to serve as a framework for future experimental studies into the role of the electrogenic pump current (and its possible compensatory currents) in disease, and its role in evolution of highly specialised excitable cells (such as electrocytes).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: 

      The authors demonstrate that, while the loss of Ezrin increases lysosomal biogenesis and function, its presence is required for the specific endocytosis of EGFR. Upon further investigation, the authors reveal that Ezrin is a crucial intermediary protein that links EGFR to AKT, leading to the phosphorylation and inhibition of TSC. TSC is a critical negative regulator of the mTORC1 complex, which is dysregulated in various diseases, making their findings a valuable addition to multiple fields of study. Their cell signaling findings are translatable to an in vivo Medaka fish model and suggest that Ezrin may play a crucial role in retinal degeneration.

      Strengths: 

      Giamundo, Intartaglia, et al. utilized unbiased proteomic and transcriptomic screens in Ezrin KO cells to investigate the mechanistic function of Ezrin in lysosome and cell signaling pathways. The authors' findings are consistent with past literature demonstrating Ezrin's role in the EGFR and mTORC1 signaling pathways. They used several cell lines, small molecule inhibitors, and cellular and in vivo knockout models to validate signaling changes through biochemical and microscopy assays. Their use of multiple advanced microscopy techniques is also impressive.

      We are grateful to the Editor and the Reviewers for their important and constructive comments, which amended us to improve our manuscript. We have now carried out new experiments and analyses to further support our findings.

      Weaknesses: 

      While the authors demonstrated activation of TSC1 (lysosomal accumulation) and inactivation of Akt (decreased phosphorylation in TSC1), as well as decreased mTORC1 signaling in Ezrin knockout cells, direct experiments showing the rescue of mTORC1 activity by AKT and TSC1 mutants are required to confirm the linear signaling pathway and establish Ezrin as a mediator of EGFR-AKTTSC1-mTORC1 signaling. Although the authors presented representative images from advanced microscopy techniques to support their claims, there is insufficient quantification of these experiments. Additionally, several immunoblots in the manuscript lack vital loading controls, such as input lanes for immunoprecipitations and loading controls for western blots.

      We wish to thank the Reviewer for his/her important and constructive comments on our manuscript and to consider that our study provides new information for understanding the mechanism regulating TSC/mTORC1 pathway. We have now extensively revised the manuscript according to his/her suggestions. Indeed, to expand on the evidence demonstrating Ezrin as a mediator of EGFR-AKTTSC1-mTORC1 signaling, the revised manuscript includes quantification of all advanced microscopy images, rescue experiments demonstrating the role of Ezrin in AKT/TSC/mTORC1 molecular network, and controls for WBs and immunoprecipitations.

      Reviewer #2 (Public Review):

      Summary: 

      The authors begin with the stated goal of gaining insight into the known repression of autophagy by Ezrin, a major membrane-actin linker that assembles signaling complexes on membranes. RNA and protein expression analysis is consistent with upregulation of lysosomal proteins in Ezrin-deficient MEFs, which the authors confirm by immunostaining and western blotting for lysosomal markers. Expression analysis also implicates EGF signaling as being altered downstream of Ezrin loss, and the authors demonstrate that Ezrin promotes relocalization of EGFR from the plasma membrane to endosomes. Ezrin loss impacts downstream MAPK/Akt/mTORC1 signaling, although the mechanistic links remain unclear. An Ezrin mutant Medaka fish line was then generated to test Ezrin's role in retinal cells, which are known to be sensitive to changes in autophagy regulation. Phenotypes in this model appear generally consistent with observations made in cultured cells, though mild overall. 

      Strengths: 

      Data on the impact of Ezrin-loss on relocalization of EGFR from the plasma membrane are extensive, and thoroughly demonstrate that Ezrin is required for EGFR internalization in response to EGF. 

      A new Ezrin-deficient in vivo model (Medaka fish) is generated.

      Strong data demonstrates that Ezrin loss suppresses Akt signaling. Ezrin loss also clearly suppresses mTORC1 signaling in cell culture, although examination of mTORC1 activity is notably missing in Ezrin-deficient fish. 

      We thank the Reviewer for the recognition of our study and apologize for the insufficient evidence reported in the previous version of the manuscript. As requested by the Reviewer, we considerably expanded the number of experiments to support EZRIN/EGFR/TSC molecular network in regulating autophagy pathway in the revised manuscript. Furthermore, following the Reviewer’s comment we have expanded the interpretation of our findings in the "Discussion” section. We hope the new version of our manuscript will satisfy the Reviewer’s worries.

      Weaknesses: 

      LC3 is used as a readout of autophagy, however the lipidated/unlipidated LC3 ratio generally does not appear to change, thus there does not appear to be evidence that Ezrin loss is affecting autophagy in this study. 

      We certainly agree with the Reviewer on the importance of this issue and apologize for the lack of clarity. Ezrin is an already widely characterized protein participating autophagy pathway. Several studies, including our previous studies, demonstrated that both silencing and pharmacological inhibition of Ezrin may promote autophagy by promoting activation of TFEB, in part through the TRPML1-calcineurin signaling pathway (Naso et al 2020; Intartaglia et al 2022; Lou et al 2024). However, a full elucidation on how Ezrin controls autophagy is still not unknown. As suggested by the Reviewer, to reinforce our data, we have now fixed this inaccuracy by better elucidating this aspect in the revised manuscript. Accordingly, we have monitored the autophagic flux and LC3 expression level following the guidelines for the use and interpretation of assays for monitoring autophagy (4th edition) by Klionsky et al. 2021. The data presented in the new Figure supplement 1 now better support the notion that depletion of Ezrin increases autophagic flux. We hope the new version of our manuscript will satisfy the Reviewer’s worries.

      The conclusion is drawn that Ezrin loss suppresses EGF signaling, however this is complicated by a strong increase in phosphorylation of the p38 MAPK substrate MK2. Without additional characterization of MAPK and Erk signaling, the effect of Ezrin loss remains unclear.  Causative conclusions between effects on MAPK, Akt, and mTORC1 signaling are frequently drawn, but the data only demonstrate correlations. For example, many signaling pathways can activate mTORC1 including MAPK/Erk, thus reduced mTORC1 activity upon Ezrin-loss cannot currently be attributed to reduced Akt signaling. Similarly, other kinases can phosphorylate TSC2 at the sites examined here, so the conclusion cannot be drawn that Ezrin-loss causes a reduction in Akt-mediated TSC2 phosphorylation.

      We agree with the Reviewer that this is an interesting and important question. However, we respectfully disagree with the Reviewer and feel that addressing this point by additional studies on both MAPK and ERK pathways, as the Reviewer suggests, is outside the scope of this manuscript. We therefore prefer to address these questions in future studies. However, following the Reviewer’s comment we have expanded the interpretation of our findings in the "Discussion” section. We hope the new version of our manuscript will satisfy the Reviewer’s worries.

      In Figure 7, the conclusion cannot be drawn that retinal degeneration results from aberrant EGFR signaling.

      We certainly agree with the Reviewer on the importance of this issue. We now fixed this inaccuracy by adding TUNEL staining that showed the retinal degeneration in Ezrin KO medaka fish. The results of these assays are described in the Results section and documented in revised Figure 7, panels H.

      It is unclear why TSC1 is highlighted in the title, as there does not appear to be any specific regulation of TSC1 here. 

      We modified the title accordingly

      In Figure 1 the conclusion is drawn that there is an increase in lysosome number with Ezrin KO, however it does not appear that the current analysis can distinguish an increased number from increased lysosome size or activity. Similarly, conclusions about increased lysosome "biogenesis" could instead reflect decreased turnover.

      Following this Reviewer’s observation, we changed the text according to his/her suggestion.

      Immunoprecipitation data for a role for Ezrin as a signaling scaffold appear minimal and seem to lack important controls.

      We apologize for these inaccuracies. We have now carried out new experiments to further support our findings. Moreover, all blots were changed for better exposed images. In the revised Figures the controls were showed.

      In Figure 3A it seems difficult to conclude that EGFR dimerization is reduced since the whole blot, including the background between lanes, is lighter on that side.

      We now fixed this inaccuracy. The blots were changed for better exposed images in revised Figure 3, panel A. and quantified

      In Figure 6C specificity controls for the TSC1 and TSC2 antibodies are not included but seem necessary since their localization patterns appear very different from each other in WT cells.

      We apologize because we have created some confusion. We have now emended this mistake and revised all panels in Figure 6C (now Figure 6D) for consistency between figures and text. Concerning the specificity of TSC1 and TSC2 antibodies and staining, indeed, antibodies labelling was showing the ordinary pattern from TSC in the cells as stated in Menon et al. 2014. We would like to point out that the antibodies are the same indicated in Menon et al. 2014 and our data are not only based on TSC1 and TSC2 staining but on a considerable number of in vivo and in vitro experiments in which many and different markers were used by performing several complementary approaches (i.e. immunofluorescence, western blot analysis, Omics, etc.)

      Menon S, Dibble CC, Talbott G, Hoxhaj G, Valvezan AJ, Takahashi H, Cantley LC, Manning BD. Spatial control of the TSC complex integrates insulin and nutrient regulation of mTORC1 at the lysosome. Cell. 2014 Feb 13;156(4):771-85.

      In Figure 7 the signaling effects in Ezrin-deficient fish are mild compared to cultured cells, and effects on mTORC1 are not examined. Further data on the retinal cell phenotypes would strengthen the conclusions.

      We thank the Reviewer for his/her comment. We have now fixed this inaccuracy in the revised manuscript. We added the analysis for p4EBP1 (S65), a mTORC1 substrate Figure 7 panel D. 

      In Figure 7F there appears to be more EGFR throughout the cell, so it is difficult to conclude that more EGFR at the PM in Ezrin-/- fish means reduced internalization. 

      We agree with the Reviewer that it is an important question that helped us to improve the quality of the data presented. As correctly noted by the Reviewer, EGFR protein level is increased due to EZRIN deletion. This is evident in Figure 7 panel F, in line with both proteomic analysis and in vitro experiments (Figure 2I; Figure 3E; Figure 5C). We also agree that the increase of EGFR protein level could strength the background of immunofluorescence. Therefore, to better represent the EGFR membrane translocation on flat mount RPE from medaka lines, we add a highlighting box showing it in both WT and KO medaka line in the revised Figure 7 panel F.

      Reviewer #3 (Public Review): 

      Summary: 

      In this study, the authors have attempted to demonstrate a critical role for the cytoskeletal scaffold protein Ezrin, in the upstream regulation of EGFR/AKT/MTOR signaling. They show that in the absence of Ezrin, ligand-induced EGFR trafficking and activation at the endosomes is perturbed, with decreased endosomal recruitment of the TSC complex, and a corresponding decrease in AKT/MTOR signaling. 

      Strengths: 

      The authors have used a combination of novel imaging techniques, as well as conventional proteomic and biochemical assays to substantiate their findings. The findings expand our understanding of the upstream regulators of the EGFR/AKT MTOR signaling and lysosomal biogenesis, appear to be conserved in multiple species, and may have important implications for the pathogenesis and treatment of diseases involving endo-lysosomal function, such as diabetes and cancer, as well as neuro-degenerative diseases like macular degeneration. Furthermore, pharmacological targeting of Ezrin could potentially be utilized in diseases with defective TFEB/TFE3 functions like LSDs. While a majority of the findings appear to support the hypotheses, there are substantial gaps in the findings that could be better addressed. Since Ezrin appears to directly regulate MTOR activity, the effects of Ezrin KO on MTOR-regulated, TFEB/TFE3 -driven lysosomal function should be explored more thoroughly. Similarly, a more convincing analysis of autophagic flux should be carried out. Additionally, many immunoblots lack key controls (Control IgG in co-IPs) and many others merit repetition to either improve upon the quality of the existing data, validate the findings using orthogonal approaches, or provide a more rigorous quantitative assessment of the findings, as highlighted in the recommendation for authors. 

      We thank the Reviewer for the recognition of our study and apologize for the inaccuracies previously. We also greatly appreciate the efforts the reviewer went through with his/her support and help for the improvement of our manuscript. We considerably expanded the number of experiments to support EZRIN/EGFR/AKT network in controlling mTORC1 pathway in the revised manuscript as requested by the Reviewer. We hope the new version of our manuscript will satisfy the Reviewer’s worries.

      Reviewer #1 (Recommendations for The Authors):

      Major comments: 

      (1) While the authors show that, in the absence of Ezrin, TSC accumulates on the lysosome and suppresses mTORC1 signaling, they should perform additional genetic experiments to strengthen their conclusions. Can they knockout or knockdown TSC1/2 in Ezrin-deficient cells to rescue mTORC1 activity? Can they mutate the lysosomal localization signal on TSC1 (TSC1Q149E/R204E/K238E) in Ezrin-deficient cells to rescue mTORC1 activity? Does constitutively active AKT (myr-AKT or AKT-E40K) restore mTORC1 activity in Ezrin-deficient cells? 

      We agree with the Reviewer that it is an important concern that helped us to improve the quality of the data presented. We now provide in the revised version of Figure supplement 4F the results of pharmacological inhibition of Ezrin on MEF-TSC2 KO cells. In line with our findings, the lack of TSC2 is able to rescue mTORC1 signaling in absence of Ezrin activity. Thus, these data strongly support that Ezrin is required for TORC1pathway via TSC complex targeting.

      (2) In the absence of Ezrin, TSC1 constitutively localizes on the lysosome and suppresses mTORC1. Does this suppression hold in the presence of other mTORC1-activating signals (i.e., amino acids, insulin, oxygen)? 

      Following the reviewer’s suggestion we now provide this information in the revised Figure 6C, in which we showed that stimulation with insulin does not exert its activating effect on mTORC1 signaling (i.e. phosphorylation of pP70 S6 - pT389). These new data, together with the experiments on MEF TSC2 KO cells, clearly support the model by which Ezrin works as a scaffold protein connecting ATK signaling to TSC complex. The lack of Ezrin induces a disconnection between AKT and TSC complex, which is translocated on lysosomes and insensitive to inhibition of AKT signaling.

      (3) In Figure 3A, the authors showed EGFR dimerization through a western blot of a crosslinking assay. However, the western blot data are unclear and do not strongly support their statement. Additionally, the authors mentioned that the dimerization is confirmed by immunofluorescence analysis, but this statement should be revised since the imaging analysis only indirectly shows the copresence of EZR and EGFR, not necessarily the dimerized EGFR. The authors should perform additional experiments to strengthen their claim or tone down their statements in the text and model figure. 

      We certainly agree with the Reviewer on the importance of this issue and now we have fixed this inaccuracy in the revised manuscript. The blots of crosslinking were changed for better exposed images in revised Figure 3, panel A. Moreover, we also properly quantified signals to support our conclusion.

      (4) It is interesting that Ezrin binds EGFR, AKT, and TSC as a scaffolding protein. To define the mechanisms by which Ezrin interacts with AKT, EGFR, and TSC, can the authors perform domain analyses to determine which regions of Ezrin are required for its binding with AKT, EGFR, and TSC in mediating EGFR-AKT-TSC-mTORC1 signaling? 

      We thank the Reviewer for his/her comment that improves our manuscript. Conducting domain analysis in the lab would be ideal, although this seems to us a long tour de force that might be associated to several technical and experimental issues. However, in silico approaches provide a helpful alternative for generating initial hypotheses about domain-domain interactions, though they should be seen as a starting point rather than a complete solution. Recent advances in fold prediction suggest that AlphaFold3 could be used to predict dimer formation and, consequently, domain-domain interactions. However, such an approach is challenging in this case because some of the considered proteins are transmembrane, and all are prone to form multimeric complexes with multiple partners, making them poor candidates for reliable fold predictions. In fact, the predicted dimers are poorly supported, and AlphaFold3 lacks confidence in the relative positioning of interactors, limiting its interpretability. Alternatively, database mining and machine-learning methods, such as HINT, Domine, and PPIDomainMiner, provide more robust evidence. Indeed, these tools allow us to consistently identify a strong interaction between Ezrin's FERM central domain and EGFR's PK domain shown now in the Figure Supplement 2C and Supplement Figure 3C-H. Importantly, these findings generate valuable hypotheses, therefore experimental validation is still necessary. But we prefer to leave it for future studies.

      Minor Comments: 

      (1) There are several immunoblots that did not have adequate controls:  - In Figure 2D, an input lane should be shown for each of the cell lysates to demonstrate the presence of other proteins in the cell lysate used for the IP.

      We have now fixed this inaccuracy in the revised manuscript.

      - Figure 3A does not have a loading control. Also, immunoblot quality should be significantly improved.

      We have now fixed this inaccuracy in the revised manuscript.

      - The HER2 western blot in Figure 5C does not accurately represent the data shown in the quantification graph.

      We have now fixed this inaccuracy by replacing HER2 western blot in the revised Figure 5C.

      - In Figure 6A, the authors should include an input as a control for the IP. To further support their claim in the model figure, can the authors also probe the IP lysate for Ezrin and Tsc2? If all are indeed in a complex together, they should be present. 

      Following this Reviewer’s observation, we add the input as control in the IP in the revised Figure 6A. Moreover, we include the immunoprecipitation data for the EZRIN and TSC2 interaction, accordingly (Figure 6A).

      - Phosphorylation sites across figures should be uniformly annotated for consistency and ease of understanding, e.g., pTSC2(S939), pS6K1(T389), and pAKT(S473).

      We have now fixed this inaccuracy in the revised text.

      (2) There are several microscopy data that lack adequate quantification. For instance, Figures 2E, 2F, 3C, 4A, 5A, and 6F only show very few cells as representative images, which is not sufficient to support their claims. 

      We thank the Reviewer for his/her comment that improves our manuscript. Accordingly, we add adequate quantification and statistical analysis in the revised Figures, accordingly.

      (3) Some suggestions to improve the readability of the manuscript: 

      -  In the abstract (line 32): "Loss of Ezrin was deficient in TSC repression by EGF and culminated in translocation of TSC to lysosomes triggering suppression of mTORC1 signaling." The wording is somewhat confusing, please change such as "Loss of Ezrin was not sufficient to repress TSC by EGF and culminated..." or "Loss of Ezrin blunted EGF-induced TSC suppression and culminated..." 

      We apologize for the lack of clarity and now we have fixed this inaccuracy by better elucidating this aspect in the revised manuscript.

      -  Figure 3D has a typo in the western blot labeling. Please change Citosol to Cytosol. 

      We have now fixed this inaccuracy in the revised text.

      -  Line 291: "Moreover, TSC2 resulted activated and AKT/mTOR signaling..." The wording is confusing. 

      We have now fixed this inaccuracy in the revised text. The text now reads: “Moreover, we found that TSC2 was dephosphorylated  in response to light in the retina, when inactive Ezrin (Naso et al., 2020) and EGFR are weakly expressed (Figure supplement 6C) as a consequence of a decrease of the AKT/mTORC1 signaling…..)

      -  The model in Figure 8 indicates that upon EGF stimulation, the activated Ezrin interacts with EGFR, causing its dissociation from actin filaments and leading to its endosome incorporation. However, the authors did not provide supporting data for this claim. Can the authors either cite literature or provide data for this? Otherwise, the model should be edited to remove actin filaments in the model. 

      We have now fixed this inaccuracy by removing actin filaments in the revised model.

      Reviewer #2 (Recommendations For The Authors):

      The data and written text seem to deal entirely with mTORC1, rather than mTORC2, thus it seems "mTOR" should be changed to "mTORC1" throughout. 

      We have now fixed this inaccuracy in the revised manuscript.

      For clarification, the TSC protein complex should be referred to as the "TSC complex", whereas "TSC" generally refers to the tumor syndrome Tuberous Sclerosis Complex.

      We have now fixed this inaccuracy in the revised manuscript.

      Quantification of colocalization would be helpful in all the panels where it is currently missing.

      We thank the Reviewer for his/her comment that improves our manuscript. Accordingly, we add adequate quantification of colocalization for each immunofluorescence in the revised Figures, accordingly.

      Line 84 typo "thorough" should be "through" 

      We have now fixed this inaccuracy in the revised manuscript.

      Line 178 - typo 

      We have now fixed this inaccuracy in the revised manuscript.

      Line 209 - typo 

      We have now fixed this inaccuracy in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors): 

      Fig. 1 The data showing an increase in lysosomal biogenesis suggests an increase in transcriptional activity. This should be confirmed by one or more of the following: 1) Increased TFEB/TFE3 nuclear localization following EZR loss, 2) Increased CLEAR promoter luciferase activity assays, 3) Increased expression of multiple CLEAR transcripts (https://www.science.org/doi/10.1126/science.1174447) or 4) Increased TFEB/ TFE3/ CLEAR gene signatures by RNA seq. Similarly, data showing increased autophagic flux should be confirmed in the presence of chloroquine or bafilomycin. 

      We agree with the Reviewer that it is an important concern that helped us to improve the quality of the data presented. It is well established that a major mechanism regulating TFEB activity is represented by the nuclear translocation. We have now carried out new experiments demonstrating that depletion of Ezrin induces TFEB nuclear translocation in Ezrin<sup>-/-</sup> cells. These findings are in line with our previous data in which pharmacological inhibition and silencing of Ezrin induced the same cellular phenotype. We also apologize because we have created some confusion, because we already carried out experiments with Bafilomycin to confirm the increase of autophagic flux. Therefore, the blots of autophagic flux were changed for better exposed images in revised Figure supplement 1H and the text was modified to emphasize these findings, accordingly.

      Fig 2D, the lanes with EZR -/- cells expressing the EZR mutants should be repeated on the same gel as the first 2 lanes (with the WT and EZR<sup>-/-</sup> cells) 

      We thank the Reviewer for his/her comment that improves our manuscript. In order to avoid any confusion, when describing the results in Figure 2D, we have now modified the Figure 2D, providing the required controls in the response to Reviewer #1 and #2. We hope the new version of our data will satisfy the Reviewer’s worries.

      Fig 2F- The presence of reduced EGFR in intracellular compartments in Ezrin KO/ -/- cells should be quantified, and shown for a 2nd EZR null cell line as well (Ezrin null MEFs) 

      We added EGFR quantification in Figure 2F. We have now carried out new experiments demonstrating that EGFR is localized on cytoplasmic membrane in MEF Ezrin KO (Figure supplement 2H), accordingly. 

      Fig 2G, did the authors test the effects of EZR depletion on basal and EGF stimulated EGFR autophosphorylation on Y1068 and Y1045 as well as downstream activation of p42/44 ERK MAPK?  Those should be tested in the HeLa system as well as the MEFs cells with EZR KO. 

      Following the Reviewer’s request, we have now added western blot data for EGFR autophosphorylation on Y1068 and p42/44 ERK MAPK in Figure 5C. Moreover, we have now added western blot data for p42/44 ERK MAPK on MEF cells in Figure supplement 2F. In contrast, we cannot provide any data for EGFR autophosphorylation on Y1068, because the antibody was not working on proteins from MEF cells.

      Also, why would HER3 levels be expected to decrease? There seems to be minimal change in HER3 expression. Also, the significance of increased MK2 phosphorylation should be further elaborated. 

      The Reviewer raised justified concerns about the HER3 and MK2. We have discussed these aspects in the "results section”, accordingly. 

      Fig 3A- Crosslinking of EGFR is not very apparent in this blot. The crosslinking blots should be repeated 3 times and quantified. 

      We certainly agree with the Reviewer on the importance of this issue and now we have fixed this inaccuracy in the revised manuscript. The blots of crosslinking were changed for better exposed images in revised Figure 3, panel A. Moreover, we also properly quantified signals to support our conclusion.

      Fig 3D- How were membrane endosomes isolated? This should be stated in the methods. Membrane/ Cytosol and Endosome fractionation showing EGFR levels should be shown in Ezrin null MEFs as well, and membrane expression should be further substantiated with surface biotinylation for cell surface EGFR. 

      We now report more information about the method that we used for membrane endosomes isolation in the Materials and Methods section. Following the Reviewer’s request, we also show that EGFR was not localized on endosomes upon EGF on Ezrin null MEFs. This data was reported in the new revised Figure Supplement 2G. Moreover, we have now carried out new experiments demonstrating the membrane localization of EGFR in MEF Ezrin KO cells. These findings are shown in Figure supplement 2H.

      Fig 5C: Similar to 2G, EGFR autophosphorylation on Y1068 and Y1045 should also be measured, as well as downstream activation of p42/44 ERK MAPK? 

      Following the Reviewer’s request, we have now carried out new experiments to assess the EGFR autophosphorylation on Y1068 and Y1045, as well as downstream activation of p42/44 ERK MAPK.  We added these new data in the revised Figure 5C, accordingly. 

      Fig 5D: Similar to 3D, Membrane/ Cytosol and Endosome fractionation showing EGFR levels should be shown in Ezrin null MEFs as well, and further substantiated with surface biotinylation for cell surface EGFR. 

      Following the Reviewer’s request, we show that EGFR was not localized on endosomes upon EGF (Figure Supplement 2G). 

      Supplement 2E: The blots show lower expression of EGFR and higher MAPK activation in EZR KO cells, contradicting the data in the other cells. 

      We apologize because we have created some confusion. It occurred during the preparation of Figure supplement 2E, reflecting image of a previous not finalized version of the Figure. We have now removed the error and replaced with a correct WB panel.

      Supplement 2F: The authors should repeat the NSC668394 experiment using: 1) multiple doses, 2) In both the Ezrin KO and null cell lines 3) and repeat 3X to quantify differences in total EGFR. 

      We respectfully disagree with the Reviewer and feel that addressing this point by additional studies on dose response of NSC668394, as the Reviewer suggests, is outside the scope of this manuscript. However, we would like to point out that we have already conducted extensive studies on the doseresponse effects of NSC668394 administration in vitro (Patent: WO2020070333A1). 

      Moreover, we apologize for not having provided enough information about the number of biological independent replicates for WB analyses. Therefore, to fill this gap of information we have expanded the Material and Methods section, accordingly.

      Patent: WO2020070333A1 - Ezrin inhibitors and uses thereof

      Fig 6A: The IP experiments should be repeated with Control IgG 

      We have now fixed this inaccuracy in the revised manuscript.

      Typos: 

      (1) Figure 3D: Citosol 

      We have now fixed this inaccuracy in the revised manuscript.

      (2) Line 216-217: "increased EGFR protein 217 levels on purified membranes and endosomes (Figure 3D and E)" - That should be decreased EGFR on endosomes in accordance with Figure 3D (lower panels) 

      We have now fixed this inaccuracy in the revised manuscript.

      (3) Abstract: "Consistently, Medaka fish deficient for Ezrin exhibit defective endo-lysosomal pathway" 

      We have now fixed this inaccuracy in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Shen et al. conducted three experiments to study the cortical tracking of the natural rhythms involved in biological motion (BM), and whether these involve audiovisual integration (AVI). They presented participants with visual (dot) motion and/or the sound of a walking person. They found that EEG activity tracks the step rhythm, as well as the gait (2-step cycle) rhythm. The gait rhythm specifically is tracked superadditively (power for A+V condition is higher than the sum of the A-only and V-only condition,

      Experiments 1a/b), which is independent of the specific step frequency (Experiment 1b). Furthermore, audiovisual integration during tracking of gait was specific to BM, as it was absent (that is, the audiovisual congruency effect) when the walking dot motion was vertically inverted (Experiment 2). Finally, the study shows that an individual's autistic traits are negatively correlated with the BM-AVI congruency effect.

      Strengths:

      The three experiments are well designed and the various conditions are well controlled. The rationale of the study is clear, and the manuscript is pleasant to read. The analysis choices are easy to follow, and mostly appropriate.

      Weaknesses:

      On revision, the authors are careful not to overinterpret an analysis where the statistical test is not independent from the data (channel) selection criterion.

      Thanks for the suggestion and we have done this according to your recommendations below.

      Reviewer #1 (Recommendations for the authors):

      Re: the double-dipping concern: I appreciate the revision. Just to clarify: my concern rests with the selection of *electrodes* based on the interaction test for the 1Hz condition. The 2Hz condition analogous test yields no significant electrodes. You perform subsequent tests (t-tests and 3-way interaction) on the data averaged across the electrodes that were significant for the 1Hz condition. Therefore, these tests will be biased to find a pattern reflecting an interaction at 1Hz, while no similar bias exists for an effect at 2Hz. Therefore, there is a bias to observe a 3-way interaction, and simple effects compatible with a 2-way interaction only for 1Hz, not for 2Hz (which is exactly what you found). There is no good statistical alternative here, I appreciate that, but the bias exists nonetheless. I think the wording is improved in this revision, and the evidence is convincing even in light of this bias.

      We are grateful for your thoughtful comments on the analytical methods. We appreciate your concerns regarding the potential bias of examining 3-way interaction based on electrodes yielding a 2-way interaction effect. To address this issue, we have conducted a bias-free analysis based on electrodes across the whole brain. The results showed a similar pattern of 3-way interaction as previously reported (p = 0.051), suggesting that the previous findings might not be caused by electrode selection. Given that the main results of Experiment 2 were not based on whole-brain analysis, we did not involve this analysis in the main text, and we have removed the three-way interaction results based on selected electrodes from the manuscript to reduce potential concerns. It is also noteworthy that, when performing analyses based on channels independent of the interaction effect at 1 Hz (i.e., significant congruency effects in the upright and inverted conditions, respectively, at 2Hz), we got similar results as reported in the main text (i.e., non-significant interaction and correlation at 2 Hz). These results were presented in the supplementary file in previous versions and mentioned in the correlation part of the Results section (see Fig. S2). Once again, we sincerely appreciate your careful review of our research. We hope the abovementioned points adequately address your concern.

      Reviewer #2 (Public review):

      Summary:

      The authors evaluate spectral changes in electroencephalography (EEG) data as a function of the congruency of audio and visual information associated with biological motion (BM) or non-biological motion. The results show supra-additive power gains in the neural response to gait dynamics, with trials in which audio and visual information was presented simultaneously producing higher average amplitude than the combined average power for auditory and visual conditions alone. Further analyses suggest that such supra-additivity is specific to BM and emerges from temporoparietal areas. The authors also find that the BM-specific supra-additivity is negatively correlated with autism traits.

      Strengths:

      The manuscript is well-written, with a concise and clear writing style. The visual presentation is largely clear. The study involves multiple experiments with different participant groups. Each experiment involves specific considered changes to the experimental paradigm that both replicate the previous experiment's finding yet extend it in a relevant manner.

      In the first revisions of the paper, the manuscript better relays the results and anticipates analyses, and this version adequately resolves some concerns I had about analysis details. In a further revision, it is clarified better how the results relate to the various competing hypotheses on how biological motion is processed.

      Weaknesses:

      Still, it is my view that the findings of the study are basic neural correlate results that offer only minimal constraint towards the question of how the brain realizes the integration of multisensory information in the service of biological motion perception, and the data do not address the causal relevance of observed neural effects towards behavior and cognition. The presence of an inversion effect suggests that the supraadditivity is related to cognition, but that leaves open whether any detected neural pattern is actually consequential for multi-sensory integration (i.e., correlation is not causation). In other words, the fact that frequency-specific neural responses to the [audio & visual] condition are stronger than those to [audio] and [visual] combined does not mean this has implications for behavioral performance. While the correlation to autism traits could suggest some relation to behavior and is interesting in its own right, this correlation is a highly indirect way of assessing behavioral relevance. It would be helpful to test the relevance of supra-additive cortical tracking on a behavioral task directly related to the processing of biological motion to justify the claim that inputs are being integrated in the service of behavior. Under either framework, cortical tracking or entrainment, the causal relevance of neural findings toward cognition is lacking.

      Overall, I believe this study finds neural correlates of biological motion that offer some constraint toward mechanism, and it is possible that the effects are behaviorally relevant, but based on the current task and associated analyses this has not been shown (or could not have been, given the paradigm).

      Reviewer #2 (Recommendations for the authors):

      Thank you for your revisions; I have updated the Strengths section, and reworded the weaknesses section. I now concede that the neural effects observed offer some constraint towards what the neural mechanisms for AV integration for BM are, whereas in my previous review, I said too strongly that these results do not offer any information about mechanism.

      Thank you again for your insightful thoughts and comments on our research. They have contributed greatly to enhancing the discussion of the article and provided valuable inspiration for future exploration of causal mechanisms.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This paper investigates the mechanism of axon growth directed by the conserved guidance cue UNC-6/Netrin. Experiments were designed to distinguish between alternative models in which UNC-6/Netrin functions as either a short-range (haptotactic) cue or a diffusible (chemotactic) signal that steers axons to their final destinations. In each case, axonal growth cones execute ventrally directed outgrowth toward a proximal source of UNC-6/Netrin. This work concludes that UNC-6/Netrin functions as both a haptotactic and chemotactic cue to polarize the UNC-40/DCC receptor on the growth cone membrane facing the direction of growth. Ventrally directed axons initially contact a minor longitudinal nerve tract (vSLNC) at which UNC-6/Netrin appears to be concentrated before proceeding in the direction of the ventral nerve cord (VNC) from which UNC-6/Netrin is secreted. Time-lapse imaging revealed that growth cones appear to pause at the vSLNC before actively extending ventrally directed filopodia that eventually contact the VNC. Growth cone contacts with the vSLNC were unstable in unc-6 mutants but were restored by the expression of a membrane-tethered UNC-6 in vSLNC neurons. In addition, the expression of membrane-tethered UNC-6/Netrin in the VNC was not sufficient to rescue initial ventral outgrowth in an unc-6 mutant. Finally, dual expression of membrane-tethered UNC-6/Netrin in both vSLNC and VNC partially rescued the unc-6 mutant axon guidance defect, thus suggesting that diffusible UNC-6 is also required. This work is important because it potentially resolves the controversial question of how UNC-6/Netrin directs axon guidance by proposing a model in which both of the competing mechanisms, e.g., haptotaxis vs chemotaxis, are successively employed. The impact of this work is bolstered by its use of powerful imaging and genetic methods to test models of UNC-6/Netrin function in vivo thereby obviating potential artifacts arising from in vitro analysis.

      Strengths:

      A strength of this approach is the adoption of the model organism C. elegans to exploit its ready accessibility to live cell imaging and powerful methods for genetic analysis.

      Weaknesses:

      A membrane-tethered version of UNC-6/Netrin was constructed to test its haptotactic role, but its neuron-specific expression and membrane localization are not directly determined although this should be technically feasible. Time-lapse imaging is a key strength of multiple experiments but only one movie is provided for readers to review.

      Thank you for your comments. We have now used SNAP labeling to directly visualize the localization of membrane tethered UNC-6 and confirmed UNC-6 is only detectable on the sublateral and ventral nerve cords (Figure S3A). These data have been added to the manuscript on page 15, lines 342-347. We have also provided a representative movie for each imaged genotype (Videos S2-10).

      Reviewer #2 (Public Review):

      Nichols et al studied the role of axon guidance molecules and their receptors and how these work as long-range and/or local cues, using in-vivo time-lapse imaging in C. elegans. They found that the Netrin axon guidance system works in different modes when acting as a long-range (chemotaxis) cue vs local cue (haptotaxis). As an initial context, they take advantage of the postembryonic-born neuron, PDE, to understand how its axon grows and then is guided into its target. They found that this process occurs in various discrete steps, during which the growth cone migrates and pauses at specific structures, such as the vSLNC. The role of the UNC-6/Netrin and UNC-40/DCC axon guidance ligand-receptor pair was then looked at in terms of its requirement for

      (1) initial axon outgrowth direction

      (2) stabilization at the intermediate target

      (3) directional branching from the sublateral region or

      (4) ventral growth from the intermediate target to the VNC.

      They found that each step is disrupted in the unc-6/Netrin and unc-40/DCC mutants and observed how the localization of these proteins changed during the process of axon guidance in wild-type and mutant contexts. These observations were further supported by analysis of a mutant important for the regulation of Netrin signaling, the E3 ubiquitin ligase madd-2/Trim9/Trim67. Remarkably, the authors identified that this mutant affected axonal adhesion and stabilization, but not directional growth. Using membrane-tethered UNC-6 to specific localities, they then found this to be a consequence of the availability of UNC-6 at specific localities within the axon growth path. Altogether, this data and in-vivo analysis provide compelling evidence of the mechanistic foundation of Netrin-mediated axon guidance and how it works step by step.

      The conclusions are well-supported, with both imaging and quantification of each step of axon guidance and localization of UNC-6 and UNC-40. Using a different type of neuron to validate their findings further supports their conclusions and strengthens their model. It's not yet known whether this model holds true for other ligand-receptor pairs, but the current work sets the stage for future analysis of other axon guidance molecules using time-lapse in-vivo imaging. There are still two outstanding questions that are important to address to support the authors' model and conclusions.

      (1) The results of UNC-6-TM expression at different locations are clear and support the conclusions but need to consider that there's no diffusible UNC-6 available. What would happen if UNC-6 is tethered to the membrane in an otherwise completely 'normal' UNC-6 gradient. Does the axon guidance ensue normally or does it get stuck in the respective site of the membrane tethered-UNC-6 and doesn't continue to outgrow properly? This is an important control (expression of the UNC-6-TM at the vSLNC or VNC in the wild type background) that would help clarify this question and gain a better insight into the separability of both axon guidance steps and the ability to manipulate these.

      Thank you for your comments. We expressed UNC-6<SUP>TM</SUP> at vSLNC and VNC in wild-type animals and examined adult morphology of both HSN and PDE in the control conditions you suggested. These data are available in Tables 1 and 2 with no statistical differences compared to wildtype animals. Second, we also provide still images of developing PDE axons near the vSLNC (Figure S3D) to confirm that this axon guidance step is intact when UNC-6<SUP>TM</SUP> is overexpressed in specific regions. Together, these data suggest that the TM rescue constructs do not interfere with endogenous axon guidance pathways. We have added these results to the manuscript on page 15, lines 347-349.

      (2) Axon guidance systems do not work in a vacuum and are generally competing against each other. For example, the SLT-1/Slit and SAX-3/ROBO axon guidance ligand-receptor pair is also required for PDE, and other post-embryonic neurons, axon guidance. It would be interesting to test mutants for these genes with the membrane tethered-UNC-6 to determine if the different steps of axon guidance are disrupted and if so, to what degree these are disrupted.

      Thank you for this suggestion. We have performed time-lapse imaging on slt-1 mutants and unc-6; slt-1 double mutants. These data are available in a new figure, Figure 3. Indeed, we found that slt-1 mutants showed abnormal direction of axon emergence and stabilization at the VNC but normal stabilization at vsLNC and axonal branching (Fig.3). These data can be found in the manuscript from pages 11-12, lines 248-269.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript from Nichols, Lee, and Shen tackles an important question of how unc6/netrin promotes axon guidance: i.e. haptotaxis vs chemotaxis. This has recently been a large topic of investigation and discussion in the axon guidance field. Using live cell imaging of unc6/netrin and unc40/DCC in several neurons that extend axons ventrally during development, as well as TM localized mutants of Unc6, they suggest that unc6 promotes first haptotaxis of the emerging growth cone followed by chemotaxis of the growth cone. This is timely, as a recent preprint from the Lundquist group, using a similar strategy to make only a TM anchored unc6 similarly found that this could rescue only the haptotaxis-like growth of the PDE neuron, but not the second phase of growth. However, their conclusions were quite different based on the overexpression of unc6 everywhere rescuing the second phase, and thus they conclude that a gradient is not present.

      Strengths:

      As this has been quite a controversy in both the invertebrate and vertebrate field, one strength of this paper is that they use an unc6-neon green to demonstrate unc6 localization, and show a gradient of localization.

      Weaknesses:

      This is important, although it could be strengthened by first showing a more zoomed-out image of unc6 in the animal, and second demonstrating the localization of the transmembrane anchored unc6 mutants, to help define what may be the "diffusible Unc6".

      Thank you for your comments. We have performed both of these experiments. In Figure 6A, we provide a zoomed out image of PDE growth cone interacting with UNC-6::mNG prior to reaching the vSLNC. Notably, we do not observe an obvious gradient that extends into this more dorsal region of the animal. We have also shown the membrane localization of UNC-6<sup>TM</sup> through SNAP labeling in Figure S3A. These data have been added to the manuscript on page 15, lines 342-347.

      I suggest two additional experimental or analysis suggestions: First, the authors clarify the phenotype of ventral emergence of the growth cone. Though the manuscript images suggest that no matter the mutant there is ventral emergence of the growth cone, but then later defects, yet they claim ventral emergence defects with the UNC6 tethered mutants, but there is no comparison of rose plots. This is confusing and needs to be addressed.

      Thank you for your comment. We have now included images (i.e. slt-1(eh15) and unc-6(ev400); slt-1(eh15) genotypes in Figure 3) and movies showing misoriented axon emergence. We have also provided an additional quantification that allows for statistical comparison of emergence angle across genotypes. This quantification takes the sine function of the angle to quantify the relative emergence trajectory across the dorsal-ventral axis. A value of 1 indicates 90° dorsal emergence, and -1 indicates 90° ventral emergence. Statistical comparisons across genotypes demonstrate that axons in both unc-6 and slt-1 mutants are misoriented relative to wild-type axons. These comparisons can be found in Figures S1B, 3C, S2B, S3C.

      Second, I have concerns that the analysis of unc40 polarization may be misleading in some cases when there appears to indeed be accumulation in the growth cone, but since the only analysis shown is relative to the rest of the cell, that can be lost.

      Thank you for sharing your concerns about the UNC-40 polarization quantifications. We have separately compared the value of the integrated density of UNC-40::GFP in each cellular domain (vSLNC-contacting area and the dorsal soma) between genotypes. While we did not include these comparisons in the original manuscript, we have now included them in the revised manuscript. Overall, these data support our conclusions that UNC-40 mispolarization occurs across the entire cell (Fig. S1F,G; S2E-H; S3E,F).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1:

      Comment 1: Within the scope of the current work there are no major weaknesses. That said, the authors themselves note pressing questions beyond the scope of this study that remain unanswered. For instance, the mechanistic nature of the interactions between FMO-4 and the other players in this story, for example in terms of direct protein-protein interactions, is not at all understood yet.

      We thank the reviewer for the positive review, and fully agree and acknowledge that there are unanswered questions for future studies that are beyond the scope of this manuscript.

      Reviewer 2:

      Comment 1: The effects of carbachol and EDTA on intracellular calcium levels are inferred, especially in the tissues where fmo-4 is acting. Validating that these agents and fmo-4 itself have an impact on calcium in relevant subcellular compartments is important to support conclusions on how fmo-4 regulates and responds to calcium.

      We thank the reviewer for this important suggestion. We agree that carbachol and EDTA can be broad agents and validating that they are altering calcium levels is very useful. While this is technically challenging, we attempted to address this by using neuronally expressed GCaMP7f calcium indicator worms and measuring their GFP fluorescence upon exposure to carbachol and EDTA. Assessing both short term and long term exposure to these agents, we were able to show that carbachol increases GFP fluorescence, indicating an increase in calcium levels, and EDTA decreases GFP fluorescence, indicating a decrease in calcium levels. Unfortunately, because FMO-4 is not neuronally expressed, we were not able to test the effects of FMO-4 on calcium in this strain, which would require hypodermal expression and possibly short-term modification of fmo-4 expression to test. We have made sure to temper our language about the indirect measures we used.

      Comment 2: Experiments are generally reliant on RNAi. While in most cases experiments reveal positive results, indicating RNAi efficacy, key conclusions could be strengthened with the incorporation of mutants.

      We appreciate and value this suggestion and agree that mutants could be helpful to strengthen our conclusions. We address this caveat in the discussion of the revised manuscript. We explain that we were concerned about knocking out key calcium regulating genes like itr-1 and mcu-1 that either already result in some level of sickness in the worms when knocked down (itr-1) or could lead to confounding metabolic changes if knocked out. We do find that our RNAi lifespan results are robust and reproducible, but we also understand and recognize the caveats that come with using RNAi knockdown instead of full deletion mutants.

      Reviewer 3:

      Comment 1: no obvious transcriptomic evidence supporting a link between fmo-4 and calcium signaling: either for knockout worms or fmo-4 overexpressing strains.

      We thank the reviewer for this feedback. While there is some transcriptomic evidence, we agree that it is not overwhelming evidence. We do think that this evidence, combined with the phenotype observed under thapsigargin (i.e., significant reduction in worm size and significant delay or prevention of development), in addition to the genetic connections to calcium regulation, provide additional compelling evidence that FMO-4 interacts with calcium signaling.

      Comment 2: no direct measures of alterations in calcium flux, signalling or binding that strongly support a connection with fmo-4.

      As described in reviewer 2 comment 1, we have successfully used GCaMP7f worms to assess calcium flux upon exposure to carbachol and EDTA. This approach confirmed the changes in calcium expected from these compounds. Unfortunately, because FMO-4 is not neuronally expressed, we were not able to test the effects of FMO-4 on calcium in this strain, which would require hypodermal expression and possibly short-term modification of fmo-4 expression to test. We have made sure to temper our language about the indirect measures we used.

      Comment 3: no measures of mitochondrial morphology or activity that strongly support a connection with fmo-4.

      This is a great point, and something we are currently working on to include for a future manuscript. 

      Comment 4: lack of a complete model that places fmo-4 function downstream of DR and mTOR signalling (first Results section), fmo-2 (second Results section) and at the same time explains connection with calcium signalling.

      We thank the reviewer for this helpful feedback. We have included a more complete working model in our revision.

      Recommendations for the authors:

      Reviewer 1:

      Comment 1: "We utilized fmo-4 (ok294) knockout (KO) animals on five conditions reported to extend lifespan in C. elegans." Here I believe "fmo-4 (ok294)" should be "fmo-4(ok294)". (No space).

      We thank the reviewer for this helpful revision. We have made this change as suggested.

      Comment 2: "Wild-type (WT) worms on DR experience a ~35% lifespan extension compared to fed WT worms, but when fmo-4 is knocked out this extension is reduced to ~10% and this interaction is significant by cox regression (p-value < 4.50e-6)." Here "cox regression" should be "Cox regression".

      We have made this change as suggested.

      Comment 3: "Having established this role, we continued lifespan analyses of fmo-4 KO worms exposed to RNAi knockdown of the S6-kinase gene rsks-1 (mTOR signaling), the von hippel lindau gene vhl-1 (hypoxic signaling), the insulin receptor daf-2 (insulin-like signaling), and the cytochrome c reductase gene cyc-1 (mitochondrial electron transport chain, cytochrome c reductase) (Fig 1C-F)." Here "von hippel lindau" should be "Von Hippel-Lindau".

      We have made this change as suggested.

      Comment 4: In three instances in the caption of Figure 5, the "4" in fmo-4 is not italicized when it should be.

      We have made this change as suggested.

      Comment 5: In two instances in the caption of Figure 7, the "4" in fmo-4 is not italicized when it should be, and in one instance in the caption of Figure 7, the "6" in atf-6 is not italicized when it should be.

      We have made this change as suggested.

      Comment 6: "Supplemental Data 3 provides the results of the Log-rank test and Cox regression analysis, which were run in Rstudio." Here Rstudio should be RStudio.

      We have made this change as suggested.

      Comment 7: In the references, within article titles italicization (e.g. of Caenorhabditis elegans) is frequently missing. While this is often an artifact introduced by reference management software, it should be corrected in the final manuscript.

      We thank the reviewer for all the helpful revision suggestions. We have made sure all the references are properly italicized where necessary.

      Reviewer 2:

      Comment 1: While FMO-4 is clearly placed in the ER calcium pathway genetically, the molecular mechanism by which FMO-4 would alter ER calcium is unclear. Notably, Tuckowski et al. highlight this gap in the discussion as well.

      We thank the reviewer for identifying this important caveat. We hope to address the molecular mechanism by which FMO-4 alters ER calcium in upcoming projects.

      Comment 2: Determining whether overexpression of catalytically dead FMO-4 or introduction of an inactivating point mutant into the endogenous locus phenocopy FMO-4 OE and KO animals would help distinguish between mechanisms involving protein-protein interactions or downstream metabolic regulation.

      We thank the reviewer for this valuable suggestion. This is an experiment we are hoping to do in the near future to better understand molecular mechanisms and protein-protein interactions.

      Reviewer 3:

      Comment 1: When measuring the effect of thapsigargin on development of fmo-4 mutants it would be great to use a developmental assay rather than quantifying normalized worm area. Also please add scale bars to Figure 3G and 4H, it seems that fmo-4 overexpression decreases worm size even in control conditions, clarify if this is the case.

      We thank the reviewer for this feedback. In addition to quantifying normalized worm area in Figure 3G-I, we have added a developmental assay (Figure 3J) that shows the development time of wild-type worms on DMSO or thapsigargin as well as the fmo-4 OE worms on DMSO or thapsigargin. These data validate that the fmo-4 OE worm development is either delayed significantly or even prevented when the worms are treated with thapsigargin.

      We have added scale bars to Figure 3G and 4H as suggested.

      We also appreciate the reviewer’s observation of the fmo-4 overexpression worms appearing smaller than wild-type worms in control conditions. We looked through the replicates and found that just one replicate showed a significant decrease in worm size, as observed in our unrevised manuscript. We repeated this experiment twice more to gather more data and determined that the fmo-4 overexpression worms were ultimately not significantly different in size compared to wild-type worms. We have included the new images and quantifications in Figure 3G-I and Figure 4H-J in the revised manuscript.

      Comment 2: correct or replace Supplementary Table 2, which is not showing a DAVID analysis as the title and text would suggest. We should see biological/molecular processes, effect sizes, p-values, ...

      We thank the reviewer for identifying this issue. We have added more detail to the Supplementary Table 2 so that it is clearer what is being shown in each tab.

      Comment 3: clarify the data presented in Supplementary Data 2 because it does not clearly explain what is shown

      This is a great point, and we have added more detail to the Supplementary Data 2 to make sure the data are more clearly explained in each tab.

      Comment 4: in Figure 5B the fluorescent images do not seem to reflect the quantification in panel 5C.

      Thank you for this feedback. We re-analyzed our data to make sure the proper fluorescent images are included with their matching quantifications in Figure 5B-C.

      Comment 5: where is Supplementary Data 3?

      We thank the reviewer for noticing this. Supplementary Data 3 was accidentally missing from the first submission, and has now been added.

      Comment 6: conceptually the last results section (regarding atf-6) does not add much to the story, I would consider removing these results

      We appreciate this feedback. We have decided to keep Figure 7 because we think it helps to validate fmo-4’s role in calcium movement from the ER. While we show genetic interactions between fmo-4 and key genes involved in calcium regulation (crt-1, itr-1, and mcu-1), we think that showing how fmo-4 also interacts with atf-6, a known regulator of calcium homeostasis, strengthens and supports the genetic mechanisms of fmo-4 proposed in this manuscript.

      Comment 7: the model proposed in Figure 7E is not convincingly supported by the results:<br /> o the arrows connecting atf-6, fmo-4 and crt-1 (calreticulin) suggest that fmo-4 is downstream of atf-6 and upstream of crt-1: Berkowitz 2020 showed that atf-6 knockdown downregulates calreticulin, so unless the authors show that this downregulation is mediated directly by fmo-4, the more likely explanation is that atf-6 knockdown affects calcium levels which in turn induces fmo-4 expression.

      We thank the reviewer for this helpful feedback. We have addressed this by updating our proposed model. We used a solid arrow leading from the reduction of atf-6 to induction of fmo-4, as this is supported by our data in Figure 7A-B. We then used dashed arrows between fmo-4 and crt-1 as well as between atf-6 and crt-1 to indicate that more data is needed to clarify this part of the pathway.

      Comment 8: Avoid pointing at a mitochondrial connection in the title as the only evidence supporting this interaction comes from the mcu-1 RNAi epistasis.

      We appreciate the reviewer’s suggestion. We added another piece of evidence suggesting an interaction between fmo-4 and the mitochondria to Supplementary Figure 7G-H. Here we show that while fmo-4 OE worms are resistant to paraquat stress, knocking down vdac-1 (a calcium regulator located in the outer mitochondrial membrane), abrogates this effect. We have kept mitochondria in our title but have made sure to temper our language in the main text to avoid pointing to a strong mitochondrial connection, since we have two pieces of evidence connecting fmo-4 to the mitochondria.

    1. Author response:

      Reviewer #1 (Public review):  

      Hüppe and colleagues had already developed an apparatus and an analytical approach to capture swimming activity rhythms in krill. In a previous manuscript they explained the system, and here they employ it to show a circadian clock, supplemented by exogenous light, produces an activity pattern consistent with "twilight" diel vertical migration (DVM; a peak at sunset, a midnight sink, and a peak in the latter half of the night). 

      They used light:dark (LD) followed by dark:dark (DD) photoperiods at two times of the year to confirm the circadian clock, coupled with DD experiments at four times of year to show rhythmicity occurs throughout the year along with DVM in the wild population. The individual activity data show variability in the rhythmic response, which is expected. However, their results showed rhythmicity was sustained in DD throughout the year, although the amplitude decayed quickly. The interpretation of a weak clock is reasonable, and they provide a convincing justification for the adaptive nature of such a clock in a species that has a wide distributional range and experiences various photic environments. These data also show that exogenous light increases the activity response and can explain the morning activity bouts, with the circadian clock explaining the evening and late-night bouts. This acknowledgement that vertical migration can be driven by multiple proximate mechanisms is important. 

      The work is rigorously done, and the interpretations are sound. I see no major weaknesses in the manuscript. Because a considerable amount of processing is required to extract and interpret the rhythmic signals (see Methods and previous AMAZE paper), it is informative to have the individual activity plots of krill as a gut check on the group data. 

      The manuscript will be useful to the field as it provides an elegant example of looking for biological rhythms in a marine planktonic organism and disentangling the exogenous response from the endogenous one. Furthermore, as high latitude environments change, understanding how important organisms like krill have the potential to respond will become increasingly important. This work provides a solid behavioral dataset to complement the earlier molecular data suggestive of a circadian clock in this species. 

      We appreciate the positive evaluation of our work by Reviewer 1, acknowledging our approach to record locomotor activity in krill as well as the importance of the findings in assessing krill’s potential to respond to environmental change in their habitat.  

      Reviewer #2 (Public review):  

      Summary: 

      This manuscript provides experimental evidence on circadian behavioural cycles in Antarctic krill. The krill were obtained directly from krill fishing vessels and the experiments were carried out on board using an advanced incubation device capable of recording activity levels over a number of days. A number of different experiments were carried out where krill were first exposed to simulated light:dark (L:D) regimes for some days followed by continuous darkness (DD). These were carried out on krill collected during late autumn and late summer. A further set of experiments was performed on krill across three different seasons (summer, autumn, winter), where incubations were all DD conditions. Activity was measured as the frequency by which an infrared beam close to the top of the incubation tube was broken over unit time. Results showed that patterns of increased and decreased activity that appeared synchronised to the LD cycle persisted during the DD period. This was interpreted as evidence of the operation of an internal (endogenous) clock. The amplitude of the behavioural cycles decreased with time in DD, which further suggests that this clock is relatively weak. The authors argued that the existence of a weak endogenous clock is an adaptation to life at high latitudes since allowing the clock to be modulated by external (exogenous) factors is an advantage when there is a high degree of seasonality. This hypothesis is further supported by seasonal DD experiments which showed that the periodicity of high and low activity levels differed between seasons. 

      Strengths 

      Although there has been a lot of field observations of various circadian type behaviour in Antarctic krill, relatively few experimental studies have been published considering this behaviour in terms of circadian patterns of activity. Krill are not a model organism and obtaining them and incubating them in suitable conditions are both difficult undertakings. Furthermore, there is a need to consider what their natural circadian rhythms are without the overinfluence of laboratory-induced artefacts. For this reason alone, the setup of the present study is ideal to consider this aspect of krill biology.

      Furthermore, the equipment developed for measuring levels of activity is well-designed and likely to minimise artefacts. 

      We would like to thank Reviewer 2 for their positive assessment of our approach to study the influence of the circadian clock on krill behavior. We are delighted, that Reviewer 2 found our mechanistic approach in understanding daily behavioral patterns of Antarctic krill using the AMAZE set-up convincing, and that the challenging circumstances of working with a polar, non-model species are acknowledged.

      Weaknesses 

      I have little criticism of the rationale for carrying out this work, nor of the experimental design. Nevertheless, the manuscript would benefit from a clearer explanation of the experimental design, particularly aimed at readers not familiar with research into circadian rhythms. Furthermore, I have a more fundamental question about the relationship between levels of activity and DVM on which I will expand below. Finally, it was unclear how the observational results made here related to the molecular aspects considered in the Discussion. 

      (1) Explanation of experimental design - I acknowledge that the format of this particular journal insists that the Results are the first section that follows the Introduction. This nevertheless presents a problem for the reader since many of the concepts and terms that would generally be in the Methods are yet to be explained to the reader. Hence, right from the start of the Results section, the reader is thrown into the detail of what happened during the LD-DD experiments without being fully aware of why this type of experiment was carried out in the first place. Even after reading the Methods, further explanation would have been helpful. Circadian cycle type research of this sort often entrains organisms to certain light cycles and then takes the light away to see if the cycle continues in complete darkness, but this critical piece of knowledge does not come until much later (e.g. lines 369372) leaving the reader guessing until this point why the authors took the approach they did. I would suggest the following (1) that more effort is made in the Introduction to explain the exact LD/DD protocols adopted (2) that a schematic figure is placed early on in the manuscript where the protocol is explained including some logical flow charts of e.g. if behavioural cycle continues in DD then internal clock exists versus if cycle does not continue in DD, the exogenous cues dominate - followed by - major decrease in cyclic amplitude = weak clock versus minor decrease = strong clock and so on 

      We would like to thank Reviewer 2 for pointing out that the experimental design and the rationale behind it are not becoming clear early in the manuscript, especially for people outside the field of chronobiology. We think that the suggestion to include a schematic figure early in the manuscript is excellent and we plan to implement this in a revised version of the manuscript.  

      (2) Activity vs kinesis - in this study, we are shown data that (i) krill have a circadian cycle - incubation experiments; (ii) that krill swarms display DVM in this region - echosounder data (although see my later point). My question here is regarding the relationship between what is being measured by the incubation experiments and the in situ swarm behaviour observations. The incubation experiments are essentially measuring the propensity of krill to swim upwards since it logs the number of times an individual (or group) break a beam towards the top of the incubation tube. I argue that krill may be still highly active in the rest of the tube but just do not swim close to the surface, so this approach may not be a good measure of "activity". Otherwise, I suggest a more correct term of what is being measured is the level of "upward kinesis". As the authors themselves note, krill are negatively buoyant and must always be active to remain pelagic. What changes over the day-night cycle is whether they decide to expend that activity on swimming upwards, downwards or remaining at the same depth. Explaining the pattern as upward kinesis then also explains by swarms move upwards during the night. Just being more active at night may not necessarily result in them swimming upwards. 

      We believe that there is a slight misunderstanding in the way that what we call “activity” is measured. The experimental columns are equipped with five detector modules, evenly distributed over the height of the column. In our analysis we count all beam breaks that are caused by upward movement, i.e. every time a detector module is triggered after a detector module at a lower position has been triggered, and not only when the top detector module is triggered. In this way, we record upward swimming movements throughout the column, and not only when the krill swims all the way to the top of the column. This still means that what we are measuring is swimming activity, caused by upward swimming. We use this measure, to deliberately separate increased swimming activity, from baseline activity (i.e. swimming which solely compensates for negative buoyancy) and inactivity (i.e. passive sinking). 

      A higher activity is thus at first interpreted as an increase in swimming activity, which in the field may result in upwards directed swimming but also could mean a horizontal increase in activity, for example representing increased foraging and feeding activity. This would explain the daily activity pattern observed under LD cycles (Fig. 2), which shows a general increase in activity during the dark phase. This nighttime increase could be used for both upward directed migration during sunset as well as horizontal directed swimming for feeding and foraging throughout the night.

      We will formulate the description of the activity metric more clearly in the revised version of the manuscript.

      (3) Molecular relevance - Although I am interested in molecular clock aspects behind these circadian rhythms, it was not made clear how the results of the present study allow any further insight into this. In lines 282 to 284, the findings of the study by Biscontin et al (2017) are discussed with regard to how TIM protein is degraded by light via the clock photreceptor CRYTOCHROME 1. This element of the Discussion would be a lot more relevant if the results of the present study were considered in terms of whether they supported or refuted this or any other molecular clock model. As it stands, this paragraph is purely background knowledge and a candidate for deletion in the interest of shortening the Discussion.  

      We agree that this part is not directly related to the data presented in the manuscript and will therefore omit this part in the revised version of the manuscript to keep the discussion concise and focused on the results. 

      Other aspects 

      (i) 'Bimodal swimming' was used in the Abstract and later in the text without the term being fully explained. I could interpret it to mean a number of things so some explanation is required before the term is introduced. 

      We thank the Reviewer for pointing this out and will provide an explanation for the term “bimodal swimming” in a revised version of the manuscript. 

      (ii) Midnight sinking - I was struck by Figure 2b with regards to the dip in activity after the initial ascent, as well as the rise in activity predawn. Cushing (1951) Biol Rev 26: 158-192 describes the different phases of a DVM common to a number of marine organisms observed in situ where there is a period of midnight sinking following the initial dusk ascent and a dawn rise prior to dawn descent. Tarling et al (2002) observe midnight sinking pattern in Calanus finmarchicus and consider whether it is a response to feeding satiation or predation avoidance (i.e. exogenous factors). Evidence from the present study indicates that midnight sinking (and potential dawn rise) behaviour could alternatively be under endogenous control to a greater or lesser degree. This is something that should certainly be mentioned in the Discussion, possibly in place of the molecular discussion element mentioned above - possibly adding to the paragraph Lines 303-319. 

      We would like to thank the Reviewer for pointing this out and agree that it would be interesting to add the idea of an endogenous control of midnight sinking to the discussion. We plan to implement this in a revised version of the manuscript. 

      (iii) Lines 200-207 - I struggled to follow this argument regarding Piccolin et al identifying a 12 h rhythm whereas the present study indicates a ~24 h rhythm. Is one contradicting the other - please make this clear. 

      In our study we found that the circadian clock drives a bimodal pattern of swimming activity in krill, meaning it controls two bouts of activity in a 24 h cycle. Piccolin et al. (2020) identified a swimming activity pattern of ~12 h (i.e. two peaks in 24 h) at the group level, which is in line with our findings at the individual level. We will revisit the mentioned section for more clarity in a revised version.   

      (iv) Although I agree that the hydroacoustic data should be included and is generally supportive of the results, I think that two further aspects should be made clear for context (a) whether there was any groundtruthing that the acoustic marks were indeed krill and not potentially some other group know to perform DVM such as myctophids (b) how representative were these patterns - I have a sense that they were heavily selected to show only ones with prominent DVM as opposed to other parts of the dataset where such a pattern was less clear - I am aware of a lot of krill research where DVM is not such a clear pattern and it is disingenuous to provide these patterns as the definitive way in which krill behaves. I ask this be made clear to the reader (note also that there is a suggestion of midnight sinking in Fig 5b on 28/2).  

      To clarify the mentioned points concerning the hydroacoustic data:

      a) As mentioned in the Methods section, only hydroacoustic data during active fishing was included in the analysis. E. superba occurs in large monospecific aggregations and the fishery is actively targeting E. superba and monitoring their catch and the proportion of non-target species continuously with cameras. Krill fishery bycatch rates are very low (0.1–0.3%, Krafft et al. 2018), and fishing operations would stop if non-target species were being caught in significant proportions at any time. Therefore, and supported by our own observations when we conducted the experiments, we argue that it is a valid assumption that the backscattering signal shown in Figure 5 is predominantly caused by E. superba. 

      b) We are aware of the fact that DVM patterns of Antarctic krill are highly variable and that normal DVM patterns do not need to be the rule (e.g. see our cited study on the plasticity of krill DVM by Bahlburg et al. 2023). The visualized data were not selected for their DVM pattern but represent the period directly preceding the sampling for behavioral experiments in four different seasons (namely S1-S4), including the day of sampling. These periods were chosen to assess the DVM behavior of krill swarms in the field in the days before and during the sampling for behavioral experiments. 

      We will include these aspects in the Methods section in a revised version of the manuscript in order to improve understanding.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors' research group had previously demonstrated the release of large multivesicular body-like structures by human colorectal cancer cells. This manuscript expands on their findings, revealing that this phenomenon is not exclusive to colorectal cancer cells but is also observed in various other cell types, including different cultured cell lines, as well as cells in the mouse kidney and liver. Furthermore, the authors argue that these large multivesicular body-like structures originate from intracellular amphisomes, which they term "amphiectosomes." These amphiectosomes release their intraluminal vesicles (ILVs) through a "torn-bag mechanism." Finally, the authors demonstrate that the ILVs of amphiectosomes are either LC3B positive or CD63 positive. This distinction implies that the ILVs either originate from amphisomes or multivesicular bodies, respectively.

      Strengths:

      The manuscript reports a potential origin of extracellular vesicle (EV) biogenesis. The reported observations are intriguing.

      Weaknesses:

      It is essential to note that the manuscript has issues with experimental designs and lacks consistency in the presented data. Here is a list of the major concerns:

      (1) The authors culture the cells in the presence of fetal bovine serum (FBS) in the culture medium. Given that FBS contains a substantial amount of EVs, this raises a significant issue, as it becomes challenging to differentiate between EVs derived from FBS and those released by the cells. This concern extends to all transmission electron microscopy (TEM) images (Figure 1, 2P-S, S5, Figure 4 P-U) and the quantification of EV numbers in Figure 3. The authors need to use an FBS-free cell culture medium.

      Although FBS indeed contains bovine EVs, however, the presence of very large multivesicular EVs (amphiectosomes) that our manuscript focuses on has never been observed and reported. For reported size distributions of EVs in FBS, please find a few relevant references below:

      PMID: 29410778, PMID: 33532042, PMID: 30940830 and PMID: 37298194

      All the above publications show that the number of lEVs > 350-500 nm is negligible in FBS. The average diameter of MV-lEVs (amphiectosomes) described in our manuscript is around 1.00-1.50 micrometer.

      Reviewer #1: These papers evaluated the effectiveness of various methods to eliminate EVs from FBS, emphasizing the challenges associated with the presence of EVs in FBS. They also caution against using FBS in EV studies due to these issues. However, I did not find a clear indication regarding the size distributions of EVs in FBS in these papers.

      Please provide accurate reference supporting the claim that 'lEVs > 350-500 nm are negligible in FBS.' The papers cited by the authors do not address this specific point.

      In the revised manuscript, we addressed the point that due to sterile filtering of FBS, it cannot contain large >0.22 µm EVs

      Our response to Reviewer #1 point 2. When we demonstrated the TEM of isolated EVs, we consistently used serum- free conditioned medium (Fig2 P-S, Fig2S5 J, O) as described previously (Németh et al 2021, PMID: 34665280).

      Reviewer #1: This is an important point that is not mentioned in the original main text, figure legend or method. Please address.

      We agree and we apologize for it. We added this information to the revised manuscript.

      Our response to Reviewer #1 point 3. Our TEM images show cells captured in the process of budding and scission of large multivesicular EVs excluding the possibility that these structures could have originated from FBS.

      Reviewer #1: These images may also depict the engulfment of EVs in FBS. Hence, it is crucial to utilize EV-free or EV-depleted FBS.

      As we mentioned earlier, we added the information to the revised manuscript that sterile filtering of the FBS presumably removed particles >0.22 µm EVs

      Our response to Reviewer #1 point 4. In addition, in our confocal analysis, we studied Palm-GFP positive, cell-line derived MV-lEVs. Importantly, in these experiments, FBS-derived EVs are non-fluorescent, therefore, the distinction between GFP positive MV-lEVs and FBS-derived EVs was evident.

      Reviewer #1: I agree that these fluorescent-labeled assays conclusively indicate that the MV-lEVs are originating from the cells. However, the images of concerns are the non- fluorescent-labeled images in (Figure 1, 2P-S, S5, Figure 4 P-U and Figure 3). The MV-lEVs may derive from both the cells and FBS.

      Please see above our response to points 1-3.

      Our response to Reviewer #1 point 5. In addition, culturing cells in FBS-free medium (serum starvation) significantly affects autophagy. Given that in our study, we focused on autophagy related amphiectosome secretion, we intentionally chose to use FBS supplemented medium.

      Reviewer #1 If this is a concern, the authors should use EV-depletive FBS.

      As we discussed above, sterile filtration of FBS removes particles >0.22 µm. In addition, based on our preliminary experiments, EV-depleted serum may effect cell physiology. 

      Our response to Reviewer #1 point 6. Even though the authors of this manuscript are not familiar with the technological details how FBS is processed before commercialization, it is reasonable to assume that the samples are subjected to sterile filtration (through a 0.22 micron filter) after which MV-lEVs cannot be present in the commercial FBS samples.

      Reviewer #1This is a fair comment that needs to be included in the manuscript.

      As you suggested, this comment is now included in the revised manuscript

      (2) The data presented in Figure 2 is not convincingly supportive of the authors' conclusion. The authors argue that "...CD81 was present in the plasma membrane-derived limiting membrane (Figures 2B, D, F), while CD63 was only found inside the MV-lEVs (Fig. 2A, C, E)." However, in Figure 2G, there is an observable CD63 signal in the limiting membrane (overlapping with the green signals), and in Figure 2J, CD81 also exhibits overlap with MV-IEVs.

      Both CD63 and CD81 are tetraspanins known to be present both in the membrane of sEVs and in the plasma membrane of cells (for references, please see Uniprot subcellular location maps: https://www.uniprot.org/uniprotkb/P08962/entry#subcellular_location https://www.uniprot.org/uniprotkb/P60033/entry#subcellular_location). However, according the feedback of the reviewer, for clarity, we will delete the implicated sentence from the text.

      Reviewer #1 Please also justify the statement questioned in (3) as these arguments are interconnected.

      We hope you find our above responses to your comment acceptable.

      (3) Following up on the previous concern, the authors argue that CD81 and CD63 are exclusively located on the limiting membrane and MV-IEVs, respectively (Figure 2-A-M). However, in lines 104-106, the authors conclude that "The simultaneous presence of CD63, CD81, TSG101, ALIX, and the autophagosome marker LC3B within the MV-lEVs..." This statement indicates that CD63 and CD81 co-localize to the MV-IEVs. The authors need to address this apparent discrepancy and provide an explanation.

      There must be a misunderstanding because we did not claim or implicate in the text that “CD81 and CD63 are exclusively located on the limiting membrane and MV-IEVs”. Here we studied co-localization of the above proteins in the case intraluminal vesicles (ILVs). In Fig 2. we did not show any analysis of limiting membrane co-localization.

      Reviewer #1 I have indicated that this statement is found in lines 104-106, where the authors argue, 'The simultaneous presence of CD63, CD81, TSG101, ALIX, and the autophagosome marker LC3B within the MV-lEVs...' If the authors acknowledge the inaccuracy of this statement, please provide a justification for this argument.

      For clarity, we modified the description of data shown in Fig2 in the revised manuscript.

      (4) The specificity of the antibodies used in Figure 2 should be validated through knockout or knockdown experiments. Several of the antibodies used in this figure detect multiple bands on western blots, raising doubts about their specificity. Verification through additional experimental approaches is essential to ensure the reliability and accuracy of all the immunostaining data in this manuscript.

      We will consider this suggestion during the revision of the manuscript.

      Reviewer #1:Please do so.

      We carefully considered the suggestion, but we realized that it was not feasible for us to perform gene silencing in the case of all our used antibodies before resubmission of our revised manuscript. However, we repeated the Western blot for mouse anti-CD81 (Invitrogen MAA5-13548) and replaced the previous Western blot by it in the revised manuscript (Fig.2-S4H)

      (5) In Figures 2P-R, the morphology of the MV-IEVs does not resemble those shown in Figures 1-A, H, and D, indicating a notable inconsistency in the data.

      EM images in Figure2 P-R show sEVs separated from serum-free conditioned media as opposed to MV-lEVs, which were in situ captured in fixed tissue cultures (Fig1). Therefore, the two EV populations necessarily have different size and structure. Furthermore, Fig. 1 shows images of ultrathin sections while in Figure 2P-R, we used a negative-positive contrasting of intact sEV-s without embedding and sectioning.

      (6) There are no loading controls provided for any of the western blot data.

      Not even the latest MISEV 2023 guidelines give recommendations for proper loading control for separated EVs in Western blot (MISEV 2023 , DOI: 10.1002/jev2.12404 PMID: 38326288). Here we applied our previously developed method (PMID: 37103858), which in our opinion, is the most reliable approach to be used for sEV Western blotting. For whole cell lysates, we used actin as loading control (Fig3-S2B).

      Reviewer #1: The blots referenced here (Fig2-S3; Fig2-S4B; Fig3-S2B) were conducted using total cell lysates, not EV extracts. Only one blot in Fig3-S2B includes an actin control. All remaining blots should incorporate actin controls for consistency.

      Fig2-S3 (corresponding to Fig2-S4 in the revised manuscript) only shows reactivity of the used antibodies. This Western blot is not intended to serve as a basis of any quantitative conclusions. Fig2-S4 (corresponding to Fig2-S5 in the revised manuscript) includes the actin control. Fig3-S2B shows the complete membrane, which was cut into 4 pieces, and the immune reactivity of different antibodies was tested. The actin band was included on the anti-LC3B blot. For clarity, we rephrased the figure legend.

      Additionally, for Figures 2-S4B, the authors should run the samples from lanes i-iii in a single gel.

      Please note that in Figure 2- S4B, we did run a single gel, and the blot was cut into 4 pieces, which were tested by anti-GFP, anti-RFP, anti-LC3A and anti-LC3B antibodies. Full Western blots are shown in Fig.3_S2 B, and lanes “1”, “2” and “3” correspond to “i”, “ii” and “iii” in Fig.2-S4, respectively.

      Reviewer #1: In the original Figure 2- S4B, the blots were sectioned into 12 pieces. If lanes "i," "ii," and "iii" were run on the same blot, the authors are advised to eliminate the grids between these lanes.

      Grids separating the lanes have been eliminated on Fig.2_S4 (now Fig.2_S5 in the revised manuscript).

      (7) In Figure 2-S4, is there co-localization observed between LC3RFP (LC3A?) with other MV-IFV markers? How about LC3B? Does LC3B co-localize with other MV-IFV markers?

      In Supplementary Figure 2-S4, we showed successful generation of HEK293T-PalmGFP-LC3RFP cell line. In this case we tested the cells, and not the released MV-lEVs. LC3A co-localized with the RFP signal as expected.

      Reviewer #1: Does LC3RFP colocalize with MV-IFV markers in HEK293T-PalmGFP-LC3RFP cell line? This experiment aims to clarify the conclusion made in lines 104-106, where the authors assert that 'The concurrent existence of CD63, CD81, TSG101, ALIX, and the autophagosome marker LC3B within the MV-lEVs...'

      In the case of PalmGFP-LC3RFP cells, LC3-RFP is overexpressed. Simultaneous assessment of this overexpressed protein with non-overexpressed, fluorescent antibod-detected molecules proved to be challenging because of spectral overlaps and inappropriate signal-noise ratios. Furthermore, in association with EVs, the number of antibody-detected molecules is substantially lower than in cells. Therefore, even though we tried, we could not successfully perform these experiments.

      (8) The TEM images presented in Figure 2-S5, specifically F, G, H, and I, do not closely resemble the images in Figure 2-S5 K, L, M, N, and O. Despite this dissimilarity, the authors argue that these images depict the same structures. The authors should provide an explanation for this observed discrepancy to ensure clarity and consistency in the interpretation of the presented data.

      As indicated in Material and Methods, Fig 2-S5 F, G, H and I are conventional TEM images fixed by 4% glutaraldehyde 1% OsO<sub>4</sub> 2h and embedded into Epon resin with a post contrasting of 3.75% uranyl acetate 10 min and 12 min lead citrate. Samples processed this way have very high structure preservation and better image quality, however, they are not suitable for immune detection. In contrast, Fig.2.-S5 K,L,M,N shows immunogold labelling of in situ fixed samples. In this case we used milder fixation (4% PFA, 0.1% glutaraldehyde, postfixed by 0.5% OsO<sub>4</sub> 30 min) and LR-White hydrophilic resin embedding. This special resin enables immunogold TEM analysis. The sections were exposed to H<sub>2</sub>O<sub>2</sub> and NaBH<sub>4</sub> to render the epitopes accessible in the resin. Because of the different applied techniques, the preservation of the structure is not the same. In the case of Fig.2 J, O, separated sEVs were visualised by negative-positive contrast and immunogold labelling as described previously (PMID: 37103858).

      Reviewer #1: Please include this justification in the revised version.

      We included this justification in the revised manuscript.

      (9) For Figures 3C and 3-S1, the authors should include the images used for EV quantification. Considering the concern regarding potential contamination introduced by FBS (concern 1), it is advisable for the authors to employ an independent method to identify EVs, thereby confirming the reliability of the data presented in these figures.

      In our revised manuscript, we will provide all the images used for EV quantification in Figure 3C. Given that Figures 3C and 3-S1 show MV-lEVs released by HEK293T-PlamGFP cells, the possible interference by FBS-derived non-fluorescent EVs can be excluded.

      Reviewer #1: Please provide all the images.

      Original LASX files are provided (DOI: 10.6019/S-BIAD1456 ).

      Reviewer #1: The images raising concerns regarding the contamination of EVs in FBS primarily consist of transmission electron microscopy (TEM) images, namely, Figure 1, 2P-S, S5, and Figure 4 P-U, along with the quantification of EV numbers in Figure 3. These concerns persist despite the use of fluorescent-labeled experiments. While fluorescent-labeled MV-lEVs are conclusively identified as originating from the cells, the MV-lEVs observed in Figure 1, 2P-S, S5, and Figure 4 P-U and Figure 3 may derive from both the cells and FBS.

      Large EVs (with diameter >800 nm) derived from FBS were not present in our experiments, as discussed above.

      (10) Do the amphiectosomes released from other cell types as well as cells in mouse kidneys or liver contain LC3B positive and CD63 positive ILVs?

      Based on our confocal microscopic analysis, in addition the HEK293T-PalmGFP cells, HT29 and HepG2 cells also release similar LC3B and CD63 positive MV-lEVs. Preliminary evidence shows MV-lEV secretion by additional cell types.

      The response of Reviewer #1: Please show these data in the revised manuscript. Moreover, do cells in mouse kidneys or liver contain LC3B positive and CD63 positive ILVs?

      We have added new confocal microscopic images to Fig2-S3 showing amphiectosomes released also by the H9c2 (ATCC) cardiomyoblast cell line. To preserve the ultrastructure of MV-lEVs in complex organs like kidney and liver, fixation with 4% glutaraldehyde with 1% OsO4 appears to be essential. This fixation does not allow for immune detection to assess LC3B and CD63 positive MV-lEVs in the ultrathin sections.

      Reviewer #2 (Public Review):

      Summary:

      The authors had previously identified that a colorectal cancer cell line generates small extracellular vesicles (sEVs) via a mechanism where a larger intracellular compartment containing these sEVs is secreted from the surface of the cell and then tears to release its contents. Previous studies have suggested that intraluminal vesicles (ILVs) inside endosomal multivesicular bodies and amphisomes can be secreted by the fusion of the compartment with the plasma membrane. The 'torn bag mechanism' considered in this manuscript is distinctly different because it involves initial budding off of a plasma membrane-enclosed compartment (called the amphiectosome in this manuscript, or MV-lEV). The authors successfully set out to investigate whether this mechanism is common to many cell types and to determine some of the subcellular processes involved.

      The strengths of the study are:

      (1) The high-quality imaging approaches used, seem to show good examples of the proposed mechanism.

      (2) They screen several cell lines for these structures, also search for similar structures in vivo, and show the tearing process by real-time imaging.

      (3) Regarding the intracellular mechanisms of ILV production, the authors also try to demonstrate the different stages of amphiectosome production and differently labelled ILVs using immuno-EM.

      Several of these techniques are technically challenging to do well, and so these are critical strengths of the manuscript.

      The weaknesses are:

      (1) Most of the analysis is undertaken with cell lines. In fact, all of the analysis involving the assessment of specific proteins associated with amphiectosomes and ILVs are performed in vitro, so it is unclear whether these processes are really mirrored in vivo. The images shown in vivo only demonstrate putative amphiectosomes in the circulation, which is perhaps surprising if they normally have a short half-life and would need to pass through an endothelium to reach the vessel lumen unless they were secreted by the endothelial cells themselves.

      Our previous results analyzing PFA-fixed, paraffin embedded sections of colorectal cancer patients provided direct evidence that MV-lEV secretion also occurs in humans in vivo (PMID: 31007874). Regarding your comment on the presence of amphiectosomes in the circulation despite their short half-lives, we would like to point out that Fig1.X shows a circulating lymphocyte which releases MV-lEV within the vessel lumen. Furthermore, in the revised manuscript, an additional Fig.1-S1 is provided. Here, we show the release of MV-lEVs both by an endothelial and a sub-endothelial cell (Fig.1-S1G). In addition, these images show the simultaneous presence of MV-lEVs and sEVs in the circulation (Fig.1-S1.A,C,D,H and I). The transmission electron micrographs of mouse kidney and liver sections provide additional evidence that the MV-lEVs are released by different types of cells, and the “torn bag release” also takes place in vivo (Fig.1.V).

      (2) The analysis of the intracellular formation of compartments involved in the secretion process (Figure 2-S5) relies on immuno-EM, which is generally less convincing than high-/super-resolution fluorescence microscopy because the immuno-labelling is inevitably very sporadic and patchy. High-quality EM is challenging for many labs (and seems to be done very well here), but high-/super-resolution fluorescence microscopy techniques are more commonly employed, and the study already shows that these techniques should be applicable to studying the intracellular trafficking processes.

      As you suggested, in the revised manuscript, we present additional super-resolution microscopy (STED) data. The intracellular formation of amphisomes, the fragmentation of LC3B-positive membranes and the formation of LC3B-positive ILVs were captured (Fig. 3B-F).

      (3) One aspect of the mechanism, which needs some consideration, is what happens to the amphisome membrane, once it has budded off inside the amphiectosome. In the fluorescence images, it seems to be disrupted, but presumably, this must happen after separation from the cell to avoid the release of ILVs inside the cell. There is an additional part of Figure 1 (Figure 1Y onwards), which does not seem to be discussed in the text (and should be), that alludes to amphiectosomes often having a double membrane.

      We agree with your comment regarding the amphisome membrane and we added a sentence to the Discussion of the revised manuscript. Fig1Y onwards is now discussed in the manuscript. In addition, we labelled the surface of living HEK293 cells with wheat germ agglutinin (WGA), which binds to sialic acid and N-acetyl-D-glucosamine. After removing the unbound WGA by washes, the cells were cultured for an additional 3 hours, and the release of amphiectosomes was studied. The budding amphiectosome had WGA positive membrane providing evidence that the external limiting membrane had a plasma membrane origin (Fig.3G)

      (4) The real-time analysis of the amphiectosome tearing mechanism seemed relatively slow to me (over three minutes), and if this has been observed multiple times, it would be helpful to know if this is typical or whether there is considerable variation.

      Thank you for this comment. In the revised manuscript, we highlight that the first released LC3 positive ILV was detected as early as within 40 sec.

      Overall, I think the authors have been successful in identifying amphiectosomes secreted from multiple cell lines and demonstrating that the ILVs inside them have at least two origins (autophagosome membrane and late endosomal multivesicular body) based on the markers that they carry. The analysis of intracellular compartments producing these structures is rather less convincing and it remains unclear what cells release these structures in vivo.

      I think there could be a significant impact on the EV field and consequently on our understanding of cell-cell signalling based on these findings. It will flag the importance of investigating the release of amphiectosomes in other studies, and although the authors do not discuss it, the molecular mechanisms involved in this type of 'ectosomal-style' release will be different from multivesicular compartment fusion to the plasma membrane and should be possible to be manipulated independently. Any experiments that demonstrate this would greatly strengthen the manuscript.

      We appreciate these comments of the reviewer. Experiments are on their way to elucidate the mechanism of the “ectosomal style” exosome release and will be the topic of our next publication.

      In general, the EV field has struggled to link up analysis of the subcellular biology of sEV secretion and the biochemical/physical analysis of the sEVs themselves, so from that perspective, the manuscript provides a novel angle on this problem.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors describe a novel mode of release of small extracellular vesicles. These small EVs are released via the rupture of the membrane of so-called amphiectosomes that resemble "morphologically" Multivesicular Bodies.

      These structures have been initially described by the authors as released by colorectal cancer cells (https://doi.org/10.1080/20013078.2019.1596668). In this manuscript, they provide experiments that allow us to generalize this process to other cells. In brief, amphiectosomes are likely released by ectocytosis of amphisomes that are formed by the fusion of multivesicular endosomes with autophagosomes. The authors propose that their model puts forward the hypothesis that LC3 positive vesicles are formed by "curling" of the autophagosomal membrane which then gives rise to an organelle where both CD63 and LC3 positive small EVs co-exist and would be released then by a budding mechanism at the cell surface that appears similar to the budding of microvesicles /ectosomes. Very correctly the authors make the distinction from migrasomes because these structures appear very similar in morphology.

      Strengths:

      The findings are interesting despite that it is unclear what would be the functional relevance of such a process and even how it could be induced. It points to a novel mode of release of extracellular vesicles.

      Weaknesses:

      This reviewer has comments and concerns concerning the interpretation of the data and the proposed model. In addition, in my opinion, some of the results in particular micrographs and immunoblots (even shown as supplementary data) are not of quality to support the conclusions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Highlight MV-IEV, ILV and limiting membrane in Figure-1G, N, and U.

      Based on the suggestion, we revised Figure1

      (2) Figure 1-Y-AF are not mentioned in the text.

      In the revised manuscript, we discuss Figure 1Y-AF

      (3) The term "IEVs" in Figure 2-S2 is not defined.

      We modified the figure legend: we changed MV-lEV to amphiectosome

      (4) Need to quantify co-localization in Figure 2-S2.

      As suggested, we carried out the co-localisation analysis (Fig2-S2I), and Fig2-S2 was re-edited

      Reviewer #2 (Recommendations For The Authors):

      I have two recommendations for improving the manuscript through additional experiments:

      (1) I think the description of the intracellular processes taking place in order to form amphiectosomes would be much stronger if some super-resolution imaging could be undertaken. This should label the different compartments before and after fusion with specific markers that highlight the protein signature of the different limiting and ILV membranes much more clearly than immuno-EM. It will also help in characterising the double-membrane structure of amphiectosomes at the point of budding and reveal whether the patchy labelling of the inner membrane emerges after amphiectosome release (the schematic model currently suggests that it happens before).

      Thank you for your suggestion. STED microscopy was applied and results are shown in new Fig3 and the schematic model was modified accordingly.

      (2) The implications of the manuscript would be more wide-ranging if the authors could test genetic manipulations that are believed to block exosome or ectosome release, eg. Rab27a or Arrdc1 knockdown. This may allow them to determine whether MV-lEVs can be released independently of the classical exosome release mechanism because they use a different route to be released from the plasma membrane. This experiment is not essential, but I think it would start to address the core regulatory mechanisms involved, and if successful, would easily allow the authors to determine the ratio of CD63-positive sEVs being secreted via classical versus amphiectosome routes.

      The suggestion is very valuable for us and these studies are being performed in a separate project.

      I think there are several other ways in which the manuscript could be improved to better explain some of the approaches, findings and interpretation:

      (1) Include some explanation in the text of certain key tools, particularly:

      a. Palm-GFP and whether its expression might alter the properties of the plasma membrane since this is used in a lot of experiments and is the only marker that seems to uniformly label the outer membrane of amphiectosomes. One concern might be that its expression drives amphiectosome secretion.

      We found evidence for amphiectosome release also in the case of several different cells not expressing Palm-GFP. We believe, this excludes the possibility that Palm-GFP expression is the inducer of the amphiectosome release. Both by fluorescent and electron microscopy, the Palm-GFP non expressing cells showed very similar MV-lEVs. In addition, in the case of non-transduced HEK293 and fluorescent WGA-binding, we made similar observations.

      b. Lactadherin - does this label the amphiectosomes after their release or does the wash-off step mean that it only labels cells, which subsequently release amphiectosomes?

      Lactadherin labels the amphiectosomes after their release and fixation. Living cells cannot be labelled by lactadherin as PS is absent in the external plasma membrane layer of living cells. We used WGA on HEK293 cells to further support the plasma membrane origin of the external membrane of amphiectosomes.

      (2) Explain the EM and confocal imaging approaches more clearly. Most importantly, is a 3D reconstruction always involved to confirm that 'separated' amphiectosomes are not joined to cells in another Z-plane.

      Thank you for your suggestion. We have modified the manuscript accordingly

      (3) Presenting triple-labelled images with red, green and yellow channels does not allow individual labelling to be determined without single-channel images and even then, it is much more informative to use three distinguishable colours that make a different colour with overlap, eg. CMY? Fig.2_S2D and E do not display individual channels, so definitely need to be changed.

      In case of Fig.2_S2D, we now show the individual channels, the earlier E image has been removed. In case of the STED images, CMY colors had been used, as you suggested.

      (4) Please discuss in the text the data in Figure 1Y onwards concerning single/double membranes on MV-lEVs.

      In the revised manuscript, we discuss the question on single/double membranes and we refer to Figure 1Y-AF

      (5) On line 162, reword 'intraluminal TSPAN4 only' to 'one in which TSPAN4 is only intraluminal' to make it clear that other proteins are also marking the intraluminal region, not TSPAN4 only.

      We modified the text accordingly.

      (6) Points for further discussion and further conclusions:

      a. In vivo experiments - discuss the limitations of this part of the analysis - it seems that none of the amphiectosome markers have been analysed in this part of the study and the MV-lEVs are only in the circulation.

      b. Can the authors give any further indication of the levels of MV-lEVs relative to free sEVs from any of their studies?

      Using our current approach, it is not possible to determine the levels of MV-lEVs to free sEV. Without analyzing serial ultrathin sections, determination of the relative ratio of MV-lEVs and sEVs would depend on the actual section plane. In future projects, we will determine the ratio of LC3 positive and negative sEVs by single EV analysis techniques (such as SP-IRIS). In the revised manuscript, additional TEM images are included to provide evidence for the simultaneous presence of sEVs and MV-lEVs and MV-lEVs both inside and outside of the circulation.

      c. Please discuss the single versus double membrane issue (relating to experiments proposed above).

      We discuss this question in more details in the revised manuscript.

      d. Please point out that the release mechanism (plasma membrane budding) will involve different molecular mechanisms to establish exosome release, and this might provide a route to determine relative importance.

      We are currently running a systemic analysis of the release mechanism of amphiectosomes, and this will be the topic of a separate manuscript.

      Reviewer #3 (Recommendations For The Authors):

      * The model is not supported.

      * The data is not of quality.

      * The appropriate methods are not exploited.

      We are sorry, we cannot respond to these unsupported critiques.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment

      This important study showing that sleep deprivation increases functional synapses while depleting silent synapses supports previous findings that excitatory signaling increases during wakefulness. This manuscript focuses in particular on AMPA/NMDA ratios. An interesting, although speculative, aspect of the manuscript is the inclusion of a model for the accumulation of sleep need that is based upon the MEF2C transcription factor but also links to the sleep-regulating SIK3-HDAC4/5 pathway. The authors have clarified some questions raised in the previous review, but the evidence for major claims was still found to be incomplete, requiring additional experimentation.

      The major claims of this study are: 1) SD increases the AMPA/NMDA receptor ratio and RS restores it; 2) SD decreases silent synapses compared to CS and RS restores their number after SD; 3) the majority of SD-induced DEGs are found in ExIT cells (glutamate pyramidal neurons projecting within the telencephalon); 4) ExIT SD-induced DEGs are enriched for genes encoding synaptic shaping components and for autism spectrum disorder risk and; 5) these DEGs are also enriched for DEGs induced by Mef2c loss of function restricted to forebrain glutamate neurons (ExIT cells comprise a subset of these) and by over-expression of constitutively nuclear HDAC4 that represses MEF2c transcriptional function. The last claim is consistent with an intracellular signaling model (presented as a hypothesis to be tested, in figure 4B).

      [The above is added to the start of the discussion section.]

      The specific claims are supported by solid evidence provided in this manuscript. The statistical support is now more clearly presented, with several changes in response to queries by reviewer 1.

      The technical issues raised by reviewer 1 do not detract from the claims, thus supported. The rationale for this assessment is expanded below in response to reviewer 1.

      Summary:

      This manuscript by Vogt et al examines how the synaptic composition of AMPA and NMDA receptors changes over sleep and wake states. The authors perform whole-cell patch clamp recordings to quantify changes in silent synapse number across conditions of spontaneous sleep, sleep deprivation, and recovery sleep after deprivation. They also perform single nucleus RNAseq to identify transcriptional changes related to AMPA/NMDA receptor composition following spontaneous sleep and sleep deprivation. The findings of this study are consistent with a decrease in silent synapse number during wakefulness and an increase during sleep. However, these changes cannot be conclusively linked to sleep/wake states. Measurements were performed in motor cortex, and sleep deprivation was achieved by forced locomotion, raising the possibility that recent patterns of neuronal activity, rather than sleep/wake states, are responsible for the observed results.

      Strengths:

      This study examines an important question. Glutamatergic synaptic transmission has been a focus of studies in the sleep field, but AMPA receptor function has been the primary target of these studies. Silent synapses, which contain NMDA receptors but lack AMPA receptors, have important functional consequences for the brain. Exploring the role of sleep in regulating silent synapse number is important to understanding the role of sleep in brain function. The electrophysiological approach of measuring the failure rate ratio, supported by AMPA/NMDA ratio measurements, is a rigorous tool to evaluate silent synapse number.

      The authors also perform snRNAseq to identify genes differentially expressed in the spontaneous sleep and sleep deprivation groups. This analysis reveals an intriguing pattern of upregulated genes controlled by HDAC4 and Mef2c, along with synaptic shaping component genes and genes associated with autism spectrum disorder, across cell types in the sleep deprivation group. This unbiased approach identifies candidate genes for follow-up studies. The finding that ASD-risk genes are differentially expressed during SD also raises the intriguing possibility that normal sleep function is disrupted in ASD.

      Weaknesses:

      A major consideration to the interpretation of this study is the use of forced locomotion for sleep deprivation. Measurements are made from motor cortex, and therefore the effects observed could be due to differences in motor activity patterns across groups, rather than lack of sleep per se.

      Experimentally induced lack of sleep always involves differences in motor activity. As previously noted in revision 1, motor learning is unlikely to occur in this paradigm and inspection of the video (in supplementary materials) shows no repetitive motor behavioral sequences during the sleep deprivation, nor can this be considered exercise due to the very slow speed of treadmill movement employed. The obvious major difference between groups is a lack of sleep per se. (See below in the “Recommendations for authors”, reviewer 1 for comments on localized wake activity inducing localized sleep-need responses)

      Considering that other groups have failed to find a difference in AMPA/NMDA ratio in mice with different spontaneous sleep/wake histories (Bridi et al., Neuron 2020), confirmation of these findings in a different brain region would greatly strengthen the study.

      The study of Bridi et al., Neuron 2020, is not comparable to our study for several important reasons. First, their compared groups were from different circadian phases (180 degrees out of phase), whereas in our study, the circadian times for each group were matched (ZT=6hours). Second, experimentally induced sleep loss did not occur whereas it was a focus of our study. Third, spontaneous sleep/wake cannot be accurately matched amongst subjects whereas in our study, sleep loss was matched exactly between groups.

      We agree that assessment of AMPA/NMDA ratio and silent synapse number in sleep deprived compared to ad libitum sleep in other areas of the neocortex is of great interest and something we hope to pursue. It would not be surprising to find differences as preliminarily reported by Bahl, et al., Nat Commun. 2024 Jan 26;15(1):779. However, such data would not further strengthen our already well supported evidence for the differences we report in the motor cortex.

      The electrophysiological measurements and statistical analyses raise several questions. Input resistance (cutoffs and actual values) are not provided, making it difficult to assess recording quality.

      As stated in our first reply, these data were omitted (an admitted oversight on our part) but are now supplied in the methods section as, “Series resistance values for the recording pipette ranged between 8 and 15 MOhm and experiments with changes larger than 25% were not used for further analyses”. We have now also added the Rs/Rm (as a separate column) for each recorded neuron in table 1.

      Parametric one-way ANOVAs were used, although the data do not appear to be normally distributed.

      We have now removed all the One-way ANOVA tests for clarity (non-parametric tests were previously supplied in addition to the one-way ANOVA tests). Determination of significance with Kruskal-Wallis non-parametric test has not altered statistical support for our conclusions.

      Reviewer 1 correctly points out that we had not tested for normality of our distributions- the distributions are likely to be normal but the sample size is too small to confidently make this call  for the ratio data which is why we removed the one-way ANOVA’s entirely from table 1.

      Two-way ANOVA’s are used to assess AMPA and EPSC amplitudes and failure rates (table 1 tab 2&5)  across sleep conditions. As now indicated (table 1, tab 2&5), the distributions of AMPA and NMDA amplitudes and FRs passed the D'Agostino & Pearson test for normality and QQ plots provide illustration supporting this claim.

      In addition, for the AMPA/NMDA and FRR measurements (Figures 1E, F), the SD group (rather than the control sleep group) was used as the control group for post-hoc comparisons, but it is unclear why.

      The label of “control group” is arbitrary. CS and RS groups are similar (sleep density for RS>CS as expected).  Since this appears to be confusing, we now compare all groups to one another in table 1 with the same statistical outcome (additional comparison of CS to RS).

      While the data appear in line with the authors' conclusions, the number of mice (3/group) and cells recorded is low, and adding more would better account for inter-animal variability and increase the robustness of the findings.

      Of course, the larger the sample, the better the approximation to the population. Our sample sizes yielded significant differences at the usual p<=0.05 threshold with non-parametric testing. A larger sample size could allow for normality testing of the distributions of the data, but fortunately, this was not necessary to support our conclusions.

      The snRNAseq data are intriguing. However, several genes relevant to the AMPA/NMDA ratio are mentioned, but the encoded proteins would be expected to have variable effects on AMPA/NMDA receptor trafficking and function, making the model presented in Figure 4C oversimplified. A more thorough discussion of the candidate genes and pathways that are upregulated during sleep deprivation, the spatiotemporal/posttranslational control of protein expression, and their effects on AMPA/NMDA trafficking vs function is warranted.

      We have not studied the candidate genes at this point and do not yet understand their potential role(s) in sleep-related AMPA/NMDA functional ratio, only that their expression levels are altered with sleep condition. We agree with the reviewer that the data are intriguing and in need of further investigation. An important first step that can help direct such studies is the identification and preliminary characterization of good candidate genes with respect their cell type specificity, significance and fold change as we have done. Their potential roles likely depend on “the spatiotemporal/posttranslational control” and other factors as reviewer 1 notes.

      Reviewer #2 (Public review):

      Here Vogt et al., provide new insights into the need for sleep and the molecular and physiological response to sleep loss. The authors expand on their previously published work (Bjorness et al., 2020) and draw from recent advances in the field to propose a neuron-centric molecular model for the accumulation and resolution of sleep need and basis of restorative sleep function. While speculative, the proposed model successfully links important observations in the field and provides a framework to stimulate further research and advances on the molecular basis of sleep function. In my review, I highlight the important advances of this current work, the clear merits of the proposed model, and indicate areas of the model that can serve to stimulate further investigation.

      Strengths:

      Reviewer comment on new data in Vogt et al., 2024

      Using classic slice electrophysiology, the authors conclude that wakefulness (sleep deprivation (SD)) drives a potentiation of excitatory glutamate synapses, mediated in large part by "un-silencing" of NMDAR-active synapses to AMPAR-active synapses. Using a modern single nuclear RNAseq approach the authors conclude that SD drives changes in gene expression primarily occurring in glutamatergic neurons. The two experiments combined highlight the accumulation and resolution of sleep need centered on the strength of excitatory synapses onto excitatory neurons. This view is entirely consistent with a large body of extant and emerging literature and provides important direction for future research.

      Consistent with prior work, wakefulness/SD drives an LTP-type potentiation of excitatory synaptic strength on principle cortical neurons. It has been proposed that LTP associated with wake, leads to the accumulation of sleep need by increasing neuronal excitability, and by the "saturation" of LTP capacity. This saturation subsequently impairs the capacity for further ongoing learning. This new data provides a satisfying mechanism of this saturation phenomenon by introducing the concept of silent synapses. The new data show that in mice well rested, a substantial number of synapses are "silent", containing an NMDAR component but not AMPARs. Silent synapses provide a type of reservoir for learning in that activity can drive the un-silencing, increasing the number of functional synapses. SD depletes this reservoir of silent synapses to essentially zero, explaining how SD can exhaust learning capacity. Recovery sleep led to restoration of silent synapses, explaining how recovery sleep can renew learning capacity. In their prior work (Bjorness et al., 2020) this group showed that SD drives an increase in mEPSC frequency onto these same cortical neurons, but without a clear change in pre-synaptic release probability, implying a change in the number of functional synapses. This prediction is now born out in this new dataset.

      The new snRNAseq dataset indicates the sleep need is primarily seen (at the transcriptional level) in excitatory neurons, consistent with a number of other studies. First, this conclusion is corroborated by an independent, contemporary snRNAseq analysis recently available as a pre-print (Ford et al., 2023 BioRxiv https://doi.org/10.1101/2023.11.28.569011). A recently published analysis on the effects of SD in drosophila imaged synapses in every brain region in a cell-type dependent manner (Weiss et al., PNAS 2024), concluding that SD drives brain wide increases in synaptic strength almost exclusively in excitatory neurons. Further, Kim et al., Nature 2022, heavily cited in this work, show that the newly described SIK3-HDAC4/5 pathway promotes sleep depth via excitatory neurons and not inhibitory neurons.

      The new experiments provided in Fig1-3 are expertly conducted and presented. This reviewer has no comments of concern regarding the execution and conclusions of these experiments.

      Reviewer comment on model in Vogt et al., 2024

      To the view of this reviewer the new model proposed by Vogt et al., is an important contribution. The model is not definitively supported by new data, and in this regard should be viewed as a perspective, providing mechanistic links between recent molecular advances, while still leaving areas that need to be addressed in future work. New snRNAseq analysis indicates SD drives expression of synaptic shaping components (SSCs) consistent with the excitatory synapse as a major target for the restorative basis of sleep function. SD induced gene expression is also enriched for autism spectrum disorder (ASD) risk genes. As pointed out by the authors, sleep problems are commonly reported in ASD, but the emphasis has been on sleep amount. This new analysis highlights the need to understand the impact on sleep's functional output (synapses) to fully understand the role of sleep problems in ASD.

      Importantly, SD induced gene expression in excitatory neurons overlap with genes regulated by the transcription factor MEF2C and HDAC4/5 (Fig. 4). In their prior work, the authors show loss of MEF2C in excitatory neurons abolished the SD transcriptional response and the functional recovery of synapses from SD by recovery sleep. Recent advances identified HDAC4/5 as major regulators of sleep depth and duration (in excitatory neurons) downstream of the recently identified sleep promoting kinase SIK3. In Zhou et al., and Kim et al., Nature 2022, both groups propose a model whereby "sleep-need" signals from the synapse activate SIK3, which phosphorylates HDAC4/5, driving cytoplasmic targeting, allowing for the de-repression and transcriptional activation of "sleep genes". Prior work shows that HDAC4/5 are repressors of MEF2C. Therefore, the "sleep genes" derepressed by HDAC4/5 may be the same genes activated in response to SD by MEF2C. The new model thereby extends the signaling of sleep need at synapses (through SIK3-HDAC4/5) to the functional output of synaptic recovery by expression of synaptic/sleep genes by MEF2C. The model thereby links aspects of expression of sleep need with the resolution of sleep need by mediating sleep function: synapse renormalization.

      Weaknesses:

      Areas for further investigation.

      In the discussion section Vogt et al., explore the links between excitatory synapse strength, arguably the major target of "sleep function", and NREM slow-wave activity (SWA), the most established marker of sleep need. SIK3-HDAC4/5 have major effects on the "depth" of sleep by regulating NREM-SWA. The effects of MEF2C loss of function on NREM SWA activity are less obvious, but clearly impact the recovery of glutamatergic synapses from SD. The authors point out how adenosine signaling is well established as a mediator of SWA, but the links with adenosine and glutamatergic strength are far from clear. The mechanistic links between SIK3/HDAC4/5, adenosine signaling, and MEF2C, are far from understood. Therefore, the molecular/mechanistic links between a synaptic basis of sleep need and resolution with NREM-SWA activity require further investigation.

      Additional work is also needed to understand the mechanistic links between SIK3-HDAC4/5 signaling and MEF2C activity. The authors point out that constitutively nuclear (cn) HDAC4/5 (acting as a repressor) will mimic MEF2C loss of function. This is reasonable, however, there are notable differences in the reported phenotypes of each. Notably, cnHDAC4/5 suppresses NREM amount and NREM SWA but had no effect on the NREM-SWA increase following SD (Zhou et al., Nature 2022).

      We speculate that the effect of cnHDAC4/5 to reduce NREM-SWA together with the reduction of NREM amount may be due to a localized increase in neuronal excitability of arousal centers, which would be expected to mask NREM-SWA. Rebound NREM-SWA may reflect the relative rebound increase of NREM-SWA still present under chronic masking conditions (induced by cnHDAC4/5) of increased arousal system excitability. A similar effect to overcome NREM-SWA masking was reported in a Kcna2 KO mouse (a Shaker homologue) by Douglas, et al. (2007, BMC Biol).

      Loss of MEF2C in CaMKII neurons had no effect on NREM amount and suppressed the increase in NREM-SWA following SD (Bjorness et al., 2020). These instances indicate that cnHDAC4/5 and loss of MEF2C do not exactly match suggesting additional factors are relevant in these phenotypes. Likely HDAC4/5 have functionally important interactions with other transcription factors, and likewise for MEF2C, suggesting areas for future analysis.

      This is not a surprising outcome since both MEF2c and HDAC4/5 are transcription factors whose function(s) are determined by multiple other factors a subset of which are relevant to sleep conditions while other determining factors are not necessarily relevant to sleep. These factors can include their phosphorylation state, genomic accessibility, and interaction with other transcription factors. All these other factors are known to be both cell type specific and determined by intracellular conditions, that in turn, are affected by extracellular conditions and ligands. We certainly agree there is much future analysis needed.

      One emerging theme may be that the SIK3-HDAC4/5 axis are major regulators of the sleep state, perhaps stabilizing the NREM state once the transition from wakefulness occurs. MEF2C is less involved in regulating sleep per se, and more involved in executing sleep function, by promoting restorative synaptic modifications to resolve sleep need.

      A useful way to restate the above might be to distinguish between control of arousal levels determining the behavioral states, wake or sleep (including REM sleep) and control of sleep function. The term, sleep, is typically used to describe the behavioral state of sleep that acts as a permissive gate to sleep function (that resolves sleep need). The sleep state should not be conflated with sleep function. There is abundant evidence that control of arousal can be dissociated from sleep need and sleep function.

      Finally, advances in the roles of the respective SIK3-HDAC4/5 and MEF2C pathways point towards transcription of "sleep genes", as clearly indicated in the model of Fig.4. Clearly more work is needed to understand how the expression of such genes ultimately lead to resolution of sleep need by functional changes at synapses.

      We are in full agreement. We also note the SIK3-HDAC4/5 pathway may have more than one role, i.e., to affect arousal centers to alter behavioral state and, more generally, to control MEF2c’s transcriptional activity thus controlling sleep-related, glutamate, synaptic phenotype.

      What are these sleep genes and how do they mechanistically resolve sleep need? Thus, the current work provides a mechanistic framework to stimulate further advances in understanding the molecular basis for sleep need and the restorative basis of sleep function.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) I appreciate the authors' thoughtful discussion of the use of forced locomotion for their sleep deprivation technique in their response, as well as the additional information that was provided regarding use of the treadmill in the manuscript. However, given that previous studies have failed to find a difference in AMPA/NMDA ratio following spontaneous sleep vs wake, confirmation of the findings in a non-motor brain region with the same SD technique (or confirmation within motor cortex with a different technique, although the authors correctly point out that other techniques also increase locomotor activity) would greatly strengthen the paper.

      Addressed above

      Notably, differences in motor activity patterns, not necessarily overall amount of locomotion, may induce differential synaptic changes between groups. This point at least warrants acknowledgement and discussion, but this has not been incorporated into the text of the manuscript.

      We will incorporate the following into the discussion:

      There is evidence that learning of a motor task  or experience of forced altered motor activity can result in localized increases in NREM (slow wave sleep)-slow wave activity (Huber R, Ghilardi MF, Massimini M, Tononi G. Local sleep and learning. Nature. 2004;430(6995):78-81); Huber et al., 2006) in the motor cortex. Since SWS-SWA is considered a marker for sleep homeostasis, the altered motor activity induced increase of SWS-SWA was considered evidence for sleep-related function. Our earlier work has clearly shown that the treadmill method of SD increases frontal cortical SWS-SWA rebound, indicating a sleep-homeostatic process (Bjorness et al., 2016; Bjorness et al., 2020). Furthermore, we have also shown that this means of experimental SD causes similar glutamate synaptic changes as those observed using other means of SD like gentle handling (Liu, et al., JoNS 2010).

      (2) The number of mice and cells used for electrophysiology in this study remains low; more animals should be included to account for inter-animal variability.

      For this study, increasing the number of mice and cells will have p<0.05 chance of altering our conclusions by rejecting the null hypotheses of the electrophysiology findings.

      (3) The additional methodological information provided allays some of my concerns regarding the electrophysiological data. However, information about the input resistance (cutoffs used and/or actual values) is still not provided, which is important for assessing recording quality.

      We have now supplied the experimentally determined input resistance for each neuron used in this study (a separate column in table 1, tabs marked, “data”).

      (4) It is not meaningful to compare raw AMPA or NMDA responses because stimulus electrode placement will differ between cells, potentially activating different numbers of afferents. Presenting these comparisons (Figure 1C) has the potential to mislead the reader.

      This is not misleading (it didn’t mislead reviewer 1) as we described the conditions. As expected by reviewer 1, the variability using “raw AMPA or NMDA responses…” was too great, but did indicate an interaction between receptor responses and sleep condition. This provided (as stated in the results section) rationale to examine, and to only draw conclusions from the AMPA/NMDA amplitude and FR ratios.

      (5) I appreciate clarification on the statistics and the authors' response has answered some of my questions. However, this also raises additional questions. What test was used to determine normality (and therefore whether to perform a parametric vs nonparametrictest)?

      Described above.

      Why was the FRR data analysis changed to a parametric test, when it does not appear that the data are normally distributed?

      Showing the parametric test was a mistake on our part- there are not enough samples to conclusively conclude the distributions are normal as reviewer 1 correctly suspects. However, the non-parametric Kruskal-Wallis tests that we also show  in table 1 indicate significant differences between conditions and the non-parametric, two-stage linear step-up procedure of Benjamini, Krieger and Yekutieli, indicates significant differences between CS-SD and RS-SD but not for CS-RS, supporting our conclusions. The (unsupported) parametric tests are now removed in Table 1 leaving behind the non-parametric test.

      Why were post-hoc tests chosen to compare to a control group rather than all pairwise comparisons,

      We now provide post-hoc all-pairwise comparisons to give the same results using the BKY analysis.

      and why was the SD rather than CS group used as the control in Figures 1E and F?

      Why were different post-hoc tests chosen for the data in Figures 1E, F?

      There was no need for this and we now, only show statistics that are used to draw our conclusions for the AMPA/NMDA EPSC ratios data shown in Figure 1E and Failure Rate Ratios data shown in Figure 1F (the conclusions are supported by the non-parametric post-hoc test and remain unchanged).

      (6) Genes in the SSC, ASD, Mef2cKO, and HD4cn categories are almost exclusively upregulated in the SD group compared to the CS group (Figure 4A). As the authors point out in their response, "No claim of mechanism linking the changed expression to altered AMPAR or NMDAR activity can be made at this point," largely due to the fact that we do not know the spatiotemporal or posttranslational modification patterns of the translated proteins, and how they affect receptor trafficking vs function. This is in agreement with my original point: as written (and as illustrated in Figure 4C), the manuscript implies that upregulation during SD increases the AMPA/NMDA ratio via receptor trafficking,

      The model indicates a likely (but not necessarily exclusive) role for AMPA/NMDA trafficking to explain the functional electrophysiological data that we do report and which is not in dispute. The SSC-DEGs in ExIT cells are consistent with sleep-altered AMPA/NMDA trafficking but remain only a correlation. However, the point is taken and Figure 4c has been revised to only reflect what we have observed electrophysiologically and the speculated mechanism(s) mediated by observed SSC-DEGs are illustrated with “?’s”.

      while in reality the picture is likely much more complicated, and therefore a more thorough discussion is warranted. Some discussion was provided in the authors' response but does not appear to have been incorporated into the text or Figure 4C.

      As indicated above the proposed model is changed in Figure 4c to more explicitly indicate which aspects reflect our electrophysiological data and which aspects reflect only an association of observations. 

      Minor comments:

      (1) Please justify only using male mice

      We had to start somewhere with our limited resources. Our intentions are to follow up with similar experiments using female mice, should funding be realized.

      (2) The model in Figure 4C is oversimplified and remains problematic, for the reasons stated in comment #6, above.

      See responses above.

      (3) Figure 4D remains confusing

      We agree. The unnecessary addition of adenosine effects on cholinergic arousal centers (experimentally well supported), have been removed from the figure to provide a more focused indication of how SWS-SWA can be related to either MEF2c and/or to ADORA1 activation through reduction of glutamate synaptic strength. ADORA1 activation elicits reduced glutamate synaptic activity through pre- and postsynaptic inhibition whereas MEF2c activation is essential to reduce sleep elicited, glutamate EPSC reduction. Reduced glutamate synaptic strength, whatever the cause, is associated with increased SWS-SWA.

    1. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The study by Aguirre-Botero et al. shows the dynamics of 3D11 anti-CSP monoclonal antibody (mAb) mediated elimination of rodent malaria Plasmodium berghei (Pb) parasites in the liver. The authors show that the anti-CSP mAb could protect against intravenous (i.v.) Pb sporozoite challenge along with the cutaneous challenge, but requires higher concentration of antibody. Importantly, the study shows that the anti-CSP mAb not only affects sporozoite motility, sinusoidal extravasation, and cell invasion but also partially impairs the intracellular development inside the liver parenchyma, indicating a late effect of this antibody during liver stage development. While the study is interesting and conducted well, the only novel yet very important observation made in this manuscript is the effect of the anti-CSP mAb on liver stage development.

      Major

      This observation is highlighted in the manuscript title but is supported by only limited data. A such it needs to be substantiated and a mechanism should be investigated.  The phenomenon of intracellular effects of the anti-CSP mAb should be analyzed in much more detail. For example, can the authors demonstrate uptake of the Ab together with the parasite during hepatocyte invasion? What cellular mechanism leads to elimination?

      Lines 234 - 243; 308 - 325: These results are the gist of the entire study and also defined the title of the manuscript. Thus, it would be pre-mature to claim the substantial effect of 3D11 antibody in late killing of the parasite in the infected hepatocytes just by looking at the decreased GFP fluorescence. The authors need to at least verify the fitness of the liver stages by measuring the size of the developing parasites as well as using different parasite specific markers (UIS4, MSP1, HSP70 etc.) in immunofluorescence assays on the infected liver sections and in vitro infections. 

      We greatly appreciate the comments. We have taken the suggestions into consideration and deepened the characterization of 3D11's late killing of parasites. We first analyzed the presence of 3D11 in the intracellular parasite after the invasion and compared it with the CSP expression on the surface of control parasites (new Fig. 4F). Next, we tested a potential action of 3D11 added in the cell culture after the invasion (new Fig. 4G). The two new panels and the text accompanying them are shown below.

      “Post-invasion labeling of 3D11 bound to the membrane of intracellular parasites revealed a strong staining surrounding the parasite at 2 and 15h, but only punctual traces of 3D11 at 44h (Figure 4F, 3D11, 3D11). Of note, CSP was detected surrounding the control parasites at all time-points indicating that the lack of staining at 44h is not due to a decrease in the CSP amount on the parasite surface (Figure 4F, CSP, Control).  To evaluate the potential post-invasion entry of 3D11 into the PV of infected cells and posterior neutralization of intracellular parasites, we incubated invaded cells from 2 to 44 h with 3D11, but no effect on the parasite intracellular development was observed (Figure 4G, 2h p.i.). 3D11 incubated for 2 h with sporozoites and cells elicited, as expected, a dose-dependent inhibition of parasite development. Altogether, our results indicate that the late inhibition of parasite development is already achieved at 15h and likely caused by antibodies dragged inside cells bound to sporozoites before or during the invasion.”

      Finally, we better characterized the parasite loss of fitness caused by 3D11 in infected cells by quantifying the parasite size, GFP intensity and the presence and intensity of UIS4, a parasitophorous vacuole membrane developmental marker at 2, 4 and 44h as described below in the new figure 5 and accompanying text.

      “To further characterize the killing of intracellular parasites by 3D11 in HepG2 cells, we next evaluated the expression of the parasitophorous vacuole membrane (PVM) marker, UIS4 37, to infer the parasite intracellular development at 2, 4 and 44h. HepG2 cells were incubated with Pb-GFP expressing sporozoites in the absence (Control, Figure 5) or presence of 1.25 µg/mL of 3D11 during the first two hours of incubation (3D11, Figure 5). The chosen 3D11 concentration led to ~50% decrease in cell invasion (Figure 4C, 2h) and ~30% decrease in the post-invasion number of EEFs (Figure 4D), leaving enough parasites to be analyzed by microscopy. To distinguish between extracellular and intracellular parasites at 2h, washed and fixed samples were incubated with mouse 3D11 mAb (1µg/mL) and revealed with a fluorescent anti-mouse secondary antibody (Figure 5A, 3D11 in blue). Samples were then permeabilized and incubated with a goat anti-UIS4 polyclonal antibody revealed with a fluorescent anti-goat secondary antibody (Figure 5A, UIS4 in red). DNA was stained with Hoechst (Figure 5A, DNA in white).

      Extracellular GFP+ sporozoites were identified by their 3D11+UIS4- phenotype (Figure 5A, 2h, extracellular). Conversely, intracellular parasites were identified by their 3D11- phenotype and stained positive or negative for UIS4 (Figure 5A, 2h and 44h, intracellular). UIS4+ PVM is normally associated with a productive cell infection 37. However, a small number of EEFs can develop in the absence of UIS4 37, likely inside the host cell nucleus (Figure 5A, 44h, intranuclear).

      In the control and 3D11-treated groups, the percentage of intracellular UIS4- parasites decreased 2 to 3-fold from 2 to 44h, as expected of a parasite population negative for a marker of productive infection (Figure 5B). However, while at 2h in the control group, this population represented 14% of intracellular parasites, in the 3D11-treated group, it reached 48% (Figure 5B). This ~3-fold increase in the UIS4 negative population could explain the late killing of intracellular sporozoites by 3D11. Whether this population is constituted by intracellular transmigratory sporozoites lacking a PVM or parasites surrounded by a PVM, but incapable of secreting UIS4 still needs to be determined. At 44h, surviving EEFs in the 3D11-treated samples presented a similar area and UIS4 staining intensity than control parasites (Figure 5C, D). However, as observed by flow cytometry (Figure 4D), the GFP intensity of 3D11-treated parasites was significantly lower than control EEFs, indicating that 3D11 can somehow affect protein expression with undetermined effects in the genesis of red blood cell infecting stages.”

      Minor<br /> • Line 44 - 43: The statement is applicable only to the rodent infecting Plasmodium parasites. The authors need to clarify that.

      This is an important clarification. We have modified the text that now reads:

      “The sporozoite surface is covered by a dense coat of the circumsporozoite protein (CSP), shown to be an immunodominant protective antigen using a rodent malaria model”

      • Line 68: Replace the second 'against' after the CSP with 'of'.

      It is done.

      • Line 141 - 143: The 3D11 mAb does affect the homing and killing in the blood of cutaneous injected sporozoites. The authors need to clearly state that the statement is true only for i.v. injected sporozoites.

      Thank you for the comment. Now the text reads:

      “Altogether, these data indicate that 3D11 rather than having an early effect on i.v. inoculated sporozoites in the blood circulation, e.g. by inhibiting the homing or killing the parasite in the blood, requires more than 4 h to eliminate most parasites in the liver.”

      • Figure 3B: The numbers of sporozoites detected in the experiment varies from 0 h (line 172) to 2 h (line 184). Therefore, the numbers need to be mentioned on all the bars of each timepoint.

      We have now added the numbers at the top of the graph from Figure 3B.

      • Figure 3C: If the authors have used flk1-GFP mice, then how well they were able to detect the Pb-PfCSP GFP parasites in the vessel vs. parenchyma in the intravital imaging? The representative images for Pb-PfCSP GFP should also be included.

      Since 3D11 does not target PbPf parasites most of them are motile in the movies, making them easily distinguishable from the endothelial cells. In addition, the stronger GFP intensity of sporozoites makes them detectable in the sinusoids. Representative images were added in the new Figure S3.

      • It is not mentioned anywhere how the viability of the sporozoites was determined. This has to be described especially in the methods section.

      • Also, the flow acquisition and data analysis of the sporozoites and infected HepG2 cells must be described in the method section.

      We briefly mentioned it in the results (line 228- 230): “In addition, by comparing the total number of recovered GFP+ sporozoites at 2 h in the two studied conditions, we measured the early lethality (%viable sporozoites, Figure 4B) of the anti-CSP Ab on the extracellular forms of the parasite (Figure 4A).”

      A more detailed description has been added in the methods section that now reads:

      “After 2 h, the supernatant was collected, and the culture was washed 2x with 0.5 volume of PBS. The cells were subsequently trypsinized. The supernatant plus the washing steps and the trypsinized cells were analyzed by flow cytometry to quantify the amount of GFP+ events inside and outside cells (Figure 3A and Figure S4). Viability was then quantified by the sum of the total number of sporozoites (GPF+ events) in the supernatant, inside and outside the cells. We calculated the percentage of parasite viability by dividing the average of the total number of sporozoites in the treated samples by the average in controls using three technical replicates for each condition. Additionally, we quantified the percentage of infected cells using the total number of GFP+ events in the HepG2 gate (Figure S4). To compare the biological replicates, we further normalized to the control of each experiment. For the samples used to analyze parasite development, the cells were incubated for 15 or 44 h after sporozoite addition, and the medium was changed after 2 and 24 h. The cells were trypsinized and the percentage of intracellular parasites was determined by flow cytometry as described above (Figure S4). The prolonged effect between 2 h and 15/44 h was calculated by normalizing the percentage of infected cells at 15/44 h to that of 2 h. For all flow cytometry measurements, the same volume was acquired.”

      • Figure 4: The flow layouts should be included for at least comparing the 0 vs. 5 μg/ml of 3D11 mAb concentrations.

      Flow layouts were added in the supplementary figure 4.

      • Line 651 (Figure S1 legend): Typographical error '14'.

      Thank you for noticing. We corrected it.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Aguirre-Botero and collaborators report on the dynamics of Plasmodium parasite elimination in the liver using the 3D11 anti-CSP monoclonal antibody (mAb). By using microscopy and bioluminescence imaging in the P. berghei rodent malaria model, the authors first demonstrate that higher antibody concentrations are required for protection against intravenous sporozoite challenge, when compared to cutaneous challenge, which is not surprising. The study also shows that the 3D11 mAb reduces sporozoite motility, impairs hepatic sinusoidal barrier crossing, and more relevantly inhibits intracellular development of liver stages through its cytotoxic activity. These findings highlight the role of this specific monoclonal antibody, 3D11 mAb against CSP, in targeting sporozoites in the liver.
>

      Major Comments

      The study provides valuable insights into the mechanisms of protection conferred by the 3D11 anti-CSP monoclonal antibody against P. berghei sporozoites and this finding allow the field to speculate that other monoclonal antibodies against CSP of P. Falciparum may act similarly. However, an important experiment is missing that would significantly strengthen the conclusions. Specifically, the authors should perform experiments where the monoclonal antibody is added immediately after the sporozoites have completed invasion. This should be done both in vitro and in vivo to show whether the antibody has any effect on intracellular development of liver stages when added after invasion.

      While the claims are generally supported by the data presented, to comprehensively conclude the late cytotoxic effects of 3D11, the additional experiment of post-invasion antibody application is relevant. This would help determine if the observed effects are due to the antibody's action during invasion or its continued action post-invasion.

      The data and methods are presented in a manner that allows for reproducibility. The use of microscopy and bioluminescence imaging is well-documented. The experiments appear adequately replicated, and statistical analyses are appropriate.

      We thank reviewer 2 for these important suggestions. To be sure that the effect might not come from the internalization of the antibodies after sporozoite invasion, we tested the amount of 3D11 bound to the parasite following invasion (new Fig. 4F) and the potential post-invasion neutralizing effect of 3D11 in vitro. The results obtained are presented below.

      “Post-invasion labeling of 3D11 bound to the membrane of intracellular parasites revealed a strong staining surrounding the parasite at 2 and 15h, but only punctual traces of 3D11 at 44h (Figure 4F, 3D11, 3D11). Of note, CSP was detected surrounding the control parasites at all time-points indicating that the lack of staining at 44h is not due to a decrease in the CSP amount on the parasite surface (Figure 4F, CSP, Control).  To evaluate the potential post-invasion entry of 3D11 into the PV of infected cells and posterior neutralization of intracellular parasites, we incubated invaded cells from 2 to 44 h with 3D11, but no effect on the parasite intracellular development was observed (Figure 4G, 2h p.i.). 3D11 incubated for 2 h with sporozoites and cells elicited, as expected, a dose-dependent inhibition of parasite development. Altogether, our results indicate that the late inhibition of parasite development is already achieved at 15h and likely caused by antibodies dragged inside cells bound to sporozoites before or during the invasion.”

      Minor Comments

      The text and figures are clear and accurate. Some minor typographical errors should be corrected.

      Thank you for the remark; we have verified the text again to remove typographical errors.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Aguirre-Botero et al have studied the effect of a potent monoclonal antibody against the circumsporozoite protein, the major surface protein of the malaria sporozoite. This is an elegantly designed, performed, and analyzed study. They have efficiently delineated the mode of action of anti-CSP repeat mAb and confirmed previous in vitro work (not cited) that demonstrated the same intracellular effect. 

      Specific comments

      Line 51: The authors claim a correlation between high antibody levels and protection. However, they did not provide direct proof that these antibodies were responsible for protection, nor did they establish a cut-off level of anti-CSP antibodies that would distinguish between protected and unprotected individuals.

      We thank reviewer 3 for the comments. Indeed, we agree with reviewer 3, these are correlative studies where the causality cannot be established. We modified the ensuing sentence to specify the causality between anti-CSP mAbs and in vivo protection against sporozoite infection. Now the text reads:

      “Extensive research has demonstrated a positive correlation between high levels of anti-CSP antibodies (Abs) induced by the RTS,S/AS01 vaccine and efficacy against malaria(11-13). Remarkably, anti-CSP monoclonal Abs (mAbs) have been proven to protect in vivo against malaria in various experimental settings, including, mice(14-21), monkeys(23), and humans(24-26)”

      Line 326: The late intrahepatic effect of mAb against the CSP repeat has been previously reported (see Figure 2, Nudelman et al, J Immunol, 1989). The effect was shown to affect the transition from liver trophozoites to liver schizonts. This study should be cited and discussed.

      Thank you for this important remark. We included this seminal reference and now the modified text reads:

      “Notably, a similar effect has been previously reported using sera from mice immunized with PfCSP or mAb against P. yoelii (Py) CSP. Incubation of Pf or Py sporozoites with the immune sera or mAbs not only affected sporozoite invasion in vitro but continued to affect intracellular forms for several days after invasion(38,39). Additionally, using anti-PfCSP sera, it was also observed that late EEFs from sera-treated sporozoites had abnormal morphology(38). Altogether, it was thus concluded that the anti-CSP Abs present in the sera had a long-term effect on the parasites(38,39).”

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Kaya et al. studies the effect of food consumption on hippocampal sharp wave ripples (SWRs) in mice. The authors use multiple foods and forms of food delivery to show that the frequency and power of SWRs increase following food intake, and that this effect depends on the caloric content of food. The authors also studied the effects of the administration of various food-intake-related hormones on SWRs during sleep, demonstrating that ghrelin negatively affects SWR rate and power, but not GLP-1, insulin, or leptin. Finally, the authors use fiber photometry to show that GABAergic neurons in the lateral hypothalamus, increase activity during a SWR event.

      Strengths:

      The experiments in this study seem to be well performed, and the data are well presented, visually. The data support the main conclusions of the manuscript that food intake enhances hippocampal SWRs. Taken together, this study is likely to be impactful to the study of the impact of feeding on sleep behavior, as well as the phenomena of hippocampal SWRs in metabolism.

      Weaknesses:

      Details of experiments are missing in the text and figure legends. Additionally, the writing of the manuscript could be improved.

      We thank the reviewer for their favorable assessment of the work and its potential impact. We will add all requested details in the text and figure legends and will revise the wording of the manuscript to improve its clarity.

      Reviewer #2 (Public review):

      Summary:

      Kaya et al uncover an intriguing relationship between hippocampal sharp wave-ripple production and peripheral hormone exposure, food intake, and lateral hypothalamic function. These findings significantly expand our understanding of hippocampal function beyond mnemonic processes and point a direction for promising future research.

      Strengths:

      Some of the relationships observed in this paper are highly significant. In particular, the inverse relationship between GLP1/Leptin and Insulin/Ghrelin are particularly compelling as this aligns well with opposing hormone functions on satiety.

      Weaknesses:

      I would be curious if there were any measurable behavioral differences that occur with different hormone manipulations.

      We thank the reviewer for their favorable assessment of the work and its contribution to our understanding of non-mnemonic hippocampal function. Whether there are behavioral differences that occur following administration of the different hormones is a great question, yet unfortunately our study design did not include fine behavioral monitoring to the degree that would allow answering it. While some previous studies have partially addressed the behavioral consequences of the delivery of these hormones (we will include a reference to these studies in the revised manuscript), how these changes may interact with the hippocampal and hypothalamic effects we observe is a very interesting next step.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Kaya et al. explores the effects of feeding on sharp wave-ripples (SWRs) in the hippocampus, which could reveal a better understanding of how metabolism is regulated by neural processes. Expanding on prior work that showed that SWRs trigger a decrease in peripheral glucose levels, the authors further tested the relationship between SWRs and meal consumption by recording LFPs from the dorsal CA1 region of the hippocampus before and after meal consumption. They found an increase in SWR magnitude during sleep after food intake, in both food restricted and ad libitum fed conditions. Using fiber photometry to detect GABAergic neuron activity in the lateral hypothalamus, they found increased activity locked to the onset of SWRs. They conclude that the animal's satiety state modulates the amplitude and rate of SWRs, and that SWRs modulate downstream circuits involved in regulating feeding. These experiments provide an important step forward in understanding how metabolism is regulated in the brain. However, currently, the paper lacks sufficient analyses to control for factors related to sleep quality and duration; adding these analyses would further support the claim that food intake itself, as opposed to sleep quality, is primarily responsible for changes in SWR activity. Adding this, along with some minor clarifications and edits, would lead to a compelling case for SWRs being modulated by a satiety state. The study will likely be of great interest in the field of learning and memory while carrying broader implications for understanding brain-body physiology.

      Strengths:

      The paper makes an innovative foray into the emerging field of brain-body research, asking how sharp wave-ripples are affected by metabolism and hunger. The authors use a variety of advanced techniques including LFP recordings and fiber photometry to answer this question. Additionally, they perform comprehensive and logical follow-up experiments to the initial food-restricted paradigm to account for deeper sleep following meal times and the difference between consumption of calories versus the experience of eating. These experiments lay the groundwork for future studies in this field, as the authors pose several follow-up questions regarding the role of metabolic hormones and downstream brain regions.

      We thank the reviewer for their appreciation and constructive review of the work.

      Weaknesses:

      Major comments:

      (1) The authors conclude that food intake regulates SWR power during sleep beyond the effect of food intake on sleep quality. Specifically, they made an attempt to control for the confounding effect of delta power on SWRs through a mediation analysis. However, a similar analysis is not presented for SWR rate. Moreover, this does not seem to be a sufficient control. One alternative way to address this confound would be to subsample the sleep data from the ad lib and food restricted conditions (or high calorie and low calorie, etc), to match the delta power in each condition. When periods of similar mean delta power (i.e. similar sleep quality) are matched between datasets, the authors can then determine if a significant effect on SWR amplitude and rate remains in the subsampled data.

      This is an important point that we believe we addressed in a few complementary ways. First, the mediation analysis we implemented measures the magnitude and significance of the contribution of food on SWR power after accounting for the effects of delta power, showing a highly significant food-SWR contribution. While the objective of subsampling is similar, mediation is a more statistically robust approach as it models the relationship between food, SWR power, and delta power in a way that explicitly accounts for the interdependence of these variables. Further, subsampling introduces the risk of losing statistical power by reducing the sample size, due to exclusion of data that might contain relevant and valuable information. Mediation analysis, on the other hand, uses the full dataset and retains statistical power while modeling the relationships between variables more holistically. However, as we were not satisfied with a purely analytical approach to test this issue, we carried out a new set of experiments in ad-libitum fed mice, where there is no potential issue of food restriction impairing sleep quality in the pre-sleep session. In these conditions food amount also significantly correlated with, and showed significant mediation of, the SWR power change. Finally, we acknowledge and discuss this point in the Discussion, highlighting that given the known relationship between cortical delta and SWRs, it is challenging to fully disentangle these signals.

      (2) Relatedly, are the animals spending the same amount of time sleeping in the ad lib vs. food restricted conditions? The amount of time spent sleeping could affect the probability of entering certain stages of sleep and thus affect SWR properties. A recent paper (Giri et al., Nature, 2024) demonstrated that sleep deprivation can alter the magnitude and frequency of SWRs. Could the authors quantify sleep quantity and control for the amount of time spent sleeping by subsampling the data, similar to the suggestion above?

      We will include a comparison of sleep amount in the revised manuscript.

      Additionally, we will add details to the Methods section that were missing in the original submission that are relevant to this point. Specifically, within the sleep sessions, the ongoing sleep states were scored using the AccuSleep toolbox (https://github.com/zekebarger/AccuSleep) using the EEG and EMG signals. NREM periods were detected based on high EEG delta power and low EMG power, REM periods were detected based on high EEG theta power and low EMG power, and Wake periods were detected based on high EMG power. Importantly, only NREM periods were included for subsequent SWR detection, quantification and analyses (in particular, reported SWR rates reflect the number of SWRs per second of NREM sleep).

      (3) Plot 5I only reports significance but does not clearly show the underlying quantification of LH GABAergic activity. Upon reading the methods for how this analysis was conducted, it would be informative to see a plot of the pre-SWR and post-SWR integral values used for the paired t-test whose p-values are currently shown. For example, these values could be displayed as individual points overlaid on a pair of box-and-whisker plots of the pre- and post-distribution within the session (perhaps for one example session per mouse with the p-value reported, to supplement a plot of the distribution of p-values across sessions and mice). If these data are non-normal, the authors should also use a non-parametric statistical test.

      We will include this quantification and visual representation in the revised manuscript.

      Minor comments:

      (4) A brief explanation (perhaps in the discussion) of what each change in SWR property (magnitude, rate, duration) could indicate in the context of the hypothesis may be helpful in bridging the fields of metabolism and memory. For example, by describing the hypothesized mechanistic consequence of each change, could the authors speculate on why ripple rate may not increase in all the instances where ripple power increases after feeding? Why do the authors speculate that ripple duration does not increase, given that prior work (Fernandez-Ruiz et al. 2019) has shown that prolonged ripples support enhanced memory?

      We will include a discussion of these points in the revised manuscript.

      (5) The authors suggest that "SWRs could modulate peripheral metabolism" as a future implication of their work. However, the lack of clear effects from GLP-1, leptin and insulin complicates this interpretation. It might be informative for readers if the authors expanded their discussion of what specific role they speculate that SWRs could play in regulating metabolism, given these negative results.

      While we provided potential explanations for the lack of effects of the hormone administrations, we will further elaborate on this point in the revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: <br /> In this manuscript, the authors identified that 

      (1) CDK4/6i treatment attenuates the growth of drug-resistant cells by prolongation of the G1 phase; 

      (2) CDK4/6i treatment results in an ineffective Rb inactivation pathway and suppresses the growth of drug-resistant tumors;

      (3) Addition of endocrine therapy augments the efficacy of CDK4/6i maintenance;

      (4) Addition of CDK2i with CDK4/6 treatment as second-line treatment can suppress the growth of resistant cell;

      (5) The role of cyclin E as a key driver of resistance to CDK4/6 and CDK2 inhibition.

      Strengths:

      To prove their complicated proposal, the authors employed orchestration of several kinds of live cell markers, timed in situ hybridization, IF and Immunoblotting. The authors strongly recognize the resistance of CDK4/6 + ET therapy and demonstrated how to overcome it.

      Weaknesses:

      The authors need to underscore their proposed results from what is to be achieved by them and by other researchers. 

      Thank you for your thoughtful review and for highlighting both the strengths and weaknesses of our manuscript. We appreciate your recognition of the methodological rigor and the significance of our findings in addressing resistance to CDK4/6 inhibitors combined with endocrine therapy.

      To address your concern regarding the need to delineate our results from those achieved by other researchers, we will incorporate clarifications in the revised manuscript. Specifically, we will:

      (1) Clearly distinguish our novel contributions from prior findings in the field.

      (2) Explicitly cite and discuss relevant studies to contextualize our work, ensuring that our contributions are appropriately framed within the broader body of knowledge.

      These revisions will enhance the transparency and impact of our manuscript, as well as highlight the originality and significance of our findings. Thank you again for your constructive feedback.

      Reviewer #2 (Public review):

      Summary:

      This study elucidated the mechanism underlying drug resistance induced by CDK4/6i as a single agent and proposed a novel and efficacious second-line therapeutic strategy. It highlighted the potential of combining CDK2i with CDK4/6i for the treatment of HR+/HER2- breast cancer.

      Strengths:

      The study demonstrated that CDK4/6 induces drug resistance by impairing Rb activation, which results in diminished E2F activity and a delay in G1 phase progression. It suggests that the synergistic use of CDK2i and CDK4/6i may represent a promising second-line treatment approach. Addressing critical clinical challenges, this study holds substantial practical implications.

      Weaknesses: 

      (1) Drug-resistant cell lines: Was a drug concentration gradient treatment employed to establish drug-resistant cell lines? If affirmative, this methodology should be detailed in the materials and methods section. 

      We greatly appreciate the reviewer for raising this important question. In the revised manuscript, we will update the methods section to include a detailed description of how the drug-resistant cell lines were developed. Specifically, we will clarify whether a drug concentration gradient treatment was employed and provide step-by-step details to ensure reproducibility.

      (2) What rationale informed the selection of MCF-7 cells for the generation of CDK6 knockout cell lines? Supplementary Figure 3. A indicates that CDK6 expression levels in MCF-7 cells are not notably elevated. 

      We appreciate the reviewer’s insightful question about the rationale for selecting MCF-7 cells to generate CDK6 knockout cell lines. This choice was guided by prior studies highlighting the significant role of CDK6 in mediating resistance to CDK4/6 inhibitors (1-4). Moreover, we observed a 4.6-fold increase in CDK6 expression in CDK4/6 inhibitor-resistant MCF-7 cells compared to their drug-naïve counterparts (Supplementary Figure 3A). While we did not detect notable differences in CDK4/6 activity between wild-type and CDK6 knockout cells under CDK4/6 inhibitor treatment, these findings point to a potential non-canonical function of CDK6 in conferring resistance to CDK4/6 inhibitors.

      (3) For each experiment, particularly those involving mice, the author must specify the number of individuals utilized and the number of replicates conducted, as detailed in the materials and methods section. 

      We sincerely thank the reviewer for bringing this to our attention. In the revised manuscript, we will provide explicit details regarding the number of replicates and mice used for each experiment. This information will be included in the materials and methods section, figure legends, and relevant text to ensure transparency and clarity.

      (4) Could this treatment approach be extended to triple-negative breast cancer? 

      We greatly appreciate the reviewer’s inquiry about extending our findings to triple-negative breast cancer (TNBC). Based on our data presented in Figure 1 and Supplementary Figure 2, which include the TNBC cell line MDA-MB-231, we anticipate that the benefits of maintaining CDK4/6 inhibitors could indeed be applied to TNBC with an intact Rb/E2F pathway.

      Reviewer #3 (Public review):

      Summary:

      In their manuscript, Armand and colleagues investigate the potential of continuing CDK4/6 inhibitors or combining them with CDK2 inhibitors in the treatment of breast cancer that has developed resistance to initial therapy. Utilizing cellular and animal models, the research examines whether maintaining CDK4/6 inhibition or adding CDK2 inhibitors can effectively control tumor growth after resistance has set in. The key findings from the study indicate that the sustained use of CDK4/6 inhibitors can slow down the proliferation of cancer cells that have become resistant, and the combination of CDK2 inhibitors with CDK4/6 inhibitors can further enhance the suppression of tumor growth. Additionally, the study identifies that high levels of Cyclin E play a significant role in resistance to the combined therapy. These results suggest that continuing CDK4/6 inhibitors along with the strategic use of CDK2 inhibitors could be an effective strategy to overcome treatment resistance in hormone receptor-positive breast cancer.

      Strengths:

      (1) Continuous CDK4/6 Inhibitor Treatment Significantly Suppresses the Growth of Drug-Resistant HR+ Breast Cancer: The study demonstrates that the continued use of CDK4/6 inhibitors, even after disease progression, can significantly inhibit the growth of drug-resistant breast cancer.

      (2) Potential of Combined Use of CDK2 Inhibitors with CDK4/6 Inhibitors: The research highlights the potential of combining CDK2 inhibitors with CDK4/6 inhibitors to effectively suppress CDK2 activity and overcome drug resistance.

      (3) Discovery of Cyclin E Overexpression as a Key Driver: The study identifies overexpression of cyclin E as a key driver of resistance to the combination of CDK4/6 and CDK2 inhibitors, providing insights for future cancer treatments.

      (4) Consistency of In Vitro and In Vivo Experimental Results: The study obtained supportive results from both in vitro cell experiments and in vivo tumor models, enhancing the reliability of the research.

      (5) Validation with Multiple Cell Lines: The research utilized multiple HR+/HER2- breast cancer cell lines (such as MCF-7, T47D, CAMA-1) and triple-negative breast cancer cell lines (such as MDA-MB-231), validating the broad applicability of the results.

      Weaknesses:

      (1) The manuscript presents intriguing findings on the sustained use of CDK4/6 inhibitors and the potential incorporation of CDK2 inhibitors in breast cancer treatment. However, I would appreciate a more detailed discussion of how these findings could be translated into clinical practice, particularly regarding the management of patients with drug-resistant breast cancer. 

      We greatly appreciate this opportunity to further contextualize our findings within clinical practice. In the revised manuscript, we will expand the discussion to explore how the identified mechanisms can inform patient stratification and therapeutic combinations. We will also highlight the potential of integrating CDK2 inhibitors with continued CDK4/6 inhibition as a second-line strategy for HR+ breast cancer patients who exhibit resistance to CDK4/6 inhibitors, leveraging insights from current and ongoing clinical trials. This will provide a clearer framework for translating our findings into actionable therapeutic strategies.

      (2) While the emergence of resistance is acknowledged, the manuscript could benefit from a deeper exploration of the molecular mechanisms underlying resistance development. A more thorough understanding of how CDK2 inhibitors may overcome this resistance would be valuable. 

      Thank you for this insightful suggestion. In the revised manuscript, we will delve deeper into the molecular mechanisms by which CDK2 inhibitors counteract resistance to CDK4/6 inhibitors and endocrine therapy. We will emphasize the role of the non-canonical Rb inactivation pathway and upregulated transcriptional activity in reactivating CDK2, which contribute to resistance under CDK4/6 inhibition. Furthermore, we will discuss how dual inhibition of CDK4/6 and CDK2 effectively suppresses this resistance pathway, offering a mechanistic rationale for the therapeutic potential of this combination strategy.

      (3) The manuscript supports the continued use of CDK4/6 inhibitors, but it lacks a discussion on the long-term efficacy and safety of this approach. Additional studies or data to support the safety profile of prolonged CDK4/6 inhibitor use would strengthen the manuscript. 

      We greatly appreciate the reviewer for raising this important point. To address this, we will incorporate a discussion on the long-term safety and efficacy of CDK4/6 inhibitor maintenance therapy. Drawing from clinical trials and retrospective analyses (5-9), we will highlight data supporting the tolerability of prolonged CDK4/6i treatment, particularly in combination with endocrine therapy. We will also discuss its clinical benefits over chemotherapy or endocrine therapy alone, contextualizing these findings with our proposed therapeutic approach (6,8-11).

      References:

      (1) Yang C, Li Z, Bhatt T, Dickler M, Giri D, Scaltriti M_, et al._ Acquired CDK6 amplification promotes breast cancer resistance to CDK4/6 inhibitors and loss of ER signaling and dependence. Oncogene 2017;36:2255-64

      (2) Li Q, Jiang B, Guo J, Shao H, Del Priore IS, Chang Q_, et al._ INK4 Tumor Suppressor Proteins Mediate Resistance to CDK4/6 Kinase Inhibitors. Cancer Discov 2022;12:356-71

      (3) Ji W, Zhang W, Wang X, Shi Y, Yang F, Xie H_, et al._ c-myc regulates the sensitivity of breast cancer cells to palbociclib via c-myc/miR-29b-3p/CDK6 axis. Cell Death & Disease 2020;11:760

      (4) Wu X, Yang X, Xiong Y, Li R, Ito T, Ahmed TA_, et al._ Distinct CDK6 complexes determine tumor cell response to CDK4/6 inhibitors and degraders. Nature Cancer 2021;2:429-43

      (5) Martin JM, Handorf EA, Montero AJ, Goldstein LJ. Systemic Therapies Following Progression on First-line CDK4/6-inhibitor Treatment: Analysis of Real-world Data. Oncologist 2022;27:441-6

      (6) Xi J, Oza A, Thomas S, Ademuyiwa F, Weilbaecher K, Suresh R_, et al._ Retrospective Analysis of Treatment Patterns and Effectiveness of Palbociclib and Subsequent Regimens in Metastatic Breast Cancer. J Natl Compr Canc Netw 2019;17:141-7

      (7) Basile D, Gerratana L, Corvaja C, Pelizzari G, Franceschin G, Bertoli E_, et al._ First- and second-line treatment strategies for hormone-receptor (HR)-positive HER2-negative metastatic breast cancer: A real-world study. Breast 2021;57:104-12

      (8) Kalinsky K, Accordino MK, Chiuzan C, Mundi PS, Sakach E, Sathe C_, et al._ Randomized Phase II Trial of Endocrine Therapy With or Without Ribociclib After Progression on Cyclin-Dependent Kinase 4/6 Inhibition in Hormone Receptor–Positive, Human Epidermal Growth Factor Receptor 2–Negative Metastatic Breast Cancer: MAINTAIN Trial. Journal of Clinical Oncology;0:JCO.22.02392

      (9) Kalinsky K, Bianchini G, Hamilton EP, Graff SL, Park KH, Jeselsohn R_, et al._ Abemaciclib plus fulvestrant vs fulvestrant alone for HR+, HER2- advanced breast cancer following progression on a prior CDK4/6 inhibitor plus endocrine therapy: Primary outcome of the phase 3 postMONARCH trial. Journal of Clinical Oncology 2024;42:LBA1001-LBA

      (10) Mayer EL, Wander SA, Regan MM, DeMichele A, Forero-Torres A, Rimawi MF_, et al._ Palbociclib after CDK and endocrine therapy (PACE): A randomized phase II study of fulvestrant, palbociclib, and avelumab for endocrine pre-treated ER+/HER2- metastatic breast cancer. Journal of Clinical Oncology 2018;36:TPS1104-TPS

      (11) Llombart-Cussac A, Harper-Wynne C, Perello A, Hennequin A, Fernandez A, Colleoni M_, et al._ Second-line endocrine therapy (ET) with or without palbociclib (P) maintenance in patients (pts) with hormone receptor-positive (HR[+])/human epidermal growth factor receptor 2-negative (HER2[-]) advanced breast cancer (ABC): PALMIRA trial. Journal of Clinical Oncology 2023;41:1001-

    1. Author response:

      We appreciate the time and thoughtful reviews of all 3 reviewers. Ahead of a full revision of the paper, we would like to address a couple of points the reviewers have raised that we plan to address in more detail in our full revision.

      (1) The relationship between membrane tension and interfacial tension: The major request by reviewers was for a better explanation of the relationship between measured mechanical parameters and membrane interfacial tension. We plan to include a schematic of the different forces at play in the membrane and to clarify our discussion and here, provide a brief explanation.

      In our study, we identified a relationship between channel activation pressure and two membrane mechanical properties (area expansion modulus (K<sub>A</sub>) and bending rigidity (K<sub>c</sub>)) though we did not find a correlation between channel activation pressure and a third mechanical property (membrane fluidity). Through further computational analysis of the membranes, we identified an additional property called interfacial tension that helps unify and explain our results. Interfacial tension (γ) is a property akin to surface tension that reflects the chemical composition at the interface of the membrane (between the polar headgroups of the lipids and the hydrophobic acyl chains of the lipids) and balances the repulsive interaction of the nonpolar hydrocarbon chains with the polar headgroup regions of the lipids. In the established polymer brush model, the expansion modulus is proportional to the interfacial tension (W. Rawicz, Biophyiscal Journal, 2000)

      γ = K<sub>A</sub>/C,

      where C is a constant. Interfacial tension occurs at the boundary between the lipid bilayer and external aqueous environment and is different from mechanical tension. While mechanical membrane tension (t) reflects a physical force in plane with the membrane, interfacial tension reflects the chemical composition at each interface of the membrane. While mechanical membrane tension depends on the size and shape of the membrane, interfacial tension is independent of these features and depends on the molecular composition of the liquid-liquid interface. An expanded discussion on this topic was recently provided (Lipowsky. Faraday Discussions. 2024). While distinct, these two properties can be related to one another via the area expansion modulus (K<sub>A</sub>). Typically, one would imagine that upon reducing interfacial tension, and correspondingly reducing the K<sub>A</sub>, it should now take less energy to stretch the membrane to the same extent and should reduce the activation pressure (and corresponding in plane mechanical tension ) required to open an embedded mechanosensitive channel. Interestingly though, interfacial tension also works to pull the channel open so that a reduction in interfacial tension also means more energy will be required to open the channel. We find that reductions in interfacial tension and corresponding increased energy required to open embedded channels outweighs the reduced tension that should be required to stretch the membrane. We plan to more clearly explain this tradeoff in our revision. Overal, our findings identify the exact properties driving mechanosensitive channel behavior in our study. Further, they provide a guide to understanding how and why shifts in mechanosensitive channel activation occur by connecting chemical composition changes to the changes in membrane tension propagation in a given membrane.

      (2) Data presentation to support determined area expansion modulus and bending rigidity values: We will show stress strain curves used to derive Ka and kc values

      (3) Address why membrane tension data was not shown for ephys experiments: The micropipette and patch clamp setups are different, and we did not use the same system for both measurements. In fact, limitations in tools that would allow for concurrent tension measurements while conducting channel activation measurements have limited our understanding of the role of membrane tension on mechanosensation to date. While recent studies have attempted to resolve this limitation through the design of new tools that enable concurrent monitoring of mechanosensitive channel activation and membrane tension (Lüchtefeld et al. Nature Methods. 2024), these tools were not available to us during our study or now. Because our study also attempted to connect these two features (membrane tension and channel activation) but we lacked tools to do so simultaneously, we used two sets of measurements to separately uncover membrane mechanical properties and channel activation pressure.

      One reason it is difficult to measure membrane tension during a typical patch clamp study is because of limitations in the imaging equipment and pipettes used for this assay. The experiment is usually done by looking through the eyepiece and the pipette angle is around 45 degrees from the plane of the stage so it would be hard to visualize changes in the patch geometry in the tip of the pipette. Basically, we are able to see the pipette touch the GMPV, but cannot resolve the patch moving up the pipette. In response to the reviewer comment that tension=pressure difference times pipette radius divided by 2, we were unable to measure the radius and changes in radius of a patch upon increases in applied pressure due to the above mentioned imaging constraints. This limitation is why we were unable to directly measure applied tension with our current patch clamp set up.

      (4) Interfacial tension is not experimentally measured: Interfacial tension = K<sub>A</sub> /C where C is a constant (typically C=4 for bilayer membranes). The best way to measure interfacial tension is to determine K<sub>A</sub> (the area expansion modulus), which we have experimentally done by generating stress vs strain curves for GPMVs. In literature, reductions in interfacial tension of a membrane are typically experimentally determined by measuring a corresponding reduction in the associated K<sub>A</sub> value (eg. Ly and Longo. Biophys J. 2004). We have similarly followed this approach.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Therefore, their tool may be useful for stimulating multiple populations using a blue excitatory opsin in neuron A and their tool for red excitation of neuron B… Yet, there are no data presented that showcases their new tool for this purpose

      We agree with the reviewer that in this manuscript we have not experimentally shown the applicability of our system for dual optical stimulation. However, the suppression of blue-light excitation of ZipV/T-IvfChr-expressing neurons strongly suggests this can be used in experiments exciting populations of neurons similarly shown for BiPOLES. We don’t see a theoretical basis where this experiment cannot be done if sufficient cell targeting mechanisms (such as the use of cre-lox or retroAAV) is utilized. We have started several projects pursuing these utilities in the meantime.

      While they do show that red light = excitation and blue light = inhibition, they neither show 1) all-optical on/off modulation of the same cell; nor 2) high-frequency inhibition or excitation (max stim rate of 20hz, which is the same as the BiPOLES paper used for their LC stimulation paradigm; Vierock, as above, Figure 7a-d).

      Regarding point 1, we understand that the reviewer asks if we have optically excited (with red light) and inhibited (with blue light) the same neurons. If so, figure 4B1 (optical excitation of ZipT-IvfCh with red light) and figure 5A (optical inhibition of  ZipT-IvfCh with blue light) represent largely the same set of neurons.

      Regarding point 2, we respectfully disagree with the reviewer’s interpretation of Figure 7a-d) in Vierock et al. As we understand, in this part the authors apply a 20 Hz optical stimulation protocol to the LC neurons in vivo. However, there is no data showing that individual neurons do follow this stimulation protocol. To be clear, we are not saying that BiPOLES cannot drive 20 Hz APs. Very likely it can. It is based on ChrimsonR which is capable of doing so (Klapoetke et al., Figure 2). Although, in this manuscript we have not shown data for optical stimulation above 20Hz, our system is based on vfChrimson, which is known to drive AP of 100Hz and above (Mager et al., figure 2 and 3).  

      they must revise the manuscript to show that their approach is both 1) different in some way when compared to BiPOLES (it is my understanding that they did not do this, as per the supplementary alignment of the BiPOLES sequence and the sequence of the BiPOLES-like construct that they did test) and 2) that the properties that the investigators specifically tailored their construct to have confer some sort of experimental advantage when compared to the existing standard.

      In the latest version of the manuscript, we have compared our ZipV-IvfChr and the BiPOLES construct adapted with vfChrimson (Fig. 2 Suppl 1). The mean photocurrent amplitude of IvfChr in the ZipV-IvfChr construct is ~2.7 x higher than BiPOLES adapted with vfChrimson (14 randomly selected HEK293 cells in each group) (Fig. 2 Suppl 1B). We conducted this experiment in HEK293 cells to ensure accurate voltage-clamping and less biased cell selection. Even adjusting for the smaller photocurrent of vfChrimson vs ChrimsonR, this would still translate to ~1.6 x greater photocurrent with ZipV-IvfChr compared to the original BiPOLES utilizing ChrimsonR. We believe the increased efficiency of excitation is an important aspect of adapting vfChrimson for red-light excitation of neurons.

      Reviewer #2 (Public Review):

      (1) In the Introduction or Discussion, the authors could better motivate the need for a red-shifted actuator that lacks blue crosstalk, by giving some specific examples of how the tool could be productively used, e.g. pairing with another blue-shifted excitatory opsin in a different population, or pairing with a GFP-based fluorescent indicator, e.g. GCaMP. The motivation for the current tool is not obvious to non-experts.

      In the discussion, we now provided examples for potential use of the tool. For example, one of the key aspects that can be manipulated by the existing tool is the induction of spike-timing dependent plasticity with 2 wavelengths of light with blue light channelrhodopsin such as oChIEF is used to evoke presynaptic release and ZipT-IvfChr expressed in postsynaptic neuron. In this situation, the rapid termination of inhibitory response is critical so it does not interfere with the induction of LTP or LTD. Another experiment is the alternate control of projection neurons and interneurons in cortical areas, independent controls of neurons of direct and indirect pathways in the striatum to manipulate behavior.

      (2) Simultaneous excitation and inhibition are not the same as non-excitation. The authors mentioned shunting briefly. Another possible issue is changes in osmotic balance. Activation of a Na+ channel and a Cl- channel will lead to net import of NaCl into the cell, possibly changing osmotic pressure. Please discuss.

      We agree with the notion that osmotic, ionic and pH changes in small neuronal structure can be disruptive to the physiology and this is the reason we developed our approach where the fastest channelrhodopsins are used so we can minimize the channel opening time and the flux of ions through the channels when brief light illuminations are applied. Not only the flux of protons, sodium ions and calcium ions are minimized, the flux of chloride should be minimal as well (as the membrane potential should be close to the reversal potential of chloride reversal potential hence low ion flow). Hence our approach should be minimally disruptive compared to most other existing channelrhodopsin-based approaches when short or minimal light pulses were used in conjunction with our tools. This recommendation is included in the updated manuscript .

      (3) The authors showed that in ZipT-IvfChr, orange light drives excitation and blue light does not. But what about simultaneous blue and orange light? Can the blue light overwhelm the effect of the orange light? Since the stated goal is to open the blue part of the spectrum for other applications, one is now worried about "negative" crosstalk. Please discuss and, ideally, characterize this phenomenon.

      We now have performed this experiment. Simultaneous blue (470nm) and red light (635nm) stimulation does not produce AP (Fig .4 Suppl 1A)). This suggests the inhibitory effect of ACR is more efficient than the excitatory effects of IvfChr due to their higher conductance, this re-emphasizes the rapid termination of the ACR effects is critical for minimal disruption of physiological effects in such pairing strategy.

      (3.1) Does the use of the new tool require careful balancing of the expression levels of the ZipT and the IvfChr? Does it require careful balancing of blue and orange light intensities?

      As with any optogenetic tool, the users should validate the efficacy of the tool in their own system. Our tool solely relies on the balanced expression of the 2A system, the efficiency of the two opsins and their degradation of the time-span of expression. These aspects of the tool would be better addressed in future versions of the tools or improvement of the BiPOLES-type of tandem expression in subsequent versions. From the instrumentation side, the light intensity and differential penetration depth requires careful consideration. However, this holds true in most optogenetic and fluorescence imaging-based approaches as well. In the current update of the manuscript, we have included further discussion on these aspects as well.

      (3.2) Also, many opsins show complex and nonlinear responses to dual-wavelength illumination, so each component should be characterized individually under simultaneous blue + orange light.

      We now have performed this experiment (please see our comment to point 3)

      (3.3) I was expecting to see photocurrents at different holding potentials as a function of illumination wavelength for the coexpressed construct (i.e. to see at what wavelength it switches from being excitatory to inhibitory); and also to see I-V curves of the photocurrent at blue and orange wavelengths for the co-expressed constructs (i.e. to see the reversal potential under blue excitation). Overall, the patch clamp and spectroscopic characterization of the individual constructs was stronger than that of the combined constructs.

      We have added the IV curves for the co-expressed construct at different holding potentials for 470nm and 635nm wavelengths. This shows reverse potential for the two wavelengths that are intended for in vitro and in vivo applications. Performing a similar experiment for a variety of wavelengths would not be as valuable, in part, due to the enormous amount of data generated. As we have shown in the study, the response of any channelrhodopsins vary with different light duration and light intensities in addition to the wavelengths and holding potentials. The results for each recorded cell could include stimulation by different wavelengths, stimulation by different illumination intensities, stimulation with different light duration in addition to different holding potentials. Not only would the results be highly variable from cell-to-cell, there will be potentially hundreds or thousands of combinations to be tested per cell (e.g., 5 light intensities @1, 2.5 , 5 , 10 and 20 mW/mm>sup>2</sup>, 8 different wavelengths @ 450nm, 475nm, 500nm, 525nm, 550nm, 575nm, 600nm and 625nm, 7 light durations @ 1ms, 5ms, 10ms, 50ms, 100ms, 500ms and 1s, and , and 6 holding potentials @ -80mV, -70mV, -60mV, -40mV, -20mV and 0mV would result in 1680 stimulation conditions per recorded cell).Technically, the significant lowering of membrane resistance when both IvfChr and ZipACR variants are activated simultaneously would compromise the quality of voltage-clamping even in HEK293 cells with series resistance compensation. We have yet to see any other studies that had included such ambitious electrophysiology experiment for the channelrhodopsin characterization, likely due to the feasibility of such experiment.

      Reviewer #3 (Public Review):

      (1) The enhanced vf-Chrimson could potentially be a highlight of the manuscript, serving broader applications. Yet, gauging the overall improvements of ivf-Chrimson in comparison to other Chrimson variants remains intricate due to several reasons. First, photocurrents from ivf-Chrimson seem smaller than those from C-Chrimson (Supplemental Figure 3), and a direct comparison with standard vf-Chrimson is absent.

      We appreciate the reviewer’s positive view of our modified variant. We did not emphasize this particular modification as it was identical to our previous published modification and similar to that previously published by others (CsChrimson and C1Chrimson). In all these cases, improved membrane expression was consistently detected. We believe that expression data and our comparison of C-Chrimson and IvfChr is sufficient to justify the improved membrane expression and function.

      Second, while membrane expression of ivf-Chrimson appears enhanced in provided brightfield recordings, the quantitative analysis would necessitate confocal microscopy and a membrane marker (Supplemental Figure)

      We have now quantified the results with a membrane palmitoylated mCherry using confocal microscopy shown in Fig 2 Suppl1 A. We measured the Pearson Correlation Coefficient of the mCherry with EGFP or Citrine signal for the 6 constructs (vfChrimson, vfChrimson with trafficking sequence, vfChrimson with N-terminal signaling peptide from oChIEF (C-vfChrimson), vfChrimson with trafficking sequence and N-terminal signaling peptide from oChIEF (IvfChr), BiPOLES with EGFP or citrine and vfChrimson) and the results were identical and consistent with the prior results using epifluorescence microscopy.

      (2) Finally, other N-terminal modified Chrimson variants, like CsChrimson by Klapoetke et al. in 2014 and C1Chrimson by Oda et al. in 2018, have been generated. Comparing ivf-Chrimson to vf-CsChrimson or vf-C1Chrimson would be important to evaluate the benefits of the applied N-terminal modification.

      Our development of IvfChrimson is similar to the approach of vf-CsChrimson and identical to that of vf-C1Chrimson and we do not claim these modifications to be unique or superior. However, we have developed our design independently of these other studies and we have more extensive functional comparison and characterization data of our IvfChrimson variant than the other studies.

      (2.1) The action spectra of ZipACR suggest peak absorption of ZipACR WT and its mutant at 525 - 550 nm (Fig. 3). This is even further red-shifted than previously reported by Govorunova et al. Further action spectra recordings differ for all constructs between recordings initiated with blue or red light (Supplementary Fig. 5). This discrepancy is unexpected and should be discussed.

      We thank the reviewer for the comment, this was a mistake in the traces used for the figure. The example traces were the spectral response measured from the 400 nm to 650 nm instead of the 650 nm to 400 nm order shown in the spectral data. This has now been corrected.

      Additionally, the representative photocurrents of Zip(151V) in Fig. 3D1 do not align with the corresponding action spectrum in Fig. 3D2 as they show maximal photocurrents for 400 nm excitation.

      Please, see point above.

      (3) The authors introduce two different bicistronic expression cassettes-ZipT-IvfChR and ZipV-IvfChR-without providing clear guidelines on their conditions of use. Although the authors assert that ZipT is slower and further red-shifted than ZipV, the differences in the data for both ACR mutants are small and the benefits of the different final constructs should be explained.

      In our testing in neurons, ZipT has less ‘escaped’ spikes after the termination of the light pulses in the cells we have tested. However, this is dependent on the membrane properties such as capacitance and resistance of the cells. ZipV has a faster termination time and in some situations may be necessary due to its faster termination time and reduced disruption of physiological processes.

      We have now included this discussion in our updated manuscript.

      (4) The ZipT/V-IvfChRs are designed as bicistronic constructs; yet, disparities in membrane trafficking and protein degradation between the two channels could lead to divergences in blue and red light photoresponses. For future applicants, understanding the extent of expression ratio variations across cells using the presented expression cassettes could be of significance and should be discussed.

      We now have included this discussion in our responses above.

      Reviewer #1 (Recommendations For The Authors):

      (1) The Figure 1a mV cartoon traces for chloride are confusing. The chloride currents are depolarizing, not hyperpolarizing. As noted by the authors, these channels largely generate AP blockade through shunting inhibition (division), not hyperpolarization (subtraction).

      The figure has been corrected.

      (2) Figure 2A does not show where the light is applied. Why are some of the bars blue and some of them not filled?

      This has been corrected

      (3) Figure 2C1 does not show where the light is applied. There should be an inset to detail the blue-light-cessation-evoked AP. Also doesn't give the holding potential.

      The requested details are added.

      (4) Figure 2C2 inset is described as showing that "Light-induced currents with 470 nm illumination were initially outward but turned inward immediately following light offset." Is that correct? It looks to me like the current turns inward about half-way through the light pulse and then becomes even stronger after the light turns off. That is also consistent with the CC traces, which appear to show a transition toward depolarization during the light pulse before the AP initiation at light offset.

      Yes, the reviewer's observation is correct. There are blue light-induced outward and inward current peaks at the onset and offset of the light. Accordingly, we have modified the phrasing for Fig. 2C2.

      (5) Figure 3D1 shows that Zip(151V) has a peak current at 400nm, with a steady increase in current from red to blue, however, this is not the case in the summary data in 3D2. It's also not shown in Supplementary Figure 5B. What's going on?

      We apologize for the prior version of the figure associated with the first submission. The example traces from 400nm -> 650 nm were incorrectly included in the figure whereas the 650nm -> 400 nm example traces should be included. This has been corrected.

      (6) Figure 3D1 has no time scale.

      It is now been included

      (7) Figure 3E1 should read "Transduced" and not "Transfected"

      This has been corrected.

      (8) IvfChr fidelity drops off dramatically at 20hz...down to 50% efficiency of generating APs. This is described in the legend as "high frequency". Maybe the cart came before the horse in this figure...as it looks like in panel C that using less light power density improves fidelity in the dual opsin configuration with red light stimulation...why not use that power for the characterization? Did you try any higher frequencies? Or longer pulse widths? This is an important characterization to inform further use of the tool. This shortcoming isn't a cell-intrinsic limitation, as the 470nm stim with IVfChr was 100% successful at both 10hz and 20hz.

      It is known that red but not blue light pulses induce desensitization (optical fatigue) in red-shifted ChR variants. Indeed, one can reinstate the response to red light, by giving violet-blue light pulses (Fig 4. Suppl 2). We think this is the reason that the 470nm stimulation was more effective in inducing AP in cells expressing IvfChR. Higher light intensities induce greater desensitization, but are preferred for faster opening of channels and depolarization of neurons. This can explain why, in some situations, lower light intensities were more effective in producing APs when pulse trains were used. We have recordings from cells firing APs at 40Hz (not included). All these cells had high expression levels of the opsin.   

      (9) Figure 4D: why use 100ms pulse width? How do you know that this isn't causing depol block? Or some of the nefarious concerns that are raised in the discussion, such as "...disrupt[ion of] normal neuronal physiology and signal processing that occurs in millisecond time scale"?

      We used 100ms pulse duration to follow the published protocol that this experiment is based on (Lin et al., 2013, Nature Neuroscience). 

      (10) Figure 4E-bottom: What is the blue peak at light onset? Is the tool driving early activation before silencing?

      There seems to be an early, sharp and brief activation by blue light. We don’t know the definite cause of this, but we speculate this is driven by blue-light activation of ZipACR and not the IvfChr portion of the construct. The reason is that such a sharp rise is absent when only IvfChr is expressed (Fig. 4E, upper panel). Soma-targeted motif tethered to channelrhodopsins is known to result in preferential expression of channels close to soma but does not exclude the expression of channelrhodopsin in axonal and dendritic compartments, especially when animals are allow to recover for long period of time after viral injection. We believe that ZipACR at axonal terminals where the chloride concentration is high can still cause blue-light evoked depolarization and transmitter release. We observed this phenomenon in two mice in their first trial. The data for individual trials for each mouse are included in a supplementary table.

      (11) Figure 4G: Earlier in this same figure (B2, C), 470nm light was more effective at stimulating IvfChr than 635nm light. Is it unexpected that 638nm light would in this in vivo context be more effective at driving IvfChr responses than 450 nm light (at least as reflected by the AUC measurements)? Does this reflect fiber placement and light penetration/scattering?

      The spectral peaks of Chrimson-based variants including vfChrimson are all centered around 600 nm, and at 635 / 638 nm light, the amplitudes of photo-response decline, the channel onset slows, and the channels suffer greater desensitization. In isolated preparations where the light penetration is similar between 635 / 638 nm and 470 nm, 470 nm responses can outperform 635 / 638 nm responses due to its lack of desensitization and higher consistency in its response. This is also a strong reason that we have developed our current approach. In in vivo preparation shown in Fig. 4D-G, the much higher tissue penetration of 638nm light due to reduced absorption and reduced scattering can offset the performance of IvfChr to 450 nm light.  

      (12) In the methods, it is noted that different viral batches appear to generate different levels of neuronal toxicity. If that is the case, how did you differentiate between true differences between constructs vs. differential cell health effects?

      For figure 4D-F (whisker movement), we determined virus toxicity using NeuN staining. In slice recordings, we used the electrophysiological property of the neurons to assess their health. For this manuscript, we had one batch of virus that produced toxicity. We did not include any data from this batch.

      Reviewer #2 (Recommendations For The Authors):

      ● Define AUC on first use.

      It is now defined.

      ● Figure 3C2: Please explain how the photocurrents were normalized. As presented, it looks like under strong orange light, the ZipACR has higher photocurrent than the ivfChr.

      This is due to the fact vfChrimson and other Chrimson-based variants do not fully recover in the dark after 590 nm stimulation. We tested IvfChrimson with both reconditioning light pulse of 405 nm and without 405 nm and we can consistently reach a greater ‘maximal’ response from the same cell after 405 nm reconditioning (see Fig. 4 Suppl 2). We therefore normalize the response to the maximal recorded response of the cell often achieved with 10 or 20 mW/mm<sup>2</sup> 590 nm stimulation after 405 nm reconditioning. We understand this can be confusing and have now replaced the light-intensity response in Fig. 3C2 with the one with 405 nm reconditioning which is easier to interpret for the readers.

      ● P. 3: "As expected, blue light pulses induce transient membrane suppression..." Unclear what "suppression" means. Shunting? Hyperpolarization?

      We rephrased this to “As expected, blue light pulses transiently suppress APs…”

      ● P. 3: "illumination at 470 nm and 590 nm wavelengths led to similar amounts of courtship song (110.1 {plus minus} 12.8 and 78.5 {plus minus} 11.6,n = 16-17, respectively)". What are the units of "courtship song"?

      The unit for courtship song is the number of pulses per 10 seconds. This has been clarified in the figure.

      ● P. 5: The quantification of photocurrent in terms of pA/pF/A.U. is non-standard. I understand the impetus to normalize by expression to give something proportional to per-molecule conductance, but a user cares about overall photocurrent. Please also give the real photocurrents, either pA or pA/pF.

      We have provided the real photocurrent in pA or pA/pF where scientifically appropriate. To avoid selection and experimenter’s bias in our data, we did not set criteria for data elimination for cells with specific fluorescence intensity or photocurrent amplitude. Some resulting response can range from vary up to 20 folds from the same construct in many experiments. We do not believe that averaging absolute photocurrent amplitude would be justified due to the imbalance of weighing in the results. We do acknowledge that not selecting or eliminating data points would introduce higher noise in recordings with smaller responses but this is preferable over the selection or experimenter bias that is likely to be introduced otherwise.

      ● Please quote illumination intensities wherever possible.

      ● P. 7: why was the red light crosstalk into Zip(151T) tested at 635 nm instead of 590 nm? Isn't the relevant parameter 590 nm, since that will be used for the excitatory opsin?

      In all our characterizations of the constructs using slice electrophysiology recordings, we used 635nm instead of 590nm. The reason is that compared to 590nm wavelength, at 635nm the photocurrent for Zip(151T) and Zip(151V) is significantly reduced (Fig. 3D1,D2).

      ● P. 10: "we examined the power at which responses to 470 nm and 635 nm lights induce APs in neurons expressing ZipT-IvfChr, ZipV-IvfChr, or IvfChr", but the preceding sentence says you didn't test the ZipT-IvfChr. This is confusing, please clarify.

      The previous paragraph refers to the photocurrent recordings in HEK293 cells where our fast LED based illumination system is limited to 590 nm light, whereas the subsequent paragraph refers to the brain slice neuronal recordings. We have now emphasized the difference of the experiments in the rewrite.

      ● Fig. 4B1, top: Why don't the blue traces return to the same baseline after the stimulus epochs?

      We observed this shift in baseline (~4mV more depolarized) in cells expressing IvfChR (or vfChR) only with blue light stimulation. This was observed in the neurons recorded in the CA1 as well (data not shown). There was no such a change following red light stimulation (Fig. 4B1). Therefore, this should not affect the applicability of our construct. The original paper introducing vfChR did not test the responses of their constructs to blue light. There could be another photocycle state that is activated stronger by 470nm than 590nm and it has a slow off-rate, but this is only a speculation from our side. It must be noted we did not observe such a phenomenon in cells expressing ChrimsonR (Fig. 1 Suppl 1C).

      ● Fig. S3B, right: The two colors are barely distinguishable on the graph. Consider more distinct colors and/or different symbols.

      It has been changed accordingly.

      ● P. 15: "However, we do not recommend the use of orange light pulses, as we observed a significant photocurrent in this wavelength." Not clear what this is referring to. Which construct? Under which circumstances shouldn't one use orange light pulses? Where's the data showing this?

      This is referring to Fig. 3D1,D2 and Figure 4 suppl Fig. 2 which show a normalized ~40-50% photocurrent at 590nm. Now in the text, the reference figures for the data are added.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Audio et al. measured cerebral blood volume (CBV) across cortical areas and layers using high-resolution MRI with contrast agents in non-human primates. While the non-invasive CBV MRI methodology is often used to enhance fMRI sensitivity in NHPs, its application for baseline CBV measurement is rare due to the complexities of susceptibility contrast mechanisms. The authors determined the number of large vessels and the areal and laminar variations of CBV in NHP and compared those with various other metrics.

      Strengths:

      Non-invasive mapping of relative cerebral blood volume is novel for non-human primates. A key finding was the observation of variations in CBV across regions; primary sensory cortices had high CBV, whereas other higher areas had low CBV. The measured CBV values correlated with previously reported neuronal and receptor densities.

      Weaknesses:

      A weakness of this manuscript is that the quantification of CBV with postprocessing approaches to remove susceptibility effects from pial and penetrating vessels, as well as orientation dependency, is not fully validated, especially on a laminar scale. Further specific comments follow.

      We suspect that the comment regarding the lack of validation on laminar level stems from an error made by the corresponding author in the original bioRxiv submission (v1, May 17th https://www.biorxiv.org/content/10.1101/2024.05.16.594068v1?versioned=true), where Figure 3 which contains laminar validation was lost during pdf conversion. After submitting to E-Life, this mistake was quickly identified, and a corrected manuscript was re-uploaded to the bioRxiv (v2, June 5th, https://doi.org/10.1101/2024.05.16.594068). Although we informed the eLife staff about the update, it appears that the revised manuscript may not have reached reviewer #1 in time. We sincerely apologize for any confusion or inconvenience this may have caused.

      (1) Baseline CBV indices were determined using contrast agent-enhanced MRI (deltaR2*). Although this approach is suitable for areal comparisons, its application on a laminar scale has not been validated in the literature or in this study. By comparing with histological vascular information of V1, the authors attempted to validate their approach. However, the generalization of their method is questionable. The main issue is whether the large vessel contribution is minimized by processing approaches properly in various cortical areas (such as clusters 1-3 in Figure 5). It would be beneficial to compare deltaR2* with deltaR2 induced by contrast agents in a few selected slices, as deltaR2 is supposed to be sensitive to microvessels, not macrovessels. Please discuss this issue.

      The requested validation is presented in Figure 3F, which compares our deltaR2* measurements with previously invasive estimates of large vessel, capillary and cytochrome oxidase (CO) levels in V1 (Weber et al., 2008; doi.org/10.1093/cercor/bhm259). Our deltaR2* values show a stronger correspondence with microvascularity and CO levels than large vessels. Moreover, Figure 3D illustrates relative differences between V1 and V2, which closely align with the relative vascular volume differences reported by Zheng et al., 1991. It is important to note that Weber and colleagues averaged across V2-V5 due to similar vascularity across these areas. In our material, we also observed similar vascularity in these areas, though V5 (e.g., MT) has slightly denser vascularity, in agreement with reports of CO staining.

      Additionally, we report similar GM/WM vascular density, and high vascular density in primary sensory areas. Unfortunately, available ground-truth data on vascularity does not provide further (general) validation data for laminar vasculature in macaques (such as those in cluster 1-3; Fig. 5). That said, we have provided substantial evidence linking whole-brain vascular measures with variations in neuron (for data distribution, see Supp. Fig. 6F) and receptor densities, which we believe provides strong support for our approach.

      We would like to clarify that the authors do not assert that gradient-echo MRI is exclusively sensitive to microvessels and not macrovessels. This is not stated anywhere in the manuscript. If any sentence appears misleading, please let us know, and we will consider revising it. It is well-established that large vessels contribute to ΔR2* (Ogawa et al., 1993; Boxerman et al., 1995), and this is clearly stated in the manuscript (introduction, methods, results and discussion) and demonstrated in Figures 2A, B, and Supp. Figs. 2, 3, and 4. The primary concern, as the reviewer also noted, is whether we have sufficiently minimized the contribution of large vessels in our parcellated data analysis.

      At the parcellated level, we used the median value to avoid skewness in the data distribution, which primarily arises from large vessels, as regions near these vessels exhibit higher ΔR2*. The skewness of ΔR2* is also visible in Figure 1F, G. While this approach mitigates this large-small vessel issue, it does not entirely resolve it, as a slight linear increase toward the cortical surface remains (in all parcels). This is likely due to our inability to delineate all penetrating vessels as shown in Figure 2E and because contrast agents cumulatively accumulate toward superficial layers where blood originates and returns to the pial surface. To mitigate this issue, we detrended across layers the parcellated profiles, obtaining results similar to the ground-truth measures of vascularity in V1-V5 and CO histology in V1.

      (2) High-resolution MRI with a critical sampling frequency estimated from previous studies (Weber 2008, Zheng 1991) was performed to separate penetrating vessels, which is considered one of the major advancements in this study. However, this approach is still insufficient to accurately identify the number of vessels due to the blooming effects of susceptibility and insufficient spatial resolution. There was no detailed description of the detection criteria. More importantly, the number of observable penetrating vessels is dependent on imaging parameters and the dose of the contrast agent. If imaging slices were obtained in parallel to the cortex with higher in-plane resolution, it would likely improve the detection of penetrating vessels. Using higher-field MRI would further enhance the detection of penetrating vessels. Therefore, the reported value is only applicable to the experimental and processing conditions used in this study. Detailed selection criteria should be mentioned, and all potential pitfalls should be discussed.

      We believe that Figure 2 represents a significant conceptual and data analysis advancement in the field of vascular imaging. To the best of our knowledge, this is the first MRI study attempting to assess vessel density across cortical layers and compare the number of vessels to the known ground-truth. While we do not claim to have achieved a perfect solution (as shown in Figure 2), we offer a robust challenge to the imaging community by introducing this novel benchmarking approach. Our hope is that this conceptual framework will inspire the MR imaging community to tackle this challenge.

      Regarding imaging parameters, TE did not have much effect on our results, with a slight effect observed in the superficial layers due to the presence of large pial vessels (blooming effect; Fig. 2C). This also suggests that similar results could be achieved by changing the contrast agent dose, though there are, of course, CNR requirements and limitations at either end of the spectrum.

      We completely agree with the reviewer that spatial resolution is critical in resolving the arterio-venous networks, and we have dedicated significant attention to this topic in the introduction, results and discussion sections. We also agree with the reviewer that if imaging slices were obtained in parallel to the cortex with higher in-plane resolution, it would improve the detection of vessels. However, while this approach is ideal for counting vessels in a single plane and isolated region of cortex, it is less suited to the surface mapping of vessels, which is the focus of our study.

      Regarding the exclusion of vessels, based on visual comparison of vessels in volume space, Frangi-filter detection of vessels in volume space, and surface detection of vessels, we found no evidence to develop additional exclusion criteria (Supp. Fig. 3). On the contrary, we identified a number of false negatives in both the surface maps and volume maps. Notable exceptions to this rule seemed to occur at premotor areas F2 and F3 (Matelli et al., 1984; Patterns of cytochrome oxidase activity in the frontal agranular cortex of the macaque monkey). In these regions, we observed peculiar “pockets” of signal drop-out in equivolumetric layers 4-5. It is unclear what these signal-voids represent but it is interesting to note that these cortical areas F1-F5 were originally delineated by distinct CO+ positive large cells (Matelli et al., 1984).

      (3) Attempts to obtain pial vascular structures were made (Figure 2). As mentioned in this manuscript, the blooming effect of susceptibility contrasts is problematic. In the MRI community, T1-based Gd contrast agents have been used for mapping large vasculature, which is a better approach for obtaining pial vascular structures. Alternatively, computer tomography with a blood contrast agent can be used for mapping blood vasculature noninvasively. This issue should be discussed.

      We agree with the reviewer that T1-based contrast agents may offer more precise direct localization of large vessels in pial vasculature. However, the primary focus of our study was not on visualizing pial vascular structures, but rather on measuring vascular volume across cortical layers. For this purpose, we opted to use ferumoxytol, which provides superior T2*-contrast and about ten times longer plasma half-life compared to gadolinium. While we anticipated artifacts from the pial network, we developed a novel method to indirectly map these long-distance susceptibility artifacts arising from large vessels onto the cortical surface (Fig. 2A). If the goal would be to specifically visualize pial vessels, we applaud the high-resolution TOF angiography developed for direct vessel visualization (Bollman et al., 2022; https://doi.org/10.7554/eLife.71186)

      Changes in text:

      “4.1 Methodological considerations - vessel density informed MRI

      While the pial vessels can be directly visualized using high-resolution time-of-flight MRI (Bollmann et al., 2022), and computed tomography (Starosolski et al., 2015), imaging of the dense vascularity within the large and highly convoluted primate gray matter presents other formidable challenges. Here, we used a combination of ferumoxytol contrast agent and cortical layer resolution 3D gradient-echo MRI to map cerebrovascular architecture in macaque monkeys. These methods allowed us to indirectly delineate large vessels and indirectly estimate translaminar variations in cortical microvasculature.”

      (4) Since baseline R2* is related to baseline R2, vascular volume, iron content, and susceptibility gradients, it is difficult to correlate it with physiological parameters. Baseline R2* is also sensitive to imaging parameters; higher spatial resolution tends to result in lower R2* values (closer to the R2 value). Therefore, baseline R2* findings need to be emphasized.

      We agree with the reviewer's comment on the complexity of correlating baseline R2* with vasculature, given its sensitivity to multiple factors such as venous oxygenation, iron content, and imaging parameters such as image resolution. While our study focuses on vascular measurements, one could also highlight iron’s role in brain energy metabolism. Deoxygenated blood affects R2*, iron in oligodendrocytes supports myelination and neuronal signaling, and iron’s role in cytochrome c oxidase during electron transport impacts mitochondrial energy production. These metabolic factors collectively affect baseline R2* and link it to vasculature. Though quantitative susceptibility mapping (QSM) could help differentiate these different factors, it is beyond the scope of this study.

      (5) CBV-weighted deltaR2* is correlated with various other metrics (cytoarchitectural parcellation, myelin/receptor density, cortical thickness, CO, cell-type specificity, etc.). While testing the correlation between deltaR2* and these other metrics may be acceptable as an exploratory analysis, it is challenging for readers to discern a causal relationship between them. A critical question is whether CBV-weighted deltaR2* can provide insights into other metrics in diseased or abnormal brain states. If this is the case, then high-resolution deltaR2* will be useful. Please comment on this possibility.

      We agree with the reviewer that correlation deltaR2* with other metrics, such as myelin and cortical thickness, receptors and interneuron types, remains exploratory. Establishing causal relationships requires advanced multivariate analysis across cortical layers, but mapping histological stains to cortical layers is still under development. While this exploratory approach is promising, the ability to apply these insights to diseased or abnormal brain states is not yet clear. Layer-specific analysis of vasculature and function in disease is a future goal, and ongoing work aims to expand this line of inquiry. For now, while high-resolution deltaR2* may indeed offer diagnostic potential, we prefer to refrain from overstating its clinical utility at this stage. We agree that multimodal studies integrating neuroanatomy, function, and vascular metrics will be valuable for deeper insights into brain abnormalities.

      Changes in text:

      “4.3 The vascular network architecture is intricately connected to the neuroanatomical organization within cerebral cortex

      …To comprehensively understand the factors contributing to the vascular organization of the brain, experimental disentanglement through multivariate analysis of laminar cell types and receptor densities is needed (Hayashi et al., 2021, Froudist-Walsh et al., 2023).”

      (6) There is no discussion about the deltaR2* difference across subcortical areas (Figure 1). This finding is intriguing and warrants a thorough discussion in the context of the cortical findings.

      We thank the reviewer for this comment. We have expanded discussion on subcortical structures:

      Section 4.3, 1st paragraph:

      “In the cerebral cortex, neurons account for a significant portion (≈80-90%) of energy demand, with most of this energy allocated to signaling (≈80%) and maintaining membrane resting potentials (≈20%) (Attwell and Laughlin, 2001; Howarth et al., 2012). Since firing frequency is modulatory and the neural networks utilize distributed coding, the maintenance of resting-state membrane potential determines the minimal energy budget and the lower-limit for cerebral perfusion. Based on neuronal variability and energy dedicated to maintaining surface potential, this suggest an approximate (4 × 20% ≈) 80% variation in CBF and a resultant 25% variation in CBV across the cortex, in line with Grubbs' law (CBV = 0.80 × CBF0.38) (Grubb et al., 1974). In the cerebellar cortex, neuron density is higher, and the resting potentials are thought to account for more than 50% of energy usage (Howarth et al., 2012), aligning with its higher vascular volume compared to the cerebral cortex (Fig. 1F). However, this is a simplified estimation, and a more comprehensive assessment would need to account for consider an aggregate of biophysical factors such as…”

      Section 4.3, 4th paragraph:

      “When viewed in terms of information flow, CBV appear to decrease along the canonical circuit pathway (e.g., L4→L2/3→L5) in the primary visual cortex (Douglas and Martin, 2007) and as one ascends the hierarchy (e.g., V1→V2→V3&4→MT→7A) from primary sensory areas (Fig. 3F, Supp. Fig. 8) (Felleman and Van Essen et al., 1991, Markov et al., 2014). A similar pattern is observed in the auditory hierarchy, where the inferior colliculus, an early processing hub, exhibits the highest vascular volume, followed by a gradual reduction along cortical auditory ‘where’ and ‘what’ pathways (Fig. 1F, Fig. 3B).”

      (7) Figure 3 is missing. Several statements in the manuscript require statistics (e.g., bimodality in Figure 2D, Figure 3F).

      We apologize to the reviewer for the absence of Figure 3 in the initial submission.

      As for statistical testing of bimodality, we respectfully disagree and feel that this would not add much value to the manuscript. We think a descriptive, rather than rigorous, approach is sufficient in this context.

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents a new approach for non-invasive, MRI-based measurements of cerebral blood volume (CBV). Here, the authors use ferumoxytol, a high-contrast agent, and apply specific sequences to infer CBV. The authors then move to statistically compare measured regional CBV with the known distribution of different types of neurons, markers of metabolic load, and others. While the presented methodology captures an estimated 30% of the vasculature, the authors corroborated previous findings regarding the lack of vascular compartmentalization around functional neuronal units in the primary visual cortex.

      Strengths:

      Non-invasive methodology geared to map vascular properties in vivo.

      Implementation of a highly sensitive approach for measuring blood volume.

      Ability to map vascular structural and functional vascular metrics to other types of published data.

      Weaknesses:

      The key issue here is the underlying assumption about the appropriate spatial sampling frequency needed to capture the architecture of the brain vasculature. Namely, ~7 penetrating vessels / mm2 as derived from Weber et al 2008 (Cer Cor). The cited work begins by characterizing the spacing of penetrating arteries and ascending veins using a vascular cast of 7 monkeys (Macaca mulatta, same as in the current paper). The ~7 penetrating vessels / mm2 are computed by dividing the total number of identified vessels by the area imaged. The problem here is that all measurements were made in a "non-volumetric" manner and only in V1. Extrapolating from here to the entire brain seems like an over-assumption, particularly given the region-dependent heterogeneity that the current paper reports.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      - For broader readership, it would be beneficial to provide a guide on how to interpret baseline R2* versus ΔR2*.

      The text was edited as follows:

      “…For quantitative assessment, R<sub>2</sub>* values were estimated from multi-echo gradient-echo images acquired both before and after the administration of ferumoxytol contrast agent (Table 1). Subsequently, the baseline R<sub>2</sub>* and ΔR<sub>2</sub>*, an indirect proxy measure of CBV (Boxerman et al., 1995), volume maps for each subject were mapped onto the twelve native equivolumetric layers (ELs) (Fig. 1C). Each vertex was then corrected for normal of the cortex relative to B<sub>0</sub> direction (Supp. Fig. 1). Surface maps for each subject were registered onto a Mac25Rhesus average surface using cortical curvature landmarks and then averaged across the subjects (Fig. 1D, E). Around cortical midthickness, the distribution of R<sub>2</sub>*, an aggregate measure for ferritin-bound iron, myelin content and venous oxygenation levels (Langkammer et al., 2012), resembled the spatial pattern of ΔR<sub>2</sub>* vascular volume. However, across cortical layers, these measures exhibited reversed patterns: R<sub>2</sub>* increased toward the white matter surface, whereas ΔR<sub>2</sub> decreased (Fig. 1E, G).”

      - The legends in Figure 1 describe green/cyan arrows, which are not visible in the figure itself.

      We thank the reviewer for noting this discrepancy. The reference to green/cyan arrows was removed from the Figure 1 legend.

      - There are typos in Section 3.3: "(Figure 4A, E)" and "(cluster 3; Figure 3)" should be corrected to Figure 5.

      We thank the reviewer for noting this error. The references to the Figures were corrected.

      Reviewer #2 (Recommendations for the authors):

      The work is elegantly presented and very easy to follow. The figures and the data presented there are compelling and well-organized. I have enjoyed reading the paper, despite my disagreement with the validity of the methodology presented.

      Validation against MRA methods (high resolution needed here, Bolan et al 2006, cited also by the authors). Certainly, that work used a much higher magnetic field. This could be done through collaboration if such a magnet is not available. In my humble opinion, the current arguments provided in the paper as validation fall short in convincing future readers. Other TOF approaches might be better suited (in combination with line scanning or single plane sequences) for the 3T used in this work.

      We appreciate the reviewer’s suggestion regarding time-of-flight (TOF) angiography at ultra-high magnetic fields, such as 9.4T for improved visualization of fast-flowing blood in arterial vessels, as elegantly demonstrated in Bolan et al., 2006. However, our focus was on mapping vasculature across cortical layers and TOF is not optimal for imaging slow capillary blood inflow. To enhance CNR also at capillary level, we used ferumoxytol-contrast agent to create quantitative CBV-weighted cortical layer maps (Boxerman et al., 1995).

      We are open to collaborative opportunities to revisit this work using ultra-high magnetic field strengths and more detailed neuroanatomical ground-truth measures. However, the recommended line scanning or single-plane sequences, at least on first impression, seem inadequate for whole-brain coverage and cortical surface mapping.

      Some of the methodology can be made more accessible to non-MRI readers. For example, a more elaborate explanation of R2* and ΔR2 could benefit future readers.

      Elaborated as requested (see above reply).

      A more detailed discussion of the limitations of the methodology could also be beneficial here. Explain the potential implications of under-sampling denser vascular areas (i.e. with potentially more than 7 penetrating vessels per mm2).

      V1, with its highest neuronal density, likely also has the highest feeding/draining vessel density. Based on this, we hypothesized that a 0.23 mm isotropic image resolution would sufficiently capture cortical arterio-venous networks, but we did not achieve the expected detection of 7 penetrating vessels per mm<sup>2</sup>. Consequently, we refrained from quantifying vessel density in other areas, albeit we did report the total vessel count.

      This under-sampling likely biases our ΔR2* estimates, skewing them toward larger vessels. To address this, we used median parcel values to avoid over-representing large vessels (the long-tail in ΔR2 parcels data distribution represents large vessels) and corrected for the cortical surface bias where blood originates from and returns to the pial network. These steps helped mitigate large vessel bias as described in the methods, results and discussion (see also our response to Reviewer #1, question #1).

      To improve clarity for readers, we further clarified:

      Methods:

      “The effect of blood accumulation in large feeding arteries and draining veins toward in the superficial layers was estimated using linear model and regressed out from the parcellated ΔR<sub>2</sub>* maps.”

      Results:

      “To mitigate bias resulting from undersampling the large-caliber vessels (Fig. 2A, B), median parcel values were obtained and M132 parcellated ΔR2* profiles were then detrended across ELs in each subject and then averaged.”

      Discussion:

      “This methodology, however, has known limitations. First, gradient-echo imaging is more sensitized toward large pial vessels running along the cortical surface and large penetrating vessels, which could differentially bias the estimation of Δ R<sub>2</sub>* across cortical layers (Fig. 2A, 2B) (Boxermann et al., 1995; Zhao et al., 2006). Additionally, vessel orientation relative to the B<sub>0</sub> direction introduce strong layer-specific biases in quantitative ΔR<sub>2</sub>* measurements (Supp. Fig. 1C) (Ogawa et al., 1993; Viessmann et al., 2019; Lauwers et al., 2008). To address these concerns, we conducted necessary corrections for B<sub>0</sub>-orientation, obtained parcel median values and regressed linear-trend thereby mitigating the effect of undersampling large-caliber vessels across ELs (Fig. 2C, Supp. Fig. 1).” 

      Please note, we are currently unable to create BALSA links to the figures due to maintenance issues at the data repository. As a result, we have opted to remove the links:

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper investigates the effects of the explicit recognition of statistical structure and sleep consolidation on the transfer of learned structure to novel stimuli. The results show a striking dissociation in transfer ability between explicit and implicit learning of structure, finding that only explicit learners transfer structure immediately. Implicit learners, on the other hand, show an intriguing immediate structural interference effect (better learning of novel structure) followed by successful transfer only after a period of sleep.

      Strengths:

      This paper is very well written and motivated, and the data are presented clearly with a logical flow. There are several replications and control experiments and analyses that make the pattern of results very compelling. The results are novel and intriguing, providing important constraints on theories of consolidation. The discussion of relevant literature is thorough. In summary, this work makes an exciting and important contribution to the literature.

      Weaknesses:

      There have been several recent papers that have identified issues with alternative forced choice (AFC) tests as a method of assessing statistical learning (e.g. Isbilen et al. 2020, Cognitive Science). A key argument is that while statistical learning is typically implicit, AFC involves explicit deliberation and therefore does not match the learning process well. The use of AFC in this study thus leaves open the question of whether the AFC measure benefits the explicit learners in particular, given the congruence between knowledge and testing format, and whether, more generally, the results would have been different had the method of assessing generalization been implicit. Prior work has shown that explicit and implicit measures of statistical learning do not always produce the same results (eg. Kiai & Melloni, 2021, bioRxiv; Liu et al. 2023, Cognition).

      We agree that numerous papers in the Statistical Learning literature discuss how different test measures can lead to different results and, in principle, using a different measure could have led to varying results in our study. In addition, we believe there are numerous additional factors relevant to this issue including the dichotomous vs. continuous nature of implicit vs. explicit learning and the complexity of the interactions between the (degree of) explicitness of the participants' knowledge and the applied test method that transcend a simple labeling of tests as implicit or explicit and that strongly constrains the type of variations the results of  different test would produce. Therefore, running the same experiments with different learning measures in future studies could provide additional interesting data with potentially different results.

      However, the most important aspect of our reply concerning the reviewer's comment is that although quantitative differences between the learning rate of explicit and implicit learners are reported in our study, they are not of central importance to our interpretations. What is central are the different qualitative patterns of performance shown by the explicit and the implicit learners, i.e., the opposite directions of learning differences for “novel” and “same” structure pairs, which are seen in comparisons within the explicit group vs. within the implicit group and in the reported interaction. Following the reviewer's concern, any advantage an explicit participant might have in responding to 2AFC trials using “novel” structure pairs should also be present in the replies of 2AFC trials using the “same” structure pairs and this effect, at best, could modulate the overall magnitude of the across groups (Expl/Impl.) effect but not the relative magnitudes within one group. Therefore, we see no parsimonious reason to believe that any additional interaction between the explicitness level of participants and the chosen test type would impede our results and their interpretation.

      Given that the explicit/implicit classification was based on an exit survey, it is unclear when participants who are labeled "explicit" gained that explicit knowledge. This might have occurred during or after either of the sessions, which could impact the interpretation of the effects.

      We agree that this is a shortcoming of the current design, and obtaining the information about participants’ learning immediately after Phase 1 would have been preferred. However, we made this choice deliberately as the disadvantage of assessing the level of learning at the end of the experiment is far less damaging than the alternative of exposing the participants to the exit survey question earlier and thereby letting them achieve explicitness or influence their mindset otherwise through contemplating the survey questions before Phase 2. Our Experiment 5 shows how realistic this danger of unwanted influence is: with a single sentence alluding to pairs in the instructions of Exp 5, we  could completely change participants' quantitative performance and qualitative response pattern. Unfortunately, there is no implicit assessment of explicitness we could use in our experimental setup. We also note that given the cumulative nature of statistical learning, we expect that the effect of using an exit survey for this assessment only shifts absolute magnitudes (i.e. the fraction of people who would fall into the explicit vs. implicit groups) but not aspects of the results that would influence our conclusions.

      Reviewer #2 (Public Review):

      Summary:

      Sleep has not only been shown to support the strengthening of memory traces but also their transformation. A special form of such transformation is the abstraction of general rules from the presentation of individual exemplars. The current work used large online experiments with hundreds of participants to shed further light on this question. In the training phase, participants saw composite items (scenes) that were made up of pairs of spatially coupled (i.e., they were next to each other) abstract shapes. In the initial training, they saw scenes made up of six horizontally structured pairs, and in the second training phase, which took place after a retention phase (2 min awake, 12 h incl. sleep, 12 h only wake, 24 h incl. sleep), they saw pairs that were horizontally or vertically coupled. After the second training phase, a two-alternatives-forced-choice (2-AFC) paradigm, where participants had to identify true pairs versus randomly assembled foils, was used to measure the performance of all pairs. Finally, participants were asked five questions to identify, if they had insight into the pair structure, and post-hoc groups were assigned based on this. Mainly the authors find that participants in the 2-minute retention experiment without explicit knowledge of the task structure were at chance level performance for the same structure in the second training phase, but had above chance performance for the vertical structure. The opposite was true for both sleep conditions. In the 12 h wake condition these participants showed no ability to discriminate the pairs from the second training phase at all.

      Strengths:

      All in all, the study was performed to a high standard and the sample size in the implicit condition was large enough to draw robust conclusions. The authors make several important statistical comparisons and also report an interesting resampling approach. There is also a lot of supplemental data regarding robustness.

      Weaknesses:

      My main concern regards the small sample size in the explicit group and the lack of experimental control.

      The sample sizes of the explicit participants in our experiments are, indeed, much smaller than those of the implicit participants due to the process of how we obtain the members of the two groups. However, these sample sizes of the explicit groups are not small at all compared to typical experiments reported in Visual Statistical Learning studies, rather they tend to be average to large sizes. It is the sizes of the implicit subgroups that are unusually high due to the aforementioned data collecting process. Moreover, the explicit subgroups have significantly larger effect sizes than the implicit subgroup, bolstering the achieved power that is also confirmed by the reported Bayes Factors that support the “effect” or the “no effect” conclusions in the various tests ranging in value from substantial to very strong.  Based on these statistical measures,  we think the sample sizes of the explicit participants in our studies are adequate.

      As for the lack of experimental control, indeed, we could not fully randomize consolidation condition assignment. Instead, the assignment was a product of when the study was made available on the online platform Prolific. This method could, in theory, lead to an unobserved covariate, such as morningness, being unbalanced between conditions. We do not have any reasons to believe that such a condition would critically alter the effects reported in our study, but as it follows from the nature of unobserved variables, we obviously cannot state this with certainty. Therefore, we added an explicit discussion of these potential pitfalls in the revised version of the manuscript.

      Reviewer #3 (Public Review):

      In this project, Garber and Fiser examined how the structure of incidentally learned regularities influences subsequent learning of regularities, that either have the same structure or a different one. Over a series of six online experiments, it was found that the structure (spatial arrangement) of the first set of regularities affected the learning of the second set, indicating that it has indeed been abstracted away from the specific items that have been learned. The effect was found to depend on the explicitness of the original learning: Participants who noticed regularities in the stimuli were better at learning subsequent regularities of the same structure than of a different one. On the other hand, participants whose learning was only implicit had an opposite pattern: they were better in learning regularities of a novel structure than of the same one. This opposite effect was reversed and came to match the pattern of the explicit group when an overnight sleep separated the first and second learning phases, suggesting that the abstraction and transfer in the implicit case were aided by memory consolidation.

      These results are interesting and can bridge several open gaps between different areas of study in learning and memory. However, I feel that a few issues in the manuscript need addressing for the results to be completely convincing:

      (1) The reported studies have a wonderful and complex design. The complexity is warranted, as it aims to address several questions at once, and the data is robust enough to support such an endeavor. However, this work would benefit from more statistical rigor. First, the authors base their results on multiple t-tests conducted on different variables in the data. Analysis of a complex design should begin with a large model incorporating all variables of interest. Only then, significant findings would warrant further follow-up investigation into simple effects (e.g., first find an interaction effect between group and novelty, and only then dive into what drives that interaction). Furthermore, regardless of the statistical strategy used, a correction for multiple comparisons is needed here. Otherwise, it is hard to be convinced that none of these effects are spurious. Last, there is considerable variation in sample size between experiments. As the authors have conducted a power analysis, it would be good to report that information per each experiment, so readers know what power to expect in each.

      Answering the questions we were interested in required us to investigate two related but separate types of effects within our data: general above-chance performance in learning, and within- and across-group differences.

      Above-chance performance: As typical in SL studies, we needed to assess whether learning happened at all and which types of items were learned. For this, a comparison to the chance level is crucial and, therefore, one-sample t-test is the statistical test of choice. Note that all our t-tests were subject to experiment-wise correction for multiple comparisons using the Holm-Bonferroni procedure, as reported in the Supplementary Materials.

      Within- and across-group differences: To obtain our results regarding group and par-type differences and their interactions, we used mixed ANOVAs and appropriate post-hoc tests as the reviewer suggested. These results are reported in the method section.

      Concerning power analysis, in the revised version of the manuscript we added analysis of achieved power for the statistical tests most critical to our arguments.

      (2) Some methodological details in this manuscript I found murky, which makes it hard to interpret results. For example, the secondary results section of Exp1 (under Methods) states that phase 2 foils for one structure were made of items of the other structure. This is an important detail, as it may make testing in phase 2 easier, and tie learning of one structure to the other. As a result, the authors infer a "consistency effect", and only 8 test trials are said to be used in all subsequent analyses of all experiments. I found the details, interpretation, and decision in this paragraph to lack sufficient detail, justification, and visibility. I could not find either of these important design and analysis decisions reflected in the main text of the manuscript or in the design figure. I would also expect to see a report of results when using all the data as originally planned.

      We thank the reviewer for pointing out these critical open questions our manuscript that need further clarification. The inferred “consistency effect” is based on patterns found in the data, which show an increase in negative correlation between test types during the test phase. As this is apparently an effect of the design of the test phase and not an effect of the training phase, which we were interested in, we decided to minimize this effect as far as possible by focusing on the early test trials. For the revised version of the manuscript, we revamped and expanded the discussion of how this issue was handled and also add a short comment in the main text, mentioning the use of only a subset of test trials and pointing the interested reader to the details.

      Similarly, the matched sample analysis is a great addition, but details are missing. Most importantly, it was not clear to me why the same matching method should be used for all experiments instead of choosing the best matching subgroup (regardless of how it was arrived at), and why the nearest-neighbor method with replacement was chosen, as it is not evident from the numbers in Supplementary Table 1 that it was indeed the best-performing method overall. Such omissions hinder interpreting the work.

      Since our approach provided four different balanced metrics (see Supp. Tables 1-4) for each matching method, it is not completely straightforward to make a principled decision across the methods. In addition, selecting the best method for each experiment separately carries the suspicion of cherry-picking the most suitable results for our purposes. For the revised version, we expanded on our description of the matching and decision process and added supplementary descriptive plots showing what our data looks like under each matching method for each experiment. These plots highlight that the matching techniques produce qualitatively roughly identical results and picking one of them over the other does not alter the conclusions of the test. The plots give the interested reader all the necessary information to assess the extent our design decisions influence our results.

      (3) To me, the most surprising result in this work relates to the performance of implicit participants when phase 2 followed phase 1 almost immediately (Experiment 1 and Supplementary Experiment 1). These participants had a deficit in learning the same structure but a benefit in learning the novel one. The first part is easier to reconcile, as primacy effects have been reported in statistical learning literature, and so new learning in this second phase could be expected to be worse. However, a simultaneous benefit in learning pairs of a new structure ("structural novelty effect") is harder to explain, and I could not find a satisfactory explanation in the manuscript.

      Although we might not have worded it clearly, we do not claim that our "structural novelty effect" comes from a “benefit” in learning pairs of the novel structure. Rather, we used the term “interference” and lack of this interference. In other words, we believe that one possible explanation is that there is no actual benefit for learning pairs of the novel structure but simply unhindered learning for pairs of the novel structure and simultaneous inference for learning pairs of the same structure. Stronger interference for the same compared to the novel structure items seems as a reasonable interpretation as similarity-based interference is well established in the general (not SL-specific) literature under the label of proactive interference.

      After possible design and statistical confounds (my previous comments) are ruled out, a deeper treatment of this finding would be warranted, both empirically (e.g., do explicit participants collapse across Experiments 1 and Supplementary Experiment 1 show the same effect?) and theoretically (e.g., why would this phenomenon be unique only to implicit learning, and why would it dissipate after a long awake break?).

      Across all experiments, the explicit participants showed the same pattern of results but no significant difference between pair types, probably due to insufficiency of the available  sample sizes. We already included in the main text the collapsed explicit results across Experiments 1-4 and Supplementary Experiment 1 (p. 16).  This analysis confirmed that, indeed, there was a significant generalization for explicit participants across the two learning phases. We could re-run the same analysis for only Experiment 1 and Supplementary Experiment 1, but due to the small sample of  N=12 in Suppl. Exp. 1, this test will be likely completely underpowered. Obtaining the sufficient sample size for this one test would require an excessive number (several hundreds) of new participants.

      In terms of theoretical treatment, we already presented our interpretation of our results in the discussion section, which we expanded on in the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) It would be very useful to add individual data points (and/or another depiction of the distribution) to the bar plots. If not in the main figures, as added figures in the supplement.

      We added violin plots for all results in the Supplementary.

      (2) It would be helpful to include in the supplement some examples of responses that led to the 'explicit' or 'implicit' classification. Specifically, what kind of response was considered to contain a partial recognition of the underlying structure vs. no recognition?

      We added example responses used for classification in the Supplementary.

      (3) It would be useful to show the results of Experiment 5 as well as the diagonal version as supplemental figures.

      We added the requested figures in the Supplementary.

      Typos: page 10: "in in the tests", page 15: "rerun"

      Fixed.

      Reviewer #2 (Recommendations For The Authors):

      (1) My strongest reservation relates to the small sample size in the explicit group. The authors do report stats for all experiments together in one analysis and I think this is the only robust finding for this group. I would suggest removing any comparisons between this smaller group and the larger implicit group since they do not make a lot of sense due to the imbalance in sample size in my opinion. If they do want to report the explicit group individually for each experiment, they should at least test for differences between the experiments also for this group using ANOVA.

      We do agree that the unbalanced nature of the sample sizes can be problematic for the between-group comparisons. The t-tests reported for between-group comparisons are in fact Welch’s t-test better suited for unequal sample sizes and variances. Previously, we failed to report that these t-tests were Welch’s t-test, which we fixed in the revised version.

      In the Supplementary, we previously reported an ANOVA including all explicit participants from all experiments. This showed a significant main effect of Experiment and test type, but no significant interaction. We take this as evidence that although specific levels of learning vary by experimental condition, the overall pattern of learning (i.e. which pairs are learned better) are the same across all experiments.

      (2) Moreover, the explicit group does not only differ in the explicitness of their memory but also regarding learning performance per se (as evidenced by performance differences for the first training). This important confound needs to be acknowledged and discussed more thoroughly!

      We agree that this topic is important, this is why the subsection “The Type of Transfer Depends on Quality of Knowledge, Not Quantity of Knowledge” deals exclusively with this issue. See our reply to the next point.

      (3) The resampling approach is somewhat interesting to solve the issue raised in 2. However, I doubt that the authors actually achieve what they are claiming. Since we have a 2-AFC task the possibility must be considered that participants who chose correctly in the implicit group did so by chance. This means that the assumption that the matched pairs actually have the same amount of memory for the first training period as the explicit group is likely false. Therefore, this analysis is still comparing apples and oranges.

      We address this idea in detail in the supplementary materials pointing out first that the matched results showed the same pattern as the full results suggesting that Phase 1 and Phase 2 results are independent for this group, and by arguing that randomly selected subset of participants should not show a significant deviation from null performance in the Same vs. Novel performance in Phase 2.

      (4) One important issue, when conducting online experiments is assuring random allocation of participants. How did the authors recruit participants to ensure they did not select participants for the different experiments that differed regarding their preference for wake vs. sleep retention intervals? If no care was taken in this regard, I would suggest reporting this and maybe briefly discussing it.

      This shortcoming was now reported and addressed in the discussion section of the revised manuscript.

      (5) I could not find any information about the exact questions that were asked about the task rules. Also, there was no information on how the answers were used to assign groups. Both should be added.

      The exact questions were added to the revised Supplementary.

      (6) I think that the literature on sleep and rule extraction is well-represented in the manuscript. However, I think also referring more thoroughly to the literature on how sleep leads to gist extraction, schemas, and insight would help understand the relevance of the present research.

      We subsumed references to the mentioned areas of research under the labels of abstraction and generalization. In the revised section, we listed the appropriate labels along with the already used references to make the connection to a vast literature treating generalization in related but distinct ways more explicit.

      (7) It is unclear to me why the items learned in the first learning phase interfere with those learned in the second learning phase (without sleep) and not vice versa. What is the author's explanation for this?

      We added a paragraph on this to our revised discussion section. In short, there may also be retroactive interference. However, we would need yet another variation of the paradigm to properly measure it, and this was outside the scope of the current work.

      (8) As far as I can tell the study lacks all of the usual control tasks that are used in the field of sleep and memory (especially subjective sleepiness and objective vigilance). In addition, this research has the circadian confound, and therefore additional controls would have been warranted, e.g., morningness-eveningness, retrieval capabilities. Also, performance immediately after training phase 1 was not tested, which would serve as an important control for circadian differences in initial learning of the rule.

      The study uses a number of the control measures established in the sleep and memory literature, such as habitual sleep quality and sleep quality during the night of and the night before the experiment. However, there are, of course, more potentially interesting measures, such as the ones named by the reviewer.

      Testing performance right after training phase 1 would have been very interesting indeed. However, due to the nature of statistical learning tasks, this would have completely confounded the implicitness of learning by presenting participants with segmented input; i.e. isolated pairs. Therefore, we opted for the lesser of two evils in our design decision.

      (9) As far as I can tell, there is no effect of sleep on correctly identifying pairs from training phase 1. This would be expected and thus should be discussed.

      As noted and referenced in the discussion section, the effect of sleep on statistical learning per se is a subject of controversy in the literature, where some studies apparently find effects, while others find no effect on statistical learning whatsoever.

      (10) The manuscript should explicitly mention if the study was preregistered.

      It was not.

      Reviewer #3 (Recommendations For The Authors):

      The topic of this project is close to my heart, and I commend the authors for conducting numerous variations of the experiment with large sample sizes. I have some suggestions I feel will make the paper stronger, and a few minor comments that caught my eye during reading:

      (1) First and foremost, I found the paper's structure cumbersome. For instance, different aspects of Experiment 1 results are reported in (1) the main text, (2) under methods, and (3) in Supplementary. This makes reading unnecessarily difficult. This relates not only to the analysis results - the sample size is reported as 226 in the main text, 226+3 in Methods, and 226+3+19 in Supplementary. I strongly suggest removing all results from the Methods section and merging the supplementary results with the main results.

      We overhauled the structure of the paper, moving much more information into the proper method section and out of the Supplementary.

      (2) "Attention checks" and "response bias" appear first in Supplementary Experiment 1 but are explained only later under Experiment 1. The same thing for the experimental procedure. I therefore suggest placing Experiment 1 before Supplementary Experiment 1, but related to my previous comment - have one paragraph dedicated to Subject Exclusion of all experiments.

      The new structure of the Method sections solves this.

      (3) Figure 4 is mentioned but does not appear in the manuscript.

      This has been fixed. The paragraph in question now references the correct supplementary figure.

      (4) OSF project includes only data with no README file on how to understand the data. The work would also benefit from sharing the experimental and analysis codes.

      A README file was added.

      (5) This sentence is repeated in relation to four experiments: "Bayes Factors from Bayesian t-tests for implicit participants reported for experiments 1, 2, and 3 used an r-scale parameter of 0.5 instead of the default √2/2, reflecting that Experiment 1 found small effect sizes for this group". First, it is missing an explanation of what the r-scale means. Second, it sounds as if this was a product of the procedure, but in fact it was a decision by the researcher if I am correct. If so, it is missing a description of how and why this choice was made.

      This was indeed a decision by the researchers, in line with a Baysian logic of evidence accumulation. We made the explanation in the paper clearer.

      (6) Did I understand correctly that each pair was tested 4 times? Was it against the same foil? Did you make sure not to repeat the same pair in back-to-back trials? These details, in addition to what I noted in the public review, are needed.

      Each pair was tested 4 times. Each time against a different foil pair. Details have been added to the Method section.

      (7) Also in relation to my public review, I could not understand why the sample size was overshot by so much in Experiment 1 (229 instead of 198.15)?

      The calculated sample size of 198.15 was for the implicit subgroup alone, while 229 included explicit and implicit participants.

      (8) The correlation between phase 1 and phase 2 is only tested in explicit participants. Why is that? A test in implicit participants is needed for completeness.

      Correlations for implicit participants have been added.

      (9) There is known asymmetry between the horizontal and vertical plains in our visual system (with preference for horizontal stimuli). I was missing a comparison between learning in the two structures, and a report of how many participants received either in Phase 1.

      The allocation of participants to horizontal and vertical conditions was balanced. In the Method section we already report an ANOVA testing for a potential effect of orientation condition, which was not significant.

      Minor/aesthetic comments:

      (1) "In Phase 2, explicit participants performed above chance for learning pairs that shared their higher level orientation structure with that of pairs in Phase 1". This sounds as if there was a separate test following the two learning phases. Perhaps reword to "for phase 2 pairs".

      Fixed

      (2) "the two asleep-consolidation groups (Exp. 3 and 4)" - I think you mean Exp. 2 and 4.

      Fixed.

      (3) "acquiring explicitness in Experiment 5 as compared to 1" I think you mean Supplementary Experiment 1 as compared to 1.

      Fixed

      (4) "without such a redescription, the previously learned patterns in Phase 1 interfere with new ones in Phase 2, when redescription occurs..." The comma should be a dot.

      Fixed

      (5) In Experiment 4, did 168 or 169 participants survive exclusion? Both accounts exist, and so do reports of degrees of freedom that allow both 23 and 24 explicit participants.

      Fixed.

      (6) "Implicit learners also performed above chance.." in Experiment 2 is missing (n=XX).

      Fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      We are grateful to the reviewers and the editorial team for their feedback and thorough revisions of our paper. We also appreciate their acknowledgement that this study represents a significant advancement in the field of reproductive neuroendocrinology and offers insights on the contribution of obesity vs melanocortin signaling in women’s fertility. In the revised version, we will provide a more detailed clarification of the data and methodology and adhere to the reviewers’ suggestions.

      Please find below our answers to specific concerns in the public review:

      Given the fact that mice lacking MC4R in Kiss1 neurons remained fertile despite some reproductive irregularities, the overall tone and some of the conclusions of the manuscript (e.g., from the abstract: "... Mc4r expressed in Kiss1 neurons is required for fertility in females") were overstated. Perhaps this can be described as a contributing pathway, but other mechanisms must also be involved in conveying metabolic information to the reproductive system.

      We will tone down these statements throughout the manuscript to indicate that MC4R in Kiss1 neurons plays a role in the metabolic control of fertility (rather than “…is required for fertility”)

      The mechanistic studies evaluating melanocortin signalling in Kiss1 neurons were all completed in ovariectomised animals (with and without exogenous hormones) that do not experience cyclical hormone changes. Such cyclical changes are fundamental to how these neurons function in vivo and may dynamically alter the way they respond to neuropeptides. Therefore, eliminating this variable makes interpretation difficult.

      Mice lack true follicular and luteal phases and therefore it is impossible to separate estrogen-mediated changes from progesterone-mediated changes (e.g., in a proestrous female). Therefore, we use an ovariectomized female model in which we can generate a LH surge with an E2-replacement regimen [1]. This model enables us to focus on estrogen effects, exclude progesterone effects, and minimize variability. Inclusion of cycling females would make interpretation much more difficult.

      (1) Bosch et al., 2013 Mol & Cell Endo; https://doi.org/10.1016/j.mce.2012.12.021

      Use of the POMC-Cre to target ontogenetic inputs to Kiss1 neurons might have targeted a wider population of cells than intended.

      POMC is transiently expressed during embryonic development in a portion of cells fated to be Kiss1 or NPY/AgRP neurons [1-2]. Therefore, this is a valid concern when crossing with a floxed mouse. However, use of AAVs in adult animals avoids this issue and leads to specific expression in POMC neurons [3]. This POMC-Cre mouse has been used extensively with AAVs to drive specific expression in POMC neurons by other laboratories [4-7]. Therefore, we are confident that our optogenetic studies have narrowly targeted POMC inputs.

      (1) Padilla et al., 2010 Nat Med; https://doi.org/10.1038/nm.2126

      (2) Lam et al., 2017 Mol Metab; https://doi.org/10.1016/j.molmet.2017.02.007

      (3) Stincic et al., 2018 eNeuro; https://doi.org/10.1523/eneuro.0103-18.2018

      (4) Fenselau et al., 2017 Nat Neuro; https://doi.org/10.1038/nn.4442

      (5) Rau & Hentges, 2019 J Neuro; https://doi.org/10.1523/jneurosci.3193-18.2019

      (6) Fortin et al., 2021 Nutrients; https://doi.org/10.3390/nu13051642

      (7) Villa et al., 2024 J Neuro; https://doi.org/10.1523/jneurosci.0222-24.2024

      Recommendations for Authors

      We thank the reviewers and the editorial team for their comments and thorough revisions of our paper. We have now addressed their comments and edited the manuscript accordingly:

      Reviewer #1 (Recommendations For The Authors):

      L80 -This is an awkward sentence; it isn't an inverse agonist of the AgRP; this may read better just to say that the inverse agonist, AgRP.

      Thank you for this comment. This has now been changed in the text (L80).

      L86 - This text reads as if mice have an inherent obesity issue.

      This has also now been addressed in the text (L86).

      L131 - The numbers of digits past the decimal point should match for both mean and SEM.

      This has also now been addressed throughout the text.

      Figure 1D: Revise the bar graphs with distinct SEM bars, as these data are not generated within the same mice.

      The graphs are now changed, and they include distinct SEM and individual data points.

      Figure 2I-L - An n of 3 for controls is pretty minimal, though the clustering of data points is tight.

      We thank the reviewer for this comment, and we emphasize that while we agree that an n=3 for controls is minimal, the mRNA level values of this group are close, therefore the clustering of the data points is tight. We are happy to provide the raw data value for these groups if the reviewer wishes to.

      L159 - The role of reduced dynorphin mRNA is pretty speculative with regard to basal levels of LH, especially since no other indices of LH secretion were affected. It should also be recognized that mRNA levels do not always equate to activity.

      We agree with the reviewer that our explanation of the role of the reduced dynorphin with regards to the elevated basal LH is speculative, however, we only report that the higher LH levels correlates with the lower expression of the Pdyn gene expression, which is in line with the well documented role of Dynorphin on inhibiting LH secretion. We also recognize that mRNA levels don’t necessarily reflect activity. We have now added this statement to the text (L159).

      L164 - Given the ovary data, it seems that the increase seen in KO mice isn't quite sufficient, but is it known how much of a surge is necessary for ovulation in mice?

      We agree with the reviewer’s comment that the LH surge in Kiss1MC4RKO group is not enough to consistently induce ovulation, which is supported by the decrease in the numbers of corpora lutea data (Figure 2, O).

      According to literature, an LH surge in the female mice is estimated by a LH value >4 ng/ml (Bahougne et al., 2020). According to this rule, our data show that only two females out of six had LH surge in the KO group, while four females out of five had LH surge in the control group.  

      L211 - According to the figure, LH pulses were not recovered and remained similar to KO levels. Looking at the LH secretory patterns presented, it seems like the pulse frequency data should be interpreted with some caution, given that some of the pulses identified are tenuous at best.

      We agree that the LH pulses identified by our software (criteria described in the methods) are variable in shape (LH pulses are difficult to detect clearly in gonad intact females) and did not differ in number between groups; however, the reinsertion of Mc4r within Kiss1 neurons restored LH basal levels, amplitude and total secretory mass, which are clear indicatives of a significant improvement in the ability of these mice to release LH.

      L218 - Is there a reason why the surge was not looked at in these groups?

      Ovarian histology is the best indicator of ovulation. In these mice, corpora lutea were absent, indicating impaired ovulation, thus, we did not consider performing an LH surge protocol was necessary.

      L244 - This would also fit with previous findings in sheep that not all Kiss neurons express MC receptors

      We agree with this comment.

      L329 - Given the rapidity of its actions, how would this membrane ER function during a normal surge?

      Rapid estrogen signaling can act to ease transitions between states. Membrane delimited E2 actions can quickly attenuate or enhance coupling between receptors and signaling cascades. These effects will precede E2-driven changes in gene expression that produce more stable alterations in signaling. This combination of mechanisms will reduce any lag between rises in serum E2 and physiological effects. Considering the abbreviated mouse reproductive cycle, parallel mechanisms acting on different timescales are particularly important.

      L365 - I'm a little confused as to how this particular work sheds light on a role for MC3R. Is the relative distribution of the two isoforms within Kiss neurons known?

      In the present study, we report that hypothalamic Mc3r expression decreases leading up to the age of puberty onset (p30), in line with the profile of expression of Mc4r and a recent publication involving Mc3r in puberty onset (Lam et al., 2021), suggesting that both receptors may be involved in the control of reproductive function, potentially through the direct regulation of Kiss1 neurons as characterized in our present study.

      L422 - While I understand the nature of this statement, the receptor may simply reflect the activity of what binds to it, i.e., AgRP vs. alpha-MSH, suggesting that maybe the prepubertal period is more AgRP-dominated.

      We agree with this statement, and this needs to be further investigated.

      L495 - Reinsertion of Mc4R in Kiss1 neurons

      Thank you for this comment. This is now corrected in the text (L501).

      L524 - Bilateral ovariectomy of 6-month

      Thank you for this comment. This is now corrected in the text (L530).

      L538 - Is it known what stage of the cycle these mice were in when samples were collected?

      Yes, the samples were collected in diestrus. This is now mentioned in the text (L548)

      L556 - Pulse amplitude is usually measured relative to the preceding nadir.

      The method that we have been consistently using in our lab is the average of the 4 highest LH values in the samples collection period for each animal. We have found this to be consistent and representative of the overall amplitude (McCarthy et al., 2021; Talbi et al., 2021).

      L594 - This is a little confusing - the whole MBH would contain the ARH, but only the ARH was collected from the KO mice. If the whole MBH, dynorphin and Tac3, and Tac3 are expressed outside of the ARC, making interpretation of changes specifically within the ARH is difficult.

      Here (L592), we describe two different experiments, as mentioned by i) and ii).

      For experiment 1 (i): MBH was used in the WT mice at ages P10, P15, P22 and P30 to investigate the expression of the melanocortin genes (Agrp, Pomc, Mc3r and Mc4r).

      For experiment 2 (ii): In both KO and control groups, only the micro-dissected ARH was used to investigate genes expressions of Pdyn, Kiss1, Tac2, Tacr3.

      Reviewer #2 (Recommendations For The Authors):

      The validation experiments for the various manipulations are currently presented in the supplementary data. Still, in my opinion, these are critically important for interpreting the data, and it should be considered to present these more comprehensively in the main body of the manuscript. In Figure S1, it seems that the exposure of the two images is not the same, with a higher background in the control. Has this image been adjusted to highlight the staining, while the other has not? It looks like there remains a low level of expression still present in at least some of the KO cells - this may reflect difficulties using RNAscope (with its extreme amplification) to detect the absence of a signal, or it could also be that the knockout is incomplete. A percentage of cells still express MC4R. I think this should be acknowledged or discussed.

      We thank the reviewer for the feedback. While we agree that the validation of the mouse model is critical, we would like to keep it in the supplemental data.

      We also agree that the exposure looks different between the KO and WT controls, and we thank the reviewer for this comment. The quality of the photograph decreased when transferring to the manuscript. This has now been improved in the revised figure.

      As for the MC4R expression in some of the KO cells, we believe that MC4R is expressed in non Kiss1 cells as shown in the merged figure. Therefore, we believe that the Knockout of Mc4r in Kiss1 neurons is complete in these mice.

      The clear difference from the PVN's lack of effect is convincing and indicates that a specific knockout has been achieved. Is equivalent data also available for the AVPV population of cells that are examined later in the manuscript? Do those Kiss1 neurons also express the MC4R? The same question applies to the knock-in experiment: Was the expression of MC4R also driven in the AVPV population using this approach

      Yes, Kiss1 neurons in the AVPV also express MC4R as indicated in this study, and thus Mc4r is removed/reinserted in the AVPV as well in this mouse model.

      The quantitative RT-qPCR data on developmental changes in metabolic signaling molecules are really peripheral to the paper's main question. Relative to the validation experiments (as discussed above), I think these are less important data and could be placed into a supplementary figure. The discussion of these data becomes problematic, e.g., on line 359, the changes are described as "a low melanocortin tone..." but this seems problematic when referring to reduced expression of AgRP, an inverse agonist at the MC4R. If you are going to present these data, individual data points should be shown. Similarly, the question about whether this is a PCOS-like phenotype is perhaps worth asking. Still, the simple assessment of T and AMH could also be reported in a sentence without necessarily showing the data (or placing it in a supplementary figure). Better to focus on the key question - which is the role of MC4R signaling in Kiss1 neurons.

      We understand this reviewer’s concerns, however, due to the impact of MC4R signaling (particularly in the context of AgRP) on puberty, we strongly believe that the reader will benefit from expression profile across ages so we will respectfully disagree and keep in the main figure.  

      Per this reviewer’s comment, we have now added individual data points to Figure 1D.

      We also agree with the reviewer that the T and AMH data are not in the main scope of the paper, but since we uncovered a PCOS-like phenotype in female mice with specific deletion of Mc4r from Kiss1 neurons, it is important to keep these data in the main figure to show that the phenotype does not fully resemble a PCOS model.

      Having praised the experimental design, I think it is fair to acknowledge that the reproductive data from these experiments remain difficult to interpret. I understand that it is difficult to illustrate estrous cycles, but the "quantitative" data on percentages of time spent in any one stage are not as informative as seeing the actual individual patterns in Figure 2B. Were all of the animals consistently like the one illustrated, with persistent diestrus and only occasional evidence of ovulation?

      We agree that Figure 2C may be difficult to interpret but it is the best way to capture the all the data points for each group.

      All the 5 Kiss1MC4RKO females had persistent diestrus phases with only one or two estrus phases over 15 days (except for one female who had 4 estrous days), compared to control females who had 7 to 9 days of estrous, as shown in the graph (except for one female who had 5 days of estrus over 15 days period).

      Given that LH pulses appear to be normal, does this, in fact, suggest an ovarian problem? Is that possible? Are MC4R and Kiss1 co-expressed in the ovary? Or do you think this suggests an ovulation problem, perhaps driven by the impaired LH surge?

      This reviewer is correct in that our findings suggest a central defect in ovulation based on the deficit observed in the preovulatory LH surge. Thus, it is possible to have normal LH pulses, which are driven by one population of Kiss1 neurons (ARH) and the LH surge, driven by a distinct population of Kiss1 neurons (AVPV).

      Similarly, the response to the "LH surge induction protocol" is impaired (why not look at endogenous LH surges?). It seems that ovulation should be an all-or-none phenomenon in that if the LH surge is sufficient to induce ovulation, then all available follicles would be ovulated. If it is not, then no follicles will be ovulated. Why fewer follicles are ovulated in the gene-targeted animals seems more likely to be due to impaired follicular development rather than a subthreshold LH surge. So, this again points back to the ovary. Or perhaps we need a more thorough assessment of the pattern of LH pulses throughout the cycles in these animals.

      An LH surge induction protocol allows us to submit all female mice to the same conditions and expect a similar response, which is then optimal to compare with animals with an expected ovulation deficit, as it eliminates   external factors. We disagree in that ovulation is an all-or-none phenomenon because in mice numerous follicles mature at the same time and thus a decrease in the number of ovulated oocytes may be significant between groups even if the animals are not completely infertile.

      Collectively, my assessment of these data is that there are effects on reproduction, but they are actually relatively subtle. There were abnormal cycles and impaired LH surge in response to exogenous estrogen. But the animals are not actually infertile, so can ovulate and express normal reproductive behavior. So while there is a role for MC4R signalling in Kiss1 neurons, it may be a contributing modulatory role rather than a major regulatory mechanism. I think the tone of the descriptions should reflect this. I like the way it is framed in some parts of the discussion ("reproductive impairments...mediated by MC4R in Kiss1 neurons and not by their obese phenotype"), but the overall significance of this is overstated in some places, such as the abstract and in other parts of the discussion ("this population is tightly controlled by melanocortins").

      As mentioned in previous responses, ovulation in mice is not all-or nothing, so while the mice can reproduce, the disruption in the central mechanisms that control ovulation and irregular estrous cycles are a significant advancement in the field with strong translational potential to species where only one oocyte is usually ovulated, like in humans, where reproductive disorders in MC4R patients had been attributed to the obesity phenotype rather than to a central action of MC4R (as the reviewer captured in their comment). This is one of the main findings of this study.

      The overstatement has been now addressed throughout the text.

      For in vitro studies, all mice were ovariectomized and given estradiol "replacement." What was the rationale for this? Wouldn't this suppress the basal activity of these neurons? Then it appears that some of the animals were studied as ovariectomised (for an unspecified time but apparently ">7 days", without hormone replacement. The basal activity of these cells would be dramatically different. I think these artificial manipulations make these data quite difficult to interpret. How does this reflect the situation in a normal (or abnormal) estrous cycle? My understanding is that the brain slice approach already compromises the ability of this population of cells to function as a coordinated network (i.e., coordinated episodes of activity that are seen in vivo have not been observed in vitro in brain slices). Ovariectomizing and providing exogenous hormones also removes the additional regulatory elements of the cyclical changes in hormone inputs, so the cells may or may not behave like they would in vivo. Perhaps the authors could justify their choice of experimental model.

      We have clarified that the mice were ovariectomized for 7-10 days. A group of 3 mice are OVXed at once and then used on subsequent days a week later. This delay is both for the recovery of the animal and to allow for “washout” of endogenous ovarian hormones. For optogenetic studies, we were not measuring basal activity. Rather, we prioritized the ability to detect a postsynaptic response. While E2 decreases the networked activity of Kiss1- ARH neurons, the Hcn channels, calcium channels, and Vglut2 expression are all increased. This leads to increased excitability and more glutamate release. Mice lack true follicular and luteal phases and therefore it is impossible to separate estrogen-mediated changes from progesterone-mediated changes (e.g., in a proestrous female). Therefore, we use an ovariectomized female model in which we can generate a LH surge with an E2-replacement regimen (Bosch et al., J Mol Cell Endocrinology 2013). This model enables us to focus on estrogen effects, exclude progesterone effects, and minimize variability. Finally, we have documented that Kiss1<sup>ARH</sup> neurons retain the synchronization of their neuronal firing in the hypothalamic slice preparation (Qiu et al., eLife 2016).

      Figure 4E shows neurons' staining after expressing a Cre-dependent channel rhodopsin vector into POMC-Cre mice. The number of labelled cells looks markedly larger than expected for adult POMC neurons. Was the specificity of this approach to neurons expressing POMC checked? I understand that the POMC-Cre mice have been criticised for ectopic expression of Cre during development in other populations of neurons in the arcuate nucleus that does not express POMC, such as the AgRP neurons (e.g., PMID: 22166984). Is it possible that this is not a problem in adult animals? Has that been validated in these animals? The description of the method suggests that it is acknowledged that some of the expression driven in these animals might be in AgRP neurons. Still, optogenetic activation of these cells will include all cells expressing Cre at the time of AAV administration.

      POMC is transiently expressed during embryonic development in a portion of cells fated to be Kiss1 or NPY/AgRP neurons. Therefore, this is a valid concern when crossing with a floxed mouse. However, use of AAVs in adult animals avoids this issue and leads to specific expression in POMC neurons. This POMC-Cre mouse has been used extensively with AAVs to drive specific expression in POMC neurons by other laboratories (Padilla et al., Nat Med 2010; Lam et al., Mol Metab 2017; Stincic et al., eNeuro 2018 eNeuro; Fenselau et al., Nat Neuro 2017). We have previously shown that AAV-driven mCherry expression is limited to cells labeled with a beta-endorphin antibody (Stincic et al., 2018 eNeuro). Therefore, we are confident that our optogenetic studies have narrowly targeted POMC inputs.

      Some additional explanation of the electrophysiology result may be required. For example, on Line 292, I'm confused by Fig 4M. Why is the response to 20Hz stimulation different in this cell (compared to the one in 4L) before administering naloxone? What proportion of cells showed this opposite response? On line 307: Is 5 cells sufficient for testing the POMC inputs onto AVPV and PeN Kiss1 neurons? How many slices/animals are included in collecting these 5 cells? The rapid action of STX illustrates the ability to modulate the response to MTII, but I am struggling to understand the implications of this in a physiological context. Suppose this response is desensitized by longer-term treatment with E2 (as indicated in the manuscript). Is it relevant to normal regulation during the cycle (particularly in the AVPV, where the key regulatory step seems to be the prolonged exposure to high estradiol as part of the preovulatory signals leading up to the LH surge)?

      As stated in the text, E2 has been shown to increase POMC expression and beta-Endorphin immunostaining. We do not know the effects of E2 on aMSH expression and release. E2 also tends to attenuate the coupling between inhibitory postsynaptic metabotropic (Gi,o-coupled) receptors and signaling cascades. So, there is likely a combination of pre- and post-synaptic mechanisms contributing to these responses. However, the focus of the current studies was on the predominant melanocortin signaling and, as such, we chose to eliminate the influence of opioid signaling. We have added two more cells to this group, both of which were successfully rescued for a total of 5 of 6 cells (6 slices, 5 animals). Between the labeling of b-endorphin fibers and high rate of rescue, we do believe that this is sufficient evidence to support a direct POMC input to Kiss1<sup>AVP/PeN</sup> neurons.

      Line 52: "Here, we show that Mc4r expressed in Kiss1 neurons is required for fertility in females." The knockout animals remain fertile, so this conclusion needs to be re-worded.

      Thank you for this comment. This has now been changed (L52).

      Line 80: "The melanocortin 4 receptor (MC4R) binds α-melanocyte stimulating hormone (αMSH), an agonist product of the pro-opiomelanocortin (Pomc) gene, and the inverse agonist of the agouti-related peptide (AgRP) to regulate food intake and energy expenditure" Is this the correct wording? I think it should be stated that AgRP is an inverse agonist at the MC4R, not that αMSH is the inverse agonist of AgRP. Re-work this sentence.

      Thank you for this comment. This has now been changed (L79-80).

      Line 88: "... however, conflicting reports exist". Describe what these conflicting reports show. Many MC4 variants ("mutations") are expressed in humans, but few will fully inactivate signalling like the mouse knockout.

      We thank the reviewer for this comment. By conflicting data, we refer to the studies that report no reproductive impairments in women with MC4R mutations. Either because the metabolic impairments (obesity, hyperphagia, hyperinsulinemia, hyperleptinemia, etc) are so strong that the focus is skewed to these issues, without a full reproductive assessment in these women, or simply because the reviewer mentioned, not all MC4R mutations fully inactivate its signaling in humans - as opposed to mouse models where reproductive disruption has been described previously in full body MC4RKOs.

      Line 91: "...that largely affects females". Is this a genuine sex difference, or are reproductive deficits simply more overt in female rodents? I think the Coss paper (reference 19 in the manuscript) showed a greater effect of diet-induced obesity in males than in females.

      We believe that sex differences exist with regards to the role of MC4R in the regulation of fertility, as we show that most of this effect is mediated by MC4R signaling in Kiss1 AVPV neurons, a neuronal population that is specific to the female brain.

      As far as we can tell, the Coss paper (Villa et al., 2024) has only tested males but not females. Moreover, they investigated the effect of diet induced obesity in mice on their fertility (specifically LH secretion), while in this study we are specifically looking at the deletion of MC4R from Kiss1 neurons, and these mice were not obese (Figure 2A). While both these conditions induce impaired fertility, the mechanisms and signaling pathways are different (our mice lack MC4R signaling while the obese mice have a decrease in MC4R expression but the signaling is still functional).

      Line 392: also Hessler et al. PMID: 32337804.

      This reference is now added to the text (Line 393).

      Line 433. The discussion of how advanced puberty onset (seen in the Kiss1-specific KO animals) might be caused by MC4R signalling in AVPV Kiss1 neurons, which are sexually dimorphic, which might explain sex differences in puberty timing in mammals seems extremely speculative and based on limited data. More targeted experiments would be needed to address this, and I think this speculation should be removed here.

      This speculation has now been removed from the text.

      Line 438: "Furthermore, our findings suggest that metabolic cues, through the regulation of the melanocortin output onto Kiss1AVPV/PeN neurons, are essential for the timing and magnitude of the GnRH/LH surge." Again, I think this is overstating the present data, which has only looked at an artificial hormone administration regime. The animals are fertile and, thus, must be able to mount a sufficient LH surge. The major effect, in fact, seems to be on their cycle, perhaps leading to impaired follicular development. Please acknowledge that this will be one of the multiple pathways by which metabolic information is fed into the HPG axis.

      In addition to the effect on their cycles as mentioned by the reviewer, the Kiss1MC4RKO females also display impaired fertility (Figure 2, S-T) and fewer corpora lutea which is in line with the impaired mounting of LH surge (Figure 2, M). Even if the LH surge is induced by the hormone administration protocol, it only reflects the natural ability of the HPG axis to mount the surge, as this regimen is only there to mimic the endogenous hormonal changes leading to LH surge and therefore ovulation, in a controlled manner. Nonetheless, we agree with this reviewer that this is not the sole mechanism by which metabolism regulates reproductive function and this has been emphasized in the paper. (line 443)

      Reviewer #3 (Recommendations For The Authors):

      The decreased melanocortin tone drives puberty onset (Figure 1D), and this is correlative. The transgenic animals' hypothalamic expression of Agrp, Pomc, Mc4r, and Mc3r can be measured to strengthen the claim. Hprt expression should be demonstrated, as this housekeeping gene was used as a common denominator.

      We thank the reviewer for this comment. While we think that indeed, measuring Agrp, Pomc, Mc4r, and Mc3r gene expressions in the transgenic mice will strengthen our claim and give more insights into the melanocortins tone during pubertal maturation, this is unfortunately not feasible as it will involve generating a lot of mice (at least n=40 pups for an n=5/group, KO and control littermates, females only -which will require setting up lots of breeding pairs-) during different ages throughout puberty.

      As for the gene expression of Hprt, because we have 6 mice per age, 4 ages total, every gene (Agrp, Pomc, Mc4r, Mc3r) was run in a separate plate with Hprt as its own housekeeping gene. Samples were run in duplicates for each Hprt and melanocortin genes in a 96 well = 48 wells for Hprt and 48 wells for each of the melanocortin genes. Therefore, it won’t be possible to represent one Hprt expression for all the four genes, however every gene was normalized to the Hprt gene expression that was ran in the same plate).

      In Figures 4 and 5, dot plots can be used (as opposed to the bar graphs) to better reflect the individual data points.

      Figures 4 and 5 have been revised to include individual data points.

      The electrophysiology experiment requires more details in the method section. In addition to the publication cited, a brief recap of the methodology used in this paper, such as the focal application of MTII (Figure 4B), is also needed.

      We have added more details to the Methods.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      In the manuscript the authors describe a new pipeline to measure changes in vasculature diameter upon optogenetic stimulation of neurons. The work is useful to better understand the hemodynamic response on a network /graph level.

      Strengths:

      The manuscript provides a pipeline that allows to detect changes in the vessel diameter as well as simultaneously allows to locate the neurons driven by stimulation.

      The resulting data could provide interesting insights into the graph level mechanisms of regulating activity dependent blood flow.

      Weaknesses:

      (1) The manuscript contains (new) wrong statements and (still) wrong mathematical formulas.

      The symbols in these formulas have been updated to disambiguate them, and the accompanying statements have been adjusted for clarity.

      (2) The manuscript does not compare results to existing pipelines for vasculature segmentation (opensource or commercial). Comparing performance of the pipeline to a random forest classifier (illastik) on images that are not preprocessed (i.e. corrected for background etc.) seems not a particularly useful comparison.

      We’ve now included comparisons to Imaris (a commercial) for segmentation and VesselVio (open-source) for graph extraction software.

      For the ilastik comparison, the images were preprocessed prior to ilastik segmentation, specifically by doing intensity normalization.

      Example segmentations utilizing Imaris have now been included. Imaris leaves gaps and discontinuities in the segmentation masks, as shown in Supplementary Figure 10. The Imaris segmentation masks also tend to be more circular in cross-section despite irregularities on the surface of the vessels observable in the raw data and identified in manual segmentation. This approach also requires days to months to generate per image stack.

      “Comparison with commercial and open-source vascular analysis pipelines

      To compare our results with those achievable on these data with other pipelines for segmentation and graph network extraction, we compared segmentation results qualitatively with Imaris version 9.2.1 (Bitplane) and vascular graph extraction with VesselVio [1]. For the Imaris comparison, three small volumes were annotated by hand to label vessels. Example slices of the segmentation results are shown in Supplementary Figure 10. Imaris tended to either over- or under-segment vessels, disregard fine details of the vascular boundaries, and produce jagged edges in the vascular segmentation masks. In addition to these issues with segmentation mask quality, manual segmentation of a single volume took days for a rater to annotate. To compare to VesselVio, binary segmentation masks (one before and one after photostimulation) generated with our deep learning models were loaded into VesselVio for graph extraction, as VesselVio does not have its own method for generating segmentation masks. This also facilitates a direct comparison of the benefits of our graph extraction pipeline to VesselVio. Visualizations of the two graphs are shown in Supplementary Figure 11. Vesselvio produced many hairs at both time points, and the total number of segments varied considerably between the two sequential stacks: while the baseline scan resulted in 546 vessel segments, the second scan had 642 vessel segments. These discrepancies are difficult to resolve in post-processing and preclude a direct comparison of individual vessel segments across time. As the segmentation masks we used in graph extraction derive from the union of multiple time points, we could better trace the vasculature and identify more connections in our extracted graph. Furthermore, VesselVio relies on the distance transform of the user supplied segmentation mask to estimate vascular radii; consequently, these estimates are highly susceptible to variations in the input segmentation masks.We repeatedly saw slight variations between boundary placements of all of the models we utilized (ilastik, UNet, and UNETR) and those produced by raters. Our pipeline mitigates this segmentation method bias by using intensity gradient-based boundary detection from centerlines in the image (as opposed to using the distance transform of the segmentation mask, as in VesselVio).”

      (3) The manuscript does not clearly visualize performance of the segmentation pipeline (e.g. via 2d sections, highlighting also errors etc.). Thus, it is unclear how good the pipeline is, under what conditions it fails or what kind of errors to expect.

      On reviewer’s comment, 2D slices have been added in the Supplementary Figure 4.

      (4) The pipeline is not fully open-source due to use of matlab. Also, the pipeline code was not made available during review contrary to the authors claims (the provided link did not lead to a repository). Thus, the utility of the pipeline was difficult to judge.

      All code has been uploaded to Github and is available at the following location: https://github.com/AICONSlab/novas3d

      The Matlab code for skeletonization is better at preserving centerline integrity during the pruning of hairs from centerlines than the currently available open-source methods.

      - Generalizability: The authors addressed the point of generalizability by applying the pipeline to other data sets. This demonstrates that their pipeline can be applied to other data sets and makes it more useful.  However, from the visualizations it's unclear to see the performance of the pipeline, where the pipelines fails etc. The 3d visualizations are not particularly helpful in this respect . In addition, the dice measure seems quite low, indicating roughly 20-40% of voxels do not overlap between inferred and ground truth. I did not notice this high discrepancy earlier. A thorough discussion of the errors appearing in the segmentation pipeline would be necessary in my view to better assess the quality of the pipeline.

      2D slices from the additional datasets have been added in the Supplementary Figure 13 to aid in visualizing the models’ ability to generalize to other datasets.

      The dice range we report on (0.7-0.8) is good when compared to those (0.56-86) of 3D segmentations of large datasets in microscopy [2], [3], [4], [5], [6]. Furthermore, we had two additional raters segment three images from the original training set. We found that the raters had a mean inter class correlation  of 0.73 [7]. Our model outperformed this Dice score on unseen data: Dice scores from our generalizability tests on C57 mice and Fischer rats on par or higher than this baseline.

      Reviewer #2 (Public review):<br /> The authors have addressed most of my concerns sufficiently. There are still a few serious concerns I have. Primarily, the temporal resolution of the technique still makes me dubious about nearly all of the biological results. It is good that the authors have added some vessel diameter time courses generated by their model. But I still maintain that data sampling every 42 seconds - or even 21 seconds - is problematic. First, the evidence for long vascular responses is lacking. The authors cite several papers:

      Alarcon-Martinez et al. 2020 show and explicitly state that their responses (stimulus-evoked) returned to baseline within 30 seconds. The responses to ischemia are long lasting but this is irrelevant to the current study using activated local neurons to drive vessel signals.

      Mester et al. 2019 show responses that all seem to return to baseline by around 50 seconds post-stimulus.

      In Mester et al. 2019, diffuse stimulations with blue light showed a return to baseline around 50 seconds post-stimulus (cf. Figure 1E,2C,2D). However, focal stimulations where the stimulation light is raster scanned over a small region focused in the field of view show longer-lasting responses (cf. Figure 4) that have not returned to baseline by 70 seconds post-stimulus [8]. Alarcon-Martinez et al. do report that their responses return baseline within 30 seconds; however, their physiological stimulation may lead to different neuronal and vessel response kinetics than those elicited by the optogenetic stimulations as in current work.

      O'Herron et al. 2022 and Hartmann et al. 2021 use opsins expressed in vessel walls (not neurons as in the current study) and directly constrict vessels with light. So this is unrelated to neuronal activity-induced vascular signals in the current study.

      We agree that optogenetic activation of vessel-associated cells is distinct from optogenetic activation of neurons, but we do expect the effects of such perturbations on the vasculature to have some commonalities.

      There are other papers including Vazquez et al 2014 (PMID: 23761666) and Uhlirova et al 2016 (PMID: 27244241) and many others showing optogenetically-evoked neural activity drives vascular responses that return back to baseline within 30 seconds. The stimulation time and the cell types labeled may be different across these studies which can make a difference. But vascular responses lasting 300 seconds or more after a stimulus of a few seconds are just not common in the literature and so are very suspect - likely at least in part due to the limitations of the algorithm.

      The photostimulation in Vazquez et al. 2014 used diffuse photostimulation with a fiberoptic probe similar to Mester et al. 2019 as opposed to raster scanning focal stimulation we used in this study and in the study by Mester et al. 2019  where we observed the focal photostimulation to elicited longer than a minute vascular responses. Uhlirova et al. 2016 used photostimulation powers between 0.7 and 2.8 mW, likely lower than our 4.3 mW/mm2 photostimulation. Further, even with focal photostimulation, we do see light intensity dependence of the duration of the vascular responses. Indeed, in Supplementary Figure 2, 1.1 mW/mm2 photostimulation leads to briefer dilations/constrictions than does 4.3 mW/mm2; the 1.1 mW/mm2 responses are in line, duration wise, with those in Uhlirova et al. 2016.

      Critically, as per Supplementary Figure 2, the analysis of the experimental recordings acquired at 3-second temporal resolution did likewise show responses in many vessels lasting for tens of seconds and even hundreds of seconds in some vessels.

      Another major issue is that the time courses provided show that the same vessel constricts at certain points and dilates later. So where in the time course the data is sampled will have a major effect on the direction and amplitude of the vascular response. In fact, I could not find how the "response" window is calculated. Is it from the first volume collected after the stimulation - or an average of some number of volumes? But clearly down-sampling the provided data to 42 or even 21 second sampling will lead to problems. If the major benefit to the field is the full volume over large regions that the model can capture and describe, there needs to be a better way to capture the vessel diameter in a meaningful way.

      In the main experiment (i.e. excluding the additional experiments presented in the Supplementary Figure 2 that were collected over a limited FOV at 3s per stack), we have collected one stack every 42 seconds. The first slice of the volume starts following the photostimulation, and the last slice finishes at 42 seconds. Each slice takes ~0.44 seconds to acquire. The data analysis pipeline (as demonstrated by the Supplementary Figure 2) is not in any way limited to data acquired at this temporal resolution and - provided reasonable signal-to-noise ratio (cf. Figure 5) - is applicable, as is, to data acquired at much higher sampling rates.

      It still seems possible that if responses are bi-phasic, then depth dependencies of constrictors vs dilators may just be due to where in the response the data are being captured - maybe the constriction phase is captured in deeper planes of the volume and the dilation phase more superficially. This may also explain why nearly a third of vessels are not consistent across trials - if the direction the volume was acquired is different across trials, different phases of the response might be captured.

      Alternatively, like neuronal responses to physiological stimuli, the vascular responses elicited by increases in neuronal activity may themselves be variable in both space and time.

      I still have concerns about other aspects of the responses but these are less strong. Particularly, these bi-phasic responses are not something typically seen and I still maintain that constrictions are not common. The authors are right that some papers do show constriction. Leaving out the direct optogenetic constriction of vessels (O'Herron 2022 & Hartmann 2021), the Alarcon-Martinez et al. 2020 paper and others such as Gonzales et al 2020 (PMID: 33051294) show different capillary branches dilating and constricting. However, these are typically found either with spontaneous fluctuations or due to highly localized application of vasoactive compounds. I am not familiar with data showing activation of a large region of tissue - as in the current study - coupled with vessel constrictions in the same region. But as the authors point out, typically only a few vessels at a time are monitored so it is possible - even if this reviewer thinks it unlikely - that this effect is real and just hasn't been seen.

      Uhlirova et al. 2016 (PMID: 27244241) observed biphasic responses in the same vessel with optogenetic stimulation in anesthetized and unanesthetized animals (cf Fig 1b and Fig 2, and section “OG stimulation of INs reproduces the biphasic arteriolar response”). Devor et al. (2007) and Lindvere et al. (2013) also reported on constrictions and dilations being elicited by sensory stimuli.

      I also have concerns about the spatial resolution of the data. It looks like the data in Figure 7 and Supplementary Figure 7 have a resolution of about 1 micron/pixel. It isn't stated so I may be wrong. But detecting changes of less than 1 micron, especially given the noise of an in vivo prep (brain movement and so on), might just be noise in the model. This could also explain constrictions as just spurious outputs in the model's diameter estimation. The high variability in adjacent vessel segments seen in Figure 6C could also be explained the same way, since these also seem biologically and even physically unlikely.

      Thank you for your comment. To address this important issue, we performed an additional validation experiment where we placed a special order of fluorescent beads with a known diameter of 7.32 ± 0.27um, imaged them following our imaging protocol, and subsequently used our pipeline to estimate their diameter. Our analysis converged on the manufacturer-specified diameters, estimating the diameter to be 7.34 ± 0.32. The manuscript has been updated to detail this experiment, as below:

      Methods section insert

      “Second, our boundary detection algorithm was used to estimate the diameters of fluorescent beads of a known radius imaged under similar acquisition parameters. Polystyrene microspheres labelled with Flash Red (Bangs Laboratories, inc, CAT# FSFR007) with a nominal diameter of 7.32um and a specified range of 7.32 ± 0.27um as determined by the manufacturer using a Coulter counter were imaged on the same multiphoton fluorescence microscope set-up used in the experiment (identical light path, resonant scanner, objective, detector, excitation wavelength and nominal lateral and axial resolutions, with 5x averaging). The images of the beads had a higher SNR than our images of the vasculature, so Gaussian noise was added to the images to degrade the SNR to the same level of that of the blood vessels. The images of the beads were segmented with a threshold, centroids calculated for individual spheres, and planes with a random normal vector extracted from each bead and used to estimate the diameter of the beads. The same smoothing and PSF deconvolution steps were applied in this task. We then reported the mean and standard deviation of the distribution of the diameter estimates. A variety of planes were used to estimate the diameters.”

      Results Section Insert

      “Our boundary detection algorithm successfully estimated the radius of precisely specified fluorescent beads. The bead images had a signal-to-noise ratio of 6.79 ± 0.16 (about 35% higher than our in vivo images): to match their SNR to that of in vivo vessel data, following deconvolution, we added Gaussian noise with a standard deviation of 85 SU to the images, bringing the SNR down to 5.05 ± 0.15. The data processing pipeline was kept unaltered except for the bead segmentation, performed via image thresholding instead of our deep learning model (trained on vessel data). The bead boundary was computed following the same algorithm used on vessel data: i.e., by the average of the minimum intensity gradients computed along 36 radial spokes emanating from the centreline vertex in the orthogonal plane. To demonstrate an averaging-induced decrease in the uncertainty of the bead radius estimates on a scale that is finer than the nominal resolution of the imaging configuration, we tested four averaging levels in 289 beads. Three of these averaging levels were lower than that used on the vessels, and one matched that used on the vessels (36 spokes per orthogonal plane and a minimum of 10 orthogonal planes per vessel). As the amount of averaging increased, the uncertainty on the diameter of the beads decreased, and our estimate of the bead's diameter converged upon the manufacturer's Coulter counter-based specifications (7.32 ± 0.27um), as tabulated in Table 1.”

      Reviewer #1 (Recommendations for the authors):

      Comments to the authors replies to the reviews:

      - Supplementary Figure 13:

      As indicated before the 3d images + scale makes it impossible to judge the quality of the outputs.

      As aforementioned, 2D slices have been added to the Supplementary Figure 13.

      - Supplementary Table 3:

      There is a significant increase in the Hausdorrf and Mean Surface Distance measures for the new data, why ?

      A single vessel being missed by either the rater or the model would significantly affect the Hausdorff distance (HD) and by extension Mean Surface Distance: this is particularly pertinent in the LSFM image with its much larger FOV and thus a potential for much larger max distances to result from missed vessels in the prediction or ground truth data. Large Hausdorff distances may indicate a vessel was missed in either the ground truth or the segmentation mask.

      Of note, a different rater annotated these additional datasets from the raters labeling the ground truth data. There is a high variability in boundary placements between raters. On a test where three raters segmented the same three images from the original dataset, we computed a ICC of 0.73 across their segmentations. Our model Dice scores on predictions in out-of-distribution data sets were on par with the inter-rater ICC on the Thy1ChR2 2PFM data.

      - Supplementary Figure 2: The authors provide useful data on the time responses. However, looking at those figures, it is puzzling why certain vessels were selected as responding as there seems almost no change after stimulation. In addition, some of the responses seem to actually start several tens of seconds before the actual stimulus (particularly in A).

      Only some traces in C and D (dark blue) seem to be actually responding vessels.

      This is not discussed and unclear.

      Supplementary Figure 2 displays the time courses of vessel calibre for all vessels in the FOV, not just those deemed responders.

      The aforementioned effects are due to the loess smoothing filter having been applied to the time courses for the preliminary response, which has been rectified in the updated figures. In particular, Supplementary Figure 2 has been updated with separate loess smoothing before and after photostimulation. The (pre-stimulation) effect is gone once the loess smoothing has been separated.

      - R Point 7: As indicated before and in agreement with the alternative reviewer, the quality of the results in 3d is difficult to judge. No 2d sections that compare 'ground truth' with inferred results are shown in the current manuscript which would enable a much better judgment. The provided video is still 3d and not a video going through 2d slices. Also, in the video the overlap of vasculature and raw data seems to be very good and near 100%, why is the dice measure reported earlier so low ? Is this a particularly good example ?

      Some examples, indicating where the pipeline fails (and why) would be helpful to see, to judge its performance better (ideally in 2d slices).

      As discussed in the public comments, the 2D slices are now included in Suppl. Fig. 4 and suppl. Fig 13 to facilitate visual assessment. The vessels are long and thin so that slight dilations or constrictions impact the Dice scores without being easily visualizable.

      - Author response images 6 and 7. From the presented data the constrictions measured in the smaller vessels may be a result (at least partly) of noise. This seems to be particularly the case in Author response image 7 left top and bottom for example. It would be helpful to show the actual estimates of the vessels radii overlaid in the (raw) images. In some of the pictures the noise level seems to reach higher values than the 10-20% of noise used in the tests by the authors in the revision.

      The vessel radii are estimated as averages across all vertices of the individual vessels: it is thus not possible to overlay them meaningfully in 2D slices: in Figure 2B, we do show a rendering of sample vessel-wise radii estimates.

      - "We tested the centerline detection in Python, scipy (1.9.3) and Matlab. We found that the Matlab implementation performed better due to its inclusion of a branch length parameter for the identification of terminal branches, which greatly reduced the number of false branches; the Python implementation does not include this feature (in any version) and its output had many more such "hair" artifacts. Clearmap skeletonization uses an algorithm by Palagyi & Kuba(1999) to thin segmentation masks, which does not include hair removal. Vesselvio uses a parallelized version of the scipy implementation of Lee et al. (1994) algorithm which does not do hair removal based on a terminal branch length filter; instead, Vesselvio performs a threshold-based hair removal that is frequently overly aggressive (it removes true positive vessel branches), as highlighted by the authors."

      This statement is wrong. The removal of small branches in skeletons is algorithmically independent of the skeletonization algorithm itself. The authors cite a reference concerned with the algorithm they are currently employing for the skeletonization. Careful assessment of that reference shows that this algorithm removes small length branches after skeletonization is performed. This feature is available in open-source packages as well, or could be easily implemented.

      We appreciate that skeletonization is distinct from hair removal and have reworded this paragraph for clarity. We are currently working with SciPy developers to implement hair removal in their image processing pipeline so as to render our pipeline fully open-source.

      The removal of hairs after skeletonization with length based thresholding leads to the possibility of removing parts of centerlines in the main part of vessels after branch points with hairs. The Matlab implementation does not do this and leaves the main branches intact.

      This text has been updated to:

      “Hair” segments shorter than 20 μm and terminal on one end were iteratively removed, starting with the shortest hairs and merging the longest hairs at junctions with 2 terminal branches with the main vessel branch to reduce false positive vascular branches and minimize the amount of centerlines removed. This iterative hair removal functionality of the skeletonization algorithm is currently unavailable in python, but is available in Matlab [9].

      - "On the reviewer's comment, we did try inputting normalized images into Ilastik, but this did not improve its results." This is surprising. Reasonable standard preprocessing (e.g. background removal, equalization, and vessel enhancement) would probably restore most of illastik's performance in the indicated panel.

      While the improvement may be present in a particular set of images, the generalizability of such improvement to other patches is often poor in our experience, as reflected by aforementioned results and the widespread uptake of DL approaches to image segmentation. The in vivo datasets also contain artifacts arising from eg. bleeding into the FOV that ilastik is highly sensitive to. This is an example of noise that is not easily removed by standard preprocessing.

      - "Typical pre-processing/standard computer vision techniques with parameter tuning do not generalize on out-of-distribution data with different image characteristics, motivating the shift to DL-based approaches."

      I disagree with this statement. DL approaches can generalize typically when trained with sufficient amount of diverse data. However, DL approaches can also fail with new out of distribution data. In that situation they only be 'rescued' via new time intensive data generation and retraining. Simple standard image pre-processing steps (e.g. to remove background or boost vessel structures) have well defined parameter that can be easily adapted to new out of distribution data as clear interpretations are available. The time to adapt those parameters is typically much smaller than retraining of DL frameworks.

      We find that the standard image processing approaches with parameter tuning work robustly only if fine-tuned on individual images; i.e., the fine-tuning does not generalize across datasets. This approach thus does not scale to experiments yielding large image sizes/having high throughput experiments. While DL models may not generalize to out-of-distribution data, fine-tuning DL models with a small subset of labels generally produce superior models to parameter tuning that can be applied to entire studies. Moreover, DL fine-tuning is typically an efficient process due to very limited labelling and training time required.

      - It is still unclear how the authors pipeline performs compared with other (open source or commercially) available pipelines. As indicated before, comparing to illastik, particularly when feeding non preprocessed data, does not seem to be a particularly high bar.

      This question has also been raised by the other reviewer who asked to compare to commercially available pipelines.

      This question was not answered by the authors, and instead the authors reply by claiming to provide an open source pipeline. In fact, the use of matlab in their pipeline does not make it fully open-source either. Moreover, as mentioned before, open-source pipelines for comparisons do exists.

      As discussed above, the manuscript now includes comparisons to Imaris for segmentation and Vesselvio for graph extraction. The pipeline is on github.

      -"We agree with the review that this question is interesting; however, it is not addressable using present data: activated neuronal firing will have effects on their postsynaptic neighbors, yet we have no means of measuring the spread of activation using the current experimental model."

      Distances to the closest neuron in the manuscript are measured without checking if it's active. Thus, distances to the first set of n neurons could be measured in the same way, ignoring activation effects.

      Shorter distances to an entire ensemble of neurons would still be (more) informative of metabolic demands.

      This could indeed be done within the existing framework. The connected-components-3d can be used to extract individual occurrences of neurons in the FOV from the neuron segmentation mask. Each neuron could then have its distance calculated to each point on the vessel centerlines.

      - model architecture:

      It is unclear from the description if any positional encoding was used for the image patches.

      It is unclear if the architecture / pipeline can handle any volume sizes or is trained on a fixed volume shapes? In the latter case how is the pipeline applied?

      The model includes positional encoding, as described in Hatamizadeh et al. 2021.

      The model can be applied to images of any size, as demonstrated on larger images in Supplementary Figure 9 and on smaller images in Supplementary Figure 2. The pipeline is applied in the same way. It will read in the size of an input image and output an image of the same size.

      - transformer models often show better results when using a learning rate scheduler that adjust the learning rate (up and down ramps typically). Did the authors test such approaches?

      We did not use a learning rate scheduler, as we found we were getting good results without using one.

      - formula (4): The 95% percentile of two numbers is the max, and thus (5) is certainly not what the HD95 metric is. The formula is simply wrong as displayed.

      Thank you. The formula has been updated.

      - formula (5): formula 5 is certainly wrong: n_X, n_y are either integer numbers as indicated by the sum indices or sets when used in the distances, but can't be both at the same time.

      Thank you for your comment. The Formula has been updated.

      - The statement:

      "this functionality of the skeletonization algorithm is currently unavailable in any python implementation, but is available in Matlab [56]."

      is not correct (see reply above)

      Please see the response above. This text has been updated to:

      “Hair” segments shorter than 20 μm and terminal on one end were iteratively removed, starting with the shortest hairs and merging the longest hairs at junctions with 2 terminal branches with the main vessel branch to reduce false positive vascular branches and minimize the amount of centerlines removed. This iterative hair removal functionality of the skeletonization algorithm is currently unavailable in Python, but is available in Matlab [9].

      - the centerline extraction is performed after taking the union of smoothed masks. The union operation can induce novel 'irregular' boundaries that degrade skeletonization performance. I would expect to apply smoothing after the union?

      Indeed the images were smoothed via dilation after taking the union, as described in the previous set of responses to private comments.

      - "The radius estimate defined the size of the Gaussian kernel that was convolved with the image to smooth the vessel: smaller vessels were thus convolved with narrower kernels."

      It's unclear what image were filtered ?

      We have updated this text for clarity:

      The radius estimate defined the size of the Gaussian kernel that was convolved with the 2D image slice to smooth the vessel: smaller vessels were thus convolved with narrower kernels.

      - Was deconvolution on the raw images applied or after Gaussian filtering ?

      The deconvolution was applied before Gaussian filtering.

      - ",we extracted image intensities in the orthogonal plane from the deconvolved raw registered image. A 2D Gaussian kernel with sigma equal to 80% of the estimated vessel-wise radius was used to low-pass filter the extracted orthogonal plane image and find the local signal intensity maximum searching, in 2D, from the center of the image to the radius of 10 pixels from the center."

      Would it not be better to filter the 3d image before extracting a 2d plane and filter then ?

      That could be done, but would incur a significant computational speed penalty. 2D convolutions are faster, and produced excellent accuracy when estimating radii in our bead experiment.

      What algorithm was used to obtain the 2d images.

      The 2d images were obtained using scipy.ndimage.map_coordinates.

      - Figure 2: H is this the filtered image or the raw data ?

      Panel H is raw data.

      - It would be good to see a few examples of the raw data overlaid with the radial estimates to evaluate the approach (beyond the example in K).

      Additional examples are shown in Figure 5.

      - Figure 2 K: Why are boundary points greater than 2 standard deviations away from the mean excluded ?

      They are excluded to account for irregularities as vessels approach junctions [10], [11] REF.

      - Figure 2 L: what exactly is plotted here ? What are vertex wise changes, is that the difference between the minimum and maximum of all the detected radii for a single vertex? Why do some vessels (red) show high values consistently throughout the vessel ?

      Figure 2L displays change in the radius of vertices - in this FOV- following photostimulation in relation to baseline.

      - Assortativity: to calculate the assortativity, are radius changes binned in any form to account for the fact that otherwise, $e_{xy}$ and related measures will be likely be based on single data points?

      Assortativity is not calculated from single data points. It can be calculated by either binning into categories or computing it on scalars i.e. average radius across a vessel segment:

      See here for info on calculating assortativity from binned categories (ie classifying a vessel as a constrictor, dilator or non-responder):

      https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.assortativity.attribute_assortativity_coefficient.html#networkx.algorithms.assortativity.attribute_assortativity_coefficient

      And see here for calculating assortativity from scalar values:

      https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.assortativity.numeric_assortativity_coefficient.html#networkx.algorithms.assortativity.numeric_assortativity_coefficient

      We calculated the assortativity using scalar values.

      In both cases, one uses all nodes and calculates the correlation between each node and its neighbours with an attribute that can be binned or is a scalar. Binning the value on a given node would not affect the number of nodes in a graph.

      - "Ilastik tended to over-segment vessels, i.e. the model returned numerous false positives, having a high recall (0.89{plus minus}0.19) but low precision (0.37{plus minus}0.33) (Figure 3, Supplementary Table 3)."

      As indicated before, and looking at Figure 4, over segmentation seems due to too high background. A suggested preprocessing step on the raw images to remove background could have avoided this.

      The images were normalized in preprocessing.

      - Figure 4: The 3d panels are not much easier to read in the revised version. As suggested by other reviewers, 2d sections indicating the differences and errors would be much more helpful to judge the pipelines quality more appropriately.

      As discussed above, 2D sections are now available in a supplementary figure.

      - Figure 3: What would be the dice score (and other measures) between two ground truths extracted by two annotations by two humans (assisted e.g. by illastik).

      Two additional human rates annotated images. We observed a ICC of 0.73 across a total of three raters on the three images.

      - Figure 5: The authors only provide the absolute value of SU for the sigma noise levels. This only has some meaning when compared to the mean or median SU of the images. In the text the maximal intensity of 1023 SU is mentioned, but what are those values in images with weaker / smaller vessels (as provided in the constriction examples in the revision)/

      I am unclear why this validation figure should be part of the main manuscript while generalization performance is left out.

      The manuscript has been updated with the mean SNR value of 5.05 ± 0.15 to provide context for the quality of our images.

      Bibliography

      (1) J. R. Bumgarner and R. J. Nelson, “Open-source analysis and visualization of segmented vasculature datasets with VesselVio,” Cell Rep. Methods, vol. 2, no. 4, Apr. 2022, doi: 10.1016/j.crmeth.2022.100189.

      (2) G. Tetteh et al., “DeepVesselNet: Vessel Segmentation, Centerline Prediction, and Bifurcation Detection in 3-D Angiographic Volumes,” Front. Neurosci., vol. 14, Dec. 2020, doi: 10.3389/fnins.2020.592352.

      (3) N. Holroyd, Z. Li, C. Walsh, E. Brown, R. Shipley, and S. Walker-Samuel, “tUbe net: a generalisable deep learning tool for 3D vessel segmentation,” Jul. 24, 2023, bioRxiv. doi: 10.1101/2023.07.24.550334.

      (4) W. Tahir et al., “Anatomical Modeling of Brain Vasculature in Two-Photon Microscopy by Generalizable Deep Learning,” BME Front., vol. 2020, p. 8620932, Dec. 2020, doi: 10.34133/2020/8620932.

      (5) R. Damseh, P. Delafontaine-Martel, P. Pouliot, F. Cheriet, and F. Lesage, “Laplacian Flow Dynamics on Geometric Graphs for Anatomical Modeling of Cerebrovascular Networks,” ArXiv191210003 Cs Eess Q-Bio, Dec. 2019, Accessed: Dec. 09, 2020. [Online]. Available: http://arxiv.org/abs/1912.10003

      (6) T. Jerman, F. Pernuš, B. Likar, and Ž. Špiclin, “Enhancement of Vascular Structures in 3D and 2D Angiographic Images,” IEEE Trans. Med. Imaging, vol. 35, no. 9, pp. 2107–2118, Sep. 2016, doi: 10.1109/TMI.2016.2550102.

      (7) T. B. Smith and N. Smith, “Agreement and reliability statistics for shapes,” PLOS ONE, vol. 13, no. 8, p. e0202087, Aug. 2018, doi: 10.1371/journal.pone.0202087.

      (8) J. R. Mester et al., “In vivo neurovascular response to focused photoactivation of Channelrhodopsin-2,” NeuroImage, vol. 192, pp. 135–144, May 2019, doi: 10.1016/j.neuroimage.2019.01.036.

      (9) T. C. Lee, R. L. Kashyap, and C. N. Chu, “Building Skeleton Models via 3-D Medial Surface Axis Thinning Algorithms,” CVGIP Graph. Models Image Process., vol. 56, no. 6, pp. 462–478, Nov. 1994, doi: 10.1006/cgip.1994.1042.

      (10) M. Y. Rennie et al., “Vessel tortuousity and reduced vascularization in the fetoplacental arterial tree after maternal exposure to polycyclic aromatic hydrocarbons,” Am. J. Physiol.-Heart Circ. Physiol., vol. 300, no. 2, pp. H675–H684, Feb. 2011, doi: 10.1152/ajpheart.00510.2010.

      (11) J. Steinman, M. M. Koletar, B. Stefanovic, and J. G. Sled, “3D morphological analysis of the mouse cerebral vasculature: Comparison of in vivo and ex vivo methods,” PLOS ONE, vol. 12, no. 10, p. e0186676, Oct. 2017, doi: 10.1371/journal.pone.0186676.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1:

      The authors explain that an action potential that reach an axon terminal emits a small electrical field as it "annihilates". This happens even though there is no gap junction, at chemical synapses. The generated electrical field is simulated to show that it can affect a nearby, disconnected target membrane by tens of microvolts for tenths of a microsecond. Longer effects are simulated for target locations a few microns away.

      To simulate action potentials (APs), the paper does not use the standard HodgkinHuxley formalism because it fails to explain AP collision. Instead it uses the Tasaki and Matsumoto (TM) model which is simplified to only models APs with three parameters and as a membrane transition between two states of resting versus excited. The authors expand the strictly binary, discrete TM method to a Relaxing Tasaki Model (RTM) that models the relaxation of the membrane potential after an AP. They find that the membrane leak can be neglected in determining AP propagation and that the capacitive currents dominate the process.

      The strength of the work is that authors identified an important interaction between neurons that is neglected by the standard models. A weakness of the proposed approach is the assumptions that it makes. For instance, the external medium is modeled as a homogeneous conductive medium, which may be further explored to properly account for biological processes. To the authors’ credit, the external medium can be largely varying and could be left out from the general model, only to be modeled specific instances.

      The authors provide convincing evidence by performing experiments to record action potential propagation and collision properties and then developing a theoretical framework to simulate effect of their annihilation on nearby membranes. They provide both experimental evidence and rigorous mathematical and computer simulation findings to support their claims. The work has a potential of explaining significant electrical interaction between nerve centers that are connected via a large number of parallel fibers.

      Comments on revisions:

      The authors responded to all of my previous concerns and significantly improved the manuscript.

      We thank the reviewer for his comments and are pleased that we were able to adequately address all of his previous concerns. As a small comment to the remark of the reviewer “potential of explaining ... interaction ... via a large number of parallel fibers” we would like to add: The ephaptic coupling is prominent when APs annihilate at axon terminals, as we illustrate in Figure 4 and 5. Across parallel fibers, the impact of propagating APs is much lower but still may result in synchronization of APs.

      Reviewer 2:

      In this study, the authors measured extracellular electrical features of colliding APs travelling in different directions down an isolated earthworm axon. They then used these features to build a model of the potential ephaptic effects of AP annihilation, i.e. the electrical signals produced by colliding/annihilating APs that may influence neighbouring tissue. The model was then applied to some different hypothetical scenarios involving synaptic connections. In a revised version of the manuscript, it was also applied, with success, to published experimental data on the cerebellar basket cell-to-Purkinje cell pinceau connection. The conclusion is that an annihilating AP at a presynaptic terminal can emphatically influence the voltage of a postsynaptic cell (this is, presumably, the ’electrical coupling between neurons’ of the title), and that the nature of this influence depends on the physical configuration of the synapse.

      As an experimental neuroscientist who has never used computational approaches, I am unable to comment on the rigour of the analytical approaches that form the bulk of this paper. The experimental approaches appear very well carried out, and the data showing equal conduction velocity of anti- and orthodromically propagating APs in every preparation is now convincing.

      The conclusions drawn from the synaptic modelling have been considerably strengthened by the new Figure 5. Here, the authors’ model - including AP annihilation at a synaptic terminal - is used to predict the amplitude and direction of experimentally observed effects at the cerebellar basket cell-to-Purkinje cell synapse (Blot & Barbour 2014). One particular form of the model (RTM with tau=0.5ms and realistic non-excitability of the terminal) matches the experimental data extremely well. This is a much more convincing demonstration that the authors’ model of ephaptic effects can quantitatively explain key features of experimental data pertaining to synaptic function. As such, the implications for the relevance of ephaptic coupling at different synaptic contacts may be widespread and important.

      However, it appears that all of the models in the new Fig5 involve annihilating APs, yet only one fits the data closely. A key question, which should be addressed if at all possible, is what happens to the predictive power of the best-fitting model in Fig5 if the annihilation, and only the annihilation, is removed? In other words, can the authors show that it is specifically the ephaptic effects of AP annihilation, rather than other ephaptic effects of, say AP waveform/amplitude/propagation, that explain the synaptic effects measured in Blot & Barbour (2014)? This would appear to be a necessary demonstration to fully support the claims of the title.

      Reviewer 2 (Recommendations for the authors):

      Can you clarify whether all models shown in Fig5 involve an annihilating AP? Is it possible to plot the predicted effects of the most successful model (RTM 0.5ms in B) with *only* the annihilation selectively removed?

      We are grateful for the reviewer’s comments and the specific suggestion for improvement (’...can the authors show that it is specifically the ephaptic effects of AP annihilation, rather than other ephaptic effects...’). For illustrating the importance of annihilation, we added the results of our calculation when no annihilation occurs, i.e. for propagating APs in the source neuron (Figure 5A) and we modified the geometry of the source neuron in Figure 5B such that only the annihilation takes place. Together with the source neuron with similar properties to the Basket cell (Figure 5C), we now show the effect of annihilation and the effect of Basket cell specific geometry and physiology. We added and edited in the main text the following 4 sentences:

      ll 271: In our two models (TM and RTM), the modulation of not terminating but propagating APs along the source axon on the AP rate of the target cell is minute (Figure 5A). Note that this geometry does not correspond to the Purkinje cell-Basket cell connectivity. For annihilating APs at the axon terminal, with excitable segments up to the very end, our models reveal a moderate modulation, and only about half of what was reported for the Purkinje cell by Blot and Barbour (2014). This illustrates the importance of AP annihilation for ephaptic coupling (Figure 5B). We added and edited the figure legend:

      Figure 5. ... (A) excluding the annihilation of an AP at the source neuron, i.e. a propagating AP, cause only minute modulation of the predicted AP rate in the target neuron. Note that this example does not represent the Basket cell terminal with annihilating APs. (B) annihilation of an AP at the terminal of the source neuron, with all segments being excitable in our calculation, cause moderate modulation. (C) source neuron with similar properties to the Basket cell, i.e. a bouton and last segments non-excitable (corresponding to 15 µm with no switch from resting state to excited state), cause inhibition and rebound that is very similar as described by Blot and Barbour (2014).

      In the discussion, we extended one sentence to refer to Figure 5:

      ll 346: This may cause synchronization of APs and our proposed model also can be used to study the observed phenomena of synchronization due to ephaptic coupling, even in the case of zero discharge (see Figure 4A, and local impact on the target, integrated on timescales >1 ms in Figure 5).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We sincerely appreciate the time and effort you and the reviewers have invested in evaluating our work.

      We are grateful for the constructive criticism of the reviewers. Building up on their feedback we have made additions to the reviewed preprint. Specifically, we have added information to the supplementary materials to give additional context on the impact of the fixed experimental design on infants’ looking behavior. Further, we have adapted the text throughout the manuscript to incorporate a thorough discussion of the impact of the experimental design.

      We believe that these revisions and the inclusion of supplementary analyses provide a clearer understanding of our findings.